Tom showed me a link to The J1 Forth CPU, a very small processor which is coded in Verilog (only 200 lines!) and can run very fast on existing FPGA boards.
It is quite an intriguing design.
Forth is an intriguing if somewhat archaic programming language. In the bygone ages of my youth, I experimented a bit with FigForth on my Atari 400, which I found to be a pretty interesting and fun programming environment, but I found most Forth code to be rather inscrutable and hard to understand, at least, it was hard if you let more than a few days pass between when you wrote it and when you read it. Nevertheless, I don’t particularly fear Forth, and have used another somewhat similar language, namely Postscript, off and on over the years.
Forth is a stack-oriented language. In languages such as C, we are used to having a call-stack which contains not just return addresses, but also local variables. Forth is a different in that local variables aren’t typically stored on the stack. Instead, Forth splits things that C would place on a single stack into two separate stacks: the data stack (often just called “the stack”) and the return stack (or “rstack”). Forth “words” (procedures) generally specify actions that manipulate one or both stacks. For instance, the “plus” word, might take the two values at the top of the data stack and replace them with their sum. The “exit” or “;” word pulls a return address off the rstack, and transfers it to the IP (instruction pointer, basically the PC).
The J1 architecture is reasonably novel in that is optimized for the execution of Forth. It is a 16 bit machine, which has an internal state consisting of a 33 (not sure why not 32, but that’s what the docs say) deep data stack, a 32 deep return stack, and a 13 bit program counter. No condition flags, modes, or registers. It has 8K words (16 bit words) of RAM, and an additional 8K words for memory mapped I/O. Instructions fall into one of 5 classes:
- If the MSB of the instruction is a one, the remaining 15 bits specify a literal value which is pushed onto the data stack.< ./li>
- If the upper three bits are all zero, then the remaining 13 bits specify a new value for the PC (an unconditional jump).
- If the upper three bits are “001”, then the machine pops the data stack, and if the value is zero, the remaining 13 bits specify the new PC (conditional jump).
- If the upper three bits are “010”, then the remaining 13 bits specify a call address. The current PC is pushed to the return stack, and the remaining 13 bits specify the new PC.
- The remaining case (upper three bits “011”) specify an ALU instruction. This single instruction can implement many Forth words directly. The remaining 13 bits are split into 7 fields which together specify a complex action. Four bits pick one of 16 operations that are performed on the top value (and for binary operations, the next value) of the stack. A single bits says whether the top value should be copied to the next value on the data stack. Another bit specifies whether the return address should be copied to the PC. Another bit specifies whether the top of the data stack should get copied to the rstack. Two two-bit fields give a set of signed increments/decrements for the data and rstack. And a last bit specifies if a RAM store operation should happen (the next value gets stored in the address pointed at by the top value).
And that’s pretty much it! Many basewords are very, very simple and can be implemented in just one or two op codes. The resulting machine runs very fast. It’s really quite clever: all the more so because it’s actually possible to understand how it works.
It actually is fairly close to the Hack Platform defined in Noam Nisan and Shimon Schocken’s book “The Elements of Computing Systems”. The “Hack” machine has only two kinds of instructions: an “address” instruction (similar to the literal instruction above) and the “compute” instruction, similar to the last: it specifies what function the ALU computes, the destination for the ALU, and a jump condition. They use this chip to implement a more conventional stack-based virtual machine. The two machines have similar specifications and power.
I found them both to be interesting. I’ll be reading the J1 Verilog code closely.