The J1 Forth CPU, running on an FPGA

December 2, 2010 | FPGA, Hardware, My Projects | By: Mark VandeWettering

Tom showed me a link to The J1 Forth CPU, a very small processor which is coded in Verilog (only 200 lines!) and can run very fast on existing FPGA boards.

It is quite an intriguing design.

Forth is an intriguing if somewhat archaic programming language. In the bygone ages of my youth, I experimented a bit with FigForth on my Atari 400, which I found to be a pretty interesting and fun programming environment, but I found most Forth code to be rather inscrutable and hard to understand, at least, it was hard if you let more than a few days pass between when you wrote it and when you read it. Nevertheless, I don’t particularly fear Forth, and have used another somewhat similar language, namely Postscript, off and on over the years.

Forth is a stack-oriented language. In languages such as C, we are used to having a call-stack which contains not just return addresses, but also local variables. Forth is a different in that local variables aren’t typically stored on the stack. Instead, Forth splits things that C would place on a single stack into two separate stacks: the data stack (often just called “the stack”) and the return stack (or “rstack”). Forth “words” (procedures) generally specify actions that manipulate one or both stacks. For instance, the “plus” word, might take the two values at the top of the data stack and replace them with their sum. The “exit” or “;” word pulls a return address off the rstack, and transfers it to the IP (instruction pointer, basically the PC).

The J1 architecture is reasonably novel in that is optimized for the execution of Forth. It is a 16 bit machine, which has an internal state consisting of a 33 (not sure why not 32, but that’s what the docs say) deep data stack, a 32 deep return stack, and a 13 bit program counter. No condition flags, modes, or registers. It has 8K words (16 bit words) of RAM, and an additional 8K words for memory mapped I/O. Instructions fall into one of 5 classes:

  • If the MSB of the instruction is a one, the remaining 15 bits specify a literal value which is pushed onto the data stack.< ./li>
  • If the upper three bits are all zero, then the remaining 13 bits specify a new value for the PC (an unconditional jump).
  • If the upper three bits are “001”, then the machine pops the data stack, and if the value is zero, the remaining 13 bits specify the new PC (conditional jump).
  • If the upper three bits are “010”, then the remaining 13 bits specify a call address. The current PC is pushed to the return stack, and the remaining 13 bits specify the new PC.
  • The remaining case (upper three bits “011”) specify an ALU instruction. This single instruction can implement many Forth words directly. The remaining 13 bits are split into 7 fields which together specify a complex action. Four bits pick one of 16 operations that are performed on the top value (and for binary operations, the next value) of the stack. A single bits says whether the top value should be copied to the next value on the data stack. Another bit specifies whether the return address should be copied to the PC. Another bit specifies whether the top of the data stack should get copied to the rstack. Two two-bit fields give a set of signed increments/decrements for the data and rstack. And a last bit specifies if a RAM store operation should happen (the next value gets stored in the address pointed at by the top value).

And that’s pretty much it! Many basewords are very, very simple and can be implemented in just one or two op codes. The resulting machine runs very fast. It’s really quite clever: all the more so because it’s actually possible to understand how it works.

It actually is fairly close to the Hack Platform defined in Noam Nisan and Shimon Schocken’s book “The Elements of Computing Systems”. The “Hack” machine has only two kinds of instructions: an “address” instruction (similar to the literal instruction above) and the “compute” instruction, similar to the last: it specifies what function the ALU computes, the destination for the ALU, and a jump condition. They use this chip to implement a more conventional stack-based virtual machine. The two machines have similar specifications and power.

I found them both to be interesting. I’ll be reading the J1 Verilog code closely.


Comment from Jeremy
Time 12/2/2010 at 1:36 pm

Presumably that should be “If the MSB of the instruction is a *one*, the remaining 15 bits specify a literal value which is pushed onto the data stack” or the encoding makes no sense from your description.

Comment from Mark VandeWettering
Time 12/2/2010 at 2:00 pm

Ah, correct, fixed.

Comment from Tom
Time 12/2/2010 at 2:42 pm

The data stack has the top stack item in it’s own register for speed. That’s why the data stack is 33 deep (32 item stack + top register).

Comment from Diane VA3DB
Time 12/8/2010 at 8:08 pm

Oh gosh, this brings back memories. I did FIG-FORTH years ago on an AIM-65, typed it in myself, lots of editing and saving to cassette tape. Painful. I even interfaced it to a 8″ floppy disk drive controller I wire wrapped up and had it saving FORTH pages. The good ol’ days.

Comment from Wim Valcke
Time 1/2/2011 at 1:02 pm

Hm, firmware does not compile,

anyone tries this ?

make j1.bin in firmware directory : Gives an error

version.fs:2: bad number
: builddate d# 1293998289. d# >>>+0100<<< ;

My forth knowledge is nearly zero, could someone help me out ?

Comment from Munch
Time 3/25/2011 at 10:41 am

I don’t know the specific brand of Forth used to program the J1, but I would suspect that d# is a parsing word which is not capable of handling + as a sign character. Try removing the + character.

Comment from Munch
Time 3/25/2011 at 10:43 am

@Mark: I’m rather surprised that you’re unfamiliar with stack CPUs over history. The J1 is only the latest of a very large number of stack-architecture CPUs. On a more commercial front, you should check out the GreenArrays embedded controller chips too. Google search “GA144” will bring you to some information on that chip.

Comment from Mark VandeWettering
Time 3/25/2011 at 1:01 pm

I really haven’t been paying much attention to Forth since my early microcomputer days as most of what made Forth attractive (compactness and self-hosting) have faded in importance for most of the applications I’m interested in. I was interested in the J1 processor mostly because it represents a simple architecture that I could experiment with in FPGA form, not for any particular love of the stack-based architecture or Forth.

That being said, I’m looking forward to having a Gameduino to play with, along with its embedded J1 processor.

Comment from Ken Boak
Time 4/26/2015 at 1:17 pm


Got my ZPUino running a very slow J1 emulator. See Blog