A few more notes on my implementation of Conway’s Life…

I received a couple of email and twitter queries about various aspects of the code, so I thought I would add a followup to yesterday’s post.

First of all, I received a query as to whether it was “open source”. Frankly, the code is so minimal and so obvious I didn’t even consider it worthy of protection from any license. Consider it in the public domain. I do appreciate it when people who use this code give a nod back to my website as the origin of the code, but it’s not mandatory.

Seriously. It’s not a big deal.

Next, realize that I wrote the code in just a few minutes (less than thirty). It’s written in just about the most obvious way I could conceive. I really was just looking at a way to test my understanding of the U8glib library, and this seemed like a good way to go. There are a number of ways the code could be improved.

First, this implementation doesn’t use the full resolution of the 128×64 display: it only uses 96×64. The reason is quite simple: the way the code is written, it needs to maintain 2 buffers (the “current” and “next” arrays) each of which hold 128×64 bits. If you multiply out 128x64x2 and divide by 8 to get the number of bits needed, you’ll come up with 2048: coincidently, exactly the same amount of RAM in an ATMega328. But you’ll need some extra memory to hold temporary variables, buffers used by the U8glib, the stack for subroutine calls, etc… Trimming the resolution back to 96×64 provides enough space to actually run. But we could actually use the full resolution by being slightly more clever: instead of having two full size arrays, we could do more of the updates “in place”, and only buffer a couple of extra lines to hold out temporary values before we overwrite them. I have the code in my head, but a few odd details about it aren’t worked out yet, I hope to have nearly as simple an implementation worked out today some time. Stay tuned.

Second, it’s slow. Doing 8 independent bit lookups in the array, the increments, and the compares? It’s a straightforward implementation, but not a fast one. In particular, the shift operations on every pixel lookup are probably really a bad idea on the Atmel AVR: these chips don’t include a barrel shifter, so to shift by 7 takes 7 cycles. We could probably factor out a lot of these calculations with some work. It’s possible to use techniques that can process all the bits inside a given word “in parallel” by implementing bitwise arithmetic, but since the AVR is only an 8 bit processor, the gains are less than they would be on a 32 bit processor, and handling all the edge cases is kind of a nuisance. I remember doing this back in my college days, but I’d have to work out all the details again. You can look at this article from StackOverflow to get some hints and pointers to other implementations. Some of the ideas are good, some not.

Third: the display is really, really tiny. It might be fun to use the builtin scaling of the U8G driver to magnify it by two. Then, of course, we’ll only need to do 64×32, which should run around 4x as fast, and should be easier to see. It will also use a lot less memory, which makes it more useful as a module in (say) another program like a DIY watch.

Lastly, I was told that initializing the random number generator in this way would work, but it seems to generate the same seed each time. I’ll have to figure out a better way to initialize the seed on RESET.

Stay tuned for an improved version.