A simple clock, running on the Arduino Uno…

In a previous post, I wrote some interrupt driven code to refresh a multiplexed 8 digit, 7 segment display. Every 2 milliseconds, the Arduino triggers an interrupt, and then clocks out sixteen bits to a pair of chained 74HC595 shift registers, and then returns. That displays a single digit. The next interrupt displays the next digit, and so on until all eight are displayed. Because of persistence of vision, you don’t perceive any flicker, and you get a nice display.

Two things bothered me about the previous code: on a standard Arduino, it used about 18% of all available CPU cycles to do this refresh. That’s a lot of time to dedicate to just display. It’s not hard to see why: the code calls digitalWrite both directly, and because the shiftOut call also uses digitalWrite.

The problem is that digitalWrite is slow, really slow.

The reason is that while digitalWrite is slow, it’s actually doing an awful lot. At the hardware level, all the digital pins of an Arduino (which are numbered 0…13) are individual bits in one of several eight bit hardware ports. These ports are really just memory addresses. They have both a data direction register (which tells you whether each bit in the port is an input or an output) and a data register, which for inputs tells you whether associated pin is high or low, and which you can turn on individual bits to set outputs high or low.

On the Arduino Uno, pins 8-13 are attached to the lowest bits of the PORTB. You can assign byte values to PORTB, and the associated outputs will get driven high or low, all in one instruction. You can even set multiple bits at the same time,
which is impossible to do with digitalWrite.

So, why use digitalWrite?

It’s more portable. All Arduinos have pins 0-13, but some map them to different underlying PORTS (Arduino Megas for instance have more I/O ports, and a different mapping). By using digitalWrite, you don’t need to know which port corresponds to which pin, that’s all part of the library. The Teensy LC that I was testing with yesterday doesn’t even have the same underlying chip architecture, but the libraries to emulate the Arduino all behave perfectly, and I don’t have to change even a single line to get it to work with the Teensy. That’s pretty cool.

But it is pretty slow.

My testing yesterday showed that on the Arduino Uno, using digitalWrite meant that the critical part of the interrupt routine took 320 microseconds. The Arduino Uno runs at around 16Mhz, and can basically run one instruction per cycle, so that means it is taking about 5120 instructions just to clock out the sixteen bits into shift registers. Each bit requires about three output bits to get toggled, so that means that each toggle is around 100 instructions long.

I knew I could do better.

As I said, pins 8-13 are the low bits of PORTB. By directly accessing those bits, we can speed things up alot.

Without further explanation, here’s the old, short, portable code that calls digitalWrite and shiftOut routines:

[sourcecode lang=”cpp”]
void
LED_irq(void)
{
digitalWrite(latchpin, LOW);

// select the digit…
shiftOut(datapin, clockpin, MSBFIRST, col[segcnt]) ;
// and select the segments
shiftOut(datapin, clockpin, MSBFIRST, segbuf[segcnt]) ;

digitalWrite(latchpin, HIGH) ;
segcnt ++ ;
segcnt &= NDIGITS_MASK ;
}
[/sourcecode]

The faster version, which runs about 8x faster, but is not portable looks like this:

[sourcecode lang=”cpp”]
void
LED_irq(void)
{
PORTB &= ~0b00010000 ;

for (int i=0; i<8; i++) {
if (col[segcnt] & (0x80 >> i))
PORTB |= 0b00000100 ;
else
PORTB &= ~0b00000100 ;
PORTB |= 0b00001000 ;
PORTB &= ~0b00001000 ;
}

for (int i=0; i<8; i++) {
if (segbuf[segcnt] & (0x80 >> i))
PORTB |= 0b00000100 ;
else
PORTB &= ~0b00000100 ;
PORTB |= 0b00001000 ;
PORTB &= ~0b00001000 ;
}

PORTB |= 0b00010000 ;

segcnt ++ ;
segcnt &= NDIGITS_MASK ;
}
[/sourcecode]


If you want the complete code, you can get it from my github repository.