The other day, my friend Jeff sent me an email detailing the latest benchmarking of his new 8 core Intel box. He had written a program to compute return rates for all possible draws in video poker. A few years ago, this program had taken a month of computer time to complete, and he wondered how how quickly it would run on his new box. The answer: around 12 hours.
This got me thinking back to my experience using Simon Singh’s “Cipher Challenge” that I did back in 2000. I hadn’t tried to solve Stage 9, because at the time it seemed a daunting task, requiring computational resources far in excess of what I had at the time. But Jeff’s email got me thinking, so I decided to run an experiment: how long would it really take?
I didn’t have a fast, optimized DES implementation of my own, so I decided to dig around. It turns out that OpenSSL contains an implementation of DES, and a few minutes of writing some code verified that it could actually decode the challenge message. So, the question was: how long would it take to break this code?
Stage 8 had revealed 8 of the 56 key bits, leaving a keyspace of 2^48 entries left (about 281 trillion entries). So, I coded up a small test program that did a million DES_set_key calls, followed by a decryption of the message. I found that it took about 21 microseconds to do a test. On average, we’ll search half the keyspace (around 140 trillion keys) before we find the key, so if you multiply it out, you get…. 93.65 years. We have eight cores, which brings it down to about 11.7 years. Not so good…
But here’s the thing: we probably don’t have to really decode the entire message. If we believe that the output is ASCII, then each of the output bytes should have the top bit clear. Any block which doesn’t have this feature indicates our key is on the wrong track. The odds of getting all those bits clear by random chance is one in 256. If we try to only decode the first six blocks, then our chance that we got something wrong is about one in 2^48th, which is just about the size of the keyspace, so we’d probably wouldn’t see very many false positives. If we modify the rule to just check the first six blocks, then it only takes about 1.16 microseconds to do a key set and a decode. Now my single machine can get the run time down to 236 days. Still pretty darned slow actually. It’s clear that openssl’s DES implementation isn’t really up to the task. I suspect that there are software implementations which are several times faster than this. I’ll try to dig one up and see what could really be done.
Addendum: When I got home, I dug up a copy of Eric Young’s libdes library, and ran some basic speed tests using the “speed” program that’s part of the distributions. I wasn’t running it on the same machine, so the numbers aren’t directly comparable, but my home machine quite a bit slower. Nevertheless, I got some interesting numbers:
[chessboard] % ./speed Doing set_key for 10 seconds 29270676 set_key's in 10.00 seconds Doing des_encrypt's for 10 seconds 54502334 des_encrypt's in 9.93 second Doing des_cbc_encrypt on 1024 byte blocks for 10 seconds 402292 des_cbc_encrypt's of 1024 byte blocks in 10.00 second Doing des_ede_cbc_encrypt on 1024 byte blocks for 10 seconds 146870 des_ede_cbc_encrypt's of 1024 byte blocks in 10.01 second Doing crypt for 10 seconds 1731166 crypts in 9.92 second set_key per sec = 2927067.60 ( 0.342uS) DES raw ecb bytes per sec = 43909231.82 ( 0.182uS) DES cbc bytes per sec = 41194700.80 ( 0.194uS) DES ede cbc bytes per sec = 15024463.54 ( 0.532uS) crypt per sec = 174512.70 ( 5.730uS)
If you add the set_key time to 48 times the raw ecb encode, you get about 9 microseconds. Using both of its two cores, that works out to about 20 years. That’s about 31 times slower, or nearly 8x slower per core.