Elementary Crypto Lesson

I’ve been interested in codes and cryptography for quite some time. I find them at the fascinating intersection of history, mathematics and computer science: all topics that I like to read about and experiment with. Let me give you a basic crypto lesson, with a moral at the end.

Let’s say that all your messages are coded just with capital letters, and you remove all spaces. This gives you an alphabet of (suprise!) 26 letters. Let’s say that you wish to encode a message. You think to yourself: golly, I know what I’ll do. I’ll convert each letter its corresponding number in the range of 1-26. That will make it confusing! So my message HELLO will translate into

8-5-12-12-15

But that doesn’t seem very hard. People could crack that pretty simply. What to do… what to do…

Well, I could scramble the letter order. Perhaps if A was represented by 13, and B by 8, and so on, they couldn’t figure it out. But if you do cryptograms in the newspaper, you know that even with a modest amount of code text, you can crack these things pretty easily using frequency analysis and the like.

Let’s go back to our simple code again. Imagine that we had a second text, the same length as the first that we could use as a key. To encode we add the two numbers together, and if the result is greater than 26, we subtract 26. That will certainly jumble up the frequencies, preventing some kinds of analysis, and since the key is long, techniques for polyalphabetic ciphers won’t really work either.

But there is a serious flaw. Imagine that you could guess a word in the cipher text. Perhaps if the message were addressed to me, it would contain BRAINWAGON or even worse VANDEWETTERING. You could try to subtract these words out of the cipher text, and if normal words popped out the other side, you would have a partial decrypt of both the message and the encoding stream. This may seem difficult if all you are used to is normal substitution ciphers, but in fact it is dead simple to break.

One way to avoid this problem is to use what is called a one time pad. If the encoding stream is truly random, then when you can’t recover the encoding stream (it is, after all, perfectly random). One time pads are perfectly secure, with the caveat that you can never, ever, ever reuse a one time pad. Why? Because then you could subtract the two messages, the one time pad data drops out, and you are left with the simple, easy to break case listed above.

Why the cryptography lesson? Because Bruce Schneier (author of the excellent book Applied Cryptography) points out that no less than Microsoft makes this exact error. When you save a Word document, it reencodes it with precisely the same stream, and therefore if you have access to multiple versions of the same document, you can recover the entire document with elementary cryptanalysis.

This is one of the reasons I like open source: you can audit software to find errors like this, and work to correct them quickly.

Anyway, if you like cryptography, privacy and information issues, subscribe to Bruce’s Cryptogram. It’s good stuff.