Category Archives: Cryptography

The Chaocipher revealed! from Cipher Mysteries

Stumbling back through articles in Slashdot, I found a pretty nifty article on one of my favorite subjects: historical cryptography. The story goes that back in 1918, a cipher system/machine was invented by John F. Byrne. Rumor says that it was very strong, and yet could be implemented using a mechanism that would fit in a cigar box. The details of this invention were never publicly released. However, recently the widow of Bryne’s son, John Byrnes Jr., has decided to donate his notes to the National Cryptological Museum, and the first publications are beginning to trickle out. Moshe Rubin has a new paper that details the working of the algorithm in sufficient detail that it should be possible to write an implementation in whatever language you like for experimentation. It’s too late for me to start today, but expect a Python reference implementation in the next few days:

The Chaocipher revealed! | Cipher Mysteries.

A cursory glance over the implementation suggests that the key space is basically 26! * 26! or about:

162,644,002,617,632,464,507,038,883,409,628,607,021,056,000,000,000,000

By comparison, the German Army Enigma (three rotors) had a keyspace of only 562,064,881,159,999,426,560, and the Navy Enigma a keyspace which was only 1000x larger. So if all things were equal, we might expect that the Chaocipher was a lot harder to crack. But all things are probably not equal. I’ll be pondering this over the next few days.

Schneier on Homomorphic Encryption Breakthrough

A couple of weeks ago during lunch, someone had mentioned that a breakthrough in the world of cryptography had occurred: that someone had succeeded in creating something called a “homomorphic encryption scheme”. The thing was, nobody at the table really understood what that was all about. I did a brief bit of reading on it, and once I got the basic idea, I realized that yes indeed, it was a pretty amazing result, and pretty surprising. But rather than try to describe it to you, I’ll toss you to this recent link on Bruce Schneier’s website for his explanation.

Schneier on Security: Homomorphic Encryption Breakthrough

A Simple Python Program for Factoring…

Sometimes, your interests converge. Over on Programming Praxis, he had a coding challenge to implement Monte Carlo factorization. The last couple of days, I have been thinking a bit more about factorization, so it seemed like a good idea to give it a whack. I cribbed a version of the Rabin Miller primality tester off the web, and then coded the rest up in python.

#!/usr/bin/env python

# 
# a basic implementation of the Pollard rho factorization
# Written by Mark VandeWettering <markv@pixar.com>
#

import sys
import locale
import random

class FactorError(Exception):
    def __init__(self, value):
	self.value = value 
    def __str__(self):
	return repr(self.value)

def miller_rabin_pass(a, n):
    d = n - 1
    s = 0
    while d % 2 == 0:
        d >>= 1
        s += 1

    a_to_power = pow(a, d, n)
    if a_to_power == 1:
        return True
    for i in xrange(s-1):
        if a_to_power == n - 1:
            return True
        a_to_power = (a_to_power * a_to_power) % n
    return a_to_power == n - 1

def isprime(n):
    for repeat in xrange(20):
        a = 0
        while a == 0:
            a = random.randrange(n)
        if not miller_rabin_pass(a, n):
            return False
    return True



def gcd(a, b):
    while b != 0:
	a, b = b, a%b
    return a 

def findfactor(n):
    for c in range(1, 50):
	x = y = random.randint(1, n-1)
	x = (x * x + c) % n
	y = (y * y + c) % n
	y = (y * y + c) % n
	while True:
	    t = gcd(n, abs(x-y))
	    if t == 1:
		x = (x * x + c) % n
		y = (y * y + c) % n
		y = (y * y + c) % n
	    elif t == n:
		break
	    else:
		return t
    raise FactorError("couldn't find a factor.")
	
def factor(n):
    r = []
    while True:
	if isprime(n):
	    r.append(n)
	    break
	try:
	    f = findfactor(n)
	    r.append(f)
	    n = n / f
	except FactorError, msg:
	    r.append(n)
	    break
    r.sort()
    return r 

def doit(n):
	flist = factor(n)
	print locale.format("%d", n, True), "="
	for f in flist:
		print "\t%s" % locale.format("%d", f, True)

locale.setlocale(locale.LC_ALL, "")

doit(2**98-1)

This is a fairly challenging factorization, and after a few seconds, it outputs:

316,912,650,057,057,350,374,175,801,343 =
	3
	43
	127
	4,363,953,127,297
	4,432,676,798,593

Taking my own code challenge…

Back in 2007, I made a blog post entitled “Code Challenge”, where I put forth a code challenge.

Code Challenge | brainwagon.

Nobody took the challenge. But what was slightly odder was that I had no recollection of making this challenge, or even what type of cipher it was. The only clue that I had was that it was apparently in response to an episode of the increasingly mediocre television show Numb3rs.

So, I considered it a challenge sent from past Mark to future Mark. The first thing I had to do was to try to identify the type of cipher. I had made this posting in June of 2007, and I guessed that I would have been watching episodes from that season (season 3) but possibly in reruns. I relatively quickly located “The Janus List” and its “straddling checkerboard cipher” as the likely method. Indeed, reading the description of the cipher (try this blog entry for details) it began to come back to me.

So, I tried to think of how to crack it. Here’s the insight. The straddling checkerboard cipher is essentially a simple subsitution cipher, except that each character is either encoded into one or two characters. What makes it easy is that there are eight single character letters, and twenty double character letters, but all double character letters start with one of the two non-single character letters. So, you can proceed by guessing what those two prefix characters are, and then scanning through the message. If you get more than 26 (or 28, if you include period and slash as most straddling checkerboard implementations do), then you know you haven’t gotten the right answer. But a quick look at the single letter frequencies in my case indicated that the characters 6 and 8 were vastly more frequent than the other characters, so I guessed that those were the right characters. When I busted the cipher up into characters, I ended up with 23 distinct symbols, which seemed entirely appropriate. I could then assign a random letter to each of those 23 symbols, and I’d have a simple cryptogram.

Here’s the resulting cryptogram:

WRAFSIFFKVIQKFINCOMGWIDUFANVKFWKVVIOIDCRICEIOBHKODCAGRIOQRACRMHUGOHBKBT
MRKJIKTOIKDMLAVUOIDHUWALMHUKOIOIKDANVWRAFCHNVOKWUTKWIMHUOFITLHNBIANVIPC
IIDANVTMCTIJIOKNDWINKCAHUFADHUBWKNMHNIQATTKCWUKTTMQHOERKODINHUVRWHKCWUK
TTMDICHDIWRAFSIFFKVIFUCCIFFLUTTM

Mind you, I might have accidently assigned a letter to itself here, so it’s not quite like a normal cryptogram. But it isn’t hard to make some guesses. ‘I’ is the most common letter, so is probably E. The last three characters are TTM, which implies a double letter right at the end, which suggested LLY. The double letter digraph FF occurs three times, which suggests that F might map to S. Putting these together makes the last word to look like SuccESSluLLY. Wow! U and C both look like they map to themselves. L is F. The message starts TraS, which suggests THIS. R becomes H, A becomes I. Now we have traction, K is A, S is M, V is G…

THISMESSAGE.ASE.C.Y.TE.USI.GASTAGGE.E.CHEC.E...A..CI.HE..HICHY.U....A.L
YHA.EAL.EA.YFIGU.E..UTIFY.UA.E.EA.I.GTHISC..G.ATULATEY.U.SELF...EI.GE.C
EE.I.GLYCLE.E.A..TE.ACI.USI..U.TA.Y..E.ILLACTUALLY....HA..E..UGHT.ACTUA
LLY.EC..ETHISMESSAGESUCCESSFULLY

It’s a short skip and a jump to the complete decode:

THISMESSAGEWASENCRYPTEDUSINGASTAGGEREDCHECKERBOARDCIPHERWHICHYOUPROBABL
YHAVEALREADYFIGUREDOUTIFYOUAREREADINGTHISCONGRATULATEYOURSELFONBEINGEXC
EEDINGLYCLEVERANDTENACIOUSIDOUBTANYONEWILLACTUALLYWORKHARDENOUGHTOACTUA
LLYDECODETHISMESSAGESUCCESSFULLY

Apparently, I was tenacious enough.

Revisiting Simon Singh’s Cipher Challenge…

The other day, my friend Jeff sent me an email detailing the latest benchmarking of his new 8 core Intel box. He had written a program to compute return rates for all possible draws in video poker. A few years ago, this program had taken a month of computer time to complete, and he wondered how how quickly it would run on his new box. The answer: around 12 hours.

This got me thinking back to my experience using Simon Singh’s “Cipher Challenge” that I did back in 2000. I hadn’t tried to solve Stage 9, because at the time it seemed a daunting task, requiring computational resources far in excess of what I had at the time. But Jeff’s email got me thinking, so I decided to run an experiment: how long would it really take?

I didn’t have a fast, optimized DES implementation of my own, so I decided to dig around. It turns out that OpenSSL contains an implementation of DES, and a few minutes of writing some code verified that it could actually decode the challenge message. So, the question was: how long would it take to break this code?

Stage 8 had revealed 8 of the 56 key bits, leaving a keyspace of 2^48 entries left (about 281 trillion entries). So, I coded up a small test program that did a million DES_set_key calls, followed by a decryption of the message. I found that it took about 21 microseconds to do a test. On average, we’ll search half the keyspace (around 140 trillion keys) before we find the key, so if you multiply it out, you get…. 93.65 years. We have eight cores, which brings it down to about 11.7 years. Not so good…

But here’s the thing: we probably don’t have to really decode the entire message. If we believe that the output is ASCII, then each of the output bytes should have the top bit clear. Any block which doesn’t have this feature indicates our key is on the wrong track. The odds of getting all those bits clear by random chance is one in 256. If we try to only decode the first six blocks, then our chance that we got something wrong is about one in 2^48th, which is just about the size of the keyspace, so we’d probably wouldn’t see very many false positives. If we modify the rule to just check the first six blocks, then it only takes about 1.16 microseconds to do a key set and a decode. Now my single machine can get the run time down to 236 days. Still pretty darned slow actually. It’s clear that openssl’s DES implementation isn’t really up to the task. I suspect that there are software implementations which are several times faster than this. I’ll try to dig one up and see what could really be done.

Addendum: When I got home, I dug up a copy of Eric Young’s libdes library, and ran some basic speed tests using the “speed” program that’s part of the distributions. I wasn’t running it on the same machine, so the numbers aren’t directly comparable, but my home machine quite a bit slower. Nevertheless, I got some interesting numbers:

[chessboard] % ./speed
Doing set_key for 10 seconds
29270676 set_key's in 10.00 seconds
Doing des_encrypt's for 10 seconds
54502334 des_encrypt's in 9.93 second
Doing des_cbc_encrypt on 1024 byte blocks for 10 seconds
402292 des_cbc_encrypt's of 1024 byte blocks in 10.00 second
Doing des_ede_cbc_encrypt on 1024 byte blocks for 10 seconds
146870 des_ede_cbc_encrypt's of 1024 byte blocks in 10.01 second
Doing crypt for 10 seconds
1731166 crypts in 9.92 second
set_key            per sec =   2927067.60 (    0.342uS)
DES raw ecb bytes  per sec =  43909231.82 (    0.182uS)
DES cbc bytes      per sec =  41194700.80 (    0.194uS)
DES ede cbc bytes  per sec =  15024463.54 (    0.532uS)
crypt              per sec =    174512.70 (    5.730uS)

If you add the set_key time to 48 times the raw ecb encode, you get about 9 microseconds. Using both of its two cores, that works out to about 20 years. That’s about 31 times slower, or nearly 8x slower per core.

Matt Blaze: Security and Human Behavior 2009

For all the people who’ve come to see this as my ham radio blog, I apologize, but I do have other interests, and computer security is one of them. It’s not that I am any kind of expert, but I have played around quite a bit with various bits of computer security and cryptography over the years, and enjoy reading up on it. Recently a small interdisciplinary workshop was held at MIT on this subject, and Matt Blaze was kind enough to make recordings of the workshop. I’m loading these onto my iPhone (perhaps the last thing new it will see before my new iPhone 3GS arrives!) Haven’t checked the audio quality yet, but hopefully the sessions will be interesting…

Matt Blaze: Security and Human Behavior 2009

Keyboard Acoustic Emanations Revisited

While my blog has been dominated by radio related stuff lately, I do continue to be interested in lots of different subjects, including various topics related to computer security and codes. While scanning my feeds today, I found reference to this work, which I hadn’t seen before, but which I find interesting both for its security implications and its use of machine learning. Very cool.

Keyboard Acoustic Emanations Revisited

We examine the problem of keyboard acoustic emanations. We present a novel attack taking as input a 10-minute sound recording of a user typing English text using a keyboard, and then recovering up to 96% of typed characters. There is no need for a labeled training recording. Moreover the recognizer bootstrapped this way can even recognize random text such as passwords: In our experiments, 90% of 5-character random passwords using only letters can be generated in fewer than 20 attempts by an adversary; 80% of 10- character passwords can be generated in fewer than 75 attempts. Our attack uses the statistical constraints of the underlying content, English language, to reconstruct text from sound recordings without any labeled training data. The attack uses a combination of standard machine learning and speech recognition techniques, including cepstrum features, Hidden Markov Models, linear classification, and feedback-based incremental learning.

M4 Message Breaking Project

You might have noticed if you are a long time reader of this blog that I’m fascinated by codes and ciphers, particularly the kind that were developed before computers really came on the scene.   That’s why I’m finding the M4 Message Breaking Project interesting: they are attempting to break three two as yet unbroken code intercepts that presumably use the Nazi 4-Rotor Naval Enigma machine.

Years ago when Simon Singh’s The Code Book came out, he ran a cipher challenge that invited readers to compete for a $10,000 prize by being the first to break 10 codes.   I broke 7 out of the 10 (all the ones I thought I had a shot) including a 3 rotor Enigma encrypt.   Breaking the 4 rotor variant with a much shorter message is a significant challenge, and they’ve managed to break one of the three already.

I’ve got their distributed client running on my machine.   We shall see how it goes. 🙂

[tags]Enigma Machine,Cryptanalysis,Codes,Ciphers,Distributed Computing[/tags]

DocuColor Tracking Dot Decoding Guide

Xerox printers use a watermarking technique to insert codes onto all printed documents from their Docucolor color laser printers. These identify date, time and printer serial number with a grid of yellow dots which appear in the printout. Presumably these codes are inserted to make the job of the Secret Service simpler in tracking their use in creating counterfeit money. What’s kind of cool though is that the EFF has figured out how to decode them. Interesting bit: the dots are simple to see when viewed under an intense blue light, like one of those blue Photon LEDs.

Minnesota court takes dim view of encryption

Sigh. Things like this really depress me. Minnesota court takes dim view of encryption | CNET News.com

A Minnesota appeals court has ruled that the presence of encryption software on a computer may be viewed as evidence of criminal intent.

From the PGP FAQ:

Who uses PGP?
People who value privacy use PGP. Politicians running election campaigns, taxpayers storing IRS records, therapists protecting clients’ files, entrepreneurs guarding trade secrets, journalists protecting their sources, and people seeking romance are a few of the law abiding citizens who use PGP to keep their computer files and their e-mail confidential.

Businesses also use PGP. Suppose you’re a corporate manager and you need to e-mail an employee about his job performance. You may be required by law to keep this e-mail confidential. Suppose you’re a saleswoman, and you must communicate over public computer networks with a branch office about your customer list. You may be compelled by your company and the law to keep this list confidential. These are a few reasons why businesses use encryption to protect their customers, their employees, and themselves.

PGP also helps secure financial transactions. For example, the Electronic Frontier Foundation uses PGP to encrypt members’ charge account numbers, so that members can pay dues via e-mail.

Whether this individual is guilty or not, this seems incredibly ill-conceived.

Elementary Crypto Lesson

I’ve been interested in codes and cryptography for quite some time. I find them at the fascinating intersection of history, mathematics and computer science: all topics that I like to read about and experiment with. Let me give you a basic crypto lesson, with a moral at the end.

Let’s say that all your messages are coded just with capital letters, and you remove all spaces. This gives you an alphabet of (suprise!) 26 letters. Let’s say that you wish to encode a message. You think to yourself: golly, I know what I’ll do. I’ll convert each letter its corresponding number in the range of 1-26. That will make it confusing! So my message HELLO will translate into

8-5-12-12-15

But that doesn’t seem very hard. People could crack that pretty simply. What to do… what to do…

Well, I could scramble the letter order. Perhaps if A was represented by 13, and B by 8, and so on, they couldn’t figure it out. But if you do cryptograms in the newspaper, you know that even with a modest amount of code text, you can crack these things pretty easily using frequency analysis and the like.

Let’s go back to our simple code again. Imagine that we had a second text, the same length as the first that we could use as a key. To encode we add the two numbers together, and if the result is greater than 26, we subtract 26. That will certainly jumble up the frequencies, preventing some kinds of analysis, and since the key is long, techniques for polyalphabetic ciphers won’t really work either.

But there is a serious flaw. Imagine that you could guess a word in the cipher text. Perhaps if the message were addressed to me, it would contain BRAINWAGON or even worse VANDEWETTERING. You could try to subtract these words out of the cipher text, and if normal words popped out the other side, you would have a partial decrypt of both the message and the encoding stream. This may seem difficult if all you are used to is normal substitution ciphers, but in fact it is dead simple to break.

One way to avoid this problem is to use what is called a one time pad. If the encoding stream is truly random, then when you can’t recover the encoding stream (it is, after all, perfectly random). One time pads are perfectly secure, with the caveat that you can never, ever, ever reuse a one time pad. Why? Because then you could subtract the two messages, the one time pad data drops out, and you are left with the simple, easy to break case listed above.

Why the cryptography lesson? Because Bruce Schneier (author of the excellent book Applied Cryptography) points out that no less than Microsoft makes this exact error. When you save a Word document, it reencodes it with precisely the same stream, and therefore if you have access to multiple versions of the same document, you can recover the entire document with elementary cryptanalysis.

This is one of the reasons I like open source: you can audit software to find errors like this, and work to correct them quickly.

Anyway, if you like cryptography, privacy and information issues, subscribe to Bruce’s Cryptogram. It’s good stuff.

Tor: An anonymous Internet communication system

This EFF-funded project sounds very interesting. It attempts to provide anonymity by making traffic analysis difficult by using something called an onion router. I’ll have to read more about it.

Tor: An anonymous Internet communication system

Tor is a toolset for a wide range of organizations and people that want to improve their safety and security on the Internet. Using Tor can help you anonymize web browsing and publishing, instant messaging, IRC, SSH, and more. Tor also provides a platform on which software developers can build new applications with built-in anonymity, safety, and privacy features.