Category Archives: Computer Science

A Simple Python Program for Factoring…

Sometimes, your interests converge. Over on Programming Praxis, he had a coding challenge to implement Monte Carlo factorization. The last couple of days, I have been thinking a bit more about factorization, so it seemed like a good idea to give it a whack. I cribbed a version of the Rabin Miller primality tester off the web, and then coded the rest up in python.

#!/usr/bin/env python

# 
# a basic implementation of the Pollard rho factorization
# Written by Mark VandeWettering <markv@pixar.com>
#

import sys
import locale
import random

class FactorError(Exception):
    def __init__(self, value):
	self.value = value 
    def __str__(self):
	return repr(self.value)

def miller_rabin_pass(a, n):
    d = n - 1
    s = 0
    while d % 2 == 0:
        d >>= 1
        s += 1

    a_to_power = pow(a, d, n)
    if a_to_power == 1:
        return True
    for i in xrange(s-1):
        if a_to_power == n - 1:
            return True
        a_to_power = (a_to_power * a_to_power) % n
    return a_to_power == n - 1

def isprime(n):
    for repeat in xrange(20):
        a = 0
        while a == 0:
            a = random.randrange(n)
        if not miller_rabin_pass(a, n):
            return False
    return True



def gcd(a, b):
    while b != 0:
	a, b = b, a%b
    return a 

def findfactor(n):
    for c in range(1, 50):
	x = y = random.randint(1, n-1)
	x = (x * x + c) % n
	y = (y * y + c) % n
	y = (y * y + c) % n
	while True:
	    t = gcd(n, abs(x-y))
	    if t == 1:
		x = (x * x + c) % n
		y = (y * y + c) % n
		y = (y * y + c) % n
	    elif t == n:
		break
	    else:
		return t
    raise FactorError("couldn't find a factor.")
	
def factor(n):
    r = []
    while True:
	if isprime(n):
	    r.append(n)
	    break
	try:
	    f = findfactor(n)
	    r.append(f)
	    n = n / f
	except FactorError, msg:
	    r.append(n)
	    break
    r.sort()
    return r 

def doit(n):
	flist = factor(n)
	print locale.format("%d", n, True), "="
	for f in flist:
		print "\t%s" % locale.format("%d", f, True)

locale.setlocale(locale.LC_ALL, "")

doit(2**98-1)

This is a fairly challenging factorization, and after a few seconds, it outputs:

316,912,650,057,057,350,374,175,801,343 =
	3
	43
	127
	4,363,953,127,297
	4,432,676,798,593

Taking my own code challenge…

Back in 2007, I made a blog post entitled “Code Challenge”, where I put forth a code challenge.

Code Challenge | brainwagon.

Nobody took the challenge. But what was slightly odder was that I had no recollection of making this challenge, or even what type of cipher it was. The only clue that I had was that it was apparently in response to an episode of the increasingly mediocre television show Numb3rs.

So, I considered it a challenge sent from past Mark to future Mark. The first thing I had to do was to try to identify the type of cipher. I had made this posting in June of 2007, and I guessed that I would have been watching episodes from that season (season 3) but possibly in reruns. I relatively quickly located “The Janus List” and its “straddling checkerboard cipher” as the likely method. Indeed, reading the description of the cipher (try this blog entry for details) it began to come back to me.

So, I tried to think of how to crack it. Here’s the insight. The straddling checkerboard cipher is essentially a simple subsitution cipher, except that each character is either encoded into one or two characters. What makes it easy is that there are eight single character letters, and twenty double character letters, but all double character letters start with one of the two non-single character letters. So, you can proceed by guessing what those two prefix characters are, and then scanning through the message. If you get more than 26 (or 28, if you include period and slash as most straddling checkerboard implementations do), then you know you haven’t gotten the right answer. But a quick look at the single letter frequencies in my case indicated that the characters 6 and 8 were vastly more frequent than the other characters, so I guessed that those were the right characters. When I busted the cipher up into characters, I ended up with 23 distinct symbols, which seemed entirely appropriate. I could then assign a random letter to each of those 23 symbols, and I’d have a simple cryptogram.

Here’s the resulting cryptogram:

WRAFSIFFKVIQKFINCOMGWIDUFANVKFWKVVIOIDCRICEIOBHKODCAGRIOQRACRMHUGOHBKBT
MRKJIKTOIKDMLAVUOIDHUWALMHUKOIOIKDANVWRAFCHNVOKWUTKWIMHUOFITLHNBIANVIPC
IIDANVTMCTIJIOKNDWINKCAHUFADHUBWKNMHNIQATTKCWUKTTMQHOERKODINHUVRWHKCWUK
TTMDICHDIWRAFSIFFKVIFUCCIFFLUTTM

Mind you, I might have accidently assigned a letter to itself here, so it’s not quite like a normal cryptogram. But it isn’t hard to make some guesses. ‘I’ is the most common letter, so is probably E. The last three characters are TTM, which implies a double letter right at the end, which suggested LLY. The double letter digraph FF occurs three times, which suggests that F might map to S. Putting these together makes the last word to look like SuccESSluLLY. Wow! U and C both look like they map to themselves. L is F. The message starts TraS, which suggests THIS. R becomes H, A becomes I. Now we have traction, K is A, S is M, V is G…

THISMESSAGE.ASE.C.Y.TE.USI.GASTAGGE.E.CHEC.E...A..CI.HE..HICHY.U....A.L
YHA.EAL.EA.YFIGU.E..UTIFY.UA.E.EA.I.GTHISC..G.ATULATEY.U.SELF...EI.GE.C
EE.I.GLYCLE.E.A..TE.ACI.USI..U.TA.Y..E.ILLACTUALLY....HA..E..UGHT.ACTUA
LLY.EC..ETHISMESSAGESUCCESSFULLY

It’s a short skip and a jump to the complete decode:

THISMESSAGEWASENCRYPTEDUSINGASTAGGEREDCHECKERBOARDCIPHERWHICHYOUPROBABL
YHAVEALREADYFIGUREDOUTIFYOUAREREADINGTHISCONGRATULATEYOURSELFONBEINGEXC
EEDINGLYCLEVERANDTENACIOUSIDOUBTANYONEWILLACTUALLYWORKHARDENOUGHTOACTUA
LLYDECODETHISMESSAGESUCCESSFULLY

Apparently, I was tenacious enough.

Revisiting Simon Singh’s Cipher Challenge…

The other day, my friend Jeff sent me an email detailing the latest benchmarking of his new 8 core Intel box. He had written a program to compute return rates for all possible draws in video poker. A few years ago, this program had taken a month of computer time to complete, and he wondered how how quickly it would run on his new box. The answer: around 12 hours.

This got me thinking back to my experience using Simon Singh’s “Cipher Challenge” that I did back in 2000. I hadn’t tried to solve Stage 9, because at the time it seemed a daunting task, requiring computational resources far in excess of what I had at the time. But Jeff’s email got me thinking, so I decided to run an experiment: how long would it really take?

I didn’t have a fast, optimized DES implementation of my own, so I decided to dig around. It turns out that OpenSSL contains an implementation of DES, and a few minutes of writing some code verified that it could actually decode the challenge message. So, the question was: how long would it take to break this code?

Stage 8 had revealed 8 of the 56 key bits, leaving a keyspace of 2^48 entries left (about 281 trillion entries). So, I coded up a small test program that did a million DES_set_key calls, followed by a decryption of the message. I found that it took about 21 microseconds to do a test. On average, we’ll search half the keyspace (around 140 trillion keys) before we find the key, so if you multiply it out, you get…. 93.65 years. We have eight cores, which brings it down to about 11.7 years. Not so good…

But here’s the thing: we probably don’t have to really decode the entire message. If we believe that the output is ASCII, then each of the output bytes should have the top bit clear. Any block which doesn’t have this feature indicates our key is on the wrong track. The odds of getting all those bits clear by random chance is one in 256. If we try to only decode the first six blocks, then our chance that we got something wrong is about one in 2^48th, which is just about the size of the keyspace, so we’d probably wouldn’t see very many false positives. If we modify the rule to just check the first six blocks, then it only takes about 1.16 microseconds to do a key set and a decode. Now my single machine can get the run time down to 236 days. Still pretty darned slow actually. It’s clear that openssl’s DES implementation isn’t really up to the task. I suspect that there are software implementations which are several times faster than this. I’ll try to dig one up and see what could really be done.

Addendum: When I got home, I dug up a copy of Eric Young’s libdes library, and ran some basic speed tests using the “speed” program that’s part of the distributions. I wasn’t running it on the same machine, so the numbers aren’t directly comparable, but my home machine quite a bit slower. Nevertheless, I got some interesting numbers:

[chessboard] % ./speed
Doing set_key for 10 seconds
29270676 set_key's in 10.00 seconds
Doing des_encrypt's for 10 seconds
54502334 des_encrypt's in 9.93 second
Doing des_cbc_encrypt on 1024 byte blocks for 10 seconds
402292 des_cbc_encrypt's of 1024 byte blocks in 10.00 second
Doing des_ede_cbc_encrypt on 1024 byte blocks for 10 seconds
146870 des_ede_cbc_encrypt's of 1024 byte blocks in 10.01 second
Doing crypt for 10 seconds
1731166 crypts in 9.92 second
set_key            per sec =   2927067.60 (    0.342uS)
DES raw ecb bytes  per sec =  43909231.82 (    0.182uS)
DES cbc bytes      per sec =  41194700.80 (    0.194uS)
DES ede cbc bytes  per sec =  15024463.54 (    0.532uS)
crypt              per sec =    174512.70 (    5.730uS)

If you add the set_key time to 48 times the raw ecb encode, you get about 9 microseconds. Using both of its two cores, that works out to about 20 years. That’s about 31 times slower, or nearly 8x slower per core.

Matt Blaze: Security and Human Behavior 2009

For all the people who’ve come to see this as my ham radio blog, I apologize, but I do have other interests, and computer security is one of them. It’s not that I am any kind of expert, but I have played around quite a bit with various bits of computer security and cryptography over the years, and enjoy reading up on it. Recently a small interdisciplinary workshop was held at MIT on this subject, and Matt Blaze was kind enough to make recordings of the workshop. I’m loading these onto my iPhone (perhaps the last thing new it will see before my new iPhone 3GS arrives!) Haven’t checked the audio quality yet, but hopefully the sessions will be interesting…

Matt Blaze: Security and Human Behavior 2009

The Next 700 Programming Languages

I learned that computer scientist Peter Landin passed away recently. Landin’s research helped refine the direction of my college studies, and was always a great pleasure to read. His derivation and explanation of the SECD machine served as the basis for a more mature and clear understanding of many aspects of programming languages and their evaluation. And, of course, there was his classic paper “The Next 700 Programming Languages”, which provides an interesting perspective on programming languages, where they should were, where they are going, and what is significant about their design. Great stuff!

The Next 700 Programming Languages

Addendum: Here’s my attempt to embed a link to this paper here.

Alpha-Beta Search is Hard To Debug

So, I’ve spent a few hours over the last couple weeks to try to debug milhouse, my computer checkers program that I tinkered together over the last few years. It’s kind of maddening, because in so many ways it seems to work, but it has a number of rather maddening behaviors which can only be classified as bugs. Here’s an example. Consider this simple 2 K vs. 1K endgame (entirely typical, and the kind of endgame that I can even do).

board1

It’s easy, right? Sure enough, Milhouse finds the 14 move sequence that basically forces the White king to flee the double corner, where disaster awaits. The search is very quick, taking just a fraction of a second.

But here’s the thing: nominally more difficult puzzles seem to confuse milhouse entirely. Let’s modify the puzzle to make the winning sequence just a bit longer.

board2

Obviously, this is just 8 half moves (or plies) away from the previous one. (You can see that if White does anything but jigger back and forth, he does nothing but hasten his own demise.) But for some odd reason, a 25 ply search doesn’t discover this winning line.

So, here’s the thing: alpha-beta search is a lot like binary search. To write a bug free binary search is remarkably challenging. Try it sometime. There are all sorts of odd edge conditions you can get wrong if you aren’t careful. Jon Bentley uses it as the prototypical example of how to develop and debug software in his classic work Programming Pearls. And alpha-beta search is 10x worse, for a number of reasons:

  1. The algorithm has the same sort of hidden subtlety that binary search does. There are many edge conditions (is the invariant that alpha < beta? alpha <= beta? do cutoffs occur when score is > alpha? or >= alpha?) that aren’t consistently described in the classic literature on algorithms.
  2. It operates on (typically) very large search trees. It’s hard to find bugs by examining the search tree because of its raw size, and because many minor problems (such as the comparison issues above) are effectively hidden because they often don’t affect the outcome of searches.
  3. There is lots of information about alpha beta search on the web, but almost all available sources suffer from poorly described invariants, and often don’t implement the necessary complications (such as transposition tables) which further complicate the design.
  4. If you do find code that does work and is non-trivial, you can’t understand it. Game programmers love to “optimize” code, which means that all useful patterns and structure have been abstracted out of the program, using only a helpless hodge podge.

Okay, I’m just bitching. I’ve got 99% of this code right, but the 1% is causing me grief.

Addendum: Oh, and here’s another interesting thing. You might remember from a previous blog posting that I counted the number of possible checkers positions for each number of pieces. If you look at the table, you’ll find that there are about a quarter of a million positions with three pieces or less. My checkers program can compute about twenty five million positions a second, yet I’m not finding the winning search. It’s not too hard to imagine what the problem is. Let’s assume that the branching factor for this problem is roughly eight. If the winning line is 25 ply long, the full search tree is 825 nodes. That’s 3.7E22 positions. Even with alpha-beta searching, that’s a huge number of nodes. This is where transposition tables are supposed to come in. There are many paths that lead toward equivalent positions, so there is no reason to research the same trees over and over again. So, you maintain a table that records the outcomes of previous searches, and if the same position shows up, you already know the answer and don’t have to search.

Obviously, mine isn’t working too well. If it were, I’d expect to quickly expand the search perimeter to find nodes I hadn’t seen before, and locate the winning line.

TEMPEST: A Signal Problem – Hack a Day

I’m not 100% obsessed (more like 98%) with radio topics: this morning, I found this link on Hack a Day which provided a link to several articles having to do with TEMPEST. I’ve blogged about TEMPEST before, but for those who haven’t heard of it before, it’s a way of eavesdropping on electronic signals by listening for insecure, electronic emissions. I’d seen some of these before, but I hadn’t seen this work on evesdropping on USB keyboard emissions:


Compromising Electromagnetic Emanations of Keyboards Experiment 2/2 from Martin Vuagnoux on Vimeo.

I also hadn’t seen TEMPEST: A Signal Problem, a paper recently released under the FOI detailing the history of TEMPEST. Very interesting.

Re-animating the PDP-11/70

A few years ago, Tom Duff and I each wrote an emulator for the PDP-1 so we could play the original version of Space Wars! I learned a lot about old computers in the week or so it took me to do, and I must admit that I’ve retained a fascination for old computers ever since. Tom mentioned that he has a front panel from an old PDP-11, and has talked about doing a project where he wires the front panel to a more modern machine running a PDP-11 emulator, which seemed like a cool idea. After all, modern computers just don’t have enough blinking lights. Here’s a link to a project which does precisely that using an inexpensive Zilog microcontroller over ethenet. It also includes some links to other similar and interesting projects. Check it out.

Re-animating the PDP-11/70

Google Code University

One of the greatest things about computing technology is simply how much information is available to anyone who is interested. Technical reports, papers, and most importantly, open source software tools are all available to whomever wants them. This represents a significant democratization of technology. Major universities like MIT are even making entire college level courses available to the public.

It’s nice to see that Google is trying to do the same by establishing the Google Code University. Lots of good information available here.

Google Code University – Google Code

This website provides tutorials and sample course content so CS students and educators can learn more about current computing technologies and paradigms. In particular, this content is Creative Commons licensed which makes it easy for CS educators to use in their own classes.

The Courses section contains tutorials, lecture slides, and problem sets for a variety of topic areas:

  • AJAX Programming
  • Algorithms
  • Distributed Systems
  • Web Security
  • Languages

Keyboard Acoustic Emanations Revisited

While my blog has been dominated by radio related stuff lately, I do continue to be interested in lots of different subjects, including various topics related to computer security and codes. While scanning my feeds today, I found reference to this work, which I hadn’t seen before, but which I find interesting both for its security implications and its use of machine learning. Very cool.

Keyboard Acoustic Emanations Revisited

We examine the problem of keyboard acoustic emanations. We present a novel attack taking as input a 10-minute sound recording of a user typing English text using a keyboard, and then recovering up to 96% of typed characters. There is no need for a labeled training recording. Moreover the recognizer bootstrapped this way can even recognize random text such as passwords: In our experiments, 90% of 5-character random passwords using only letters can be generated in fewer than 20 attempts by an adversary; 80% of 10- character passwords can be generated in fewer than 75 attempts. Our attack uses the statistical constraints of the underlying content, English language, to reconstruct text from sound recordings without any labeled training data. The attack uses a combination of standard machine learning and speech recognition techniques, including cepstrum features, Hidden Markov Models, linear classification, and feedback-based incremental learning.

@ SIGGRAPH 2008

Well, I’m sitting at The Standard (a frankly far too chic hotel for a forty something computer geek like myself), it’s not quite 7 A.M. and I spent my first day at SIGGRAPH. I’m hear mostly for the benefit of recruiting: sitting in the booth, answering questions and showing up at our Pixar User’s Group Meeting. We are handing out 20th anniversary Renderman walking teapots: very nice, I managed to get one for Josh, but haven’t picked up one of my own: I’ll try to later and get a picture here.

I’m not really attending papers (you can get links here) but there seems to be quite a bit of buzz about Intel’s Larrabee architecture. Broadly speaking, the trend of GPUs has been to slowly work to expand both the number and capability of the different functional units: more shader units, that can execute more arbitrary code, and more texture units, which can present results which are available to more units. Larrabee leapfrogs this: we are back to having X86 cores (not my favorite architecture, but ubiquitous) which are fully general, linked together by a fast shared cache with scheduling done in software. To me, this represents the obvious end game to the evolution of GPUs. Companies like nVidia have been trying to tell us that we can use GPUs to do more general computation: Intel has delivered an architecture where that claim is much more obviously true.

Oh well, I’m gonna get some pancakes at IHOP, then walk the show floor a bit. I want to try to see what the state of the art in stereo monitors is, and maybe see who I can bump into. I’ve got booth duty again at 1:30 (more teapots handed out at 2:00) and then the User’s Group meeting at 6:30. If any readers are at SIGGRAPH, feel free to come by the booth and say hi.