Daily Archives: 7/23/2010

Primality testing with Perl regexs

Okay, here’s a little pet peeve of mine. Somebody brought up that it’s possible to do primality testing using Perl regular expressions. This has been kicking around for a while, I’m not sure who originated the idea, but you can find links to it with a Google search. As an example, consider the Perl one liner here:

Primality testing with Perl regexs@Everything2.com

It’s a very cute hack, so where is my peeve?

The regular expression they list isn’t a regular expression.

It seems a bit of a quibble. I’ve had people argue that whatever the meaning of regular expressions might be, the “regular expressions” matched by popular “regular expression” libraries like Perl and Python should supercede, because they are in fact so ubiquitous. I might argue, except for one thing:

Real regular expressions can be matched really quickly, but are seldom matched very quickly at all by Perl, Python, or Ruby. Russ Cox wrote a nice article about this. Yes, perl regular expressions can match all sorts of complex things that real regular expressions can’t, but if all you’ve got is a real regular expression, perl’s performance is typically an order of magnitude slower than necessary, and in fact, perl can be arbitrarily slow. This should come as no surprise: since the perl implementation of primality testing does work, it obviously must be slow: it is, after all, capable of any arbitrary computation.

It’s not like doing it correctly is even difficult: we implemented all this stuff back in Ginnie Lo’s compiler class back in the early 80s, and it was old back then. We made lexical analyzers that began by compiling NFAs for all the tokens in the language, and then converting them into DFAs, which marched along and processed files in linear time.

Okay, it’s my peeve, but there is some good, beautiful computer science underlying it all.