Regular Expression Matching Can Should Be Simple And Fast
Yesterday Tom mentioned this article linked from Lambda The Ultimate the other day, and I finally got around to reading it today.
Regular Expression Matching Can Be Simple And Fast | Lambda the Ultimate
The short summary: the regular expression matching in common scripting languages like Perl and Python are many orders of magnitude slower than they need to be for fairly large classes of potential regular expressions. I guess I find this a little startling: the topic is pretty well understood, and has been so for roughly forty years. Ken Thompson wrote a paper in 1968 on the topic, a straightforward implementation of the ideas contained within runs a heck of a lot faster.
Interestingly enough, I learned how this stuff worked way back in my third year of college as an undergraduate. I took a compiler course, where we learned all about Thompson’s algorithm and how to convert regular expressions into non-deterministic finite automata, and then how to compile them into deterministic automata. This algorithm formed the basis of the lexer generator in an undergraduate compiler course, and was not difficult to write (or even understand). I’m left wondering: what are they teaching kids in Computer Science these days? Oy.
Here are some good examples of how to properly write a regular expression library.
Addendum: During the same time period, I had a personal epiphany. At the time we were using a Vax 11/750, and a fast version of fgrep, called match, written by Peter Bain came across the old mod.sources newsgroup. It used the dedicated MATCHC instruction of the VAX to search for matching strings. The funny thing was, soon after a whole bunch of even faster implementations came down the pike. It taught me that being smart was often a better thing than having better hardware.
Sounds like a positive attitude for 2025. Those stiches are going make you look like Harry Potter. :-) (Should be…
I suspect the world would be better if that percentage were even greater.
Apparently 15% of all web traffic is cat related. There's no reason for Brainwagon be any different.
Thanks Mal! I'm trying to reclaim the time that I was using doom scrolling and writing pointless political diatribes on…
Brainwagons back! I can't help you with a job, not least because I'm on the other side of our little…