From the code cellar: a program to find haiku in text…

While I was surfing around this weekend, I was reminded of the old sweetcode.org website (sadly, which is no more, but you can still see courtesy of the Internet Wayback machine if you like) and found an old chunk of code that I had archived that was written by Danny O’Brien based upon an idea by Don Marti. It basically would scan a text, break it into syllables, and then output any haikus (segments of text that have the 5-7-5 syllable counts). The program further allows you to specify that the haiku should start with a capital letter, and end with a period.

It was a bit broken, but a few minutes of hacking got it working again, and I fed it one of my favorites: The Adventures of Sherlock Holmes. Here is a smattering of some of the better ones it found.

He never spoke of
the softer passions, save with
a gibe and a sneer.

You understand? I
am to be neutral? To
do nothing whatever.

She knows that the King
is capable of having
her waylaid and searched.

But then, when I found
how I had betrayed myself
I began to think.

Some letters get more
worn than others, and some wear
only on one side.

I can stand this strain
no longer; I shall go mad
if it continues.

Twice he struck at the
chamber door without any
reply from within.

He put out his hand
and coldly grasped that which she
extended to him.

It is all dark to
me. But perhaps it may grow
lighter as we go.

I’m not sure why I find these so amusing.

3 thoughts on “From the code cellar: a program to find haiku in text…”

Brian M Rosen 5/25/2010 at 4:30 pm

Oh…this is great. This needs to be combined with the google book project somehow. Or perhaps a contest where you present 5 haikus from a single source and see how identifiable it is…

Alan Yates 5/26/2010 at 12:52 am

How does the syllable parser work, rules or a dictionary?

Mark VandeWettering Post author5/26/2010 at 8:14 am

Mostly a dictionary. As part of their speech research, CMU publishes a dictionary of pronunciations in an easy to parse format. If a word isn’t in the dictionary, it falls back to a simpler but less accurate rule based algorithm. I’ve noted a couple of mistakes in the output, but it seems to work out pretty well.

http://www.speech.cs.cmu.edu/cgi-bin/cmudict

Comments are closed.

Brian M Rosen 5/25/2010 at 4:30 pm

Oh…this is great. This needs to be combined with the google book project somehow. Or perhaps a contest where you present 5 haikus from a single source and see how identifiable it is…
Alan Yates 5/26/2010 at 12:52 am

How does the syllable parser work, rules or a dictionary?
Mark VandeWettering Post author5/26/2010 at 8:14 am

Mostly a dictionary. As part of their speech research, CMU publishes a dictionary of pronunciations in an easy to parse format. If a word isn’t in the dictionary, it falls back to a simpler but less accurate rule based algorithm. I’ve noted a couple of mistakes in the output, but it seems to work out pretty well.

http://www.speech.cs.cmu.edu/cgi-bin/cmudict