Okay, it’s been a while since I posted anything: I’ve been busy with travel and the holidays, and now I’m trying to get my home office/shack setup so I can pursue some other projects. It is one of those rooms that has piles of crap, some of which I haven’t seen in years, so I’m carefully working through it, tossing stuff that is useless and organizing the remainder.
As a result, I’m trying to also make the space a bit more engaging, so I’ll spend more time there. My eventual goal is to get some of my radios in here so I can listen to more shortwave and ham traffic, but I also spend a fair amount of my off hours on IRC (mostly on the #hamradio channel on irc.freenode.net using my callsign K6HX as my nickname). Instead of forcing me to sit reading a screen, I thought it would be less intrusive and allow me to get more work done if I could monitor the channel by having a voice synthesizer read the msgs that appear in the #hamradio channel. That way, I could just go about my business, but still hear the conversations.
So, that was my idea.
My first thought was to simply run pidgin (a fairly nice IRC client that runs on multiple platform) and just use the pidgin-festival plugin for voice synthesis. But it turns out it was more difficult than I had hoped: nothing I tried seem to make it work. When I cracked open the source code after an hour of flailing, I was annoyed to find a bunch of really questionable code (hence my comment on twitter earlier about chimps writing code), and I soured on the idea. I turned off the computer and went to bed.
On the drive home from work yesterday, I thought about the problem again. I knew that festival could do the speech synthesis. I knew that the curses based irc client epic5
could write a log file of each msg. I then hatched a simple idea: I’d write a little bit of Python glue that would basically act like tail -f
: it would repeatedly scan for new lines at the end of the irc.log file, and then use popen
to the festival voice synthesizer, spew out what needed to be read, and then continue.
So, here’s the code!
[sourcecode lang=”python”]
#!/usr/bin/env python
import sys
import os
import select
import time
import re
msgpat = re.compile("<([A-Za-z0-9_]+)> (.*)$")
actpat = re.compile("\* (.*)$")
def say(nick, s):
p = os.popen("festival –tts", "w")
p.write(”’%s says %s”’ % (nick, s))
p.close()
def act(s):
p = os.popen("festival –tts", "w")
p.write(s)
p.close()
def process(s):
m = msgpat.match(s)
if m:
say(m.group(1), m.group(2))
else:
m = actpat.match(s)
if m:
act(m.group(1))
f = open("irc.log", "r")
f.seek(0, 2)
while True:
l = f.readline()
while l:
process(l.strip())
l = f.readline()
time.sleep(1)
[/sourcecode]
It works fairly well. There are a couple of things that I need to work on. First, I think the main loop is fairly inelegant. What I want is a version of readline() that works more like a read from a pipe: blocking when there is no further data to be read. But overall, this code mostly prevents the busy wait that would result without the sleep call.
There really isn’t any reason not to use an irc client library and capture the events that you want to speak directly. I didn’t go that way because frankly I didn’t want to spend more than 10 minutes testing out the idea. I’ll probably code up that soon though.
Other things:
- It doesn’t do good things with emoticons like
:-)
. You could build a dictionary to map things like that to words like smile or grin. Ditto for things like “hmmm” and “hahaha”. - It announces the speaker before every message. If the next message is the same person, it could just say what they said without the redundant introduction.
- It doesn’t generalize to reading my twitter feed.
- It should also drop things that look like URLS (they aren’t interesting to hear read out loud.)
Funny, I had the same idea about the same time as you. My program wound up being written in Perl, and using espeak and mplayer, rather than Festival. Some nice features were that it filtered out log timestamps, and only spoke lines that were written by someone with a nick (rather than everything in the log, like join/part messages).
Another enhancement you might consider is assigning different voices to different nicks.