I’m in serious danger of digressing into trying to learn more about speech synthesis. The discovery of CSpeak, it’s relatively lack of intelligibility and inability to properly formulate consonants has me oddly intrigued though. I’ve been digging for more information, and the source code for rsynth
yielded a link to:
Dennis Klaat’s Software for a parallel/cascade formant synthesizer
Klaat’s synthesizer includes a 40 parameter model (compared to the 7 from CSpeak) and includes at least some limited tables for vowels and consonants. it also includes FORTRAN (ugh) for the model. I’ll have to look at it more carefully, but bookmarking it for later consumption.
Addendum: The comp.speech archive might have some useful bits in it, including implementations of this model.
I spent part of today typing the code in. It has a couple of nonstandard bits in it, so no output yet…
You might also want to get a copy of “From Text to Speech: The MITalk System” by Allen, Hunnicutt and Klatt.
I’ve been pretty impressed with the speech quality and hackability of eSpeak, myself, but I haven’t looked to see how it actually does it. It seems to be eminently hackable, too, in the little that I’ve done with it.