I’ve been thinking about making a kind of “codec2 robot” that people can telnet to and get responses from, and toward that end, I thought I’d see how it did on synthetic speech, since I thought being able to use a speech synthesizer for responses would be good. I had a file generated from my “mscript” Morse code tutorial generator thingie lying around, and passed it through Codec2.
Original sound clip, voice generated on MacBook, downsampled to 8khz.
Compressed via Codec 2.
It’s quite starting, but the segments containing morse simply disappear. I’m not sure I’d call that a flaw, but it may limit the usability of the codec in certain cases (like the CW practice nets held on some repeaters). Does anyone know if D-Star/AMBE does a similarly bad job of reconstructing these relatively pure tones?
Just curious.
Thanks Mark, that is an interesting test. Speech Codecs are highly optimised for human speech. Especially low bit rate codecs, which fit a model of human speech production that send model parameters (rather than the waveform) to the decoder. When a signal falls outside of that model, a codec falls over.
In this example, very few humans can emit a single tone, and the rate at which the CW tone is modulated (switched on and off) is faster than a human can articulate sounds.
A classic case is passing DTMF tones, very important for telephony circuits that use codecs like g.729. The usual approach is too hook up a DTMF detector in parallel with the speech encoder. When valid DTMF tones are detected, the codec is disabled and a special frame is transmitted that just says to the decoder “synthesise these DTMF tones pls”.
Here is the cool thing. As this is an open source codec, we could add algorithms to handle CW, just like many commercial vocoders handle DTMF. Part of the process at the encoder could be decoding (demodulating?) the morse into ASCII.
Cheers,
David
Sure thing David. Sending Morse, DTMF or other subaudible tones might be useful in many applications of interest to hams. I was actually just testing to see if the synthesized voice was still legible and it is indeed. I am trying to set up an echo server, just for fun/testing.
It’s great fun to test the limits of codec2 (and other voice encoding schemes) by, for instance, setting up an audio loopback and singing into the mike or playing a musical instrument. People interested in playing with speech synthesis and codec2 (which work very well together) might want to copy the bash script I’ve posted here:
http://ve9qrp.blogspot.com/2010/11/festival-speech-synthesis-and-codec2.html
One approach to the issues you raise here, Mark, would be to have multiple codecs at hand and a standard for metadata indicating which codec was in play. It’s a great pity that the D-star protocols do not include such metadata for their Digital Voice channel, making it very difficult seamlessly to swap in C2.
As for an echo server, I’ve been playing with a sort of ‘Skype call testing’ scenario, using this script:
ncat –udp -l $LOCAL_HOST $LOCAL_PORT –sh-exec ‘netstat -nu | grep
$LOCAL_HOST:$LOCAL_PORT | head -1 | cut -c 45-57 > /tmp/out.ip ;
timelimit -t10 -T1 cat > /tmp/out.c2 ; cat /tmp/morig.c2 /tmp/out.c2 >
/tmp/out2.c2 ; ncat –udp `cat /tmp/out.ip` $LOCAL_PORT < /tmp/out2.c2'
This copies the first 10 sec. of a streamed udp transmission to our local host, then plays back a canned message (morig.c2) followed by the stored 10 seconds. The only difficulty is if more than one person connects at the same time, I'm not sure netstat will properly report each persons' IP address. If there is a way to ensure that netstat prints the connections in chronological order, then this script's concept will work.
A matching script to connect to this server, or to make C2 QSOs is at http://dl.dropbox.com/u/2142578/Blog%20Files/c2qso.sh