28 | October | 2009 | brainwagon

This is just a brief note to myself to archive, well, what amounts to a single command that processes an audio file, but to stand as a placeholder for an interesting topic that I know virtually nothing about. 🙂

Here’s the basic idea: Amateur radio SSB communications are limited to narrow bandwidths (around 2.7khz is considered fairly typical). Unlike in conventional recording where fidelity might be desired, what is desired in amateur communications is to maximize the ability for you to understand the voice. Therefore, it’s fairly common for rigs to employ some kind of speech processing circuitry to modify the incoming voice signal to maximize its legibility. Often this is some hardware inside your rig, but it’s not unheard of for this functionality to be provided via software, either running on an embedded processor in your transceiver, or even as an external process running on your computer.

I was interested in trying to sort out what kind of processing was needed. It seemed to me that there were several components.

Resample the audio down to a lower sample rate (say around 8khz)
Bandpass filter the audio. My voice is fairly deep, but the frequencies below about 300hz aren’t very important for legibility. Wasting energy by sending them is probably misguided. Similarly for frequencies above 2700Hz or so, not for legibility, but because we don’t want our signals to be wide.
We want to have some kind of a “noise gate”. Background noise should drop out entirely.
Companding. We want lots of power to be placed in all the places where the noise gate isn’t on.

To prototype this idea, I recorded some audio (on my iPhone, it was convenient):

CQ from K6HX at 44.1Khz, 16 bit stereo sound, no processing…

I then used the audio “swiss army knife”, sox to process this sound using the following command:

sox memo.wav -r 8000 -c 1 compand.wav \
  highpass 300 lowpass 2400 \
  compand 0.1,0.1 -60,-60,-40,-10,-20,-8,-10,-6,-6,-5,-3,-3 -6 \
  polyphase

The command is a bit cryptic, but here’s the explanation. It specifies the input as being memo.wav, and the output will be compand.wav, resampled to 8khz and only one channel. The rest of the command specifies some processing to be done. The highpass and lowpass commands are there to trim the audio spectrum (I couldn’t remember how to specify bandpass filters with sox, but this works), followed by a “compand” function, which specifies attacks and decays at 0.1 seconds, followed by a list of gains. Each pair specifies an input power in db and an output power. Hence, -60db signals get mapped to -60db, but -40db signals get boosted to -10, -20 to -8, and so on. The overall signal is reduced by six db to avoid clipping. The polyphase command specifies a reasonably high quality rate converter to be used to downsample to 8khz.

And, here’s the resulting voice file:

Same file, K6HX calling CQ, but at 8khz, bandpass filtered and companded

The overall quality seems quite good, the legibility seems high and reasonably free of annoying artifacts. The only problem I see is in the pronunciation of “KAY” in my callsign, we lose most of the explosive sound at the beginning, and it sounds a lot like “AY”. I probably can adjust the attack/delay a bit to help that. I’ll muck with it some more in the future Ultimately my idea would be to implement a simple speech processor like this as a portable stand alone project. Using portaudio it should be very straightforward.

The Knightsqrss mailing list still intrigues me, although my own qrss beacon has been off the air for a while (I’m trying to get around to homebrewing its replacement, but have been distracted by other facets of life for a while). Recently, Eddie G3ZJO offered this rather interesting circuit which I scratched my head about for a while:

This “WSPR-ORGAN” is an interesting idea. It uses a 74HC86 Quad Exclusive NOR gate in a clever way. Two of the gates are configured as oscillators/inverters fed by a pair of identical crystals, one running with a fixed capacitance, the other with some capacitance switched in via a control signal. The output of these two oscillators is fed in to a third XNOR gate which serves as a mixer, producing sum and differences of the two signals. Eddie is interested in the difference, which is small (in the audible range) and which has little drift, since the two oscillators will likely have identical drift characteristics which will cancel out when using the signal to get the difference signal. What we essentially have is a very precise voltage controlled oscillator which puts out a signal in the audio range. Pretty neat.

The resulting discussion on the Knights was similarly interesting, and lead me to a series of interesting web pages. I hadn’t seen the 74HC86 used in this application before, but a little quick Googling lead me to Ian, VK3KRI’s page on using the 74HC86 as a crystal synthesizer. A diversion into the world of “subharmonic” mixers yielded a link to this interesting receiver design for QRSS which uses a common 5.0688Mhz oscillator which is tricked to operate at twice the frequency to provide good reception around the QRSS watering hole at 10.140Mhz. This page also links to LA8AK’s page which explains some of the interesting mixer ideas. Too much for me to absorb before coffee, but all cool stuff.

Addendum: Checking the prices of Digikey reveals that 5.0688Mhz crystals are in fact a very common item, and can be had for about $.40 a piece. Frankly, the 10.140 Mhz crystals offered by Expanded Spectrum Systems aren’t exactly burdensome, but it’s interesting to see other designs which make use of even more common crystal frequencies.