This is just a brief note to myself to archive, well, what amounts to a single command that processes an audio file, but to stand as a placeholder for an interesting topic that I know virtually nothing about. 🙂
Here’s the basic idea: Amateur radio SSB communications are limited to narrow bandwidths (around 2.7khz is considered fairly typical). Unlike in conventional recording where fidelity might be desired, what is desired in amateur communications is to maximize the ability for you to understand the voice. Therefore, it’s fairly common for rigs to employ some kind of speech processing circuitry to modify the incoming voice signal to maximize its legibility. Often this is some hardware inside your rig, but it’s not unheard of for this functionality to be provided via software, either running on an embedded processor in your transceiver, or even as an external process running on your computer.
I was interested in trying to sort out what kind of processing was needed. It seemed to me that there were several components.
- Resample the audio down to a lower sample rate (say around 8khz)
- Bandpass filter the audio. My voice is fairly deep, but the frequencies below about 300hz aren’t very important for legibility. Wasting energy by sending them is probably misguided. Similarly for frequencies above 2700Hz or so, not for legibility, but because we don’t want our signals to be wide.
- We want to have some kind of a “noise gate”. Background noise should drop out entirely.
- Companding. We want lots of power to be placed in all the places where the noise gate isn’t on.
To prototype this idea, I recorded some audio (on my iPhone, it was convenient):
CQ from K6HX at 44.1Khz, 16 bit stereo sound, no processing…
I then used the audio “swiss army knife”, sox to process this sound using the following command:
sox memo.wav -r 8000 -c 1 compand.wav \
highpass 300 lowpass 2400 \
compand 0.1,0.1 -60,-60,-40,-10,-20,-8,-10,-6,-6,-5,-3,-3 -6 \
The command is a bit cryptic, but here’s the explanation. It specifies the input as being memo.wav, and the output will be compand.wav, resampled to 8khz and only one channel. The rest of the command specifies some processing to be done. The highpass and lowpass commands are there to trim the audio spectrum (I couldn’t remember how to specify bandpass filters with sox, but this works), followed by a “compand” function, which specifies attacks and decays at 0.1 seconds, followed by a list of gains. Each pair specifies an input power in db and an output power. Hence, -60db signals get mapped to -60db, but -40db signals get boosted to -10, -20 to -8, and so on. The overall signal is reduced by six db to avoid clipping. The polyphase command specifies a reasonably high quality rate converter to be used to downsample to 8khz.
And, here’s the resulting voice file:
Same file, K6HX calling CQ, but at 8khz, bandpass filtered and companded
The overall quality seems quite good, the legibility seems high and reasonably free of annoying artifacts. The only problem I see is in the pronunciation of “KAY” in my callsign, we lose most of the explosive sound at the beginning, and it sounds a lot like “AY”. I probably can adjust the attack/delay a bit to help that. I’ll muck with it some more in the future Ultimately my idea would be to implement a simple speech processor like this as a portable stand alone project. Using portaudio it should be very straightforward.