I suspect the world would be better if that percentage were even greater.
Audio processing for amateur radio voice communications…
This is just a brief note to myself to archive, well, what amounts to a single command that processes an audio file, but to stand as a placeholder for an interesting topic that I know virtually nothing about. 🙂
Here’s the basic idea: Amateur radio SSB communications are limited to narrow bandwidths (around 2.7khz is considered fairly typical). Unlike in conventional recording where fidelity might be desired, what is desired in amateur communications is to maximize the ability for you to understand the voice. Therefore, it’s fairly common for rigs to employ some kind of speech processing circuitry to modify the incoming voice signal to maximize its legibility. Often this is some hardware inside your rig, but it’s not unheard of for this functionality to be provided via software, either running on an embedded processor in your transceiver, or even as an external process running on your computer.
I was interested in trying to sort out what kind of processing was needed. It seemed to me that there were several components.
- Resample the audio down to a lower sample rate (say around 8khz)
- Bandpass filter the audio. My voice is fairly deep, but the frequencies below about 300hz aren’t very important for legibility. Wasting energy by sending them is probably misguided. Similarly for frequencies above 2700Hz or so, not for legibility, but because we don’t want our signals to be wide.
- We want to have some kind of a “noise gate”. Background noise should drop out entirely.
- Companding. We want lots of power to be placed in all the places where the noise gate isn’t on.
To prototype this idea, I recorded some audio (on my iPhone, it was convenient):
CQ from K6HX at 44.1Khz, 16 bit stereo sound, no processing…
I then used the audio “swiss army knife”, sox to process this sound using the following command:
sox memo.wav -r 8000 -c 1 compand.wav \ highpass 300 lowpass 2400 \ compand 0.1,0.1 -60,-60,-40,-10,-20,-8,-10,-6,-6,-5,-3,-3 -6 \ polyphase
The command is a bit cryptic, but here’s the explanation. It specifies the input as being memo.wav, and the output will be compand.wav, resampled to 8khz and only one channel. The rest of the command specifies some processing to be done. The highpass and lowpass commands are there to trim the audio spectrum (I couldn’t remember how to specify bandpass filters with sox, but this works), followed by a “compand” function, which specifies attacks and decays at 0.1 seconds, followed by a list of gains. Each pair specifies an input power in db and an output power. Hence, -60db signals get mapped to -60db, but -40db signals get boosted to -10, -20 to -8, and so on. The overall signal is reduced by six db to avoid clipping. The polyphase command specifies a reasonably high quality rate converter to be used to downsample to 8khz.
And, here’s the resulting voice file:
Same file, K6HX calling CQ, but at 8khz, bandpass filtered and companded
The overall quality seems quite good, the legibility seems high and reasonably free of annoying artifacts. The only problem I see is in the pronunciation of “KAY” in my callsign, we lose most of the explosive sound at the beginning, and it sounds a lot like “AY”. I probably can adjust the attack/delay a bit to help that. I’ll muck with it some more in the future Ultimately my idea would be to implement a simple speech processor like this as a portable stand alone project. Using portaudio it should be very straightforward.
Comments
Comment from Mark VandeWettering
Time 10/28/2009 at 4:39 pm
I believe that it applies the filters in left to right order, but I’m a bit confused as to the way it does resampling. I specified the “polyphase” resampler, which it warns is obsolete, and that I should check the manual page, but the man page is remarkably unhelpful in telling me what the RIGHT answer is. Oh well.
(I’ve broken the command line across multiple lines so you have a chance of seeing it in all its glory.)
Comment from Guido
Time 11/1/2009 at 4:50 am
Would not it the most efficient if just the fundamental frequencies of a voice are tracked and transmitted?
So the voice signal is transformed into a sequence of frequencies on which a PLL or DDS is tuned in time.
73, Guido
Comment from BBG
Time 6/11/2010 at 8:50 pm
I think that that the information you have posted is quite informative. I have been doing a lot of queries searches to try and locate this info. It is clear to me to me that you are super educated in the communications space. I realize that it is extremely difficult work to come up with new stuff….continue the good work. It is appreciated.
Comment from AC0KG
Time 10/28/2009 at 2:24 pm
Does sox apply the filters in any particular order? What difference might it make to the resulting audio quality? I would expect that the downsampling should be one of the last filters applied.
Might be interesting to try to identify a fundamental or primary pitch in the original sample and if it falls above or below an ideal range for intelligibility in the processed signal, pitch-shift it toward the threshold. This might reduce the loss of usable content that falls outside the high/low cut-off at the expense of making the user sound less like his or her self.