Audio compression compared…

April 29, 2003 | My Projects | By: Mark VandeWettering

I’ve been interested in compression technology of all kinds for what seems like forever. Saturday I did some informal testing of the Ogg Vorbis codec, and the new speech only codec Speex.

I compress all my music (small collection that it is) using Ogg Vorbis. I’ve found that 96kbps Ogg is either indistinguishable or slightly better than 128kbps MP3 files, and I like the idea that I’m using an open codec to store my music. Ogg Vorbis is well supported on all the platforms I have (via plugins for WinAmp and XMMS), and just seems like a good choice.

I’ve noticed that even 64kbps Ogg files sound pretty good, certainly good enough to encode old time radio shows and the like. Lately I’ve been hearing a great deal about Speex, an ogg codec designed especially for encoding speech. It is tuned for either 8, 16 or 32khz audio. So I decided to try to use Speex to encode some material, and compare it to the normal Ogg Vorbis at 64khz.

For my source material, I used Rayzur’s Edge Star Wars: Second Strike, Act I. My original source material is the MP3 download, which is 30.5 megabytes, and is encoded with joint stereo at 128kbps. I used the venerable mpg123 to convert it back into a 547 megabyte wav file, ready for recompression. (Yes, I know this isn’t really a good test, since I used compressed audio source material. Sue me.)

First I tried recompressing the the audio directly into Ogg Vorbis, specifying a 64khz average bitrate. The resulting file is 21 megabytes, and has a bitrate of roughly 53kbps. I was expecting something in the 15 megabyte range, but it appears that the original doesn’t really use 128kbps even though that’s its nominal compression. Since this is and “old time radio” like program, there are probably many places where silence supression can be used, and this reduces the overall bitrate considerably.

I then set out to encode it with speex. Speex wants audio at either 8khz, 16khz, or 32khz, so I had to resample it. I used the venerable sox utility at its highest quality setting, which was slow, but got the job done. When I
downsampled the audio to 16khz, I noticed considerable popping, which was caused by audio clipping, so I had to reduce the overall range of the audio to prevent clipping. Once I did that, I encoded the result in speex’s wideband mode. The resulting file was 8.5 megabytes, over a factor of three reduction from the original mp3. Unfortunately, the audio wasn’t really of acceptable quality. I could detect occasional squawks, and levels seemed to shift occasionally at the beginning and end of words as if some kind of silence detector got fooled.

I then retried the result in ultrawide band mode. The result was considerably better, not quite as nice as the ogg version, but I found it entirely acceptable. The good news is that the resulting file was only 9.5 megabytes, barefly bigger than the wideband mode version, and still less than half the size of the ogg version.

Using speex in ultra wideband mode, you can get about sixty hours of audio onto a single CD with entirely reasonable quality. Recently I purchased Douglas Adam’s Dirk Gentley’s Holistic Detective Agency and Long Dark Teatime of the Soul in audio book form. Both books would easily fit in a single 128M Compact Flash Card.

I’ll have to experiment with this some more.