My recent playing with SSTV images coming from ARRISSat-1 have made me think a bit more about SSTV. I used two different applications to decode SSTV images (MMSSTV on Windows, and Multiscan on OS X), and got slightly different results in terms of performance from each. This leads me to ask "just what are the limits of performance for analog SSTV imagery, and how closely do existing applications approach these limits?"
Whenever I ask a question like this, there seems to be two complementary approaches: doing research and doing experiments.
Luckily, courtesy of Google, research has never really been easier. Relatively quickly I found this article by Lionel and Roland Cordesses:
It's a great article, with many good ideas. The authors have apparently implemented them in the F2DC SSTV program, which apparently in need of some repair (written for Borland tools and Win95) but could provide the basis for a new, good implementation. But their paper provides some basic ideas, and I'm not sure I agree 100% with their implementation of them, so perhaps it is more valuable as inspiration.
They use a fairly old but still useful technique based upon the Hough Transform to do high quality deskewing of the incoming image signal. Much of the paper is devoted to details of making that work. In the past, I've used RANSAC to figure out sync pulse locations in my NOAA satellite decoder. Their technique is almost certainly less computationally intensive, but RANSAC can match more complex models (including Doppler shift, which is important for NOAA images), and I find RANSAC less "fussy" to implement (it is relatively indifferent to the model you choose, which makes modularization easier).
The other part of their paper deals with robustly estimating frequency. The simplest (and worse performing) demodulators estimate frequency from just two or three samples surrounding the current pixel location. The authors instead determine frequency by convolving a window function 37 samples long around the current position (they are sampling at 44100 samples per second) or about .83 ms. In the Martin-1 mode they use, a pixels is about .57ms long, so they are looking at a window of just about 1.5 pixels. That seems okay to me, but I'm confused by some of the choices.
They choose to digitize at 44100, which seems excessive to me. SSTV information is confined to the regions between 1200 and 2300 Hz, so according to the Nyquist theorem, even sample rates of 8Khz contain all the information needed to do a decode (assuming the signals actually are bandwidth limited). I was thinking of using 11025 samples per second. The corresponding window functions that would correspond to the 37 tap filter would be about 9 samples long, which would still provide the same level of frequency discrimination, but at a lower computational cost. I can't imagine any practical reason to use finer sampling (DSP experts can point out any errors in my thinking in comments).
The cool part of their system is that they estimate the signal to noise ratio of the signal (by comparing power inside and outside the SSTV bandwidth) and use longer filters to estimate frequency when the SNR is poor. This makes a tradeoff between resolution and noise immunity, which seems in practice to be quite effective.
It would be interesting to make modern implementation, using gcc and fftw3 as a basis, and documenting all of the design choices. I think it would also be good to test the decoders against both AWGN (additive white Gaussian noise) and perhaps using an HF channel simulator to judge their performance. I'm most interested in the Robot36 mode, since that seems to be a common choice for satellites, but Scottie and Martin modes are also essential.
If anyone else has any interesting references on SSTV transmission and reception, feel free to add them via comments.