While my blog has been dominated by radio related stuff lately, I do continue to be interested in lots of different subjects, including various topics related to computer security and codes. While scanning my feeds today, I found reference to this work, which I hadn’t seen before, but which I find interesting both for its security implications and its use of machine learning. Very cool.
Keyboard Acoustic Emanations Revisited
We examine the problem of keyboard acoustic emanations. We present a novel attack taking as input a 10-minute sound recording of a user typing English text using a keyboard, and then recovering up to 96% of typed characters. There is no need for a labeled training recording. Moreover the recognizer bootstrapped this way can even recognize random text such as passwords: In our experiments, 90% of 5-character random passwords using only letters can be generated in fewer than 20 attempts by an adversary; 80% of 10- character passwords can be generated in fewer than 75 attempts. Our attack uses the statistical constraints of the underlying content, English language, to reconstruct text from sound recordings without any labeled training data. The attack uses a combination of standard machine learning and speech recognition techniques, including cepstrum features, Hidden Markov Models, linear classification, and feedback-based incremental learning.