Retrieving Sound from Vibrations Captured on Video

By September 02, 2014
Projet du MIT

Using a simple camera, researchers have managed to reconstruct intelligible speech from the visually observed vibrations in a packet of potato chips and the leaves of a pot plant.

Transforming a packet of potato chips into a microphone: this is the feat achieved by a research team from the prestigious Massachusetts Institute of Technology (MIT), in collaboration with researchers from Microsoft Research and Adobe. To pull this off, they did not use a miniaturised gadget in the best James Bond traditions but a standard video camera, backed up by a powerful IT algorithm. “When sound hits an object, it causes the object to vibrate,” explains Abe Davis, a graduate student in electrical engineering and computer science at MIT who is lead author on the recently-published scientific paper. “The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye. Previously people didn’t realise that this information was there.” So ‘all’ the research team had to do was to draw on this information to recreate the sound bouncing off the packet of chips.

2000 to 6000 frames per second

The latest experiments use a high-speed camera capable of capturing 2,000 to 6,000 frames per second. This is much faster than the 60 frames per second possible with some smartphones, but well below the frame rates of the best commercial high-speed cameras, which can top 100,000 frames per second. The successive camera frames pick up tiny movements arising from vibrations, sometimes on a scale of thousandths of a pixel on video. As a comparison, a 1080p high definition video has more than two million pixels. This is where the algorithm comes in, analysing the variations from one frame to another, and using these to reconstitute the sounds which caused the vibrations. Despite some bugs, the results can be heard and understood. In one set of experiments, they were able to recover and record on to a sound track intelligible speech from the vibrations of a potato-chip bag photographed from 15 feet away through soundproof glass.

Surveillance and medicine

A second experiment was as conclusive as the first. This time, instead of the packet of chips, the researchers filmed a pot plant. The video demonstration shows that when the vibrations of the leaves were recorded and translated into sound, the words of the nursery rhyme ‘Mary Had A Little Lamb’ could be discerned. As for the uses to which the new discovery will be put, Davis argues that “Big Brother won't be able to hear anything that anyone ever says all of a sudden,” adding however: “But it is possible that you could use this to discover sound in situations where you couldn’t before.” Clearly this technique will be of interest to the police, who would find it very useful to recover sounds from video surveillance cameras, and the military and intelligence services will also be keen to add this approach to their range of surveillance and espionage methods.  However, on a less sinister note, doctors may be able to make good use of the technique, e.g. to measure a prematurely-born infant’s pulse by videoing its wrist.

Legal mentions © L’Atelier BNP Paribas