Researchers from the California Institute of Technology in Pasadena have built smart glasses that translate images into sounds that can be intuitively understood without training. The device, called vOICe (OIC stands for "Oh! I See"), is a pair of dark glasses with an attached camera, connected to a computer. It's based on an algorithm of the same name developed in 1992 by Dutch engineer Peter Meijer. The system converts pixels in the camera's video feed into sound, mapping brightness and vertical location to an associated pitch and volume.
A cluster of dark pixels at the bottom of the frame sounds quiet and has a low pitch, while a bright patch at the top would sound loud and high-pitched. The way a sound changes over time is governed by how the image looks when scanned left to right across the frame. Headphones sends the processed sound into the wearer's ear.
"You're taking something from vision and you're putting it into audition," says Caltech's Noelle Stiles, who works on vOICe. "Your brain is doing the opposite - it's taking in all of the sounds and it's making sense of them visually."
Integrating the senses
In a paper published this week, Stiles and her colleague Shinsuke Shimojo explain that mapping visuals to sound in this way reflects how we integrate data from different senses. Perceiving a rose, for instance, means experiencing more than just its colour - its scent, the texture of its petals and the rustle of its leaves also count.
Shimojo and Stiles worked to understand how people intuitively map objects to sounds. They asked sighted volunteers to match images (stripes, spots and natural textures) to sounds, while blind volunteers were asked to feel textures and select sounds that seemed to correspond to them. The pattern of choices directly shaped vOICe's algorithms and seemed to produce an intuitive result.
Tested on the device, blind people with no experience of using it were able to match the shapes to the sounds as often as those who had been trained - both groups performed 33 per cent better than by chance. But when the encoding was reversed, so that a high part of the image became a low pitch and a bright part of the image became a quiet sound, volunteers found it harder to match image to sound.
Intuitive mapping
"The result that select natural stimuli could be intuitive with sensory substitution, with or without training, was unexpected," the researchers write."This research shows it's not just important how much information you provide, but whether you provide it in a way that the person can intuitively make sense of," says Ione Fine of the University of Washington in Seattle. "They are basically saying that the magic bullet is going to be finding an intuitive mapping system and not relying on training," she says.
Fine points out that there's a gulf between distinguishing patterns in a lab and using vOICe to observe and understand the real world. But she says the work could ultimately help design better vision aids. Traditionally, these have relied on training users to understand the patterns they produced when converting vision to other stimuli. "It's much better to find something intuitive and easy to use," she says.
Stiles and Shinsuke are now using functional magnetic resonance imaging (fMRI) data to analyse activity within the brain, looking for that intuitive mapping system.
Journal reference: Scientific Reports, DOI: 10.1038/srep15628