Video to audio conversion for visually impaired

Joseph M Redfern


Supervised by Kirill Sidorov; Moderated by George Theodorakopoulos

As of 2012 there are 285 million visually impaired people in the world, of which 246 million have low vision and 39 million are totally blind. While medical research into treating these conditions is underway, computer scientists can also help! Assuming a visually impaired person retains hearing and is willing to wear a webcam on their head (and a portable computer in the pocket or backpack), it seems possible to develop a piece of software that would analyse the video stream from the camera (and possibly from other sensors, such as a laser range) and "convert" it into audio information which would help the person to navigate.

In particular, it would be interesting to:

(1) Investigate the main problems encountered by the blind people (from medical literature and experimentally). What is the most useful information that they crucially need to navigate and perform everyday functions?

(2) Investigate what similar projects have achieved. There was, in particular, this one:


The above method suffers from an important drawback: the resulting output is too low-level and hence difficult to parse without substantial training. Can it be improved?

(3) As the result of the above investigation, design a piece of software which would do some sort of video-to-sound conversion. Bear in mind that the general problem of image-to-text is extremely difficult. However, for the purposes of this project, it would be sufficient if the software is functional only in some limited, controlled scenarios. Reporting positions of people in an in-door situation would be an example of one such scenario.

We speculate that a good solution to this problem would involve detecting salient features in images (such as faces, or fast-moving objects) and encoding them with some sound-patterns (melodies?) placed in the acoustic stereo field.

A first class mark would be awarded for a working prototype which would allow the examiner to walk around the room with their eyes closed, relying only on the audio output from your software, while avoiding obstacles, and possibly performing some simple tasks (e.g. picking up an object from the floor).

Initial Plan (03/02/2015) [Zip Archive]

Final Report (05/05/2015) [Zip Archive]

Publication Form