New computer software program excels at lip reading

March 17, 2017, University of Oxford
Lips close-up. Credit: Shutterstock

A new computer software program has the potential to lip-read more accurately than people and to help those with hearing loss, Oxford University researchers have found.

Watch, Attend and Spell (WAS), is a new artificial intelligence (AI) software system that has been developed by Oxford, in collaboration with the company DeepMind.

The AI system uses computer vision and machine learning methods to learn how to lip-read from a dataset made up of more than 5,000 hours of TV footage, gathered from six different programmes including Newsnight, BBC Breakfast and Question Time. The videos contained more than 118,000 sentences in total, and a vocabulary of 17,500 words.

The research team compared the ability of the machine and a human expert to work out what was being said in the silent video by focusing solely on each speaker's lip movements. They found that the software system was more accurate compared to the professional. The human lip-reader correctly read 12 per cent of words, while the WAS software recognised 50 per cent of the words in the dataset, without error. The machine's mistakes were small, including things like missing an "s" at the end of a word, or single letter misspellings.

The could support a number of developments, including helping the hard of hearing to navigate the world around them. Speaking on the tech's core value, Jesal Vishnuram, Action on Hearing Loss Technology Research Manager, said: 'Action on Hearing Loss welcomes the development of new technology that helps people who are deaf or have a to have better access to television through superior real-time subtitling.

'It is great to see research being conducted in this area, with new breakthroughs welcomed by Action on Hearing Loss by improving accessibility for people with a hearing loss. AI lip-reading technology would be able to enhance the accuracy and speed of speech-to-text especially in noisy environments and we encourage further research in this area and look forward to seeing new advances being made.'

Commenting on the potential uses for WAS Joon Son Chung, lead-author of the study and a graduate student at Oxford's Department of Engineering, said: 'Lip-reading is an impressive and challenging skill, so WAS can hopefully offer support to this task - for example, suggesting hypotheses for professional lip readers to verify using their expertise. There are also a host of other applications, such as dictating instructions to a phone in a noisy environment, dubbing archival silent films, resolving multi-talker simultaneous speech and improving the performance of automated speech recognition in general.'

The research team comprised of Joon Son Chung and Professor Andrew Zisserman at Oxford, where the research was carried out, together with Dr Andrew Senior and Dr Oriol Vinyals at DeepMind. Professor Zisserman commented `this project really benefitted by being able to bring together the expertise from Oxford and DeepMind'.

Explore further: Lipreading system is focus of research team at University of Oxford

Related Stories

Lip-read me now, hear me better later

April 12, 2007

Experience hearing a person's voice allows us to more easily hear what they are saying. Now research by UC Riverside psychology Professor Lawrence D. Rosenblum and graduate students Rachel M. Miller and Kauyumari Sanchez ...

Recommended for you

China auto show highlights industry's electric ambitions

April 22, 2018

The biggest global auto show of the year showcases China's ambitions to become a leader in electric cars and the industry's multibillion-dollar scramble to roll out models that appeal to price-conscious but demanding Chinese ...

Robot designed for faster, safer uranium plant pipe cleanup

April 21, 2018

Ohio crews cleaning up a massive former Cold War-era uranium enrichment plant in Ohio plan this summer to deploy a high-tech helper: an autonomous, radiation-measuring robot that will roll through miles of large overhead ...

Virtually modelling the human brain in a computer

April 19, 2018

Neurons that remain active even after the triggering stimulus has been silenced form the basis of short-term memory. The brain uses rhythmically active neurons to combine larger groups of neurons into functional units. Until ...

'Poker face' stripped away by new-age tech

April 14, 2018

Dolby Laboratories chief scientist Poppy Crum tells of a fast-coming time when technology will see right through people no matter how hard they try to hide their feelings.

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

dlethe
not rated yet Mar 17, 2017
It won't be long until law enforcement starts using this technology. Imagine what they could do with a telescope hooked up to a webcam into this software, or if they unleash it for use in airports; near borders; on the floor of the NYSE stock exchange; anywhere people get together where somebody could benefit from knowing what people are saying "privately" to each other.

What a wonderful way to get around wiretapping laws.

You heard it here first .. we'll need to get some anti lip-tapping laws to protect our privacy.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.