Smart listeners and smooth talkers

Nov 17, 2011
Sounds recognised from an audio recording. Credit: Xunying Liu

Human-like performance in speech technology could be just around the corner, thanks to a new research project that links three UK universities.

Human conversation is rich and it’s messy. When we communicate, we constantly adjust to those around us and to the environment we’re in; we leave words out because the context provides meaning; we rush or hesitate, or change direction; we overlap with other speakers; and, crucially, we’re expressive.

No wonder then that it’s proved so challenging to build machines that interact with people naturally, with human-like performance and behaviour.

Nevertheless there have been remarkable advances in speech-to-text technologies and synthesizers over recent decades. Current devices speed up the transcription of dictation, add automatic captions to video clips, enable automated ticket booking and improve the quality of life for those requiring assistive technology.

However, today’s speech technology is limited by its lack of ability to acquire knowledge about people or situations, to adapt, to learn from mistakes, to generalise and to sound naturally expressive. “To make the technology more usable and natural, and open up a wide range of new applications, requires field-changing research,” explained Professor Phil Woodland of Cambridge’s Department of Engineering.

Along with scientists at the Universities of Edinburgh and Sheffield, Professor Woodland and colleagues Drs Mark Gales and Bill Byrne have begun a five-year, £6.2 million project funded by the Engineering and Physical Sciences Research Council to provide the foundations of a new generation of speech technology.

Complex pattern matching

Speech technology systems are based on powerful techniques that are capable of learning statistical models known as Hidden Markov Models (HMMs). Trained on large quantities of real speech data, HMMs model the relationship between the basic speech sounds of a language and how these are realised in audio waveforms.

It’s a complex undertaking. For speech recognition, the system must work with a continuous stream of acoustic data, with few or no pauses between individual words. To determine where each word stops and starts, HMMs attempt to match the pattern of successive sounds (or phonemes) to the system’s built-in dictionary, assigning a probability score as to which sounds are most likely to follow the first sound to complete a word. The system then takes into account the structure of the language and which word sequences are more likely than others.

Adapt, train and talk

A key focus for the new project is to build systems that are adaptive, enabling them to acclimatise automatically to particular speakers and learn from their mistakes. Ultimately, the new systems will be able to make sense of challenging audio clips, efficiently detecting who spoke what, when and how.

Unsupervised training is also crucial, as Professor Woodland explained: “Systems are currently pre-trained with the sort of data they are trying to recognise – so a dictation system is trained with dictation data – but this is a significant commercial barrier as each new application requires specific types of data. Our approach is to build systems that are trained on a very wide range of data types and enable detailed system adaptation to the particular situation of interest. To access and structure the data, without needing manual transcripts, we are developing approaches that allow the system to train itself from a large quantity of unlabelled speech data.”

“One very interesting aspect of the work is that the fundamental HMMs are also generators of speech, and so the adaptive technology underlying speech recognition is also being applied to the development of personalised speech synthesis systems,” added Professor Woodland. New systems will take into account expressiveness and intention in speech, enabling devices to be built that respond to an individual’s voice, vocabulary, accent and expressions.

The three university teams have already made considerable contributions to the field and many techniques used in current speech recognition systems were developed by the engineers involved in the new project. The new programme grant enables them to take a wider vision and to work with companies that are interested in how speech technology could transform our lives at home and at work. Applications already planned include a personalised voice-controlled device to help the elderly to interact with control systems in the home, and a portable device to enable users to create a searchable text version of any audio they encounter in their everyday lives.

Explore further: Researchers develop fast, economical method for high-definition video compositing

add to favorites email to friend print save as pdf

Related Stories

'Motherese' important for children's language development

May 06, 2011

(Medical Xpress) -- Talking to children has always been fundamental to language development, but new research reveals that the way we talk to children is key to building their ability to understand and create ...

Speech recognition leaps forward

Aug 29, 2011

During Interspeech 2011, the 12th annual Conference of the International Speech Communication Association being held in Florence, Italy, from Aug. 28 to 31, researchers from Microsoft Research will present work that dramatically ...

Research aims to improve speech recognition software

Aug 11, 2010

Anyone who has used an automated airline reservation system has experienced the promise - and the frustration - inherent in today's automatic speech recognition technology. When it works, the computer "understands" that you ...

Professor says hate speech is akin to obscenity

Jan 25, 2011

Obscenity isn’t about sex, but rather about degradation, argues a Michigan State University professor who conducted a first-of-its kind study on the relationship between hate speech and obscenity.

'The King's Speech': good drama - but accurate science?

Feb 21, 2011

(PhysOrg.com) -- "The King's Speech" is a compelling enough story to merit 12 Oscar nominations. (We’ll find out how compelling when the Academy Awards are announced Feb. 27). However, as contentions surface about the ...

Recommended for you

Pandora posts in-line 1Q loss, upbeat sales

6 hours ago

(AP)—Internet radio company Pandora reported higher-than-expected revenue in the latest quarter, with losses in line with analysts' forecasts, as the number of subscribers who pay for ad-free listening rose above 2.5 million.

Google Drive sports new view and scan enhancements

6 hours ago

(Phys.org) —Google Drive has a new look and functions. The makeover in Google Drive features scanning and interface enhancements that put the user into "card" mode. The enhancements make it easy for the ...

Inventor creates Card Beams with 3D printer

7 hours ago

What are card beams, you may ask? They are the building toy that allows you to build gravity-defying houses of cards with the help of friction, gravity, and two types of beams - the cap and the connector.

Solar Kettle allows for boiling water off the grid

8 hours ago

(Phys.org) —A company called Contemporary Energy has unveiled a new device it calls the Solar Kettle. It looks very much like a normal coffee thermos, but has flaps on one side that open to allow for collecting ...

Review: Google music plan solid, serendipitous

10 hours ago

Google's new music service offers a lot of eye candy to go with the tunes. The song selection of around 18 million tracks is comparable to popular services such as Spotify and Rhapsody, and a myriad of playlists ...

User comments : 0

More news stories

Google Drive sports new view and scan enhancements

(Phys.org) —Google Drive has a new look and functions. The makeover in Google Drive features scanning and interface enhancements that put the user into "card" mode. The enhancements make it easy for the ...

Solar Kettle allows for boiling water off the grid

(Phys.org) —A company called Contemporary Energy has unveiled a new device it calls the Solar Kettle. It looks very much like a normal coffee thermos, but has flaps on one side that open to allow for collecting ...

Review: Google music plan solid, serendipitous

Google's new music service offers a lot of eye candy to go with the tunes. The song selection of around 18 million tracks is comparable to popular services such as Spotify and Rhapsody, and a myriad of playlists ...

Controlling mood through the motions of mitochondria

(Medical Xpress)—Regulating the distribution of power in neurons is done by a system that makes the national electric grid look simple by comparison. Each neuron has several thousand mitochondria confined ...

A quantum simulator for magnetic materials

Physicists understand perfectly well why a fridge magnet sticks to certain metallic surfaces. But there are more exotic forms of magnetism whose properties remain unclear, despite decades of intense research. ...

A hidden population of exotic neutron stars

(Phys.org) —Magnetars – the dense remains of dead stars that erupt sporadically with bursts of high-energy radiation - are some of the most extreme objects known in the Universe. A major campaign using ...