August 11, 2010

Research aims to improve speech recognition software

by Binghamton University

Anyone who has used an automated airline reservation system has experienced the promise - and the frustration - inherent in today's automatic speech recognition technology. When it works, the computer "understands" that you want to book a flight to Austin rather than Boston, for example. Research conducted by Binghamton University's Stephen Zahorian aims to improve the accuracy of such programs.

Zahorian, a professor of electrical and computer engineering, recently received a grant of nearly half a million dollars from the Air Force Office of Scientific Research. The funds will support the two-year development of a multi-language, multi-speaker audio database that will be available for spoken-language processing research. Zahorian and his team plan to gather and annotate recordings of several hundred speakers each in English, Spanish and Mandarin Chinese.

"The challenge," he said, "is to get speech recognition working better in real-life situations."

That's why the samples in the new database will come from publicly available sources such as YouTube.

Zahorian's team will annotate each sample, creating a more detailed version of closed captioning, including time stamps and descriptions of background sounds. Once the human listener has finished with the transcription, automatic speech recognition algorithms will be used to align the recording with the captions. Next, software will be developed to verify and correct errors in the time alignment.

"Speech-recognition algorithms begin by mimicking what your ear does," Zahorian said. "But we want the algorithms to extract just the most useful characteristics of the speech, not all of the possible data. That's because more detail can actually hurt performance, past a certain point."

The field of automatic speech recognition has a long history, dating back to projects at Bell Labs before the computer age. These days, much of the technology relies on algorithms that convert sounds into numbers.

In Zahorian's research, he represents speech as a picture in a time-frequency plane. He then uses image-processing techniques to extract features of the speech, which has led him to focus more on time than on frequency.

When researchers are ready to test an algorithm, they rely on a common set of databases held by the Linguistic Data Consortium. Zahorian's unusual image-based approach has given his team some of the best results ever reported for automatic speech recognition experiments using two of the consortium's best-known databases.

The database Zahorian develops with the new funding will join these others, offering researchers around the world a new way to test their theories with samples of real-life speech.

Some mistakes are inevitable, given the variations in pitch, tone and pronunciation from person to person.

Still, the field does have a clear standard, Zahorian said: "In order to be useful, a system should have a word-error rate of no more than 10 percent."

Zahorian is interested in language modeling - if someone has said these three words, what's the fourth word likely to be? - as well as conversation modeling - that is, predicting when the speakers will switch. He's also intrigued by the potential to make advances by using established methods from other fields, including the neural networks developed by researchers working in artificial intelligence.

He sees a future in which automatic speech recognition will enable technology to extract the meaning of speech as well as the words.

"The dream," Zahorian said, "is that someday travelers will be able to speak into a little gadget that will translate what they've said into another language instantly and accurately."

Provided by Binghamton University

Citation: Research aims to improve speech recognition software (2010, August 11) retrieved 27 April 2024 from https://phys.org/news/2010-08-aims-speech-recognition-software.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

NEC Develops Speech-to-Speech Translation Software for Mobile Phones

0 shares

Feedback to editors

Optical barcodes expand range of high-resolution sensor

10 hours ago

Ridesourcing platforms thrive on socio-economic inequality, say researchers

10 hours ago

Did Vesuvius bury the home of the first Roman emperor?

10 hours ago

Florida dolphin found with highly pathogenic avian flu: Report

11 hours ago

A new way to study and help prevent landslides

11 hours ago

New algorithm cuts through 'noisy' data to better predict tipping points

11 hours ago

Researchers reconstruct landscapes that greeted the first humans in Australia around 65,000 years ago

11 hours ago

High-precision blood glucose level prediction achieved by few-molecule reservoir computing

12 hours ago

Enhancing memory technology: Multiferroic nanodots for low-power magnetic storage

12 hours ago

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes

12 hours ago

Load comments (2)

Research aims to improve speech recognition software

Optical barcodes expand range of high-resolution sensor

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Researchers reconstruct landscapes that greeted the first humans in Australia around 65,000 years ago

High-precision blood glucose level prediction achieved by few-molecule reservoir computing

Enhancing memory technology: Multiferroic nanodots for low-power magnetic storage

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes

Relevant PhysicsForums posts

Passing variables in FORTRAN

Parallel processing for loops and pointer defined outside the loop

My Website For Creating Interactive Visuals Linked To Equations

Number of Multiplications in the FFT Algorithm

Error logging in: onLoginSuccess is not a function

Latest Notable AI accomplishments

NEC Develops Speech-to-Speech Translation Software for Mobile Phones

Your Next Computer May Know How You Feel

Carnegie Mellon engineering researchers to create speech recognition in silicon

Google developing a translator for smartphones

Can't Make it to a Meeting? Send a Computer Instead

Researchers shed light on the brain mechanism responsible for processing of speech

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

Research aims to improve speech recognition software

Optical barcodes expand range of high-resolution sensor

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Researchers reconstruct landscapes that greeted the first humans in Australia around 65,000 years ago

High-precision blood glucose level prediction achieved by few-molecule reservoir computing

Enhancing memory technology: Multiferroic nanodots for low-power magnetic storage

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes

Relevant PhysicsForums posts

Related Stories

NEC Develops Speech-to-Speech Translation Software for Mobile Phones

Your Next Computer May Know How You Feel

Carnegie Mellon engineering researchers to create speech recognition in silicon

Google developing a translator for smartphones

Can't Make it to a Meeting? Send a Computer Instead

Researchers shed light on the brain mechanism responsible for processing of speech

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience