Doing what the brain does -- how computers learn to listen

Aug 14, 2009

(PhysOrg.com) -- We see, hear and feel, and make sense of countless diverse, quickly changing stimuli in our environment seemingly without effort. However, doing what our brains do with ease is often an impossible task for computers. Researchers at the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences and the Wellcome Trust Centre for Neuroimaging in London have now developed a mathematical model which could significantly improve the automatic recognition and processing of spoken language. In the future, this kind of algorithms which imitate brain mechanisms could help machines to perceive the world around them.

Many people will have personal experience of how difficult it is for computers to deal with spoken language. For example, people who "communicate" with automated telephone systems now commonly used by many organisations need a great deal of patience. If you speak just a little too quickly or slowly, if your pronunciation isn't clear, or if there is , the system often fails to work properly. The reason for this is that until now the computer programs that have been used rely on processes that are particularly sensitive to perturbations. When computers process language, they primarily attempt to recognise characteristic features in the frequencies of the voice in order to recognise words.

"It is likely that the brain uses a different process", says Stefan Kiebel from the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences. The researcher presumes that the analysis of temporal sequences plays an important role in this. "Many perceptual stimuli in our environment could be described as temporal sequences." Music and spoken language, for example, are comprised of sequences of different length which are hierarchically ordered. According to the scientist's hypothesis, the brain classifies the various signals from the smallest, fast-changing components (e.g., single sound units like "e" or "u") up to big, slow-changing elements (e.g., the topic).

The significance of the information at various temporal levels is probably much greater than previously thought for the processing of perceptual stimuli. "The brain permanently searches for temporal structure in the environment in order to deduce what will happen next", the scientist explains. In this way, the brain can, for example, often predict the next sound units based on the slow-changing information. Thus, if the topic of conversation is the hot summer, "su…" will more likely be the beginning of the word "sun" than the word "supper".

To test this hypothesis, the researchers constructed a which was designed to imitate, in a highly simplified manner, the neuronal processes which occur during the comprehension of speech. Neuronal processes were described by algorithms which processed speech at several temporal levels. The model succeeded in processing speech; it recognised individual speech sounds and syllables. In contrast to other artificial speech recognition devices, it was able to process sped-up speech sequences. Furthermore it had the brain's ability to "predict" the next speech sound. If a prediction turned out to be wrong because the researchers made an unfamiliar syllable out of the familiar sounds, the model was able to detect the error.

The "language" with which the model was tested was simplified - it consisted of the four vowels a, e, i and o, which were combined to make "syllables" consisting of four sounds. "In the first instance we wanted to check whether our general assumption was right", Kiebel explains. With more time and effort, consonants, which are more difficult to differentiate from each other, could be included, and further hierarchical levels for words and sentences could be incorporated alongside individual sounds and syllables. Thus, the model could, in principle, be applied to natural language.

"The crucial point, from a neuroscientific perspective, is that the reactions of the model were similar to what would be observed in the human brain", Stefan Kiebel says. This indicates that the researchers' model could represent the processes in the . At the same time, the model provides new approaches for practical applications in the field of artificial speech recognition.

More information: Stefan J. Kiebel, Katharina von Kriegstein, Jean Daunizeau, Karl J. Friston; Recognizing sequences of sequences; PLoS Computational Biology, August 14th, 2009.

Source: Max-Planck-Gesellschaft (news : web)

Explore further: Thailand urged to explore edible insect market

add to favorites email to friend print save as pdf

Related Stories

Researchers produce 'neural fingerprint' of speech recognition

Nov 10, 2008

Scientists from Maastricht University (Netherlands) have developed a method to look into the brain of a person and read out who has spoken to him or her and what was said. With the help of neuroimaging and data mining techniques ...

Why can’t I learn a new language?

Jul 08, 2008

Adults, even the brightest ones, often struggle with learning new languages. Dr Nina Kazanina in the Department of Psychology at the University of Bristol explains why.

Seeing While Hearing Speeds Brain's Processing of Speech

Jan 15, 2005

While the R&B classic "I Heard It Through the Grapevine" advises you to "believe half of what you see and none of what you hear," a University of Maryland study has found that seeing and hearing together speed up the brain's ...

Zeroing in on the brain's speech 'receiver'

Jun 20, 2007

A particular resonance pattern in the brain’s auditory processing region appears to be key to its ability to discriminate speech, researchers have found. They found that the inherent rhythm of neural activity called “theta ...

Recommended for you

Researcher admits mistakes in stem cell study

12 minutes ago

A blockbuster study in which US researchers reported that they had turned human skin cells into embryonic stem cells contained errors, its lead author has acknowledged. ...

Scientists discover how rapamycin slows cell growth

1 hour ago

University of Montreal researchers have discovered a novel molecular mechanism that can potentially slow the progression of some cancers and other diseases of abnormal growth. In the May 23 edition of the prestigious journal ...

Bittersweet: Bait-averse cockroaches shudder at sugar

3 hours ago

Sugar isn't always sweet to German cockroaches, especially to the ones that avoid roach baits. In a study published May 24 in the journal Science, North Carolina State University entomologists show the ne ...

User comments : 0

More news stories

White tiger mystery solved

White tigers today are only seen in zoos, but they belong in nature, say researchers reporting new evidence about what makes those tigers white. Their spectacular white coats are produced by a single change ...

Hubble reveals the ring nebula's true shape

(Phys.org) —The Ring Nebula's distinctive shape makes it a popular illustration for astronomy books. But new observations by NASA's Hubble Space Telescope of the glowing gas shroud around an old, dying, ...