A computer can pick out speech even amid cacophony

Nov 26, 2008
Schematic diagram of SHoUT

(PhysOrg.com) -- Using a recent development in speech recognition, it is possible to search through television news programmes provided the recognition system has been trained beforehand. PhD candidate Marijn Huijbregts from the University of Twente (Netherlands) has, however, taken things even further: he has developed Spoken Document Retrieval for audio and video files that the speech recognition system has not yet been trained to deal with.

This version of speech recognition works well even if there is a great deal of unexpected background noise. Huijbregts received his doctorate from the Faculty of Electrical Engineering, Mathematics and Computer Science on 21 November.

Information can be retrieved from text very quickly using, for example, an index in a book or a search machine such as Google. However, it is much more difficult to search in audio and video files, as they do not have an easily searchable index. You can use speech recognition to simplify this process as most of the information in audio and video files comes from speech. By recording via speech recognition, you can transform speech into text. To do this, you need a Spoken Document Retrieval (SDR) system; this makes it possible to search directly in audio and video materials, just as if you were searching in ordinary text documents. In other words, a sort of Google for audio and video.

Evening news on television

The Human Media Interaction group at the University of Twente had previously developed an SDR system for an evening television news programme. Search terms could be used to look for specific topics, the system being specially trained using newspaper texts and 20 hours of news programmes. The SDR for the evening news programme worked well because, in that situation, it was more or less known what was going to be said and there was little background noise. If you tried applying this system, without any training, to other video files, it did not perform well. Huijbregts then wondered whether he could develop a SDR system for which almost no training data would be needed, but which could nevertheless deal with unknown audio and video files satisfactorily.

SHoUT

With unknown audio and video files, it is not clear beforehand what is going to happen: who is speaking, what is being said and what sort of background noises are present. Huijbregts therefore developed an SDR system that was robust enough to deal with these unknown situations. It is called SHoUT (this acronym corresponds to the Dutch version of ‘Speech Recognition Research at University of Twente’). SDR can be described as robust if it can deal with all audio and video files under all sorts of circumstances, such as background noise or if people are not speaking clearly.

SHoUT is divided up into three stages. Firstly, the system distinguishes between speech and other sounds. For example, background music is filtered out from speech. Secondly, the system identifies different speakers and gives them labels. Then finally the automatic speech recognition takes place: the system transforms speech into text. You can now search the text file for relevant topics using key words, just as Google searches through text files on Internet.

The first version of SHoUT is already available, but Huijbregts is developing it even further. SHoUT and other demonstrations of SDR systems can be found on the website of Huijbregts (wwwhome.cs.utwente.nl/~huijbreg/).

Provided by University of Twente, Netherlands

Explore further: Computer software accurately predicts student test performance

add to favorites email to friend print save as pdf

Related Stories

No room for wrong notes

Feb 04, 2014

Each audio file has its own history. Editing processes such as cutting and compressing leave their own marks, and this is what researchers use to detect manipulated recordings or plagiarized passages of music ...

Researchers revolutionize closed captioning

Mar 22, 2012

(PhysOrg.com) -- Ever since closed video captioning was developed in the 1970s, it hasn't changed much. The words spoken by the characters or narrators scroll along at the bottom of the screen, enabling hearing ...

Apple seeks patents for display and noise-out systems

Dec 11, 2011

(PhysOrg.com) -- Apple made patent news this week in two directions, toward a Kinect like system and toward a quest for excellence in sound quality on phones. It’s been reported that Apple has filed patent ...

NSA leaker charged with espionage, theft

Jun 22, 2013

The Justice Department has charged former National Security Agency contractor Edward Snowden with espionage and theft of government property in the NSA surveillance case.

Recommended for you

'Chief Yahoo' David Filo returns to board

33 minutes ago

Yahoo announced the nomination of three new board members, including company co-founder David Filo, who earned the nickname and formal job title of "Chief Yahoo."

Fired Yahoo exec gets $58M for 15 months of work

43 minutes ago

Yahoo's recently fired chief operating officer, Henrique de Castro, left the Internet company with a severance package of $58 million even though he lasted just 15 months on the job.

Simplicity is key to co-operative robots

8 hours ago

A way of making hundreds—or even thousands—of tiny robots cluster to carry out tasks without using any memory or processing power has been developed by engineers at the University of Sheffield, UK.

Freight train industry to miss safety deadline

9 hours ago

The U.S. freight railroad industry says only one-fifth of its track will be equipped with mandatory safety technology to prevent most collisions and derailments by the deadline set by Congress.

User comments : 1

Adjust slider to filter visible comments by rank

Display comments: newest first

NanoStuff
not rated yet Nov 30, 2008
Youtube becomes self aware.

More news stories

'Chief Yahoo' David Filo returns to board

Yahoo announced the nomination of three new board members, including company co-founder David Filo, who earned the nickname and formal job title of "Chief Yahoo."

Simplicity is key to co-operative robots

A way of making hundreds—or even thousands—of tiny robots cluster to carry out tasks without using any memory or processing power has been developed by engineers at the University of Sheffield, UK.

Floating nuclear plants could ride out tsunamis

When an earthquake and tsunami struck the Fukushima Daiichi nuclear plant complex in 2011, neither the quake nor the inundation caused the ensuing contamination. Rather, it was the aftereffects—specifically, ...

New clinical trial launched for advance lung cancer

Cancer Research UK is partnering with pharmaceutical companies AstraZeneca and Pfizer to create a pioneering clinical trial for patients with advanced lung cancer – marking a new era of research into personalised medicines ...

More vets turn to prosthetics to help legless pets

A 9-month-old boxer pup named Duncan barreled down a beach in Oregon, running full tilt on soft sand into YouTube history and showing more than 4 million viewers that he can revel in a good romp despite lacking ...