'Googling' through unique audio material: towards a better search result

Jul 04, 2012

Searching and finding in audio archives can be improved if we take a different look at the underlying technology and allow for how the results are used. This provides a better picture of the problems and the points for improvement. Laurens van der Werff demonstrated this in his PhD thesis 'Evaluation of Noisy Transcripts for Spoken Document Retrieval', which he will defend on 5 July at the University of Twente.

Van der Werff's research was carried out within the project CHoral, which focuses on making spoken audio material from the past accessible. Dutch archives and other heritage institutions look after many hundreds of thousands of hours of audio material such as interviews with witnesses of a special event but also, for example, all transmissions of national and regional radio organisations.

If this unique audio material can be disclosed well then it will make a valuable contribution to research in the area of language use and dialect, regional and national politics, and history. CHoral is one of 18 projects from the NWO research programme CATCH (Continuous Access to Cultural Heritage) which has a total budget of more than 15 million euros and is working on the accessibility of Dutch cultural heritage.

Improved evaluation of transcripts

Automatic in combination with offers the possibility of searching through sound files: spoken word is converted into a written text (transcript) that you can subsequently search as 'usual'. Many research labs worldwide are working hard on improving the quality of . However, for applications in search systems - and certainly for heritage collections - these improvements do not always deliver a maximum benefit.

For heritage collections, Van der Werff proposed a new way of evaluating the quality of automatically generated transcripts that pays more attention to how historians and other end-users want to use the search results. This offers the possibility of an improved analysis of where problems occur and provides leads for optimisation. Due to the limited frame of reference in the heritage sector on which optimisations can be based, this approach is a most welcome step forwards.

Specific challenges of heritage material

The audio material in heritage collections has a number of special characteristics. Many sound tapes are not digitised, they have mostly not been manually transcribed and they have no or only superficial metadata. Furthermore, it often concerns recordings from non-professional speakers with a lot of noise in the background. And many of the speakers only occur in a single sound fragment and so very little training material is available for a computer – a typical problem within that is exacerbated by the small geographic area Dutch is spoken in. Another complicating factor is that this heritage data is mostly used in a highly specific manner. As a result of all of these special characteristics, an approach that works well with news data, for example, cannot be automatically applied to this unique material.

Applications of the optimised technology

The techniques from the Choral project were, for example, used on collections from the Rotterdam Municipal Archive (transmissions Radio Rijnmond; website 'Brandgrens' with eyewitness accounts about the bombing of Rotterdam), the NIOD (Radio Oranje with speeches from Queen Wilhelmina during World War II; eyewitness accounts of survivors from Buchenwald) and the interview archive of Aletta/IAVV.

The knowledge and techniques from CHoral have also helped to lay the basis for the open source speech recognition package SHoUT (University of Twente) that has been further developed within the CATCH valorisation programme CATCHPlus (www.catchplus.nl). Using this software each archive can now, in principle, make its audio sources accessible without the need for its own in-house specialists. SHoUT is already being used for the national website 'Verteld Verleden' ['Spoken Past'], through which all audio sources in the Netherlands will be accessible in the future.

Further information: www.nwo.nl/catch and www.nwo.nl/catch/choral

Explore further: First steps towards "Experimental Literature 2.0"

add to favorites email to friend print save as pdf

Related Stories

Faster, easier way to access audiovisual assets

Jan 13, 2010

(PhysOrg.com) -- Millions of hours of old shows sit collecting dust in the basements of TV and radio broadcasters. Digging through these audiovisual treasure troves is becoming faster and easier thanks to ...

Culture vultures go beyond, way beyond Google

Dec 22, 2008

(PhysOrg.com) -- European researchers are pushing online culture and heritage research way beyond Google by using a smart search system that is multilingual, multimedia and optimised for cultural heritage. ...

Unlocking the secrets of Heritage Smells

Mar 28, 2011

Clues to the condition of museum exhibits and antique objects are to be revealed in a research project led by the University of Strathclyde in Glasgow- with the use of technology for 'sniffing' artefacts.

Rich musical pickings with easier access to archives

Apr 22, 2009

(PhysOrg.com) -- Digital sound archives offer enormously rich resources but accessing them is currently difficult, and often arbitrary. European researchers believe they have developed a solution, one that offers compelling ...

'Talking dictionaries' document vanishing languages

Feb 17, 2012

Digital technology is coming to the rescue of some of the world's most endangered languages. Linguists from National Geographic's Enduring Voices project who are racing to document and revitalize struggling languages are ...

Recommended for you

First steps towards "Experimental Literature 2.0"

2 hours ago

As part of a student's thesis, the Laboratory of Digital Humanities at EPFL has developed an application that aims at rearranging literary works by changing their chapter order. "The human simulation" a saga ...

User comments : 0

More news stories

Students take clot-buster for a spin

(Phys.org) —In the hands of some Rice University senior engineering students, a fishing rod is more than what it seems. For them, it's a way to help destroy blood clots that threaten lives.

Finnish inventor rethinks design of the axe

(Phys.org) —Finnish inventor Heikki Kärnä is the man behind the Vipukirves Leveraxe, which is a precision tool for splitting firewood. He designed the tool to make the job easier and more efficient, with ...

First steps towards "Experimental Literature 2.0"

As part of a student's thesis, the Laboratory of Digital Humanities at EPFL has developed an application that aims at rearranging literary works by changing their chapter order. "The human simulation" a saga ...