Smarter searching in archives using newly developed interface

Sep 03, 2012

Large quantities of data are flowing into archives each day: Newspapers and books are being digitised, whereas video material is being supplied directly in digital format. Search engine technology is therefore growing in importance. All of this digitised material provides a wealth of information for researchers in the humanities and social sciences, but can they also find what they are looking for amongst these so-called 'big data'?

According to Marc Bron, PhD student at the Intelligent Systems Lab Amsterdam (ISLA) at the University of Amsterdam, that depends on various factors. For certain material, researchers know that it is in the archive and which search terms they should use to retrieve it. However, in the majority of cases researchers come to the archive with a research question and they must first search for suitable material and explore the content of the archive.

Finding relevant material
One important difficulty in finding relevant material lies in the formulation of the search question that can be entered into the search engine. The search terms used by researchers can differ from the terminology archivists use to describe the material, even though they both mean more or less the same thing. For example, a researcher might enter the term 'migrant', whereas an archivist has used the term 'foreigner'. The second problem arises if material is found. Researchers cannot establish whether or not they have collected all of the relevant material or if other interesting things can still be found that they are not yet aware of.

Explorative interface provides a solution

In order to tackle these problems, Bron has developed an explorative together with colleagues at ISLA, the Centre for Television in Transition of Utrecht University and the Netherlands Institute for Sound and Vision. This interface is called MeRDES, an acronym for Media Researchers' Data Exploration Suite. It can be used to compare the outcomes of different in rich archives, such as those of the Netherlands Institute for Sound and Vision.

Researchers can visualise the number of programmes that are relevant for each of the queries in order to form an impression about how much information is available for different aspects of a subject. For example, using this approach the growing use of the term 'migrant' in archive material can be compared with the use of the term 'foreigner'. The amount of material available for a subject and how this compares to other subjects can exert a considerable influence on the approach used for the research and the questions that can ultimately be answered.

Marc Bron and postdoc Jasmijn Van Gorp (Utrecht University) tested the interface by carrying out a user study with 40 media scientists. Bron presented the outcomes of their research at the International conference of the Special Interest Group on Information Retrieval (SIGIR) that was held from 12 to 16 August in Portland (Oregon, United States). A demo of the interface is available at: zookma.science.uva.nl/merdesdemo.

Explore further: 'Googling' through unique audio material: towards a better search result

add to favorites email to friend print save as pdf

Related Stories

Rich musical pickings with easier access to archives

Apr 22, 2009

(PhysOrg.com) -- Digital sound archives offer enormously rich resources but accessing them is currently difficult, and often arbitrary. European researchers believe they have developed a solution, one that offers compelling ...

Multimedia search without detours

Apr 02, 2010

Finding a particular song or video is often no easy matter. Manually assigned metadata may be incorrect, and the unpacking of compressed data can slow up the search. DIVAS, a multimedia search engine, uses digital fingerprints ...

Search engine mashup

Jul 06, 2007

A mashup of two different types of web search tools could make find the useful nuggets of information among all the grit on the Internet much easier.

Recommended for you

The brain as a model for future supercomputers

May 14, 2013

(Phys.org) —The brain's repute took a big hit in 1997 when an IBM supercomputer defeated world chess champion Gary Kasparov in a match reported around the world. But in the second round, the brain is back.

User comments : 0

More news stories

German energy shift faces headwinds

Tense engineers have their eyes peeled on complex colour-coded diagrams on a wall-sized screen that makes their control room look like the inside of a spaceship.

Internet in 'coma' as Iran election looms

Iran is tightening control of the Internet ahead of next month's presidential election, mindful of violent street protests that social networkers inspired last time around over claims of fraud, users and ...

China police billions spell profit opportunity

Mannequins in riot gear, armoured cars and drones line a police equipment and "anti-terrorism technology" trade fair in Beijing as vendors seek to profit from China's huge internal security budget.