(PhysOrg.com) -- Millions of hours of old shows sit collecting dust in the basements of TV and radio broadcasters. Digging through these audiovisual treasure troves is becoming faster and easier thanks to software developed by European researchers.
In recent years many public and private organisations have embarked on initiatives to digitise collections of recordings from decades past in an effort to gain new insights into history and preserve the audiovisual content for posterity.
Sifting through these collections of analogue magnetic tapes they have uncovered long-lost footage of historical events and interviews with historical figures. But they have also encountered numerous problems.
“We don’t have the resources to digitise, describe and index all content in detail, so we need some sort of automated or semi-automated method,” says Jean-François Cosandier, the head of the documentation and archive department at Radio Suisse Romande, a radio station in the French-speaking part of Switzerland.
Not least among the challenges these archival archaeologists face is identifying what is included in the content of an old recording and cataloguing the digital copy for easy access and retrieval in the future.
“Some archives have collections of recordings that are well documented, but many do not,” says Philippe Scohy, a project manager at Memnon Archiving Services in Brussels. Some even have hundreds of thousands of hours of content without even knowing what’s in it.
Because of the lack of metadata information describing the content of old recordings it can take an archivist as long as five or six hours to catalogue a one-hour radio interview even though perhaps only a few minutes of that interview will be of interest.
“Given the amount of old media being digitised and the problems of identifying and cataloguing it, any tool that makes the archivist’s job easier is a welcome development,” Scohy notes.
Memnon is currently marketing a set of tools, IPI-Manager, intended to do just that. Developed in the EU-funded Memories project, the tools automate the more laborious aspects of the archiving process, helping archivists index and sort media collections faster and more easily. That in turn should lead to more historical content being made more accessible to more people, ensuring its preservation for future generations. For example, Radio Suisse Romande, a partner in the Memories project, plans to use the tools to help make its 80-year-old collection of audio recordings accessible and searchable online.
“We have digitised a quarter of our old analogue archive, so there is still a lot more work to be done,” Cosandier says. And, he adds, the development of new and more effective techniques is not justified solely by efforts to digitise old content. Nowadays archivists have to deal with a diverse range of audio documents, from radio programmes and speeches to conferences and university courses. With traditional methods of analysis and indexation it would be almost impossible to archive and make this content accessible.
Tools to dig up the past and the future
By analysing audio content, the Memories tools are able to identify different features of a recording. Used to catalogue a radio interview, for example, they detect when a question is asked and an answer given by recognising the exchange between speakers. The system then automatically tags each question and answer pair to let future listeners jump to different parts of the interview at the click of a button. Similarly, the Memories researchers developed a tool to automatically detect and tag the start, end and commercial breaks of different shows by recognising their trademark jingles.
“An old tape might be labelled with the shows that are on it, but more often than not an archivist is given no clue as to what order they are in or how long they run without watching the whole thing,” Scohy says. “Our tools provide that information.”
In the case of recordings of a person or people speaking, voice recognition technology can also be applied, which, with training, can automatically identify speakers, while a speech-to-text application turns the spoken content into text.
To provide search functionality, the Memories team developed a sophisticated search tool adapted from information-gathering methods that have been tried and tested in genetic and genomic applications. It is based on the statistical association of the occurrences of words.
In the case of music, the Memories researchers in Mist Technologies/Audionamix and Technion (Haifa) developed a tool to “unmix” the different channels that make up a song. Called Single Sensor Source Separation (SSSS), the software is able to differentiate between instruments, separating the sound of a trumpet from a piano, for example, and making it possible to identify different stages in a tune. The current version works best with mono recordings and can also be used to help digitally remaster them into stereo and surround sound, Scohy notes.
Open archives for future-proof content
The overall Memories architecture is based on the Open Archiving Information System (OAIS) model, a standard originally developed by the Consultative Committee for Space Data Systems (CCSDS) with the aim of future proofing digital content by storing and cataloguing it in such a way that it does not become obsolete and inaccessible as a result of technological progress.
“By adopting the OAIS approach we are trying to ensure that content is around for a very long time, not just years but thousands of years,” Scohy says.
With Memnon actively marketing products based on the work done in Memories and expecting its first sales imminently, preserving audiovisual memories for the future should be a little less of a challenge.
Explore further: Computer program to take on world's best in Texas Hold 'em
More information: Memories project: www.memories-project.eu/