share this!
1
3
Share
Email

April 10, 2018

Researchers develop more comprehensive acoustic scene analysis method

by Chinese Association of Automation

Researchers have demonstrated an improved method for audio analysis machines to process our noisy world. Their approach hinges on the combination of scalograms and spectrograms—the visual representations of audio—as well as convolutional neural networks (CNNs), the learning tool machines use to better analyze visual images. In this case, the visual images are used to analyze audio to better identify and classify sound.

The team published their results in the journal IEEE/CAA Journal of Automatica Sinica (JAS), a joint publication of the IEEE and the Chinese Association of Automation.

"Machines have made great progress in the analysis of speech and music, but general sound analysis has been lagging a big behind—usually, mostly isolated sound 'events' such as gun shots and the like have been targeted in the past," said Björn Schuller, a professor and chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg in Germany, who led the research. "Real-world audio is usually a highly blended mix of different sound sources—each of which have different states and traits."

Schuller points to the sound of a car as an example. It's not a singular audio event; rather different parts of the car's parts, its tires interacting with the road, the car's brand and speed all provide their own unique signatures.

"At the same time, there may be music or speech in the car," said Schuller, who is also an associate professor of Machine Learning at Imperial College London, and a visiting professor in the School of Computer Science and Technology at the Harbin Institute of Technology in China. "Once computers can understand all parts of this 'acoustic scene', they will be considerably better at decomposing it into each part and attribute each part as described."

Spectrograms provide a visual representation of audio scenes, but they have a fixed time-frequency resolution, that is the time at which frequencies change. Scalograms, on the other hand, offer a more detailed visual representation of acoustic scenes than spectrograms, for instance, acoustic scenes like the music or the speech or other sounds in the car now can be better represented.

"There are usually multiple sounds happening in one scene so... there should be multiple frequencies and they change with time," said Zhao Ren, an author on the paper and a Ph.D. candidate at the University of Augsburg who works with Schuller. "Fortunately, scalograms could solve this problem exactly since it incorporates multiple scales."

"Scalograms can be employed to help spectrograms in extracting features for acoustic scene classification," Ren said, and both spectrograms and scalograms need to be able to learn to continue improving.

"Further, pre-trained neural networks build a bridge between [the] image and audio processing."

The pre-trained neural networks the authors used are Convolutional Neural Networks (CNNs). CNNs are inspired by how neurons work in animals' visual cortex and the artificial neural networks can be used to successfully process visual imagery. Such networks are crucial in machine learning, and in this case, helping improve the scalograms.

CNNs receive some training before they're applied to a scene, but they mostly learn from exposure. By learning sounds from a combination of different frequencies and scales, the algorithm can better predict the sources and, eventually, predict the result of an unusual noise, such as a car engine malfunction.

"The ultimate goal is machine hearing/listening in a holistic fashion... across speech, music, and sound just like a human being would," Schuller said, noting that this would combine with the already advanced work in speech analysis to provide a richer and deeper understanding, "to then be able to get 'the whole picture' in the audio."

More information: Zhao Ren et al, Deep Scalogram Representations for Acoustic Scene Classification, IEEE/CAA Journal of Automatica Sinica (2018). DOI: 10.1109/JAS.2018.7511066

Provided by Chinese Association of Automation

Citation: Researchers develop more comprehensive acoustic scene analysis method (2018, April 10) retrieved 16 August 2024 from https://phys.org/news/2018-04-comprehensive-acoustic-scene-analysis-method.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

'Bat detectives' train new algorithms to discern bat calls in noisy recordings

4 shares

Feedback to editors

Researchers develop more comprehensive acoustic scene analysis method

The banana apocalypse is near, but biologists might have found a key to their survival

Scientists pinpoint dino-killing asteroid's origin: past Jupiter

Soundscape study shows how underground acoustics can amplify soil health

Scottish and Irish rocks confirmed as rare record of 'snowball Earth'

Blind cavefish have extraordinary taste buds that increase with age, research reveals

'Mercury bomb' threatens millions as Arctic temperatures rise, study warns

Team develops method for control over single-molecule photoswitching

X-ray irradiation technique helps to control cancer-causing poison in corn

Physicists uncover new phenomena in fractional quantum Hall effects

Researchers observe 'locked' electron pairs in a superconductor cuprate

Relevant PhysicsForums posts

Optimization of barrel length in pneumatic cannons

Why Are My Bending Moment Signs Incorrect in the Direct Stiffness Method?

Compressive strength of aluminum

Need help with determining thickness of steel bars

Airflow Restriction in Tobacco Smoking Pipe

Baltimore's Francis Scott Key Bridge Collapses after Ship Strike

'Bat detectives' train new algorithms to discern bat calls in noisy recordings

Read my lips: New technology spells out what's said when audio fails

Facial expression more important to conveying emotion in music than in speech

Don't shout to be heard—just reduce your acoustic reverberation

System correlates recorded speech with images, could lead to fully automated speech recognition

Computer learns to recognize sounds by watching video

Tiny probe that senses deep in the lung set to shed light on disease

MIT and NASA engineers demonstrate a new kind of airplane wing

When Concorde first took to the sky 50 years ago

Paper sensors remove the sting of diabetic testing

Micropores let oxygen and nutrients inside biofabricated tissues

Understanding dynamic stall at high speeds

Medical Xpress

Tech Xplore

Science X

Researchers develop more comprehensive acoustic scene analysis method

The banana apocalypse is near, but biologists might have found a key to their survival

Scientists pinpoint dino-killing asteroid's origin: past Jupiter

Soundscape study shows how underground acoustics can amplify soil health

Scottish and Irish rocks confirmed as rare record of 'snowball Earth'

Blind cavefish have extraordinary taste buds that increase with age, research reveals

'Mercury bomb' threatens millions as Arctic temperatures rise, study warns

Team develops method for control over single-molecule photoswitching

X-ray irradiation technique helps to control cancer-causing poison in corn

Physicists uncover new phenomena in fractional quantum Hall effects

Researchers observe 'locked' electron pairs in a superconductor cuprate

Relevant PhysicsForums posts

Related Stories

'Bat detectives' train new algorithms to discern bat calls in noisy recordings

Read my lips: New technology spells out what's said when audio fails

Facial expression more important to conveying emotion in music than in speech

Don't shout to be heard—just reduce your acoustic reverberation

System correlates recorded speech with images, could lead to fully automated speech recognition

Computer learns to recognize sounds by watching video

Recommended for you

Tiny probe that senses deep in the lung set to shed light on disease

MIT and NASA engineers demonstrate a new kind of airplane wing

When Concorde first took to the sky 50 years ago

Paper sensors remove the sting of diabetic testing

Micropores let oxygen and nutrients inside biofabricated tissues

Understanding dynamic stall at high speeds

Newsletter sign up

Donate and enjoy an ad-free experience