November 9, 2010 feature
Human-computer music performances use system that links music and musical gestures (w/ Video)
(PhysOrg.com)—Every musical sound comes from a specific way that an instrument is played. With modern technology such as sensors, signal processing, and sometimes machine learning algorithms, researchers can determine the precise musical gesture used to produce a particular sound on an instrument. The ability to recreate musical gestures from sounds can be used for interactive human-computer music performances, music transcription, and other innovative applications.
In a new study, researchers have developed a method for capturing musical gestures and mapping them to sounds that overcomes some of the disadvantages of previous methods. Adam Tindale, Ajay Kapur, and George Tzanetakis – all trained musicians and computer scientists working at the University of Victoria in Victoria, Canada – have described the new method in a study to be published in IEEE Transactions on Multimedia. The method grew out of the authors' experiences in developing instruments for interactive human-computer music performances. At the time, Tindale and Kapur were both completing their PhDs at the University of Victoria. Tindale now works at the Alberta College of Art and Design and Kapur works at the California Institute of the Arts and the New Zealand School of Music.
As the researchers explain in their study, there are two main approaches for capturing musical gestures. One approach is direct acquisition, which involves attaching permanent sensors to instruments to create "hyper-instruments." However, this approach is often invasive to performers and requires modification of expensive instruments. The second approach is indirect acquisition, which involves using a microphone to capture the sound, as well as sophisticated signal processing and machine learning algorithms to extract gestures from the sounds, which requires large amounts of training.
The researchers' new method is somewhat of a hybrid of these two approaches. They temporarily attach sensors to an instrument to capture musical gestures and a microphone to capture sound. This data is analyzed, and the gesture-sound mappings are used to train machine learning models to extract gestures from sounds only. These machine models then create what the researchers call a "surrogate sensor," which behaves like the original invasive sensor but is not attached to the instrument. The surrogate sensor can determine the musical gestures based only on the analyzed sound captured from the microphone.
"The main advantage of the method is that it allows large amounts of training data for machine learning algorithms to be acquired without human annotation simply by playing an instrument enhanced with sensors," Tzanetakis told PhysOrg.com. "Once the surrogate sensor has been trained and its performance evaluated, then it can be used in place of the actual physical sensor on unmodified instruments of the same type."
The system overcomes the previous difficulties in that it doesn't hinder the performers or their instruments when performing, and doesn't require large amounts of processing and analysis, greatly reducing time requirements. For instance, data from snare drum samples that took nearly a week of manual labor can be processed in less than an hour with the new method.
The researchers demonstrated the system with professional musicians playing electronic sitars and electronic snare drums. The results showed that the trained surrogate sensors can accurately determine musical gestures when different musicians were performing, and not only the performer who was used to train the surrogate sensor. Further, any kind of performing (such as improvisation or playing a song) could be used to train the surrogate sensor.
In the future, the researchers plan to make the system more sensitive to additional features, such as training it to recognize the type of mouthpiece used by a woodwind instrument and the string on which a note is played on a violin. Both Tindale and Kapur currently use the system during musical performances, as shown in the accompanying videos.
"Basically, the gesture information extracted is typically used in the following ways," Tzanetakis explained. "(1) To map performance information to sound and a music generation algorithm that react in real-time expressively to the music; (2) to drive animations and visuals; (3) to synthesize sounds parametrically (for example, a synthesized drum sound might change based on where the snare drum is hit); and (4) to do analysis of the music played (for example, automatically tracking the tempo) in the context of computer-controlled robotic instruments interacting with humans."
Copyright 2010 PhysOrg.com.
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in whole or part without the express written permission of PhysOrg.com.