Speech signal processing—enhancing voice conversion models

December 27, 2016
Researchers in Japan have created a new voice conversion method using an adaptive restricted Boltzmann machine - a model capable of deconstructing speech and rebuilding it to sound like a different person speaking. Crucially, this model works without the need for parallel data from two speakers for training, meaning target voices can say words and sentences not used in training. Credit: University of Electro-Communications

Altering a person's voice so that it sounds like another person is a useful technique for use in security and privacy, for example. This computational technique, known as voice conversion (VC), usually requires parallel data from two speakers to achieve a natural-sounding conversion. Parallel data requires recordings of two people saying the same sentences, with the necessary vocabulary, which are then time-matched and used to create a new target voice for the original speaker.

However, there are issues surrounding parallel data in speech processing, not least a need for exact matching vocabulary between two speakers, which leads to a lack of corpus for other vocabulary not included in the pre-defined training. Now, Toru Nakashika at the University of Electro-Communications in Tokyo and co-workers have successfully created a model capable of using non-parallel data to create a target voice - in other words, the target voice can say sentences and not used in model training.

Their new VC method is based on the simple premise that the acoustic features of speech are made up of two layers - neutral phonological information belonging to no specific person, and 'speaker identity' features that make words sound like they are coming from a particular speaker. Nakashika's model, called an adaptive restricted Boltzmann machine, helps deconstruct speech, retaining the neutral phonological information but replacing speaker specific information with that of the target speaker.

After training, the model was comparable with existing parallel-trained models with the added advantage that new phonemic sounds can be generated for the target speaker, which enables generation of the target speaker with a different language.

Explore further: Self-learning computer software can detect and diagnose errors in pronunciation

More information: Toru Nakashika et al. Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine, IEEE/ACM Transactions on Audio, Speech, and Language Processing (2016). DOI: 10.1109/TASLP.2016.2593263

Related Stories

Hearing with your eyes—a Western style of speech perception

November 15, 2016

Which parts of a person's face do you look at when you listen them speak? Lip movements affect the perception of voice information from the ears when listening to someone speak, but native Japanese speakers are mostly unaffected ...

Can voice recognition technology identify a masked jihadi?

January 7, 2016

The latest video of a masked Islamic State jihadist apparently speaking with a British accent led to him being tentatively identified as Muslim convert Siddhartha Dhar from East London. Voice recognition experts were reportedly ...

Researchers produce 'neural fingerprint' of speech recognition

November 10, 2008

Scientists from Maastricht University (Netherlands) have developed a method to look into the brain of a person and read out who has spoken to him or her and what was said. With the help of neuroimaging and data mining techniques ...

Exploring gender perception via speech

May 25, 2016

When listening to voices, we tend to perceive the speaker as masculine or feminine rather quickly. These snap judgments are based on acoustic information from the speaker's voice. But some vocal qualities deemed "feminine" ...

Recommended for you

Volvo to supply Uber with self-driving cars (Update)

November 20, 2017

Swedish carmaker Volvo Cars said Monday it has signed an agreement to supply "tens of thousands" of self-driving cars to Uber, as the ride-sharing company battles a number of different controversies.


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.