December 27, 2016

Speech signal processing—enhancing voice conversion models

by University of Electro-Communications

Altering a person's voice so that it sounds like another person is a useful technique for use in security and privacy, for example. This computational technique, known as voice conversion (VC), usually requires parallel data from two speakers to achieve a natural-sounding conversion. Parallel data requires recordings of two people saying the same sentences, with the necessary vocabulary, which are then time-matched and used to create a new target voice for the original speaker.

However, there are issues surrounding parallel data in speech processing, not least a need for exact matching vocabulary between two speakers, which leads to a lack of corpus for other vocabulary not included in the pre-defined model training. Now, Toru Nakashika at the University of Electro-Communications in Tokyo and co-workers have successfully created a model capable of using non-parallel data to create a target voice - in other words, the target voice can say sentences and vocabulary not used in model training.

Their new VC method is based on the simple premise that the acoustic features of speech are made up of two layers - neutral phonological information belonging to no specific person, and 'speaker identity' features that make words sound like they are coming from a particular speaker. Nakashika's model, called an adaptive restricted Boltzmann machine, helps deconstruct speech, retaining the neutral phonological information but replacing speaker specific information with that of the target speaker.

After training, the model was comparable with existing parallel-trained models with the added advantage that new phonemic sounds can be generated for the target speaker, which enables speech generation of the target speaker with a different language.

More information: Toru Nakashika et al. Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine, IEEE/ACM Transactions on Audio, Speech, and Language Processing (2016). DOI: 10.1109/TASLP.2016.2593263

Provided by University of Electro-Communications

Citation: Speech signal processing—enhancing voice conversion models (2016, December 27) retrieved 10 July 2024 from https://phys.org/news/2016-12-speech-processingenhancing-voice-conversion.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Self-learning computer software can detect and diagnose errors in pronunciation

7 shares

Feedback to editors

Speech signal processing—enhancing voice conversion models

A new species of extinct crocodile relative rewrites life on the Triassic coastline

New method achieves tenfold increase in quantum coherence time via destructive interference of correlated noise

Mars likely had cold and icy past, new study finds

Study: Nanoparticle vaccines enhance cross-protection against influenza viruses

New tools are needed to make water affordable, says study

Researchers demonstrate how to build 'time-traveling' quantum sensors

Lion with nine lives breaks record with longest swim in predator-infested waters

New multimode coupler design advances scalable quantum computing

High-speed electron camera uncovers new 'light-twisting' behavior in ultrathin material

Perceived warmth, competence predict callback decisions in meta-analysis of hiring experiments

Relevant PhysicsForums posts

Windows updates driving me crazy

Scrolling in an editing window on Android

Laptop suggestions for a Physics Student

Custom icon for specific file type in Nautilus on Ubuntu 22.04

The best and most secure password manager

Cyber security in the modern/post-modern internet

Self-learning computer software can detect and diagnose errors in pronunciation

Hearing with your eyes—a Western style of speech perception

What makes your voice yours? Researchers take steps to characterize and quantify voice quality

Can voice recognition technology identify a masked jihadi?

Researchers produce 'neural fingerprint' of speech recognition

Exploring gender perception via speech

Google's challenge to game consoles to kick off in November

Technology streamlines computational science projects

New video game teaches teens about electricity

Travis the translator aims to make people understood

Windows 10 update set for October release

De-jargonizing program helps decode science speak

Medical Xpress

Tech Xplore

Science X

Speech signal processing—enhancing voice conversion models

A new species of extinct crocodile relative rewrites life on the Triassic coastline

New method achieves tenfold increase in quantum coherence time via destructive interference of correlated noise

Mars likely had cold and icy past, new study finds

Study: Nanoparticle vaccines enhance cross-protection against influenza viruses

New tools are needed to make water affordable, says study

Researchers demonstrate how to build 'time-traveling' quantum sensors

Lion with nine lives breaks record with longest swim in predator-infested waters

New multimode coupler design advances scalable quantum computing

High-speed electron camera uncovers new 'light-twisting' behavior in ultrathin material

Perceived warmth, competence predict callback decisions in meta-analysis of hiring experiments

Relevant PhysicsForums posts

Related Stories

Self-learning computer software can detect and diagnose errors in pronunciation

Hearing with your eyes—a Western style of speech perception

What makes your voice yours? Researchers take steps to characterize and quantify voice quality

Can voice recognition technology identify a masked jihadi?

Researchers produce 'neural fingerprint' of speech recognition

Exploring gender perception via speech

Recommended for you

Google's challenge to game consoles to kick off in November

Technology streamlines computational science projects

New video game teaches teens about electricity

Travis the translator aims to make people understood

Windows 10 update set for October release

De-jargonizing program helps decode science speak

Newsletter sign up

Donate and enjoy an ad-free experience