share this!
2
3
Share
Email

February 6, 2019

New method for high-speed synthesis of natural voices

by Research Organization of Information and Systems

A research team at the National Institute of Informatics (NII/Tokyo, Japan) including Xin Wang, Shinji Takaki and Junichi Yamagishi has developed a neural source-filter (NSF) model for high-speed, high-quality voice synthesis. This technique, which combines recent deep-learning algorithms and a classical speech production model dated back to the 1960s, is capable not only of generating high-quality voice waveforms closely resembling the human voice, but also of conducting stable learning via neural networks.

To date, many speech synthesis systems have adopted the vocoder approach, a method for synthesizing speech waveforms that is widely used in cellular-phone networks and other applications. However, the quality of the speech waveforms synthesized by these methods has remained inferior to that of the human voice. In 2016, an influential overseas technology company proposed WaveNet—a speech-synthesis method based on deep-learning algorithms—and demonstrated the ability to synthesize high-quality speech waveforms resembling the human voice. However, one drawback of WaveNet is the extremely complex structure of its neural networks, which demand large quantities of voice data for machine learning and require parameter tuning and various other laborious trial-and-error procedures to be repeated many times before accurate predictions can be obtained.

Overview and achievements of the research

One of the most well-known vocoders is the source-filter vocoder, which was developed in the 1960s and remains in widespread use today. The NII research team infused the conventional source-filter vocoder method with modern neural-network algorithms to develop a new technique for synthesizing high-quality speech waveforms resembling the human voice. Among the advantages of this neural source-filter (NSF) method is the simple structure of its neural networks, which require only about one hour of voice data for machine learning and can obtain correct predictive results without extensive parameter tuning. Moreover, large-scale listening tests have demonstrated that speech waveforms produced by NSF techniques are comparable in quality to those generated by WaveNet.

Because the theoretical basis of NSF differs from the patented technologies used by influential overseas ICT companies, the adoption of NSF techniques is likely to spur new technological advances in speech synthesis. For this reason, the source code implementing the NSF method has been made available to the public at no cost, allowing it to be widely used.

More information: Neural source-filter-based waveform model for statistical parametric speech synthesis. arxiv.org/abs/1810.11946

Source code: github.com/nii-yamagishilab/pr … ject-CURRENNT-public

Trained models (may be executed to generate English-language voices): github.com/nii-yamagishilab/pr … ect-CURRENNT-scripts

Voice samples (Japanese or English): nii-yamagishilab.github.io/samples-nsf/index.html

Provided by Research Organization of Information and Systems

Citation: New method for high-speed synthesis of natural voices (2019, February 6) retrieved 29 June 2024 from https://phys.org/news/2019-02-method-high-speed-synthesis-natural-voices.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

WaveGlow: A flow-based generative network to synthesize speech

5 shares

Feedback to editors

The Milky Way's eROSITA bubbles are large and distant

6 hours ago

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

7 hours ago

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

10 hours ago

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Jun 28, 2024

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

Jun 28, 2024

New computational microscopy technique provides more direct route to crisp images

Jun 28, 2024

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Jun 28, 2024

Tiny bright objects discovered at dawn of universe baffle scientists

Jun 28, 2024

New method for generating monochromatic light in storage rings

Jun 28, 2024

Soft, stretchy electrode simulates touch sensations using electrical signals

Jun 28, 2024

Load comments (0)

New method for high-speed synthesis of natural voices

The Milky Way's eROSITA bubbles are large and distant

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

New computational microscopy technique provides more direct route to crisp images

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Tiny bright objects discovered at dawn of universe baffle scientists

New method for generating monochromatic light in storage rings

Soft, stretchy electrode simulates touch sensations using electrical signals

Relevant PhysicsForums posts

Who can find the largest prime number with their own programmed code?

Math Major Trying to Learn CS

Parallelizing N-Queens

How to test locally hosted websites on mobile?

Question about learning programming

Why do emails from my contact form bounce?

WaveGlow: A flow-based generative network to synthesize speech

Google leverages WaveNet model's gains, sounds seem more natural

Introducing Cloud Text-to-Speech service for developers

Engineers translate brain signals directly into speech

Speech synthesizer designed to work out mouth movements into words

Scientists improve deep learning method for neural networks

Machine learning approach for low-dose CT imaging yields superior results

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Team breaks world record for fast, accurate AI training

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Medical Xpress

Tech Xplore

Science X

New method for high-speed synthesis of natural voices

The Milky Way's eROSITA bubbles are large and distant

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

New computational microscopy technique provides more direct route to crisp images

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Tiny bright objects discovered at dawn of universe baffle scientists

New method for generating monochromatic light in storage rings

Soft, stretchy electrode simulates touch sensations using electrical signals

Relevant PhysicsForums posts

Related Stories

WaveGlow: A flow-based generative network to synthesize speech

Google leverages WaveNet model's gains, sounds seem more natural

Introducing Cloud Text-to-Speech service for developers

Engineers translate brain signals directly into speech

Speech synthesizer designed to work out mouth movements into words

Scientists improve deep learning method for neural networks

Recommended for you

Machine learning approach for low-dose CT imaging yields superior results

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Team breaks world record for fast, accurate AI training

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Newsletter sign up

Donate and enjoy an ad-free experience