November 1, 2017

New text-to-speech tool for DIY voiceovers—from soft, sad and sultry to scary

by Ron Hoory And Alex Sorin, IBM

The animation world is rich in lovable and memorable characters, each with its own unique voice and personality—and animators, writers, and designers keep coming up with even more new games, film ideas, villains, and heroes. Creating voiceovers for these characters is a time-consuming and expensive process that often involves holding auditions for voice actors, and studio time to record.

What if you could take a text, generate speech, and go on from there to create a whole bunch of new voices – just by changing different vocal aspects such as pitch, rhythm, timbre, etc.? My team at IBM Research-Haifa is building on top of Watson text-to-speech technology to create customizable voices. Our vision for a solution to easily create new, distinct, expressive voices, led us to develop an automated voice creation process that is fast and flexible.

Our vision comes to life

Our vision has already come to life in cooperation with Sesame Street and the IBM Research Education team. We participated in an IBM-Sesame Street pilot at Georgia's Gwinnett County Public Schools in April-May 2017. Sesame Workshop content and Watson Education technology were introduced into classrooms for the first time, using an app for learning new vocabulary. Our challenge was to synthesize voices for new Sesame characters that will make kids smile, similar to familiar characters like Ernie, Big Bird, and Elmo.

Using voice as a tool

Credit: IBM

The IBM Virtual Voice Creator is a web-based tool that starts with three standard text-to-speech voices available for American English at WDC TTS service. Using the tool, we can change different parameters and transform these standard voices into new virtual voices.

Think of it as a kind of a mixing console like sound engineers use – but for voice manipulation. Sliders control and change each different vocal aspect, such as pitch, speed, timbre, and breathiness, and can apply them in endless combinations. The GUI also creates a "visual signature" of the parameter controls, with a visual representation that changes its shape as the parameters are manipulated. It's easy to play around and create new voices. Happy Lisa can quickly turn into a wicked witch, and sullen Michael can be recreated as a cheerful little boy.

Animators can use this tool to create new voices for game characters or cartoon heroes. All they have to do is choose a standard voice, and play around with the sliders, using their imagination to shape the voice persona of the new character. Then, they can add in the text to get the audio output in their "new" voice for the soundtrack without any need for voice actors and recording studios.

The technology enables emerging games where the scripts are generated on-line, and the audio cannot be recorded in advance.

The entertainment field is just one example. We're very excited about the possibilities of applying this text-to-speech technology in any context that needs multiple distinct voices created on-demand, such as education, advertising, and more.

Provided by IBM

Citation: New text-to-speech tool for DIY voiceovers—from soft, sad and sultry to scary (2017, November 1) retrieved 17 July 2024 from https://phys.org/news/2017-11-text-to-speech-tool-diy-voiceoversfrom-soft.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Google leverages WaveNet model's gains, sounds seem more natural

6 shares

Feedback to editors

New 3D anatomical atlas of the African clawed frog increases understanding of development and metamorphosis processes

9 hours ago

Intensive farming could raise risk of new pandemics, researchers warn

10 hours ago

Scientists develop new AI method to create material 'fingerprints'

12 hours ago

Study shows frogs can quickly increase their tolerance to pesticides

13 hours ago

Nature-based solutions to disaster risk from climate change are cost-effective, study confirms

13 hours ago

Astronomers discover what may be 21 neutron stars orbiting sun-like stars

14 hours ago

Scientists use machine learning to predict diversity of tree species in forests

15 hours ago

Physicists pool skills to better describe the unstable sigma meson particle

16 hours ago

Telescope tag-team discovers 10 strange and exotic pulsars

16 hours ago

NASA transmits hip-hop song to deep space for first time

16 hours ago

Load comments (0)

New text-to-speech tool for DIY voiceovers—from soft, sad and sultry to scary

Our vision comes to life

Using voice as a tool

New 3D anatomical atlas of the African clawed frog increases understanding of development and metamorphosis processes

Intensive farming could raise risk of new pandemics, researchers warn

Scientists develop new AI method to create material 'fingerprints'

Study shows frogs can quickly increase their tolerance to pesticides

Nature-based solutions to disaster risk from climate change are cost-effective, study confirms

Astronomers discover what may be 21 neutron stars orbiting sun-like stars

Scientists use machine learning to predict diversity of tree species in forests

Physicists pool skills to better describe the unstable sigma meson particle

Telescope tag-team discovers 10 strange and exotic pulsars

NASA transmits hip-hop song to deep space for first time

Relevant PhysicsForums posts

Particle.js: Exploring Particle Physics with Web Technologies

Help solving a geometrical matching issue with Graph Neural Networks

5 GHz PC WiFi connection Cybersecurity question

Help with some optimization code for Block Matrices

Is an API Always Necessary for Server-Client Communication?

I did this POST message configuration damage to my wifi internet, help

Google leverages WaveNet model's gains, sounds seem more natural

Fujitsu develops new speech synthesis technology

Technology gives unique voices to those who can't speak

Nemours, Therapy Box create app for kids, adults with speech deficits to use own voices

People who hear voices can detect hidden speech in unusual sounds

Bilingual children are better at recognizing voices

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

New text-to-speech tool for DIY voiceovers—from soft, sad and sultry to scary

Our vision comes to life

Using voice as a tool

New 3D anatomical atlas of the African clawed frog increases understanding of development and metamorphosis processes

Intensive farming could raise risk of new pandemics, researchers warn

Scientists develop new AI method to create material 'fingerprints'

Study shows frogs can quickly increase their tolerance to pesticides

Nature-based solutions to disaster risk from climate change are cost-effective, study confirms

Astronomers discover what may be 21 neutron stars orbiting sun-like stars

Scientists use machine learning to predict diversity of tree species in forests

Physicists pool skills to better describe the unstable sigma meson particle

Telescope tag-team discovers 10 strange and exotic pulsars

NASA transmits hip-hop song to deep space for first time

Relevant PhysicsForums posts

Related Stories

Google leverages WaveNet model's gains, sounds seem more natural

Fujitsu develops new speech synthesis technology

Technology gives unique voices to those who can't speak

Nemours, Therapy Box create app for kids, adults with speech deficits to use own voices

People who hear voices can detect hidden speech in unusual sounds

Bilingual children are better at recognizing voices

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience