Speech Synthesizer Helps Movie Critic

June 15, 2010 By Phillip F. Schewe, Inside Science News Service
Matthew Aylett and colleagues are aiming to capture a more conversational style of synthesized speech.

The voices you hear on message services are often created artificially by fitting together short audio snippets from a large library of vocalized words and sounds. Scientists are now moving beyond the older generic voices to produce what, to many listeners, sounds like an "actual" person speaking.

Voice synthesis technology had a recent success when a personalized text-to-speech system was crafted for the movie critic Roger Ebert, who suffers from thyroid cancer and is unable to speak any longer. The synthetic Ebert speech was created from many hours of his recorded during past television programs. The company that made Ebert’s system, CereProc, is able to make TTS conversions quickly, allowing them to provide more than just a generic-voice synthesis.

Matthew Aylett, one of the founders of CereProc, said that he and his colleagues do not necessarily aim for smooth voicings. Instead, he said, "We want the variation which gives a voice a fresh and natural feel to it. This means not getting our voice talents to speak in a boring and neutral way but capturing their more conversational speech style."

At CereProc they are even able to build a certain amount of emotion into their voices as well creating regional accents. The company, located in Edinburgh, Scotland, has been successful, for example, in reproducing Scottish and Irish accented speech.

For "cloning" voices of specific people (they did one for President George Bush, for example), he said they sometimes have resort to "found" recordings. The trouble with using snippets of voice is that these recordings were often made under a variety of acoustic conditions, which then have to be corrected in making a final voice.

Roger Ebert learned about the President Bush "voice" and asked if CereProc could assemble a voice for him using the large inventory of of his television show. So far they have used about five hours of Ebert’s voice to produce a voice bank of about 300,000 phonetic sounds.

Timothy Bunnell, a scientist at Nemours Biomedical Research in Wilmington, Del., is working to make personalized voices accessible to everyone, especially for those with neurodegenerative diseases.

Bunnell's voice synthesis system can be prepared for people who still have the power of speech. By contrast, Aylett’s synthetic speech program, such as that for Roger Ebert, is based on recordings of people who can no longer speak.

Carrying out text-to-speech synthesis for children is more difficult. "It is difficult for young children to record enough speech with the required degree of consistency and precision needed to build a high-quality synthetic voice," said Bunnell. The main problem, he said, is not with the nature of the utterances, but with the amount of variability in children's , much more than for adults.

Aylett enjoys synthesis research. "It's fun to use the synthesizer, but it’s even better to see it helping people who really need it."

Explore further: An average voice is beautiful, say scientists

Related Stories

An average voice is beautiful, say scientists

January 25, 2010

(PhysOrg.com) -- Nobody wants to be average, so we are told, but scientists at the University of Glasgow have found that when it comes to vocal attractiveness, sounding average attracts more admirers.

Teacher talk strains voices, especially for women

October 26, 2009

Teachers tend to spend more time speaking than most professionals, putting them at a greater risk for hurting their voices -- they're 32 times more likely to experience voice problems, according to one study. And unlike singers ...

Nuance buys British voice-to-text company SpinVox

December 30, 2009

US speech recognition company Nuance Communications Inc. announced on Wednesday that it has acquired British voice-to-text firm SpinVox for 102.5 million dollars in a stock and cash deal.

Showing the Mechanics of Making Music

May 3, 2007

Why do some people sound good enough to compete on American Idol while others can't carry a tune? With a lab full of tubes, wires and computers, Nandhu Radhakrishnan uses speech pathology to help others become better actors ...

Recommended for you

China auto show highlights industry's electric ambitions

April 22, 2018

The biggest global auto show of the year showcases China's ambitions to become a leader in electric cars and the industry's multibillion-dollar scramble to roll out models that appeal to price-conscious but demanding Chinese ...

Robot designed for faster, safer uranium plant pipe cleanup

April 21, 2018

Ohio crews cleaning up a massive former Cold War-era uranium enrichment plant in Ohio plan this summer to deploy a high-tech helper: an autonomous, radiation-measuring robot that will roll through miles of large overhead ...

How social networking sites may discriminate against women

April 20, 2018

Social media and the sharing economy have created new opportunities by leveraging online networks to build trust and remove marketplace barriers. But a growing body of research suggests that old gender and racial biases persist, ...

Virtually modelling the human brain in a computer

April 19, 2018

Neurons that remain active even after the triggering stimulus has been silenced form the basis of short-term memory. The brain uses rhythmically active neurons to combine larger groups of neurons into functional units. Until ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.