Researcher gives subjects their voice

Researcher gives subjects their voice
Credit: Rupal Patel.

Stephen Hawking and a 9-​​year-​​old girl with a speech dis­order most likely use the same syn­thetic voice. It's called Per­fect Paul and it's easy to under­stand, espe­cially in acousti­cally chaotic envi­ron­ments like class­rooms full of chil­dren. While new, more natural-​​sounding voices are avail­able, Per­fect Paul remains the most oft-​​used syn­thetic voice in the com­mu­nity of dis­or­dered speakers.

But Per­fect Paul con­veys none of the per­son­ality inherent in vocal iden­tity, explains Rupal Patel, an asso­ciate pro­fessor of com­puter sci­ence and speech lan­guage pathology and audi­ology.

"What we're trying to do is improve the quality," she said, "but also increase the per­son­al­iza­tion of those voices, by not just making it a little kid's voice, but making it that little kid's voice."

Backed by a grant from the National Sci­ence Foun­da­tion, Patel and her research team are devel­oping ways to create per­son­al­ized syn­thetic voices that resemble users' vocal iden­ti­ties while remaining as under­stand­able as those of the healthy donors.

In the first iter­a­tion of the project, which Patel calls VocaliD (pro­nounced vocality, for Vocal Iden­tity), her team com­pu­ta­tion­ally merged the acoustics of a sus­tained vowel sound from a child with a speech dis­order, like this:

with the acoustics of a full sen­tence spoken by a healthy speaker of the same demo­graphic, like this:

The result is a clear, syn­thetic voice with the per­son­ality of the end user:

These voices have already elicited great responses from par­ents; one said, "If [my son] had been able to talk, this is what he would sound like." How­ever, the early ver­sion of VocaliD used a difficult-​​to-​​scale  approach that is not easily repro­ducible. Patel said, "We'd like to be able to allow users to create new voices as they mature in the same way a nat­ural voice evolves."

With the sup­port of another grant from the National Sci­ence Foun­da­tion, her team is cur­rently adding phys­i­o­log­ical infor­ma­tion on top of the acoustics.  "When you hear speech, it's a com­bi­na­tion of your source and your filter," Patel said. The source, she explained, derives from the voice box in the larynx whereas the filter is deter­mined by the shape and length of the vocal tract.

Vocal characteristics—such as pitch, breath­i­ness, and loudness—all emerge from the vocal folds in the larynx and give rise to vocal iden­tity. Mod­u­lating those fea­tures by changing the shape of our mouths and moving our tongues gives rise to dis­tinct vowel and con­so­nant sounds, which, Patel said, are typ­i­cally impaired in dis­or­dered speech.

Using data from a set of sen­sors placed on par­tic­i­pants' tongues and mouths, the researchers will deter­mine the most effi­cient way to approx­i­mate the phys­ical aspects of the dis­or­dered speaker's vocal tract. They can then add this infor­ma­tion into the voice-​​synthesis soft­ware to create voices that will grow and change as the users mature.

The aca­d­emic com­mu­nity has long accepted the source-​​filter theory of speech, but more work needs to be done in order to under­stand it, according to Patel, espe­cially as researchers develop more advanced speech tech­nolo­gies for secu­rity and other applications.

Patel's work in par­tic­ular also aims to inform basic research ques­tions such as, "How much do both the source and filter con­tribute to the iden­tity of a speaker's output?"

Patel's soft­ware is com­pat­ible across assis­tive tech­nology plat­forms, including main­stream touch-​​pad devices, a fea­ture she hopes will increase its adop­tion within the com­mu­nity. Patel spec­u­lates that assis­tive com­mu­ni­ca­tion devices will even­tu­ally appeal to healthy people as a new way of learning, com­mu­ni­cating, and interacting.

"The iPad rev­o­lu­tion is helping to break down bar­riers and increasing the emphasis on user inter­face issues," said Patel, who has been working to improve assis­tive com­mu­ni­ca­tion tech­nolo­gies for more than 16 years. "Lots of kids, both healthy and impaired, are using screens to interact now."

Citation: Researcher gives subjects their voice (2013, February 21) retrieved 19 April 2024 from https://medicalxpress.com/news/2013-02-subjects-voice.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

3Qs: Facial recognition is the new fingerprint

 shares

Feedback to editors