Bilingual avatar speaks Mundie language

Bilingual avatar speaks Mundie language

(PhysOrg.com) -- This week's Microsoft Big Idea event, TechFest 2012, presented the latest advances on the part of researchers at Microsoft. A bilingual talking head received much of the attention. Called "Monolingual TTS," the Microsoft research effort involves software that can translate the user’s speech into another language and in a voice that sounds like the original user’s. As Microsoft explains, with the use of a speaker’s monolingual recording, the system's algorithm can render speech sentences in different languages for building "mixed coded bilingual text to speech (TTS) systems."

According to the team, “We have recordings of 26 languages which are used to build our TTS of corresponding languages. By using the new approach, we can synthesize any mixed language pair out of the 26 languages.”

The software does this by first “learning” what the user’s voice sounds like. The tool works by using speech recognition, followed by translation, followed by the final output in a different language. The demo at this week used an avatar of Craig Mundie, Microsoft's chief research and strategy officer, to illustrate the system in action.

A synthetic version of Mundie's voice, in English, welcomed the audience to Microsoft Research. Then the voice shifted to the same phrase in Mandarin. The words in Mandarin were reported to be recognizably Mundie’s voice.

Craig Mundie's talking head speaks in English.
Craig Mundie's talking head speaks in Chinese.

Some obvious applications might be in a wide range of service-related activities, from the hospitality and tourism market sectors to government workers making use of the software with communities at home and in their international travels.

"We will be able to do quite a few scenario applications," said Frank Soong, who is a principal researcher in Microsoft’s speech group. Soong helped create the system with his colleagues at Microsoft’s research lab in Beijing.

Microsoft, meanwhile, has had a vision for a while about virtual avatars being used along with this kind of technology. The vision is one where avatars not only look like their users, with photo-realistic effects, but can also successfully mimic their users’ voices and approximate their lip movements to put speech translation into instant, and personalized, action.

Last year, Mundie was on hand at the Microsoft Research Asia facility in Beijing, where he said that the coming-together of touch, vision, synthesis and recognition, will be an important advancement.

“Another dream we have is that I should be able to sit in my office, send my avatar to meet somebody in Beijing, and I can speak in English and the avatar speaks in Mandarin in real-time," he said. "We want the computer to be a simultaneous translator."


Explore further

MSI shows voice-controlled motherboard approach at IDF

More information: research.microsoft.com/en-us/p … o-real_talking_head/
via Technology Review

© 2011 PhysOrg.com

Citation: Bilingual avatar speaks Mundie language (2012, March 10) retrieved 22 August 2019 from https://phys.org/news/2012-03-bilingual-avatar-mundie-language.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
0 shares

Feedback to editors

User comments

Mar 10, 2012
Really Microsoft, your behind in times. Valve's had the facial morph tech that approximates visual emotion combined with lip syncing since HL2 was released. Not only that it's free for anyone that bought the game to play with to their heart's content. The only advanced I see there is applying Microsoft's horrible SAPI to synthesize the voice. The language choice is no coincidence either, hides a lot (though not all) of the shortcomings of Microsoft SAPI.

Mar 10, 2012
Nothing new from MS, wait until someone invents something and re-invent a bad version of it.

Mar 10, 2012
wow. his mandarin is great. i wonder how accurate it is with greater amounts of language.

these virtual avatars could become lifelong friends and life coaches. Just think of facebook and netflix connected with iphone apps and other smart devices, and with programming to help you find what you want whether it be multimedia content, study materials, news, etc. it can also identify unhealthy habits and counsel you in a way that is effective to your preferences: (computer sees the user is feeling upset, plays beethoven as per user's preferences and says: "bob, today for lunch, i have a plan. eat a banana, a smoked turkey sandwich with mustard and a slice of tomato, and a glass of milk.") anyway, i am just saying it can help monitor people's behavior and provide them with things to make their life, career and love more effective, and do so in a way that feels natural to the person.

Mar 10, 2012
avatars? really? You think when I conduct a business meeting I'm gonna want to talk to someones avatar? This stuff looks great in movies but in the real world it's unpractical. What we need is something on the lines of a babel fish. The best way to market this would be through smartphone apps. Record the speaker,translate, then playback through a bluetooth. If successful then maybe you can consider avatars. left foot, right foot, left foot.

Mar 11, 2012
this is good invention as long as nothing gets lost in translation. definitions must be precise and grammar in both or all languages, otherwise misunderstandings may happen that could prevent good partnerships or business.

Mar 11, 2012
Think of the bandwidth savings if all you needed to send was the text!

Mar 11, 2012
Wow. Now I'll have someone to talk to while my Google car drives itself to work. Oh, wait, that's what cell phones are for, I guess.

@Sanescience: Made me laugh. I was just thinking a day or two ago that the old 300 baud bandwidth admonishments had died along with the BBS world. If only we all had seen the future of streaming video....
.
wow. his mandarin is great. i wonder how accurate it is with greater amounts of language.

.
@ bredmond: Greater numbers, bredmond, greater numbers of languages, not greater amounts. Since this is an article about languages, I'll take the opportunity to nit pick on usage where I wouldn't usually do so (just kidding, rest assured). Now, everybody, feel free to pick out the usage errors in *my* post. There are at least two, I think.

Mar 11, 2012
I dont get why you dont think this is not a good invention.. being able to talk in real time with clients of any language would be amazing.

Mar 11, 2012
Nothing new from MS, wait until someone invents something and re-invent a bad version of it.
.. but I do still believe, it's quite a nice idea to become a Microsoft owner...;-) It's surprisingly stable and successful company, which actually sells a software, not just Internet ads or overpriced toys.

Mar 11, 2012
big benefit for business, and tourists to ask for the directions if lost. also, diplomatic relations and tribal meetings benefit if translations are valid. I like this invention. it may prove very useful.

Mar 11, 2012
Eh. These are gimmicks. The core features Microsoft is *not* bragging about are translation accuracy and avatar AI.

When Microsoft is ready to show off either or both, we'll be eager to hear about it.

Mar 12, 2012
Now we can speak to the Wookies without fear of pronouncing things incorrectly.

Mar 14, 2012
@ bredmond: Greater numbers, bredmond, greater numbers of languages, not greater amounts.

what i mean is if he were to talk for a long time, would the translation be correct throughout the whole duration.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more