(PhysOrg.com) -- This week's Microsoft Big Idea event, TechFest 2012, presented the latest advances on the part of researchers at Microsoft. A bilingual talking head received much of the attention. Called "Monolingual TTS," the Microsoft research effort involves software that can translate the users speech into another language and in a voice that sounds like the original users. As Microsoft explains, with the use of a speakers monolingual recording, the system's algorithm can render speech sentences in different languages for building "mixed coded bilingual text to speech (TTS) systems."
According to the team, We have recordings of 26 languages which are used to build our TTS of corresponding languages. By using the new approach, we can synthesize any mixed language pair out of the 26 languages.
The software does this by first learning what the users voice sounds like. The tool works by using speech recognition, followed by translation, followed by the final text to speech output in a different language. The demo at Microsoft this week used an avatar of Craig Mundie, Microsoft's chief research and strategy officer, to illustrate the system in action.
A synthetic version of Mundie's voice, in English, welcomed the audience to Microsoft Research. Then the voice shifted to the same phrase in Mandarin. The words in Mandarin were reported to be recognizably Mundies voice.
"We will be able to do quite a few scenario applications," said Frank Soong, who is a principal researcher in Microsofts speech group. Soong helped create the system with his colleagues at Microsofts research lab in Beijing.
Microsoft, meanwhile, has had a vision for a while about virtual avatars being used along with this kind of technology. The vision is one where avatars not only look like their users, with photo-realistic effects, but can also successfully mimic their users voices and approximate their lip movements to put speech translation into instant, and personalized, action.
Last year, Mundie was on hand at the Microsoft Research Asia facility in Beijing, where he said that the coming-together of touch, vision, speech synthesis and recognition, will be an important advancement.
Another dream we have is that I should be able to sit in my office, send my avatar to meet somebody in Beijing, and I can speak in English and the avatar speaks in Mandarin in real-time," he said. "We want the computer to be a simultaneous translator."
Explore further:
MSI shows voice-controlled motherboard approach at IDF
More information:
research.microsoft.com/en-us/projects/photo-real_talking_head/
via Technology Review

default_ex
5 / 5 (1) Mar 10, 2012epsi00
5 / 5 (1) Mar 10, 2012bredmond
not rated yet Mar 10, 2012these virtual avatars could become lifelong friends and life coaches. Just think of facebook and netflix connected with iphone apps and other smart devices, and with programming to help you find what you want whether it be multimedia content, study materials, news, etc. it can also identify unhealthy habits and counsel you in a way that is effective to your preferences: (computer sees the user is feeling upset, plays beethoven as per user's preferences and says: "bob, today for lunch, i have a plan. eat a banana, a smoked turkey sandwich with mustard and a slice of tomato, and a glass of milk.") anyway, i am just saying it can help monitor people's behavior and provide them with things to make their life, career and love more effective, and do so in a way that feels natural to the person.
Xynos21
not rated yet Mar 10, 2012SiberskiyaGaluboy
1 / 5 (1) Mar 11, 2012Sanescience
not rated yet Mar 11, 2012PhotonX
not rated yet Mar 11, 2012@Sanescience: Made me laugh. I was just thinking a day or two ago that the old 300 baud bandwidth admonishments had died along with the BBS world. If only we all had seen the future of streaming video....
.
.
@ bredmond: Greater numbers, bredmond, greater numbers of languages, not greater amounts. Since this is an article about languages, I'll take the opportunity to nit pick on usage where I wouldn't usually do so (just kidding, rest assured). Now, everybody, feel free to pick out the usage errors in *my* post. There are at least two, I think.
PoponDex
5 / 5 (1) Mar 11, 2012Sonhouse
5 / 5 (1) Mar 11, 2012The english version sounded more like Stephan Hawking still.
Callippo
not rated yet Mar 11, 2012SiberskiyaGaluboy
1 / 5 (1) Mar 11, 2012Feldagast
not rated yet Mar 11, 2012Urgelt
not rated yet Mar 11, 2012When Microsoft is ready to show off either or both, we'll be eager to hear about it.
Tausch
1 / 5 (1) Mar 12, 2012Aren't you glade our hearing and voice is limited in frequency range? Makes learning the sounds of one language - the human language - much easy.
Much harder to learn for us are the languages of life forms utilizing sounds of an unlimited frequency range - we will sound to them, like we are repeating ourselves endlessly.
Which combinations of sounds within our vocal cord range can not be duplicated with our voice?
This reminds everyone of the binary nature of nature. Where the 'amount' of 'zeros' and 'ones' harbors the potential to represent any 'sound' to any arbitrary precision to convey what we have acquire through evolution the label and meaning of the word:
'meaning'.
HydraulicsNath
not rated yet Mar 12, 2012bredmond
not rated yet Mar 14, 2012what i mean is if he were to talk for a long time, would the translation be correct throughout the whole duration.