Avatar Mimics You in Real Time

Mar 25, 2008 By Lisa Zyga feature
Avatar Mimics You in Real Time
Screen shots of the avatar showing two different gestures, based on the human’s movements. Credit: Schreer, et al. ©2008 IEEE.

It’s a little bit like looking in the mirror at your cartoon double, except that the “reflection” is an avatar on your computer screen. Wave your hand, nod your head, speak a sentence, and your avatar does the same.

The technology is already on display at the headquarters of Deutsche Telekom in Bonn, Germany, and at the Deutsche Telekom Laboratories in Berlin. Visitors can experiment with making gestures and watching comical characters mimic them in real time.

The public feedback is very positive, according to researchers Oliver Schreer, Peter Eisert, and Ralf Tanger, of the Fraunhofer Heinrich-Hertz-Institut in Berlin, and Roman Englert with the Deutsche Telekom Laboratories and Ben Gurion University in Beer-Sheva, Israel. The team will publish the results of its study on its vision- and speech-driven avatar technology in an upcoming issue of IEEE Transactions on Multimedia.

“The presented approach allows intuitive, touchless user interaction,” Schreer told PhysOrg.com. “Due to the recognition capabilities, any novel interface can be designed for interactive human-computer interaction.”

The researchers’ prototype system is compatible with a normal PC, with the only hardware required being a low-cost Webcam and a pair of standard headphones. The complete set of audio-visual analysis is performed in real-time, which allows an immediate animation of the virtual character. To begin, the program wouldn’t require any training or individual input of gestures. However, since the system relies on skin color recognition to follow the movements of the hands and head, users must wave their hands around at first to enable the system to determine the individual’s skin color. Wearing skin-colored clothing should be avoided.

The system can recognize a set of 66 parameters that define facial expression, and it also contains a set of high-level facial expressions (such as joy, sadness, surprise, and disgust). Users can also press buttons to manually activate these expressions. The system also recognizes “visemes,” which move the lips in accordance with the phoneme being spoken based on voice analysis. A set of 15 visemes can represent all phonemes. The system also recognizes a set of 186 body motion parameters that define joint rotation in the arms and upper body.
The head rotation is detected as well in order to represent head nick, head shake and head roll.

By detecting the positions of fingers, the system can recognize many basic gestures, including many from the American Sign Language alphabet. Sometimes the avatar’s hands don’t exactly mimic the user’s hands, as the main aim is to make the avatar’s movements as smooth and natural as possible.

In the future, the researchers plan to apply the system in virtual chat rooms and online call center applications, such as technical support. In both situations, users are represented by avatars. The avatars become animated based on the users’ movements and speech, while maintaining the privacy of the users. The researchers also hope to integrate the avatar system into mobile devices, where it could serve as a user-friendly interface in addition to touch screens, a stylus, or speech recognition systems.

“Some aspects like gestures based on hand recognition are already market mature,” Schreer said. “Finger analysis and interpretation are more complicated and may need another one or two years in order to achieve robust algorithms that operate under real-life conditions, i.e. the real environment. Initial applications are scrolling of menus in cellulars (e.g. SMS browsing) and in the medical area for the control of interfaces in operating rooms.”

More information: Schreer, Oliver; Englert, Roman; Eisert, Peter; and Tanger, Ralf. “Real-Time Vision and Speech Driven Avatars for Multimedia Applications.” IEEE Transactions on Multimedia. To be published in a future issue.

Copyright 2008 PhysOrg.com.
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in whole or part without the express written permission of PhysOrg.com.

Explore further: Computer scientist publishes new algorithm cluster to data mine health records

add to favorites email to friend print save as pdf

Related Stories

Nintendo's TVii a replacement for the remote (Update)

Dec 19, 2012

Nintendo is switching on a television service that transforms the tablet-like controller for its new Wii U game console into a remote that changes the channel on your TV and puts programs from the Internet ...

Beam me to my meeting!

Oct 29, 2012

Forget about crackly lines or blurry webcams. Video conferencing has just got a whole lot better. By combining robotics, video and a host of other sensor and display technologies, European scientists can now virtually 'beam' ...

Recommended for you

The brain as a model for future supercomputers

May 14, 2013

(Phys.org) —The brain's repute took a big hit in 1997 when an IBM supercomputer defeated world chess champion Gary Kasparov in a match reported around the world. But in the second round, the brain is back.

User comments : 2

Adjust slider to filter visible comments by rank

Display comments: newest first

seb
3.8 / 5 (5) Mar 25, 2008
Sounds neat.. except that the cam on my acer laptop that I bought 2 years ago had an application that came with it that does just that.. although it wasn't 100%, it did things like let you add stuff to your video (like mustaches and hats and such) or just plain replace you with an animated avatar that mimics your movements. It wasn't as precise as this, with its fingers etc, but considering its a cheap cam and your typical moderately coded low end software, thats to be expected
pingpong
2.3 / 5 (3) Mar 25, 2008
Sounds neat.. except that the cam on my acer laptop that I bought 2 years ago had an application that came with it that does just that.. although it wasn't 100%, it did things like let you add stuff to your video (like mustaches and hats and such) or just plain replace you with an animated avatar that mimics your movements. It wasn't as precise as this, with its fingers etc, but considering its a cheap cam and your typical moderately coded low end software, thats to be expected

I have a cheap old car. It has 4 wheels. It can drive. What's so special about those Formula 1 and Indy cars?
I also can play basketball. I can hold a ball. I can throw it. What's so special about NBA players?
I have a cheap laser pointer in my pocket.....

More news stories

German energy shift faces headwinds

Tense engineers have their eyes peeled on complex colour-coded diagrams on a wall-sized screen that makes their control room look like the inside of a spaceship.

Internet in 'coma' as Iran election looms

Iran is tightening control of the Internet ahead of next month's presidential election, mindful of violent street protests that social networkers inspired last time around over claims of fraud, users and ...

China police billions spell profit opportunity

Mannequins in riot gear, armoured cars and drones line a police equipment and "anti-terrorism technology" trade fair in Beijing as vendors seek to profit from China's huge internal security budget.

Russia retrieves mice, newts from space

A Russian capsule filled with 45 mice and 15 newts along with other small animals returned from a month's mission in orbit on Sunday with data scientists hope will pave the way for a manned flight to Mars.

Honeybees trained in Croatia to find land mines

(AP)—Mirjana Filipovic is still haunted by the land mine blast that killed her boyfriend and blew off her left leg while on a fishing trip nearly a decade ago. It happened in a field that was supposedly ...

Galaxy's Ring of Fire

Johnny Cash may have preferred this galaxy's burning ring of fire to the one he sang about falling into in his popular song. The "starburst ring" seen at center in red and yellow hues is not the product of ...