Face of the future rears its head: 'Zoe' uses a basic set of six simulated emotions (w/ video)

March 19, 2013

(Phys.org) —Meet Zoe: a digital talking head which can express human emotions on demand with "unprecedented realism" and could herald a new era of human-computer interaction.

A virtual "talking head" which can express a full range of human emotions and could be used as a digital , or to replace with "face messaging", has been developed by researchers.

The lifelike face can display emotions such as happiness, anger, and fear, and changes its voice to suit any feeling the user wants it to simulate. Users can type in any message, specifying the requisite emotion as well, and the face recites the text. According to its designers, it is the most expressive controllable avatar ever created, replicating human emotions with unprecedented realism.

The video will load shortly

The system, called "Zoe", is the result of a collaboration between researchers at Toshiba's Cambridge Research Lab and the University of Cambridge's Department of Engineering. Students have already spotted a striking resemblance between the disembodied head and Holly, the ship's computer in the British sci-fi comedy, .

Appropriately enough, the face is actually that of Zoe Lister, an actress perhaps best-known as Zoe Carpenter in the Channel 4 series, Hollyoaks. To recreate her face and voice, researchers spent several days recording Zoe's speech and . The result is a system that is light enough to work in , and could be used as a personal assistant in smartphones, or to "face message" friends.

The framework behind "Zoe" is also a template that, before long, could enable people to upload their own faces and voices - but in a matter of seconds, rather than days. That means that in the future, users will be able to customise and personalise their own, emotionally realistic, digital assistants.

If this can be developed, then a user could, for example, text the message "I'm going to be late" and ask it to set the emotion to "frustrated". Their friend would then receive a "face message" that looked like the sender, repeating the message in a frustrated way.

The team who created Zoe are currently looking for applications, and are also working with a school for autistic and deaf children, where the technology could be used to help pupils to "read" emotions and lip-read. Ultimately, the system could have multiple uses – including in gaming, in audio-visual books, as a means of delivering online lectures, and in other user interfaces.

"This technology could be the start of a whole new generation of interfaces which make interacting with a computer much more like talking to another human being," Professor Roberto Cipolla, from the Department of Engineering, University of Cambridge, said.

"It took us days to create Zoe, because we had to start from scratch and teach the system to understand language and expression. Now that it already understands those things, it shouldn't be too hard to transfer the same blueprint to a different voice and face."

As well as being more expressive than any previous system, Zoe is also remarkably data-light. The program used to run her is just tens of megabytes in size, which means that it can be easily incorporated into even the smallest computer devices, including tablets and smartphones.

It works by using a set of fundamental, "primary colour" emotions. Zoe's voice, for example, has six basic settings - Happy, Sad, Tender, Angry, Afraid and Neutral. The user can adjust these settings to different levels, as well as altering the pitch, speed and depth of the voice itself.

By combining these levels, it becomes possible to pre-set or create almost infinite emotional combinations. For instance, combining happiness with tenderness and slightly increasing the speed and depth of the voice makes it sound friendly and welcoming. A combination of speed, anger and fear makes Zoe sound as if she is panicking. This allows for a level of emotional subtlety which, the designers say, has not been possible in other avatars like Zoe until now.

To make the system as realistic as possible, the research team collected a dataset of thousands of sentences, which they used to train the speech model with the help of real-life actress, Zoe Lister. They also tracked Lister's face while she was speaking using computer vision software. This was converted into voice and face-modelling, mathematical algorithms which gave them the voice and image data they needed to recreate expressions on a digital face, directly from the text alone.

The effectiveness of the system was tested with volunteers via a crowd-sourcing website. The participants were each given either a video, or audio clip of a single sentence from the test set and asked to identify which of the six basic emotions it was replicating. Ten sentences were evaluated, each by 20 different people.

Volunteers who only had video and no sound only successfully recognised the emotion in 52% of cases. When they only had audio, the success rate was 68%. The two together, however, produced a successful recognition rate of 77% - slightly higher than the recognition rate for the real-life Zoe, which was 73%! This higher rate of success compared with real life is probably because the synthetic talking head is deliberately more stylised in its manner.

As well as finding applications for their new creation, the research team will now work on creating a version of the system which can be personalised by users themselves.

"Present day still revolves around typing at a keyboard or moving and pointing with a mouse." Cipolla added. "For a lot of people, that makes computers difficult and frustrating to use. In the future, we will be able to open up computing to far more people if they can speak and gesture to machines in a more natural way. That is why we created Zoe - a more expressive, emotionally responsive face that human beings can actually have a conversation with."

Explore further: Put on a happy face: Happy digital characters sell products better than sad ones

Related Stories

Avatars as communicators of emotions

July 9, 2008

Current interactive systems enable users to communicate with computers in many ways, but not taking into account emotional communication. A PhD thesis presented at the University of the Basque Country puts forward the use ...

Perception of emotion is culture-specific

September 15, 2010

Want to know how a Japanese person is feeling? Pay attention to the tone of his voice, not his face. That's what other Japanese people would do, anyway. A new study examines how Dutch and Japanese people assess others' emotions ...

Recommended for you

Samsung to disable Note 7 phones in recall effort

December 9, 2016

Samsung announced Friday it would disable its Galaxy Note 7 smartphones in the US market to force remaining owners to stop using the devices, which were recalled for safety reasons.

Swiss unveil stratospheric solar plane

December 7, 2016

Just months after two Swiss pilots completed a historic round-the-world trip in a Sun-powered plane, another Swiss adventurer on Wednesday unveiled a solar plane aimed at reaching the stratosphere.

Solar panels repay their energy 'debt': study

December 6, 2016

The climate-friendly electricity generated by solar panels in the past 40 years has all but cancelled out the polluting energy used to produce them, a study said Tuesday.


Adjust slider to filter visible comments by rank

Display comments: newest first

2 / 5 (2) Mar 19, 2013
Its Holly from Red Dwarf!!
4 / 5 (1) Mar 19, 2013
Not very convincing. And still in the uncanny valley, at least in some moments.
not rated yet Mar 19, 2013
still in the uncanny valley, at least in some moments.

I noticed that the eyes are able to change position in relation to each other, subtle but detectable. In a real person the skull would ensure consistent spacing of the eyes.
5 / 5 (3) Mar 19, 2013
The head movement is always a rotation around the center of the head. It looks weird. They should maybe model head movement from the neck position instead.
As Jimbaloid noted, the deformable skull also looks strange.
3 / 5 (2) Mar 19, 2013
I was not able to detect the emotions conveyed by looking at her face. I found them all to be very similar.
3 / 5 (2) Mar 19, 2013
I was not able to detect the emotions conveyed by looking at her face. I found them all to be very similar.

I agree, it would also help if it seemed to be looking at you and not up in the air behind you. That squeaky voice would have to changed too. She sounds like she's whining about everything she says.
not rated yet Mar 19, 2013
This creation needs a DO!
not rated yet Mar 20, 2013
Reminds me of Max Headroom from the 80's.
Not very original
1 / 5 (2) Mar 20, 2013
I was not able to detect the emotions conveyed by looking at her face. I found them all to be very similar.

I agree, it would also help if it seemed to be looking at you and not up in the air behind you. That squeaky voice would have to changed too. She sounds like she's whining about everything she says.

i also couldn't detect the emotions, her eyes didnt change.
5 / 5 (1) Mar 20, 2013
I don;t know why I have been down graded for my holly comment, seems valid to me, See the link below.

http://www.google...?q=holly red dwarf&hl=en&tbm=isch&tbo=u&source=univ&sa=X&ei=Ha9JUcDXN_SZ0QXa2YGwDQ&ved=0CDsQsAQ&biw=1280&bih=683
3 / 5 (2) Mar 21, 2013
I think they need to watch some video games

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.