FaceDirector software generates desired performances in post-production, avoiding reshoots

December 11, 2015, Disney Research

Some film directors are famous for demanding that scenes be shot and re-shot repeatedly until actors express just the right emotion at the right time, but directors will be able to fine-tune performances in post-production, rather than on the film set, with a new system developed by Disney Research and the University of Surrey.

Called FaceDirector, the system enables a director to seamlessly blend from a couple of video takes to achieve the desired effect.

"It's not unheard of for a director to re-shoot a crucial scene dozens of times, even 100 or more times, until satisfied," said Markus Gross, vice president of research at Disney Research. "That not only takes a lot of time - it also can be quite expensive. Now our research team has shown that a director can exert control over an actor's performance after the shoot with just a few takes, saving both time and money."

Jean-Charles Bazin, associate research scientist at Disney Research, and Charles Malleson, a Ph.D. student at the University of Surrey's Centre for Vision, Speech and Signal Processing, showed that FaceDirector is able to create a variety of novel, visually plausible versions of performances of actors in close-up and mid-range shots.

Moreover, the system works with normal 2D video input acquired by standard cameras, without the need for additional hardware or 3D face reconstruction.

The researchers will present their findings at ICCV 2015, the International Conference on Computer Vision, Dec. 11-18, in Santiago, Chile.

"The central challenge for combining an actor's performances from separate takes is video synchronization," Bazin said. "But differences in head pose, emotion, expression intensity, as well as pitch accentuation and even the wording of the speech, are just a few of many difficulties in syncing video takes."

Bazin, Malleson and the rest of the team solved this problem by developing an automatic means of analyzing both and audio cues. It then identifies frames that correspond between the takes using a graph-based framework.

"To the best of our knowledge, our work is the first to combine audio and for achieving an optimal nonlinear, temporal alignment of facial performance videos," Malleson said.

Once this synchronization has occurred, the system enables a director to control the performance by choosing the desired facial expressions and timing from either video, which are then blended together using facial landmarks, optical flow and compositing.

To test the system, actors performed several lines of dialog, repeating the performances to convey different emotions - happiness, sadness, excitement, fear, anger, etc. The line readings were captured in HD resolution using standard compact cameras. The researchers were able to synchronize the videos in real-time and automatically on a standard desktop computer. Users could generate novel versions of the performances by interactively blending the video takes.

The researchers showed additional results of FaceDirector for different applications: generation of multiple performances from a sparse set of input video takes in the context of nonlinear storytelling, script correction and editing, and voice exchange between emotions (for example to create an entertaining performance with a sad voice over a happy face).

Explore further: New method captures facial details at high fidelity and real time

More information: "FaceDirector- Continuous Control of Facial Performance in Video-Paper" [PDF, 13.22 MB]

Related Stories

New interactive method synchronizes multiple videos

August 8, 2014

Disney Research Zurich has developed a new tool to help video editors synchronize multiple video clips based on the visual content of the videos, rather than relying on timecodes or other external markers. Current editing ...

Recommended for you

Cryptocurrency rivals snap at Bitcoin's heels

January 14, 2018

Bitcoin may be the most famous cryptocurrency but, despite a dizzying rise, it's not the most lucrative one and far from alone in a universe that counts 1,400 rivals, and counting.

Top takeaways from Consumers Electronics Show

January 13, 2018

The 2018 Consumer Electronics Show, which concluded Friday in Las Vegas, drew some 4,000 exhibitors from dozens of countries and more than 170,000 attendees, showcased some of the latest from the technology world.

Finnish firm detects new Intel security flaw

January 12, 2018

A new security flaw has been found in Intel hardware which could enable hackers to access corporate laptops remotely, Finnish cybersecurity specialist F-Secure said on Friday.

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

5 / 5 (2) Dec 11, 2015
We can write lies on the most subtle level..... and gradually, seeing has moved to not believing.

Now the transformation is complete.

The harbinger of the final nails in the coffin of 'seeing is believing'.

combine it with this:


And the media machines have the ability to transform anything anyone says, in any short interview..... into whatever they want the viewer to see.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.