Puppeteering Faces

Putting words in someone's mouth – is this meanwhile possible??

A group of scientists from Erlangen, Stanford, and Saarbrücken can transfer the facial expressions and lip movements of one person to the video stream of another in real time, whereas gestures and head movements remain unchanged. Thanks to the efficiency of the algorithms, this can be performed with standard computer hardware.

Both persons are filmed with cameras that have an additional depth-sensor. Such devices (e.g. Microsoft Kinect TM) also measure the distance from each pixel to the scene. In a preprocessing step, based on this data, the parameters of a face model are estimated for both persons, producing the mathematical description of head geometry and reflectance. The next step analyzes the facial expressions and lip movements of both persons, transfers this from one person to the other, and produces a photo-realistic depiction in the target video stream. Thanks to years of successful research, calculation algorithms are meanwhile efficient enough to be able to perform in real time on well-equipped standard computers. This is achieved by skillfully outsourcing sometimes costly reconstruction steps to modern graphics hardware. Justus Thies (University Erlangen-Nürnberg) and Dr. Michael Zollhöfer (MPI for Informatics) see a possible application in the visual enhancement of dubbed movies, frequently foreign films, when the actor talks in one language but the spectator hears another. This entails synchronizing the lip movements to the audio.

This method of transferring in real time the facial expression and lip movements of one person to the video stream of another one, will be officially introduced in November during the Computer Graphics Conference "SIGGRAPH ASIA" in Kobe, Japan. It is the result of a close cooperation between two German computer science research groups, of Prof. Marc Stamminger at the University of Erlangen and of Prof. Christian Theobalt at the MPI for Informatics in Saarbrücken. A third contributor is Prof. Matthias Nießner at Stanford University. The corresponding video on YouTube (https://youtu.be/eXVspNUeiWw), which demonstrates the real-time transfer, has been viewed more than 250,000 times.

Researchers have long been investigating problems in the basic research field of image understanding, especially new methods to estimate dynamic scene models from videos (geometry, reflectance of objects). Professor Theobalt explains: "It is particularly important to calculate models of video data from only a few cameras, or even from just one, in order to estimate a mathematical and as realistic description as possible of rigid, movable, but also deformable bodies within a scene. This is a very difficult and time-consuming computational problem, and the developed methods will find further applications. Basically, this work should be seen as a building block for techniques that allow computers to capture the moving world around them and also to interact with many applications in robotics and augmented/virtual reality".

This work shows that the deceptively realistic manipulation of live video streams is becoming ever more possible. As everyone knows, pictures and video clips are manipulated for marketing or propaganda purposes, so we will also have to watch out for possible manipulation of allegedly live videos.

Additional information:

Project page with contacts	http://people.mpi-inf.mpg.de/~mzollhoef/Papers/SGASIA2015_RR/page.html
Video	https://www.youtube.com/watch?v=eXVspNUeiWw