Bild 1Bild 2Bild 3
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

Current Spotlight

Exchanging Faces in Images

Volker Blanz, Department 4: Computer Graphics

In Computer Graphics, more and more applications require digital effects within existing images or video material, rather than creating virtual scenes entirely. Currently, tasks such as exchanging or animating faces in images are usually done manually with software for digital photo editing, which follows the same principle as conventional photographic print retouching: color values in images are changed locally in each point, and image regions are copied from one image into another. This is very tedious and requires the skill of an artist.

We have developed a method that provides users a high-level tool for exchanging or animating faces in images in an almost completely automated way. For the user, the new editing paradigm is no longer based on points (pixels), but on high level descriptions such as "Person A", "Person B", and "Smile". Most importantly, our algorithm can insert a face from a given viewing direction into images at any other viewing directions and illuminations, which was not possible with previous techniques.

Our algorithm reconstructs 3D face models from single input images, applies changes in 3D, and draws the results into target images. The new pose angles and illuminations for the target images are computed automatically. The only manual interaction required is to click on about 7 feature points, such as the corners of the eyes and mouth, in each image. The system, which will be described in detail below, is based on several fundamental approaches to Computer Graphics:

Learning-Based Graphics captures the statistical properties of a class of objects, such as faces, after learning these properties from examples.

The 3D Morphable Model represents faces as points in a high-dimensional face space.

Connecting 3D Graphics with Photo-realistic 2D Image Processing makes our system versatile with respect to head pose and illumination.

Analysis-by-Synthesis exploits the fact that it is easier to create an image of a face from a 3D model than vice versa. Our 3D reconstruction algorithm draws an image, compares it to the input image of the face that is to be reconstructed, and updates the model parameters. This process is repeated until the synthetic image is as similar as possible to the input.

Morphable Model

The Morphable Model of 3D Faces treats the shape of a face as high dimensional vector, formed by the 3D coordinates of a dense mesh of surface points combined to a single vector. In the same way, texture vectors are formed from the red, green and blue color component of each point. High-dimensional spaces are hard to visualize, but it is sufficient to think of these vectors simply as points in a plane. The center point between two faces would be a new face that looks a bit like a blend of both. In the vector space of faces, any linear combination of faces, such as 10% of face A plus 5% of face B plus 85% of face C, gives a realistic new face. This is due to the dense point-to-point correspondence between face vectors that we automatically established on all 200 example faces that span the vector space: Each vector component describes the same point, such as the tip of the nose, in all face vectors.

3D Surface Reconstruction from a single image is an ill-posed problem: dark regions in an image can be either due to shadows, or to dark materials that absorb light. Even with uniform materials, such as plaster sculptures, the equations for shape-from-shading have no unique solution. The reason why humans can easily estimate 3D shape from images is their knowledge about what is possible in real world, and what is not. For example, humans know the range of possible shapes of human faces, and select the most plausible face that fits the light signal received by our eyes. The very same strategy is used in the 3D face reconstruction algorithm.

Given an image of a face, our algorithm computes the linear combination of example faces and the scene parameters that fit the input image best in terms of point-by-point image difference. In an analysis-by-synthesis loop, the algorithm draws a synthetic image (rendering operation R, with scene parameters rho), compares the result with the input, and updates the face model and scene parameters. Mathematically, this is a non-linear optimization problem. In order to help the system to find the face in the image, the user has to click on 7 feature points, such as the eyes and the nose.

 

Exchanging Faces

In order to draw a face from one image into another, the 3D reconstruction algorithm is applied to both of them. Then, the 3D face from the source image is drawn with the pose and illumination parameters that were estimated from the other. The background behind the 3D face is the original target image. In this background, the image structures around the face are extended into the face region across the silhouette of the original face, which is important if the new face is smaller than the original. Strands of hair that cover the face have to be processed manually, and are drawn as a front layer for any new face.

Exchanging faces can be applied as a new tool in digital image editing. Much of the process is automated, so it can be used for consumer applications such as virtual try-on of hairstyles. The advantage of our approach is that any photo of a new hairstyle, together with a portrait of the consumer, can be used for a synthetic, photo-realistic preview. In contrast, 3D representations of virtual hairstyles, as they are used in animated movies, haven't reached the degree of realism required for such an application, and it would be challenging to re-model existing hairstyles created by hairdressers. Another application of face exchange may be in movie productions. Extending the technique to video frames is straight-forward.

An important field of application can be in automated face recognition. Changes in pose and illumination are still not handled sufficiently by current face recognition software. For the Face Recognition Vendor Test 2002 by the National Insitute of Standards and DARPA, we have created virtual front views from side views of faces and drawn these into a standard target image. With our synthetic front views, the rate of correct recognition of 9 out of 10 commercial face recognition programs increased considerably: While the best systems achieved only about 40 percent correct recognition with the original (side) views, they had more than 80 percent correct with our synthetic (front) views.

Reanimating Faces in Images

Based on the 3D reconstructions, faces can be re-animated in photographs and paintings. After reconstruction, we animate the faces in 3D, and draw the result back into the original image. The 3D deformations for animation are learned from scans of a face at different facial expressions. In our vector space representation, the difference between a smiling face and a neutral face is a "smile vector" that can be added to any other neutral face to make it smile. This technique, which we have also applied to video input, may be used in a wide range of media applications, including virtual museums, and for lip-synching movies.

Volker Blanz, Department 4: Computer Graphics, in collaboration with T. Vetter, Basel.

For more information, please visit the homepage of Volker Blanz.

 

Examples

Original image of the person inserted into the two pictures above.