|Realistic Human Face Scanning and Rendering|
|Graphics Lab Facial Scanning Project - 2001|
|Graham Fyffe *Tim Hawkins **Chris Watts Wan-Chun Ma Paul Debevec|
|USC Institute for Creative Technologies |
One of the "holy grails" of computer graphics is being able to create human faces that look and act realistically. There are many levels of attaining this goal. A digital photograph, for example, is a computer image of a human face that looks real. For faces that act realistically, you can't beat a video camera. These acquisition methods, however, are lacking for two reasons: they only record data in two dimensions, and they are not controllable. If you make a video recording of someone standing still reciting the alphabet, you can't then play it back and make her sing "The Star Spangled Banner" while spinning in circles. Not as obvious, but as important for applications such as computer-based special effects in movies, there is no way to change the lighting after a photograph has been taken or a scene has been filmed. This means, for example, that it is difficult to take a photograph of a person standing in the desert on a sunny day and make it look like the are standing indoors in a Cathedral in front of a stained glass window. Doing so requires painstaking manual touchup by skilled artists, and it is still almost impossible to make it look real.
One of the focuses of our work in the ICT Graphics Lab is to be able to acquire 3D models of humans that can be animated, relit, edited, and otherwise modified. Our "killer app" would be a device in which a person sit, and after a brief "scan," yields a computer graphics model of their face that we could stick onto a video game character's head, or composite into a movie frame for special effects shot. These computer-generated images would be so realistic that they would be indistinguishable from photographs. This level of realism is called Photo-realism.
Our prototype device for digitizing a human face is a rig called a lightstage. Scanning a person in the lightstage is a two-phase process. In the first phase, which lasts a few seconds, the lightstage photographs a human subject's face lit from hundreds of different directions. This recovers the skin's reflectance properties (how skin reflects light, its color, shininess, etc.) The second phase uses a technique called active vision to recover a 3D geometric model of the face (the shape). These two components, reflectance properties and geometry, should be enough to reconstruct the appearance of a person with any light bouncing off of them from any point of view.
Active vision involves projecting patterns with a projector onto a subject. The face's appearance under each pattern is recorded by a separate video camera. By analyzing the images of the face with different patterns on it, active vision techniques obtain a 3D model of the subject. One of the problems with using active vision technology to recover high-resolution geometry is that the geometry tends to be very noisy. This noise is the geometric equivalent of getting bad reception on your TV. It introduces sharp random features like small ridges and valleys that are not present on the actual face. This is the case with the image shown below on the left (A). We would like techniques that allow us to recover higher quality geometric information, without all of this noise.
One approach is to use the reflectance properties we recover in order to improve the geometric model. Using the reflectance data recorded by the lightstage and a technique called photometric stereo, we are able to obtain an accurate estimate of the direction that the surface of the skin is oriented at each point on the face. This data is called a surface normal map or orientation map. We have developed a technique, called differential displacement map recovery, to incorporate this normal map with the noisy range scan to obtain very accurate geometric models of the face. It is efficient; typical processing time for a high-res scan is 1-2 minutes, and it gives good results. Examples are shown below in figures A -D. A forthcoming publication will fully describe this technique.
|A. The original unprocessed noisy range scan. There are holes in the scan because the scanning rig could not see all of the face. In a real application, we would take multiple scans from different angles to see the entire face.||B. The same model as in A after differential displacement map recovery.|
In addition to color information, it is crucial to be able to predict the color of the skin. Skin has interesting and subtle color variations depending on the direction from which it is viewed, the expression of the face, the oiliness of the skin surface, and how it is lit. For this reason, a single photograph will often not suffice to be able to produce the types of flexible digital facial models we are after. To fully understand and capture these color variations, many researchers have developed complicated skin reflectance models and techniques for recording skin color using different measurement devices. The techniques we use are explained in the paper Acquiring the Reflectance Field of the Human Face, SIGGRAPH 2000.
Once we analyze this reflectance data, we are developing algorithms and tools for being able to efficient re-render the face from any viewpoint in any environment. Eventually, we would like to be able to reanimate the face as well. Examples of preliminary results are shown below, Figures C-D. These images can be generated at over 30 frames per second on a high-end PC with graphics hardware acceleration. Another prototype application, the Face Demo, also allows for real-time viewing of a 2D image of the face in any environment.
|C. The same model in B with color added. This color is a lighting and direction-independent color called diffuse albedo. Real skin is shinier than this due to an oily layer on top of the dermis.||D. The same model in B from a closer viewpoint with diffuse albedo. Notice the smooth and subtle contour of the nose and lips recovered by our technique. Unlike blurring techniques for processing range scans to produce smooth models, our technique actually increases, rather than decreases, the resolution of the geometric model.|
Our main goal at the ICT Graphics Lab is to pursue basic research to enable more compelling and realistic virtual environments (of course, compelling and realistic are not necessarily the same thing.) A major component of this is realistic digital actors capable of expressing a wide range of emotions, expressions, performing actions, all while looking good. Applications of these techniques range from more exciting computer games, to better special effects in movies, to 3D video-phones, to medical visualization.
[Update 2007] We never formally published this work other than putting this page on the web back in 2001. However, an efficient version of this normals-upresing technique was recently been presented in:
- Diego Nehab, Szymon Rusinkiewicz, James Davis, Ravi Ramamoorthi. Efficiently combining positions and normals for precise 3D geometry. ACM Transactions on Graphics, 24(3), August 2005, pp. 536-543.
Our group's more recent work applys normals based on the specular reflection of the face rather than the general photometric normals:
- Wan-Chun Ma, Tim Hawkins, Pieter Peers, Charles-Felix Chabert, Malte Weiss, Paul Debevec. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. Eurographics Symposium on Rendering, June 2007.