Computer facial animation is primarily an area of
computer graphics
Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great deal ...
that encapsulates methods and techniques for generating and animating images or models of a character face. The character can be a
human
Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, culture, ...
, a humanoid, an
animal
Animals are multicellular, eukaryotic organisms in the Kingdom (biology), biological kingdom Animalia. With few exceptions, animals Heterotroph, consume organic material, Cellular respiration#Aerobic respiration, breathe oxygen, are Motilit ...
, a
legendary creature
A legendary creature (also mythical or mythological creature) is a type of fictional entity, typically a hybrid, that has not been proven and that is described in folklore
Folklore is shared by a particular group of people; it encompasses ...
or character, etc. Due to its subject and output type, it is also related to many other scientific and artistic fields from
psychology to traditional
animation. The importance of
human faces in
verbal and non-verbal communication and advances in
computer graphics hardware and
software
Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work.
...
have caused considerable scientific, technological, and artistic interests in computer facial animation.
Although development of
computer graphics
Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great deal ...
methods for facial animation started in the early-1970s, major achievements in this field are more recent and happened since the late 1980s.
The body of work around computer facial animation can be divided into two main areas: techniques to generate animation data, and methods to apply such data to a character. Techniques such as
motion capture
Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robo ...
and
keyframing belong to the first group, while
morph targets animation (more commonly known as blendshape animation) and
skeletal animation belong to the second. Facial animation has become well-known and popular through animated feature
film
A film also called a movie, motion picture, moving picture, picture, photoplay or (slang) flick is a work of visual art that simulates experiences and otherwise communicates ideas, stories, perceptions, feelings, beauty, or atmospher ...
s and
computer games but its applications include many more areas such as
communication
Communication (from la, communicare, meaning "to share" or "to be in relation with") is usually defined as the transmission of information. The term may also refer to the message communicated through such transmissions or the field of inqu ...
,
education
Education is a purposeful activity directed at achieving certain aims, such as transmitting knowledge or fostering skills and character traits. These aims may include the development of understanding, rationality, kindness, and honesty. ...
, scientific
simulation
A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the selected system or process, whereas the ...
, and
agent-based systems (for example online customer service representatives). With the recent advancements in computational power in personal and
mobile devices, facial animation has transitioned from appearing in pre-rendered content to being created at runtime.
History
Human
facial expression has been the subject of scientific investigation for more than one hundred years. Study of facial movements and expressions started from a biological point of view. After some older investigations, for example by
John Bulwer
John Bulwer (baptised 16 May 1606 – buried 16 October 1656
)
was an English physician and early Baconian natural philosopher
who wrote five works exploring the Body and human communication, particularly by gesture.
He was the first person ...
in the late 1640s,
Charles Darwin
Charles Robert Darwin ( ; 12 February 1809 – 19 April 1882) was an English natural history#Before 1900, naturalist, geologist, and biologist, widely known for his contributions to evolutionary biology. His proposition that all speci ...
’s book ''The Expression of the Emotions in Men and Animals'' can be considered a major departure for modern research in behavioural
biology
Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditar ...
.
Computer based facial expression modelling and
animation is not a new endeavour. The earliest work with computer based facial representation was done in the early-1970s. The first three-dimensional facial animation was created by
Parke in 1972. In 1973, Gillenson developed an interactive system to assemble and edit line drawn facial images. in 1974,
Parke developed a parameterized three-dimensional facial model.
One of the most important attempts to describe facial movements was
Facial Action Coding System (FACS). Originally developed by Carl-Herman Hjortsjö in the 1960s and updated by
Ekman Ekman is a Swedish surname. Notable people with the surname include:
* Carl Gustaf Ekman (1872–1945), Swedish politician
*Carl Daniel Ekman (1845–1904), Swedish chemical engineer
* Erik Leonard Ekman (1883–1931), Swedish botanist
* Fam Ekman ...
and
Friesen in 1978, FACS defines 46 basic facial Action Units (AUs). A major group of these Action Units represent primitive movements of facial muscles in actions such as raising brows, winking, and talking. Eight AU's are for rigid three-dimensional head movements, (i.e. turning and tilting left and right and going up, down, forward and backward). FACS has been successfully used for describing desired movements of synthetic faces and also in tracking facial activities.
The early-1980s saw the development of the first physically based muscle-controlled face model by Platt and the development of techniques for facial caricatures by Brennan. In 1985, the animated short film ''
Tony de Peltrie'' was a landmark for facial animation. This marked the first time computer facial expression and speech animation were a fundamental part of telling the story.
The late-1980s saw the development of a new muscle-based model by
Waters, the development of an abstract muscle action model by
Magnenat-Thalmann and colleagues, and approaches to automatic speech synchronization by Lewis and Hill. The 1990s have seen increasing activity in the development of facial animation techniques and the use of computer facial animation as a key storytelling component as illustrated in animated films such as ''
Toy Story'' (1995), ''
Antz'' (1998), ''
Shrek'', and ''
Monsters, Inc.
''Monsters, Inc.'' (also known as ''Monsters, Incorporated'') is a 2001 American computer-animated Monster movie, monster comedy film produced by Pixar, Pixar Animation Studios for Walt Disney Pictures. Featuring the voices of John Goodman, B ...
'' (both 2001), and
computer games such as ''
Sims''. ''
Casper'' (1995), a milestone in this decade, was the first movie in which a lead actor was produced exclusively using digital facial animation.
The sophistication of the films increased after 2000. In ''
The Matrix Reloaded'' and ''
The Matrix Revolutions
''The Matrix Revolutions'' is a 2003 American science fiction action film written and directed by the Wachowskis. It is the third installment in ''The Matrix'' film series, released six months following '' The Matrix Reloaded''. The film st ...
'', dense
optical flow from several high-definition cameras was used to capture realistic facial movement at every point on the face. ''
Polar Express (film)
''The Polar Express'' is a 2004 American computer-animated fantasy adventure film co-written and directed by Robert Zemeckis, based on the 1985 children's book of the same name by Chris Van Allsburg, who also served as one of the executive prod ...
'' used a large Vicon system to capture upward of 150 points. Although these systems are automated, a large amount of manual clean-up effort is still needed to make the data usable. Another milestone in facial animation was reached by ''
The Lord of the Rings'', where a character specific shape base system was developed. Mark Sagar pioneered the use of
FACS in entertainment facial animation, and FACS based systems developed by Sagar were used on ''
Monster House'', ''
King Kong'', and other films.
Techniques
Generating facial animation data
The generation of facial animation data can be approached in different ways: 1.)
marker-based motion capture on points or marks on the face of a performer, 2.)
markerless motion capture
Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robot ...
techniques using different type of cameras, 3.) audio-driven techniques, and 4.)
keyframe animation.
*
Motion capture
Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robo ...
uses cameras placed around a subject. The subject is generally fitted either with reflectors (passive motion capture) or sources (active motion capture) that precisely determine the subject's position in space. The data recorded by the cameras is then digitized and converted into a three-dimensional computer model of the subject. Until recently, the size of the detectors/sources used by motion capture systems made the technology inappropriate for facial capture. However, miniaturization and other advancements have made motion capture a viable tool for computer facial animation.
Facial motion capture was used extensively in
Polar Express by
Imageworks Imageworks may refer to:
* Sony Pictures Imageworks, a visual effects and character animation company headquartered in Vancouver, British Columbia, Canada
* Image Works, a video game publisher in the late-1980s and early-1990s
* ImageWorks (Disne ...
where hundreds of motion points were captured. This film was very accomplished and while it attempted to recreate realism, it was criticized for having fallen in the '
uncanny valley', the realm where animation realism is sufficient for human recognition and to convey the emotional message but where the characters fail to be perceived as realistic. The main difficulties of motion capture are the quality of the data which may include vibration as well as the retargeting of the geometry of the points.
*
Markerless motion capture
Motion capture (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robot ...
aims at simplifying the motion capture process by avoiding encumbering the performer with markers. Several techniques came out recently leveraging different sensors, among which standard video cameras, Kinect and depth sensors or other structured-light based devices. Systems based on
structured light
A structured light pattern designed for surface inspection
An Automatix Seamtracker arc welding robot equipped with a camera and structured laser light source, enabling the robot to follow a welding seam automatically
Structured light is the p ...
may achieve real-time performance without the use of any markers using a high speed structured light scanner. The system is based on a robust offline face tracking stage which trains the system with different facial expressions. The matched sequences are used to build a person-specific linear face model that is subsequently used for online face tracking and expression transfer.
* Audio-driven techniques are particularly well fitted for speech animation. Speech is usually treated in a different way to the animation of facial expressions, this is because simple
keyframe-based approaches to animation typically provide a poor approximation to real speech dynamics. Often
visemes are used to represent the key poses in observed speech (i.e. the position of the lips, jaw and tongue when producing a particular
phoneme
In phonology and linguistics, a phoneme () is a unit of sound that can distinguish one word from another in a particular language.
For example, in most dialects of English, with the notable exception of the West Midlands and the north-wes ...
), however there is a great deal of variation in the realisation of visemes during the production of natural speech. The source of this variation is termed
coarticulation which is the influence of surrounding visemes upon the current viseme (i.e. the effect of context). To account for coarticulation current systems either explicitly take into account context when blending viseme keyframes or use longer units such as
diphone,
triphone,
syllable
A syllable is a unit of organization for a sequence of speech sounds typically made up of a syllable nucleus (most often a vowel) with optional initial and final margins (typically, consonants). Syllables are often considered the phonological "bu ...
or even
word and
sentence-length units. One of the most common approaches to speech animation is the use of dominance functions introduced by Cohen and Massaro. Each dominance function represents the influence over time that a viseme has on a speech utterance. Typically the influence will be greatest at the center of the viseme and will degrade with distance from the viseme center. Dominance functions are blended together to generate a speech trajectory in much the same way that
spline basis functions are blended together to generate a curve. The shape of each dominance function will be different according to both which viseme it represents and what aspect of the face is being controlled (e.g. lip width, jaw rotation etc.). This approach to computer-generated speech animation can be seen in the Baldi talking head. Other models of speech use basis units which include context (e.g.
diphones,
triphones etc.) instead of visemes. As the basis units already incorporate the variation of each viseme according to context and to some degree the dynamics of each viseme, no model of
coarticulation is required. Speech is simply generated by selecting appropriate units from a database and blending the units together. This is similar to concatenative techniques in audio
speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
. The disadvantage to these models is that a large amount of captured data is required to produce natural results, and whilst longer units produce more natural results the size of database required expands with the average length of each unit. Finally, some models directly generate speech animations from audio. These systems typically use
hidden Markov models or
neural nets to transform audio parameters into a stream of control parameters for a facial model. The advantage of this method is the capability of voice context handling, the natural rhythm, tempo, emotional and dynamics handling without complex approximation algorithms. The training database is not needed to be labeled since there are no phonemes or visemes needed; the only needed data is the voice and the animation parameters.
*
Keyframe animation is the least automated of the processes to create animation data although it delivers the maximum amount of control over the animation. It is often used in combination with other techniques to deliver the final polish to the animation. The
keyframe data can be made of scalar values defining the
morph targets coefficients or rotation and translation values of the bones in models with a bone based rig. Often to speed up the
keyframe animation process a control rig is used by the animation. The control rig represents a higher level of abstraction that can act on multiple
morph targets coefficients or bones at the same time. For example, a "smile" control can act simultaneously on the mouth shape curving up and the eyes squinting.
Applying facial animation to a character
The main techniques used to apply facial animation to a character are: 1.)
morph targets animation, 2.)
bone driven animation, 3.) texture-based animation (2D or 3D), and 4.)
physiological
Physiology (; ) is the scientific study of functions and mechanisms in a living system. As a sub-discipline of biology, physiology focuses on how organisms, organ systems, individual organs, cells, and biomolecules carry out the chemical ...
models.
*
Morph target
Morph target animation, per-vertex animation, shape interpolation, shape keys, or blend shapes is a method of 3D computer animation used together with techniques such as skeletal animation. In a morph target animation, a "deformed" version of a m ...
s (also called "blendshapes") based systems offer a fast playback as well as a high degree of fidelity of expressions. The technique involves modeling portions of the face mesh to approximate expressions and
visemes and then blending the different sub meshes, known as morph targets or blendshapes. Perhaps the most accomplished character using this technique was Gollum, from ''The Lord of the Rings''. Drawbacks of this technique are that they involve intensive manual labor and are specific to each character. Recently, new concepts in 3D modeling have started to emerge. Recently, a new technology departing from the traditional techniques starts to emerge, such as ''
Curve Controlled Modeling
In mathematics, a curve (also called a curved line in older texts) is an object similar to a line, but that does not have to be straight.
Intuitively, a curve may be thought of as the trace left by a moving point. This is the definition that a ...
''
that emphasizes the modeling of the movement of a 3D object instead of the traditional modeling of the static shape.
*
Bone driven animation is very broadly used in games. The bones setup can vary between few bones to close to a hundred to allow all subtle facial expressions. The main advantages of bone driven animation is that the same animation can be used for different characters as long as the morphology of their faces is similar, and secondly they do not require loading in memory all the
Morph target
Morph target animation, per-vertex animation, shape interpolation, shape keys, or blend shapes is a method of 3D computer animation used together with techniques such as skeletal animation. In a morph target animation, a "deformed" version of a m ...
s data. Bone driven animation is most widely supported by 3D game engines. Bone driven animation can be used for both 2D and 3D animation. For example, it is possible to rig and animate using bones a 2D character using
Adobe Flash
Adobe Flash (formerly Macromedia Flash and FutureSplash) is a multimedia software platform used for production of animations, rich web applications, desktop applications, mobile apps, mobile games, and embedded web browser video players. Fla ...
.
* Texture-based animation uses pixel color to create the animation on the character face. 2D facial animation is commonly based upon the transformation of images, including both images from still photography and sequences of video. Image
morphing is a technique which allows in-between transitional images to be generated between a pair of target still images or between frames from sequences of video. These
morphing techniques usually consist of a combination of a geometric deformation technique, which aligns the target images, and a cross-fade which creates the smooth transition in the image texture. An early example of image
morphing can be seen in
Michael Jackson's video for "Black Or White". In 3D animation texture based animation can be achieved by animating the texture itself or the UV mapping. In the latter case a texture map of all the facial expression is created and the UV map animation is used to transition from one expression to the next.
*
Physiological
Physiology (; ) is the scientific study of functions and mechanisms in a living system. As a sub-discipline of biology, physiology focuses on how organisms, organ systems, individual organs, cells, and biomolecules carry out the chemical ...
models, such as skeletal muscle systems and physically based head models, form another approach in modeling the
head
A head is the part of an organism which usually includes the ears, brain, forehead, cheeks, chin, eyes, nose, and mouth, each of which aid in various sensory functions such as sight, hearing, smell, and taste. Some very simple animals may ...
and
face.
Here, the physical and
anatomical characteristics of
bone
A bone is a rigid organ that constitutes part of the skeleton in most vertebrate animals. Bones protect the various other organs of the body, produce red and white blood cells, store minerals, provide structure and support for the body, an ...
s,
tissues, and
skin are simulated to provide a realistic appearance (e.g. spring-like elasticity). Such methods can be very powerful for creating realism but the complexity of facial structures make them computationally expensive, and difficult to create. Considering the effectiveness of parameterized models for communicative purposes (as explained in the next section), it may be argued that physically based models are not a very efficient choice in many applications. This does not deny the advantages of physically based models and the fact that they can even be used within the context of parameterized models to provide local details when needed.
Face animation languages
Many face animation languages are used to describe the content of facial animation. They can be input to a compatible "player"
software
Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work.
...
which then creates the requested actions. Face animation languages are closely related to other
multimedia presentation languages such as
SMIL and
VRML. Due to the popularity and effectiveness of
XML as a data representation mechanism, most face animation languages are XML-based. For instance, this is a sample from
Virtual Human Markup Language (VHML):
First I speak with an angry voice and look very angry,
but suddenly I change to look more surprised.
More advanced languages allow decision-making, event handling, and parallel and sequential actions. The ''Face Modeling Language'' (FML) is an
XML-based language for describing face
animation.
FML supports
MPEG-4
MPEG-4 is a group of international standards for the compression of digital audio and visual data, multimedia systems, and file storage formats. It was originally introduced in late 1998 as a group of audio and video coding formats and related tec ...
Face Animation Parameters (FAPS), decision-making and dynamic
event handling, and typical
programming constructs such as
loops. It is part of the iFACE system.
The following is an example from FML:
See also
*
Animation
*
Caricature
*
Computer animation
*
Computer graphics
Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great deal ...
*
Deepfake
*
Facial expression
*
Facial motion capture
*
Interactive online characters
*
Morphing
*
Parametric surface A parametric surface is a surface in the Euclidean space \R^3 which is defined by a parametric equation with two parameters Parametric representation is a very general way to specify a surface, as well as implicit representation. Surfaces that o ...
*
Texture mapping
References
Further reading
* ''Computer Facial Animation'' by Frederic I. Parke, Keith Waters 2008
* ''Data-driven 3D facial animation'' by Zhigang Deng, Ulrich Neumann 2007
* ''Handbook of Virtual Humans'' by Nadia Magnenat-Thalmann and Daniel Thalmann, 2004
* {{cite book, last=Osipa, first=Jason, title=Stop Staring: Facial Modeling and Animation Done Right, publisher=John Wiley & Sons, year=2005, edition=2nd, isbn=978-0-471-78920-8, url=https://www.amazon.co.uk/gp/reader/0471789208/ref=sib_dp_pt#reader-link
External links
Face/Off: Live Facial Puppetry - Realtime markerless facial animation technology developed at ETH ZurichThe "Artificial Actors" Project - Institute of AnimationiFACEAnimated Baldi* download o
Carl-Herman Hjortsjö, Man's face and mimic language"(the original Swedish title of the book is: "Människans ansikte och mimiska språket". The correct translation would be: "Man's face and facial language")
Computer animation
Anatomical simulation