<<

Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

Creating Micro-expressions and Nuanced Nonverbal in Synthetic Cultural Characters and Environments Marjorie Zielke, Ph.D, Frank Dufour, Ph.D Gary Hardee University of Texas at Dallas University of Texas at Dallas Richardson, Texas Richardson, Texas [email protected] [email protected] [email protected]

ABSTRACT

Understanding how to observe and analyze of virtual in synthetic environments can assist warfighters with determining source credibility, deception detection, behavior observation, process training and other training objectives. The First Person Cultural Trainer (FPCT) is a high-fidelity, game-based simulation that trains cross-cultural decision making and is also a platform for the creation of synthetic characters which display culturally accurate facial and micro-expressions. These facial and micro-expressions serve as a channel for factors such as invariants and affordances within the synthetic environment that also communicate critical nonverbal information. This paper will discuss technical challenges and solutions in creating facial and micro-expressions and environmental invariants and affordances such as those needed in our high-fidelity training projects. The paper will also provide an overview of the state-of-the-art for creating nuanced nonverbal from the gaming industry. We conclude with a description of our solution to developing facial and micro-expressions and nuanced nonverbal communication that we feel best suits our needs at this time.

ABOUT THE AUTHORS

Marjorie A. Zielke, Ph.D., is an assistant professor of Arts and Technology at the University of Texas at Dallas and is the principal investigator on First Person Cultural Trainer. Dr. Zielke has served as principal investigator and project manager on a series of culturally related military-funded simulation projects over the last several years. Her areas of research are cyberpsychology and hyper-realistic simulations, and she works in the cultural training, health and marketing sectors. Dr. Zielke received her Ph.D. from the University of Texas at Dallas, and also has an MBA and a master’s in international business. Projects on which Dr. Zielke have been principal investigator, including the First Person Cultural Trainer, have won awards from the National Training and Simulation Society, the Department of Defense and the Society for Simulation and Healthcare.

Frank Dufour, Ph.D., teaches sound design for digital arts. He created one of the first all-digital recording studios in France specialized in digital sound restoration for audio publications and sound design for animation movies and video productions. His work on the First Person Cultural Trainer focuses on cognitive game development and sound design.

Gary Hardee, MA, is the research project manager for the Institute for Interactive Arts and Engineering and a doctoral student at the University of Texas at Dallas. He has many years of professional media experience and manages several UT Dallas projects involving cultural training simulations and virtual patients.

The following colleagues at the University of Texas at Dallas also contributed as authors: Dr. Robert Taylor, Dr. Bruce Jacobs, Desmond Blair, Adam Buxkamper, Jumanne Donahue, Brandon Keown and Dan Trinh.

2011 Paper No. 11372 Page 1 of 11 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

Creating Micro-expressions and Nuanced Nonverbal Communication in Synthetic Cultural Characters and Environments

Marjorie Zielke, Ph.D, Frank Dufour, Ph.D Gary Hardee University of Texas at Dallas University of Texas at Dallas Richardson, Texas Richardson, Texas [email protected] [email protected] [email protected]

INTRODUCTION

Today’s irregular warfare environment increasingly in 1/5 to 1/25 of a second (Porter and Brinke, 2008). requires training for engaging and communicating Figure 1 below illustrates the difference between facial with multi-cultural non-combatant populations. How and micro-expressions. well warfighters understand the verbal and non-verbal communications of non-combatants can influence their success in missions. Traditionally, non-combatant engagement training has been achieved through sessions with live actors. While such training can be effective, it also is limited and costly. Alternatively, training with virtual agents in game-based simulations can be an efficient, highly composable and effective solution. Figure 1. A (left) is a configuration of muscle movements in the that expresses an To be effective, a game-based simulation’s virtual agents, emotional state. A micro-expression (right) is a or non-player characters (NPCs), must demonstrate quick burst – 1/5 to 1/25 of a second – of involuntary cognitive and emotional complexity through both verbal . expressions as well as high-fidelity, nuanced visual representations that include facial expressions and much briefer micro-expressions. Further, the concept of The full understanding of facial and micro-expressions emotional flow suggests that nonverbal communication needs to be in the context of invariants and affordances. is a continuum of behavioral actions displayed within Invariants express the importance of the environment to the environmental milieu to which the NPCs respond understanding and synthetically portraying nonverbal with appropriate behaviors. In this paper we focus communication. Both structural and transformational on the development of facial and micro-expressions invariants are used in our model. Structural invariants in synthetic environments and the creation of the are properties of objects that at some level of description overall affordances and invariants that set the context can be considered the same. (Michaels, Carello, for interpreting behavior. Facial expressions, micro- 1981) Structural invariants give us an overall sense expressions, affordances, invariants and emotional flow of the combined elements of the environment to put are explained further below and in Figure 2. the nonverbal communication of NPCs in context. Structural invariants might be understood as the overall A facial expression is a configuration of muscle and common feel of the environment as they are movements in the face and come into existence in a available to the entire population of the environment. number of ways. In the context of communication, facial expressions can be simulated, ritualized and completely A transformational invariant is change occurring in or to voluntary (Russell and Fernández-Dols, 1997), an object or a collection of objects in the environment. genuinely connected to an emotional state and freely (Michaels, Carello, 1981) A micro-expression, for expressed, or used to mask an involuntary emotional example, is a transformational invariant that describes a state (Porter and Brinke, 2008). The final circumstance brief change in a facial expression. is the one in which micro-expressions emerge. A micro- expression occurs when a person cannot totally suppress Affordances, in their simplest form, are the behavioral or mask an involuntary emotional facial expression. implications directly gathered by the player from The resulting leaked expression is brief, taking place structural and transformational invariants attached to

2011 Paper No. 11372 Page 2 of 11 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

static objects and places in the environment. These Finally, facial and micro-expressions cannot be elements can communicate information based on a presented as discrete representations, unconnected player’s perception and therefore might be different from the synthetic living-world environment in which between users. For example, a bombed-out building NPCs exist. Instead, we contend that they must act in represents more than charred rubble; it delivers accordance with the affordances and invariants of the information of a previous action and, depending on environment as defined above. We call this emotional the perceiver’s intention, location, culture and personal flow. Emotional flow transitions a character’s emotional background, its structural invariants afford for various equilibrium and may be a combination of facial and actions such as hiding, exploring or avoiding. micro-expressions as well as other types of nonverbal communication. Figure 2 below summarizes these concepts.

Figure 2. This image is an example of the types of nonverbal information available in our synthetic environment. The overall feel of the environment (A) is displayed through structural invariants – the combination of objects in the environment. Change occurring in or to an object or a collection of objects is a transformational invariant that communicates additional nonverbal information. Examples of transformational invariants are movement by the man on the roof (B) or, when he is seen close up, changes in facial expressions (C) or the brief display of a micro-expression (D). The NPC’s transition from state C to state D would be described as emotional flow. The entire environment indicated by (E) are the meanings attached by the player to the nonverbal communication of the environment or affordances. For example, to some observers, the scene depicted above might seem dangerous; to others it might seem as a typical day with nothing unusual.

2011 Paper No. 11372 Page 3 of 11 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

Using the ideas presented above, this paper will discuss more recent times, the work of is generally the technical challenges and solutions for creating visual cited as contemporary evidence of the universality of representations of facial and micro-expressions. We begin facial expressions. Ekman links facial expressions – by discussing how training in nonverbal communication and specifically micro-expressions – to six universally can assist warfighters in distinguishing between emergent and well-established basic , threats and non-threats. We then provide an overview , , enjoyment, and (Sabini of the state-of-the-art for developing facial and micro- and Silver, 2005). There are other candidate basic expressions from the gaming industry. For example, emotions, like excitement, guilt and , but only six new commercial software allows game developers to well-established ones (Sabini and Silver, 2005). stream video capture of an actor’s performance onto the face of a 3D model, producing high-fidelity, life-like Nonverbal communication is particularly relevant animations of human in videogame characters. in high-velocity, time-pressured situations imbued We conclude with a discussion of our approach to with threat, such as today’s irregular warfare developing facial and micro-expressions in the First environment. Warfighters, much like police officers, Person Cultural Trainer (FPCT), a 3D game-based must increasingly develop perceptual shorthand to simulation sponsored by TRADOC G2 Intelligence reveal emergent threats from non-threats and to assist Support Activity that trains cross-cultural decision- in determining direct, effective action. This perceptual making. Our strategy is based on five criteria that we shorthand is phenomenological and relies heavily, consider essential to developing cultural, psychological if not almost exclusively, on nonverbal appraisals. and behavioral complexity within virtual humans and Phenomenology essentially is a study of how people synthetic societies. The criteria include: the need for typify behavior through behavior pattern analysis. composable training scenarios that are not dependent on The cues investigated may be interactional, but they actor performance; the to develop our modules can be measured and quantified and then assessed in for use across multiple technology platforms; the ability dynamic environments. Although phenomenological to vary the appropriate level of visual fidelity depending interactionism is considered by some to be subjective, upon training objectives; the ability to provide in-game the ability to quantify things like body lean, arm angle, and after-action training feedback and assessments; and eye gaze duration, blinking, feet shuffling, arm swinging, the goal to develop end-user tools for subject matter degree of physical co-presence, and bodily stance experts (SMEs). Whereas new technology can provide make it abundantly empirical. Our strategy in FPCT hyper-realistic facial and micro-expressions, we have is to create a phenomenological training environment opted for a strategy that trades off some visual realism through the use of high-fidelity NPC facial and micro- to support composability and our other objectives. expressions, as informed by the work of Ekman, and through environmental representations of invariants and affordances as defined above. The next sections provide SIGNIFICANCE AND CHARACTERISTICS OF a discussion of facial and micro-expressions and their NONVERBAL COMMUNICATION TRAINING connection to communicating emotion and intent in virtual characters and synthetic environments. Many researchers posit that the majority of communication between people is nonverbal. MICRO-EXPRESSIONS: WINDOW TO Nonverbal communication encompasses a variety of EMOTION different modalities. , or haptic touch, is one means. Posture is another. Finally, facial expression and movement, and are critical to understanding As shown in Figure 3, micro-expressions are particularly nonverbal communication. Although a great deal of informative in interpreting nonverbal communication, nonverbal communication is based on idiosyncratic but also challenging to model in synthetic environments. and sometimes arbitrary symbols which can vary from One reason for this is that micro-expressions are region to region or culture to culture, the vast majority of strongly linked to emotion. As stated above, Ekman such communication is remarkably universal. Charles links micro-expressions to six universally human and Darwin (1872) was perhaps the first to recognize this well-established basic emotions—anger, fear, sadness, quality in his path-breaking work, The Expression of enjoyment, disgust and surprise (Sabini and Silver, the Emotions in Man and Animals. (Darwin, 1872) In 2005). Ekman argues that the six core emotions meet

2011 Paper No. 11372 Page 4 of 11 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

a number of criteria, including distinctive presentation, be masked or faked to some degree, such as during an appearance in other primates and are unbidden or arising attempt to deceive another, or blended when a person is without conscious effort (Sabini and Silver, 2005). in an emotionally conflicted state (Ochs, Niewiadomski, Ekman also asserts that each basic emotion is surrounded Pelachaud and Sadek, 2005). When the nonverbal by a “family” of closely related ones (Ekman, 1999). For indicators conflict with verbal communication this example, sadness might be related to states like feeling might indicate deception. The literature on “deception depressed or sullen. The six established emotions clues” is rich and varied. The conventional wisdom have unique micro-expressions associated with them. that emerges from this research is that verbal cues may These expressions typically occur rapidly and are brief profess truth, but nonverbal cues can cause deception to in duration. These expressions at times are blended to “leak.” Liars who attempt to compensate for the leakage express multiple or conflicting emotions. Some work may make their deception more glaring. Although has been done exploring how simulated emotion might there is debate in the literature as to which nonverbal be translated into micro-expressions in a virtual human movements indicate leakage of deception, researchers (Schaap, 2009), though we observe the research does have concluded that distinctions can be made, they are not account for the influence of culture and context. empirical, and they can be quantified with appropriate data collection techniques (Vrij 2008).

Any approach to understanding or simulating micro- Our strategy with micro-expressions is to communicate expressions must represent two types of emotion: a greater degree of nuanced emotion and deception, and felt emotion, which is experienced by the individual, our development takes into consideration three major and expressed emotion, which can be affected by physiological factors: personality, culture or context (Ekman and Keltner, • : how our NPCs manipulate, navigate 2003). This two-dimensional definition is especially and perceive the space around them. Space in useful when one notes that emotional expressions can nonverbal communication can be disaggregated into

Figure 3. While at first glance the images above may appear all to be the same, if one looks closely, each face reveals a slightly different micro-expression related to emotion.

2011 Paper No. 11372 Page 5 of 11 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

four distinct components: intimate space, personal space, social space and public space. Optical Motion Capture Systems – The Basics • : the use of posture, , stance and movement in communication. Posture, for example, Optical motion capture is currently the most widely used can signal a multitude of possible interactions and facial motion capture solution for games and movies. can evince levels of hostility or receptiveness to The ability of the system to create a complex facial communication. Factors such as the angle of body marker set on an actor allows for accurate facial capture lean, the degree of bodily openness to a speaker, of a given performance. The cameras in optical motion and the configuration of fingers, hands and arms capture systems continue to improve in their capability are all potentially relevant in this regard. Because to capture subtle facial and body gestures. A leader in these signals are nonverbal, they are communicated the motion-capture industry, Vicon, utilizes cameras instantaneously—which is precisely why nonverbal that differentiate between high and low - contrast on a communication is so efficient in conveying real-time gray scale. LED lights are strobed onto the capture area, information. where reflective markers placed on the motion capture • : Eyes have long been considered actor reflect the light back. The camera captures the windows to the soul, and much can be intuited about reflectivity of the markers in order to give positional inner motives from the study of eyes. Variation in data to the computer. This allows for the creation of a 3D the extent, nature, frequency, intensity and duration scene containing the motion of the actor. At this time, the of eye contact, for example, betrays everything newest motion capture camera that Vicon has available from indifference and disinterest to fascination and is the T160 series. The camera has a 16 megapixel hostility. resolution, which allows the smaller facial markers to be seen from a traditional wall-mounted camera The facial and micro-expressions of our virtual agents setup. Newer Vicon cameras have made improvements reflect emotions that are elicited by three forms of over older models by increasing the pixel resolution stimuli: cultural rules, interactions with the player allowing facial markers to be seen from a distance. and other NPCs, and invariants and affordances in (Vicon T-Series, n.d. Para. 1-12.) One disadvantage of the synthetic environment. The next section discusses optical motion capture systems is the large amount of advantages and disadvantages of the state-of-the-art processing that must be done after capture in order to technology which focuses on enhancing the realism of place the data onto a 3D character. This data cleaning both facial and micro-expressions. We follow with a invariably removes some of the original performance, discussion of our strategy. replacing it with an animator’s interpretation. As mentioned above, MotionScan Technology was NEW LEVELS OF REALISM MotionScan

Overall, motion capture systems continue to improve their capability to capture subtle facial and body gestures developed by Depth Analysis for the new videogame, that are critical to generating micro-expressions and the L.A. Noire. MotionScan technology uses an array of corresponding synthetic environments. However, in the cameras positioned around an actor’s head to capture past few years, several new facial animation approaches a performance from multiple angles simultaneously. have been developed to generate accurate facial With multiple angles and cameras, a full 3D mesh can animation data onto virtual characters. In particular, be created from the live-action performance. This two new systems focus on using video capture of actor process allows MotionScan technology to turn a real- performances to produce realistic facial animations -- life performance into a virtual one with one-to-one MotionScan technology developed by Depth Analysis accuracy. No markers, reflective paint or any other form for Rockstar’s new videogame L.A. Noire and Image of makeup, as found in traditional optical motion scan Metrics’ Faceware software and pipeline. This section systems, need to be applied to the actor. The actor simply discusses the basics of widely used traditional optical sits down in a chair and gives a performance, which is motion capture methodology and these two new state- captured for use in the eventual game development. of-the art approaches. (Stuart, 2011) Figure 4 below is indicative of the type of production possible with MotionScan Technology.

2011 Paper No. 11372 Page 6 of 11 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

similar. Any performance can be transferred/retargeted to any type of rig or character. The rigs can be full blend shapes, bones or a combination of both. Image Metrics’ process is also faster than the traditional key- frame approach in terms of time spent both animating and smoothing out transitions between the extreme animation points in an actor’s performance. (FaceWare, n.d. Para. 1-2.).

Figure 4. MotionScan, which was developed for use in the development of the L.A. Noire videogame, streams video of an actor’s performance onto the face of the virtual character. The image above illustrates an in-game scene developed with this technology.

The process is fairly simple. The data that is collected by the capture does not need to be cleaned by an animator and is therefore not altered as occurs in traditional motion capture. However, with current MotionScan technology, an actor’s performance cannot be retargeted Figure 5. Image Metrics software creates facial onto a different character. The performance can only expressions from video of an actor’s performance. be used on the mesh created by the capture session. The software does not require facial markers such as This means that each capture can only be used for those used in traditional motion-capture, but rather a single character. It also means that the character in is reproduced from video. game will look almost exactly like the actor giving the performance. (Stuart, 2011) Unlike the MotionScan technology, the quality of Image MotionScan textures are streamed onto the virtual Metrics’ final animation depends largely on the skill of character’s face as video, which requires considerable the animators. This is because the process for Image hardware processing speed. In an interview, developers Metrics requires animators to create poses using facial explained that, to optimize rendering of the video, no rigs that match the most extreme gestures performed by more than three virtual characters in L.A. Noire can be the actor in the reference video. This process requires talking at the same time. To accommodate for this, other reference video to be sent to Image Metrics for in-house virtual characters face away from the camera and do not processing before a final animation is returned. Slight talk. (Stuart, 2011) tweaks to the poses that the artist sets may be needed to influence the entire performance. But video or poses do Image Metrics not need to be sent back to Image Metrics for iterations. (CGSociety Feature Article, n.d. Para. 1-13.).

Image Metrics technology has been used in numerous Facial capture technology that incorporates video, such commercial videogames, such as Grand Theft Auto as that of MotionScan or Image Metrics, produces a IV, Assassin’s Creed II, NBA 2K11 and Halo Reach. great degree of visual realism, and Image Metrics’ As illustrated in Figure 5, Image Metrics uses a technology does allow artists to change the performance simple capture system using video from any camera. to their needs. We believe, however, that our blend This simplifies the capture process and allows it to be shapes approach, discussed in the next section, supports done almost anywhere. The results of Image Metrics our development goals, including composability and animations are accurate to the performance given on flexibility while still producing realistic facial and the video. (CGSociety Feature Article, n.d. Para. 1-13.) micro-expressions. Image Metrics’ process does not require the rigs to be

2011 Paper No. 11372 Page 7 of 11 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

Our Approach: Blend Shapes In general the advantages of blend shapes are their Gameplay in FPCT focuses on the effects of player ability to be hand-crafted by artists. Their flexibility lies interactions on the emotions of NPCs. Conversation, in the fact that they are created by vertex manipulation, gestures and events can influence NPC emotions meaning that the complexity of the blend shape is only positively or negatively, which in turn influence NPC limited by the number of vertices on the mesh and the mood and cooperation levels. The player’s goal is to ability of the artist creating the shape. They are also establish trust through culturally correct interactions, widely supported by most game engines. In FPCT, thereby enhancing the player’s ability to discover blend shapes are advantageous because they allow us mission-critical information, known as golden nuggets. to break expressions into multiple pieces. This gives us the flexibility to create expressions during run time As mentioned earlier, we have approached our instead of having to have a prerecorded performance development with five major criteria in mind. First, for all interactions that take place in the game. This our developer tools have been designed to allow a methodology also allows us to have high-fidelity high degree of composability to dynamically generate expressions, although they may be slower to create than variable cultures and subcultures, each with its own in the MotionScan or Image Metrics process. In general, variable sets of visual representations of emotion. We blend shapes give us the flexibility we need to create a want the capability to generate culturally accurate micro- character that can react to the environment dynamically expressions in multiple NPCs as well as invariants and instead of through a scripted scenario. affordances from any type of environment. Second, our development is designed to be platform independent, The blend shapes approach also allows us to separate meaning that it is not tied to any one videogame system emotional representations into upper and lower facial or engine. This requirement might restrict use of resource expressions. As illustrated in Figure 6, this allows us intensive technology which might not work optimally on to visually represent a conflict between felt, or elicited, mobile platforms, for example. Third, our development emotions and expressed emotions, which are influenced is being designed to accommodate varying levels of by a virtual agent’s goals, personality and culture. Such 3D fidelity, depending upon the training objective. For a conflict in emotions could represent, for example, an example, if the training objective is to analyze facial attempt to mask a felt versus an expressed emotion, as and body gestures of a culturally accurate NPC for in deception. The design of our emotional model draws signs of deception, then NPC micro-expressions must on Ekman’s concepts that the upper face is harder to achieve a level of realism that transfers the knowledge. control. Thus, the upper face shows more felt emotion, Fourth, in-game feedback, or during action review, is an while the lower face displays more expressed emotion. important aspect of our training simulation. Therefore, the player must be able to receive feedback based on the information communicated through all the criteria set forth – facial and micro-expressions, invariants, affordances and emotional flow. Last, our development philosophy includes a goal for end-user tools for SMEs so they may design immersive virtual scenarios based on their real-world experiences.

Given these criteria, we are developing facial and micro-expressions using blend shapes. Blend shapes are deformations on 3D models that are created from the topology of an already existing mesh with the intention of creating a new representation. This is achieved by Figure 6. Using blend shapes, we are able to represent manipulating the vertices on a duplicate mesh into the blended micro-expressions, such as a combination of desired shape. The new representation can be blended felt emotion like sadness – expressed in the upper between the original shape and the newly created model part of the face – with an expressed emotion such linearly, allowing morphing between the two models to as – expressed in the lower portion of the any degree. face.

2011 Paper No. 11372 Page 8 of 11 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

EMOTIONAL FLOW partial drop in intensity from the peak and represents the integration of the energy brought by the attack to the Blend shapes require no bones, instead using the changes body of the instrument. The “sustain” phase corresponds in vertices on the actual mesh of the 3D character to to the main body of the sound after the transient “attack- create different expressions on the face. This allows for decay” sequence and represents the resonant mode of greater artist, and ultimately SME, control over what the instrument. The release represents the dissipation expressions look like. With this flexibility, FPCT’s of the energy and the end of the resonating mode. The NPCs can display complex expressions that portray release expresses the transition back to immobility, or multiple emotions, as well as do full transitions from in our emotional flow model, the transition to a new one emotion to another. This allows for the simulation emotional equilibrium for the virtual character. We of quick flashes of unintentional emotion as well as believe this model provides a theoretical foundation the ability for our NPCs to transition dynamically in- for programming variable blend shapes into micro- game from one emotional equilibrium to another in expressions that can be altered by time and intensity response to varying intensities of stimuli. We describe and reflect the relationship between the stimuli and the this as visual emotional flow, which, instead of being current state of emotion or flow of emotion and its visual attached to a pre-recorded performance, is generated display. real-time during gameplay. We frame the explanation of our concept for emotional flow using the acoustical concepts for resonance, damping, and dynamic profile.

The emotional flow of NPCs is an oscillating system. The temporal manifestation of the emotional flow is based on variations around a state of equilibrium, which, instead of being fixed and attached to a pre-recorded set of snapshots, is generated real-time during gameplay. For example, the visual display of happiness does not consist of pasting a grin over the character’s mouth, but Figure 7. This is an example of an ADSR envelope in organizing the flow of muscular events materialized – attack, decay, sustain and release. We use the by blend shapes according to the character’s actions ADSR envelope as a model for the display of micro- and responses to the stimuli from the environment. This expressions. Time t = 0 to t = 1 represents a person’s oscillating system responds to stimuli in two ways: by emotional equilibrium. Time t = 1 to t = 2 represents resonating to them or by damping them. The system is the attack, a rapid increase in the intensity of an similar to an acoustic resonator: when the stimuli are emotion over a short period of time in response to in sympathy with the state of the system, the system stimuli, such as the death of a loved one. Time t = 2 to integrates and transfers more of the energy from the t = 3 represents the decay, as emotion wanes from its stimuli and produces a greater response to the stimuli. peak intensity. Time t = 3 to t = 4 represents sustain, Conversely, the system damps the energy of the stimuli a flattening of emotion that is higher in intensity than by not transferring it. the beginning emotional equilibrium. This represents how long an individual sustains the higher emotional As shown in Figure 7, to design our emotional flow intensity. Finally, time t = 4 to t = 5 represents release, model integrating the display of micro-expressions, which is the natural decay of intensity over time. we reference the acoustical model of a dynamic profile known as the ADSR envelope: attack, decay, sustain, A FINAL COMMENT ON THE IMPORTANCE and release. ADSR is a model used in sound synthesis OF INVARIANTS AND AFFORDANCES to replicate the dynamic profile of the sounds of natural FOR PORTRAYING FACIAL AND musical instruments. This model represents the four MICRO-EXPRESSIONS IN SYNTHETIC phases of the response of the acoustic oscillator, i.e. ENVIRONMENTS the musical instrument to a mechanical stimulus. The “attack” represents the initial energetic input: depending on the type of instrument, it can be sharp, explosive, Today’s game engines are capable of handling hundreds progressive, or continuous. The “decay” expresses a of thousands of polygons, making them capable of

2011 Paper No. 11372 Page 9 of 11 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2011

producing virtual characters and environments that distortion of shape or visual elements but an emphasis almost mirror real world counterparts. However, we on a particular expression. Even when used in virtual believe fidelity in virtual humans is not defined solely by simulations, virtual characters must be thought of as the visual quality of a re-created human form. Fidelity actors. Increasing the visual fidelity of a character helps is also an issue of understanding the foundational the player identify specific features of that character. principles that have defined character animation for However, creating motion that is readable is what allows more than a century. The principles defined by Disney the player to connect with that character. These two animation draw from traditional theatre and remain principles are particularly significant to the expression of essential for creating virtual characters that grasp an emotion through facial gestures and to events occurring audience’s attention and evoke emotional responses. in the virtual environment as a means of transferring (Frank, 1981) The idea that emotion is expressed not information. only from characters themselves, but also from their environment, goes back to the very beginning of early SUMMARY animation.

Disney artists developed a system of guidelines for Today’s irregular warfare environment requires greater capturing motion, and although most of these guidelines training for interaction with multi-cultural, non- apply to traditional 2D animation, there are two that are combatant populations. We contend that game-based particularly valid when attempting to increase the fidelity simulations designed to meet these training needs must of 3D animation as well. The first is anticipation. This have tools for composing a wide range of expressions principle involves creating counter motions that allow a that offer the appropriate degree of animation and viewer to follow or anticipate the action of a character visual fidelity necessary to display nuanced nonverbal on screen. Anticipation allows the viewer to remain information. We contend this nuanced nonverbal engaged while alerting about actions that are about to information should be displayed in virtual characters, take place. Anticipation is not always obvious and may through micro-expressions, and in the environment, not always reveal a character’s intentions, but rather through invariants and affordances. We also contend is a device used to inform the viewer. The “illusion of that micro-expressions can be combined and blended to life” concept identified by Disney animators states that create life-like transitions of emotional equilibrium. We “anticipatory moves may not show why [a character] is call this emotional flow. Our approach to generate such doing something, but there is no question about what high-fidelity visual representations is through the use of he is doing.” (Frank, 1981) Anticipatory motions serve blend shapes. Although new commercial software allows to counter balance the primary action taken by an on- game developers to stream video capture of an actor’s screen actor. For example, before a pitcher releases a performance onto the face of a 3D model, producing ball he winds up alerting the audience to his intent to extremely realistic animations of human emotion, we throw. If the same scenario were recreated in a game believe our approach provides greater flexibility and environment without the anticipatory motion, the viewer composability. We base our development strategy on would lose the information required to remain engaged several criteria that we consider essential to developing with the characters action. cultural, psychological and behavioral complexity in synthetic worlds. Our criteria also includes the Another Disney 2D principle that remains relevant in capability to develop training scenarios for use across 3D animation is exaggeration. Disney animators found multiple technology platforms; the ability to apply the no matter how accurately they would reproduce an appropriate level of visual fidelity depending upon the animation it seemed to lack the essence of the action. training objectives; the ability to provide in-game and This essence is a key element required for engaging the after-action training feedback and assessments; and viewer -- making them feel what was about to take place ultimately, end-user functionality for SMEs. on screen. (Frank, 1981) This became more of a problem as animations became more subtle and nuanced, as we ACKNOWLEDGEMENTS are doing with our work in FPCT. Nuanced animations are possible with current technologies although there is still a need to push or exaggerate character animations This project is possible from a contract from TRADOC for them to read accurately. This exaggeration is not a G2 Intelligence Support Activity, and we appreciate

2011 Paper No. 11372 Page 10 of 11 their ongoing support. We also thank our colleagues at Perception. Prentice-Hall, Inc. P. 42 the University of Texas at Dallas for their contributions Ochs, Magalie, Radoslaw Niewiadomski, Catherine to this research, particularly Michael Kaiser, Jay Pelachaud, and David Sadek. (2005) “Intelligent Laughlin, Steven Michael Engel Craven, Souphia Ieng, Expressions of Emotions.” Tim Lewis, Jeremy Johnston, Adam Chandler, Brad and Intelligent Interaction. Vol. 3784. Springer, 2005. Hennigan, Brian Powyszynski, Daniel Ries and all the 707-14. Lecture Notes in Computer Science. Retrieved members of the First Person Cultural Trainer team. May 9, 2011, from http://www.springerlink.com/ We also would like to acknowledge the support of Dr. content/7007280gtq412j0h/. Thomas Linehan, director of the Institute for Interactive Porter, Stephen, and Leanne Brinke. “Reading between Arts and Engineering at the University of Texas at the Lies: Identifying Concealed and Falsified Emotions Dallas. in Universal Facial Expressions.” Psychological Science 19.5 (2008): 508-14. EBSCO. Retrieved REFERENCES June 22, 2011, from . Richardson, Stuart (2011) The art of lying: LA Noire. Retrieved June 16, 2011, from http://www.develop- Aiken, Jim. (2004) Power Tools for Synthesizer online.net/features/1166/The-art-of-lying-LA-Noire Programming. Backbeat Books, San Francisco, CA. Russell, James A., and José Miguel Fernández-Dols. © 2004 ISBN: 0879307730 The Psychology of Facial Expression. Cambridge: CGSociety: Society of Digital Artists Feature Article. Cambridge UP, 1997. “An Image is worth a thousand words, but the face Sabini, John, and Maury Silver. (2005) “Ekman’s Basic says them all. CGScociety explores Image Metrics.” Emotions: Why Not Love and Jealousy?” Cognition (n.d.) Retrieved June 16, 2011, from http://features. and Emotion 19.5 (2005): 693-712. Retrieved March cgsociety.org/story_custom.php?story_id=4480 24, 2011, from http://ivizlab.sfu.ca/arya/Papers/ Darwin, Charles. (1872) The Expressions of the Others/Emotions/Ekman%20-%20Why%20Not%20 Emotions in Man and Animals. London: J. Murray. Love%20and%20Jealousy.pdf. Davison, John. (2010) “The Most Impressive Thing Schaap, Robert. (2009) “Enhancing Video Game I Saw at E3.” Retrieved June 16, 2011, from http:// Characters with Emotion, Mood and Personality.” www.gamepro.com/article/news/215667/the-most- Thesis. Delft University of Technology, 2009. impressive-thing-i-saw-at-e3/ Retrieved March 9, 2011, from http://graphics.tudelft. Ekman, Paul. “Chapter 3: Basic Emotions.” (1999) nl/Game_Technology/Schaap?action=AttachFile&do Handbook of Cognition and Emotion. Ed. T. =get&target=Thesis_Schaap.pdf. Dalgleish and M. Power. Sussex, U.K.: John Wiley Schutz, Alfred . The Phenomenology of the Social & Sons. Retrieved June 13, 2011, from http:// World. Northwestern University Press. Evanston, citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1 Illinois. Originally published in German under the .123.1143&rep=rep1&type=pdf. title Der sinnhafte Aufbaus der sozialen Welt. © 1932 Ekman, Paul, and Dacher Keltner. (2003) “Introduction: by Julius Springer, Vienna; © 1960 by Springer- Expression of Emotion.” Handbook of Affective Verlag, Sciences. Ed. R.J. Davidson, K.R. Scherer, and H.H. Vienna. English translation © 1967 by Northwestern Goldsmith. New York: Oxford UP. 411-14. Retrieved University Press. June 13, 2011, from http://www.paulekman.com/wp- Thomas, Frank. Disney Animation: The Illusion Of content/uploads/2009/02/Intoduction-Expression-Of- Life. New York: Abbeville, 1981. Print. Emotion.pdf. Vicon T-Series: The world’s next generation motion FaceWare. Facial Animation Software. (n.d.) Retrieved capture camera. (n.d.) PDF retrieved from website: June 16, 2011, from http://www.image-metrics.com/ http://www.vicon.com/products/documents/ pdf/FaceWare%20Overview.pdf TSeriesSedition.pdf Kong, Jennie. (2010) Groundbreaking New MotionScan Vrij, Aldert. (2008) Detecting lies and deceit: Pitfalls Technology set to Redefine 3D CGI Performances. and opportunities. Chichester, United Kingdom: Press announcement retrieved June 16, 2011, from Wiley. http://depthanalysis.com/downloads/pr/depth_analysis_ announcement_20100301.pdf L.A. Noire Trailer – Behind the Scenes. Retrieved June 16, 2011, from http://www.youtube.com/ watch?v=ZY7RYCsE9KQ Michaels, Claire F., and Carello, Claudia. (1981) Direct