Representing People in Virtual Environments

Will Steptoe 17th November 2009 INTRODUCTION What’s in this lecture?

First Hour Overview and Applications - State-of-Art, Social Agency, Human Behaviour, Realism, Applications, Agency (Agents and Avatars), CVEs, Avatar Control.

Second Hour Technical Aspects and Demonstration - Graphics, , Behaviour - Application in 3DSMax INTRODUCTION State-of-Art

Real-Time Pre-Rendered

Heavy Rain The Curious Case of Quantic Dream, 2009 Benjamin Button David Fincher, 2008 INTRODUCTION Virtual Humans

• Complex problem of technical and human factors. • Generating subtleties of human behaviour is a problem beyond raw computing power. The more real they look, the more real we expect them to behave. • To generate completely realistic characters we have to completely understand human perception in reality! • ... but why are we so sensitive to minor defects in virtual humans? SOCIAL AGENCY Social Agency and the ELIZA effect

• People generally require minimal encouragement to view computer systems and applications as social agents, reading far more understanding than is warranted from symbols and graphical displays. “Individuals mindlessly apply social rules and expectations to computers” – Nass and Moon, 2000.

• This was unexpectedly observed, and first documented, by Weizenbaum (1966) when performing user studies with ELIZA - a computer program for the study of natural language communication between man and machine. SOCIAL AGENCY Social Agency and the ELIZA effect

• During the purely text-based interactions between participants and the system, ELIZA simulated a Rogerian psychotherapist by rephrasing input statements from the user, and returning them as questions. (i.e. “I’m feeling depressed” -> Why do you think you are feeling depressed?)

• Weizenbaum observed many examples of people becoming emotionally engaged when ‘communicating’ with ELIZA, and some even asked to be left alone with the system. SOCIAL AGENCY Social Agency and the ELIZA effect

• This phenomenon has become known as the ‘ELIZA effect’, and may be considered a precursor to many observations found in the VE literature concerning presence (place illusion) and copresence.

• People are particularly responsive to depictions of humans. SOCIAL AGENCY The Fear of public speaking

• David • Not very comfortable with public speaking • Asked to speak about his favourite subject: cables • Behaviours triggered at appropriate intervals

Pertaub, D.-P., Slater, M., and Barker, C. (2002). An experiment on public speaking anxiety in response to three different types of virtual audience. Presence: Teleoperators and Virtual Environments, 11(1): 68-78 SOCIAL AGENCY The Fear of public speaking

• The user was asked to give a presentation three times – Positive, Negative and Mixed

• Positive - agents smiled, leaned forward, faced the user, maintained gaze, clapped hands, etc. • Negative - agents yawned, slumped forward, put feet on the table, avoided eye contact, and finally walked out • Mixed - agents started off with largely negative responses and gradually turned positive SOCIAL AGENCY Realistic responses in VE ? • Individuals' self-rated performance was positively correlated with the perceived good mood of the agents • Evidence of a negative response especially strong with the negatively inclined audience – Sweating and stammering – Vocal protests at the agent behaviours

• Virtual humans with minimal behavioural-visual fidelity can elicit significant user responses • End Goal: Virtual humans with high visual fidelity that mimic real-life context-appropriate behaviours HUMAN BEHAVIOUR Categories of behavioural cues

Argyle, M. (1998). Bodily Communication. Methuen & Co Ltd, second edition.

• Vocal properties – Tone, Pitch, Loudness… • Facial expressions – The most studied behavioural cue due to it’s role in communication • Gaze behaviour – Probably the most intense social signallers • Kinesics: Posture and Motion – Numerous gestures depending on culture for instance • Proxemics – Culture and gender dependent HUMAN BEHAVIOUR

Facial expression

• In reality, 20000 facial expressions exist • Normally animated by blending “Morph Targets” • Different granularities of facial expression – Facial action parameters (most basic units) • Basic emotions – Phonemes (mouth shapes for lip-sync) – Principal component analysis HUMAN BEHAVIOUR

Gesture

• Normally animated by choosing from a library of gestures • Very closely associated with speech – Also back channel gestures by listeners (e.g. head nod) • Different types of gesture – E.g. beat, iconic • Again see Cassell’s work referenced earlier HUMAN BEHAVIOUR

Posture

Coulson, M. (2004). Attributing emotion to static body postures: Recognition accuracy, confusions, and viewpoint dependence. Journal of Nonverbal Behavior, 28(2):117–139.

• Over 1000 stable postures have been observed • Normally animated by choosing from (or blending between) a library of gestures • Associated with attitude and emotion • Associated also with interpersonal attitude HUMAN BEHAVIOUR

Measuring Success

• So the careful design of behaviour is important but there are caveats • Success of a VE is measured in terms of the extent to which sensory data projected within a virtual environment replaces the sensory data from the physical world – quantified by rating the individuals’ sense of presence during the experience • For Virtual Humans: Success is taken as the extent to which participants act and respond to the agents as if they were real – Subjective: Questionnaires, Interviews – Objective: Physiological, Behavioural HUMAN BEHAVIOUR

Subjective means

• Traditional methods: Questionnaires and interviews – Various questionnaires exist – http://www.presence-research.org • Criticised due to its various dependencies – the individual’s accurate post-hoc recall, – processing and rationalisations of their experience in the VE and – Varying interpretations of the word ‘presence’ HUMAN BEHAVIOUR Objective: Responses to stimuli

• Numerous possible objective measures – Subconscious responses • Threat-related facial cues provokes individuals to use different viewing strategies – Neural responses • Different areas of the brain are activated during +ve, -ve and neutral situations – Psychological responses • Stress and Anxiety in response to threat – Physiological responses • Galvanic Skin Responses, Heart Rate Variability, electrocardiograms, electromyography, Respiratory activity – Behavioural responses • Flight or Fight (based on cognitive appraisal) • Vary based on cognitive factors, personality, emotional state, gender etc. – How do we interpret the data and results? REALISM Uncanny Valley

• As the behaviour and representation of robots (and other facsimiles) of humans approaches that of actual humans, it causes a response of revulsion among human observers.

• Theory from 70s by roboticist Masahiro Mori – Controversial, its not very rigorous or scientific, many people don’t believe it – There are problems but maybe it captures something REALISM The Uncanny Valley REALISM The Uncanny Valley Dreamworks reduced realism of Princes Fiona (Shrek): “…she was beginning to look too real, and the effect was getting distinctly unpleasant.”

Final Fantasy movie: “…it begins to get grotesque. You start to feel like you're puppeteering a corpse” REALISM REALISM Uncanny Valley

• At low levels of realism, the more realistic a character the more people like it. • But when you get almost real then characters start to get disturbing - corpses are used a lot as metaphors • Interestingly, there are two graphs: movement and appearance, movement is more important. REALISM Different Types of Realism

• Visual Realism – What it looks like (pictures, film, games, VE)

• Animation Realism – How it moves, animation (film, games, VE)

• Behavioural Realism – How it responds and interacts (games, VE) REALISM Mismatch in Realism • Maybe the problem is that levels of movement and behavioural realism do not match graphical realism. • This mismatch disturbs us, something that looks human but does not act like a human. • Consistency is important. REALISM Appearance vs. Behaviour

Vinayagamoorthy, V., Garau, M., Steed, A., and Slater, M. (2004b). An eye gaze model for dyadic interaction in an immersive virtual environment: Practice and experience. Forum, 23(1):1–11. REALISM Appearance vs. Behaviour

App. Cartoon Higher – App. Cartoon Higher – – Form Fidelity – Form Fidelity Beh. Beh. Random 3 ♂ pairs 3 ♂ pairs Random High Low gaze 3 ♀ pairs 3 ♀ pairs gaze

Inferred* 3 ♂ pairs 3 ♂ pairs Inferred* Low High gaze 3 ♀ pairs 3 ♀ pairs gaze

Garau, M., Slater, M., Vinayagamoorthy, V., Brogni, A., Steed, A., and Sasse, A. M. (2003). The impact of avatar realism and eye gaze control on the perceived quality of communication in a shared immersive virtual environment. In Proceedings of SIGCHI, pages 529–536. REALISM Appearance vs. Behaviour

• Realistic gaze behaviour had a positive impact on the perception of more visually-realistic avatars. • In the case of a lower visually realistic avatar, the more complex gaze model had a negative effect on participant response.

• Important to note that the differences between both the gaze models were very subtle – saccadic velocity and fixation durations. • Analysis demonstrated a very strong interaction effect between the type of avatar and the fidelity of the gaze model. REALISM Realism vs Believability

• The lesson is that we need to be careful with realism for virtual humans

• Often we prefer to use the term “believability” – Not how much a character is objectively like a human – How much we feel it is/respond to it as if it is – Bugs Bunny is very believable

• Photorealism is only one element of believability – But don’t turn into an anti-realism zealot! APPLICATIONS OF VIRTUAL CHARACTERS Characters in Virtual Environments • So far we have talked about how people respond to characters.

• Now we will talk about characters in virtual environments

• Characters are often key to an environment, the primary content

• We are interested in people so populated environments are interesting APPLICATIONS OF VIRTUAL CHARACTERS Applications of Virtual Characters Games Non-player characters are generally there to either be shot, or to have more complex interactions with. Player-characters are represent the user. Online Virtual Worlds Users are represented by avatars – an iconic representation of a human. Interaction via text, voice and nonverbal (scripted animation) means. Immersive VEs Users are embodied by avatars – natural body movement is mapped to avatar animation. APPLICATIONS OF VIRTUAL CHARACTERS Multi-user worlds • Avatars become much more important in multi- user worlds (the most important feature?) • They also represent you to other people • They affect how people perceive you APPLICATIONS OF VIRTUAL CHARACTERS Multi-user worlds • Established norms of proxemic and gaze behaviour are preserved in VEs: male-male dyads maintain greater interpersonal distance than female-female dyads, male-male dyads maintain less eye contact than female-female dyads, and decreases in interpersonal distance are compensated with gaze avoidance (Yee 2007).

• Echoes Argyle et al.’s equilibrium theory specifying an inverse relationship between mutual gaze and interpersonal distance APPLICATIONS OF VIRTUAL CHARACTERS

Immersive VR

• In immersive VR systems you can interact with life size, real-time characters that may be agents (autonomous), avatars (other human users) or hybrids. AGENCY

Agency: Avatars and Agents

• Characters in virtual environments fulfill many roles but there are two primary types • Avatars – Representations of you, or other people – User controlled (tracked) • Agents – Others, that you interact with – Computer Controlled • Hybrid – Part tracked, part simulated AGENCY

Agency: Avatars and Agents

• In practice some elements of avatar behaviour are Avatar Tracking programmed not tracked • E.g., breathing and eye blinking at the least • Ideally can use information about ‘mood’ to determine aspects of avatar Mixture of both behaviour. • Impossible to track every aspect of the human’s behaviour so much must be inferred and programmed. • Real avatars are mixed. Agent Programming AGENCY

Agency: Avatars and Agents

• For agents the behaviour is completely programmed. • For avatars the behaviour is ideally completely determined by the behaviour of the real tracked human. • In practice the human cannot be fully tracked – typically in VR only head and one hand movements are tracked! AGENCY

Interactive Behaviour

• Key to both roles is the interaction with a character • Composed of two elements, UI and AI • “User interface” – In what ways do we interact with a character? • “Artificial (Augmented) Intelligence” – How does the character respond? – How is it controlled? AGENTS

Agents

• Many different style of interaction for agents

• Cannon fodder, non-player characters, crowds, complex conversational agents

• Many interactions, shooting, moving, conversation (from dialogue trees to spoken interaction) AGENTS Agents - Game NPC

• UI: – Moving, shooting – Simple conversation • AI: – Finite state machines – Scripts – Path Planning AGENTS

Virtual Humans - Agents

Agents are entirely program controlled rather than representing an on-line human. These are examples from virtual fashion shows. http://www.miralab.unige.ch AGENTS

Agents - Embodied Conversational Agents

• UI: – Speech conversation – Gestures etc. – Tracking data • AI: – Complex conversational – AI methods AGENCY

Inferring Behaviour: Animation imitating life

Lasseter, J. (1987). Principles of applied to 3d . ACM SIGGRAPH Computer Graphics, 21(4):35–44.

• Emotional models – Controllers of behaviour in accordance to internal states • Personality models – Creating unique identities • Conversation-feedback models – Controlling behaviour • Social models – Interpersonal relationships and attitudes • ??? AVATARS Avatars

• Your embodiment in the VE • A vital part of shared VEs • Generic or personalised

• User embodiment in shared VEs is the fundamental mediator of the visual interaction, functioning both to identify users and to communicate nonverbal behaviour including position, identification, focus of attention, and gesture and actions (Thalmann 1999)]. AVATARS Avatars

• Avatars generally exhibit generic humanoid form, reflecting their status as a representation of a human user, and critically, enables direct relationship between the user’s natural bodily movement, and the corresponding animation of their avatar embodiment in the VE.

• Relates back to social agency and presence. AVATARS Avatars

• Useful and interesting applications are with other people – Simulation of real events – Training – Entertainment – Shared VEs • The other users are entirely ‘real’ but represented entirely synthetically – As in shared (networked VEs) AVATARS IN CVEs Collaborative Virtual Environments (CVEs) • Can be immersive (i.e. CAVE) or non-immersive (i.e. desktop) • Avatars are the visual mediator of communication

• Differing control metaphors: – In immersive system, avatars embody the real tracked person in terms of spatial representation (where they are, what they are looking at) and behavioural representation (what they are doing). – In non-immersive systems, avatar control is performed by standard input devices as no tracking is available. AVATARS IN CVEs Avatars and Identity

• Users of online virtual worlds use avatars as a means of identity creation • Customization is vital – Appearance, clothes, hair, sometimes animation

• The relationship to real identity is complex – Have a different appearance, personality, gender – Explore hidden sides of yourself – Some people feel their avatar is “More Me” than their physical self AVATARS IN CVEs Avatars as social tools

• Ideally avatars is social VEs should support social interaction • Display the bodily functions of communication (body language) • However, most avatars in most virtual worlds don’t • The body movements often exist, but most users use them unrealistically or often not at all • Primarily a problem of control AVATARS IN IMMERSIVE CVEs Avatars in Immersive CVEs

Allows spatial interaction more easily than other telecommunication systems such as video.

Spatiality is a natural feature of the real-world. AVATARS IN IMMERSIVE CVEs Avatar Mediated Communication Hardware and software work together to approximate reality: • Life-size representations • Body tracking coupled to avatar movement - head and hand at least as these are the prerequisite tracking devices used in immersive systems. • Stereo visualisation AVATARS IN IMMERSIVE CVEs Avatar Mediated Communication

CVE hardware and software are usually decoupled.

This means that the same software will operate on many hardware systems.

Asymmetric collaboration is possible (i.e. a user in the CAVE and a user on a mobile device).

Each user views and interacts via input devices appropriate to their hardware. AVATAR CONTROL Controlling avatars • Typed Text, Emoticons, Traditional GUI, Speech, Full body tracking AVATAR CONTROL Minimal Tracking for IK in VR

• Badler et al showed a minimal configuration for IK representing the movements of a human in VR – www.cis.upenn.edu/ ~hollick/presence/presence.html • It was shown that 4 sensors are sufficient to reasonably reconstruct the approximate body configuration in real-time. AVATAR CONTROL Nonverbal Expression – tracking vs simulation

Key limiting factor of avatar-mediated communication is the lack of nonverbal communication (NVC). Avatars are primitive when compared to video of real people.

In AMC, NVC behaviours can modelled (thus forming a hybrid avatar-agent), but models are unable to communicate the subtleties of human behaviour and the unpredictability of social interaction.

Also, models are not faithful to the controlling user’s behaviour, so may communicate incorrect signals. AVATAR CONTROL Nonverbal Expression – tracking vs simulation

Tracking is the solution, but is difficult.

Implement tracking behaviour according to priority: 1. Body motion 2. Eye movements 3. Facial expression 4...... ? AVATAR CONTROL

Problems with Controlling Avatars

• Two modes of control: at any moment the user must choose between either selecting a gesture from a menu or typing in a piece of text for the character to say. This means the subtle connections and synchronisations between speech and gestures are lost. • Explicit control of behaviour: the user must consciously choose which gesture to perform at a given moment. As much of our expressive behaviour is subconscious the user will simply not know what the appropriate behaviour to perform at a give time is [BodyChat, Vilhjalmsson, H. and Cassell, J., 1998] AVATAR CONTROL

Problems with Controlling Avatars

• Emotional displays: current systems mostly concentrate on displays of emotion whereas Thórisson and Cassell (1998) have shown that envelope displays – subtle gestures and actions that regulate the flow of a dialog and establish mutual focus and attention – are more important in conversation. • In non-immersive CVEs only: direct tracking of a user’s face or body does not help as the user resides in a different space from that of the avatar and so features such as direction of gaze will not map over appropriately.

[BodyChat, Vilhjalmsson, H. and Cassell, J., 1998] AVATAR CONTROL

Solutions

• Always ensure that any control is done through a single interface (e.g. through text chat) • BUT…. • The body language of an avatar should be largely autonomous, and indirectly controlled by users • Minimize the level of control needed

[BodyChat, Vilhjalmsson, H. and Cassell, J., 1998] AVATAR CONTROL

Solutions: Spark

• Text Chat based environment • Parse users text input for interactional information • Use this information to generate behaviour AVATAR CONTROL

Solutions: Spark AVATAR CONTROL

Solutions: PIAVCA

Operator Speech Speech User Interaction Generation Movements Concurrent Script Database Multi-model utterances Behaviours

Motion Queue Proxemics Posture Shifts Gaze

Final Animation GOAL OF VIRTUAL HUMANS

Designing virtual humans

• GOAL: Represent the Person in VE consistently – With perceived realism, believability … • Induce responses to the virtual human – Inducing realistic/lifelike responses • Enhancing collaborative experience • Facilitate social communication and interpersonal relationships GOAL OF VIRTUAL HUMANS

Designing behaviour • Creating apparent social intelligence is challenging • Have to present behavioural cues to depict a perceived (and plausible) psychological state – Or the near-truth internal state of the Person being represented • Human behaviour is a very intricate phenomenon – Dependent on many factors • Extremely difficult to replicate especially if the design process is approached in an ad-hoc manner – For instance: In social interactions within VE, the more visually realistic the virtual human, the more naturalistic users expect it to act GOAL OF VIRTUAL HUMANS Summary

• Social agency ensures that virtual human agents are able and necessary to represent social situations • Gamut of human behaviour ensures that this is a very complex problem which should be adapted to application. • Higher realism (behavioural and visual) is not necessarily a good thing. • Avatars and Agents must capture social intelligence using tracking or simulation of behaviour. The design of these should consider to many factors, again including application. • Avatars typically need to be a mixture based on tracking data and inferred state. • Current Research focus on quantifying the successful creation of Virtual Humans using objective measures. End of Part 1

3DSMax Demo INTRODUCTION

Technical Aspects of Virtual Characters

• Graphics – Polygon meshes, rendering • Animation – , mesh morphing, physical simulation • Behaviour INTRODUCTION

Graphics

• Techniques: Meshes, texture mapping, standard graphics stuff • Hand modelling: can be cartoony or highly realistic • /phototextures: can have very high realism • Rendering Opacity: Subsurface scattering INTRODUCTION Modelling

Scanned body results in huge mesh which can be rendered at different resolutions (numbers of polygons) INTRODUCTION Animation – bones and morphs Body Animation

• Can Hand animate the skeleton • Often use motion capture • Real data = Realism (?) MOTION CAPTURE

Marker-based Capture • Able to capture subtle facial expressions of actors • Not real-time (require intensive post-processing) • Less reusable (i.e. Skeletal motion capture can be applied to any model, but facial motion capture is more specific to a particular model). SKELETAL ANIMATION

Skeletal Animation

• The fundamental aspect of human body motion is the motion of the skeleton • The motion of rigid bones linked by rotational joints (first approximation) • I will discuss other elements of body motion such as muscle and fat briefly later SKELETAL ANIMATION

Typical Skeleton

• Circles are rotational joints lines are rigid links (bones) • The red circle is the root (position and rotation offset from the origin) • The character is animated by rotating joints and moving and rotating the root SKELETAL ANIMATION

Forward Kinematics (FK) • The position of a link is calculated by concatenating rotations and offsets

R0

P2

R1 O O0 1 O2 SKELETAL ANIMATION

Forward Kinematics (FK)

• First you choose a position on a link (the end point) • This position is rotated by the rotation of the joint above the link • Translate by the length (offset) of the parent link and then rotate by its joint. Go up it its parent and iterate until you get to the root • Rotate and translate by the root position SKELETAL ANIMATION

Forward Kinematics (FK)

• Simple and efficient • Come for free in a scene graph architecture • Difficult to animate with, – often we want to specify the positions of a characters hands not the rotations of its joints • The problem: – Calculating the required rotations of joints needed to put a hand (or other body part) in a given position. SKELETAL ANIMATION

Inverse Kinematics

• An number of ways of doing it • Matrix methods (hard) • Cyclic Coordinate Descent (CCD) – A geometric method (secretly matrices underneath)

R0 Pt

R1

O1 O2 SKELETAL ANIMATION

Inverse Kinematics

• Start with the final link SKELETAL ANIMATION

Inverse Kinematics

• Rotate it towards the target SKELETAL ANIMATION

Inverse Kinematics

• Then go to the next link up SKELETAL ANIMATION

Inverse Kinematics

• Rotate it so that the end effector points towards the target SKELETAL ANIMATION

Inverse Kinematics

• And the next… SKELETAL ANIMATION

Inverse Kinematics

• And the next… SKELETAL ANIMATION

Inverse Kinematics

• And iterate until you reach the target SKELETAL ANIMATION

Inverse Kinematics

• And iterate until you reach the target SKELETAL ANIMATION

Inverse Kinematics

• And iterate until you reach the target SKELETAL ANIMATION

Inverse Kinematics

• And iterate until you reach the target SKELETAL ANIMATION

Inverse Kinematics

• And iterate until you reach the target SKELETAL ANIMATION

Inverse Kinematics

• IK is a very powerful tool • However, it’s computationally intensive • IK is generally used in animation tools and for applying specific constraints • FK is used for the majority of real time animation systems SKELETAL ANIMATION Representation and Format

• Layered representation – Skeleton structure forms a scene graph – Scene graph embodies a set of joints – A mesh overlays the scene graph – As the skeletal structure moves the mesh must deform appropriately (otherwise there are holes)

http://ligwww.epfl.ch/~maurel/Thesis98.html MPEG4 example Facial Animation

• Don’t have a common underlying structure like a skeleton • Faces are generally animated as meshes of vertices • Animate by moving individual vertices MORPH TARGET ANIMATION Morph Targets

• Have a number of facial expressions, each represented by a separate mesh • Each of these meshes must have the same number of vertices as the original mesh but with different positions • Build new facial expressions out of these base expressions (called Morph Targets) MORPH TARGET ANIMATION Morph Targets MORPH TARGET ANIMATION

Morph Targets

• Smoothly blend between targets • Give each target a weight between 0 and 1 • Do a weighted sum of the vertices in all the targets to get the output mesh vi = ∑ wtvti ;∑ wt =1 t∈morph_targets MORPH TARGET ANIMATION Using Morph Targets • Morph targets are a good low level animation technique • Also need ways of choosing morph targets • Could let the choose (nothing wrong with that) • But there are also more principled ways. END!

Summary

• Characters are represented typically as ‘skinned’ skeletal scene graphs, representing sets of joints that link to the geometry.

• Forward kinematics determines overall configuration given joint angles and Inverse kinematics determines joint angles from requirements for end-effectors

• Morph targets are a method of mesh deformation often used for facial animation