Signature Redacted MIT Media Lab July 26, 2018

My Personalized Movies: Novel System for Automatically Animating a Movie based on Personal Data and Evaluation of its Impact on Affective and Cognitive Experience

by Fengjiao Peng

B.Sc. Physics The University of Hong Kong, 2016

Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning, in partial fulfillment of the requirements for the degree of Master of Science in Media Arts and Sciences at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September, 2018

@2018 Fengjiao Peng. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created.

Signature of Author: Signature redacted MIT Media Lab July 26, 2018

Certified by: Signature redacted Rosalind W. Picard Professor, Affective Computing Research Thesis Supervisor Signature redacted Accepted by: MASSACHUSETTS INSTITUTE L-)Tod Machover OF TECHNOLOGY Academic Head and Sciences OCT 16 2018 Program in Media Arts

LIBRARIES ARCHIVES I Signature redacted

Cynthia Breazeal

Associate Professor, Personal Robots Group Thesis Reader

Signature redacted

V. Michfel Bove

Principal Research Scientist, Object-Based Media Group Thesis Reader Fengjiao Peng My Personalized Movies: Novel System for Automatically Animating a Movie based on Personal Data and Evaluation of its Impact on Affective and Cognitive Experience Documentation, July 26, 2018 Readers: Cynthia Breazeal and V. Michael Bove Supervisor: Rosalind W. Picard

Massachusetts Institute of Technology 77 Massachusetts Ave 02139 and Cambridge My Personalized Movies: Novel System for Automatically Animating a Movie based on Personal Data and Evaluation of its Impact on Affective and Cognitive Experience

Fengjiao Peng

Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning, in partial fulfillment of the requirements for the degree of Master of Science in Media Arts and Sciences

Abstract

Storytelling is a fundamental way in which human beings make sense of the world. Animated movies tell stories that engage audience across culture and age groups. I designed and built My Personalized Movies (MPM), a novel system where animated stories are automatically created based on data provided by individuals. The data include self-tracked mood and behavior captured in quantitative measures and descriptive text. MPM is designed to engage viewers through an emotive narrative, induce self-reflection about their mood and behavior patterns, and to improve self-compassion and self-esteem, which mediates behavior change.

I demonstrate with a few stages of studies, involving in total 107 participants, that viewers show strong emotional engagement with MPM and can explicitly connect animated characters' stories to one's past experiences. An analysis of 22 participants' facial expression data during MPM reveals that participants' change in implicit self-esteem is positively correlated with the happiness of their facial expression. Participants with higher depression severity, as measured by PHQ9, showed less positive facial expression at the happy moments in the animation.

Thesis Supervisor: Rosalind W. Picard

Title: Professor of Affective Computing Research

7 I Acknowledgement

I would like to express my sincere gratitude for all the mentors, colleagues, friends and family that helped me complete this degree. This work couldn't have been possible without your support.

I would like to thank Rosalind Picard for guiding and supporting my thesis work. The idea of this work dates back to 2015, when I was a self-taught animator applying to grad school. In my email correspondences with Roz, she prompted me to imagine making "intelligent" animation that "adapts to everyone", that "touches and inspires them". The scope of the idea seemed crazy to me, let alone working on it as a master's thesis. I would like to thank Roz for providing me the opportunity to spend the wonderful two years at the Media Lab, and your insight, patience, and support as a supervisor. I'm grateful for my readers, V. Michael Bove Jr. and Cynthia Breazeal for guiding me through the thesis formulating and writing process. I'd like to thank Weixuan (Vincent) Chen, Asma Ghandeharioun for being great office mates! It was such a good time working with Sara Taylor, Craig Ferguson and Ognjen Rudovic, when I learned a lot from each of you. I'd also like to thank Javier Hernandez, Kristy Johnson, Natasha Jaques, Akane Sano, and everyone in the Affective Computing Group for your generous help and advice. I am also grateful for Mary Heanue's help with the difficult project finance and logistics.

I am grateful for Veronica LaBelle, Lucy Zhang, Emily Yue for being wonderful UROPs who brought the animation world to life. Sneha Makini, Christian Vazquez, David Cruz, Erin Holder and Christopher Acree have been the most fantastic friends and family overseas. I'd like to thank Dianbo Liu and Andrew Shea for spending the challenging but fun three weeks co-authoring our paper together. Working in the media lab, everyone is a constant source of creativity and inspiration.

, Atr VAt) f*t 9

Contents

1 Introduction 13 1.1 Overview ...... 15

2 Relevant Background Work 17 2.1 The quantified self ...... 17 2.2 Reflective Media ...... 17 2.3 Animation as a language of emotion ...... 18 2.4 Emotional Agents ...... 18 2.5 Games for health ...... 20

3 System Design 21 3.1 Overview ...... 21 3.1.1 From Data to Animation ...... 22 3.2 Believable Agent ...... 25 3.2.1 Representation of emotions ...... 25 3.2.2 Navigation and Locomotion ...... 27 3.2.3 Motivation ...... 29 3.2.4 Attention ...... 30 3.2.5 Perception and reaction ...... 31 3.3 Affective cinematography ...... 32 3.3.1 Design considerations ...... 33 3.3.2 Designing with the Cinemachine system ...... 34 3.4 Rendering considerations ...... 35 3.4.1 Lighting and weather...... 36 3.4.2 Stylized rendering ...... 37

4 Experiments 39 4.1 Graphical Affects Validation ...... 39 4.1.1 Results ...... 39 4.1.2 Discussion ...... 40 4.2 A Trip to the Moon: Self-reflection ...... 41 4.2.1 Group comparison ...... 43 4.2.2 Emotional engagement ...... 45 4.2.3 Human-agent connection ...... 46

11 4.2.4 Self-reflection ...... 47 4.2.5 Discussion ...... 49 4.3 Snowbound: Changing implicit self-esteem 51 4.3.1 The Shape of Stories ...... 51 4.3.2 Implicit Association Test and Implicit Self-Esteem 53 4.3.3 Hypotheses ...... 55 4.3.4 Study ...... 56 4.3.5 Data encoding ...... 57 4.3.6 Group findings ...... 58 4.3.7 Case Studies ...... 63 4.3.8 Discussions ...... 66

5 Conclusion 69 5.1 Summary ...... 69 5.2 Contributions and Recommendations ...... 70

Bibliography 73

12 Introduction 1

People are narrative animals. As children, our caretakers immerse us in stories: fairy tales, made-up stories, favorite stories, "Read me a story!"

- M. Mateas, P. Sengers Narrative Intelligence.

Every day, we learn about the world through the understanding of stories - fictions, news stories, movies, and commercials. Our decisions and behaviors can be influenced by observing story characters' experiences and feeling empathy for them. Tens of thousands of years ago, when our civilization was young, humans already had folk stories and myths through which we learnt the laws of nature, social conducts and life lessons. These folk stories and myths often educate, or threaten, or encourage listeners via eliciting basic human emotions: fear, desire for love, happiness, and so on. Emotion plays a key role in the effectiveness of narratives [8]. Without humans' neurological foundations to mirror others' emotions [57], stories lose their power to engage.

Today, technology has extended our abilities to tell and retell stories, visually, musi- cally, virtually, where mixed forms of media enhance the sensational effect of the narrative. A large body of literature is dedicated to understanding how we process and see cause and effect from possibly arbitrary image sequences in movies and TV [46], and how these visual narratives find their way into our cognitive schema. Evi- dence often points to the manipulation of viewers' mood and emotions, as emotion is capable of facilitating or impairing decision making [51].

While a lot of creative energy has been invested in helping products sell and nudging customers to purchase, good stories can be created to help people with their more pressing needs. Mental health among college students, as well as adults of other groups, is becoming a crucial concern. In a nation-wide survey conducted in the United States in 2015, 22% of respondents screened for depression in self-reported surveys. About a third of respondents had been diagnosed with a mental disorder [7]. Technologies that help users record and regulate their mental state have been increasingly popular given the possibility to self track and quantify physiological and

13 biological data. Wellbeing technology ranges from intervening autonomic physiology [12, 20] to more holistic approaches such as assisting mindfulness practices, often in highly personalized manners [52].

If people are indeed so responsive to the emotional nuances in visual narratives, why not enable persuasive narratives to dynamically change to support individuals' emotional wellbeing? Why not tailor stories to direct the audience towards desired mood and behaviors? The problem remains how such narratives can be created and personalized without sophisticated human involvement, the same way wellbeing technology adapts to individual users by processing their data and runs without a psychologist/therapist watching.

Computers can "generate" stories. Emotion can be expressed, communicated and elicited through animated visual forms [30, 48]. Think about the timid blanket character in the Disney movie Aladdin. This language of animation is interpretable across culture and age groups [30]. Animation artists create expressive characters based on simple principles. These animation principles have been thoroughly studied and applied in intelligent interfaces [34] and robotic design [7] to embody emotions in visuals or objects. Computers can be used to generate animation, but generating animated stories is still hard, given the layers of human imagination, humor and drama writing behind each simple story.

Everyone comes with a story, and each is the most touched when they see a movie or book character going through a similar experience as they have. If we can collect information of each person's life experience through quantified self technologies, why not create a story for each that mirrors their life? We propose My Personalized Movies (MPM), a system of producing personalized animated stories that engage viewers emotionally by reflecting their self-tracked mood and behavior. In MPM, stories are written automatically, not through deliberate screen-writing or design of drama, but via mirroring the life story of the specific viewer.

We created an animation world with 3D game graphics. The components in the nature environment, including the ground, vegetation and weather, are procedurally generated with a customized Perlin noise algorithm to personalize to user data. Upon creating a library of animated agents' facial expressions and body movements, we designed a novel emotion-behavioral model for the animated characters to act out the stories and behave with consistency and intelligence. The visual style is enhanced with customized shading algorithms. The animation is generated realtime in one click by downloading user data from an online server.

We then assessed the psychological and behavioral effect of the animation outputs through a series of studies.

14 Chapter 1 Introduction 1.1 Overview

This thesis is structured as follows: We start with advancements in technology that can be applied to collect self-tracking user data, such as the quantified self (QS).

In Chapter 2, we discuss related work in serious gaming and reflective design that takes users' physiological and affective data as input, to change users' physiology and psychology. We also refer to the design of autonomous intelligent agents, where we draw inspiration to create our emotional agents.

In Chapter 3, we create a virtual animation world that can procedurally generate animated stories, where we dive into the design of the procedural environment, character mechanics, scene mechanics and discuss a few rendering and aesthetic explorations in increasing visual believability.

Chapter 4 follows with three studies, engaging 107 users, to assess our prototype's effects on perception and psychology. The first study validates that individual animated clips featuring the agent, distinct to express happiness, stress, tiredness and social openness, are perceived by viewers as intended. The second study composed complete animated stories, titled A Trip to the Moon, featuring a corgi dog's trip chasing the Moon. We set out to assess whether viewers see themselves within arbitrary animation sequences, and whether personalization has a stronger emotional impact than nonpersonalized. In a third study, featuring personalized movies titled Snowbound, we examined whether MPM can influence viewers self- perception such as their implicit self-esteem. We also studied viewers' real-time emotional response to the movies by analyzing their facial expression data.

Chapter 5 summarizes the experiment findings, evaluates the contributions of this project, and makes future work recommendations for researchers and practition- ers.

1.1 Overview 15

Relevant Background Work

2.1 The quantified self

The progress towards personalized media feed has enabled stories we read to be sourced by our lives. Based on our digital footprint, we receive recommendations and advertisement customized to each person's taste and needs. The same is happening in health care - The development of mobile phone technology and biological sensors is enabling individuals to self-track biological, physical and environmental information. Mobile self-monitoring applications have been increasingly popular by tracking less tangible health data, such as mood and stress. A study investigating 107 users of mobile health and fitness application showed that self-reflection plays an important role in behavior change, and it is preferred that applications give tangible behavior change recommendations [25].

From rich self-tracking data, individuals can interpret and infer the patterns, correlations and causal relations in their own behavior and wellbeing. One crucial challenge is the design of the reflection interface to help users build up a critical understanding of their data, especially to motivate users to change their behaviors for the betterment of their own wellbeing or their social, biological environment. Emotions can facilitate or impair decision making [51]. We often need to couple critical thinking with emotional engagement to motivate user decisions.

2.2 Reflective Media

With the availability of wearables and physiological sensors, users can self-track their affective and behavioral data streams. Reflective media design takes user data such as their physical action, autonomic physiology, or self-reported mood, and maps it to an output that is designed to induce certain feelings, critical thinking or behaviors of the user [1]. One can think of MPM as a form of reflective media, mapping users' mood and behavior data to emotionally-engaging animation language. For example, if the user has irregular sleeping schedules, the avatar representing the user would behave with physical fatigue.

17 H66k proposed the concept of the affective loop experience where the users are pulled into feedback loops to their own emotions [28]. The feedback can take the form of figures, graphs, visualizations or other virtual or bodily designs. EmotionCheck is an emotion-regulation technology featuring a wearable device designed to provide a false, lower heart rate to the users to regulate their anxiety level [12]. The regulation is done without users being implicitly involved in trying to reduce their heart rate.

Other approaches use visual or physical forms to embody the signals. AffectCam uses physiological measures of arousal to select the most memorable photos in an individual's ordinary life [61]. Affective Diary incorporates the bodily experience of movement and arousal in an individual's journaling process [39]. Daily enhanced the emotional experience of journaling by capturing users' physiological signals during writing, allowing users to look at their emotional responses later [13]. These designs support and sustain the reflection and reliving of past experiences. Users are able to discover correlations between behaviors and the affective reflections which may give new insight and inspire life changes [39]. Similar to previous research on responsive media, MPM aim at reinforcing or changing user's perception of themselves or their past experiences. We aim to tell a compelling story with the data, with characters that elicit a response and connect with the audience.

2.3 Animation as a language of emotion

The above reflective designs often teach users new bodily or visual forms to represent emotions that take a while to learn. With MPM, we adapt the language of animated movies that users across different cultural backgrounds and age groups can easily engage with [30]. The language includes a vocabulary of movements: squash and stretch (Fig. 2.1), timing, anticipation, etc [31]. These simple principles of animation have long been incorporated in the design of emotional robots to bring life to them [4, 65]. Users don't have to learn or acquire new associations between the design form and their emotions. It was shown in our studies that the emotion language applied in MPM, in addition to the stories, can not only be accurately recognized, but also can move adult users to laughs and tears.

2.4 Emotional Agents

Intelligent virtual agents that interact with users have seen wide application. It has been argued that the believability of emotional agents is based on their perception, reasoning and reaction to external events [42, 56, 4]. In addition, they should demonstrate consistency of their emotional reactions and motivational states [50].

18 Chapter 2 Relevant Background Work ______-4

Fig. 2.1: Squash and stretch in Luxo Jr's hop. Figure taken from [38].

For example, a dog agent is observed to wag its tail (reaction) when receiving a complement (perception) from the user; such a dog agent is expected to wag its tail when receiving complements in the future (consistency). During our experiments, we also discovered attention as an important module. Studies to test conversational agents show that humans are more engaged with agents who "indicate attention by turning towards the person that the robot is addressing" [9]. Based on these principles, we script our story agent to have reasonable and consistent emotional responses to different events that happen to them in the stories. Inconsistent or surprising moments are occasionally allowed to elicit attention from the user.

In 2001, Tomlinson and Blumberg presented Alphawolf at SIGGRAPH, where virtual wolf agents are controlled by computational models that imitate social and learning behaviors of real wolves [62]. Our behavioral model is a similar system that blends animations based on agents' emotional states, controlling avatars' moving speed, body movements and facial expressions. Our model for dog agents is designed to be more personified, imitating human decision making rather than dog behaviors.

Popularity of 3D gaming has increased the interest in designing autonomous cinematography systems to enhance gameplay and storytelling. In addition to avoiding occlusions in gaming environments, some cinematography systems are designed to adapt to virtual agents' emotional and motivational states, imitating real cinematog- raphers. Tomlinson designed a cinematography system to take into account virtual agents' emotions, and storytelling motivations that change based on timeline [63]. Laaksolahti implemented autonomous cameras to reveal emotions and social interaction of virtual characters [37]. Elson designed a system to generate cinematography based on story scripts to automate machinima production [16].

2.4 Emotional Agents 19 2.5 Games for health

Health, education and training can be incorporated in gaming narratives where users are motivated by in-game rewards. Roth conducted studies to explore the factors that give interactive storytelling in video games motivational appeal for the user: curiosity, suspense, aesthetic pleasure and self-enhancement [59]. In the game for health designed by Gobel et al [21], user's body movement, heart rate, pulse and personalized workout data are used to influence the storytelling of the video game. Gustafsson et al [24] designed a mobile phone based game linked to the power consumption of the house, where teenagers are motivated to keep the virtual monster avatar healthy through lowering the power consumption.

These games let users have control over the agent in real time. However, interactive agents are susceptible to user's exploration. When the user discovers the boundary of agent scripting, where the agent fails to react to novel user input or starts to mechanically repeat reactions, the agent's believability quickly decreases. As a result, persuasive games with virtual agents are often constrained to children who find virtual agents more believable, and less effective with adults who are quickly bored or overwhelmed [27]. Gamification usually relied on cognitive mechanisms, such as action-reward, managing cognitive load, leveling up, for motivation and engagement [23]. In many instances of serious gaming, adult players are not expected to have an emotional response to the narrative of the game. In comparison, traditional film viewing and storytelling experiences are usually designed to elicit emotional, empathetic responses from the viewer. In this case, viewers' agency is extended by directors and storytellers, who decide on the screenplay, cinematography and editing, as well as the fate of the characters. Passively consuming the stories leaves users a more "meaningful and dramatic experience" [27] for both children and adults. We set out to explore the area of automatically generating passively-viewed media for emotional engagement that has historically been under-explored.

20 Chapter 2 Relevant Background Work System Design 3

The goal is believability, but pure physical realism does not ensure a character'sor a scene's believability. Most important, we aim to tell a compelling story, with charactersthat elicit a response and connect with the audience.

- Tom Porter (Computer Scientist, Pixar)

3.1 Overview

Several empirical principles in animated film making provide an intuition for creating this work. In Disney and Pixar animated movies, the agent (main character) often has a simple goal and a limited skill set to achieve the goal. The exposition, rising action, climax and falling action in traditional storytelling [18] all spring from obstacles that prevents the agent from achieving the goal. The goal is common and broad enough to conform to a majority of viewer and life situations. The symbolic form of goal also signifies the flow of time and metamorphosis in the agent. While the obstacles bring depth to the story, the agent can have a simple personality, allowing both children and adults to relate.

We create a narrative where a corgi dog (agent) travels in the wild. In the story used in the first experiment, A Trip to the Moon, the agent attempts to go to the moon (goal) where he/she either succeeds or fails. In the second story, Snowbound, the agent attempts to survive a snow storm. We depict the dog's journey by means of it following an ever-changing path, based on the personal data, representative of traveling through life. Along this journey, the dog will face various obstacles, in the form of weather, rocks blocking his way to the moon, and more, of which the nuances and variations will be dictated by the user's self-reported mood and behavior data, recorded every day over a week. The dog will also encounter social interactions (with other dog agents) that can elicit a variety of feelings.

21 We built a 3D environment in Unity and used a polygon dog model as the agent. We chose dogs as our avatars because they have intuitively understandable expressions of emotions, with which we can avoid the "uncanny valley" problems of using humanoid avatars. Interviews with study participants later reviewed that whether it was a dog, or a squirrel, or a deer, didn't matter as much as long as it represented personalized experiences. The agent's animation is created by experienced animators to display distinct emotions through facial expression and body movement, each story event being the combination of several animations. When the animation starts, the agent navigates to the goal, stopping whenever an event happens.

3.1.1 From Data to Animation

The design choices of animation to represent the data were made by three researchers based on their prior experience in filming or animation, and computational constraints. For example, the representation of stress changed several times throughout the iterations: Initially, we experimented with a mountain appearing, forming a visual obstacle in front of the agent. However, re-computing the agent's navigation path while the landscape was changing raised computational issues, causing the agent to sometimes walk underground. For the A Trip to the Moon experiments, we replaced the rising mountains with rocks falling from the sky. For Snowbound, stress came to life in the form of wild bears chasing the character, better simulating a "fight or flight" experience.

Six main groups of mood and behavior data are collected from the user through daily surveys: overall mood today, sleep and exercise, stress, social interaction, sense of purpose, and a line that reminds the user of that day. The questions on sleep, exercise and stress are the same as in Sano's study to measure stress, sleep and happiness, where one's mental wellbeing is systematically evaluated [60]. In Sano's study, sleep cycles, exercise and social interaction are correlated with mood [60]. The questions on sense of purpose are adapted from the purpose-in-life test, as the level of sense of purpose is highly correlated with depression and other mental health problems [26].

Below we explain how each type of data maps to animation:

One line that will remind you of today. After a preliminary study, we found that it's often difficult for users to recall the events on a day as far as a week ago by just telling them to recall "last Tuesday". It doesn't mean that they don't remember what happened a week ago - if we had named the restaurant they went to on that Tuesday, or the friend they met, they would have recalled the event clearly.

22 Chapter 3 System Design ______-I

Fig. 3.1: Three color and weather schemes that correspond to depression, anger and excitement.

Therefore we ask the users for one line as a memory retrieval cue for each event that happened. In the experiment, we received cues on a spectrum, varying from concrete ones, such as "Back to work after Labor Day...", "Rainy beautiful sunset KKC", "Pho Basil", to more abstract ones such as "Idiotic", "defused bombs", "tell me why, yang", "You are beautiful". This line is used as chapter titles in the animation, and each event is a chapter.

Overall mood. We map each user's overall feeling to a valence and arousal chart, using Likert scales of 1-7 that users would select in the surveys. Different points on the valence-arousal graph are mapped to different states of the agent's body animation and facial expression and moving speed, the steepness of the agent's surroundings, the color and weather embedded in the environment and the forest sound and music effects. For example, when the mood is calm and peaceful, the agent wanders leisurely with a content smile, the ground is smooth, the color scheme is a light purple with a thin layer of mist, and the forest sounds of bird chirping and breezes. All the factors are designed to express the emotion experienced in the day. (Fig. 3.1)

Mood changing events. Each user's animated story is organized in units of mood- changing events, each event representing an experience that has changed the user's mood during the period of the study Mood change can occur due to different causes: social interaction, sleep, exercise, stress, or mental activities such as rumination.

Social interaction. If a mood change is caused by social interaction, the animation will show an interaction between the agent and other animal agents, depending on how much social interaction there is (with an individual or a group), what kind of interaction it is, and who the user interacted with. General types of interaction include: no interaction, where the corgi acts with loneliness or content solitude, neutral interactions that don't affect the mood, a happy, playful interaction, an angry fight, or rejection. Depending on who the user interacts with, we introduce a power dynamic through the size of the other dog agents, where smaller would imply

3.1 Overview 23 interacting with someone more submissive, inferior or childlike, and larger would imply interacting with someone more dominant or authoritative.

If the user interacted with a group, the animation shows a group of deer accompany- ing the agent. The agent can respond differently to the deer, making the deer appear friendly or disturbing. In the study, one participant entered "I hate men" as the label of one of her events, and in her video, the agent is annoyed at a group of deer following her around as the phrase "I hate men" showed up. In her facial expression recording, participant smiled at the scene and what it possibly indicated.

Sleep and exercise. Similar to the social interactions, sleep and exercise events are scripted to depict users' data in that category. Too little sleep affects perfor- mance in the agent's movement, and eventually causes him or her to fall asleep. A good amount of exercise will make the agent appear more energetic and better at overcoming the steep landscape.

Stress. Stress is another factor that is closely related to mood. Stressors are visualized in the movie as a rain of rocks that blocks the agent from reaching the moon, or wild bears chasing the agent. Facing the same stressor, some view it as a challenge, a source of motivation (excited response), while others view it as a threat (anxious response) [43]. Therefore, we designed two questions, "How stressful is today?" and "How was your experience handling the stress?" to separate the user's difference responses to stress. The first question investigates the users' perceived level of stress, and the second question investigates how resilient or vulnerable they are when facing the stress. To answer "How was your experience handling the stress," users can select on a Likert scale of 0-7, where 0 corresponds to "I was anxious and stressed out", and 7 corresponds to "I felt things were under control".

If the perceived stress level is high, the agent will appear to be breathing fast, acting agitated. If the user has an excited response, the agent will run bravely towards the stressor and try to outrun it, else the agent will run away from the stressor. With this design, despite that the agent has the same stressed facial expressions and body animations, viewers are able to tell whether the agent has an excited or anxious response. This is based on the Kuleshov Effect, which states that identical faces followed with either neutral or emotionally salient movies alters viewers' attribution of the facial expression (for example, viewers are led to think the same facial expression as "hungry" or "disgusted", depending on whether the following shot is a dish of food or a dead rat) [46]. The ambiguity in the agent's facial expression is reduced by the content that follows.

Internal thoughts. Sometimes a user's mood change is not caused by the present - it is remembrance of the past or anticipation of the future. If a user reports a mood

24 Chapter 3 System Design Fig. 3.2: Screenshots of rendered animation that demonstrate different camera angles and rendering effects.

change due to these "internal" causes, the animation shows the corgi agent wanders into a deep cave. The agent then walks out of the cave with its mood changed. The cave can be warm and welcoming, symbolizing good memories or hopes, or dark and frightening, depending on the direction of the mood change.

Sense of purpose and achievement. We investigated the users' sense of purpose using 3 questions adapted from the Purpose-in-life psychology test [26]. How much interest the user has in life, how purposeful he feels, and how much personal achievement on the days investigated decide together the distance from the agent to the moon and whether the agent will have enough momentum to travel to it in the end.

3.2 Believable Agent

We built the system using the game engine Unity 2017.1.Ofl. The system went through three iterations. The iterations saw changes on 3D art assets, agent behavior models, more robust camera behaviors, lighting and rendering adjustments. The result is a smoother and more natural animated movie look that resembles human storyboarding and drawing, e.g. see Fig 3.2. Section 3.2 - 3.5 will talk in detail about how each part of the animation is realized.

3.2.1 Representation of emotions

We used a polygon corgi dog as our main character. Working with animation artists, we created a library of basic animations in .fbx format, including facial expressions and body movements. Fig. 3.3 (left) shows a spectrum of emotions the agent can display in the face. One cam compare the agent's facial expressions to Nobel Prize laureate Lorenz's grimace scale for dogs (Fig. 3.3 right). The virtual agent has some personified facial expressions that dogs don't naturally possess.

3.2 Believable Agent 25 el

In addition to the facial expressions, the agent can display a range of emotions through body movements. These movements are designed to be consistent with the natural movements of dogs, in addition to some movements that resemble humans. For example, when the agent wags its tail, it is assumed to be showing friendliness or anticipation of good things. This movement is commonly seen among dogs. The agent can also shake its head to show a sense of unwillingness or rejection, which is a gesture unknown to dogs but specific to humans.

Fig. 3.3: On the left is the spectrum of facial expressions of our virtual agent. On the right is Lorenz' grimace scale, demonstrating levels of pain for dogs.

The facial expressions and body movements are animated in two superimposing layers, if, ib, such that the agent can display different facial expressions while moving the body freely. The body movements layer lb concerns all the vertices on the model, including the face, while the facial expression layer, if, concerns only vertices on the face. We use {Mf}, {Mb} to represent the transformation matrices corresponding to the two layers. Each vertex i on the model has a Mf that transforms it. But what's the order of applying the two layers of transformations? Compared to {Mb}s, which change over time to describe the movement (for example, raising the paw and shaking the paw), {Mf}s are constant over time, so we write {Mf} as translational matrices: 1 0 0 toi 0 1 0 ti Mfi = T(txi, tyi, tzi) - 0 0 1 tZi 0 0 0 1

{Mb} are a mixture of rotational and translational matrices. Because {Mb} involves rotation around a certain point, for example, an elbow rotating around the joint that connects the elbow to the body, they change the frame of reference for the translational matrices. Thus, {Mf} should be applied to the models first. We can then gain efficiency [2] by applying

Mb(MfVi) = (MbMf)vi

26 Chapter 3 System Design Layers Pa aSe

ace

body

Fig. 3.4: Setting up layers of the Animator in Unity.

In Unity, the order of transformation is applied by setting 1f as the base animation layer, and lb as an overlaying layer in the Animator component attached to the corgi gameobject (Fig 3.4).

To realize a mixture of facial expressions, starting from the corgi's neutral face vector vi at vector i, each facial expression is Mk, k E [1,.-- , K], K is the total number of facial expressions, the translation matrix Mpi at vertex i is a weighted sum of all facial expressions: K F = i jv k=1 This setup helps us easily parametrize the agent's expressions to continuous mood data.

3.2.2 Navigation and Locomotion

Now that the agent has basic animations, we want to match the animations with its body displacement. For instance, a fast-moving agent needs to be paired with a running animation, while a slow-moving agent needs to be paired with a trotting or walking animation. In addition, an agent moving up or down the hill needs to have his body aligned with the slope of the hill, and the paws need to be grounded so the agent doesn't walk under the floor. Such an actuated system is often called a locomotion system. Locomotion refers to the various methods that a robot, often composed of muscles and motors, uses to move from one place to another. Advanced locomotion systems involve the design of physical models that control the joints [53]; Recent development of locomotion systems simulates the mechanical property of muscles [19], and extends the models from biped agents to unseen new animals [67].

In Unity, navigation can be done by adding a navigation mesh agent component, NavMeshAgent, to the agent gameobject. The NavMeshAgent AI module will compute the agent's shortest path and speed to reach a certain point along the surface of a 3D terrain. The agent's navigation speed and next position in the current frame is accessible through the NavMeshAgent. Below is the structure of the code to compute

3.2 Believable Agent 27 the agent's animation speed with regard to its navigation speed in each rendered frame. The aim is to decide the agent's body animation based on its moving speed in the forward,that left and right directions. If the agent is taking a left or right turn, the animation should play a turn of the body in the corresponding direction. Since the computational time of each frame, real-time rendered, is different, it should be taken into account when computing the agent's speed.

velocity = Vector2.zero;

for each frame: { if agent is moving: worldDeltaPosition = NavMeshAgent.nextPosition - dog.position;

// Get the moving speed in the forward and side directions float dy = DotProduct(dog's forward direction and worldDeltaPosition); float dx = DotProduct(dog's forward direction and worldDeltaPosition); deltaPosition = new vector (dx, dy);

Now we have the agent's moving speed in terms of the agent's current forward (dy) and side (dx) directions, we will smooth it with regard to the speed in the previous frames to avoid sudden speed jumps. If the current frame took little time to compute, the resulting speed will be close to the previous frame's speed. In Unity, the time used to compute the current frame is stored in Time.deltaTime.

//continued from the above for loop velocity = interpolate(velocity, deltaPosition, Time.deltaTime/0.15f); //interpolate(a,b,t) is the linear interpolation such that //when t==O, result=a //when t==1, result=b

if velocity.dx > 0.1f: {set animation to turning to the right} if velocity.dx < -0.1f: {set animation to turning to the left} if O.1f < velocity.dy < 0.5f: {set animation to walking} if velocity.dy > 0.5f: {set animation to running} (etc) }//end of each frame

28 Chapter 3 System Design The agent's body should be aligned with the ground it's traveling on. To do this, we find out the ground's normal direction by casting an (invisible) ray from the down direction of the agent. The ray will measure the distance from the agent to the ground and the tilting angle, which will both be adjusted. As in the previous case where we computed the speed of the agent, we will smooth the measures with the computing time of the current frame.

smoothTilt = Quaternion.Identity; for each frame: Vector3 theRay = agent's down direction if (UnityEngine.Physics.Raycast(agent's position, theRay, out rcHit): groundDis = rcHit.distance; Quaternion grndTilt = Quaternion.FromToRotation(Vector3.up, rcHit .normal); smoothTilt = interploate(smoothTilt, grndTilt, Time.deltaTime*0.05f); transform.rotation = smoothTilt * transform.rotation; Vector3 locPos = transform.localPosition; transform.localPosition.y = transform.localPosition.y - GroundDis;

With the above we have implemented a simple locomotion system. It is not an advanced real-time 60fps locomotion setup, so there can be visible flaws. When 3D models have a photo-realistic look, human eyes are sensitive to the motion flaws. As the rendering becomes less and less realistic, human eyes become more tolerant towards imperfect character movements, as is often seen in low frame-rate animated movies. In section 3.4 we will talk about artistic and rendering choices to decrease the effect of the locomotion flaws.

3.2.3 Motivation

Emotional agents' believability is influenced significantly by the consistency of agents' emotional reactions and motivational states [50]. In a lot of animated movies, the consistency is needed to successfully build up viewer-agent empathy (Fig 3.5).

We set up the agent's motivational state as follows:

- Want friendship.

- Want to keep traveling.

3.2 Believable Agent 29 S

Fig. 3.5: Consistency between agent's motivational states and emotional states.

- In A Trip to the Moon: Want to travel to the Moon.

- In Snowbound: Want to survive snow storms.

Consistency means the agent should exhibit positive emotions when they approach motives, and negative emotions when they are blocked from motives. At the beginning of the animation, the agent will display behaviors that inform the audience of its motivational states:

- Want friendship -* the agent runs excitedly towards friends.

- Want to keep traveling -4 when each event is done, the agent moves forward.

" In A Trip to the Moon: Want to travel to the Moon -÷ the agent keeps howling at the Moon

- In Snowbound: Want to survive snow storms -+ the agent happily approaches warmth such as that provided by a bonfire or a cave.

3.2.4 Attention

After some initial testing, we realized one important piece for the agents to appear believable is to have an attentional module. In 2002, Bruce demonstrated in a study that humans are more likely to engage in conversations with robots who "indicate attention by turning towards the person that the robot is addressing" [9]. In the 2002 book Emotions in humans and artifacts, it is said that an emotional robot's attention

30 Chapter 3 System Design - - *1

Fig. 3.6: The above scene of two dogs walking side by side, gently looking at each other is enabled by the attention module. Notice that the corgi's torso bends from the head to the shoulder, so it won't affect the walking.

should be diverted by "sensory input with high salience" (loud noises, bright light), or new motives (hunger), etc [64].

In our system, the agent demonstrates attention with the turning of its head to show a change in visual focus. We assign the head joint to be the main joint to move towards the point of interest. With large turns, the shoulders also turn to compose a natural look. When the agent approaches a new subject of interest, be it a flower, a dog friend, some wild bears, it will turn in the direction even when its body is moving another way (Fig.3.6). The attention module greatly improved the believability and demonstrated intelligence of the agent.

3.2.5 Perception and reaction

Now the agent knows how to navigate an uneven terrain, equipped with a basic library of animations, but it remains to be decided when to display which expression or movement. Since the animation is dictated by users' mood and behavior data, the agent should react to events the way the user reacted to life events. Hence the agents are only semi-autonomous emotional agents - they are more like actors that present a script.

In the experiments, users' affective experiences are encoded in the units of mood- changing events, and a user's data is a series of mood-changing events. The moods are encoded with the valence-arousal chart (Fig 3.7). This encoding is supported by neuroscientist Kensinger's work, stating that events that contain emotional valence are more likely to be remembered, where the valence and arousal of these events exert an important effect [35]. In the system, we set up 12 basic emotional states of the agent to represent users' emotions: calmSad, sad, content, tired, veryStressed, excited, tense, depressed, relaxed, upset, alert, normal. Each state has its unique set of animation + facial expressions to convey the emotion. If the user's input data falls

3.2 Believable Agent 31 High arousal (exciting, agitating)

"miracle" "slaughter"

Low valence High valence (negative) death "shadow" "comfort" (positive)

"fatigued" "relaxed"

Low arousal (soothing, calming)

Fig. 3.7: The valence-arousal chart as presented in Kensinger's 2004 paper, Remembering emotional experiences: The contribution of valence and arousal. Affective experiences can be described in two dimensions: Valence refers to how positive or negative an event is, and arousal reflects whether an event is exciting/agitating or calming/soothing. Words have been placed at locations within this space, indicating their approximate valence and arousal ratings. [35]

onto a certain region of the valence-arousal chart, the output animation is then a linear interpolation of the nearest neighbors, hence deciding the agent's reaction to events. Remixing the body animation and facial expression produced distinct and convincing presentation of emotions.

Emotional granularity. Notice that there are more emotional states of the agent depicting negativity or low valence (calmSad, sad, tired, veryStressed, tense, depressed, upset) than positivity or high valence (content, excited, relaxed). This is designed so different negative emotions of the agent can be distinguished more clearly in the animation, enabling viewers to perceive one's negative emotions with high granularity. Emotional granularity refers to one's ability to "distinguish among a variety of negative and positive discrete emotions" [3]. By definition, an individual with low emotional granularity is more likely to treat a range of like-valence feelings as interchangeable. Studies show that negative emotion differentiation has a positive correlation with the frequency of negative emotion regulation [3]. That is, people who can better tell their negative emotions apart (upset from stressed from anxious) are likely to be better at regulating their negative emotions. We therefore designed a rich representation space for negative emotions, so viewers can reflect on these emotions and potentially benefit from knowing their differences.

3.3 Affective cinematography

32 Chapter 3 System Design 3.3.1 Design considerations

To tell a story with a film, cinematography comes into play and is key to showing the coherence and integrity of the story, as well as creating emotional connection. Unsuccessful cinematography design misses the key moments to demonstrate characters' personality, emotions and decisions. With this in mind, we'd like to design the shots algorithmically with the knowledge of what's happening in the movie. However, this is hard to implement for two reasons. First, the randomization of locomotion makes it hard to be precise about what the agent is doing at anytime. Second, there are numerous occlusion objects in the scene, like trees, rocks, and it would be computationally expensive to include all of them in deciding the camera behavior for each frame.

With these challenges in mind, we experimented with setting a few default, "safe" shots, like an over-shoulder shot that closely follow the agent, and alternate these safe shots with other riskier shots that are more likely to go wrong. A top shot looking down at the agent is a risky shot, because there could be tree branches blocking the camera, resulting in a black screen.

We soon found that the "safe" shots we chose are a little boring to watch, because they stiffly follow the agent without any indication of change, like exiting old scenes, establishing new scenes. They also feel unnatural because they perfectly ensure that the agent is always visible and the focus of the scene.

Eventually we chose to dynamically generate what we will call a tracking shot, a fixed-location camera with the ability to rotate to look at the agent, and a dampening effect - if the agent runs too fast and gets outside the frame, it takes the camera a second or two to catch up with it. Every 10 seconds or so, the tracking camera is re-located within a radius of the agent's current position. Because the camera rotates, no tree or rock can occlude the view forever! The outcome is a natural camera movement that allows for change of angle, imperfection and occasional surprises.

Since the camera is randomly placed, there is the chance that it will end up underground, or inside a tree. By adding the height of the terrain at the spot, it makes sure that the camera is at least above ground. To detect if the camera has ended up inside a tree, we cast an invisible "ray" from the camera to the agent. If the ray intersects with any surface, we re-locate the camera to another random spot until it's good:

// this is part of the script attached to the tracking camera void Start (

3.3 Affective cinematography 33 { corgi = GameObject.FindWithTag("Player"); trans = transform; //repeat relocating the tracking camera every 10s InvokeRepeating("MoveCam",10f,10f); }

void MoveCam() f pos = corgi.transform.position + new Vector3(Random.Range(lf,8f), 0,Random.Range(if,8f)); float y = Terrain.activeTerrain.SampleHeight(pos); y += Terrain.activeTerrain.GetPosition().y + Random.Range(lf, 5f); y = Mathf.Max(y, -15); pos = new Vector3(pos.x,y,pos.z);

I/if is underground or inside other objects, resample a point int layerMask = 1 << 1; while (Physics.Raycast(pos, corgi.transform.position-pos, (corgi.transform.position-pos).magnitude, layerMask)) f pos = new Vector3(pos.x,Terrain.activeTerrain.SampleHeight(pos) + Terrain.activeTerrain.GetPosition().y + Random.Range(lf,10f),pos.z); } transform.position = pos; }

In addition to the basic tracking camera, we designed a set of other camera shots triggered when certain conditions are satisfied, maximizing viewers' affective response.

3.3.2 Designing with the Cinemachine system

With the Cinemachine module in Unity, we crafted the following types of camera shots and set up conditions to trigger them (Table 3.1).

At runtime, the cameras are switched via tuning a dynamic CinemachineVirtualCam- era.Priorityscore. Each camera shot is assigned a score, and the camera with the highest score gets deployed in each frame. This system avoids the conflicts of camera shots when the triggering conditions of multiple cameras are satisfied at the same

34 Chapter 3 System Design time. For example, when the agent is walking tiredly, the Priorityscore of the Dutch tilt camera is set to 30, which is higher than all the other cameras. At the same time, if the agent displays a surprised look, the Priorityscore of the close up shot will be set to 40, and a close up of the agent's surprised face shot will be rendered. The "priority" of the close up shot can shadow the other less common shots because it's more important in storytelling, revealing a change in the agent's mood.

Tab. 3.1: List of camera shots. Cameras are ordered by the frequency of occurrence.

Camera shot Explanation Triggering condition tracking shot the camera's position is still, default1 while rotating to look at the agent close up the camera is close to the agent's when the agent's emotional state face changes 2 over the shoulder the camera is placed behind one to establish agent-agent interac- agent's shoulder tions when they run into each other arc shot the subject is circled by the cam- When the agent goes close to the era Moon, the camera does an arc shot to show the Moon Dutch tilt the camera is tilted on its side when the agent is tired or anxious top shot the camera is directly looking at when the agent faints, falls the scene from above, to show a asleep, or discover something contrast of the agent's smallness good in a harsh environment against the vast background bottom shot the camera is looking above at when the agent jumps in the air the character from below hand-held shot the shot has a jerky, immediate when the agent enters the scene feel by adding a random displacement to the camera's position earthquake shot the camera shakes quickly and when rocks are about to fall from discretely to create the illusion the sky that the ground is shaking I List of camera shots deployed in the system.

A few special camera effects are used to construct the cinematic atmosphere. We applied Depth of field to the cameras, which is a blurring effect of objects not in focus, to help the audience concentrate on the agent. Vignette originated from an old-time film and TV defect where the corners and edges of the screen are darkened. It also resembles people's tunnel vision, thus creating an impression of stress or nostalgia, depending on if it's applied to a stressful or sentimental scene.

3.4 Rendering considerations

3.4 Rendering considerations 35 Fig. 3.8: Design of different skyboxes. From left to right, up to down: sunny sky, sunset sky, morning sky, and stormy sky. Notice how the skybox and the lighting of the 3D assets together set the basic mood and atmosphere of the environment.

3.4.1 Lighting and weather

Lighting is an important tool we used to convey the joy and challenges in the environment. The environment is an externalization and extension of the agent's emotions. The fall of rain and shiny sunlight can both used to convey the current mood of the agent. When the animation was shown to the audience in the experiments, they often "aw"-ed at the change of weather from sunny to snowy as if empathizing with the agent's fate.

Lighting is composed of numerous parts in Unity, but a few mattered most in our project: the skybox, directional lights, the ambient light and fog. The skybox is a box with a 2D projection mapping that surrounds the whole terrain. In Fig 3.8, the sky as well as the green mountain are all in the 2D skybox, while the foreground - the trees and the snowfield are true 3D models. Since all scenes in this animation are outdoors, the skybox takes up a large proportion of visual space and is key to setting the tones and atmosphere of the scene.

The five basic skyboxes (sunny, sunset, morning, stormy, night) are each matched with coordinating directional lights and ambient lights. Fog refers to overlaying the environment with a gradient as the camera depth increases (Fig. 3.9). The fog can blend the 3D environment naturally with the edge of the skybox, and also create a sense of depth.

36 Chapter 3 System Design Fig. 3.9: Dark vs. light-colored fog, with other lighting conditions the same.

3.4.2 Stylized rendering

After some iterations, we used cel shading and hand-painted textures to compose a rendering style to mimic the look of traditional 2D animation (Fig. 3.10).

Cel shading in computer graphics refers to rendering with non-photorealistic illumi- nation models: On an object surface, according to the angle between incoming light and surface normal, the color is computed discreetly rather than continuously Cel shading generates a cartoon effect. In addition to cel shading, we added hand-drawn outlines to the objects by computing the angle between the view line and the surface normal. When that angle is close to vertical, meaning we're looking at the surface forming an edge there, a black line is drawn.

Instead of using photos of surface texture, or computer-generated textures, we worked with artists to paint the texture for skyboxes and the environment. The hand- painted sky and environment resembles hand-painted backgrounds in traditional 2D animations.

Fig. 3.10: Comparison between the Unity standard rendering (left) and our stylized rendering (right).

We made this design choice to hide the imperfections in the graphics engine. With a realistic rendering approach, human eyes will recognize the world as realistic and become sensitive to identify flaws: deformed arms, the dog's paw going underground, etc. Considering that we can't eliminate all the flaws as the animation is being procedurally generated, we decided to render it with a hand-drawn style, where human eyes are much more tolerant with the physics of the world. When the dog's

3.4 Rendering considerations 37 foot steps underground, it looks like an artistic abstraction of the foot, rather than a physical conflict.

38 Chapter 3 System Design Experiments

4.1 Graphical Affects Validation

Before testing complete movies on users, we generated short clips of animation to see if emotions conveyed in the animation can be accurately perceived. We can't possibly test all the possible scenes as a result of so many degrees of freedom in each scene, so we chose four clips that represent four basic affective states. We call this first study "graphical affects validation".

We conducted an online survey of 30 subjects to evaluate the emotional effects of four key animation clips, each showing the agent being a) stressed and anxious, b) happy and energetic, c) frustrated and tired and d) sociable and friendly. Clips a) and b) show the agent in an environment with background music or ambient sound. Clips c) and d) show only the agent with no environment or sound. The environment and sound are designed to elicit the corresponding emotion. For example, in the video clip a), in order to indicate the high level of stress and anxiety, the lighting in the forest is dark, the landscape morphs into obstacles, the background music is eerie, and the agent is staring at the forest, panting heavily.

Users were asked to rate the corgi's happiness level, energy level, stress level, and calmness level on a 1-7 Likert scale, and briefly disclosed whether they have an experience they can relate to the video.

4.1.1 Results

The results of the graphical affects validation survey are shown in Table 4.1. The results overall confirmed that the intended states were perceived from the animations: On average, video clips b) and d) have a high happiness score (6.23 and 6.77), while c) has a low happiness score (1.7) (Fig 4.1). Users are generally able to perceive clip a) as a stressful situation (5.33), the agent in clip c) as sleepy or tired (1.86), and the agent in clip d) as sociable (6.36). The T-test to compare the mean scores of certain perceived emotions shows statistical significance with users considering the happiness level in clip d) higher than c) by 3 scales (p<0.001), the happiness level in clip b) higher than a) by 1 scale (p=0.0019), the stress level in clip a) higher than

39 Video clip a): atress and anxiety Video clip b): happy and niergedc Video cip c): *ustr d and ired Video clip d): social and fI'endly

Video clip c): frnitteed and Video clp ): eeaed and

Fig. 4.1: Distributions of ratings. The first row of four graphs shows how happy users perceived the dog to be in the four video clips. Users have varied opinions about clip a) and fairly similar opinions about clips b), c) and d). The second row of graphs shows how users perceived the sleep quality of the dog in c) and the stress level of the situation in clip d).

b) by 2 scales (p<0.001), and both the energetic level and sleep quality of the agent in video clip b) are 3-scales higher than c) (p=<0.001, p=0.0029).

In users' description of their responses to the videos, we see that users are able to relate personal experiences to the animation. When asked about how the videos make them feel, and 23 users responded "tense" or "anxious" to a), 23 users responded "happy", "excited" or a synonym such as "exhilarated" to b). When watching clip b), users relate the visuals to their recalled experience of "going on vacation", "opening Christmas presents as a child" and "when I see a steak".

Clip c) reminds users of experiences of lack of sleep, being rejected and times of being depressed. We recorded responses such as "how I feel after a long stressful day at work or at home", "getting stood up on a date" and "my own depression".

Clip d) indicating positive social interaction makes users think of good social experiences such as "seeing an old friend, feeling playful" and "seeing someone for the first time in a while".

The positive results of this graphical affects validation study suggest that users can connect to the agent in a well-designed animated scene at various emotional depths. We were able to proceed to create personalized animated movies based on the findings of this study.

4.1.2 Discussion

Clips a) and b) tested in the study included multiple components that might elicit viewers' emotions: the character's behavior, the environment, and also background

40 Chapter 4 Experiments Animated a) stressed, b) happy, en- c) frustrated, d) social and Affect anxious ergetic fatigue friendly Happiness 3.86 2.06 6.23 1.35 1.7 1.12 6.77 0.50 Calmness 5 1.98 4.73 2.02 2.43 1.48 4.06 1.43

Is the situation 5.33 1.65 - stressful?

Energetic - 6.23 0.63 1.3 0.53 -

Sleep quality - 5.9 1.58 1.86 1.16 -

Sociable - 6.23 2.01 2.1 1.35 6.36 1.18 Tab. 4.1: Results of graphical affects validation survey. For certain clips, certain questions are irrelevant to the affect of interest, so we didn't ask those questions and left the table blank. The scores indicate the mean standard deviation of the Likert scale (from 1 to 7) rating for the corresponding affect. The higher the score is, the happier/less calm/more energetic/having slept better the agent appears to the user. The first column describes the animated affect in the four clips.

sounds. As these components will always appear together in the story videos, we decided to test the effect of the composed scene, rather than each element separately As a result, we can't claim how much weight the character and the environment contributed separately to eliciting the desired emotion.

The study also revealed some specific instances of ambiguity in the animations. The responses of 8 out of 30 users indicated that it was unclear whether they thought clip a) reflects "excitement" or "anxiety" of the dog, while they mostly agreed the clip depicts "a stressful situation". It suggested that these users either perceive the stressful situation as a challenge (exited response) or threat (anxious response). MrCrae found in a study that people tend to treat a stressor as as a loss, a threat, or a challenge, and deploy different coping mechanisms [43]. Therefore, we incorporated two questions "How stressful was today?" and "How did you handle the stress?" in future versions of the survey and created different animations to separate these effects. In the movies, the ambiguity in the agent's facial expression is reduced by following the shot depicting the facial expressions with attributional content. (See the paragraph about stress in 3.1.1 for detailed explanation.)

4.2 A Trip to the Moon: Self-reflection

We conducted a one-week video study to evaluate the emotional effect of the personalized animated movies. The study protocol and recruitment process are pre- approved by the MIT Committee On the Use of Humans as Experimental Subjects (COUHES).

4.2 A Trip to the Moon: Self-reflection 41 We sent emails to university labs and dorms, recruiting 27 participants aged 18-36. Random assignment was made resulting in 13 of them assigned to the control group, and 14 to the personalized group. One participant in the control group dropped out of the study and asked to have his/her data deleted before watching his video, because school kept him too busy. All participants took pre-study surveys including tests on personality, stress level and mental health status. Each day during the week, participants reported their daily mood and behavioral data through an online Google Forms survey.

At the end of the week, participants received an animated movie through email, and were instructed to watch it. Members of the personalized group each received a story that was personalized according to each individual's mood and behavioral data, while the control group receive a non-personalized animated video of the same length, featuring a corgi dog rendered from the same models. Considering that the one-line cues are part of the personalization package, they were only used in the personalized group. The control video is generated from real data of a preliminary study participant, showing the corgi dog running through a forest, facing a stressful rain of rocks, having one negative and one positive interaction with another dog and in the end running uphill on a good note. The control video was picked such that it features some ups and downs that all participants might relate to their surveyed experience.

All participants were told the same story: that the video was customized to their mood and behavior data. After watching the videos, participants filled out an evaluation form. There were three questions that accepted free text-form responses, without prompting the participants with any of the researcher's preconceptions such as "stress" or "social interaction":

- Q1 Ignoring imperfect AI renderings, what do you think about the story in the video?

- Q2 What do you think about the main character (corgi dog)?

- Q3 This video is generated from your mood and behavioral data. What do you think the video reflected about you?

After they submitted the responses online, all participants received the gift card reward.

42 Chapter 4 Experiments 4.2.1 Group comparison

After one week, we received 10 responses to the video through Google Forms from the 12 people remaining active in the control group and 13 responses to the video from the 14 people in the personalized group. We first present a quantitative overview of the results by looking at behavioral measures such as the lengths of their responses, and then do a thematic analysis [6] to discuss participants' emotional engagement, connection to the agent and change in self-reflection.

Two researchers first read the responses with the research questions in mind, blinded to conditions, and came up with four quantitative cross-group comparisons. One researcher then coded the responses manually, blinded to conditions. First, we looked at how many participants in each group showed confusion about the story, by saying "I was confused", "not clear what is happening". Second, we compared the average length of response in two groups. Third, we counted how many emotion- descriptive (e.g. "emotional", "nostalgic") words they used to describe themselves. Fourth, we compared if they recalled a past experience from the animation. The comparison results are listed in Table 4.2. The results support the engaging effect of personalized animations (less confusing, more emotionally engaging, inducing lengthier writings and more recalling of past experiences), but are not statistically significant over those of the control group.

Considering that simple quantitative measures as above don't capture the full picture, we read the responses and found that participants' responses fell into three types: indifferent, intrigued, affected, based on the length of their response and whether they connected the animation to themselves.

Indifferent. Participants who were indifferent showed confusion and negativity when asked what they think of the story. To Q1, they replied with confusion. To Q3, they denied that the video represents or reflects them and didn't relate the video to their personal experience. Their answers are all brief, adding up to fewer than around 80 words (M=42.8, SD=27.1), showing a low level of engagement.

Intrigued. Participants who were intrigued provided their interpretation of the story to Q1. They had some connection with the agent by describing the corgi dog with positive words such as "cute", "expressive". They wrote short but positive answers, the total word count was fewer than around 80 (M=40.1, SD= 26.6). They were able to discover one or two self-reflections.

Invested. Participants who were invested used the most emotion-related words to describe their feelings for the video. They wrote the longest answers, ranging from

4.2 A Trip to the Moon: Self-reflection 43 -am

around 80 to over 300 words (M= 127, SD= 93.9, the big standard deviation due to the long tail of long responses). They described a strong personal connection with the agent and could describe multiple reflections on their experiences and personalities. The defining factors of the three types of response are summarized in Table 4.3.

Group Confused Word Count Emotion Recall Control 50% 71.3 t 45.7 2.2 2.5 60% Personalized 30.8% 84.5 95.4 3.3 3.0 84.6% P-values All > 0.1 Tab. 4.2: Comparison between the control group and the personalized group. Confused refers to the percentage of users showing confusion about the story (e.g. By saying "Don't know what's going on", "I am confused"). Word count is the group average of word count. Emotion refers to the average number of emotion-descriptive words (e.g. "emotional", "happy", "nostalgic") the user used in their response. Recall refers to the percentage of participants recalling past experiences corresponding to the animation in the group.

Type of response Indifferent Intrigued Invested Word count < 80 < 80 > 80 Confusion Yes No No Described emotions they felt No Yes Yes Wrote self-reflection No Yes Yes Tab. 4.3: Three types of responses to the video.

0.6 Percentage of Responses

0.5 W Personalized Group

0.4 _____5/13 -3/10 Q0.3_ 0.3 a.0.a .2 - -2/10

0.1 - 1/13 0 Indifferent Intrigued Invested Type of response Fig. 4.2: Distribution of three types of responses among the two study groups.

The distribution of the three types of responses in the two study groups is shown in Fig. 4.2. While the overall study is limited to a relatively small number of people, we can see higher intrigued and invested frequencies in the personalized group than in the control group. Half (5 out of 10) participants in the control group showed indifference about the video, while only 1 out of 13 participants in the personalized group showed indifference. The difference in types of responses in the two groups is statistically significant (p=0.036 at confidence level 0.95) using an independent-

44 Chapter 4 Experiments samples unequal-variance t-test. The result suggests that while participants in both groups try to reflect on their behavior when told the video is generated from their personal data, the personalized videos were more interpretable and successful in terms of eliciting personal reflection than was a generic, non-personalized animation (even though the participants in both groups were told it was "personalized".)

4.2.2 Emotional engagement

Below we demonstrate the effect of personalized animation with examples. We mostly list responses from the personalized group, omitting those of the control group, half of which describe confusion and indifference anyway Interested readers can refer to this online database for a full collection of participant responses.

Among the intrigued and invested participants, emotion responses range from low arousal ("calm") to high arousal ("super excited"), low valence ("sad", "didn't like") to high valence ("glad"), and more complex emotions such as nostalgia and independence. Below are some answers to Q1, "what do you think about the story", from participants whose age, gender and group (P for personalized, C for control) are included in brackets.

Happy: "Ifeel glad that the corgi was able to walk on the moon after a few chapters featuring ups and downs." (25, male, P)

Sad: "It made me feel nostalgic and at some points a little sad." (18, male, P) "Itseemed kind of aimless and repetitive at times, but was also beautifully poignant at others." (19, male, P)

Calm: "Ifeel calm and relaxed." (30, male, P)

Unsettling: "It was scary and unsettling in the middle, but the nice music as I/the dog chilled on the moon made it slightly better" (19, female, P)

Dynamic and evolving: "The very beginning made me super excited, because corgis are great and I recognized the phrases I entered. I could tell the story began happily. The video certainly dipped into a spooky atmosphere, and Ifelt anxious watching the corgi become separatedfrom its friend by a wall of rocks. The middle made me nervous that something was going to pop out on the screen. The end greatly confused me, and Ifelt neither happy nor sad. But it seemed the corgi wanted to reach the moon all along, and it finally achieved its goal." (20, female, P)

4.2 A Trip to the Moon: Self-reflection 45 Denial of one's behavior. Some participants were engaged by the story, but showed a negative taste for certain behaviors of the agent. "I didn't like the story. The corgi didn't really interactwith the other dog and was alone at the end." (19, male, P)

This attitude of denial indicates that the participant might have felt self-conscious or offended when seeing their undesired behaviors animated on a character. The same participant wrote, "I have a hard time connecting with others," when asked for his self-reflection in Q3. This response is consistent with the behavior he disliked of the agent. It suggests that seeing a truthful reflection of one's undesired behavior could be a relatively unpleasant emotional experience.

From this exploration, and the seriousness with which participants reflected based on this automated animated portrayal of their data, we believe that personalized story videos should, carefully balance the unpleasantness with hope and positivity. We do not wish to have videos presenting bad mood or unwanted behavior data leaving a participant unable to overcome the initial discomfort; rather, we want to enable them to emotionally reflect, but then also to feel motivated to change their behavior. As another participant pointed out, "I was kind of shocked to see things suddenly get dark, but I think it also made sense... It was just a little jarringto come face to face with." (19, female, P).

Notice that certain participants in the control group (5/10) were also emotionally invested in the video, even though the personalization of their videos was a ruse. One participant wrote "It left me with this feeling like I can be independent which is something I'm also working on after years of being dependent on significant others and because I'm still recovering from a hard break-up, seeing the corgi realize he didn't (rely on) the other dog to be happy made me happy." (22, female, C) This result was expected from the Barnum effect, but we also expected the effects to be greater in the personalized group. The high engagement in the control group supports our animation and cinematographic design for viewers to connect emotionally with viewers.

4.2.3 Human-agent connection

We analyzed the participants' empathy for the agent by looking at replies to Q3, "What do you think about the corgi dog?"

12 out of all 23 participants, and 8 out of the 13 participants in the personalized group directly described their connection to the agent. "The dog seemed kind of lonely for most of the video. For some reason (maybe its facial expression and tongue flapping) the dog appeared naive and kind of stupid to me, but also always hopeful. I

46 Chapter 4 Experiments felt a deep emotional connection to the dog at times." (19, male, P) "Ilike it, it is cute and expressive." (18, male, P) "The corgi is adorable. I was sad when the dog looked sad/sleepy. While watching it, I thought of how I could make the corgi happier and then at the end, it was so happy :D" (22, female, P) One participant showed that they identified with the agent by referring to them and the agent as the same entity: "I/the dog chilled on the Moon".

Some participants' attitude towards the agent was more descriptive of what they saw it looked like and less descriptive of their personal connection. "Hismood changes very quickly." (19, male, P) "Temperamental." (21, female, P) An 18-year-old female participant (P) considered the dog "high", which was not one of the intended states, although it could be possible to also ask participants about their drug and alcohol behavior and reflect such use on their avatar. A 19-year-old male participant (C) said the agent "has communication and social intelligence problems". His reply to Q3, "That I was confused and in a mentally bad place," seems to indicate that he may have empathized with the agent's visually-demonstrated challenges and problems.

4.2.4 Self-reflection

The one-line reminders provided daily by each participant appeared as chapter titles in their personalized animations, which served as memory retrieval cues. Two participants explicitly commented on the effect of the retrieval cues, "I was also vaguely amazed at how well I could recall each day based on the words in the chapter names <- which also kept me invested, because I knew it was me." (19, female, P) "The very beginning made me super excited, because corgis are great and I recognized the phrases I entered." (20, female, P)

Participants independently came to conclusions about their mood behavior pattern when answering Q3 "what do you think the video reflected about you". Notice that they were not prompted to write about "friends", "exercise" or "stress", nor was there any text in the animation indicating that a particular scene was about a certain behavior. In other words, the reflections were solely based on their understanding of the story world and memory of doing the surveys.

Overall mood. "It reflected that I have been fairly happy the past couple of days." (30, male, P)

"I've been through a lot recently, but try to keep my head up and can do so because of myfriends." (21, female, C)

4.2 A Trip to the Moon: Self-reflection 47 "Mostly happy with some occasional stress, maybe the interaction with the other dog represents an interaction with another person, also I sometimes think of my personality as a geometricallyrendered dog." (20, female, C)

Social life. The animated video portrayed the self-reported social interactions as interactions between their avatar and another dog, prompting some to reflect on their social life and its interaction with their mood. A number of participants reflected in writing on aspects of their social life that need improvement, such as loneliness and isolation. "I have a hard time connection with others." (19, male, P) "Needs more friends." (18, female, P) "When I isolate myself, I tend to be sad and unproductive. I sometimes need time away from people, but that can turn into a negative thing when there's too much of it." (19, female, C) "The presence of other dogs seems to reflect how my mood strongly depends on my interactions with other people. When the corgi got walled off by all those rocks, I was reminded how lonely Ifeel when I go without human interactionfor too long." (20, female, P)

Stress. When there was a stress scene in the personalized videos, participants were usually able to identify the tense atmosphere and connect it with their stressful experience. "The wall of rocks also correspondedwell with all the homework I received this past week." (20, female, P) "That I am sometimes calm and collected but other times I am overwhelmed by stress and wants to run away/not know how to handle the stress." (22, female, P) One participant (P) was, however, unable to identify the source of the stress, "(The video reflects) that I'm angry?? It made the world seem very scary and out to get me. Why was this dog on a planet in the middle of nowhere with hostile boulders and angry other dogs?"

Sleep and exercise. Fewer participants discovered the relationship between sleep, exercise, and mood. One 18-year-old male participant (P) thought he could have been more "active", which could either refer to a mental or physical state. No participant mentioned "sleep", though some of them had very irregular sleeping schedules, causing the agent to appear tired and fall asleep often in the video. Considering that we didn't give participants prior instructions on how to interpret the animation, we think participants could either have perceived the lack of sleep as a physical state or mental tiredness. That the agent acted energetically or slowly can also be interpreted differently for participants.

It is interesting how certain participants also came to a conclusion about mental states that are not included in the surveys, and thus not intentionally included in the animations. In other words, some thought that the animated video, or the system behind it, had the intelligence to speculate on their "hidden states:"

48 Chapter 4 Experiments "The video also implies I'm more of a follower, since I'm not the type to push people to do things. I'm not sure what the moon represents, but if the video is trying to imply I space out a lot, it'd be very correct." (20, female, P)

"I've mostly been around people this week, but sometimes, I do feel a little lonely off and on; I'm surprised the video caught that even though there's no question asking if Ifeel lonely, so I'm impressed that this "AI" somehow captured that vulnerability in me." (22, female, C)

4.2.5 Discussion

Impact on Mood

Viewing MPM can influence mood. In the study, participants reported feeling various, and often a mixture of emotions while watching MPM: "happy", "nostalgic", "glad", "sad". It seems like depending on the content of MPM, dictated by user data, various emotions can be triggered and mood change in all directions can occur. In the psychology literature, mood is often studied as a subject that can change within a window of minutes, at the playing of music and flickering of images [45]. Music can have a powerful influence on viewers' mood. It is difficult to determine whether music alone caused the users' mood to change, or it worked in addition to the story. We would hope that it was the story that was the most important in altering the mood, but further studies need to be done to determine the contribution of these modules.

Focus on Behaviors of Interest

In order to better represent each user's past week, also in response to users' request in the preliminary study, our video study involved mood and four types of behaviors (sleep, exercise, social interaction, sense of purpose). As a result, it was harder for participants to intuitively understand how their sleep and exercise affected the story. Future videos can try to focus on fewer dimensions of data and dedicate the story to emphasize the largest change in the data, or a change that the user specifies is of particular interest to them. Slight changes to the automation algorithm could enable their personal animation to function as a kind of amplifier of the behaviors on which they most want to reflect.

4.2 A Trip to the Moon: Self-reflection 49 Balance hope and negativity

Emotionally engaging stories require the alternation of ups and downs in the agent's experiences, and can either have a happy ending or a sad ending. However, for some audiences, the negative part of the story is more difficult to come face to face with. Certain participants found the dark part of the story "jarring", "scary" and "unsettling". Our study findings suggest that while negativity might seem discouraging to some individuals, it can be carefully balanced with positivity and the hope to change. Future work needs to consider how to handle this balance in the case when all the data from the participant is negative. In our next study, Snowbound, we included a question at the end of each daily survey, asking the participants to think of one thing they are grateful for today, so we have some positive data in case their data profile is mostly negative.

As movie creators, we are not constrained by any set of rules to present the emotions of the movie. As a result, we have the ability to "manipulate" users' perception of their behaviors. We can zoom in on a positive experience or dwell on a negative one, depending on the desired cognitive and psychological response. Future studies can dig more into these effects on a personalized basis.

Judging good work. The above discussion naturally leads to the question: what do we define as "good" personalized animation? From our interaction with participants, we see personalized animation potentially serving different purposes: self-reflection, changing behavior, or even communication of emotions. Depending on the purpose, the standards for "good personalized animation" can vary. Generally, a good personalized video should be able to elicit curiosity and empathy - viewers keep anticipating what happens next out of compassion for the characters (themselves); subsequently, what designers want users to do with that feeling can be tailored from case to case.

Content

Based on participants' feedback, we gradually optimized MPM to last a maximum of 8 to 10 minutes to engage the audience, because participants reported that longer videos were "slow-paced" or "long-winded". This is partially due to the fact that we have a limited library of agent behaviors and events, and in long videos the repetition of behaviors and events might disengage the viewer. Similar to computer games and animated movies, we narrowed the "gameplay time" down to a range that is supported by how much content we have.

50 Chapter 4 Experiments From our post-study survey of 27 participants, the average interest in watching more videos from future data is 5.25 on a Likert scale of 1 to 7, indicating that they are curious overall to see changes in their personal animation story when their behaviors and experiences change.

4.3 Snowbound: Changing implicit self-esteem

The results from the previous study made us wonder if any measurable change other than mood happened to the participants when they viewed their MPM. The exciting observation that viewers found the story "accurate in describing their week" despite the video being generic and ambiguous means that there's room in being creative about the content and showing viewers one side of the story more than the other.

One hypothesis, for example, is that MPM can subtly nudge people to think more positively or negatively about their past experiences if we deliberately show more positive or negative content. Studies show that people's experienced past events and remembered past events can differ [55]. The remembering is often determined by certain moments of an experience, such as the peak and the end of the experience [55], instead of the integral of it [32]. Positive reminiscing of past experiences is positively correlated with perceived ability to enjoy life [10]. Given the above, we would like to design MPM as a tool to help balance positive and negative reminiscing. It can induce constructive reflection of negative experiences for those who are already comfortable emotionally, or give a positive boost to the emotionally afflicted.

4.3.1 The Shape of Stories

It's intuitive that showing more positive content might induce reminiscence of more positive things. In addition to the proportion of positive content, we think it also matters where to put the positive content, and the dynamics it produces in relation to the whole story video.

Kurt Vonnegut in a 1995 lecture [66] proposed that successful stories in literature have "shapes", defined on a scale of good fortune - ill fortune over the course of time (Fig. 4.3).

For example, "man in hole" is the body of stories that starts good, declines with bad luck and challenges, and the characters overcome back luck to reunite with fortune in the end.

4.3 Snowbound: Changing implicit self-esteem 51 -AI

OUT" 00

Fig. 4.3: Three types of story shapes by Kurt Vonnegut: man-in-hole, boy-meets-girl, and Cinderella.

0.25 .20 - Vsv -s2- sV3 U. 15. - ~Cosst20Bok Moda-10 3 h,,0.05 0.00 -0.05 -0.10

-0.5 %of oo % of BookofB k 10 30 50 70 90 10 30 50 70 90 10 30 50 70 90 02 0.20 . S )- -(9V 2) -- -(SV 3) 0.15 MadOY10 h,0 05- 0 00 - -0.05- -0.10 - % of ok% of 10 30 50 70 90 10 30 50 70 90 10 30 W0 70 90

Fig. 4.4: Six emotional story arcs overlayed with the emotional trajectory of the closest 20 books. On the top, from left to right: rise, man-in-hole, Cinderella. On the bottom, from left to right: tragedy, Icarus, Oedipus.

"Boy meets girl" describes normal characters running into something marvelous, lose it and eventually gaining it back.

"Cinderella" is another shape of stories that start with bad fortune, and hope builds up - until the dramatic moment everything is lost, but eventually the character obtains infinite happiness.

With data mining, researchers looked into Vonnegut's shape of stories theory by ex- amining online collections of fictions and their popularity [54]. With a sentence-level sentiment analysis and sliding window approach, six clusters of popular emotional story arcs emerge from all 1327 stories. The results are shown in Fig. 4.4.

To apply the emotional story arcs to our animation stories, we assigned labels to 8 types of story arcs (Table 4.4).

We encoded the valence trajectory of MPM with the valence and arousal of each animation event (Fig. 4.5). For each event that the agent encountered, we represented it with a triangle dot. If the event had a positive, negative or neutral impact on mood, the valence was encoded as 1, -1 or 0. Because it is not intuitive to compare

52 Chapter 4 Experiments Label Story Arc Explanation R rise it goes from low all the way up TRD tragedy it goes from high all the way down

MIH man in hole RF rise-fall it rises and falls ODP Oedipus it's the opposite of Cinderella, starting by going down a valley, then going up with more hope, but eventually ends in tragedy

CDRL Cinderella FP flat-positive all events are positive FN flat-negative all events are negative Tab. 4.4: Labeling emotional story arcs.

the positivity of events in MPM, such as that of escaping a bear attack and that of a happy walk with a dog friend, we simply encoded whether an event was positive or negative to see the overall story curve. We then fitted a polynomial curve to the discrete mood-changing events. To avoid sharp peaks and valleys in the polynomial curve, the discrete mood-changing events are smoothed with a Savitzky-Golay filter, where the output value Y at data point (j,Y), j = 1, 2,-- , n, is smoothed with a set of convoluted coefficients, Ci:

r-1 2 j rniyji 2 2

Considering that with samples from the study, most participants had from 3 to 15 mood-changing events throughout the week, we chose m = 5, and the 0th order of the convoluted coefficients {-0.086, 0.343,0.486,0.343, -0.086}.

4.3.2 Implicit Association Test and Implicit Self-Esteem

To measure the intervention result of MPM, which lasts no more than 8-10 minutes, it raises the question as in what test to give the participants before and after MPM that would reflect change objectively over such short time frames. Specifically, the

4.3 Snowbound: Changing implicit self-esteem 53 Emotional Story Arc for SB1804 Emotional Story Arc for SB1806 Emotional Story Arc for SB1808 10- A T 100 A A A A A 07S 10 A A A 050 05 00 025 00 -- 4------0.00 - -0T2i

-075 -1.0 1 -100 AA f S l -10 A A A A 0 2 6 0 o 26- S2 4 6 a 10 Time Time Time Emotional Arc for Emotional Story Arc for SB1B1O Story SB1811 Emotional Story Arc for SB1813 10 A

06 0S 04 S04 400

02 .T25

0 ------4g Time Time Emotional Story Arc for SB1814 Emotional Story Arc for SB1817 100 A A A A 100 A A M0 d-chengsn events 075 - %%lentV 050 0.25 000 ------A S000 ------\ A -0.25- -025 Med& As m50e -050 High Ar011 -0.7S -0.75 -100 A A -1.00 T ime Time

Fig. 4.5: Story arcs of certain participants' animation stories. The vertical axis signifies valence, with 1 being positive, 0 being neutral and -1 being negative. The individual triangle dots shows mood-changing events, and where they lie on the valence scale. The curve is a smoothed polynomial fit to the mood changing events, which demonstrates the change in the valence trajectory. The color coding of the curve shows the change in arousal, with red showing high arousal, and blue showing low arousal. The corresponding label for each graph is (from left to right, top to down): R, CDR, MIH, FP, RF, CDR, MIH, RF.

54 Chapter 4 Experiments memory of taking the test a few minutes ago should not significantly impact the participant's response to the test the second time. The implicit association test (IAT) is such a test that is hard to fake or voluntarily manipulate for participants, even when told to do so [36].

We use the IAT to measure the participants' implicit self-esteem before and after MPM. It has previously been shown that improved self-esteem can mediate positive behavior change among adolescents [49], and that the development of self-esteem can impact preventive health behaviors such as exercise and diet [29].

Implicit self-esteem is similar to explicit self-esteem, the latter measured with self- reports. In the explicit self-esteem test, test takers are shown statements, such as "On the whole, I am satisfied with myself', and respond to them with levels of agreement [58]. Bosson [5] compared the test-retest reliability of IAT for implicit self-esteem and Rosenberg (explicit) self-esteem scale (RSE), and found that IAT's test-retest reliability is 0.69, while that of the RSE is 0.80. While the test-retest reliability of implicit self-esteem is close to that of other IATs, it is "acceptable, (albeit low)" [5]. We chose to not use RSE, however, because within the short time frame of MPM, participants can be affected by their responses to the first test.

4.3.3 Hypotheses

Hypothesis 1

Participants perceive positive content in MPM more positively than negative content.

Hypothesis 2

Viewing MPM with augmented positive content (as in the test group in the Snow- bound study) results in a rise in implicit self-esteem, compared to no augmentation (as in the control group in the Snowbound study).

Hypothesis 3

Viewing MPM that shows mostly positive moments of the previous week results in a rise in implicit self-esteem.

4.3 Snowbound: Changing implicit self-esteem 55 Hypothesis 4

Viewing MPM that shows certain rising emotional trajectories of the previous week results in a rise in implicit self-esteem. The concept of emotional trajectories of stories is explained in Section 4.3.1.

Hypothesis 5

Participants with higher PHQ9 scores (more depressed) show lower-valence facial expressions upon seeing positive content in their animation.

4.3.4 Study

We recruited 51 participants, age 18 - 62. Enrollment required that the participants score 2 or above in the PHQ2 form sent with the recruitment email [41]. This condition helped us focus on the population experiencing mild to moderate depression. Participants who enrolled didn't tell us their PHQ2 scores. During the consent session, they took the PHQ9 test and perceived stress scale test.

As in the previous study, participants recorded their mood and behavior data for a week. To address the problem of balancing hope and negativity, we added a question to the end of the daily survey, "Think of something you're grateful for today and write it down". We call this piece of text the "positive highlight" of their day. This treatment made sure that we always have data to animate the positive side of the participants' life.

At the end of the week, all participants came back to the lab to watch their videos and provided feedback. 46 out of 51 participants completed the study. Participants were randomized into two groups, the control group and the test group.

The control group watched a personalized animated movie. This movie is generated from all data the participant reported, except their "positive highlights". The data, depending on the participant's input, includes positive or negative mood-changing events.

The test group watched a personalized movie with the "positive highlights". We combined a local key words search and Google cloud text analysis to identify if the "positive hightlight" is a social activity, an activity related to sleep, or something else. For example, if the participant mentions "mom", it is animated as a social activity

56 Chapter 4 Experiments with another dog; If the text is "fun with family", it is animated as hanging out with some deer; If the text is "sleep and rest", the animation shows the dog taking a nap. Otherwise, we simply show the dog finds a burning, warm bonfire in the forest and hangs out around the fire, a comforting little scene.

We put a hidden camera in the room to record participants' facial expressions when participants viewed their animation, without informing them of the recording to obtain natural facial expressions. After the study was finished, participants were debriefed about the hidden camera and were given the choice to either a) have the recorded video permanently deleted, b) give us the permission to use the de- identified emotion data derived from the recording, but not the recorded video, or c) give us the permission to use both the the de-identified emotion data and the recorded video. Out of the 33 participants whose facial expression data was recorded, 2 picked a), 9 picked c), and the rest picked b).

Before participants watched the video, they were instructed to take the IAT for implicit self-esteem. Then they watched their personalized video while their facial expression was secretly recorded. After the video ended, they were prompted to write down responses to three reflective questions, same as the previous A Trip to the Moon study:

- Ignoring imperfect AI renderings, what do you think about the story? (Don't worry if you feel like you don't know how to read into it. It could be generally how you feel instead of solving a puzzle of the plot.)

- The video is generated from your mood and behavior data. What do you think the video reflected about you?

" Recall the most memorable moments in the video. Describe the events in the past week that they correspond to.

We put the reflective questions before the second IAT, because participants often reported feeling a mixture of emotions immediately after watching the animation. The reflective questions act as a buffer period for them to identify and reflect on their emotions. At last, they took the IAT again.

4.3.5 Data encoding

IAT results. The results of an IAT are represented with the IAT D score, where a positive D score indicates that participants were faster in the compatible (me- pleasant) block than the incompatible (me-unpleasant) block [11].

4.3 Snowbound: Changing implicit self-esteem 57 Facial expression. We encoded the time when an emotionally salient (positive/negative) moment occurred in each participant's animation video, blinded to participants' information and their facial expressions. A positive moment is when, for instance, the agent finds a bonfire or a shelter, or is approached by friendly animals. Partici- pants' facial expression data in combination with the content will help us understand how participants perceived the emotional valence of the stories.

We used the Affectiva SDK to detect facial expressions and emotions from recorded videos [44]. The emotion of interest could be smile, contempt, smirk, sadness, etc. Each emotion ranges from 0 (no emotion detected) to 100 (highest emotion detected). Valence ranges from -100 to 100, indicating from extreme negativity to extreme positivity. For each participant P, the m-th time positive content occurs in their animation, m = 1, 2,.* * , Si, we look at the area under curve (AUC) of emotions starting 1 second before the positive moment, until 5 seconds after the positive moment. That is, we block out a window of [-1,5]-seconds of their facial expressions, {ek}, k = 1, 2, --- , Nm, Nm being the total number of emotion frames we can detect within window m. Nm can vary from window to window because the participant's face can sometimes be occluded by their hands, or by themselves leaning too close to the monitor, etc. We compute the area under curve for an emotion by averaging all such windows:

S1 Th ek

1:N. AUC>(Emotion) mni Si

We used AUC to denote facial expressions following positive content, and AUC- for facial expressions following negative content. 19 participants have valid facial expression data for positive moments, 15 participants have data for negative moments, and a mere 11 participants have data for both positive and negative moments.

Now we can compute facial expression AUCs, AUCt (Smile), AUCt (Smirk), etc, as well as likelihood of emotions, AUCt (Joy), AUCt (Contempt), AUCt (Sadness) for each participant i. The likelihood of emotions is decided by several facial expressions, based on the EMFACS mappings [14]. For example, the likelihood of joy is increased by the presence of smile, but reduced by the presence of brow raise or brow furrow.

4.3.6 Group findings

This section will present an analysis of all participants' responses using statistical methods. In the next section 4.3.7, we will zoom in on a few participants with particularly interesting responses and provide some case studies.

58 Chapter 4 Experiments Facial expression analysis shows that participants are generally very attentive to and engaged with the animation. The average participant has a mean engagement of 12.6 10.1, and a maximum engagement of 94.81 13.1. Participants were found to display a variety of facial expressions when viewing MPM: smile, frown, smirk, brow furrow, etc. One participant was moved to tears.

Hypothesis 1: Participants perceive positive content in MPM more positively than negative content. Finding: Facial expression does not show adequate supporting evidence.

We used facial expressions to study participants' emotional responses to MPM. We compared the distribution of AUC and AUCJ for all participants in 5 emotions: valence, joy, sadness, contempt, and engagement (Fig 4.6). Looking at engagement, negative content seems slightly more engaging than positive content, with a difference 2.0 4.6. While the average AUC2 (Joy) is higher than AUC,--(Joy), and the average AUCJ(Sadness) is higher than AUC (Sadness), the results are not statistically significant, with large variances (Table 4.5). Participants show higher AUC (Contempt) than AUC (Contempt) by 7.3 9.4, with p-value 0.090. The average engagement at all times is higher than that of negative moments, measured by a T-test with p-value 0.054.

6 out of 11 (54.5%) participants had AUC j(Valence) > AUC,-(Valence). A comparison between the control and test group didn't show a significant difference in facial expressions. On one hand, this ambiguity might be attributed to the high variance of facial expressions: while some facial expressions of emotion are universal across cultures [15], many expressions are linked to multiple emotions [33]. Facial expressions can provide different insights into emotional experiences from self- reports [33]. When passively viewing media, participants might only partially show their emotional change through facial behaviors, or not show it at all.

In addition, participants might perceive the content as positive or negative, but feel a different way because of the memories and emotions elicited. Participants were observed by the researchers to frown at a happy family reunion, or smile with amusement when a fight with a sibling happens. Hence the positivity in facial expression might not be directly due to how the content was perceived.

Hypothesis 2-4: Is the proportion of positive content, the emotional trajectory of MPM, or participants' facial expression correlated with the change in D scores? Finding: The average valence of facial expression when watching MPM is positively correlated with the change in D scores.

4.3 Snowbound: Changing implicit self-esteem 59 Joy Sadness Contempt 16 16 0 0 0 14 14 40- 12 12 30. 10 10

8 8 20 6 00 6 0 4- 4 10- 0 2 2 "Tr 0 0- 0 - .2 0- MC- AMUC+ all AJC- AUC+ all AiC- AC+ all

Valence Engagement_ Attention 0 20 10. D2.5 - 15 0- 0 90.0- 10 -10- 87.5-

85.0- 5 0 -20-- 82.5 - 0

0 80.0 - 0 A6C- AUC+ all AJC- AUC+ all1 AJC- AUC-I al

Fig. 4.6: Box plots of AUCt and AUC- for all participants. In the top row, from left to right: joy, sadness, contempt. In the bottom row, from left to right: valence, engagement, attention. The "AUC-" and "AUC+" labels refer to the emotion at negative and positive moments in MPM, and "all" is an average of all times during MPM. The box extends from the lower to upper quartile values [Q1, Q3], and the whiskers show the closest data points within the range [Q1-1.5*IQR, Q3-1.5*IQR]. Flier points are those past the end of the whiskers. The thin orange line shows median. The thick purple line shows mean.

Valence Joy Sadness Contempt Engagement ave(AUC+ - -0.3 6.2 0.6 3.9 -1.4 4.2 7.3 9.4 2.0 4.6 AUC7) % of participants 55% 45% 45% 72% 64% with AUC > AUC,7 Tab. 4.5: Comparing the facial expressions at positive moments and negative moments in MPM.

The average of all IAT scores from all tests, is D = 0.84 0.35. This agrees with previous studies that self-esteem tests generally show self-positivity [22]. The average change in D scores, 6D, for the test group, is 6D(positive) = -0.06 0.29, while the average change in the control group is 6D(control) = -0.04 0.31. No significant difference in the distribution of 6D was found between the two groups. There is also no clear correlation between 6D and participants' PHQ9 scores.

60 Chapter 4 Experiments -

Histogram of Change in D Scores 8. 7. 6- 'U

'E 5-

0 4- 3. E z 2- 1 - 0. -0.6 -0.4 --0.2 0.0 0.2 0.4 0.6 6D

Fig. 4.7: Histogram of the change in participants' D scores before and after intervention.

S 5. 5. 10 10 - 0 S -0eI. . Oge 0 * 0 . -U'. -.. S -- B'' . -10 0 S '.. , -5 .... ~ -10 S '- -10 - -20 6 -20 ... S S o. -15 - -30 -30 1, ,61* -20-10 , , , -4010 5 15 8 10 12 14 16 PH09 PHQ9 PHQ9 pyvalue = 0.1239 pvalue = 0.7000 pyvalue = 0.1870 co coeff = -0.3879 co coeff = 0.1315 Co coeff = 0.2922 * 5 0 | 10 : 10 S0 0- *. .. ..'' 0 e* .S...... -10- -1- , *..' -10 * S -10- -20 - -20 -15 -30 -30 -20 -40 1 *i -0.25 0.0O 0.25 0.50 -0.4 -0.2 00 0.2 -0.50 -0.25 O.bO 0.25 O.50 6D OD 6D pyvalue = 0.9730 pyalue = 0.6506 p_value = 0.0714 co coeff = -0.0089 co_coeff = 0.1543 co-coeff = 0.3917

Fig. 4.8: Plots of participants' AUC(Valence) against PHQ9 (first row) and 6D (second row). Red (left column) scatter plots correspond to AUC(Valence), blue (middle column) scatter plots correspond to AUC,- (Valence), and green (right column) scatter plots correspond to AUC(Valence) for all of MPM. Attached under each graph are the correlation coefficients (co-coeff) and two-sided p-value of a test against the null hypothesis that the slope is 0.

4.3 Snowbound: Changing implicit self-esteem 61

- At a glance at the distribution of 6D (Fig 4.7), most participants had a shift in their implicit self-esteem after the intervention. Given that the test-retest reliability of the IAT test for implicit self-esteem, that is, the correlation of scores between tests repeated on the same subject without intervention is 0.69, the results we have show that the group change in IAT scores is higher than the test-retest reliability. That is, MPM is possibly doing something interesting to the participants.

We plotted participants' facial expressions against 3D, and found that valence is most strongly correlated with 6D (Fig 4.8). AUC(Valence) for all of MPM is positively correlated with 6D, with correlation coefficient 0.39, and p-value 0.0714. Taking a closer look at the valence-3D plot (right-most in the lower row), it seems that most participants with facial expression valence <= 0 had 3D <0, which indicates that participants who showed less facial happiness at their MPMs performed worse in the second IAT test than the first.

We then looked into the content and whether it's correlated with the change in D scores. Considering that the mood and behavior data differ from participant to participant, someone in the control group might have had a good week, and thus their video can be naturally uplifting and positive. We first examined the proportion of positive content, that is, the proportion of valence-raising mood-change events among all mood-changing events in the participant's MPM. We also analyzed the type of emotional trajectories that MPM presented against the change in D scores. Neither turned out to be a strong indicator of the change in implicit self-esteem.

Hypothesis 5: PHQ9 is negatively correlated with facial valence upon seeing positive content.

We examined the correlation between participants' PHQ9 scores and their emotion AUCs. We were interested in knowing whether the level of depressive symptomatology is related to how they respond facially to positive and negative content in MPM. From Fig 4.8, AUCj (Valence) is negatively correlated with PHQ9 scores, with correlation coefficient -0.38, and p-value 0.1239. It seems that participants with higher PHQ9 scores (higher depression level) showed less happy faces at positive moments, although not so throughout all of their MPM, i.e. not always unhappy.

62 Chapter 4 Experiments ______-U - 1

4.3.7 Case Studies

Participant 04: In Search for Happiness

Participant 04 (P04) had a PHQ9 score of 8 (mild depression), and had a negligible 6D = -0.05. Her MPM consisted of 8 positive and 1 negative mood-changing events.

Fig. 4.9: The screen shots are taken from P04's MPM, from left to right each corresponding to event "I got backwards crossovers :)", "seeing parents at airport" and in the participant's written response, "slow sunrise".

P04 seemed emotionally neutral when she came to the study, and continued being calm when taking the first IAT, as shown in her recording. About 1:04 into her MPM, when the two events "I got backwards crossovers :)" and "seeing parents at airport" began, she started sobbing (Fig 4.9). At "backwards crossovers" (left image), the agent found a fire in the darkness and fell asleep next to it. It was a "positive

4.3 Snowbound: Changing implicit self-esteem 63 highlight" moment. "Seeing parents at airport" featured the dog waking up to three deer coming to its company (middle image). After trekking together in the snow, the agent wagged its tail and expressed friendliness to the deer in a slow sunrise (right image).

She referred to MPM as "touching", "[...] Somehow, when I first saw it, I connected immediately to the story ... It made me think about being unsure and really searching for happiness. It also tried to show me the positive sides - for example, having other animals come to connect with me. I really enjoyed the outdoors setting.

"I don't know why, but I started crying when I saw the first memory flash up (for me it was backwards crossovers). I've been trying really hard every day, but I didn't really remember the small moments until I saw them in this video. Seeing the dog try so hard to be happy made me really empathize with it - I tried to think about myself as if someone wanted me to be happy as much as I wanted the dog to be happy. The slow sunrise was very beautiful. I wish it was longer."

Participant 28: An Upswing

P28 had a PHQ9 score of 9 (mild depression). She had the highest AUC(Smile) 5.69 and AUC(Joy) = 5.00 among all participants, and was visibly smiling a lot from her facial expression recording. Her IAT result was a significant positive shift, 6D = 0.32. Her MPM consisted of 6 positive and 2 negative mood-changing events.

In general, she commented that MPM "really made me realize just how much my mood varied over the course of a week, and it was nice to reflect back on not just events, but how they made me feel."

Her most memorable moment of the video was about friendship, "I really liked how the label "supportive friends" (or something similar) was paired with the corgi meeting up with the deer and running through the forest. I remember that when I was feeling pretty bad, I talked to my friends and they made me feel better. I thought this part of the video was especially meaningful and inspirational."

She also talked about the ambiguity in the animation, "I thought it was really cute and a little bit like modern art (some of the events shown clearly match up with the words that popped up, while others didn't, as much, and I tried to figure out what parts of my responses matched up with that specific part of the plot)."

64 Chapter 4 Experiments P28 also showed strong interest in the project by staying after the study to talk to the researchers, and requesting a copy of published material in the case of publication.

Participant 08: The Stress Triggers

P08 had a PHQ9 score of 5 (mild or no depression). His first IAT score was a normal 0.91, while MPM brought it down by 0.67. His MPM is 7-out-of-11 positive events, but the memorable events as he recalls are all negative ones.

Fig. 4.10: The screen shots are taken from P08's MPM. The left image corresponding to the interpersonal conflict with his airbnb host. The middle and right images correspond to seeing a polar bear and running away from it.

In the written response, he recalled and described two scenes about interpersonal conflicts and one about stress (Fig 4.10). "There was a moment when the corgi was confronted by the bigger dog barking, which was similar to a conflict I had with my airbnb host over spring break. When the dog saw all the other animals and was

4.3 Snowbound: Changing implicit self-esteem 65 annoyed corresponds to me being annoyed at people complaining during the trip for no good reason. The polar bear rushing at the dog and it not knowing how to escape was similar to how I felt when I checked my grades over break and was upset at how poorly I did and didn't know how to react."

Participant 20: Momentary Bright Spot

P28 had a PHQ9 score of 18 (moderate depression). His first IAT score was relatively low, 0.57. The intervention brought up the score by 0.14. His MPM consisted of 6 positive and 2 negative mood-changing events.

When asked about what the video reflected about himself, he generalized a life lesson, "The video reflected that after bad, there always comes a good time and I keep looking for that good time and wait for this bad time to end."

His most memorable moment was a cave scene, which corresponds to a mood change driven by internal thoughts, "The most memorable moment in the video was dog going to the cave with a really sad face and it was related to a bad morning I had when I was feeling really sad in the morning. I didn't want to wake up and I was thinking I wish I could have stayed in my home country. I was amazed to see when dog was actually close to a cave which looked like a home to me. Even in the end, deers helping dog to reach home was quite fascinating."

4.3.8 Discussions

Facial expression vs. self-reports

In this study, we did not validate participants' perception of positive and negative moments in MPM separately from their facial expressions. From the results of graphical affects validation, we assumed that when participants see animated positive affect, they will perceive positive affect. However, the results in graphical affects validation is based on self-reports. Considering that facial expressions might differ and provide different insight into emotional experiences from self-reports [33], such disagreement between what participants might perceive and show is not all unexpected.

In graphical affects validation, animation clips shown are also not associated with participants' past experiences, while MPM also triggers recalling of past mood and behaviors. Going through life, we often hold a mixture of feelings towards a single

66 Chapter 4 Experiments life event. When participants see positive affect in their MPM, their perception might be biased. One participant wrote, "I was overwhelmed ... most of my happiness comes from dance". It seems that the elevating dance scenes might have not all been perceived positively for her, because they gradually indicate a lack of variation in her source of happiness. There could be a big variation between what we think is positive content and what emotions they elicit, which might explain the unexpected results in the study.

The integrity of negative emotions

Concurrently, it is the mainstream culture of the HCI field that intervention technologies should mediate positive change in behavior and mentality. Our culture tells us to "think positively", "smile more often", and "regulate our negative emotions". While positive emotions have inherent values to human wellbeing [17], negative emotions are an organic part of the human psychology. They can help us be aware of threat, stay motivated [68], review our actions and achieve personal growth. To evaluate the effect of MPM, where not only positive emotion but also negative emotions are elicited, we need a more comprehensive framework to take both into account. Mindfulness, that is, the awareness of emotions, can help treat depression and prevent relapse [47]. One suggestion for future investigation on MPM is to look at the improvement in emotional awareness, such as reflected by the TAS-20 score [40].

4.3 Snowbound: Changing implicit self-esteem 67 I r

Conclusion 5

Each has his past shut in him like the leaves of a book known to him by heart, and his friends can only read the title.

-Virginia Woolf

5.1 Summary

Converting personal data into a format that can have emotional impact can potentially fuel motivation for self-reflection and positive behavior change; however, a fully-automated system is not capable today of understanding human personal experiences at the kind of level that could make such a story both explicit and accurate. The mapping from user data to a cohesive plot and the design of cinematic language is harder to be learned by a computer than humans, requiring a much large data set of such user cases.

To begin to address this challenge, we built an automated system to construct an animation that is personalized to the data provided by an individual over a week, utilizing an animated avatar (a corgi) in a virtual world that implicitly reflects the person's behaviors, such as sleep and social interactions, and moods associated with each day of the week.

We conducted three studies: The first (graphical affects validation) found that the avatar's portrayal of a set of moods and behaviors could be perceived accurately in general, as reflected by self-reports. This study helped us understand that stress can elicit either an excited or anxious response. We further designed coherent mechanisms in the movie generation to help participants distinguish between the two responses.

The second study (A Trip to the Moon) tested the emotional engagement of using a personalized story against a challenging control: a generic story that the participants were told was personalized. Post-hoc analysis indicated that truly personalized animation tended to be more emotionally engaging, encouraging greater and lengthier

69 writing that indicated self-reflection about moods and behaviors. While human imagination plays an important and valuable part in both conditions - test and control - the impact we found suggests that true personalization may be more powerfully influential on moods and self-reflection than simply believing that one is receiving personalized feedback.

The third study (Snowbound) focused on a group of participants with mild and moderate depression, and tested multiple hypotheses: whether positively-augmented MPM can elicit more positive change in implicit self-esteem, and whether the change in implicit self-esteem is correlated with participants' facial expressions. The results showed that MPM significantly shifted participants' implicit self-esteem either positively or negatively, and such change is positively correlated with the valence of participants' facial expressions when viewing MPM: The higher valence the participant displayed on the face, the higher the change in implicit self-esteem is.

However, neither the facial expression or change in implicit self-esteem could be predicted from the animation content. What we thought was positive content could elicit both joyful and contemptuous facial expressions. A possible explanation is that participants can have varying perception of the animation, as influenced by their past experiences, current mood, or their depression symptomatology. We found that the higher their PHQ9 score is, the lower-valence facial expressions they have when a positive moment pops up in MPM.

After iterations on plot scripting, modeling and rendering, the Snowbound movies largely improved on graphics quality and demonstrated high emotional engagement, as shown in facial engagement and attention scores. Participants left pensive and emotional writings, some insisting on talking to the researchers or having a copy of their 8-10 minute MPM after viewing it. We saw a wide spectrum of emotions in the facial expression recordings: smiles and smirks, frowns and brow furrows, joy and sadness; One participant was moved to tears.

5.2 Contributions and Recommendations

A method to automatically generate an emotionally engaging movie from personalized content is described above. In particular, the unique emotional properties of animation have been exploited as a reflective design language. Based on advances in real-time GPU rendering, our method provides an instant animated feedback based on simple questionnaire input. Although limited to running time and pre-designed content, it can, according to study participants, "accurately capture and reflect a wide range of personal, professional and behavioral experiences.

70 Chapter 5 Conclusion MPM exhibits interesting emotional influences by eliciting users' recall of past experiences in a personalized manner. It takes a fictional concept, a personalized animated movie, and implements a real-time practical tool with potential applications including for self-reflection, behavior change and therapeutic practices. An emotion-adaptive camera system was designed to capture salient moments and increase human-agent connection. The movies are generated based on custom emotion-behavioral models for emotional agents, and a procedural virtual weather and environment system. It demonstrates good engagement despite being minimal- budget. This research shows that a meaningful, universal experience can be crafted through tracing back to each user's unique mentality and memories.

Due to limitations of the studies, a range of outstanding questions remain to be explored. It remains to be understood whether the influence of MPM can be better controlled through scripting, or it is subject to the complexity of human experiences and emotions. The studies showed that the influence on individuals' self-esteem is salient, but did not reveal what researchers could do to the content to directly manipulate the influence. Perhaps a more human-aided approach should be taken, involving therapists and psychologists' decisions, to deliver a more targeted MPM experience.

Uncertainties exist in the user's understanding and quantifying of their own emotions, and the computer's understanding of user emotions, as well as the computerized feedback of emotions. MPM used implicit representations of emotion and abstract visualization of human interactions, to avoid the uncertainty blowing up. A conflict with an airbnb host is a strange dog barking at the agent, and a sense of community is represented by the company of forest animals. The result is a natural user experience without users constantly noticing that the system isn't fully capable of understanding their reported complex life experiences. We recommend future designers of reflective systems to consider a similar "vague" manner of emotion and behavior representation.

I am, personally, the most motivated by the intimate responses participants wrote during the studies. Some treated the study as a therapeutic practice, and shared intimate thoughts and emotions otherwise likely unspoken to their friends and family Some shared their joy of gaining new insight about themselves by viewing MPM. Some thanked the researchers for choosing this research topic, and encouraged us to do more good work for human wellbeing. It has been a treasured experience to utilize the perks of modern technology to take concrete steps, even animated corgi steps, to help those who are in need.

5.2 Contributions and Recommendations 71 I Bibliography

[1] Stefan Panayiotis Agamanolis. ,,Isis, Cabbage and Viper: new tools and strategies for designing responsive media". PhD thesis. Massachusetts Institute of Technology, 2001 (cit. on p. 17).

[2]Tomas Akenine-Moller, Eric Haines, and Naty Hoffman. Real-time rendering. AK Peter- s/CRC Press, 2008 (cit. on p. 26).

[3]Lisa Feldman Barrett, James Gross, Tamlin Conner Christensen, and Michael Benvenuto. ,,Knowing what you're feeling and knowing what to do about it: Mapping the relation between emotion differentiation and emotion regulation". In: Cognition & Emotion 15.6 (2001), pp. 713-724 (cit. on p. 32).

[4]Joseph Bates et al. ,,The role of emotion in believable agents". In: Communications of the ACM 37.7 (1994), pp. 122-125 (cit. on p. 18).

[5]Jennifer K Bosson, William B Swann Jr, and James W Pennebaker. ,,Stalking the perfect measure of implicit self-esteem: The blind men and the elephant revisited?" In: Journal of personality and social psychology 79.4 (2000), p. 631 (cit. on p. 55).

[6]Virginia Braun and Victoria Clarke. ,,Using Thematic Analysis in Psychology". In: 3 (Jan. 2006), pp. 77-101 (cit. on p. 43).

[7] Cynthia L Breazeal. Designing sociable robots. MIT press, 2004 (cit. on p. 14).

[8]Timothy C Brock, Jeffrey J Strange, and Melanie C Green. ,,Power beyond reckoning". In: Narrative impact. Social and cognitive foundations (2002), pp. 1-16 (cit. on p. 13).

[9]Allison Bruce, Illah Nourbakhsh, and Reid Simmons. ,,The role of expressiveness and attention in human-robot interaction". In: Robotics and Automation, 2002. Proceedings. ICRA'02. IEEE InternationalConference on. Vol. 4. IEEE. 2002, pp. 4138-4142 (cit. on pp. 19, 30).

[10]Fred B Bryant, Colette M Smart, and Scott P King. ,,Using the past to enhance the present: Boosting happiness through positive reminiscence". In: Journal of Happiness Studies 6.3 (2005), pp. 227-260 (cit. on p. 51).

[1 1]Tom Carpenter, Ruth Pogacar, Chris Pullig, et al. ,,Conducting IAT Research within Online Surveys: A Procedure, Validation, and Open Source Tool". In: (2017) (cit. on p. 57).

73 [12]Jean Costa, Alexander T Adams, Malte F Jung, Francois Guimbetiere, and Tanzeem Choudhury. ,,EmotionCheck: leveraging bodily signals and false feedback to regulate our emotions". In: Proceedingsof the 2016 ACM InternationalJoint Conference on Pervasive and Ubiquitous Computing. ACM. 2016, pp. 758-769 (cit. on pp. 14, 18).

[13]Shaundra Bryant Daily and Rosalind Picard. ,,INNER-active Journal". In: Proceedings of the 1st ACM workshop on Story representation, mechanism and context. ACM. 2004, pp. 51-54 (cit. on p. 18).

[14]Gianluca Donato, Marian Stewart Bartlett, Joseph C. Hager, Paul Ekman, and Terrence J. Sejnowski. ,,Classifying facial actions". In: IEEE Transactions on pattern analysis and machine intelligence 21.10 (1999), pp. 974-989 (cit. on p. 58).

[15]Paul Ekman. ,,Universals and cultural differences in facial expressions of emotion." In: Nebraska symposium on motivation. University of Nebraska Press. 1971 (cit. on p. 59).

[16]David K Elson and Mark 0 Riedl. ,,A Lightweight Intelligent Virtual Cinematography System for Machinima Production." In: AIIDE 2 (2007), p. 3 (cit. on p. 19).

[17]Barbara L Fredrickson. ,,The role of positive emotions in positive psychology: The broaden-and-build theory of positive emotions." In: American psychologist 56.3 (2001), p. 218 (cit. on p. 67).

[18 ]Gustav Freytag. Freytag's technique of the drama: an exposition of dramaticcomposition and art. Scholarly Press, 1896 (cit. on p. 21).

[19]Thomas Geijtenbeek, Michiel Van De Panne, and A Frank Van Der Stappen. ,,Flexible muscle-based locomotion for bipedal creatures". In: ACM Transactions on Graphics (TOG) 32.6 (2013), p. 206 (cit. on p. 27).

[20]Asma Ghandeharioun and Rosalind Picard. ,,BrightBeat: Effortlessly Influencing Breath- ing for Cultivating Calmness and Focus". In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM. 2017, pp. 1624-1631 (cit. on p. 14).

[21]Stefan G6bel, Sandro Hardy, Viktor Wendel, Florian Mehm, and Ralf Steinmetz. ,,Serious games for health: personalized exergames". In: Proceedingsof the 18th ACM international conference on Multimedia. ACM. 2010, pp. 1663-1666 (cit. on p. 20).

[22]Anthony G Greenwald and Shelly D Farnham. ,,Using the implicit association test to measure self-esteem and self-concept." In: Journal of personality and social psychology 79.6 (2000), p. 1022 (cit. on p. 60).

[23]Frank L Greitzer, Olga Anna Kuchar, and Kristy Huston. ,,Cognitive science implica- tions for enhancing training effectiveness in a serious gaming context". In: Journal on EducationalResources in Computing (JERIC) 7.3 (2007), p. 2 (cit. on p. 20).

[24]Anton Gustafsson, Magnus Bang, and Mattias Svahn. ,,Power explorer: a casual game style for encouraging long term behavior change among teenagers". In: Proceedings of the InternationalConference on Advances in Computer Enterntainment Technology. ACM. 2009, pp. 182-189 (cit. on p. 20).

[25]Kirsi Halttu and Harri Oinas-Kukkonen. ,,Persuading to reflect: Role of reflection and insight in persuasive systems design for physical health". In: Human-ComputerInteraction 32.5-6 (2017), pp. 381-412 (cit. on p. 17).

74 Bibliography [26] Lisa L Harlow, Michael D Newcomb, and Peter M Bentler. ,,Depression, self-derogation, substance use, and suicide ideation: Lack of purpose in life as a mediational factor". In: Journal of clinical psychology 42.1 (1986), pp. 5-21 (cit. on pp. 22, 25).

[27]D Fox Harrell and Jichen Zhu. ,,Agency Play: Dimensions of Agency for Interactive Narrative Design." In: AAA[ spring symposium: Intelligent narrative technologies II. 2009, pp. 44-52 (cit. on p. 20).

[28]Kristina H66k. ,,Affective loop experiences: designing for interactional embodiment". In: PhilosophicalTransactions of the Royal Society of London B: Biological Sciences 364.1535 (2009), pp. 3585-3595 (cit. on p. 18).

[29] Erik T Huntsinger and Linda J Luecken. ,,Attachment relationships and health behavior: The mediational role of self-esteem". In: Psychology & Health 19.4 (2004), pp. 515-526 (cit. on p. 55).

[30]Katherine Isbister, Kia H66k, Jarmo Laaksolahti, and Michael Sharp. ,,The sensual evaluation instrument: Developing a trans-cultural self-report measure of affect". In: Internationaljournal of human-computer studies 65.4 (2007), pp. 315-328 (cit. on pp. 14, 18).

[31]Ollie Johnston and Frank Thomas. The illusion of life: Disney animation. Disney Editions New York, 1981 (cit. on p. 18).

[32]Daniel Kahneman and Jason Riis. ,,Living, and thinking about it: Two perspectives on life". In: The science of well-being 1 (2005) (cit. on p. 51).

[33]Karim Sadik Kassam. Assessment of emotional experience throughfacial expression. Har- vard University, 2010 (cit. on pp. 59, 66).

[34] Rubaiat Habib Kazi, Tovi Grossman, Nobuyuki Umetani, and George Fitzmaurice. ,,Mo- tion Amplifiers: Sketching Dynamic Illustrations Using the Principles of 2D Animation". In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM. 2016, pp. 4599-4609 (cit. on p. 14).

[35]Elizabeth A Kensinger. ,,Remembering emotional experiences: The contribution of valence and arousal". In: Reviews in the Neurosciences 15.4 (2004), pp. 241-252 (cit. on pp. 31, 32).

[36] Do-Yeong Kim. ,,Voluntary controllability of the implicit association test (IAT)". In: Social Psychology Quarterly (2003), pp. 83-96 (cit. on p. 55).

[37]Jarmo Laaksolahti, Niklas Bergmark, and Erik Hedlund. ,,Enhancing believability using affective cinematography". In: International Workshop on Intelligent Virtual Agents. Springer. 2003, pp. 264-268 (cit. on p. 19).

[38]John Lasseter. ,,Principles of traditional animation applied to 3D computer animation". In: ACM Siggraph Computer Graphics. Vol. 21. 4. ACM. 1987, pp. 35-44 (cit. on p. 19).

[39]Madelene Lindstr6m, Anna Stahl, Kristina H66k, et al. ,,Affective diary: designing for bodily expressiveness and self-reflection". In: CHI'06 extended abstracts on Human factors in computing systems. ACM. 2006, pp. 1037-1042 (cit. on p. 18).

[40] Gwenold Loas, 0 Otmani, A Verrier, D Fremaux, and MP Marchand. ,,Factor analysis of the French version of the 20-ltem Toronto alexithymia scale (TAS-20)". In: Psychopathol- ogy 29.2 (1996), pp. 139-144 (cit. on p. 67).

Bibliography 75 [41]Bernd Lbwe, Kurt Kroenke, and Kerstin Grafe. ,,Detecting and monitoring depression with a two-item questionnaire (PHQ-2)". In: Journal of psychosomatic research 58.2 (2005), pp. 163-171 (cit. on p. 56).

[42]Carlos Martinho and Ana Paiva. ,,Pathematic agents: rapid development of believable emotional agents in intelligent virtual environments". In: Proceedings of the third annual conference on Autonomous Agents. ACM. 1999, pp. 1-8 (cit. on p. 18).

[43]Robert R McCrae. ,,Situational determinants of coping responses: Loss, threat, and challenge." In: Journal of personality and Social Psychology 46.4 (1984), p. 919 (cit. on pp. 24, 41).

[44]Daniel McDuff, Abdelrahman Mahmoud, Mohammad Mavadati, et al. ,,AFFDEX SDK: a cross-platform real-time multi-face expression recognition toolkit". In: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM. 2016, pp. 3723-3726 (cit. on p. 58).

[45]Cathy H McKinney, Michael H Antoni, Mahendra Kumar, Frederick C Tims, and Philip M McCabe. ,,Effects of guided imagery and music (GIM) therapy on mood and cortisol in healthy adults." In: Health psychology 16.4 (1997), p. 390 (cit. on p. 49).

[46]Dean Mobbs, Nikolaus Weiskopf, Hakwan C Lau, et al. ,,The Kuleshov Effect: the influence of contextual framing on emotional attributions". In: Social cognitive and affective neuroscience 1.2 (2006), pp. 95-106 (cit. on pp. 13, 24).

[47]D Morgan. Mindfulness-based cognitive therapy for depression:A new approach to pre- venting relapse. 2003 (cit. on p. 67).

[48]Philippa Mothersill and V Michael Bove Jr. ,,The EmotiveModeler: An Emotive Form Design CAD Tool". In: Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. ACM. 2015, pp. 339-342 (cit. on p. 14).

[49]Jennifer A O'Dea and Suzanne Abraham. ,,Improving the body image, eating attitudes, and behaviors of young male and female adolescents: A new educational approach that focuses on self-esteem". In: InternationalJournal of Eating Disorders 28.1 (2000), pp. 43-57 (cit. on p. 55).

[50]Andrew Ortony. ,,On making believable emotional agents believable". In: Trappl et al.(Eds.)(2002) (2002), pp. 189-211 (cit. on pp. 18, 29).

[51]Rosalind W Picard and Roalind Picard. Affective computing. Vol. 252. MIT press Cam- bridge, 1997 (cit. on pp. 13, 17).

[52]Inmaculada Plaza, Marcelo Marcos Piva Demarzo, Paola Herrera-Mercadal, and Javier Garcia-Campayo.,,Mindfulness-based mobile applications: literature review and analysis of current features". In: JMIR mHealth and uHealth 1.2 (2013) (cit. on p. 14).

[53]Marc H Raibert and Jessica K Hodgins. ,,Animation of dynamic legged locomotion". In: Acm Siggraph Computer Graphics. Vol. 25. 4. ACM. 1991, pp. 349-358 (cit. on p. 27).

[54]Andrew J Reagan, Lewis Mitchell, Dilan Kiley, Christopher M Danforth, and Peter Sheridan Dodds. ,,The emotional arcs of stories are dominated by six basic shapes". In: EPJ Data Science 5.1 (2016), p. 31 (cit. on p. 52).

[55]Donald A Redelmeier and Daniel Kahneman. ,,Patients' memories of painful medical treatments: Real-time and retrospective evaluations of two minimally invasive proce- dures". In: Pain 66.1 (1996), pp. 3-8 (cit. on p. 51).

76 Bibliography [56]W Scott Reilly. Believable Social and EmotionalAgents. Tech. rep. Carnegie-Mellon Univ Pittsburgh pa Dept of Computer Science, 1996 (cit. on p. 18).

[57] Giacomo Rizzolatti and Laila Craighero. ,,The mirror-neuron system". In: Annu. Rev. Neurosci. 27 (2004), pp. 169-192 (cit. on p. 13).

[58]Morris Rosenberg, Carmi Schooler, Carrie Schoenbach, and Florence Rosenberg. ,,Global self-esteem and specific self-esteem: Different concepts, different outcomes". In: Ameri- can sociological review (1995), pp. 141-156 (cit. on p. 55).

[59] Christian Roth, Peter Vorderer, and Christoph Klimmt. ,,The motivational appeal of interactive storytelling: Towards a dimensional model of the user experience". In: Joint InternationalConference on InteractiveDigital Storytelling. Springer. 2009, pp. 38-43 (cit. on p. 20).

[60]Akane Sano. ,,Measuring college students' sleep, stress, mental health and wellbeing with wearable sensors and mobile phones". PhD thesis. Massachusetts Institute of Technology, 2016 (cit. on p. 22).

[61]Corina Sas, Tomasz Fratczak, Matthew Rees, et al. ,,AffectCam: arousal-augmented sensecam for richer recall of episodic memories". In: CHI'13 Extended Abstracts on Human Factors in Computing Systems. ACM. 2013, pp. 1041-1046 (cit. on p. 18).

[62] Bill Tomlinson and Bruce Blumberg. ,,Alphawolf: Social learning, emotion and development in autonomous virtual agents". In: Workshop on Radical Agent Concepts. Springer. 2002, pp. 35-45 (cit. on p. 19).

[63]Bill Tomlinson, Bruce Blumberg, and Delphine Nain. ,,Expressive autonomous cinematography for interactive virtual environments". In: Proceedings of thefourth international conference on Autonomous agents. ACM. 2000, pp. 317-324 (cit. on p. 19).

[64]Robert Trappl, Paolo Petta, and Sabine Payr. Emotions in humans and artifacts. MIT Press, 2002 (cit. on p. 31).

[65]AJN Van Breemen. ,,Bringing robots to life: Applying principles of animation to robots". In: Proceedings of Shapping Human-Robot Interaction workshop held at CHI 2004. 2004, pp. 143-144 (cit. on p. 18).

[66]Kurt Vonnegut. ,,Shapes of stories". In: Vonnegut's Shapes of Stories (1995) (cit. on p. 51).

[67]Kevin Wampler, Zoran Popovi6, and Jovan Popovi. ,,Generalizing locomotion style to new animals with inverse optimal regression". In: ACM Transactions on Graphics (TOG) 33.4 (2014), p. 49 (cit. on p. 27).

[68]Drew Westen, Pavel S Blagov, Keith Harenski, Clint Kilts, and Stephan Hamann. ,,Neural bases of motivated reasoning: An fMRI study of emotional constraints on partisan polit- ical judgment in the 2004 US presidential election". In: Journalof cognitive neuroscience 18.11 (2006), pp. 1947-1958 (cit. on p. 67).

Bibliography 77 Ib List of Figures

2.1 Squash and stretch in Luxo Jr's hop. Figure taken from [38]...... 19

3.1 Three color and weather schemes that correspond to depression, anger and excitem ent...... 23 3.2 Screenshots of rendered animation that demonstrate different camera angles and rendering effects...... 25 3.3 On the left is the spectrum of facial expressions of our virtual agent. On the right is Lorenz' grimace scale, demonstrating levels of pain for dogs. 26 3.4 Setting up layers of the Animator in Unity...... 27 3.5 Consistency between agent's motivational states and emotional states. 30 3.6 The above scene of two dogs walking side by side, gently looking at each other is enabled by the attention module. Notice that the corgi's torso bends from the head to the shoulder, so it won't affect the walking. 31 3.7 The valence-arousal chart as presented in Kensinger's 2004 paper, Remembering emotional experiences: The contribution of valence and arousal. Affective experiences can be described in two dimensions: Va- lence refers to how positive or negative an event is, and arousal reflects whether an event is exciting/agitating or calming/soothing. Words have been placed at locations within this space, indicating their approximate valence and arousal ratings. [35] ...... 32 3.8 Design of different skyboxes. From left to right, up to down: sunny sky, sunset sky, morning sky, and stormy sky. Notice how the skybox and the lighting of the 3D assets together set the basic mood and atmosphere of the environm ent...... 36 3.9 Dark vs. light-colored fog, with other lighting conditions the same. . . 37 3.10 Comparison between the Unity standard rendering (left) and our stylized rendering (right)...... 37

4.1 Distributions of ratings. The first row of four graphs shows how happy users perceived the dog to be in the four video clips. Users have varied opinions about clip a) and fairly similar opinions about clips b), c) and d). The second row of graphs shows how users perceived the sleep quality of the dog in c) and the stress level of the situation in clip d). . 40

4.2 Distribution of three types of responses among the two study groups. . 44

79 4.3 Three types of story shapes by Kurt Vonnegut: man-in-hole, boy-meets- girl, and Cinderella...... 52 4.4 Six emotional story arcs overlayed with the emotional trajectory of the closest 20 books. On the top, from left to right: rise, man-in-hole, Cinderella. On the bottom, from left to right: tragedy, Icarus, Oedipus. 52 4.5 Story arcs of certain participants' animation stories. The vertical axis signifies valence, with 1 being positive, 0 being neutral and -1 being negative. The individual triangle dots shows mood-changing events, and where they lie on the valence scale. The curve is a smoothed polynomial fit to the mood changing events, which demonstrates the change in the valence trajectory. The color coding of the curve shows the change in arousal, with red showing high arousal, and blue showing low arousal. The corresponding label for each graph is (from left to right, top to down): R, CDR, MIH, FP, RF, CDR, MIH, RF...... 54 4.6 Box plots of A UCf and AUCJ for all participants. In the top row, from left to right: joy, sadness, contempt. In the bottom row, from left to right: valence, engagement, attention. The "AUC-" and "AUC+" labels refer to the emotion at negative and positive moments in MPM, and "all" is an average of all times during MPM. The box extends from the lower to upper quartile values [Q1, Q3], and the whiskers show the closest data points within the range [Q1-1.5*IQR, Q3-1.5*IQR]. Flier points are those past the end of the whiskers. The thin orange line shows median. The thick purple line shows mean...... 60 4.7 Histogram of the change in participants' D scores before and after intervention...... 61 4.8 Plots of participants' AUC(Valence) against PHQ9 (first row) and 6D (second row). Red (left column) scatter plots correspond to AUC (Valence), blue (middle column) scatter plots correspond to AUC (Valence), and green (right column) scatter plots correspond to AUC(Valence) for all of MPM. Attached under each graph are the correlation coefficients (co-coeff) and two-sided p-value of a test against the null hypothesis that the slope is 0...... 61 4.9 The screen shots are taken from P04's MPM, from left to right each corresponding to event "I got backwards crossovers :)", "seeing parents at airport" and in the participant's written response, "slow sunrise". . . 63 4.10 The screen shots are taken from P08's MPM. The left image corresponding to the interpersonal conflict with his airbnb host. The middle and right images correspond to seeing a polar bear and running away from it. 65

80 List of Figures List of Tables

3.1 List of camera shots. Cameras are ordered by the frequency of occurrence. 35

4.1 Results of graphical affects validation survey. For certain clips, certain questions are irrelevant to the affect of interest, so we didn't ask those questions and left the table blank. The scores indicate the mean standard deviation of the Likert scale (from 1 to 7) rating for the corresponding affect. The higher the score is, the happier/less calm/more energetic/having slept better the agent appears to the user. The first column describes the animated affect in the four clips. .... 41 4.2 Comparison between the control group and the personalized group. Confused refers to the percentage of users showing confusion about the story (e.g. By saying "Don't know what's going. on", "I am confused"). Word count is the group average of word count. Emotion refers to the average number of emotion-descriptive words (e.g. "emotional", "happy", "nostalgic") the user used in their response. Recall refers to the percentage of participants recalling past experiences corresponding to the animation in the group...... 44 4.3 Three types of responses to the video ...... 44 4.4 Labeling emotional story arcs...... 53 4.5 Comparing the facial expressions at positive moments and negative moments in MPM ...... 60

81 I Colophon

This thesis was typeset with ITEX2E. It uses the Clean Thesis style developed by Ricardo Langner. The design of the Clean Thesis style is inspired by user guide documents from Apple Inc.

Download the Clean Thesis style at http: //cleanthesis. der-ric de/.