Pervasive Well-being Technology
Pablo Paredes
Electrical Engineering and Computer Sciences University of California at Berkeley
Technical Report No. UCB/EECS-2016-37 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-37.html
May 1, 2016 Copyright © 2016, by the author(s). All rights reserved.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.
Acknowledgement
I want to thank my advisor, Professor John F. Canny. I am grateful for picking me up among hundreds. Prof. Eric Brewer - You guided my passion with such bold vision. Thanks for making a putative son of TIER. Dr. Mary Czerwinski – I think we chose the most exciting and rewarding topic. Let's keep working at it. Prof. Greg Niemeyer - You are one of the kindest, most creative and interesting people I know. Looking forward to doing more together. Prof. Bjoern Hartmann - You became my “ad hoc” advisor and my main funding partner support for the latter part of my career. You are without any doubt on of the brightest minds in HCI. Prof. John Chuang - What a pleasure it was to work with you. I always admired your ability to bring together so many stakeholders to the wearables class.
Pervasive Well-being Technology
By
Pablo Enrique Paredes Castro
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy
In
Computer Science
in the
Graduate Division
of the
University of California, Berkeley
Committee in charge:
Professor John F. Canny, Chair Professor Björn Hartmann Professor Greg Niemeyer Professor Mary Czerwinski
Fall 2015
Pervasive Well-being Technology
Copyright 2015
by
Pablo Enrique Paredes Castro
1
Abstract
Pervasive Well-being Technology
by
Pablo Enrique Paredes Castro
Doctor of Philosophy in Computer Science
University of California, Berkeley
Professor John F. Canny, Chair
Well-being is the characteristic of humans to feel well with themselves and their environment. A key driver of wellbeing is to have a healthy mind. Stress management, positive emotions, and empathic social interaction get us closer to this goal. The only sustainable way towards wellbeing is by maintaining a healthy lifestyle. In summary, technologies supporting this objective should measure and intervene "life" itself!
The good news is that we are in a unique position to challenge the way technology can become a driver of wellbeing. Internet of things (IoT), Big Data, Affective Computing and Critical Engineering, are the key enablers. With pervasive data, interventions can evolve from being reactive to predictive. We could design long- term interventions that foster good habits, resilience, and personal growth. We encapsulate technologies in two parts: "Conceptualization and Sensor-less Sensing" and "Opportunistic Interventions." The former uses data streams to investigate and measure human traits and trends. The latter repurposes popular apps, devices, and environments into effective interventions.
In summary, we aim at designing well-being technology that could survive the chasm of adoption and attrition.
i
Dedicated with love to my wife Claudia (Clau), my babies Manuela (Manu) and Diego Pablo, my parents Enrique and Cecilia, my brother Juan Carlos, my sister Maria Cecilia. You made this possible. Thanks forever!
ii
Contents ii
List of Figures vi
List of Tables x
Introduction 1
Acknowledgements 3
Part 1: Conceptualization and Sensing: 5
1. Qualitative Insight Mining 7 1.1 INTRODUCTION ………………………………………………………... 8 1.2 RELATED WORK ………………………………………………………. 10 1.2.1 Keyword Search ………………………………………………… 10 1.2.2 Semantic Similarity ……………………………………………... 10 1.2.3 Qualitative Data Analysis Tools ………………………………. 11 1.3 METHODOLOGY ………………………………………………………. 11 1.3.1 LiveJournal Corpus …………………………………………….. 11 1.3.2 Data Preprocessing ……………………………………………. 11 1.3.3 word2vec ………………………………………………………... 12 1.3.4 Querying ………………………………………………………… 13 1.4 ITERATIVE CO-DESIGN PROCESS ………………………………… 15 1.4.1 Initial Concept Formation ……………………………………… 15 1.4.2 Early Idea Validation …………………………………………… 17 1.4.3 Low Fidelity (Lo-Fi) Prototype ………………………………… 17 1.4.4 Testing on more Data …………………………………………. 22 1.4.5 Mid Fidelity (Mi-Fi) Prototype …………………………………. 23 1.5 FUTURE WORK ……………………………………………………….. 26 1.6 CONCLUSION ………………………………………………………….. 27
2. Game & Narrative Design for Behavior Change 29 2.1 INTRODUCTION ………………………………………………………. 30 2.2 PREVIOUS WORK ……………………………………………………. 30 2.3 THE CHALLENGE: EFFICACY + EFFICIENCY ……………………. 31 2.4 DESIGN METHODOLOGY …………………………………………… 32 2.5 GAME PROTOTYPE EXAMPLES …………………………………… 33 2.6 NARRATIVE PROTOTYPE EXAMPLE ……………………………... 36 2.6.1 Mental Health Background ……………………………………. 36 2.6.2 Related Work …………………………………………………… 39 2.6.3 Prototype Movie: “A Rainy Day” ……………………………… 39 2.6.4 Uses of Therapeutic Machinima ……………………………… 42 2.6.5 Future Research ……………………………………………….. 42 iii
2.7 PRINCIPLES FOR CONCEPTUALIZATION ……………………….. 43 2.7.1 Understanding Disempowering Narratives ………………….. 43 2.7.2 Externalizing Problems ………………………………………... 43 2.7.3 Game Dynamics as Interventions ……………………………. 45 2.7.4 Narratives as Interventions ….………………………………… 45 2.8 CONCLUSION ..…..….………………………………………………… 45
3 Sensor-less Sensing 46 3.1 INTRODUCTION ……………………………………………………….. 47 3.2 BACKGROUND ………………………………………………………… 47 3.2.1 Arousal and Stress …………………………………………….. 47 3.2.2 Physiological Affect Sensing ………………………………….. 48 3.3 AROUSAL FROM COMPUTER PERIPHERALS …………………… 49 3.3.1 Mouse Movement ……………………………………………… 49 3.3.2 Mouse Pressure ………………………………………………... 52 3.3.3 Keyboard ………………………………………………………... 53 3.4 SENSING AFFECT WITH TEXT MINING …………………………… 57 3.5 MOVING FROM SENSING TO INTERVENTIONS ………………… 57 3.5.1 Usability Studies ……………………………………………….. 57 3.5.2 Stress and Emotion from Sensor and Contextual Data .……. 58 3.5.3 Integrated Model and Data Acquisition ……………………… 58 3.5.4 Causal Inference ………………………………………………. 58 3.6 ADAPTIVE INTERACTIONS AND INTERVENTIONS …………….. 59 3.6.1 Feedback and Interface Adaptation ………………………….. 59 3.6.2 Emotional Regulation and Psycho-education Interventions . 59 3.6.3 Foreground and Background Interventions …………………. 60 3.7 CONCLUSION …………………………………………………………. 60
Part 2: Opportunistic Interventions 61
4 Repurposing mobile and wearable systems 62 4.1 INTRODUCTION ………………………………………………………. 63 4.2 BACKGROUND AND RELATED WORK …………………………… 63 4.3 IMPLEMENTATION …………………………………………………… 63 4.3.1 Prototypes ……………………………………………………… 63 4.3.2 Common Use Scenario ……………………………………….. 66 4.4 STUDY DESIGN ………………………………………………………. 66 4.4.1 Objective Data …………………………………………………. 66 4.4.2 Subjective Data ………………………………………………… 67 4.5 EVALUATION ………………………………………………….. 67 4.5.1 Baseline ………………………………………………………… 67 4.5.2 Randomized Experiment ……………………………………… 68 4.6 RESULTS ………………………………………………………………. 68 iv
4.7 IMPLICATIONS FOR DESIGN ………………………………………. 71 4.7.1 Suite of Interventions and Context …………………………… 71 4.7.2 Design Improvements …………………………………………. 72 4.7.3 Long-term (Real-Life) Usability ………………………………. 73 4.8 CONCLUSION ...……………………………………………………….. 73
5 Harvesting web interventions 74 5.1 INTRODUCTION ……………………………………………………… 75 5.2 BACKGROUND WORK ………………………………………………. 76 5.2.1 Cognitive Behavioral Therapy (CBT) based Technology …. 76 5.2.2 Stress Technology Intervention R&D ……………………….. 76 5.3 STRESS MANAGEMENT SYSTEM DESIGN ……………………... 77 5.3.1 Micro-Intervention Authoring System ……………………..… 77 5.3.2 Intervention Recommender System ………………………… 80 5.3.3 Mobile App Interface …………………………………………. 83 5.4 STUDY DESIGN ……………………………………………………… 85 5.4.1 Phones and Screening Procedures ………………………… 85 5.4.2 Experiment Design …….……………………………………… 86 5.4.3 Gratuity and Incentive Policies ……………………………… 86 5.5 RESULTS ……………………………………………………………… 86 5.5.1 Descriptive Data ……………………………………………… 87 5.5.2 Qualitative Results …………………………………………… 91 5.5.3 Quantitative Results …………………………………………. 92 5.6 IMPLICATIONS FOR DESIGN ……………………………………… 94 5.6.1 Intervention Design …………………………………………… 94 5.6.2 Intervention Recommendation ………………………………. 95 5.6.3 Future Research ………………………………………………. 96 5.7 CONCLUSION ………………………………………………………… 96
6 Multi-modal Interventions 97 6.1 INTRODUCTION ……………………………………………………… 98 6.2 BACKGROUND ……………………………………………………….. 99 6.2.1 Circumplex Model of Affect ………………………………….. 99 6.2.2 Auditory Affective Expressivity ………………………………. 100 6.2.3 Haptic Affective Expressivity ………………………………… 100 6.2.4 Multisensory Affective Response ………………………... 100 6.3 PRIOR WORK ……………………...………………….……………… 101 6.3.1 Audio and Haptic Research …………………………………. 101 6.3.2 Vibrotactile Interfaces ………………………………………… 101 6.4 EXPERIMENT DESIGN ……………………………………………… 102 6.4.1 Vibrotactile Wearable Hardware …………………………….. 102 6.4.2 Stimuli Generation …………………………………………….. 102 6.4.3 Conditions ……………………………………………………… 104 v
6.5 RESULTS ……………………………………………………………… 104 6.5.1 Quantitative Analysis …………………………………………. 104 6.5.2 Qualitative Analysis …………………………………………… 109 6.5.3 Usability Assessment ………………………………………… 111 6.6 IMPLICATIONS FOR DESIGN ……………………………………… 113 6.7 FUTURE WORK ………………………………………………………. 114 6.8 CONCLUSION ………………………………………………………… 115
7 Urban Ambien Interventions – Fiat Lux 116 7.1 INTRODUCTION ……………………………………………………… 117 7.2 PREVIOUS WORK …………………………………………………… 118 7.2.1 Urban Interactive Light Design ………………………………. 118 7.2.2 Light Interaction and Expressivity …………………………… 118 7.2.3 Implicit Interaction and Approachability …………………….. 118 7.3 LIGHTING SYSTEM DESIGN ……………………………………….. 119 7.4 METHODOLOGY ……………………………………………………… 119 7.4.1 Location ………………………………………………………… 119 7.4.2 Method ……………………………………………………….…. 121 7.5 EXPERIMENT 1: INTERACTIVE LIGHT FACTORS ……………… 123 7.5.1 Experiment Design ……………………………………………. 123 7.5.2 Quantitative Analysis ………………………………………….. 124 7.5.3 Post-test Qualitative Analysis …………………………….….. 126 7.5.4 Card Sorting Analysis …………………………………………. 127 7.6 EXPERIMENT 2: INTERACTIVE LIGHTS VERSUS ALWAYS ON 132 7.6.1 Experiment Design …………………………………………….. 132 7.6.2 Quantitative Analysis ………………………………………….. 132 7.6.3 Post-test Qualitative Analysis ………………………………… 133 7.6.4 Card Sorting Analysis …………………………………………. 134 7.7 EXPERIMENT 3: BEFORE + SLOW PRIMITIVES ………………… 136 7.7.1 Experiment Design …………………………………………….. 136 7.7.2 Quantitative Analysis ………………………………………….. 137 7.7.3 Post-test Qualitative Analysis ………………………………… 139 7.7.4 Card Sorting Analysis …………………………………………. 139 7.8 IMPLICATIONS FOR DESIGN ………………………………………. 140 7.9 CONCLUSION …………………………………………………………. 144
8 Conclusion 145 8.1 SUMMARY…….……………………………………………………….. 145 8.1.1 Part 1 Ð Conceptualization and Sensing ……………………. 145 8.1.2 Part 2 Ð Opportunistic Interventions …………………………. 145 8.2 INTERVENTION RESEARCH DRIVERS …………………………… 146 8.3 VISION ………………………..………………………………………… 146
References 148 vi
List of Figures
1.1 Example based on a real query, on how the queries evolve in response to the returned answers. The flow is somewhat analogous to semi-structured interviews, where interviewers must modify their line of questioning on the fly to respond to answers received. The example is linear for simplicity of presentation, but the interviewer can easily follow a line of questioning and then return to an earlier query as many time as desired. ……………………. 16
1.2 Example of the effect of filters in the query results on the full data. Top right displays the top 8 results for the query “family holiday”. To see more in depth responses, we can filter out the sentences with less than 30 words, see results in the bottom left. Since we perform a semantic query, we may choose to exclude the words “family” or “holiday” from the results. In the top right panel, we exclude the word “family”. Finally, we can define a weighting scheme on the query words. In the bottom right panel, we increase the weight on the word “family” to obtain results that relate more to “family” than “holiday”. Note that each result belongs to a full post, which we retrieve to the interviewer, e.g., the original post of the highlighted result is shown in Figure 1.3. …………………………………………………………. 20
1.3 Public post on LiveJournal.com, corresponding to the highlighted result in Figure 1.2. By viewing the full text in its original context, the interviewer is able to explore a response in greater depth. …………………………………. 21
1.4 Example queries and results ran on 100% of the data, and with distinct filters. …………………………………………………………………………….. 25
2.1 Efficacy + Engagement game design dualities. ……………………………… 32
2.2 Monsters Game Screen. ………………………………………………………. 34
2.3 Scheherazade’s Word Game Screens. ……………………………………… 35
2.4 Semester adventure game screen. …………………………………………… 36
2.5 Original CBT cartoon versus machinima video screenshot. ………………. 38
2.6 Excerpts from original CBT manual and machinima movie script. …………. 40
3.1 Mouse displacement damping ratio for 20 different motion tasks for mental Non-stress and Stress conditions [22]. ……………………………………… 51
3.2 Affective and Stress Measurements using 7-point Likert scales. ………… 52
3.3 Individual differences between the averaged differential (stress Ð calm) signal per participant. *The difference was computed from significantly
vii
different distributions (W, p<0.05). …………………………………………… 55
4.1 An SMS message used to prompt the closest contacts in the user’s (inner) social network. ………………………………………………………………….. 64
4.2 One of the games we used involved maneuvering a ball in a maze. ………. 64
4.3 Vibrotactile bracelet with two eccentric rotating mass (ERM) motors for guided breathing and acupressure stimulation. ……………………………… 65
4.4 Accupressure points for relaxation. …………………………………………… 66
4.5 Experiment Stages. …………………………………………………………….. 68
4.6 Subjective stress levels for each intervention across the different stages on part 2. ….……………………………………………………………………... 69
4.7 Comparison of the four interventions. Social Networking and Guided Breathing had the most uniform distribution of usability traits. ……………… 69
4.8 Electro-Dermal Activity (EDA) oscillating around 540 mV. ………………….. 70
4.9 Biometric signals comparison between Calm and Stress stages. ………….. 71
5.1 Mobile app sample screens: a) subjective stress assessment; b) intervention title, icon, slogan and prompt. …………………………………… 84
5.2 Experience Sampling Method (ESM): a) Self-report message to be clicked by user; b) Circumplex quadrant selection; c) Self-rating completion with option to launch micro-intervention if desired. ………………………………. 85
5.3 Machine Learning selected interventions. ……………………………………. 87
5.4 Distribution of interventions per participant. …………………………………. 87
5.5 Fraction of interventions for which users selected the recommended intervention. …………………………………………………………………….. 88
5.6 Stress deltas (stress before Ð stress after). ………………………………… 89
5.7 Daily users per day per experiment condition. ……………………………… 89
5.8 “Ideal” users (stress self-rating before intervention > 0.5 and intervention duration > 60 sec) a) Stress after versus stress before interventions; b) stress delta versus intervention duration. 90
5.9 Subjective ratings (likeability and efficacy). ………………………………… 91
5.10 Depression (PHQ-9) quantitative results. ………………………………… 93 viii
5.11 Coping Strategies Questionnaire (CSQ). ………………………………… 94
6.1 Circumplex Model of Affect by Russell [16]. …………………………………. 99
6.2 Basic haptic effects with two vibrating motors as described in [12]. ………. 101
6.3 a) Prototype wristband made of Velcro straps covering the vibrating motors and attached to Arduino Uno. b) Motor placement (1.1 inches apart). …….. 102
6.4 Stimuli affect map: a) Haptic, b) Sound. (o) represents average points for all 40 users; (x) represents the centroids; colors represent the different stimuli: black = happy, red = anger, blue = sadness, green = relaxed. …….. 105
6.5 Multiple comparisons (with Bonferroni’s correction). Sound mode’s valence is significantly different than Haptic mode or Combined (Haptic + Sound) mode. …………………………………………………………………… 107
6.6 Multiple comparisons (with Bonferroni’s correction). Happy haptic (High Valence + High Arousal) combined with Anger sound (Low Valence + High Arousal) was significantly different. …………………………………………… 108
6.7 Multiple comparisons (with Bonferroni’s correction). Only Anger and Relaxed present significantly different valences. …………………………… 109
6.8 Usability metrics for Sound, Haptic and Combination (Sound + Haptic) conditions. ………………………………………………………………………. 112
7.1 Experimental setup a) metrics: light radius = 1.12m, distance between lights = 2.7m, hallway width = 4m, hallway length = 30ft, base luminance with lights off = 2lx. Note: low illumination not displayed in this image; b) User walking with all lights on. Max luminance per light = 40 lx. Note: Image captured with high sensitivity camera, appears to be brighter than reality…. 120
7.2 Hard Lighting Interactions. …………………………………………………….. 122
7.3 Circumplex Model of Affect (CMA) mapping of the Arousal and Valence metrics for each condition in study number 1. ……………………………….. 122
7.4 Energy consumption (Wh) in red, contrasted with Valence (baseline corrected) in blue for each light condition. ……………………………………. 125
7.5 Board used to sort the different cards with places into the best affective classification. ……………………………………………………………………. 128
7.6 Example of the light conditions board. Used to match places with light conditions. ………………………………………………………………………. 129
7.7 Interaction between Valence (feelings) about a place and (a) Timing and
ix
(b) Ramp factors. ……………………………………………………………… 131
7.8 Energy consumption (Wh) Ðred- contrasted with Valence (baseline corrected) Ðblue- for each light condition. …………………………………… 133
7.9 Interaction between Valence (feelings) about a place and (a) Timing and (b) Ramp factors. ……………………………………………………………….. 135
7.10 Interaction between Valence (feelings) about a place and (a) Timing and (b) Ramp factors. ………………………………………………………….. 137
7.11 Interaction between Valence (feelings) about a place and the number of Look Ahead Lights. ………………………………………………………….. 140
7.12 Inverse U curve about a place and (a) Timing and (b) Ramp factors. …. 141
x
List of Tables
2.1 Behavior Change and Narrative topics taught in addition to Game Development and Human Centered Design topics. …………………... 33
2.2 Movie script elements of “A Rainy Day”. ……………………………….. 41
5.1 Micro-interventions design matrix. Therapy groups are subdivided into individual and social groups. Each therapy group has a friendly icon and name. The micro-intervention format consists of a Prompt plus a URL. ………………………………………………………………… 79
5.2 User data and their parameters. ………………………………………… 81
5.3 Sensory features collected on the phone. ……………………………… 81
5.4 Distribution of participants and interventions for the different conditions of the study. …………………………………………………… 86
5.5 Reported learning. ………………………………………………………… 92
6.1 Top four haptic patterns used for the expression of emotions[9]. …… 100
6.2 Circumplex Model of Affect with haptic and sound mappings per quadrant. …………………………………………………………………… 103
6.3 Experiment conditions. …………………………………………………… 104
6.4 Three-way ANOVA for Arousal values. ………………………………… 106
6.5 Three-way ANOVA for Valence values. ………………………………… 106
6.6 Preferred communication modalities. …………………………………… 113
7.1 Counts of places with respect to the feelings and level of excitement. 130
7.2 Counts of Places with respect to the feelings and level of excitement. 134
7.3 Counts of places with respect to the feelings and level of excitement. 139
7.4 Design implications for optimizing affect versus energy consumption in urban interaction illumination systems. ………………………………. 142
1
Introduction
A vision shared by all the fields of computer science is to improve the living conditions for human beings. Computers have affected areas such as labor conditions, access to information, entertainment and science. Computers have touched almost every aspect of human existence. They have brought notable improvement to our living standards. There are frontiers that we need to conquer, and computers should play a significant role in that regard. Mental health is one of those. Computers are not yet able to know how we feel, cannot answer what we think, and do not explain our behaviors.
Despite their pervasiveness, computers have not driven large-scale adoption of well- being technology. In fact, the adoption of mental health continues to be so low nowadays that we should count this as an open problem. 30% of the population will need mental health treatment at some point in their lives. However, only, 10% will see a mental health specialist. Out of those only 30% will get treated. From those, only 10% will complete treatment and sadly, 70% of them will relapse. 0.07% is a staggering low efficiency of such a fundamental need. Our approach in this work is not to automate mental health theory. We want to use Human-Computer Interaction (HCI) theory and research to propose novel tools. We want to improve intervention adoption while maintaining or improving their efficacy. We can achieve this by repurposing popular technologies such as web apps, LEDs, and wearable devices (a.k.a. wearables).
Furthermore, the need for intervention research cannot be in tandem with sensing research. The drivers of adoption, engagement and compliance are fundamental problems of human-centered design. For example, novelty is a driver of adoption. However, it is a double edge sword. Target populations drawn to an intervention are quick to realize they are not ready to adopt it. The technology reminds them of their suffering, and the incremental effects are not valuable in the short term. For those engaged, once novelty dies, even an efficacious intervention can die. People would abandon even good designs over time. All this shows the need to not only design interventions, but also sensors. We need sensing technologies that are stealth and pervasive. This way we use long-term adoption timelines, or wait for the right time to launch an intervention.
Part 1 discusses the initial stages of intervention design. It proposes methodologies and tools that help conceptualize interventions that are engaging and efficacious. It also discusses a novels stealth approach to sensing, which we coin "Sensor-less Sensing." Part 2 presents different opportunistic well-being interventions. It discusses how popular tech design, popular devices, and popular places drive adoption. 2
In summary, we present a group of work that draws on HCI theory and creativity. We aim at breaking the mold of automation of clinical psychological treatments. We take advantage of well adopted (popular) interaction designs (apps, games, etc.) to make interventions desirable. We hope to inspire health care to think beyond traditional clinical paradigms. Ultimately, we want to influence the way we design technology to focus it on well-being.
3
Acknowledgements
First I want to thank my advisor, Professor John F. Canny. I am grateful for picking me up among hundreds, despite coming from a non-traditional career. Through him I learned to do rigorous science. It was not easy, it took effort to discover how hard it was, but how pleasing it was at the end. Special thanks to my parents-in-law, Regulo and Nancy. Your help to build our little house in San Leandro was invaluable! Thanks to my dear neighbors Tim and Jennifer, you made our stay in California a delight and an inspiration. I want to thank my academic and life mentors: Prof. Eric Brewer - You guided my passion with such smart vision. Your incredible knowledge of technology is only surpassed by your human kindness. Thanks for making me a putative son of TIER! Dr. Margie Morris - without your help during my application phase I would have not had a strong enough application. I hope we can continue to collaborate in years to come. Dr. Sunny Consolvo – I managed to understand the value of a strong will and keen intelligence. She is a natural leader. Thanks for showing me the power of usability design in a world full of technicians! Dr. Mary Czerwinski – Thanks for believing in me. I still think we chose the most interesting and rewarding topic. There is so much we need to discover, I just hope we will always be able to come with new projects together. Prof. Greg Niemeyer - You are one of the most kind, dynamic, creative and interesting person I met at Berkeley. Without your inspiration, patience and faith in me, making it through the last year would have been very hard. Prof. Bjoern Hartmann - You became my “ad hoc” advisor and my key funding partner support for the latter part of my career. You are without any doubt on of the brightest minds in HCI. I hope we will not stop collaborating Prof. John Chuang - What a pleasure it was to work with you. You always valued my somewhat eclectic set of abilities. I always admired your ability to bring together so many stake holders to the wearables class. I am so glad we became co- authors. Let’s keep collaborating! Prof. Ricardo Munoz – Thanks for seeing in me an agent for change and for welcoming into your research groups at UCSF. I learned so much through you and your colleagues! Prof. Adrian Aguilera – Thanks for your constant support and insights in the wonderfully complex world of clinical psychology. Prof. Stephen Schueller – Thanks for your guidance in the early stages of my thesis and for your inspiring passion for the use of technology in psychology. I want to thank all my colleagues at the Berkeley Institute of Design (BID), Google, Microsoft Research (MSR) and the Technology and Infrastructure for Emerging Regions (TIER) groups. All of you had been a strong inspiration for me to pursue novel and challenging ideas. I cannot be more lucky to have worked with such a 4
talented interdisciplinary team. Special thanks to Keng-hao, Reza, David and Kristin at BID and Rani and Asta at MSR. You all made me challenge my ideas to the limit. Thanks to my colleagues in the Social Apps lab, my house for the last year. Eduardo Calle, Marijo Recalde, Soham Shah, Mackenzie Alessi, you all made a very difficult project implementable and so enjoyable! The city of San Leandro will always be in debt with you all. Javier Rosa, thanks for becoming such an unlikely friend. I think destiny brought us together, and we managed to create a strong friendship based in pure creative tension. Thanks for your intelligence, technical prowess and true passion for technology. I will always count you as one of the highlights of my experience at Berkeley. Ryuka Ko, thanks so much for so many things. You were not only my co-author, but my colleague and councilor. I just hope to see you playing music in the very best outlets. You are so talented. If music is your destiny, science will be missing a great talent. If science becomes your home, music will be missing one. Dav Clark, thanks so much for your frank and open friendship. You were, no doubt a deep source of creative trouble for my soul! Your keen observations helped shape a much more critical mind. David Sun, we worked so hard together. There is so much to be said about how much we learned by making mistakes. Thanks for working with me and getting so interested in the topic that brought me to Berkeley. Thanks to my co-authors and Berkeley Institute of Design fellow collaborators. I call you all the fantastic five. Ana Rufino Ferreira, Cory Schillaci, Dennis Xing, Pierre Karashchuck and Gene Woo. What a magical experience to have so much talented interested in moving forward a crazy idea around a difficult but relevant problem, mental health. We came a long ways and I truly hope we can make our creation either a product or a fantastic research tool. Let’s keep walking together! To my co-authors in the Synestouch paper. Ryuka Ko, Arezu Aghaseyedjavadi, Linda Babler, Eduardo Calle, Thanks so much for your passion. We managed to do so much with so little. We managed to publish a paper that despite its complications, helped open the door to understand multi-modal expressions. I am so happy to have met you all. Matt Chan, my first co-author ever! Thanks for your collaboration. Our CalmMeNow paper opened me so many doors! Thanks for your interest in mental health! Eduardo Calle, thanks so much for sticking with science ever since we started that ambitious project in Ecuador. I hope to see you in a few years fulfilling your dream to get a PhD in robotics!
5
Part 1 Conceptualization and Sensing
Designing interventions is more an art than a science. Those involved in mental health, know how hard it is to make an effective theory adoptable. The first part is to understand the underlying phenomena and drawing insights. Traditional research methods are usually expensive. Qualitative approaches generate preliminary and descriptive valuable insights. Qualitative approaches are rarely scalable. We describe a novel system based on semantic searching that scales up qualitative exploration. We used a corpus of data with rich emotional content, livejournal.com. Our preliminary results show a huge potential that can even transcend mental health. Qualitative researchers, in general, could use this tool to gather intervention design insights. Furthermore, this tool could potentially even measure human traits captured in personal free writing.
Insights are not enough to design appropriate interventions. Design methods that leverage existing therapeutic knowledge are necessary. When designing behavior change apps, it is not sufficient to build new technology. Many times, an intervention tries to correct something broken. An understanding of the "negative" scenario and successful therapies will improve the design outcome. We studied projects from student groups focused on this topic. Their outcomes helped establish some design principles for effective and efficient interventions.
Finally, "Sensor-less Sensing" is our proposal to use personal data as sensors. We propose to repurpose signals from peripherals and social media to track mental states. Unobtrusive Sensing is a must do to enable pervasive wellbeing interventions. There are advantages in leveraging daily use computing devices and social media as “sensors”:
Accessibility. Peripherals, such as mice and keyboard, are indispensable to interaction with graphic user interfaces.
Unobtrusiveness. There is no need for sensors on the body, especially in semi- structured office spaces.
Long-term, in-situ monitoring. People spend considerable amounts of time using computers. We can unobtrusively monitor and provide feedback.
Application and content-neutral monitoring. Mice motion is neutral to the application and the content with which the user is interacting. This capability improves adoption and reduces privacy concerns. 6
Chapter 1 shows a novel way to research qualitatively social media using semantic searching. This approach can help study many daily life topics, including mental health ones. This approach transcends sensing, as it can also help discover potential interventions. Chapter 2 describes a method used for the design of creative behavior change interventions. Chapter 3 introduces the use of the PC peripherals such as mice, keyboards, and microphones as sensors.
7
Chapter 1 Qualitative Insight Mining
In this chapter we describe a novel tool to open up new windows of exploration for qualitative researchers. Through the use of semantic search algorithms applied to a large corpus of data, obtained from LiveJournal.com, we are able to extract valuable information that qualitative researchers can use in their methodology. We describe the design processes that lead us to the development of this tool. We analyze the results of an open user study with qualitative researchers, which helped us confirm the value of the tool and lead to the development of additional features. Finally, we present comparisons between results obtained by modifying certain input parameters such as: percentage of data used, number of results returned, filtering, and word weighting.
8
1. INTRODUCTION
Semi-structured interviews are a central tool in user research and the social sciences more generally. They allow for rich and open-ended exploration of user attitudes, preferences, desires, fears and values. From a design perspective, their value relies in the ability to generate insights. While they are of tremendous value, they also have difficulties:
- They are expensive in time and resources. They often involve travel by one or both parties, recording and transcription of content, subject reimbursement, and later effort by the interviewer to form a synthesis from individual inter- views. - They are real-time which affords continuity in the thoughts of the interviewee, but also challenges the interviewer to decide instantly what topics to pursue. - They typically involve a relatively small sample of subjects. While one obtains a breadth of responses it is difficult to know how widely-shared these are. - They are normally time-bounded. Parties make a prior commitment to the duration of the interview, which cannot be adjusted to follow promising threads that emerge late in the interview.
In this chapter, we explore the use of Big Data technologies to provide an complementary tool that serves some of the same goals of semi-structured interviews and other methodologies of discover. Specifically, we explore recently developed deep semantic embedding techniques to discover relevant posts from a large corpus of user-generated content. We use a corpus of public posts from the LiveJournal site. LiveJournal is a social media site where users maintain a personal online journal or diary. LiveJournal posts tend to be longer than other sites, and the journal format encourages externalization of user attitudes and values. Explicit sentiment is attached to some posts via an emoticon system. The site contains many themed areas around topics such as health and lifestyle. Even among the public posts there are many where users discuss significantly life challenges and seek and provide support for others. The content does not cover every possible topic, but the corpus is quite large and rich and coverage of rarer topics grows as the corpus grows. We describe a tool which allows an “interviewer” to explore this corpus via a series of “questions” similar to a semi-structured interview. Each question retrieves a set of relevant sentences from various parts of the corpus, and the containing posts can be retrieved to provide context around them.
Within the LiveJournal corpus are significant tracts of text where users discuss attitudes, values and personal experiences around various topics. We argue that these tracts are similar to the responses users might give in a semi- 9
structured interview. They even occur in the context of that user’s history of posts, which can be explored to better understand their background. The challenge then is to separate the relevant tracts from other parts of the corpus. This is where deep semantic technologies come in. Word2vec is a neural semantic embedding method that has been shown to perform well at a variety of semantic tasks. “Semantics” here includes not just propositional content, but sentiment and even style. We argue that use of these technologies provides a much richer form of matching than traditional keyword-based search and even first-generation semantic search techniques such as Latent Semantic Indexing (LSI). We bolster this argument with our experiments which showed that perceived quality of the retrieved results was improved by filtering out surface-matching (i.e. keyword- matching) posts from semantically-matching posts.
While this approach is not a replacement for semi-structured interviews, it can serve some of the same goals. This is especially true for topics that are more general and therefore well-covered in the corpus. For these topics, there are potential advantages of this approach over classical interviews, which include:
Economy. Our approach is very economical compared to live interviews. No travel is involved, no transcription, no subject payments.
Interviewer Reflection. Interviewers can take as much time as needed to compose questions. There is no cost to pursuing side themes, or with “dead- end” themes. Interviews can last as long as desired, be paused/restarted etc.
Representativeness/Diversity. The number of users who will post about a popular topic (e.g. health, transportation) is large compared to the number of users who would be involved in face-to-face interviews. By exploring up and down the set of relevant sentences, the interviewer gains a sense of typical attitudes.
Interviewer Control. Our system provides a variety of controls and filters over the retrieved results. With practice, this allows the interviewer to more quickly get the kinds of responses they are looking for, compared to reformulating questions in a live interview.
We present evidence that the value of this approach improves with the size of the corpus and the effectiveness of the semantic embedding method. With the recent rapid progress in scalability and accuracy of semantic embedding methods, this means that our approach will improve in usefulness with these developments.
10
1.2 RELATED WORK
1.2.1 Keyword search
Keyword searches were the first approach to mining a large corpus of data for relevant information. Exploration techniques using single-word queries date back to keyword-in-context indexing (KWIC) [78, 40] which provides snippets of text surrounding the search query and a key for reading the text in its original context. This model is still relevant, but the results are difficult to parse and recent work has focused on visualization methods such as Word Tree [149] which aggregates results with identical word sequences and displays them in a tree-like structure. Later visual text exploration tools based on the Word Tree visualization paradigm include WordSeer [104] which makes it easier to navigate a corpus by facilitating switching between different views, adding summary statistics and applying filters (such as date ranges or other metadata). Other refinements included imposing specific grammatical requirements on the results [103].
A standard goal of semi-structured interviews is to obtain non-obvious insights, responses pertinent to the original topic but not expected by the interviewer. Most such interview responses are not expected to explicitly contain words from the interviewer’s questions. For this purpose, keyword search techniques can be too restrictive. Some efforts have been made to search a corpus directly by document level themes. One such example provides a flexible visualization and exploration tool [35] based on topic modeling, a technique to identify latent themes across a collection of documents [13]. As an alternative to the latent concept approach, the relatedness of texts has also been computed using explicit semantic analysis on natural concepts as defined by humans via Wikipedia [45]. An- other attempt to leverage the Wikipedia corpus for semantic annotation of content is described in [89].
1.2.2 Semantic Similarity
Researchers have also made efforts to developing a semantic similarity metric applicable at the word or sentence level. Schemes for measuring similarity at the word level include LSI and Point-wise Mutual Information and Information Retrieval (PMI-IR) [141], both of which learn semantic relation- ships based on co-occurrence of words in a training corpus. More recently, distributed word embeddings learned with recurrent neural networks have become state of the art through algorithms like word2vec, which we use here and describe in the Methodology section. Building on these efforts, various models to compute short text similarity based on word-level semantic similarity were also developed [90, 68]. A recent pa- per also attempts to develop direct embeddings at the sentence level using long short term memory (LSTM) 11
neural networks, called skip-thought vectors [72].
1.2.3 Qualitative Data Analysis Tools
In recent years, Qualitative Data Analysis (QDA) tools have been quite helpful to streamline qualitative research. Tools such as MaxQDA [86], Atlas.ti [6], and N-Vivo [108] are among the most popular ones. These tools provide a wealth of functions to make coding content faster and more precise. They all help generate insights drawn from different sources of data such as videos, text, annotations and recordings. However, these tools have not shown the possibility to perform semantic searching yet. They also do not have direct knowledge connections to sources such as social media or libraries. We are aware that some of these products have not yet succeeded to provide semantic search. We are confident that our tool could complement these tools as a plug-in.
1.3 METHODOLOGY
1.3.1 LiveJournal Corpus
For this study, we obtained a data set consisting of all public English-language LiveJournal posts as of November 2012 in raw XML format. As previously mentioned, LiveJournal is a social networking service where users can keep a blog, journal, or diary. Users have their own journal pages, which show all of their most recent journal entries. Each journal entry can also be viewed on its own web page which includes comments left by other users.
As of 2012, LiveJournal in the United States received about 170 million page views each month from 10 million unique visitors. Our data contains about one million users with a total of 64,326,865 text posts. The median sentence length is 10 words and, as expected, the distribution of sentence lengths follows a power law. Of the users that provided their date of birth, the majority were in the 17-25 age group. Users who chose to identify their location were primarily in the United States (72%), with significant populations in Russia, Canada, the United Kingdom, and Australia. Additionally, users were able to indicate their binary gender; of those who did so 45% identified as male and 55% as female.
1.3.2 Data preprocessing
We started with a raw XML dataset which consists of more than 9 billion words in over 500 million sentences. In order to extract semantic meaning and run synthetic interviews through the use of natural language processing (NLP) algorithms, we implemented a data pipeline that allows us to extract and clean the post text in a computationally efficient form. 12
The first step is to tokenize the corpus using FLEX (Fast Lexical Analyzer), a scanner which maps each word in the corpus to a unique number and saves the mapping in a dictionary. For efficiency, we keep only the 994,949 most common words and discard the rest (all of which occur fewer than 41 times in the entire dataset). Care is taken to properly tokenize numbers and emoticons such as “:-)”, “=<” or “>:P”.
After removing non-textual content such as images, hyperlinks, and XML formatting data, a bag-of-words (BOW) representation is saved for each sentence from the posts as a column in a sparse feature matrix. The rows of this matrix correspond to the entries in the dictionary above, and the values in the matrix are the counts of each entry for each sentence. For each sentence, we also store the ID of the post it belongs to and the user who wrote it so that the original post can be found online. Finally, we keep the tokenized sentence data to display results when performing the query.
1.3.3 word2vec
We explore synthetic interviewing by employing word2vec [91, 92], a popular word embedding model which has been successful in a variety of NLP applications, including analogy tasks, sentence completion, machine translation [85], and topic modeling [32]. Word embedding models map each dictionary word into a lower dimensional continuous vector representation. With word2vec, this embedding is learned automatically from a sample corpus using a recurrent neural network.
The true power of the technique is that the resulting feature space demonstrates semantic structure, e.g. vec(“king”) − vec(“man”) + vec(“woman”) has a greater co- sine similarity to vec(“queen”) than to the vector of any other word in the dictionary. We can also generalize this comparison technique beyond the scope of single-words. By performing vector addition on the word vector embeddings of corresponding words and normalizing the resulting sum, we can compare the semantic proximity of complete sentences. This procedure is described in detail below.
We use two embedding models trained on two different corpus. First, we train our own embedding model on the corpus of LiveJournal posts using the Skip- gram with negative sampling (SGNS) implementation of word2vec as recommended in [92]. This model gives 300-dimensional embeddings for the 994,949 words in our dictionary. Conveniently, Mikolov et. al published a pre- trained SGNS word2vec model for open source access [93]. This model was trained on a 100 billion-word Google News corpus, and gives 300-dimensional embeddings for 3 million unique words. All of the words from our LiveJournal dictionary that are not included in this model are mapped to zero vectors. Common stop words (i.e. “the”, “an”, “who”) provide little semantic 13
information and are also mapped to zero vectors.
1.3.4 Querying
The goal is to perform semantic searches on the dataset from queries which can range from individual words to full sentences. To do this, we find the similarity of sentences based on the embeddings of their individual words. Specifically, we define a sentence embedding to be the mean of its constituent word vector embeddings. The interviewer’s question is also converted to a sentence embedding and tested on the data set using cosine similarity. Note that multi-sentence queries are also possible by treating the entire query in the same way. Word embeddings have been employed in more recent and sophisticated approaches, which outperform our similarity method when used in short text strings [68]. However, our approach has sufficient power for our goal while favoring rapid retrieval of the best matches from a large corpus.
In order to query the data, we multiply the word2vec embed- ding matrix and the sparse bag of words matrix described above to get a semantic matrix. Each column of this matrix is the word2vec semantic encoding of a sentence in our data set. We normalize each column to make sure that the magnitude of each column is the same, regardless of the number of words in each sentence. This allows us to perform queries simply by performing the dot product of each column with the query vector. The sentences with the largest inner product scores have the strongest semantic match to the query. We display a certain number of these sentences, which we call the responses to the interviewer’s question.
Semantic querying uses word2vec embeddings trained on two corpuses or data: GoogleNews and LiveJournal. Each embedding is trained with the same master dictionary, to guarantee that each index corresponds to the same word. Let w2vMat ∈ R300 × #words be the word2vec embedding, where #words is the number of words in the master dictionary, and 300 is the chosen vector dimension.
The featurized sentence data is the bag of words matrix described above. We call it dataMat ∈R#words × #sents, where #sents is the number of sentences considered from the LiveJournal dataset. The sentence data is converted into vectors:
magic = w2vMat ∗ dataMat, (1) where matrix magic is further normalized along each column, and represents the numerical matrix wherewe perform the querying.
Querying if performed by converting the original query into a bag of words 14
(queryWords ∈ R#words × 1), and then calculating its corresponding vector
queryVec = w2vMat ∗ queryWords, (2) which is then normalized.
The semantic relationship is determined by cosine distance, which is the dot product of queryVec with the columns magic, since they are both normalized. The following array, distMat ∈ R1 × #sents, represents the semantic score between the query vector and the sentence vectors from LiveJournal data:
distMat = queryVecT ∗ magic. (3)
While the system above gives us valuable information, we will see in the following section that some additional features are helpful to tune the responses and obtain more qualitatively interesting responses. First, we use a minimum threshold on the number of words in the response. No response is allowed which contains fewer words than the threshold. All words, including stop words and out-of-dictionary words, are included in this total. We found that a minimum threshold of 7 is a good starting point for the LiveJournal dataset, but it can be selected by the interviewer at any time.
The interviewer is given the power to filter out responses which contain a specific word (or a word from a specific set). The tool does not return sentences containing one or more words from the set of filter words. One simple case in which this is helpful is to suppress responses with exact matches to words from the query question, thereby avoiding responses similar to those of a traditional keyword search.
We also give the interviewer the ability to add importance weighting to words in the search query. In this case, when calculating the embedding of the query sentence, we perform a weighted mean of the individual word embeddings based on their importance. Note that these weights can be positive or negative. In the latter case, responses with semantic relation to the negatively weighted word are suppressed.
In order to understand the context of answer sentences returned by the query, we also provide the interviewer a link to the original post on the LiveJournal website.
15
1.4 ITERATIVE CO-DESIGN PROCESS
Our population was qualitative researchers with diverse social science backgrounds. We interviewed
1.4.1 Initial Concept Formation
The design of this system starts with the realization that there is a potential to gather semantic data from large corpus of data, which could be used to help researchers identify what people are actually saying about a topic. The use of LiveJournal text helps access colloquial terms. The size of the corpus also ensures that even less popular topics are often represented in many journal posts. We also felt that existing keyword-based tools to explore such large corpuses are limited in their ability to recover unexpected insights.
We also want the researchers to be able to pose queries as full sentences in addition to collections of words. In this way queries can carry additional information such as emotions. We therefore formulated a very simple format of asking “questions” or queries to the crowd data to obtain relevant answers from the dataset. Figure 1.1 shows an example of a series of synthetic interview questions analogous to the semi-structured interview process. We expect that an interviewer hones their line of questioning along with their hypothesis by exploring the responses at each step.
16
to explore idea to explore Interesting result, result, Interesting use as next query trigger such powerful memories .644 -- it’s funny how a song can .644 -- it’s ’ , where interviewers must Filter: ‘trigger’ For us seeing that somethingvalid. is There repeated is many a times concepttion”, is in and ethnography we called use “satura- it all the time Filter: ‘reminiscence’ Filter: ‘reminiscence’ the responses. Sincetive some or of redundant our answers, one queries idea generated we repeti- had wasour to surprise, design the a ethnographerslieved rejected that, this in their idea. field,receiving They “saturation” similar of be- responses) responses could (repeatedly also be informative. Of course they were interestedcould be in toggled, having and such they aor believed feature even that a if a simple it filter optioncomplex) for that expressions diversity allows would them already to help. see larger (more Low Fidelity (Lo-Fi) Prototype Once we gathered this initialour perspective search we tool. began to Duehad refine to already the developed complexities ansearched of initial over the low-fidelity system, just prototype we that 0.1% of the dataLater we in progressed one to computerapproximately a 1% with higher of fidelity the prototype datatowards querying and the we ability are to query currentlya 100% working cluster of the of data computers.in on using the a fly The using prototype reason thatsystem will we was simulate that were the from so speed ourqueries of experience interested were the the generated speed final was at asquality which important of the as the content the for amount creating and a good searchbetween experience. different queries and answers. to two researchers. First we hadand a how very people short maintain weight conversation generating after a dieting. couple of He very started simplethough by queries we about were diet, working and with even found the value in lo-fi some prototype of healso the already immediately responses. wanted However access this to researcher surrounding contextual sentences data or, such even better, asembedded access the in to the the actual sentence post. This speed allowed interviewers to make quick connections with a public health researcher interested in the topic of diet, With this prototype working we presented the lo-fi prototype 16GB of memory, where we could run queries on the fly. “diversity” filter to increase the entropy of our answers. To Query: trigger’ ‘reminiscence [5]’ trigger such powerful memories Query: ‘it’s funny how a song can Query: ‘it’s structured interviews -
Filter: ‘trigger’ such powerful memories Minimum Word Threshold: 20 Threshold: Minimum Word ‘reminiscence’ Many responses Many responses Results emphasize contain formal word interested in memory interested ‘funny’, but interviewer Query: ‘it’s funny how a song can trigger Query: ‘it’s many time as desired. as time many [5]’ shows an example of a series of 1 responses complicated Initial Query: memories Filter: ‘trigger’ Interested in more in more Interested Example, based on Example, a real o query, how the evolve in to queries response the returned trigger such powerful ‘reminiscence trigger’ ‘reminiscence
Query: ‘it’s funny how a song can Query: ‘it’s
Figure 1.1: answers. The flow is somewhat analogous to semi modify their ofline questioning on the tofly respond to answers received. The example is linear for simplicityof presentation,the but interviewer easilycan follow a line of questioning thenand return earlieras query an to If you can searchgood. in . all . and those if journals therecontrast. would are many be that really are similar it helps to Figure 1. Example, based onanalogous a real to synthetic semi-structured interview, of interviewing,example how where the is interview interviewers linear questions must for evolve simplicitytimes in modify of as response their to presentation, desired. line the but returned of the answers. interviewer questioning The can flow on easily is the follow somewhat fly a line to of respond questioning to andensures answers then that received. return even to less The an popular earliermany topics query journal as are posts. many often We represented also in tools felt to that explore existing such keyword-based largeto corpuses recover are unexpected limited insights. in their ability full sentences in addition toqueries collections can of carry words. additional information Intherefore such this formulated as way emotions. a We verytions” simple or format queries to of the askingfrom crowd “ques- the data dataset. to obtain Figure synthetic relevant answers interview questions analogous tointerview the process. semi-structured We expect thatline an of interviewer questioning hones along their withthe their responses hypothesis at by each exploring step. Early Idea Validation Once we had established whattwo we fronts, wanted the to method do to we obtain worked the in data andon the the way validation. semi-structured interviewingperspective work of and the the qualitative overall methodstalking and with tools. a We social started workergraphers, and and a simply couple asked of themcorpus technology what of ethno- they data believed a couldgiving large do them access for to them. millions of We diaries, used and the thatlieved they analogy could that of such a tool would bea fundamental very need beneficial was to tomodify have have the “context”, a unit i.e. of toback. be analysis The able worker from to also sentences mentionedface to that to having journals pose a and queries simplecan is inter- help very search important, not especiallyexpressions. only if technical the terms tool but also colloquial ulation is fundamental, and agroup tool of allowing people them with to commonhelp describe interests them a would when be formulating valuable initialing to hypotheses. conversation we One had interest- concerned an option to postprocess The ethnographers mentioned that their need to observe a pop- We also want the researchers to be able to pose queries as We approached researchers very soon to gather perspective “interview” their users through them. The social worker“universal” be- perspective of people’s thoughts. She believed that 17
1.4.2 Early Idea Validation
Once we had established what we wanted to do we worked in two fronts, the method to obtain the data and the validation. We approached researchers very soon to gather perspective on the way semi-structured interviewing work and the overall perspective of the qualitative methods and tools. We started talking with a social worker and a couple of technology ethnographers, and simply asked them what they believed a large corpus of data could do for them. We used the analogy of giving them access to millions of diaries, and that they could “interview” their users through them. The social worker believed that such a tool would be very beneficial to have a “universal” perspective of people’s thoughts. She believed that a fundamental need was to have “context”, i.e. to be able to modify the unit of analysis from sentences to journals and back. The worker also mentioned that having a simple inter- face to pose queries is very important, especially if the tool can help search not only technical terms but also colloquial expressions.
If you can search in all those journals would be really good. . . and if there are many that are similar it helps to contrast.
The ethnographers mentioned that their need to observe a population is fundamental, and a tool allowing them to describe a group of people with common interests would be valuable to help them when formulating initial hypotheses. One interesting conversation we had concerned an option to post- process the responses. Since some of our queries generated repetitive or redundant answers, one idea we had was to design a “diversity” filter to increase the entropy of our answers. Ethnographers believed that, in their field, “saturation” of responses (repeatedly receiving similar responses) could be informative.
For us seeing that something is repeated many times is valid. There is a concept in ethnography called “saturation”, and we use it all the time
They were interested in having such a feature if it could be toggled. They believed that a filter for diversity or even a simple option that allows them to see larger (more complex) expressions would help.
1.4.3 Low Fidelity (Lo-Fi) Prototype
Once we gathered this initial perspective we began to refine our search tool. Due to the complexities of the system, we had already developed an initial low-fidelity prototype that searched over just 0.1% of the data in one computer with 16GB of memory, where we could run queries on the fly. Later we progressed to a higher fidelity prototype querying approximately 1% of the data and we are currently working towards the ability to query 100% of the data on the fly using a cluster of computers. The reason we were so 18
interested in using a prototype that will simulate the speed of the final system was that from our experience the speed at which the queries were generated was as important as the amount and quality of the content for creating a good search experience. This speed allowed interviewers to make quick connections between different queries and answers.
With this prototype working we presented the lo-fi prototype to two researchers. First we had a very short conversation with a public health researcher interested in the topic of diet, and how people maintain weight after dieting. He started by generating a couple of very simple queries about diet, and even though we were working with the lo-fi prototype he already found value in some of the responses. However, this researcher also immediately wanted access to contextual data such as the surrounding sentences or, even better, access to the sentence embedded in the actual post.
. . . if you can show me some context would be good. . . like, maybe the sentence before, or even better before and after. . . yeah, the link to the page would be good also, . . .
During the process the interviewer asked if it was possible to weight the words in any way, as he noted that sometimes ancillary words dominated the answers. We took note of this as a potential improvement for the tool.
With this initial mildly positive reaction we carried on engaging other researchers and we already started to develop some way to access the contextual information surrounding the hit phrases.
A second researcher studies Airbnb and neighbor relationships with Airbnb landlords. The challenge in this case is the inherent difficulty in searching for a recent topic while our data set was obtained in May 2012 (we are in the process of obtaining a more recent version of the LJ corpus). Despite this limitation we explored queries concerning issues with neighbors. The frustration of the researcher was quite enlightening. At the same time, we observed a change in the query inputs, as the researcher moved away from the traditional keyword search approach and started searching more colloquial common expressions. Furthermore, she was interested in using the actual answers as queries themselves. Since we were having little success, this approach compensated our inability to frame a good query. This indeed brought us a bit closer to the type of answers the researcher valued. There was an improvement when searching on the mid-fi prototype (1% of the data) but, due to time constraints, we didn’t gather further insights.
Due to a time limitation we did a very quick search on the mid-fi prototype (1% of the data), and we did observe already an improvement. As closing 19
remarks, the Airbnb researcher brought a very relevant point, which is that in order to generate good answers she believes that it was important for the interviewer to become a “smart user”. This was considered good and bad. On the one hand, the overall process of creating queries and seeing the answers did add to the knowledge and helped the researcher form ideas around the research topic. On the other hand, the perils and frustration of generating good queries should be addressed not only by improving the quality of the searches but also by adding a good UI to help creating the queries. The researcher also pointed out some tools like MaxQDA [86] as examples of some of the interfaces that qualitative researchers are used to, with good examples of knobs and other smaller helpful tools.
For the smart system vs smart user, I was just thinking along the lines of how well the system is equipped to predict what the user means (think of early Altavista vs Google today) and how that affects how easy it is for the user to get what they want – of course, the other side of this is how clear it is how the system works since if the rules of the game are obvious for the user, then it is easier to game the system to get what is sought for.
Finally, we discussed the types of topics that LiveJournal can handle. The researcher believed that having some preliminary understanding of what topics are well represented in the corpus would help understand how adequate the corpus is.
. . . now, this is better, but you have to make some “crazy” assumptions to get to this level.
One more interaction with a researcher focused on social relationships online. He was well versed in search models and he wanted to have an initial description of the system, which he understood well. As a matter of fact, the researcher had an inside story about a failed attempt by a commercial analysis software maker on trying to do semantic search a few years ago. He was very open to this idea and interested to see what he could find. He started with a simple query about “family holidays” (Figure 1.2 shows sample questions and responses related to this topic).
20
ery words. In the In words. ery
3.
1. weight “family” more . . . and I think I will sendstudents this in my to department! my :) colleagues . in . . the CDC, or even better get many other dissertations to other last of a childhood christmas the frenzy of tissue paper , broken ball ornaments of tissue paper , broken true last of a childhood christmas the frenzy , and saying happy holidays ? , holiday cards what is up with holiday trees quiet , personal celebrations for holidays as in calendar . holiday : i prefer wrapped : in law , sis holiday presents , parents bought : parents holiday presents merry christmas happy holidays i hope everyone has a wonderful holiday with their families s something about holidays , holiday shopping and decorations that :-p still , there possibly a holiday for mum and me , long weekend that is . looking at xmas lights is holiday movies and lotion sets driving around there wrapped : in law , sis holiday presents , parents bought : parents holiday presents . the holidays continue with a trip to seekonk and house full of relatives merry christmas happy holidays i hope everyone has a wonderful holiday with their families true ornaments , ball broken christmas tissuepaper, frenzy the of achildhood lastof and friends . , relatives , grandchildren , siblings husband children parents love from my son has headed home to amarillo spend the holiday with his mom and siblings . out of the house , enjoying holiday with friends . and sister were my brother happiest of holidays to all my friends , and your families . entries. He felt that someabout journals his were topic excellent and case hethat studies even people thought in of public other policy potential could research make. some additional features like theget ability the to sentences subtract before words or andthat to after he would the like matches. to be Finallythe able he to ones said easily he see the wants journalsalone. and to select work He on also andthe reiterated socio-demographic search properties the on of those importance theconcerned journals dataset. of about He knowing understanding was the not topicsand embedded not in much the interested corpus, in a predetermined diversity filter. He Mid Fidelity (Mi-Fi) Prototype In response to the feedbackmented from the the additional interviewers we search imple- options (filtering, importance section above. We also expandedof the the interactive dataset. search We to call 1% this the mid-fidelity (mi-fi) prototype. The public health researcher said that he would love to see would not mind using it but only if he can control it. weighting, and minimum threshold) described Querying sub- We now wanted to contrast our former iteration with our filters holiday shopping with friends , and early dinners . , sister grandma husband thought about and planned : people parents holiday presents : marginally high however . my holiday stress and friends ! make me happy . and annoying holiday music shopping . , sister grandma husband thought about and planned : people parents holiday presents : marginally high however . my holiday stress and friends ! . dinners early and holiday , friends with shopping holiday query:familyholiday “family” excludesentences containing 1. 2. 3. 4. 5. 6. 7. 8. query:familyholiday family2timesholiday more than weight “family” excludesentences containing 1. 2. 3. 4. 5. 6. 7. 8.
“family” exclude word exclude word the highlighted result is shown Figure is in theresult highlighted 0 words,seethe0 resultsbottominleft. Since perform wesemantic a query, 30 word minimum 30 word Example Search for “Family Holiday” Example Search . 3 holiday presents bought : parents , parents in law , sis in law holiday presents wrapped : in law , sis holiday presents , parents bought : parents holiday presents christmas celebrations : eve mom s family , my family has different family , , more or a time of family , funerals shopping till you drop christmas break , then back home for our spend the holiday and following weekend at my parents i had also been thinking about the national push to have family time , dinners and decorated , first day of holiday acquired lights put up on the house , christmas tree because ... its cold thanksgiving food family weather rain christmas music after all that hubbub , rather than my family s usual christmas gathering of and holidays and family vacations , the is starting to look like ... a family . lots of time with family , after holiday shopping late christmas gift and so on . christmas vacation was spent by celebrating the holidays and visiting w friends family . it was a time of holidays , festive cheer and spending with close friends family . our own family dinner for the holiday . being family less for the holidays , we created holiday , not a fun winter occassion to exchange gifts and get christmas is a religious the holiday season ! that old traditional holiday filled with fun , food family and to kick off ball ornaments of tissue paper , broken true last of a childhood christmas the frenzy holiday presents thought about and planned : people parents , sister grandma husband thought about and planned : people parents holiday presents : marginally high however . my holiday stress day sister s in laws then immediate family and after christmas dad . . family , and maybe a small space to breath household family year s end christmas solstice generic winter holiday celebration , and get for a while . a room kids settled in to share family game night , fun vacations etc . baking , home made eggnog startto up with the christmas music , write holiday cards faraway people . grandparents getting to spend time with my family decorations christmas making buying xmas presents friends not being able to sleep like a five year old on xmas eve super fun holiday parties my st birthday ! family christmas . to a rojo friends cause of stupi auntie drama , my family headed off together with family and friends . holiday shopping with friends , and early dinners . query:familyholiday 30>= words sentences with 1. 2. 3. 4. 5. 6. 7. 8. query:familyholiday 1. 2. 3. 4. 5. 6. 7. 8.
the displays right Top data. full the on results query the in filters of effect the of Example
Figure 2. Example of themore effect in of depth filters responses, in wemay the can choose query filter to results. out exclude Top the righta the sentences displays weighting words with the “family” scheme less top or on 8 than “holiday” the results 30 from query for words, the words. the see results. query results In “familyis in In the shown holiday” the the bottom in on bottom top right Figure the left. right panel, full panel, data. we Since we increase we To see exclude the perform a the weight semantic word on query, “family”. the we word Finally, “family”, we to canDespite obtain define his results remarks that relate about more UIindeed to in tools, this his unique biggest ability interestfind that our was “genres” tool of brought journals to potentially or groups of users, which for him Testing on more data By this time wememory had to run enabled the a interactive queries larger on server 1% with of 64GB thehealth data. researcher, of who We had ahad mild interacted. experience The the researcher first beganquestions time by about we asking the data demographic set,user, such etc. as number He of considered users, thisHe posts important also per to wanted frame to his understandprocess research. and better asked the whether semantic itsentences. searching was The better researcher to mentioned search thatin keywords he or finding is users very that interested cessful are strategies to archetypes lose of weightprepared those and a keep who group it used of low. synonyms suc- Hesearch. about had We eating started also that searching in he theto wanted lo-fi see to prototype the just difference. as aposing The way the public question health “i researcher lost beganand weight by the and default kept it threshold, off” butfrom with found the no little lo-fi filter value dataset. in the We moved results then to theHe mid-fi was prototype, very impressed and wanted to see various full journal was very impressive. wanted to demonstrate this new iteration to the same public which return much more relevant results for the same query. “family” than “holiday”. Note that each result belongs to a full post, which we retrieve to the interviewer, e.g., the original post of the highlighted result Figure 1.2: top 8 results for the query “family holiday”. To see more in depth responses, 3 than less we withsentences can filter out the we may choose to exclude the words “family” or “holiday” weexclude from the theword “family”.results. Finally, Inwe can definethe a topweightingto more scheme rightrelate that results onobtainthe topanel, “family” qu theword on weight the increase we panel, right bottom “family” than “holiday”. Noteinterviewer, e.g., the that original post of each result belongs to a full post, which we retrieve to the 21
The social relationships researcher quickly found that the data was very promising. He was very happy with the large number of responses easily available, including expressions syntactically similar to the initial query as well as others semantically close, but quite different syntactically. We visited the actual journals as well (see figure 1.3 for an example), going back and forth between the response sentences and the full journal posts. He started finding meta relations across the sentences and we came across more than one answer from the same journal. The researcher felt that the synthetic interview tool was naturally helping him to find “genres” of users, and thought this was a great innovation. (see figure 4 for some examples). He said that the top results validated his past findings, from traditional research processes, that songs and smells are extremely triggers of reminiscence. He believed that synthetic interviewing has potential to support research at a formative phase as well as during refinement of the hypothesis. With more information about the population, the synthetic interview process could even become a useful formal tool. Finally, we closed our interviews with a law sociologist re- searching access to justice. The initial search results were quite peculiar because of the formal phrasing of her queries. When searching for “presence in court” many answers were about tennis courts. However when we added the filter to elim- inate the word “ball” from the results the answers were much better and she found many interesting quotes about the judi- ciary system. She reiterated the point of previous researchers that access to detailed demographic information is a key ele- ment to make the tool viable. Without that it would be very hard to use it in publications. I believe that I can use this tool in two moments. In the
beginning of my research when I want to decide which issue I will talk about say I will try to figure out what is FigureFigure 3. 1.3: Public Public post onpost LiveJournal.com, on LiveJournal.com, corresponding corresponding to the high- to the highlighted result in Figure 1.2. By viewing the full text in its original lighted result in Figure 2. By viewing the full text in its original context, going on, what people are talking about, what are the context, the interviewer is able to explore a response in greater depth. main issues that are going on. I can bring one issue that the interviewer is able to explore a response in greater depth. is interesting to help me make the right question. It’s nice to see what people are talking [about]. I can then build and again contacted the social networks researcher and run the question to be tested. And then I see another time to some of his former queries but this time with filters. use it when I want to make real research with people and He was delighted with the new results and made the remark I don’t have enough money or have enough time to go that he believes that even at this level of fidelity he would be to the field, it will make it easier and more applicable to much more interested in using the tool. make research with people. . . . it is a really good tool to be able to find the chunks However, the law sociologist believed that even without those of data that I want to use in my studies and I would feel key validation elements, the tool would be very helpful during pretty comfortable in being able to use that [word2vec]. the initial phase of the hypothesis formulation. She believed that the tool itself would be a great way to save money and Finally we showed the tool to two other researchers. The first time instead of going “door to door” at the early stages. She was a human computer interface researcher/designer studying mentioned that for some of her projects, e.g. in the favelas of online activism and multimedia communication tools for fam- Rio de Janeiro, she was not even able to go to the field as it ilies. He found very little merit in the data for the activism was too dangerous and she was forced to rely on secondary topic, likely because his particular research topics were more sources, like police or judiciary documents, to study the pro- modern than our dataset. Even with the mi-fi model, he did cess of pacification within the favelas. She believed that if the find some references that he found interesting once he adapted synthetic interview tool could be applied to a corpus of data his search strategy from keywords to more colloquial expres- containing posts from such areas, this tool would allow her sions (such as “please sign my petition”). At this point he direct contact with the people’s voice. Even in more accessible made a very important remark. He found that he had to com- areas, she believes the simple ability to search quickly through pletely change his search paradigm when using the synthetic what the people think about a topic is very compelling. interview tool, moving away from the example based queries that are successful with traditional keyword search engine I had experience in Brazil because my field was impos- queries and towards semantically meaningful questions. sible, so I decided to make research where I could have access available. So I decided to make a new social I think the problem might be when you think about a system. It was impossible to have people to classify the search or a query tool we are way to much biased by favelas, so I went to the judicial system database for the our daily search experience, “all about keywords” but it jurisprudence so I could make a research through the seems to me that this, for this to work, you have to give point of view of the institution. an example. The law sociologist also expressed that she was eager to use When he started searching for the information about a previous the tool as soon as possible for her current research topic. research topic, “reminiscence trigger,” the human computer interaction researcher was very happy with the results results 22
This is impressive...the actual responses for a query are very similar. . . Ah!, if you can search for all those “authors”, people that write about the same topic you could get something like “genres”. . . yeah, this would be very useful
He believed that an ability to changing the unit of research from sentences to journals to genres of journals would be very valuable. The researcher also suggested that it might be interesting to store a history of queries, making a micro corpus from previous responses on which to perform additional se- mantic searches. He mentioned that, as the synthetic interview tool incorporates these additional complexities, some user interface (UI) tools would be necessary to make the process more tractable.
. . . exploring text is hard... yeah, some tools like remembering my searches or some faceted search would be good, but I do not want the system to make many decisions for me, it will lead us to a “search bubble”
He was also very impressed that among the different results he was able to find phrases that were relevant to his query, but which did not include the actual query words at all. As you could see in Figure 2 the researcher searched for “family holidays” and found many results containing the word family, but also many that were about family without the actual word. He told us that he would like the ability to add a filter where the actual query words are excluded from the results.
. . . more than searching for the tail, it would be better to filter out “words” like this “family” word that comes many times and see only things like this one about the “brother and the sister”?
Despite his remarks about UI tools, his biggest interest was indeed in this unique ability that our tool brought to potentially find “genres” of journals or groups of users, which for him was very impressive.
1.4.4 Testing on more data
By this time we had enabled a larger server with 64GB of memory to run the interactive queries on 1% of the data. We wanted to demonstrate this new iteration to the same public health researcher, who had a mild experience the first time we had interacted. The researcher began by asking demographic questions about the data set, such as number of users, posts per user, etc. He considered this important to frame his research. He also wanted to understand better the semantic searching process and asked whether it was better to search keywords or sentences. The researcher mentioned that he is very interested in finding users that are archetypes of those who used successful strategies to lose weight and keep it low. He had also prepared a group of synonyms about eating that he wanted to search. We started 23
searching in the lo-fi prototype just as a way to see the difference. The public health researcher began by posing the question “i lost weight and kept it off” with no filter and the default threshold, but found little value in the results from the lo-fi dataset. We moved then to the mid-fi prototype, which return much more relevant results for the same query. He was very impressed and wanted to see various full journal entries. He felt that some journals were excellent case studies about his topic and he even thought of other potential research that people in public policy could make.
...and I think I will send this to my colleagues in the CDC, or even better get many other dissertations to other students in my department! :) . . .
The public health researcher said that he would love to see some additional features like the ability to subtract words or to get the sentences before and after the matches. Finally, he said that he would like to be able to easily see the journals and select the ones he wants to work on and search on those journals alone. He also reiterated the importance of understanding the socio- demographic properties of the dataset. He was not concerned about knowing the topics embedded in the corpus, and not much interested in a predetermined diversity filter. He would not mind using it but only if he can control it.
1.4.5 Mid Fidelity (Mi-Fi) Prototype
In response to the feedback from the interviewers we implemented the additional search options (filtering, importance weighting, and minimum threshold) described Querying subsection above. We also expanded the interactive search to 1% of the dataset. We call this the mid-fidelity (mi-fi) prototype. We now wanted to contrast our former iteration with our filters and again contacted the social networks researcher and run some of his former queries but this time with filters.
He was delighted with the new results and made the remark that he believes that even at this level of fidelity he would be much more interested in using the tool.
...it is a really good tool to be able to find the chunks of data that I want to use in my studies and I would feel pretty comfortable in being able to use that [word2vec].
Finally, we showed the tool to two other researchers. The first was a human computer interface researcher/designer studying online activism and multimedia communication tools for families. He found very little merit in the data for the activism topic, likely because his particular research topics were more modern than our dataset. Even with the mi-fi model, he did find some references that he found interesting once he adapted his search strategy from 24
keywords to more colloquial expressions (such as “please sign my petition”). At this point he made a very important remark. He found that he had to completely change his search paradigm when using the synthetic interview tool, moving away from the example based queries that are successful with traditional keyword search engine queries and towards semantically meaningful questions.
I think the problem might be when you think about a search or a query tool we are way too much biased by our daily search experience, “all about keywords” but it seems to me that this, for this to work, you have to give an example.
When he started searching for the information about a previous research topic, “reminiscence trigger,” the human computer interaction researcher was very happy with the results (see figure 1.4 for some examples). He said that the top results validated his past findings, from traditional research processes, that songs and smells are extremely triggers of reminiscence. He believed that synthetic interviewing has potential to support research at a formative phase as well as during refinement of the hypothesis. With more information about the population, the synthetic interview process could even become a useful formal tool.
Finally, we closed our interviews with a law sociologist re- searching access to justice. The initial search results were quite peculiar because of the formal phrasing of her queries. When searching for “presence in court” many answers were about tennis courts. However, when we added the filter to eliminate the word “ball” from the results the answers were much better and she found many interesting quotes about the judiciary system. She reiterated the point of previous researchers that access to detailed demographic information is a key element to make the tool viable. Without that it would be very hard to use it in publications.
I believe that I can use this tool in two moments. In the beginning of my research when I want to decide which issue I will talk about say I will try to figure out what is going on, what people are talking about, what are the main issues that are going on. I can bring one issue that is interesting to help me make the right question. It’s nice to see what people are talking [about]. I can then build the question to be tested. And then I see another time to use it when I want to make real research with people and I don’t have enough money or have enough time to go to the field, it will make it easier and more applicable to make research with people.
25
are from 4
and 2
containing “trigger” containing chris , the fat son was trying. to lose weight by dieting and exercising and trying after dieting , exercising to lose weight i ve gained . i have spent the past year , losing weight on a low carb diet . i m back on the diet train , tryingweight . to lose some more my lost five pounds by drastically cutting carbs from speaking of weight , i ve already that i was part some of you may remember the study comparing atkins weight loss to , i m drinking diet soda . and in my latest attempt to lose weight without exercising the last week i ve been on a diet , trying to lose weight . , memories of the news memories you wish could forget , to remember i keep having memories of her ... happy , sad that fall that makes me sad ... choir banquet brings back memories , bad good , memories of but next subject ... today was a day of memories , jim hoff , and sometimes memories of dreams thus , i have memories of remembering she is the one who keeps memories ; of tears running silent has been a very long year , bad memories sweet frustating moments memories , and those make however , good memories make you think of more diet . weight loss on a low fat , moderate carb diet . together . memories of the weather , us when we were in between . somewhere memories , and evil . friends , memories of the summer . . , but never memories of dreams of dreams issued harshly ; memories of pain too intense to explain blood threats washed away . moments , happy memories sadness emo shit . stressed memories . you think of more query: I lost weight by trying caloriecounting trying by weight lost query:I “calorie” excludesentences containing 1. 2. 3. 4. 5. 6. 7. 8. memories powerful such cantrigger asong how funny query:it's excludesentences 5times“memories” more weighted 30>= words sentences with 1. 2. 3. 4. 5. 6. 7. 8. success. One researcher voicedLiveJournal, that [most because authors] “the postlove, corpus about and is topics marriage”, such however there astopics “would food, like not physics”. be This [entries] shortcoming on is in the querymi-fi corpus, prototypes described above, weof were the able to dataset query offline 100% interactive interviews). (the The process results is insuch Figures currently full too corpus. slow Thismany for significantly queries. improves the We alsoother quality topics of expect which that are synthetic poorlyposts represented interviews can in on be the executed LiveJournal successfullycorpus by with choosing greater an relevance. alternate dings. We experimented with embeddingsNews trained as on Google well astraining LiveJournal corpus posts, provides and ansynthetic additional concluded interviews option that to for the specific tuning research the topics. FUTURE WORK ods as a alternative/complement toing semi-structured in interview- user studies. Atmatching high-level, we improved found over that traditional using keyword-based semantic and methods, that searching a larger dataset improves the quality and This paper was a first exploration of semantic matching meth- which we can target in two ways. In addition to our lo-fi and Another variable in the interviews is the choice of word embed- Example Results Figure 4. Example queries and results ran on 100% of the data, and with distinct filters. Example queries and results ran on 100% of the data, and with distinct filters. distinct with and data, the of 100% on ran results and queries Example
1.4: reminiscencetrigger Figure d of naisbitt s book , representative democracy to participatory d of naisbitt s book , representative democracy continues but , just incase someone out there s supper trigger sensitive , warning but , just incase someone out there . , talk about a memoryby particular today ; s prompt triggered song ? rage , like this . memories , that just sends me into pure things trigger off some smells not only trigger memories , they the intense feelings that went along for camera to trigger studio flashlight discreteness trigger is a control this wireless its funny how much a smell or song can trigger such vivid memories . my heartthat trigger a series of memories . still melts at the constant reminders a photograph or song , for example that triggers nostalgic memory is there of a certain political participation an educated , questioning and engaged citizenry is essential for , political fundamental building blocks popular sovereignty a democracy needs the three not a democracy ... who knew ? bahamamama : democracy nope its a republic tren democracy is an organization composed of democratic it goes thusly : grassroots a citizen s guide to democracy of democracy another section from the future harding democracy would oppress in his discussion of democracy , he cautioned that a pure : civil rights , economy and political . social democracy your nation s freedoms with them too . . synchronously period of your life ? successful democracy . equality and political liberty to be a democracy at all . the decentralization theme of past two chapters . activists and others determinedthe democratic party to revitalize . inaction . minorities . query: “reminiscence” excludesentences containing 1. 2. 3. 4. 5. 6. 7. 8. query:democracy participation 1. 2. 3. 4. 5. 6. 7. 8. CONCLUSIONS In this paper, weinterviews presented on a a new large, tool colloquialtage text to of corpus. perform advances We synthetic in take semantic advan- powerful word semantic embeddings search to provide toolsentences. a based Additional on features queries for modifyingthe phrased and as search augmenting results wereeral designed qualitative researchers. with the Theindicated feedback feedback that from of the researchers tool sev- semi-structured can interviews. serve The some responses of alsotool’s highlight potential the the advantages same in goals economy,representativeness, as interviewer and reflection, interview control. research questions and a large collection of informal anecdotes. others in thatresearcher it states is that “not he “cansimply all also “thinking about profit of from keywords”. examples [ourtraditional tool]” to keyword Instead, by searching, search the the on”. syntheticmore interview varied Compared gives results, to including manyoriginal whose query connection is not to obvious the toof the the interviewer at process. the beginning cannot ignore the remaining minority that did not echo similar The synthetic interview tool bridges the divide between formal As stated by the online activism researcher, this tool is unlike While a majority of our trials yielded positive results, we 26
However, the law sociologist believed that even without those key validation elements, the tool would be very helpful during the initial phase of the hypothesis formulation. She believed that the tool itself would be a great way to save money and time instead of going “door to door” at the early stages. She mentioned that for some of her projects, e.g. in the favelas of Rio de Janeiro, she was not even able to go to the field as it was too dangerous and she was forced to rely on secondary sources, like police or judiciary documents, to study the process of pacification within the favelas. She believed that if the synthetic interview tool could be applied to a corpus of data containing posts from such areas, this tool would allow her direct contact with the people’s voice. Even in more accessible areas, she believes the simple ability to search quickly through what the people think about a topic is very compelling.
I had experience in Brazil because my field was impossible, so I decided to make research where I could have access available. So I decided to make a new social system. It was impossible to have people to classify the favelas, so I went to the judicial system database for the jurisprudence so I could make a research through the point of view of the institution.
The law sociologist also expressed that she was eager to use the tool as soon as possible for her current research topic.
1.5 FUTURE WORK
This chapter was a first exploration of semantic matching methods as a alternative/complement to semi-structured interviewing in user studies. At high-level, we found that using semantic matching improved over traditional keyword-based methods, and that searching a larger dataset improves the quality and diversity of retrieved posts. These findings suggest a number of avenues for future work:
Improved Semantic Technologies. Word2vec is a fast and moderately effective semantic embedding scheme. But it ignores word order and phrase structure. State-of-the-art methods use recurrent neural nets (RNNs) such as LSTM (Long Short-Term Memory) [72]. These better model local and global sentence structure, and can be used to model paragraph and document-level structure. Using better embedding methods will likely lead to more useful results from semantic matching.
Larger-Scale Live Matching. Semantic matching is expensive at the full scale of Livejournal. With 500 million sentences and 300-dimensional embedding, one needs 600 GB of “hot” data in memory to do model matching. That re- quires either custom hardware or a cluster of machines. In the former case, one also needs a fast index to retrieve nearby sentences. Both solutions are practical but require integration of state-of-the-art techniques, and remain 27
work in progress for us.
Synthetic Text Generation. One powerful affordance of the LSTM design is the generation of text as well as matching. That is, one can produce fully synthetic user output in response to a query. This synthetic text has high quality on related tasks such as language translation. While one loses the affordances related to exploring a particular user’s post in context, one gains affordances related to the diversity and representativeness of the synthetic posts (via controls on the diversity of their synthesis). One also gains a level of privacy protection relative to retrieving true posts. The quality and utility of these posts is very much an open question.
1.6 CONCLUSION
In this chapter, we presented a new tool to perform synthetic interviews on a large, colloquial text corpus. We take advantage of advances in semantic word embeddings to provide a powerful semantic search tool based on queries phrased as sentences. Additional features for modifying and augmenting the search results were designed with the feedback of several qualitative researchers. The feedback from researchers indicated that the tool can serve some of the same goals as semi-structured interviews. The responses also highlight the tool’s potential advantages in economy, interviewer reflection, representativeness, and interview control.
The synthetic interview tool bridges the divide between formal research questions and a large collection of informal anecdotes. As stated by the online activism researcher, this tool is unlike others in that it is “not all about keywords”. Instead, the researcher states that he “can also profit from [our tool]” by simply “thinking of examples to search on”. Compared to traditional keyword searching, the synthetic interview gives more varied results, including many whose connection to the original query is not obvious to the interviewer at the beginning of the process.
While a majority of our trials yielded positive results, we cannot ignore the remaining minority that did not echo similar success. One researcher voiced that because “the corpus is LiveJournal, [most authors] post about topics such as food, love, and marriage”, however there “would not be [entries] on topics like physics”. This shortcoming is in the query corpus, which we can target in two ways. In addition to our lo-fi and mi-fi prototypes described above, we were able to query 100% of the dataset offline (the process is currently too slow for interactive interviews). The results in Figures 1.2 and 1.4 are from such full corpus. This significantly improves the quality of many queries. We also expect that synthetic interviews on other topics which are poorly represented in the LiveJournal posts can be executed successfully by choosing an alternate corpus with greater relevance. 28
Another variable in the interviews is the choice of word embed- dings. We experimented with embeddings trained on Google News as well as LiveJournal posts, and concluded that the training corpus provides an additional option for tuning the synthetic interviews to specific research topics.
29
Chapter 2 Game & Narrative Design for Behavior Change
This chapter presents a list of principles that could be used to conceptualize games and narratives for behavior change. These principles are derived from lessons learned after teaching two design-centered courses around Gaming and Narrative Technologies for Health Behavior Change. Course sessions were designed to create many rapid prototypes based on specific topics from behavior change theory coupled with iterative human-centered and games design techniques. The design task was composed of two broad goals: 1) designing efficacious technologies, with an emphasis on short-term behavior change and 2) using metaphors, dramatic arcs and game dynamics as vehicles for increased engagement and long-term sustained change. Some example prototypes resulting from this design approach are presented.
30
2.1 INTRODUCTION
Persuasive technologies such phone apps or serious games share common goals of creating an engaging and efficacious experience towards behavior change. Either by modifying or adapting current interventions or by designing new applications based on behavior change theory, these techniques have the potential to reach millions of people who can benefit from a pervasive medium. Most designers and HCI professionals deal on regular basis with apps focused mostly on usability and engagement. In parallel, many health and biomed researchers focus mostly on high efficacy. Appropriate usability design may not be sufficient to guarantee long-term engagement - many times needed to gain adequate efficacy levels - however it is a needed condition for initial engagement/adoption. In this chapter we present a preliminary list of principles for conceptualization of games for behavior change derived from key lessons learned after teaching two semesters of design-oriented classes, focused on games to improve wellbeing and health.
2.2 PREVIOUS WORK
Several studies have shown that CCBT (Computerized Cognitive Behavioral Therapy) such as MoodGym [98], Beating the Blues [11], among others, compare very well with face-to-face therapy. However, engagement and attrition levels are not acceptable. Indeed, even though 2/3 of depressed patients say they would prefer therapy over drug treatment, only 20% of patients referred for in-person psychotherapy actually start it, and 1/3 of those will drop out [95]. Web-based therapy also has very poor engagement, although apparently for different reasons [84]. Dropouts may be due to (i) lack of commitment by patients (ii) lack of a regular schedule for system use (iii) difficulty or tediousness of using the tools.
Gaming has important characteristics that enhance some cognitive elements such selective attention [48], which can play an important role to behavior change, as they could help people pay more attention to the main message. Another very important characteristic gaming offers is that it also makes the learning process fun [83], which in turn generate better engagement. Complementary, games also help increase motivation [128] and emotional engagement. [59]
Previous work form Baranoski, et. al. [10] has already shown success using games for health behavior change. Some other gaming examples focused on health are the Personal Investigator [29] leveraging CBT for mental health, and Superbetter.us [136] which leverages real-life social support embedded into a superhero story.
Complementary to the gaming literature, the use of persuasive technology to 31
improve usability and engagement for physical activity has been studied with the Ubifit system [27], which showed increased exercise levels by improving goal tracking, as well as using metaphors to improve people’s engagement. Many other examples around exercise, sleep and stress reduction, such as Nike Plus [107], HearthMath [53] and FitBit [41] seem to indicate that systems associated with a lifestyle change have also higher levels of engagement among their niche adopters. In any case, it is yet to be seen if these technologies are set to be adopted widely. the use of games to improve engagement we find the Lumosity [79] suite of games used for cognitive training. Cognitive techniques are wrapped around mini games, improving engagement, ensuring improved efficacy over time [129].
2.3 THE CHALLENGE: EFFICACY + EFFICIENCY
The challenge to merge efficacy and engagement can be dissected into the following design dualities (Figure 2.1):
Scientific vs. Iterative methods: A gap exists between current clinical intervention development methods based on the scientific method (hypotheses + statistical validation) and iterative gaming and app technology design. Usability design demands an approach that favors exploring ideas based on prompt user feedback through the construction of prototypes. However, it is necessary to keep in mind that the overall goal is to generate efficacious behavioral change that helps overcome health or wellbeing problems. Merging these two approaches is one of the constraints used to design our course sessions.
Short vs. Long-term focus: Short vs. long-term change is treated differently from a behavioral perspective. The former demands knowledge around decision-making, emotional elements and personal skills, while the latter demands a deeper understanding of identity and personality. A good way to mix behavior change goals with identity and personalization are narratives and games. These two elements incorporate concrete micro tasks associated with roles and missions that can be translated into smaller behavior change skills, while the metaphors, scenarios and stories support a deeper immersion into new identities.
Content vs. Dynamics: When designing interventional technology, efficacy is usually regarded as the main goal. Engagement usually plays a secondary role, which could have a major impact in the adoption of the technology. Commercial apps do look for a more complete user experience, which pays attention to execution as well as engagement and identity details. However, success is usually measured in terms of revenue generation, rather than
engagement for physical activity has been studied with Short vs. Long-term focus: Short vs. long-term change the Ubifit system [3], which showed increased exercise is treated differently from a behavioral perspective. The 32 levels by improving goal tracking, as well as using former demands knowledge around decision-making, metaphors to improve people’s engagement. Many emotional elements and personal skills, while the latter behavioral metrics. Merging both the content (i.e. narrative)other as well examples as the around exercise, sleep and stress demands a deeper understanding of identity and dynamics of the game into a coherent design that help develop real life skills reduction, such as Nike Plus [14], HearthMath [15] and personality. A good way to mix behavior change goals is yet another design challenge to be considered. FitBit [16] seem to indicate that systems associated with identity and personalization are narratives and Scientific with a lifestyle change have also higher levels of games. These two elements incorporate concrete micro engagement among their niche adopters. In any case, tasks associated with roles and missions that can be it is yet to be seen if these technologies are set to be translated into smaller behavior change skills, while the Short-term adopted widely. metaphors, scenarios and stories support a deeper immersion into new identities. the use of games to improve engagement we find the Content Lumosity [18] suite of games used for cognitive Content vs. Dynamics: When designing interventional training. Cognitive techniques are wrapped around mini technology, efficacy is usually regarded as the main Dynamics games, improving engagement, ensuring improved goal. Engagement usually plays a secondary role, which efficacy over time [11]. could have a major impact in the adoption of the technology. Commercial apps do look for a more Long-term The challenge: Efficacy + Engagement complete user experience, which pays attention to The challenge to merge efficacy and engagement can execution as well as engagement and identity details. be dissected into the following design dualities (Figure However, success is usually measured in terms of Iterative 1) : revenue generation, rather than behavioral metrics.
Figure 2.1: Efficacy + Engagement game design Merging both the content (i.e. narrative) as well as the dualities. Scientific vs. Iterative methods: A gap exists between dynamics of the game into a coherent design that help
2.4 DESIGN METHODOLOGY current clinical intervention development methods develop real life skills is yet another design challenge to based on the scientific method (hypotheses + statistical be considered. It is important to note that the methodology followedvalidation) during and the iterative gaming and app technology conceptualization process Figure in the course 1 – Efficacy is focused + on maximizing creativity provided the aforementioned constraints. To help students acquiredesign. sensibility Usability design demands an approach that Design methodology Engagement game design around behavioral efficacy, specific behavioral theories are usedfavors as exploringthe basis ideas based on prompt user feedback It is important to note that the methodology followed of design challenges to promote dualitiesrapid prototyping in very shortthrough sessions. the construction A of prototypes. However, it is during the conceptualization process in the course is specialist, who many times has little design experience,necessary presents to the keep in mind that the overall goal is to focused on maximizing creativity provided the theoretical component in a one-hour talk. Students cangenerate ask questions efficacious behavioral change that helps aforementioned constraints. To help students acquire associated with the topic being presented, and the specialist intervention overcome health or wellbeing problems. Merging these sensibility around behavioral efficacy, specific closes with a brief discussion around the way such behavior change theories can be used to design new technology. Table 2.1 shows a list oftwo the approaches theoretical is one of the constraints used to design behavioral theories are used as the basis of design topics presented. During the second hour a design challengeour is presentedcourse sessions. to challenges to promote rapid prototyping in very short the students. They need to go from problem assessment to a complete game sessions. A specialist, who many times has little design concept with rules, usability scenarios, title and introduction. In many cases we even ask them to create a suggestive billboard to position the idea. Students need to begin by expressing a behavioral problem through its disempowering narrative and find the counteracting empowering narrative, which leverages the behavior change 33
concept taught by the specialist. The students must storyboard both narratives (disempowering and empowering). They must also externalize relevant intangible elements such as the problems themselves and/or the feelings associated with it by converting them into enemies, scenes, obstacles or other gaming elements. Additionally, they need to externalize the skills needed to overcome such problems by portraying them as weapons or as specific game dynamics. At all times, students are encouraged to make sure game progression is elicited and not only end goals, to make sure change is embraced by the users. Finally, students are asked to test each other’s games, present their game as if it was being launched on TV or act their games out.
Behavior Change Topics Narrative Topics
Intro to Behavior Change Intro to Life Stories
Body-Mind Connection Narrative Psychology
Positive Psychology Drama Therapy
Sports Psychology Neuroscience Games
Anxiety, Depression and Cognitive Trauma Narratives Behavioral Therapy
Behavior Change in Society Improv-based Games
Communitarian Mental Health Digital Storytelling Interventions
Social Networking for Behavior Change
Table 2.1: Behavior Change and Narrative topics taught in addition to Game Development and Human Centered Design topics.
2.5 GAME PROTOTYPE EXAMPLES
Among many others we chose a few examples of the work done in class:
Monsters (Figure 2.2) – a simple two-player game based on monsters and weapons. Concepts around externalization and empowering metaphors drawn from drama therapy and narrative therapy are used to make “visible” enemies and the weapons to destroy them. These elements have clear links to the problems and skills needed to solve them. For example, a monster representing stress can be seen as a flaming monster, and player 2 can be
34
used to help you blow the torch by teaching you how to breath correctly to calm you down. “warrior” narrative helps people to be active and reflection could help increase people’s awareness of assertive, while a “victim” makes the person their own thoughts and further understand their passive and receptive of disgrace. problems. Furthermore, health behavior change games must be designed to be adaptive to changes Principles for Conceptualization in problem definition, as the game helps the user Understanding disempowering narratives discover the root cause of a superficial problem. a. Narratives are lived - not only used to tell stories d. Problems as fictional enemies – Externalizing about one self. People confront ideas and situations problems into concrete game elements (i.e. objects, based on the way they portray themselves. This is monsters, obstacles, etc.) help people understand observed in trauma patients who cannot overcome that a problem does not occupy every aspect of their the generalization of their disempowering lives. It also helps the user understand the narratives. Understanding the narratives new characteristics of the problem, which in turn will empowering narratives underneath unhealthy help understand the possible solutions around it. behaviors will help design new empowering Designers should provide users the possibility to
narratives that change unhealthy habits, eliminate externalize their problematic feelings into a concrete Figure 2.2: Monsters Game Screen Figure 2 – Monsters Game over-generalizations and organize thoughts around game or narrative element that can later be
Scheherazade’s World (Figure 2.3Screen) – a game that aids in the preventionthe of appropriate context. destroyed or controlled. The element representing suicide by creating a community for at-risk young women to share their stories. The One Thousand and One Nights tells of a king named Shahryar,b. Focus on strengths – Design around behavior the problem should have a clean metaphor, for who would marry a new wife each day and sentence yesterday’s wife to death.change can benefit from understanding people’s example, stress into an oppressive rock, or Unlike previous wives, Scheherazade had a secret weapon to keep her alive. Every night, she would tell the king a story, only to end with a cliffhanger eachcurrent strengths, rather than imposing an ideal depression as glue that impedes you to move, in night. Because the king wanted to know the rest of the story, he would spare her life for another day. Through stories, she was able to survive. This gamemodel for functioning under a specific situation. A order to help the user understand the characteristics is based in part on Narrative Therapy and Drama Therapy aspects, as wellkey as concept that describes the basis for behavior and affordances of the problem at hand. Digital Storytelling. change is what Bandura defines as self-efficacy [1]. e. Materializing Skills into weapons – As well as In a nutshell, self-efficacy explains using current tangible problems, weapons that represent the skills strengths. However, discovering strengths may required to overcome the problem should be Bedroom – Write stories here demand an exploration not only of thriving materialized. Such tools must carry a clean experiences, but also difficult experiences, where meaning that is memorable and supports the notion strengths are used to be resilient and survive that change is possible via the use of the metaphors emotional or physical pain or disgrace. associated with such weapons.
Externalizing problems Game Dynamics as Interventions c. Interpretation and introspection – Problems are f. Progress as a proxy for self-efficacy – Eliciting rarely completely understood by users. Designers progress should be a key element of game design Sultan’s bedroom – Listen to Scheherazade should strive to provide tools, time spaces and cues for behavior change. Many times users need to Figure 3 – Scheherazade’s to help people interpret problems and introspect. realize first that “change” is actually possible. If no World Game Screens Games with forced pauses and prompts for progression is clearly observed, the sensation of
“warrior” narrative helps people to be active and reflection could help increase people’s awareness of assertive, while a “victim” makes the person their own thoughts and further understand their passive and receptive of disgrace. problems. Furthermore, health behavior change games must be designed to be adaptive to changes Principles for Conceptualization in problem definition, as the game helps the user Understanding disempowering narratives discover the root cause of a superficial problem. a. Narratives are lived - not only used to tell stories d. Problems as fictional enemies – Externalizing about one self. People confront ideas and situations problems into concrete game elements (i.e. objects, based on the way they portray themselves. This is monsters, obstacles, etc.) help people understand observed in trauma patients who cannot overcome that a problem does not occupy every aspect of their the generalization of their disempowering lives. It also helps the user understand the narratives. Understanding the narratives new characteristics of the problem, which in turn will empowering narratives underneath unhealthy help understand the possible solutions around it. behaviors will help design new empowering Designers should provide users the possibility to
narratives that change unhealthy habits, eliminate externalize their problematic feelings into a concrete Figure 2 – Monsters Game over-generalizations and organize thoughts around game or narrative element that can later be Screen the appropriate context. destroyed or controlled. The element representing 35 b. Focus on strengths – Design around behavior the problem should have a clean metaphor, for change can benefit from understanding people’s example, stress into an oppressive rock, or current strengths, rather than imposing an ideal depression as glue that impedes you to move, in model for functioning under a specific situation. A order to help the user understand the characteristics key concept that describes the basis for behavior and affordances of the problem at hand. change is what Bandura defines as self-efficacy [1]. e. Materializing Skills into weapons – As well as In a nutshell, self-efficacy explains using current tangible problems, weapons that represent the skills strengths. However, discovering strengths may required to overcome the problem should be Bedroom – Write stories here demand an exploration not only of thriving materialized. Such tools must carry a clean experiences, but also difficult experiences, where meaning that is memorable and supports the notion strengths are used to be resilient and survive that change is possible via the use of the metaphors emotional or physical pain or disgrace. associated with such weapons.
Externalizing problems Game Dynamics as Interventions c. Interpretation and introspection – Problems are f. Progress as a proxy for self-efficacy – Eliciting rarely completely understood by users. Designers progress should be a key element of game design Sultan’s bedroom – Listen to Scheherazade should strive to provide tools, time spaces and cues for behavior change. Many times users need to Figure 2.3: Scheherazade’s Word Game ScreensFigure 3 – Scheherazade’s to help people interpret problems and introspect. realize first that “change” is actually possible. If no Semester Adventure World(Figure 2.4 Game) – a simple Screens game to reduce stress and improve time management around test exams, where the player follows anGames with forced pauses and prompts for progression is clearly observed, the sensation of adventure as a warrior that needs to reduce stress by gaining powerful tokens by improving his/her time management skills, i.e. fulfilling tasks on time, which are portrayed as enemies to be beaten. This game leverages personality theories based on life stories, which indicates that people assume new roles based on the way they define themselves. A “warrior” narrative helps people to be active and assertive, while a “victim” makes the person passive and receptive of disgrace.
36 with Personal, Mobile, Displays, International Conference on inefficacy is perpetuated and therefore, any Ubiquitous Computing, 2008 additional effort to develop skills or change [4] D. Coyle, D., M. Matthews, J. Sharry, A. Nisbet and G. motivations could be futile. The initial game levels Doherty, Personal Investigator: A Therapeutic 3D Game for must demonstrate to the user that change is Adolescent Psychotherapy, International Journal of Interactive possible. Technology and Smart Education, 2(2):73-88, 2005 g. Social Validation – Sharing and celebrating with [5] S. Green, D. Bavelier, Action video games modifies visual others helps assimilate the new changing reality. selective attention. Nature, 423, 534-537, 2003 Without social affirmation around change, progress [6] S. Hsu, F. Lee, M. Wu, Dessigning Action games for appealing to buyers. Cyberpshychology Behavior, 8:585-91, may seem part of our imagination. Designers 2005 should use social affirmation to promote self- [7] Malone, T.W., What Makes Things Fun to Learn?: efficacy. Using social influence could be used as a Heuristics for Designing Instructional Computer Games. In vehicle to get some concrete change, but it runs the Proc. SIGSMALL, ACM Press, NY, USA, 162-169, 1980 risk to leave the user believing that they were [8] I. M. Marks, K. Kavanaugh, and L. Gega. Hands-On Help: imposed a new reality by others and therefore Computer-Aided Psychotherapy. Psychology Press, 2007 reducing gains in self-efficacy, which ultimately [9] D. C. Mohr, L. Vella, S. Hart, T. Heckman, and G. Simon. drives change. The effect of telephone-administered psychotherapy on symptoms of depression and attrition: A meta-analysis. Clinical Psychology: Science and Practice, 15(3):243–253, 2008 Figure 2.4: Semester adventure game screen Acknowledgements [10] R. Ryan, C. Rigby, A. Przbylski. The motivational pull of Figure 4 – Semester 2.6 NARRATIVE PROTOTYPE EXAMPLE We thank all the guest lecturers, students and visitors videogames: a self determination theory approach. Motivation Adventure Game Screen to our class for helping us build a highly creative and and Emotion. 347-63, 2006 2.6.1 Mental Health Background fun environment around serious issues and their [11] M. Scanlon, D. Drescher, K. Sarkar, Improvement of The World Health Organization reported that mental disorders arecomplex a major solutions. Visual Attention and Working Memory through a Web-based financial burden for the world’s economy. They are currently the third most Cognitive Training Program, Lumos Labs, 2007 costly health problem in terms of disability-adjusted life-years [105] around [12] http://moodgym.anu.edu.au/welcome the world, and the largest in the US and Canada. They represent 10%References of the global disease burden. Fewer than 25% of depressed patients receive[1] A. the Bandura, Self-efficacy: Toward a unifying theory of [13] http://www.beatingtheblues.co.uk/ necessary treatment and an even smaller percentage of at-risk groups have behavioral change. Psychological Review, Vol 84(2), 191-215, [14] http://nikeplus.nike.com/plus/ access to adequate preventive care [105]. In many developing 1977 countries, mental illness is not even recognized as a medical problem [94]. Furthermore, [15] http://www.heartmathstore.com/ (untreated) depression rates are very high for chronic disease patients[2] T. in Baranowski, R. Buday, D. Thompson, J. Baranowski, those countries (e.g. AIDS patients) and impede patient’s treatment,Playing for Real: Video Games and Stories for Health-Related [16] http://www.fitbit.com/ compounding those health problems [94]. A multitude of barriersBehavior exist, Change. American Journal of Preventive Medicine, Vol [17] https://www.superbetter.com/ including a lack of trained professionals and understanding of mental34(1), illness 74 -82, 2008 by patients themselves. Even in developed countries, mental illness is [18] http://www.lumosity.com [3] S. Consolvo, P. Klasnja, D. McDonald, D. Avrahami, J. Froehlich, L. LeGrand, R. Libby, K. Mosher and J. Landay, Flower or a Robot Army? Encouraging Awareness and Activity
37
stigmatized and many people will not seek treatment.
In this section we introduce a machinima (machine + cinema) system that will empower therapists, social workers, public health professionals, to further increase the efficacy of therapy. Machinima is an animation recorded from a game engine where a human “directs” the actions of the characters. For our application we have focused on the SIMS engine because of its rich emotive content already available, in fact, most of the scenarios and characters required were readily available. Machinima allows the creation of engaging entertaining videos as an adjunct to, and as a potential alternative to, existing approaches. CBT is an effective approach for a variety of mental health concerns including depression, anxiety, insomnia, eating pathology, marital stress, and chronic pain [21]. CBT principles have been integrated into several computerized formats as both an adjunct to face-to-face therapy and as a stand alone treatment. Computerized CBT (CCBT) is a fully automated implementation of CBT where patients follow a complete treatment through interactive texts and figures. Several studies have shown that CCBT compares very well with face-to-face therapy [71, 95], and CCBT is approved for reimbursement under Britain’s National Health Service (NHS).
Indeed, although therapy is effective and patients prefer it to psychopharmacological treatment, only20% of patients referred for in-person psychotherapy actually start it, and 1/3 of those will drop out [84]. Thus even in developed countries, there are great barriers (i.e., stigma, lack of access, anxiety about live discussion, etc.) to conventional treatments for mental health problems. Web-based therapy also has poor uptake, although apparently for different reasons [100]. Dropouts may be due to (i) lack of commitment by patients (ii) lack of a regular schedule for system use (iii) difficulty or tediousness of using the tools.
Our approach uses scripted health interventions following the design principles of a number of successful systems, but adding two incentives: (i) use of an interactive entertainment scenario format, and (ii) engagement of family members in the therapy process. Although we focus our initial efforts in the treatment of depression, we believe the values of a cinematographic approach to engage patients who could benefit from CBT in general. To test our idea, we are currently producing the material that translates the CBT-based manuals for the treatment and prevention depression developed at San Francisco General Hospital [100]. Figure 2.5 shows both; the printed material that is distributed to patients, and some of the machinima screenshots of the micro video created to enhance/replace this material. “Deus Ex”Ð Machinima for Behavior Change
Pablo E. Paredes, MS, MBA1, Stephen Schueller, PhD2, John F. Canny, PhD1 1University of California, Berkeley, CA; 2University of California, San Francisco, CA
Abstract This paper presents “Deus Ex” a system which usesmachinima (machine + cinema) as a complement for Cognitive Behavioral Therapy (CBT). The expected gains are to improve engagement, likeability and adherence through cinematographic fun and increase efficacy by providing a balance of interactivity and narrative. Through highly pervasive media such as mobile phones and digital video discs (DVD), machinima-based CBT should further reach out to populations currently lacking access to this type of therapy. Furthermore, family engagement can be increased through series of videos based on family psycho-educational programs.
Introduction “Deus Ex”Ð MachinimaThe forWorld Behavior Health Organization Change reported that mental disorders are a major financial burden for the world’seconomy. They are currently the third most costly health problem in terms of disability-adjusted life-years1 around the world, Pablo E. Paredes, MS, MBA1, Stephenand Schueller, the largest PhD in the2, John US and F. Canny, Canada. TheyPhD 1 represent 10% of the global disease burden.Fewer than 25% of 1University of California, Berkeley, CA; 2depressedUniversity patients of California, receive the Sannecessary Francisco, treatment CA and an even smaller percentageof at-risk groups have access to adequate preventive care1. In many developing countries, mental illnessis not even recognized as a medical 2 Abstract problem . Furthermore, (untreated) depression rates are very highfor chronic disease patients in those countries (e.g. AIDS patients)and impede patient’s treatment, compoundingthose health problems2. A multitude of barriers exist, This paper presents “Deus Ex” a system which usesmachinimaincluding (machine a lack + of cinema) trained as professionalsand a complement for understandingCognitive of mental illness by patients themselves. Even in Behavioral Therapy (CBT). The expected gains are to improve engagement, likeability and adherence through developed countries, mental illness isstigmatized and many people will not seek treatment. cinematographic fun and increase efficacy by providing a balance of interactivity and narrative. Through highly In this poster we introduce a machinima(machine + cinema) system that will empower therapists, social workers, pervasive media such as mobile phones and digital video discs (DVD), machinima-based CBT should further reach out to populations currently lacking access to this typepublic of theraphealth yprofessionals,. Furthermore, to familyfurther engagementincrease the canefficacy be of therapy. Machinimais an animation recorded from a increased through series of videos based on family psychogame-educational engine whereprograms. a human “directs” the actions of the characters. For our application we have focused on the SIMS engine because of its rich emotive content already available, in fact, most of the scenarios and characters required were readily available. Machinima allows the creation of engaging entertaining videos as an adjunct to, Introduction and as a potential alternative to, existing approaches. CBT is an effective approach for a variety of mental health concerns including depression, anxiety, insomnia, eating pathology, marital stress, and chronic pain3. CBT The World Health Organization reported that mental disorders are a major financial burden for the world’seconomy. principles have been integrated into sever1 al computerized formats as both an adjunct to face-to-face therapy and as a They are currently the third most costly health problem standin terms alone of disability treatment.-adjusted Computerized life-years CBTaround (CCBT) the world, is a fully automated implementation of CBT where patients and the largest in the US and Canada. They representfollow 10% a ofcomplete the global treatment disease through burden.Fewer interactive than texts 25% and of figures. Several studies haveshown that CCBT compares depressed patients receive the necessary treatment and an even smaller percentageof at-risk 4,5groups have access to 1 very well with face-to-face therapy , and CCBT is approved for reimbursementunder Britain’s National Health adequate preventive care . In many developing countries,Service mental (NHS). illnessis not even recognized as a medical 2 38 problem . Furthermore, (untreated) depression rates are very highfor chronic disease patients in those countries (e.g. AIDS patients)and impede patient’s treatment, compoundingthose health problems2. A multitude of barriers exist, including a lack of trained professionalsand understanding of mental illness by patients themselves. Even in developed countries, mental illness isstigmatized and many people will not seek treatment. In this poster we introduce a machinima(machine + cinema) system that will empower therapists, social workers, public health professionals, to further increase the efficacy of therapy. Machinimais an animation recorded from a game engine where a human “directs” the actions of the characters. For our application we have focused on the SIMS engine because of its rich emotive content already available, in fact, most of the scenarios and characters required were readily available. Machinima allows the creation of engaging entertaining videos as an adjunct to, and as a potential alternative to, existing approaches. CBT is an effective approach for a variety of mental health concerns including depression, anxiety, insomnia, eating pathology, marital stress, and chronic pain3. CBT principles have been integrated into several computerized formats as both an adjunct to face-to-face therapy and as a stand alone treatment. Computerized CBT (CCBT) is a fully automated implementation of CBT where patients follow a complete treatment through interactive texts and figures. Several studies haveshown that CCBT compares very well with face-to-face therapy4,5, and CCBT is approved for reimbursementunder Britain’s National Health Service (NHS).
Figure 1. Original CBT cartoon vs. machinima video screenshot.
Figure 1. Original CBT cartoon vs. machinima video screenshot. Figure 2.5: Original CBT cartoon versus machinima video screenshot.
39
2.6.2 Related Work
Among the CCBT commercial systems available that contain interactive cartoon elements the most relevant are: MoodGYM [98], FearFighter [39], Beating the Blues [11], Living Life to the Full [76]. These systems are endorsed by several governments and provide effective treatment even through insurance coverage in some cases.
In contrast with current efforts focused on computer games for CBT enhancement (Coyle, et. al.) [30], machinima could drive adoption of self- help interventions because a game engine not required. Although machinima is derived from games, it does not require a game engine. Rather it is a series of modest-resolution videos with decision branching. It is modest in both memory and CPU usage. These modest requirements could support devices with limited interactivity such as DVDs, which in turn can drive higher adoption in low-income populations
2.6.3 Prototype Movie: “A Rainy Day“
We created a prototype based on the San Francisco General Hospital Depression Clinic manual. This manual is currently used for group CBT of major depression and maintained by the Latino Mental Health Research Program team from the University of California at San Francisco [100]. The manual consists of four modules (thoughts, activities, people and health) covering key CBT concepts and content relevant to primary care patients. For our prototype we drew from an activity included in the first module (Thoughts and Your Mood). Specifically, we adapted a cartoon that demonstrates the link between thoughts and mood. Figure 2.4 showed both the cartoon and some movie shots. In this simple sketch, a character walking reacts to the beginning of rain. In the first part the character chooses an unhealthy thought and reacts with sadness. In the second part, the same character has a different thought that triggers smiling and leads the character to run happily through the rain.
Figure 2.6 shows the way the information in the original manual was converted into a movie script. CBT training material was translated into a movie script that followed basic rules of cinematography, which described the scenarios, camera shots, actors, dialogue and transitions.
40
Catchy phrase to remember blobs that Visual thought thought explain and elicit process Emotive engagement to be through a conflict resolved catches our Realistic avatar and curious attention us in the immerses plot/subplot to be Beautiful surrounding by rain challenged that provides Calming voice hope and explanations Entertainment Value manual and machinimamoviescript. CBT value on reality Focus for CBT Building block therapy Thought trigger of reality Resemblance to a place We can relate like this the instructor Voice of Excerpts from original CBT CBT original from Excerpts
Detail A Rainy Day Thought Selection Rain Simple person Simple park Narrator igure 2.6: igure F Excerpts from original CBT manual and machinima movie script. Movie script elements of A Rainy Day.
Elements Title Plot (Action) Subplot (Theme) Protagonist Scenery Narrative/ Dialogue Machinima’s cinematographic and narrative elements (camera angles, sounds, music, plot and subplot, the characters),person’s sense of immersion in the scene and its verisimilitude to real social experience, should further increase not only the interest of the patient for the trigger video. and as For the example, element that thedefines our power rainthemeto control our own can thoughts, easily remembered. be making it understood an as element the that could emotional be In summary, CBT’s engagement and efficacy issues narrative and dialogical interactivity. can be improved by easily adding a mix of entertainment, Figure 2. Table 1. 41
We created the video using several cinematographic elements available in machinima: background music, foreground texts, a narrator, camera panning and scenery. We were able to further introduce subtle yet powerful elements that transmit strong emotional feelings, which should help the user remember this lesson by enhancing his/her memory associated with this emotional arousal [22]. We used camera panning to have close-up views of the character, which allows emphasizing the feeling of despair felt by the character on the first part of the video. On the contrary, an open camera shot provides a feeling of realization, freedom, and completeness. We used the rain as our anchor element to elicit a theme centered on thoughts. The rain was the trigger that remained invariant in both parts of the movie. By focusing on the rain drops when it starts to rain, and also after the character has expressed his thought and mood, we elicit the theme of rain as a neutral event that triggers a thought that is entirely dependent on our interpretation of the situation. Figure 2. Excerpts from original CBT manual and machinima movie script. Machinima’s cinematographic and narrative elements (camera angles, sounds, Machinima’smusic, plot cinematographic and subplot, and narrative characters), elements (camera the angles,person’s sounds, sense music, plotof andimmersion subplot, characters), in the thescene person’s and sense its of immersionverisimilitude in the scene to and real its verisimilitude social experience to real social ,experience, should shouldfurther further increase increase not only the interest of the patient for the video. For example, the raintheme can be understood as the emotional triggernot onlyand as thethe element interest that definesof the our patient power to controlfor the our video own thoughts, (Table making 2.2 it). an For element example, that could thebe easilyrain remembered. theme can be understood as the emotional trigger and as the element that defines our power to control our own thoughts, making it an element that Tablecould 1. Moviebe easily script elements remembered. of A Rainy Day.
Elements Detail CBT value Entertainment Value Title A Rainy Day Focus on reality Catchy phrase to remember Plot Thought Building block for CBT Visual thought blobs that (Action) Selection therapy explain and elicit thought process Subplot Rain Thought trigger Emotive engagement (Theme) through a conflict to be resolved Protagonist Simple person Resemblance of reality Realistic avatar catches our curious attention and immerses us in the plot/subplot Scenery Simple park We can relate to a place Beautiful surrounding to be like this challenged by rain Narrative/ Narrator Voice of the instructor Calming voice that provides Dialogue hope and explanations
Table 2.2: Movie script elements of “A Rainy Day” In summary, CBT’s engagement and efficacy issues can be improved by easily adding a mix of entertainment, narrativeIn summary, and dialogical CBT’s interactivity. engagement and efficacy issues can be improved by easily adding a mix of entertainment, narrative and dialogical interactivity. 42
2.6.4 Uses of Therapeutic Machinima
Machinima’s flexibility should allow for the creation of databases of videos with minor modifications that help tailor the best content to the user. Generic storylines with small changes in locations (park, street, city, etc.), characters (sex, age, race, etc.) will allow for health practitioners, coaches, educators and other members to benefit from a rich resource tailored to important places or situations in a person’s life.
It is clear to observe that most of the advantages that machinima adds to CBT can also be applied to other types of training material. Therefore, we believe that machinima has the potential to push for a dramatic increase in the creation of educational and training systems leveraging highly customizable video.
Complementary to the research and technical challenges related to machinima, the use of video should be proven effective to improve engagement of illiterate users, and therefore use this technology to help breach deficiencies in mental health treatment in marginalized regions.
2.6.5 Future Research
Machinima’s impact in the creation of tailored content for mental health must be evaluated in three aspects:
1) Is machinima more engaging (better persistence, better uptake) than non-game methods?,
2) Are there any improvements in efficacy from its higher level of interactivity and engagement?,
3) Can a reasonable degree of branching be added without creating an overwhelming number of videos to make?
Another research challenge would be to produce a series of videos where characters and scenarios can be designed for different family members to transmit different components of a family psycho-education program.
Research on the effects of perspective (first and third person views) should be performed to determine which perspectives are more appropriate to teach specific CBT concepts as well as the effect on the engagement of patients and family members. 43
2.7 PRINCIPLES FOR CONCEPTUALIZATION
2.7.1 Understanding Disempowering Narratives a. Narratives are lived - not only used to tell stories about one self. People confront ideas and situations based on the way they portray themselves. This is observed in trauma patients who cannot overcome the generalization of their disempowering narratives. Understanding the narratives new empowering narratives underneath unhealthy behaviors will help design new empowering narratives that change unhealthy habits, eliminate over- generalizations and organize thoughts around the appropriate context. b. Focus on strengths – Design around behavior change can benefit from understanding people’s current strengths, rather than imposing an ideal model for functioning under a specific situation. A key concept that describes the basis for behavior change is what Bandura defines as self-efficacy [8]. In a nutshell, self-efficacy explains using current strengths. However, discovering strengths may demand an exploration not only of thriving experiences, but also difficult experiences, where strengths are used to be resilient and survive emotional or physical pain or disgrace.
2.7.2 Externalizing Problems c. Interpretation and introspection – Problems are rarely completely understood by users. Designers should strive to provide tools, time spaces and cues to help people interpret problems and introspect. Games with forced pauses and prompts for reflection could help increase people’s awareness of their own thoughts and further understand their problems. Furthermore, health behavior change games must be designed to be adaptive to changes in problem definition, as the game helps the user discover the root cause of a superficial problem. d. Problems as fictional enemies – Externalizing problems into concrete game elements (i.e. objects, monsters, obstacles, etc.) help people understand that a problem does not occupy every aspect of their lives. It also helps the user understand the characteristics of the problem, which in turn will help understand the possible solutions around it. Designers should provide users the possibility to externalize their problematic feelings into a concrete game or narrative element that can later be destroyed or controlled. The element representing the problem should have a clean metaphor, for example, stress into an oppressive rock, or depression as glue that impedes you to move, in order to help the user understand the characteristics and affordances of the problem at hand.
44
e. Materializing skills into weapons – As well as tangible problems, weapons that represent the skills required to overcome the problem should be materialized. Such tools must carry a clean meaning that is memorable and supports the notion that change is possible via the use of the metaphors associated with such weapons.
2.7.3 Game Dynamics as Interventions f. Progress as a proxy for self-efficacy – Eliciting progress should be a key element of game design for behavior change. Many times users need to realize first that “change” is actually possible. If no progression is clearly observed, the sensation of inefficacy is perpetuated and therefore, any additional effort to develop skills or change motivations could be futile. The initial game levels must demonstrate to the user that change is possible. g. Social validation – Sharing and celebrating with others helps assimilate the new changing reality. Without social affirmation around change, progress may seem part of our imagination. Designers should use social affirmation to promote self- efficacy. Using social influence could be used as a vehicle to get some concrete change, but it runs the risk to leave the user believing that they were imposed a new reality by others and therefore reducing gains in self-efficacy, which ultimately drives change.
2.7.4 Narratives as Interventions h. Translate abstract concepts to visual stories: CBT’s lessons learned by patients are mostly abstract. Core concepts of CBT include the investigation of the link between thoughts, mood, and behavior and the challenging and reframing of thoughts to improve one’s mood and promote healthy behaviors. These concepts are complex and are made more concrete via examples and metaphors. These examples can be better presented with graphics or even better, with storyboards or cartoon elements. In the case current commercial CCBT systems [39, 10, 76, 98], these examples are presented as interactive cartoons that allow the user to make choices along with the storyline. i. Verisimilitude to real social experience: With machinima we add a much greater level of realism: facial animation, body gesture, movement and speech prosody, with a tinge of playfulness and irony inherited from the source game (In this case the SIMS). The game mechanics available in the SIMS, allow easy setup cameras, control and program goals for avatars that already contain high resemblance with human actors, including emotional content. These mechanics provide flexibility for authoring from the perspective of a producer or director, rather than having to develop complete scenarios and simulate human characters. 45
j. Simple game mechanics: CBT’s main driver of progress is through homework that reinforces skills taught in sessions and allows people to apply these skills in their daily lives to better promote healthy thought management and behavior change. Many of these tasks are guided through printed materials, or through notes taken by the patient during its therapy sessions. CCBT adds pictures and online questionnaires that adapt to the patient progress to further enhance engagement and retention. Narrative games (machinima) should provide an element of interactive fun that will engage the patient in a dramatic plot while learning lessons. Additionally, this element of cinematographic fun could be shared with family members to increase awareness or even support the learning of some interpersonal skills. k. Continuous and adaptive learning: Principles either learned at therapy session or self-taught can be reinforced through constant reviewing of the material (through DVD, mobile phones), and are made more salient when accessible to patients after an event evokes the relevant lesson. Although these concepts could also be replicated in paper-based materials, the interactive nature of machinima elements allow the user to follow a decision tree that can adapt to the learning and understanding progress of the user. l. Family engagement: Videos can be designed into a mini series that follows a family psycho-education[83] program aimed to: 1) generate understanding of the patient’s problem, 2) incorporate different perspectives to similar situations to generate empathy, 3) complement the patient’s education by explaining hard to grasp concepts that must be discussed in family, 4) social capital is restored by sharing entertainment and educational moments, 5) allowing the patient to re-experience techniques learned in session with the therapist.
2.8 CONCLUSION
In this chapter we present a framework to guide the way we conceptualize behavior change interventions. We propose a design space that takes advantage of game and narrative mechanics and content, that describes a short or long term strategy, and that leverages a scientific model for efficacy versus an iterative model for engagement. We present some prototype examples of game apps constructed based on theoretical basis. We present a narrative prototype that leverages cinematographic elements and machinima and we conclude with a series of principles for design that discuss issues of disempowering narratives, externalization, game dynamics and narrative dynamics.
46
Chapter 3 Sensor-less Sensing
This chapter describes our vision on what should be the research around sensing that enables adaptive interventions to make affective computing and stress management technology pervasive and unobtrusive. With the use of common computer peripherals and mobile computing devices as affect sensors, personalized and adaptive intervention technologies can be developed. Furthermore, physiological sensing can be performed without the introduction of extraneous factors such as wearable devices or focused software. Different methods for sensing and complementary adaptable interventions and interactions are described and proposed. We show some empirical evidence of the use of a computer mouse in the detection of stress.
47
3.1 INTRODUCTION
This chapter presents the body of knowledge around the adoption, usability and design of non-invasive sensing techniques for affective computing and stress management, due to its big impact in mental health and productivity. We define sensor-less sensing as the umbrella term covering the opportunistic use of existing sensors embedded in daily use computing devices and peripherals such as mice [55, 82, 135], keyboards [31, 37, 55], cameras [12, 118] and mobile devices to repurpose their signals to track different biometric states representative of mental or physiological states directly or indirectly. Sensor-less sensing offers a great opportunity to detect physiological metrics such as heart rate, heart rate variability (HRV), breathing rate, etc. Another exciting opportunity is to indirectly detect mental state changes. Preliminary research performed in our lab on this topic has been able to detect robust signals in voice [24], and by measuring the natural oscillation of the arm through a computer mouse [135]. In a first approach sensor-less sensing can be used to monitor affective states associated with the use of computer interfaces (such as frustration, anger, happiness, stress, etc.) There are a number of advantages in leveraging daily use computing devices as “sensors”: Accessibility - peripherals such as mice and keyboard are ubiquitous and are indispensable to interaction with desktops graphic user interfaces. Unobtrusiveness - There is no need for wiring sensors to the body or speak to a microphone (or cellphone) especially in semi-structured office spaces. Long-term, in-situ monitoring - Many people spend a substantial amount of time using computers, which affords the opportunity to monitor and provide feedback while people are engaged in stressful tasks. Application and content neutral monitoring - Mice motion and smartphone gestures is neutral to the application and the content with which the user is interacting. This implies better generalizability and alleviates privacy concerns compared with other more intrusive techniques, like monitoring keystrokes or camera usage.
3.2 BACKGROUND 3.2.1 Arousal and Stress Arousal and stress affect almost all the body functions, including cognitive function and memory. Stress/arousal is not a single system but several related systems working in harmony (allostatic body function) [17]. Stress has a concrete function to trigger a reaction to a specific threat (stressor) [20]. Arousal generally improves memory, speed of work, association and pattern recognition; it also increases errors on unfamiliar tasks and causes cognitive 48
and perceptual narrowing. The effects of these changes on performance depend on the task. This general form has been observed in studies on the effects of emotional stress on recall [25]. However, a large body of research has given conflicting results under different stress/performance combinations, often with linear relationships between arousal and performance, or with performance that saturates at high levels of arousal. It is best to assume that this relationship needs to be learned for each class of tasks. Similarly, it has been shown that the optimal level of arousal varies from one individual to the next. This all suggests that in order to make use of stress/arousal feedback, a system needs to gather a large amount of data about each individual, and furthermore this data needs to be identified with the task the subject is doing. Traditional psychophysiology focuses on the Autonomic Nervous System (ANS) due to its strong correlation with affect. Its quasi-independence from the Central Nervous System (CNS) helps to use it as a stronger signal associated with emotions. We wanted to take advantage of the less explored Somatic Nervous System (SoNS) . The use of movement from limbs as a proxy for arousal is intuitive. Changes in muscle tension and grip are associated with emotional change. However the challenge is to obtain a pure signal, rather than reading other mental processes. Attention, intention, workload, etc. are signals that affect SoNS. Methodology is important to isolate the proper signal.
3.2.2 Physilogical Affect Sensing Recent advancement in sensors and wireless technologies and related computational techniques have accelerated the push towards wearable health- care devices capable of providing ambulatory monitoring of a variety of vital signs, such as electrocardiogram (ECG), electromyogram (EMG), pulse oximetry, bio-impedance, electro-dermal activity (EDA) (formerly known as Galvanic Skin Response - GSR) [106]. Many of these vital signs are strongly linked with physiological changes induced by emotional arousal [116]. Healey and Picard used a combination of these physiological sensors to determine stress-levels for drivers in real-life driving tasks [52]. However, many have also reported that a variety of other conditions such as physical activity, attention or fatigue can introduce physiological responses very similar to stress/arousal [58]. Affect sensing using voice and facial features has also been explored. Specifically, voice analysis has demonstrated modest accuracy at estimating emotion and high accuracy of stress/arousal [24]. Computer vision has demonstrated good accuracy at emotion detection from facial images [74]. However, collecting sufficient data to train robust models for voice analysis is
49
challenging, and in both cases deployment is further complicated because of privacy concerns [77]. In spite of their growing availability, friction continues to exist in public adoption due to the intrusive nature of body periphery sensing. By enabling daily use computing devices capable of converting some body signals into mental health metrics, new affective technology adoption could be improved.
3.3 AROUSAL FROM COMPUTER PERIPHERALS 3.3.1 Mouse movement1 A promising exploration of the use of mouse movements to detect affect was Wolfgang Maehr’s diploma thesis [82]. Maehr used several metrics on mouse movement, and emotions induced in subjects by watching short videos. Specific “motion breaks” that were discontinuities in mouse movement were significantly related to both arousal and disgust and close to significant for anger. One common correlate of arousal/stress is muscle tension. Tension in arm and wrist muscles would change the dynamics of the movement, e.g. its resonant frequency and damping ratio. In our work (Sun, et. al. [135]) we have seen that it is possible to train simple controller models, such as Linear Predictive Coding (LPC), to capture the basic dynamic parameters of the movement. We approximated the parameters of a Mass-Spring-Damper (MSD) model. The mass are the bones and muscle mass, while the spring is the muscle tension. The exploration used “game grade” mice, which have update rates of 500 Hz and resolutions of 5700 dots per inch (DPI). We used three different tasks: Clicking, Drag-and-Drop and Steering. We randomized four different distances and four different object sizes. Users performed all with or without stress. We calmed down users at the beginning and after each stress task. We measured movement only after a stressor (math) or a calming intervention (positive visualization). Table 3.1 shows the results for the different tasks during the use of the mouse posterior to the calm (post_calm) and the stress (post_stress) tasks. We show only results for the x-axis. Under a t-test, we observe statistical significance - after Bonferroni correction - (p<0.0167) for all the tasks along the x-axis. For the Clicking task, the damped frequency was on average higher during the Stress phase than the Calm phase, along both x-axis (ωx) (t(48) = 4.54, p<0.001 ) and y-axis (ωy) (t(48) = 3.94, p<0.001 ); while
1 Some of the data presented in this section is taken from: Sun, D., Paredes, P., & Canny, J. (2014, April). MouStress: detecting stress from mouse motion. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 61-70). ACM.
50
damping ratio was lower along the x-axis (ζx) (t(48) = 4.54, p<0.001 ). No significant effect was observed for the y axis. For the Steering task, the analysis showed that the damped frequency was higher during Stress compared to Calm for ωx (t(48) = 2.40, p=.01). In the y-axis only ωy was significantly different (t(48) = 2.55, p=0.007). No significant effect was observed for damping ratio along x-axis or y-axis. None of the parameters were significant for the Dragging task. Time was significantly lower under stress only for the clicking task (t(48) = -2.65, p=0.005).
Task: post_calm post_stress t(48) p-value Clicking mean(stdev) mean(stdev) Damped frequency (ωx) 0.13(0.001) 0.14(0.001 4.54 <0.001* Damped ratio (ζx) 0.53(0.0003) 0.53(0.0002) 4.54 <0.001* Task: Drag-&-Drop Damped frequency (ωx) 0.13(0.002) 0.13(0.001) 1.62 0.56 Damped ratio (ζx) 0.53(0.0004) 0.53(0.0003) 1.68 0.05 Task: Steering Damped frequency (ωx) 0.12(0.002) 0.13(0.003) 2.4 0.01* Damped ratio (ζx) 0.53(0.001) 0.53(0.001) 2.55 0.007*
Table 3.1: t-tests for the x-axis damping ratio, damping frequency and completion time for each of the mouse tasks (Clicking, Drag-and-Drop, and Steering). *Statistically significant (p<=0.01) after Bonferroni correction.
To calibrate our findings, we contrasted them with HRV. We calculated HRV out of a raw Electrocardiogram (ECG) signal. We curated the signal as recommended by Task Force of The European Society of Cardiology and The North American Society of Pacing and Electrophysiology [137]. We found significant differences between the stressor and calm intervention phases for several metrics: mean peak-to-peak R-R ratio (t(48) = 10.95, p<0.001), Low Frequency (<0.15Hz) energy (LF) (t(48) = -2.03, p=0.02) and High Frequency (>0.15Hz) energy (HF) (t(48) = 2.05, p=0.02). We did not observe any differences between the post_stress and post_calm mouse tasks. However we did observe a significant difference for subjective ratings (0 = calm à 10 = stress): post_stress (M = 3.9, SE = .25) and post_calm (M = 2.67, SE = .26) (t(48) = 5.86, p<0.001).
To complement our work, we went on to try to find a way to automatically classify the observed differences between stress and calm. As an example, we show the damping ratio aggregated over N=52 (26 females and 27 males) 51
performing all different motion tasks with 5 randomized repetitions each for the Clicking condition (see Figure 3.1). As it can be observed, the difference between signals is not huge, but is completely discernable.
Figure 3.1: Mouse displacement damping ratio for 20 different motion tasks for mental Calm and Stress conditions [22]
For the accuracy of within-subjects stress classification, i.e. given some labeled samples for one subject as training data, we studied the accuracy of stress classification on some unseen samples. We did this by taking a random sample of k of the data points derived during the study, training a classifier on the remaining n−k points, and using this to classify the initial k samples. We used a simple model-based classifier, relying on the structure that is evident in Figure 3.1. The model has a staircase structure, i.e. we model canonical stress behavior as having a simple step-wise dependence on target distance, and a separate (step slope) dependence on target size. An advantage of this model is that it requires only knowledge of the distance of a mouse motion, not the target size. Thus an underlying logger which is not aware task or application could be used. Model accuracy peaked at about 71% 52
accuracy with 30 samples. As the number of measurement samples increases beyond 30, accuracy starts to fall because there are not enough remaining samples (100-k) to build an accurate model. Thus, about 10 mouse movements of a stable stress state should yield around 70% accuracy.
3.3.2 Mouse Pressure2
The capacitive mouse used in our work [55] is the Touch Mouse from Microsoft, based on the Cap Mouse described in [146]. This mouse has a grid of 13x15 capacitive pixels with values that range from 0 (no capacitance) to 15 (maximum capacitance). Higher capacitive readings while handling the mouse are usually associated with an increase of hand contact with the surface of the mouse. Taking a similar approach to the one described by Gao et al. [47], we estimated the pressure on the mouse from the capacitive readings.
The experiment was based on a simplified version of the Fitts’ law task [80], participants were challenged to click on horizontal bars that alternatively appeared on the either side of the display. In particular, there were three different distances (200, 350 and 500 pixels) between the bars and three different bar widths (50, 85 and 120 pixels) that were randomly combined. For each of the combinations, the participant had to perform 10 repetitions. Therefore, for each task the participant had to click 90 times on the bars (3 distances x 3 widths x 10 repetitions). In order to induce a relaxed or stressed emotional state, this task was performed right after both the relaxed and stressed conditions of the expressive writing or text transcription tasks. Stress and affect was measured using three Likert scales (Figure 3.2).
Figure 3.2: Affective and Stress Measurements using 7-point Likert scales.
To determine if people under stress handle the mouse differently, participants performed a simplified version of the Fitt’s law task [80], in which they needed to click on several vertical pairs of bars of varying widths and distances from each other. Unlike the keyboard tasks, the stressor took place before the task,
2 Data presented in this section is taken from: Hernandez, J., Paredes, P., Roseway, A., & Czerwinski, M. (2014, April). Under pressure: sensing stress of computer users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 51- 60). ACM.
53
with either the expressive writing (for participants 1 to 12) or the transcription task (for participants 13 to 24). In this work we estimate the amount of pressure with the mouse by analyzing capacitance readings. From the raw capacitance readings of the two conditions we computed the average of all the 13x15 capacitive pixels at any point in time and created a time series for each condition (bottom-left). Finally, we estimated the overall pressure by computing the average of each series (bottom-right). As it can be seen from the capacitive readings of this example, the location of each finger can be easily identified. While the participant used 4 fingers during the relaxed condition (blue rectangle), s/he showed more contact of the pinky finger during the stressed condition (dashed-red rectangle). 18 participants showed increased mouse contact during the stressed condition, and 6 participants showed reduced contact during the same condition. The differences between the two conditions were significant for all the participants (W, p<0.05). 75% of the participants showed increased contact with the mouse under stress. A summary of the results for pressure difference between stress and calm could be observed in Figure 3.3.
3.3.3 Keyboard3
Monitoring the dynamics of keyboard usage has been widely studied in different areas such as biometric authentication [9, 96] and personality characterization [69]. Some of the main keyboard dynamics are based on latencies of the keystrokes, such as time between keystrokes or the length of time that each keystroke is pressed. One of the interesting findings when analyzing keyboard dynamics such as these reveals that the typing patterns of the same individuals vary over time and are affected by other factors such as stress or gradual changes in cognitive or physical function [96]. Thus, keyboard dynamics can provide relevant behavioral information about the affective and cognitive state of the user. Motivated by this finding, Vizer et al. [147] created a system that measured keystroke and linguistic features of 24 computer users, and were able to recognize cognitive and physical stress with accuracies of 75% and 62.5%, respectively. In a separate study, Khanna and Sasikumar [70] also used keyboard dynamics to differentiate between neutral/positive and negative emotions of 21 participants in a laboratory study. One of their main findings was that the negative emotional state was associated with more typing mistakes and slower speeds in comparison with the more neutral affective condition.
3 Data presented in this section is taken from: Hernandez, J., Paredes, P., Roseway, A., & Czerwinski, M. (2014, April). Under pressure: sensing stress of computer users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 51- 60). ACM.
54
Affective inference from keyboard activity was recently described by Epp, et. al. [37]. The developers used timings from individual key and short key- sequence (bi- and tri-gram) features. Data were collected naturalistically, i.e. the system monitored subjects’ everyday computer use, and ESM (Experience Sampling) was used to gather self-assessments of emotional state. The system showed promising accuracy (70%-88%) for most emotion labels. While these results are encouraging, as the authors acknowledged, these raw classification rates for skewed categories masked a very modest gain over baseline classification (e.g. for excitement). Still the approach shows the power of this feature set.
We experimented with a pressure-sensitive keyboard, as described by Dietz et al. [31]. For each keystroke, the keyboard provides readings from 0 (no contact) to 254 (maximum pressure). We implemented a custom-made keyboard logger in C++ that allowed us to gather the pressure readings at a sampling rate of 50 Hz. We performed two tasks: text transcription and expressive writing. In the former stress was induced by altering the size of the characters, doubling the speed of the blink of the cursor, adding a timer and finally adding ambient noise. In the latter people were asked to write either a calm or a stressful memory for about 5 minutes. Self reported stress was significantly higher for the stressful tasks. Contrasting Electro-Dermal Activity (EDA) measured through an Affective Q-sensor bracelet did not show any differences.
Figure 3.3 (top) shows the individual differences between the stressed and relaxed distribution averages of the transcription task. Session: Stress CHI 2014, One of a CHInd, Toronto, ON, Canada
memory at the beginning of the experiment may positively influence the rest of the session. Collecting data from more participants and increasing the duration of the calming clip …… could provide additional insight as to how to prevent this effect in future experiments. Despite this one ordering effect, the differences between the two conditions were 24 22
consistent across all of the other task orderings, indicating 20 increased typing pressure during the stressed condition. 18 16 55 14
Average 12
10
8 Text Transcription 00:00 00:07 00:15 00:22 00:30 00:37 00:45 00:52 01:00 01:07 8 Duration of Mouse Clicking Relaxed/Stressed 6 Figure 7. Example of mouse pressure estimation during 4 the mouse clicking task. Blue and red colors correspond
2 to the relaxed and stressed conditions, respectively. 0 How is mouse capacitance affected by stress? -2 In order to understand if people under stress handle the -4 mouse differently, participants performed a simplified -6 version of the Fitt’s law task [19], in which they needed to -8 click on several vertical pairs of bars of varying widths and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 distances from each other. Unlike the keyboard tasks, the ** ******************* stressor took place before the task, with either the expressive writing (for participants 1 to 12) or the Expressive Writing 8 transcription task (for participants 13 to 24). 6 In this work we estimate the amount of pressure with the 4 mouse by analyzing capacitance readings. Figure 7 shows the estimation analysis process we used for participant 1. 2 From the raw capacitance readings of the two conditions 0 (top of the figure), we computed the average of all the -2 13x15 capacitive pixels at any point in time and created a -4 time series for each condition (bottom-left). Finally, we -6 estimated the overall pressure by computing the average of -8 each series (bottom-right). As it can be seen from the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 capacitive readings of this example, the location of each ******************** finger can be easily identified. While the participant used 4 fingers during the relaxed condition (blue rectangle), s/he Mouse Clicking showed more contact of the pinky finger during the stressed 4 condition (dashed-red rectangle). Figure 6 (bottom) shows the differences between the two 2 conditions during the mouse clicking task. In this case, a positive value indicates more contact with the mouse 0 Average pressure during the stressed condition – average pressure during the relaxed condition surface during the stressed condition, and a negative value indicates more contact with the mouse surface during the -2 relaxed condition. As can be seen, 18 participants showed increased mouse contact during the stressed condition, and -4 6 participants showed reduced contact during the same 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 condition. The differences between the two conditions were ***** **************** ** Participants * significant for all the participants (W, p<0.05). Although
the majority of participants (75%) handled the mouse with significantly more contact during the stressed condition, Figure 3.3: FigureIndividual 6. Individual differences differences between between the the averaged averages differentialof there was larger response variability than the one observed (stress – calm)the signal stressed per andparticipant. relaxed conditions *The difference for the text was computed during the keyboard tasks. Visual inspection of the from significantlytranscription, different expressive distributions writing, (W,and mousep<0.05) clicking capacitive readings showed that participants had a tasks. *The difference was computed from significantly consistent mouse grip across conditions and that the different distributions (W, p<0.05). differences between conditions were due to
57 56
A positive value indicates higher average pressure during the stressed condition, and a negative value indicates higher average pressure during the relaxed condition. As can be seen, 22 out of the 24 participants (91.67%) showed higher average pressure metrics during the stressed conditions. When comparing the distributions of keystroke pressure across the two conditions, all participants except for three of them (participants 4, 7 and 19) showed significantly more pressure in the stressed conditions. These differences are similar to the ones observed during the expressive writing task shown in the middle graph of the figure. For this task, 23 out of the 24 participants (95.83%) showed increased average typing pressure during the stressed conditions, for which all participants except four (participants 1, 2, 4 and 19) showed a significant difference.
Note, however, that participant 3 showed significantly less pressure during the stressed condition. When describing a stressful memory, this participant described past episodes of depression which may have caused the decrease in keyboard pressure. Finally, the overall average typing pressure observed during the transcription task was higher than that during the expressive writing task, which is consistent with the self- reports of stress. No significant differences were found between the stressed and relaxed conditions in terms of amount of introduced characters, task duration or typing speed (amount of interactions). Furthermore, there were not significant differences in terms of pressure between the two genders. However, when comparing the average pressure values for the different task orderings in the expressive writing task, the groups were significantly different (K,H(3)=8.873, p = 0.031). In particular, participants that started the experiment writing about a relaxing memory (participants 1 to 6) showed smaller differences between the two conditions. This group of participants also showed lower average pressure values for the two conditions, although not significant (W, Z = -1.5, p = 0.134), than participants with other task orderings. The tendency towards decreased pressure values indicates that writing about a relaxing memory at the beginning of the experiment may positively influence the rest of the session. Collecting data from more participants and increasing the duration of the calming clip could provide additional insight as to how to prevent this effect in future experiments. Despite this one ordering effect, the differences between the two conditions were consistent across all of the other task orderings, indicating increased typing pressure during the stressed condition.
57
3.4 TEXT MINING As described in chapter 1, text mining has a strong potential to discover personal insights. Another use of our semantic searching tool is to discover labels of emotion. We can evaluate the use of the tool and its querying mechanisms as a sensor-less sensing technique. To do this accuracy studies must be performed. One initial approach could be to sense long-term processes, rather than immediate changes. We could discover life events that are associated with stress or strong emotions. Interventions that require long term change could benefit from these measurements. Furthermore, if we manage to generate a large body of data with proper markers, we could potentially explore causal inference methods, as described in the prior section. With enough available text data, we can perform causal inference in a way that allows us to determine the directionality of a mental health outcome. For example, determining if mood changes lead to poor interpersonal skills or vice- versa could help describe better interventions.
3.5 MOVING FROM SENSING TO INTERVENTIONS As observed, sensor-less sensing is an important enabling technology. It provides a tool to measure important mental health metrics without requiring any change in the user’s behavior. However, to ultimately rely into a wide adoption of these techniques, it is important to understand other adoption and engagement factors such as usability, context, field use and causal inference.
3.5.1 Usability Studies Given the new uses derived from extracting personal, and in many cases intimate, information, our research approach contemplates a fundamental focus on usability. In the case of sensing personal mental data, it is important to understand that many users may not have a clear conceptual model of this novel sensing technology. Additionally, their level of motivation and their level of engagement will be diverse and certainly should change over time. It is relevant to examine the interaction between the user and a device that “reads” your mind and how this can generate further confusion, anxiety, or behavioral changes, as well as desire or rejection of the system. Usability studies should help understand the differences between a) treating the device as reader of one’s (mental) self or b) as a companion (“pet”) that is affected or altered by our mental states. Furthermore, single mode and multi-modal lab and field studies should be performed to observe adoption and engagement patterns. 58
3.5.2 Stress and Emotion from Sensor and Contextual Data We will begin with lab studies with subject under induced stress and emotions. We have run some of these studies in our lab to date, using a variety of calibration methods [112] [135]. In reality it is difficult to train a stress or emotion sensor because there is no objective ground truth. While Heart Rater Variability (HRV) responds strongly to Autonomous Nervous System (ANS) tone, there are several confounds. Even chemical tests (e.g. cortisol samples from saliva) are confounded by body chemistry (esp. medications), and exhibit significant lag (minutes or tens of minutes). We make use also of induced stress, but different subjects respond very differently to particular stressors, so the best we can hope for is an increase in stress on average. This is still enough to train a model, and once trained we can study the accuracy of this model with or without the inclusion of particular sensors to measure their individual predictive value. Data for stress modeling will include continuous variables (estimates from the biometric sensors including voice, keyboard, mouse, HRV), a periodic time variable, and discrete variables (location, id of a nearby person). We will start with simple models, namely linear regression of sensor data on a consensus of “control” signals (cortisol, self-report and Tricorder HRV). Discrete changes will be modeled with additive linear coefficients. Going beyond pure supervised regression; we will experiment with latent factor models, which may expose useful patterns of thought and/or behavior, which predict stress. We have previous experience with rich factor models for activity classification from desktop activity data [121]. 3.5.3 Integrated Model and Data Acquisition Given the models as trained above the final boundary if to perform field modeling. Using the models above, it is possible to compute real-time estimates of emotional states and stress for users working at a desktop or using a mobile device. When significant changes occur in modeled emotional and/or stress level occur, the system will prompt the user to give a self-report. These self-reports will then be used for further training and model refinement. By issuing requests at times of change, we will be able to gather emotional and stress label data that should be as close as possible in time to triggers. Thus it should be of maximum value for building models of the complex dependencies on discrete emotional triggers and stressors. 3.5.4 Causal Inference With enough available text data, we can perform causal inference in a way that allows us to determine the directionality of a mental health outcome. For example, determining if mood changes lead to poor interpersonal skills or vice- versa could help describe better interventions. Based on our text labels we could mark different observational variables, such as life events, moods, sentiment, and other variables that we could later analyze. It is important to note that causal relationships are not sufficient to define good interventions. 59
Another key element is to know they types of interventions, when interventions should be applied and the appropriate engagement strategies. Some of these topics will be treated in Part 2 of this thesis.
3.6 ADAPTIVE INTERACTIONS AND INTERVENTIONS
3.6.1 Feedback and Interface Adaptation The first and most basic interaction/intervention will be to present direct feedback of different emotional states to users. Direct feedback should raise awareness and drive behavior change. Another option is the automatic alteration of the interface to better adapt to the emotional state of the user. The interface could be adapted either at the background or foreground level [62] to help the user either maintain attention in the task at hand during a stressful, but productive state, or change and relax during a frustrating or overwhelming episode. Interface adaptation to human affect has been used in airplanes, using previous knowledge, self-reports, diagnostic tasks and physiological sensing and changing the interface at the content and format levels [60]. Work has also been done to adapt computing interfaces to cognitive and affective changes (the latter captured through changes in facial expressions) [34]; however little work has been done to unobtrusively sense affect and actively adapt computer interfaces to affective changes. We plan to explore the use of sensor-less sensing to monitor different emotions commonly present during interactions with computing devices to help people use them more towards the accomplishment of their goals and to optimize the emotional engagement associated with it. In terms of mental health, the ability to maintain an “appropriate” level of stress, needed to fulfill the task will be beneficial especially in productive settings like the office or school.
3.6.2 Emotional Regulation and Psycho-education Interventions Emotional regulation interventions such as calming technologies present a good opportunity to help deliver users that present initial (moderate) symptoms of stress or other emotional changes with a brief and effective intervention that would help people avoid unnecessary emotional alteration. Paredes, et. al. [112] present an example of some mobile individual and social interventions that are brief and usable without the need of additional hardware and leveraging intrinsic and social aspects. Coyle, D. et. al. [30] has explored gaming interventions focused mental health based on Cognitive Behavioral Therapy (CBT). Chapter 2 describes some social and gaming concepts that could benefit from a sensor-less sensing technology. More specifically, section 2.6 describes the machinima (machine + cinema) 60
technology for the creation of short movies that can deliver some life skill or CBT-based stress reduction. Section 2.6.3 describes how machinima can be built from a CBT manual [101].
3.6.3 Foreground and Background Interventions Additionally, novel designs of foreground and background interventions delivered via the graphical unit interface (GUI) could be part of operating systems or applications. Some ideas in that could be developed are: a. the use of desktop themes, screen savers, menu bars, etc. could be used to deliver soothing messages or color combinations that can help reduce stress, b. the modification of fonts or illumination of the screen to help improve reading during moments of emotional arousal, c. help block or reduce the number of cues presented through automated messages, such as incoming email or chat, to help maintain focus during highly stressful situations.
3.7 CONCLUSION As described in this chapter, sensor-less sensing opens novel possibilities to work with a new suite of psychophysiology signals. It allows the use of movement and the Somatic Nervous System (SoNS) as a source of reliable data through movement and pressure. It is exciting to observe that we can take advantage of widely used devices, such as mice and keyboards. We also discuss the repurpose of our semantic search technology described in chapter 1 as a key enabler of using text as a sensing stream. By sensing without disrupting the user’s workflow, we considerable reduce barriers for adoption. However, we discuss also the importance to not only rely on sensing as an enabler of change. Additional research is fundamental to be able to move from sensing to an efficacious intervention deployment. We discuss factors such as usability, context and field studies as a key complement to any sensing approach. Finally, we discuss the novel types of interventions that would be much harder to implement without the ability to use unobtrusive sensing techniques. Part 2 presents a body of research that takes advantage of some of our conceptualization and sensing innovations to generate novel “opportunistic” interventions.
61
Part 2 Opportunistic Interventions
"Opportunistic Interventions" repurposes well-adopted devices and media into effective interventions. We aim at increasing efficiency (adoption and reduced attrition) beyond efficacy. We discuss three important types of interventions: Suites, Multi-modal, and Environmental. Key elements that transcend these types are:
Novelty. An efficacious intervention can become obsolete due to boredom. Humans adapt to its effects, and their interest decreases with time. Suites of interventions have the potential to deliver new content over time. Fresh interventions and authoring systems can reduce novelty effects.
Context matters. An intervention can become a stressor under the wrong context. A person receiving a personal message could interpret it as uplifting or as embarrassing. Context data is important and should be input to a successful intervention approach.
Introductory period. Users with no previous training can find an intervention taxing. For example, people with no experience in deep breathing may hyperventilate. A training period could improve efficacy and reduce attrition.
Negative outcomes. Interventions can have negative affective outcomes if poorly designed or implemented. Factorial experimental design can help flush out undesired outcomes. Multi-modal approaches should take into account potential interaction effects. Environmental interventions should consider prior perceptions.
Chapter 4 studies how to design a suite of mobile and wearable interventions. Our focus is in diversity while maintaining efficacy. Chapter 5 presents interventions harvested from the web. Popular apps become efficacious interventions. Machine learning policies use contextual cues to recommend the best intervention. Chapter 6 discusses the difficulties of multi-modal interactions in wearable systems. Haptic and sound interaction effects are analyzed. Finally, Chapter 7 showcases novel urban LED lights that elicit emotional outcomes in pedestrians.
62
Chapter 4 Repurposing Mobile and Wearable Systems
This chapter describes design explorations for stress mitigation on mobile devices based on three types of interventions: haptic feedback, games and social networks. This chapter offers a qualitative assessment of the usability of these three types of interventions together with an initial analysis of their potential efficacy. Social networking and games show great potential for stress relief. Lastly, this chapter introduces key findings and considerations for long- term studies of stress mitigation in HCI, as well as a list of aspects to be considered when designing calming interventions.
63
4.1 INTRODUCTION
Stress is an exacerbating factor for many physiological and psychological illnesses [26]. Three promising avenues can mitigate stress. First, the sense of touch has been regarded as an important communicator of empathy and calmness [126]. Second, social support is a valid tool to reduce stress [54]. Third, playing games is considered a distraction that could be used as a way to relieve stress [67]. We want to investigate novel HCI approaches to calm people down in the early stages of the accumulation of stress. We want to answer two questions: (1) Is it possible to reduce stress through interactive techniques? (2) Which modalities or interfaces are most promising to calm people down? Specifically, we investigate three different approaches suggested by prior work [12, 54]: haptic feedback, where vibrating motors stimulate acupressure points; interactive games, where game play can reduce stress; and social network interactions.
4.2 BACKGROUND AND RELATED WORK
Appropriate feedback is crucial to behavioral change. Positive psychology [131] is currently emerging as a new way of inducing behavioral changes including helping people calm down with appealing cues. Evidence Based Therapies administered via the Internet [102] have been showing promising results. As an example, Cognitive Behavioral Therapy (CBT) [12, 115], which uses several techniques for habit change teaches people how to recognize their sources of stress and block the negative associated reaction [8]. Narrative Therapy focuses on constructing conversations to help people be satisfied with their state of being [151].
Recent studies have demonstrated the value of haptics as a therapeutic tool [144], where different haptic techniques are used to support different mental health therapies. The game ―Relax to win implements bio- feedback (controlling the game through bio signals) as a mechanism to help people relax [122]. In addition, several game mobile games and web applications have recently appeared that are designed to calm players.
4.3 IMPLEMENTATION
4.3.1 Prototypes
We have built four prototypes to explore the usability and efficacy of mobile interventions to manage stress:
Social Networks
We built a text-based interface using SMS to deliver alert messages (see 64 CHI 2011 ¥ Work-in-Progress May 7Ð12, 2011 ¥ Vancouver, BC, Canada
Figure 4.1). We worked with participants and a small circle of their friends to test responsiveness of thisCHI method. 2011 ¥ Work-in-Progress This setup was chosen because of the May 7Ð12, 2011 ¥ Vancouver, BC, Canada power of intimate communication between friends and family; and because of the pervasiveness of SMS messages.
Introduction techniques are used to support different mental health Stress is an exacerbating factor for many physiological therapies. The game ―Relax to win implements bio- and psychological illnesses [3]. Three promising feedback (controlling the game through bio signals) as avenues can mitigate stress. First, the sense of touch a mechanism to help people relax [9]. In addition, Scenario: CalmMeNow In Use users. Awareness of personal stress levels varies has been regarded as an important communicator of several game mobile games and web applications have Joe, a college student, has been preparing for a between people; this limits the validity of self-reports. empathy and calmness [6]. Second, social support is a recently appeared that are designed to calm players. presentationvalid tool to reduce for his stress class. [4]. He Third,is nervous playing about games public is Subjective Data speaking.considered H ae distractionstarts to panic that could and worriesbe used thatas a heway wi toll Implementation We applied three commonly used scales to measure makerelieve a stressfool of [5]. himself! We want Joe’s to investigatemobile dev novelice, connected HCI We have built four prototypes to explore the usability toapproaches a Berkeley to Tricordercalm people sensor, down detectsin the early that stages he is of and perceivedefficacy of mobilestress: interventions to manage stress: stressthe accumulationed and activates of stress. his Wehaptic-guided want to answer breathing two Social1. Networks:The State-Trait we built Anxietya text-based Inventory interface scale using (STAI):
bracelet.questions: It (1)also Is sendsit possible an SMS to reduce alert stressto his throughclose friend SMS to deliveranalyzes alert general messages and (see momentary Figure 1). feeling We of anxiety Figure 4.1: An SMS message used to prompt the closest Figure 1. An SMS message used to Ben.interactive After atechniques? few moments (2) Whichof deep, modalities guided orbreathing, worked2. withSubjective participants assessment and a small of stresscircle of on their a 0 – 10 Likert contacts in the user’s (inner) social network. prompt the closest contacts in the Joeinterfaces receives are an most SMS promising text from to Bencalm with people a funny down? joke friends toscale test responsiveness of this method. This Playing Games user’s social network. andSpecifically, a line of we encouragement. investigate three Joe different calms approachesdown and is setup3. was Life chosen Events because Questionnaire: of the power evaluates of intimate long-term suggested by prior work [2,4]: haptic feedback, where now ready for his presentation. communicationstress betweenaccumulation friends via and questions family; and about events vibrating motors stimulate acupressure points; because of the pervasiveness of SMS messages. We used commercially available mobile games with simple tasks, such as such as the death of a relative, divorce, job loss. mazes and basic interaction games (tilting, moving, rotating).Studyinteractive Design We games, allowed where game play can reduce stress; Playing Games: we used commercially available mobile Stress levels were induced via two mental stressors: participants to choose games at their own discretion. Thisand interventionsocial network interactions.was games with simple tasks, such as mazes and basic Figure 3. Acupressure points for The focus of this study is to evaluate promising calming chosen because of the distraction it provides. Figure 4.2 shows an off-the-self interaction1. The games Stroop (tilting, Color moving, Word Test, rotating which). We presents a relaxation technologiesBackground. We and first Related investigated Work the efficacy and allowed participantsseries of words to choose that gamesdescribe at colors,their own but using a maze game. potentialAppropriate use feedback of such istechnologies crucial to behavior in a labal change.experiment . Motor Placement discretion.different This intervention color of ink was at chosen increasing because speeds of the ToPositive gather psychology data on the [10 effectiveness] is currently emerging of our interventions as a new distraction it provides. weway collected of inducing both behavioral used objective changes (biometric), including helping and 2. A subtraction math test with penalties for mistakes subjectivepeople calm (psychometric down with appealing and self-report) cues. Evidence data .Based Guided Acupressure:and for long we response employed times two— vibro-tactileif users made a Therapies administered via the Internet [7] have been motors inmistake a bracelet or (Figuretook too 2) long, that stimulatesthey had to start over. Objectiveshowing promising Data results. As an example, Cognitive acupressure points in the wrists and the chest (when GalvanicBehavioral Skin Therapy Response (CBT) (GSR) [2] [8], and which Electrocardiogram uses several the wristEvaluation is held to the sternum); these points are known to reduce stress (Figure 3). We employed a datechniquesta (ECG) forare habit known change indicators teaches of people stress how, if the to Baseline Wizard of Oz technique to control the timely application subjectrecognize is nottheir engaged sources ofin stressstrenuous and block physical the negativeactivity. We recruited 20 participants (13 male and 7 female) of this stimulus to the participants. Weassociated gathered reaction this data [1]. usingNarrative an ambulatoryTherapy focuses sensing on among students in our institution. The evaluation constructing conversations to help people be satisfied Guided Breathing: Using the same bracelet, we coached device, the Berkeley Tricorder. We used four features: consisted of two phases, each one with 6 identified with their state of being [12]. participants to breathe according to well-known deep- stages: (1) Arrival, (2) Calming, (3) Stroop Test, (4) Figure 2. Vibro-tactile bracelet with 1. ECG Heart Rate (HR) (positively correlated with breathing techniques, where a haptic stimulus indicates Wait/Anticipation, (5) Math Test, (6) Calming (see two vibrating motors for Recentstress) studies have demonstrated the value of haptics the breathing rhythm. Correct breathing rhythm is one Figure 5). The first stage was used to gather acupressure stimulation. 2.as aECG therapeutic Heart Rate tool Variability[11], where (HRV) different (negatively haptic of the key elements to achieve a soothing effect. correlated) psychometric and subjective data at the moment of 3. GSR Electrodermal conductance (EDC) (negatively arrival. During the calming stages, participants completed positive thinking and visualization exercises correlated) 1700 and we incentivized them to speak descriptively about Figure 4.2Figure. One 4 .of One the of games the games we weused involved4. GSR Variability (negatively correlated) beautiful and soothing situations. No other maneuveringused involves a ball inmaneuvering a maze a ball in These biometric data are valuable because they can interventions took place. a maze. assist in verifying the subjective stress assessments of
1701 CHI 2011 ¥ Work-in-Progress May 7Ð12, 2011 ¥ Vancouver, BC, Canada
Introduction techniques are used to support different mental health Stress is an exacerbating factor for many physiological therapies. The game ―Relax to win implements bio- and psychological illnesses [3]. Three promising feedback (controlling the game through bio signals) as avenues can mitigate stress. First, the sense of touch a mechanism to help people relax [9]. In addition, has been regarded as an important communicator of several game mobile games and web applications have empathy and calmness [6]. Second, social support is a recently appeared that are designed to calm players. valid tool to reduce stress [4]. Third, playing games is considered a distraction that could be used as a way to Implementation relieve stress [5]. We want to investigate novel HCI We have built four prototypes to explore the usability approaches to calm people down in the early stages of and efficacy of mobile interventions to manage stress: the accumulation of stress. We want to answer two Social Networks: we built a text-based interface using questions: (1) Is it possible to reduce stress through SMS to deliver alert messages (see Figure 1). We 65 Figure 1. An SMS message used to interactive techniques? (2) Which modalities or worked with participants and a small circle of their prompt the closest contacts in the interfaces are most promising to calm people down? friends to test responsiveness of this method. This Guided Breathing user’s social network. Specifically, we investigate three different approaches setup was chosen because of the power of intimate suggested by prior work [2,4]: haptic feedback, where communication between friends and family; and We employed vibrotactile feedback to guide breathing patterns. Figure 4.3 vibrating motors stimulate acupressure points; shows a bracelet with two eccentric rotating mass (ERM) vibration motors. We because of the pervasiveness of SMS messages. coached participants to breathe according to well-knowninteractive deep- breathing games, where game play can reduce stress; Playing Games: we used commercially available mobile techniques, where a haptic stimulus indicates the breathingand rhythm. social network Correct interactions. games with simple tasks, such as mazes and basic breathing rhythm is one of the key elements to achieve a soothing effect. interaction games (tilting, moving, rotating). We Background and Related Work allowed participants to choose games at their own Appropriate feedback is crucial to behavioral change. Motor Placement discretion. This intervention was chosen because of the Positive psychology [10] is currently emerging as a new distraction it provides. way of inducing behavioral changes including helping people calm down with appealing cues. Evidence Based Guided Acupressure: we employed two vibro-tactile Therapies administered via the Internet [7] have been motors in a bracelet (Figure 2) that stimulates showing promising results. As an example, Cognitive acupressure points in the wrists and the chest (when Behavioral Therapy (CBT) [2] [8], which uses several the wrist is held to the sternum); these points are techniques for habit change teaches people how to known to reduce stress (Figure 3). We employed a recognize their sources of stress and block the negative Wizard of Oz technique to control the timely application associated reaction [1]. Narrative Therapy focuses on of this stimulus to the participants. constructing conversations to help people be satisfied Guided Breathing: Using the same bracelet, we coached
with their state of being [12]. participants to breathe according to well-known deep- Figure 4.3: VibroFiguretactile 2 . bracelet Vibro-tactile with bracelet two eccentric with rotating mass breathing techniques, where a haptic stimulus indicates (ERM) motors for guided breathing and acupressure stimulation. two vibrating motors for Recent studies have demonstrated the value of haptics the breathing rhythm. Correct breathing rhythm is one acupressure stimulation. as a therapeutic tool [11], where different haptic of the key elements to achieve a soothing effect. Guided Acupressure
We employed the aforementioned bracelet to stimulate acupressure points in the wrists and the chest (when the wrist is held to the sternum); these points 1700 are known to reduce stress (Figure 4.4). We employed a Wizard of Oz technique to control the timely application of this stimulus to the participants. CHI 2011 ¥ Work-in-Progress May 7Ð12, 2011 ¥ Vancouver, BC, Canada
66
Scenario: CalmMeNow In Use users. Awareness of personal stress levels varies Joe, a college student, has been preparing for a between people; this limits the validity of self-reports. presentation for his class. He is nervous about public speaking. He starts to panic and worries that he will Subjective Data make a fool of himself! Joe’s mobile device, connected We applied three commonly used scales to measure to a Berkeley Tricorder sensor, detects that he is perceived stress: stressed and activates his haptic-guided breathing 1. The State-Trait Anxiety Inventory scale (STAI): bracelet. It also sends an SMS alert to his close friend analyzes general and momentary feeling of anxiety Ben. After a few moments of deep, guided breathing, 2. Subjective assessment of stress on a 0 – 10 Likert Joe receives an SMS text from Ben with a funny joke scale and a line of encouragement. Joe calms down and is 3. Life Events Questionnaire: evaluates long-term now ready for his presentation. stress accumulation via questions about events such as the death of a relative, divorce, job loss. Study Design Figure 4.4: Accupressure points for relaxation. Stress levels were induced via two mental stressors: Figure 3. Acupressure points for The focus of this study is to evaluate promising calming 1. The Stroop Color Word Test, which presents a relaxation technologies. We first investigated the efficacy and 4.3.2 Common use scenario potential use of such technologies in a lab experiment. series of words that describe colors, but using a different color of ink at increasing speeds To gather data on the effectiveness of our interventions Joe, a college student, has been preparing for a presentationwe f collectedor his class. both He used objective (biometric), and 2. A subtraction math test with penalties for mistakes is nervous about public speaking. He starts to panic and worriessubjective that (psychometric he will and self-report) data. and for long response times—if users made a make a fool of himself! Joe’s mobile device, connected to a Berkeley Tricorder mistake or took too long, they had to start over. sensor [106], detects that he is stressed and activates Objective his haptic Data-guided breathing bracelet. It also sends an SMS alert to his close friendGalvanic Ben. Skin After Response a (GSR) and Electrocardiogram Evaluation few moments of deep, guided breathing, Joe receives an SMS text from Ben data (ECG) are known indicators of stress, if the Baseline with a funny joke and a line of encouragement. Joe calms down and is now subject is not engaged in strenuous physical activity. We recruited 20 participants (13 male and 7 female) ready for his presentation. We gathered this data using an ambulatory sensing among students in our institution. The evaluation 4.4 STUDY DESIGN device, the Berkeley Tricorder. We used four features: consisted of two phases, each one with 6 identified 1. ECG Heart Rate (HR) (positively correlated with stages: (1) Arrival, (2) Calming, (3) Stroop Test, (4) The focus of this study is to evaluate promising calming technologies. We first stress) Wait/Anticipation, (5) Math Test, (6) Calming (see investigated the efficacy and potential use of such technologies in a lab 2. ECG Heart Rate Variability (HRV) (negatively Figure 5). The first stage was used to gather experiment. To gather data on the effectiveness of our interventions we psychometric and subjective data at the moment of collected both used objective (biometric), and subjective (psychometriccorrelated) and 3. GSR Electrodermal conductance (EDC) (negatively arrival. During the calming stages, participants self-report) data. correlated) completed positive thinking and visualization exercises and we incentivized them to speak descriptively about 4.4.1 Objective DataFigure 4. One of the games we 4. GSR Variability (negatively correlated) beautiful and soothing situations. No other used involves maneuvering a ball in Electro-Dermal Activity (EDA) and Electrocardiogram dataThese (ECG) biometric are known data are valuable because they can interventions took place. a maze. indicators of stress, if the subject is not engaged in strenuousassist physical in verifying activity. the subjective stress assessments of We gathered this data using an ambulatory sensing device, the Berkeley
Tricorder. We used four features: 1701 1. ECG Heart Rate (HR) (positively correlated with stress) 67
2. ECG Heart Rate Variability (HRV) (negatively correlated) 3. Electro-dermal conductance (EDC) (negatively correlated) 4. EDA Variability (negatively correlated)
These biometric data are valuable because they can assist in verifying the subjective stress assessments of users. Awareness of personal stress levels varies between people; this limits the validity of self-reports.
4.4.2 Subjective Data
We applied three commonly used scales to measure perceived stress:
1. State-Trait Anxiety Inventory scale (STAI): analyzes general and momentary feeling of anxiety 2. Subjective assessment of stress: 0 – 10 Likert scale 3. Life Events Questionnaire: evaluates long-term stress accumulation via questions about events such as the death of a relative, divorce, job loss.
Stress levels were induced via two mental stressors:
a. Stroop Color Word Test: Presents a series of words that describe colors, but using a different color of ink at increasing speeds. b. Recurrent Subtraction Math Test: Which includes penalties for mistakes and for long response times—if users made a mistake or took too long, they had to start over.
4.5 EVALUATION
4.5.1 Baseline
We recruited 20 participants (13 male and 7 female) among students in our institution. The evaluation consisted of two phases, each one with 6 identified stages: (1) Arrival, (2) Calming, (3) Stroop Test, (4) Wait/Anticipation, (5) Math Test, (6) Calming (Figure 4.5). The first stage is used to gather psychometric and subjective data at the moment of arrival. During the calming stages, participants completed positive thinking and visualization exercises and we incentivized them to speak descriptively about beautiful and soothing situations. No other interventions took place. CHI 2011 ¥ Work-in-Progress May 7Ð12, 2011 ¥ Vancouver, BC, Canada
68
Randomized Experiment In the second phase, administered two weeks later, participants completed the same stages - but this time we applied the four interventions (Social Network,
Playing Games, Guided Acupressure and Guided FigureFigure 4.5: 5. ExperimentExperiment Stages Stages
Breathing) during stages 2, 4 and 6, in randomized 4.5.2 Randomized Experiment order, in order to measure their efficacy to relief stress. paired t-test comparison between phases1 and 2, no In the second part of the experiment, administered two weeks later, Participants were assigned in random order to each of participantssignificant completed effect the was same observed stages - butbetween this time interventions we applied the .four the potential combination of 3 interventions to obtain a interventionsThis indicates (Social that Network, our interventions Playing Games, had Guided similar Acupressure effects and Guided Breathing) during stages 2, 4 and 6, in randomized order, in order to within-subjects comparison of the interventions. We measureto the their positive efficacy thinking to relief stress. and visualizationParticipants were techniques assigned in random of expected to obtain results that were either better or orderphase to each 1. of Figure the potential 7 presents combination the of normalized 3 interventions data to obtain a within- subjects comparison of the interventions. We expected to obtain results that similar to the visualization and positive thinking stages weregathered either better from or similarthe usability to the visualization questionnaire. and positive The thinking social stages from phase 1. Finally, at the end of phase 2 we fromnetwork part 1. Finally,support at and the end breathing of part 2 interventions we gathered closed have-form (Likert scales) qualitative information about likeability, potential benefit, and Figure 6. Subjective Stress Levels for gathered closed-form (Likert scales) qualitative perceivedstronger, efficacy more of uniform the interventions. support. We Acupressure also gathered and open -ended information about improvements as well as suggestions for new interventions. each intervention across the different information about likeability, potential benefit, and games have some strengths and noticeable stages on phase 2 perceived efficacy of the interventions. We also 4.6weaknesses. RESULTS gathered open-ended information about improvements As seen in Figure 4.6, the subjective scale expressed the expected pattern of as well as suggestions for new interventions. stressA preliminary and calmness Principalin different Componentstages. At an aggregate Analysis level, (PCA) stressors on did raisethe perceived objective stress data levels showed and the calmingthat all stages the didaforementioned manage to lower the stress level. Participants' subjective stress ratings were ranked to account for Results differencesbiometrics in individual used ratingto evaluate scales. Ratings stress for provide each stage 86% were ranked from As seen in Figure 6, the subjective scale expressed the 1 tocorrelation 6: the stage with the the lowest expected subjective results. stress rating This was result ranked as 1; the highest stress rating was ranked as 6. Using a multiple comparisons Friedman expected pattern of stress and calmness in different test,indicates the variation that between these biometrics stages was found are torelevant be statistically to the significant stages. At an aggregate level, stressors did raise (p<0.001).future problemAdditionally, of in inferring part 2, the stresssubjects from left the ECG test withand a GSR lower level of stress than the level they entered with (p<0.001). No statistical significance perceived stress levels and the calming stages did wassamples, found regarding in conjunction the effect with of each subjective intervention. data. In a Further paired t- test manage to lower the stress level. Participants' comparisonanalysis between is important parts 1 and to 2, reduce no significant the effectset of was components. observed between interventions. This indicates that our interventions had similar effects to the subjective stress ratings were ranked to account for positiveAdditionally, thinking and we visualization performed techniques a pairwise of part t-test 1. of the differences in individual rating scales. Ratings for each aggregated values for the stress stages (Stroop and stage were ranked from 1 to 6: the stage with the Math) and the calm stages (Calm 1 and Calm 2). We lowest subjective stress rating was ranked as 1; the observed that the difference was statistically significant highest stress rating was ranked as 6. Using a multiple (p<0.001), which means that indeed, GSR and EDC comparisons Friedman test, the variation between follow the subjective stress states at an aggregate Figure 7. Comparison of the four stages was found to be statistically significant level. However, at an individual level there are many interventions. Social Networking and (p<0.001). Additionally, in phase 2, the subjects left differences and gaps, which suggests that careful breathing had the most uniform the test with a lower level of stress than the level they analysis and further design may be needed to distribution entered with (p<0.001). No statistical significance was guarantee usability. Figure 8 shows an example of GSR found regarding the effect of each intervention. In a signal levels plotted against samples, where oscillations
1702 CHI 2011 ¥ Work-in-Progress May 7Ð12, 2011 ¥ Vancouver, BC, Canada
CHI 2011 ¥ Work-in-Progress May 7Ð12, 2011 ¥ Vancouver, BC, Canada
69 Randomized Experiment In the second phase, administered two weeks later, participantsRandomized completed Experiment the same stages - but this time weIn theapplied second the phase,four interventions administered (Social two weeks Network, later, participants completed the same stages - but this time Playing Games, Guided Acupressure and Guided Figure 5. Experiment Stages we applied the four interventions (Social Network, Breathing) during stages 2, 4 and 6, in randomized Playing Games, Guided Acupressure and Guided Figure 5. Experiment Stages order,Breathing) in order during to measure stages 2, their 4 and efficacy 6, in randomized to relief stress. paired t-test comparison between phases1 and 2, no Participantsorder, in order were to assignedmeasure intheir random efficacy order to reliefto each stress. of pairedsignificant t-test effect comparison was observed between between phases1 interventions and 2, no . theParticipants potential werecombination assigned of in 3 random interventions order toto eachobtain of a significantThis indicates effect that was our observed interventions between had interventions similar effects. within-subjectsthe potential combination comparison of of 3 theinterventions interventions. to obtain We a Thisto the indicates positive that thinking our interventions and visualization had similar techniques effects of expectwithin-subjectsed to obtain comparison results that of thewere interventions. either better We or tophase the positive1. Figure thinking 7 presents and thevisualization normalized techniques data of similarexpect edto theto obtain visualization results thatand positivewere either thinking better stages or phasegathered 1. Figure from the7 presents usability the questionnaire. normalized data The social similar to the visualization and positive thinking stages gathered from the usability questionnaire. The social from phase 1. Finally, at the end of phase 2 we network support and breathing interventions have from phase 1. Finally, at the end of phase 2 we network support and breathing interventions have FigureFigure 4.6:6. SubjectiveSubjective stressStress levelsLevels for for each gathered closed-form (Likert scales) qualitative stronger, more uniform support. Acupressure and interventionFigure 6. acrossSubjective the different Stress stages Levels on forpart 2. gathered closed-form (Likert scales) qualitative stronger, more uniform support. Acupressure and each intervention across the different information about likeability, potential benefit, and games have some strengths and noticeable each intervention across the different information about likeability, potential benefit, and games have some strengths and noticeable Figure 4.7 stagespresents on the phase normalized 2 data gathered from theperceived usability efficacy of the interventions. We also weaknesses. stages on phase 2 perceived efficacy of the interventions. We also weaknesses. questionnaire. The social network support and breathing interventions have gathergathereded open-endedopen-ended informationinformation about improvements stronger, more uniform support. Acupressure and games have some strengths A preliminary Principal Component Analysis (PCA) on and noticeable weaknesses. asas wellwell asas suggestionssuggestions forfor newnew interventions.interventions. A preliminary Principal Component Analysis (PCA) on thethe objectiveobjective datadata showedshowed thatthat all all the the aforementioned aforementioned ResultsResults biometricsbiometrics usedused toto evaluateevaluate stress stress provide provide 86% 86% AsAs seenseen inin FigureFigure 6,6, thethe subjectivesubjective scale expressed the correlationcorrelation withwith thethe expectedexpected results. results. This This result result expectedexpected patternpattern ofof stressstress andand calmnesscalmness in different indicatesindicates thatthat thesethese biometricsbiometrics are are relevant relevant to to the the stages.stages. AtAt anan aggregateaggregate level,level, stressorsstressors did raise futurefuture problemproblem ofof inferringinferring stressstress from from ECG ECG and and GSR GSR perceived stress levels and the calming stages did samples, in conjunction with subjective data. Further perceived stress levels and the calming stages did samples, in conjunction with subjective data. Further manage to lower the stress level. Participants' analysis is important to reduce the set of components. manage to lower the stress level. Participants' analysis is important to reduce the set of components. subjective stress ratings were ranked to account for Additionally, we performed a pairwise t-test of the subjectivedifferences stress in individual ratings ratingwere rankedscales. toRatings account for foreach aggregatedAdditionally, values we performed for the stress a pairwise stages t-test (Stroop of theand differencesstage were inranked individual from rating1 to 6: scales. the stage Ratings with for the each Math)aggregated and the values calm forstages the stress(Calm stages1 and Calm (Stroop 2). andWe stagelowest were subjective ranked stress from 1rating to 6: was the ranked stage withas 1; the the observedMath) and that the the calm difference stages (Calm was statistically 1 and Calm significant 2). We lowesthighest subjective stress rating stress was rating ranked was as ranked 6. Using as a 1; multiple the (p<0.001),observed that which the means difference that wasindeed, statistically GSR and significant EDC
highestcomparisons stress Friedman rating was test, ranked the variation as 6. Using between a multiple follow(p<0.001), the subjective which means stress that states indeed, at an GSR aggregate and EDC
Figure 7. Comparison of the four comparisonsstages was found Friedman to be test, statistically the variation significant between levelfollow. However, the subjective at an stressindividual states level at therean aggregate are many (p<0.001) . Additionally, in phase 2, the subjects left differences and gaps, which suggests that careful FigureFigureinterventions. 4.7: Comparison7. Comparison Social of theNetworking of fourthe four interventions. and stages Social was found to be statistically significant level. However, at an individual level there are many Networkingbreathing and had Guided the Breathing most uniform had the most uniform(p<0.001)the test with. Additionally, a lower level in ofphase stress 2, than the subjectsthe level leftthey analysisdifferences and and further gaps, design which may suggests be needed that carefulto distributioninterventions. of usability Social traits. Networking and entered with (p<0.001). No statistical significance was guarantee usability. Figure 8 shows an example of GSR distribution the test with a lower level of stress than the level they analysis and further design may be needed to breathing had the most uniform found regarding the effect of each intervention. In a signal levels plotted against samples, where oscillations entered with (p<0.001). No statistical significance was guarantee usability. Figure 8 shows an example of GSR distribution found regarding the effect of each intervention. In a signal levels plotted against samples, where oscillations
1702
1702 70
A preliminary Principal Component Analysis (PCA) on the objective data showed that all the aforementioned biometrics used to evaluate stress provide 86% correlation with the expected results. This result indicates that these biometrics are relevant to the future problem of inferring stress from ECG and EDA samples, in conjunction with subjective data. Further analysis is important to reduce the set of components. Additionally, we performed a pairwise t-test of the aggrCHIegated 2011 values ¥ Work-in-Progress for the stress stages (Stroop and May 7Ð12, 2011 ¥ Vancouver, BC, Canada Math) and the calm stages (Calm 1 and Calm 2). We observed that the difference was statistically significant (p<0.001), which means that indeed,
EDA follow the subjective stress states at an aggregate level. However, at an individual level there are many differences and gaps, which suggests that careful analysis and further design may be needed to guarantee usability. Figure 4.8 shows an example of EDA signal levels plotted against samples, where oscillations show the changes between stages, inversely comparable with the peaks observed in Figure 4.6. Lower values are correlated with high stress stages, while higher peaks with calm ones. show the changes between stages, inversely 1. Social Networks: comparable with the peaks observed in Figure 6. Lower Timely response: by increasing the number of values are correlated with high stress stages, while contacts or by maintaining a message repository to higher peaks with calm ones. Figure 9 shows a be used when no contacts are available. summary of the levels of the different biometrics. On Help button: Allow the person to request help from an aggregate level many expected behaviors were their contacts. observed: lower EDC mean, positive EDC change and lower HR for the calm stages and their inverses for the 2. Breathing: stress stages. HRV did not show the expected behavior, Pressure feedback: may resemble the act of showing a higher value for stress. breathing better than vibration. Training and adaptation: Gradually increasing Recommendations speed and frequency of breathing could be useful The main recommendations from our research are: Figure 4.8. Electro-Dermal Activity (EDA) oscillating around 540 mV especially for people not familiar with deep- Figure 8. GSR measurement Suite of Interventions and Context: Many interventions breathing techniques. oscillating around 540 mV could coexist in a system, and its application should be Figure 4.9 shows a summary of the levels of the different biometrics. On an 3. Acupressure: aggregate level many expected behaviors were observed: lower EDCbased mean, on four factors, all in relationship with context: positive EDC change and lower HR for the calm stages and their inverses for Training and adaptation: Some participants did not 1. Volatility: assign an intervention to a situation that the stress stages. HRV did not show the expected behavior, showing a higher manage to wear the bracelet correctly, and others value for stress. carries the lowest risk of becoming a stressor. As found it to be too ―novel‖. Some users mentioned an example, breathing techniques may not be that as time passed they felt more at ease. appropriate in a humid or hot climate. 2. Interruption time: assign an intervention that has Other acupressure points: Some users mentioned the adequate amount of interruption time. As an their interest to apply the device in other points example, long interventions may not be such as neck, arms and legs. appropriate during a meeting break. 4. Playing games: 3. Media: assign an intervention appropriate to the Personalization: a personalized suite of games will time and place. As an example, sound-emitting be important to choose the best-suited games to interventions may not be appropriate in an office. calm people down. 4. Habituation: all the interventions run the risk of Passive games: some participants mentioned that growing old. Having a suite of interventions that active games gets them stressed and/or ―hooked‖. changes over time, even for similar contexts, will Games with very simple tasks could be more be necessary to maintain the users’ interest. beneficial. Games deserve further study to define
the right characteristics to calm people down. Design improvements: Some improvements to the Figure 9. Biometric Signals different interventions have been identified based on Long-term (real-life) usability study: As described by comparison between Calm and Stress the qualitative data: Muñoz [7] to achieve long-term usability, self-help
1703 CHI 2011 ¥ Work-in-Progress May 7Ð12, 2011 ¥ Vancouver, BC, Canada
show the changes between stages, inversely 1. Social Networks: comparable with the peaks observed in Figure 6. Lower Timely response: by increasing the number of values are correlated with high stress stages, while contacts or by maintaining a message repository to higher peaks with calm ones. Figure 9 shows a be used when no contacts are available. summary of the levels of the different biometrics. On Help button: Allow the person to request help from an aggregate level many expected behaviors were their contacts. observed: lower EDC mean, positive EDC change and lower HR for the calm stages and their inverses for the 2. Breathing: stress stages. HRV did not show the expected behavior, Pressure feedback: may resemble the act of showing a higher value for stress. breathing better than vibration. Training and adaptation: Gradually increasing Recommendations speed and frequency of breathing could be useful The main recommendations from our research are: especially for people not familiar with deep- Figure 8. GSR measurement Suite of Interventions and Context: Many interventions breathing techniques. oscillating around 540 mV could coexist in a system, and its application should be 3. Acupressure: based on four factors, all in relationship with context: Training and adaptation: Some participants did not 1. Volatility: assign an intervention to a situation that manage to wear the bracelet correctly, and others carries the lowest risk of becoming a stressor. As found it to be too ―novel‖. Some users mentioned an example, breathing techniques may not be that as time passed they felt more at ease. 71 appropriate in a humid or hot climate. 2. Interruption time: assign an intervention that has Other acupressure points: Some users mentioned the adequate amount of interruption time. As an their interest to apply the device in other points example, long interventions may not be such as neck, arms and legs. appropriate during a meeting break. 4. Playing games: 3. Media: assign an intervention appropriate to the Personalization: a personalized suite of games will time and place. As an example, sound-emitting be important to choose the best-suited games to interventions may not be appropriate in an office. calm people down. 4. Habituation: all the interventions run the risk of Passive games: some participants mentioned that growing old. Having a suite of interventions that active games gets them stressed and/or ―hooked‖. changes over time, even for similar contexts, will Games with very simple tasks could be more be necessary to maintain the users’ interest. beneficial. Games deserve further study to define
the right characteristics to calm people down. Figure 4.9: Biometric signals comparison between Calm and Stress Design improvements: Some improvements to the stages Figure 9. Biometric Signals different interventions have been identified based on Long-term (real-life) usability study: As described by 4.7 IMPLIcomparisonCATIONS FOR between DESIGN Calm and Stress the qualitative data: Muñoz [7] to achieve long-term usability, self-help 4.7.1 Suite of Interventions and Context
Many interventions could coexist in a system, and its application should be based on four factors, all in relationship with context:
Volatility: assign an intervention to a situation that carries the lowest risk of becoming a stressor. As an example, breathing techniques 1703 may not be appropriate in a humid or hot climate. Interruption time: assign an intervention that has the adequate amount of interruption time. As an example, long interventions may not be appropriate during a meeting break. Media: assign an intervention appropriate to the time and place. As an example, sound-emitting interventions may not be appropriate in an office. Habituation: all the interventions run the risk of growing old. Having a suite of interventions that changes over time, even for similar 72
contexts, will be necessary to maintain the users’ interest.
4.7.2 Design improvements
Some improvements to the different interventions have been identified based on the qualitative data:
Social Networks:
Timely response: by increasing the number of contacts or by maintaining a message repository to be used when no contacts are available.
Help button: Allow the person to request help from their contacts.
Breathing:
Pressure feedback: may resemble the act of breathing better than vibration.
Training and adaptation: Gradually increasing speed and frequency of breathing could be useful especially for people not familiar with deep- breathing techniques.
Acupressure:
Training and adaptation: Some participants did not manage to wear the bracelet correctly, and others found it to be too ―novel‖. Some users mentioned that as time passed they felt more at ease.
Other acupressure points: Some users mentioned their interest to apply the device in other points such as neck, arms and legs.
Playing games:
Personalization: a personalized suite of games will be important to choose the best-suited games to calm people down.
Passive games: some participants mentioned that active games gets them stressed and/or ―hooked‖. Games with very simple tasks could be more beneficial. Games deserve further study to define the right characteristics to calm people down.
73
4.7.3 Long-term (real-life) usability
As described by Muñoz [100] to achieve long-term usability, self-help interventions should have: a rational (mental model), education/training phase, guaranteed usage in the real world and attribution of benefits to the tool. In the case of the different tools mentioned in this study we can obtain improvements in all these areas, however longitudinal and potentially ethnographic studies may be necessary to reach appropriate conclusions. Additionally, a longitudinal study will help gather real- life data to improve the way biometrics are used to infer stress levels, specially in situations where stress is either necessary to function, or in situations where patterns generated from normal physical or mental activity could be confused with stress patterns.
4.8 CONCLUSION
We are currently analyzing the biometric data to further add value to the selection of the most promising techniques. These techniques will be used in a larger experiment where biometric, subjective and psychometric data will be used to infer ambulatory stress and analyze potential interventions. Our study provided some design guidelines, as well as a perspective on promising interventions. Some of the key findings are:
Potential efficacy of mobile interventions: There is promise that mobile interventions can potentially calm people down in the earlier stages of stress. No significant difference between interventions: Further study is needed to find differences and/or to optimize intervention designs. Social networks leverage humor and intimacy: Intragroup virtual interaction can be leveraged to reduce individual perceived stress levels. High volatility: All interventions showed a degree of volatility (risk to become stressors) with context. Gaps between subjective and objective stress data: Further study of the discrepancies between subjective and biometric data could provide important information to improve the way mental states are inferred.
74
Chapter 5 Harvesting Web Interventions
Stress is considered to be a modern day “global epidemic"; so given the widespread nature of this problem, it would be beneficial if solutions that help people to learn how to cope better with stress were scalable beyond what individual or group therapies can provide today. Therefore, in this work, we study the potential of smart-phones as a pervasive medium to provide therapy for the general population - "popular therapy". The work melds two novel contributions: first, a micro-intervention authoring process that focuses on repurposing popular web applications as stress management interventions; and second, a machine-learning based intervention recommender system that learns how to match interventions to individuals and their temporal circumstances over time. After four weeks, participants in our user study reported higher self-awareness of stress, lower depression-related symptoms and having learned new simple ways to deal with stress. Furthermore, participants receiving the machine-learning recommendations without option to select different ones showed a tendency towards using more constructive coping behaviors.
75
5.1 INTRODUCTION
Stress is considered the modern-day killer epidemic [66]. Many physiological and mental disorders are associated with stress [5, 88]. Sadly, although many people (69%) recognize that stress is a big problem, only a small number (32%) actually know how to deal effectively with it [5]. Recent discoveries have shown the importance of coping with stress in a constructive way in order to reduce its damaging effects [130]. When undergoing stress, our body experiences a series of physiological changes, colloquially named the “fight or flight” response [123]. In the past, our ancestors living in the prairies relied on this mechanism to survive threats, such as being chased by an animal, or attacked by another tribe, etc. Modern humans are exposed to many psychological stressors - such as an imminent paper deadline, an interview, an important presentation, etc. However, often there is no practical way to “fight” or “flight” from these stressors. Coping constructively with such modern stressors is in many ways a skill we have to learn.
Several theories have driven the creation of effective therapeutic interventions [23, 33]. However, despite their efficacy, these interventions are not always efficient. Among several of the challenges, the interventions often suffer from two delivery problems: low adherence and low engagement rates. Research into how to improve these efficiency metrics is actively being pursued in the psychological community. In practical terms, the challenge to deliver effective interventions in real life can be summarized with the following question: how can we design the “right” intervention(s) to be delivered at the “right” time(s)?
In this chapter we focus on the first half of this challenge, i.e., “what” should a mobile app recommend when the user needs an intervention in any real life setting; leaving the “when” as future research. We conducted a study to verify three main questions:
Can we repurpose popular applications and web-sites as stress management micro-interventions?
Can the efficiency of such interventions be greatly improved by personalizing them to each individual and their context?
Can we gently move people’s stress coping tendencies from destructive to constructive ones over time?
We designed a system based on an adaptive “learn-by-doing” model that allowed us to verify these claims.
In the following sections we explain the current attempts used in the HCI community to create computerized mental health interventions. We explain 76
our novel intervention authoring process and the resulting group of micro- interventions derived from it; we present the details of the machine-learning (ML) system, its sensor inputs and algorithms to successfullymatch a user’s intervention request; we describe the mobile app we designed to record context data, enable an Experience Sampling Method (ESM) and deliver interventions; and we finally present study results and their implications for design and future research for recommender systems that leverage popular media.
5.2 BACKGROUND WORK
Contemporary research of technology for mental health has been mostly focused on sensing symptoms. Movements such as the Quantified Self and Wearable Computing are driving research focused on the development of new and adequate sensors that will enable clinical research. Much less research is focused on the delivery or the enablement of new therapies. We cite below a few examples of these studies, most of them extending current therapies and others exploring novel technologies.
5.2.1 Cognitive Behavioral Therapy (CBT) based technology
The most notable and successful use of technology for mental health is Computerized-CBT (CCBT) systems. The most relevant are: MoodGYM [98] FearFighter [39] and Beating the Blues [11]. These systems provide effective treatment even covered by health insurance in some countries. Another interesting effort is the use of gaming platforms for CBT enhancement [30]. Online technology has also been utilized for CBT-based smoking cessation [100], and recently for personalized CBT treatment [33]. An example of a mobile system leveraging CBT concepts is the PTSD coach [120]. This app teaches patients with Post Traumatic Stress Disorder (PTSD) skills to manage their anxiety or depressive episodes.
5.2.2 Stress Technology Intervention R&D
Specifically focused on stress, Paredes and Chang’s CalmMeNow [112] presents four mobile interventions: social messages, breathing exercises, mobile gaming and acupressure. Among the relevant findings was a confirmation that there is a fine line between an intervention being effective and actually becoming a stressor, if applied in the wrong context. MoodWings [81] explores a wearable biofeedback design that helps people become aware of their stress to help regulate it. Maybe not surprisingly, during a driving task, stress actually went up, but driving performance improved significantly. This underlines the importance of stress as a normal reaction when performing demanding tasks, and that it is not always advisable to reduce it, but to perhaps simply keep it under control. More recently, wearable devices have 77
been developed to help regulate breathing patterns and increase breathing mindfulness [99]. It is worthwhile mentioning also the existence of various meditation and breathing commercial applications that help people learn relaxation and mindfulness.
Unlike previous studies, the goal of this work is not to find “the best” intervention. Instead, we argue that there is not a one-size- fits-all intervention. Therefore, we present methods that allow the authoring of many interventions and matching them to individuals based on their personalities and current needs.
5.3 STRESS MANAGEMENT SYSTEM DESIGN
We designed and implemented an application for Windows Phone 8.1 and cloud based services to support the delivery of micro- interventions, providing recommendations on the interventions, collecting user feedback and collecting contextual information. In the following sub-sections we provide more details about its main components.
5.3.1 Micro-Intervention Authoring System
Design Objectives
Our intervention authoring system design objectives were two: 1) maximize engagement and 2) discover online activities that could be used by the general population to reduce stress.
1. Maximize Engagement: Engagement is a key component of therapy adoption. Eysenbach’s work on attrition science [38] explains the importance of understanding the differences of intervention adoption between traditional drug trials and eHealth systems. He proposes metrics that capture not only the intrinsic efficacy of the interventions, but also its usability efficiency. Additionally, Schueller’s research on personalized behavioral technology interventions has shown the importance for patients to choose their own interventions as one way to improve engagement [131]. Finally, Doherty describes four strategies for increased engagement in online mental health: interactivity, personalization, support and social technology [33].
2. Stress Reduction Online Activities: Psychology research has shed some light on people’s natural abilities to deal with stressful situations during daily life. Bonanno has studied people’s innate characteristics to recover from stressful situations [14], Lazarus has described the strategies people use to cope [75] and the field of positive psychology studies the ways people use their strengths to reduce the impact of stress [132]. However, people deal on a daily basis with stress using simple physiological and psychological activities, 78
such as breathing before reacting negatively, giving meaning to hardship, laughing, etc.
It is reasonable then to assume that the recent trend towards mobile apps should reveal people using these apps for stress reduction activities. Some apps do share some characteristics similar to psychotherapy interventions, such as the following: keep us distracted away from our problems, record personal progress, organize thoughts, socialize, etc. Given this observation of the online world, we decided that it was worthwhile to explore the use of web apps as proxies to psychotherapy micro- interventions.
Mapping the Design Space
Our design space can be mapped as the intersection of stress management psychotherapy and popular web apps. We started by mapping the most commonly used stress management psychotherapy approaches. Then we grouped these approaches into four categories: Positive Psychology, Cognitive Behavioral, Meta-cognitive and Somatic (see Table 5.1). We chose this classification based on two premises: a) it corresponded to a theoretical framework as a good approximation to therapeutic approaches and was accepted by clinical psychology collaborators and b) it was simple enough to be presented to users using friendly nametags. As mentioned earlier, socialization is an element associated with improved engagement [33]. Therefore we further sub-divided our four intervention groups into interventions that could be performed alone (individual) or with or for others (social). In parallel, we mapped the top web apps [140] and top (Windows Phone) apps [152] and games [153]. We chose best rating metric as a proxy for popular/engaging apps, i.e., those with high levels of adoption and user satisfaction.
Micro-Intervention Structure We wanted to design micro-interventions that followed some of the effective usability characteristics described by Olsen [111], i.e. a micro-intervention that could be designed by diverse design populations (i.e., psychologists, caretakers or even users), be used in combination, and scaled up easily. We boiled down the micro- intervention format to a minimal expression using only two components: a text prompt that tells the user what to do and a URL that launches the appropriate tool to execute the micro- intervention (see Table 5.1 for examples). Furthermore, we constrained the micro-interventions to be representative of one of the psychotherapy categories, and performed in a short time (approximately less than 3 minutes) to maximize usage scenarios.
79
+
+
+
+
+ : Psychology research has
of a Prompt plus a URL. dual and social groups. groups. social and dual
+
Engagement is a key component of
+ Shall we play a short game?” short a play we Shall " “Challenge yourself! Replace an unpleasant an Replace yourself! “Challenge these of some Try stretch! quick a for "Time “ : “Everyone has something they do really well... really do they something has “Everyone : : : : http://www.facebook.com/me/
: “Write down a stressful memory of another of memory stressful a down “Write “"Try to think what new perspective these news these perspective new what think to “"Try eat to want they when except hilarious are "Cats
intervention intervention format consists of a : "Learn about active constructive responding and responding constructive active about "Learn : : : : Micro-intervention Samples Micro-intervention - URL Prompt Prompt Prompt Prompt
+ : : : :
Prompt Prompt Prompt Prompt
: : : : http://www.magicappstore.com http://privnote.com http://www.shrib.com/ http://www.huffingtonpost.com/good-news/ http://m.pinterest.com/search/pins/?q=office stretch http://m.pinterest.com/search/pins/?q=office http://youtube.com/results?search_query=active+constructive http://m.pinterest.com/search/pins/?q=funny cats http://m.pinterest.com/search/pins/?q=funny
: : : : : : : are are subdivided into individual and social
Individual Social Individual Social Individual Social Individual Social find an example on your FB timeline that showcases one of your of one showcases that timeline FB your on example an find strengths.” practice with one person" one with practice it. destroy then and disappears, and away flies it imagine person, thought with two pleasant ones. Write the pleasant ones down.” ones pleasant the Write ones. pleasant two with thought URL minutes of few a for URL URL others." with them share and life your to bring URL friends." your to it show and these of few a out Check me. URL URL URL
• • • • • • • • Stress Reduction Online Activities are subdivided into indivi
intervention formatintervention consists collecting user feedback and collecting contextual information. In the following sub-sections we provide more details about its main components. System Authoring Micro-Intervention Objectives Design Our intervention authoring system design objectives were two: 1) maximize engagement and 2) discover online activities that could stress. reduce to population general the by used be 1) Maximize Engagement: therapy adoption. Eysenbach’s work explains on the attrition importance science intervention [10] adoption of between traditional drug understanding trials and eHealth the systems. He differences proposes metrics that of capture not only efficacy the intrinsic of the interventions, Additionally, but also Schueller’s its research usability technology interventions has on shown the efficiency. importance for patients to personalized behavioral choose their own interventions as one way to improve engagement [24]. Finally, Doherty describes engagement in online mental health: interactivity, personalization, four strategies for technologysocial and support [9]. increased 2) shed some light on people’s natural abilities to deal with stressful situations during daily life. Bonanno has studied people’s innate characteristics to recover from stressful situations [3], Lazarus has described the strategies people use to cope [13] and positive psychology the studies the ways field people use their of strengths to reduce the impact of stress [25]. However, people deal on a daily basis with stress using simple activities, physiological such and as psychological breathing before hardship, to meaning etc. laughing, reacting negatively, giving