Learning Based Visual Engagement and Self-Efficacy

LEARNING BASED VISUAL ENGAGEMENT AND SELF-EFFICACY by SVATI DHAMIJA B.Tech., Electronics and Communications Engineering, Punjab Technical University, India 2007 M.Tech., Electronics and Communications Engineering, Guru Gobind Singh Indraprastha University, India 2011 A dissertation submitted to the Graduate Faculty of the University of Colorado Colorado Springs in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science 2018 This dissertation for the Doctor of Philosophy degree by Svati Dhamija has been approved for the Department of Computer Science by Terrance E. Boult, Chair 1 Charles C. Benight 2 Manuel Gunther¨ 1 Rory A. Lewis 1 Jonathan Ventura 3 Walter J. Scheirer 4 Date: 12-03-2018 1 T.Boult, R.Lewis and M.Gunther¨ are with University of Colorado Colorado Springs 2 C.Benight is with Psychology Department in University of Colorado Colorado Springs 3 J.Ventura is with California Polytechnic State University 4 W.Scheirer is with University of Notre Dame ii Dhamija, Svati (Ph.D., Computer Science) Learning Based Visual Engagement and Self-Efficacy Dissertation directed by El Pomar Professor, Chair Terrance E. Boult Abstract With the advancements in the fields of Affective computing, Computer vision and Human-computer interaction, along with the exponential development of intelligent machines, the study of human mind and behavior is emerging as an unrivaled mystery. A plethora of experiments are being carried out each day to make machines capable enough to sense subtle verbal & non-verbal human behaviors and understand human needs in every sphere of life. Numerous applications ranging from virtual assistants for online-learning to socially-assistive robots for health care, are being designed. A frequent challenge in such applications is creating scalable systems that are tailored to an individual’s capabilities, preferences and response to interaction. Moreover, digital interventions that are tailored to induce cognitive and behavioral changes in individuals are common, but often lack tools for automated analysis and monitoring of user behavior while interacting with the application. Lack of automated monitoring is especially detrimental in aggravating societal problems like mental illnesses and disorders, where it is important to determine the psychological wellbeing of populations, to detect people at risk, to personalize and deliver interventions that work in real-world. Prior research indicates that web-interventions have the potential to advance patient-centered mental health care delivery and can estimate patient involvement by recognizing human emotions and affect. Personalized computing systems with capabilities of perception, rational and intelligent behavior can also be developed by observing bio-physiological responses and facial expressions. Though invention of such systems is slowly improving the prediction and detection of stress levels, it is farther away from adapting to an individual’s specific need of cure and advise coping methods. Given the stigma associated with mental disorders, people may not always seek assistance. Engagement or disengagement with applications can be an important indicator of individual’s iii desire to be treated and whether they are interested in coping with their situation. A scalable adaptive person-centered approach is therefore essential to non-invasively monitor user- engagement and response to the application. In this work, we present a novel vision and learning based approach for web-based interventions with the use case of trauma-recovery. This thesis comprises of my recent research which develops contextual engagement, mood-aware engagement and multi-modal machine learning algorithms in a non-intrusive setting with EASE (Engagement Arousal & Self-Efficacy), by calibrating Engagement from face videos and physiological Arousal leading towards a hidden state of Self-Efficacy. iv Dedicated to my family. Thank you for always being there for me. Acknowledgements First and foremost, I would like to express my sincerest regards to my advisor Dr. Terrance E. Boult. Without his guidance and supervision, this thesis would not be possible. His unique ways of mentoring and vision of the big picture inspired me to think objectively and taught me the fundamentals of research. I would also like to extend gratitude to my committee members: Dr. Benight, Dr. Gunther,¨ Dr. Ventura, Dr. Scheirer and Dr. Lewis for setting high expectations and propelling me in fruitful directions. I thank the financial support of NSF Research Grant SCH-INT 1418520 “Learning and Sensory-based Modeling for Adaptive Web-Empowerment Trauma Treatment” for this work. I am grateful to numerous VAST lab members who made the workplace fun and engaging, Archana, Abhijit, Chloe, Steve, Manuel, Lalit, Gauri, Dubari, Khang, Ankita, Dan R., Caleb, James, Ethan, Marc, Bill, Yousef, Adria and Akshay. I would also like to acknowledge the support of my fellow researchers from psychology, Kotaro, Carrie, Austin, Pamela, Amanda and Shaun, who worked alongside me through the tardy and laborious process of data collection. Special thanks to Ginger for the numerous adventures we had together, including ski trips, hikes, city tours, fun-filled dinners, board games etc. She made colorado springs a home away from home. My encouraging friends Pooja, Reetu, Komal, Cherry, Radhika, Silvee, Himani and Archana, who make life happier in many ways despite the long distances and different time-zones that separate us. Last but not the least, I thank my parents and in-laws for being the source of inspiration and instilling me with courage. My grandparents for their unconditional love and blessings. Mumma and Papa for their countless sacrifices and prayers over the years. Thank you for keeping me grounded, making me aware of glass ceilings and trusting my abilities to break them. My brother, Akshay, for being my decisive and reassuring agony-uncle. My husband, Abhijit, for being my pillar of strength, friend and mentor. Thank you for braving this journey with me. Finally, I thank the almighty for watching over me through the activities of mind, senses and emotions. vi Contents 1 Introduction 1 1.1 Why Engagement? . 2 1.2 Why Self-Efficacy? . 3 1.3 Background of application space : Trauma-Recovery . 3 1.4 Limitations of prior work . 5 1.5 Scope of this research . 7 1.6 Contributions from this thesis . 8 1.6.1 Dataset Collection . 8 1.6.2 Contextual Engagement Prediction from video . 9 1.6.3 Mood-aware Engagement Prediction . 9 1.6.4 Automated Action Units vs. Expert Raters . 9 1.6.5 Self-Efficacy prediction: Problem formulation . 10 1.6.6 Predicting Change in Self-Efficacy using EA features . 10 1.7 Publications from this work . 11 1.7.1 First author publications . 11 1.7.2 Other work . 11 2 EASE Dataset Collection 13 2.1 Introduction . 13 vii 2.2 Related Work . 13 2.3 EASE Data Collection Setup . 15 2.4 Participants and Demographics . 16 2.5 The EASE space variables . 17 2.5.1 Engagement measures . 17 2.5.2 Arousal data . 19 2.5.3 Trauma-recovery outcomes and dependent variables . 20 2.6 Challenges . 21 3 Contextual Engagement Prediction from video 23 3.1 Introduction . 23 3.2 Related work . 26 3.3 Algorithms for Sequence Learning . 29 3.3.1 Recurrent Neural Network . 30 3.3.2 Gated Recurrent Units . 31 3.3.3 Long Short-Term Memory . 32 3.4 Experiments . 33 3.4.1 Facial action units extraction . 33 3.4.2 SVC model . 35 3.4.3 Sequence learning models and tuning . 35 3.4.4 Context-specific, Cross-context and Combined models . 36 3.4.5 Varying data-segment length . 36 3.5 Evaluation results . 38 3.5.1 Contextual engagement results . 38 3.5.2 Comparison results for sequence learning algorithms . 38 3.6 Discussion . 42 3.6.1 Performance of Contextual Engagement Models . 42 viii 3.6.2 Incorporating Broader Range of Data . 44 4 Mood-Aware Engagement Prediction 46 4.1 Introduction . 46 4.2 Related Work . 49 4.3 Experiments . 50 4.3.1 Profile of Mood States (POMS) . 50 4.3.2 Facial action units extraction . 51 4.3.3 LSTM Engagement and Mood Models . 52 4.3.4 Cross-Validation and Train-test pipeline . 54 4.4 Evaluation Results . 56 4.5 Broader Impact . 57 4.5.1 Robust Mood and Mood-Aware Engagement Models . 59 5 Automated Action Units Vs. Expert Raters 61 5.1 Introduction . 61 5.2 Related Work . 63 5.3 Sequential learning model using RNN: automated, expert raters . 65 5.4 EASE Subset & Annotations . 66 5.5 Machine Learning Models . 68 5.6 Model Training/Tuning . 71 5.7 Results . 73 5.8 Discussion . 75 6 Self-Efficacy: Problem formulation and Computation 77 6.1 Introduction . 77 6.2 EASE Coping Self-Efficacy Measures . 79 6.3 Effectiveness of CSE in estimating PTSD symptom severity . 80 ix 6.4 Observations from related work . 84 6.4.1 Engagement and Self-Efficacy . 86 6.4.2 Heart Rate Variability and Self-Efficacy . 87 6.5 Defining CSE as a Machine Learning problem . 89 6.6 CSE Prediction Pipeline . 90 6.7 EASE subset and Data distribution . 91 6.8 Experiments . ..

Learning Based Visual Engagement and Self-Efficacy

HONGLAK LEE 2260 Hayward St, Beyster Building Room 3773, Ann Arbor, MI 48109 [email protected]

Make3d: Learning 3D Scene Structure from a Single Still Image Ashutosh Saxena, Min Sun and Andrew Y

AAAI-12 Conference Committees

Ashesh Jain 142 Gates Building, Stanford University, CA 94305 [email protected]

Unsupervised Structured Learning of Human Activities for Robot Perception

Ashesh Jain 142 Gates Building, Stanford University, CA 94305 [email protected]

AAAI Organization

Robobarista: Learning to Manipulate Novel Objects Via Deep Multimodal Embedding

Large-Scale Knowledge Base for Robots

Avi Singh – Curriculum Vitae

Dipendra Misra [email protected]

CHAI 2020 Progress Report 9/30 2/38 Research Towards Solving the Problem of Control