Socially-Aware Dialogue System
Total Page:16
File Type:pdf, Size:1020Kb
Socially-Aware Dialogue System Ran Zhao CMU-LTI-19-008 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: Alexander I.Rudnicky, CMU (Chair) Alan W.Black, CMU (Co-Chair) Louis-Philippe Morency, CMU Amanda Stent, Bloomberg Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy In Language and Information Technologies c 2019, Ran Zhao Keywords: socially-aware framework, multimodal machine learning, temporal sequence learning, cognitive architecture, neural dialogue model, rapport, conversational strategy, spoken dialogue system, discourse analysis, social reasoning, natural language generation, negotiation Abstract In the past two decades, spoken dialogue systems, such as those commonly found in cellphones and other interactive devices, have emerged as a key factor in human- computer interaction. For instance, Apple’s Siri, Microsoft’s Cortana, and Amazon’s Alexa help human users complete tasks more efficiently. However, research in this area has yet to produce dialogue systems that build interpersonal closeness over the course of a conversation while also carrying out the task. This thesis attempts to address that shortcoming. Specifically, research in computational linguistics (Bick- more and Cassell, 1999) has shown that people pursue multiple conversational goals in dialogue, which include those that fulfill propositional functions to contribute information to the dialogue; those that fulfill interactional functions to manage con- versational turn-taking; and those that fulfill interpersonal functions to manage the relationships between interlocutors. Although spoken dialogue systems have greatly advanced in modeling the propositional and, to a lesser extent, interactional functions of human communication, these systems fall short in replicating the interpersonal functions of conversation. We propose that this interpersonal deficiency is due to insufficient models of interpersonal goals and strategies in human communication. As dialogue systems become more common and are used more frequently, propo- sitional content and interactional content will not suffice. In this thesis, therefore, we address these challenges by proposing a socially-aware intelligent framework that exploits a path to systematically generate dialogues that fulfill interpersonal functions. In Zhao et al., (2014), we clarify that a socially-aware intelligent framework can explain how humans in dyadic interactions build, maintain, and tear down social bonds through specific conversational strategies that fulfill specific social goals and that are instantiated in particular verbal and nonverbal behaviors. In order to operationalize this framework, we argue that four capabilities are needed to achieve a socially-aware intelligent system. The system must (1) automatically infer human users’ social intentions by recognizing their social conversational strategies, (2) accurately estimate social dynamics by observing dyadic interactions, (3) reason through appropriate conversational strategies while accounting for both the task goal and social goal, and (4) realize surface-level utterances that blend task and social conversation. Our socially-aware dialogue system focuses on blended conversations that mix a goal-oriented task with social chat. As a proof of concept, we exploit a knowledge-inspired socially-aware personal assistant to aid conference attendees by eliciting their preferences through building rapport, and then making informed personalized recommendations about sessions to attend and people to meet. Finally, we leverage the power of neural networks to model negotiation dialogue within our socially-aware intelligent framework. We present a novel learning method and a two-phase computational model to blend negotiation utterances and social conversation. Our method requires less human efforts than traditional knowledge- inspired approaches. We conduct comprehensive experiments to show that the system can facilitate negotiation while building a social bond with a human user. iv Acknowledgments I would like to sincerely express my gratitude, first and foremost, to my won- derfully supportive and helpful committee, Prof. Alexander I. Rudnicky, Prof. Alan Black, Prof. Louis-Philippe Morency and Dr. Amanda Stent. My PhD advisor, Prof. Rudnicky, allowed me freedom to explore the field of human-computer interaction, guided my future research directions, and appreciated and supported my ideas. Our relationship is deeper than the typical advisor-advisee relationship. Prof. Rudnicky generously shared with me his insights on the fundamental philosophy and theory of human communication, which sparked my curiosity about dialogue systems and inspired me to learn state-of-the-art models. Our meetings left me imaging new research directions that I had previously not thought possible, and to ensure rigor, he guided me to craft evaluation metrics for each study. Without his help, I could not have produced the quality work that follows. Prof. Black offered helpful and creative advice when I strayed off path. He taught me how to ground computational theories of speech and language with methods of practical implementation. He made me believe that one day a computer could speak like a human. I am privileged to have had the opportunity to learn about multimodal machine learning from Prof. Morency. His excellent teaching and thoughtful questions equipped me with both high-level methodology design and low-level model development on the research of human communication dynamics. Dr. Stent at Bloomberg provided me an invaluable internship experience which engaged my great passion for applying deep learning to finance. She directed me to inspiring resources that broadened my horizons, and she taught me how to shape research ideas and design experiments. Even more, she helped me present our work at an international conference. I am extremely grateful for having four world-renowned experts on my committee. Thank you all for supporting my thesis work. During my PhD journey, I was fortunate to work on different projects and collab- orate with several faculty and talented researchers, including Prof. Justine Cassell, Prof. Mark Dredze, Dr. Arun Verma, Dr. David Rosenberg, Tanmay Sinha, Os- car J. Romero, Tiancheng Zhao, Yuntian Deng, Yoichi Matasuyama, Alexandros Papangelis, Arjun Bhardwaj, Sushma Akoju, Zhao Meng and Orson Xu. Thank you all for offering me precious research experience and lifelong friendship. As a student at CMU, I have learned so much about machine learning and natural language processing, and honed skills to implement that knowledge. Here, I want to thank professors who taught me while at CMU: Tom Mitchell, Manuela Veloso, Ruslan Salakhutdinov, Graham Neubig, Eduard Hovy, Eric Nyberg, Reid Simmons, Chris Dyer, Noah Smith and Carolyn Penstein Rose. In the past six years, I have attended many conferences, which gave me opportu- nities to discuss and interact with many smart people, such as Prof. Jonathan Gratch, Prof. David Traum, Dr. Jason Weston, Prof. Stefan Scherer, Prof. Stefan Kopp, Prof. Timothy Bickmore, Prof. Yukiko Nakano, Prof. Hannes Vilhjalmssoon, Natasha Jaques, Dr. Bing Liu, Elena Corina Grigore, Dr. Arno Hartholt, and many more than I can list here. I am inspired by their passion and grateful for their contributions to our field. In addition, I want to thank my sponsors, Yahoo! and Amazon, for their generous funding; Dr. Doug Phillips for his professional English revision; and my undergradu- ate research assistants for their annotation work. I am grateful for your assistance, which sped up my research and made my life much easier. Finally, to my dearest family: Thank you. To my parents, for your unconditional love and support. Thank you, Jingjing, my wonderful wife and soulmate, for facing with me the challenges of the last six years and giving me the courage to overcome the difficulties. Thank you, Winston and Harry, my two cute sons for making my life meaningful and enjoyable. This dissertation is as much yours as it is mine. vi Contents 1 Introduction 1 1.1 Thesis Statement . .4 1.2 Thesis Structure . .4 2 Theoretical Framework 7 2.1 Introduction . .7 2.2 Related Work . .8 2.3 Theoretical Framework for Rapport Management . .9 2.4 Computational Model of Rapport Management . 13 2.5 Examples from Corpus Data . 15 2.6 Conclusions . 15 3 A Knowledge-inspired Socially-aware Personal Assistant (SAPA) 17 3.1 Overview of Socially-aware Personal Assistant . 17 3.1.1 Introduction . 17 3.1.2 Computational Architecture . 18 3.1.3 Sample Dialogues . 19 3.2 Predictive Model for Conversational Strategies Recognition . 20 3.2.1 Introduction . 20 3.2.2 Related Work . 20 3.2.3 Ground Truth . 21 3.2.4 Understanding Conversational Strategies . 21 3.2.5 Machine Learning Modeling . 25 3.2.6 Results and Discussion . 26 3.2.7 Post-experiment . 27 3.2.8 Conclusions . 29 3.3 Predictive Model for Rapport Assessment . 30 3.3.1 Introduction and Motivation . 30 3.3.2 Related Work . 31 3.3.3 Study Context . 33 3.3.4 Method . 34 3.3.5 Experimental Results . 35 3.3.6 Validation and Discussion . 36 3.3.7 Conclusions and Future Work . 40 vii 3.4 Conversational Strategy Planning for Social Dialogue . 40 3.4.1 Introduction and Motivation . 40 3.4.2 Related Work . 41 3.4.3 System Architecture . 42 3.4.4 Computational Model . 42 3.4.5 Design of the Decision-Making module . 45 3.4.6 Experimentation and Results . 46 3.4.7 Conclusions . 49 4 A Neural Social Intelligent Negotiation Dialogue System (SOGO) 50 4.1 Introduction and Motivation . 50 4.2 Study Context . 52 4.3 Related Work . 52 4.3.1 Negotiation Agent . 52 4.3.2 Social Intelligent Agent . 54 4.4 SOGO 1.0: A Semi-automated Socially-aware Negotiation System . 54 4.4.1 Two-phase Computational Model . 54 4.4.2 System Architecture . 61 4.4.3 Pilot Study with SOGO 1.0 System . 62 4.4.4 Evaluation . 63 4.4.5 Discussion . 66 4.5 SOGO 2.0: A Fully-automated Socially-aware Negotiation System . 67 4.5.1 Proposed Framework . 69 4.5.2 Rapport Estimator . 69 4.5.3 Socially-aware Dialogue Model .