Situated Intelligent Interactive Systems
Total Page:16
File Type:pdf, Size:1020Kb
Situated Intelligent Interactive Systems Zhou Yu Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Artificial Intelligence The Langauge Technology Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213 2016 Thesis Committee: Alan W Black, Chair Alexander I. Rudnicky, Co-Chair Louis-Phillippe Morency Dan Bohus (Microsoft Research) David Suendermann-Oeft (Educational Testing Service) Copyright c 2016 Zhou Yu. All rights reserved. Abstract The recent wide usage of Interactive Systems (or Dialog Systems), such as Apple Siri has at- tracted a lot of attention. The ultimate goal is to transform current systems into real intelligent systems that can communicate with users effectively and naturally. There are three major chal- lenges to this ultimate goal: first, how to make systems that cooperate with users in a natural manner; second, how to provide a adaptive and personalized experience to each user to achieve better communication efficiency; and last, how to make multiple task system transition from one task to another fluidly to achieve overall conversation effectiveness and naturalness. To address these challenges, I proposed a theoretical framework, Situated Intelligence (SI) and applied it to non-task-oriented, task-oriented and implicit-task-oriented conversations. In the SI framework, we argue that three capabilities are needed to achieve natural and high quality conversations: (1) systems need situation awareness; (2) systems need to have a rich repertoire of conversation strategies to regulate its situation contexts, to understand natural lan- guage and to provide personalized user experience; (3) systems must have a global planning policy that optimally chooses among different conversation strategies at run-time to achieve an overall natural conversation flow. We make a number of contributions in different types of con- versation systems in terms of algorithms development and end-to-end systems building via ap- plying the SI framework. In the end, we introduce the concept of implicit-task-oriented system which interleaves the task conversation with everyday chatting. We implemented a film-promotion system and run a user study with it. The results show the system not only achieves the implicitly embedded goal but also keeps users engaged along the way. i Contents 1 Introduction 1 1.1 Motivation . .1 1.2 Contribution . .2 1.3 Outline . .3 2 Situated Intelligence Framework 5 2.1 Related Work . .5 2.2 Situation Awareness . .6 2.3 Conversation Strategy . .7 2.4 Global Planning Policy . .7 3 Application Overview 9 3.1 Non-Task-Oriented Interactive Systems . .9 3.1.1 Challenges . 10 3.1.2 Related Work . 10 3.1.3 Approach . 10 3.1.4 Systems . 11 3.1.5 Impact . 11 3.2 Task-Oriented Systems . 12 3.2.1 Challenges . 12 3.2.2 Related Work . 12 3.2.3 Approach . 13 3.2.4 Systems . 13 3.2.5 Impact . 13 3.3 Implicit-Task-Oriented Systems . 14 3.3.1 Challenges . 14 3.3.2 Approach . 14 iii 3.3.3 Systems . 14 4 TickTock, A Non-Task-Oriented Dialog System Framework 15 4.1 Database . 16 4.2 User Input Process . 16 4.3 Answer Retrieval . 17 4.4 Engagement Module . 17 4.5 Dialog Manger . 17 4.6 Text-to-Speech and Talking head . 18 4.7 TickTock in Mandarin . 18 4.8 Conclusion . 20 5 Crowd-source for Non-Task-Oriented Systems 21 5.1 Introduction . 21 5.2 Methodology . 22 5.2.1 Mechanical Turk Study Design . 23 5.2.2 Results and Analysis . 25 5.3 Conclusion . 25 6 Engagement Understanding 27 6.1 Introduction and Related Work . 27 6.2 Experimental Setting . 29 6.3 Databases . 30 6.3.1 English TickTock Database (ETDb) . 30 6.3.2 Chinese TickTock Database (CTDb) . 31 6.4 Annotation Scheme . 32 6.5 Human Behavior Quantification . 33 6.5.1 Verbal Behavioural Cues . 34 6.5.2 Acoustic Behavioural Cues . 34 6.5.3 Visual Behavioural Cues . 34 6.5.4 Dialog Behavioural Cues . 35 6.6 Analysis and Results . 35 6.7 Conclusion . 38 7 Engagement Prediction 39 7.1 Introduction and Related Work . 39 7.2 Machine Learning Setting . 40 7.3 Feature Sets . 40 7.4 Results and Analysis . 41 7.5 Time and Accuracy Trade-Off . 42 7.6 Culture Adaptation . 44 7.7 Conclusion . 46 8 Conversation Strategy for Non-Task-Oriented Systems 47 8.1 Introduction . 47 8.2 Related Work . 48 8.3 Conversation Strategies . 49 8.3.1 Engagement Strategy . 49 8.3.2 Knowledge-base Strategy . 50 8.4 Dialog Policy . 52 8.5 User Study Design . 53 8.6 Data Annotation . 54 8.7 Results and Analysis of Conversation Strategy . 55 8.7.1 Knowledge-base Strategy . 55 8.7.2 Engagement Strategies for Engagement Maintenance . 57 8.7.3 Engagement Strategies for Engagement Improvement . 58 8.8 Results of Systems with Engagement Coordination . 58 8.9 Relationship Analysis of System Appropriateness and User Engagement . 60 8.10 Conclusion . 62 9 Global Planning Policy for Non-Task-Oriented Systems 63 9.1 Introduction . 63 9.2 Related Work . 64 9.3 Dialog Policy Design . 65 9.4 Reinforcement Learning . 67 9.4.1 Engagement Maintenance Policy . 68 9.4.2 Engagement Maintenance and Improvement Policy . 69 9.5 Evaluation Metrics . 69 9.5.1 Turn-Level Appropriateness . 70 9.5.2 Conversational Depth . 70 9.5.3 Information Gain . 72 9.5.4 Overall User Engagement . 72 9.6 Experiment . 72 9.7 Results and Analysis . 73 9.7.1 Reinforcement Learning Policy VS. Non-Reinforcement Learning Policy 74 9.7.2 Engagement Maintenance and Improvement Policy VS Engagement Main- tenance Policy . 75 9.8 Conclusion . 76 10 HALEF: A Task-Oriented Dialog System Framework 77 10.1 Introduction . 77 10.2 Foundational Frameworks . 78 10.2.1 The HALEF dialog framework . 78 10.2.2 FreeSWITCH . 80 10.2.3 Engagement Module . 80 10.3 Framework Integration . ..