Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and Challenges
Total Page:16
File Type:pdf, Size:1020Kb
Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and Challenges by Yelim Kim BSc., The University of Toronto, 2016 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate and Postdoctoral Studies (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) March 2020 c Yelim Kim, 2020 The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled: Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and Challenges submitted by Yelim Kim in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Examining Committee: Dongwook Yoon, Computer Science Supervisor Joanna McGrenere, Computer Science Supervisor Karon MacLean, Computer Science Examining committee member ii Abstract With substantial industrial interests, conversational voice user interfaces (VUIs) are becoming ubiquitous through devices that feature voice assistants such as Apple's Siri and Amazon Alexa. Naturalness is often considered to be central to conversational VUI designs as it is associated with numerous benefits such as reducing cognitive load and increasing accessibility. The lit- erature offers several definitions for naturalness, and existing conversational VUI design guidelines provide different suggestions for delivering a natural experience to users. However, these suggestions are hardly comprehensive and often fragmented. A precise characterization of naturalness is necessary for identifying VUI designers' needs and supporting their design practices. To this end, we interviewed 20 VUI designers, asking what naturalness means to them, how they incorporate the concept in their design practice, and what challenges they face in doing so. Through inductive and deductive the- matic analysis, we identify 12 characteristics describing naturalness in VUIs and classify these characteristics into three groups, which are `Fundamental', `Transactional' and `Social' depending on the purpose each characteristic serves. Then we describe how designers pursue these characteristics under different categories in their practices depending on the contexts of their VUIs (e.g., target users, application purpose). We identify 10 challenges that de- signers are currently encountering in designing natural VUIs. Our designers reported experiencing the most challenges when creating naturally sounding dialogues, and they required better tools and guidelines. We conclude with iii implications for developing better tools and guidelines for designing natural VUIs. iv Lay Summary Providing natural conversation experience is often considered central to de- signing conversational Voice User Interfaces (VUIs), as it is expected to bring out numerous benefits such as lower cognitive load, lower learning curve, and higher accessibility. Despite its noted importance, naturalness is ill-defined. There are also no comprehensive standard resources for helping designers to pursue naturalness. In order to provide support for VUI designers in the future, it is critical to understand how they currently perceive and pursue naturalness in their design practices. Hence, we interviewed 20 VUI designers to understand their notion of a natural conversational VUI and their practices and challenges of pursuing it. In this thesis, we present 12 characteristics of naturalness and classify these characteristics into 3 groups. We also identify 10 challenges that our designers are currently encountering and conclude with implications for developing better tools and guidelines for designing natural conversational VUIs. v Preface This thesis was written based on the study approved by the UBC Behavioural Research Ethics Board (certificate number H18-01732). This thesis extends a conference paper that is currently under review for publication. As the first author of the submitted paper, I designed and conducted the semi-structured interviews and analyzed the data under the supervision of Dr. Dongwook Yoon and Dr. Joanna McGrenere. More specifically, my two supervisors helped me formulate research questions, design the study and analyze the collected data. The submitted paper was written with great help from the two supervisors as well as the help from Mohi Reza, another co-author of the paper. Mohi Reza, an MSc student, provided a great amount of help in writing the introduction and related work section of the submitted paper as well as providing great insight for shaping findings and contributions. Mohi Reza also provided English writing assistance for the submitted paper. vi Table of Contents Abstract .................................. iii Lay Summary ............................... v Preface ................................... vi Table of Contents ............................ vii List of Figures .............................. x Acknowledgements ........................... xi 1 Introduction .............................. 1 1.1 Problem Definition . 1 1.2 Contributions . 4 1.3 Overview . 4 2 Related Work ............................. 5 2.1 VUI Literature in HCI . 5 2.2 Existing Definitions of Naturalness . 6 2.3 Characterizing Conversations . 7 2.4 Difference Between Spoken Language and Written Language . 8 2.5 Human-likeness in Embodied Agents . 8 2.6 Tools and Guidelines for VUI Design . 9 vii 3 Methods ................................ 10 3.1 Participants . 10 3.2 Interviews . 12 3.3 User Tasks . 13 3.4 Procedure . 14 3.5 Data Analysis . 14 3.6 Apparatus . 15 4 Results ................................. 17 4.1 Designers Characterize Naturalness . 17 4.1.1 Fundamental Conversation Characteristics . 18 4.1.2 Social Conversation Characteristics . 23 4.1.3 Transactional Conversation Characteristics . 25 4.2 Designers Experience Challenges . 30 5 Discussion and Implications for Design ............. 47 5.1 Discussion . 47 5.1.1 Capturing Naturalness in Context . 47 5.1.2 Positioning Naturalness of VUI in the Literature . 48 5.1.3 Contrasting Naturalness of Transactional vs. Social Agents . 49 5.1.4 Limitations . 50 5.1.5 Implications for VUI Design Tools . 50 6 Conclusion and Future Work ................... 53 Bibliography ............................... 55 Appendix ................................. 69 A.1 The Recruitment Poster . 69 A.2 The Recruitment Message Posted Through SNS . 71 A.3 The Consent Form . 73 viii A.4 The Pre-interview Survey . 78 A.5 The Semi-structured Interview Script . 86 A.6 More Descriptions on the User Task . 93 A.7 The Post User-task Survey . 97 A.8 The Data Analysis Process . 104 ix List of Figures 3.1 The familiarity with SSML . 11 4.1 The twelve characteristics of naturalness that designers deem important . 19 4.2 The 10 challenges that designers are currently encountering in designing natural VUIs . 32 x Acknowledgements Firstly, I would like to express my sincere gratitude to my two supervisors, Dongwook Yoon and Joanna McGrenere. They were always generous with their time whenever I needed their help. Before I entered graduate school, I did not have much exposure to the Human-computer Interaction (HCI) field and research environment. My two great supervisors were always so patient and generous with my progress and encouraged me to explore and make my own decisions for my project. I really thank them for their extensive help throughout the whole project. My two supervisors always amazed me with their great professionalism and their endless passion for HCI research, and they set an example for me to follow. Secondly, I would like to thank Mohi Reza, an MSc student who helped me in writing a paper we submitted to a top tier computing conference. He dedicated 2 weeks for me to help my project. I really enjoyed working with him and learned a lot from him. Thirdly, I would like to express my appreciation to Karon MacLean for accepting to be the second reader of my thesis and being so generous with her time. I also want to thank her for her insightful guidance and help for the other project that I worked on with her student, Soheil Kianzad. Lastly, I would like to thank the students in Multimodal User eXperience lab for providing insightful feedback on my study and for offering much valuable advice on graduate life. xi Chapter 1 Introduction In this chapter, we first introduce an overview of the problem space. Then, we motivate and illustrate the contributions of our study, and outline the overall structure of the thesis. 1.1 Problem Definition With substantial industrial interest, conversational Voice User Interfaces (VUIs) are becoming ubiquitous, with the plethora of everyday gadgets, from smartphones to home control systems, that feature voice assistants (e.g., Ap- ple's Siri, Amazon Alexa and Google Home Assistant). Conversational VUI systems are one of the two general types of VUI systems [1]. In a conversa- tional VUI system, users perceive the voice agents as conversation partners and accomplish their goals by having conversations with the agents [1]. While in a command-based VUI system, which is the other general type of VUI sys- tem, users are expected to learn and use the appropriate voice commands to accomplish their goals [2]. Hereafter, we use the term `VUI' to refer to `con- versational VUI', and we use the term `voice agent' to refer to `conversational agent' [3]. At the heart of desired properties of VUIs is naturalness. According to 1 prior work and industrial design guidelines, enabling users to accomplish their goals by