Understanding User Satisfaction with Intelligent Assistants
Total Page:16
File Type:pdf, Size:1020Kb
Understanding User Satisfaction with Intelligent Assistants Julia Kiseleva1;∗ Kyle Williams2;∗ Ahmed Hassan Awadallah3 Aidan C. Crook3 Imed Zitouni3 Tasos Anastasakos3 1Eindhoven University of Technology, [email protected] 2Pennsylvania State University, [email protected] 3Microsoft, {hassanam, aidan.crook, izitouni, tasos.anastasakos}@microsoft.com ABSTRACT Q1: how is the weather in Chicago User’s dialog Voice-controlled intelligent personal assistants, such as Cortana, Q2: how is it this weekend with Cortana: (A) Task is “Finding a Google Now, Siri and Alexa, are increasingly becoming a part of Q3: find me hotels in Chicago hotel in Chicago” users’ daily lives, especially on mobile devices. They allow for a Q4: which one of these is the cheapest radical change in information access, not only in voice control and Q5: which one of these has at least 4 stars touch gestures but also in longer sessions and dialogues preserving context, necessitating to evaluate their effectiveness at the task or Q6: find me direc>ons from the Chicago airport to number one session level. However, in order to understand which type of user User’s dialog Q1: find me a pharmacy nearby interactions reflect different degrees of user satisfaction we need with Cortana: (B) explicit judgements. In this paper, we describe a user study that Q2: which of these is highly rated Task is “Finding a pharmacy” was designed to measure user satisfaction over a range of typical Q3: show more informa>on about number 2 scenario’s of use: controlling a device, web search, and structured Q4: how long will it take me to get there search dialog. Using this data, we study how user satisfaction var- thanks ied with different usage scenarios and what signals can be used for modeling satisfaction in different scenarios. We find that the notion of satisfaction varies across different scenarios and show that, in Figure 1: Two real examples of users’ dialogs with a intelligent some scenarios (e.g. making a phone call), task completion is very assistant: In the dialog (A), a user performs a ‘complex’ task important while for others (e.g. planning a night out), the amount of planning his weekend in Chicago. In the dialog (B), a user of effort spent is key. We also study how the nature and complexity searches for a closest pharmacy. of the task at hand affect user satisfaction, and found that preserv- ing the conversation context is essential and that overall task-level satisfaction cannot be reduced to query-level satisfaction alone. Fi- nally, we shed light on the relative effectiveness and usefulness of cuted by Northstar Research and commissioned by Google, found voice-controlled intelligent agents, explaining their increasing pop- out that 55% of the U.S. teens use voice search every day and that ularity and uptake relative to the traditional query-response interac- 89% of teens and 85% of adults agree that voice search is going tion. to be ‘very common’ in the future. One of the reasons for the in- H.5.2 [Information Interfaces and Presentation]: User Interfaces— eval- creased adoption is the current quality of speech recognition due to uation/methodology, interaction styles, voice I/O massive online processing [36], but perhaps more important is the added value users perceive: the spoken dialog mode of interaction Keywords: intelligent assistant, user satisfaction, user study, user experi- is a more natural way for people to communicate and is often faster ence, mobile search, spoken dialog system than typing. Intelligent assistants allow for radically new ways of information 1. INTRODUCTION access, very different from traditional web search. Figure 1 shows Spoken dialogue systems [35] have been around for a while. two examples of dialogs with intelligent assistants sampled from However, it has only been in recent years that voice controlled the interaction logs. They are related to two tasks: (A): searching intelligent assistants, such as Microsoft’s Cortana, Google Now, things to do on a weekend in Chicago, and (B): searching for the Apple’s Siri, Amazon’s Alexa, Facebook’s M, etc, have become a closest pharmacy. Users express their information needs in spoken daily used feature on mobile devices. A recent study [12], exe- form to an intelligent assistant. The user behavior is different com- pared to standard web search because in this scenario an intelligent Permission to make digital or hard copies of all or part of this work for personal or assistant is expected to maintain the context throughout the conver- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation sation. For instance, our user anticipates intelligent assistants to un- on the first page. Copyrights for components of this work owned by others than the derstand that their interaction is about ‘Chicago’ in the transitions: author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or Q1 ! Q2, Q3 ! Q4 in Figure 1(A). These structured search di- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. alogs are more complicated than standard web search, resembling CHIIR ’16, March 13 - 17, 2016, Carrboro, NC, USA complex, context-rich, task-based search [43]. Users expect their intelligent assistants to understand their intent and to keep the con- c 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. ISBN 978-1-4503-3751-9/16/03. $15.00 text of the dialog—some users even thank their intelligent assistant DOI: http://dx.doi.org/10.1145/2854946.2854961 for its service, as in example in Figure 1(B). (A) (B) (C) Cortana: “You Knowledge Pane Organic Results could probably go without one. The forecast shows …” User: “Do I need Image Answer Image Answer to have a jacket tomorrow?” Location Answer Queries: Figure 3: An example of a ‘simple’ task with a structured Figure 2: An example of mobile SERPs that might lead to ‘good search dialog. abandonment’. do not need to click. In case of structured dialog search, the lack Users communicate with intelligent assistants through voice com- of standard implicit feedback signals emerges even more because mands for different scenarios of use, ranging from controlling their users talk to their phones instead of making clicks. One example of device—for example to make a phone call, or to manage their cal- this is the transition Q ! Q in Figure 1(B). endar—to complex dialogs as shown in Figure 1. These interac- 2 3 In light of the current work, this paper aims to answer the fol- tions between users and intelligent assistants are more complicated lowing main research question: than web search because they involve: What determines user satisfaction with intelligent as- • automatic speech recognition (ASR): users communicate mostly sistants? through voice commands and it has been shown that errors in speech recognition negatively influence user satisfaction [23]; We breakdown our general research problem into five specific research questions. Our first research question is: • understanding user intent: an intelligent assistant needs to understand user intent in order to select a right scenario of RQ 1: What are characteristic types of scenarios of actions, and provide exact answers whenever possible; use? • dialog-based interaction: users expect an intelligent assistant Based on analysis of the logs of a commercial intelligent assistant; to maintain the context of the dialog; on the way different types of requests are handled by the back- end systems; and on previous work [25], we propose three types of • complex information needs: users express more sophisticated scenarios of intelligent assistant use: (1) controlling the device; (2) information needs while interacting with intelligent assis- searching the web; and (3) perform a complex task (or ‘mission’) tants. in dialogue style. We characterize key aspects of user satisfaction for each of these scenarios. This prompts the need to better understand success and failure of Our second research question is: intelligent assistant usage. When are users (dis)satisfied? How can we evaluate intelligent assistants in ways that reflect perceived user RQ 2: How can we measure different aspects of user satisfaction well? Can we resort to traditional methods of offline satisfaction? and online evaluation or do we need to take other factors into ac- We set up user studies with realistic tasks derived from the log count? alalysis, following the three scenarios of use, and measuring a wide Evaluation is a central component of many web search applica- range of aspects of user satisfaction relevant to each specific sce- tions because it helps to understand which direction to take in order nario. to improve a system. The common practice is to create a ‘gold’ Our third research question is: standard (set of ‘correct’ answers) judged by editorial judges [21]. In case of intelligent assistants, there may be no general ‘correct’ RQ 3: What are key factors determining user satisfac- answer since the answers are highly personalized and contextual- tion for the different scenarios? ized (user location, previous user history) to fit user information needs. Another way to evaluate web search performance is im- In order to understand what the key components of user satisfaction plicit relevance feedback such as clicks and dwell time [3, 11, 16, are, we analyze output of our user studies for different intelligent 26, 27]. However, we know that user satisfaction for mobile web assistants scenarios. We aim at understanding what factors influ- search is already very different [33]. ence user satisfaction the most: speech recognition quality, com- In the examples in Figure 2, different types of answers are shown plexity of the task, or the amount of effort users spend to complete for queries such as ‘Location Answer’, ‘Image Answer’ or ‘Knowl- tasks.