Comparing Voice and Touch Interaction for Smartphone Radio and Podcast Application

DEGREE PROJECT IN THE FIELD OF TECHNOLOGY MEDIA TECHNOLOGY AND THE MAIN FIELD OF STUDY COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Comparing voice and touch interaction for smartphone radio and podcast application FREDRIK WALLÉN KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Comparing voice and touch interaction for smartphone radio and podcast application Jämförelse av röst- och pekskärmsinteraktion för en radio- och podcastapplikation för smartphones Fredrik Wallén KTH Royal Institute of Technology Stockholm, Sweden [email protected] Subject: Media Technology Master of Science (in engineering) in Media Technology Master in Media Technology Supervisor: Filip Kis Examiner: Roberto Bresin Principal: Isotop 29 June 2017 SAMMANFATTNING Röststyrningen blir vanligare och numera är den också möjligt att använda i individuella appar för smartphones. Det har dock inte tidigare undersökts för vilka uppgifter det ur ett användbarhetsperspektiv är att föredra framför pekskärmsinteraktion. För att undersöka det skapades ett röstinterface för en radio och podcast applikation som redan hade ett pekskärmsinterface. Röstinterfacet testades också med användare för att förbättra dess användbarhet. Efter det gjordes ett test där deltagarna blev ombedda att utföra samma uppgift med både pekskärm- och röstinterface. Den tid de tog på sig uppmättes och deltagarna betygsatte upplevelsen av att utföra uppgiften på en skala. Slutligen blev de tillfrågade om vilken interaktionsmetod de föredrog. För de flesta av de testade uppgifterna var röstinteraktion snabbare och fick högre betyg. Det ska dock noteras att i fall då användaren inte har specifika uppgifter att utföra kan det vara svårare för dem att veta vad en röststyrd app kan och inte kan göra än när de använder pekskärm. Många användare uttryckte också att de var motvilliga till att använda röstkommandon i allmänna utrymmen av rädsla för att verka underliga. Dessa resultat kan tillämpas på radio/podcast appar och, i mindre utsträckning, appar för att titta på TV-serier och spela musik. Comparing voice and touch interaction for smartphone radio and podcast application Fredrik Wallén KTH Royal Institute of Technology Stockholm, Sweden [email protected] ABSTRACT of NLUIs and VUIs in which mobile apps like Siri, Google Now Today voice recognition is becoming mainstream and nowadays it and Cortana, which combine visual and auditory information and is also possible to include in individual smartphone apps. voice-only devices such as the Amazon Echo and Google Home However, it has not previously been investigated for which tasks are becoming mainstream. it is preferable from a usability perspective to use voice Voice control is no longer exclusive to dedicated voice control recognition rather than touch. In order to investigate this, a voice apps. It is now possible for developers to use a voice control user interface was created for a smartphone radio application, interface for any app. However, it has not previously been which already had a touch interface. The voice user interface was investigated in which cases it is beneficial to do so from a also tested with users in order to improve its usability. After that, usability perspective. a test was conducted where the participants were asked to perform the same tasks using both the touch and voice interface. The time 1.2 Problem definition they took to complete the tasks was measured and the participants The objective of the study is to investigate for which types of rated the experience of completing the task on a scale. Finally, tasks, within a smartphone radio application, using voice they were asked which interaction method they preferred. commands provides a better user experience than using the For most of the tasks tested, the voice interaction was both faster touchscreen. and got a higher rating. However, it should be noted that in a case 1.3 Expected scientific results where users don’t have specific tasks to perform it might be My hypothesis was that touch interaction would be faster and give harder for them to know what a voice controlled app can and a better user experience than touch interaction except in cases cannot do than when they are using touch. Many users also where touch interaction would require several taps and keyboard expressed that they were reluctant to use voice commands in input. In these cases, voice interaction would be faster and and public spaces out of fear of appearing strange. These results can give a better user experience. be applied to other radio/podcast apps and, to a lesser extent, app for watching TV series and playing music. 1.4 Principal The thesis was carried out at Isotop, an IT consultancy company Keywords based in Stockholm, Sweden. They are hired by the Swedish voice user interface, voice command, natural language user Radio (SR) to create the Sveriges Radio Play app for iOS and interface, voice search, voice assistant Android. 1. INTRODUCTION 2. THEORY AND CONCEPTS 1.1 Background 2.1 Usability testing - Voice user interfaces At the end of the 1980s so called natural language user interfaces In her book “Designing Voice User Interfaces: Principles of (NLUIs) started to appear, according to Michos et al (1996). They Conversational Experiences” (2016) Cathy Pearl gives advice on provided an alternative to the standard command line interface how to conduct usability testing for voice user interfaces. She used for example in UNIX and DOS. Rather than having to mentions several principles that applies to usability testing in remember that the command for creating a directory in Unix is general, such as testing on users as similar to the target group as “mkdir”, a user could just write: “Create a directory called possible and studying what is good and bad in existing interfaces. Documents” and get the same result. This would take longer to She also advises not to have all users perform the tasks in the write but would make it possible for users not familiar with the same order, since the earlier tasks performed might influence how command language to use the system. people perform later tasks. Instead, it is better to use a latin square According to Pearl (2016) Voice user interfaces (VUIs) became so that the task order is varied. more common in the early 2000s in the form of interactive voice It is a good idea to ask the participants questions both after each response (IVR) systems. These were phone systems in which the task and at the end of the test. Quantitative questions commonly users could talk to a computer and perform the same tasks they use the Likert Scale (“strongly disagree” to “strongly agree”) and would previously perform talking to a human operator over the the questions alternate between being positive (“The system is phone. These systems were often designed to imitate human easy to use”) and negative (“The system is confusing”). conversations and the user commonly replied to questions asked Additionally, there could be open-ended questions such as “How by the computer. do you think the system could be improved?”. Pearl claims that we are in now could be known as the second era 1 Specifically for testing of voice user interfaces, she recommends actions that with certain probabilities will lead to other states. writing the tasks carefully and avoiding mentioning command States are also associated with a reward. You start at a specific words or strategies for completing the tasks. The tasks should also state and can then calculate which actions to take to maximize the not give away too much and only provide the essential probability to get a certain reward. With a partially observable information the user needs to complete them. Additionally, it is Markov decision process, you do not know for certain which state important to test whether the users understand that they can talk to you start at. Instead, there are certain probabilities you start at the system. Apart from this, the common usability testing each state. This makes for a more complicated calculation of how methodology think-aloud, in which the user speaks what they are to get the highest reward. experiencing, doesn’t work very well for testing VUIs since the The rule-based framework, according to Bellegarda, is rooted in users speak to interact with the system. artificial intelligence. It is based on rule sets where each rule In the early stages of design, Dybkjær and Bernsen (2001) consists of a condition and an action. Data and events from user proposes using a “Wizard of Oz” test, which is a test of something input are also inserted into a facts store, which manages facts. For that actually doesn’t work and a human behind the scenes gives example, regarding meetings there might be a rule saying that the illusion of a working system. According to them, this can be they shall contain a date, one or more persons and a location and used to gather valuable data. They emphasize the need to start the that they might contain a topic. evaluation as early as possible and continue evaluating throughout Bellegarda also argue that the statistical approach is better for the development. understanding free speech while the rule based framework is 2.2 Designing voice user interfaces better for understanding input for specific tasks. He also states that Apple’s Siri uses both of these two frameworks and combines Pearl (2016) explains that voice controlled apps have many them to try to get the best outcome. similarities with so called IVR systems which are bots spoken to over the phone. The main difference between Voice Controlled According to Tur and De Mori, in 2011, in commercial apps and IVR systems is that voice controlled apps also have a application the rule-based framework was usually used while the visual component, which can be used to convey additional statistical framework was more commonly used in research.

Comparing Voice and Touch Interaction for Smartphone Radio and Podcast Application

Voice Interfaces

An Explorative Customer Experience Study on Voice Assistant Services of a Swiss Tourism Destination

The Nao Robot As a Personal Assistant

International Journal of Information Movment

NLP-5X Product Brief

Google Search by Voice: a Case Study

A Guide to Chatbot Terminology

Voice User Interface on the Web Human Computer Interaction Fulvio Corno, Luigi De Russis Academic Year 2019/2020 How to Create a VUI on the Web?

VERSE: Bridging Screen Readers and Voice Assistants for Enhanced

Eindversie-Paper-Rianne-Nieland-2057069

Voice Assistants and Smart Speakers in Everyday Life and in Education

PDF Preprint