VERSE: Bridging Screen Readers and Voice Assistants for Enhanced

VERSE: Bridging Screen Readers and Voice Assistants for Enhanced Eyes-Free Web Search Alexandra Vtyurina∗ Adam Fourney Meredith Ringel Morris University of Waterloo Microsoft Research Microsoft Research Waterloo, Ontario, Canada Redmond, WA, USA Redmond, WA, USA [email protected] [email protected] [email protected] Leah Findlater Ryen W. White University of Washington Microsoft Research Seattle, WA, USA Redmond, WA, USA [email protected] [email protected] ABSTRACT INTRODUCTION People with visual impairments often rely on screen readers People with visual impairments are often early adopters of when interacting with computer systems. Increasingly, these audio-based interfaces, with screen readers being a prime individuals also make extensive use of voice-based virtual example. Screen readers work by transforming the visual assistants (VAs). We conducted a survey of 53 people who are content in a graphical user interface into audio by vocalizing legally blind to identify the strengths and weaknesses of both on-screen text. To this end, they are an important accessibility technologies, and the unmet opportunities at their intersection. tool for blind computer users – so much so that every major We learned that virtual assistants are convenient and accessible, operating system includes screen reader functionality (e.g., but lack the ability to deeply engage with content (e.g., read VoiceOver1, TalkBack2, Narrator3), and there is a strong mar- beyond the frst few sentences of an article), and the ability ket for third-party offerings (e.g., JAWS4, NVDA5). Despite to get a quick overview of the landscape (e.g., list alternative their importance, screen readers have many limitations. For search results & suggestions). In contrast, screen readers example, they are complex to master, and depend on the coop- allow for deep engagement with content (when content is eration of content creators to provide accessible markup (e.g., accessible), and provide fne-grained navigation & control, but alt text for images). at the cost of reduced walk-up-and-use convenience. Based on Voice-activated virtual assistants (VAs), such as Apple’s Siri, these fndings, we implemented VERSE (Voice Exploration, Amazon’s Alexa, and Microsoft’s Cortana, offer another audio- Retrieval, and SEarch), a prototype that extends a VA with based interaction paradigm, and are mostly used for everyday screen-reader-inspired capabilities, and allows other devices tasks such as controlling a music player, checking the weather, (e.g., smartwatches) to serve as optional input accelerators. and setting up reminders [47]. In addition to these house- In a usability study with 12 blind screen reader users we hold tasks, however, voice assistants are also used for general- found that VERSE meaningfully extended VA functionality. Participants especially valued having access to multiple search purpose web search and information access [31]. In contrast results and search verticals. to screen readers, VAs are marketed to a general audience and are limited to shallow investigations of web content. Be- ing profcient users of audio-based interfaces, people who ACM Classifcation Keywords are blind often use VAs, and would beneft from broader VA H.5.m. Information Interfaces and Presentation (e.g. HCI): capabilities [36, 2]. Miscellaneous In this work, we explore opportunities at the intersection of Author Keywords screen readers and VAs. Through an online survey with 53 Virtual assistants; voice search; screen readers; accessibility blind screen reader and VA users, we investigated the pros and cons of searching the web using a screen reader-equipped web *Work done while at Microsoft Research. browser, and when getting information from a voice assistant. Based on these fndings, we developed VERSE (Voice Explo- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed ration, Retrieval, and SEarch) – a prototype that augments the for proft or commercial advantage and that copies bear this notice and the full citation VA interaction model with functionality inspired by screen on the frst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or 1https://www.apple.com/accessibility/mac/vision/ republish, to post on servers or to redistribute to lists, requires prior specifc permission 2 and/or a fee. Request permissions from [email protected]. https://support.google.com/accessibility/android/answer/6283677 3 ASSETS ’19, October 28–30, 2019, Pittsburgh, PA, USA https://www.microsoft.com/en-us/accessibility/windows 4 © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. https://www.freedomscientifc.com/Products/Blindness/JAWS 5 ISBN 978-1-4503-6676-2/19/10. $15.00 https://www.nvaccess.org/ DOI: 10.1145/3308561.3353773 readers to better support free-form, voice-based web search. participants, and exhibited more probing behaviour (i.e., “a We then conducted a design probe study of VERSE, and iden- user leaves and then quickly returns to a page” [12]) showing tifed future directions for improving eyes-free information- greater diffculty in triaging search results. Assessing trust- seeking tools. worthiness and credibility of search sources can also pose a problem. Abdolrahmani et al. [3, 1] found that blind users use This work addresses the following research questions: signifcantly different web page features from sighted users to • RQ1: What challenges do blind people face when: (a) assess page credibility. seeking information using a search engine and a screen In this paper, our survey lends further support to these prior reader versus (b) when using a voice assistant? fndings on web accessibility, and extends them to include • RQ2: How might voice assistants and screen readers be challenges encountered when using voice-activated virtual merged to confer the unique advantages of each technology? assistants. • RQ3: How do blind web searchers feel about such hybrid systems, and how might our prototype, VERSE, be further improved? Novel Screen Reader Designs Traditional screen readers provide sequential access to web In the following sections we cover prior research, the online content. Stockman et al. [43] explored how this linear repre- survey, the functionality of VERSE, and the VERSE design sentation can mismatch the document’s spatial outline, con- probe study. We conclude by discussing the implications of tributing to high cognitive load for the user. To mitigate this our fndings for designing next-generation technologies that issue, prior research has explored a variety of alternative screen improve eyes-free web search for blind and sighted users by reader designs [39], which we briefy outline below. bridging voice assistants and screen readers paradigms. One approach is to use concurrent speech, where several RELATED WORK speech channels simultaneously vocalize information [21, 52]. This work builds on several distinct threads of prior research, For example, Zhu et al.’s [52] Sasayaki screen reader augments as detailed below. primary output by concurrently whispering meta information to the user. Web Exploration by Screen Reader Users A method for non-visual skimming presented by Ahmed et Accessing web content using a screen reader can be a daunt- al. [4] attempts to emulate visual “glances” that sighted people ing task. Though the Web Content Accessibility Guide- use to roughly understand the contents of a page. Their results lines (WCAG 6) codify how creators can improve the accessi- suggest that such non-visual skimming and summarization bility of their content, many websites fail to adhere to these techniques can be useful for providing screen reader users guidelines [13, 22]. For example, Guinness et al. report that, with an overview of a page. in 2017, alternative text was missing from 28% of the images sampled from the top 500 websites indexed by alexa.com [22]. Khurana et al. [26] created SPRITEs – a system that uses More generally, poor design and inaccessible content are the a keyboard to map a spatial outline of the web page in an leading reasons for frustration among screen reader users [27], attempt to overcome the linear nature of screen reader output. despite nearly two decades of web accessibility research. In All participants in a user evaluation completed tasks as fast as, fact, many of the challenges described by Jonathan Berry in or faster than, with their regular screen reader. 1999 [10] are still relevant to this day [25, 14, 42]. Conse- Another approach, employed by Gadde et al. [20], uses crowd- quently, screen reader users learn a variety of workarounds to sourcing methods to identify key semantic parts of a page. access inaccessible content: they infer the roles of unlabeled They developed DASX – a system that transported the users elements (e.g., buttons) by exploring the nearby elements, they to the desired section using a single shortcut based on these develop “recipes” for websites by memorizing their structure, semantic labels; as a result, they saw performance of screen and they use keyword search to skip to relevant parts of docu- reader users rise signifcantly. Islam et al. [24] used linguis- ments [15]. Even with these mitigation strategies, comparative tic and visual features to segment web content into semantic analysis has shown that blind users require more time per vis- parts. A pilot study

VERSE: Bridging Screen Readers and Voice Assistants for Enhanced

Accessible Cell Phones

ABC's of Ios: a Voiceover Manual for Toddlers and Beyond!

An Explorative Customer Experience Study on Voice Assistant Services of a Swiss Tourism Destination

The Nao Robot As a Personal Assistant

VPAT™ for Apple Ipad Pro (12.9-Inch)

International Journal of Information Movment

Google Search by Voice: a Case Study

Adriane-Manual – Wikibooks

Vizlens: a Robust and Interactive Screen Reader for Interfaces in The

Turning Voiceover on with Siri 1

Slide Rule: Making Mobile Touch Screens Accessible to Blind People Using Multi-Touch Interaction Techniques Shaun K

Darkreader: Bridging the Gap Between Perception and Reality Of