Voogle: a Voice Interface to Google

Dan Stuart SI 531 12/14/2010 Voogle: A Voice Interface to Google Introduction Visually-impaired Internet users utilizing screen readers are often obligated to listen to a large amount of information that is either navigational or extraneous while navigating the Internet. This is necessary for using new websites, which the user is not familiar with, but for those that the user has already used many times before with this information is unnecessary. For pages that are accessed often, finding and conveying only the necessary information has the potential to save a large amount of time. The Google SERP is an example of such a page: along with the titles of returned results, the page can contain links to image results and other SERPs, summaries of retrieved pages, and full URLs, along with a large number of navigational links and ads (along with potential annoyances such as Google Instant, for which there is actually an invisible link that allows users of screen readers to turn it off with one click). I proposed to create a minimal interface to Google Search, combining the Google AJAX Search API with the text-to-speech pyttsx library and a speech recognition utility to build a speech-controlled interface to the Google search that will only convey the necessary information: result titles and truncated URLs, with options to control retrieval of other information such as variable- length summaries and full URLs, and navigation to selected links. Due to the limited availability of speech recognition software for my operating system, I was not able to implement the speech Figure 1: The Google SERP contains a large recognition capabilities, but I was able to build amount of extraneous information. and test a proof-of-concept application that functions otherwise as specified. I tested the usability of my application with the help of two of my peers, using for comparison the open-source screen reader Orca combined with the Mozilla Firefox web browser. Study participants completed several search tasks using the Orca/Firefox interface, then several using the Voogle interface, and compared the relative ease of each set of tasks. Participants were recruited from my peer group, and rated themselves as comfortable with executing advanced search tasks. Due to the difficulties in locating experienced screen reader users, neither of my study participants had any experience with assistive technology. This is potentially problematic, and would be corrected in a larger study, but both participants were equally unfamiliar with both interfaces, so the study was at least 1 Dan Stuart SI 531 12/14/2010 fair in that sense. The interfaces were evaluated on ease-of-use, as determined by user questionnaires, average time required to complete search tasks, and number of queries executed. Due to the problems implementing the speech recognition feature of Voogle, I served that role by typing what each study participant said to execute commands. The Orca/Firefox interface fared somewhat worse than Voogle, partly due to some problems with the speech engine, but also due to the large amount of information that participants had to navigate through in order to find the information they were looking for. Both participants pointed this out, although I had been careful not to mention it to them before or during the study. Literature Review Though the number of visually-impaired Internet users is relatively small, the problems they face trying to navigate poorly laid-out websites are of great concern to some HCI researchers. Lazar et al (2007) conducted a study of 100 blind screen reader users in which the participants kept a log of how their time was spent while using the Internet. Of the many frustrations that participants encountered, the researchers found that the top problem (by time wasted) among users of screen readers is “page layout causing confusing screen reader feedback”. Finding information on a poorly-designed page is the biggest problem that these users face, and while Lazar et al recommend better page design to solve this issue, it may be that a complex web page constructed primarily to serve the needs of users will always contain a large amount of extraneous content for blind users. Leporini and Paternò (2004) identify 18 criteria for evaluating the usability of websites when accessed with screen readers. Among these are: • number of links and frames, • location of navigation bar, • assignment of shortcuts, • specific sections, • indexing of contents, and • identifying the main page content. All of these criteria deal with reducing the amount of time users have to spend skipping through page content to get to the information they are looking for. These potential navigational issues account for one third of their criteria, the largest grouping, and all are problems I attempted to solve with the design of Voogle by stripping the Google SERP down to only the essential results information. Leporini et al (2008) looked at how to improve the code of the Google SERP, and the main recommendation was to “place the most important elements of the interface at the top of the source file” (emphasis theirs). This is an interesting recommendation, and very much in the same vein as the previous paper. Because screen readers primarily parse the HTML of a page, designing the HTML to 2 Dan Stuart SI 531 12/14/2010 accommodate screen readers and using the CSS to control layout is another elegant solution to the problem of website accessibility. Systems: Voogle For this project I wrote a proof-of- concept command-line interface to Google Search, designed specifically for visually- impaired users. Because I had originally planned to make the system voice-controlled, I named it Voogle, for “Voice-controlled Google”. (I found out part of the way through development that screen readers are traditionally named after aquatic creatures, e.g. Orca and JAWS. By the time I learned this I was already committed to the Voogle name, but I might otherwise have chosen a name that conforms to the tradition, such as Plecostomus.) I was unfortunately not able to implement the voice-control interface within the constraints of the operating systems I had to work with. However, I was able to get the other features working nicely, including the text-to-speech output. Figure 2: The Voogle command-line interface. The The Voogle user interface takes user has inspected the options menu, then initiated commands using a simple grammar: users can a search for "Dr Chuck" (ideally) say or type “search” (or one of a set of synonyms) to initiate a search, “options” to enter the options menu, or “help” to see and hear the help file. Simply entering “search” starts the search routine, which prompts for a query; alternatively users can enter the query after “search” in the same command to go directly to the query results. Everything pictured in Figure 1, except the user input, is read out to the user with the pyttsx library, which provides Python bindings to espeak, the standard text-to-speech utility included with the Linux operating system. When a query is entered, Voogle retrieves the search results using the Google AJAX API and reads them out one at a time, first the title, then the summary snippet (the same one that Google displays on its SERP), then the URL, which can be set in the options menu to be either “full” (the entire result URL) or “short” (just the domain name). After each result is read, Voogle asks the user whether to open the result. If the user answers “yes”, or an equivalent synonym, the result is remembered and will be opened in a web browser, which launches at the end of the search process. Voogle then moves 3 Dan Stuart SI 531 12/14/2010 on to the next result, unless the user answers with “quit”, in which case the search is aborted and any selected results are opened. One minor problem with this interaction is that “quit” is not an obvious response to the question “Open result?”, so this is one aspect of the user experience that needs work. Voogle includes several configurable options to customize its behavior, as shown in Figure 1. Several of these (“volume”, “rate”, “speech”) control the text-to-speech behavior. “Rate” sets the speed, in words per minute, that text is read out, and can be set between 1 and around 500, although the default, 200, is comfortable for most novice users. The “results” option controls the number of results that are read out before Voogle asks the user if they want more, and can be set from 1 to 32, which is the most results that the Google AJAX API will provide. In retrospect, this is not a very useful option, as the user has the option to quit the search after every result, so this in effect adds another choice they must make. The “url” option controls whether Voogle shows the full URL of each result, or just the domain name, as described above. The “lucky” option, when enabled, functions exactly as Google's “I'm Feeling Lucky” button: it causes Voogle to skip the results menu entirely and launches the first search result immediately. Finally, the “input” option was meant to control whether Voogle would look for input from the keyboard or microphone, but since speech input was never implemented, this option is non-functional. Systems: Orca/Firefox Orca is an open source assistive technology suite for visually impaired users. Aside from screen-reading capabilities, Orca offers screen magnification, braille output, “key echo” which pronounces the names of keys pressed, and customizable pronunciation of words. Orca is a very powerful, highly- Figure 3: The Orca interface customizable tool, suitable for general use with a wide variety of applications.

Load more