Dan Stuart SI 531 12/14/2010

Voogle: A Voice Interface to Google

Introduction

Visually-impaired Internet users utilizing screen readers are often obligated to listen to a large amount of information that is either navigational or extraneous while navigating the Internet. This is necessary for using new websites, which the user is not familiar with, but for those that the user has already used many times before with this information is unnecessary. For pages that are accessed often, finding and conveying only the necessary information has the potential to save a large amount of time. The Google SERP is an example of such a page: along with the titles of returned results, the page can contain links to image results and other SERPs, summaries of retrieved pages, and full URLs, along with a large number of navigational links and ads (along with potential annoyances such as Google Instant, for which there is actually an invisible link that allows users of screen readers to turn it off with one click). I proposed to create a minimal interface to Google Search, combining the Google AJAX Search API with the text-to-speech pyttsx library and a speech recognition utility to build a speech-controlled interface to the Google search that will only convey the necessary information: result titles and truncated URLs, with options to control retrieval of other information such as variable- length summaries and full URLs, and navigation to selected links. Due to the limited availability of speech recognition software for my , I was not able to implement the speech Figure 1: The Google SERP contains a large recognition capabilities, but I was able to build amount of extraneous information. and test a proof-of-concept application that functions otherwise as specified.

I tested the usability of my application with the help of two of my peers, using for comparison the open-source Orca combined with the Mozilla Firefox web browser. Study participants completed several search tasks using the Orca/Firefox interface, then several using the Voogle interface, and compared the relative ease of each set of tasks. Participants were recruited from my peer group, and rated themselves as comfortable with executing advanced search tasks. Due to the difficulties in locating experienced screen reader users, neither of my study participants had any experience with assistive technology. This is potentially problematic, and would be corrected in a larger study, but both participants were equally unfamiliar with both interfaces, so the study was at least

1 Dan Stuart SI 531 12/14/2010 fair in that sense.

The interfaces were evaluated on ease-of-use, as determined by user questionnaires, average time required to complete search tasks, and number of queries executed. Due to the problems implementing the speech recognition feature of Voogle, I served that role by typing what each study participant said to execute commands. The Orca/Firefox interface fared somewhat worse than Voogle, partly due to some problems with the speech engine, but also due to the large amount of information that participants had to navigate through in order to find the information they were looking for. Both participants pointed this out, although I had been careful not to mention it to them before or during the study.

Literature Review

Though the number of visually-impaired Internet users is relatively small, the problems they face trying to navigate poorly laid-out websites are of great concern to some HCI researchers. Lazar et al (2007) conducted a study of 100 blind screen reader users in which the participants kept a log of how their time was spent while using the Internet. Of the many frustrations that participants encountered, the researchers found that the top problem (by time wasted) among users of screen readers is “page layout causing confusing screen reader feedback”. Finding information on a poorly-designed page is the biggest problem that these users face, and while Lazar et al recommend better page design to solve this issue, it may be that a complex web page constructed primarily to serve the needs of users will always contain a large amount of extraneous content for blind users.

Leporini and Paternò (2004) identify 18 criteria for evaluating the usability of websites when accessed with screen readers. Among these are: • number of links and frames, • location of navigation bar, • assignment of shortcuts, • specific sections, • indexing of contents, and • identifying the main page content. All of these criteria deal with reducing the amount of time users have to spend skipping through page content to get to the information they are looking for. These potential navigational issues account for one third of their criteria, the largest grouping, and all are problems I attempted to solve with the design of Voogle by stripping the Google SERP down to only the essential results information.

Leporini et al (2008) looked at how to improve the code of the Google SERP, and the main recommendation was to “place the most important elements of the interface at the top of the source file” (emphasis theirs). This is an interesting recommendation, and very much in the same vein as the previous paper. Because screen readers primarily parse the HTML of a page, designing the HTML to

2 Dan Stuart SI 531 12/14/2010 accommodate screen readers and using the CSS to control layout is another elegant solution to the problem of website accessibility.

Systems: Voogle

For this project I wrote a proof-of- concept command-line interface to Google Search, designed specifically for visually- impaired users. Because I had originally planned to make the system voice-controlled, I named it Voogle, for “Voice-controlled Google”. (I found out part of the way through development that screen readers are traditionally named after aquatic creatures, e.g. Orca and JAWS. By the time I learned this I was already committed to the Voogle name, but I might otherwise have chosen a name that conforms to the tradition, such as Plecostomus.) I was unfortunately not able to implement the voice-control interface within the constraints of the operating systems I had to work with. However, I was able to get the other features working nicely, including the text-to-speech output. Figure 2: The Voogle command-line interface. The The Voogle user interface takes user has inspected the options menu, then initiated commands using a simple grammar: users can a search for "Dr Chuck" (ideally) say or type “search” (or one of a set of synonyms) to initiate a search, “options” to enter the options menu, or “help” to see and hear the help file. Simply entering “search” starts the search routine, which prompts for a query; alternatively users can enter the query after “search” in the same command to go directly to the query results. Everything pictured in Figure 1, except the user input, is read out to the user with the pyttsx library, which provides Python bindings to , the standard text-to-speech utility included with the operating system.

When a query is entered, Voogle retrieves the search results using the Google AJAX API and reads them out one at a time, first the title, then the summary snippet (the same one that Google displays on its SERP), then the URL, which can be set in the options menu to be either “full” (the entire result URL) or “short” (just the domain name). After each result is read, Voogle asks the user whether to open the result. If the user answers “yes”, or an equivalent synonym, the result is remembered and will be opened in a web browser, which launches at the end of the search process. Voogle then moves

3 Dan Stuart SI 531 12/14/2010 on to the next result, unless the user answers with “quit”, in which case the search is aborted and any selected results are opened. One minor problem with this interaction is that “quit” is not an obvious response to the question “Open result?”, so this is one aspect of the user experience that needs work.

Voogle includes several configurable options to customize its behavior, as shown in Figure 1. Several of these (“volume”, “rate”, “speech”) control the text-to-speech behavior. “Rate” sets the speed, in words per minute, that text is read out, and can be set between 1 and around 500, although the default, 200, is comfortable for most novice users. The “results” option controls the number of results that are read out before Voogle asks the user if they want more, and can be set from 1 to 32, which is the most results that the Google AJAX API will provide. In retrospect, this is not a very useful option, as the user has the option to quit the search after every result, so this in effect adds another choice they must make. The “url” option controls whether Voogle shows the full URL of each result, or just the domain name, as described above. The “lucky” option, when enabled, functions exactly as Google's “I'm Feeling Lucky” button: it causes Voogle to skip the results menu entirely and launches the first search result immediately. Finally, the “input” option was meant to control whether Voogle would look for input from the keyboard or microphone, but since speech input was never implemented, this option is non-functional.

Systems: Orca/Firefox

Orca is an open source assistive technology suite for visually impaired users. Aside from screen-reading capabilities, Orca offers screen magnification, braille output, “key echo” which pronounces the names of keys pressed, and customizable pronunciation of words. Orca is a very powerful, highly- Figure 3: The Orca interface customizable tool, suitable for general use with a wide variety of applications. Orca users navigate applications using the mouse or arrow keys, as abilities permit, and Orca reads any text adjacent to the cursor. Because it works with every application, Orca is not able to determine which text is irrelevant to the user's needs, and as a result reads everything. While Orca is an excellent program, I felt that this was an area in which I could improve on its performance for specific user tasks. Figure 4: The Orca Preferences pane is more impressive

4 Dan Stuart SI 531 12/14/2010

Research Methods

To compare the usability of the Voogle interface to the Orca/Firefox combination, I recruited two users from my peer group to participate in a pilot study. My participants were very similar to each other: both assessed themselves as experienced with web searching, both were completely unfamiliar with screen readers and other accessibility software, and both primarily use Google for their web search needs. Both participants use the Internet every day, and they are close in age as well (21 and 26). These similarities are important for comparing their performance on the assigned search tasks. The study sessions took place separately, one in a classroom in North Quad, and the other in my home. For a larger study, a greater uniformity of environment would of course be required. In particular, the classroom was difficult for me and User 1 to find, and we expected at any moment to be interrupted by a class, so this may have added to the participant's stress level. On the other hand, User 2 was my roommate, and we did the study in the dining room we share, so she was most likely more comfortable than User 1.

To open the study I read the following statement:

“Thank you for agreeing to participate in today's study. You will be evaluating the usability of two text-to-speech interfaces to the Google search engine. One is Voogle, a voice-controlled interface to Google search, and the other is the open-source screen reader Orca, combined with the Mozilla Firefox web browser. You will use each system to complete a small number of search tasks, and then answer a few questions about your experience. The study should take 30 to 60 minutes total. Do you consent to having your anonymous questionnaire included with my final report? To simulate the experience of a blind Internet user, I will ask you to either close your eyes or to wear a blindfold. Do you have a preference?”

I was careful in this statement to not mention which system I had developed, although I assumed that at least one of the participants had heard me talking about Voogle on prior occasions. I thought that the other had as well, but near the end of the study she asked me, “So, did you write both of these?”, so hopefully her results are unbiased in that regard.

After receiving consent to use the participant's results for the study, I gave the participant a questionnaire and asked them to fill out the first section, in which the participant assesses their experience and familiarity with the Internet, web searches, and assistive technology. Both participants' questionnaires are included in the Appendix.

5 Dan Stuart SI 531 12/14/2010

Next, I asked the participant to complete three simple search tasks using either Voogle of Orca/Firefox. I had prepared a list of six search tasks which could be completed using only the information on the Google SERP, mainly because Voogle only reads search results and summaries, and provides no functionality after opening the selected results. At this point, a visually impaired user would require a screen reader anyway, so this would not have resulted in an entirely fair comparison. The search tasks, in the order given, were:

1. What is the mass of the Sun? 2. Where is Google located? 3. Who is the CEO of the publisher O'Reilly Media? 4. Which composer or group produced the score for the newest Tron film? 5. What is the average age of people in China? 6. What is the name of the journalist who interviewed Richard Nixon in 1977?

I later tables, the search tasks will be referred to by number for brevity. Because the tasks were to be completed using only the Google search engine, I deliberately made the task phrasing somewhat ambiguous in order to prompt the participants to think about an effective query. Given the amount of time required to listen to an entire SERP with Orca, or to ten results in Voogle, participants were motivated to search efficiently, and I did not want to give away too many good search terms in the task question. For example, a good answer to the question “Where is Google located?” is “everywhere”, but what I had in mind for the correct answer was the city of Mountain View, CA, where Google headquarters is. Similarly, the name of the newest Tron film is Tron Legacy, but I wanted to see how the participants would complete this task with as few queries as possible.

I asked each study participant to complete the search tasks in the same order, but using different systems. So, Participant 1 used Voogle for the first three tasks, and Orca for the last three, and Participant 2 did the reverse. I gave brief instructions at the beginning of each section, related to the relevant system, as follows. For Voogle, I gave a short introduction to the system's syntax:

“Voogle is controlled with a voice interface with a specific grammar. You may say the word “search”, followed by your query, to conduct a search, or the word “options” to access the options menu. I will be typing your instructions into Voogle verbatim, so you may open your eyes and face away from the screen if you wish.”

For the Voogle test, I acted as the speech-to-text interpreter, which freed the participants to have their eyes open. At the time, I made this accommodation for the participants' comfort, but this may have been a mistake, as it removed a distracting condition from the Voogle test that was present in the Orca/Firefox test, possibly slightly biasing the results. In a larger study, the speech recognition would hopefully be working, so the study participants would still face the screen with their eyes closed to ensure uniformity of test conditions.

6 Dan Stuart SI 531 12/14/2010

For Orca/Firefox, the most important instructions were related to navigation on the SERP:

“Orca is an open-source screen reader with many configurable options. Among many other capabilities, it reads the currently-selected text of a web page, so users may navigate using the tab and arrow keys, and use the space bar to click on links. Orca uses keyboard input, and pronounces the name of each key pressed. If you are not comfortable with touch typing, I can cover the screen so that you can see the keyboard. You will start on the main Google search page, so you may enter your query as soon as you are ready.”

Neither participant requested to have the screen covered for the Orca/Firefox test.

I recorded the queries entered, and the total time taken to find the correct answer, on a sheet for each participant, which is in the Appendix. After each set of search tasks I asked the participant to fill out the section of the questionnaire related to the system they had just used. Each section had the same four questions, and asked the user to rate their experience, from 1 to 5, on ease of use, perception of the speed at which they completed their tasks, and confidence before and after the tasks. These results will be analyzed in the Results section.

Results

Quantitatively analyzing the results from such a small study is difficult, but qualitatively several observations can be made. Both users found Orca more difficult to use, and this is partly reflected in the data collected as well.

From the start, the participants had trouble with Orca because the speech synthesizer was clipping off the end of each utterance, making its speech very hard to make out. For example, when the user types and “N”, it is supposed to say “en”, but it ended up sounding much more like “e-”, hence the difference between an N and an M was hard to distinguish. This particular problem gave User 1 significant difficulties on her first Orca/Firefox search task, which took significantly longer because of the difficulty in detecting the problem of a singly mistyped letter. The same problem came up in her third task as well.

I had a similar problem when writing Voogle, but I was able to easily solve it by starting up an entirely new synthesizer for each utterance; while this is computationally inefficient in principle, the difference was impossible to notice. I did not anticipate having the same problem with Orca, because when I used it for the first and only time several months ago, it had no such issues. This alone made Orca very difficult to use, although the general sense of long utterances could be divined accurately.

7 Dan Stuart SI 531 12/14/2010

Also, though I was careful not to discuss the results of User 1's test with User 2 until the end of that session, both users independently commented on how much text Orca had to get through on the Google SERP to get to the results. Due to their unfamiliarity with Orca's rendering of the SERP, and the aforementioned speech synthesizer issues, both participants ended up listening to every item between the search box and the results, which included the Instant and Safe Search pull-downs, the advanced search link, the number of results, and the links to other types of search. This was exactly the result I was looking for, but though I tried not to let my delight show, in both cases I ended up enthusiastically agreeing, which was not as detached and rigorous as I had hoped to be for this study.

The full questionnaires and search task results are included in the Appendix, but I will summarize and analyze the results here to see if any statistically significant effects can be detected. First, the results of the questionnaire for Voogle:

Voogle Ease of Use Confidence (Pre-study) Confidence (Post-study) Speed User 1 5 3 5 4 User 2 3 2 4 3 Mean 4 2.5 4.5 3.5 σ 1.414 0.707 0.707 0.707 Error of the mean 1.000 0.500 0.500 0.500 95% CI Lower Bound 2.040 1.520 3.520 2.520 95% CI Upper Bound 5.960 3.480 5.480 4.480 Table 1: Voogle questionnaire results and statistics

Here, σ is the standard deviation, the error of the mean is the standard deviation divided by the square root of the number of data points, and the 95% confidence interval is defined by Mean±1.96 . As expected for only two trials, the standard deviations for these results are quite large. The results for the Orca/Firefox questionnaire:

Orca/Firefox Ease of Use Confidence (Pre-study) Confidence (Post-study) Speed User 1 1 2 2 3 User 2 1 4 2 3 Mean 1 3 2 3 σ 0.000 1.414 0.000 0.000 Error of the mean 0.000 1.000 0.000 0.000 95% CI Lower Bound 1.000 1.040 2.000 3.000 95% CI Upper Bound 1.000 4.960 2.000 3.000 Table 2: Orca/Firefox questionnaire results and statistics

The numbers for pre-study confidence by themselves do not say much, although they do show

8 Dan Stuart SI 531 12/14/2010 that both users were somewhat apprehensive before each test. However, both participants' post-study confidence was higher after the Voogle tasks, and unchanged or lower after the Orca/Firefox tasks. This agrees with what I observed during the studies, which was that both users were somewhat disoriented by the unfamiliar tasks, and Orca/Firefox reinforced that feeling, while Voogle may have been easier to use than expected.

The Orca statistics for ease of use and speed are somewhat difficult to interpret, since the standard deviation is 0, but we can compare the mean of each of those measures with the confidence intervals for the Voogle statistics. While the speed results are similar, the Voogle results for ease of use are higher than those for Orca/Firefox at a statistically significant level. While it is still difficult to make meaningful inferences from only two trials, it appears that Voogle may be significantly easier to use, as expected.

I also measured the time required (in seconds) and the number of queries entered per search task. The time results are given below, in Table 3:

Voogle Orca/Firefox User 1 Task 1 131 280 Task 2 69 45 Task 3 15 102 User 2 Task 1 61 86 Task 2 254 81 Task 3 47 111 Mean 96.167 117.500 σ 86.145 82.788 Error of the mean 35.169 33.798 95% CI Lower Bound 27.236 51.256 95% CI Upper Bound 165.097 183.744 Table 3: Time required per task, in seconds

The differences between Voogle and Orca/Firefox are not statistically significant for these tests. Why could this be? The majority of search tasks required only one query, and very few search results, to find the correct answer. This probably indicates that I made the search tasks too easy, so one challenge in expanding this study would be to come up with more difficult searches which can still be completed within the constraints of the individual systems, or perhaps another criterion for evaluation could be the relevance precision of the set of sites selected for a search task from each system.

9 Dan Stuart SI 531 12/14/2010

Voogle Orca/Firefox User 1 Task 1 1 2 Task 2 1 1 Task 3 1 1 User 2 Task 1 1 1 Task 2 3 1 Task 3 1 1 Mean 1.333 1.167 σ 0.816 0.408 Error of the mean 0.333 0.167 95% CI Lower Bound 0.680 0.840 95% CI Upper Bound 1.987 1.493

Table 4: Searches per task, in seconds The results for the number of searches per task are given in Table 4. The average time required per search was also similar between systems, and the difference was definitely not statistically significant. One interesting feature to note is that, generally, the time required for the first search on each system was longer than the time for the second and third, for most trials. (The long time and larger number of queries for User 2's second Voogle task was related to some confusion and frustration over the results returned, and a large amount of time was spent on query reformulation.) This general trend probably reflects the process of the users getting used to the new search interface.

Conclusion

By most measures, Voogle and Orca/Firefox performed equally well in this small study. While it is difficult to draw conclusions from such a small amount of data, Voogle significantly outperformed Orca/Firefox in the ease of use and user-satisfaction measurements. My study participants frequently seemed to be confused by the large amount of information that Orca forced them to listen to before the relevant search result information (not to mention the speech synthesizer problems), and based on my results and conversations with the participants this is the biggest problem with Orca.

As I expected prior to the study, participants appreciated the ability to receive only the essential information necessary to complete their queries. Sighted users have the advantage of being able to skip, instantaneously, to the information they need because they know where on the SERP it will be located. I would expect that users of screen readers would eventually learn that they can get straight to the information they want by (say) pressing the tab key twice, then the down key seven times. While this would be a workable solution, it would be necessary to learn this strategy anew for every

10 Dan Stuart SI 531 12/14/2010 frequently-visited web page, and any new interface features would interfere as well. A consistent, minimal interface to popular websites would greatly enhance their usability for these users, and while it is not convenient to download and install such an application for every website, I could see myself using one for Google, Facebook, and Slashdot. Such applications could be developed by the operators of the website (in the case of Facebook) or by the community (for Twitter and Digg, perhaps), and would expand the potential user base for these services.

While it is unlikely that Facebook, for example, would be prepared to sell such an application to its visually-impaired users (since they are not otherwise set up for financial transactions with individuals), it would be possible to generate ad revenue from short audio ads played occasionally enough not to significantly interfere with the user experience. Thus it is potentially in the best interests of even a monetized website to provide such an application to its visually-impaired users; given that it took me approximately thirty to forty hours to write Voogle, it might take a professional development team about as long to put together a really excellent application and substantially expand the site's user base in an otherwise saturated market.

11 Dan Stuart SI 531 12/14/2010

References

• Lazar, Allen, Kleinman, Malarkey (2007). What Frustrates Screen Reader Users on the Web: A Study of 100 Blind Users. International Journal of Human–Computer Interaction, 22(3), 247– 269. http://triton.towson.edu/~jlazar/IJHCI_blind_user_frustration.pdf • B. Leporini, F. Paternò (2004). Increasing usability when interacting through screen readers. Universal Access in the Information Society, Volume 3, Number 1, 57-70. http://www.springerlink.com/content/uvmc6xf7q4juqp9g/ • B. Leporini, P. Andronico, M. Buzzi, . Castillo (2008). Evaluating a Modified Google User Interface Via Screen Reader. Universal Access in the Information Society, Volume 7, Number 3, 155-175.

12 Dan Stuart SI 531 12/14/2010

Appendix

13