±Direct Speech Access

T. V. Raman

Adobe Systems i E-mail: [email protected] Voice-mail: 1 (415) 962-3945

Abstract Keywords

Emacspeak is a full-¯edged speech output inter- Direct Speech Access, Access to worksta- face to , and is being used to provide direct tions. speech access to a UNIX workstation. The kind of speech access provided by Emacspeak is qual- itatively different from what conventional screen- 1 Introduction readers provide Ðemacspeak makes applications speakÐ as opposed to speaking the screen. Emacspeak is an Emacs subsystem that allows Emacspeak is the ®rst full-¯edged speech output the user to get feedback using synthesized speech. system that will allow someone who cannot see to Traditionally, screen reading programs have al- work directly on a UNIX system (Until now, the lowed a visually impaired user to get feedback only option available to visually impaired users has using synthesized speech. Such programs have been to use a talking PC as a terminal.) Emacspeak been commercially available for well over a dec- is built on top of Emacs. Once Emacs is started, the ade. Most of them run on PC's under DOS, and user gets complete spoken feedback. there are now a few screen-readers for the Win- I currently use Emacspeak at work on my SUN dows platform. Early screen-reading programs re- SparcStation and have also used it on a DECAL- lied on the character representation of the contents PHA workstation under Digital UNIX while at Di- of the screen to produce the spoken feedback. A gital's CRL1 . I also use Emacspeak as the only signi®cant amount of research and development speech output system on my laptop running Linux. has been carried out to provide access to Graph- Emacspeak is available on the Internet: ical User Interfaces (GUI). These provide spoken access by ®rst constructing an off-screen model FTP ftp://crl.dec.com/pub/digital/emacspeak/ (OSM) Ða data structure that encapsulates the in- formation displayed visuallyÐand then using this WWW http://www.research.digital.com/CRL OSM to provide spoken feedback. The best and perhaps the most complete speech access system The work described in this paper was performed at Di- to the GUI is Screenreader/2 (ScreenReader For

gital Equipment Corporation's Cambridge Research Lab. OS/2) developed by Dr. Jim Thatcher at the IBM 1 Emacspeak was developed as a spare-time project while Watson Research Center [Tha94]. This package I worked at Digital's Cambridge Research Lab (CRL). provides robust spoken access to applications un- der the OS2 Presentation Manager and Windows 3.1. Commercial packages for Microsoft Windows 3.1 provide varying levels of spoken access to the GUI. The Mercator project [ME92, WKES94, MW94, Myn94] has focused on providing spoken Ða crufty if workable solution on the desktop. access to the X-Windows system. However, this was clearly impractical in the mo- As is clear from the above, screen-readers for the bile environment ÐI would have had to carry two

UNIX environment have been conspicuous in their laptops! 3 absence2 . The available options at the time were: Emacspeak is an emacs subsystem that provides complete speech access under UNIX. Emacspeak  Run DOS on the laptop and be limited to a will always have the shortcoming that it will only highly restricted environment. work under Emacs. This said, there is very little

 Run Windows 3.1 on the laptop with a com- that cannot be done inside Emacs, so it's not a real mercial screen-access package for Windows shortcoming. Ðmost of which were still ¯aky to say the Emacspeak does have a signi®cant advantage: least. since it runs inside Emacs, a structure-sensitive,

fully customizable environment, Emacspeak of-  Run Linux on the laptop and get the advant- ten has more context-speci® information about ages of a 32-bit OS with full multitasking cap- what it is speaking than its commercial counter- abilities. parts. In this sense, Emacspeak is not a ªscreen- readerº, it is a subsystem that produces speech out-  Run OS2 with IBM ScreenReader. put. A traditional screen-reader speaks the con- At the time I approached the problem, Linux was tent of the screen, leaving it to the user to inter- the most attractive solution Ðexcept that there was pret the visually laid-out information. Emacspeak, no speech access system available for Linux (or on the other hand, treats speech as a ®rst-class out- any other UNIX). put modality; it speaks the information in a man- ner that is easy to comprehend when listening. The traditional screen-reading paradigm suffers from a 3 Development Of Emacspeak severe shortcoming Ðthe user has to interpret the semantics encapsulated in the visual layout in or- Initially, I decided to write a speech-enabling ex- der to arrive at the meaning of the information dis- tension to Emacs as opposed to writing a conven- played by an application. In contrast, Emacspeak tional screen-reader at the TTY level in order to Ða direct speech access systemÐ speech-enables get a working system in the most expedient man- speci®c user applications to to speak the informa- ner editting editing possible. The ®rst working pro- tion that is being conveyed to the user. By doing totype of Emacspeak took under a week to design this, Emacspeak has much more contextual know- and implement. Once this prototype was working, ledge about the information being spoken than the advantages of the speech-enabling approach does a conventional screen-reading program. outlined earlier became apparent. I then decided to turn Emacspeak into more than a prototype Ð Emacspeak turned into my full-time speech access 2 Motivation interface. Using Emacs' power Emacspeak was motivated by my desire to run and ¯exibility, it has proven straightforward to add a multitasking OS on my laptop. Before Emac- modules that customize how different applications speak, the only way I could access a UNIX work- provide spoken feedback, e.g., depending on the station was via a PC emulating a talking terminal major/minor mode of a given buffer. Note that

2 This means that most visually impaired computer users 3 October 1994 face the additional handicap of being DOS-impaired ± a far more serious problem! the basic speech functionality provided by Emac- AUCTEX Editing (LA )TEX documents. speak is suf®cient to use most Emacs packages ef- fectively; adding package-speci®c customizations Emacs' ®le manager. makes the interaction much smoother. This is be- OOBR A powerful code viewer for browsing ob- cause package-speci®c extensions can take advant- ject oriented code. The interface provided is age of the application context to provide appropri- similar to the one provided in the Small talk ate feedback. world. Emacs-19's font-locking facilities are extended

to the speech output as well; for instance, a user GDB A powerful interface for debugging C and ++ can customize the system to have different types C code.

of text spoken using different kinds of voices ++ (speech fonts). Currently, this feature is used to CC-Mode CandC editing extensions. provide ªvoice lockingº for many popular editing INFO Browsing online documentation. modes like c-mode, tcl-mode, perl-mode, emacs- lisp-mode etc. MAN Browsing online UNIX MAN pages. Emacspeak currently comes with speech exten- sions for several popular Emacs subsystems and Folding Using Emacs as a structured folding ed- editing modes. I would like to thank their respect- itor. ive authors for their wonderful work which makes TEMPO A package that allows for editing tem- Emacs more than a text editor ÐEmacs is a fully plates. customizable user environment. Here is a partial list of the various editing modes A powerful interactive checker. and applications supported by Emacspeak: CALC A powerful symbolic algebra calculator. W3 A powerful Emacs-based WWW browser. ETERM Launching a terminal inside HTML-HELPER Publishing on the WWW. Emacs. This extension enables you to login to another system and get spoken feedback, VM A mail reader. as well as running programs that can only be A USENET news reader. run from the shell. With this extension, Emac- speak can do everything that a screen-reader BBDB The Insidious Big Brother Data Base. This written at the TTY level would achieve. For

is a powerful rolodex system that can be used instance, you can run a VI 4 inside the terminal to maintain and automatically update a rolo- emulator and get complete spoken feedback dex. from Emacspeak.

CALENDAR A tool for maintaining appoint- BUFFER-MENU Navigating through the list of ments etc. currently open buffers.

HYPERBOLE A powerful hypertext and hyper- Comint Interacting with command interpreters button system that allows the user to organize running in an inferior process. This allows the and ®nd information. user to run inferior UNIX shells, Lisp or TCL processes etc. ROLO Yet another rolodex system that comes with Hyperbole. EDIFF A powerful interface that allows the user to compare ®les, apply patches etc. KOUTL An outlining editor for maintaining

structured documents. 4 VI is the default editor on most UNIX systems TCL Supports editing and interactive debugging these applications. Clearly, the screen-reading ap- of TCL scripts. proach has the advantage that providing spoken ac- cess does not require direct cooperation from the PERL Supports editing of PERL scripts. underlying application Ðbut this is also its primary shortcoming. 4 Advantages The speech-enabling approach forces speci®c applications to treat speech as a ®rst-class I/O me- Emacspeak is a speech output system, and as such dium. This means that the speech output modules describing the output from Emacspeak in print is do not have to wait until the information is ®nally clearly impossible. Suf®ce it to say that the spoken displayed to the screen before speaking it Ðthe feedback provided by Emacspeak is qualitatively speech module can access the application context superior to that of the traditional screen-reading and generate the spoken output using all the in-

approach5 . In this section, we point out some of the formation that was available to the visual output features of the spoken feedback provided by Emac- routines. speak. The design used in Emacspeak can also be de- scribed as follows. Dr. Jim Thatcher describes his IBM ScreenReader as an interpreter that ex-

The Main Differentiator ecutes speci®c scripts to provide application spe- 6 The primary difference between Emacspeak and ci®c feedback. This makes IBM ScreenReader conventional screen-readers can be summarized by one of the most powerful screen access systems. saying that Emacspeak extends individual applica- In the case of IBM ScreenReader, the application tions to speak as opposed to speaking the ®nal dis- speci®c scripts (ScreenReader Pro®les) as well as play produced by individual applications. On the the ScreenReader interpreter that runs these scripts surface, the approach of extending each individual both run at global scope (top-level). Emacspeak application to speak might seem intractable. On goes one step further Ðthe application speci®c closer examination however, it becomes obvious scripts as well as the core speech output routines that speech-enabling applications is in fact the only all run within the application context. way to provide appropriate spoken feedback to the user. Further, the implementation strategy used, References providing appropriate hooks to key functions in an application, is a powerful technique for speech- [ME92] Elizabeth D. Mynatt and W. Keith Ed- enabling applications. We justify this assertion in wards. Mapping GUIs to auditory in- the following paragraphs. terfaces. Proceedings ACM UIST92, Every computing application can be viewed as pages 61±70, 1992. having three distinct components: [MW94] E.D. Mynatt and G. Weber. Nonvisual

 Obtain user input Ðget the data. presentation of graphical user inter- faces: Contrasting two approaches.

 Perform the computation. Proceedings of the 1994 ACM Confer-

 Display the result Ðgenerate output. ence on Human Factors in Computing Systems (CHI'94), April 1994. Conventional software assumes that the only mode of providing output is via a visual display. [Myn94] E.D. Mynatt. Auditory Presentation The screen-reading approach retro®ts spoken out- of Graphical User Interfaces.Santa put to this design in order to provide access to 6 I used this screenreader exclusively for 5 years until I

5 I have used screen-readers for the last 5 years. wrote Emacspeak. Fe. Addison-Wesley: Reading MA.., 1994.

[Tha94] James Thatcher. Screen reader/2: Ac- cess to os/2 and the graphical user in- terface. Proc. of The First Annual ACM Conference on Assistive Tech- nologies (ASSETS '94), pages 39±47, Nov 1994.

[WKES94] E. D. Mynatt W. K. Edwards and K. Stockton. Providing access to graphical user interfaces - not graph- ical screens. Proc. Of The First Annual ACM Conference on Assist- ive Technologies (ASSETS '94), pages 47±54, Nov 1994.