An Intelligent Interface for Keyboard and Mouse Control -- Providing Full

An Intelligent Interface for Keyboardand MouseControl - Providing Full Accessto PC Functionality via Speech Bill Manaris Renre McCauley Valanne MaeGyvers Computer Science Department Computer Science Department Psychology Department College of Charleston College of Charleston University of Louisiana Charleston, SC 29424 Charleston, SC 29424 Lafayette, LA 70504 [email protected] [email protected] [email protected] ABSTRACT mouse. SUITEKeysintegrates state-of-the-art components SUITEKeysis a speech user interface for motor-disabled for speech recognition, natural language processing, and computer users. This interface provides access to all dialog management, to provide a speaker-independent, available functionality of a computer by modeling continuous-speech interface. It is implemented on MS interaction at the physical keyboard and mouselevel. Windowsplatforms (95, 98, NT, 2000, and potentially SUITEKeysis currently implemented for MS Windows WindowsCE--for palmtop PCs). platforms. Its architecture integrates a continuous,speaker- This speech-interface concept provides motor-disabled independent recognition engine with natural language users with universal access to any device that processing components. It makes extensive use of dialog requires/supports alphanumericdata entry, such as palmtop managementto improve recognition accuracy. SUITEKeys PCs, cellular phones, and camcorders.It mayalso be used extends the speech-keyboard metaphor through by able-bodied users with temporary, task-induced motor functionality visible on its graphical user interface, and disabilities, such as users entering arbitrary (non-word) accessible through speech. Experimental results indicate alphanumeric data in mobile computingand/or hands-busy that speech input for alphanumeric data entry is a much environments. Specifically, as the physical dimensionsof moreeffective modality than existing alternatives for the traditional input devices shrink, the motorskills of the user target user group. Such alternatives include miniaturized becomeless effective, almost inadequate. For instance, keyboards, stylus "soft" keyboards, and handwriting studies showthat users mayexperience severe reduction in recognition software. data entry speed (words per minute) whenswitching from Keywords regular QWERTYkeyboard to a mobile-device keyboard Speechrecognition, natural languageprocessing, intelligent alternative. Specifically, these studies report a 75% user interfaces, accessibility. reduction on a stylus-based "soft" keyboard(PalmPilot) and a 60%reduction on a telephone keypad[4, 5]. INTRODUCTION SUITEKeys allows users to transcribe sequences of Studies indicate that speech interaction with a virtual keystrokes and mouseactions using spoken language. It keyboardand mouseis a very effective input modality for assumesthat the user speaks English and has minimalor no motor-controlchallenged users, in terms of data entry, task speech impediments. It models an 84-key keyboard (no completion,and error rates [1, 2, 3]. Theseresults applyto numeric pad) and two-button mousefunctionality. 2 Other both permanent and temporary (task-induced) motor applications, including the operating system, treat the disabilities. Examplesof the latter include users entering generated keyboardand mouseevents as having originated arbitrary alphanumeric data in mobile computing devices from the correspondingphysical devices. and hands-busy environments. Speech interaction with a virtual keyboardand mousehas been shownto be far better Other speech applications, such as NaturallySpeaking and than~ alternative modalities such as mouthstiek, handstick, Microsoft Voice, do not provide access to all the miniaturized keyboards, stylus "soft" keyboards, and functionality available through the physical keyboardand handwritingrecognition software [3]. mouse[6, 7]. This is becausethey are designed for higher- level tasks, such as Dictation and Command& Control. This paper presents SUITEKeys 1.0--a speech user Theyallow input of keystrokes either by requiring a fixed interface for providing access to a virtual keyboard and keywordas prefix (e.g., "Press a"), or by entering a spell Copyright©2001, AAAI. All rights reserved. ] Mouthstickand handstickare devices used by motor-disabled users to manipulateQWERTY keyboards. 2 SUITEKeysmodels only the left keys for ALT,CONTROL, and SHIFT. 182 FLAIRS-2001 From: FLAIRS-01 Proceedings. Copyright © 2001, AAAI (www.aaai.org). All rights reserved. mode. Moreover, they do not necessarily model all 84 keys. Finally, they send recognized keystrokes directly to the active window,bypassing the operating system. This prohibits the operating system and other applications from attaching arbitrary semantics to sequences of keystrokes (e.g., ALT-CONTROL-DEL),and intercepting them. OVERVIEW SUITEKeys was originally developed as a testbed B application for the SUITEspeech-understanding-interface architecture at the Universityof Louisianaat Lafayette [8]. It has been finalized at the College of Charleston. It is distributed as freeware for non-profit use. An early | prototype of the system, the underlying architecture, and theoretical foundations for this work were presented in ASSETS-98,FLAIRS-99 and elsewhere [1, 2, 9]. SUITEKeyshas been implementedusing MSVisual Studio, SAPI, and Lex and YACC.Minimum system requirements I OperatingSystem ] include: Pentium120 processor (preferred Pentium200 and up); 16MBRAM for Windows 95 & 98, 24MBRAM for Fig. l. SUITEKeysArchitecture. Windows NT, 64MBRAM for Windows 2000; 18MBdisk space; sound card & sound card device driver supported in "Up Arrow", "Tab", "Back Space"), mousecommands, and Windows;microphone (preferred noise-reducing, close-talk system commands. headset), and 16 kHz/16bit or 8kHz/16bit sampling rate SYSTEM ARCHITECTURE for input stream. The architecture of SUITEKeys integrates speech SU[TEKeysis available at www.cs.cofc.edu/~manaris/ recognition and natural language processing components SUITEKeys/. (see Figure 1). It processes input that originates as speech Domain Limitations events and converts it to operating system keyboard and Although the system incorporates a continuous, speaker- mouseevents. The complete system runs on top of the independentengine, this capability could not be exploited operating system like any other application. Other in all cases. This is due to speech ambiguities in the applications are unaware that keyboard/mouse events linguistic domain which normally confuse even human originated as speechinput. listeners. For instance, the linguistic domain includes The architecture is subdivided into (a) the languagemodel several near-homophoneletters, such as "b" and "p’, or "d" and (b) language processing components.Speech events are and "t". 3 For this reason, a few tasks have been modeled processed in a pipelined fashion to enable real-time only through discrete speech. This requires that the user response [2, 8]. Processing is performedby the following pause betweenwords. Although this maymake interaction components: less natural in some cases, it improves understanding Dialog Management accuracy4 considerably. The SUITEKeysarchitecture incorporates a stack-based Specifically, we use discrete speech mainly when dialog manager. This subsystem utilizes a dialog grammar transcribing regular letters (e.g., "a", "b"). Wealso use it containing dialog states and actions. Each top-level state a few other cases--where our usability studies showedthat corresponds to a specific language model (e.g., lexicon, users used it anyway--suchas entering function keys (e.g., grammar). The overall linguistic domainhas been divided "Function Eleven", "Function Twelve")and certain special into thirteen such states. Thedialog managerloads, in real keys (e.g., "Insert", "Home","Print Screen"). We use time, the appropriate dialog state based on the current state continuousspeech in all other cases, such as entering letters of the interaction. Giventhe ambiguity of the linguistic using the military alphabet, numbers,repeatable keys (e.g., domain, this allows processing componentsto focus on specific subsets of the overall domain, and thus improve recognition accuracy and performance. 3 Hencethe inventionof the militaryalphabet. Speech Processing 4 Weuse the term "understanding"as opposedto "recognition," SUITEKeysevolved through several architectures following since the system,in additionto translatingan utteranceto its the rapid developmentsin speechrecognition within the last ASCIIrepresentation (recognition), it also carries out the six years. Currently, it implementsthe Microsoft Speech intendedmeaning (understanding), such as type a letter, move API (SAPI). The latest version of SUITEKeys comes the mouse,and switch speakers. HCl / IUI 183 bundled with the MS Speech Recognition engine (freeware), but could also workwith other SAPI-compliant engines, such as DragonSystems NaturallySpeaking [6]. Natural LanguageProcessing Thearchitecture incorporates a left-to-right, top-down,non- deterministic parser. This parser supports multiple parse trees, thus providing for ambiguityhandling at the semantic level. It incorporates semantic actions for constructing semanticinterpretations of user input. Code Generator This moduleconverts semantic interpretations--the output of the parser--to low-level code understood by the operating system. Other Components Other componentsof the SUITEKeysarchitecture include a knowledge-base manager, speech generator, lexical Fig. 2: SUITEKeysmaximized view. analyzer, and pragmatic analyzer.

Load more