Applying Natural Language Processing Techniques to Speech Prostheses

From: AAAI Technical Report FS-96-05. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved. Applying Natural Language Processing Techniques to Speech Prostheses Ann Copestake Center for the Study of Language and Information (CSLI) Ventura Hall, Stanford University, Stanford, CA94305-4115 aac@csli,stanford, edu Abstract simply because of the time which is taken but because the delays completely disrupt the usual processes of In this paper, we discuss the application of Natural Language Processing (NLP) techniques to improving turn-taking. Thus the other speaker finds it hard to speechprostheses for people with severe motordisabil- avoid interrupting the prosthesis user. ities. Manypeople whoare unable to speak because of This problem can be alleviated in two ways: by im- physical disability utilize text-to-speech generators as proving the design of the interface (keyboard, head prosthetic devices. However,users of speech prosthe- stick, head pointer, eye tracker etc) or by minimizing ses very often have moregeneral loss of motor control the input that is required for a given output. Wewill and, despite aids such as word prediction, inputting concentrate on the latter aspect here, although there the text is slow and difficult. For typical users, cur- is some interdependence and we will briefly mention rent speechprostheses have output rates whichare less some aspects of this below. than a tenth of the speed of normalspeech. Weare ex- Techniques which have been used for minimizing in- ploring various techniques whichcould improverates, put include the following: without sacrificing flexibility of content. Here we describe the statistical wordprediction techniques used in a communicatordeveloped at CSLIand someexper- Abbreviations, icons and alternative languages iments on improving prediction performance. Wedis- Interfaces to speech prostheses commonlyallow text cuss the limitations of prediction on free text, and out- to be associated with particular abbreviations, func- line workwhich is in progress on utilizing constrained tion keys or on-screen icons. This is useful for text NLgeneration to makemore natural interactions pos- which is repeated frequently, but requires memoriza- sible. tion and does not allow muchflexibility. Somesys- tems use Minspeak (Baker, 1982), which allows short Introduction sequences of keystrokes to be used to produce whole utterances. Minspeakis compact and quite flexible, The Archimedes project at CSLI is concerned with but using it effectively requires considerable learn- developing computer-assisted communication for peo- ing. ple with disabilities, considering both their interaction with computers and with other individuals. We are Fixed text dialogues and stored narratives attempting to provide practical devices for immediate Alm et al (1992) describe an approach where needs, and also to carry out basic research on com- fixed text is stored and can be retrieved as ap- munication, which will lead to future improvements in propriate for particular stages of conversation, in these techniques. The work described in this paper particular: greeting, acknowledgement of inter- concerns the communication needs of people who have est/understanding, parting. Other work by the same unintelligible speech or wholack speech altogether be- group allows the retrieval of preconstructed utter- cause of motor disabilities. It is possible to build pros- ances and stories in appropriate contexts (Newell et thetic devices for such users by linking a suitable phys- al, 1995). A commercial implementation of this work ical interface with a speech generator, such as DecTalk, is Talk:About, produced by Don Johnson Incorpo- so that text or other symbolic input can be converted rated. This approach is important in that it empha- to speech. However, while speech rates in normal con- sizes the social role of conversation, but it allows the versation are around 150-200 words per minute (wpm), user little flexibility at the time the conversation is and reasonably skilled typists can achieve rates of 60 taking place. wpm,conditions which impair physical ability to speak Word (or phrase) prediction Many speech pros- usually cause more general loss of motor function and theses which take text rather than iconic input in- typically speech prosthesis users can only output about corporate some kind of word prediction, where the 10-15 wpm. This prevents natural conversation, not user is given a choice of a number of words, which 5 is successively refined as keystrokes are entered. We so that all the keys are mappedto the right hand side discuss prediction further below. since the user has very restricted right hand mobility Compansion Compansion is an approach which has and no left hand use. An interface to an eye-tracker been used in a prototype system (Demasco and Mc- is currently under development at CSLI to replace the Coy, 1992) but is not yet available commercially. It keyboard. The prediction techniques which the acces- allows the user to input only content words, to which sor currently utilizes are discussed below, followed by morphological information and function words are an outline of work in progress on a more complex sys- added by the system to produce a well-formed sen- tem. tence. A rather similar approach is described by Vaillant and Checler (1995), though they assume Statistical Prediction Techniques icons as input. Compansionis useful for individu- The basic technique behind word prediction is to give als who have difficulty with syntax, but as a tech- the user a choice of the words (or words and phrases) nique for improving speech rate for users with no which are calculated to be the most likely, based on cognitive disability it has limitations, since it still the previous input. The choices are usually displayed requires the user to input at least 60%of the num- as some sort of menu: if the user selects one of the ber of keystrokes which would be necessary to input items, that word is output, but if no appropriate choice the text normally. Furthermore, it involves natural is present, the user continues entering letters. For ex- language generation and requires considerable hand- ample, if ’t’ has been entered as the first letter of a coded linguistic knowledge (grammar, lexicon, some word, the system might show a menu containing the, information about word meaning). to, that, they, too, turn, telephone and thank you. If In the Archimedes project, we are concentrating in none of these are correct and the user enters ’a’, the particular on the needs of individuals who have degen- options might changeto take, table and so on. If table is erative muscular disorders, such as amyotrophic lat- then selected, it will be output (possibly with a trailing eral sclerosis (ALSor Lou Gehrig’s disease). Individ- space) and some effort has been saved. This approach uals with ALShave no cognitive or linguistic impair- is very flexible, since the user can input anything: the ment and have previously had full language use, so worst case is that the string is unknown,in which case solutions to the communicationproblem which restrict the system will not make any useful prediction. Predic- their range of expression are not acceptable. Such users tion ordering is based on the user’s previous input, and would prefer to continue using their original language, so the system can automatically adapt to the individ- rather than to learn an alternative symbol system. ual user. Unknownstrings are added to the database, Thus, of the techniques described above which are cur- thus allowing them to be predicted subsequently. rently available, the most suitable is text prediction Prediction systems have been used for at least 20 combined with single-key encoding of very frequently years (see Newell et al, 1995, and references therein). used phrases. Text input using a conventional key- The basic techniques are actually useful for any se- board maybe possible, but is usually slow and painful. quence of actions: for example, they can be used for ALSis a progressive disease and in its later stages only computer commands(e.g. Darragh and Witten, 1992). eye movementmay be possible, so it is important that However,we will concentrate on text input here, since any prosthetic system allows a range of physical input it is possible to improve prediction rates by using devices, since interfaces which are most suitable in the knowledge of language. For text input, the simplest earlier stages of the disease will becomeunusable later technique is to use the initial letters of a word as con- on. text, and to predict words on the basis of their frequen- We found that, in addition to the speed problems, cies. The prediction database is thus simply a wordlist existing commercial text-to-speech systems which in- with associated frequencies. This basic approach was corporated word prediction had a variety of drawbacks. implemented in the first version of the CSLI personal In particular, most are dedicated to speech output and accessor, using a wordlist containing about 3000 words cannot be used to aid writing text or email. There extracted from 26,000 words of collected data as start- are also problems of limitations in compatibility with ing data. particular software or hardware, and restrictions in We also built a testbed system in order to simulate the physical interfaces. One of the fundamental engi- the effects of various algorithms on the collected data neering principles underlying the Archimedesproject is in advance of trying them out with a user. We used that individuals should have Personal Accessors which a testing methodology where the data was split into take care of their personal needs with respect to phys- training and test sets, with the test set (10%of total ical input and which can be hooked up to any host data) treated as unseen. Weused the following scoring computer, with a small inexpensive adapter, replacing method: the conventional keyboard and mouse. A Personal Ac- (keystrokes + menuselections) * 100 cessor for one user with ALS has been developed at keystrokes needed without prediction CSLI, and now forms his main means of communication.

Load more