From: AAAI Technical Report FS-96-05. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved.

Applying Natural Language Processing Techniques to Speech Prostheses

Ann Copestake Center for the Study of Language and Information (CSLI) Ventura Hall, , Stanford, CA94305-4115 aac@csli,stanford, edu

Abstract simply because of the time which is taken but because the delays completely disrupt the usual processes of In this paper, we discuss the application of Natural Language Processing (NLP) techniques to improving turn-taking. Thus the other speaker finds it hard to speechprostheses for people with severe motordisabil- avoid interrupting the prosthesis user. ities. Manypeople whoare unable to speak because of This problem can be alleviated in two ways: by im- physical disability utilize text-to-speech generators as proving the design of the interface (keyboard, head prosthetic devices. However,users of speech prosthe- stick, head pointer, eye tracker etc) or by minimizing ses very often have moregeneral loss of motor control the input that is required for a given output. Wewill and, despite aids such as word prediction, inputting concentrate on the latter aspect here, although there the text is slow and difficult. For typical users, cur- is some interdependence and we will briefly mention rent speechprostheses have output rates whichare less some aspects of this below. than a tenth of the speed of normalspeech. Weare ex- Techniques which have been used for minimizing in- ploring various techniques whichcould improverates, put include the following: without sacrificing flexibility of content. Here we de- scribe the statistical wordprediction techniques used in a communicatordeveloped at CSLIand someexper- Abbreviations, icons and alternative languages iments on improving prediction performance. Wedis- Interfaces to speech prostheses commonlyallow text cuss the limitations of prediction on free text, and out- to be associated with particular abbreviations, func- line workwhich is in progress on utilizing constrained tion keys or on-screen icons. This is useful for text NLgeneration to makemore natural interactions pos- which is repeated frequently, but requires memoriza- sible. tion and does not allow muchflexibility. Somesys- tems use Minspeak (Baker, 1982), which allows short Introduction sequences of keystrokes to be used to produce whole utterances. Minspeakis compact and quite flexible, The Archimedes project at CSLI is concerned with but using it effectively requires considerable learn- developing computer-assisted communication for peo- ing. ple with disabilities, considering both their interaction with computers and with other individuals. We are Fixed text dialogues and stored narratives attempting to provide practical devices for immediate Alm et al (1992) describe an approach where needs, and also to carry out basic research on com- fixed text is stored and can be retrieved as ap- munication, which will lead to future improvements in propriate for particular stages of conversation, in these techniques. The work described in this paper particular: greeting, acknowledgement of inter- concerns the communication needs of people who have est/understanding, parting. Other work by the same unintelligible speech or wholack speech altogether be- group allows the retrieval of preconstructed utter- cause of motor disabilities. It is possible to build pros- ances and stories in appropriate contexts (Newell et thetic devices for such users by linking a suitable phys- al, 1995). A commercial implementation of this work ical interface with a speech generator, such as DecTalk, is Talk:About, produced by Don Johnson Incorpo- so that text or other symbolic input can be converted rated. This approach is important in that it empha- to speech. However, while speech rates in normal con- sizes the social role of conversation, but it allows the versation are around 150-200 words per minute (wpm), user little flexibility at the time the conversation is and reasonably skilled typists can achieve rates of 60 taking place. wpm,conditions which impair physical ability to speak Word (or phrase) prediction Many speech pros- usually cause more general loss of motor function and theses which take text rather than iconic input in- typically speech prosthesis users can only output about corporate some kind of word prediction, where the 10-15 wpm. This prevents natural conversation, not user is given a choice of a number of words, which

5 is successively refined as keystrokes are entered. We so that all the keys are mappedto the right hand side discuss prediction further below. since the user has very restricted right hand mobility Compansion Compansion is an approach which has and no left hand use. An interface to an eye-tracker been used in a prototype system (Demasco and Mc- is currently under development at CSLI to replace the Coy, 1992) but is not yet available commercially. It keyboard. The prediction techniques which the acces- allows the user to input only content words, to which sor currently utilizes are discussed below, followed by morphological information and function words are an outline of work in progress on a more complex sys- added by the system to produce a well-formed sen- tem. tence. A rather similar approach is described by Vaillant and Checler (1995), though they assume Statistical Prediction Techniques icons as input. Compansionis useful for individu- The basic technique behind word prediction is to give als who have difficulty with syntax, but as a tech- the user a choice of the words (or words and phrases) nique for improving speech rate for users with no which are calculated to be the most likely, based on cognitive disability it has limitations, since it still the previous input. The choices are usually displayed requires the user to input at least 60%of the num- as some sort of menu: if the user selects one of the ber of keystrokes which would be necessary to input items, that word is output, but if no appropriate choice the text normally. Furthermore, it involves natural is present, the user continues entering letters. For ex- language generation and requires considerable hand- ample, if ’t’ has been entered as the first letter of a coded linguistic knowledge (grammar, lexicon, some word, the system might show a menu containing the, information about word meaning). to, that, they, too, turn, telephone and thank you. If In the Archimedes project, we are concentrating in none of these are correct and the user enters ’a’, the particular on the needs of individuals who have degen- options might changeto take, table and so on. If table is erative muscular disorders, such as amyotrophic lat- then selected, it will be output (possibly with a trailing eral sclerosis (ALSor Lou Gehrig’s disease). Individ- space) and some effort has been saved. This approach uals with ALShave no cognitive or linguistic impair- is very flexible, since the user can input anything: the ment and have previously had full language use, so worst case is that the string is unknown,in which case solutions to the communicationproblem which restrict the system will not make any useful prediction. Predic- their range of expression are not acceptable. Such users tion ordering is based on the user’s previous input, and would prefer to continue using their original language, so the system can automatically adapt to the individ- rather than to learn an alternative symbol system. ual user. Unknownstrings are added to the database, Thus, of the techniques described above which are cur- thus allowing them to be predicted subsequently. rently available, the most suitable is text prediction Prediction systems have been used for at least 20 combined with single-key encoding of very frequently years (see Newell et al, 1995, and references therein). used phrases. Text input using a conventional key- The basic techniques are actually useful for any se- board maybe possible, but is usually slow and painful. quence of actions: for example, they can be used for ALSis a progressive disease and in its later stages only computer commands(e.g. Darragh and Witten, 1992). eye movementmay be possible, so it is important that However,we will concentrate on text input here, since any prosthetic system allows a range of physical input it is possible to improve prediction rates by using devices, since interfaces which are most suitable in the knowledge of language. For text input, the simplest earlier stages of the disease will becomeunusable later technique is to use the initial letters of a word as con- on. text, and to predict words on the basis of their frequen- We found that, in addition to the speed problems, cies. The prediction database is thus simply a wordlist existing commercial text-to-speech systems which in- with associated frequencies. This basic approach was corporated word prediction had a variety of drawbacks. implemented in the first version of the CSLI personal In particular, most are dedicated to speech output and accessor, using a wordlist containing about 3000 words cannot be used to aid writing text or email. There extracted from 26,000 words of collected data as start- are also problems of limitations in compatibility with ing data. particular software or hardware, and restrictions in We also built a testbed system in order to simulate the physical interfaces. One of the fundamental engi- the effects of various algorithms on the collected data neering principles underlying the Archimedesproject is in advance of trying them out with a user. We used that individuals should have Personal Accessors which a testing methodology where the data was split into take care of their personal needs with respect to phys- training and test sets, with the test set (10%of total ical input and which can be hooked up to any host data) treated as unseen. Weused the following scoring computer, with a small inexpensive adapter, replacing method: the conventional keyboard and mouse. A Personal Ac- (keystrokes + menuselections) * 100 cessor for one user with ALS has been developed at keystrokes needed without prediction CSLI, and now forms his main means of communica- tion. Currently a keyboard is used for input, modified (we use ’keystroke’ to refer to any atomic user input,

5 irrespective of whether the physical interface is actu- at the cost of greatly increasing the memorytaken ally a keyboard or not). For example, choosing table by the system and decreasing its speed. A more tar- after inputting ’t’, ’a’ wouldgive a score of (2 + 1)/6 geted approach is to add inflectional variants of words 50%, assuming that a space was automatically output which have been seen. E.g. if communicateis seen, the after the word. This scoring method is an idealiza- system can also add communicates, communicated and tion, since it ignores the difference between positions communicating. This technique results in 41 of the un- on the menu, and assumes that the user always chooses seen words being added, and has the advantage that a menuitem when one is available. These assumptions fewer inappropriate words are shown to the user. (The are reasonable for the current system, since the choice freely-available morphology system described in Karp between menu items is made using a single key, and, et al (1992) was used for this work.) This technique for a user with severe motor impairment, the cognitive most useful when used in conjunction with the syntac- effort of looking at the menuchoices is small compared tic filtering technique described below, since this also to the physical effort of movingthe hand to a different requires morphological analysis in order to determine key. With a menu size of 8, the mean score for the syntactic category. basic method was 57.3%. We describe some improvements to the basic ap- ngrams proach in the remainder of this section, since although Given the great improvements that have been made these do not represent any major breakthroughs in in speech recognition by using Hidden Markov Models prediction technology, we hope that the discussion will make the nature of the problem clearer. The (HMMs),it is natural to expect that these techniques would be beneficial for word prediction. Howeverex- techniques described were all ones that could be im- plemented quickly, utilizing public domain or readily isting text corpora do not make good models for the speech of our user, and the amount of training data available resources. which we can collect is insufficient to extract reliable Recency word-based trigrams. 26,000 words of data represents around three months of input, and much more data This refers to increasing the weights of recently seen would be necessary to collect useful trigrams. One so- words so that these are temporarily preferred. Wetried lution to this problem is to back off to broader cat- a range of different factors and strategies for allowing egories, so we investigated the use of part of speech the added weights to decay. For example, it seemed (POS) bigrams extracted from existing tagged corpora plausible that the weights of very frequent closed class available from the Linguistic Data Consortium (Penn words, such as the and in shouldn’t be modified, while Treebank). The idea is that we can use a corpus in open class nouns and verbs should be. Howeverwe got which each word token has been tagged with its appro- the best results from the simplest strategy of adding priate POSin that context and derive transition prob- a constant amount to the weight of a word, which abilities for POS-to-POStransitions, instead of word- was calculated to be sufficient to raise any word to to-word transitions. Then, if the predictor modifies the the menushown after an initial letter has been input. weights on words according to their possible parts-of- (Words which are already on this menu may be raised speech and the possible parts-of-speech of the previous to the top level menu.)E.g. if table is usually predicted word, we get some syntactic filtering. For example, if after seeing ’t’ ’a’, the recency factor was calculated to word frequencies alone are used, the top eight possi- be sufficient to promote it to the menuseen after ’t’ bilities predicted by one of our training sets when no alone is input. The weights are removed when there is initial letter has been input were I, is, the, you, it, to, a gap of more than 10 minutes between inputs, which is a and ok. These are all almost certainly inappropriate a heuristic that suggests a particular conversation has following we at the start of the sentence. This is par- ended. This simple technique results in an improve- tially reflected in POStransition probabilities: e.g. the meat in score from 57.3% to 56.0%. probability of PP (personal pronoun) being followed another PP is very low. Using transition probabilities Unknown words extracted from a subset of the tagged Treebank cor- In a representative test corpus of 2600 words we found pora to modify the frequencies resulted in the following that there were 193 word types which were unseen in list being predicted following we: is, did, are, was, ok, the 23,400 word training corpus. Of these, about a call, should and can. Here five of the eight menuitems third were typographical errors or proper names. A are reasonably probable continuations, thus we have larger wordlist could thus potentially improve perfor- achieved some syntactic filtering. The inappropriate mance. We tried adding an extra 18,000 words ex- words is and was are due to we being tagged simply tracted from a newspaper corpus to the lexicon, giving as PP since the Treebank tagset we used does not dis- them a frequency of 0, so they would only be shownon tinguish between singular and plurals pronouns. OK the menus when there was no possible previously seen is predicted because it was unseen in the corpus data, word. This covered 92 of the missing words. Adding it and so did not have an allocated tag: words without improved performance, but only by about 0.9%, and a tag were treated as though they could belong to any class for this initial experiment. the can follow we at the beginning of a sentence, but However, overall we got no improvement in per- only in rather unlikely contexts such as we, the under- formance using POStransition probabilities extracted signed. Of course the grammaticality of a string can from the Treebank corpora, apparently because they depend on more than one or two words of previous were a poor model for our data. The problem seems context, but without some reliable methods for lexical to be that our user makes much more frequent use of and structural disambiguation, a conventional gram- questions, imperatives and interjections than the sen- mar will posit too much ambiguity to make very tight tences found in the corpora we used to extract the predictions possible. Manyof our user’s utterances POSbigrams. We therefore decided to derive transi- are short phrases rather than complete sentences, so tion probabilities directly from the data collected from any grammar would have to be capable of dealing with our user. As an initial step we tried to tag our data fragments, but these are allowed for naturally with the using the tagger developed by Elworthy (1994) and POS ngram model. lexicon derived from the Treebank corpora. This gave some unexpected results: the Treebank corpus turned Some conclusions on prediction out to be a good model for our data with respect to All the experiments reported above kept the size of relative frequencies of POSassociated with particular the menu at eight items. Increasing the menusize im- words(e.g. if a wordsuch as table is likely to be a noun proves performance, however there are obviously lim- rather than a verb in the Treebank corpus, then on the itations to the menu size that a user can cope with. whole the same is true in our user’s data). Because of Eight items is at the upper end of the range usually this, we got about 92% tagging accuracy simply by suggested in HCI studies, but the system currently in choosing the most frequent tag for each word. Run- use presents a choice of 10 items, since we found that ning the tagger did not improve these results, which the limiting factor was not the cognitive load of scan- is unsurprising since taggers generally perform rather ning that number of menuchoices, but the size of the poorly whenthe initial data is close to being correct, screen. A smaller menusize might be better for a user where the dataset is small, and when the sentences with more mobility. We hope that when using the are short, all of which were true here. Using the POS eye-tracker it will be possible to make use of multiple transition probabilities derived from a text which was menus, arranged according to content, since a wider tagged by simply taking the most globally probable range of options can be made available with a single tags for the word in the training data gave an over- movement. For example, it would seem natural to have all improvement in prediction rate of about 2.7%. We a separate menu of proper names. expect to be able to improve this somewhat, using a Similarly, results would improve if it was possible to refined set of POStags and considering POStrigrams predict phrases rather than words. But mixing words instead of bigrams. and phrases in a single menu is confusing (except for Our tentative conclusions from this work are that cases such as compoundnouns, like desk lamp, which it is possible to construct POSbigrams and trigrams are in many respects wordlike in their behavior). In from data collected from a personal accessor, with- any case, the current system has a numberof hardwired out using a tagger, by simply assuming each word phrases tied to particular keys or accessed through a in the input has its most likely POS. This will in- hierarchical menu, which take care of the most frequent troduce some errors, but unless these are systematic, phrases. they should not affect the overall accuracy of the model One disadvantage of concentrating too much on too badly. Since no tagger is involved, the transitions modeling the collected data is that it is far from the can be straightforwardly learned by a running acces- output that our user would like to produce. We have sor. An external lexicon which gives the possible parts been collecting data for about nine months, and over of speech for a given word and identifies the most fre- that time there have been significant changes in the quent is needed, but this can be derived from existing output: the utterances now tend to be shorter, though tagged corpora. Manual augmentation of the lexicon more frequent, and there is an increasing tendency to is useful for cases where the user’s vocabulary contains produce ungrammatical output, by dropping the for words which are not found in the external corpus, but example. Yhrthermore, although we have concentrated this is not essential. on speech output here, email and other written forms This approach to syntactic filtering has several ad- have rather different characteristics, so for optimum vantages over using a conventional grammar. Gram- performance different modes are necessary. Obviously mars are relatively difficult to develop and maintain, we would also like to check performance with more and parsing is expensive computationally. If the gram- than one user, although, since we have not done any mar is narrow in coverage, it will not provide any pre- hand-modeling in the work reported here, the system dictions for sentences which are not covered. On the could be made to adapt to a new user incrementally other hand, a broad coverage grammar would need and automatically. to be augmentedwith probabilities in order to distin- Our current user finds word prediction is a great guish between likely and unlikely strings. For example, benefit and he will not use any system that does not

8 incorporate it. Workon improving the prediction tech- and lexicon. The use of partial text supplied by tem- niques is still in progress, and we expect to get some plates combined with free user input extends the work further performance gains. Savings in keystrokes of of Aim et al (1992), which concentrated on almost com- up to 60%have been reported in the literature, but pletely fixed text for sequences such as greetings, to the these results appear to be from subjects whowere us- muchwider range of situations where partially prede- ing more restricted vocabularies. However,we feel that fined strings are appropriate. the experiments described here suggest that we are see- The choice of template is made by the user, and ing diminishing returns from considerable increases in the interface provides slots which the user instantiates complexity to the system. We believe that word pre- with text units. In manycases, slots will be optional diction alone cannot achieve the improvements in out- or have default fillers, constructed according to con- put rate which are needed for more natural speech. text and previous inputs. Instantiation of the slots is Because of this, we are in the process of developing a aided by word and phrase prediction, conditioned on new technique which combines prediction with some slot choice. Prediction should be muchmore effective elements of the other approaches listed in the intro- than with free text, since the slots will provide fine- duction. This ’cogeneration’ approach is described in grained syntactic and semantic constraints. The co- the next section. generator operates by combining the constraints spec- ified by the template(s) with the general constraints Cogeneration of the HPSGgrammar to produce an output sentence, Cogeneration is a novel approach to natural language guided by statistical information. generation which is under development at CSLI. Tra- To give a concrete example, the ’refer to earlier dis- ditionally, generation has been seen as the inverse of cussion’ template might have slots for ’topic of discus- parsing, and the input is some sort of meaning repre- sion’, ’time of discussion’ (optional) and ’participants sentation, such as predicate calculus expressions. This in discussion’ (defaulting to ’us’). The user might in- is inappropriate for assistive communication,since for- put buy desk lamp to the topic slot, and breakfast in the mal meaning representation languages are hard to time slot. The template would have a number of par- learn and anyway tend to be more verbose than their tially fixed text units associated with it, which might natural language counterparts. Instead, in cogenera- include: tion, input is partially specified as a series of text units You know talking about by the user, and the job of the generator is to combine The system could then generate: these units into grammatical, coherent sentences which You know we were talking at breakfast about buy- are idiomatic, appropriately polite and so on. To ac- ing a desk lamp complish this, the generator has to be able to order the text units, add inflections and insert extra words (both The intention is that the system provides the inflec- function and content words). This concept of lexicalist tional information, and uses syntactic information to generation is closely related to Shake’n Bake genera- arrange the text so that the sentence sounds reason- tion (Whitelock, 1992; Poznanski et al, 1995). The ably natural. Cogeneration thus also builds on De- knowledge sources which are needed for cogeneration masco and McCoy’s (1992) work on compansion. The are: placement of the prepositional phrase at breakfast af- ter talking rather than at the end of the sentence is ¯ a grammar and lexicon expression in a constraint- motivated because this avoids ambiguity. The expan- based framework, such as Head-driven Phrase Struc- sion of breakfast into at breakfast involves the choice ture Grammar(HPSG: Pollard and Sag, 1987, 1994) of a particular preposition, and choice of the indefinite ¯ statistical information about collocations and pre- article ( a desk lamp rather than the desk lamp). This ferred syntactic structures latter choice would be based on the (default) assump- ¯ application- and context-dependent templates which tion that the particular objects bought will not have are used to guide both the user input and the process been previously mentioned, possibly reinforced by in- of generation, and to provide fixed text for conven- formation from previous input. In general, the system tional situations will maintain a complete record of utterances, so that text can be appropriately indexed and retrieved. Thus cogeneration involves a combination of gram- The need for linguistic processing as an essential matical, statistical, and template-based constraints. component of this approach is shown by the un- For application to speech prostheses the templates will grammaticality and near-unintelligibility of the output be designed for particular dialogue situations. These which would have resulted from treating the user input will be organized in a hierarchy which will contain gen- and the template text as fixed strings: eral classes, such as ’request’, ’question’, ’statement’, with more specific templates such as ’refer to earlier You know us talking about buy desk lamp break- discussion’ inheriting from the general classes. The fast templates provide constraints on generation, which can In general, fixed string substitution has severe limita- be expressed in the same formalism as the grammar tions in anything other than the most restricted cir-

9 cumstances. which operate with very restricted domains and which In principle there need be no restriction on user in- have rather limited and well-defined communication put to the system. The system would perform opti- goals. The cogeneration approach avoids any explicit mally if there were a full lexical entry for every word, encoding of real-world knowledge by being user-driven but would degrade gracefully if little or no informa- and by the combination of templates, grammar (in- tion were known about some of the words in the user cluding lexical semantics) and statistical information. input. However,full lexical entries will be needed for Wetherefore expect to build a usable prototype which words in the template strings. We envisage a range is capable of operating in any domain, unlimited by of about 20 templates being sufficient to cover a wide subject matter. The other most important feature of range of conversational situations, so the cognitive load cogeneration compared to conventional generation is involved in learning and selecting templates will not be that it does not require production of a complete mean- great. More specific templates could be customized for ing representation, but is essentially word-based, mak- an individual speaker, and since the information asso- ing it possible to freely mix (semi-)fixed phrases with ciated with specific templates will be expressed as a user input text. The user does not have to learn a text string, this could be done by the user. Similarly, new language (logical or iconic) to drive the generator the user could provide customized fixed text options for but is always in control of the content of the output: the built-in templates. The system will be inherently the generation process is user-driven via template and robust, in that if there were no appropriate specific word choice, and the user can always choose to reject template, the very general templates could be used as or modify the output text. This approach to genera- a fall-back: the user would have to type more, but tion is thus particularly suitable for an assistive device, the word prediction would still operate. Furthermore, especially for users whohave lost their previous ability the system will be able to adapt automatically to an to speak. individual user over time, both with respect to word Despite the avoidance of coding real-world knowl- prediction and template preferences. edge, cogeneration is relatively knowledge-intensive Besides the hierarchical arrangement of templates, compared with simple word prediction. We hope that there will also be a linear ordering, which allows the this will pay off in terms of improvementin conversa- user’s choice of template to be predicted by the system tion naturalness, although this work is only in its very under some circumstances. This is most apparent in early stages, and we are a long way from being able to highly conventional situations such as greetings. Nat- demonstrate this. However,one potential side-effect of urally, since the system only has access to one side utilizing more linguistic knowledgeis that this can also of the conversation, the prediction cannot be perfect. be harnessed to make generated speech more natural However, we would expect the user to be able to take with respect to intonation and the expression of emo- advantage of sequences and use them to channel the tion, for example. Another potential benefit is that co- conversation so that communication is most easy for generation could be made to accept non-textual input them, possibly with the aid of templates which indi- and produce speech output. For example, the use of an cated ’return to previous topic’ etc. Furthermore we eye-tracker could makeit possible to input a represen- would expect there to be environments where it is pos- tation of movementmore directly, by scanning from a sible to provide scripts for interactions (e.g. Schank start point to an end point on a scene representing the and Abelson, 1977), such as restaurants and super- user’s room, for example. If the start-point and end- markets. Constructing detailed scripts for real world point are labeled (e.g. as representing the user and the situations is not a main focus of our research, since telephone) this input can be treated as equivalent to we would expect there to be wide variability in what text, and a natural language utterance such as Bring a particular individual would find useful. It should be me the telephone could be generated. Work is also in possible to make the template construction tools suf- progress at CSLIon utilizing language constructs bor- ficiently easy to use to allow templates and scripts to rowed from ASL to enhance such use of non-textual be added by the user for his/her particular needs. input.

Conclusion Acknowledgements Prediction techniques are robust and flexible, but by I am grateful to David Elworthy for advice on tagging, themselves cannot offer the improvementin text input to members of the Archimedes group, and especially speed necessary to allow natural conversation using a to Greg Edwards, who implemented the Personal Ac- text-to-speech system. We therefore propose to use cessor described here. the cogeneration technique for this application. Nat- ural language generation as usually conceived involves References many complex AI issues: real world knowledge repre- Alm, Norman, John L. Arnott and Alan F. Newell sentation and reasoning, planning, user-modeling, rea- (1992) ’Prediction and conversational momentum soning about user goals and intentions and so on. For in an augmentative communication system’, Com- this reason, it is currently feasible only for systems munications of the ACM, 35(5), 47-57.

10 Baker, B. R. (1982) ’Minspeak’, Byte, 7(9), 186-202. Darragh, J. J. and I. H. Witten (1992) The reactive keyboard, Cambridge University Press. Demasco, Patrick W. and Kathleen F. McCoy (1992) ’Generating text from compressed input:an intel- ligent interface for people with severe motor im- pediments’, Communications of the ACM, 35(5), 68-78. Elworthy, David (1994) ’Part of speech tagging and phrasal tagging’, ACQUILEX2 Working paper no 10, Computer Labo- ratory http ://www. cl. cam. ac. uk/Research/NL/ acquilex/acq2wps, html. Karp, Daniel, Yves Schabes, Martin Zaidel and Dania Egedi (1992) ’A freely available wide coverage mor- phological analyzer for English’, Proceedings of the l~th International Conference on Computational Linguistics (COLING-92), Nantes, France. Newell, Alan F., John L. Arnott, Alistair Y. Cairns, Ian W. Ricketts and Peter Gregor (1995) ’Intel- ligent systems for speech and language impaired people: a portfolio of research’ in Edwards, Alis- tair D. N. (ed.), Extra-Ordinary Human-Computer Interaction, Cambridge University Press, pp. 83- 101. Pollard, Carl and Ivan Sag (1987) An information- based approach to syntax and semantics: Volume 1 fundamentals, CSLI Lecture Notes 13, Stanford CA. Pollard, Carl and Ivan Sag (1994) Head-Driven Phrase Structure Grammar, Chicago University Press, Chicago and CSLI Publications, Stanford. Poznanski, Victor, John L. Beaven and Pete Whitelock (1995) ’An efficient generation algorithm for lexi- calist MT’, Proceedings o/the 33rd Annual Meeting o/the Association for Computational Linguistics (ACL-95), Cambridge, Mass.. Schank, Roger C. and Robert P. Abelson (1977) Scripts, plans, goals and understanding: an inquiry into human knowledge, Lawrence Erlbaum Asso- ciates, Hillsdale, N.J.. Vaillant, Pascal and Michael Checler (1995) ’Intelli- gent voice prosthesis: converting icons into natural language sentences’, Computation and Language E-Print Archive: http://xxx, lanl. gov/abs/cmp-lg/9506018. Whitelock, Pete (1992) ’Shake-and-bake translation’, Proceedings of the l~th International Confer- ence on Computational Linguistics (COLING-92), Nantes, France.

11