<<

Rapid Portability among Domains in an Interactive Spoken System

Mark Seligman Mike Dillinger Spoken Translation, Inc. Spoken Translation, Inc. Berkeley, CA, USA 94705 Berkeley, CA, USA 94705 mark.seligman mike.dillinger @spokentranslation.com @spokentranslation.com

Abstract this difficulty and briefly present an approach to rapid inter-domain portability that we believe is promising. Spoken Language Translation systems have Three aspects of our approach will be discussed: (1) usually been produced for such specific domains as large general-purpose lexicons for automatic speech health care or military use. Ideally, such systems recognition (ASR) and (MT), would be easily portable to other domains in which made reliable and usable through interactive facilities translation is mission critical, such as emergency response or law enforcement. However, porting has in for monitoring and correcting errors; (2) easily practice proven difficult. This paper will comment on modifiable facilities for instant translation of frequent the sources of this difficulty and briefly present an phrases; and (3) quickly modifiable custom glossaries. approach to rapid inter-domain portability. Three As preliminary support for our approach, we aspects will be discussed: (1) large general-purpose apply our current SLT system, now optimized for the lexicons for automatic speech recognition and health care domain, to sample utterances from the machine translation, made reliable and usable through military, emergency service, and law enforcement interactive facilities for monitoring and correcting domains. errors; (2) easily modifiable facilities for instant With respect to the principal source of the porting translation of frequent phrases; and (3) quickly modifiable custom glossaries. As support for our problems affecting most SLT systems to date: most approach, we apply our current SLT system, now systems have relied upon statistical approaches for optimized for the health care domain, to sample both ASR and MT (Karat and Nahamoo, 2007; utterances from the military, emergency service, and Koehn, 2008); so each new domain has required law enforcement domains, with discussion of extensive and high-quality in-domain corpora for best numerous specific sentences. results, and the difficulty of obtaining them has limited these systems’ portability. The need for in- 1 Introduction domain corpora can be eliminated through the use of a quite general corpus (or collection of corpora) for Recent years have seen increasing research and statistical training; but because large corpora give rise commercial activity in the area of Spoken Language to quickly increasing perplexity and error rates, most Translation (SLT) for mission-critical applications. In SLT systems have been designed for specialized the health care area, for instance, such products as domains. Converser (Dillinger & Seligman, 2006), S-MINDS By contrast, breadth of coverage has been a (www.fluentialinc.com), and Med-SLT (Bouillon et central design goal of our SLT systems. Before any al, 2005) are coming into use. For military optimization for a specific domain, we “give our applications, products like Phraselator systems a liberal arts education” by incorporating (www.phraselator.com) and S-MINDS very broad-coverage ASR and MT technology. (We (www.fluentialinc.com) have been deployed. presently employ rule-based rather than statistical MT However, the demand for real-time translation is by components, but this choice is not essential.) For no means restricted to these areas: it is clear in example, our MT lexicons for English<>Spanish numerous other areas not yet extensively addressed – translation in the health care area contain roughly emergency services, law enforcement, and others. 350,000 words in each direction, of which only a Ideally, a system produced for one such domain small percentage are specifically health care terms. (e.g., health care) could be easily ported to other Our translation grammars (presently licensed from a domains. However, porting has in practice proven commercial source, and further developed with our difficult. This paper will comment on the sources of collaboration) are similarly designed to cover the structures of wide-ranging general texts and spoken © 2008. Licensed under the Creative Commons Attri- discourse. bution-Noncommercial-Share Alike 3.0 Unported To deal with the errors that inevitably follow as license (http://creativecommons.org/licenses/by-nc- coverage grows, we provide a set of facilities that sa/3.0/). Some rights reserved. enable users from both sides of the language barrier to

40 Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications, pages 40–47 Manchester, August 2008 interactively monitor and correct such errors. We have As an initial safeguard against translation errors, described these interactive techniques in (Dillinger we supply a back-translation, or re-translation of the and Seligman, 2004; Zong and Seligman, 2005; translation. Using this paraphrase of the initial input, Dillinger and Seligman, 2006; and Seligman and even a monolingual user can make an initial judgment Dillinger, 2006). With users thus integrated into the concerning the quality of the preliminary machine loop, automatically translated translation output. If errors are seen, the user can spoken conversations can range widely with modify specific parts of the input and retranslate. acceptable accuracy (Seligman, 2000). Users can (Other systems, e.g. IBM’s MASTOR (Gao et al, move among domains with relative freedom, even in 2006), have also employed re-translation. Our advance of lexical or other domain specialization, implementations, however, exploit proprietary because most domains are already covered to some technologies to ensure that the lexical senses used degree. After a quick summary of our approach (in during back-translation accurately reflect those used Section 2), we will demonstrate this flexibility (in in forward translation. We also allow users to modify Section 3). part or all of the input before regenerating the While our system’s facilities for monitoring and translation and back-translation.) correction of ASR and MT are vital for accuracy and In addition, if uncertainty remains about the confidence in wide-ranging conversations, they can be correctness of a given word sense, we supply a time consuming. Further, interactivity demands a proprietary set of Meaning Cues™ – synonyms, minimum degree of computer and print literacy, definitions, examples, pictures, etc. – which have which some patients may lack. To address these been drawn from various resources, collated in a issues, we have developed a facility called database (called SELECT™), and aligned with the Translation Shortcuts™, through which prepared respective lexica of the relevant MT systems. (In the of frequent or especially useful phrases in present English<>Spanish version of the system, this the current domain can be instantly executed by database contains some 140,000 entries, searching or browsing. The facility is described in corresponding to more than 350,000 lexical entries. (Seligman and Dillinger, 2006). After a quick The cues are automatically grouped by meaning, and description of the Translation Shortcuts facility cue groups are automatically mapped to MT lexica (Section 4), this paper will emphasize the contribution using proprietary techniques – thus in effect of the Translation Shortcuts facility to domain retrofitting an MT system with the ability to explain portability, showing how a domain-specific set of to users the meanings of its pre-existing lexical Shortcuts can be composed and integrated into the items.) With these cues as guides, the user can system very quickly (Section 5). monitor the current, proposed meaning and if Finally, while the extensive lexical resources necessary select a different, preferred meaning from already built into the system provide the most among those available. Automatic updates of significant boost to domain portability in our system, translation and back-translation then follow. (Our it will always be desirable to add specialized lexical current MT vendor has modified its rule-based items or specialized meanings of existing ones. translation engine to allow specification of a desired Section 6 will briefly present our system’s glossary sense when translating a word or expression; we import facility, through which lexical items can be provide guidelines for other vendors to do likewise. added or updated very quickly. Our concluding Comparable modifications for statistical MT engines remarks appear in Section 7. will entail the setting of temporary weightings that will bias the selection of word or phrase translations 2 Highly Interactive, Broad-coverage for the current sentence only.) Future versions of the SLT system will allow personal word-sense preferences thus specified in the current session to be optionally We now briefly summarize our group’s approach stored for reuse in future sessions, thus enabling a to highly interactive, broad-coverage SLT. Our gradual tuning of word-sense preferences to systems stress interactive monitoring and correction individual needs. (However, such persistent personal of both ASR and MT. preferences will still be applied sentence by sentence, First, users can monitor and correct the speaker- rather than by permanently modifying lexica or phrase dependent speech recognition system to ensure that tables. Further, users will always be able to the text which will be passed to the machine temporarily override, or permanently reset, their translation component is as correct as necessary. personal preferences.) Facilities will also be provided Voice commands (e.g., Scratch That or Correct for sharing such preferences across a working group. ) can be used to repair speech Given such interactive correction of both ASR recognition errors. Thus, users of our SLT systems in and MT, wide-ranging, and even playful, exchanges effect serve to enhance the interface between ASR become possible (Seligman, 2000). Such interactivity and MT. within a speech translation system enables increased Next, during the MT stage, users can monitor, accuracy and confidence, even for wide-ranging and if necessary correct, translation errors. conversations.

41 3 Advantages of Very Broad Coverage marked for repair. (Because our MT components for Domain Switching are presently rule-based, we can address such problems individually and manually. If a This section discusses the advantages of very statistical MT component were used instead, the broad lexical coverage for rapid domain porting. recorded errors could guide the selection of texts Using our interactive SLT system in its present for further training.) configuration, optimized for the health care domain but with a general-purpose foundation of over 60,000 As mentioned, in our system, the back-translation lexical items for ASR and 350,000 lexical items for is designed to function as the first line of defense rule-based MT, we will test several input sentences against inadequate translation. If an unsatisfactory from each of three distinct domains in which back-translation is obtained, we advise users to re- translation is mission-critical – military, emergency phrase the input and translate again until satisfied. response, and law enforcement. The test sentences (False negatives sometimes occur, though we work to were invented by the authors; readers can judge their eliminate them; however, it is best to err on the side of plausibility. They were pronounced by Seligman caution.) If the back-translation is satisfactory, we using the built-in microphone of a Motion Computing advise checking the Meaning Cues as a defense LE1600 tablet PC equipped with a push-to-talk against false positives. These may result if an am- button. biguous English input word is translated into Spanish For each input, we will show (1) the English in the wrong sense (for instance, bank may be trans- input, (2) the original Spanish translation, and (3) the lated as banco ("financial institution") when orilla del English back-translation. We also comment on several río ("riverbank") is wanted), but is nevertheless re- factors: translated as the same ambiguous English word • If ASR errors occurred, we describe their (bank). We are experimenting with mechanisms to interactive resolution. (All inputs were corrected eliminate such cases by substituting non-ambiguous before proceeding with translation. All synonyms in the back-translation for ambiguous input corrections were made by voice.) words. In the current tests, if back-translations are • If our Meaning Cues facility indicated judged insufficient to convey the intended meaning, questionable meanings for any of the expressions paraphrases are substituted and any lexical translation in the input, we note the problems and describe errors are corrected until acceptable results are the resolutions. achieved. All such paraphrases are displayed below, whether they involve simple word substitutions or • Some problems in translation result from bugs or more extensive changes. gaps in the translation component. These are

Military Domain input initial transla- back- comments tion translation Watch out for tenga cuidado Be careful with ASR: mines misrecognized as minds mines around con minas por mines around MT: mines mistranslated as minas (“reserve, here aquí here mine”); meaning interactively changed to minas (“booby trap, ground-emplaced mine, land mine”) [] tenga cuidado Be careful with Spanish is unchanged, but correct meaning con minas por mines around has been verified. aquí here We confiscated confiscamos sus We confiscated MT: arms mistranslated as brazos (“bra- their arms last brazos la semana their arms last chia”); retranslated as armas (“arm, gun, week pasada week weapon”) [retranslation] confiscamos sus We confiscated armas la semana their weapons pasada last week The operation is la operación es The operation is ASR: The misrecognized as knee. scheduled for oh programada para programmed for MT: Translation of oh 600 is uncertain 600 oh 600 oh 600. The operation is la operación es The operation is MT: Translation of 6 a.m. is still unclear. scheduled for 6 programada para programmed for a.m. 6 a.m. 6 a.m. The operation is la operación es The operation is MT: Translation is now verified, given slight scheduled for six programada para programmed for rewording (six instead of six o'clock). o’clock in the las seis de la six in the morn- morning mañana ing

42 We're training Los entrenamos We train them ASR: Correct spelling (c.f. gorillas) was pro- them as guerrillas como guerrillas like guerrillas duced. MT: Bug: tolerable back-translation error: like should be as. NOTE: For the military domain and more generally, improved translation of day times, especially when expressed as military time, is clearly needed.

Emergency Response Domain input initial transla- back- comments tion translation Tell them to drop Dígales a ellos Tell them to MT: Bug: tolerable Spanish>English mis- the food at head- que dejen caer them that they translation of pattern “digales a ellos que quarters la comida en drop the food in ” (“tell them to ”); drop cuartel general headquarters mistranslated as “drop down, drop away, let fall, …”, but no suitable alternate meaning found; substituting drop off … drop off … dígales a ellos Tell them to MT: translation and back-translation un- que dejen caer them that they changed; still no suitable meaning; substi- la comida en drop the food in tuting leave cuartel general headquarters ... leave ... Dígales a ellos Tell them to MT: back-translation and Meaning Cues now que dejen la them that they okay comida en cuar- leave the food at tel general headquarters We need more Necesitamos we need more MT: back-translation levers is considered shovels and más palas y más shovels and more okay for crowbars crowbars right palancas ahora levers right now now mismo

It's a matter of es cuestión de la it is issue of life MT: capitalization of death prompts uncer- life and death vida y la muerte and Death tainty; rephrasing

It's absolutely Es absoluta- it's absolutely MT: meaning cues for critical are okay: “fi- critical. mente crítico. critical nal, significant, key, crucial …” These people are Estas personas These people are MT: Spanish is okay, but poor back- desperately short andan desespe- desperately translation of escasas de (should be “short of water radamente es- scarce of water of/low on”) gives false negative, low confi- casas de agua. dence. Substituting low on. .. low on ... Estas personas These people MT: worse; rephrasing andan desespe- incur in desper- radamente de ately on water. capa caída en agua. These people are estas personas These people are MT: Preposition error in Spanish (para desperate for están desespe- desperate for should be por) gives false positive, but water radas para agua. water. meaning is clear

Law Enforcement Domain input initial transla- back- comments tion translation Step away from Aléjese del coche Get away from MT: get away is acceptable for step away the car the car May I see your Que pueda ver That I can see MT: Unacceptable mistranslation of pattern license, please su licencia, por your license, “que pueda , por favor” (“may I favor. please. , please”); rephrasing Show me your Muéstreme su Show me your license, please licencia, por fa- license, please vor. Keep your hands Conserve sus Preserve your MT: keep mistranslated as conserve (“take, where I can see manos donde las hands where I hold, maintain, save, retain, preserve, …”); them puedo ver. can see them. retranslated as mantenga (“keep”)

43 [retranslation] Mantenga sus Keep your hands manos donde las where I can see puedo ver them How long have Cuánto tiempo How long have MT: minor but tolerable error with preposi- you been living at usted ha vivido you been living in tions this address? en esta direc- this address? ción? Who's your in- Quién es su Who is your in- surer asegurador surer NOTE: General-purpose Spanish>English pattern “que pueda , por favor” (“may I , please”) requires fix for all domains.

4 Translation Shortcuts no need for text entry. Several facilities contribute to Having summarized our approach to highly meeting this design criterion: interactive speech translation and discussed the • A Shortcut Search facility can retrieve a set of advantages of very broad lexical and grammatical relevant Shortcuts given only keywords or the coverage for domain switching, we now turn to the first few characters or words of a string. The use of Translation Shortcuts™ in domain ports. This desired Shortcut can then be executed with a section briefly describes the facility; and Section 5 single gesture (mouse click or stylus tap) or voice explains the methods for quickly updating Shortcuts command. as an element of a rapid port. NOTE: If no Shortcut is found, the system A Translation Shortcut contains a short automatically allows users access to the full translation, typically of a sentence or two, which has power of broad-coverage, interactive speech been pre-verified, whether by a human translator or translation. Thus, a seamless transition is through the use of the system’s interactive tools. Thus provided between the Shortcuts facility and re-verification of the translation is unnecessary. In this full, broad-coverage translation. respect, Translation Shortcuts provide a kind of • A Translation Shortcuts Browser is provided, so translation memory. However, it is a handmade sort of that users can find needed Shortcuts by traversing memory (since Shortcuts are composed by linguists or a tree of Shortcut categories. Using this interface, explicitly saved by users) and a highly interactive sort users can execute Shortcuts by tapping or as well (since users can browse or search for clicking alone. Shortcuts, can make and categorize their own Shortcuts, and are advised when the input matches a Figure 1 below shows the Shortcut Search and Shortcut). It is in the ease of composition or Shortcuts Browser facilities in use. customization, as well as in the quality of the • On the left, the Translation Shortcuts Panel interaction, that innovation can be claimed. contains the Translation Shortcuts Browser, split We can consider the quality of interaction first. into two main areas, Shortcuts Categories (above) Access to stored Shortcuts is very quick, with little or and Shortcuts List (below).

Figure 1: The Input Screen, showing the Translation Shortcuts Browser and Shortcut Search facilities. Note the new Nutrition category and the results of automatic Shortcut Search.

44 • The Categories section of the Panel shows However, final text not matching a Shortcut, e.g. current selection of the Nutrition category, “Do you have any siblings?” will be passed to the containing frequently used questions and routines for full translation with verification. answers for a nutrition interview. This new A Personal Translation Shortcuts™ facility is category was created overnight, as described in in progress for future versions of the system: once a Section 5, below. Currently hidden is its Staff user has verified a translation via the interactive subcategory, containing expressions most facilities described above, he or she can save it for likely to be used by health care staff members. future reuse by pressing a Save as Shortcut button. There is also a Patients subcategory, used for The new custom Shortcut will then be stored in a patient responses. Categories for Background personal profile. Facilities for sharing Shortcuts information, Directions, etc. are also visible. will also be provided. • Below the Categories section is the Shortcuts List section, containing a scrollable list of 5 Rapid Customization of Translation alphabetized Shortcuts. Double clicking on any Shortcuts for New Domains visible Shortcut in the List will execute it. Clicking once will select and highlight a Translation Shortcuts are stored and Shortcut. Typing Enter will execute any distributed as text-format XML files. Each file currently highlighted Shortcut. contains information about which categories (e.g. Nutrition) and subcategories (Staff, Patient, etc.) We turn our attention now to the Input to which each phrase belongs. Since Shortcuts are Window, which does double duty for Shortcut stored as external data files, integration of new Search and arbitrary text entry for full translation. Shortcuts into the system is straightforward and The search facility is also shown in Figure 1. highly scalable. Once we have built a database of • Shortcuts Search begins automatically as soon frequently used expressions and their translations as text is entered by any means – voice, for a given domain (in which there may be handwriting, touch screen, or standard thousands of expressions or just a few), we can keyboard – into the Input Window. automatically generate the associated files in XML • The Shortcuts Drop-down Menu appears just format in minutes. Once this new file is added to below the Input Window, as soon as there are the appropriate directory, the Shortcuts become results to be shown. The user has entered “Do usable in the next session for text- or voice-driven you have”. The drop-down menu shows the searching and browsing. The entire sequence can results of a search within the new Nutrition be completed overnight. In one case, the Nutrition category based upon these initial characters. Department of a major hospital submitted several

pages of frequently asked questions, which were If the user goes on to enter the exact text of entered, translated, re-generated as an XML file, any Shortcut in this category, e.g. “Do you have and integrated into the system for demonstration any food allergies?,” the interface will show that the next day. this is in fact a Shortcut, so that verification of translation accuracy will not be necessary.

Do you have any food allergies? ¿Tiene alguna alergia a alguna comida? Can you tolerate milk? ¿Tolera la leche? Do you follow a special diet at home? ¿Sigue alguna dieta especial en casa?

Figure 2: Sample fragment of an automatically formatted Translation Shortcuts file for the Nutrition>Staff category and subcategory.

45

6 Use of the Glossary Import for Quick increasing perplexity and error rates, so this very- Domain Switching broad-coverage approach has generally been avoided. Our approach, however, has been to adopt a Similarly, our system includes a glossary import broad-coverage design nevertheless, and to function which supports quick addition of domain- compensate for the inevitable increase in ASR and specific or other custom lexical information (e.g., site- MT errors by furnishing users with interactive tools specific or client-specific vocabulary), once again in for monitoring and correcting these mistakes. (We text format. This glossary file may provide additional have to date used rule-based rather than statistical MT terms or may stipulate preferred (and thus overriding) components, but comparable interactive facilities translations for existing terms. The glossary file is could be supplied for the latter as well. Operational automatically generated from a simple, two-column prototypes for English<>Japanese and text-format file in which each line contains the English<>German suggest that the techniques can source-language and target-language terms. A system also be adapted for other than utility will then generate the necessary linguistic English<>Spanish.) Because such interactive tools markup (in curly brackets in Figure 3) for each of the demand some time and attention, we have also put terms. (Markup can be elaborated as appropriate for into place easily modifiable facilities for instant the machine translation engine in use, e.g. to specify translation of frequent phrases (Translation verb sub-categorization, semantic class, etc.) Like the Shortcuts). And finally, since even systems with very XML file used for Translation Shortcuts, the resulting large lexicons will require specialized lexical items or custom glossary file can simply be placed in the specialized meanings of existing ones, we have appropriate directory. implemented a quick glossary import facility, so that lexical items can be added or updated very easily. hemolítico { A, 11, 6, 0, } = hemolytic Our current SLT system, optimized for health hemolitopoyético { A, 11, 6, 0, } = hemolytopoietic care, is now in use at a medium-sized hospital in New hemolizable { A, 11, 6, 0, } = hemolyzable Jersey, with more than twenty machines installed. For hemolización { N, 2, 2, 1, } = hemolyzation this paper, we have applied the same system, without hemolizar { V, 7, 0, 1, } = hemolyze modifications, to sample utterances from the military, derecho { A, 11, 6, 0, } = right emergency service, and law enforcement domains. While this exercise has yielded no quantitative results, Figure 3. Sample glossary-import entries for the readers can judge whether it demonstrates that users health care domain. can convey mission-critical information with acceptable reliability in multiple domains, even in Here, the entry for right establishes the "right- advance of any porting efforts. Users do pay a price hand" sense as the system-wide default, overriding the for this flexibility, since time and attention are current global default sense ("correct"). (The new required for monitoring and correcting to achieve global default can, however, be overridden in turn by reliable results. However, when users judge that a personally preferred sense as specified by a user’s accuracy is not crucial, or when they are unable to personal profile; and both kinds of preferences can be monitor and correct, they can simply accept the first overridden interactively for any particular input translation attempt as is. (A bilingual transcript of sentence.) The other entries are domain-specific each conversation, soon to optionally include the lexical additions for health care not in the general back-translation, is always available for later dictionary. inspection.) They can also gain considerable time We make no claims for technical innovation in through the use of Translation Shortcuts. our Glossary Import facility, but simply point out its usefulness for rapid porting, in that new lexical items, References or new preferred senses for old items, can be altered per user and from session to session. Bouillon, P., Rayner, M., et al. 2005. A Generic Multi-Lingual Open Source Platform for Limited- 7 Conclusion Domain Medical Speech Translation. Presented at EAMT 2005, Budapest, Hungary. The principal source of the porting problems Dillinger, M. and Seligman, M. 2006. Converser™ : affecting most SLT systems to date, we have highly interactive speech-to-speech translation for observed, is that, given the general current reliance health care. HLT-NAACL 2006: Proceedings of upon statistical approaches for both ASR and MT, the Workshop on Medical Speech Translation each new domain has required an extensive and (pp.40-43). New York, NY, USA. difficult-to-obtain new corpus for best results. One might consider the use of a single very large and quite Dillinger, M. and Seligman, M. 2004. System general corpus (or collection of corpora) for statistical description: A highly interactive speech-to- training; but large corpora engender quickly speech translation system. In: Robert E. Frederking and Kathryn B. Taylor (Eds.),

46 Machine translation: from real users to research: 6th conference of the Association for Machine Translation in the Americas -- AMTA 2004 (pp. 58-63). Berlin: Springer Verlag. Gao, Y., Liang, G., Zhou, B., Sarikaya, R., et al. (2006). IBM MASTOR system: multilingual automatic speech-to-speech translator. In: HLT-NAACL 2006: Proceedings of the Workshop on Medical Speech Translation (pp.57-60). New York, NY, USA. Karat, C-M. and Nahamoo, D. 2007. Conversational interface technologies. In A. Sears & J. Jacko (Eds.), The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications. Mahwah, NJ: L. Erlbaum. Koehn, P. 2008. Statistical Machine Translation. New York: Cambridge University Press. Seligman, M.. 2000. Nine Issues in Speech Translation. Machine Translation, 15, 149-185. Seligman, M. and Dillinger, M. 2006. Usability issues in an interactive speech-to-speech translation system for health care. HLT-NAACL 2006: Proceedings of the Workshop on Medical Speech Translation (pp. 1-8). New York, NY, USA. Zong, C. and Seligman, M. 2005. Toward Practical Spoken Language Translation. Machine Transla- tion, 19, 113-137.

47