Spoken Language Translator: Phase Two Report (Draft)
Total Page:16
File Type:pdf, Size:1020Kb
Spoken Language Translator: Phase Two Report (Draft) Ralph Becket , Pierrette Bouillon , Harry Bratt , Ivan Bretan , David Carter , Vassilios Digalakis , Robert Eklund , Horacio Franco , Jaan Kaja , Martin Keegan , Ian Lewin , Bertil Lyberg , David Milward , Leonardo Neumeyer , Patti Price , Manny Rayner , Per Sautermeister , Fuliang Weng and Mats Wirén February 1997 SRI Project 6393 This is a joint report by SRI International and Telia Research AB; published simultaneously by Telia and SRI Cambridge. SRI International, Cambridge SRI International, Menlo Park Telia Research ISSCO, Geneva Executive summary Spoken Language Translator (SLT) is a project whose long-term goal is the construc- tion of practically useful systems capable of translating human speech from one lan- guage into another. The current SLT prototype, described in detail in this report, is ca- pable of speech-to-speech translation between English and Swedish in either direction within the domain of airline flight inquiries, using a vocabulary of about 1500 words. Translation from English and Swedish into French is also possible, with slightly poorer performance. A good English-language speech recognizer existed before the start of the project, and has since been improved in several ways. During the project, we have constructed a Swedish-language recognizer, arguably the best system of its kind so far built. This has involved among other things collection of a large amount of Swedish training data. The recognizer is essentially domain-independent, but has been tuned to give high performance in the air travel inquiry domain. The main version of the Swedish recognizer is trained on the Stockholm dialect of Swedish, and achieves near-real-time performance with a word error rate of about 7%. Techniques developed partly under this project make it possible to port the recognizer to other Swedish dialects using only modest quantities of training data. On the language-processing side, we had at the start of the project a substantial domain-independent language-processing system for English, a preliminary Swedish version, and a sketchy set of rules to permit English to Swedish translation. We now have good versions of the language-processing system for English, Swedish and French, and fair to good support for translation in five of the six possible language- pairs. Translation is carried out using a novel robust architecture developed under the project. In essence, this translates as much of the input utterance as possible using a sophisticated grammar-based method, and then employs a much simpler set of word- to-word translation rules to fill in the gaps. The language-processing modules are all generic in nature, are based on large, linguistically motivated grammars, and can fairly easily be tuned to give good perfor- mance in new domains. Much of the work involved in the domain adaptation process can be carried out by non-experts using tools developed under the project. Formal comparisons are problematic, in view of the different domains and lan- guages used and the lack of accepted evaluation criteria. None the less, the evidence at our disposal suggests that the current SLT prototype is no worse than the German Verbmobil demonstrator, in spite of a difference in project budget of more than an order of magnitude. i Acknowledgements The Spoken Language Translation project was funded by Telia Nät Tjänster (Swedish Telecom Networks Division) and the work was carried out jointly by Telia Research and SRI International. We would like to thank Christer Samuelsson for making available to us the LR compiler described in Chapter 6, and Steve Pulman and Christer Samuelsson for helpful comments on the material in 6. The work described in Appendix A was partly funded by the Defence Research Agency, Malvern, UK, under Strategic Research Project AS04BP44. We are grateful to Małgorzata Stys´ for comments on the material in that appendix and also for providing her analysis of the Polish examples in it. The greater part of the work described in Chapter 11 was carried out under funding from SRI International (at SRI Cambridge) and Suissetra (at ISSCO, Geneva). Much of the material in this report is based on published papers. Permission to re-use parts of those papers is gratefully acknowledged. Chapter 2 uses material from [IVTTA96], c IEEE; Chapter 5 uses Rayner and Carter (1997), c IEEE; Chapters 5 and 6 draw on Rayner and Carter (1996), and Appendix A uses Carter (1995), which are both c Association for Computational Linguistics; Chapter 11 is an edited version of Rayner, Carter and Bouillon (1996); Chapter 12 uses Rayner and Bouillon (1995); Chapter 14 is based on Rayner et al (1996); part of Chapter 16 uses Carter et al (1996) and Chapter 17 uses Rayner et al (1994), both of which are c IEEE. ii Contents Executive summary . ............................ i Acknowledgements . ............................ ii Table of contents . ............................ iii List of tables . ................................ xi List of figures . ................................ xiii 1 Introduction 1 1.1 Overview of the project . ........................ 1 1.1.1 Why do spoken language translation? ............. 1 1.1.2 What are the basic problems? ................. 2 1.1.3 What is it realistic to attempt today? . ............. 3 1.1.4 What have we achieved? . ................. 4 1.2 Overall system architecture . ................. 5 1.3 An SLT session . ............................ 9 1.3.1 The Interface Elucidated . ................. 9 1.3.2 Synthesis ............................ 13 1.3.3 Final Comments ........................ 13 2 Language Data Collection 15 2.1 Rationale and Requirements . ................. 15 2.2 Methodology . ............................ 17 2.2.1 Wizard-of-Oz Simulations . ................. 17 2.2.2 American ATIS Simulations . ................. 17 2.2.3 Swedish ATIS Simulations . ................. 17 2.3 Translations of American WOZ Material . ............. 18 2.3.1 Translations... A First Step . ................. 19 2.3.2 Email Corpus . ........................ 19 2.3.3 Sundry Comments on Email Corpus Editing . ......... 21 2.4 A Comparison of the Corpora . ................. 21 2.5 Concluding Remarks . ........................ 23 3 Speech Data Collection 25 3.1 Rationale and Requirements . ................. 25 3.2 Text Material . ............................ 26 3.2.1 ATIS Material: LDC CD-ROMs . ............. 26 iii iv 3.2.2 Swedish Material: The Stockholm-Umeå Corpus . 26 3.3 Text Material Used in Data Collection ................. 26 3.3.1 Practice Sentences . ................. 27 3.3.2 Calibration Sentences . ................. 27 3.3.3 Calibration Sentences – ATIS ................. 28 3.3.4 Calibration Sentences – Newspaper Texts . ......... 28 3.3.5 ATIS Sentences . ........................ 28 3.3.6 Expanded Phone Set . ................. 28 3.4 Dialect Areas . ............................ 30 3.4.1 Subjects . ............................ 30 3.5 Recording Procedures . ........................ 30 3.5.1 Computer ............................ 30 3.5.2 Headset . ............................ 31 3.5.3 Telephones . ........................ 31 3.5.4 The SRI Generic Recording Tool . ............. 32 3.5.5 Settings . ............................ 32 3.6 Checking ................................ 32 3.7 Lexicon . ................................ 33 3.8 Concluding Remarks . ........................ 35 4 Speech Recognition 36 4.1 The Swedish Speech Corpus . ................. 37 4.2 The Swedish Lexicon . ........................ 37 4.2.1 Introduction . ........................ 37 4.2.2 Phone Set ............................ 37 4.2.3 Morphology . ........................ 38 4.2.4 Lexicon Statistics ........................ 39 4.3 The English/Swedish Speech Recognition System . ......... 39 4.3.1 Diagnostic Experiments . ................. 39 4.3.2 Speed Optimization . ................. 41 4.3.3 Swedish Recognition . ................. 43 4.3.4 Summary ............................ 44 4.4 Dialect Adaptation . ........................ 44 4.4.1 Dialect Adaptation Methods . ................. 45 4.4.2 Experimental Results . ................. 46 4.4.3 Summary ............................ 50 4.5 Language Modeling . ........................ 50 4.5.1 Interpolating In-domain LMs with Out-of-domain LMs . 50 4.5.2 Class-Based Language Modeling . ............. 51 4.5.3 Compound Splitting in the Swedish System . ......... 51 4.6 Development of a phone backtrace for the n-best decoding algorithm . 54 4.7 The Bilingual Speech Recognition System . ............. 56 4.7.1 Experimental Setup . ................. 57 4.7.2 Multilingual Recognition . ................. 57 4.7.3 Language Identification . ................. 58 4.7.4 Summary ............................ 59 v 5 Overview of Language Processing 60 5.1 Introduction . ............................ 60 5.2 Linguistically Motivated Robust Parsing . ............. 61 5.3 Semi-automatic domain adaptation of grammars . ......... 63 5.3.1 Rational Development of Rule Sets . ............. 63 5.3.2 Training by Interactive Disambiguation . ......... 64 5.4 Summary ................................ 65 6 Customization of Linguistic Knowledge 67 6.1 Linguistic Analysis in the Core Language Engine . ......... 67 6.2 Constituent Pruning . ........................ 69 6.2.1 Discriminants for Pruning . ................. 69 6.2.2 Deciding which Edges to Prune . ............. 72 6.2.3 Probability Estimates for Pruning . ............. 72 6.2.4 Relation to other pruning