Hybrid Machine Translation Applied to Media Monitoring

[8th AMTA conference, Hawaii, 21-25 October 2008] Hybrid Machine Translation Applied to Media Montoring Hassan Sawaf, Braddock Gaskill, Michael Veronis AppTek Inc. 6867 Elm Street #300 McLean, VA 22101, USA {hsawaf,bgaskill,mveronis}@apptek.com Abstract semantic information while statistical machine translation systems (SMT) use graph-oriented In this paper, a system is presented that decoding mechanisms on multiple hypotheses recognizes spoken utterances in Arabic Dia- from the automatic speech recognition (ASR) to lects which are translated into text in English. correct for the ASR mistakes (Ney et.al. 2000). The input is recorded from a broadcast channel and recognized using automatic speech These disparate systems each have their own recognition that recognize Modern Standard strengths and weaknesses so, independently, they Arabic and Iraqi Colloquial Arabic. The rec- were able to contribute to a part of a solution. Hy- ognized utterances are normalized into Mod- brid machine translation (HMT) elevated the im- ern Standard Arabic and the output of this provement in the final output of an ASR or media Modern Standard Arabic interlingua is then monitoring system by combining the key qualities translated by a hybrid machine translation of RBMT and SMT to generate a more readable system, combining statistical and rule-based and reliable translated transcript. features. In comparison with written language, speech and especially spontaneous speech poses addi- 1 Introduction tional difficulties for the task of MT. Typically, There has long been a need and desire for bet- these difficulties are caused by errors of the rec- ter quality translation. Hearing the spoken word ognition process, which is carried out before the and translating it correctly are two separate proc- translation process. As a result, the sentence to be esses. Recognizing speech and converting it to its translated is not necessarily well-formed from a written form is one. The other is taking that tran- syntactic point-of-view. script and translating it into another language. Even without ASR errors, speech translation Early success in news monitoring applications has to cope with a lack of conventional syntactic was considered to be the ability to achieve a con- structures because the structures of spontaneous sistent 60% or better accuracy in recognition and speech differ from those of written language. A translation. What happens if the speech recogni- prime motivation for creating a hybrid machine tion is flawed and does not detect everything translation system is to take advantage of the needed for an accurate transcript and then the ma- strengths of both rule-based and statistical ap- chine translation tries to process that transcript proaches, while mitigating their weaknesses. based on errors or missed words? Thus, for example, a rule that covers a rare In some cases, automatic machine translation word combination or construction should take (MT) can close the gap by using additional infor- precedence over statistics that were derived from mation: rule-based machine translation systems sparse data (and therefore not very reliable). Ad- (RBMT) aid in correcting these errors by using ditionally, rules covering long-distance dependen- 440 [8th AMTA conference, Hawaii, 21-25 October 2008] Missed or omitted Hybrid MT can cor- words in the source will rect certain errors and multiply errors in the tar- omissions to produce a get translation. complete transcript and better translation. Figure 1: MediaSphere Screenshot of a Broadcast Transmission: Left the original Arabic text, the Names are marked in red; Right the translation using AppTek’s Hybrid Machine Translation cies and embedded structures should be weighted ing audio and textual data for question answering) favorably, since these constructions are more is the weakness that SMT sometimes has in “in- difficult to process in statistical machine transla- formativeness” (the accurate translation of infor- tion. mation) and “adequacy” (how well the meaning of the test translation matches the meaning of the Conversely, a statistical approach should take reference translation) due to the influence of the precedence in situations where large numbers of target language model. For example single words relevant dependencies are available, novel input is that may make a disproportionately heavy contri- encountered or high-frequency word combina- bution to informativeness and adequacy, such as tions occur. terms indicating negation or important content An aspect that is extremely important, espe- words may be missing. cially in regards to processes that extract informa- On the other hand, rule-based systems may tion from text (e.g. a “distillation engine”, prepar- excel with respect to informativeness. For exam- 441 [8th AMTA conference, Hawaii, 21-25 October 2008] ple, a Lexical Functional Grammar (LFG) rule- tems, which us the Maximum Entropy approach, based system almost equaled the capability of is that they combine many knowledge sources novice translators, and was not far behind expert and, therefore give a good basis for making use of translators, in respect to informativeness in a pre- multiple knowledge sources while analyzing a vious evaluation (Doyon et.al., 1999). sentence for translation. It can be expected that the use of rule-based 1.2 Rule-Based MT Module machine translation in conjunction with statistical machine translation would greatly improve infor- For the presented rule-based module, an LFG mativeness, by imbuing statistical machine trans- system (Shihadah&Roochnik, 1998) is employed lation with all necessary features of a good rule- which is used to feed the hybrid machine transla- based machine translation system, ensuring high tion. The LFG system contains a richly-annotated fluency (a definitive strength of the statistical ap- lexicon containing functional and semantic infor- proach) and increasing adequacy and informa- mation. tiveness (using embedded rule-based machine translation features). 1.3 Hybrid MT A brief comparison of the two systems will In the hybrid machine translation (HMT) help us illustrate how the hybridized MT ap- framework introduced in this paper, the statistical proach unites very valuable features to form a search process has full access to the information comprehensive system: available in LFG lexical entries, grammatical rules, constituent structures and functional struc- RBMT provides fidelity, meaning that infor- tures. This is accomplished by treating the pieces mativeness and adequacy are greater than fluency. of information as feature functions in the Maxi- This means that RBMT output might not read as mum Entropy. well, but it is usually more accurate. Incorporation of these knowledge sources SMT systems strive for accuracy, as well, but both expand and constrain the search possibilities. are more noted for their fluency. MT output reads Areas where the search is expanded include those well and that gives a more immediate appearance in which the two languages differ significantly, as of correctness. Both systems have attractive and for example when a long-distance dependency useful qualities to generate an output that is use- exists in one language but not the other. ful. HMT brings those key features into one sys- 2 Description of HMT Approach tem to deliver output that reads well and is true to Statistical Machine Translation is traditionally the context of the spoken word. This has im- represented in the literature as choosing the target proved the ability of our automated media moni- (e.g., English) sentence with the highest probabil- toring system to capture live feeds that contain ity given a source (e.g., French) sentence. both broadcast quality speech and unrehearsed interviews from the field, transcribe them despite Originally, and most-commonly, SMT uses dialect and other spoken nuances, and create an the “noisy channel” or “source-channel” model English translation that captures the true meaning adapted from speech recognition (Brown of each word that was spoken. et.al.,1990;Brown et.al.,1993). While most SMT systems used to be based on 1.1 Statistical MT Module the traditional “noisy channel” approach, this is Statistical Machine Translation (SMT) sys- simply one method of composing a decision rule tems have the advantage of being able to learn that determines the best translation. Other meth- translations of phrases, not just individual words, ods may be employed and many of them can even which permits them to improve the functionality be combined if a direct translation model using a of both example-based approaches and translation Maximum Entropy is employed. memory. Another advantage to some SMT sys- 442 [8th AMTA conference, Hawaii, 21-25 October 2008] Such a method enables improved language syntactic and semantic analytic value to the modeling, for it allows postulating proper inde- translation. Functional constraints are multiple, pendence assumptions that reflect the knowledge and some of these functions are language depend- of causality in the real world. In the field of statis- ent (e.g. gender, polarity, mood, etc.). tical parsing, (Collins, 1999) and (Charniak, These functions can be cross-language or 2000) place in their work place a large emphasis within a certain language. A cross-language func- on effective parameterization of their models in tion could be the tense information but also the order to model real-world causality. function “human”, describing that the concept to be generally a human being (“man”, “woman”, 2.1 Translation Models

Load more