[8th AMTA conference, Hawaii, 21-25 October 2008]

Hybrid Applied to Media Montoring

Hassan Sawaf, Braddock Gaskill, Michael Veronis

AppTek Inc. 6867 Elm Street #300 McLean, VA 22101, USA {hsawaf,bgaskill,mveronis}@apptek.com

Abstract semantic information while statistical machine translation systems (SMT) use graph-oriented In this paper, a system is presented that decoding mechanisms on multiple hypotheses recognizes spoken utterances in Dia- from the automatic (ASR) to lects which are translated into text in English. correct for the ASR mistakes (Ney et.al. 2000). The input is recorded from a broadcast chan- nel and recognized using automatic speech These disparate systems each have their own recognition that recognize Modern Standard strengths and weaknesses so, independently, they Arabic and Iraqi Colloquial Arabic. The rec- were able to contribute to a part of a solution. Hy- ognized utterances are normalized into Mod- brid machine translation (HMT) elevated the im- ern Standard Arabic and the output of this provement in the final output of an ASR or media Modern Standard Arabic interlingua is then monitoring system by combining the key qualities translated by a hybrid machine translation of RBMT and SMT to generate a more readable system, combining statistical and rule-based and reliable translated transcript. features. In comparison with written language, speech and especially spontaneous speech poses addi- 1 Introduction tional difficulties for the task of MT. Typically, There has long been a need and desire for bet- these difficulties are caused by errors of the rec- ter quality translation. Hearing the spoken word ognition process, which is carried out before the and translating it correctly are two separate proc- translation process. As a result, the sentence to be esses. Recognizing speech and converting it to its translated is not necessarily well-formed from a written form is one. The other is taking that tran- syntactic point-of-view. script and translating it into another language. Even without ASR errors, speech translation Early success in news monitoring applications has to cope with a lack of conventional syntactic was considered to be the ability to achieve a con- structures because the structures of spontaneous sistent 60% or better accuracy in recognition and speech differ from those of written language. A translation. What happens if the speech recogni- prime motivation for creating a hybrid machine tion is flawed and does not detect everything translation system is to take advantage of the needed for an accurate transcript and then the ma- strengths of both rule-based and statistical ap- chine translation tries to process that transcript proaches, while mitigating their weaknesses. based on errors or missed words? Thus, for example, a rule that covers a rare In some cases, automatic machine translation word combination or construction should take (MT) can close the gap by using additional infor- precedence over statistics that were derived from mation: rule-based machine translation systems sparse data (and therefore not very reliable). Ad- (RBMT) aid in correcting these errors by using ditionally, rules covering long-distance dependen-

440 [8th AMTA conference, Hawaii, 21-25 October 2008]

Missed or omitted Hybrid MT can cor- words in the source will rect certain errors and multiply errors in the tar- omissions to produce a get translation. complete transcript and better translation.

Figure 1: MediaSphere Screenshot of a Broadcast Transmission: Left the original Arabic text, the Names are marked in red; Right the translation using AppTek’s Hybrid Machine Translation

cies and embedded structures should be weighted ing audio and textual data for question answering) favorably, since these constructions are more is the weakness that SMT sometimes has in “in- difficult to process in statistical machine transla- formativeness” (the accurate translation of infor- tion. mation) and “adequacy” (how well the meaning of the test translation matches the meaning of the Conversely, a statistical approach should take reference translation) due to the influence of the precedence in situations where large numbers of target language model. For example single words relevant dependencies are available, novel input is that may make a disproportionately heavy contri- encountered or high-frequency word combina- bution to informativeness and adequacy, such as tions occur. terms indicating negation or important content An aspect that is extremely important, espe- words may be missing. cially in regards to processes that extract informa- On the other hand, rule-based systems may tion from text (e.g. a “distillation engine”, prepar- excel with respect to informativeness. For exam-

441 [8th AMTA conference, Hawaii, 21-25 October 2008]

ple, a Lexical Functional Grammar (LFG) rule- tems, which us the Maximum Entropy approach, based system almost equaled the capability of is that they combine many knowledge sources novice translators, and was not far behind expert and, therefore give a good basis for making use of translators, in respect to informativeness in a pre- multiple knowledge sources while analyzing a vious evaluation (Doyon et.al., 1999). sentence for translation. It can be expected that the use of rule-based 1.2 Rule-Based MT Module machine translation in conjunction with statistical machine translation would greatly improve infor- For the presented rule-based module, an LFG mativeness, by imbuing statistical machine trans- system (Shihadah&Roochnik, 1998) is employed lation with all necessary features of a good rule- which is used to feed the hybrid machine transla- based machine translation system, ensuring high tion. The LFG system contains a richly-annotated fluency (a definitive strength of the statistical ap- lexicon containing functional and semantic infor- proach) and increasing adequacy and informa- mation. tiveness (using embedded rule-based machine translation features). 1.3 Hybrid MT A brief comparison of the two systems will In the hybrid machine translation (HMT) help us illustrate how the hybridized MT ap- framework introduced in this paper, the statistical proach unites very valuable features to form a search process has full access to the information comprehensive system: available in LFG lexical entries, grammatical rules, constituent structures and functional struc- RBMT provides fidelity, meaning that infor- tures. This is accomplished by treating the pieces mativeness and adequacy are greater than fluency. of information as feature functions in the Maxi- This means that RBMT output might not read as mum Entropy. well, but it is usually more accurate. Incorporation of these knowledge sources SMT systems strive for accuracy, as well, but both expand and constrain the search possibilities. are more noted for their fluency. MT output reads Areas where the search is expanded include those well and that gives a more immediate appearance in which the two languages differ significantly, as of correctness. Both systems have attractive and for example when a long-distance dependency useful qualities to generate an output that is use- exists in one language but not the other. ful. HMT brings those key features into one sys- 2 Description of HMT Approach tem to deliver output that reads well and is true to Statistical Machine Translation is traditionally the context of the spoken word. This has im- represented in the literature as choosing the target proved the ability of our automated media moni- (e.g., English) sentence with the highest probabil- toring system to capture live feeds that contain ity given a source (e.g., French) sentence. both broadcast quality speech and unrehearsed interviews from the field, transcribe them despite Originally, and most-commonly, SMT uses dialect and other spoken nuances, and create an the “noisy channel” or “source-channel” model English translation that captures the true meaning adapted from speech recognition (Brown of each word that was spoken. et.al.,1990;Brown et.al.,1993). While most SMT systems used to be based on 1.1 Statistical MT Module the traditional “noisy channel” approach, this is Statistical Machine Translation (SMT) sys- simply one method of composing a decision rule tems have the advantage of being able to learn that determines the best translation. Other meth- translations of phrases, not just individual words, ods may be employed and many of them can even which permits them to improve the functionality be combined if a direct translation model using a of both example-based approaches and translation Maximum Entropy is employed. memory. Another advantage to some SMT sys-

442 [8th AMTA conference, Hawaii, 21-25 October 2008]

Such a method enables improved language syntactic and semantic analytic value to the modeling, for it allows postulating proper inde- translation. Functional constraints are multiple, pendence assumptions that reflect the knowledge and some of these functions are language depend- of causality in the real world. In the field of statis- ent (e.g. gender, polarity, mood, etc.). tical parsing, (Collins, 1999) and (Charniak, These functions can be cross-language or 2000) place in their work place a large emphasis within a certain language. A cross-language func- on effective parameterization of their models in tion could be the tense information but also the order to model real-world causality. function “human”, describing that the concept to be generally a human being (“man”, “woman”, 2.1 Translation Models “president” are generally “human”, but also po- The translation models introduced for the sys- tentially concepts like “manager”, “caller”, tem which is described herein is a combination of “driver”, depending on the semantic and syntactic statistically learned lexicons interpolated with a environment). bilingual lexicon used in the rule-based LFG sys- A “within-language” function could be gen- tem. der, as objects can have different genders in dif- 2.2 Language Models ferent languages (e.g. for the translation eqiva- lents of “table”, in English it has no gender, in In this paper the use of lexical and grammati- German it is masculine, in French and Arabic it is cal feature functions in a statistical framework is feminine). introduced. The incorporation of rich lexical and structural data into SMT helps accelerate an 2.4 Translation of MSA into English emerging trend in SMT, which is the use of lin- The translation from Modern Standard Arabic guistic analysis. Analogous to the work of (Sawaf into English is done using the above described et.al., 2000; Charniak et.al., 2003; Och&Ney, system using lexical, functional and syntactical 2004) to improve MT quality language model fea- features which were used in a LFG based system. ture functions of the following form used: the language model feature functions cover standard The statistical models are trained on a bi- 5-gram, POS-based 5-gram and time-synchronous lingual sentence aligned corpus for the translation CYK-type parser, as described in (Sawaf et.al., model and the alignment template model. The 2000). The m-gram language models (word and language models (POS-based and word-based) are POS class-based) are trained on a corpus, where being trained on a monolingual corpus. morphological analysis is utilized. 2.5 Translation of Dialect Arabic into English Then a hybrid translation system is trained to translate the large training corpus in non-dialect Translation of dialect Arabic is implemented language into the targeted dialect language. After using a hybrid MT system that translates the dia- that, the new “artificially” generated corpus is lect into Modern Standard Arabic (MSA). For the utilized to train the statistical language models. presented translation system, a bilingual corpus is For the words, which do not have a translation used which consists of sentences in Iraqi dialect into the target language, are transliterated, using a (Iraqi Colloquial Arabic, ICA) and Modern Stan- transliteration engine, conceptionally borrowed dard Arabic. Also feature functions built out of from Grapheme-to-Phoneme converter like (Bi- rules built to translate (or rather: convert) the dia- sani&Ney, 2002; Wei, 2004). Besides this corpus, lect into non-dialect are used. the original text corpus is used for the training of As much of the Arabic input can be either the final language models. MSA or ICA at the same time, the quality of translation can be increased by using dialect fea- 2.3 Functional Models ture functions for both the MSA and ICA dialect The use of functional constraints for lexical variants and allow the Generative Iterative Scal- information in source and target give a deeper

443 [8th AMTA conference, Hawaii, 21-25 October 2008]

Figure 2: MediaSphere Screenshot of a Media Content Repository

ing (GIS) algorithm to change the weighting of The solution automatically captures and in- these features during the training process. dexes television, video and audio content in near real-time making it fully searchable and accessi- 3 Description of Media Monitoring System ble. Once audio is captured and indexed, Knowl- MediaSphere is a software solution providing edge Management tools facilitate intelligent con- multilingual transcripts of various TV and Radio tent search capabilities for users. stations for many domestic and international news For the speech recognition of dialectal Arabic bureaus as well as transcripts for conversational speech, the main problem is that it is very difficult telephony. MediaSphere supports video, audio to estimate a statistical language model from the and telephony for text processing with advanced very small corpus, especially in comparison to the linguistic capabilities such as machine translation size of the vocabulary. of transcribed text, information retrieval with In addition to the challenge of dialects, are the query translation (cross-language information re- nuances of various topical areas of speech that trieval; XL-IR), automated name and entity rec- present unique terms specific to those areas or ognition (NER) and translation (NET), automatic commonly used terms with different meanings summarization (SUM) and automatic topic detec- tion (TD). MediaSphere is a state of the art solution to facilitate the process of generating transcription from TV transmissions and telephony, translating transcribed text into and from English and deliver rich media content online. The multilingual Medi- aSphere solution utilizes video and audio logging technologies, telephony platforms as well as ASR integrated with MT, XL-IR, NER/NET, SUM and TD. Figure 3: MediaSphere Dialogue for Selection/Stacking of Domains and/or Micro-dictionaries

444 [8th AMTA conference, Hawaii, 21-25 October 2008]

Table 1: Machine Translation output of AppTek’s Hybrid Machine Translation in comparison to a state-of-the-art 3rd party Sta- tistical Machine Translation News & General AppTek HMT 3rd Party SMT Remarks -Musharraf calls for reconcilia- Musharraf calls for reconcilia- For fluency the SMT reﻣﺸﺮﻑ ﻳﺪﻋﻮ ﻟﻠﻤﺼﺎﳊﺔ ﻭﻳﺘﺠﺎﻫﻞ tion and ignores the efforts his tion efforts and ignores the moved the pronoun andﻣﺴﺎﻋﻲ ﺇﺑﻌﺎﺩﻩ ﻋﻦ ﺍﻟﺴﻠﻄﺔ expulsion from the authority removal from power created ambiguity about who is being removed Process in the heart to Jalal The heart of Amman, al- The correct translation is aﻋﻤﻠﻴﺔ ﺑﺎﻟﻘﻠﺐ ﻟﻠﻄﺎﻟﺒﺎﻧﻲ ﻭﺍﳌﺸﻬﺪﺍﻧﻲ Talabani and Al Mshhdani in Mashhadani, Talabani and for heart operation in Ammanﺑﻌﻤﺎﻥ ﻹﺟﺮﺍﺀ ﻓﺤﻮﺹ Amman for examinations tests for Talabani and tests for Mashadani The explosion occurred about The explosion occurred about The correct translation ofﻭﻭﻗﻊ ﺍﻟﺘﻔﺠﻴﺮ ﺣﻮﺍﻟﻰ ﺍﻟﺴﺎﻋﺔ ﺍﻟﺜﺎﻣﻨﺔ eight thirty a.m. local time, an eight o'clock in the morning the time is eight thirty (8:30ﻭﺍﻟﻨﺼﻒ ﺻﺒﺎﺣﺎ ﺑﺎﻟﺘﻮﻗﻴﺖ ﺍﳌﺤﻠﻲ, ﻭﻫﻮ arrival date of auditors to the local time, when the arrival of am) not 8ﻣﻮﻋﺪ ﻭﺻﻮﻝ ﺍﳌﺮﺍﺟﻌﲔ ﺇﻟﻰ ﺩﺍﺋﺮﺓ ﺍﳉﻮﺍﺯﺍﺕ , -circle passports, and resulted auditors to the Passport Servﻭﺃﺳﻔﺮ ﻋﻦ ﺍﺣﺘﺮﺍﻕ ﺧﻤﺲ ﺳﻴﺎﺭﺍﺕ ﻣﺪﻧﻴﺔ in the burning five civil cars ice, and resulted in the burningﻭﺇﳊﺎﻕ ﺃﺿﺮﺍﺭ ﻣﺎﺩﻳﺔ ﺑﺎﳌﺒﺎﻧﻲ ﻭﺍﳌﺤﺎﻝ -and causing material damages of five civilian cars and causﺍﻟﺘﺠﺎﺭﻳﺔ ﺍﳌﺠﺎﻭﺭﺓ. to the buildings and neighbor- ing material damage to build- ing shops. ings and shops nearby. -The source said a truck bomb The source said that a truck All words in red are absoﻭﺫﻛﺮ ﺍﳌﺼﺪﺭ ﺃﻥ ﺷﺎﺣﻨﺔ ﻣﻔﺨﺨﺔ ﻣﻦ of the type (a) was parked bomb aircraft (Kia) was near lutely wrong translations. Itﻃﺮﺍﺯ (ﻛﻴﺎ) ﻛﺎﻧﺖ ﻣﺮﻛﻮﻧﺔ ﺑﺎﻟﻘﺮﺏ ﻣﻦ ﻣﺮﺃﺏ near the garage for the cars Marconi garage for cars con- is a truck not an aircraft. Itﻟﻠﺴﻴﺎﺭﺍﺕ ﺗﺎﺑﻊ ﻟﺪﺍﺋﺮﺓ ﺟﻮﺍﺯﺍﺕ ﺍﻷﻋﻈﻤﻴﺔ belonging to the service tinued to circle passports is a parking garage notﺍﻟﻮﺍﻗﻌﺔ ﻓﻲ ﺷﺎﺭﻉ ﺍﳌﻐﺮﺏ ﺍﻧﻔﺠﺮﺕ ﻣﻮﻗﻌﺔ 12 -passports-in the street of mo- Alaazemih located in the street Marconi garage. It is Moﻗﺘﻴﻼ ﻣﺪﻧﻴﺎ ﻭﻋﺸﺮﻳﻦ ﺟﺮﻳﺤﺎ. rocco exploded killing 12 ci- exploded Morocco signed 12 rocco street not Morocco vilians and twenty injured. civilians were killed and twenty injured.

specific to those areas. For example, a news • Military; broadcast might report on the year’s flu season • Special Operations; and then a malicious attack spread by a hacker. Both topics might use the term “virus” with dif- • Mechanical; ferent meanings. The transcript created by the • Political & Diplomatic; ASR and the subsequent translation could be in- accurate without the proper understanding of the • Nuclear; context of the term “virus” in each instance. The • Chemical; use of domain-specific information resolves this potential problem. • Aviation; • Computer & Technology; 3.1 Domains and Micro-dictionaries • Medical; The introduction of special domain dictionar- ies is readily available. Multiple domain specific • Business & Economics; on-line dictionaries include, in addition to the • Law Enforcement; general dictionary, the following micro- dictionaries: • Drug Terms.

445 [8th AMTA conference, Hawaii, 21-25 October 2008]

4 Examples RWTH’s ASR team and the AppTek’s MT team for fruitful discussions and support. The tests were performed on input from broadcast news using the open domain as well as References input from the military domain. The experiments show that the hybrid MT performs better for this Bisani, M., H. Ney. 2002. Investigations on Joint- Mul- task than either the purely statistical or purely tigram Models for Grapheme-to-Phoneme Conversion rule-based MT approaches. International Conference on Spoken Language Proc- essing (ICSLP), pp. 105–108. Denver, CO. The hybrid machine translation approach in- troduced here shows a very high accuracy in all Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. categories: fluency, informativeness and ade- Jelinek, J. Lafferty, R. Mercer, and P. Roosin. 1990. A quacy. The approach shows that information units Statistical Approach to Machine Translation. Compu- which need to be translated are processed cor- tational Linguistics, 16, pp. 79–85. Cambridge, MA. rectly, moreover the output of the translation reads Brown, P., S. Della Pietra, V. Della Pietra, and R. Mer- fluently. cer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational 5 Conclusion Linguistics, 19(2), pp. 263–311. Cambridge, MA. This paper shows that the proposed Hybrid Charniak, E., K. Knight, and K. Yamada. 2003. Syntax- Machine Translation approach shows better re- Based Language Models for Statistical Machine Trans- sults than a pure rule-based and a pure corpus- lation. In Proceedings of MT Summit IX, 23–27. New based approach for both written and especially for Orleans, LA. spoken input. It also introduces an approach to increase language model quality for dialect lan- Charniak, E. 2000. A Maximum-Entropy-Inspired guage speech recognition by using non-dialect, Parser. In Proceedings of the 2000 Conference of the non-spontaneous language resources. North American Chapter of the Association for Com- putational Linguistics (NAACL). Seattle, WA. Future work will incorporate further integra- tion of other features into the translation process. Collins, M. 1999. Head-Driven Statistical Models for Also the use of full morphosyntactic analysis will Natural Language Parsing. Thesis, Department of be helpful, as Arabic is highly morphologically Computer and Information Science, University of inflected. (Nießen, 2002) shows that the utiliza- Pennsylvania. Philadelphia, PA. tion of morpho-syntactic analysis promises to Doyon, J., K. Taylor and J. White. 1999. Task-Based have an impact on languages, especially morpho- Evaluation for Machine Translation. In Proceedings of logically complex languages like Finnish, Arabic the Machine Translation Summit VII. Singapore. and Hungarian, but also languages like German. In the context of this hybrid approach, the utiliza- Ney, H., S. Nießen, F. J. Och, H. Sawaf, C. Tillmann, tion of deep morphology should have an even S. Vogel. 2000. Algorithms for statistical translation of higher impact, as syntactical analysis is performed spoken language. In IEEE Transactions on Speech and within the core translation procedure as opposed Audio Processing. 8(1):24–36. Piscataway, NJ. to just being preprocessing step for a purely Nießen, S. 2002. Improving Statistical Machine Trans- corpus-based translation (e.g. statistical transla- lation Using Morpho-syntactic Information. Thesis, tion). Aachen University of Technology. Aachen, Germany. Acknowledgments Och, F. J., and H. Ney. 2004. The Alignment Template Approach to Statistical Machine Translation. In Com- The work and experiments in this paper are putational Linguistics, 30(4), pp. 417–449. Cambridge, partly funded by DARPA under Contract No. MA. HR0011-07-C-0023 and HR0011-08-C-0110. The authors would like to thank Hermann Ney and the Sawaf, H., K. Schütz, and H. Ney. 2000. On the Use of Grammar-Based Language Models for Statistical Ma-

446 [8th AMTA conference, Hawaii, 21-25 October 2008]

chine Translation. In Proceedings of the Sixth Interna- tional Workshop on Parsing Technologies (IWPT), pp. 231–241. Trento, Italy. Shihadah, M., P. Roochnik. 1998. Lexical-Functional Grammar as a Computational-Linguistic Underpin- ning to Arabic Machine Translation. In Proceedings of the 6th International Conference and Exhibition on Multi-lingual Computing. Cambridge, UK. Wei, G. 2004. Phoneme-based Statistical Translitera- tion of Foreign Names for OOV Problem. Thesis. Hong Kong, China.

447