<<

From Statistical Term Extraction to Hybrid Machine

Petra Wolf, Ulrike Bernardi Christian Federmann, Sabine Hunsicker Lucy Software and Services DFKI Neumarkter Str. 81 Stuhlsatzenhausweg 3 81673 Munich, Germany 66123 Saarbrucken,¨ Germany petra.wolf,[email protected] sabine.hunsicker,[email protected]

Abstract multi-layered Linguistically augmented Statistical ). We decided to impose This study presents a new hybrid approach strict knowledge-enhanced requirements for the for translation equivalent selection within statistical term extraction which consequently was a transfer-based machine translation sys- augmented by several layers of linguistic data re- tem using an intertwined net of traditional finement and automatic feature attribution. LiS- linguistic methods together with statisti- TEX is distinct from other term extractors as it cal techniques. Detailed evaluation reveals has layers that access already during the early term that the translation quality can be improved defining extraction phase the RBMT system com- substantially in this way. ponents for linguistic filtering and later production of knowledge-augmented output. In contrast to 1 Introduction the phrase-table approach, the F-measure is high A promising integration point for statistical which avoids evident deteriorations after integra- techniques into Rule-Based Machine Translation tion into the MT system. (RBMT) systems is the transfer phase. A key prob- The remainder of this paper is organized as fol- lem here is to deal with non-deterministic rules lows: Section 2 gives an overview of related work. and preferences in order to disambiguate and select Details on the terminology extraction are then pro- the most natural expressions in the target vided in Section 3 followed by a description of the (cf. (Eisele et al., 2008b; Thurmair, 2009)). We results of terminology and translation quality eval- performed studies on a phrase-table-driven trans- uations in section 4. Section 5 summarizes the key fer approach based on an RBMT system (for de- findings and outlines open issues for future work. tails on the RBMT system see (Alonso and Thur- mair, 2003)): A prototype was built accessing sta- 2 Related work tistically generated bilingual phrase tables at run- There have been several studies of hybrid machine time in addition to system and . translation approaches to overcome the drawbacks This hybrid prototype showed indeed better lexi- of rule-based and statistical MT alone by a com- cal selection, improved coverage and even some- bined approach, starting from RBMT as well times enhanced , but well-known issues in as starting from SMT. Evaluations of SMT vs. morpho-syntax and a very low hit rate of the phrase RBMT systems revealed that one of the weak table module remained as limiting factors. Thus al- points of RBMT systems is the lexical selection in though the data which were accessed and preferred transfer (cf. (Thurmair, 2009; Chen et al., 2009). are obviously the desired ones, the first challenge Since RBMT systems tend to suffer from insuf- was the lack of deeper linguistic knowledge within ficient and too deterministic lexical coverage and the data while the second challenge was the small choice (Eisele et al., 2008b), this study concen- F-measure of the data itself. trates on automatic enlargements of RBMT lexi- This led us to a deeper intertwined hybrid exten- cons and enhanced transfer-generation operations, sion: The LiSTEX approach (Hybrid Transfer by while taking into account the peculiarities of a spe- c 2011 European Association for Machine Translation. cific RBMT system and statistical techniques. Mikel L. Forcada, Heidi Depraetere, Vincent Vandeghinste (eds.) Proceedings of the 15th Conference of the European Association for Machine Translation, p. 225232 Leuven, Belgium, May 2011 The considerable potential of statistical term ex- traction combined with RBMT has been evalu- ated by other researchers, such as (Thurmair, 2003; Dugast et al., 2009). Here we go one step further by already integrating some RBMT techniques into the very early stages of the statistical term extraction process. This helps to avoid the well- known problems found in term guessing and iden- tification (Heid, 1999). Additional linguistic lay- ers assure that the extracted terms and phrases are tailored and augmented for the RBMT system in question.

3 Intelligent Terminology Extraction The underlying term extraction tool extracts term pairs by means of statistical from ex- isting translation memories or bilingual corpora (Eisele et al., 2008a). As a bilingual corpus, the (Koehn, 2005) was used for devel- opment. As a test set, we choose the ACL WMT 2008 test set which contains the Q4/2000 portion Figure 1: Term Extraction Workflow of the EuroParl data (2000-10 to 2000-12). The initial design study on peculiarities of a spe- tents of this terminology to be extracted and im- cific statistically based term extraction that satis- ported are determined by the fine-grained data fies the requirements of an RBMT system dealt needed by an RBMT system. The first major modi- with issues like term definition in a strict linguis- fication of the statistical extraction has to fulfill the tic and system-related sense, term identification requirement to extract only well defined linguis- as single vs. multiword expressions, base tic terms. For this reason, the extraction toolkit form reduction and application of linguistic cate- has been extended in order to access and use the gory patterns for identification or elimination of RBMT analysis, transfer and generation trees to noise at the end of the first multi-layered extrac- find interesting linguistic phrases, i.e. terms. In tion phase. Also the term recognition needed to be addition, it checks whether the translation of a spe- defined in relation with the linguistically based tar- cific term in the transfer tree, be it a single or a get system, i.e. comparison of the extraction result multiword, differs from the reference translation with the target system in order to identify in the corpus. In this way, the term extraction tool known vs. unknown terms. Thus this painstaking delivers only terms which receive a linguistic iden- specification research provided the basis for the ex- tification and which are translated differently with tended implementation of the LiSTEX term identi- the conventional RBMT system than in the bilin- fication, knowledge-enhanced acquisition and aug- gual memory. mentation modules. The extracted terminology has to contain the For LiSTEX, we concentrated on German- following minimum information: Canonical forms English and Spanish-English. The remaining sub- and categories of the source and target terms, do- sections explain details of the intelligent term ex- main area, frequency and, in case of multiwords, traction techniques and present the term extraction the internal structure of the expression. Finally, workflow as a whole (see Figure 1). the context has to be passed through: One exam- ple source language sentence in which the term 3.1 Term Identification with Initial Meta occurs in the corpus with the corresponding trans- Knowledge Acquisition and Organization lation equivalent sentence, useful for later manual The objective is to have the semantic translation inspection. equivalents decided by the statistical technique. For achieving this first layer of terminology ac- The requirements, however, on the linguistic con- quisition, the source and reference text are tok-

226 enized, tagged with part of speech information and as punctuation marks or integers, are filtered the tokens are lemmatized. Source and reference into separate lists. Since these entries tend to texts are aligned -by-word, the be wrongly aligned and extracted, the sublists is translated by the RBMT Engine and analysis, have to be inspected and only useful terms transfer and generation trees are created. These will be imported. trees are aligned to the words in the source and ref- All entries showing a category change • erence text and all phrases which the RBMT sys- (i.e. source and target categories are not tem translated differently than the reference trans- identical) are excluded from import since lation are selected as potential term candidates. they are likely to be wrong. Sometimes they Since these phrases still include the original sur- are caused by errors during part of speech face forms, the canonical forms are built now, e.g. tagging, sometimes by alignment errors. adjectives are inflected properly to match the head Quality Splitting procedures which allow • noun. Thus the terms receive the proper further predictions as to the quality of the format. The categories for the entire terms are de- extracted terms. A German multiword term rived from the part of speech sequences. The list for example which results in an English of terms is alphabetically sorted and non-frequent single target word is a likely non-valid term, terms are filtered out. mostly due to alignment errors: Up to this point in the processing chain, the ex- 1. Single words on source and target side traction process is done for one language direction 2. Single words on source, multiword ex- only. Now the other language direction is gener- pressions on target side ated and the intersection of both is created to avoid 3. Multiword expressions on source side, wrong term pairs caused by alignment problems. single words on the target side The next step is the collocation check to re- duce the amount of false entries. Frequencies for 4. Multiword expressions for source and all term pairs (single words as well as multiword target language Additional generation of part of speech sub- expressions) are extracted. We check the single • word entries and extract all the multiword expres- lists for nouns, verbs, adjectives and ad- sions containing this word and their corresponding verbs respectively will facilitate the import frequencies. If a word is part of a multiword, a preprocessing since the linguistic splitting of certain threshold calculation referring to the fre- the output files allows for a fast and pre- quencies of the single word and its corresponding cise quality assessment of intermediate mod- multiwords limits the acquisition of sin- ules. Moreover, the further automatic lin- gle words. The higher this threshold is set, the guistic feature augmentation of the extracted more single terms will be rejected. In this way, terms will be category-dependent. we can avoid that the proper name Tour de France 3.2 RBMT-Targeted Quality Precision Phase in the text will lead to two extracted terms: Tour de France and France, since France only appears After completion of the LiSTEX acquisition phase, within that multiword and not on its own and is semi-automatic inspection of the filtered and split therefore not a correct translation into German. terms revealed several problems mainly due to: Once these refined extraction layers have been Wrong derivation of verbalized nouns: ac- completed, there are final actions to be performed • cuse, belong instead of accused, belonging. in order to generate terminology which is linguisti- Missing noun heads: monetary instead of cally more precise to be handled accurately during • the subsequent cleanup and feature augmentation monetary authority, automobile instead of preprocessing. Thus the next step performs an ad- automobile industry. Weak adjectival nouns which would be ditional linguistically driven differentiation pro- • cess important for quality growth. wrongly analyzed as plural: Angeklagte im Strafverfahren instead of the correct Entries where the lemmatizer could not find masculine canonical form Angeklagter im • the correct lemma and just guesses it, are Strafverfahren. stored in separate lists. All kinds of wrong derivation of adjec- • Entries containing special characters, such tival endings: medizinisch Versorgung, • 227 gesundheitsbezogenere Angabe instead of December). In addition, most of the extracted → medizinische Versorgung (medical care), adjectives, adverbs or verbs after intersection with gesundheitsbezogene Angabe (health-related the RBMT lexicons are wrong and will not be im- claim). ported. Non-basic forms or non-existent forms like Approx. 2% of German-English entries and al- • Basler Ausschusses, Brusselrer¨ Flughafen most 4% of the English-Spanish data were cor- instead of Basler Ausschuss, Brusseler¨ Flug- rected by these automatic cleanup procedures. hafen. About 10% of the originally extracted entries are Wrong Plurals: Genitalverstummelungen¨ bei deleted for German-English and around 25% for • Frauen female genital mutilation. English-Spanish (cf. Table 2 for details on the → Varying capitalization, thus variants’ multi- number of extracted terms and their error rate • plication: Eu Presidency, Eu presidency, EU before and after semi-automatic inspection, auto- Presidency. matic cleanup and selection). Wrong category assignment: noun instead of • 3.3 Linguistic Feature-Value-Pair adverb for the expressions case by case. Augmentation US vs. UK spelling • After completion of the refined term extraction, the Wrong translation equivalents like: Kandi- • next step involves deep feature and value assign- datur der Turkei¨ Turkey ’s, Grundgesetz → ment. Targeted category- and content-specific al- von Hongkong of of. → gorithms for the respective linguistically classified The majority of these false terms were due to terms could be developed and applied: The im- lemmatizer and derivational errors in the linguis- port preprocessing parses and augments the data tic submodules of the term acquisition. To a great structures by generating the obligatory linguistic extent these wrong terms are corrected in the fol- feature-value pairs for the RBMT system. Fur- lowing precision phase: Wrong or missing end- thermore, it creates the monolingual and bilin- ings of adjective specifiers were rectified, wrong gual information for the three system lexicons (two superlative adjective specifier compounds and in- monolingual and one bilingual lexicon) and finally correct derivation of irregular adjective specifiers writes the entries. were corrected. Senseless terms, such as of of, This is performed by the subsequent module and wrongly categorized entries, such as belong, containing the Input Parser and the Defaulter: deepen, Foreign as nouns are deleted. Single word The lexicographic information needed for a com- nouns in plural form are evaluated to avoid getting plete RBMT entry can be automatically created duplicates to already existing singular entries. from individual parts of the entry so far generated. Also most of the wrong translation equivalents Even information from one entry can be used to which followed a certain pattern or syntax could complete other entries so that available informa- be deleted. Of course, a certain number of wrong tion in already existing RBMT lexicons can be ac- equivalents, especially the pure semantic ones, can cessed and used for calculation of new entries: only be detected and deleted manually. Since we Single Word Defaulting When the user imports did only automatic cleanup, these errors will ap- a new monolingual entry tagged with noun, verb, pear later as specific translation errors, but they adjective, adverb, all the obligatory feature values will only play a minor role (cf. Section 4). (e.g. for nouns: allomorph, declension class value, A major role in the quality refinement as a whole linguistic gender, kind of noun (e.g. proper nouns, is played by the selection process which benefits countable nouns), natural sex, semantic type of now from the former differentiation process. Term noun, e.g. abstract nouns, temporal units and the pairs with very improbable category changes (i.e. like) can automatically be inserted by LiSTEX. if the source term is a noun and the target term These defaults represent the best guess the system an adverb) will be completely deleted. Also for can make. example all acquired term data containing a Ger- man multiword source with an English single tar- Multiword Defaulting Multiwords consist of get word will be discarded and thus cannot un- heads and variable parts. The input parser can dergo the further transaction of automatic augmen- automatically parse and recognize their internal tation (example: europaischer¨ Rat im Dezember structure and then creates monolingual entries for

228 heads and variable parts and defaults the obligatory lines and for Spanish-English 1,253,026 lines. Ta- values. ble 1 shows a large number of monolingual en- tries compared to the relatively small number of Defaulting Monolingual Entries from Transfer bilingual entries. This can be explained by the fact Entries Transfer entries are not usable in the that the monolingual lists still contain many entries RBMT system if there are no corresponding mono- which only appear once in the corpus. Many of lingual entries for analysis and generation. There- these entries are faulty, so they are discarded early fore, after processing the transfer entries, the cor- on. For example, for German English 365,243 responding mono entries are automatically gener- → of the 441,425 terms have a frequency of 1. ated.

Deriving There are features that logically follow Translation Direction Monoling. List Biling. List from already defined features. For example, gen- German-English 441,425 / 508,592 45,857 der and number features are derived from the class Spanish-English 406,296 / 294,069 35,088 value. These features are not generated in the same way as the defaulted features are. Defaulting, af- Table 1: Extracted Terms ter all, is only a kind of intelligent guess. Derived features, on the other hand, are calculated and by The bilingual lists form the basis for removing definition are correct, provided they are deduced unwanted single terms resulting from multiwords. from correct feature-values. In the end, we got 30,803 terms for German- After completing this input and default- English with an error rate of about 14% (cf. Table ing step, the so far extracted translation equiva- 2). After the semi-automatic correction, the num- lent term Beitrittsverhandlung accession nego- ber of entries is reduced to 27,054 terms with an er- → tiation with the annotation noun is augmented in ror rate of about 8%. We evaluated the error rate by this phase and receives the following automatically randomly selecting 5% of the whole data set and generated structures and feature-value-pairs: De- interpolated the results to the complete data set. faulted monolingual and bilingual entries: For English-Spanish, the results are slightly better (”Beitrittsverhandlung” NST ALO ”Beitrittsverhand- with an error reduction from 18.25% to 3.8%. • lung” ARGS ((N1 (PREP ”mit” ”uber”)¨ (CA A)) (N0 The corrected and augmented terms are finally (PREP0 ”uber”)¨ (FCP TH) (INT T))) CL (P-EN S-0) imported into the RBMT lexicons. For German- GD (F) KN MS-CNT LINK S SX (N) TYN (PRO) !AUTHOR ”TermExtract” !OWNER ”sys” !DATE English, out of the 27,054 terms, around 26,000 1296734190 ) entries could be imported without any further ac- (”accession negotiation” NST ALO ”accession ne- tion. 931 term pairs for German English re- • gotiation” KN CNT TYN (ABS PRO) MW-TYPE → spectively 720 for English German result in STRING-NST MW-BODY ((STRING ”accession”) → (HEAD)) !AUTHOR ”TermExtract” !OWNER ”sys” conflicts during the import. Namely, conflicting !DATE 1296734198 MW-HEAD-CAN ”negotiation” transfer entries are term pairs which already exist MW-HEAD-CAT NST) in the RBMT lexicon, but with additional infor- (”Beitrittsverhandlung” NST ”accession negotiation” • mation, such as additional tests or transformations NST PRF 10000 TAG (POL) !DATE 1296734170 performed during transfer. Therefore, the new en- !OWNER ”sys” !AUTHOR ”TermExtract”) tries have been added to the lexicon in addition to As one can see from both monolingual en- the already existing entries in order not to loose tries, the defaulting mechanism before making its information. Furthermore, approx. 1,500 trans- guesses opens the already existing lexicons and ac- fer entries were modified during the import, since cesses - if available - already coded information, these are term pairs which are identical to existing here from the entries Verhandlung and negotiation RBMT lexicon entries, but just differ in the subject and propagates the identified feature-value pairs on area information. These entries are merged with the corresponding new entries. the existing lexicon entries during import. Only 4 Evaluations for about half of the terms new monolingual Eng- lish or German entries were created, since the other 4.1 Extraction and Import Quantification half is already available in the lexicon. On one For the evaluation, we extracted phrase lists from hand, this is due to the fact that the RBMT lexicons the EuroParl corpus (Koehn, 2005): For German- are already quite big and contain a great variety of English, the Europarl Corpus contains 1,259,571 terminology. On the other hand, the Europarl cor-

229 English Spanish English German ↔ ↔ Extracted Terms Terms after Correction Extracted Terms Terms after Correction Number of Terms 24,519 18,546 30,803 27,054 Error Rate 18.25% 3.8% 14.25% 7.71%

Table 2: Error Rates for Extracted and Corrected Term Pairs pus does not contain very specific terminology, but cabulary. more general terms which are already covered by In all language directions, the translation quality the RBMT lexicons. evaluation revealed clear improvements, ranging The results for the English-Spanish import are from approx. 38% better in Spanish → very similar to the ones for German-English: Out English to more than 50% improved translations of the 18,546 terms, around 18,000 entries could in German English. However, these improve- → be imported without any further action, whereas ments are offset by approx. 10% to 20% worsened approx. 500 term pairs resulted in conflicts during translations. In the following, we evaluate in detail the import and have been added as additional en- all the translation differences in order to find out tries. Nearly 2,000 entries have been modified and the reasons for the big variety between the various merged with existing entries just differing in the language directions on one hand and for the dete- subject area information. Here again for only half riorations caused by LiSTEX on the other hand to of the terms new monolingual English or Spanish develop mechanisms to compensate them. entries were created, since the other entries already exist in the RBMT lexicon. Improvements The translation evaluation re- vealed several improvements: More appropriate 4.2 Translation-Related Evaluation terminology is used. For example, the term prosti- The translation evaluation after the LiSTEX im- tucion´ infantil is now correctly translated as child port revealed a high number of differently trans- prostitution. Before the expression was composi- lated sentences: All four directions showed more tionally translated as infantile prostitution which is than 95% differently translated sentences (cf. Ta- wrong. ble 4). And even within these sentences we found Even the recognition of the whole sentence not only one, but several differences. structure can occur to be substantially improved The BLEU and NIST scores did not show any with the additional terminology, since the compo- improvements comparing LiSTEX to the baseline sitional analysis of complex multiword structures system (cf. Table 3). But since BLEU’s correla- is no longer necessary, if these multiwords are now tion with human judgments has already been ques- in the lexicon. tioned (Callison-Burch et al., 2006), we performed a manual evaluation which revealed clear improve- Multiword Parts Added and/or Lost Some- ments for all language directions. times during extraction a part of a multiword has For this evaluation, we adapted the Appraise been lost or in contrary was added. If not corrected evaluation tool (Federmann, 2010) so that the in- during the precision phase, the result is a wrong terface shows the source sentence, reference trans- entry which in consequence produces false transla- lation and the two anonymised translation candi- tions. For example, Zeitraum is wrongly translated dates by the RBMT system. Anonymising the or- as five-year period instead of period. der will eliminate potential bias by the human an- notators. Wrong Translation Equivalents There are From these different translations one third up to translations that have underlying inadequate term half of them are of equal quality as before. This equivalents in a given context, i.e. November is portion is mainly due to new alternatives which translated into October since the term imported may vary without changing the quality of the new was derived from the human translation mistake output. This outcome is due to the fact that the made in the original corpus. Sometimes it is a spe- RBMT lexicons are of quite broad coverage and cific idiomatic usage in the corpus which then after that the Europarl corpus covers mostly general vo- extraction does not reflect the default usage.

230 Baseline LiSTEX Translation Direction NIST BLEU NIST BLEU German English 5.5582 0.1632 5.2430 0.1491 → English German 4.3616 0.1078 4.2718 0.1060 → Spanish English 5.8776 0.1953 5.6414 0.1830 → English Spanish 6.1375 0.2085 5.9086 0.1978 → Table 3: BLEU and NIST Scores

English Spanish English German ↔ ↔ English Spanish Spanish English English German German English → → → → Translated TUs 2,000 2,000 2,000 2,000 Different Translations 95.05% 96.05% 95.45% 96.20% Evaluated Differences 287 398 970 1,261 Better 47.74% 38.44% 50.82% 57.49% Equal 31.71% 46.73% 41.86% 30.29% Worse 20.56% 14.82% 7.32% 12.21%

Table 4: Translation Quality Evaluation of English-Spanish and English-German

Multiword Expressions Multiword expressions Wrong Capitalization of English Terms The in contrast to the fixed expressions maintain a large capitalization of the extracted English terms is in- variability as to their morpho-syntactic behaviour. consistent and thus produces capitalized and non- As we now import a huge number of multiwords capitalized translation alternatives. and even collocations, the RBMT multiword treat- Since we ment is challenged to a considerable degree. Espe- Missing Subcategorization Frames added the extracted and augmented terminology, cially coordination and PP-attachment procedures already existing entries which only have transfor- are heavily affected. Multiwords are normally mations and no tests are no longer preferred in fixed in their structure so that it is not possible transfer and thus imported entries with no subcate- to add additional modifiers to the head or variable gorization information are selected: nouns like Be- parts. For collocations, this is slightly different as merkung zu jdn./etw. remark on sb./sth. have in they interact more freely. If you import a term pair, → the original RBMT lexicon a transformation which such as europaischer¨ Rat in Nizza - Nice Coun- had mapped the original preposition into the ap- cil and now translate the phrase des Europaischen¨ propriate target preposition. Rates in Nizza und in Biarritz, the analysis can no Quantifying the worse translations, the follow- longer attach und in Biarritz to the head Rat and ing picture can be drawn as to the given deteri- therefore the analysis fails. Here more intelligent oration types in descending order: The type mechanisms have to be developed to process collo- Al- is by far the most frequently disturb- cations and still allow for further modifying them. ternatives ing factor followed by the extraction-error of Mul- Alternatives from TermExtraction Alterna- tiword Part Lost/Added. The third item of the tives originating from the terminology extraction Multiword errors may cause syntactic deteriora- output have to be evaluated more specifically. At tion and more phrasal analyses although the entry the moment, we reduced the import to the 5 best itself has been correct. Capitalization as fourth er- alternatives for a given source term according to ror factor can also cause syntactic problems, since their frequencies. These terms are then alpha- these capitalized terms are expected to be proper betically ordered in the translation process. First names. Thus these entries get special default val- evaluations revealed that the number of alterna- ues. The error type Wrong Translation Equiva- tives should be restricted even more and that apart lents appears only on the final position. It could be from the scoring algorithm additional measures reduced to a quite insignificant amount, so for the like mappings, default scores, frequency ordering future work we will rather concentrate on the first are necessary. four items cited in this list.

231 5 Conclusion Acknowledgment Part of this work was carried out within the Eu- There is no simple press-a-button-method for fill- roMatrix Plus project and has been funded by the ing RBMT systems with noticeable quality gain European Union as FP7-ICT-2007-3-231720. in the end. Our research, however, could clearly attest the great potential of the multi-level hybrid approach of LiSTEX: On one side we have an ex- References tended intertwined acquisition and precision setup Alonso, J. and G. Thurmair. 2003. The Comprendium which benefits from statistical techniques in ac- Translator system. In Proceedings of the MT Summit cessing the most appropriate and frequent transla- IX (New Orleans). tions of terms and phrases in large corpora while Callison-Burch, C., M. Osborne, and P. Koehn. 2006. using at the same time deeper knowledge like Re-evaluating the role of in machine translation tree structures from the target RBMT system and research. In Proceedings of the 11th EACL (Trento), -based . On the other side, pages 249–256. LiSTEX features a complex system using pre- Chen, Y., M. Jellinghaus, A. Eisele, Yi Z., S. Hunsicker, cise linguistic deriving and defaulting techniques S. Theison, Ch. Federmann, and H. Uszkoreit. 2009. for automatic execution of the obligatory feature- Combining multi-engine translations with Moses. In Proceedings of the Fourth Workshop on Statistical value augmentation and linguistic entity genera- Machine Translation (Athens), pages 42–46. tion before system integration takes places. Dugast, L., J. Senellart, and P. Koehn. 2009. Sta- We could show that almost all intermediate steps tistical post editing and dictionary extraction: Sys- up to the final integration can be automized to a tran/Edinburgh submissions for ACL-WMT2009. In high degree which indeed is a convincing perfor- Proceedings of the Fourth Workshop on Statistical mance. As we could see in the evaluation de- Machine Translation (Athens), pages 110–114. scription there are specific technological hurdles Eisele, A., C. Federmann, H. Uszkoreit, H. Saint- to overcome. For further verification of LiSTEX Amand, M. Kay, M. Jellinghaus, S. Hunsicker, there is now need for other underlying extraction T. Herrmann, and Y. Chen. 2008a. Hybrid architec- tures for multi-engine machine translation. In Pro- corpora which are more domain-specific and have ceedings of Translating and the Computer 30. little intersection with the already available broad- coverage RBMT system. Lemmatizer improve- Eisele, A., C. Federmann, H. Uszkoreit, H. Saint- Amand, M. Kay, M. Jellinghaus, S. Hunsicker, ments need to be performed to ease the task of the T. Herrmann, and Y. Chen. 2008b. Hybrid machine subsequent precision module. translation architectures within and beyond the eu- Another issue is the large amount of imported romatrix project. In Proceedings of European As- sociation of Machine Translation (Hamburg), pages multiword and phrase-like entries which are rather 27–34. free collocations than fixed . By LiSTEX we could overcome the morpho-syntactic issues Federmann, C. 2010. Appraise: An open-source toolkit for manual phrase-based evaluation of trans- still showing up in SMT-bound systems, but now lations. In Proceedings of the Conference on the level of challenge has moved up to the is- International Language Resources and Evaluation sue of deeper syntactic inclusion of larger linguis- (Valetta), pages 1731–1734. tic phrase units which stand above fixed expres- Heid, U. 1999. Extracting terminologically relevant sions and idioms in the strict sense, but which collocations from german technical texts. In Pro- precisely play the major role when targeting for ceedings of the International Congress on Termino- non-deterministic natural expressions in the trans- logy and Knowledge Engineering, pages 241–255. lation output. This challenge needs to be addressed Koehn, P. 2005. Europarl: A parallel corpus for statis- and additional adaptations have to be implemented tical machine translation. In Proceedings of the MT since the RBMT grammars could perform such a Summit X (Phuket), pages 79–86. proper handling of syntactically free collocations. Thurmair, G. 2003. Making term extraction tools us- able. In Proceedings of the Conference of Controlled Further research is necessary to refine the selec- Language Translation (Dublin), pages 170–179. tion among the translation alternatives produced by LiSTEX concentrating on additional narrow- Thurmair, G. 2009. Comparing different architectures of hybrid machine translation systems. In Proceed- down-procedures selecting by frequency, default ings of the MT Summit XII (Ottawa), pages 340–347. and mapping measures.

232