Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options
Total Page:16
File Type:pdf, Size:1020Kb
Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options Yulia Tsvetkov Chris Dyer Lori Levin Archna Bhatia Language Technologies Institute Carnegie Mellon University Pittspurgh, PA, 15213, USA ytsvetko, cdyer, lsl, archna @cs.cmu.edu { } Abstract Eidelman et al., 2013), improving the identifica- We propose a technique for improving tion of phrase pairs in parallel data (DeNero et al., the quality of phrase-based translation 2008; DeNero and Klein, 2010), and formal gen- systems by creating synthetic translation eralizations to gapped rules and rich nonterminal options—phrasal translations that are gen- types (Chiang, 2007; Galley et al., 2006). This erated by auxiliary translation and post- paper proposes a different mechanism for improv- editing processes—to augment the de- ing phrase-based translation: the use of synthetic fault phrase inventory learned from par- translation options to supplement the standard allel data. We apply our technique to phrasal inventory used in phrase-based translation the problem of producing English deter- systems. miners when translating from Russian and In the following, we argue that phrase tables ac- Czech, languages that lack definiteness quired in usual way will be expected to have gaps morphemes. Our approach augments the in their coverage in certain language pairs and English side of the phrase table using a that supplementing these with synthetic translation classifier to predict where English arti- options is a priori preferable to alternative tech- cles might plausibly be added or removed, niques, such as post processing, for generalizing and then we decode as usual. Doing beyond the translation pairs observable in training data ( 2). As a case study, we consider the prob- so, we obtain significant improvements in § quality relative to a standard phrase-based lem of producing English definite/indefinite arti- baseline and to a to post-editing complete cles (the, a, and an) when translating from Russian translations with the classifier. and Czech, two languages that lack overt definite- ness morphemes ( 3). We develop a classifier that § 1 Introduction predicts the presence and absence of English arti- cles ( 4). This classifier is used to generate syn- Phrase-based translation works as follows. A set § of candidate translations for an input sentence is thetic translation options that are used to augment phrase tables used the usual way ( 5). We eval- created by matching contiguous spans of the in- § put against an inventory of phrasal translations, uate their performance relative to post-processing reordering them into a target-language appropri- approach and to a baseline phrase-based system, ate order, and choosing the best one according to a finding that synthetic translation options reliably outperform the other approaches ( 6). We then discriminative model that combines features of the § phrases used, reordering patterns, and target lan- discuss how our approach relates to previous work ( 7) and conclude by discussing further applica- guage model (Koehn et al., 2003). This relatively § tions of our technique ( 8). simple approach to translation can be remarkably § effective, and, since its introduction, it has been 2 Why Synthetic Translation Options? the basis for further innovations, including devel- oping better models for distinguishing the good Before turning to the problem of generating En- translations from bad ones (Chiang, 2012; Gim- glish articles, we give arguments for why syn- pel and Smith, 2012; Cherry and Foster, 2012; thetic translation options are a useful extension of 271 Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 271–280, Sofia, Bulgaria, August 8-9, 2013 c 2013 Association for Computational Linguistics standard phrase-based translation approaches, and I saw the why this technique might be better than some al- I saw a ternative proposals that been made for generaliz- ing beyond translation examples directly observ- I saw able in the training data. saw a cat In language pairs that are typologically sim- saw the cat ilar (i.e., when both languages lexicalize the saw the the cat same kinds of semantic and syntactic informa- saw a a cat tion), words and phrases map relatively directly I saw cat from source to target languages, and the standard approach to learning phrase pairs is quite effec- Я увидел кошку tive.1 However, in language pairs in which in- 1SG+NOM saw +1SG +PST cat+ACC dividual source language words have many dif- ferent possible translations (e.g., when the target Figure 1: Russian-English phrase-based transla- language word could have many different inflec- tion example. Since Russian lacks a definiteness tions or could be surrounded by different func- morpheme the determiners a, the must be part of tion words that have no direct correspondence in a translation option containing óâèäåë or êîøêó the source language), we can expect the standard in order to be present in the right place in the En- phrasal inventory to be incomplete, except when glish output. Translation options that are in dashed very large quantities of parallel data are available boxes should exist but were not observed in the or for very frequent words. There simply will not training data. This work seeks to produce such be enough examples from which to learn the ideal missing translation options synthetically. set of translation options. Therefore, since phrase based translation can only generate input/output if multiple possibilities appear to be equally good word pairs that were directly observed in the train- (say, multiple inflections of a translated lemma), ing corpus, the decoder’s only hope for produc- then multiple translation options may be synthe- ing a good output is to find a fluent, meaning- sized. Ultimately, of course, the global translation preserving translation using incomplete transla- model must select one translation for every phrase tion lexicons. Synthetic translation option genera- it uses, but the decoder will have access to global tion seeks to fill these gaps using secondary gener- information that it can use to pick better transla- ation processes that produce possible phrase trans- tion options. lation alternatives that are not directly extractable from the training data. We hypothesize that by 3 Case Study: English Definite Articles filling in gaps in the translation options, discrim- inative translation models will be more effective We now turn to a translation problem that we will (leading to better translation quality). use to assess the value of synthetic translation op- The creation of synthetic translation options can tions: generating English in/definite articles when be understood as a kind of translation or post- translating from Russian. editing of phrasal units/translations. This raises Definiteness is a semantic property of noun a question: if we have the ability to post-edit a phrases that expresses information such as iden- phrasal translation or retranslate a source phrase tifiability, specificity, familiarity and unique- so as to fill in gaps in the phrasal inventory, we ness (Lyons, 1999). In English, it is expressed should be able to use the same technique to trans- through the use of article determiners and non- late the sentence; why not do this? While the ef- article determiners. Although languages may ex- fectiveness of this approach will ultimately be as- press definiteness through such morphemes, many sessed empirically, translation option generation is languages use alternative mechanisms. For exam- appealing because the translation option synthe- ple they may use noncanonical word orders (Mo- 2 sizer need not produce only single-best guesses— hanan, 1994) or different constructions such as existentials, differential object marking (Aissen, 1When translating from a language with a richer lexical 2003), and the ba (吧) construction in Chinese inventory to a simpler one, approximate matching or backing off to (e.g.) morphologically simpler forms likewise reliably 2See pp. 11–12 for an example in Hindi, a language with- produces good translations. out articles. 272 (Chen, 2004). While these languages lack arti- logistic regression: cles, they may use demonstratives and the quan- tifier one to emphasize definiteness and indefinite- p(y w, i) exp λjhj(y, w, i), | ∝ ness, respectively. Xj Russian and Czech are examples of languages where h ( ) are feature functions, λ are the corre- that use non-lexical means to express definiteness. j · j sponding weights, and y D, I, N refer, respec- As such, in Russian to English translation systems, ∈ { } tively, to the outputs: definite article, indefinite ar- we expect that most Russian nouns should have at ticle, and no article.5 least three translation options—the bare noun, the noun preceded by the, and the noun preceded a/an. 4.2 Features Fig. 1 illustrates how the definiteness mismatch The English article system is extremely com- between Russian and English can result in “gaps” plex (as non-native English speakers will surely in the phrasal inventory learned from a relatively know!): in addition to a general placement rule large parallel corpus. The Russian input should that articles must precede a noun or its modifiers translate (depending on context) as either I saw a in an NP, multiple other factors can also affect ar- cat or I saw the cat; however, the phrase table we 3 ticle selection, including countability of the head learned is only able to generate the former. noun, syntactic properties of an adjective modi- fying a noun (superlative,