Training and Evaluating Error Minimization Rules for Statistical Machine Translation Ashish Venugopal Andreas Zollmann Alex Waibel School of Computer Science School of Computer Science School of Computer Science Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University
[email protected] [email protected] [email protected] Abstract As discussed in (Och, 2003), the direct translation model represents the probability of target sentence Decision rules that explicitly account for ’English’ e = e1 ... eI being the translation for a non-probabilistic evaluation metrics in source sentence ’French’ f = f1 ... fJ through an machine translation typically require spe- exponential, or log-linear model cial training, often to estimate parame- Pm ters in exponential models that govern the exp( k=1 λk ∗ hk(e, f)) pλ(e|f) = P Pm 0 (1) search space and the selection of candi- e0∈E exp( k=1 λk ∗ hk(e , f)) date translations. While the traditional Maximum A Posteriori (MAP) decision where e is a single candidate translation for f rule can be optimized as a piecewise lin- from the set of all English translations E, λ is the ear function in a greedy search of the pa- parameter vector for the model, and each hk is a rameter space, the Minimum Bayes Risk feature function of e and f. In practice, we restrict (MBR) decision rule is not well suited to E to the set Gen(f) which is a set of highly likely this technique, a condition that makes past translations discovered by a decoder (Vogel et al., results difficult to compare.