2: Theory and Praxis of Decoding P Statistical Machine Translation Statistical Machine Translation P Lecture 2

Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation Statistical Machine Translation p Lecture 2 ¯ Components: Translation model, language model, decoder Theory and Praxis of Decoding foreign/English English Philipp Koehn parallel text text [email protected] statistical analysis statistical analysis School of Informatics University of Edinburgh Translation Language Model Model Decoding Algorithm – p.1 – p.2 Philipp Koehn, University of Edinburgh 2 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Phrase-Based Systems p Phrase-Based Translation p ¯ A number of research groups developed phrase-based Morgen fliege ich nach Kanada zur Konferenz systems ( RWTH Aachen, Univ. of Southern California/ISI, CMU, IBM, Johns Hopkins Univ., Cambridge Univ., Univ. of Catalunya, ITC-irst, Univ. Edinburgh, Univ. of Maryland...) Tomorrow I will fly to the conference in Canada ¯ Systems differ in – training methods ¯ Foreign input is segmented in phrases – model for phrase translation table – any sequence of words, not necessarily linguistically motivated – reordering models ¯ Each phrase is translated into English – additional feature functions ¯ Phrases are reordered ¯ Currently best method for SMT (MT?) – top systems in DARPA/NIST evaluation are phrase-based – best commercial system for Arabic-English is phrase-based – p.3 – p.4 Philipp Koehn, University of Edinburgh 3 Philipp Koehn, University of Edinburgh 4 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Phrase Translation Table p Decoding Process p Maria no dio una bofetada a la bruja verde ¯ Phrase Translations for “den Vorschlag”: j j English (e f) English (e f) the proposal 0.6227 the suggestions 0.0114 ’s proposal 0.1068 the proposed 0.0114 a proposal 0.0341 the motion 0.0091 ¯ Build translation left to right the idea 0.0250 the idea of 0.0091 – select foreign words to be translated this proposal 0.0227 the proposal , 0.0068 proposal 0.0205 its proposal 0.0068 of the proposal 0.0159 it 0.0068 the proposals 0.0159 ... ... – p.5 – p.6 Philipp Koehn, University of Edinburgh 5 Philipp Koehn, University of Edinburgh 6 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde Mary Mary ¯ ¯ Build translation left to right Build translation left to right – select foreign words to be translated – select foreign words to be translated – find English phrase translation – find English phrase translation – add English phrase to end of partial translation – add English phrase to end of partial translation – mark foreign words as translated – p.7 – p.8 Philipp Koehn, University of Edinburgh 7 Philipp Koehn, University of Edinburgh 8 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde Mary did not Mary did not slap ¯ ¯ One to many translation Many to one translation – p.9 – p.10 Philipp Koehn, University of Edinburgh 9 Philipp Koehn, University of Edinburgh 10 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde Mary did not slap the Mary did not slap the green ¯ ¯ Many to one translation Reordering – p.11 – p.12 Philipp Koehn, University of Edinburgh 11 Philipp Koehn, University of Edinburgh 12 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Translation Options p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a labruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the Mary did not slap the green witch slap the witch ¯ Look up possible phrase translations ¯ Translation finished – many different ways to segment words into phrases – many different ways to translate each phrase – p.13 – p.14 Philipp Koehn, University of Edinburgh 13 Philipp Koehn, University of Edinburgh 14 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Hypothesis Expansion p Hypothesis Expansion p Maria no dio una bofetada a labruja verde Maria no dio una bofetada a labruja verde Mary not give a slap to the witch green Mary not give a slap to the witch green did not a slap by green witch did not a slap by green witch no slap to the no slap to the did not give to did not give to the the slap the witch slap the witch e: e: e: Mary f: --------- f: --------- f: *-------- p: 1 p: 1 p: .534 ¯ Start with empty hypothesis ¯ Pick translation option – e: no English words ¯ Create hypothesis – f: no foreign words covered – e: add English phrase Mary – p: probability 1 – f: first foreign word covered – p: probability 0.534 – p.15 – p.16 Philipp Koehn, University of Edinburgh 15 Philipp Koehn, University of Edinburgh 16 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p A Quick Word on Probabilities p Hypothesis Expansion p Maria no dio una bofetada a labruja verde ¯ Not going into detail here, but... Mary not give a slap to the witch green did not a slap by green witch ¯ Translation Model no slap to the did not give to – phrase translation probability p(MaryjMaria) the slap the witch – reordering costs e: witch f: -------*- – phrase/word count costs p: .182 e: e: Mary – ... f: --------- f: *-------- p: 1 p: .534 ¯ Language Model – uses trigrams: ¯ Add another hypothesis > j < > j < – p(Mary did not) = p(Maryj s ) * p(did Mary, s ) * p(not Mary did) – p.17 – p.18 Philipp Koehn, University of Edinburgh 17 Philipp Koehn, University of Edinburgh 18 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Hypothesis Expansion p Hypothesis Expansion p Maria no dio una bofetada a labruja verde Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green Mary not give a slap to the witch green did not a slap by green witch did not a slap by green witch no slap to the no slap to the did not give to did not give to the the slap the witch slap the witch e: witch e: ... slap e: witch e: slap f: -------*- f: *-***---- f: -------*- f: *-***---- p: .182 p: .043 p: .182 p: .043 e: e: Mary e: e: Mary e: did not e: slap e: the e:green witch f: --------- f: *-------- f: --------- f: *-------- f: **------- f: *****---- f: *******-- f: ********* p: 1 p: .534 p: 1 p: .534 p: .154 p: .015 p: .004283 p: .000271 ¯ ... until all foreign words covered ¯ Further hypothesis expansion – find best hypothesis that covers all foreign words – backtrack to read off translation – p.19 – p.20 Philipp Koehn, University of Edinburgh 19 Philipp Koehn, University of Edinburgh 20 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Hypothesis Expansion p Explosion of Search Space p Maria no dio una bofetada a labruja verde ¯ Number of hypotheses is exponential with respect to Mary not give a slap to the witch green did not a slap by green witch sentence length no slap to the did not give to the slap the witch µ Decoding is NP-complete [Knight, 1999] e: witch e: slap f: -------*- f: *-***---- µ Need to reduce search space p: .182 p: .043 e: e: Mary e: did not e: slap e: the e:green witch – risk free: hypothesis recombination f: --------- f: *-------- f: **------- f: *****---- f: *******-- f: ********* p: 1 p: .534 p: .154 p: .015 p: .004283 p: .000271 – risky: histogram/threshold pruning ¯ Adding more hypothesis µ Explosion of search space – p.21 – p.22 Philipp Koehn, University of Edinburgh 21 Philipp Koehn, University of Edinburgh 22 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Hypothesis Recombination p Hypothesis Recombination p p=0.092 p=0.092 p=1 p=0.534 p=0.092 p=1 p=0.534 p=0.092 Mary did not give Mary did not give did not did not give give p=0.164 p=0.044 p=0.164 ¯ ¯ Different paths to the same partial translation Different paths to the same partial translation µ Combine paths – drop weaker hypothesis – keep pointer from worse path – p.23 – p.24 Philipp Koehn, University of Edinburgh 23 Philipp Koehn, University of Edinburgh 24 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Hypothesis Recombination p Hypothesis Recombination p p=0.092 p=0.017 p=0.092 did not give did not give Joe Joe p=1 p=0.534 p=0.092 p=1 p=0.534 p=0.092 Mary did not give Mary did not give did not did not give give p=0.164 p=0.164 ¯ ¯ Recombined hypotheses

2: Theory and Praxis of Decoding P Statistical Machine Translation Statistical Machine Translation P Lecture 2

(Meta-) Evaluation of Machine Translation

Public Project Presentation and Updates

8.14: Annual Public Report

Statistical and Hybrid Machine Translation Between All European Languages

Abstract the Circle of Meaning: from Translation

Improving Statistical Machine Translation Efficiency by Triangulation

Factored Translation Models and Discriminative Training For

Convergence of Translation Memory and Statistical Machine Translation

Statistical Machine Translation

Edinburgh's Statistical Machine Translation Systems for WMT16

Improved Learning for Machine Translation

A Survey of Statistical Machine Translation