Enriching Phrase-Based Statistical Machine Translation with POS

Enriching Phrase-Based Statistical Machine Translation with POS Information Miriam Kaeshammer and Dominikus Wetzel Department of Computational Linguistics Saarland University Saarbrücken, Germany {miriamk,dwetzel}@coli.uni-sb.de Abstract complex as the mapping between the factors can result in an explosion of translation options. This work presents an extension to phrase- With this work, we explore a different approach based statistical machine translation mod- to integrate linguistic knowledge, in particular els which incorporates linguistic knowl- POS information, into the phrase-based model. edge, namely part-of-speech information. The standard phrase (translation) table is enriched Scores are added to the standard phrase ta- with new scores which encode the correspondence ble which represent how the phrases cor- on the POS level between the two phrases of a respond to their translations on the part- phrase pair; for example the probability of “trans- of-speech level. We suggest two different lating” the POS sequence of one phrase into the kinds of scores. They are learned from a POS sequence of the other phrase. We propose POS-tagged version of the parallel train- two methods to obtain such POS scores. These ing corpus. The decoding strategy does extra scores are additional feature functions in the not have to be modified. Our experiments log-linear framework for computing the best trans- show that our extended models achieve lation (Och and Ney, 2002). They supply further similar BLEU and NIST scores compared information about the phrase pairs under consid- to the standard model. Additional manual eration during decoding, but do not increase the investigation reveals local improvements number of translation options. in the translation quality. The presented extension neither makes use of hand-crafted rules nor manually identified pat- 1 Introduction terns. It can therefore be performed fully auto- Currently, the most prominent paradigm in statis- matically. Furthermore, our approach is language- tical machine translation (SMT) are phrase-based independent and does not rely on a specific POS models (Koehn et al., 2003), in which text chunks tagger or tag set. Adaptation to other language (phrases) of one language are mapped to corre- pairs is hence straightforward. sponding text chunks in another language. This This paper first describes related work and then standard approach works only with the surface introduces our extended translation model. Eval- forms of words and no linguistic information is uation results are reported for experiments with a used for establishing the mapping between phrases German-English system. We finally discuss our or generating the final translation. It has been work and suggest possible further extensions. shown, however, that integrating linguistic knowl- 2 Related Work edge, e.g. part-of-speech (POS) or morphological information, in pre- or post-processing or directly There are several strategies for improving the into the translation model improves the translation quality of standard phrase-based SMT by incor- quality (cf. Section 2). porating linguistic knowledge, in particular POS Factored translation models (Koehn and Hoang, information. 2007) are one extension of the standard phrase- One such approach is to modify the data in based approach, which allow to include rich lin- a pre-processing step. For example, Collins et guistic knowledge into the translation model. Ad- al. (2005) parse the sentences of the source lan- ditional models for the specified factors are used, guage and restructure the word order, such that which makes decoding computationally more it matches the target language word order more 33 Proceedings of the Student Research Workshop associated with RANLP 2011, pages 33–40, Hissar, Bulgaria, 13 September 2011. closely. Language-specific, manually devised the POS sequence which underlies one phrase of rules are employed. Popovic´ and Ney (2006) the pair corresponds to the POS sequence of the follow the same idea, but make use of manu- other phrase of the pair. Two concrete methods ally defined patterns based on POS information: to calculate this correspondence will be described e.g. local adjective-noun reordering for Span- in Section 3.2. The new scores can be integrated ish and long-range reorderings of German verbs. into the log-linear framework as additional feature Essentially, this strategy aims at facilitating and functions. improving the word alignment. Another exam- Figure 1 shows two phrase pairs from a ple along those lines is (Carpuat, 2009). Surface German-English phrase table. In this partic- words in the training data are replaced with their ular case, the POS scores should encode the lemma and POS tag. Once the improved align- correspondence between ART ADJA NN from ment is obtained, the phrase extraction is based the German side and DT JJ NNS VBN (a) or on the original training data, thus a different de- DT JJ NNS (b) from the English side. In- coding strategy is not necessary. Another data- tuitively, ART ADJA NN corresponds better to driven approach is presented in (Rottmann and Vo- DT JJ NNS than to DT JJ NNS VBN. Phrase gel, 2007), where word reordering rules based on pair (b) should therefore have higher POS scores. POS tags are learned. A word lattice with all re- The transition from the standard translation orderings (including probabilities for each) is con- model to the extended one can be broken up structed and used by the decoder to make more into two major steps: (1) POS-Mapping, which informed decisions. is the task of mapping each phrase pair in the Another strategy is concerned with enhancing standard phrase table to its underlying pair of the system’s output in a post-processing step. POS sequences (henceforth POS phrase pair), and Koehn and Knight (2003) propose a method for (2) POS-Scoring, which refers to assigning POS noun phrases where feature-rich reranking is ap- scores to each phrase pair based on the previously plied to a list of n-best translations. determined POS phrase pair. Instead of the above pre- or post-processing steps, Koehn and Hoang (2007) present factored 3.1 POS-Mapping models which allow for a direct integration of lin- Obtaining the part-of-speech information for each guistic information into the phrase-based trans- phrase in the phrase table cannot be achieved by lation model. Each surface word is now repre- tagging the phrases with a regular POS tagger. sented by a vector of linguistic factors. It is a They are usually written for and trained on full general framework, exemplified on POS and mor- sentences. Phrases would therefore get assigned phological enrichment. In order to tackle the in- incorrect POS tags, since a phrase without its con- creasing translation options introduced by addi- text and the same phrase occurring in an actual tional factors, the decoding strategy needs to be sentence are likely to be tagged with different POS adapted: translation options are precomputed and sequences. early pruning is applied. Factored models including POS information (amongst others) are em- Since the phrase pairs in the phrase table origi- ployed for example by Holmqvist et al. (2007) for nate from specific contexts in the parallel training German-English translation and Singh and Bandy- corpus, we require a phrase to have the same POS opadhyay (2010) for the resource-poor language sequence as it has in the context of its sentence. pair Manipuri-English. Consequently, our approach takes the following steps: First, both sides of the training corpus are 3 Extended Translation Model POS-tagged. Secondly, the untagged phrases in the phrase table and their tagged counterparts in The general idea is to integrate POS information the corpus are associated with each other to estab- into the translation process by adding one or sev- lish a mapping from phrase pairs to POS phrase eral POS scores to each phrase pair in the stan- pairs. This procedure is consequently not called dard phrase table which represents the transla- POS-Tagging, but rather POS-Mapping. tion model and usually contains phrase transla- Our approach is to apply the same phrase ex- tion probabilities, lexical weightings and a phrase traction algorithm again that has been used to ob- penalty. The additional scores reflect how well tain the standard phrase table. Phrase pairs are ex- 34 (a) die möglichen risiken ||| the possible risks posed ||| 1.0 [. ] 0.155567 0.000520715 (b) die möglichen risiken ||| the possible risks ||| 0.1 [. ] 0.178425 0.0249141 Figure 1: Two phrase pairs, each with the first standard translation score and two new POS scores. tracted from the POS-tagged parallel training cor- lihood estimation for translation probability and pus, thereby taking over the word alignments that lexical weighting in both translation directions) is have been established for the parallel sentences to performed on a version of the parallel training cor- extract standard phrase pairs before. In the re- pus, in which each word is substituted by its POS sulting word/POS phrase table, a token is a com- tag. Again, as we did in Section 3.1 to obtain bination of a word with a POS tag. For this to the word/POS phrase table, the word alignments work, words and POS tags must be delimited by that were established to extract the standard phrase any special character other than a space. Thanks to pairs are reused. the reused word alignments, the word/POS phrase In this way, a POS phrase table is trained which table contains each phrase pair of the standard has four scores attached to each POS phrase pair. phrase table at least once. If a phrase pair occurs Those are the desired PPT scores. Due to the with several different POS sequences in the train- reused word-alignment, it contains all POS phrase ing data, the word/POS phrase table contains an pairs that also occur in the word/POS phrase table.

Load more