A Machine Translation Approach to Cross-Lingual Word Sense Disambiguation (Semeval-2013 Task 10)

NRC: A Machine Translation Approach to Cross-Lingual Word Sense Disambiguation (SemEval-2013 Task 10) Marine Carpuat National Research Council Ottawa, Canada [email protected] Abstract (WSD) tasks: as a tool to automatically create training data (Guo and Diab, 2010, for instance) ; as This paper describes the NRC submission to a source of parallel data that can be used to train the Spanish Cross-Lingual Word Sense Dis- ambiguation task at SemEval-2013. Since this WSD systems (Ng and Chan, 2007; van Gompel, word sense disambiguation task uses Spanish 2010; Lefever et al., 2011); or as an application translations of English words as gold annota- which can use the predictions of WSD systems de- tion, it can be cast as a machine translation veloped for SemEval tasks (Carpuat and Wu, 2005; problem. We therefore submitted the output of Chan et al., 2007; Carpuat and Wu, 2007). This Se- a standard phrase-based system as a baseline, mEval shared task gives us the opportunity to com- and investigated ways to improve its sense dis- pare the performance of machine translation systems ambiguation performance. Using only local context information and no linguistic analy- with other submissions which use very different ap- sis beyond lemmatization, our machine trans- proaches. Our goal is to provide machine transla- lation system surprisingly yields top precision tion output which is representative of state-of-the-art score based on the best predictions. However, approaches, and provide a basis for comparing its its top 5 predictions are weaker than those strength and weaknesses with that of other systems from other systems. submitted to this task. We submitted two systems to the Spanish Cross-Lingual WSD (CLWSD) task: 1 Introduction 1. BASIC, a baseline machine translation system This paper describes the systems submitted by the trained on the parallel corpus used to define the National Research Council Canada (NRC) for the sense inventory; Cross-Lingual Word Sense Disambiguation task at 2. ADAPT, a machine translation system that has SemEval 2013 (Lefever and Hoste, 2013). As in been adapted to perform better on this task. the previous edition (Lefever and Hoste, 2010), this word sense disambiguation task asks systems to dis- After describing these systems in Sections 2 and ambiguate English words by providing translations 3, we give an overview of the results in Section 4. in other languages. It is therefore closely related to 2 BASIC: A Baseline Phrase-Based machine translation. Our work aims to explore this Machine Translation System connection between machine translation and cross- lingual word sense disambiguation, by providing a We use a phrase-based SMT (PBSMT) architec- machine translation baseline and investigating ways ture, and set-up our system to perform English-to- to improve the sense disambiguation performance of Spanish translation. We use a standard SMT system a standard machine translation system. set-up, as for any translation task. The fact that this Machine Translation (MT) has often been used PBSMT system is intended to be used for CLWSD indirectly for SemEval Word Sense Disambiguation only influences data selection and pre-processing. 2.1 Model and Implementation cube pruning. The main differences between this In order to translate an English sentence e into Span- set-up and the popular open-source Moses system ish, PBSMT first segments the English sentence into (Koehn et al., 2007), are the use of hierarchical re- phrases, which are simply sequences of consecutive ordering (Moses only supports non-hierarchical lex- words. Each phrase is translated into Spanish ac- icalized reordering by default) and smoothed trans- cording to the translations available in a translation lation probabilities (Chen et al., 2011). lexicon called phrase-table. Spanish phrases can be As a result, disambiguation decisions for the reordered to account for structural divergence be- CLWSD task are based on the following sources of tween the two languages. This simple process can information: be used to generate Spanish sentences, which are • local source context, represented by source scored according to translation, reordering and lan- phrases of length 1 to 7 from the translation and guage models learned from parallel corpora. The reordering tables score of a Spanish translation given an English input sentence e segmented into J phrases is defined as • local target context, represented by the 5-gram P P follows: score(s; e) = i j λilog(φi(sj; ej)) + language model. λLM φLM (s) Detailed feature definitions for phrase-based SMT Each English sentence in the CLWSD task is models can be found in Koehn (2010). In our sys- translated into Spanish using our PBSMT system. tem, we use the following standard feature functions We keep track of the phrasal segmentation used to φ to score English-Spanish phrase pairs: produce the translation hypothesis and identify the Spanish translation of the English word of interest. • 4 phrase-table scores, which are conditional When the English word is translated into a multi- translation probabilities and HMM lexical word Spanish phrase, we output the Spanish word probabilities in both directions translation di- within the phrase that has the highest IBM1 transla- rections (Chen et al., 2011) tion probability given the English target word. For the BEST evaluation, we use this process • 6 hierarchical lexicalized reordering scores, on the top PBSMT hypothesis to produce a single which represent the orientation of the current CLWSD translation candidate. For the Out-Of-Five phrase with respect to the previous block that evaluation, we produce up to five CLWSD transla- could have been translated as a single phrase tion candidates from the top 1000 PBSMT transla- (Galley and Manning, 2008) tion hypotheses. • a word penalty, which scores the length of the 2.2 Data and Preprocessing output sentence Training the PBSMT system requires a two-step pro- • a word-displacement distortion penalty, which cess with two distinct sets of parallel data. penalizes long-distance reorderings. First, the translation, reordering and language models are learned on a large parallel corpus, the In addition, fluency of translation is ensured by a training set. We use the sentence pairs extracted monolingual Spanish language model φLM , which from Europarl by the organizers for the purpose of is a 5-gram model with Kneser-Ney smoothing. selecting translation candidates for the gold annota- Phrase translations are extracted based on IBM- tion. Training the SMT system on the exact same 4 alignments obtained with GIZA++ (Och and Ney, parallel corpus ensures that the system “knows” the 2003). The λ weights for these features are learned same translations as the human annotators who built using the batch lattice-MIRA algorithm (Cherry and the gold standard. This corpus consists of about Foster, 2012) to optimize BLEU-4 (Papineni et al., 900k sentence pairs. 2002) on a tuning set. We use PORTAGE, our inter- Second, the feature weights λ in the PBSMT are nal PBSMT decoder for all experiments. PORTAGE learned on a smaller parallel corpus, the tuning set. uses a standard phrasal beam-search algorithm with This corpus should ideally be drawn from the test domain. Since the CLWSD task does not provide reordering models for (1) the Europarl subset used parallel data in the test domain, we construct the by the CLWSD organizers (900k sentence pairs, as tuning set using corpora publicly released for the in the BASIC system), and (2) the news commen- WMT2012 translation task1. Since sentences pro- tary corpus from WMT12 (which comprises 150k vided in the trial data appeared to come from a wide sentence pairs). For the language model, we use the variety of genres and domains, we decided to build Spanish side of these two corpora, as well as that of our tuning set using data from the news-commentary the full Europarl corpus from WMT12 (which com- domain, rather then the more narrow Europarl do- prises 1.9M sentences). Models learned on different main used for training. We selected the top 3000 data sets are combined using linear mixtures learned sentence pairs from the WMT 2012 development on the tuning set (Foster and Kuhn, 2007). test sets, based on their distance to the CLWSD We also attempted other variations on the BASIC trial and test sentences as measured by cross-entropy system which were not as successful. For instance, (Moore and Lewis, 2010). we tried to update the PBSMT tuning objective to be All Spanish and English corpora were processed better suited to the CLWSD task. When producing using FreeLing (Padro´ and Stanilovsky, 2012). translation of entire sentences, the PBSMT system Since the CLWSD targets and gold translations is expected to produce hypotheses that are simulta- are lemmatized, we lemmatize all corpora. While neously fluent and adequate, as measured by BLEU FreeLing can provide a much richer linguistic anal- score. In contrast, CLWSD measures the adequacy ysis of the input sentences, the PBSMT sytem only of the translation of a single word in a given sen- makes use of their lemmatized representation. Our tence. We therefore attempted to tune for BLEU- systems therefore contrast with previous approaches 1, which only uses unigram precision, and therefore to CLWSD (van Gompel, 2010; Lefever et al., 2011, focuses on adequacy rather than fluency. However, for instance), which use richer sources of informa- this did not improve CLWSD accuracy. tion such as part-of-speech tags. 4 Results 3 ADAPT: Adapting the MT system to the CLWSD task Table 1 gives an overview of the results per target word for both systems, as measured by all of- Our ADAPT system simply consists of two modifi- ficial metrics (see Lefever and Hoste (2010) for a cations to the BASIC PBSMT system. detailed description.) According to the BEST Pre- First, it uses a shorter maximum English phrase cision scores, the ADAPT system outperforms the length.

Load more