Document-Level Machine Translation with Word Vector Models

Document-Level Machine Translation with Word Vector Models Eva Mart´ınez Garcia,�Cristina Espa na-Bonet˜ Llu´ıs Marquez` TALP Research Center Arabic Language Technologies Univesitat Politecnica` de Catalunya Qatar Computing Research Institute Jordi Girona, 1-3, 08034 Barcelona, Spain Tornado Tower, Floor 10 emartinez,cristinae @cs.upc.edu P.O. Box 5825, Doha, Qatar { } [email protected] Abstract coreferent pronouns outside a sentence cannot be properly translated in this way, which is already In this paper we apply distributional se- important because the correct translation of pro- mantic information to document-level ma- nouns in a document confers a high level of coher- chine translation. We train monolingual ence to thefinal translation. Also, discourse con- and bilingual word vector models on large nectives are valuable because they mark theflow corpora and we evaluate themfirst in a of the discourse in a text. It is desirable to transfer cross-lingual lexical substitution task and them to the output translation in order to maintain then on thefinal translation task. For trans- the characteristics of the discourse. The evolution lation, we incorporate the semantic infor- of the topic through a text is also an important fea- mation in a statistical document-level de- ture to preserve. coder (Docent), by enforcing translation All these aspects can be used to improve the choices that are semantically similar to translation quality by trying to assure coherence the context. As expected, the bilingual throughout a document. Several recent works go word vector models are more appropriate on that direction. Some of them present post- for the purpose of translation. Thefi- processing approaches making changes into afirst nal document-level translator incorporat- translation according to document-level informa- ing the semantic model outperforms the tion (Mart´ınez-Garcia et al., 2014a; Xiao et al., basic Docent (without semantics) and also 2011). Others introduce the information within the performs slightly over a standard sentence- decoder, by, for instance, implementing a topic- level SMT system in terms of ULC (the av- based cache approach (Gong et al., 2011; Xiong et erage of a set of standard automatic eval- al., 2015). The decoding methodology itself can be uation metrics for MT). Finally, we also changed. This is the case of a document-oriented present some manual analysis of the trans- decoder, Docent (Hardmeier et al., 2013), which lations of some concrete documents. implements a search in the space of translations of a whole document. This framework allows us 1 Introduction to consider features that apply at document level. Document-level information is usually lost during One of the main goals of this paper is to take ad- the translation process when using Statistical Ma- vantage of this capability to include semantic in- chine Translation (SMT) sentence-based systems formation at decoding time. (Hardmeier, 2014; Webber, 2014). Cross-sentence We present here the usage of a semantic repre- dependencies are totally ignored, as they trans- sentation based on word embeddings as a language late sentence by sentence without taking into ac- model within a document-oriented decoder. To do count any document context when choosing the this, we trained a word vector model (WVM) us- best translation. Some simple phenomena like ing neural networks. As afirst approach, a monolingual model is used in analogy with the standard c 2015 The authors. This article is licensed under a Creative Commons� 3.0 licence, no derivative works, attribution, CC- monolingual language models based onn-grams BY-ND. of words instead of vectors. However, to better 59 approach translation, bilingual models are built. ing into account the results of thefirst two steps. These models are avaluated in isolation outside These approaches report improvements in thefi- the decoder by means of a cross-lingual evaluation nal translations but, in most of them. the improve- task that resembles a translation environment. Fi- ments can only be seen through a detailed manual nally, we use these models in a translation task and evaluation. When using automatic evaluation met- we observe how the semantic information enclosed rics like BLEU (Papineni et al., 2002), differences in them help to improve translation quality. are not significant. The paper is organized as follows. A brief re- A document-oriented SMT decoder is presented vision of the related work is done in Section 2. in (Hardmeier et al., 2012; Hardmeier et al., 2013). In Section 3, we describe our approach of using a The decoder is built on top of an open-source bilingual word vector model as a language model. phrase-based SMT decoder, Moses (Koehn et al., The model is compared to monolingual models 2007). The authors present a stochastic local and evaluated. We show and discuss the results of search decoding method for phrase-based SMT our experiments on the full translation task in Sec- systems which allows decoding complete docu- tion 5. Finally, we draw the conclusions and define ments. Docent starts from an initial state (trans- several lines of future work in Section 6. lation) given by Moses and this one is modified by the application of a hill climbing strategy tofind a 2 Related Work (local) maximum of the score function. The score In the last years, approaches to document-level function and some defined change operations are translation have started to emerge. The earliest the ones encoding the document-level information. ones deal with pronominal anaphora within an One remarkable characteristic of this decoder, be- SMT system (Hardmeier and Federico, 2010; Na- sides the change of perspective in the implementa- gard and Koehn, 2010). These authors develop tion from sentence-level to document-level, is that models that, with the help of coreference resolu- it allows the usage of a WVM as a Semantic Space tion methods, identify links among words in a text Language Model (SSLM). In this case, the decoder and use them for a better translation of pronouns. uses the information of the word vector model to More recent approaches focus on topic cohesion. evaluate the adequacy of a word inside a transla- (Gong et al., 2011) tackle the problem by mak- tion by calculating the distance among the current ing available to the decoder the previous transla- word and its context. tions at decoding time using a cache system. In In the last years, several distributed word repre- this way, one can bias the system towards the lexi- sentation models have been introduced. Further- con already used. (Xiong et al., 2015) also present more, distributed models have been successfully a topic-based coherence improvement for an SMT applied to several different NLP tasks. These mod- system by trying to preserve the continuity of sen- els are able to capture and combine the semantic tence topics in the translation. To do that, they ex- information of the text. An efficient implemen- tract a coherence chain from the source document tation of the Context Bag of Words (CBOW) and and, taking this coherence chain as a reference, the Skipgram algorithms is presented in (Mikolov they predict the target coherence chain by adapt- et al., 2013a; Mikolov et al., 2013c; Mikolov et ing a maximum entropy classifier. Document-level al., 2013d). Within this implementation WVMs translation can also be seen as the post-process of are trained using a neural network. These models an already translated document. In (Xiao et al., proved to be robust and powerful to predict seman- 2011; Mart´ınez-Garcia et al., 2014a), they study tic relations between words even across languages. the translation consistency of a document and re- They are implemented inside the word2vec soft- translate source words that have been translated in ware package. However, they are not able to han- different ways within a same document. The aim is dle lexical ambiguity as they conflate word senses to incorporate document contexts into an existing of polysemous words into one common represen- SMT system following3 steps. First, they iden- tation. This limitation is already discussed in tify the ambiguous words; then, they obtain a set (Mikolov et al., 2013b) and in (Wolf et al., 2014), of consistent translations for each word according in which bilingual extensions of the word2vec ar- to the distribution of the word over the target docu- chitecture are also proposed. These bilingual ex- ment; andfinally, generate the new translation tak- tensions of the models consist of a combination 60 of two monolingual models. They combine the or the composition of monoligual models to build source vector model and the target vector model bilingual ones. This section shows a methodology by training a new neural network. This network is to build directly bilingual models. able to learn the projection matrix that combines the information of both languages. A new bilin- 3.1 Bilingual word vector models gual approach is presented in (Mart´ınez-Garcia et For our experiments we use the two algorithms im- al., 2014b). Also, the resulting models are evalu- plemented in the word2vec package, Skipgram and ated in a cross-lingual lexical substitution task as CBOW. well as measuring their accuracy when capturing The Skipgram model trains a NN to predict the words semantic relationships. context of a given word. On the other hand, the Recently, Neural Machine Translation (NMT) CBOW algorithm uses a NN to predict a word has appeared as a powerful alternative to other MT given a set of its surrounding words, where the or- techniques. Its success lies on the excellent results der of the words in the history does not inuence the that deep neural networks have achieved in natural projection. language tasks as well as in other areas. In short, In order to introduce semantic information in a NMT systems are build over a trained neural net- bilingual scenario, we use a parallel corpus and au- work that is able to output a translation given a tomatic word alignment to extract a new training source text in the input (Sutskever et al., 2014b; corpus of word pairs: (w w ).

Document-Level Machine Translation with Word Vector Models

Integrating Optical Character Recognition and Machine

Machine Translation for Academic Purposes Grace Hui-Chin Lin

Machine Translation for Language Preservation

Machine Translation and Computer-Assisted Translation: a New Way of Translating? Author: Olivia Craciunescu E-Mail: Olivia [email protected]

Language Service Translation SAVER Technote

Terminology Extraction, Translation Tools and Comparable Corpora Helena Blancafort, Béatrice Daille, Tatiana Gornostay, Ulrich Heid, Claude Méchoulam, Serge Sharoff

Acknowledging the Needs of Computer-Assisted Translation Tools Users: the Human Perspective in Human-Machine Translation Annemarie Taravella and Alain O

MULTILINGUAL CHATBOT with HUMAN CONVERSATIONAL ABILITY [1] Aradhana Bisht, [2] Gopan Doshi, [3] Bhavna Arora, [4] Suvarna Pansambal [1][2] Student, Dept

Exploring the Use of Acoustic Embeddings in Neural Machine Translation

Neural Machine Translation System of Indic Languages - an Attention Based Approach

Terminology Extraction, Translation Tools and Comparable Corpora

N-Gram-Based Machine Translation