Multi-Task Neural Model for Agglutinative Language Translation

Multi-Task Neural Model for Agglutinative Language Translation Yirong Pan1,2,3, Xiao Li1,2,3, Yating Yang1,2,3, and Rui Dong1,2,3 1 Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, China 2 University of Chinese Academy of Sciences, China 3 Xinjiang Laboratory of Minority Speech and Language Information Processing, China [email protected] {xiaoli, yangyt, dongrui}@ms.xjb.ac.cn Abstract (Ablimit et al., 2010). Since the suffixes have many inflected and morphological variants, the Neural machine translation (NMT) has vocabulary size of an agglutinative language is achieved impressive performance recently considerable even in small-scale training data. by using large-scale parallel corpora. Moreover, many words have different morphemes However, it struggles in the low-resource and morphologically-rich scenarios of and meanings in different context, which leads to agglutinative language translation task. inaccurate translation results. Inspired by the finding that monolingual Recently, researchers show their great interest data can greatly improve the NMT in utilizing monolingual data to further improve performance, we propose a multi-task the NMT model performance (Cheng et al., 2016; neural model that jointly learns to perform Ramachandran et al., 2017; Currey et al., 2017). bi-directional translation and agglutinative Sennrich et al. (2016) pair the target-side language stemming. Our approach employs monolingual data with automatic back-translation the shared encoder and decoder to train a as additional training data to train the NMT model. single model without changing the standard Zhang and Zong (2016) use the source-side NMT architecture but instead adding a token before each source-side sentence to monolingual data and employ the multi-task specify the desired target outputs of the two learning framework for translation and source different tasks. Experimental results on sentence reordering. Domhan and Hieber (2017) Turkish-English and Uyghur-Chinese modify the decoder to enable multi-task learning show that our proposed approach can for translation and language modeling. However, significantly improve the translation the above works mainly focus on boosting the performance on agglutinative languages by translation fluency, and lack the consideration of using a small amount of monolingual data. morphological and linguistic knowledge. Stemming is a morphological analysis method, 1 Introduction which is widely used for information retrieval tasks Neural machine translation (NMT) has achieved (Kishida, 2005). By removing the suffixes in the impressive performance on many high-resource word, stemming allows the variants of the same machine translation tasks (Bahdanau et al., 2015; word to share representations and reduces data Luong et al., 2015a; Vaswani et al., 2017). The sparseness. We consider that stemming can lead to standard NMT model uses the encoder to map the better generalization on agglutinative languages, source sentence to a continuous representation which helps NMT to capture the in-depth semantic vector, and then it feeds the resulting vector to the information. Thus we use stemming as an auxiliary decoder to produce the target sentence. task for agglutinative language translation. However, the NMT model still suffers from the In this paper, we investigate a method to exploit low-resource and morphologically-rich scenarios the monolingual data of the agglutinative language of agglutinative language translation tasks, such as to enhance the representation ability of the encoder. Turkish-English and Uyghur-Chinese. Both This is achieved by training a multi-task neural Turkish and Uyghur are agglutinative languages model to jointly perform bi-directional translation with complex morphology. The morpheme and agglutinative language stemming, which structure of the word can be denoted as: prefix1 utilizes the shared encoder and decoder. We treat + … + prefixN + stem + suffix1 + … + suffixN stemming as a sequence generation task. 1031 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 103–110 July 5 - July 10, 2020. c 2020 Association for Computational Linguistics Training data Bilingual Data Monolingual Data The method requires no changes to the standard NMT architecture but instead requires adding a <MT> + English sentence Turkish sentence token at the beginning of each source sentence to Encoder-Decoder specify the desired target sentence. Inspired by <MT> + Turkish sentence English sentence Framework their work, we employ the standard NMT model <ST> + Turkish sentence stem sequence with one encoder and one decoder for parameter sharing and model generalization. In addition, we Figure 1: The architecture of the multi-task neural model build a joint vocabulary on the concatenation of the that jointly learns to perform bi-directional translation between Turkish and English, and stemming for Turkish source-side and target-side words. sentence. Several works on morphologically-rich NMT have focused on using morphological analysis to 2 Related Work pre-process the training data (Luong et al., 2016; Huck et al., 2017; Tawfik et al., 2019). Gulcehre et Multi-task learning (MTL) aims to improve the al. (2015) segment each Turkish sentence into a generalization performance of a main task by using sequence of morpheme units and remove any non- the other related tasks, which has been successfully surface morphemes for Turkish-English translation. applied to various research fields ranging from Ataman et al. (2017) propose a vocabulary language (Liu et al., 2015; Luong et al., 2015a), reduction method that considers the morphological vision (Yim et al., 2015; Misra et al., 2016), and properties of the agglutinative language, which is speech (Chen and Mak, 2015; Kim et al., 2016). based on the unsupervised morphology learning. Many natural language processing (NLP) tasks This work takes inspiration from our previously have been chosen as auxiliary task to deal with the proposed segmentation method (Pan et al., 2020) increasingly complex tasks. Luong et al. (2015b) that segments the word into a sequence of sub- employ a small amount of data of syntactic parsing word units with morpheme structure, which can and image caption for English-German translation. effectively reduce language complexity. Hashimoto et al. (2017) present a joint MTL model to handle the tasks of part-of-speech (POS) tagging, 3 Multi-Task Neural Model dependency parsing, semantic relatedness, and textual entailment for English. Kiperwasser and 3.1 Overview Ballesteros (2018) utilize the POS tagging and We propose a multi-task neural model for machine dependency parsing for English-German machine translation from and into a low-resource and translation. To the best of our knowledge, we are morphologically-rich agglutinative language. We the first to incorporate stemming task into MTL train the model to jointly learn to perform both the framework to further improve the translation bi-directional translation task and the stemming performance on agglutinative languages. task on an agglutinative language by using the Recently, several works have combined the standard NMT framework. Moreover, we add an MTL method with sequence-to-sequence NMT artificial token before each source sentence to model for machine translation tasks. Dong et al. specify the desired target outputs for different tasks. (2015) follow a one-to-many setting that utilizes a The architecture of the proposed model is shown in shared encoder for all the source languages with Figure 1. We take the Turkish-English translation respective attention mechanisms and multiple task as example. The “<MT>” token denotes the decoders for the different target languages. Luong bilingual translation task and the “<ST>” token et al. (2015b) follow a many-to-many setting that denotes the stemming task on Turkish sentence. uses multiple encoders and decoders with two separate unsupervised objective functions. Zoph 3.2 Neural Machine Translation (NMT) and Knight (2016) follow a many-to-one setting Our proposed multi-task neural model on using the that employs multiple encoders for all the source source-side monolingual data for agglutinative languages and one decoder for the desired target language translation task can be applied in any language. Johnson et al. (2017) propose a more NMT structures with encoder-decoder framework. simple method in one-to-one setting, which trains In this work, we follow the NMT model proposed a single NMT model with the shared encoder and by Vaswani et al. (2017), which is implemented as decoder in order to enable multilingual translation. Transformer. We will briefly summarize it here. 1042 Task Data # Sent # Src # Trg 4 Experiment Tr-En train 355,251 6,356,767 8,021,161 valid 2,455 37,153 52,125 4.1 Dataset test 4,962 69,006 96,291 The statistics of the training, validation, and test Uy- train 333,097 6,026,953 5,748,298 datasets on Turkish-English and Uyghur-Chinese Ch valid 700 17,821 17,085 machine translation tasks are shown in Table 1. test 1,000 20,580 18,179 For the Turkish-English machine translation, Table 1: The statistics of the training, validation, and following (Sennrich et al., 2015a), we use the WIT test datasets on Turkish-English and Uyghur-Chinese corpus (Cettolo et al., 2012) and the SETimes machine translation tasks. The “# Src” denotes the corpus (Tyers and Alperen, 2010) as the training number of the source tokens, and the “# Trg” denotes dataset,

Multi-Task Neural Model for Agglutinative Language Translation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support