Multilingual Multimodal Machine Translation for Dravidian

Home , Dravidian languages, Tamil language

Multilingual Multimodal Machine Translation for Dravidian Languages utilizing Phonetic Transcription Bharathi Raja Chakravarthi, Ruba Priyadharshini Bernardo Stearns, Mihael Arcan, Saraswathi Narayanan College, Manel Zarrouk, and John P. McCrae Madurai, India Insight Centre for Data Analytics Arun Jayapal National University of Ireland Galway Smart Insights from Conversations, Galway, Ireland Hyderabad, India [email protected] S. Sridevy Tamil Nadu Agricultural University, Coimbatore, India

Abstract which are aligned with the parallel sentences for training. The largest existing dataset contain- Multimodal machine translation is the task ing captions, images, and translations for English, of translating from a source text into the German, French and Czech is the WMT shared target language using information from task Multi30K dataset which is derived from the other modalities. Existing multimodal Flickr30k dataset (Plummer et al., 2015; Plummer datasets have been restricted to only highly et al., 2017). Typically this data is manually cre- resourced languages. In addition to that, ated with the help of bilingual annotators (Elliott et these datasets were collected by manual al., 2016), however, for many languages, such re- translation of English descriptions from sources are not available. In those cases, machine the Flickr30K dataset. In this work, we translation can be a useful tool for the quick ex- introduce MMDravi, a Multilingual Multi- pansion to new languages by producing candidate modal dataset for under-resourced Dravid- translation (Dutta Chowdhury et al., 2018). In or- ian languages. It comprises of 30,000 sen- der to reduce the amount of time, we pose transla- tences which were created utilizing several tion as a post-editing task. We automatically trans- machine translation outputs. Using data lated the English sentences from the WMT cor- from MMDravi and a phonetic transcrip- pus using a pre-trained general domain Statistical tion of the corpus, we build an Multilingual Machine Translation (SMT) and Neural Machine Multimodal Neural Machine Translation Translation (NMT). system (MMNMT) for closely related Dra- Multilingual NMT models (Firat et al., 2016) vidian languages to take advantage of mul- have been shown to increase the translation qual- tilingual corpus and other modalities. We ity for under-resourced languages. Closely related evaluate our translations generated by the Dravidian languages such as Tamil (ISO-639-1: proposed approach with human-annotated ta), Kannada (ISO-639-1: kn), and Malayalam evaluation dataset in terms of BLEU, ME- (ISO-639-1: ml) exhibit a large overlap in their TEOR, and TER metrics. Relying on vocabulary and strong syntactic and lexical simi- multilingual corpora, phonetic transcrip- larities. Dravidian languages are a family of lan- tion, and image features, our approach im- guages spoken primarily in the southern part of proves the translation quality for the under- India and spread over South Asia and are con- resourced languages. sidered as under-resourced languages. However, the scripts used to write these languages are dif- 1 Introduction ferent and they differ in their morphology. Re- The development of a Multilingual Multimodal cently Chakravarthi et al. (2019) have shown that Neural Machine Translation (MMNMT) system phonetic transcription of a corpus into Latin script requires multilingual parallel corpora and images improves the multilingual NMT performance for under-resourced Dravidian languages. © 2019 The authors. This article is licensed under a Creative Commons 4.0 licence, no derivative works, attribution, CC- In this paper, we propose applying Multilingual BY-ND. Multimodal NMT for translating between closely

LoResMT 2019 Dublin, Aug. 19-23, 2019 | p. 56 related Dravidian languages and English. We cre- image feature. This allowed them to predict the ated multimodal data using SMT and NMT meth- next word and showed that the image feature im- ods, trained on a general domain corpus for under- proved the translation quality. Although all these resourced languages. Combining multilingual and approaches have demonstrated the possibility of multimodal data along with a phonetic transcrip- MNMT, they rely on manually collected corpora tion of the corpus improves translation perfor- but under-resourced languages do not have such mance for closely related Dravidian languages is resources. Our work follows the doubly-attentive shown in the results. model (Calixto et al., 2017) with MMDravi data for the multilingual model by phonetic transcrip- 2 Related Work tion. In Ha et al. (2016) and Johnson et al. (2017), the To capture rich information from multimodal con- authors have demonstrated that multilingual NMT tent available on the Web, especially images with improves translation quality. For this, they created descriptions in English was explored in the con- multilingual NMT without changing the architec- tent of NMT (Specia et al., 2016a) in the WMT ture by introducing special tokens at the beginning shared task. The WMT shared task also pro- of the source sentence indicating the source lan- vided resources for other popular languages Ger- guage and target language as shown in Figure 1. man, Czech and French (Elliott et al., 2017). We follow this by introducing special tokens in Most of those data were expensive, for example, the source sentence to indicate the target language. English-German corpus was created by Elliott et Phonetic transcription to Latin script and the In- al. (2016), and cost e23,000 for data collection ternational Phonetic Alphabet (IPA) was studied (e0.06 per word). Such resources are not available by (Chakravarthi et al., 2019) and showed that for under-resourced languages. Recent work by Latin script outperforms IPA for the Multilingual (Dutta Chowdhury et al., 2018), carried out experi- NMT of Dravidian languages. We propose to com- ments by utilizing synthetic data for Hindi-English bine multilingual, phonetic transcription and mul- language pair. In contrast, we created MMDravi as timodal content to improve the translation qual- a translation post-editing task by utilizing transla- ity of under-resourced Dravidian languages. Our tions of the English sentences associated with im- contribution is to use the closely related languages ages using SMT and NMT trained on general do- from the Dravidian language family to exploit the main data. similar syntax and semantic structures by phonetic The shared task on Multimodal NMT (MNMT) transcription of the corpora into Latin script along was introduced by Specia et al. (2016b) to generate with image feature to improve the translation qual- image descriptions for a target language, given an ity. image and/or a description in the source language. In previous works on MNMT, the researchers uti- 3 Background lized visual context by involving both NMT and Image Description Generation (IDG) features that 3.1 Dravidian Languages explicitly uses an encoder-decoder (Cho et al., Dravidian languages have individual writing 2014). However, the encoder-decoder architecture scripts and have been assigned a unique block in encodes the source sentence into a ﬁxed-length the Unicode computing industry standard. The vector. To overcome this drawback (Bahdanau et similarity of these languages is that they are all al., 2015) introduced attention mechanism to fo- written from left to right, consist of sequences cus on parts of the source sentence. The work by of simple or complex characters and follow an Calixto and Liu (2017), carried out different ex- alpha-syllabic writing system in which the individ- periments to incorporate visual features into NMT ual symbols are syllables (Bhanuprasad and Sven- by projecting an image feature vector as words son, 2008). The languages also have different sets into the source sentence, using the image to ini- of vowels and consonants. Vowels and conso- tialize the encoder hidden state, and using im- nants are atomic but when they are combined with age features to initialize the decoder hidden state. each other they form consonant ligatures. Dravid- In Calixto et al. (2017), the author incorporated ian languages such as Tamil do not represent dif- features through a separate encoder and doubly- ferences between aspirated and unaspirated stops, attentive attention of the decoder to depend on the while other Dravidian languages such as Kannada

LoResMT 2019 Dublin, Aug. 19-23, 2019 | p. 57 and Malayalam have a large number of loan words The attention model calculates ct as the from Indo-Aryan languages and support a large weighted sum of the source side context vectors: number of compound characters resulting from the combination of two consonants symbols (Kumar et N al., 2015). src ct = αt,i hi (2) i=1 3.2 Phonetic Transcription Xsrc src exp (et,i ) Phonetic transcription is the use of phonetic sym- αt,i = (3) N exp (esrc) bols such as IPA or non-native script. As the Dra- j=1 t,j vidian languages under study are written in dif- αsrc is the normalizedP alignment matrix be- ferent scripts, they must be converted to some t,i tween each source annotation vector h and word common representation before training the MM- i y to be emitted at a time step t. Expected align- NMT to take advantage of closely related language t ment esrc between each source annotation vector resources. Phonetic transcription to Latin script t,i h and the target word y is computed using the and the International Phonetic Alphabet (IPA) was i t following formula: studied by (Chakravarthi et al., 2019) and showed that Latin script outperforms IPA for the Multi- src src T src 0 src et,i = (Va ) tanh(Ua st + Wa hi) (4) lingual NMT Dravidian languages. The improve- ments in translation performance were shown in src src src Va , Ua and Wa are model parameters. terms of the BLEU (Papineni et al., 2002) metric. We used the Indic-trans library by Bhat et al. 3.4 Multimodal Neural Machine Translation (2015) for phonetic transcription of corpora into The Multimodal NMT (MNMT) (Calixto et al., the Latin script, which brings all the languages 2017) model is an extension of the encoder- into a single representation by a phoneme match- decoder framework, by incorporating visual in- ing algorithm. The same library was used to back- formation. To incorporate the visual features ex- transliterate from Latin script to the corresponding tracted from the pre-trained model the authors have Dravidian language to evaluate the translation per- integrated another attention mechanism to the de- formance. coder. The doubly-attentive decoder Recurrent Neural Network is conditioned on the previous 3.3 Neural Machine Translation hidden state, previously emitted word, source sen- Neural Machine Translation is a sequence-to- tence and the image via attention mechanism (Cal- sequence approach (Sutskever et al., 2014) us- ixto et al., 2017). In the original attention-based ing an encoder-decoder architecture with an atten- NMT model described in Section 3.3, a single en- tion mechanism (Bahdanau et al., 2015). Given coder for the source sentence, a single decoder for a source sentence X=x1, x2, x3,...xn and target the target sentence and the attention mechanism sentence Y =y1, y2, y3,...yn the bidirectional en- are conditioned on the source sentence. MNMT coder transforms the source sentence into annota- integrates two separate attention mechanism over tion vectors C=h1, h2, h3,...hn. At each time step the source language and visual features associated t, the source context vector ct is computed based with the source and target sentence. The decoder on the annotation vector and the decoder’s previ- generates a target word by computing a new prob- ous hidden state st 1. The decoder generates one ability P (y = k y ,C,A) given a hidden state − t |

P (yt = k y

The L0,Ls,LW and Lc are transformation matri- L0, Ls, Lw, Ey, Lcs,and Lci are projection matri- ces. ces. The mechanism in MNMT is similar to NMT

LoResMT 2019 Dublin, Aug. 19-23, 2019 | p. 58 Figure 1: Example of sentences with special tokens to indicate the source and target languages. with an attention model, except for the source sen- BLEU Score tence and previous hidden state, it also takes the Lang pair SMT NMT context vector a from the image features using a En-Ta 30.29 35.52 double attention layer to calculate the current hid- En-Kn 28.81 26.86 den state. The doubly-attentive model calculates En-Ml 36.73 38.56 the time-dependent vector it as follows: Table 2: Results of general domain SMT and NMT translation systems on general domain evaluation set L img it = βt αt,l al (6) Xl=1 Multi30K dataset contains parallel corpora for En- Where, glish and German. There were two types of multi- βt = σ(Wβst 1 + bβ) (7) − lingual annotations released by Multi30K dataset The expected alignment vector of image is given (Elliott et al., 2016). The first one is an En- by glish description for each image and its German img translation. The second is a corpus of five in- img exp (et,l ) αt,l = L img (8) dependently collected English and German de- j=1 exp (et,j ) scription pairs for each image. Synthetic data or back-transliterated data have been widely used to eimg = (V img)T tanhP (U imgs0 + W imga ) t,l a a t a l (9) improve the performance of NMT and MNMT. img img img To produce a target side description of an im- Va , Ua and Wa are model parameters. age, we create a general domain SMT and NMT Corpus Statistics for English-Tamil, English-Kannada, and English- Lang pair sent s-tokens t-tokens Malayalam pairs. We collected the general do- En-Ta 0.8M 6.4M 13.3M main parallel corpora for the Dravidian languages En-Kn 0.5M 2.6M 4.5M from the OPUS website (Tiedemann and Nygaard, En-Ml 1.4M 16.7M 23.5M 2004) and (Chakravarthi et al., 2018). The corpus statistics are shown in Table 1. The corpus is tok- Table 1: Statistics of the parallel corpora used to train the enized and standardized to lowercase. The general general domain translation systems. sent: Number of sen- domain SMT was created with Moses (Koehn et tences, s-tokens: Number of source tokens, and t-tokens: Number of target tokens. al., 2007) while the NMT system was trained with OpenNMT (Klein et al., 2017). After tokenization, we fed the parallel corpora to Moses and Open- 4 Experimental Setup NMT. Preprocessed files are then used to train the models. We used the default OpenNMT parame- 4.1 Data ters for training, i.e. 2 layers LSTM with 500 hid- The images required for our work were collected den units for both, the encoder and decoder. from Flicker by Plummer et al. (2015). The The SMT and NMT system results on general

LoResMT 2019 Dublin, Aug. 19-23, 2019 | p. 59 Table 3: Results are expressed in BLEU score: Baseline is Multimodal NMT, MMNMT is trained on native script, and MMNMT-T is trained utilizing phonetic transcription.

BLEUscore Lang pair Baseline MMNMT MMNMT-T En-Ta 50.2 51.0 52.3 En-Ml 35.6 36.0 36.5 En-Kn 44.5 45.1 45.9 Ta-En 45.2 47.4 48.9 Ml-En 34.3 36.2 37.6 Kn-En 50.0 50.2 50.8

did not have to bother about the case correction for Dravidian languages (as they do not have cases). We tokenized the Dravidian language using the OpenNMT tokenizer with segment alphabet op- tions for Tamil, Kannada, and Malayalam. For the sub-word level representation, we chose the Figure 2: Example of sentence and image with candidate 10,000 most frequent units to train the BPE (Sen- translation to choose. nrich et al., 2016) model. We used this model for the sub-word level segmentation for the train- domain evaluation set are shown in Table 2. The ing, development, and evaluation set. We trained development and test set of the multimodal corpus the MMNMT model to translate from English into was collected with the help of volunteer annota- Dravidian languages as well as from Dravidian tors. To reduce the annotation time, we posed the languages into English. Visual features were ex- translation task of the development and test set as a tracted from publicly available pre-trained CNN’s. post-editing task. We provided the candidate trans- Speciﬁcally, we extract spatial image features us- lation of the English sentence from SMT, NMT, ing the VGG-19 network (Simonyan and Zisser- and an option to choose the best translation or pro- man, 2014). In our experiment, we pass all the im- vide an original translation. Eighteen annotators ages in our dataset through the pre-trained VGG- participated in this annotation process, with dif- 19 layered network to extract global information ferent backgrounds, they all are native speakers of and use them in a separate visual attention mecha- the language that they annotated. The data for the nism as described in Calixto et al. (2017). Malayalam language was collected from three different native speakers. Ten Tamil native speakers 4.2 Multilingual Multimodal Neural Machine participated in creating data for the Tamil language Translation and ﬁve Kannada native speakers annotated for the Kannada language. Since voluntary annotators Since we translate between closely related lan- are scarce and annotate little data, each sentence guages and English, we set up the translation set- was annotated by only one annotator. We then se- ting in two scenarios, 1) One-to-Many and 2) lected the system that performed better based on Many-to-One. the choice of annotators. We designed an annotation tool to meet the objective of method. We de- 4.2.1 One-to-Many Approach cided to use Google Forms to collect the data from the voluntary annotator’s. An example is shown In this setting, we create a model to translate in Figure 2. We chose NMT and used the gen- from English into Tamil, Malayalam, and Kan- eral domain NMT to post-edit the translation for nada. The source language sentence was replicated the training set of MMDravi. three times for the three languages with a token in- For our tasks, all descriptions in English were dicating target language. Figure 1 shows the ex- converted to lowercase and tokenized, while we ample of sentences.

LoResMT 2019 Dublin, Aug. 19-23, 2019 | p. 60 src a black dog runs on green grass with a toy in his mouth . ref ஒரு கருப்ꯁ நாய் வா뾿쯍 ஒரு ப ாம்மையுடꟍ ச்மை ꯁ쯍쮿쯍 ஓ翁垿ற鏁. MMNMT ஒரு கருப்ꯁ நாய் வா뾿쯍 ஒரு ப ாம்மையுடꟍ ச்மை ꯁ쯍 므鏁 இய ங்埁ம் . MMNMT ஒரு கருப்ꯁ நாய் வா뾿쯍 ஒரு -T ப ாம்மையுடꟍ ச்மை ꯁ쯍 므鏁 ஓ翁垿றத.

Figure 3: Example showing improvement of translation quality and readability of the translation over baseline model. Errors are shown in red color.

src a woman and two men , that are dressed professionally, are having a discussion. ref ஒரு பெண் மற்று믍 இரண்翁 ஆண்垳், ப ொழில்믁றை உறையணிந்鏁, ஒரு விவொ த்தில் உ்ளொர்垳். MMNMT ஒரு பெண் , மற்று믍 இரு ஆண்垳் professionally ஆயத் 믍ெண்ணி , ஒரு விவொ த்திறை வவண்டிவ ொ믍 . MMNMT ஒரு பெண் மற்று믍 இரண்翁 ஆண்垳், -T professional உறையணிந்鏁, ஒரு விவொ த்தில் உ்ளொர.

Figure 4: Example showing translation with accurate transfer of important information. Errors are shown in red color.

4.2.2 Many-to-One Approach the Bilingual Multimodal NMT model in BLEU. In the many-to-one MMNMT system, we cre- Translation from Dravidian to English has the ate a model to translate from Tamil, Malayalam, highest improvement in terms of BLEU Score. and Kannada (Dravidian languages) to English. Our experiments show that the MMNMT system We replicated the English sentence three times for compared with the bilingual system has an im- three languages on the target side of the corpus. provement in several language directions, which We then train the MNMT system with a visual fea- are likely gained from phonetic transcription, im- ture for individual language level with the MM- age features, and transfer of parameters from dif- Dravi data. We compared the results with the MM- ferent languages. NMT for one-to-many and many-to-one models. The results show that for MMNMT with phonetically transcribed corpora, helps more in Dravidian 4.3 Results to English than English to Dravidian. An explana- We applied the baseline bilingual Multimodal tion for this is that in the dataset, each source sen- NMT systems with respect to the MMDravi data tence has three targets, which encourages the lan- created from the Multi30k dataset. Then we guage model to improve the translation results. In trained our MMNMT and MMNMT-T (phonetic Table 3, we compare the BLEU scores with a base- transcription of corpus) for English into Dravidian line approach and our method. In order to evaluate languages and vice versa. Results are presented in the effectiveness of our proposed model, we have BLEU (Papineni et al., 2002) (BiLingual Evalua- explored MMNMT trained on original scripts and tion Understudy), which measures the n-gram pre- MMNMT trained on a single script. Our empirical cision with respect to the evaluation set. results show that the best result is achieved when Table 3 provides the BLUE scores for the MM- we phonetically transcribed the corpus and brought NMT model. We observed that the translation it to a single script for both English to Dravidian performance of MMNMT is higher compared to and Dravidian to English translation tasks.

LoResMT 2019 Dublin, Aug. 19-23, 2019 | p. 61 Figure 3 shows the examples of where the MM- ICLR 2015, San Diego, CA, USA, May 7-9, 2015, NMT model improves the translation quality and Conference Track Proceedings. readability of the translation over the baseline Bhanuprasad, Kamadev and Mats Svenson. 2008. Er- model. The results given by the human evalu- rgrams – a way to improving ASR for highly in- ation confirm the results observed in evaluation flected Dravidian languages. In Proceedings of BLEU metric. The second example for English- the Third International Joint Conference on Natural Language Processing: Volume-II. Tamil translation of MMNMT system outperform- ing the baseline is shown in Figure 4. The first Bhat, Irshad Ahmad, Vandan Mujadia, Aniruddha Tam- example showns an almost perfect translation ob- mewar, Riyaz Ahmad Bhat, and Manish Shrivastava. tained with the MMNMT system for English to 2015. IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search. In Proceed- Tamil. In the second example, translation obtained ings of the Forum for Information Retrieval Evalua- with the MMNMT system is acceptable with the tion, FIRE ’14, pages 48–53, New York, NY, USA. accurate transfer of important information (Cough- ACM. lin, 2003). This suggests the synthetic data with Calixto, Iacer and Qun Liu. 2017. Incorporating global our MMNMT model can be used in an under- visual features into attention-based neural machine resourced language setting to improve the transla- translation. In Proceedings of the 2017 Conference tion quality. on Empirical Methods in Natural Language Process- ing, pages 992–1003. Association for Computational 5 Conclusion Linguistics. We introduced a new dataset, named MMDravi Calixto, Iacer, Qun Liu, and Nick Campbell. 2017. Doubly-attentive decoder for multi-modal neural and proposed a MMNMT method for closely re- machine translation. In Proceedings of the 55th An- lated Dravidian languages to overcome the re- nual Meeting of the Association for Computational source issues. Compared to the baseline ap- Linguistics (Volume 1: Long Papers), pages 1913– proach, the results show that our approach can im- 1924. Association for Computational Linguistics. prove translation quality, especially for Dravidian Chakravarthi, Bharathi Raja, Mihael Arcan, and John P. languages. Our evaluation, using phonetic tran- McCrae. 2018. Improving Wordnets for Under- scription, multilingual and multimodal NMT, has Resourced Languages Using Machine Translation. shown that the proposed MMNMT-T outperforms In Proceedings of the 9th Global WordNet Confer- ence. The Global WordNet Conference 2018 Com- the existing approach of multimodal, multilingual mittee. in low-resource neural machine translation across all the language pairs considered. We plan to re- Chakravarthi, Bharathi Raja, Mihael Arcan, and John P. lease multilingual translations as an addition to McCrae. 2019. Comparison of Different Orthogra- phies for Machine Translation of Under-resourced Flickr30k set, and explore the effect of the qual- Dravidian Languages. In Proceedings of the 2nd ity of this synthetic data in our future work. Conference on Language, Data and Knowledge. Acknowledgments Cho, Kyunghyun, Bart van Merrienboer, Caglar Gul- cehre, Dzmitry Bahdanau, Fethi Bougares, Holger This work is supported by a research grant from Schwenk, and Yoshua Bengio. 2014. Learning Science Foundation Ireland, co-funded by the Eu- phrase representations using RNN encoder–decoder ropean Regional Development Fund, for the In- for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Nat- sight Centre under Grant Number SFI/12/RC/2289 ural Language Processing (EMNLP), pages 1724– and the European Union’s Horizon 2020 research 1734, Doha, Qatar, October. Association for Com- and innovation programme under grant agreement putational Linguistics. No 731015, ELEXIS - European Lexical Infras- Coughlin, Deborah. 2003. Correlating automated and tructure and grant agreement No 825182, Pret-ˆ a-` human assessments of machine translation quality. LLOD. In In Proceedings of MT Summit IX, pages 63–70.

Dutta Chowdhury, Koel, Mohammed Hasanuzzaman, References and Qun Liu. 2018. Multimodal neural machine translation for low-resource language pairs using Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua synthetic data. In Proceedings of the Workshop on Bengio. 2015. Neural machine translation by Deep Learning Approaches for Low-Resource NLP, jointly learning to align and translate. In 3rd Inter- pages 33–42, Melbourne, July. Association for Com- national Conference on Learning Representations, putational Linguistics.

LoResMT 2019 Dublin, Aug. 19-23, 2019 | p. 62 Elliott, Desmond, Stella Frank, Khalil Sima’an, and Lu- Evaluation of Machine Translation. In Proceedings cia Specia. 2016. Multi30K: Multilingual English- of the 40th Annual Meeting of the Association for German Image Descriptions. In Proceedings of the Computational Linguistics. 5th Workshop on Vision and Language, pages 70–74. Association for Computational Linguistics. Plummer, Bryan A., Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Elliott, Desmond, Stella Frank, Lo¨ıc Barrault, Fethi Lazebnik. 2015. Flickr30k entities: Collecting Bougares, and Lucia Specia. 2017. Findings of region-to-phrase correspondences for richer image- the Second Shared Task on Multimodal Machine to-sentence models. In Proceedings of the 2015 Translation and Multilingual Image Description. In IEEE International Conference on Computer Vision Proceedings of the Second Conference on Machine (ICCV), ICCV ’15, pages 2641–2649, Washington, Translation, pages 215–233. Association for Com- DC, USA. IEEE Computer Society. putational Linguistics. Plummer, Bryan A., Liwei Wang, Chris M. Cervantes, Firat, Orhan, Kyunghyun Cho, and Yoshua Bengio. Juan C. Caicedo, Julia Hockenmaier, and Svet- 2016. Multi-way, multilingual neural machine trans- lana Lazebnik. 2017. Flickr30k entities: Col- lation with a shared attention mechanism. In Pro- lecting region-to-phrase correspondences for richer ceedings of the 2016 Conference of the North Amer- image-to-sentence models. Int. J. Comput. Vision, ican Chapter of the Association for Computational 123(1):74–93, May. Linguistics: Human Language Technologies, pages 866–875. Association for Computational Linguis- Sennrich, Rico, Barry Haddow, and Alexandra Birch. tics. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th An- Ha, Thanh-Le, Jan Niehues, and Alexander H. Waibel. nual Meeting of the Association for Computational 2016. Toward multilingual neural machine trans- Linguistics (Volume 1: Long Papers), pages 1715– lation with universal encoder and decoder. CoRR, 1725. Association for Computational Linguistics. abs/1611.04798. Simonyan, Karen and Andrew Zisserman. 2014. Very Johnson, Melvin, Mike Schuster, Quoc V. Le, Maxim deep convolutional networks for large-scale image Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, recognition. CoRR, abs/1409.1556. Fernanda Viegas,´ Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s Specia, Lucia, Stella Frank, Khalil Sima’an, and multilingual neural machine translation system: En- Desmond Elliott. 2016a. A Shared Task on Multi- abling zero-shot translation. Transactions of the As- modal Machine Translation and Crosslingual Image sociation for Computational Linguistics, 5:339–351, Description. In Proceedings of the First Conference December. on Machine Translation: Volume 2, Shared Task Pa- pers, pages 543–553. Association for Computational Klein, Guillaume, Yoon Kim, Yuntian Deng, Jean Linguistics. Senellart, and Alexander Rush. 2017. OpenNMT: Open-Source Toolkit for Neural Machine Transla- Specia, Lucia, Stella Frank, Khalil Sima’an, and tion. In Proceedings of ACL 2017, System Demon- Desmond Elliott. 2016b. A shared task on mul- strations, pages 67–72. Association for Computa- timodal machine translation and crosslingual im- tional Linguistics. age description. In Proceedings of the First Con- ference on Machine Translation, pages 543–553, Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Berlin, Germany, August. Association for Compu- Callison-Burch, Marcello Federico, Nicola Bertoldi, tational Linguistics. Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexan- Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. 2014. dra Constantin, and Evan Herbst. 2007. Moses: Sequence to sequence learning with neural networks. Open Source Toolkit for Statistical Machine Trans- In Proceedings of the 27th International Conference lation. In Proceedings of the 45th Annual Meeting of on Neural Information Processing Systems - Vol- the Association for Computational Linguistics Com- ume 2, NIPS’14, pages 3104–3112, Cambridge, MA, panion Volume Proceedings of the Demo and Poster USA. MIT Press. Sessions, pages 177–180. Association for Computa- tional Linguistics. Tiedemann, Jorg¨ and Lars Nygaard. 2004. The OPUS Corpus - Parallel and Free: http://logos.uio.no/opus. Kumar, Arun, Llu´ıs Padro,´ and Antoni Oliver. 2015. In Proceedings of the Fourth International Con- Joint Bayesian morphology learning for Dravidian ference on Language Resources and Evaluation languages. In Proceedings of the Joint Workshop (LREC’04). European Language Resources Associ- on Language Technology for Closely Related Lan- ation (ELRA). guages, Varieties and Dialects, pages 17–23, Hissar, Bulgaria, September. Association for Computational Linguistics. Papineni, Kishore, Salim Roukos, Todd Ward, and Wei- Jing Zhu. 2002. BLEU: a Method for Automatic

LoResMT 2019 Dublin, Aug. 19-23, 2019 | p. 63