A Baseline Neural Machine Translation System for Indian Languages

Jerin Philip∗ Vinay P. Namboodiri† C. V. Jawahar‡ IIIT Hyderabad IIT Kanpur IIIT Hyderabad

Abstract one outperforming the previous best and estab- lishing superior baselines in a systematic man- We present a simple, yet effective, Neural Ma- ner [Sutskever et al., 2014, Bahdanau et al., 2014, chine Translation system for Indian languages. Luong et al., 2015, Gehring et al., 2017, Vaswani We demonstrate the feasibility for multiple lan- et al., 2017]. Unfortunately, most of these meth- guage pairs, and establish a strong baseline for further research. ods are highly resource intensive. This, naturally made NMT less attractive for Indian languages due 1 Introduction to the concerns of feasibility. However, we be- lieve that the neural machine translation schemes Ability to access information from non-native lan- are most appropriate for the Indian languages in guages has become a necessity for humans in the the present day setting due to (i) the simplicity of modern age of Information Technology. This is the solution in rapidly prototyping and establishing specially true in where there are many pop- practically effective systems, (ii) the lower cost of ular languages. While knowledge repositories in annotation of the training data. (The data required English are growing at rapid pace, Indian lan- for building NMT systems at most demand sen- guages still remain very poor in digital resources. tence level alignments, with no special tagging.) This leaves an average Indian handicapped in ac- (iii) the ease in transfer of ideas/algorithms across cessing the knowledge resources. Automatic ma- languages (Many practical tricks of developing ef- chine translation is a promising tool to bridge the fective solutions in one language provide insights language barrier, and thereby to democratize the in newer language pairs also.) (iv) ease in transfer knowledge. of knowledge across languages/tasks, often imple- Machine Translation (MT) has been a topic mented through pre-training, transfer learning or of research for over two decades now, with domain adaptation (v) rich software infrastructure schools of thought following rule-based ap- available for rapidly prototyping effective mod- proaches [Sinha et al., 1995, Chaudhury et al., els (eg. software stack for training) and deploy- 2010], example-based approaches [Somers, 1999, ment (eg. on embedded devices). Many of these Sinhal and Gupta, 2014] and statistics-based ap- are very important in the Indian setting, where the proaches [Brown et al., 1993, Koehn, 2009, number of researchers and industries interested in Kunchukuttan et al., 2014, Banerjee et al., 2018]. Indian Language MT is very small compared to A class of statistics-based methods, involving the what is demanded. Often we may not even have use of deep neural networks, popularly known more than one group equipped per language, with as Neural Machine Translation (NMT) is making the state of the art tools and techniques. rapid progress in providing nearly usable transla- India has 22 official languages, and many un- tions across many language pairs [Edunov et al., official languages in use. With the fast growing 2018, Aharoni et al., 2019, Neubig and Hu, 2018]. numbers of mobile phone and Internet users, there Multiple competitive neural network architectures is an immediate need for automatic machine trans- have come up in the last couple of years, each lation systems from/to English as well as, across ∗ Indian languages. Though the digital content in In- [email protected] [email protected]† dian languages has increased a lot in the last few [email protected]‡ years, it is not yet comparable to that in English. For example, there are only 780K Wikipedia pages across Indian languages. We hope that this in and 214K in Telugu in contrast to 36.5M will enable further research in this area. in English [Wikipedia]. The ability to have ef- fective translation models with acceptable perfor- • Finally, we would like to release models and mance takes solutions based on natural interfaces resources (such as noisy sentences) for en- to the remote areas of a large world population. In- abling research in this area. dia has a spectrum of languages which could be The rest of the content is organized as follows: labelled as low resource. Within the scope of the We start off with Section 2 reviewing some of the discussion in this work, a high resource language relevant background literature with a specific fo- is one where large number (> 1M) of sentence- cus on Indian language scenario. In Section 3, we level aligned text pairs between two languages are discuss the components of the system and the adap- available. Note that this itself could be a very con- tations carried out for Indian language situations. servative definition. In Section 4, we elaborate the experimental details. Usable NMT solutions already exist between In Section 5, we summarize the results and discuss several high resource languages of the world. our findings. Some closed systems are available for public use on the web [Google, Microsoft] for Indian Lan- 2 Machine Translation and Indian guages. However, they are limited in terms of de- Language Situation tails, reproducibility, and scope for further exten- Machine translation systems could be built for a sions to the wider research community. specific language pair or for multiple pairs simul- The objective of this work is multi fold. We taneously. We take the latter route in this work. summarize them below. This choice is dominated by many practical advan- • Though neural machine translation has re- tages in managing the limited available resources. ceived much attention for many western lan- We also hypothesize that the linguistic similairies guage pairs, not many effective systems are across many Indian languages could help each reported yet for Indic languages in the litera- other in this manner, though a systematic study on ture, barring exceptions like [Banerjee et al., this is outside the scope of this work. Our work is 2018, Sen et al., 2018, Philip et al., 2018]. We in a multilingual setting wherein one system trans- would like to explore this machine learning lates in all directions. We discuss challenges spe- and machine translation front, and demon- cific to the context of a few Indian languages. strate the feasibility for multiple Indian lan- Text Resources Most Indian language pairs are guages. resource scarce in terms of sentence aligned par- • Indian languages have always suffered from allel bi-text. More pairs and experts are likely the lack of annotated corpora in the past for to be available in pivoting through English to a number of language processing task includ- learn a multilingual translation model. Increas- ing the machine translation that uses statisti- ing reach of Internet access in the subcontinent is cal or learning techniques. One of our objec- however changing these, as observable from Fig- tive is also to investigate how to effectively ure 1, which presents the systematic increase in the use the available data that could be monolin- wikipedia pages in Indian languages. gual, noisy and partially aligned. Hindi-English (hi-en) can be labelled a rela- tively high resource pair, after continued efforts • From a system point of view, we would like to compile and increase resources over the past to build and demonstrate a web based system decade. The IIT-Bombay Hindi English Parallel that can provide satisfactory results with pos- Corpus (IITB-hi-en) [Kunchukuttan et al., 2018] is sibly superior results to many of the previous the largest general domain dataset available for use open systems. In the process of developing a in MT. In addition to a 1.5 Million parallel data, system, we also study the quirks that work for IITB-hi-en also offers nearly 19.2M monolingual Indian languages, and comment on directions hindi text, which can be used to further enhance that we did not find promising. the performance. • Quantitatively, we would like to establish a The Indian Languages Corpora Initiative (ILCI) strong baseline for machine translation to and Corpus [Jha, 2010] is another major initiative. The Methods Most of the past attempts look at the

800K problem of machine translation in Indian setting as bn hi 700K a pair-wise translation problem. That is, the scope ur ml is restricted to one or two languages to build trans- 600K ta te lation solutions [Garje and Kharate, 2013, Khan 500K et al., 2017]. The Sampark [TDIL-Sampark] sys- 400K tem looks at translation between different pairs of 300K languages in the country as multiple different prob-

#(wikipedia-pages) 200K lems, solved separately, with some commonality

100K of the solution across the pairs in idea. This pair- wise view of the problem space, enables greater 2009 2011 2013 2015 2017 2019 incorporation of language expertise into the solu- year tion. In the absence of huge resources to train Figure 1: Growth of wikipedia pages in languages a statistical systems, these provided reasonable translations. Sata Anuvadak [Kunchukuttan et al., 2014], a compilation of 110 independently trained ILCI corpus we used here include parallel data spe- translation systems which used Statistical Machine cific to the domains of tourism and health across Translation (SMT) analyzes a multilingual setup seven languages, all 7 aligned. Indic Languages through SMT based approaches. One may attempt Multilingual Parallel Corpus (WAT-ILMPC)1 is to solve multilingual translations through several a compilation of parallel data of subtitles, from hops through intermediate pivots. However, error OPUS2, which contains parallel pairs of subtitles compounds at each step. between languages of the subcontinent and En- Modern day NMT approaches are capable of glish. We also make use of this. handling sharing of learning between language pairs for which no data is available, through Linguistic Aspects Khan et al. [2017] briefly methods of zero shot learning [Johnson et al., covers linguistic aspects of several languages in 2017], transfer learning [Zoph et al., 2016], data- the country. Indian languages are generally mor- augmentation approaches [Sennrich et al., 2016, phologically rich. Free word ordering in many Edunov et al., 2018]. We are motivated by the suc- languages lead to multiple ways of representing cess of these. the content on the target side for some content in the source side. Dravidian languages face fur- Evaluations and Benchmarks Workshop on ther challenges with heavy agglutinative and in- Asian Translation (WAT 2018) [Nakazawa et al., flective characteristics. Recent developments in 2018] tasks on Indian languages had seen models the use of open vocabulary [Kudo, 2018, Sennrich performing lesser on automated evaluation proce- et al., 2015] have found success for many lan- dures being preferred more by human evaluation. guages without large deterioration to translation This may lead to interesting explorations on the quality. The same principles also help us in prac- need of evaluation protocols and data sets for many tically circumventing the challenges due to word Indian language translation tasks. morphology. 3 Neural Machine Translation Our multilingual setting involves language pairs where translation is an “ill-posed” problem. For In the previous section, we discussed the design example, a single word “you” in English could choices and the data we use in developing the translate to “tum”, “me” and “aap” in Hindi, and model. We now discuss and formulate the basic valid sentences exist where correct translation can building blocks of our NMT pipeline. We extend not be identified even with paragraph level con- the formulation to our multilingual setup, and fur- texts. Addressing these are left out of the scope of ther discuss data augmentation techniques. this paper. 3.1 The Pipeline 1http://lotus.kuee.kyoto-u.ac.jp/WAT/ indic-multilingual/index.html We start with a joint vocabulary which can cover 2http://opus.nlpl.eu/ all source and target languages. The source and target samples can thus be represented as se- 3.2 Multilingual Extensions quences comprising of a finite vocabulary V . We In a multilingual setting, we have several lan- follow Kudo [2018] to reduce vocabulary to be fea- guages to translate to and from. Hereafter we sible for learning, where the likelihood of a mono- use xx, yy and zz as placeholders to denote arbi- lingual corpora is maximized while optimizing for trary languages. The languages and the respec- minimum vocabulary. tive notations (shown in the bracket) as English Sequence to sequence models (seq2seq) with an (en), Hindi (hn), Telugu (te), Tamil (ta), Malay- encoder-decoder architecture are widely prevalent alam (ml), (ur), Bangla (bn). among NMT approaches [Sutskever et al., 2014, Our multilingual setup closely follows Johnson Bahdanau et al., 2014, Luong et al., 2015, Vaswani et al. [2017]. We share the encoder and decoder et al., 2017]. We also employ a similar architecture parameters across all languages, following the fact for translating text in one language to another. We that the vocabulary is common on both sides and now proceed to formally describe the parts relevant the representations being the same relaxes the to this work below. computation. A control token is prepended to the Consider a source sequence x = input sequence to indicate which direction to trans- (x1, x2, . . . , xm) and a target sequence late to (__t2xx__). The decoder learns to generate ∈ y = (y1, y2, . . . , yn), where xi, yj V . the target given this input. Our problem is to translate the sequence x to One advantage of using multilingual setups is · y. Our translation model has an encoder (fθ( )) that it enables zero shot learning. Few shot learn- followed by a decoder. The encoder consumes the ing (such as zero-shot and one-shot) has emerged source side sequence x to learn representation c. as an effective tool for addressing the lack of The auto-regressive decoder learns to maximize training data in many practical problems in com- log-likelihood of the target sequence modelled by puter vision [Socher et al., 2013, Zhang and the conditional probability density given below. Saligrama, 2015], language processing [Johnson For a given time-step in the decoding process et al., 2017] and speech processing [Stafylakis and t, the generation process for the token at t is Tzimiropoulos, 2018]. Pairs in language direc- conditioned on c and the tokens generated until tions that did not exist in the training set can still be − t 1. translated. They can also aid low resource settings Encoding is represented as of Indian languages. Rapid adaptations to new lan- guages can also be done through multilingual se- c = fθ(x) tups [Neubig and Hu, 2018]. Aharoni et al. [2019] perform experiments at an even larger scale to find Our objective is now to maximize that massive-multilingual setups leading to large improvements in low resource setting. ∑t=T | ′ log pθ(yt yt

4.1 Datasets 4.2 Training Details In this work, we compile data available from sev- In this section, we describe in detail the hyperpa- eral sources, each of which we describe below. rameters of the models in the previous section, in The languages we use in our experiments include order to enable reproducibility. English (en), Hindi (hi), Bengali (bn), We use SentencePiece4 to reduce vocabu- (ml), Tamil (ta), Telugu (te) and Urdu (ur). Sum- lary while maintaing ability to cover the full text. mary of the corpora used is provided in Table 1. While attempting to create one huge shared vo- cabulary building similar to Johnson et al. [2017], bn hi ml ta te ur we find languages where pairs are lesser in num- wat-train 337K 84K 359K 26K 22K 26K ber are under-represented. To get around this, we iitb-train - 1.5M - - - - build separate SentencePiece models to deal with ilci-train 50K 50K 50K 50K 50K 50K the class imbalance, and to take advantage of the wat-dev 0.5K 0.5K 0.5K 0.5K 0.5K 0.5K iitb-dev - 0.5K - - - - large monolingual data available. These models iitb-test - 2.5K - - - - are trained to cover maximum text constraining to wat-test 1K 1K 1K 1K 1K 1K 4000 vocabulary size. Due to shared vocabulary wat-mono 453K 105K 402K 30K 24K 29K among these individual models arising from code- wiki- 371K - 1M 1.6M 2.66M - mixed content present in the large corpus, we end mono iitb-mono - 19.2M - - - - up with a vocabulary size of 26345, across 7 lan- guages. One advantage to freezing the vocabulary Table 1: WAT Corpus Details is that we can perform warm starts for later models, while varying data. Our experiments we use an NMT model based Parallel Data We primarily rely on the IITB- on Transformer-Base available in fairseq-py5. hi-en as parallel data for our training. We also The model learns to translate across 7 languages. use an extended version of our system from WAT- A control token to indicate which language to 2018 [Philip et al., 2018] to backtranslate the avail- translate to is used to switch languages, as used able monolingual data in Hindi to English, and to in Johnson et al. [2017]. We train the model with augment the training. all the available training data for 120 epochs, giv- To take full advantage in the multilingual set- ing us a baseline model which we denote hereafter ting, we use the ILCI corpus to include training IL-MULTI . Computations for training IL-MULTI pairs in all directions. In addition to the above, we use WAT-ILMPC, which gives ability to the multi- 3https://www.narendramodi.in/mann-ki-baat lingual model to implicitly pivot through English. Thanks to Bahubhashak initiative for bringing the attention to this, as a language resource. The designated splits for testing and validation 4https://github.com/google/sentencepiece (development) for IITB-hi-en and WAT-ILMPC 5https://github.com/pytorch/fairseq took us approximately 3 days on 16 NVIDIA of both datasets. We report the results on the auto- 1080Tis across 4 machines. mated evaluation procedures for the same here. A There are several issues with the data used to summary of our numbers on the BLEU evaluation train IL-MULTI . One major concern visible is metric for each task is presented in Table 2, across imbalance in results across languages. Given an two variants of the same model we have. NMT system which gives noisy but comprehensi- Our base model i.e., IL-MULTI obtains the best ble translations to English, we use backtranslation results till date for the English to Hindi direction to augment and bring in more authentic language task for IITB-en-hi. The results for Hindi to En- samples for under-represented languages. Given glish is also quite comparable. monolingual corpus is abundant compared to bi- For WAT-ILMPC tasks, our system gives com- text, this process works out to improve results fur- parable numbers while translating to English from ther. Therefore we augment the training dataset other available languages. However, in the other with back-translations from monolingual corpus. direction, we observe that the results are lower. We only add {hi, en}->xx samples, to deal with the Attempts to fine-tune and adapt IL-MULTI to class imbalance and to prevent decoder learning the domain specific subtitles dataset gives us IL- noisy text. The system we obtain after training IL- MULTI+ft , which demonstrates a boost in the MULTI further for 72 epochs with backtranslated evaluation scores on test sets specific to WAT- data added, we denote as IL-MULTI+bt . ILMPC. However, we observe performance de- We use Bleualign6[Sennrich and Volk, 2011] grades in the IITB-hi-en test set, which is general to obtain parallel pairs between each pair of lan- domain. guages. Following this, we use heuristic meth- The overall ordering in the evaluation metrics ods to aligned pairs across languages, to obtain a show some patterns and consistency. While xx-en near complete multilingual test-set. The approxi- is giving comparable performances in many lan- mated translation set for Bleualign is created by guages to the best values, en-xx fails to show sim- using our IL-MULTI+bt model, but the multilin- ilar performance. This could be due to imbalances gual pairs after alignment undergoes one iteration in training samples among language-directions. of manual verification. The WAT-ILMPC dataset heavily pivots through English, the training data being strictly with En- 4.3 Evaluations glish on one side and other languages on the other. We report Bilingual Evaluation Understudy This leaves training with a situation where there is (BLEU) [Papineni et al., 2002] on all the test a lot of data for the decoder to learn to generate sets discussed previously. BLEU is a precision English targets, while proportionately very less in based metric for automatic evaluation of machine languages of smaller resources. We try to solve translation test sets. In Section 7, we describe and this scenario by augmenting training with heavy present other metrics which are used in the MT monolingual data on the decoder, trying to trans- space to enable comparisons. late from noisy backtranslated sources in English Automated evaluation is perhaps not the best or Hindi. The model thus warm-started from IL- choice when it comes to determine the quality of MULTI , IL-MULTI+bt gives us increase in BLEU translations. for the less resource directions. Our results on WAT-ILMPC and IITB-hi-en puts 5 Results and Discussions us in a comparable setting for many language In this section, we present our results on WAT- pairs. However, our multiway training enables ILMPC, IITB-hi-en and our newly explored Mann us to study how many non xx-yy directions work, Ki Baat test-sets. In the process, we attempt to where xx or yy need not necessarily be English. For comprehensively study the performance. this, we use the Mann Ki Baat dataset to obtain a proper multilingual dataset, using which we report 5.1 Results on IITB-en-hi and WAT-ILMPC our numbers in these directions below. We have now one-model yielding all-round perfor- 5.2 Multilingual Results mance on two tasks — one domain specific and For our multilingual results, we present BLEU one generic test set being trained on a combination scores after using our SentencePiece models for 6https://github.com/rsennrich/Bleualign tokenization. Though non-standard, this compen- direction model wat-bn wat-hi wat-ml wat-ta wat-te wat-ur iitb-hi IL-MULTI 10.21 20.72 8.17 12.55 19.10 16.56 20.17 from-en IL-MULTI+ft 12.26 26.25 9.22 16.06 23.63 19.51 18.31 IL-MULTI 17.35 29.74 12.45 17.63 25.56 21.03 22.62 to-en IL-MULTI+ft 19.58 31.55 14.77 21.94 30.91 24.60 20.66

Table 2: BLEU scores on WAT and IITB test sets. Directions are indicated in the row.

source The Führer isn’t recruiting you as soldiers, he’s looking for a secretary. target यह आवयक नह है. यूरर तुम को सैनक के प म भत नह कर रहा है, वह एक सचव खोज रहा है. mm यूरर आप सैनक के प म भत नह कर रहा है, वह एक सचव क तलाश कर रहा है. mm+ft यूरर आप सैनक के प म भत नह कर रहा है, वह एक सचव क तलाश म है। source No man in the sky intervened when I was a boy to deliver me from Daddy’s fist and abominations. target आकाश म कोई भी आदमी हतेप कया जब म एक लड़का था मेरे पताजी क मुी और घनौने काम से वतरत करने के लए। mm आसमान म कसी आदमी ने तब हतेप नह कया जब म पताजी क मुी और घनौनी से मुझे पंचाने वाला लड़का था। mm+ft आकाश म कोई भी आदमी हतेप कया जब म एक लड़का था पताजी क मुी और अपमान से मुझे देने के लए. source Tonight, we offer you the one illusion that we dreamed about as kids but never dared perform until now. target आज रात, हम तुम हम बच के प म के बारे म सपना देखा था क एक म पेशकश...... लेकन अब तक दशन करने क हमत कभी नह. mm आज रात हम आपको एक म क पेशकश करते ह क हम बच के प म सपने देखा लेकन अब तक कभी हमत नह क दशन। mm+ft आज रात, हम तुह एक म क पेशकश हम बच के प म के बारे म सपना देखा क लेकन अब तक दशन करने क हमत कभी नह क. source What if I tell you it’s got a Foot Locker on one side and a Claire’s Accessories on the other. target या होगा अगर म तुह बता मल गया है एक फुट एक तरफ लॉकर और सरे पर एक लेयर सहायक उपकरण. mm या अगर म तुह बताता क यह एक तरफ एक फुट लॉकर और एक लेयर एसेसरज मल गया है. mm+ft या म यह एक तरफ एक फुट लॉकर मल गया है और एक लेयर के एसेसरज पर. source I could take up Jon’s duties while he’s gone, My Lord. target जबक वह, चला गया मेरे भु म जॉन के कत का समय लग सकता है। mm म जॉन के कत को संभाल सकता था जब वह गया है, मेरे भु. mm+ft म जॉन के कत को संभाल सकता था जब वह चला गया है, मेरे भु।

Table 3: Five example sentences in English and their translations in Hindi. Readers may note the (i) length (ii) complexity and (iii) diversity among models of the sentences sates for agglutinative languages with free word srcs bn en hi ml ta te ur orderings ending up with low BLEU scores al- bn 99.87 15.20 14.43 8.15 3.85 7.35 0.0 beit translations being of reasonably good quality. en 9.50 100.00 15.37 8.71 4.60 8.20 0.0 hi 11.97 21.79 99.20 9.09 5.83 9.49 0.0 Since the tokenization setup is uniform across lan- ml 6.85 12.00 9.84 99.90 3.37 7.32 0.0 guages, this lets us compare the several directions ta 3.51 6.92 5.86 4.06 99.91 3.63 0.0 possible fairly. (We do however, report the non- te 6.99 11.06 9.54 8.00 3.59 99.74 0.0 ur 0.00 0.00 19.55 0.00 0.00 0.00 100.0 SentencePiece tokenized BLEU scores in Section 7). Table 4: BLEU scores over translation pairs in In Table 4, we present results from Mann Ki Mann Ki Baat multilingual test-set. The complex- Baat, using our model IL-MULTI+bt , which re- ity of sentences are high, with long and rich sen- sulted the best numbers overall. The results indi- tences of the kind included in ministerial speeches. cate improvements especially in generating trans- lations the resource scarce languages for a rich source. Our English-Hindi models give strongest represented well enough in the dataset. The results performance, as in the setting above. The rows in- in ml-te and bn-te show comparable performance dicate source language and the columns target lan- with hi-te and en-te. The same phenomenon exists guage. while translating to Tamil and Malayalam. Almost From the grid, we see performance in (near) zero all the models have learned to copy well, i.e, when shot learning for language pairs which were not the source and target languages are same. It gets the target accurate ∼ 100% of the time. of the WAT-ILMPC. Further inspections indicate In our experiments we find that IL-MULTI+ft the ground truth samples to be incomplete, at ei- which performed in the WAT-ILMPC test sets bet- ther sides or both. In Table 5, we illustrate a few ter failed to show good results here, indicating that of these. the model is clearly adapting to WAT-ILMPC data. 6.2 Limitations of the Evaluation 5.3 Qualitative Examples and Discussions Metrics/Protocol In Tables 9 and 10, we illustrate the success and In this section, we argue why several automated failure cases among a few languages. metrics or evaluation protocols may not be ideal Our model has managed to capture meaning for evaluating MT models in the Indian language most of the time. When the ground-truth is in- setting. There are, like in many other languages, correct due to the subtitle corpus being noisy, our diverse ways of conveying the same meaning in model yet manages to give the correct translation text in Indian Languages as well. Normally, au- (WG -3 for example). There are samples like mu- tomated metrics like BLEU are evaluated across sic tracks which are not translated and stay the multiple references, but the WAT-ILMPC test set same. While these are acceptable in a subtitle set- has only one reference. ting it is not so for pure translation attempts. Sometimes the ground truth, being subtitles 7 Conclusion and Future Work seem to have sentences for which word-meanings can not be inferred from the given context. Take In this work, we report baselines for Indian Lan- WB-4 in Table 10 for example. The ground truth guage tasks. Our solution implemented through has a lot of context, assuming the person has a dis- the state of the art machine learning tricks yield ease and is recovering while translating. It can be competetive results for many language pairs even noticed that our models confuse it to be improve- if not enough resources are available. We study the ment or betterment, with the context being absent multilingual training scenario for Indian language, in the source. applying several established methods in literature Our model is not able to keep up to the mark successfully to get competitive results. while translating poetic sentences. It seems to be A reasonably functioning MT-System can be missing out on colloquial sentences as well, trans- used in many other tasks. These involve comput- lating almost literally, missing intent although cap- ing embeddings for downstream tasks of classifi- turing enough meaning, as in WB-5. cation or sentiment analysis. Ability to transfer the learning from well experimented embeddings like 6 Discussions in English will help Indian Languages. Some of the future works could include (i) use It is important to isolate errors or noise and im- of ideas from machine learning to address the re- prove the evaluation standards over time to en- source constraints (eg. few shot learning, active able more authentic evaluations. In the previous learning, adverserial training) in Indian language section, we illustrated several qualitative examples situations (ii) domain adaptation, transfer learn- where our model was working fine, but the ground ing and style transfer in language/translation situa- truth itself is incorrect, incomplete or has extra in- tions (iii) visualization and interpretebable transla- formation. In this section, we take one iteration at tion models for Indian languages with sharable lin- improving the state of things of the test set. down guistic structures (iv) human in the loop solutions a few errors and discuss our take on the evaluation for training, inference and adaptations (v) Multi- metrics. modal (image/video, speech and text) translational 6.1 Shortcomings in the Evaluation Set models and evaluation schemes. In our experiments, we found our multilingual Acknowledgements models trained on large data to be robust and well performing for translations, but the BLEU Authors would like to acknowledge (i) motiva- scores obtained during evaluation indicates oth- tions from the consistent and focused efforts of erwise. Our best results are obtained after fine- Profs RS and VC over the last many decades and tuning a general purpose model on the training set establishing research groups with clear focus (ii) source Or you could leave and return to your families as men instead of murderers. target അെി നി് െകാലയാളിക് പകരം നിെറ ംബി മടിോകാ കഴിം. source I always wanted to build a talkies in our village in your father’s name. target നിെ അെ േപരിൽ നാിൽ ഒ ടാീസ് പണിക എത് പ് തേല ഉ ഒരഹമാണ്, േമാെന. source At the funfair, near the ghost train, the marshmallow twister is twisting. target मेले म, भूतीया train के पास, marshmallow घूमता ज रह है source See how lovingly he still stares at his master? target देखो क वह अपने गु पर अब भी कैसे यार करता है? source And that’s for Vortex target এবং েসটা িকনা ভরেট'র হেয়? পিরার বুঝা যাে েতামার কােছ সবার নাম আেছ।

Table 5: Qualitative sample from WAT test-dataset. Missing tokens are colored in red. Few samples which are treated as gold aren’t as good. the vision and thoughts of Prof. RR in empow- Tamali Banerjee, Anoop Kunchukuttan, and Pushpak Bhat- ering rural India, (iii) the efforts of Prof. PB in tacharyya. Multilingual indian language translation sys- tem at wat 2018: Many-to-one phrase-based smt. 12 2018. creating resources, test-beds and bringing interna- tional and systematic attention to the Indian lan- Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. guage problems, and (iv) a large number of re- Understanding Back-Translation at Scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural searchers who released their codes and resources Language Processing, pages 489–500, 2018. publicly to enable this work. Special thanks to Pra- jwal Renukanad, Kiran Devraj, Vishnu Dorbala, Roee Aharoni, Melvin Johnson, and Orhan Firat. Massively multilingual neural machine translation. arXiv preprint Rudrabha, Binu, and other volunteers for helping arXiv:1903.00089, 2019. with assessing the translation system for qualita- Graham Neubig and Junjie Hu. Rapid adaptation of neu- tive samples. ral machine translation to new languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 875–880, 2018. References RMK Sinha, K Sivaraman, Aditi Agrawal, Renu Jain, Rakesh Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to Srivastava, and Ajai Jain. Anglabharti: a multilingual ma- sequence learning with neural networks. In Advances in chine aided translation project on translation from english neural information processing systems, pages 3104–3112, to indian languages. In 1995 IEEE International Confer- 2014. ence on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century, volume 2, pages 1609–1614. IEEE, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 1995. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. Sriram Chaudhury, Ankitha Rao, and Dipti M Sharma. Anusaaraka: An expert system based machine translation Minh-Thang Luong, Hieu Pham, and Christopher D Man- system. In Proceedings of the 6th International Confer- ning. Effective approaches to attention-based neural ma- ence on Natural Language Processing and Knowledge En- chine translation. arXiv preprint arXiv:1508.04025, 2015. gineering (NLPKE-2010), pages 1–6. IEEE, 2010. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Harold Somers. Example-based machine translation. Ma- and Yann N Dauphin. Convolutional sequence to sequence chine Translation, 14(2):113–157, 1999. learning. In Proceedings of the 34th International Confer- ence on Machine Learning-Volume 70, pages 1243–1252. Ruchika A Sinhal and Kapil O Gupta. A pure ebmt approach JMLR. org, 2017. for english to hindi sentence translation system. Interna- tional Journal of Modern Education and Computer Sci- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- ence, 6(7):1, 2014. reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Il- Peter F Brown, Vincent J Della Pietra, Stephen A Della Pietra, lia Polosukhin. Attention is all you need. In Advances and Robert L Mercer. The mathematics of statistical ma- in Neural Information Processing Systems, pages 5998– chine translation: Parameter estimation. Computational 6008, 2017. linguistics, 19(2):263–311, 1993. Wikipedia. Wikipedia Statistics V2. URL https://stats. Philipp Koehn. Statistical machine translation. Cambridge wikimedia.org/v2/#/all-projects. University Press, 2009. Google. Google Translate. URL http://translate. Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, google.com. Ritesh M. Shah, and Pushpak Bhattacharyya. Shata- anuvadak: Tackling multiway translation of indian lan- Microsoft. Bing Translator. URL https://www.bing. guages. In LREC, 2014. com/translator. Sukanta Sen, Kamal Kumar Gupta, Asif Ekbal, and Pushpak Ziming Zhang and Venkatesh Saligrama. Zero-shot learning Bhattacharyya. Iitp-mt at wat2018: Transformer-based via semantic similarity embedding. In Proceedings of the multilingual indic-english neural machine translation sys- IEEE international conference on computer vision, pages tem. In Proceedings of the 5th Workshop on Asian Trans- 4166–4174, 2015. lation (WAT2018), 2018. Themos Stafylakis and Georgios Tzimiropoulos. Zero-shot Jerin Philip, Vinay P. Namboodiri, and C. V. Jawahar. CVIT- keyword spotting for visual speech recognition in-the- MT Systems for WAT-2018. In 5th Workshop on Asian wild. In Proceedings of the European Conference on Com- Translation (WAT2018), 2018. puter Vision (ECCV), pages 513–529, 2018.

Anoop Kunchukuttan, Pratik Mehta, and Pushpak Bhat- Rico Sennrich and Martin Volk. Iterative, mt-based sen- tacharyya. The IIT Bombay English-Hindi Parallel Cor- tence alignment of parallel texts. In Proceedings of pus. In Proceedings of the Eleventh International Con- the 18th Nordic Conference of Computational Linguistics ference on Language Resources and Evaluation (LREC- (NODALIDA 2011), pages 175–182, 2011. 2018), 2018. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Girish Nath Jha. The TDIL Program and the Indian Language Zhu. Bleu: a method for automatic evaluation of machine Corpora Intitiative (ILCI). In LREC, 2010. translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Nadeem Jadoon Khan, Waqas Anwar, and Nadir Durrani. Association for Computational Linguistics, 2002. Machine translation approaches and survey for indian lan- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and guages. arXiv preprint arXiv:1701.04290, 2017. Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision Taku Kudo. Subword regularization: Improving neural net- and pattern recognition, pages 248–255. Ieee, 2009. work translation models with multiple subword candi- dates. In Proceedings of the 56th Annual Meeting of Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, the Association for Computational Linguistics (Volume 1: and Hajime Tsukada. Automatic evaluation of translation Long Papers), volume 1, pages 66–75, 2018. quality for distant language pairs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Lan- Rico Sennrich, Barry Haddow, and Alexandra Birch. Neu- guage Processing, pages 944–952. Association for Com- ral machine translation of rare words with subword units. putational Linguistics, 2010. arXiv preprint arXiv:1508.07909, 2015. Rafael E Banchs, Luis F D’Haro, and Haizhou Li. Adequacy– GV Garje and GK Kharate. Survey of machine translation fluency metrics: Evaluating mt in the continuous space systems in india. 2013. model framework. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3):472–482, 2015. TDIL-Sampark. Sampark Translation System. URL http://tdil-dc.in/index.php?option=com_ vertical&parentid=74&lang=en.

Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Vié- gas, Martin Wattenberg, Greg Corrado, et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351, 2017.

Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. Transfer learning for low-resource neural machine trans- lation. In Proceedings of the 2016 Conference on Em- pirical Methods in Natural Language Processing, pages 1568–1575, 2016.

Rico Sennrich, Barry Haddow, and Alexandra Birch. Im- proving neural machine translation models with monolin- gual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 86–96, 2016.

Toshiaki Nakazawa, Katsuhito Sudoh, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, and Sadao Kurohashi. Overview of the 5th workshop on asian translation. In Proceedings of the 5th Workshop on Asian Translation (WAT2018), 2018.

Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems, pages 935–943, 2013. src.hi मुंबई म उहने भारतीय पार क अंतम गद पर आउट होने से पहले 15 गद पर 26 रन बनाए। general.en In , he made 26 runs on 15 balls before being out on the last ball of Indian mercury. adapted.en In Mumbai, he scored 26 runs in 15 balls before being dismissed for the last ball of the Indian innings. src.hi लेकन लयोन ने ये कैच हवा म उड़कर लया। general.en But Leon flew these coaches in the air. adapted.en But Leone took the catch in the air. src.hi 5- भारतीय कतान वराट कोहली ीलंका के खलाफ वनडे मैच म 2000 रन पूरे कर सकते ह। general.en 5- Indian captain Virat Kohli can complete 2000 runs in forest matches against Sri Lanka. adapted.en 5- Indian captain Virat Kohli can complete 2000 runs in ODIs against Sri Lanka.

Table 6: Translated samples by a model trained on a general dataset versus the same model adapted to a domain, here - cricket. The adapted samples are correct translations of the phrases. But due to mixing in English terms which are domain specific the general model makes errors.

A Fine tuning to a specific target domain cricket data and (3) a model with same parame- ters as IL-MULTI trained from scratch. Table 7 Transfer learning has been widely used by the compares the performance of the three models on computer vision community for some time now. the test set. Among the three models, the warm- These works typically achieve excellent perfor- started model fine tuned on cricket specific corpus mance on a specific narrow domain task by fine- achieves the best performance, outperforming the tuning a generic model trained on a larger dataset model trained from scratch on this narrow domain. like ImageNet [Deng et al., 2009]. in the language Cricket domain include many terms which re- space, past works attempt at fine tuning embed- solve to imply meanings different from their nor- dings [??] for sentiment classification to a new mal ones, when restricted to the domain. For in- language or domain where data is scarce, effec- stance a “boundary” or “four” would mean the tively transferring learning from the general case event of “four runs” being scored in a match. the embeddings are originally trained on. In this Terms like “sixers”, “square drive”, “form” etc section, we experiment similar fine-tuning meth- pose different meanings compared to the meanings ods and show that we can improve our translation associated with the same when placed in a general system for a particular domain – cricket related context. In Table 6 are some examples corrected content. by the fine-tuning method, which would otherwise Newspaper reports and commentary archives on be mistranslations due to the lack of contextual cricket are crawled and corrected, and a noisy par- knowledge. allel pair is obtained using online services. 279K parallel pairs are thus obtained between Hindi and B Additional Results English. We set aside 100 samples as test-set and collect human provided translations for the same. B.1 More on WAT-ILMPC In the main paper, the content was focused on Model en → hi hi → en building a general purpose machine translation cold-start 30.80 33.84 system among 7 languages. We argued in favour of warm-start 25.57 29.27 keeping IL-MULTI for future experiments against + fine-tuning 35.72 39.97 IL-MULTI+ft . In this section, we present supple- mentary content assessing IL-MULTI+ft , specific Table 7: BLEU scores of different takes on building to the merits in particular to translation tasks WAT- a domain specific translation system, for cricket. ILMPC. Cold-start denotes model trained on cricket data In Table 8, we indicate Rank-based Intuitive alone. Warm start begin with IL-MULTI - trained Bilingual Evaluation Score (RIBES)[Isozaki et al., on the general dataset. This model is further fine 2010] and Adequacy Metric-Fluency Metric (AM- tuned to cricket data. FM)[Banchs et al., 2015] additionally, along with the BLEU scores presented before. These num- We use three types of models while testing - bers are evaluated by the WAT Submission plat- (1) IL-MULTI , (2) IL-MULTI fine-tuned on form and are comparable to other submissions. metric direction wat-bn wat-hi wat-ml wat-ta wat-te wat-ur BLEU from-en 12.26 26.25 9.22 16.06 23.63 19.51 to-en 19.58 31.55 14.77 21.94 30.91 24.60 from-en 0.654716 0.740006 0.517447 0.700706 0.779194 0.657035 RIBES to-en 0.759463 0.805115 0.727629 0.770894 0.816129 0.741251 from-en 0.576480 0.652810 0.645120 0.760910 0.731550 0.537590 AM-FM to-en 0.529720 0.633010 0.521020 0.602280 0.667160 0.566190

Table 8: All metrics (BLEU, RIBES, AM-FM) on WAT-ILMPC leaderboard for our IL-MULTI+ft model. Our model ranks first in 11 cases, which are highlighted in bold.

Tables 9 and 10, we contains success and fail- models for the respective languages below in Ta- ure cases of our models on WAT-ILMPC test-sets, bles 14, 16, 15. already discussed before.

B.2 Mann Ki Baat We present our multilingual translation BLEU scores on Mann Ki Baat test-set. We find what is expected in agreement with existing NMT litera- ture. Our baseline model gives us good results on the Mann Ki Baat test set. However, contrary to the case in WAT-ILMPC, we find that IL-MULTI out- performs IL-MULTI+ft . This is expected as we fine tuned to a subtitles corpus, and got better off results. As widely demonstrated in literature, aug- mentation with backtranslated data gives improve- ments of few BLEU points to models. Overall IL- MULTI+bt performs the best, as visible from the grids given below (Tables 11, 13, 12) 7. From a qualitative point of view, we found sam- ples from languages like ml, te, ta to be at par with languages except hi and en. However, the BLEU scores indicate a larger gap. We observe this is pri- marily due to agglutinative and inflective nature of these languages. Free word ordering in many leads to low BLEU scores, even when samples are of good quality. The above numbers conform to normal tok- enization in evaluation scripts primarily suited for English. But it is a common practice to use more information to tokenize. These will provide BLEU scores comparable to across languages, especially for languages with heavy agglutination and inflec- tion. We present BLEU scores for the references and hypothesis tokenized using our SentencePiece

7We are following indicnlp’s BLEU computation, which should match the WAT website’s BLEU computation, but did not. We are in touch with the organizers to make the values consistent. wat.ml അവർ ഒ േരാഗകാരിെയ പടർതായി ഞാൻ ക, അവർ് ആേരാഗകരമായ ഒ േഹാ് WG -1 േവണം. wat.en I think they’re spreading a pathogen, and they need a healthy host. mm.en I think they’re spreading a sick person, they need a healthy host. mm+ft.en I think they’re spreading a patient, they need a healthy host. wat.en One, two, three, four, five, six, seven... eight, nine, ten, eleven. wat.ml WG -2 ഒ് ,ര്, ്, നാല്, അ്, ആ, ഏഴ്... എ്, ഒത്, പ്, പതിെനാ് . mm.ml ഒ്, ര്, ്, നാല്, അ്, ആറ്, ഏഴ്... 8, 9, 10, പതിെനാ്. mm+ft.ml ഒ്, ര്, ്, നാല്, അ്, ആറ്, ഏഴ്... എ്, ഒത്, പ്, പതിെനാ്. wat.en Come on listen to this corporal for a moment sir if he’s not dead already wat.ml WG -3 Come on. സർെജ്റ് , ഇയാൾ പറത് ഒ് േകൾ. സർെജ്റ്, ഇവിെട മാരകമായി പേ ഒ പാളാരെനം െകാ് വതാ. mm.ml ഒ നിമിഷം ഈ േകാറ പറത് േക സ, അവ മരിിിെി mm+ft.ml ഈ േകാറ ഒ നിമിഷം േക സ...... അവ മരിിിെി wat.hi तुम मेरे घर य आये थे? WG -4 wat.en Then why do I hurt so bad right now? mm.en Why did you come to my house mm+ft.en Why did you come to my house wat.hi मुझे लगता है क यह वातव म कसी तरह क एक उड़ान बात थी. WG -5 wat.en I think it was actually a flying thing of some kind. mm.en I think it was actually a kind of flying thing. mm+ft.en I think it was actually a kind of flying thing. wat.hi 16 दसबर को हमारा काम कुछ-कुछ घुड़सवार सेना जैसा होगा, ओके? WG -6 wat.en Now, our role on the 16th of December is to be a bit like the cavalry, OK? mm.en On December 16, our work will be like some horsemen army, OK mm+ft.en On December 16th, our job will be like some riding army, okay wat.en I don’t know how you find the time to raise kids and teach. wat.ta ைழைத கெகார எ உக ேநர ெகட? WG -7 mm.ta க ழைதக வள ம ப ேநர எப ெதயா. mm+ft.ta க ழைதக வள ம பக ேநர கக எப என ெதயா.

Table 9: Success Cases from WAT test-set. wat.ml അത് ഇനി നിെറ ദയിൽ എ െവം േദഷം ഉെ് പറാം ശരി.. കാി വലി- WB-1 ാ സമയമാോൾ.. മി ആൾാർം അതിന് കഴിയി.. wat.en No matter how much hate and anger you may have in your heart, when it comes time to pull the trigger most people can’t do it. google.en No matter how much hatred and anger it is in your heart .. most people can not do it .. mm+ft.en If you say it’s hate and angry in your heart, right... when you smoke the trigger, most people can’t do it. mm.en If it’s a hate and anger in your heart, then it’s okay... when it’s time to smoke the treasure, most people can’t do it.

wat.ml വാളിേനാൾ ർ് അ ചലിിവെറ ചിയിലാണ് ഉാേകത് . wat.en Sometimes a sharp mind is enough. WB-2 google.en It should be in the mind of the one who moves it over the sword and moves it. mm+ft.en It’s more important than the sword that’s in the mind of a moving man. mm.en It’s supposed to be in the mind of the sword that drives. wat.en ”Most likely they have died of cold and hunger - far away there in the middle of the forest.” wat.ml WB-3 "Most likely they have died of cold and hunger - far away there in the middle of the forest." mm.ml "അവ് തം വിശം മരണം സംഭവിം...... അവിെട വനിെറ മി െര." mm+ft.ml "അവ് തം വിശം ലം മരണം സംഭവിാം...... അവിെട കാിെറ നവി െര." wat.en ”Most likely they have died of cold and hunger - far away there in the middle of the forest.” wat.en He told me you would get better. And you did. WB-4 wat.ml അവൻ പറ അ് േവഗം േഭദമാെമ് അ മാകം െചോ. mm.ml അവ എോട് പറ നിന് നത് കിെമ്. mm+ft.ml അവ എോട് പറ നിന് ഖമാെമ്. wat.en Old age should burn and rave at close of day wat.ta Old age should burn and rave at close of day WB-5 mm.ta வயதானவக நா வ எக ேவ... mm+ft.ta பைழய வய எக ேவ ம நா ெநகமாக ழா ேவ

Table 10: Failure Cases from WAT test-set srcs bn en hi ml ta te ur bn 99.87 15.32 14.22 7.35 3.51 6.40 0.00 srcs bn en hi ml ta te ur en 9.45 100.00 15.46 7.99 3.80 6.02 14.53 bn 99.85 11.56 12.42 2.01 1.20 2.30 0.00 hi 11.96 21.77 99.22 8.50 4.55 7.34 10.50 en 4.98 100.00 13.75 1.87 1.28 1.81 18.52 ml 6.54 12.13 9.85 99.89 3.05 6.19 0.00 hi 6.93 17.87 99.49 2.77 1.71 3.09 12.55 ta 3.40 6.82 5.85 3.69 99.86 3.20 0.00 ml 3.19 8.93 8.79 99.77 0.95 2.43 0.00 te 6.51 11.00 9.34 7.42 3.26 99.65 0.00 ta 1.68 5.15 5.33 1.16 99.74 1.57 0.00 ur 0.00 0.00 28.54 0.00 0.00 0.00 100.00 te 2.78 7.51 7.67 2.24 1.06 99.48 0.00 ur 0.00 0.00 32.64 0.00 0.00 0.00 100.00 Table 14: BLEU scores of IL-MULTI with tokeniza- tion Table 11: BLEU scores of IL-MULTI srcs bn en hi ml ta te ur srcs bn en hi ml ta te ur bn 99.87 15.20 14.43 8.15 3.85 7.35 0.0 bn 99.85 11.28 12.46 2.49 1.30 2.72 0.00 en 9.50 100.00 15.37 8.71 4.60 8.20 0.0 en 5.18 100.00 13.66 2.78 1.46 2.81 0.00 hi 11.97 21.79 99.20 9.09 5.83 9.49 0.0 hi 7.01 17.95 99.50 3.03 2.07 3.78 0.00 ml 6.85 12.00 9.84 99.90 3.37 7.32 0.0 ml 3.45 8.86 8.98 99.77 1.04 3.08 0.00 ta 3.51 6.92 5.86 4.06 99.91 3.63 0.0 ta 1.78 5.28 5.46 1.22 99.85 1.71 0.00 te 6.99 11.06 9.54 8.00 3.59 99.74 0.0 te 3.01 7.64 7.84 2.61 1.21 99.68 0.00 ur 0.00 0.00 19.55 0.00 0.00 0.00 100.0 ur 0.00 0.00 32.64 0.00 0.00 0.00 100.00 Table 15: BLEU scores of IL-MULTI+bt with tok- Table 12: BLEU scores of IL-MULTI+bt enization srcs bn en hi ml ta te ur srcs bn en hi ml ta te ur bn 99.49 9.53 3.07 1.07 0.59 1.46 0.00 en 4.19 99.99 10.10 1.93 1.18 1.63 0.00 bn 99.73 12.85 3.87 4.29 1.05 4.31 0.00 hi 4.54 13.28 99.42 1.50 0.91 1.36 18.44 en 8.39 99.98 12.34 7.70 2.95 4.94 0.00 ml 1.33 7.51 4.79 99.42 0.88 1.89 0.00 hi 7.52 17.33 99.17 5.57 2.07 3.44 15.67 ta 0.98 4.24 3.21 0.92 99.53 0.93 0.00 ml 3.11 10.33 6.08 99.75 2.50 4.61 0.00 te 2.06 5.79 3.15 1.41 0.47 99.10 0.00 ta 1.79 5.71 3.91 3.13 99.77 1.84 0.00 ur 0.00 0.00 0.00 0.00 0.00 0.00 100.00 te 4.96 8.67 4.82 5.23 0.87 99.51 0.00 ur 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Table 13: BLEU scores of IL-MULTI+ft Table 16: BLEU scores of IL-MULTI+ft with tok- enization en Mann Ki Baat, February 2019 My dear countrymen, Namaskar. en Mann Ki Baat, February 2019 My dear countrymen, Na- Mann Ki Baat, February 2019 My dear countrymen, Na- maskar. maskar. hi मान क बात, फरवर 2019 मेरे य देशवासी, नमकार। मन क बात, फरवर 2019 मेरे यारे देशवासयो, नमकार bn িক বাট, েফয়াির ২০১৯ আমার িয় েদশবাসী, নামসকার. মন িক বাত, েফয়াির ২০১৯ আমার িয় েদশবাসী, নমার! ta ம பா, ரவ 2019 எ ய ேநச நா மன ர, ரவ 2019 எனதைம நா- நமக. மகேள, வணக!

en In order to ensure that their fellow countrymen could sleep peacefully, these brave sons toiled relentlessly, day or night. en In order to ensure that their fellow countrymen could In order to ensure that their fellow countrymen could sleep peacefully, these brave sons toiled relentlessly, day sleep peacefully, these brave sons toiled relentlessly, day or night. or night. ml േദശവാസിക് സമാധാനോെട ഉറാ കഴിെമ- ജന സമാധാനോെട ഉറാ, നെട ഈ വീര- ് ഉറവാ, ഈ ധീരമാ് നിതേമാ, രാ- ാ രാിെയ പകലാി കാവ നി. ിേയാ ഉണി. hi ताक यह सुनत कया जा सके क उनके साथी देशवासी शांतपूवक देशवासी चैन क नद सो सक, इसलए, इन हमारे वीर सपूत ने, रात- सो सक, इन बहार बेट ने अथक, दन या रात बता दी। दन एक करके रखा था te 顇శ鑍 పర్�ంతం尾 ꀿదర్ꡋ퐾లꀿ శర్ద鱍尾 ꀿశ桍밿ం桇ం顁呁 ఈ 顇శపర్జ పర్�ంతం尾 ꀿదర్ ꡋవడం 呋సం ఈ푀రꡁతమ హస 呁렾 పర్�ంతం尾, పగ 졇頾 쀾తిర్吿 運డꡍ萾葍. ꀿదార్쀾 렾ꁁ呁ꀿ మన젿ꁍ రకిష్ం栾. ta நபக அைமயாக க த எப- நாமக மயாக உறகேவ ைத உ ெசய இத ர மகக அைம- எபதகாக, நைடய இத ர ைம- யாக, நா அல இர றதன. தக, இர பக எனபாராம தகைள அபதாக. bn তােদর েদশবাসীরা শািেত ঘুমােত পারেত িনিত করার জ, এই েদশবাসী যােত িনিে ঘুেমােত পােরন েসিদেক েখয়াল েরেখ এইসব সাহসী েছেলরা িনঃসেেহ, িদন বা রােত িবর হেয় যায়। বীর সুসানরা িদন-রাত এক কের সজাগ দৃি েরেখ চলেতা।

te ప頿 쁋灁ల కిర్తం భరత렾త తన 푀ర ꡁలꁁ 呋졋ꡍ밿ం頿.

en Ten days ago, the Great Mother lost his heroic sons. 10 days ago, Mother India had to bear the loss of many of her valiant sons. ta ப நாக னா பரமாத த- 10 நாக பாக, பாரத அைன த ைடய ரரகைள இழதா . ர ைமதகைள இழறா. hi दस दन पहले भरत मा अपने वीर पुतल को खो चुक थी । 10 दन पूव, भारत-माता ने अपने वीर सपूत को खो दया bn দশ িদন আেগ ভৃত িনেজর বীেরর পুেক হািরেয় েফেলন ৷ দশ িদন আেগ, ভারতমাতা তাঁর বীর সুসানেদর হািরেয়েছন। te ప頿 쁋灁ల కిర్తం భరత렾త తన 푀ర ꡁలꁁ 呋졋ꡍ밿ం頿. ప頿 쁋灁ల కిర్తం భరత렾త తన 푀రꡁలꁁ呋졋ꡍ밿ం頿. ml പ ദിവസി ന്പ് ഭരത തെറ വീരാെര 10 ദിവസം ് ഭാരതമാതാവിന് ധീരാരായ ാെര നെ . നെ.

ml ൈസനം ഭീകരവാദികെളം അവെര സഹായിവേരം േവേരാെട ഇാെതയാാ നിയം െചിരികയാണ്. en The army is determined to seize the terrorists and those The Army has resolved to wipe out terrorists and their who help them. harbourers. hi सेना ने आतंकवादय और उनक मददगार को जड़ से वंचत करने का सेना ने आतंकवादय और उनके मददगार के समूल नाश का संकप संकप लया है। ले लया है ta இராவ பயகரவாக ம அவக- ரவாகைள அவக ைண உத ெசதவகைள ேவ - ெசபவகைள ேவேரா ெக எ க ய ெசள . உபாைட இராவன ேமெகா- றாக. ml ൈസനം ഭീകരവാദികെളം അവെര സഹായിവേര- ൈസനം ഭീകരവാദികെളം അവെര സഹായിവേര- ം േവേരാെട ഇാെതയാാ നിയം െചിരിക- ം േവേരാെട ഇാെതയാാ നിയം െചിരിക- യാണ്. യാണ്. te సైꀿక 題త뱍푇త遍 మ쀿 퐾쀿 సయం 桇퐾쀿ꀿ 푇푇 అ푇 렾ట 顇శ ధై쀾뱍ꀿ吿 బ젾ꀿꁍ ఇ栾桍밿. ꁁం萿 త꠿ꡍం桁呋వ萾ꀿ吿 ꀿశ桍밿ంచబ萿ం頿 . bn ৈসরা সাসীেদর এবং তােদর সাহাযকারীেদরেক আলাদা কের রা- েসনারা সাসবাদীেদর এবং তােদর মদতকারীেদর কীভােব সমূেল ংস খার জ ুত করা হেয়েছ ৷ করা যায় তার স িনেয়েছ।

Table 17: Qualitative samples from Mann Ki Baat Dataset. Source is one large row. Translations to different languages are shown in two columns - first one is our prediction and second one the respective aligned ground truth. en Jamuna Tudu, famous nicknamed ’Lady Tarzan’ in Jharkhand, most valiantly took on the Timber Mafia and Nax- alites, and not only saved the 50 hectares of forest but also inspired ten thousand women to unite and protect the trees and wildlife. en Jamuna Tudu, famous nicknamed ’Lady Tarzan’ in Jamuna Tudu, famous nicknamed ’Lady Tarzan’ in Jharkhand, most valiantly took on the Timber Mafia and Jharkhand, most valiantly took on the Timber Mafia and Naxalites, and not only saved the 50 hectares of forest Naxalites, and not only saved the 50 hectares of forest but also inspired ten thousand women to unite and pro- but also inspired ten thousand women to unite and pro- tect the trees and wildlife. tect the trees and wildlife. te 瀾ర塍ండ 졋పర్頿鱍 框ం頿న "젾葀 簾쀾灍న" ꡇగల "జ롁ꀾ - ◌ార塍ండ 졋 졇葀 簾쀾灍న ꡇ運 పర్堾뱍逿 框ం頿న జ롁ꀾ - ", అత뱍鰿కం尾 簿ంబర 렾갿밾 మ쀿 కిష్ప谿頾లపై 遀- , 簿ంబర 렾갿밾 運, నకలైట졍 運 ꡋ쀾葇 హస- 呁ꀾꁍ, 呇వలం 50 吾籍ర졍 అడలꁁ 렾తర్롇 吾꠾డ졇顁, వంతమైన పꀿ 桇. ఏభై 吾籍ర졍 అట푀 పార్ం逾ꀿꁍ నష籍ꡋ呁ం萾 吾ꁀ ప頿푇ల మం頿 సతరీ 框ట졍ꁁ మ쀿 వన뱍పార్谿ꀿ సంరకిష్ంచ- 吾꠾డడ롇 吾呁ం萾, క젿క籁籍尾 框籁졍, వన뱍灀퐾ల రక్షణ 呋సం 萾ꀿ吿 萾 పేర్రణ 桇�. ꡋ쀾葇젾 ప頿푇ల మళల呁 పేర్రణꁁ ఇ栾桍. ta ஜாக கெபற "லா தஜ" ஆக ஆன ேபா ட உல- ("Jamuna Tudu", "Timber Mafia" ம "Naki") க வ மக ேயாககைலைய க ரபலமாக எதா, ேம 50 அவ பறா, இவைர 1500 ேபக- ெஹேட காகைள மேம காபாய- ைள ேயாக பநகளாக ஆறா. ட, ப ஆர ெபகைள மரகைள, வனலகைள ஒைமபத னா. ml ഛാഖിെല 'ലാഡി ടാസ' എ ശനായ ജന ◌ാഖി േലഡി ടാസ എ േപരി വിഖാതയായ , ഏം വിശാലമായി ടിംബ മാഫിയം േദശാടന- യനാ തടി മാഫിയേയാം നക് സകേളാംേപാരാ- ാം സരി് 50 െഹ വനം രി മാമ, പ് കെയ സാഹസം വി. 50 െഹ വനം നശി- ആയിരം ീക മരം വനജീവികെളം ഒി രി- ാെത കാെവ മാമ, പതിനായിരം ീകെള സം- ാം േരിി. ഘടിി് െടം വനജീവികെടം രായി വിാ അവെര േരിി. bn ঝাখে িবখাত "লািড টাজন", সবেচেয় চম￿কারভােব িটার মািফ- ঝাখের িবখাত ￿Lady Tarzan￿ িহেসেব খাত যমুনা টুডু, গা- য়া ও নবািহনী দখল কেরন। শুধুমা ৫০ েহর বন রা কেরন, ছপাচারকারী মািফয়া ও নকশালেদর েমাকািবলা করার মত সাহসী এমনিক দশ হাজার মিহলােক গাছ ও ব াণীেক এক করার জও কাজ কের িতিন শুধু ৫০ েহর জল উজা হেয় যাওয়ার হাত েরণ কেরেছন।. েথেক বাঁচানিন, উপরু দশ হাজার মিহলােক একেজাট কের গাছ ও বাণীেদর সুরার জ পািঠেয়েছন। hi झारखंड म मशर उपनाम 'लेड तजन' जमुना तुडू ने सबसे यादा चम- झारखड म ￿￿￿ ￿￿￿￿￿ के नाम से वयात जमुना टुडू ने टबर कार से टबर माफया और नसलय को न केवल 50 हेटेयर जंगल माफया और नसलय से लोहा लेने का साहसक काम कया उहने से बचाया बिक दस हजार महला को पेड़-पौध और वयजीव क न केवल 50 हेटेयर जंगल को उजड़ने से बचाया बिक दस हज़ार सुरा के लए ेरत कया। महला को एकजुट कर पेड़ और वयजीव क सुरा के लए ेरत कया

Table 18: Longer sentences from Mann Ki Baat set.