MUCS@ - Machine Translation for Dravidian Languages using Stacked Long Short Term Memory Asha Hegde Ibrahim Gashaw H. L. Shashirekha Dept of Computer Science Dept of Computer Science Dept of Computer Science Mangalore University Mangalore University Mangalore University
[email protected] [email protected] [email protected] Abstract are small literary languages. All these languages except Kodava have their own script. Further, these The Dravidian language family is one of 1 the largest language families in the world. languages consists of 80 different dialects namely In spite of its uniqueness, Dravidian lan- Brahui, Kurukh, Malto, Kui, Kuvi, etc. Dravid- guages have gained very less attention due ian Languages are mainly spoken in southern In- to scarcity of resources to conduct language dia, Sri Lanka, some parts of Pakistan and Nepal technology tasks such as translation, Parts- by over 222 million people (Hammarstrom¨ et al., of-Speech tagging, Word Sense Disambigua- 2017). It is thought that Dravidian languages are tion etc. In this paper, we, team MUCS, native to the Indian subcontinent and were origi- describe sequence-to-sequence stacked Long nally spread throughout India1. Tamil have been Short Term Memory (LSTM) based Neural Machine Translation (NMT) models submit- distributed to Burma, Indonesia, Malaysia, Fiji, ted to “Machine Translation in Dravidian lan- Madagascar, Mauritius, Guyana, Martinique and guages”, a shared task organized by EACL- Trinidad through trade and emigration. With over 2021. The NMT models are applied for trans- two million speakers, primarily in Pakistan and two lation using English-Tamil, English-Telugu, million speakers in Afghanistan, Brahui is the only English-Malayalam and Tamil-Telugu corpora Dravidian language spoken entirely outside India provided by the organizers.