Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent Anoop Kunchukuttan ∗ Ratish Puduppully ∗y Pushpak Bhattacharyya IIT Bombay IIIT Hyderabad IIT Bombay
[email protected] ratish.surendran
[email protected] @research.iiit.ac.in Abstract one of the oldest writing systems of the Indian sub- continent which can be dated to at least the 3rd cen- We present Brahmi-Net - an online system for tury B.C.E. In addition, Arabic-derived and Roman transliteration and script conversion for all ma- scripts are also used for some languages. Given the jor Indian language pairs (306 pairs). The sys- diversity of languages and scripts, transliteration and tem covers 13 Indo-Aryan languages, 4 Dra- script conversion are extremely important to enable vidian languages and English. For training effective communication. the transliteration systems, we mined paral- The goal of script conversion is to represent the lel transliteration corpora from parallel trans- lation corpora using an unsupervised method source script accurately in the target script, without and trained statistical transliteration systems loss of phonetic information. It is useful for exactly using the mined corpora. Languages which reading manuscripts, signboards, etc. It can serve do not have parallel corpora are supported as a useful tool for linguists, NLP researchers, etc. by transliteration through a bridge language. whose research is multilingual in nature. Script con- Our script conversion system supports con- version enables reading text written in foreign scripts version between all Brahmi-derived scripts as accurately in a user's native script. On the other well as ITRANS romanization scheme.