WILDRE-2 2Nd Workshop on Indian Language Data

WILDRE2 - 2nd Workshop on Indian Language Data: Resources and Evaluation Workshop Programme 27th May 2014 14.00 – 15.15 hrs: Inaugural session 14.00 – 14.10 hrs – Welcome by Workshop Chairs 14.10 – 14.30 hrs – Inaugural Address by Mrs. Swarn Lata, Head, TDIL, Dept of IT, Govt of India 14.30 – 15.15 hrs – Keynote Lecture by Prof. Dr. Dafydd Gibbon, Universität Bielefeld, Germany 15.15 – 16.00 hrs – Paper Session I Chairperson: Zygmunt Vetulani Sobha Lalitha Devi, Vijay Sundar Ram and Pattabhi RK Rao, Anaphora Resolution System for Indian Languages Sobha Lalitha Devi, Sindhuja Gopalan and Lakshmi S, Automatic Identification of Discourse Relations in Indian Languages Srishti Singh and Esha Banerjee, Annotating Bhojpuri Corpus using BIS Scheme 16.00 – 16.30 hrs – Coffee break + Poster Session Chairperson: Kalika Bali Niladri Sekhar Dash, Developing Some Interactive Tools for Web-Based Access of the Digital Bengali Prose Text Corpus Krishna Maya Manger, Divergences in Machine Translation with reference to the Hindi and Nepali language pair András Kornai and Pushpak Bhattacharyya, Indian Subcontinent Language Vitalization Niladri Sekhar Dash, Generation of a Digital Dialect Corpus (DDC): Some Empirical Observations and Theoretical Postulations S Rajendran and Arulmozi Selvaraj, Augmenting Dravidian WordNet with Context Menaka Sankarlingam, Malarkodi C S and Sobha Lalitha Devi, A Deep Study on Causal Relations and its Automatic Identification in Tamil Panchanan Mohanty, Ramesh C. Malik & Bhimasena Bhol, Issues in the Creation of Synsets in Odia: A Report1 Uwe Quasthoff, Ritwik Mitra, Sunny Mitra, Thomas Eckart, Dirk Goldhahn, Pawan Goyal, Animesh Mukherjee, Large Web Corpora of High Quality for Indian Languages i Massimo Moneglia, Susan W. Brown, Aniruddha Kar, Anand Kumar, Atul Kumar Ojha, Heliana Mello, Niharika, Girish Nath Jha, Bhaskar Ray, Annu Sharma, Mapping Indian Languages onto the IMAGACT Visual Ontology of Action Pinkey Nainwani, Handling Conflational Divergence in a pair of languages: the case of English and Sindhi Jayendra Rakesh Yeka, Vishnu S G and Dipti Misra Sharma, Semi automated annotated treebank construction for Hindi and Urdu Saikrishna Srirampur, Deepak Kumar Malladi and Radhika Mamidi, Improvised and Adaptable Statistical Morph Analyzer (SMA++) K Kabi Khanganba and Girish Nath Jha, Challenges in Indian Language Transliteration: a case of Devanagari, Bangla and Manipuri Girish Nath Jha, Lars Hellan, Dorothee Beermann, Srishti Singh, Pitambar Behera and Esha Banerjee, Indian languages on the TypeCraft platform – the case of Hindi and Odia 16.30 – 17.30 hrs – Paper Session II Chairperson: Dr. S. S. Aggarwal Nripendra Pathak and Girish Nath Jha, Issues in Mapping of Sanskrit-Hindi Verb forms Atul Kr. Ojha, Akanksha Bansal, Sumedh Hadke and Girish Nath Jha, Evaluation of Hindi- English MT Systems Sreelekha S, Pushpak Bhattacharyya and Malathi D, Lexical Resources for Hindi Marathi MT Esha Banerjee, Akanksha Bansal and Girish Nath Jha, Issues in chunking parallel corpora: mapping Hindi-English verb group in ILCI 17:30 – 18.10 hrs – Panel discussion India and Europe - making a common cause in LTRs Coordinator: Hans Uszkoreit Panelists - Joseph Mariani, Swaran Lata, Zygmunt Vetulani, Dafydd Gibbon, Panchanan Mohanty 18:10- 18:25 hrs – Valedictory Address by Prof. Nicoletta Calzolari, CNR-ILC, Italy 18:25-18:30 hrs – Vote of Thanks ii Editors Girish Nath Jha Jawaharlal Nehru University, New Delhi Kalika Bali Microsoft Research Lab India, Bangalore Sobha L AU-KBC Research Centre, Anna University, Chennai Esha Banerjee Jawaharlal Nehru University, New Delhi Workshop Organizers/Organizing Committee Girish Nath Jha Jawaharlal Nehru University, New Delhi Kalika Bali Microsoft Research Lab India, Bangalore Sobha L AU-KBC Research Centre, Anna University, Chennai Workshop Programme Committee A. Kumaran Microsoft Research, India Amba Kulkarni University of Hyderabad, India Chris Cieri, LDC University of Pennsylvania Dafydd Gibbon Universität Bielefeld, Germany Dipti Mishra Sharma IIIT, Hyderabad, India Girish Nath Jha Jawaharlal Nehru University, New Delhi, India Hans Uszkoreit Saarland University, Germany Indranil Datta EFLU, Hyderabad, India Jopseph Mariani LIMSI-CNRS, France Jyoti Pawar Goa University, India Kalika Bali MSRI, Bangalore, India Karunesh Arora CDAC Noida, India Malhar Kulkarni IIT Bombay, India Monojit Choudhary Microsoft Research, India Nicoletta Calzolari CNR-ILC, Pisa, Italy Niladri Shekhar Dash ISI Kolkata, India Panchanan Mohanty University of Hyderabad, India Pushpak Bhattacharya IIT Bombay, India S. S. Aggarwal KIIT, Gurgaon Sobha L AU-KBC RC, Anna University, Chennai, India Umamaheshwar Rao University of Hyderabad, India Zygmunt Vetulani Adam Mickiewicz University, Poznan, Poland iii Table of contents Introduction x 1 Anaphora Resolution System for Indian Languages 1 Sobha Lalitha Devi, Vijay Sundar Ram and Pattabhi RK Rao 2 Automatic Identification of Discourse Relations 9 in Indian Languages Sobha Lalitha Devi, Sindhuja Gopalan and Lakshmi S 3 Annotating Bhojpuri Corpus using BIS Scheme 16 Srishti Singh and Esha Banerjee 4 Indian Subcontinent Language Vitalization 24 András Kornai and Pushpak Bhattacharyya 5 Augmenting Dravidian WordNet with Context 28 Rajendran S and Arulmozi Selvaraj 6 A Study on Causal Relations and its Automatic 34 Identification in Tamil Menaka Sankarlingam, Malarkodi C S and Sobha Lalitha Devi iv 7 Issues in the Creation of Synsets in Odia: A Report1 41 Panchanan Mohanty, Ramesh C Malik and Bhimasena Bhol 8 Large Web Corpora of High Quality for Indian 47 Languages Uwe Quasthoff, Ritwik Mitra, Sunny Mitra, Thomas Eckart, Dirk Goldhahn, Pawan Goyal, Animesh Mukherjee 9 Mapping Indian Languages onto the IMAGACT 51 Visual Ontology of Action Massimo Moneglia, Susan Brown, Aniruddha Kar, Anand Kumar, Atul Kumar Ojha, Heliana Mello, Niharika, Girish Nath Jha, Bhaskar Ray and Annu Sharma. 10 Handling Conflational Divergence in a pair of 56 languages: the case of English and Sindhi Pinkey Nainwani 11 Semi-automated annotated treebank construction 64 for Hindi and Urdu Vishnu S G, Jayendra Rakesh Yeka and Dipti Misra Sharma v 12 Improvised and Adaptable Statistical Morph 73 Analyzer (SMA++) Saikrishna Srirampur, Deepak Kumar Malladi and Radhika Mamidi 13 Challenges in Indian Language Transliteration: 77 a case of Devanagari, Bangla and Manipuri K Kabi Khanganba and Girish Nath Jha 14 Indian languages on the TypeCraft platform – 84 the case of Hindi and Odia Girish Nath Jha, Lars Hellan, Dorothee Beermann, Srishti Singh, Pitambar Behera and Esha Banerjee 15 Issues in Mapping of Sanskrit-Hindi Verb forms 90 Nripendra Pathak and Girish Nath Jha 16 Evaluation of Hindi-English MT Systems 94 Atul Kr. Ojha, Akanksha Bansal, Sumedh Hadke and Girish Nath Jha 17 Lexical Resources for Hindi Marathi MT 102 Sreelekha S, Pushpak Bhattacharyya and Malathi D vi 18 Issues in chunking parallel corpora: mapping 111 Hindi-English verb group in ILCI Esha Banerjee, Akanksha Bansal and Girish Nath Jha vii Author Index Arulmozi, S. ………………………………………………………………………………………..29 Banerjee, Esha………………………………………………………………………..……16, 85, 112 Bansal, Akanksha…………………………………………………………………………...…95, 112 Beermann, Dorothee……………………………………………………………………………..…85 Behera, Pitambar……………………………………………………………………………………85 Bhattacharyya, Pushpak…………………………………………………………………….…25, 103 Bhol, Bhimasena……………………………………………………………………………………42 Brown, Susan……………………………………………………………………………….………52 C.S., Malarkodi………………………………………………………………………………..……35 D, Malathi…………………………………………………………………………………………103 Eckart, Thomas…………………………………………………………………………………..…48 Goldhahn, Dirk…………………………………………………………………………………..…48 Gopalan, Sindhuja……………………………………………………………………………………9 Goyal, Pawan…………………………………………………………………………………….…48 Hadke, Sumedh…………………………………………………………………………………..…95 Hellan, Lars…………………………………………………………………………………………85 Jha, Girish Nath………………………………………………………….……52, 78, 85, 91, 95, 112 Kar, Aniruddha…………………………………………………………..…………………………52 Khanganba, K Kabi………………………………………………………...……………………….85 Kornai, András………………………………………………………………...……………………25 Kumar, Anand………………………………………………………………………………………52 Lalitha Devi, Sobha………………………………………………………………...…………1, 9, 35 Malik, Ramesh C. ……………………………………………………………………….…………42 Malladi, Deepak Kumar………………………………………………………………….…………74 Mamidi, Radhika……………………………………………………………………………………74 Mello, Heliana………………………………………………………………………………………52 Mishra, Niharika……………………………………………………………………………………52 Mitra, Ritwik……………………………………………………………………………..…………48 Mitra, Sunny Mitra………………………………………………………………………….………48 Mohanty, Panchanan…………………………………………………………………………..……42 Moneglia, Massimo…………………………………………………………………………………52 Mukherjee, Animesh……………………………………………………………………………..…48 Nainwani, Pinkey……………………………………………………………………………...……57 Ojha, Atul Kumar………………………………………………………………………….……52, 95 Pathak, Nripendra……………………………………………………………………………..……91 Quasthoff, Uwe………………………………………………………………………………..……48 R.,Vijay Sundar Ram……………………………………………………………………...…………1 Rajendran, S. …………………………………………………………………………………….…29 Rao, Pattabhi RK……………………………………………………………………………….……1 Ray, Bhaskar………………………………………………………………………………......……52 S G, Vishnu…………………………………………………………………………………………65 viii S, Sreelekha…………………………………………………………………………………......…103 S., Lakshmi……………………………………………………………………………………..……9 S., Menaka…………………………………………………………………………………....…….35 Sharma, Annu………………………………………………………………………………………52 Sharma, Dipti Misra…………………………………………………………………………...……65 Singh, Srishti……………………………………………………………………………………16, 85 Srirampur, Saikrishna………………………………………………………………………………74

WILDRE-2 2Nd Workshop on Indian Language Data

Linguapax Review 2010 Linguapax Review 2010

Chapter 2 Language Use in Nepal

Some Principles of the Use of Macro-Areas Language Dynamics &A

Linguistic Survey of India Bihar

7=SINO-INDIAN Phylosector

A Sociolinguistic Survey of the Bhatri-Speaking Communities of Central India

ISO 639-3 New Code Request

Indian Hieroglyphs

2001 Presented Below Is an Alphabetical Abstract of Languages A

Language Documentation and Description

An Experimental Analysis of Speech Features for Tone Speech Recognition

Himalayan Linguistics Apatani Phonology and Lexicon