Workshop Abstracts

EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION LREC 2012 SATELLITE WORKSHOPS Held under the Patronage of Ms Neelie Kroes, Vice-President of the European Commission, Digital Agenda Commissioner MAY 21-22 & MAY 26-27, 2012 ISTANBUL LÜTFI KIRDAR CONVENTION & EXHIBITION CENTRE ISTANBUL, TURKEY WORKSHOP ABSTRACTS Editors: Please refer to each single workshop list of editors. Editorial Assistance by: Sara Goggi, Hélène Mazo © ELRA – European Language Resources Association. All rights reserved. LREC 2012, EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Title: LREC 2012 Workshop Abstracts Distributed by: ELRA – European Language Resources Association 55-57, rue Brillat Savarin 75013 Paris France Tel.: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 www.elra.info and www.elda.org Email: [email protected] and [email protected] Copyright by the European Language Resources Association ISBN 978-2-9517408-7-7 EAN 9782951740877 All rights reserved. No part of this book may be reproduced in any form without the prior permission of the European Language Resources Association ii TABLE OF CONTENTS W112 - WILDRE Workshop on Indian Language Data: Resources and Evaluation...............................................1 W114 - Workshop on Language Resources and Technologies for Turkic Languages...........................................15 W18 - Best Practices for Speech Corpora in Linguistic Research..........................................................................23 W110 - Multimodal Corpora: How should multimodal corpora deal with the situation? .....................................31 W121 - Challenges in the management of large corpora........................................................................................39 W129 - Merging of Language Resources................................................................................................................47 W51 - SALTMIL & AfLaT2012: Language technology for normalisation of less-resourced languages..............55 W54 - ColabTKR 2012: Collaboration in Terminology and Knowledge Representation......................................65 W57 - LRE-Rel: Language Resources and Evaluation for Religious Texts...........................................................73 W83 - Semantic Relations-II...................................................................................................................................85 W87 - Describing Language Resources with Metadata..........................................................................................95 W89 - META-RESEARCH Workshop on Advanced Treebanking.....................................................................105 W102 - Adaptation of Language Resources and Tools for Processing Cultural Heritage ...................................113 W111 - Third Workshop on Building and Evaluating Resources for Biomedical Text Mining...........................121 W120 - Language Engineering for Online Reputation Management....................................................................133 W49 - Building and Using Comparable Corpora..................................................................................................141 W88 - EEOP2012: Exploring and Exploiting Official Publications.....................................................................153 W93 - ES³ 2012: Corpora for Research on Emotion, Sentiment and Social Signals............................................159 W06 - Joint ISA-7, SRSL-3, and I2MRT Workshop on Semantic Annotation and the Integration and Interoperability of Multimodal Resources and Tools............................................................................................173 W46 - Computational Models of Narrative...........................................................................................................181 W103 - Ugc: @NLP can u tag #user_generated content?! via lrec-conf.org........................................................197 W134 - Collaborative Resource Development and Delivery................................................................................205 W47 - Language Resources for Public Security Applications .............................................................................213 W77 - Representation and Processing of Sign Languages ...................................................................................223 W84 - NLP4ITA: Natural Language Processing for Improving Textual Accessibility........................................237 W86 - CREDISLAS Creating Cross-Language Resources for Disconnected Languages and Styles..................243 W97 - SPLeT-2012: Semantic Processing of Legal Texts....................................................................................251 iii iv Workshop on Indian Language and Data: Resources and Evaluation 21 May 2012 ABSTRACTS Editors: Girish Nath Jha, Kalika Bali, Sobha L. 1 Workshop Programme 08:30-08:40 – Welcome by Workshop Chairs 08:40-08:55 – Inaugural Address by Mrs. Swarn Lata, Head, TDIL, Dept of IT, Govt of India 08:55-09:10 – Address by Dr. Khalid Choukri, ELDA CEO 0910-09:45 – Keynote Lecture by Prof Pushpak Bhattacharyya, Dept of CSE, IIT Bombay. 09:45-10:30 – Paper Session I Chairperson: Sobha L • Somnath Chandra, Swaran Lata and Swati Arora, Standardization of POS Tag Set for Indian Languages based on XML Internationalization best practices guidelines • Ankush Gupta, A Generic and Robust Algorithm for Paragraph Alignment and its Impact on Sentence Alignment in Parallel Corpora • Malarkodi C.S and Sobha Lalitha Devi, A Deeper Look into Features for NE Resolution in Indian Languages 10:30 – 11:00 Coffee break + Poster Session Chairperson: Monojit Choudhury • Akilandeswari A, Bakiyavathi T and Sobha Lalitha Devi, ‘atu’ Difficult Pronominal in Tamil • Subhash Chandra, Restructuring of Painian Morphological Rules for Computer processing of Sanskrit Nominal Inflections • Praveen Dakwale, Himanshu Sharma and Dipti Misra Sharma, Anaphora Annotation in Hindi Dependency TreeBank • H. Mamata Devi, On the Development of Manipuri-Hindi Parallel Corpus • Madhav Gopal, Annotating Bundeli Corpus Using the BIS POS Tagset • Madhav Gopal and Girish Nath Jha, Developing Sanskrit Corpora Based on the National Standard: Issues and Challenges • Ajit Kumar and Vishal Goyal, Practical Approach For Developing Hindi-Punjabi Parallel Corpus • Sachin Kumar, Girish Nath Jha and Sobha Lalitha Devi, Challenges in Developing Named Entity Recognition System for Sanskrit • Swaran Lata and Swati Arora, Exploratory Analysis of Punjabi Tones in relation to orthographic characters: A Case Study • Diwakar Mishra, Kalika Bali and Girish Nath Jha, Grapheme-to-Phoneme converter for Sanskrit Speech Synthesis • Aparna Mukherjee, Phonetic Dictionary for Indian English • Sibansu Mukhapadyay, Tirthankar Dasgupta and Anupam Basu, Development of an Online Repository of Bangla Literary Texts and its Ontological Representation for Advance Search Options • Kumar Nripendra Pathak, Challenges in Sanskrit-Hindi Adjective Mapping • Nikhil Priyatam Pattisapu, Srikanth Reddy Vadepally and Vasudeva Varma, Hindi Web Page Collection tagged with Tourism Health and Miscellaneous • Arulmozi S, Balasubramanian G and Rajendran S, Treatment of Tamil Deverbal Nouns in BIS Tagset 2 • Gurpreet Singh, Letter-to-Sound Rules for Gurmukhi Panjabi (Pa): First step towards Text- to-Speech for Gurmukhi • Silvia Staurengo, TschwaneLex Suite (5.0.0.414) Software to Create Italian-Hindi and Hindi-Italian Terminological Database on Food, Nutrition, Biotechnologies and Safety on Nutrition: a Case Study. 11:00 – 12:00 – Paper Session II Chairperson: Kalika Bali • Shahid Mushtaq Bhat and Richa Srishti, Building Large Scale POS Annotated Corpus for Hindi & Urdu • Vijay Sundar Ram R, Bakiyavathi T, Sindhujagopalan R, Amudha K and Sobha Lalitha Devi, Tamil Clause Boundary Identification: Annotation and Evaluation • Manjira Sinha, Tirthankar Dasgupta and Anupam Basu, A Complex Network Analysis of Syllables in Bangla through SyllableNet • Pinkey Nainwani, Blurring the demarcation between Machine Assisted Translation (MAT) and Machine Translation (MT): the case of English and Sindhi 12:00-12:40 – Panel discussion on "India and Europe - making a common cause in LTRs" Coordinator: Nicoletta Calzolari Panelists - Kahlid Choukri, Joseph Mariani, Pushpak Bhattacharya, Swaran Lata, Monojit Choudhury, Zygmunt Vetulani, Dafydd Gibbon 12:40- 12:55 – Valedictory Address by Prof Nicoletta Calzolari, Director ILC-CNR, Italy 12:55-13:00 – Vote of Thanks 3 Workshop Organizers Girish Nath Jha Jawaharlal Nehru University, New Delhi Kalika Bali Microsoft Research Lab India, Bangalore AU-KBC Research Centre, Anna University, Sobha L. Chennai Workshop Programme Committee A. Kumaran Microsoft Research Lab India, Bangalore A. G. Ramakrishnan IISc Bangalore Amba Kulkarni University of Hyderabad Dafydd Gibbon Universitat Bielefeld, Germany Dipti Mishra Sharma IIIT, Hyderabad Girish Nath Jha Jawaharlal Nehru University, New Delhi Joseph Mariani LIMSI-CNRS, France Kalika Bali Microsoft Research Lab India, Bangalore Khalid Choukri ELRA, France Monojit Choudhury Microsoft Research Lab India, Bangalore Nicoletta Calzolari ILC-CNR, Pisa, Italy Niladri Shekhar Dash ISI Kolkata Shivaji Bandhopadhyah Jadavpur University, Kolkata Sobha L. AU-KBC Research Centre, Anna University Soma Paul IIIT, Hyderabad Umamaheshwar Rao University of Hyderabad 4 Introduction WILDRE – the first ‘Workshop on Indian Language Data: Resources and Evaluation’ is being organized in Istanbul,

Workshop Abstracts

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support