Advances in Computational Linguistics

Advances in Computational Linguistics Research in Computing Science Series Editorial Board Comité Editorial de la Serie Editors-in-Chief: Associate Editors: Editores en Jefe Editores Asociados Juan Humberto Sossa Azuela (Mexico) Jesús Angulo (France) Gerhard Ritter (USA) Jihad El-Sana (Israel) Jean Serra (France) Jesús Figueroa (Mexico) Ulises Cortés (Spain) Alexander Gelbukh (Russia) Ioannis Kakadiaris (USA) Serguei Levachkine (Russia) Petros Maragos (Greece) Julian Padget (UK) Mateo Valero (Spain) Editorial Coordination: Formatting: Coordinación Editorial Formato Blanca Miranda Valencia Sulema Torres Ramos Research in Computing Science es una publicación trimestral, de circulación internacional, editada por el Centro de Investigación en Computación del IPN, para dar a conocer los avances de investigación científica y desarrollo tecnológico de la comunidad científica internacional. Volumen 41 Febrero, 2009. Tiraje: 500 ejemplares. Certificado de Reserva de Derechos al Uso Exclusivo del Título No. 04-2004-062613250000- 102, expedido por el Instituto Nacional de Derecho de Autor. Certificado de Licitud de Título No. 12897, Certificado de licitud de Contenido No. 10470, expedidos por la Comisión Calificadora de Publicaciones y Revistas Ilustradas. El contenido de los artículos es responsabilidad exclusiva de sus respectivos autores. Queda prohibida la reproducción total o parcial, por cualquier medio, sin el permiso expreso del editor, excepto para uso personal o de estudio haciendo cita explícita en la primera página de cada documento. Imagen de la portada: http://tierramayaimports.com/images/PL3terraA.jpg. Impreso en la Ciudad de México, en los Talleres Gráficos del IPN – Dirección de Publicaciones, Tres Guerras 27, Centro Histórico, México, D.F. Distribuida por el Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Esq. Av. Miguel Othón de Mendizábal, Col. Nueva Industrial Vallejo, C.P. 07738, México, D.F. Tel. 57 29 60 00, ext. 56571. Editor Responsable: Juan Humberto Sossa Azuela, RFC SOAJ560723 Research in Computing Science is published by the Center for Computing Research of IPN. Volume 41, February, 2009. Printing 500. The authors are responsible for the contents of their articles. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior permission of Centre for Computing Research. Printed in Mexico City, February, 2009, in the IPN Graphic Workshop – Publication Office. Volume 41 Volumen 41 Advances in Computational Linguistics Volume Editor: Editor de Volumen Alexander Gelbukh Instituto Politécnico Nacional Centro de Investigación en Computación México 2009 ISSN: 1870-4069 Copyright © Instituto Politécnico Nacional 2009 Copyright © by Instituto Politécnico Nacional Instituto Politécnico Nacional (IPN) Centro de Investigación en Computación (CIC) Av. Juan de Dios Bátiz s/n esq. M. Othón de Mendizábal Unidad Profesional “Adolfo López Mateos”, Zacatenco 07738, México D.F., México http://www.ipn.mx http://www.cic.ipn.mx The editors and the Publisher of this journal have made their best effort in preparing this special issue, but make no warranty of any kind, expressed or implied, with regard to the information contained in this volume. All rights reserved. No part of this publication may be reproduced, stored on a retrieval system or transmitted, in any form or by any means, including electronic, mechanical, photocopying, recording, or otherwise, without prior permission of the Instituto Politécnico Nacional, except for personal or classroom use provided that copies bear the full citation notice provided on the first page of each paper. Indexed in LATINDEX and Periodica / Indexada en LATINDEX y Periodica Printing: 500 / Tiraje: 500 Printed in Mexico / Impreso en México Preface Computational linguistics is an interdisciplinary research area that combines the ideas and methods of linguistics, computer science, and artificial intelligence and has two- fold goal: on the one hand, to study human language by means of modern computational methods, and on the other hand, to develop computer programs capable of human-like activities related to understanding or producing texts or speech in human language, such as English or Chinese. The most important technical applications of computational linguistics include information retrieval and information organization, machine translation, and natural language interfaces, among others. However, as in any science, the activities of the researchers are mostly concentrated on its internal art and craft, that is, on the solution of the problems arising in analysis or synthesis of natural language text or speech, such as syntactic and semantic analysis, disambiguation, or compilation of dictionar- ies and grammars necessary for such analysis. This volume presents 25 original research papers written by 64 authors represent- ing 23 different countries: Algeria, Brazil, Canada, China, Czech Republic, France, Germany, Greece, Hong Kong, India, Islamic Republic of Iran, Italy, Jordan, Lithua- nia, Macao, Mexico, Portugal, Russian Federation, Spain, Switzerland, Turkey, United Kingdom, and United States. The volume is structured in 8 thematic areas of both theory and applications of computational linguistics: – Computational lexicography and lexical resources – Morphology and syntax – Semantics – Anaphora and co-reference – Text classification – Text summarization – Speech generation – Applications The papers included in this volume were selected on the base of rigorous international reviewing process out of 65 submissions considered for evaluation; thus the acceptance rate of this volume was 38%. I would like to cordially thank all people involved in the preparation of this volume. In the first place I want to thank the authors of the published paper for their excellent research work that gives sense to the work of all other people involved, as well as the authors of rejected papers for their interest and effort. I also thank the members of the Editorial Board of the volume and additional reviewers for their hard work on reviewing and selecting the papers. I thank Sulema Torres, Ignacio Garcia Araoz, Oralia del Carmen Pérez Orozco, and Raquel López Alamilla for their valuable collaboration in preparation of this volume. The submission, reviewing, and selection process was supported for free by the EasyChair system, www.EasyChair.org. Alexander Gelbukh February 2009 Table of Contents Índice Page/Pág. Computational Lexicography and Lexical Resources Scaling to Billion-plus Word Corpora.........................................................................3 Jan Pomikálek, Pavel Rychlý and Adam Kilgarriff Detecting and Grounding Terms in Biomedical Literature........................................15 Kaarel Kaljurand, Fabio Rinaldi, Thomas Kappeler and Gerold Schneider Automatic Word Clustering in Studying Semantic Structure of Texts.....................27 Olga Mitrofanova Semantically-Driven Extraction of Relations between Named Entities ....................35 Caroline Brun and Caroline Hagège Evaluation of Named Entity Extraction Systems.......................................................47 Mónica Marrero, Sonia Sánchez-Cuadrado, Jorge Morato Lara and George Andreadakis Building a Brazilian Portuguese Parallel Corpus of Original and Simplified Texts ...............................................................................59 Helena M. Caseli, Tiago F. Pereira, Lucia Specia, Thiago A. S. Pardo, Caroline Gasperin and Sandra M. Aluisio Morphology and Syntax Synchronized Morphological and Syntactic Disambiguation for Arabic ..................73 Daoud Daoud Parsing with Polymorphic Categorial Grammars ......................................................87 Matteo Capelletti and Fabio Tamburini A CCG-based System for Valence Shifting for Sentiment Analysis.........................99 František Simančík and Mark Lee Classiﬁcation of “Inheritance” Relations: a Semi-automatic Approach ..................109 Ekaterina Lapshinova-Koltunski Semantics Discovering Discourse Motifs in Instructional Dialog ............................................123 Juan M. Huerta Ontology Oriented Computation of English Verbs Metaphorical Trait...................135 Zili Chen, Jonathan J. Webster, Ian I. Chow and Tianyong Hao Automatic Formal Verification of Conceptual Model Documentation by Means of Self-organizing Map ...........................................................................143 Algirdas Laukaitis and Olegas Vasilecas Anaphora and Co-reference Coreference Resolution using Markov Logic Network ...........................................157 Shujian Huang, Yabing Zhang, Junsheng Zhou and Jiajun Chen A Ranking Approach to Persian Pronoun Resolution..............................................169 Nafiseh Sadat Moosavi and Gholamreza Ghassem-Sani Text Classification Classification of Clinical Conditions: A Case Study on Prediction of Obesity and Its Co-morbidities .....................................................183 Archana Bhattarai, Vasile Rus and Dipankar Dasgupta Analysis of Stemming Alternatives and Dependency Pattern Support in Text Classification...............................................................................................195 Levent Özgür and Tunga Güngör Investigating Variations in Adjective Use across Different Text Categories ..........207 Jing Cao and Alex Chengyu Fang Multi-category

Advances in Computational Linguistics

International Computer Science Institute Activity Report 2007

A Resource of Corpus-Derived Typed Predicate Argument Structures for Croatian

3 Corpus Tools for Lexicographers

Better Web Corpora for Corpus Linguistics and NLP

Overview: Computational Lexical Semantics and the Week Ahead

Unit 3: Available Corpora and Software

Adam Kilgarriff's Legacy to Computational Linguistics And

Finding Definitions in Large Corpora with Sketch Engine

A Toolbox for Linguistic Discovery1

Semantic Word Sketches

Deliverable D3.3.-B Third Batch of Language Resources

John Benjamins Publishing Company