Linguistic Resources and Tools for Processing the Romanian Language” Mălini, 27-29 October 2016
Total Page:16
File Type:pdf, Size:1020Kb
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE “LINGUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE” MĂLINI, 27-29 OCTOBER 2016 Editors: Maria Mitrofan Daniela Gîfu Dan Tufiș Dan Cristea Organisers Faculty of Computer Science “Alexandru Ioan Cuza” University of Iași Research Institute for Artificial Intelligence “Mihai Drăgănescu” Romanian Academy, Bucharest Institute for Computer Science Romanian Academy, Iași Under the auspices of the Academy of Technical Sciences This volume was published with the support of the National Authority for Scientific Research and Innovation (ANCSI) ISSN 1843-911X PROGRAM COMMITTEE Verginica Barbu Mititelu, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Costin Bădică, Faculty of Automation, Computers and Electronics, University of Craiova Tiberiu Boroș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Mihaela Colhon, Faculty of Mathematics and Natural Science, University of Craiova Dan Cristea, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași and Institute for Computer Science, Romanian Academy, Iași Ştefan Daniel Dumitrescu, Institute of Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Corina Forăscu, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași and Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Daniela Gîfu, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Adrian Iftene, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Andreea Macovei, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Cătălina Mărănduc, Institute of Linguistics “Iorgu Iordan - Al. Rosetti”, Romanian Academy, Bucharest and Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Mihai Alex Moruz, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Ionuț Cristian Pistol, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Octavian Popescu, IBM Research Center, USA Elena Isabelle Tamba, “A. Philippide” Institute for Romanian Philology, Romanian Academy, Iași Diana Trandabăț, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Dan Tufiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Marius Zbancioc, Institute for Computer Science, Romanian Academy, Iași ORGANISING COMMITTEE Mihaela Colhon, Faculty of Mathematics and Natural Science, University of Craiova Dan Cristea, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Lucian Gâdioi, Faculty of Computer Science, “Alexandru Ioan Cuza” University, Iași Daniela Gîfu, Faculty of Computer Science, “Alexandru Ioan Cuza” University, Iași Adrian Iftene, Faculty of Computer Science, “Alexandru Ioan Cuza” University, Iași Andreea Macovei, Faculty of Computer Science, “Alexandru Ioan Cuza” University, Iași Maria Mitrofan, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Diana Trandabăț, Faculty of Computer Science, “Alexandru Ioan Cuza” University, Iași Dan Tufiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest v TABLE OF CONTENTS TABLE OF CONTENTS ......................................................................... VII FOREWORD .............................................................................................. IX CHAPTER 1 CORPORA ACQUISITION AND ANNOTATION .......... 1 DIGITIZATION OF ROMANIAN PRINTED TEXTS OF THE 17TH CENTURY ...................................................................................................... 3 Alexandru Colesnicov, Ludmila Malahov, Tudor Bumbu TEMPORAL ANNOTATION IN LINKED DATA SOURCES ............................. 11 Ioana Ionașcu, Tiberiu Boroș AN INTRODUCTION TO TIME TRACKS ........................................................... 19 Andreea Macovei BUILDING AND EVALUATING THE ROMANIAN MEDICAL CORPUS ....... 29 Maria Mitrofan, Dan Tufiș CHAPTER 2 TREE-BANKS AND DEPENDENCY PARSING ........... 37 FOLK POETRY FOR COMPUTERS: MOLDOVAN CODRI’S BALLADS PARSING ................................................................................................................ 39 Victoria Bobocev, Tudor Bumbu, Victoria Lazu, Victoria Maxim, Daniela Istrati DEPENDENCY PARSING WITHIN NOUN PHRASES WITH PATTERN- BASED APPROACHES .......................................................................................... 51 Mihaela Colhon, Dan Cristea SYNTAXNET FOR ROMANIAN: RESULTS AND POTENTIAL ...................... 61 Paula Gradu, Radu Ion TWO RESOURCES DEVELOPED IN THE PROJECT SEMANTICS-DRIVEN SYNTACTIC PARSER FOR ROMANIAN ............................................................ 69 Elena Irimia, Verginica Barbu Mititelu A RESOURCE FOR THE WRITTEN ROMANIAN: THE UAIC DEPENDENCY TREEBANK ............................................................................................................ 79 Cătălina Mărănduc, Cenel-Augusto Perez CHAPTER 3 SPEECH DATA PROCESSING AND STUDIES............ 91 THE TRANSCRIPTION OF ROMANIAN CORPORA BETWEEN WHAT IS SPOKEN AND THE GRAMMATICALY CORRECT WRITING ........................ 93 Vasile Apopei, Otilia Păduraru VOICE CONTROLLED HOME AUTOMATION SYSTEM ............................... 101 Tiberiu Boroș, Ștefan Daniel Dumitrescu, Horia Cucu A RECURRENT NEURAL NETWORKS APPROACH FOR KEYWORD SPOTTING APPLIED ON ROMANIAN LANGUAGE ...................................... 111 vii Sonia Pipa, Tiberiu Boroș TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS.............................................................................................................. 121 Alin-Florentin Vasile, Tiberiu Boroș CHAPTER 4 LEXICAL AND SEMANTIC RESOURCES ................. 129 ADJECTIVES IN WORDNET: SEMANTIC ISSUES ......................................... 131 Tsvetana Dimitrova, Valentina Stefanova TERMINOLOGY APPROACH IN ROMANIAN LANGUAGE DICTIONARIES.THEORETIC AND PRACTICAL ASPECTS. STUDY OF DLR AND DEX .............................................................................................................. 143 Mihaela Marin A BOOSTRAPPING SYSTEM FOR DICTIONARY MANAGEMENT AND PARSING .............................................................................................................. 153 Mihai Alex Moruz, Dan Cristea AUTOMATIC MERGING OF MARKED UP TEXTS FOR DICTIONARY ENTRY PARSING ................................................................................................ 163 Mihai Alex Moruz A COLLOCATIONAL APPROACH TO ROMANIAN STRONG NEGATIVE POLARITY ITEMS ............................................................................................... 173 Monica-Mihaela Rizea, Gianina N. Iordăchioaia, Frank Richter CHAPTER 5 SHORT PAPERS .............................................................. 187 A DEEPER PERSPECTIVE OF ONLINE TOURISM REVIEWS ANALYSIS USING NATURAL LANGUAGE PROCESSING AND COMPLEX NETWORKS TECHNIQUES ...................................................................................................... 189 Alex Becheru, Costin Bădică A ROMANIAN CORPUS ANNOTATED WITH VERBAL MULTIWORD EXPRESSIONS ..................................................................................................... 193 Verginica Barbu Mititelu, Monica-Mihaela Rizea, Mihaela Ionescu, Mihaela Onofrei, Elena Irimia A PERSPECTIVE ON THE EVALUATION OF A SYSTEM OFFERING ENHANCED E-BOOK INTERACTION .............................................................. 197 Ionuț Pistol, Daniela Gîfu THE E-CULTFOOD PROJECT ............................................................................ 201 Diana Trandabăț, Petronela Savin, Daniela Gîfu, Andreea Macovei ON THE PHONEMIC STATUS OF THE ROMANIAN VOWELS Ă [Ʌ] AND Â [Ɨ]: EVIDENCE FROM LARGE SCALE ACOUSTIC ANALYSIS AND AUTOMATIC SPEECH RECOGNITION ............................................................ 205 Ioana Vasilescu, Margaret E.L. Renwick, Bianca Vieru, Lori Lamel INDEX OF AUTHORS ............................................................................ 209 viii FOREWORD From its inception, in 2001, the ConsILR Conference (traditional acronym for Consorțiul pentru Informatizarea Limbii Române, an initiative born in the Section for Information Science and Technology of the Romanian Academy) was meant as a meeting place for linguists and computational linguists, but also for researchers of the humanities, PhD students and master students in Computational Linguistics, all with a major interest in the study of the Romanian language from a computational perspective. The series of events have run, with few exceptions, once every year, first in the format of a workshop, and since 2010 – as a conference. In order to reach wider visibility, the organisers decided to make the Conference itinerant and to publish its Proceedings in English. Thus, ConsILR is not strictly addressed to researchers working on Romanian language but also to other scientists, from any part of the world, which could find sources of inspiration in the models and techniques developed for our language and apply them for their own languages. Opening the gate for researchers working on languages other than Romanian to participate in the Conference and publish their work in this Proceedings, a reverse influence is also facilitated, namely that their work inspire scientists working on the Romanian language. This year the Conference