Template Consilr 2006
Total Page:16
File Type:pdf, Size:1020Kb
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE “LINGUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE” IAȘI, 22-23 NOVEMBER 2018 Editors: Vasile Păiș Daniela Gîfu Diana Trandabăț Dan Cristea Dan Tufiș Organisers Faculty of Computer Science “Alexandru Ioan Cuza” University of Iași Research Institute for Artificial Intelligence “Mihai Drăgănescu” Romanian Academy, Bucharest Institute for Computer Science, Romanian Academy, Iași Romanian Association of Computational Linguistics Under the auspices of the Academy of Technical Sciences ISSN 1843-911X SCIENTIFIC COMMITTEE Mihaela Colhon, Faculty of Mathematics and Natural Science, University of Craiova Dan Cristea, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Daniela Gîfu, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Adrian Iftene, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Verginica Barbu Mititelu, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Mihai Alex Moruz, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Vasile Păiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Ionuț Cristian Pistol, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Elena Isabelle Tamba, “A. Philippide” Institute for Romanian Philology, Romanian Academy, Iași Horia-Nicolai Teodorescu, ARFI-IIT & Faculty of Electronics, Telecommunications and Information Technology, “Gheorghe Asachi” Technical University of Iași Diana Trandabăț, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Ștefan Trăușan-Matu, Computer Science Department, “POLITEHNICA” University of Bucharest & Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest; Dan Tufiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Marius Zbancioc, Institute for Computer Science, Romanian Academy, Iași iii ORGANISING COMMITTEE Anca Diana Bibiri, Department of Interdisciplinary Research – Humanities and Social Sciences, “Alexandru Ioan Cuza” University of Iași Dan Cristea, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Paul Diac, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Lucian Gâdioi, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Daniela Gîfu, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Adrian Iftene, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Mihaela Onofrei, Institute for Computer Science, Romanian Academy, Iași Vasile Păiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Andrei Scutelnicu, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Diana Trandabăț, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Dan Tufiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest TABLE OF CONTENTS TABLE OF CONTENTS ................................................................................................................ V FOREWORD ................................................................................................................................ VI CHAPTER 1 SPEECH PROCESSING ......................................................................................... 1 A Psycholinguistic View in Romanian Intonation Interpretation ....................................................................... 3 Doina Jitcă Nonlinear System for Romanian Voice Processing ......................................................................................... 13 Carmen Grigoraș, Victor Grigoraș, Vasile Apopei Comparison of I-Vector And GMM-UBM Speaker Recognition on a Romanian Large Speech Corpus............ 25 Alexandru-Lucian Georgescu, Horia Cucu, Corneliu Burileanu A Comparison Between Traditional Machine Learning Approaches and Deep Neural Networks for Text Processing in Romanian ................................................................................................................................. 33 Adriana Stan, Mircea Giurgiu Prosodic Phrasing between Subjective Perception and Objectivity .................................................................. 43 Vasile Apopei, Otilia Paduraru CHAPTER 2 TEXT PROCESSING .............................................................................................51 On Text Processing after OCR ....................................................................................................................... 53 Alexandru Colesnicov, Svetlana Cojocaru, Ludmila Malahov, Lyudmila Burtseva Romanian Diacritics Restoration using Recurrent Neural Networks ................................................................ 61 Stefan Ruseti, Teodor-Mihai Cotet, Mihai Dascalu TEPROLIN: an Extensible, Online Text Preprocessing Platform for Romanian............................................... 69 Radu Ion How to Find out Whether the Romanian Language Was Influenced by the Two Historical Unions? ................ 77 Dan Cristea, Daniela Gîfu, Svetlana Cojocaru, Alexandru Colesnicov, Ludmila Malahov, Marius Popescu, Mihaela Onofrei, Cecilia Bolea CHAPTER 3 LINGUISTIC RESOURCES ..................................................................................89 More Romanian Word Embeddings from the ReTeRom Project ..................................................................... 91 Vasile Păiș, Dan Tufiș Valence Dictionary for Romanian Language in Printed Version and XML Format ........................................ 101 Ana-Maria Barbu Nonstandard vs. RRT: Comparison of Two Romanian Corpora in UD .......................................................... 113 Cătălina Mărănduc, Victoria Bobicev Interlinking and Extending Large Lexical Resources for Romanian .............................................................. 125 Mihai Alex Moruz, Andrei Scutelnicu, Dan Cristea CHAPTER 4 APPLICATIONS ................................................................................................... 133 README - Improving Writing Skills in Romanian Language ...................................................................... 135 Maria-Dorinela Sirbu, Mihai Dascălu, Daniela Gîfu, Teodor-Mihai Cotet, Adrian Tosca, Stefan Trausan- Matu New GIS Based Approaches for the Linguistic Atlases ................................................................................. 147 Silviu-Ioan Bejinariu, Vasile Apopei, Manuela Nevaci, Nicolae Saramandu Let’s Teach Computer Poetry: Automatic Rhyme Detection ......................................................................... 159 Victoria Bobicev, Cătălina Mărănduc Extracting Actions from Romanian Instructions for IoT Devices ................................................................... 169 Bianca I. Nenciu, Stefan Ruseti, Mihai Dascalu Identifying Fake News on Twitter Using Naïve Bayes, SVM and Random Forest Distributed Algorithms ..... 177 Ciprian-Gabriel Cușmaliuc, Lucia-Georgiana Coca, Adrian Iftene INDEX OF AUTHORS................................................................................................................ 190 FOREWORD At its inception, in 2001, the ConsILR Conference (traditional acronym for Consorțiul de Informatizare pentru Limba Română – Consortium of Informatization for Romanian Language) was an initiative born in the Section of Information Science and Technology of the Romanian Academy, meant to bring together at the same table linguists and computational linguists, researchers in different fields of humanities, PhD students and master students in computational linguistics, pure linguistics and philologists, all those that have a major interest in the study of the Romanian language from a computational perspective. The series of events have run, with few exceptions, once every year, first in the format of a workshop, and since 2010 – as a conference. In order to reach wider visibility, the organisers decided to make the Conference itinerant and to publish its Proceedings in English. Thus, ConsILR is not strictly addressed to researchers working on Romanian language but also to other scientists, from any part of the world, which could find sources of inspiration in the models and techniques developed for our language and apply them for their own languages. Opening the gate for researchers working on languages other than Romanian to participate in the Conference and publish their results in this series of Proceedings, a reverse influence is also expected, namely that their work inspire scientists dealing with Romanian language. As always, this year too, the organisers of the Conference Linguistic Resources and Technologies for Romanian Language are: the Faculty of Computer Science of the “Alexandru Ioan Cuza” University of Iași and two institutes of the Romanian Academy: the Research Institute for Artificial Intelligence “Mihai Drăgănescu”, in Bucharest, and the Institute for Computer Science, in Iași. We are honored that the Academy of Technical Sciences has offered again its high auspices to run the event under and, for the first time, the Romanian Association