Proceedings of the 16Th International Workshop on Treebanks and Linguistic Theories

TLT16 January 2018 Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories January 23–24, 2018 Prague, Czech Republic ISBN: 978-80-88132-04-2 Edited by Jan Hajič. These proceedings with papers presented at the 16th International Workshop on Treebanks and Linguistic Theories are included in the ACL Anthology, supported by the Association for Computational Linguistics (ACL) at http://aclweb.org/anthology; both individual papers as well as the full proceedings book are available. © 2017. Copyright of each paper stays with the respective authors (or their employers). Distributed under a CC-BY 4.0 licence. TLT16 takes place in Prague, Czech Republic on January 23–24, 2018. Preface The Sixteenth International Workshop on Treebanks and Linguistic Theories (TLT16) is being held at Charles University, Czech Republic, 23–24 January 2018, for the second time in Prague, which hosted TLT already in 2006. This year, TLT16 is co-located with the Workshop on Data Provenance and Annota- tion in Computational Linguistics 2018 (January 22, 2018, at the same place and venue) and immediately followed by the 2nd Workshop on Corpus-based Research in the Humanities, held in Vienna, Austria. This year, TLT16 received 32 submissions of which 15 have been selected to be presented as oral presentations, and additional seven have been asked to present as posters. We were happy that our in- vitation to give a plenary talk has been accepted by both Lilja Øvrelid of University of Oslo in Norway (with a talk “Downstream use of syntactic analysis: does representation matter?”) and Marie Candito, of University of Paris Diderot, France (“Annotating and parsing to semantic frames: feedback from the French FrameNet project”). We have also included a panel discussion on the very topic of Treebanks and Linguistic Theories – to discuss opinions of the future of the field and the workshop with the participants. We are grateful to the members of the program committee, who worked hard to review the submissions and provided authors with valuable feedback. We would also like to thank various sponsors, mainly the Faculty of Mathematics and Physics of Charles University, providing local and logistical and accounting and financial support, and the Centre for Advanced Study (CAS) at the Norwegian Academy of Science and Letters for supporting one of the invited speakers. Last but not least, we would like to thank all authors for submitting interesting and relevant papers, and we wish all participants a fruitful workshop. December 2017 Jan Hajič (also in the name of local organizers), Charles University, Prague, Czech Republic Sandra Kuebler, Indiana University, Bloomington, Indiana, USA Stephan Oepen, University of Oslo, Norway, and CAS, Norwegian Academy of Science and Letters, Oslo Markus Dickinson, Indiana University, Bloomington, Indiana, USA Program Committee Co-chairs i Program Chairs Jan Hajič Charles University, Prague Sandra Kübler Indiana University Bloomington Stephan Oepen Universitetet i Oslo Markus Dickinson Indiana University Bloomington Reviewers Patricia Amaral Indiana University Bloomington Emily M. Bender University of Washington Eckhard Bick University of Southern Denmark Ann Bies Linguistic Data Consortium, University of Pennsylvania Gosse Bouma Rijksuniversiteit Groningen Miriam Butt Konstanz Silvie Cinková Charles University, Prague Marie-Catherine de Marneffe The Ohio State University Koenraad De Smedt University of Bergen Tomaž Erjavec Dept. of Knowledge Technologies, Jožef Stefan Institute Filip Ginter University of Turku Memduh Gokirmak Istanbul Technical University Eva Hajicova Charles University, Prague Dag Haug University of Oslo Barbora Hladka Charles University, Prague Lori Levin Carnegie Mellon University Teresa Lynn Dublin City University Adam Meyers New York University Jiří Mírovský Charles University, Prague Kaili Müürisep University of Tartu Joakim Nivre Uppsala University Petya Osenova Sofia University and IICT-BAS Agnieszka Patejuk Institute of Computer Science, Polish Academy of Sciences Tatjana Scheffler Universität Potsdam Olga Scrivner Indiana University Bloomington Djamé Seddah Alpage/Université Paris la Sorbonne Michael White The Ohio State University Nianwen Xue Brandeis University Daniel Zeman Charles University, Prague Heike Zinsmeister University of Hamburg Local Organizing Committee Jan Hajič (chair) Charles University, Prague Kateřina Bryanová Charles University, Prague Zdeňka Urešová Charles University, Prague Eduard Bejček (publication chair) Charles University, Prague ii Table of Contents Invited Speakers Annotating and parsing to semantic frames: feedback from the French FrameNet project . v Marie Candito Downstream use of syntactic analysis: does representation matter? . vi Lilja Øvrelid Tuesday, 23rd January Distributional regularities of verbs and verbal adjectives: Treebank evidence and broader implications . 1 Daniël de Kok, Patricia Fischer, Corina Dima and Erhard Hinrichs UD Annotatrix: An annotation tool for Universal Dependencies . 10 Francis M. Tyers, Maria Sheyanova and Jonathan Washington The Treebanked Conspiracy. Actors and Actions in Bellum Catilinae . 18 Marco Passarotti and Berta Gonzalez Saavedra Universal Dependencies-based syntactic features in detecting human translation varieties. 27 Maria Kunilovskaya and Andrey Kutuzov Graph Convolutional Networks for Named Entity Recognition. 37 Alberto Cetoli, Stefano Bragaglia, Andrew O’Harney and Marc Sloan Extensions to the GrETEL Treebank Query Application . 46 Jan Odijk, Martijn van der Klis and Sheean Spoel The Relation of Form and Function in Linguistic Theory and in a Multilayer Treebank . 56 Eduard Bejček, Eva Hajičová, Marie Mikulová and Jarmila Panevová Literal readings of multiword expressions: as scarce as hen’s teeth . 64 Agata Savary and Silvio Ricardo Cordeiro Querying Multi-word Expressions Annotation with CQL . 73 Natalia Klyueva, Anna Vernerova and Behrang Qasemizadeh REALEC learner treebank: annotation principles and evaluation of automatic parsing . 80 Olga Lyashevskaya and Irina Panteleeva A semiautomatic lemmatisation procedure for treebanks. Old English strong and weak verbs . 88 Marta Tío Sáenz and Darío Metola Rodríguez Data point selection for genre-aware parsing . 95 Ines Rehbein and Felix Bildhauer Error Analysis of Cross-lingual Tagging and Parsing. 106 Rudolf Rosa and Zdeněk Žabokrtský Wednesday, 24th January A Telugu treebank based on a grammar book. 119 Taraka Rama and Sowmya Vajjala Recent Developments within BulTreeBank . 129 Petya Osenova and Kiril Simov iii Towards a dependency-annotated treebank for Bambara . 138 Ekaterina Aplonova and Francis M. Tyers Merging the Trees – Building a Morphological Treebank for German from Two Resources . 146 Petra Steiner What I think when I think about treebanks . 161 Anders Søgaard Syntactic Semantic Correspondence in Dependency Grammar . 167 Catalina Maranduc, Catalin Mititelu and Victoria Bobicev Multi-word annotation in syntactic treebanks – Propositions for Universal Dependencies . 181 Sylvain Kahane, Marine Courtin and Kim Gerdes A Universal Dependencies Treebank for Marathi . 190 Vinit Ravishankar Dangerous Relations in Dependency Treebanks . 201 Chiara Alzetta, Felice Dell’Orletta, Simonetta Montemagni and Giulia Venturi iv Annotating and parsing to semantic frames: feedback from the French FrameNet project Invited talk Marie Candito Université Paris Diderot, France [email protected] Abstract Building systems able to provide a semantic representation of texts has long been an objective, both in linguistics and in applied NLP. Although advances in machine learning sometimes seem to diminish the need to use as input sophisticated structured representations of sentences, the enthusiasm for interpreting trained neural networks somewhat seems to reaffirm that need. Because they represent schematic situations, semantic frames (Fillmore, 1982), as intantiated into FrameNet (Baker, Fillmore and Petruck, 1983) are an appealing level of generalization over the eventu- alities described in texts. In this talk, I will present some feedback from the development of a French FrameNet, including analysis of the main difficulties we faced during annotation. I will describe how linking generalizations can be extracted from the frame-annotated data, using deep syntactic annotations. I will then investigate what kind of input is most effective for FrameNet parsing, from no syntax at all to deep syntactic representations. The work I’ll present is a joint work with: • Marianne Djemaa, Philippe Muller, Laure Vieu, G. de Chalendar, B. Sagot, P. Amsili (French FrameNet), • C. Ribeyre, D. Seddah, G. Perrier, B. Guillaume (deep syntax), • Olivier Michalon, Alexis Nasr (semantic parsing). v Downstream use of syntactic analysis: does representation matter? Invited talk Lilja Øvrelid University of Oslo, Norway [email protected] Abstract Research in syntactic parsing is largely driven by progress in intrinsic evaluation and there have been impressive developments in recent years in terms of evaluation measures, such as F-score or labeled accuracy. At the same time, a range of different syntactic representations have been put to use in treebank annotation projects and there have been studies measuring various aspects of the ”learnability” of these representations and their suitability for automatic parsing, mostly also evaluated in terms of intrinsic measures. In this talk I will provide a different perspective on these developments and give an overview of research that

Proceedings of the 16Th International Workshop on Treebanks and Linguistic Theories

Histcoroy Pyright for Online Information and Ordering of This and Other Manning Books, Please Visit Topwicws W.Manning.Com

Using Constraint Grammar for Treebank Retokenization

Page 657 Monterey Program (November 19-20)- Page 666

Using Danish As a CG Interlingua: a Wide-Coverage Norwegian-English Machine Translation System

Preliminary Version of the Text Analysis Component, Including: Ner, Event Detection and Sentiment Analysis

Information Retrieval and Web Search

The Dravidian Languages

Programming in HTML5 with Javascript and CSS3 Ebook

ED132833.Pdf

Floresta Sinti(C)Tica : a Treebank for Portuguese

Constructing a Lexicon of Arabic-English Named Entity Using SMT and Semantic Linked Data

Proposal to Encode 0C34 TELUGU LETTER LLLA