Proceedings of CLARIN Annual Conference 2020

Total Page:16

File Type:pdf, Size:1020Kb

Proceedings of CLARIN Annual Conference 2020 CLARIN Annual Conference 2020 PROCEEDINGS Edited by Costanza Navarretta, Maria Eskevich 05 – 07 October 2020 Virtual Edition Please cite as: Proceedings of CLARIN Annual Conference 2020. Eds. C. Navarretta and M. Eskevich. Virtual Edition, 2020. Programme Committee Chair: Costanza Navarretta, University of Copenhagen, Denmark • Members: Lars Borin, University of Gothenburg (SE) • António Branco, Universidade de Lisboa (PT) • Tomaž Erjavec, Jožef Stefan Institute (SI) • Eva Hajicová,ˇ Charles University Prague (CZ) • Erhard Hinrichs, University of Tübingen (DE) • Nicolas Larrousse, Huma-Num (FR) • Krister Lindén, University of Helsinki (FI) • Monica Monachini, Institute of Computational Linguistics “A. Zampolli" (IT) • Karlheinz Mörth, Austrian Academy of Sciences (AT) • Jan Odijk, Utrecht University (NL) • Maciej Piasecki, Wrocław University of Science and Technology (PL) • Stelios Piperidis, Institute for Language and Speech Processing (ILSP), Athena Research Center • (GR) Eirikur Rögnvaldsson, University of Iceland (IS) • Kiril Simov, IICT, Bulgarian Academy of Sciences (BG) • Inguna Skadina, University of Latvia (LV) • , Koenraad De Smedt, University of Bergen (NO) • Marko Tadicˇ , University of Zagreb (HR) • Jurgita Vaicenonienˇ e,˙ Vytautas Magnus University (LT) • Tamás Váradi, Research Institute for Linguistics, Hungarian Academy of Sciences (HU) • Kadri Vider, University of Tartu (EE) • Martin Wynne, University of Oxford (UK) • ii Reviewers: Lars Borin, SE Jan Odijk, NL • • António Branco, PT Stelios Piperidis, GR • • Tomaž Erjavec, SI Eirikur Rögnvaldsson, IS • • Eva Hajicová,ˇ CZ • Kiril Simov, BG • Martin Hennelly, ZA • Inguna Skadina, LV • , Erhard Hinrichs, DE • Koenraad De Smedt, NO Marinos Ioannides, CY • • Marko Tadic,´ HR Nicolas Larrousse, FR • • Jurgita Vaicenonienˇ e,˙ LT Krister Lindén, FI • • Monica Monachini, IT Tamás Váradi, HU • • Karlheinz Mörth, AT Kadri Vider, EE • • Costanza Navarretta, DK Martin Wynne, UK • • Subreviewers: Federico Boschetti, IT Fahad Khan, IT • • Daan Broeder, NL Daniël de Kok, DE • • Francesca Frontini, FR Penny Labropoulou, GR • • Maria Gavriilidou, GR Christophe Parisse, FR • • Riccardo Del Gratta, IT Niccolò Pretto, IT • • Neeme Kahusk, EE Thorsten Trippel, DE • • Aleksei Kelli, EE Valeria Quochi, IT • • CLARIN 2020 submissions, review process and acceptance Call for abstracts: 9 January 2020, 24 February 2020 • Submission deadline: 28 April 2020 • In total 40 submissions were received and reviewed (three reviews per submission) • Virtual PC meeting: 15-16 June 2020 • Notifications to authors: 22 June 2020 • 36 accepted submissions • More details on the paper selection procedure and the conference can be found at https://www.clarin. eu/event/2020/clarin-annual-conference-2020-virtual-form. iv Table of Contents Resources and Knowledge Centres for Language and AI Research The CLARIN Resource and Tool Families Jakob Lenardicˇ and Darja Fišer . 1 An Internationally FAIR Mediated Digital Discourse Corpus: Towards Scientific and Pedagogical Reuse Rachel Panckhurst and Francesca Frontini . 6 The First Dictionares in Esperanto: Towards the Creation of a Parallel Corpus Denis Eckert and Francesca Frontini . 11 Digital Neuropsychological Tests and Biomarkers: Resources for NLP and AI Exploration in the Neu- ropsychological Domain Dimitrios Kokkinakis and Kristina Lundholm Fors . 15 CORLI: The French Knowledge-Centre Eva Soroli, Céline Poudat, Flora Badin, Antonio Balvet, Elisabeth Delais-Roussarie, Carole Etienne, Lydia-Mai Ho-Dac, Loïc Liégeois and Christophe Parisse . 19 The CLASSLA Knowledge Centre for South Slavic Languages Nikola Ljubešic,´ Petya Osenova, Tomaž Erjavec and Kiril Simov . 23 Annotation and Visualization Tools Sticker2: A Neural Syntax Annotator for Dutch and German Daniël de Kok, Neele Falk and Tobias Pütz . 27 Exploring and Visualizing Data with GermanNet Rover Marie Hinrichs, Richard Lawrence and Erhard Hinrichs . 32 Named Entity Recognition for Distant Reading in ELTeC Francesca Frontini, Carmen Brando, Joanna Byszuk, Ioana Galleron, Diana Santos and Ranka Stankovic´ 37 Towards Semi-Automatic Analysis of Spontaneous Language for Dutch JanOdijk .............................................................................. 42 A Neural Parsing Pipeline for Icelandic Using the Berkeley Neural Parser Þórunn Arnardóttir and Anton Karl Ingason . 48 Research Cases Annotating Risk Factor Mentions in the COVID-19 Open Research Dataset Maria Skeppstedt, Magnus Ahltorp, Gunnar Eriksson and Rickard Domeij . 52 Contagious "Corona" Compounding in a CLARIN Newspaper Monitor Corpus Koenraad De Smedt . 56 Trawling the Gulf of Bothnia of News: A Big Data Analysis of the Emergence of Terrorism in Swedish and Finnish Newspapers, 1780–1926 v Mats Fridlund, Daniel Brodén, Leif-Jöran Olsson, Lars Borin . 61 Studying Emerging New Contexts for Museum Digitisations on Pinterest Bodil Axelsson, Daniel Holmer, Lars Ahrenberg and Arne Jönsson . 66 Evaluation of a Two-OCR Engine Method: First Results on Digitized Swedish Newspapers Spanning over nearly 200 Years Dana Dannélls, Lars Björk, Torsten Johansson and Ove Dirdal . 71 Stimulating Knowledge Exchange via Trans-National Access – the ELEXIS Travel Grants as a Lexico- graphical Use Case Sussi Olsen, Bolette S. Pedersen, Tanja Wissik, Anna Woldrich and Simon Krek . 77 Repositories and Workflows PoetryLab as Infrastructure for the Analysis of Spanish Poetry Javier De la Rosa, Álvaro Pérez, Laura Hernández, Aitor Díaz, Salvador Ros and Elena González- Blanco ................................................................................ 82 Reproducible Annotation Services for WebLicht Daniël de Kok and Neele Falk . 88 The CLARIN-DK Text Tonsorium Bart Jongejan . 93 Integrating TEITOK and Kontext at LINDAT Marten Janssen . 98 CLARINO+ Optimization of Wittgenstein Research Tools Alois Pichler . 102 Using the FLAT Repository: Two Years In Paul Trilsbeek . 107 Data Curation, Archives and Libraries Building a Home for Italian Audio Archives Silvia Calamai, Niccolò Pretto, Monica Monachini, Maria Francesca Stamuli, Silvia Bianchi and Pierangelo Bonazzoli . 112 Digitizing University Libraries – Evolving from Full Text Providers to CLARIN Contact Points on Cam- puses Manfred Nölte and Martin Mehlberg . 117 “Tea for Two”: The Archive of the Italian Latinity of the Middle Ages meets the CLARIN Infrastructure Federico Boschetti, Riccardo Del Gratta, Monica Monachini, Marina Buzzoni, Paolo Monella and Roberto Rosselli Del Turco . 121 Use Cases of the ISO Standard for Transcription of Spoken Language in the Project INEL Anne Ferger and Daniel Jettka . 126 Evaluating and Assuring Research Data Quality for Audiovisual Annotated Language Data Timofey Arkhangelskiy and Hanna Hedeland . 131 Towards Comprehensive Definitions of Data Quality for Audiovisual Annotated Language Resources Hanna Hedeland . 136 Towards an Interdisciplinary Annotation Framework: Combining NLP and Expertise in Humanities Laska Laskova, Petya Osenova and Kiril Simov . 141 Metadata and Legal Aspects Signposts for CLARIN Denis Arnold, Bernhard Fisseni and Thorsten Trippel . 146 Extending the CMDI Universe: Metadata for Bioinformatics Data Olaf Brandt, Holger Gauza, Steve Kaminski, Mario Trojan and Thorsten Trippel . 151 The CMDI Explorer Denis Arnold, Ben Campbell, Thomas Eckart, Bernhard Fisseni, Thorsten Trippel and Claus Zinn 157 Going to the ALPS: A Tool to Support Researchers and Help Legality Awareness Building Veronika Gründhammer, Vanessa Hannesschläger and Martina Trognitz . 162 When Size Matters. Legal Perspective(s) on N-grams Pawel Kamocki . 166 CLARIN Contractual Framework for Sharing Language Data: The Perspective of Personal Data Pro- tection Aleksei Kelli, Krister Lindén, Kadri Vider, Pawel Kamocki, Ramunas¯ Birštonas, Gaabriel Tavits, Penny Labropoulou, Mari Keskküla and Arvi Tavast . 170 Resources and Knowledge Centers for Language and AI Research 1 Extending the CLARIN Resource and Tool Families Jakob Lenardicˇ1 Darja Fiserˇ 1,2 1Dept. of Translation, Faculty of Arts 2Dept. of Knowledge Technologies University of Ljubljana, Slovenia Jozefˇ Stefan Institute, Slovenia [email protected] [email protected] Abstract This paper presents the current state of the CLARIN Resource and Tool families initiative, the aim of which is to provide user-friendly overviews of the available language resources and tools in the CLARIN infrastructure for researchers from digital humanities, social sciences and human language technologies. The initiative now consists of a total of 11 corpus families, 5 families of lexical resource resources, and 4 tool families, which together amount to 950 manually curated tools and resources as of 17 August 2020. We present the initiative from the perspective of miss- ing metadata as well as problems related to the general accessibility of the tools and resources and their findability in the Virtual Language Observatory (VLO). 1 Introduction This paper presents the current state of the CLARIN Resource and Tool families initiative,1 the aim of which is to provide user-friendly overviews of the available language resources and tools in the CLARIN infrastructure for researchers from digital humanities, social sciences and human language technologies. When the initiative first made its debut at LREC in 2018, it consisted of early versions of only 4 corpus families – parliamentary, computer-mediated communication (CMC), parallel, and newspaper corpora (Fiserˇ et al., 2018). Since then, the initiative has been expanded substantially and now consists
Recommended publications
  • CLIB 2016 Proceedings
    The Second International Conference Computational ​ Linguistics in Bulgaria (CLIB 2016) is organised within ​ the Operation for Support for International Scientific Conferences Held in Bulgaria of the National Science ​ Fund Grant № ДПМНФ 01/9 of 11 Aug 2016. National Science Fund ​ ​ CLIB 2016 is organised by: The Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences PUBLICATION AND CATALOGUING INFORMATION Title: Proceedings of the Second International Conference Computational Linguistics in Bulgaria (CLIB 2016) ISSN: 2367­5675 (online) Published and The Institute for Bulgarian Language Prof. Lyubomir ​ distributed by: Andreychin, Bulgarian Academy of Sciences ​ Editorial address: Institute for Bulgarian Language Prof. Lyubomir ​ Andreychin, Bulgarian Academy of Sciences ​ 52 Shipchenski prohod blvd., bldg. 17 Sofia 1113, Bulgaria +359 2/ 872 23 02 Copyright of each paper stays with the respective authors. The works in the Proceedings are licensed under a Creative Commons Attribution 4.0 International Licence (CC BY 4.0). Licence details: http://creativecommons.org/licenses/by/4.0 ​ Proceedings of the Second International Conference Computational Linguistics in Bulgaria 9 September 2016 Sofia, Bulgaria PREFACE We are excited to welcome you to the second edition of the International Conference Computational ​ Linguistics in Bulgaria (CLIB 2016) in Sofia, Bulgaria! ​ CLIB aspires to foster the NLP community in Bulgaria and further the cooperation among researchers working in NLP for Bulgarian around the world. The need for a conference dedicated to NLP research dealing with or applicable to Bulgarian has been felt for quite some time. We believe that building a strong community of researchers and teams who have chosen to work on Bulgarian is a key factor to meeting the challenges and requirements posed to computational linguistics and NLP in Bulgaria.
    [Show full text]
  • Conference Abstracts
    EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Held under the Patronage of Ms Neelie Kroes, Vice-President of the European Commission, Digital Agenda Commissioner MAY 23-24-25, 2012 ISTANBUL LÜTFI KIRDAR CONVENTION & EXHIBITION CENTRE ISTANBUL, TURKEY CONFERENCE ABSTRACTS Editors: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis. Assistant Editors: Hélène Mazo, Sara Goggi, Olivier Hamon © ELRA – European Language Resources Association. All rights reserved. LREC 2012, EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Title: LREC 2012 Conference Abstracts Distributed by: ELRA – European Language Resources Association 55-57, rue Brillat Savarin 75013 Paris France Tel.: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 www.elra.info and www.elda.org Email: [email protected] and [email protected] Copyright by the European Language Resources Association ISBN 978-2-9517408-7-7 EAN 9782951740877 All rights reserved. No part of this book may be reproduced in any form without the prior permission of the European Language Resources Association ii Introduction of the Conference Chair Nicoletta Calzolari I wish first to express to Ms Neelie Kroes, Vice-President of the European Commission, Digital agenda Commissioner, the gratitude of the Program Committee and of all LREC participants for her Distinguished Patronage of LREC 2012. Even if every time I feel we have reached the top, this 8th LREC is continuing the tradition of breaking previous records: this edition we received 1013 submissions and have accepted 697 papers, after reviewing by the impressive number of 715 colleagues.
    [Show full text]
  • Morphological Doublets in Croatian: a Multi-Methodological Analysis
    Morphological Doublets in Croatian: A multi-methodological analysis By: Dario Lečić A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy The University of Sheffield Faculty of Arts and Humanities Department of Russian and Slavonic Studies 20 January, 2017 Acknowledgments Many a PhD student past and present will agree that doing a PhD is a time-consuming process with lots of ups and downs, motivational issues and even a number of nervous breakdowns. Having experienced all of these, I can only say that they were right. However, having reached the end of the tunnel, I have to admit that the feeling is great. I would like to use this opportunity to thank all the people who made this possible and who have helped me during these four years spent researching the intricate world of morphological doublets in Croatian. First of all, I would like to express my immense gratitude to my primary supervisor, Professor Neil Bermel from the Department or Russian and Slavonic Studies at the University of Sheffield for offering his guidance from day one. Our regular supervisory meetings as well as numerous e-mail exchanges have been eye-opening and I would not have been able to do this without you. I hope this dissertation will justify all the effort you have put into me as your PhD student. Even though the jurisdiction of the second supervisor as defined by the University of Sheffield officially stretches mostly to matters of the Doctoral Development Programme, my second supervisor, Dr Dagmar Divjak, nevertheless played a major role in this research as well, primarily in matters of statistics.
    [Show full text]
  • Languages in the European Information Society Croatian
    META-NET White Paper Series Languages in the European Information Society Croatian Early Release Edition META-FORUM 2011 27-28 June 2011 Budapest, Hungary The development of this white paper has been funded by the Seventh Framework Programme and the ICT Policy Support Programme of the European Commission under contracts T4ME (Grant Agreement 249119), CESAR (Grant Agreement 271022), METANET4U (Grant Agreement 270893) and META-NORD (Grant Agreement 270899). This white paper is for educators, journalists, politicians, language communities and others, who want to establish a truly multilingual Europe. This white paper is part of a series that promotes knowledge about language technology and its potential. The availability and use of language technology in Europe varies between languages. Conse- quently, the actions that are required to further support research and development of language technologies also differs for each language. The required actions depend on many factors, such as the complexity of a given language and the size of its community. META-NET, a European Commission Network of Excellence, has conducted an analysis of current language resources and technolo- gies. This analysis focused on the 23 official European languages as well as other important regional languages in Europe. The results of this analysis suggests that there are many significant research gaps for each language. A more detailed, expert analysis and as- sessment of the current situation will help maximise the impact of additional research and minimize any risks. META-NET consists of 44 research centres from 31 countries who are working with stakeholders from commercial businesses, gov- ernment agencies, industry, research organisations, software com- panies, technology providers and European universities.
    [Show full text]
  • Overabundance in Croatian Dual-Class Verbs FLUMINENSIA, God
    Tomislava Bošnjak Botica, Gordana Hržica, Overabundance in Croatian dual-class verbs FLUMINENSIA, god. 28 (2016), br. 1 Tomislava Bošnjak Botica, Gordana Hržica OVERABUNDANCE IN CROATIAN DUAL-CLASS VERBS dr. sc. Tomislava Bošnjak Botica, Institut za hrvatski jezik i jezikoslovlje, [email protected], Zagreb dr. sc. Gordana Hržica, Edukacijsko-rehabilitacijski fakultet, [email protected], Zagreb izvorni znanstveni članak UDK 811.163.42’367.625 rukopis primljen: 5. 4. 2016.; prihvaćen za tisak: 21. 6. 2016. Croatian verbal inflection morphology is typically described using verb class distinctions. The number of classes differs among approaches, but the basic criterion for class division is the presence or absence and the type of suppletion in verb stems. Generally, one verb belongs to one inflectional class or paradigm only. However, some verbs belong to two classes, i.e. they have two parallel sets of stems. In such dual-class verbs, one infinitive form is realizable in two present forms in all cells within a class, i.e. there is an overabundance (Thorton 2011). Inevitably, one of the stem forming paradigms is a class with categorial suppletion. The present stem of a categorial suppletion class has a greater phonological distance from the infinitive stem than the present stem of the other class. Using a different terminology one class can be described as more transparent, while the other is less transparent (more opaque) in forming the present stem. This study attempts to present overabundance in dual-class verbs and to determine whether competition in such forms can be explained by their tendency to conform to one default class or by other factors, specifically, by the phonological distance between the two paradigms of dual-class verbs.
    [Show full text]
  • The Bulgarian National Corpus: Theory and Practice in Corpus Design
    The Bulgarian National Corpus: Theory and Practice in Corpus Design Svetla Koeva, Ivelina Stoyanova, Svetlozara Leseva, Tsvetana Dimitrova, Rositsa Dekova, and Ekaterina Tarpomanova Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences, Sofia, Bulgaria abstract The paper discusses several key concepts related to the development of Keywords: corpora and reconsiders them in light of recent developments in NLP. corpus design, On the basis of an overview of present-day corpora, we conclude that Bulgarian National Corpus, the dominant practices of corpus design do not utilise the technologies computational adequately and, as a result, fail to meet the demands of corpus linguis- linguistics tics, computational lexicology and computational linguistics alike. We proceed to lay out a data-driven approach to corpus design, which integrates the best practices of traditional corpus linguistics with the potential of the latest technologies allowing fast collection, automatic metadata description and annotation of large amounts of data. Thus, the gist of the approach we propose is that corpus design should be centred on amassing large amounts of mono- and multilin- gual texts and on providing them with a detailed metadata description and high-quality multi-level annotation. We go on to illustrate this concept with a description of the com- pilation, structuring, documentation, and annotation of the Bulgar- ian National Corpus (BulNC). At present it consists of a Bulgarian part of 979.6 million words, constituting the corpus kernel, and 33 Bulgarian-X language corpora, totalling 972.3 million words, 1.95 bil- lion words (1.95×109) altogether. The BulNC is supplied with a com- prehensive metadata description, which allows us to organise the texts according to different principles.
    [Show full text]
  • Deliverable D3.3.-B Third Batch of Language Resources
    Contract no. 271022 CESAR Central and South-East European Resources Project no. 271022 Deliverable D3.3.-B Third batch of language resources: actions on resources Version No. 1.1 07/02/2013 D3.3-B V 1.1 Page 1 of 81 Contract no. 271022 Document Information Deliverable number: D3.3.-B Deliverable title: Third batch of language resources: actions on resources Due date of deliverable: 31/01/2013 Actual submission date of 08/02/2013 deliverable: Main Author(s): György Szaszák (BME-TMIT) Participants: Mátyás Bartalis, András Balog (BME-TMIT) Željko Agić, Božo Bekavac, Nikola Ljubešić, Sanda Martinčić- Ipšić, Ida Raffaelli, add Jan Šnajder, Krešo Šojat, Vanja Štefanec, Marko Tadić (FFZG) Cvetana Krstev, Duško Vitas, Miloš Utvić, Ranka Stanković, Ivan Obradović, Miljana Milanović, Anđelka Zečević, Mirko Spasić (UBG) Svetla Koeva, Tsvetana Dimitrova, Ivelina Stoyanova (IBL) Tibor Pintér (HASRIL) Mladen Stanojević, Mirko Spasić, Natalija Kovačević, Uroš Milošević, Nikola Dragović, Jelena Jovanović (IPUP) Anna Kibort, Łukasz Kobyliński, Mateusz Kopeć, Katarzyna Krasnowska, Michał Lenart, Marek Maziarz, Marcin Miłkowski, Bartłomiej Nitoń, Agnieszka Patejuk, Aleksander Pohl, Piotr Przybyła, Agata Savary, Filip Skwarski, Jan Szejko, Jakub Waszczuk, Marcin Woliński, Alina Wróblewska, Bartosz Zaborowski, Marcin Zając (IPIPAN) Piotr Pęzik, Łukasz Dróżdż, Maciej Buczek (ULodz) Radovan Garabík, Adriána Žáková (LSIL) Internal reviewer: HASRIL Workpackage: WP3 Workpackage title: Enhancing language resources D3.3-B V 1.1 Page 2 of 81 Contract no. 271022 Workpackage
    [Show full text]
  • Pdf Pravdová, M., & Svobodová, I
    MATEMATICKO-FYZIKÁLNÍ FAKULTA PRAHA Morphological Resources of Derivational Word-Formation Relations LUKÁŠ KYJÁNEK ÚFAL Technical Report TR-2018-61 ISSN 1214-5521 UNIVERSITAS CAROLINA PRAGENSIS Copies of ÚFAL Technical Reports can be ordered from: Institute of Formal and Applied Linguistics (ÚFAL MFF UK) Faculty of Mathematics and Physics, Charles University Malostranské nám. 25, CZ-11800 Prague 1 Czechia or can be obtained via the Web: http://ufal.mff.cuni.cz/techrep Morphological Resources of Derivational Word-Formation Relations Lukáš Kyjánek [email protected] Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics November 2018 Abstract The report focuses on existing morphological resources containing derivational word- formation relations. For each resource, the report describes history, licence, format, data structure and some other basic statistics to compare. Therefore, it means the first step to review and harmonize these resources in a similar way as it has already been done with resources of syntactic trees. Introduction Derivational morphology has gained increasing attention in recent years by more and more research groups across Europe. As a result, various computational linguistic resources dealing with the derivational morphology of several languages have been developed. These resources are potential for research of both the NLP and the linguistic typology. However, so far, there was no list or project that would bring and describe these resources together. The report focuses on resources (simply called as lexicons) which publish and distribute data as digital datasets and which connect lemmas and/or morphemes with derivational relations. The goal of these lexicons is to provide some kind of derivational word-formation networks.
    [Show full text]
  • Proceedings of Rely on Different Character Sets Such As MATMT2008 Workshop: Mixing Approaches to CJK Or Arabic
    9th SaLTMiL Workshop on “Free/open-Source Language Resources for the Machine Translation of Less-Resourced Languages” LREC 2014, Reykjavík, Iceland, 27 May 2014 Workshop Programme 09:00 – 09:30 Welcoming address by Workshop co-chair Mikel L. Forcada 09:30 – 10:30 Oral papers Iñaki Alegria, Unai Cabezon, Unai Fernandez de Betoño, Gorka Labaka, Aingeru Mayor, Kepa Sarasola and Arkaitz Zubiaga Wikipedia and Machine Translation: killing two birds with one stone Gideon Kotzé and Friedel Wolff Experiments with syllable-based English-Zulu alignment 10:30 – 11:00 Coffee break 11:00 – 13:00 Oral papers Inari Listenmaa and Kaarel Kaljurand Computational Estonian Grammar in Grammatical Framework Matthew Marting and Kevin Unhammer FST Trimming: Ending Dictionary Redundancy in Apertium Hrvoje Peradin, Filip Petkovski and Francis Tyers Shallow-transfer rule-based machine translation for the Western group of South Slavic languages Alex Rudnick, Annette Rios Gonzales and Michael Gasser Enhancing a Rule-Based MT System with Cross-Lingual WSD 13:00 – 13:30 General discussion 13:30 Closing Editors Mikel L. Forcada Universitat d’Alacant, Spain Kepa Sarasola Euskal Herriko Unibertsitatea, Spain Francis M. Tyers UiT Norgga árktalaš universitehta, Norway Workshop Organizers/Organizing Committee Mikel L. Forcada Universitat d’Alacant, Spain Kepa Sarasola Euskal Herriko Unibertsitatea, Spain Francis M. Tyers UiT Norgga árktalaš universitehta, Norway Workshop Programme Committee Iñaki Alegria Euskal Herriko Unibertsitatea, Spain Lars Borin Göteborgs Universitet, Sweden Elaine Uí Dhonnchadha Trinity College Dublin, Ireland Mikel L. Forcada Universitat d’Alacant, Spain Michael Gasser Indiana University, USA Måns Huldén Helsingin Yliopisto, Finland Krister Lindén Helsingin Yliopisto, Finland Nikola Ljubešić Sveučilište u Zagrebu, Croatia Lluís Padró Universitat Politècnica de Catalunya, Spain Juan Antonio Pérez-Ortiz Universitat d’Alacant, Spain Felipe Sánchez-Martínez Universitat d’Alacant, Spain Kepa Sarasola, Euskal Herriko Unibertsitatea, Spain Kevin P.
    [Show full text]
  • Bollettino SOCIETÀ DI LINGUISTICA
    BOLLETTINO della di SOCIETÀ LINGUISTICA ITALIANA XXXIII 1 / 2015, Bollettino on line SOCIETÀ DI LINGUISTICA ITALIANA XXXIII / 2015, 1 www.societadilinguisticaitaliana.net SOCIETÀ DI LINGUISTICA ITALIANA Presidente: Emanuele Banfi (fi no al 2015, non rieleggibile) e-mail: emanuele.banfi @unimib.it Vicepresidente: Carol Rosen (fi no al 2016, non rieleggibile) e-mail: cgr1@ cornell.edu Segretario: Nicola Grandi (fi no al 2017, rieleggibile) Dipartimento di Filologia classica e Italianistica Via Zamboni 32, 40126 Bologna Fax: +390512098555; e-mail: [email protected] Tesoriere: Isabella Chiari (fi no al 2015, rieleggibile) e-mail: [email protected] Comitato Esecutivo: Adriana Belletti (fi no al 2015) <[email protected]>, Gabriele Iannàccaro (fi no al 2015) <[email protected]>, Emilia Calaresu (fi no al 2016) <[email protected]>, Mara Frascarelli (fi no al 2016) <[email protected]>, Cristina Lavinio (fi no al 2017) <[email protected]>, Simona Vietri (fi no al 2017) [email protected], Segretario GISCEL: Alberto A. Sobrero <[email protected]>, Responsabile GSCP: Anna De Meo <[email protected]>, Responsabile GSPL: Federico Vicario <[email protected]>, Curatore del sito SLI: Giuliano Merz (con la collaborazione di Isabella Chiari) <[email protected]> oppure <[email protected]>, Curatore del sito e della newsletter SLI: Luigi Squillante, <[email protected]> Comitato per le Nomine: Silvana Ferreri (fi no al 2015) <[email protected]>, Annibale Elia (fi no al 2016) <[email protected]>, Daniele Gambarara (fi no al 2017) <[email protected]> Commissione per la selezione dei laboratori/workshops: (fi no al 2015) Federico Albano Leoni, Adriana Belletti, Maria Grossmann, Elisabetta Jezek, Alberto A.
    [Show full text]
  • GOVOR / SPEECH Zagreb, Godina 37, 2020, Broj 2, (2021) Mrežna Inačica: ISSN 1849-2126 UDK 81'34(05)"540.6" CODEN GOVOEB Tiskana Inačica: ISSN 0352-7565
    GOVOR / SPEECH Zagreb, godina 37, 2020, broj 2, (2021) Mrežna inačica: ISSN 1849-2126 UDK 81'34(05)"540.6" CODEN GOVOEB Tiskana inačica: ISSN 0352-7565 Izdavač ODJEL ZA FONETIKU HRVATSKOGA FILOLOŠKOG DRUŠTVA Uredništvo Gordana VAROŠANEC-ŠKARIĆ, glavna urednica Petra ACZÉL Sveučilište Corvinus, Budimpešta, Mađarska Dana BOATMAN Johns Hopkins Hospital, Baltimore, SAD Almasa DEFTERDAREVIĆ Filozofski fakultet, Sarajevo, Bosna i Hercegovina Mária GÓSY Mađarska akademija znanosti, Budimpešta, Mađarska William J. HARDCASTLE Queen Margaret University, Edinburgh, UK Damir HORGA Filozofski fakultet, Zagreb, Hrvatska Patricia KEATING University of California, Los Angeles, SAD Nikolaj LAZIĆ Filozofski fakultet, Zagreb, Hrvatska Marko LIKER Filozofski fakultet, Zagreb, Hrvatska Vesna MILDNER Filozofski fakultet, Zagreb, Hrvatska Elenmari PLETIKOS OLOF Filozofski fakultet, Zagreb, Hrvatska Vesna POŽGAJ HADŽI Filozofski fakultet, Ljubljana, Slovenija Ján SABOL Filozofski fakultet, Košice, Slovačka Irena SAWICKA Filološki fakultet, Torunj, Poljska Mirjana SOVILJ Institut za eksperimentalnu fonetiku i patologiju govora "Đorđe Kostić", Beograd, Srbija Jelena VLAŠIĆ DUIĆ Filozofski fakultet, Zagreb, Hrvatska Tajnica: Diana TOMIĆ Lektorica: Katarina VARENICA Izvršna tajnica: Ana VIDOVIĆ ZORIĆ Korektorica: Marica ŽIVKO Design ovitka: Zlatko ŠIMUNOVIĆ Grafičko uređenje i prijelom Jordan BIĆANIĆ, Odsjek za fonetiku, Filozofski fakultet, Zagreb Prilozi objavljeni u Govoru referiraju se u sljedećim sekundarnim izvorima: ERIH Plus, Scopus, Linguistics Bibliography
    [Show full text]
  • Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC
    Piotr Bański, Marc Kupietz, Harald Lüngen, Paul Rayson, Hanno Biber, Evelyn Breiteneder, Simon Clematide, John Mariani, Mark Stevenson, Theresa Sick (editors) Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section Birmingham, 24 July 2017 Challenges in the Management of Large Corpora and Big Data and Natural Language Processing 2017 Workshop Programme 24 July 2017 WAC-XI guest session (11.00 - 12.30) Convenors: Adrien Barbaresi (ICLTT Vienna), Felix Bildhauer (IDS Mannheim), Roland Schäfer (FU Berlin) Chair: Stefan Evert (Friedrich-Alexander-Universität Erlangen-Nürnberg) Edyta Jurkiewicz-Rohrbacher, Zrinka Kolaković, Björn Hansen Web Corpora – the best possible solution for tracking rare phenomena in underresourced languages – Clitics in Bosnian, Croatian and Serbian Vladimir Benko – Are Web Corpora Inferior? The Case of Czech and Slovak Vit Suchomel – Removing Spam from Web Corpora Through Supervised Learning Using FastText Lunch (12:30-13:30) CMLC-5+BigNLP main section: (13:30 –17:00) Welcome and Introduction 13:30-13:40 National Corpora Talks 13:40 – 15:40 Dawn Knight, Tess Fitzpatrick, Steve Morris, Jeremy Evas, Paul Rayson, Irena Spasic, Mark Stonelake, Enlli Môn Thomas, Steven Neale, Jennifer Needs, Scott Piao, Mair Rees, Gareth Watkins, Laurence Anthony, Thomas Michael Cobb, Margaret Deuchar, Kevin Donnelly, Michael McCarthy, Kevin Scannell, Creating CorCenCC (Corpws Cenedlaethol
    [Show full text]