Morphological Doublets in Croatian: a Multi-Methodological Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Morphological Doublets in Croatian: a Multi-Methodological Analysis Morphological Doublets in Croatian: A multi-methodological analysis By: Dario Lečić A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy The University of Sheffield Faculty of Arts and Humanities Department of Russian and Slavonic Studies 20 January, 2017 Acknowledgments Many a PhD student past and present will agree that doing a PhD is a time-consuming process with lots of ups and downs, motivational issues and even a number of nervous breakdowns. Having experienced all of these, I can only say that they were right. However, having reached the end of the tunnel, I have to admit that the feeling is great. I would like to use this opportunity to thank all the people who made this possible and who have helped me during these four years spent researching the intricate world of morphological doublets in Croatian. First of all, I would like to express my immense gratitude to my primary supervisor, Professor Neil Bermel from the Department or Russian and Slavonic Studies at the University of Sheffield for offering his guidance from day one. Our regular supervisory meetings as well as numerous e-mail exchanges have been eye-opening and I would not have been able to do this without you. I hope this dissertation will justify all the effort you have put into me as your PhD student. Even though the jurisdiction of the second supervisor as defined by the University of Sheffield officially stretches mostly to matters of the Doctoral Development Programme, my second supervisor, Dr Dagmar Divjak, nevertheless played a major role in this research as well, primarily in matters of statistics. I appreciate all your efforts for trying to make me understand R, it is my own fault that it did not work. I would also like to thank Dr Mateusz-Milan Stanojević from the Faculty of Humanities and Social Sciences in Zagreb for spending hours discussing my work with me during my visits to Croatia and reading my scribbles, even though he was not involved with my research in any official function. Thanks for always believing in me and pushing me forward, Mat. I do not know what you wrote in that letter of recommendation four years ago, but it must have been gold. The majority of this work was created in Office 1.85e of the Jessop West building. I would like to thank all the people that I shared the office with in the past four years, who have made that office a pleasant place to work in. My special thanks go to Nina and Jarek for actually making me talk to them and tolerating my office pranks. Every once in a while, a PhD researcher needs to step out of his cubicle, forget about his work and socialise, otherwise he would go crazy at some point. I was fortunate enough to be a member of the Sheffield University Volleyball Club, whose members provided me with the necessary social life and a chance to let off some steam. Damien, Peter, Adam, Laura, Yvett, Melina, Dahan and all Club members past and present, thank you for being amazing friends. Go Black & Gold! And finally, I would like to thank all the participants in my numerous questionnaire studies. Some of you I know, some of you I do not, but you all made this possible as well. I apologise for pestering you, but it was all for the sake of science. I was especially proud when several of my participants approached me later by e-mail enquiring about the results. I guess this shows the questionnaire wasn’t as boring as I was afraid it would be and that people did feel intrigued by the topic I was researching. Summary The term morphological doubletism refers to a situation in language when there are two (or more) morphemes available for a single cell in an inflectional paradigm of a lexeme. Slavonic languages, with their rich inflectional systems, show particularly high levels of doubletism. In the present dissertation we analyse examples of doubletism in Croatian nominal paradigms. As shown by the dissertation’s subtitle, “a multi-methodological analysis”, we compare and contrast evidence obtained by various methods. First we conduct a corpus study to determine the frequency distributions of the doublet pairs in present-day Croatian. This analysis has shown that the distribution of the doublet pairs is not determined by any intra- or extra-linguistic factor, but that it is not completely random either. These distributions are later used in several additional studies, the purpose of which is to answer the question of how such forms are processed in speakers’ mental grammars. One of the analyses is a computational one, in which we try to reproduce a grammar of a Croatian speaker by using two memory-based models (AM and TiMBL). The models were highly successful in producing the desired output without resorting to any rules or generalizations. We also report the results of three questionnaire studies, all of which show that native speakers are extremely sensitive to the language input they receive, in line with usage-based theories of language, as well as that mental grammars are gradient. The speakers’ ratings and production rates closely matched the proportions of the doublet pairs in the corpus. Furthermore, speakers distinguish between several levels of domination of one ending over another. When the domination of one form is weak, speakers resort to a different decision criterion, namely they look at the dominant ending of phonologically similar words. Table of contents Chapter 1. Introduction - Can something be said ‘both ways’? ........ 5 1.1. Definition of research topic and terminology ........................................................ 5 1.2. Research questions ............................................................................................... 11 1.3. Dissertation overview ........................................................................................... 19 Chapter 2. What can doublets tell us about mental grammars? ..... 21 2.1. Introduction to linguistic theories ........................................................................ 21 2.2. Previous treatments of doubletism in linguistic theory ....................................... 26 2.2.1. Doubletism in the generative tradition .......................................................... 26 2.2.2. Doublets in the cognitive tradition ................................................................ 30 2.2.3. Doublets in Croatian philology ...................................................................... 41 Chapter 3. Types of linguistic data - diverging or converging evidence? ............................................................................................. 47 3.1. Intuition judgments .............................................................................................. 47 3.1.1. Issues with intuition judgments ..................................................................... 49 3.2. Corpora ................................................................................................................. 54 3.2.1. Issues with corpora ........................................................................................ 56 3.3. Methodological pluralism – comparing intuitions and corpus data .................... 57 Chapter 4. Morphological doublets in Croatian - a corpus approach ............................................................................................. 65 4.1. Development of Standard Croatian ..................................................................... 68 4.2. Croatian corpora .................................................................................................. 73 4.2.1. Hrvatski nacionalni korpus (Croatian national corpus, HNK) .................... 74 4.2.2. Hrvatska jezična riznica (Croatian Language Repository, HJR)................. 75 4.2.3. Croatian web corpus (hrWaC14) .................................................................. 75 4.3. Declension patterns of Croatian nouns ................................................................ 76 4.3.1. A-declension ................................................................................................... 77 4.3.1.1. Doubletism in the vocative singular (Vsg) .............................................. 77 4.3.1.2. Doubletism in the instrumental singular (Isg) ........................................ 78 4.3.1.3. Doubletism in the plural paradigm ......................................................... 84 4.3.1.4. Doubletism in the long plural.................................................................. 88 4.3.1.5. Doubletism in the genitive plural (Gpl) .................................................. 89 4.3.1.6. Isolated examples of stem doubletism in A-declension........................... 90 4.3.2. E-declension ................................................................................................... 91 1 4.3.2.1. Doubletism in the dative/locative singular (Dsg/Lsg) ............................. 91 4.3.2.2. Doubletism in the vocative singular (Vsg) .............................................. 94 4.3.2.3. Tripletism in the genitive plural (Gpl) .................................................... 95 4.3.3. I-declension ................................................................................................... 101 4.3.3.1. Doubletism in the instrumental singular (Isg) ...................................... 102 4.3.3.2. Gender doubletism ................................................................................ 104 4.3.3.3. Isolated examples of doubletism in I-declension ..................................
Recommended publications
  • CLIB 2016 Proceedings
    The Second International Conference Computational ​ Linguistics in Bulgaria (CLIB 2016) is organised within ​ the Operation for Support for International Scientific Conferences Held in Bulgaria of the National Science ​ Fund Grant № ДПМНФ 01/9 of 11 Aug 2016. National Science Fund ​ ​ CLIB 2016 is organised by: The Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences PUBLICATION AND CATALOGUING INFORMATION Title: Proceedings of the Second International Conference Computational Linguistics in Bulgaria (CLIB 2016) ISSN: 2367­5675 (online) Published and The Institute for Bulgarian Language Prof. Lyubomir ​ distributed by: Andreychin, Bulgarian Academy of Sciences ​ Editorial address: Institute for Bulgarian Language Prof. Lyubomir ​ Andreychin, Bulgarian Academy of Sciences ​ 52 Shipchenski prohod blvd., bldg. 17 Sofia 1113, Bulgaria +359 2/ 872 23 02 Copyright of each paper stays with the respective authors. The works in the Proceedings are licensed under a Creative Commons Attribution 4.0 International Licence (CC BY 4.0). Licence details: http://creativecommons.org/licenses/by/4.0 ​ Proceedings of the Second International Conference Computational Linguistics in Bulgaria 9 September 2016 Sofia, Bulgaria PREFACE We are excited to welcome you to the second edition of the International Conference Computational ​ Linguistics in Bulgaria (CLIB 2016) in Sofia, Bulgaria! ​ CLIB aspires to foster the NLP community in Bulgaria and further the cooperation among researchers working in NLP for Bulgarian around the world. The need for a conference dedicated to NLP research dealing with or applicable to Bulgarian has been felt for quite some time. We believe that building a strong community of researchers and teams who have chosen to work on Bulgarian is a key factor to meeting the challenges and requirements posed to computational linguistics and NLP in Bulgaria.
    [Show full text]
  • Conference Abstracts
    EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Held under the Patronage of Ms Neelie Kroes, Vice-President of the European Commission, Digital Agenda Commissioner MAY 23-24-25, 2012 ISTANBUL LÜTFI KIRDAR CONVENTION & EXHIBITION CENTRE ISTANBUL, TURKEY CONFERENCE ABSTRACTS Editors: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis. Assistant Editors: Hélène Mazo, Sara Goggi, Olivier Hamon © ELRA – European Language Resources Association. All rights reserved. LREC 2012, EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Title: LREC 2012 Conference Abstracts Distributed by: ELRA – European Language Resources Association 55-57, rue Brillat Savarin 75013 Paris France Tel.: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 www.elra.info and www.elda.org Email: [email protected] and [email protected] Copyright by the European Language Resources Association ISBN 978-2-9517408-7-7 EAN 9782951740877 All rights reserved. No part of this book may be reproduced in any form without the prior permission of the European Language Resources Association ii Introduction of the Conference Chair Nicoletta Calzolari I wish first to express to Ms Neelie Kroes, Vice-President of the European Commission, Digital agenda Commissioner, the gratitude of the Program Committee and of all LREC participants for her Distinguished Patronage of LREC 2012. Even if every time I feel we have reached the top, this 8th LREC is continuing the tradition of breaking previous records: this edition we received 1013 submissions and have accepted 697 papers, after reviewing by the impressive number of 715 colleagues.
    [Show full text]
  • Languages in the European Information Society Croatian
    META-NET White Paper Series Languages in the European Information Society Croatian Early Release Edition META-FORUM 2011 27-28 June 2011 Budapest, Hungary The development of this white paper has been funded by the Seventh Framework Programme and the ICT Policy Support Programme of the European Commission under contracts T4ME (Grant Agreement 249119), CESAR (Grant Agreement 271022), METANET4U (Grant Agreement 270893) and META-NORD (Grant Agreement 270899). This white paper is for educators, journalists, politicians, language communities and others, who want to establish a truly multilingual Europe. This white paper is part of a series that promotes knowledge about language technology and its potential. The availability and use of language technology in Europe varies between languages. Conse- quently, the actions that are required to further support research and development of language technologies also differs for each language. The required actions depend on many factors, such as the complexity of a given language and the size of its community. META-NET, a European Commission Network of Excellence, has conducted an analysis of current language resources and technolo- gies. This analysis focused on the 23 official European languages as well as other important regional languages in Europe. The results of this analysis suggests that there are many significant research gaps for each language. A more detailed, expert analysis and as- sessment of the current situation will help maximise the impact of additional research and minimize any risks. META-NET consists of 44 research centres from 31 countries who are working with stakeholders from commercial businesses, gov- ernment agencies, industry, research organisations, software com- panies, technology providers and European universities.
    [Show full text]
  • Overabundance in Croatian Dual-Class Verbs FLUMINENSIA, God
    Tomislava Bošnjak Botica, Gordana Hržica, Overabundance in Croatian dual-class verbs FLUMINENSIA, god. 28 (2016), br. 1 Tomislava Bošnjak Botica, Gordana Hržica OVERABUNDANCE IN CROATIAN DUAL-CLASS VERBS dr. sc. Tomislava Bošnjak Botica, Institut za hrvatski jezik i jezikoslovlje, [email protected], Zagreb dr. sc. Gordana Hržica, Edukacijsko-rehabilitacijski fakultet, [email protected], Zagreb izvorni znanstveni članak UDK 811.163.42’367.625 rukopis primljen: 5. 4. 2016.; prihvaćen za tisak: 21. 6. 2016. Croatian verbal inflection morphology is typically described using verb class distinctions. The number of classes differs among approaches, but the basic criterion for class division is the presence or absence and the type of suppletion in verb stems. Generally, one verb belongs to one inflectional class or paradigm only. However, some verbs belong to two classes, i.e. they have two parallel sets of stems. In such dual-class verbs, one infinitive form is realizable in two present forms in all cells within a class, i.e. there is an overabundance (Thorton 2011). Inevitably, one of the stem forming paradigms is a class with categorial suppletion. The present stem of a categorial suppletion class has a greater phonological distance from the infinitive stem than the present stem of the other class. Using a different terminology one class can be described as more transparent, while the other is less transparent (more opaque) in forming the present stem. This study attempts to present overabundance in dual-class verbs and to determine whether competition in such forms can be explained by their tendency to conform to one default class or by other factors, specifically, by the phonological distance between the two paradigms of dual-class verbs.
    [Show full text]
  • The Bulgarian National Corpus: Theory and Practice in Corpus Design
    The Bulgarian National Corpus: Theory and Practice in Corpus Design Svetla Koeva, Ivelina Stoyanova, Svetlozara Leseva, Tsvetana Dimitrova, Rositsa Dekova, and Ekaterina Tarpomanova Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences, Sofia, Bulgaria abstract The paper discusses several key concepts related to the development of Keywords: corpora and reconsiders them in light of recent developments in NLP. corpus design, On the basis of an overview of present-day corpora, we conclude that Bulgarian National Corpus, the dominant practices of corpus design do not utilise the technologies computational adequately and, as a result, fail to meet the demands of corpus linguis- linguistics tics, computational lexicology and computational linguistics alike. We proceed to lay out a data-driven approach to corpus design, which integrates the best practices of traditional corpus linguistics with the potential of the latest technologies allowing fast collection, automatic metadata description and annotation of large amounts of data. Thus, the gist of the approach we propose is that corpus design should be centred on amassing large amounts of mono- and multilin- gual texts and on providing them with a detailed metadata description and high-quality multi-level annotation. We go on to illustrate this concept with a description of the com- pilation, structuring, documentation, and annotation of the Bulgar- ian National Corpus (BulNC). At present it consists of a Bulgarian part of 979.6 million words, constituting the corpus kernel, and 33 Bulgarian-X language corpora, totalling 972.3 million words, 1.95 bil- lion words (1.95×109) altogether. The BulNC is supplied with a com- prehensive metadata description, which allows us to organise the texts according to different principles.
    [Show full text]
  • Deliverable D3.3.-B Third Batch of Language Resources
    Contract no. 271022 CESAR Central and South-East European Resources Project no. 271022 Deliverable D3.3.-B Third batch of language resources: actions on resources Version No. 1.1 07/02/2013 D3.3-B V 1.1 Page 1 of 81 Contract no. 271022 Document Information Deliverable number: D3.3.-B Deliverable title: Third batch of language resources: actions on resources Due date of deliverable: 31/01/2013 Actual submission date of 08/02/2013 deliverable: Main Author(s): György Szaszák (BME-TMIT) Participants: Mátyás Bartalis, András Balog (BME-TMIT) Željko Agić, Božo Bekavac, Nikola Ljubešić, Sanda Martinčić- Ipšić, Ida Raffaelli, add Jan Šnajder, Krešo Šojat, Vanja Štefanec, Marko Tadić (FFZG) Cvetana Krstev, Duško Vitas, Miloš Utvić, Ranka Stanković, Ivan Obradović, Miljana Milanović, Anđelka Zečević, Mirko Spasić (UBG) Svetla Koeva, Tsvetana Dimitrova, Ivelina Stoyanova (IBL) Tibor Pintér (HASRIL) Mladen Stanojević, Mirko Spasić, Natalija Kovačević, Uroš Milošević, Nikola Dragović, Jelena Jovanović (IPUP) Anna Kibort, Łukasz Kobyliński, Mateusz Kopeć, Katarzyna Krasnowska, Michał Lenart, Marek Maziarz, Marcin Miłkowski, Bartłomiej Nitoń, Agnieszka Patejuk, Aleksander Pohl, Piotr Przybyła, Agata Savary, Filip Skwarski, Jan Szejko, Jakub Waszczuk, Marcin Woliński, Alina Wróblewska, Bartosz Zaborowski, Marcin Zając (IPIPAN) Piotr Pęzik, Łukasz Dróżdż, Maciej Buczek (ULodz) Radovan Garabík, Adriána Žáková (LSIL) Internal reviewer: HASRIL Workpackage: WP3 Workpackage title: Enhancing language resources D3.3-B V 1.1 Page 2 of 81 Contract no. 271022 Workpackage
    [Show full text]
  • Pdf Pravdová, M., & Svobodová, I
    MATEMATICKO-FYZIKÁLNÍ FAKULTA PRAHA Morphological Resources of Derivational Word-Formation Relations LUKÁŠ KYJÁNEK ÚFAL Technical Report TR-2018-61 ISSN 1214-5521 UNIVERSITAS CAROLINA PRAGENSIS Copies of ÚFAL Technical Reports can be ordered from: Institute of Formal and Applied Linguistics (ÚFAL MFF UK) Faculty of Mathematics and Physics, Charles University Malostranské nám. 25, CZ-11800 Prague 1 Czechia or can be obtained via the Web: http://ufal.mff.cuni.cz/techrep Morphological Resources of Derivational Word-Formation Relations Lukáš Kyjánek [email protected] Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics November 2018 Abstract The report focuses on existing morphological resources containing derivational word- formation relations. For each resource, the report describes history, licence, format, data structure and some other basic statistics to compare. Therefore, it means the first step to review and harmonize these resources in a similar way as it has already been done with resources of syntactic trees. Introduction Derivational morphology has gained increasing attention in recent years by more and more research groups across Europe. As a result, various computational linguistic resources dealing with the derivational morphology of several languages have been developed. These resources are potential for research of both the NLP and the linguistic typology. However, so far, there was no list or project that would bring and describe these resources together. The report focuses on resources (simply called as lexicons) which publish and distribute data as digital datasets and which connect lemmas and/or morphemes with derivational relations. The goal of these lexicons is to provide some kind of derivational word-formation networks.
    [Show full text]
  • Proceedings of Rely on Different Character Sets Such As MATMT2008 Workshop: Mixing Approaches to CJK Or Arabic
    9th SaLTMiL Workshop on “Free/open-Source Language Resources for the Machine Translation of Less-Resourced Languages” LREC 2014, Reykjavík, Iceland, 27 May 2014 Workshop Programme 09:00 – 09:30 Welcoming address by Workshop co-chair Mikel L. Forcada 09:30 – 10:30 Oral papers Iñaki Alegria, Unai Cabezon, Unai Fernandez de Betoño, Gorka Labaka, Aingeru Mayor, Kepa Sarasola and Arkaitz Zubiaga Wikipedia and Machine Translation: killing two birds with one stone Gideon Kotzé and Friedel Wolff Experiments with syllable-based English-Zulu alignment 10:30 – 11:00 Coffee break 11:00 – 13:00 Oral papers Inari Listenmaa and Kaarel Kaljurand Computational Estonian Grammar in Grammatical Framework Matthew Marting and Kevin Unhammer FST Trimming: Ending Dictionary Redundancy in Apertium Hrvoje Peradin, Filip Petkovski and Francis Tyers Shallow-transfer rule-based machine translation for the Western group of South Slavic languages Alex Rudnick, Annette Rios Gonzales and Michael Gasser Enhancing a Rule-Based MT System with Cross-Lingual WSD 13:00 – 13:30 General discussion 13:30 Closing Editors Mikel L. Forcada Universitat d’Alacant, Spain Kepa Sarasola Euskal Herriko Unibertsitatea, Spain Francis M. Tyers UiT Norgga árktalaš universitehta, Norway Workshop Organizers/Organizing Committee Mikel L. Forcada Universitat d’Alacant, Spain Kepa Sarasola Euskal Herriko Unibertsitatea, Spain Francis M. Tyers UiT Norgga árktalaš universitehta, Norway Workshop Programme Committee Iñaki Alegria Euskal Herriko Unibertsitatea, Spain Lars Borin Göteborgs Universitet, Sweden Elaine Uí Dhonnchadha Trinity College Dublin, Ireland Mikel L. Forcada Universitat d’Alacant, Spain Michael Gasser Indiana University, USA Måns Huldén Helsingin Yliopisto, Finland Krister Lindén Helsingin Yliopisto, Finland Nikola Ljubešić Sveučilište u Zagrebu, Croatia Lluís Padró Universitat Politècnica de Catalunya, Spain Juan Antonio Pérez-Ortiz Universitat d’Alacant, Spain Felipe Sánchez-Martínez Universitat d’Alacant, Spain Kepa Sarasola, Euskal Herriko Unibertsitatea, Spain Kevin P.
    [Show full text]
  • Bollettino SOCIETÀ DI LINGUISTICA
    BOLLETTINO della di SOCIETÀ LINGUISTICA ITALIANA XXXIII 1 / 2015, Bollettino on line SOCIETÀ DI LINGUISTICA ITALIANA XXXIII / 2015, 1 www.societadilinguisticaitaliana.net SOCIETÀ DI LINGUISTICA ITALIANA Presidente: Emanuele Banfi (fi no al 2015, non rieleggibile) e-mail: emanuele.banfi @unimib.it Vicepresidente: Carol Rosen (fi no al 2016, non rieleggibile) e-mail: cgr1@ cornell.edu Segretario: Nicola Grandi (fi no al 2017, rieleggibile) Dipartimento di Filologia classica e Italianistica Via Zamboni 32, 40126 Bologna Fax: +390512098555; e-mail: [email protected] Tesoriere: Isabella Chiari (fi no al 2015, rieleggibile) e-mail: [email protected] Comitato Esecutivo: Adriana Belletti (fi no al 2015) <[email protected]>, Gabriele Iannàccaro (fi no al 2015) <[email protected]>, Emilia Calaresu (fi no al 2016) <[email protected]>, Mara Frascarelli (fi no al 2016) <[email protected]>, Cristina Lavinio (fi no al 2017) <[email protected]>, Simona Vietri (fi no al 2017) [email protected], Segretario GISCEL: Alberto A. Sobrero <[email protected]>, Responsabile GSCP: Anna De Meo <[email protected]>, Responsabile GSPL: Federico Vicario <[email protected]>, Curatore del sito SLI: Giuliano Merz (con la collaborazione di Isabella Chiari) <[email protected]> oppure <[email protected]>, Curatore del sito e della newsletter SLI: Luigi Squillante, <[email protected]> Comitato per le Nomine: Silvana Ferreri (fi no al 2015) <[email protected]>, Annibale Elia (fi no al 2016) <[email protected]>, Daniele Gambarara (fi no al 2017) <[email protected]> Commissione per la selezione dei laboratori/workshops: (fi no al 2015) Federico Albano Leoni, Adriana Belletti, Maria Grossmann, Elisabetta Jezek, Alberto A.
    [Show full text]
  • GOVOR / SPEECH Zagreb, Godina 37, 2020, Broj 2, (2021) Mrežna Inačica: ISSN 1849-2126 UDK 81'34(05)"540.6" CODEN GOVOEB Tiskana Inačica: ISSN 0352-7565
    GOVOR / SPEECH Zagreb, godina 37, 2020, broj 2, (2021) Mrežna inačica: ISSN 1849-2126 UDK 81'34(05)"540.6" CODEN GOVOEB Tiskana inačica: ISSN 0352-7565 Izdavač ODJEL ZA FONETIKU HRVATSKOGA FILOLOŠKOG DRUŠTVA Uredništvo Gordana VAROŠANEC-ŠKARIĆ, glavna urednica Petra ACZÉL Sveučilište Corvinus, Budimpešta, Mađarska Dana BOATMAN Johns Hopkins Hospital, Baltimore, SAD Almasa DEFTERDAREVIĆ Filozofski fakultet, Sarajevo, Bosna i Hercegovina Mária GÓSY Mađarska akademija znanosti, Budimpešta, Mađarska William J. HARDCASTLE Queen Margaret University, Edinburgh, UK Damir HORGA Filozofski fakultet, Zagreb, Hrvatska Patricia KEATING University of California, Los Angeles, SAD Nikolaj LAZIĆ Filozofski fakultet, Zagreb, Hrvatska Marko LIKER Filozofski fakultet, Zagreb, Hrvatska Vesna MILDNER Filozofski fakultet, Zagreb, Hrvatska Elenmari PLETIKOS OLOF Filozofski fakultet, Zagreb, Hrvatska Vesna POŽGAJ HADŽI Filozofski fakultet, Ljubljana, Slovenija Ján SABOL Filozofski fakultet, Košice, Slovačka Irena SAWICKA Filološki fakultet, Torunj, Poljska Mirjana SOVILJ Institut za eksperimentalnu fonetiku i patologiju govora "Đorđe Kostić", Beograd, Srbija Jelena VLAŠIĆ DUIĆ Filozofski fakultet, Zagreb, Hrvatska Tajnica: Diana TOMIĆ Lektorica: Katarina VARENICA Izvršna tajnica: Ana VIDOVIĆ ZORIĆ Korektorica: Marica ŽIVKO Design ovitka: Zlatko ŠIMUNOVIĆ Grafičko uređenje i prijelom Jordan BIĆANIĆ, Odsjek za fonetiku, Filozofski fakultet, Zagreb Prilozi objavljeni u Govoru referiraju se u sljedećim sekundarnim izvorima: ERIH Plus, Scopus, Linguistics Bibliography
    [Show full text]
  • Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC
    Piotr Bański, Marc Kupietz, Harald Lüngen, Paul Rayson, Hanno Biber, Evelyn Breiteneder, Simon Clematide, John Mariani, Mark Stevenson, Theresa Sick (editors) Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section Birmingham, 24 July 2017 Challenges in the Management of Large Corpora and Big Data and Natural Language Processing 2017 Workshop Programme 24 July 2017 WAC-XI guest session (11.00 - 12.30) Convenors: Adrien Barbaresi (ICLTT Vienna), Felix Bildhauer (IDS Mannheim), Roland Schäfer (FU Berlin) Chair: Stefan Evert (Friedrich-Alexander-Universität Erlangen-Nürnberg) Edyta Jurkiewicz-Rohrbacher, Zrinka Kolaković, Björn Hansen Web Corpora – the best possible solution for tracking rare phenomena in underresourced languages – Clitics in Bosnian, Croatian and Serbian Vladimir Benko – Are Web Corpora Inferior? The Case of Czech and Slovak Vit Suchomel – Removing Spam from Web Corpora Through Supervised Learning Using FastText Lunch (12:30-13:30) CMLC-5+BigNLP main section: (13:30 –17:00) Welcome and Introduction 13:30-13:40 National Corpora Talks 13:40 – 15:40 Dawn Knight, Tess Fitzpatrick, Steve Morris, Jeremy Evas, Paul Rayson, Irena Spasic, Mark Stonelake, Enlli Môn Thomas, Steven Neale, Jennifer Needs, Scott Piao, Mair Rees, Gareth Watkins, Laurence Anthony, Thomas Michael Cobb, Margaret Deuchar, Kevin Donnelly, Michael McCarthy, Kevin Scannell, Creating CorCenCC (Corpws Cenedlaethol
    [Show full text]
  • Proceedings of CLARIN Annual Conference 2020
    CLARIN Annual Conference 2020 PROCEEDINGS Edited by Costanza Navarretta, Maria Eskevich 05 – 07 October 2020 Virtual Edition Please cite as: Proceedings of CLARIN Annual Conference 2020. Eds. C. Navarretta and M. Eskevich. Virtual Edition, 2020. Programme Committee Chair: Costanza Navarretta, University of Copenhagen, Denmark • Members: Lars Borin, University of Gothenburg (SE) • António Branco, Universidade de Lisboa (PT) • Tomaž Erjavec, Jožef Stefan Institute (SI) • Eva Hajicová,ˇ Charles University Prague (CZ) • Erhard Hinrichs, University of Tübingen (DE) • Nicolas Larrousse, Huma-Num (FR) • Krister Lindén, University of Helsinki (FI) • Monica Monachini, Institute of Computational Linguistics “A. Zampolli" (IT) • Karlheinz Mörth, Austrian Academy of Sciences (AT) • Jan Odijk, Utrecht University (NL) • Maciej Piasecki, Wrocław University of Science and Technology (PL) • Stelios Piperidis, Institute for Language and Speech Processing (ILSP), Athena Research Center • (GR) Eirikur Rögnvaldsson, University of Iceland (IS) • Kiril Simov, IICT, Bulgarian Academy of Sciences (BG) • Inguna Skadina, University of Latvia (LV) • , Koenraad De Smedt, University of Bergen (NO) • Marko Tadicˇ , University of Zagreb (HR) • Jurgita Vaicenonienˇ e,˙ Vytautas Magnus University (LT) • Tamás Váradi, Research Institute for Linguistics, Hungarian Academy of Sciences (HU) • Kadri Vider, University of Tartu (EE) • Martin Wynne, University of Oxford (UK) • ii Reviewers: Lars Borin, SE Jan Odijk, NL • • António Branco, PT Stelios Piperidis, GR • • Tomaž Erjavec,
    [Show full text]