Morphological Doublets in Croatian: a Multi-Methodological Analysis

Morphological Doublets in Croatian: A multi-methodological analysis By: Dario Lečić A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy The University of Sheffield Faculty of Arts and Humanities Department of Russian and Slavonic Studies 20 January, 2017 Acknowledgments Many a PhD student past and present will agree that doing a PhD is a time-consuming process with lots of ups and downs, motivational issues and even a number of nervous breakdowns. Having experienced all of these, I can only say that they were right. However, having reached the end of the tunnel, I have to admit that the feeling is great. I would like to use this opportunity to thank all the people who made this possible and who have helped me during these four years spent researching the intricate world of morphological doublets in Croatian. First of all, I would like to express my immense gratitude to my primary supervisor, Professor Neil Bermel from the Department or Russian and Slavonic Studies at the University of Sheffield for offering his guidance from day one. Our regular supervisory meetings as well as numerous e-mail exchanges have been eye-opening and I would not have been able to do this without you. I hope this dissertation will justify all the effort you have put into me as your PhD student. Even though the jurisdiction of the second supervisor as defined by the University of Sheffield officially stretches mostly to matters of the Doctoral Development Programme, my second supervisor, Dr Dagmar Divjak, nevertheless played a major role in this research as well, primarily in matters of statistics. I appreciate all your efforts for trying to make me understand R, it is my own fault that it did not work. I would also like to thank Dr Mateusz-Milan Stanojević from the Faculty of Humanities and Social Sciences in Zagreb for spending hours discussing my work with me during my visits to Croatia and reading my scribbles, even though he was not involved with my research in any official function. Thanks for always believing in me and pushing me forward, Mat. I do not know what you wrote in that letter of recommendation four years ago, but it must have been gold. The majority of this work was created in Office 1.85e of the Jessop West building. I would like to thank all the people that I shared the office with in the past four years, who have made that office a pleasant place to work in. My special thanks go to Nina and Jarek for actually making me talk to them and tolerating my office pranks. Every once in a while, a PhD researcher needs to step out of his cubicle, forget about his work and socialise, otherwise he would go crazy at some point. I was fortunate enough to be a member of the Sheffield University Volleyball Club, whose members provided me with the necessary social life and a chance to let off some steam. Damien, Peter, Adam, Laura, Yvett, Melina, Dahan and all Club members past and present, thank you for being amazing friends. Go Black & Gold! And finally, I would like to thank all the participants in my numerous questionnaire studies. Some of you I know, some of you I do not, but you all made this possible as well. I apologise for pestering you, but it was all for the sake of science. I was especially proud when several of my participants approached me later by e-mail enquiring about the results. I guess this shows the questionnaire wasn’t as boring as I was afraid it would be and that people did feel intrigued by the topic I was researching. Summary The term morphological doubletism refers to a situation in language when there are two (or more) morphemes available for a single cell in an inflectional paradigm of a lexeme. Slavonic languages, with their rich inflectional systems, show particularly high levels of doubletism. In the present dissertation we analyse examples of doubletism in Croatian nominal paradigms. As shown by the dissertation’s subtitle, “a multi-methodological analysis”, we compare and contrast evidence obtained by various methods. First we conduct a corpus study to determine the frequency distributions of the doublet pairs in present-day Croatian. This analysis has shown that the distribution of the doublet pairs is not determined by any intra- or extra-linguistic factor, but that it is not completely random either. These distributions are later used in several additional studies, the purpose of which is to answer the question of how such forms are processed in speakers’ mental grammars. One of the analyses is a computational one, in which we try to reproduce a grammar of a Croatian speaker by using two memory-based models (AM and TiMBL). The models were highly successful in producing the desired output without resorting to any rules or generalizations. We also report the results of three questionnaire studies, all of which show that native speakers are extremely sensitive to the language input they receive, in line with usage-based theories of language, as well as that mental grammars are gradient. The speakers’ ratings and production rates closely matched the proportions of the doublet pairs in the corpus. Furthermore, speakers distinguish between several levels of domination of one ending over another. When the domination of one form is weak, speakers resort to a different decision criterion, namely they look at the dominant ending of phonologically similar words. Table of contents Chapter 1. Introduction - Can something be said ‘both ways’? ........ 5 1.1. Definition of research topic and terminology ........................................................ 5 1.2. Research questions ............................................................................................... 11 1.3. Dissertation overview ........................................................................................... 19 Chapter 2. What can doublets tell us about mental grammars? ..... 21 2.1. Introduction to linguistic theories ........................................................................ 21 2.2. Previous treatments of doubletism in linguistic theory ....................................... 26 2.2.1. Doubletism in the generative tradition .......................................................... 26 2.2.2. Doublets in the cognitive tradition ................................................................ 30 2.2.3. Doublets in Croatian philology ...................................................................... 41 Chapter 3. Types of linguistic data - diverging or converging evidence? ............................................................................................. 47 3.1. Intuition judgments .............................................................................................. 47 3.1.1. Issues with intuition judgments ..................................................................... 49 3.2. Corpora ................................................................................................................. 54 3.2.1. Issues with corpora ........................................................................................ 56 3.3. Methodological pluralism – comparing intuitions and corpus data .................... 57 Chapter 4. Morphological doublets in Croatian - a corpus approach ............................................................................................. 65 4.1. Development of Standard Croatian ..................................................................... 68 4.2. Croatian corpora .................................................................................................. 73 4.2.1. Hrvatski nacionalni korpus (Croatian national corpus, HNK) .................... 74 4.2.2. Hrvatska jezična riznica (Croatian Language Repository, HJR)................. 75 4.2.3. Croatian web corpus (hrWaC14) .................................................................. 75 4.3. Declension patterns of Croatian nouns ................................................................ 76 4.3.1. A-declension ................................................................................................... 77 4.3.1.1. Doubletism in the vocative singular (Vsg) .............................................. 77 4.3.1.2. Doubletism in the instrumental singular (Isg) ........................................ 78 4.3.1.3. Doubletism in the plural paradigm ......................................................... 84 4.3.1.4. Doubletism in the long plural.................................................................. 88 4.3.1.5. Doubletism in the genitive plural (Gpl) .................................................. 89 4.3.1.6. Isolated examples of stem doubletism in A-declension........................... 90 4.3.2. E-declension ................................................................................................... 91 1 4.3.2.1. Doubletism in the dative/locative singular (Dsg/Lsg) ............................. 91 4.3.2.2. Doubletism in the vocative singular (Vsg) .............................................. 94 4.3.2.3. Tripletism in the genitive plural (Gpl) .................................................... 95 4.3.3. I-declension ................................................................................................... 101 4.3.3.1. Doubletism in the instrumental singular (Isg) ...................................... 102 4.3.3.2. Gender doubletism ................................................................................ 104 4.3.3.3. Isolated examples of doubletism in I-declension ..................................

Morphological Doublets in Croatian: a Multi-Methodological Analysis

CLIB 2016 Proceedings

Conference Abstracts

Languages in the European Information Society Croatian

Overabundance in Croatian Dual-Class Verbs FLUMINENSIA, God

The Bulgarian National Corpus: Theory and Practice in Corpus Design

Deliverable D3.3.-B Third Batch of Language Resources

Pdf Pravdová, M., & Svobodová, I

Proceedings of Rely on Different Character Sets Such As MATMT2008 Workshop: Mixing Approaches to CJK Or Arabic

Bollettino SOCIETÀ DI LINGUISTICA

GOVOR / SPEECH Zagreb, Godina 37, 2020, Broj 2, (2021) Mrežna Inačica: ISSN 1849-2126 UDK 81'34(05)"540.6" CODEN GOVOEB Tiskana Inačica: ISSN 0352-7565

Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC

Proceedings of CLARIN Annual Conference 2020