Processing Verb Clusters Published by LOT Phone: +31 20 525 2461 Kloveniersburgwal 48 E-Mail: [email protected] 1012 CX Amsterdam the Netherlands
Total Page:16
File Type:pdf, Size:1020Kb
Processing verb clusters Published by LOT phone: +31 20 525 2461 Kloveniersburgwal 48 e-mail: [email protected] 1012 CX Amsterdam http://www.lotschool.nl The Netherlands Cover illustration: Jelke Bloem. The cover shows a color gradient from green to LOT-red, representing the green and red verb cluster order and the shift in their order preference. ISBN: 978-94-6093-371-4 DOI: https://dx.medra.org/10.48273/LOT0586 NUR: 616 Copyright c 2021 Jelke Bloem. All rights reserved. Processing verb clusters ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof. dr. ir. K.I.J. Maex ten overstaan van een door het College voor Promoties ingestelde commissie, in het openbaar te verdedigen in de Agnietenkapel op vrijdag 12 maart 2021, te 16.00 uur door Jelke Bloem geboren te Groningen Promotiecommissie Promotores: prof. dr. A.P. Versloot Universiteit van Amsterdam prof. dr. F.P. Weerman Universiteit van Amsterdam Overige leden: prof. dr. H.J. Bennis Universiteit van Amsterdam prof. dr. P.C. Hengeveld Universiteit van Amsterdam prof. dr. J.E. Rispens Universiteit van Amsterdam prof. dr. J.E.J.M. Odijk Universiteit Utrecht prof. dr. A.P.J. van den Bosch Universiteit van Amsterdam dr. F. Van de Velde KU Leuven Faculteit der Geesteswetenschappen Het hier beschreven onderzoek werd mede mogelijk gemaakt door steun van de Fryske Akademy. Contents Acknowledgements . xi Author contributions . xv 1 Introduction 1 1.1 Verb clusters . .2 1.2 New methods . .5 1.2.1 Automatically annotated corpora . .5 1.2.2 Agent-based models . .7 1.2.3 Language modeling . .7 1.3 Topics in variation and change . .8 1.3.1 Processing . .8 1.3.2 Variation affected by change . .8 1.3.3 Language contact . .9 1.4 Outline . .9 2 Applying automatically parsed corpora to the study of lan- guage variation 11 2.1 Introduction . 12 2.2 Verbal cluster variation . 14 2.3 Multifactorial modeling of language variation . 18 2.3.1 Using automatically annotated data . 19 2.3.2 Verbal cluster variation . 20 2.4 Method and data . 21 2.5 Results . 24 2.5.1 Europarl corpus . 28 2.6 Discussion . 28 2.6.1 Linguistic interpretation . 29 2.6.2 Future work . 30 vi 3 Evaluating Automatically Annotated Treebanks for Linguistic Research 33 3.1 Introduction . 34 3.1.1 Automatically Annotated Treebanks . 35 3.1.2 Construction-specific Querying . 36 3.2 Linguistic Studies using Automatically Annotated Treebanks . 36 3.3 Current Approaches to Evaluation . 38 3.4 Linguistically Informed Evaluation . 40 3.4.1 Manual Evaluation of the Results . 40 3.4.2 Manual Evaluation of the Text . 41 3.4.3 Fall Back to Simpler Annotation . 42 3.4.4 Search for Particular Instances . 43 3.5 Conclusion . 45 4 Verbal cluster order and processing complexity 47 4.1 Introduction . 48 4.2 Related work . 51 4.2.1 Word order variation . 51 4.2.2 Processing complexity . 52 4.2.3 Verb clusters . 54 4.2.4 Verb cluster order variation . 58 4.3 The processing hypothesis . 59 4.4 Method and data . 62 4.5 Measures of processing complexity . 64 4.6 Results . 78 4.7 Discussion . 84 4.8 Conclusion . 89 5 Testing the Processing Hypothesis of word order variation us- ing a probabilistic language model 91 5.1 Introduction . 92 5.2 The case of Dutch two-verb clusters . 94 5.3 Processing complexity . 95 5.3.1 Theoretical models . 95 5.3.2 Empirical measures . 97 5.4 Data . 99 5.5 Language model . 99 5.6 Results . 100 5.7 Discussion . 104 6 Lexical effects on Dutch verbal cluster order 107 6.1 Introduction . 108 6.1.1 Previous work . 110 6.2 Lexical and semantic associations . 112 6.2.1 Associations with related constructions: Lexical preferences113 vii 6.2.2 Semantic differences between the orders . 114 6.3 Measuring lexical associations . 115 6.3.1 Collostructional analysis . 115 6.3.2 Gain Ratio . 117 6.3.3 Residuals of logistic regression . 118 6.4 Verb clusters . 118 6.5 Method and data . 121 6.5.1 Method of analysis . 121 6.5.2 Corpus data . 123 6.6 Results . 125 6.6.1 Initial distinctive collexeme analysis . 125 6.6.2 Adjectivity factor . 127 6.6.3 Particle verb factor . 128 6.6.4 Auxiliary verb factor . 130 6.6.5 Semantic analysis . 130 6.6.6 Gain Ratio . 134 6.6.7 Residuals of logistic regression . 135 6.7 Discussion . 137 6.8 Conclusions and future work . 138 7 An agent-based model of a historical word order change 141 7.1 Introduction . 142 7.2 Verbal clusters . 143 7.3 Methodology . 144 7.4 Results . 148 7.4.1 Dutch . 150 7.5 Discussion . 152 8 Learned borrowing or contact-induced change: Verb cluster word order in Early-Modern Frisian 155 8.1 Introduction . 156 8.2 Verb clusters in Frisian . 158 8.2.1 Modern Frisian . 158 8.2.2 Historical varieties of Frisian . 160 8.2.3 `Mixed Frisian' texts and language contact . 162 8.2.4 Language contact, acquisition and change . 163 8.2.5 Verb clusters in `Mixed Frisian' . 165 8.2.6 Hypotheses . 168 8.3 Verb clusters in Dutch . 169 8.3.1 Factors of variation . 169 8.3.2 Processing verb cluster orders . 171 8.4 Data and methodology . 173 8.4.1 Texts . 173 8.4.2 Annotation . 175 8.4.3 Automatic extraction . 176 viii 8.4.4 Factors potentially affecting order variation . 176 8.5 Results . 178 8.5.1 Evaluation . 180 8.5.2 Multifactorial model of order variation . 182 8.6 Discussion . 188 8.7 Conclusion . 190 9 Synchronic variation and diachronic change in Dutch two-verb clusters 191 9.1 Introduction . 192 9.2 Background: variation and change in the order of Dutch verb clusters . 195 9.2.1 Factors influencing the order variation in verb clusters in present-day Dutch . 195 9.2.2 Diachronic developments in the order variation in Dutch two-verb clusters . 197 9.3 The relation between diachronic change and synchronic variation 199 9.3.1 Synchronic variation and diachronic change connected . 199 9.3.2 The apparent-time method . 199 9.3.3 Hypothesis and predictions . 201 9.4 Method and analysis . 202 9.4.1 Corpus Gesproken Nederlands (CGN) . 202 9.4.2 PaQu syntactic search . 204 9.4.3 COREX . 208 9.4.4 Analysis methods . 208 9.4.5 Summary of the selected data . 209 9.5 Results . 210 9.5.1 Intraspeaker variation . 210 9.5.2 Word order preferences across age groups . 210 9.5.3 Word order preferences across age groups per auxiliary verb ............................. 211 9.6 Discussion and conclusion . 214 10 Conclusion 219 10.1 Answers to the research questions . 219 10.1.1 Automatically annotated corpora . 219 10.1.2 Processing . ..