Probabilistic and Multivariate Modelling in Latin Grammar: the Participle-Auxiliary Alternation As a Case Study
Total Page:16
File Type:pdf, Size:1020Kb
Probabilistic and Multivariate Modelling in Latin Grammar: the Participle-Auxiliary Alternation as a Case Study A thesis submitted to The University of Manchester for the degree of Doctor of Philosophy in the Faculty of Humanities 2014 James W. R. Brookes School of Arts, Languages and Cultures Contents List of Tables 15 List of Figures 17 List of Abbreviations 18 Abstract 19 Declaration 20 Copyright 21 Acknowledgments 22 I Preliminaries 23 1 General Introduction 24 1.1 Background . 24 1.2 Research questions and scope . 24 1.2.1 Construction under consideration as a case study . 25 1.2.2 Latin data . 25 1.2.3 Information space . 25 1.3 Brief overview of statistical methods . 26 1.4 Organization of the thesis . 26 2 Outlining the Case Study 27 2.1 Introduction . 27 2.2 The morphosyntax of the Latin participle-auxiliary construction . 27 2.3 The construction's grammatical variability . 28 2.4 Overview of previous analyses of Latin pac variability . 31 2.5 Previously claimed language-internal constraints . 32 2.5.1 A free choice? . 32 2.5.2 Information structure . 32 2.5.3 Semantics of the participle . 32 2.5.4 Participle type . 33 2.5.5 Impersonal verbs . 33 2.5.6 Number . 33 2.5.7 Verbal morphology of the auxiliary . 33 2.5.8 Grammatical properties of the pre-verbal word. 33 2.6 Problems with previous research . 34 2 CONTENTS 3 The Probabilistic and Multivariate Framework 35 3.1 Introduction . 35 3.2 Probability in language . 35 3.3 Probabilistic and multivariate grammar: what is it? . 36 3.4 Corpus-based implementations of probabilistic models . 37 3.4.1 The dative alternation . 37 3.4.2 Dutch avc alternation . 38 3.4.3 Interim summary . 39 3.5 Psycholinguistic evidence . 39 3.5.1 Offline evidence . 39 3.5.2 Online evidence . 40 3.6 Conclusions and issues . 41 II Data and Materials 42 4 Defining and Delimiting the pac for Operational Purposes 43 4.1 Introduction . 43 4.2 Previous definitions and problems . 43 4.3 Definition in the present study . 45 4.3.1 Perfect participle + esse ................................. 45 4.3.2 Future participle + esse ................................. 46 4.3.3 Gerundive + esse ..................................... 46 4.3.4 Exclusion of other deverbal forms + esse ....................... 47 4.4 Final remarks . 48 5 The Choice of Data 49 5.1 Introduction . 49 5.2 Excluded datasources . 49 5.2.1 Inscriptions . 49 5.2.2 Poetry . 50 5.2.3 Categorical authors . 50 5.3 Included datasources . 50 5.3.1 Cicero . 50 5.3.2 Caesar . 51 5.3.3 Caesar's continuators . 51 5.3.4 Nepos . 51 5.3.5 Varro . 51 5.4 Conclusion . 52 6 Data Extraction 53 6.1 Electronic text archive . 53 6.2 Identification of data points . 53 6.3 Textual issues . 54 6.4 Frequency overview . 54 III Information Space 58 7 Methodological Preliminaries 59 7.1 Introduction . 59 3 CONTENTS 7.2 The information space . 59 7.2.1 Definition . 59 7.2.2 Specification of Ω . 60 7.3 Operationalizing the predictors . 60 7.4 Types of variables . 60 7.5 Statistical modelling . 61 7.5.1 Ordinary least squares regression . 62 7.5.1.1 Least squares . 62 7.5.1.2 Statistical inference . 67 7.5.1.2.1 Hypothesis testing . 67 7.5.1.2.2 Error variance . 67 7.5.1.2.3 Standard error . 67 7.5.1.2.4 Confidence intervals . 67 7.5.1.3 Regression diagnostics . 68 7.5.1.3.1 Normality of errors . 68 7.5.1.3.2 Linearity . 68 7.5.1.4 Dummy variable coding for categorical predictors . 68 7.5.1.4.1 Binary variables . 69 7.5.1.4.2 Multicategory variables . 70 7.5.1.4.2.1 Treatment coding . 70 7.5.1.4.2.2 Sum coding . 71 7.5.2 Generalized linear models . 72 7.5.2.1 Introduction to binary response models . 72 7.5.2.2 Modelling binary data in linear regression . 73 7.5.2.3 Logit link function . 75 7.5.2.4 Maximum Likelihood Estimation . 75 7.5.2.5 An example . 76 7.5.3 Multilevel generalized linear models . 78 7.5.3.1 Introduction . 78 7.5.3.2 Variance components models . 79 7.5.3.3 Random intercepts and random slope models with a categorical covariate 79 7.6 Explanation of predictor activity . 80 7.7 Summing up . 80 8 Multilevel Structure 81 8.1 Introduction . 81 8.2 The glm null model . 81 8.3 Text-based variability . 82 8.4 Lemma-based variability . 84 8.4.1 Background . ..