<<

OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Understanding and Measuring Morphological Complexity OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Understanding and Measuring Morphological Complexity

Edited by MATTHEW BAERMAN, DUNSTAN BROWN, AND GREVILLE G. CORBETT

1 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

3 Great Clarendon Street, Oxford, ox2 6dp, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © editorial matter and organization Matthew Baerman, Dunstan Brown, and Greville G. Corbett 2015 © the chapters their several authors 2015 Themoralrightsoftheauthorshavebeenasserted First Edition published in 2015 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2014947260 ISBN 978–0–19–872376–9 Printed and bound by CPI Group (UK) Ltd, Croydon, cr0 4yy Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contents

List of Figures and Tables vii List of Abbreviations xi List of Contributors xiii

Part I. What is Morphological Complexity?  Understanding and measuring morphological complexity: An introduction  Matthew Baerman, Dunstan Brown, and Greville G. Corbett  Dimensions of morphological complexity  Stephen R. Anderson

Part II. Understanding Complexity  Rhizomorphomes, meromorphomes, and metamorphomes  Erich R. Round  Morphological opacity: Rules of referral in Kanum verbs  Mark Donohue  Morphological complexity à la Oneida  Jean-Pierre Koenig and Karin Michelson  Gender–number marking in Archi: Small is complex  Marina Chumakina and Greville G. Corbett

Part III. Measuring Complexity  Contrasting modes of representation for inflectional systems: Some implications for computing morphological complexity  Gregory Stump and Raphael A. Finkel  Computational complexity of abstractive  Vito Pirrelli, Marcello Ferro, and Claudia Marzi  Patterns of and paradigm complexity: The case of Old and Middle Indic  Paolo Milizia OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

vi Contents

 Learning and the complexity of Ø-marking  Sebastian Bank and Jochen Trommer

References  Languages Index  Names Index  Subject Index  OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

List of Figures and Tables

Figures Figure 3.1 Kayardild clause structure and attachment of features 31 Figure 3.2 Features which percolate onto word ω 34 Figure 5.1 A hierarchy of nom-index 80 Figure 5.2 A hierarchy of person values 80 Figure 5.3 A hierarchy of gender values 80 Figure 5.4 A hierarchy of number values 80 Figure 6.1 Types of simple dynamic verbs according to the phonological shape of their perfective stem (total 142) 112 Figure 8.1 One-to-one and one-to-many relations in one-node-per-variable and many-nodes-per-variable neural networks 146 Figure 8.2 A two-level finite state transducer for the Italian irregular form vengono ‘they come’ 150 Figure 8.3 Outline architecture of a TSOM and a two-dimensional 20×20 TSOM 152 Figure 8.4 Topological propagation of long-term potentiation and long-term depression of temporal re-entrant connections over two successive time steps; and word-graph representation of German past participles 154 Figure 8.5 Topological dispersion of symbols on temporal and spatio-temporal maps, plotted by their position in input words 157 Figure 8.6 Alignment plots of the finden paradigm on a temporal (left) and a spatio-temporal (right) map 158 Figure 8.7 (Topological) dispersion values across activation patterns triggered by selected inflectional endings on temporal and spatio-temporal maps for Italian and German known word forms,andonunknownwordforms 159 Figure 8.8 BMU activation chains for vediamo-vedete-crediamo on a 20×20 map (left) and their word-graph representation (right) 160 Figure 8.9 Correlation coefficients between alignment scores and recall scores for novel words, plotted by letter positions relative to the stem-ending boundary 161 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

viii List of Figures and Tables

Figure 8.10 Stem dispersion over pairs in Italian and German verb forms. Dispersion values are given as a of 100overthemap’sdiagonal 163 Figure 9.1 A graph representing the relative frequencies of morphosyntactic property combinations in Old Indic adjectival paradigms 172

Tables Table 3.1 Morphome types and their mode of categorization 29 Table 3.2 Inflectional features and their potential values 32 Table 3.3 Features for which lexical stems can inflect 37 Table 3.4 Inflectional features and their morphomic exponents 51 Table 3.5 Phonological exponents of morphomes 51 Table 4.1 Inflection for ampl ‘laugh at’ with different subjects, objects, and tenses 56 Table 4.2 Free pronouns in Kanum, absolutive, and ergative forms 56 Table4.3 Kanumverbalinflection 57 Table 4.4 Inflection for makr ‘roast’ with different subjects, objects, and tenses 61 Table 4.5 Opacity and transparency in pronominal systems 62 Table 4.6 Tense distinctions portmanteau with affixes 62 Table 4.7 Pronominal distinctions in the idealized agreement affixes 63 Table 4.8 Takeovers in the object prefixes for two verbs 63 Table 4.9 Referrals in the object prefixes 64 Table 4.10 Referrals in tense affixes 64 Table 4.11 Referrals in the subject suffixes 64 Table 5.1 Oneida prenominal prefixesC ( -stem allomorphs) 73 Table 5.2 The nineteen possible categories of semantic indices 79 Table 5.3 Number of morphs in each stem class from which all other fifty-seven forms can be deduced 88 Table 5.4 Number and percentage of stems of each class in twelve texts 89 Table 6.1 Agreeing lexical items in the Archi dictionary 95 Table 6.2 Syncretism pattern A 98 Table 6.3 Syncretism pattern B 98 Table 6.4 Archi prefixes 98 Table 6.5 Archi infixes, Set I 98 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

List of Figures and Tables ix

Table 6.6 Archi infixes, Set II 99 Table 6.7 Archi suffixes 99 Table 6.8 Partial paradigm of first and second person pronouns 102 Table 6.9 Verbs of similar phonology, but different patterns of realizing agreement 111 Table 7.1 Orthographic forms of the verbs in (1) 121 Table 7.2 Hearer-oriented plat for the verbs in (1) 122 Table 7.3 Speaker-oriented plat for the verbs in (1) 123 Table 7.4 Morphological indices employed in the speaker-oriented plat 124 Table 7.5 Two hypothetical plats 125 Table 7.6 Plat representing a hypothetical system of ICs 127 Table 7.7 Inflection classes, exponences, and distillations in the two plats 128 Table 7.8 4-MPS entropy (× 100) of the four distillations in Table 7.7 129 Table 7.9 Distinct exponences in each of the four distillations 129 Table 7.10 Traditional principal parts of five Latin verbs 130 Table 7.11 Optimal dynamic principal-part sets of three verbs (H-O plat) 131 Table 7.12 Dynamic principal-part sets in the two plats 132 Table 7.13 Dynamic principal-part numbers in the two plats 132 Table 7.14 Candidate principal-part sets for cast in the H-O plat 133 Table 7.15 Candidate principal-part sets for cast (S-O plat) 133 Table 7.16 Number of viable optimal dynamic principal-part sets 134 Table 7.17 Candidate principal-part sets for pass (H-O plat) 134 Table 7.18 Candidate principal-part sets for pass (S-O plat) 135 Table 7.19 Density of viable dynamic principal-part sets among all candidate sets having the same number of members 135 Table 7.20 IC predictability of twelve verbs in the H-O and S-O plats 137 Table 7.21 Cell predictability measures for thirteen verbs in the H-O and S-O plats 138 Table 7.22 Predictiveness of a verb’s cells, averaged across verbs 139 Table 7.23 Predictiveness of a verb’s past-tense cell in the H-O and S-O plats 139 Table 9.1 Relative frequencies of inflectional values on the basis of Lanman (1880) 171 Table 9.2 Old Indic -a-/-a¯- adjective declension and relative frequencies of the sets of inflectional value arrays associated with the different exponents 173 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

x List of Figures and Tables

Table 9.3 R and D values for some hypothetical variants of the paradigm in Table 9.2 174 Table 9.4 Pali -a-/-a-¯ adjective declension 175 Table 9.5 Syncretism in the marked gender and in the marked number in Pali -a-/-a-¯ adjectives 176 Table 9.6 Old Church Slavic (a) and Russian (b) definite adjective 178 Table 9.7 Gender syncretism and gender semi-separate exponence in the of the Latin -o-/-a¯ and of the Pali -a-/-a-¯ adjectives 179 Table 9.8 Vertical and horizontal syncretism in a hypothetical paradigm inflected for number, gender, and case 181 Table 9.9 Jaina-Mah¯ ar¯ as¯.t.r¯ı -a-/-a-¯ adjective declension 183 Table 10.1 Typological pilot study: language sample 203 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

List of Abbreviations

Aagent ABL ablative ABS absolutive ACC accusative ALLO allomorphy ATTR attributive BMU Best Matching Unit BMU(t) Best Matching Unit at time t CAUS COMP complementization DAT dative DP dual or plural (Chapter 5) DP determiner phrase DU dual ERG ergative EX/EXCL exclusive F feminine FACT factual FEM feminine FI third person feminine singular or third person indefinite FZ feminine zoic GEN genitive GEND gender HAB habitual H-O hearer-oriented IC inflection class (Chapter 7) IC information content (Chapter 9) IE Indo-European IMP imperfective IN inclusive (Chapter 5) INCL inclusive INDEF indefinite INSTR instrumental JN joiner vowel JUNC juncture KS known subject LOC locative Mmasculine MASC masculine OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

xii List of Abbreviations

MPS morphosyntactic property set Nneuter NEG negation NON-OBL non-oblique NP noun phrase NTR neuter NUM number OBL oblique OI Old Indic OT Optimality Theory Ppatient pastPple past participle PERS person PFV perfective PL plural PNC punctual PRS present presInd present indicative presPple present participle PRIM primary morphome PP prepositional phrase REP repetitive SEJ sejunct SEQ sequential SG singular S-O speaker-oriented SOM self-organizing map STAT stative/perfective SUB subjunctive TAMA athematic tense/aspect/mood TAMT thematic tense/aspect/mood TSOM temporal self-organizing map VOC vocative OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

List of Contributors

Stephen R. Anderson, Yale University Matthew Baerman, University of Surrey Sebastian Bank, University of Leipzig DunstanBrown,UniversityofYork Marina Chumakina, University of Surrey Greville G. Corbett, University of Surrey Mark Donohue, Australian National University Marcello Ferro, Institute for Computational Linguistics, CNR Pisa Raphael A. Finkel, University of Kentucky Jean-Pierre Koenig, State University of New York at Buffalo Claudia Marzi, Institute for Computational Linguistics, CNR Pisa Paolo Milizia, University of Cassino Karin Michelson, State University of New York at Buffalo Vito Pirrelli, Institute for Computational Linguistics, CNR Pisa Erich R. Round, University of Queensland Gregory Stump, University of Kentucky Jochen Trommer, University of Leipzig OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Part I

What is Morphological Complexity? OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Understanding and measuring morphological complexity: An introduction

MATTHEW BAERMAN, DUNSTAN BROWN, AND GREVILLE G. CORBETT

Language is a complex thing, otherwise we as humans would not devote so much of our resources to learning it, either the first time around or on later attempts. And of the many ways languages have of being complex, perhaps none is so daunting as what can be achieved by inflectional morphology. A mere mention of the 1,000,000- form verb paradigm of Archi or the 100+ inflection classes of Chinantec is enough to send shivers down one’s spine. Even the milder flavours of Latin and Greek have caused no end of consternation for students for centuries on end. No doubt much of the impression of complexity that inflectional morphology gives is due to its idiosyncrasy, both at the macro- and micro-level. At the macro-level it appears to be an entirely optional component of language. Other commonly identified components of language have a broader, even universal remit. There is not a single known language to which the notions of , phonology, , and pragmatics cannot be profitably applied. And probably all languages have something that can be described as derivational morphology, though individual traditions differ on how they draw word boundaries; in any case, it would be hard to imagine a language without productive means to create new words. But many languages do completely without inflectional morphology, so that the mere fact that it exists at all in some other languages is something remarkable. At the micro-level, inflectional morphology is idiosyncratic because each language tells its own self-contained story, much more so than with other linguistic components. That is, one can take two unrelated languages and discover that they share similar syntax or phonology, but one would be hard pressed to find two unrelated languages with the same inflectional morphology. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Matthew Baerman, Dunstan Brown, and Greville G. Corbett

This idiosyncrasy has long stood in the way of describing the properties of inflec- tional systems across languages in any comprehensive fashion. Although proposals have been advanced over the years about universal constraints on inflectional struc- ture (e.g. Carstairs 1983 on inflection classes, Bobaljik 2012 on patterns of ), the actual task of verifying them is so onerous that they remain suggestive hypotheses. Otherwise, inflection is left to conduct its business in the twilight zone of asystematic lexical specification, free from the scrutiny afforded to syntax. But as in other fields, what looks a mess on the surface may, if subject to the appropriate analysis, reveal itself to be a complex system with its own internal . This logic may not map all that readily onto anything else, but for that reason it is all the more worth uncovering; this is because it does not necessarily follow from our prior conceptions of how things ought to work. Related fields within linguistics have contributed a particular view of complexity that has shaped our vision of the components of language, including morphology. Ackerman et al. (2009) note that morphology can be considered either in syntagmatic or paradigmatic terms. They concentrate on the paradigmatic dimension, but the syntagmatic conceptualization, which dominates several sub-disciplines, has often channelled thinking about complexity in morphology and other components of grammar. That is, there is a view of complexity in terms of the relationships between concatenated elements, rather than in terms of paradigmatic oppositions. This is only natural if one applies the logic associated with the analysis of syntax, but if one wishes to understand distinctions that are unnecessary from the point of view of syntax, then concentrating on one of these dimensions to the exclusion of the other is insufficient. Computational complexity is an important notion, but not the main focus of this volume. Jurafsky and Martin (2009: 563) note that grammars can be understood in terms of their generative power or the complexity of the phenomena which they are being used to describe. Well-known examples of complexity from this perspective related to morphology include Culy’s (1985) discussion of the whichever X construc- tion in Bambara, a Mande language of Mali. There is, as ever, the important problem that we should not necessarily draw inferences about the overall computational complexity of a language from particular constructions, as pointed out by Mohri and Sproat (2006: 434), because it is possible for a regular language to contain a context- free or context-sensitive subset, for instance. In this volume we concentrate on morphological complexity as the additional structurethatcannotreadilybereducedtosyntaxorphonology.AsAnderson(this volume) notes, human languages already have a combinatorial system, the syntax, and they already have a system for the expression of linguistic signs in form, the phonology. Morphology is therefore a kind of complexity which is entirely unnecessary from this perspective. While morphotactics are concerned with the combinatorial or syn- tagmatic dimension they are not the same thing as the syntax, because we can find different orderings obtaining within the different components. This need not result in OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Understanding and measuring morphological complexity  differences of complexity in the sense in which we are used to it within formal language theory, but it represents an unnecessary additional combinatorial system. Layered on topofthisissueisthefactthatmorphologyalsoexhibitsaverydifferent,paradigmatic, complexity which cannot be considered in terms of combinatorics. Complexity in the sense in which we are employing it is about distinctions which are excessive from the point of view of the other essential components, either because there is an additional syntagmatic or combinatorial system over and above the one found in syntax, or because the paradigmatic dimension allows for cross-cutting distinctions which are not relevant for syntax. In some approaches it may be a straightforward matter to specify inflectional class information. Indeed, the existence of inflectional features as such is not an issue for finite-state approaches to morphology, where thisinformationisencodedaspartofthestem(seediscussioninHippisley2010: 39–41, for instance). The larger issue here is that what is or is not ‘excessive’ depends on one’s particular model, and inflectional classes are just one example of this problem. From a purely formal perspective, then, the existence of morphological features does not say much about generative power or complexity in terms of strings. It is really about the need to distinguish different description languages to talk about morphology separately from syntax, for instance. Indeed, if we are to measure and understand the complexity of morphology we need to adapt methods to capture the degree of predictability between different paradigm cells, for instance. Entropy-based measures have been employed for this purpose, as discussed in Ackerman et al. (2009). Entropy-based measures are related to predictability. Under such measures there is a greater degree of entropy in the system if new instances are difficult to predict. Kolmogorov complexity is another measure, where the complexity of the data is seen in terms of the minimum size of the rule required to generate that data (Sagot 2013). Sagot discusses three different approaches to morphological complexity: • Counting-based • Entropy-based • Description-based The counting-based approach to morphological typology is what most typologists are familiar with. What is counted may differ between theorists, of course. Some will count the number of morphemes found in words, however these are defined; some may count the number of features or feature values available. This approach has a number of disadvantages. If we count features in different languages, we need to be able to establish that the features are comparable. Where one language has an abundance of featural distinctions in one area, and another has an abundance in a different area, how are we to judge which is more complex? More importantly, there is an assumption that the larger the number of values for a given feature, the more complex. And yet complexity can arise in languages where the inventory of relevant items is quite small. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Matthew Baerman, Dunstan Brown, and Greville G. Corbett

It is possible that the description of the system is quite involved, or that there is a significant degree of unpredictability. Entropy- and description-based measures are therefore better able to capture such differences. Sagot and Walther (2011) develop methods for measuring the complexity of various descriptions of morphology. The degree of dependence on the formal description is an important issue. Adherents of entropy-based approaches argue that these are not so dependent on the formal description, but as Sagot (2013) notes, the conditional entropy measure may depend on formalization. Sagot provides an example with reduplication for a language where all stems have the structure C1VC2. Each paradigm has two forms, as in (1). (1) Sagot’s (2013) example of the formalization-dependence of conditional-entropy Form 1 Form 2 C1VC2 C1VC2VC2 Sagotarguesthatiftheformalismwhichmodels(1)onlyallowsforconcatenative affixation, then the paradigm in (1) will be fairly complex, whereas if reduplication is permitted as an operation of the morphology, then it will be straightforward. In fact, the conditional entropy will be zero, as we can predict with certainty that form 2 isthereduplicatedversion. In Stump and Finkel’s (2013) monograph-length typology a number of detailed measures are developed for quantifying the morphological complexity of different inflectional class systems. This work starts out from the traditional notion of principal partsbuttakesthismuchfurther,movingontodevelopnovelmeasures,basedon dynamic principal parts and average cell predictor numbers, among others. Stump and Finkel (2013: 109) make an important between entropy-based mea- sures when applied to syntax and morphology. For a given syntactic context certain combinations are more predictable than others, such as the occurrence of little after very, but entropy can never be reduced to zero for active syntactic structures, because they naturally allow for different combinations. Entropy measures have also been applied to the analysis of paradigms, such as Moscoso del Prado Martín et al. (2004), Milinet et al. (2009), and Ackerman et al. (2009). However, as Stump and Finkel (2013: 111) persuasively argue, there is a major contrast when one compares what happens in morphology with what happens in syntax. In morphology, given the right implicative relations, the entropy can be reduced down to zero, whereas no matter how much syntactic context is provided, this is not possible in syntax. This shows that two radically different beasts are being described by these measures. Stump and Finkel argue further that while the entropy-measures can describe both the syntagmatic dimension (of syntax) and the paradigmatic dimension (of morphology), the set- theoretic measures of inflectional class complexity which Stump and Finkel develop cannot be used to describe syntax. The fact that these do not apply across domains entails that they are specialized for describing the implicative relations between cells. In addition to the fact that the syntagmatic properties of morphology are different OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Understanding and measuring morphological complexity  from those of syntax, this dimension of complexity which is peculiar to morphology is also worthy of consideration. Concentrating on what is peculiar to the combinatorics of morphology and the additional considerations which arise with the paradigmatic dimension is our purpose with this volume. As such it complements a body of work from the last decade which deals with the notion of complexity in language more generally. For example, Dahl (2004) takes a novel view of complexity. He starts from basic notions of information and redundancy, and works his way through relevant ideas from outside linguistics. He then analyses a range of linguistic phenomena, including inflectional morphology and featurization, showing how complexity results from historical processes. The book’s strengths are the introduction of new ideas into the discussion of linguistic complexity, with critical discussion of their applicability, and the original examination of familiar linguistic material. Sampson et al. (2009) is a volume of papers that responds to a growing recent trend in linguistics to question the principle of invariance of language complexity, the assumption that languages are all in a certain sense equally complex, and that where one particular component of a language is more complex, anotherwillbalancethisoutbybeinglessso.Itdealswithawiderangeofissuesinclud- ing discussion of its relationship with social and cognitive structures. In the introduc- tion to this volume Geoffrey Sampson takes issue with an assertion by Hockett (1958: 180–1) that ‘languages have about equally complex jobs to do’ by arguing that defining what grammar does is challenging if one wishes to make precise predictions about complexity (Sampson 2009: 2). Culicover (2013: 14) makes a similar point in relation to what he terms ‘relative global complexity’. We believe this to be a valuable insight and, instead of dealing with complexity in all its manifestations, we aim for a precise focus on the contrast between featural distinctions relevant for syntax on the one hand, and those systematic elements of morphology that appear to cross-cut syntax. Miestamo et al. (2008) is concerned above all with typology, and with the simplifying effectsofcontact.Morphologyistoucheduponinseveralofthecontributions,with complexity assessed according to two parameters: (i) the number of morphosyntactic features and values, and (ii) deviations from a one-to-one mapping between meaning and form. The first parameter treats inflectional morphology itself as an element of complexity, a question we choose to remain agnostic on for the present. But the second parameter coincides with the core concerns of the current volume, and is the particular focus of two chapters. In ‘Complexity in linguistic theory, language learning, and change’, Kusters traces developments within the Quechua verb paradigm, where various many-to-one mappings between morphosyntax and form were disentangled to something like a transparent one-to-one mapping in those varieties of the language more heavily affected by contact. In ‘Complexity in nominal plural allomorphy’, Dammel and Kürschner compare plural marking across the Germanic languages accordingtoarichsetofcriteria.Ofparticularnoteisthefactthattheylooknotjust at the forms, but at the assignment principles, so that phonologically or semantically OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Matthew Baerman, Dunstan Brown, and Greville G. Corbett predictable allomorphy is treated as less complex than that which is arbitrarily stipulated. More broadly, complexity is a notion which has wide applicability in a variety of dif- ferent fields. Gershenson and Fernández (2012: 32) note that it can be understood asa balance between order1 (information growth) and variety. Where the variety involved is great we should expect the information growth not to be high. Morphological complexity fits into this general categorization at some level. Where the relationship between syntax and its expression in form is straightforward, for example, variety in morphology does not interfere to create complexity. So the study of morphological complexity fits within a wider cross-disciplinary research programme, but we need to understand its peculiarities when applying more general techniques. The present volume is the fruit of a three-day conference held at the British Academy in London in January 2012 on the theme of morphological complexity, and represents a selection of the most interesting and relevant contributions. We have divided them up into three sections. Chapter 2 (Anderson) continues the theme of Part I, giving a typological overview of morphological complexity both in its paradigmatic and its syntagmatic aspects. The rest of the volume concentrates on three aspects of morphological complexity: (i) different expression of equivalent distinctions in particular language. At the simplest level, English expresses the plural in different wayscats ( , children, cacti) and this allomorphy of inflectional exponence divides the lexicon into different inflection classes according to the particular form taken; (ii) units which exist within morphology and which cannot be readily defined in terms of syntactic or semantic feature values; for instance, some French verbs have a different form for the first and second persons plural (a dramatic case is the verb ‘go’,with je vais ‘I go’ but nous allons ‘we go’; (iii) complexities in the realization of morphological form; thus the German Buch ‘book’ has the plural Bücher, which involves both an inflection and a change of the root vowel. The chapters in Part II deal with questions of theoretical and formal analysis of complexity as a route towards better understanding of what is involved. Chapter 3 (Round) provides a novel typology of the morphome that integrates all three themes: rhizomorphomes divide the lexicon into morphologically specified units (inflection classes), metamorphomes divide the paradigm into morphologically specified units, while meromorphomes are abstract units that serve as the building blocks of mor- phological form. Chapter 4 (Donohue) describes a system in which rules of referral play a strikingly predominant role, with forms from one cell in the paradigm being

1 This is a measure of information transformation which depends on the relationship between input and output. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Understanding and measuring morphological complexity  co-opted for use in another. While there is no apparent semantic motivation for these patterns, they follow a strict morphological hierarchy, resulting in a network of implicative relationships. The Oneida system described in Chapter 5 (Koenig and Michelson) is also characterized by massive syncretism. While the individual conflations may have a plausible morphosyntactic motivation, the broader principles which determine when and how these conflations are effected are less transparent. In this case inflection classes also play a contributing role to considerable morpho- tactic complexity otherwise characteristic of Iroquoian languages. In contrast to the morphosyntactic and morphological richness displayed by Oneida, the system of gender and number agreement in Archi, described in Chapter 6 (Chumakina and Corbett) seems quite spartan, with just a handful of affixes realizing eight possible morphosyntactic distinctions. Nevertheless, the realization of this small paradigm is anything but straightforward, as there is a large inventory of agreement targets, yet lexemes belonging to the same part of speech do not necessarily behave alike. It proves difficult to predict which items will realize gender and number, and even more challenging to determine the position of the marking (notably whether it will be prefixal or infixal). Part III concentrates on measuring and quantifying complexity using computational techniques. Inflection classes are the particular focus of the first two chapters. Chapter 7 (Stump and Finkel) describes a number of different computationally implemented metrics of the complexity of inflection class systems, pointing out an often-overlooked parameter, namely the mode of representation. They contrast a speaker-oriented (i.e. phonological) and hearer-oriented (i.e. morphologically decomposed) representation, which may give quite different results, depending on the metric. Chapter 8 (Pirrelli, Ferro, and Marzi) offers a psycholinguistically plausible computational model of the word-and-paradigm approach of inflection. This approach is motivated in particular by the presence of inflection classes, which blur any obvious segmentation between lexical and inflectional material. The result is an explicit representation of the local analogical relationships between word forms that can be used to generate paradigms. The final two chapters focus on the internal organization of the paradigm, in particular on the indirect mapping between morphosyntactic values and morpho- logical form that characterize syncretism. Chapter 9 (Milizia) returns to the long- standing question of markedness as a motivating factor behind syncretism. Given the uncertainties that plague the notion of markedness, he offers instead an information- theoretic approach, suggesting that the information load inherent in a system with inflection classes may be a motivation for the conflation of paradigmatic cells. Chapter 10 (Bank and Trommer) develops an automated method for morpho- logical segmentation which allows a quantificational assessment of different segmen- tation strategies, balancing the complexities of morphosyntactic representation and morphological exponence. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Matthew Baerman, Dunstan Brown, and Greville G. Corbett

Each chapter can be seen as a demonstration of morphological complexity. Taken together, with their different data and different methodologies, they reveal a striking picture: inflectional systems are indeed intricate and challenging, they are also elegant, andwhenviewedabstractlyenoughtheyrevealcomparablestructures.Buttheyleave the perplexing question of why this complexity and elegance is so pervasive in some languages, while other languages are devoid of inflection.

Acknowledgements Wewouldliketothankthecontributorstothisvolume,sincetheworkandideas presented here are theirs. We also thank Penny Everson and Lisa Mack for their invaluable assistance in the preparation of the manuscript. Finally, none of this wouldhavebeenpossiblewithoutthesupportoftheEuropeanResearchCouncil (grant ERC-2008-AdG-230268 MORPHOLOGY), which is gratefully acknowledged. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Dimensions of morphological complexity

STEPHEN R. ANDERSON

The question of what aspects of a language’s word structure patterns contribute to its overall complexity has a long tradition. With roots in the notions of linguistic typology associated with traditional grammar, Sapir (1921; see also Anderson 1992: §12.2) organizes these matters along three dimensions. One of these concerns the range of concepts represented by morphological markers, and refers to the extent of elaboration of the inflectional and derivational category structure of a language. A second refers to the range of marker types, and thereby differentiates transparent affixation of the sort associated with pure ‘agglutinating’ languages from a variety of other formal processes by which morphological information can be conveyed. The third dimension is that of the overall internal complexity of words, the sheer number of distinct pieces of information that are combined in a single word, ranging from the simplest case of ‘isolating’ languages that involve (little or) no morphological combination up to the ‘polysynthetic’ type1 in which most or all of the components of a full sentence are expressed within a single word. My goal in this chapter is to develop and elaborate a characterization of the mor- phological characterization of languages along lines similar to Sapir’s, so as not only to serve similar typological goals but also to provide a framework for understanding the questions of linguistic typology that motivate other authors in this volume. In that spirit, I will feel free to propose an agenda of questions to be asked about languages without being obliged to offer a comprehensive set of answers. Before proceeding to that enterprise, however, I want to step back from the details and ask what it is about morphology that constitutes ‘complexity’ in the broader picture of human natural language.

1 Not to be confused with the distinct technical sense of this term in Baker () and related work. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Stephen R. Anderson

. What is ‘complex’ about morphology? Different observers will see different things about a language as making it ‘complex’. In traditional terms, for example, the outer limit of morphological complexity in naturallanguageswasoftenseenasrepresentedbylanguagesofthepolysynthetictype, highlighting the sheer quantity of morphological elaboration as the main contributor to complexity. While teaching a course on the diversity of the world’s languages to a class of Chinese students in Beijing recently, I was exposed to what was, for me, a somewhat unusual perspective on this matter. The students had been reading some of Mark Baker’s work on the parametric description of grammars, in which he talks about ‘polysynthesis’ as a parameter of grammatical structure. They did not really know anything about the specific languages he discussed in this connection, but it was clear to them that this notion was associated with being very complex. In trying to figure out what it might mean for a language to be ‘polysynthetic’,one of my students offered aclarificationinawrittenexercise:

A ‘Polysynthetic’ language is one in which words are very complex. That is, they have more than one meaning element combined into a single word: for instance, English cat-s. From the perspective of a speaker of a Chinese language, apparently, any morpho- logical structure seems to be complex. While initially merely amusing to speakers of languages of the ‘Standard Average European’ type, I will suggest that this is a rather more coherent and principled view than it may seem at first sight. What aspects of a system contribute to its complexity? A standard rhetorical move would be to consult the dictionary for a starting point. The equivalent source of wisdom in the present age is Wikipedia, which provides the following:

A complex system is a system composed of interconnected parts that as a whole exhibit one or more properties (behavior among the possible properties) not obvious from the properties of the individual parts. ‘Complexity’, then, can be seen as the consequence of a system’s displaying charac- teristics that do not follow as theorems from its nature, as based on its irreducible components. How, then, does this apply to language? What aspects of language are essential, and what properties that languages display represent complications that are not logically necessary? Languages are systems that provide mappings between meaning, or conceptual structureontheonehand,andexpressioninsound(orsigns)ontheother.Inorder to fulfil this function, there are some kinds of organization that they have to display by virtue of their essential character. Of necessity, a language has to have a syntax, because it is the syntactic organization of meaningful elements as we find it in human languages that gives them their OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Dimensions of morphological complexity  expressive power, providing their open-ended ability to express and accommodate a full range of novel meanings. It is also plausible to suggest that languages need to have phonologies. That is because individual meaningful elements—linguistic signs—must have characteristic expressive properties to serve their essential purpose, as stressed by linguists since de Saussure.Whenthesearecombinedbymeansofthesyntacticsystem,however,the result may be at odds with the properties of the system through which they are to be implemented: the properties of the vocal tract, or of the signing articulators. There is thus a conflict between the need to preserve the distinguishing characteristics of meaningful elements and the need to express them through a system with its own independent requirements. Optimality Theory articulates this explicitly as the conflict between considerations of Faithfulness and Markedness, but similar considerations are at the foundation of every theory of phonology. Some account of how this tension is to be resolved is inherently necessary, and thus the presence of phonology, like that of syntax, follows from the nature of language. Bothsyntaxandphonologyarethusinherentinanysystemthatistofulfilthebasic requirements of a human language, and their presence (as opposed to their specific stipulated properties) cannot be seen as constituting complexity in itself. The same cannot be said for morphological structure, however, as pointed out forcefully by Carstairs-McCarthy(2010),andthatmakesithardtounderstandwhyhumansshould have evolved in such a way that their languages display this kind of organization at all. To clarify this, let us note that the content we ascribe to morphology can be divided into two parts, ‘morphotactics’ and ‘allomorphy’,and in both cases it is difficult to see the existence of such structure as following inherently from the nature of language. Morphotactics provides a system by which morphological material (grossly, but inaccurately identified with the members of a set of ‘morphemes’) can be organized into larger wholes, the surface words of the language. But in fact language already involves another system for organizing meaningful units into larger structures, the syntax, and so to the extent the morphotactics of a language can be distinguished from its syntax, this would seem to be a superfluous complication. Allomorphy is the description of the ways in which the ‘same’ element of content canberealizedinavarietyofdistinctexpressions.Whenthatvariationfollows from the language’s particular resolution of the conflict between the requirements of Faithfulness and Markedness, as described previously, this is just phonology, and can be seen as necessary. But when we find allomorphic variation that does not have its roots in the properties of the expression system, it does not have this character of necessity, and so constitutes added complexity.

.. Morphotactics = syntax In fact, the two aprioriunmotivated kinds of complication are essentially constitutive of morphology. Carstairs-McCarthy (2010) cites examples of morphotactic organi- OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Stephen R. Anderson zation that are unrelated to (and in fact contradict) the syntax of the languages in which they occur; another such example was discussed in Anderson (1992: 22–37) in Kwakw’ala, a Wakashan language of coastal British Columbia. To summarize, Kwakw’ala is a language with a rather rigid surface syntax. Sentences conform to a fairly strict template, with the verb (or possibly a sequence of verbs) coming in absolute initial position, followed by the subject DP, an object DP marked with a preceding particle beginning with x. - (if the verb calls for this), possibly another object marked with s- (again, if the verb subcategorizes for an object of this type), optionally followed by a series of PPs. Adjectives strictly precede nouns they modify, and other word order relations are similarly quite narrowly constrained by the grammar. However, Kwakw’ala also has a rich system of ‘lexical suffixes’ constituting its morphology, and these correspond functionally to independent words in other lan- guages. The point to notice is that when meaningful elements are combined in the morphology, as in the abundant variety of complex words consisting of a stem and one or more lexical suffixes, the regularities of order found in the syntax are quite regularly and systematically violated. For instance, as just noted, verbs are initial in syntactic constructions, with any objects coming later. When elements corresponding to a verb and its object are combined within a single word, however, they typically appear in the order O-V rather than V-O. Thus, u’ena-gila ‘fish.oil-make, make fish oil’ has the object of ‘make’ initially, and the element with verbal semantics following. Similarly, although the subject invariably precedes any objects in a syntactic con- struction, if (and only if) the element expressing the object of the verb is a lexical suffix attached to the verb in the morphology, it can precede the subject DP. Thus, in na’w- әm’y-ida bәgwanәm ‘cover-cheek-the man, the man covered (his) cheek’ the object of ‘cover’ is the suffix -әm’y ‘cheek’, and as a result it precedes the subject -ida bәgwanәm ‘the man’—an ordering that would be quite impossible if the object were separately expressed in the syntax. As another example, exactly when they are combined in a single word by the morphology rather than composed in the syntax, an adjectival modifier can follow the expression of the noun it modified. The lexical suffix -dzi ‘large’ not only can but must follow an associated nominal stem, as in u’aq wa-dzi ‘copper-large, large copper (ceremonial object)’. Once again, the ordering imposed by the morphotactics of the language is directly contrary to that which would be given in the syntax. There is actually something of an argument-generating algorithm here: find any systematic regularity of order in the syntax of Kwakw’ala, and it is quite likely that the principles of morphotactic organization will systematically violate it. Overall, the morphotactics of the language bear little or no resemblance to its syntax, which naturally raises the question of why a language should have two quite distinct systems OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Dimensions of morphological complexity  that both serve the purpose of combining meaningful units in potentially novel ways to express potentially novel meanings. Given the existence of two separate and distinct combinatory systems in Kwakw’ala, there is potential duplication of function: a new complex meaning might be con- structed either on the basis of the morphology or on that of the syntax. And this is indeed the case, as illustrated in (1). Kwakw’ala has a suffix -exsd that adds the notion ‘want’ to the semantics of a stem, and the meaning ‘want to (X)’ can be conveyed through the addition of this element to the stem representing a verb. But there is also a semantically empty stem ax. -,andthesuffixcanbeaddedtothistoyieldan independent verb ax. exsd ‘towant’,whichcaninturntakeanotherverbasitssyntactic complementtoyieldessentiallythesamesense. (1) a. kwakw’ala-exsd-әn speak.Kwakw’ala-want-1sg I want to speak Kwakw’ala w w b. ax.-exsd-әn q-әn k ak ’ala -want-1sg that-1sg speak.Kwakw’ala I want to speak Kwakw’ala Interestingly, there appears to have been a subtle but significant shift in the language: where traditional speakers relied heavily on the morphology for the composition of novel expressions, modern speakers are much more likely to combine meanings in the syntax. Importantly, neither the morphology nor the syntax has actually changed in any relevant way: what has happened is just that the expressive burden has shifted substantially from one combinatory system to the other. My understanding is that similar developments have occurred in other languages with complex morphology of this sort, as a function of declining active command of that system (though without loss of the ability to interpret morphologically complex words).

.. Allomorphy = phonology A similar case can be made concerning the relation between allomorphic alternation and patterns of variation dictated by the phonology of a language. Again, we can illustrate this from Kwakw’ala. As in the other Wakashan languages, Kwakw’ala lexical suffixes each belong to one of three categories, depending on their effect on the stems to which they are attached. These are illustrated in (2). (2) Hardening (roughly, glottalizing) suffixes, e.g. /qap + alud/ −→ [qap’alud] ‘to upset on rock’ Softening (roughly, voicing), e.g. /qap + is/ −→ [qabis] ‘to upset on beach’ Neutral (no change), e.g. /qap + a/ −→ [qapa] ‘(hollow thing is) upside down’ OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Stephen R. Anderson

Importantly, the category of a given suffix is not predictable from its phonological shape: it is an arbitrary property of each suffix that it is either ‘hardening’,‘softening’,or neutral. As a result, the different shapes taken by stem final consonants with different suffixes cannot be regarded as accommodations to the requirements of Markedness conditions. This example is of course quite similar to that of the initial mutations in Celtic languagesandothersuchsystems.Thereislittledoubtthat,likethese,therewasa point in the history of the Wakashan languages at which the ancestors of these suffixes did differ in phonological form, and the changes we see today are the reflexes of what were originally purely phonological alternations. But the important thing is that in the modern languages, by which I mean everything for which we have documentary evidence since the late nineteenth century, this motivation is no longer present, but the allomorphic variation persists. Languages are perfectly content, that is, to employ principles of variation whose properties do not follow from the necessary resolution of the competing demands of Faithfulness and Markedness. Such variation is the content of the component of morphology we call allomorphy, and its presence in natural language must be regarded as not following from their nature, and thus as adding complexity. So we must conclude that from the point of view of what language does and what it needs to fulfil that task, morphological structure is superfluous: neither morphotactic regularities nor non-phonologically conditioned allomorphic variation follow from the basic requirements of linguistic structure. Nonetheless, virtually all languages— even Chinese languages—have at least some morphology that is not reducible to syntax and/or phonology. Since it appears to be the case that in a very basic sense any morphology at all constitutes ‘complexity’, that fact stands in need of an explanation. A genuinely explanatory account of the basis for morphological organization would have to lie in the evolutionary history by which the human language faculty has emerged in the history of our species. Like many aspects of the structure of language, the search for such an account runs up against a general lack of firm evidence, since language in general leaves no direct trace in the physical record. Of course, just how much morphological complexity there is and where it is located can vary enormously from language to language. It is the structure of this variability to which I turn in the remainder of this chapter.

. The structure of ‘complexity space’ in morphology While some languages display very little organization that is autonomously mor- phological, others provide us with rather more to explore. Three North American languages that are notably robust in their morphology are exemplified in (3).2

2 My thanks to Marianne Mithun for the Central Alaskan Yupik and Mohawk examples here. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Dimensions of morphological complexity 

w (3) Kwakw’ala: hux. -sanola-gil-e≠ vomit-some-continuous-in.house some of them vomit in the house Central Alaskan Yupik: Piyugngayaaqellrianga-wa pi-yugnga-yaaqe-lria-nga=wa do-able-probably-intr.participial-1sg=suppose IsupposeIcouldprobablydothat Mohawk: wa’koniatahron’kha’tshero’ktáhkwen wa’-koni-at-ahronkha-’tsher-o’kt-ahkw-en factual-1sg/2sg-middle-speech-nmzr-run.out.of-caus-stative I stumped you (left you speechless) While obviously displaying more complex morphology than familiar European languages, these differ somewhat from one another. Kwakw’ala presents us with many word forms that incorporate a more diverse collection of information than we are used to in languages like English, but the individual components are relatively transparent and the degree of elaboration in words that occur in actual texts is moderate. ‘Eskimo’- Aleut languages like Central Alaskan Yupik are commonly cited as falling at the extreme end of morphological complexity, because they make use of essentially open- ended combinations of meaningful elements to construct expressions of arbitrary complexity, but the individual components of a word are still relatively easy to tease apart, and the actual degree of complexity in common use is only somewhat greater than in Kwakw’ala. Iroquoian languages like Mohawk or Oneida are somewhat more intricately organized, and the complexities of combination are much harder to disen- tangle (see Koenig and Michelson, this volume). While the results are often very elab- orate, there is a sort of upper bound imposed by the fact that, unlike the other two but similar to the Athabaskan languages, their morphology is based on a word structure templatewithalimited(iflarge)numberofslots,suchthatthedegreeofcomplexity of the material filling any particular slot is (with some exceptions) bounded. Other chapters of this volume present a variety of examples of complex morpho- logical systems. My goal here is to characterize the logical space within which such complexity falls, the major dimensions along which languages elaborate the structure of words in ways that do not derive transparently from the essential nature of these linguistic elements. These fall into two broad categories: properties of the overall system, and characteristics of the relation between individual morphological elements and their exponents, ways in which the realization of content in overt form does not follow from the nature of either.

.. Overall system complexity Morphological systems taken as wholes can differ in several ways. Some languages simply have more robust inventories of morphological material, more non-root mean- ingful elements (typically, but not always, affixes) than others. This difference is at least OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Stephen R. Anderson logically distinct from the extent to which languages combine multiple morphological elements within a single word. And where such combined expression is found, the extent to which relations between elements (such as linear order within the word) can be predicted from their nature can also vary. ... Number of elements in the system A basic sort of complexity derives from the simple matter of how much morphological elaboration a language makes available. In this regard, the languages of the Eskimo-Aleut family may be more or less at the extreme end: these languages commonly have more than five hundred derivational suffixes, and an inflectional system that involves at least as many more. The Salishand Wakashan languages of the Pacific Northwest are also rich in derivational morphology, though perhaps not quite to the same extent: Kwakw’ala, for instance, has about two hundred and fifty suffixes of this type (along with a few reduplicative processes), as documented by Boas (1947). English plays in a slightly lower league, though it is still not trivial: Marchand (1969) identifies about one hundred and fifty prefixes and suffixes in the language. Even Chinese languages, which are sometimes claimed to have ‘no morphology’, do in fact display some. Packard (2000) describes seven prefixes and eight suffixes in standard Mandarin, and provides arguments for thinking of these as morphological elements. Perhaps there are languages that are absolutely uncomplicated in this respect—Vietnamese is sometimes suggested, although this language exhibits extensive compounding, which is surely morphological structure in the relevant sense—but this is certainly an extremely rare state of affairs, if indeed it exists at all. Even a system with a comparatively small number of markers may exhibit com- plexity of a different sort when the factors determining the choice of a particular marker and the conditions of exponence (e.g. as a prefix, suffix, or infix) are themselves complex, as in the case of gender marking in Archi (Chumakina and Corbett, this volume). ... Number of affixes in a word The extent to which a language makes full use of its morphological capabilities can vary independently of the structure of the system itself. For example, the Eskimo languages all have more or less the same inventories of morphological possibilities, but some of them seem to put more weight on this aspect of their grammar than others. de Reuse (1994) observes that Central Siberian Yupik postbases are most often productive and semantically transparent, and can be added one after another in sequences of usually two or three, the maximum encountered being seven. These sequences are relatively short in comparison to other Eskimo languages, such as C[entral] A[laskan] Y[upik], where one can find more than six postbases in a word, and where it is possible to have more than a dozen. Whatisatstakehereisadifferencerelatedtothechangementionedpreviously in the extent to which modern Kwakw’ala speakers rely more on syntactic than on morphological elaboration to express complex meanings, although the poten- OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Dimensions of morphological complexity  tial expressive capacity of the morphology remains unreduced. For comparison, Kwakw’ala is roughly similar to Central Siberian Yupik in the degree of observed complexity of individual words. ... Principles of morphological combination Apart from sheer numbers of pos- sible morphological elaborations of a basic stem, either in the size of the language’s systemorinwhatcanbeobservedinindividualwords,anotherdimensionofa language’s morphological complexity is the principles that govern combinations of morphological markers. In many cases, the content (or ‘meaning’) of various parts of a word’s morphology corresponds to a structure in which some elements take semantic scope over others. The most straightforward way in which the formal correspondents of these elements can be related is for their combination to reflect such scope relations directly. Where all of the markers in question are identifiable affixes, this is achieved by having these added one after another (working out from the root), with each one taking all of the material inside it (i.e. preceding if a suffix or following if a prefix) as its scope. We can see this in Kwakw’ala, where the same affixes can combine in different orders depending on the meaning to be expressed as shown in (4). (4) a. ‘cause to want’: w w ne’nak ’-exsda-mas-ux. John gax-әn go.home-want-cause-3sg John to-1sg John made me want to go home b. ‘want to cause’: w w q’aq’oua-madz-exsd-ux. John gax-әn q-әn guk ile learn-cause-want-3sg John to-1sg that-1sg build.house John wants to teach me to build a house Here the order follows from the content properties of the elements involved, and so does not contribute complexity. Contrast that situation with one in which the order of elements within a word is specified as an autonomously morphological property, rather than following from their semantics (or something else). This situation is often referred to in terms of morphological templates, such as what we find in the Athabaskan languages. An example is the templatic order of markers within the Babine-WitsuWit’en verb, given in (5) and derived from Hargus (1997, apud Rice 2000). (5) Preverb + iterative + multiple + negative + incorporate + inceptive + dis- tributive # pronominal + qualifier + conjugation/negative + tense + subject + + stem This is actually one of the simpler and more straightforward template types found within the Athabaskan family (and is chosen here from among the many examples provided in Appendix I to Rice (2000) in part because all of the marker categories are OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Stephen R. Anderson comparatively self-explanatory). For each of these languages, we can give somewhat similar templates, specifying the order in which morphological elements appear within a rather complex whole. The principles governing such templates are not wholly arbitrary, but various factors are involved including at least some stipulation: the ordering of element classes is partly based on semantics, partly on phonology (with prosodically weaker elements located closer to the stem), and partly arbitrary. While the partial arbitrariness of such templatic morphology obviously adds com- plexity to the language in the sense being developed here, it is notable that such templates appear to be highly stable, at least grossly, over very long periods. Similarity in template structure, for example, is a significant factor in the evidence that supports a kinship between Tlingit-Athabaskan-Eyak on the one hand, and the Yeniseian languages of Siberia on the other (Vajda 2010). We may ask what factors are ‘natural’ predictors of element order within words. Like other ordering relations we find in grammar, the relative order of morphological operations—commonly, but not exclusively represented by ‘morpheme order’—is governed by more than one principle, and these do not always agree. Basic, of course, is the notion of semantic scope: a morphological operator is expected to take all of the content of the form to which it applies as its base. Another factor, though, is the typical relation between derivational and inflectional material, with the latter coming ‘outside’ the former in the general case. This relation has been asserted (in Anderson 1992 and elsewhere) to be a theorem of the architecture of grammar; a large and contentious literature testifies to the fact that this may need to be qualified in various ways, but in the present context it is only the general effect that matters. There may also be rather finer-grained ordering tendencies of a similar sort (e.g. mood inside of tense inside of agreement, etc.), as suggested in various work of Joan Bybee(e.g.Bybeeetal.1994),althoughthosealsotendtohaveratheralotofapparent exceptions. Linguistic theory needs to clarify the issue of which of these effects, if any, followfromthearchitectureofgrammar,andwhicharesimplystrongtendencies, grounded in some other aspect of language. Somewhat surprisingly, phonological effects also show up, as argued by Rice for various Athabaskan systems. This is an effect known from clitic systems: for instance, Stanley Insler argues (in unpublished work) that second position clitics in Vedic show regularities such as high vowels before low, vowel-initial clitics before consonant-initial, etc. Perhaps the appearance of similar effects in morphology is another example of how at least special clitics are to be seen as the morphology of phrases (Anderson 2005).

.. Complexity of exponence The other fundamental ways in which morphological structure can contribute to linguistic complexity derive from the non-trivial ways in which individual OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Dimensions of morphological complexity  morphological elements and their surface realizations can be related. As argued by Stump and Finkel (this volume), the nature of this complexity depends in part on the perspective adopted, whether that of the speaker producing a complex form or that of thehearerattemptingtorecovertheinformationitcontains.The‘ideal’morphological element, with what might be called canonical realization, corresponds to the classical Structuralist morpheme, with a single discrete, indivisible unit of form linked to exactly one discrete unit of content. But as we know, real morphology in real languages is only occasionally like that, and commonly deviates from this ideal in a variety of ways. ... Complexity in the realization of individual elements The simplest cases involve discontinuous aspects of a form that correspond to a single aspect of its content, including circumfixes and infixes. These are really two sides of the same coin, since an infixed form can be regarded as coming to instantiate a ‘circumfixed’ root. In both cases, a single morphological element has a discontinuous realization. Both, in turn, are simply the limiting, simplest variety of multiple exponence. Some examples are rather more exuberant than this, with my personal favourite being the way negation is multiply marked in Muskogean languages such as Choctaw. All of these are exemplified in (6) (6) Circumfixes: Slavey ya–ti ‘preach, bark, say’; cf. yahti ‘s/he preaches, barks, says’, xayadati ‘s/he prayed’, náya’e w í t i ‘we will discuss’ (Rice 2012) Infixes: Mbengokre˜ [Jé] -g- ‘plural’, cf. fãgnãn ‘to spend almost all (pl), sg fãnãn (Salanova 2012) Multiple Exponence: Choctaw akíiyokiittook ‘I didn’t go’; cf. iyalittook ‘I went’ (Broadwell 2006) In the Choctaw form, negation is marked in five independent ways: (a) substitution of a-for-li as 1sg subject marker; (b) prefixed k-; (c) suffixedo - (k); (d) an accentual featureoflengthonstem;and(e)suffixed-kii. Just as real languages involve cases in which a single element of content corresponds to multiple components of the form of words, the opposite is also true: a single elementofformcancorrespondtoseveraldistinctpartsofaword’scontent,each signalled separately in other circumstances. This is the case of cumulative morphs, typified by the -o¯ ending of Latin amo¯ ‘I love’. Indeed, particular elements of form may correspond to no part of a word’s content, in the case of ‘empty’ or ‘superfluous’ morphs.Conversely,asignificantpartofaword’scontentmaycorrespondtonopart of its form. The usual ‘solution’ to this difficulty for the classical morphemes is to posit morphological zeroes as the exponents of the content involved, but it is important to realize that this is simply a name for the problem, not a real resolution of it. A variety of forms of morphological complexity introduced by these and other non- canonical types of exponence are abundantly documented in the literature, beginning OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Stephen R. Anderson explicitly with Hockett (1947) and surveyed in Anderson (1992, to appear). These also include a variety of cases in which content is indicated not by some augmentation of the form, but rather by a systematic change of one of several sorts: subtractive morphology; Umlaut, Ablaut, and other kinds of apophony; consonant mutation; metathesis; exchange relations, and others. The bottom line is that languages abound in relations between form and content that are complex in the basic sense of violating themostnaturalwayofexpressingtheonebytheother. ... Complexity of inter-word relations Complexities of exponence are not, of course, limited to those presented by the relation between form and content in individual words referred to in section 2.2.2.1. Another class of complications to the canonical involves the range of forms built on the same base—the paradigm of a given lexeme. Several chapters in this volume are devoted to the complexity of paradigms, and the issue is thus well illustrated elsewhere, so it will suffice here simply to indicate this as a contribution to morphological complexity overall. The paradigm of a lexeme can be regarded as a structured space of surface word forms. The independent dimensions of that space are provided by the set of mor- phosyntactic properties that are potentially relevant to a lexeme of its type (defined by its syntactic properties); each dimension has a number of distinct values corre- sponding to the range of variation in its defining morphosyntactic property. Since each combination of possible values for the morphosyntactic features relevant to a given lexeme represents a different inflectional ‘meaning’, it follows that we should expect a one-to-one correspondence between distinct morphosyntactic representations and distinct word forms. To the extent we do not find that, the system exhibits additional morphological complexity. Several types of complexity of this sort can be distinguished, and each is the subject of a literature of its own. Syncretism (Baerman et al. 2005) describes the situation in which multiple morphosyntactic representations map to the same word form for a given lexeme (e.g. [hit] represents both the present and the past of the English lexeme {hit}). The opposite situation, variation, wheremultiplewordformscorrespondtothe same morphosyntactic representation, is less discussed but still exists: e.g. for many speakers of American English both the forms [dowv] and [dajvd] can represent the past tense form of {dive}. In some cases, a paradigm may be defective (Baerman et al. 2010), in that one or more possible morphosyntactic representations correspond to no word form at all for the given lexeme. A fourth kind of anomaly arises in the case of deponency (Baerman et al. 2007), where the word forms corresponding to certain morphosyntactic representations appear to bear formal markers appropriate to some other, distinct morphosyntactic content (as when the Latin active verb sequitur ‘follows’ appears to bear a marker which is elsewhere distinctive of passive verbs). All of these types of deviation from the expected mapping between relations of content and relations of form contribute to morphological complexity. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Dimensions of morphological complexity 

... Complexity of allomorphy To conclude this typology of the complexity intro- duced into language by morphological structure, it is necessary to mention the factors that determine how a given morphological element is to be realized. The simplest type here, of course, would be for each morphological unit to have a single, distinct realization in the forms of words in which it appears, but of course morphology has always attended to the fact that a single morphological element can take multiple shapes, the very definition of ‘allomorphy’.Allomorphy can contribute to the complex- ity of the system to varying degrees, though, depending on the bases of the principles underlying its conditioning. Where the variation results from the independently motivated phonology of the language, this does not contribute additional complexity to the language in the sense under discussion here, as already noted in section 2.1.2. In other cases, though, although the conditions for allomorphic variation can be stated in purely phonological terms, the actual variants that appear are not predictable from the phonology itself. Thus, in Warlpiri the marker of is -ŋku if the stem is exactly bisyllabic, but -™u when added to stems that are trisyllabic or longer. A rather more exten- sive instance of such phonologically conditioned allomorphy is found in Surmiran (a Rumantsch language of Switzerland), where essentially every stem in the language takes two unpredictably related forms, depending on whether the predictable stress conditioned by its attendant morphology falls on the stem itself or on an ending (Anderson 2011). Since phonological conditioning factors are, at least in principle, transparent, they contribute less complexity (again, in principle) than cases in which unpredictable allomorphy is based on specific morphological categories or on semantically or grammatically coherent sets of categories. These, in turn, appear less complex than ones in which the allomorphy is conditioned by (synchronically) arbitrary subsets of the lexicon, such as the Celtic mutations alluded to in section 2.1.2. Perhaps the summit of complexity with regard to the conditioning of allomorphy is the case where specific, unpredictable variants appear in a set of semantically and grammatically unrelated categories whose only unity is the role it plays in determining allomorphy. Such collections of categories, called ‘morphomes’ by Aronoff (1994), may recur in a number of rules within a language without having any particular coherence beyond this fact. See also Round (this volume) for a further elaboration of this notion and illustrations of several distinct types of ‘morphomic’ structure. Another type of allomorphic complexity is presented by formally parallel elements that behave in different ways, something that has to be specified idiosyncratically for the individual elements. This is the issue of distinct arbitrary inflectional classes into which phonologically and grammatically similar lexemes may be divided. The members of a single word class, while all projecting onto the same paradigm space, may nonetheless differ in the ways in which those paradigm cells are filled. Conversely, affixes that are formally similar may induce different sorts of modifications inthe OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Stephen R. Anderson stems to which they attach, as we saw already in section 2.1.2 with the three affixal types in Kwakw’ala. A related kind of complexity is found in languages where morphological elements can display one of several distinct types of phonological behaviour. In earlier theories such as that of Chomsky and Halle (1968), this was typically represented as a difference in the type of boundary associated with the element. Lexical Phonology incorporated this into the architecture of the grammar as the difference between the morphology of ‘Level I’,‘Level II’,etc., where specific elements are characterized by level, distinguished from clitics. Included in this category perhaps should be the case of clitics attached at various prosodic levels, as in Anderson (2005).

. The source(s) of morphological complexity This survey of ways in which word structure contributes complexity to the grammars of languages naturally raises the question of where these things come from. As argued in section 2.1, they do not follow from the intrinsic nature of the task of mapping between content and form, so where do they come from? Empirically, it seems clear that most of the ways in which grammars are morpho- logically complex arise as the outcome of historical change, restructurings of various sorts. Many of these fall under the broad category of ‘Grammaticalization’. Canon- ically, this involves the development of phonologically and semantically reduced forms of originally independent words, leading eventually to grammatical structure. Originally full lexical items may generalize their meanings in such a way as to limit their specific content, leading to their use as markers of very general situation types. When this happens, they may also be accentually reduced, leading to further phonological simplification. This, in turn, may lead to their re-analysis as clitics, with an eventual development into grammatical affixes, and so new morphology is born. But within linguistic systems, there are other possible paths that can lead to morphology where before there was only phonology and syntax. For example, phono- logical alternations, when they become opaque in some way, can also be reinterpreted asgroundedinthemorphologyinstead.ThestandardexampleofsuchachangeisGer- man Umlaut, and the overall pattern of development is quite familiar. A similar point can be made on the basis of the re-analysis of derived syntactic constructions, once they become opaque, being re-analysed as syntactically simple but morphologically complex (Anderson 1988). Historical developments thus often yield systems that are more complex in morpho- logical terms. But the opposite is true as well: when systems become more complex, that may trigger restructuring which reduces the complexity. Paradoxically, change produces complexity, but complexity can result in change. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Dimensions of morphological complexity 

It is sometimes assumed that morphology always has its origins in some other part of the grammar, particularly the syntax, as expressed in Givón’s (1971) aphorism that today’s morphology is yesterday’s syntax. But there are examples where that cannot be the case, showing that morphology seems to have some sort of status of its own. This is demonstrated particularly elegantly in phenomena found in a language that is of demonstrably recent origin, Al Sayyid Bedouin Sign Language as studied by Meir et al. (2010). In this language, more or less the entire history of the emergence of grammatical structure can be observed. The interesting point for our purposes here is that by the third generation of speakers of this language, morphological structure has beguntoemergeintheformofregularitiesofcompoundformation.Oneoftheseis the generalization that in endocentric compounds, modifiers precede their heads: e.g. ∧ pray house‘mosque’.Thisishardlyanexoticstructure.Butimportantly,itisonethat cannot have come from the syntax, because we also find that in syntactic formations, heads precede their modifiers. This demonstrates that this bit of morphology really is not entirely parasitic on other areas of grammar, however often the origin of specific morphology can be found elsewhere.

. Conclusion I started with what seems a logically plausible conception of what makes complexity: basically, some property of a system that cannot be derived from its essential character. When we look at the basic nature of human language, it seems to follow that any morphological property is of this sort, since syntax and phonology would seem to suffice unaided to fulfil the needs language serves. But given the pervasiveness of morphology in the world’s languages, and the tendency of morphological structure to be created in linguistic change rather than being uniformly eliminated, this would appear to be a sort of reductio ad absurbum of this notion of complexity, as applied to language. Inparticular,whatseemscomplextousasscientistsoflanguagemayormaynot pose problems for users of language. Strikingly, we find that little children seem to have no remarkable difficulty in acquiring languages like Georgian, or Mohawk, or Icelandic along more or less the same time course as children learning English or Mandarin. Of course, it might be that little children are just remarkable geniuses at solving problems that seem impenetrable to scientists. But it seems more likely that morphology, despite the fact that aprioriit seems like nothing but unmotivated and gratuitous complication, is actually deeply embedded in the nature of language. Although morphological structure of any sort would seem to be a serious challenge to the notion that human languages are ‘optimal’ solutions to the problem of mapping contenttoform,morphologyseemstobeafactoflife—andapartofthehuman language faculty. And that has to give us pause about our ability to say anything serious about what is or is not complex in morphological systems in any deep and basic sense. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Stephen R. Anderson

Acknowledgements I am grateful to audiences at a SSILA special session on morphological complexity in the languages of the Americas at the Linguistic Society of America Annual Meeting in Portland, Oregon in January 2012, and at the Morphological Complexity Conference in London, for discussion of this material. I am especially grateful to Mark Aronoff, Andrew Carstairs-McCarthy, and Marianne Mithun for ideas, examples, and com- ments incorporated here. My work on Kwakw’ala was supported by grants from the US National Science Foundation to UCLA. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Part II

Understanding Complexity OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes

ERICH R. ROUND

. Three species of morphome In this chapter, I wish to draw attention to a distinction between three species of morphome. A morphome (Aronoff 1994) is a category which figures prominently in the organization of a language’s morphological system, yet in its most intricate man- ifestations is anisomorphic with all syntactic, semantic and phonological categories that are active elsewhere in the grammar. Research into morphomes has intensified in recent years and it is possible now to formulate a more nuanced theory of this object of study. To that end, a distinction can be drawn between what I propose to term rhizomorphomes, meromorphomes, and metamorphomes. All three are equally morphomic categories, but of different kinds. A summary appears in Table 3.1. Rhizomorphomes are categories that pertain to sets of morphological roots. They divide the lexicon into classes whose members share similar paradigms (of inflectional word forms, derived stems, or both). Classic examples of rhizomorphomic categories are and conjugation classes (Aronoff 1994). Meromorphomes are categories that pertain to sets of word formation operations, whose task is to derive

Table . Morphome types and their mode of categorization

Morphome types Pertain to Divide up By similarity of

Rhizomorphome sets of roots the lexicon paradigms Meromorphome sets of word formation morphological patterns of exponence operations mappings Metamorphome sets of cells in a paradigm types incidence of paradigm (realizations of) meromorphomes OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round the pieces of individual word forms. Thus, meromorphomes inhere in the organisa- tion of a language’s morphological exponence system, as I will detail in this chapter. Metamorphomes are categories that pertain to distributions across a paradigm, of cells which contain pieces of exponence that are realizations of meromorphomic categories. I will have more to say about metamorphomes towards the end of the chapter. The bulk of the chapter will be devoted to clarifying how meromorphomes differ from similar concepts in current, formal morphological theory, and to presenting empirical phenomena which would appear to justify their use. The mode of argu- mentation is constructive, and so it will suffice to provide one set of examples, tobe taken from the Kayardild language, whose inflectional system is both strongly orga- nized around meromorphomic categories and problematic for existing alternatives. Kayardild is also convenient because it is free from the distraction of rhizomorphomic categories, that is, it has no conjugation or declension classes. Towards the end of the chapter I come to the topic of metamorphomes, such as the ‘L, U and N morphomes’ of Romance languages (Maiden 2005). I also highlight parallels in how the three morphome species can be complex

. Kayardild and its syntax–morphology interface Kayardild is a member of the non-Pama-Nyungan, Tangkic language family of north- ern Australia. It was traditionally spoken primarily on Bentinck Island in the south of the Gulf of Carpentaria (Evans 1995a). Typologically, it can be characterized as an agglutinative, purely suffixing, dependent-marking language. Its argument alignment is nominative–accusative. It has a fixed word order in DPs, but otherwise word order is free, to the extent that any order appears to be possible under appropriate contextual conditions. DPs and certain verbs of movement and transfer are freely elided if the meaning is recoverable from context. The language is treated in a descriptive gram- mar (Evans 1995a), and in a formal analysis of phonology, morphology and syntax (Round 2009, 2013). The description of Kayardild’s syntax–morphology interface which appears here is based on the formal analysis of Round (2013), which in turn is integrated with a formal analysis of the language’s phonology (Round 2009). This allows for reliable distinctions to be drawn between generalizations that are purely morphological versus those that are phonological. Empirically, the analyses below and inRound(2013)arebasedonacorpusofmaterialscollectedbyStephenWurmin1960, Nick Evans from 1982–2004, plus materials collected by the author in 2005–7. Other aspects of the analysis to follow are expanded upon in Round (2010, 2011, forthc., in prep). Round (2013) demonstrates that even though word order provides no evidence for it, it is possible to infer the existence of a complex clause structure in Kayardild as showninFigure3.1,whereDPsfitintotheemptypositions,andS can be embedded OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes 

SЉ COMP

SЈβ SEJ

SЈα NEG; TAMT

S

VPε TAMA VP δ TAMA

VPγ TAMA

VPβ TAMT

VPα

VЈβ

VЈα

V

Figure . Kayardild clause structure and attachment of features as a sister of V. The structure in Figure 3.1 follows not from any apriorichoice of syntactic theory, but empirically from the facts of inflectional morphology and from alternations in argument structure and discourse function, such as passivization, topicalization, and focalization. Figure 3.1 also displays certain inflectional features attaching to various nodes in the tree; we will return to this shortly. Overall, the inflectional system of Kayardild can be analysed in terms of seven mor- phosyntactic features (i.e. inflectional features), listed in Table 3.2. As a consequence of how features are assigned to individual words, a word may be specified for a value of a given feature, or it may be left unspecified for that feature; in some instances, it may even be specified for multiple values of a feature. Thus, for example, some nouns will be specified for a certain value of case, while some will have no specification for case, and some are specified for multiple case values. Table 3.2 lists the values that each of the seven features can take, when they are specified. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round

Table . Inflectional features and their potential values

Feature Abbr. Possible values

Complementization comp: [+] Sejunct sej: [+] Negation neg: [+] Athematic tama: athematic antecedent, athematic directed, tense/aspect/mood athematic incipient, athematic precondition, continuous, emotive, functional, future, instantiated, negatory, present, prior Thematic tamt: actual, apprehensive, desiderative, hortative, tense/aspect/mood immediate, imperative, nonveridical, past, potential, progressive, resultative, thematic antecedent, thematic directed, thematic incipient, thematic precondition Case case: ablative, allative, associative, collative, consequential, dative, denizen, donative, genitive, human allative, instrumental, locative, oblique, objective ablative, objective evitative, origin, privative, proprietive, purposive, subjective ablative, subjective evitative, translative, utilitive Number num: dual, plural

The distribution of these features across the words in any sentence can be accounted for in terms of their relationship to syntactic structures as follows. Features attach initiallytosomenodeinthesyntacticstructureandthenpercolatedowntoall subordinate nodes, and thus onto all subordinate words. The one constraint on percolation is that S nodes are opaque to it, and thus an embedded S constituent will not inherit features from its matrix clause. This general model ensures that different points of features’ initial attachment in the syntactic tree lead to different distributions of those features across the clause, or conversely, that by studying the distributions of features across the clause, we can infer the syntax shown in Figure 3.1 and the accompanying attachment points of features. Accordingtothismodel,andasshowninFigure3.1,thetwofeaturesassociatedwith so-called complementized clauses, comp and sej, will attach highest in the clause. Attaching further down are a negation feature neg and two tense/aspect/mood (tam) features: a ‘thematic’ feature tamt and an ‘athematic’ feature tama1 Each individual

1 Both tamt and tama express tense/aspect/mood information. The terms ‘thematic’ and ‘athematic’ refer to the morphotactic behaviour of the features’ exponents. The thematic feature, tamt is only ever realized on a stem whose morphomic representation ends with a morphomic ‘thematic’ element, whereas tama is only ever realized on a stem whose morphomic representation does not end with a thematic (Round : –, forthc.). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes  value of tama and tamt attaches to a specific one of the nodes indicated in Figure 3.1. The feature case attaches to DP nodes, and num to either DP or NP (which sits within the DP). The near-unrestricted nature of percolation leads to words getting associated with many inflectional features simultaneously. Consider for example the word ngur- ruwarrawalathinabamaruthurrka, glossed in (1). It was recorded in spontaneous speech and has the inflectional properties of word ω in Figure 3.2. It inherits the features num:plural and case:ablative that attach to its nearest DP node above it, plus case:dative from the DP node above that, tama:present from VPγ , tamt:immediate from S, sej:[+] from S α,andcomp:[+] from S β (1) ngurruwarra -walath -inaba -maruth fishtrap num:plural case::dative -urrka tamt:immediate&sej:[+] ‘for the ones from the many fishtraps’ (Evans 1995a: 66) Two comments are in order at this point, regarding features which are present but without receiving overt realization, and features which are not present at all. Not all of the features that percolate down onto a word will be overtly realized. As in many languages, some features’ realization is precluded by the realization of others, that is, there is blocking or disjunctive ordering of certain features’ realization. Specifically, of the two features comp and sej, a word will inflect overtly only for one. Accordingly, the word ngurruwarrawalathinabamaruthurrka in (1) inherits both comp:[+] and sej:[+], yet inflects overtly only for sej:[+].Similarly,awordwill inflect overtly only for one of the features tama or tamt in a given clause.2 The word in (1) inherits both tamt:immediate and tama:present, but inflects overtly only for tamt. A separate matter is that not all syntactic nodes which potentially associate with a givenfeaturealwaysdoassociatewithit.Thatistosay,individualclauses,DPs,and NPs differ from one another not only in terms of the specific value of the features that they associate with, but also in terms of whether they associate with a value of the feature at all. For example, the clause shown in Figure 3.2 is complementized, and hence the features comp and sej attach to their appropriate nodes, high in the tree. However, in uncomplementized clauses, the features comp and sej are simply absent; they do not attach in the syntax, and hence do not percolate onto any words, and thus no word in the clause will inflect for them.3 The feature neg only appears in negated,

2 Words may inflect for two such features if they originate in two different clauses (i.e. matrix and subordinate), on which, see Round (: –). 3 Forexample,thebracketedclausein(a)iscomplementized,andsoassociateswithcomp and sej (of which only sej is realized), while the bracketed clause in (b) is uncomplementized. A third possibility OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round

SЉ COMP:[+]

SЈβ SEJ:[+]

SЈα TAMT:IMMEDIATE

S

VPε

VP δ TAMA:PRESENT

VPγ

VPβ CASE:DATIVE

DP VPα NUM:PLURAL, CASE:ABLATIVE VЈ DP β

VЈα ...ω...

V

Figure . Features which percolate onto word ω

appears in (c). This contains a ‘nonsejunct’ complementized clause (Round : –) which associates with comp but not sej.Intheabsenceofsej,thecomp feature is overtly realized. (a) Jinaa bijarrb, [ ngumbaa kuruluth-arra-nth ] ? where dugong 2sg.sej kill-past-sej ‘Where is the dugong which you killed?’ (Evans : ) (b) Jinaa bijarrb? [ Nyingka kuruluth-arr ] ? where dugong 2sg kill-past ‘Where is the dugong? Did you kill it?’ (c) Jinaa bijarrb, [ nyingka kuruluth-arra-y ] ? where dugong 2sg.comp kill-past-comp ‘Where is the dugong which you killed?’ (Evans : ) OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes  main clauses, thus in most clauses neg is absent (this is true of the clause in Figure 3.2). Similarly, only some DPs and NPs are associated with case and/or number features. In Figure 3.2, the higher of the two DPs lacks any specification for number. In all cases, the logic behind the analysis of an ‘absent’ feature f is the same. In some clauses, the distribution of inflection for feature f has allowed us to infer the existence of a given syntactic node n,towhichf attaches. Words below node n inherit f, while words above it do not. Then, in a syntactically parallel clause, no such inflection for f is observed; all other evidence suggests that the syntactic structure is equivalent, yet words sitting below node n exhibit a lack of inflection for f equivalent to that exhibited by words that sit above node n.Theanalysisisthatinsuchclauses,f is absent. The system outlined here is one which leads to considerable morphological exuber- ance, as demonstrated in research by Evans (1995a, 1995b). Our focus, however, will not be exuberance, but the detail of how Kayardild’s inflectional features are realized.

. Identity of exponence: Rules of referral In sections 3.3–3.5 I analyse the realizational component of Kayardild’s inflec- tional morphology from the point of view of inferential–realizational morphology (Matthews 1974, Anderson 1992, Stump 2001). My aim is to introduce a range of empirical properties of inflectional exponence in Kayardild, and as I do so, to orient them towards certain prominent points in the landscape of formal morphological theory. Since in these few short pages I cannot do justice to the full range of potential analyses of the data, my strategy is to draw attention to aspects of the data which are well- or ill-accommodated by certain, prominent formalisms, with the understanding that similar issues will arise for at least some related approaches. For explicitness, I will assume that there exists a component of the grammar, the realizational morphology, which takes as its input a lexeme represented by a lexical index L, plus a structure Σ of inflectional features, and whose output is the underlying phonological representation of an inflected word form. The realization of feature structures Σ is analysed into constituent parts, expressible by inflectional realizational rules, the first kind of which is an inflectional rule of exponence (Infl-RExp) of the type shown in (2), where a single rule realizes some sub-structure σ of the total feature structure Σ for the lexeme L. The rule takes as one of its inputs the word form as derived so far by prior rules, indicated as φi. = + (2) Infl-RExpσ L, Σ, φi def L, Σ, φi /ki/ The output of an inflectional rule of exponence is a tuple which preserves the lexical index L and, following theories such as A-morphous morphology (Anderson 1992) and Paradigm Function Morphology (Stump 2001), also preserves the inflectional feature structure Σ. Turning to the realization itself, the rule outputs a phonological OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round form which is typically a modification of the input. In (2) the rule suffixes the string /ki/ to its phonological input.4 To generalize this last expression, we can employ an operation variable p as in (3), where p may stand for any kind of phonological operation, such as the suffixing of a string /ki/ (4a), or a nonconcatenative operation such as the application of ablaut (4b). = ( ) (3) Infl-RExpσ L,Σ,φi def L,Σ,p φi (4) a. p(φ) = φ+/ki/ b. p(φ) = ablaut(φ) When analysing the realization of inflectional feature structures in Kayardild, the central point of interest is shared exponence, and its many shades of variation. To begin, consider for example the set of inflectional feature-values in (5). Each feature- value is distinct from the others. They have different distributions in the clause (i.e. they are analysed as attaching initially to different syntactic nodes), they correlate with different semantics, and they enter into different paradigmatic relationships with other feature-values in the system. Nevertheless, all of them are realized phonologically as the suffix+ / ki/. (5) a. case:locative → /+ki/ b. tama:present → /+ki/ c. tama:instantiated → /+ki/ d. comp:[+] → /+ki/ e. tamt:immediate → /+ki/ Likewise, all of the feature-values in (6) are realized phonologically by the suffix /inta/. Nor is this unusual in Kayardild. Of the fifty-five feature-values in the Kayardild inflectional system, just under half of them share their phonological exponence exactly with at least one other feature-value (a full list of feature-values and their exponents appears in the appendix). (6) a. case:oblique → /-inta/ b. tama:emotive → /-inta/ c. tama:continuous → /-inta/ d. sej:[+] → /-inta/ e. tamt:hortative → /-inta/ Shared exponence, or syncretism, is a core concern for morphological theory. One standard method of formalizing syncretism is via a rule of referral (Zwicky 1985, Stump 1993).5 According to this approach, a rule of exponence will exist for one

4 At this juncture, the choice of a suffix /ki/ is purely for illustrative purposes, though as we will see, /ki/ is an inflectional suffix in Kayardild. 5 Of course many other approaches exist, and for some, similar issues will arise. For example, this will be true of bi-directional rules of referral (Stump :) and other static declarations of identity of exponence between cells in the paradigm of a single lexeme. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes 

Table . Features for which lexical stems can inflect Feature Nominal lexical stem Verbal lexical stem

case  — tama  — comp  — sej  — tamt —  neg —  feature structure σ, for example for σ={case:locative} in (7a), and then a rule of referral will re-route the realization of a second feature structure τ, for example τ={tama:present} in (7b), back to the first rule, (7a). (7) Felicitous inflectional rules of referral in Kayardild = a. Infl-RExpcase:locative L,Σ,φi def L,Σ, φi+ki = b. Infl-RReftama:present L,Σ,φi defInfl-RExpcase:locative L,Σ,φi By positing a rule of referral, the analyst explicitly expresses the fact that the shared exponence of these two feature structures σ and τ is non-accidental; it arises specif- ically because the realization of both sets is effected by the same rule of exponence, in this case, by (7a). Many cases of syncretism can be analysed elegantly using rules of referral. However, taken in their simplest form, rules of referral will fail under certain conditions. Relevant to this discussion is the scenario in which two feature structures σ and τ share their exponents, yet stems that inflect for σ don’t inflect for τ, and stems that inflect for τ don’t inflect for σ. This case arises in Kayardild, as follows. Table 3.3 indicates the features which nominal and verbal lexical stems can inflect for in Kayardild. Nominal stems inflect directly for values of case, tama, comp,andsej,butnot tamt or neg, while verbal stems inflect directly for values of tamt and neg,butnot case, tama, comp,orsej. Consequently, no lexical stem can inflect directly for both case and tamt, for example. Consider now a rule of referral which attempts to express the identity of the exponents of case:locative and tamt:immediate (cf. (5))such as rule (8a) or (8b). (8) Infelicitous inflectional rules of referral in Kayardild ∗ = a. Infl-RReftamt:immediate L,Σ,φi defInfl-RExpcase:locative L,Σ,φi ∗ = b. Infl-RExpcase:locative L,Σ,φi defInfl-RReftamt:immediate L,Σ,φi Both rules fail, since in effect they attempt to define ‘a stem’s inflection for tamt:immediate’ with reference to ‘its inflection for case:locative’, which does OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round notexist,orviceversa,‘astem’sinflectionforcase:locative’ with reference to ‘its inflection for tamt:immediate’, which does not exist. At this point then, it is apparent that if we wish to express identities of exponence such as those in (5) and (6) in a non- accidental manner, rules of referral alone will not suffice. One solution to this impasse, following Corbett (2007:29), is to introduce ‘virtual rules’ for at least one set of lexical stems such as (9a). Virtual rule (9a) is never used directly to realize case:locative on lexical verbal stems, but since it is present in the rule system, it is available to be routed to by a rule of referral such as (9b).6 (9) Felicitous rules of referral in Kayardild (†= ‘virtual rule’) = a. † Infl-RExpcase:locative L,Σ,φi def L, Σ, φi+ki = b. Infl-RReftamt:immediate L,Σ,φi defInfl-RExpcase:locative L,Σ,φi Virtual rules (or in a paradigm, virtual cells) are discussed by Corbett (2007) in the context of deponency. Their function is to allow the inflectional system to make reference to the inflectional exponence, E, of some feature structure σ, in cases when E does not function as a realisation of σ. To take a classic example of deponent verbs in Latin the ‘passive’ inflectional exponents, E, for a deponent verb do not function to realize morphosyntactically passive feature structures, σ, for the lexeme, because the verb is not used in the passive; nevertheless, the inflectional system does refer to the exponents E, deploying them as the verb’s active inflection. In the case of Kayardild, we would say that the suffixing of /ki/ to a lexical verbal stem (exponent E, generated by (9a)) does not function in Kayardild as the realization of case:locative (σ), yet the system still makes use of it, by referring to it in (9b). Notwithstanding these parallels, there are some differences. In the Latin case, the virtual-rule analysis captures an intuition that deponent verbs have well-formed passive exponents that are nevertheless used for an unusual function. The notion of a ‘well-formed passive exponent’ relies on a comparison with non-deponent verbs, for which the same exponents do indeed realize the passive. In Kayardild however, no

6 A more radical repair to the infelicitous rules in () would be to propose that Kayardild lexical verbal stems participate in a narrow form of ‘interclass mismatch’ (Spencer ). That is, they do not inflect for tamt:immediate at all but instead literally inflect for case:locative as if they were nominal stems. There appears to be little to recommend such an analysis of these facts. Consider first, that the features tamt and tama associated with a given clause are disjunctively ordered: a word may never inflect for both, but a word may freely inflect for both tama and case. If we supposed that a verbal stem literally inflects for case instead of tamt, we would require a further stipulation that on verbal, but not on nominal, stems, case is disjunctively ordered with tama—that is, we would require the morphological distinction between verbal and nominal stems to be maintained, even while saying that the verbal stems are being morphologically mismatched with nominals. Second, not every tamt value shares an exponent with case,thuswewould be forced to say either that verbal stems inflect sometimes for case and other times for tamt,orthat verbalstemsalwaysinflectforcase,butthatsomevaluesofcase appear only on verbal stems, and not on nominal. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes  lexical verbal stem ever has a true case:locative inflection, and thus there is little sense in which rule (9a) is reproducing a ‘well-formed exponent’ in the sense that is true for Latin deponent verbs.7 Consequently, there must be some concern that a virtual-ruleanalysisoftheKayardildfactsissomethingofasleightofhand:technically feasible, but with little explanation for why. In the next section I present an alternative which avoids any such connotation. To summarize thus far, the appeal of rules of referral is that they express the non- accidental nature of syncretism. They do so by referring the realization of one set of feature-values τ to the exponence defined for another set σ. When this is taken in its simplest form, a limitation is that the referral must be to an exponent which actually exists. In Kayardild though, some identities of exponence occur in the absence of any single lexical stem class that will inflect for all of the features involved (in this sense, they contain a dimension of complexity which does not characterize the phenomena considered in the chapters by Donohue or Koenig and Michelson). A solution which suffices for the data considered in this section, is to posit virtual rules of exponence such as (9a).On at least one interpretation however, a drawback is that we must define, for example, ‘well formed case inflections’ for lexical verbal stems, even though no such stem in the language actually inflects for case.

. Identity of exponence: the Separation Hypothesis In this section I introduce additional facts of Kayardild inflection for which the eventual analysis will be morphomic. I will not move directly to a morphomic analysis however, since there is another approach available which handles the data elegantly, even though it will ultimately fall short of accounting for the full range of facts. This approach follows from Beard’s Separation Hypothesis (1995). According to the Separation Hypothesis we may suppose that in any language, the phonological operations referred to by the morphology are represented independent of the rules in which they figure. The language therefore possesses a repertoire of operations, of the kind illustrated by the subset of Kayardild operations shown in (10). (10) Inventory of phonological operations referred to by the morphology

a. Φ1(φ) = φ+/ki/

b. Φ2(φ) = φ-/inta/

c. Φ3(φ) = φ-/napa/

d. Φ4(φ) = φ-/˙iŋ/

e. Φ5(φ) = φ+/kurka/

7 Onemightarguethatthisisacaseofmakingthewrongcomparison:ifwecomparedlexicalverbal stems with nominal stems we would see that there are well-formed exponents of case:locative. However, that comparison is also imperfect, cf. fn.. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round

These operations can then figure in rules of exponence as constants, as illustrated in (11). (11) All rules are felicitous: = ( ) a. Infl-RExpcase:locative L,Σ,φi def L,Σ, Φ1 φi = ( ) b. Infl-RExptama:present L,Σ,φi def L,Σ, Φ1 φi = ( ) c. Infl-RExptamt:immediate L ,Σ,φi def L ,Σ, Φ1 φi The rules in (11) cover the same facts of exponence discussed in 3.3, but this timeall three rules are rules of exponence; rules of referral are not necessary. In (11), the fact that shared exponence is non-accidental has been expressed through the use of the operation constant Φ1 in all three instances. This approach offers a straightforward account of identities of exponence, including those which lack any single lexical stem class that inflects for all of the feature structures involved (i.e. the cases which required virtual rules in 3.3). Moreover, as Beard (1995) emphasizes, these phonological oper- ations need not be active solely in a language’s inflectional component. The same operations can also appear in derivational rules, as indicated in (12). →  = →  ( ) (12) Deriv-RExpδ L L , ,φi def L L , ,Φ1 φi The rule in (12) is a derivational rule of exponence, which figures in the derivation of one lexeme L from another L. In its formulation, the same phonological operation Φ1, which in (11) features in the realization of inflectional features, now realizes the derivational feature structure, δ. This aspect of Beard’s approach is directly applicable to Kayardild, where most of the exponents employed by the inflectional system are also used derivationally. For example, the same /+ki/ which realizes the inflectional features already discussed also figures in the derivation of place names, such as those in (13). (13) Orthographic form: a. Jawar -i b. Makarr -ki Underlying form: /cawa˙ +ki/ /makark +ki/ Literal gloss: ‘oyster sp. (+ki)’ ‘anthill (+ki)’ This ease of relating inflectional and derivational exponence is welcome in Kayardild, where of the fifty-five feature-values in the inflectional system, over two-thirds share their exponence with some or other derivational operation (a list appears in the appendix;seeRound2011forfurtheranalysis).8 A third area in which Beard’s Separation Hypothesis offers an elegant account of Kayardild morphology is in cases such as (14)–(16). Here, the exponents of the feature-

8 Expressing the same relatedness using a rule of referral would presumably require some augmentation of the rule system, to accommodate the fact that the derivational use of the exponent involves a relationship between two different lexemes, whereas the inflectional use does not. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes  values in (15) and (16) have parts which are identical to each other and to the whole of the realization in (14). (14) case:locative → /+ki/ tama:present → /+ki/ tama:instantiated → /+ki/ comp:[+] → /+ki/ tamt:immediate → /+ki/ (15) case:ablative → /+ki-napa/ tama:prior → /+ki-napa/ tama:precondition → /+ki-napa/ (16) case:allative → /+ki-˙iŋ/ tama:directed → /+ki-˙iŋ/ tama:directed → /+ki-˙iŋ/ This is an aspect of shared exponence in Kayardild which will present a challenge to formalisms that require paradigm cells to be the basis of shared exponence, sincehereitisnotcellswhichareidentical,butpartsoftheformsinthosecells. That is, the sharing of forms in (14)–(16) involves relationships which are not merely paradigmatic, but syntagmatic also. The sharing of parts of exponence, as in (14)–(16), is expressed easily if we accept the Separation Hypothesis. Since phonological operations have an existence independent of the rules which refer to them, and since they are referred to in rules by way of operation constants such as Φ1,itispossibletodefineruleswhichrefertomultiple operations. Drawing on several of the operations from (10), the inflectional data in (14)–(16) can be analysed in terms of realization rules such as those in (17). = ( ) (17) a. Infl-RExptama:present L,Σ,φi def L,Σ, Φ1 φi = ( ( )) b. Infl-RExptama:prior L,Σ,φi def L,Σ, Φ3 Φ1 φi = ( ( )) c. Infl-RExptama:directed L,Σ,φi def L,Σ, Φ4 Φ1 φi This is useful in Kayardild, because patterns of partially shared exponence are com- mon. Of the fifty-five feature-values in the inflectional system, more than half enter into partly shared exponence with some other feature-value. We have now seen three ways in which the inflectional system of Kayardild sys- tematically exploits identities of exponence in manners which go beyond the simplest cases at the beginning of 3.3. First, identity of exponence regularly transcends any single class of stems which could inflect for all of the feature structures involved; second, identity of exponence can cross the inflection–derivation divide; and third, it can exist between the parts, as well as the whole, of feature structures’ exponents. These identities of exponence are expressible in a straightforward manner using an approach based on Beard’s (1995) Separation Hypothesis, which posits an existence OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round for phonological operations independent of the rules in which they figure; rules of exponence then refer to phonological operations through the use of operation constants. Rules of referral are unnecessary.

. Identity of pattern and dimensions of exponence: meromorphomes As flexible as it is, the approach to identity of exponence based on the Separation Hypothesis is not powerful enough to capture the full range of shared exponence in Kayardild inflection. Before proceeding further, one point warrants attention. I have been entertaining a formulation of rules that employs operators as constants in order to generate realizations. A consequence of this is that the full inflection of a lexeme, after a series of realization rules has applied, will be something like (18a), in which a series of operations applies in an appropriate order to an initial stem form, φ.9

(18) a. L,Σ, Φ17(Φ34(Φ5(Φ1(φ))))

b. L,Σ, Φ17,Φ34,Φ5,Φ1,φ This kind of representation could just as well be expressed as in (18b) with the operator constantsarrangedinalist.Thelistcanthenitselfberegardedasastringoverwhich certain wellformedness constraints might hold, and in Kayardild, such constraints do exist. To appreciate their nature, we must first return briefly to the syntax–inflection interface. In the general case, the linear order of feature-values’ realization in Kayardild is reflective of the height of the syntactic node to which the feature-value originally attaches: the higher in the syntactic tree a feature originates, the further out from the stem it is realized. However, there are exceptions to this rule (Evans 1995a: 129– 33, 1995b). Some realizations must be last, i.e. farthest from the stem.10 So, what class of realizations falls under such constraints? It is not a coherent morphosyntactic class (e.g. all tama values), nor is it a coherent phonological class (e.g. all unstressed affixes) or a position class (e.g. all inflectional exponents immediately to the right of the stem), rather it is a coherent class expressed in terms of the operator constants themselves. For example, all of the feature-values listed in (6) must be realized last (i.e. they must occur leftmost in a list like (18b)), irrespective of their origin in the syntax. What unites those feature-values is that their realization adds the operator Φ2 to the overall representation of a word form. Facts such as this (see Round, forthc. for a full account) indicate that a representation like (18b) is not merely a linguists’ convenience, a by- product of some stepwise transformation from inflectional feature structure Σ to a

9 In fact it can be argued that the general case, the stem form φ will possess a similar internal structure (Round ). 10 In this discussion I abstract away from the role of a semantically empty suffix, called the ‘termination’ which in fact appears further out than all other suffixes. Regarding the status of the termination see Round (; : –, –; forthc.). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes  phonological form, but a representation which is linguistically significant. It possesses its own set of wellformedness constraints. Moreover, neither its constituent units nor the units in terms of which its constraints are expressed are isomorphic with inflectional features; nor are they isomorphic with phonological categories. That is to say, the representation in (18b) is morphomic. The constants which comprise that representation are a species of morphome, and the constraints on their linearization are inherently morphomic constraints. In terms of our overarching topic, Kayardild morphomes are an instance of mor- phological complexity. They feature prominently in the regulation of the morpho- logical system, yet they do not enhance its expressive capability (cf. Anderson, this volume). Indeed, it is a question for future research to ascertain what payoffs, if any, such complications provide.11 Returning to the specifics of our analysis, and to reflect its morphomic nature, I will change the symbols Φi to Mj, so for example, the rule in (19a) becomes (19b) and the representation in (20a) becomes (20b). This is more than just a notational variation, for as we will see shortly, the units Mj provetobedistinctfromphonologicaloperators. = (19) a. Infl-RExptama:directed L,Σ,φi def L,Σ, Φ3,Φ1,φi = b. Infl-RExptama:directed L,Σ,φi def L,Σ, M3,M1,φi

(20) a. L,Σ, Φ17,Φ34,Φ5,Φ1,φ

b. L,Σ, M17,M34,M5,M1,φ As foreshadowed earlier, the Separation Hypothesis falls short of providing a full account of identity of exponence in Kayardild. Formally, I had used operator constants Φi within rules of exponence in order to refer to independently listed phonological operations. The limitation of that approach is that by definition, a phonological oper- ator constant Φi always stands in for one and the same phonological operation, such as suffixing /+ki/ or applying ablaut. This enforces a one-to-one mapping between the constant as an element in a representation like (20a), and the operation itself. However, the units Mj in the morphomic representation of a Kayardild word, such as (20b), do not always map onto phonological operations in a one-to-one fashion. Consider once again the realizations in (5) and (6), repeated here in (21) and (22). (21) a. case:locative → /+ki/ b. tama:present → /+ki/ c. tama:instantiated → /+ki/ d. comp:[+] → /+ki/ e. tamt:immediate → /+ki/

11 One observation is that at least some morphomes appear to have a reality for speakers, which leads them to being retained over time (Maiden ). Elsewhere (Round, forthc.) I note that the structures which are now morphomic in Kayardild appear to have endured steadfastly, even as the contentful side of the morphological system has undergone extensive upheaval. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round

(22) a. case:oblique → /-inta/ b. tama:emotive → /-inta/ c. tama:continuous → /-inta/ d. sej:[+] → /-inta/ e. tamt:hortative → /-inta/

According to the formalism using morphomic Mj units, the features in (21) and (22) are realized respectively by rules of exponence such as (23a), which makes reference to M1, and (23b), and which makes reference to M2. In itself this differs little from rules employing Φi constants. However, consider what happens when a word inflects simultaneously for features in both (21) and (22). = (23) a. Infl-RExpcase:locative L,Σ,φi def L,Σ, M1,φi = b. Infl-RExptama:emotive L,Σ,φi def L,Σ, M2,φi If a word inflects simultaneously for any of the feature-values in (21) plus any of thefeature-valuesin(22),thentheexponentforthetwois/+kurka/, a cumulative (or portmanteau) suffix. Examples of this are shown in (24)–(26) for the lexemes balung‘western’,narra‘knife’andkurda‘coolamon’.Thefourlinesofglossing, from top to bottom are: orthographic; underlying phonological;12 morphomic; and morphosyntactic. Each example illustrates a different pair of feature-values, one of whichisrealizedasM1,andoneasM2. (24) a. balungkiya b. balunginja c. balungkurrka paluŋ+ki-a paluŋ-intapaluŋ+kurka balung,Σ,M1,paluŋ balung,Σ,M2,paluŋ balung,Σ,M2,M1,paluŋ Σ = tama:present Σ = sej:[+]Σ= tama:present &sej:[+] (25) a. narraya b. narrantha c. narrawurrka ≈ara+ki-a ≈ara-inta ≈ara+kurka narra,Σ,M1,≈ara narra,Σ,M2,≈ara narra,Σ,M2,M1,ŋara Σ = case:locative Σ = tama:emotive Σ = case:locative &tama:emotive (26) a. kurdaya b. kurdantha c. kurdawurrka ku3a+ki-a ku3a-intaku3a+kurka kurda,Σ,M1,ku3a kurda,Σ,M2,ku3a kurda,Σ,M2,M1,ku3a Σ = tama:instantiated Σ = tama:continuous Σ = tama:instantiated &tama:continuous This situation can be accounted for by positing the set of mappings shown in (27), from morphomic units to phonological operations. The analysis then is that all of the

12 The /-a/ which appears after /+ki/ in some of the phonological forms is the ‘termination’,a meaningless morph which has word-final phonological content in certain contexts; see Round (: –). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes 

features in (21) map to M1, and all of the features in (22) map to M2.Themorphomic unit M1 usually maps to /+ki/ and M2 usually to /inta/, but the string M2,M1 maps to /+kurka/.13

(27) a. M1 → Φ1 ;Φ1(φ) = φ+/ki/ b. M2 → Φ2 ;Φ2(φ) = φ-/inta/ c. M2,M1 → Φ6 ;Φ6(φ) = φ+/kurka/

The success and simplicity of this analysis hinge on the ability of units like M1 and M2 to map in a non-one-to-one fashion onto phonological operations. This permits their mappings to be sensitive to their context within the morphomic representation, and with that, their mappings are powerful enough to express the subtle relationships of identity present in the Kayardild data. More abstractly, rules and representations built on Φi constants are sufficient for expressing identities of exponence of feature structure (including identities of exponence that transcend stem classes, identities of exponence that cross the inflection–derivation divide, and partial identities of exponence), but only rules and representations built on Mj units are sufficient for expressing identities of patterns of exponence such as those shared by all of the feature-values in (21) or all of those in (22). All of the inflectional features in (21) share not just an exponent, but a common pattern of exponence: /+ki/ by default and cumulative /+kurka/ in the environ- ment of M2. If a formal system is to have the capacity to express identities of patterns of exponence, it requires rules of exponence that refer to morphomic units (Mj),which are free to map onto phonological operations in a non-one-to-one manner. Let us now take this further. The morphomic units Mj are abstract; they are isomorphic neither with inflectional features nor with phonological operations; and they figure in a linguistically significant representation which is subject to its own, internally defined well formedness constraints. In these respects, the morphomic units Mj are rather similar to abstract units posited by linguists in other domains of grammar. One might ask, therefore, do they exhibit other properties characteristic of abstract units, such as decomposing into distinctive features? In Kayardild, they do. In the general case, the realization in Kayardild of two morphomic units, Mj and Mk, can differ from one another in three, independently variable dimensions. In order to express, in an independently varying manner, whether Mj and Mk are identical to another or not, on each of these three dimensions, I will decompose our morphomic units, M, into three distinctive features, with one feature corresponding to each of the three dimensions of potential difference. The first dimension, which I refer to in Round (2013) as the ‘primary’ morphome, is the set of phonological strings (i.e. suppletive allomorphs) from which a realization

13 As it happens, the order will always be M2,M1,notM1,M2,becauseM2 is a unit which must appear leftmost in the list (as mentioned previously). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round must be drawn. For example, all morphomic units M whose primary morphome is ‘μcons’ will be realized by one of the two underlying phonological strings in the set {/ŋarpa/, /ŋara/}; all units whose primary morphome is ‘μprop’ will be realized by one of the two underlying phonological strings in {/ku˙u/, /kuu/}; and all units whose primary morphome is ‘μpriv’ will be realized by the same underlying phonological string from the singleton set {/wari/}. The second dimension is the kind of phonologi- cal juncture that precedes the phonological string—Kayardild has two major kinds of phonological juncture and in the general case a suffix can be preceded by either. Thus, some morphomes whose primary feature is μcons are realized with the first kind of juncture before the phonological string, while others are realized with the second; thesameistrueofmorphomeswhoseprimaryfeatureisμpriv.Thethirddimension is whether a full set of allomorphs is made available for realization, or whether only a single, ‘strong’ allomorph is available—Kayardild allomorph sets such as {/ŋarpa/, /ŋara/} and {/ku˙u/, /kuu/}, which contain two members, always contain one ‘strong’ form (listed first) and one ‘weak’. Thus, some morphomes whose primary feature is μcons will have realizations in which both {/ŋarpa/, /ŋara/} are potential choices, while others will have realizations in which only the strong form, /ŋarpa/, is possible. The various morphomic units Mj,Mk,Ml ... inKayardildmakegooduseofthese subtle dimensions of variation. As an example we may take the three morphosyntactic feature-values listed in(28), which are realized by three distinct morphomic units. ⎡ ⎤ prim :μcons ⎣ ⎦ (28) a. tamt:past → junc : [+] → {/+ŋarpa/, /+ŋara/ } [ ] ⎡ allo : + ⎤ prim :μcons b. tamt:precondition → ⎣ junc : [+] ⎦ → /+ŋarpa/ [−] ⎡ allo : ⎤ prim :μcons c case:consequential → ⎣ junc : [−] ⎦ → /-ŋarpa/ allo : [−] The morphomic units are displayed in the central column of (28), expressed as vectors of distinctive features. All three units in (28) share the same value, μcons, for the primary morphome feature (prim). This captures the fact that all three will be realizable as either one or both of the underlying phonological strings in the set {/ŋarpa/, /ŋara/}. Turning to individual forms, in (28a) the allomorphy feature (allo) issetto[+], meaning that the morphomic unit may be realized as either /ŋarpa/ or /ŋara/. When the grammar is given an option such as this, the choice for one allomorph or the other is decided by several factors, including whether the register is spoken Kayardild or song. So for example, in (29) the tamt:past inflection of the lexeme kurrij ‘see’ is realized using the weak allomorph /ŋara/ in speech, but the strong allomorph /ŋarpa/ in song. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes 

(29) Spoken kurrijarra Song kurrijarrba kuric+ŋara kuric+ŋarpa prim: μcons prim: μcons kurrij,∑,〈 junc:[+] ,kuric〉 kurrij,∑,〈 junc:[+] ,kuric〉 allo:[+] allo:[+] ∑ = TAMT:PAST ∑ = TAMT:PAST

Now consider (28b), whose allomorphy feature is set to [–]. This means that the morphomic unit can only be realized by the strong allomorph /ŋarpa/. Thus, even in the spoken register, we get the strong allomorph /ŋarpa/ when the verb kurrij is inflected for tamt:precondition, as in (30). (30) kurrijarrba(31) warnginyarrba kuric+ŋarpa waɻŋic-ŋarpa prim: μcons prim: μcons kurrij,∑,〈 junc:[+] ,kuric 〉 warngij,∑,〈 junc:[–] ,waɻŋic 〉 allo:[–] allo:[–] ∑ = tamt: precondition ∑ = case:consequential

Atthispoint,wecanalsonotethatin(28a,b)and(29)–(30),thejuncturefeature(junc) issetto[+]. This ensures that at the juncture between the stem and suffix morphs, the underlying string /c+ŋ/ surfaces as the plosive [c], written orthographically as j.This is in contrast to (28c), which is illustrated in (31) using the lexeme warngij ‘one’, and whose juncture feature is [–]. The different juncture feature ensures that at the juncture between the underlying morphs, the underlying string /c-ŋ/ surfaces as the nasal [«], written orthographically as ny. Thus, even though (30) and (31) share the same underlying suffix morph /ŋarpa/ following the same underlying, stem-final plosive /c/, the contrast in phonological juncture leads to different phonological outcomes. To summarize: morphomic units Mi are realized as phonological operators, Φn. Moreover, the realizations in Kayardild of any two morphomic units, Mj and Mk,can differ from one another in up to three systematic yet independently varying dimen- sions. Within a morphomic representation, those three dimensions can be formalized in terms of three distinctive features. The first morphomic distinctive feature is the primary morphome feature prim. Morphomic units which share their value of prim will have exponents selected from the same set of underlying phonological-string allomorphs. The second and third distinctive features are the allomorphy and juncture features, which modulate whether one or two allomorphs are made available for use, and which phonological juncture appears to the left of the phonological suffix. These three features represent three fine-grained dimensions along which the exponents of morphomic units in Kayardild will be identical or not. In 3.3–3.5, I have shown that identities of exponence within the Kayardild inflec- tional system are common and diverse. There are complete identities of exponence OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round

(5)–(6), including those which transcend stem classes and those which span the inflection–derivation divide; there are identities between segmentable parts of expo- nents (14)–(16); and there are identities of pattern of exponence across different contexts (24)–(26). Finally, there are identities in orthogonal dimensions of expo- nence (28)–(31). This diversity of phenomena defies an account in terms of simple rules of referral or one based on the Separation Hypothesis alone. Evidently our theory requires a more powerful formalism if it is to express the full breadth of (partial) identities of (aspects of) exponence exhibited by natural languages. To that end, a formalism was adduced based on morphomic units Mj. All of the variations on identity of exponence in Kayardild submit to a simple and parsimonious account if it is assumed that realizational rules make reference to morphomic units Mj,which are decomposable into distinctive features, and which can map in a non-one-to-one manner onto phonological operations. Having established that morphomic units Mj have a linguistic existence, and having shownthattheycannotinthegeneralcasebereducedtosimplerulesofreferralorto phonological operators, in the next section I consider how these morphomic units Mj relate to other morphomic categories.

.Meromorphomes and metamorphomes

The categories/units Mj are morphological entities which are isomorphic neither with inflectional features nor with phonological operations, that is, they are morphomic in the classic sense. On the other hand, they are not categories that divide lexemes into classes, so they are not rhizomorphomes, rather they are categories that organize the operations by which individual word forms are composed, piece by piece: they are meromorphomes. In this section I turn to a third species of morphome and discuss how it relates to meromorphomes and to rhizomorphomes. In 3.3–3.5 I was concerned with matters of exponence, and with the mapping from inflectional features to phonological exponents. Dealing with that concern in Kayardild leads to the positing of meromorphomes, categories which mediate between morphosyntactic feature structures and the phonological operations by which indi- vidual pieces of individual word forms are composed. Shifting focus away from the mechanics of exponence, we may also consider its end result, that is, the sets of wordformsinaparadigmandhowthosewordformsrelate(ordonotrelate)toa given meromorphome. For example, given the entire inflectional and/or derivational paradigm of a root, we might ask which cells contain words whose surface form consists in part of the realization of a certain meromorphome.14 In that case, we are concerned with the distribution of a certain meromorphome, or equally, with the portion of a paradigm characterized by the incidence of realizations of that

14 Put another way, these are the set of words in whose morphomic representations the meromor- phomes appear. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes  meromorphome. For example, the celebrated third stem of Latin (Aronoff 1994), viewed as a piece of the words in which it appears, could be considered the realization of a single meromorphome, insofar as that meromorphome is anisomorphic with any natural class of morphosyntactic features, and has a distinctive pattern of identity of exponence as one surveys the inflectional and derivational paradigm of a root. The third stem meromorphome also has a certain distribution within the full paradigm of a root, or equally, there is a portion of the paradigm that is characterized by the incidence of its realizations. Paradigm-internal distributions such as these are morphomic categories in themselves (Maiden 2005), and are sometimes referred to as ‘morphomes’ in the literature. Yet they are not meromorphomes, nor are they rhizomorphomes. A suitable term for them is a metamorphome, since they inhere in a pattern to be found across multiple word forms, distributed within a paradigm.

. Commonalities in morphomic complexity To tie back more explicitly to our theme of complexity I would like to accentuate a common thread in the nature of complexity in all three species of morphome. In 3.5 we saw that Kayardild’s meromorphomes are decomposable into distinctive features. It is well established that rhizomorphomes such as inflection classes also fall into subclasses, expressible for instance in terms of inheritance relationships (Corbett and Fraser 1993). Hierarchical organization in metamorphomes, i.e. in the distributions of realizations of meromorphomes across a paradigm has been discussed by Maiden and O’Neill (2010) and O’Neill (2011). To the extent that these observations reflect a general tendency in the structure of morphomic categories, two conclusions can be drawn regarding the nature of complexity at the morphomic level. First, not only do morphomic categories exist and play a significant role in the morphological systems of some languages, butinthegeneralcasethosecategories are divisible into subcategories. In one sense, this is extreme complexity. If Anderson (this volume) is correct to claim that the very existence of morphology is a form of complexity, then the existence of purely morphological categories, anisomorphic with other categories in the grammar (i.e. morphomes), is a second layer of complexity which is parasitic on the first, and the existence of subcategories among them isa third layer parasitic upon the second. Second however, to the extent that all three species of morphome are divisible into subcategories, their structure appears much like that of any other grammatical categoryoncesubjectedtoaformalanalysis.Thus,whiletheexistenceofmorphomes and their subcategories may be an instantiation of complexity, their structure, and indeed their organization into an autonomous level of representation, appears to be anything but complex, in the sense that it adds little if anything to the range of types of architectures (as opposed to the number of domains) attested among grammatical subsystems. Morphomic categories subdivide just as other grammatical categories do, OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round and they map onto foreign categories anisomorphically, just as other grammatical categories do. Morphomes’ existence may be a matter of complexity, but their formal structures are of a kind which is nothing if not familiar.

. Conclusion I have proposed that our theory recognize three species of morphome. Rhizomor- phomes are morphomic categories pertaining to sets of roots. They divide the lexicon into classes such as conjugations and declensions. Meromorphomes are categories pertaining to sets of word formation operations, which derive the pieces of individual word forms. Meromorphomes can underlie both complete and partial identities of exponence, identities of pattern of exponence, and identity of dimensions of expo- nence. Metamorphomes are distributions across a paradigm, of cells which contain pieces of exponence that are realizations of meromorphomic categories. Morphomic patterns such as the L, U, and N morphomes of Romance are metamorphomes.15 All threespeciesofmorphomearedivisibleintosubcategories,thoughIhavesuggested that this is unremarkable given that morphomes are categories in a grammatical subsystem. In the main part of the chapter, I argued that identity of exponence extends well beyond the expressive capability of simple rules of referral. A parsimonious analysis of sufficient power is obtained by permitting rules of exponence to make reference to meromorphomes, which can then map to phonological operators in a non-one-to-one fashion.

. Appendix Table 3.4 lists the seven inflectional features of Round’s (2013) analysis of Kayardild, and for each feature, its possible specified values, and the morphomic exponent(s) of those values, cited in terms of primary morphome features (allomorphy and juncture ∗ featuresarenotshown).Thesymbol‘’ indicates that the feature-value shares its morphomic exponent exactly with at least one other; ‘†’ indicates that it shares at least part of its morphomic exponence with at least one other; and ‘‡’ indicates that its morphomic exponence is also employed derivationally. Table 3.5 lists the phonological exponents for primary morphomes in Table 3.4

15 The ‘form cells’ of the inflectional theory proposed by Stump (, , inter alia)arealso metamorphomes. The analysis of Kayardild adduced here shares several properties with Stump’s theory, and further investigation into their similarities is warranted. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Rhizomorphomes, meromorphomes, and metamorphomes 

Table . Inflectional features and their morphomic exponents

Feature Values and morphomic exponents

∗ comp: [+]→ μobl ∗ sej: [+] → μloc †‡ ∗ neg: [+] → μneg ‡ ∗ ∗ ∗ tama: antecedent → μcons †‡, continuous → μobl ,directed→ μloc-μall †‡, ∗ ∗ ∗ emotive → μobl , functional → μutil ,future→ μprop †‡, incipient ∗ ∗ → μdat-μmid-j†, instantiated → μloc †‡, negatory → μpriv †‡, ∗ ∗ precondition → μloc-μabl†‡, present → μloc †‡, prior → μloc-μabl †‡ tamt: actual → μpriv†‡a, antecedent → μn-μcons†, apprehensive → μappr, ∗ ∗ desiderative → μdes, directed → μloc-μall †‡, hortative → μobl , ∗ ∗ immediate → μloc †‡, imperative → μneg ‡b, incipient → μn-μdat-μmid-j†, ∗ past → μcons†‡, potential → μprop †‡, precondition → μcons†‡, ∗ progressive → μn †‡, resultative → μres‡, nonveridical → μn-μpriv† ∗ ∗ case: ablative → μloc-μabl †‡, allative → μloc-μall †‡, associative → μassoc‡, ∗ collative → μlloc-μinch-th†, consequential → μcons †‡, dative → μdat-th†‡, denizen → μden-j-μn†, donative → μdon-j†‡, genitive → μgen‡, human allative → μallh-j†, instrumental → μinst, locative ∗ → μloc †‡, objective ablative → μablo-th†, objective evitative ∗ ∗ → μevito-th†, oblique → μobl †‡, origin → μorig, privative → μpriv †‡, ∗ proprietive → μprop †‡, purposive → μallh-μmid-j†‡, subjective ablative → μablo-μmid-j†, subjective evitative → μevito-μmid-j†, translative ∗ → μdat-μmid-j†, utilitive → μutil num: dual → μdu‡, plural → μpl a Cumulative exponence of tamt:actual & neg:[+] b Cumulative exponence of tamt:imperative & neg:[+]

Table . Phonological exponents of morphomes

μabl → {napa, naa} μgen → kara≈ μobl → inta μall → {˙iŋ, ˙uŋ} μallh → cani μevito → wa‰lu μappr → «ara μinch → wa μorig → wa‰« μassoc → ≈uru μinst → ŋuni μpl → palatt μcons→ {ŋarpa, ŋara} μlloc → ki‰ μpriv→ wari μcons→ ŋarpa μloc → ki μprop → {ku˙u, kuu} μdat → ma˙u μmid→ i μres → iri« μden→ wi3i μn→ n μutil→ mara μdes → ta μneg → ≈aŋ j→ c μdon → wu μablo → wula th→ t μdu → kiarŋμloc&μobl→ kurka OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Erich R. Round

Acknowledgements For the opportunity to present and discuss this research in such a collegial and stimulatingforum,Iwouldliketoexpressmygratitudetotheorganizersandthe participants of the Conference on Morphological Complexity, held in London, Jan- uary 2012. For invaluable comments and discussion during the development of these ideas, my thanks to Steve Anderson, Mark Aronoff, Matthew Baerman, Dunstan Brown, Grev Corbett, Nick Evans, Martin Maiden, Paul O’Neill, Andy Spencer, Greg Stump and two anonymous OUP reviewers. Institutionally, I wish to acknowledge the support of the NSF (grant BCS 844550), Australian Research Council (grant ‘Isolation, Insularity and Change in Island Populations’), the Endangered Languages Project (grants IGS0039 and FTG0025), and the School of Languages and Cultures at the University of Queensland. All shortcomings, errors or oversights in the chapter remain my own. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Morphological opacity: Rules of referral in Kanum verbs

MARK DONOHUE

. Complexity In recent years the term ‘complexity’ has been used in linguistics publications in various ways. At least four senses of ‘complexity’ can be discerned: 1. Size. A language (/subsystem of a language) is said to be complex if it has a lot of members. In this sense a complex pronominal system would include a large number of pronouns; a complex phoneme system would have a lot of phonemes; complex verbal inflection would have many inflectional possibilities (see also section 2.1.1 of Anderson’s chapter, this volume). 2. Dimensions. A language (/subsystem of a language) is said to be complex if the description of its component parts requires a lot of variables. In this sense a complex pronominal system would differentiate multiple numbers, genders, and persons; a complex phoneme system would require many features to describe; complex verbal inflection would have many inflectional possibilities for a large range of different grammatical categories. 3. Rarity. A language (/subsystem of a language) is said to be complex if it includes elements that are cross-linguistically infrequent. In this sense a complex pronominal system would include categories that are only occasionally found (such as pronouns that include reference to generational differences between speaker and addressee, as in various languages of Australia); a complex phoneme system would include unusual phonemes, such as velar laterals or linguolabial fricatives; complex verbal inflection would include categories rarely found on verbs, such as night-time status of the event, or heaviness of the object (both attested in Berik). 4. Transparency. A language (/subsystem of a language) is said to be complex if there is not a direct mapping from the features present and the expression of those features. In this sense a complex pronominal system might include irregular corre- OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Mark Donohue spondences of singular and plural pronouns (such as Meryam Mer, in which the first person plural inclusive is expressed with the morpheme for second person to which a plural suffix is added); a complex phoneme system might require the features [±voice], [±labial], [±coronal], and [±dorsal], but only include the phonemes /p t d k/ (such as Finnish, or Bobot and Emplawas from eastern Indonesia); a complex system of verbal inflection might include suppletive, or portmanteau forms, or irregularly use the ‘wrong’ morphological forms to express features in some contexts (such as using plural forms to indicate a passive voice, as in Lingala). (See also section 2.2 of Anderson’s chapter.) These different senses are, at least theoretically, not all independent variables; the more dimensions in a system, the greater the potential size, and the greater the chance that there is non-transparent mapping between potential categories and actual categories.Similarly,thenatureofraritymeansthatitismorelikelytoemergeasa small part of an otherwise ‘well-behaved’ (from a cross-linguistic perspective) system, since rarity is not an absolute, but rather something embedded in commonality. Nonetheless, these different senses have all been used when discussing ‘complexity’. I shall concentrate on some aspects of the last sense, transparency, in the discussion that follows, while making reference to the other senses as appropriate. The main medium for the discussion is verbal inflection in Kanum, a language of southern New Guinea. In Kanum, which shows verbal inflection for subjects, objects, and tense, we see elaborate systems of oppositions created with a relatively small set of distinct morphemes. These morphemes have regular distributions to realize different inflectional categories, but we also find many instances in which elements of the paradigm are marked by referral from other inflectional cells (see discussion in Zwicky 1985, Stump 2001, Baerman 2004). I will exemplify some of these cases of referral, and show that there are patterns underlying this irregular behaviour.

. Kanum verbal inflection:  KanumisalanguageofsouthernNewGuineajustnorth-westoftheTorresStrait (Boelaars 1950, Drabbe 1947, 1950); the variety described here is known to its speakers as Ngkaolmpw Ngkaontr Knwme. The language has extensive agreement for both subject and object on the verb, as well as tense, and has an extensive case-marking system, and shows elements of non-configurationality (Donohue 2011). Verbs agree with the number (and person, if plural) of their subject by suffix, and with person, number, and gender of their object (if bivalent) by prefix; tense information is distributed about the verb. The simplest version of this schema is shown in (1), and is illustrated in the sentences in (2) and (3).1 Comparing the two examples, it

1 Kanum examples are presented in an orthography that follows IPA conventions, with the exception that /ŋ/ is represented by ,/j/by,/æ/by<ä> and /'/by<â>. Many clusters are broken up OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Morphological opacity: Rules of referral in Kanum verbs  isnothardtoidentifytheprefixkn- and the suffix -y in (1) and (2) attached to the roots eyerk and ew. Examining (4), it is clear that the prefix kn- marks the second person (singular) object; comparing with (5) we see that the suffixy - marks the first person plural subject. Basic verbal agreement (1) OBJECT-verb.root-SUBJECT (2) (Nynta mpw) kwneyerky. 1pl.erg 2abs we:snuck.up.on:you:yesterday ‘We snuck up on you (yesterday).’ (3) (Nynta mpw) kwnamply. 1pl.erg 2abs we:laughed.at:you:yesterday ‘We laughed at you (yesterday).’ (4) (Nynta py) swamply. 1pl.erg 3abs we:laughed.at:them:yesterday ‘We laughed at them (yesterday).’ (5) (Pynta mpw) kwnample. 3pl.erg 2abs they:laughed.at:you:yesterday ‘They laughed at you (yesterday).’ Neither the prefixes nor the suffixes in (2)–(4) contain only pronominal informa- tion; they also convey tense categories. The full inflectional paradigm for ampl ‘laugh at’, showing subject, object, and tense inflections by prefix and suffix, is shownin Table 4.1; forms that we have already seen in (2)–(4) are shown in bold. From the outset we note that there is no difference in inflection between 2pl and 3pl objects, infact,theseformsarethesameasthe3sg.mforms;similarly,the1plformsare identical to the 2sg forms. Given that we know that these forms are distinguished inthepronominalsystemofKanum,asevidencedinthefreepronounsshownin Table 4.2 and by the contrasts made in the subject suffixes, we must assume that these categories are subject to rules of referral, shown in (6), which assigns the values given for the 2sg object to the 1pl cell (a pattern that is well attested in New Guinea), and another that assigns the 3sg.m form to the 3pl and 2pl cells. This means that the s-/y- object prefixes are better thought of as unspecified for person, number, or gender. They are blocked from appearing with first person, feminine, or 2sg reference because of the existence of more highly specified morphemes that do not incur a class of features.

by epenthesis or the syllabification of a liquid or glide. As an example sentence () can be pronounced as [n˘ınmta‰ mpF ˘wk˘Fmnejerk˘ıj]. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Mark Donohue

Table . Inflection for ampl ‘laugh at’ with different subjects, objects, and tenses

object subject 1sg 2sg 3sg.m 3sg.f 1pl 2pl, 3pl √ √ √ √ √ √ sg F b- √-nt nt-√ -nt sr-√ -nt ta-√ -nt nt-√ -nt sr-√ -nt Pw-√ n-√ y-√ a-√ n-√ y-√ Tw--y√ n- -y√ y- √-y a- -y√ n- -y√ y- √-y Ykww-√ kwn-√ sw-√ kwa-√ kwn-√ sw-√ Rw--w n- -w y- -w a- -w n- -w y- -w √ √ √ √ √ √ 1pl F br-√ -ntey nt-√ -ntey sr-√ -ntey ta-√ -ntey nt-√ -ntey sr-√ -ntey Pw-√-y n-√-y y-√-y a-√-y n-√-y y-√-y Tw--ns√ n- -ns√ y- √-ns a- -ns√ n- -ns√ y- √-ns Ykww-√ -y kwn-√ -y sw-√ -y kwa-√ -y kwn-√ -y sw-√ -y Rw--ay n- -ay y- -ay a- -ay n- -ay y- -ay √ √ √ √ √ √ 2pl F br-√ -ntey nt-√ -ntey sr-√ -ntey ta-√ -ntey nt-√ -ntey sr-√ -ntey Pw-√-e n-√-e y-√-e a-√-e n-√-e y-√-e Tw--ns√ n- -ns√ y- √-ns a- -ns√ n- -ns√ y- √-ns Ykww-√ -e kwn-√ -e sw-√ -e kwa-√ -e kwn-√ -e sw-√ -e Rw--ay n- -ay y- -ay a- -ay n- -ay y- -ay √ √ √ √ √ √ 3pl F br-√ -nteme nt-√ -nteme sr-√ -nteme ta-√ -nteme nt-√ -nteme sr-√ -nteme Pw-√-e n-√-e y-√-e a-√-e n-√-e y-√-e Tw--ns√ n- -ns√ y- √-ns a- -ns√ n- -ns√ y- √-ns Ykww-√ -e kwn-√ -e sw-√ -e kwa-√ -e kwn-√ -e sw-√ -e Rw--ay n- -ay y- -ay a- -ay n- -ay y- -ay √ F:future;P:present;T:today’spast;Y:yesterday’spast;R:remotepast; :verbroot.

Table . Free pronouns in Kanum, absolutive, and ergative forms

Absolutive Ergative sg pl sg pl

1ngkâny1ngkaynynta 2 mpw mpw 2 mpay mpwnta 3pypy3pyengkwpynta OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Morphological opacity: Rules of referral in Kanum verbs 

(6) Object prefixes in the present (from Table 4.1) 1sg object = w + stem 2sg object = n + stem 3sg.m object = y + stem 3sg.f object = a + stem 1pl object = n + stem 2pl object = y + stem 3pl object = y + stem

Since the focus of this chapter is the syncretisms in the agreement morphology, we can unproblematically segment off the tense morphemes that are not also specified for features of the subject, and assign approximate features to the other affixes, in (8). Note that there are two suffixes of the formy - ,andtwooftheform-e;Kanummorphology admits high levels of syncretism, as is also clear from the absolutive pronominal forms in Table 4.2 (see Baerman 2004, Baerman et al. 2005 for a more extended discussion of kinds of syncretisms and their modelling). The template for the verb is shown in Table 4.3, which shows which feature types are coded in which positions, and where we can see that the syncretic verbal suffixes occupy different positions in the verb template. Since some of the verb roots are suppletive for number of object, or for tense, the verb root too is marked as potentially showing features of the sort found in the inflectional affixes. Of the three inflectional categories, tense, subject, and object, very few affixes are constrained to mark features of only one category, a fact that will become relevant later in the exposition. Tense affixes not portmanteau with subject features (from Table 4.1) (7) k- YESTERDAY’S PAST Not generic objects w- YESTERDAY’S PAST r- FUTURE Not 2sg/1pl, or feminine objects -nt FUTURE

Table . Kanum verbal inflection √ −4 −3 −2 −1 +1 +2 +3

Object +++(+) Subject +++ Tense ++(+) +++ Number +++(+) ++ Person +++ +

k- w- b-/w- r--nt -e1 -me s-/y nt-/n- -y1 -a -y2 ta-/a- -ns -e2 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Mark Donohue

Affixes primarily marking agreement (from Table 4.1) (8) b-/w- 1sg object nt-/n- 2sg/1pl object ta-/a- 3sg.f object s-/y- object (= 3sg.m, 2pl, 3pl) -y1 sg subject (Today’s past) -w sg subject (Remote past) -e1 pl subject (Future) -y2 pl subject -e2 2pl/3pl subject (Present, Yesterday’s past) -me 3pl subject (Future) -ns pl subject (Today’s past) -a pl subject (Remote past)

Based on the preceding discussion, we can trivially predict the forms in (9), expanding from (2) and (3). When we examine forms with a 2pl object, however, we find a problem in the regular productivity of the paradigms. The non-future forms are as predicted for both verbs. For ‘laugh at’ the rest of the paradigm follows Table 4.1 (unsurprising, since Table 4.1 is based on the regular paradigm found with the verb ampl ‘laugh at’). For eyerk, however, the future member of the paradigm shows a takeover of the 2pl by the forms used for 2sg and 1pl object, and subsequent referral of these forms to the 3pl as well (shown in (13)). There are three points of note associated with this change in the paradigm. First, the takeover is restricted to the future; the syncretism between 3sg.m, 2pl, and 3pl still holds in the non-future tenses. Second, the takeover affects only part of the syncretic {3sg.m, 2pl, 3pl} set. In (11) we can see that a 3pl object does not have the same paradigm as a 2pl object (even excluding the eligibility for the r- tense prefix): a rule that affects the 2pl need not necessarily affect the 3pl forms. Although the new referral of the 2sg/1pl forms to the 2pl cell has disrupted the referral of the 3sg.m to the 2pl, it has not disrupted the 3sg.m → 3pl referral, and does not affect the forms seen in (12) for a 3sg.m object. (9) ‘We___you.’ ‘sneakupon’ ‘laughat’ FUTURE nt-eyerk-nt-e-y s-r-ampl-nt-e-y PRESENT n-eyerk-y n-ampl-y TODAY’S PAST n-eyerk-ns n-ampl-ns YESTERDAY’S PAST k-w-n-eyerk-y k-w-n-ampl-y REMOTE PAST n-eyerk-a-y n-ampl-a-y (10) ‘We ___ you.pl.’ ‘sneak up on’ ‘laugh at’ FUTURE nt-eyerk-nt-e-y s-ampl-nt-e-y PRESENT y-eyerk-y y-ampl-y OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Morphological opacity: Rules of referral in Kanum verbs 

TODAY’S PAST y-eyerk-ns y-ampl-ns YESTERDAY’S PAST s-w-eyerk-y s-w-ampl-y REMOTE PAST y-eyerk-a-y y-ampl-a-y (11) ‘We ___ them.’ ‘sneak up on’ ‘laugh at’ FUTURE s-r-eyerk-nt-e-y s-r-ampl-nt-e-y PRESENT y-eyerk-y y-ampl-y TODAY’S PAST y-eyerk-ns y-ampl-ns YESTERDAY’S PAST s-w-eyerk-y s-w-ampl-y REMOTE PAST y-eyerk-a-y y-ampl-a-y (12) ‘We___him.’ ‘sneakupon’ ‘laughat’ FUTURE s-r-eyerk-nt-e-y s-r-ampl-nt-e-y PRESENT y-eyerk-y y-ampl-y TODAY’S PAST y-eyerk-ns y-ampl-ns YESTERDAY’S PAST s-w-eyerk-y s-w-ampl-y REMOTE PAST y-eyerk-a-y y-ampl-a-y Additional referral for object prefixes in the future seen in (10), for eyerk ‘sneak up on’ (13) 1sg object = w + stem 2sg object = n + stem 3sg.m object = y + stem 3sg.f object = a + stem 1pl object = n + stem 2pl object = n + stem 3pl object = y + stem

. Further kinds of rules of referral in Kanum verbs The two verbs diverge in other parts of their paradigms as well. With a singular subject we see that the 2sg object prefix in the future is not nt-,aspredicted,buts-n-, combining the generic object with the appropriate 2sg/1pl object prefix. With a third person feminine object we see the entire paradigm in all tenses has been taken over by the 1sg forms. These complications are not found with ampl. (14) ‘I ___ you.’ ‘sneak up on’ ‘laugh at’ FUTURE s-n-eyerk-nt nt-ampl-nt PRESENT n-eyerk n-ampl TODAY’S PAST n-eyerk-y n-ampl-y YESTERDAY’S PAST k-w-n-eyerk k-w-n-ampl REMOTE PAST n-eyerk-w n-ampl-w OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Mark Donohue

(15) ‘I___her.’ ‘sneakupon’ ‘laughat’ FUTURE b-eyerk-nt t(a)-ampl-nt PRESENT w-eyerk (a)-ampl TODAY’S PAST w-eyerk-y (a)-ampl-y YESTERDAY’S PAST k-w-eyerk k-w-a-ampl REMOTE PAST w-eyerk-w (a)-ampl-w Additional referrals seen in (14) and (15) for eyerk (16) generic object → 2sg object / future 1sg object → feminine object Our analysis becomes more complex when we consider data from different verbs. With wr ‘bite’wefindthatthe1sgobjectformstakeoverthe3sg.fobjects,butonly with yesterday’s past tense. We do not find the 2sg/1pl takeover of the 2pl that was seen in (10), but we do see a spread of the singular subject suffixes to cells with plural subjects, but only when the object is generic. The forms in (17) show the expected affixes for the first column, we>you. With the second (we>him) column we see that either the expected plural affixes are absent, or else they have been replaced with singular subject affixes. This is not the case for 2pl and 3pl objects, which are compatible with plural subject suffixes. In the third column> (we her) we find the expected plural subject suffixes, but with the 1sg object prefix in place of the feminine objectprefix.Theverb‘bite’showstakeoversforsubjectsaswellasobjects. (17) ‘bite’ ‘We __ you.’ ‘We __ him.’ ‘We __ her.’ FUTURE nt-wr-nt-e-y s-wr-nt-Ø-Ø ta-wr-nt-e-y PRESENT n-wr-y y-wr-Ø a-wr-y TODAY’S PAST n-wr-ns y-wr-y a-wr-ns YESTERDAY’S PAST k-w-n-wr-y s-w-wr-Ø k-w-w-wr-y REMOTE PAST n-wr-a-y y-wr-w a-wr-ay Referrals seen in (17) for wr (18) sg subject → pl subject / 3sg.m object 1sg object → feminine object / yesterday’s past I conclude this section with data from one further, highly idiosyncratic verb, makr ‘roast’.Table4.4showstheexpectedinflectionforthisverb,andtheattestedinflections. We should first note that there is a suppletive form of the verb for the present, today’s past, and yesterday’s past tenses, ekr.2 The contrast between different subjects is lost entirely, plural subject suffixes invade the singular paradigm and take over other tenses, the present and today’s past object prefixes are taken over, and the yesterday’s past prefix is lost. Combined with the suppletive verb forms there is a high degree

2 The selection of makr,ratherthanekr, as the root follows from the nominalized form, makr-ay. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Morphological opacity: Rules of referral in Kanum verbs 

Table . Inflection for makr ‘roast’ with different subjects, objects, and tenses

subject expected attested √ sg F s-(r-)√ -nt s-r-makr-nt Py-√ s-ekr-ay Ty-√-y s-ekr-ay Ysw-√ s-Ø-ekr-nt Ry-√-w y-makr-w 1/2pl F s-√-nt-e-y s-r-makr-nt-e-y Py-√-y s-ekr-ay Ty-√-ns s-ekr-ay Ysw-√ -y s-w-makr-y Ry-√-ay y-makr-ay 3pl F s-r-√ -nt-e-me s-r-makr-nt-e-y Py-√-e s-ekr-ay Ty-√-ns s-ekr-ay Ysw-√ -e s-w-makr-y Ry--ay y-makr-ay of irregularity with this verb, but only one clear case of suppletion, the makr/ekr alternation. ReferralsseeninTable4.4formakr (19) future/yesterday’s past object prefixes → present, today’s past future suffix → yesterday’s past / sg subject plsubjectremotepastsuffix → present, today’s past future 1/2pl subject suffix → 3pl future yesterday’s past 1/2pl subject suffix → 3pl yesterday’s past The forms and analysis presented in Tables 4.1 and 4.2, and in (7) and (8), might seem, in the light of our extended explication of the verb eyerk ‘sneak up on’, combined with data from wr ‘bite’ and makr ‘roast’, to be hopelessly optimistic. The following section will consolidate the well-attested patterns that we do find in Kanum verbs, and in section 4.5 we examine patterns of regularity in the rules of referral.

. Opacity and Kanum verbal inflection Tothispointwehaveanagreementsystemthatrequiresreferencetofeaturesdis- tinguishing two degrees of number (singular, plural) and two degrees of person (local, nonlocal); this is less dimensions than are attested in other languages, though perhaps the features required for the person axis are somewhat unusual (invoking complexity-by-rarity). Since the size of the contrast set is smaller than predicted from OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Mark Donohue

Table . Opacity and transparency in pronominal systems

(very) transparent Transparent (translucent) Opaque sg pl sg pl sg pl sg pl local αα− γ local αβlocal αβlocal α ‚ nonlocal ββ− γ nonlocal γ δ nonlocal α γ nonlocal · α

Table . Tense distinctions portmanteau with agreement affixes

object subject

Future α α Present β β Today’s past β γ Yesterday’s past γ β Remote past β δ the dimensions involved, however, there is a level of opacity involved. For example, Table 4.5 uses the dimensions that we have seen are relevant for a description of Kanum, and explores some of the possibilities along a cline between transparent and opaque, using the Greek letters α, β, γ, and δ to designate the contrasts that the system marks. In a transparent system all of the possible oppositions defined by the features distinguished along the different dimensions of variation are attested. In a maximally opaque system the two dimensions, with two degrees of variation each, are required to make a distinction between only two marked categories. The Kanum person and number system falls between these two extremes, marking three differences by not distinguishing person in the singular. We have seen that tense is marked differently on the portmanteau affixes that index subjectsandobjects,asshowninTable4.6(basedonthedatainTable4.1;again,theuse of α, β, etc. is only intended to represent contrasts within each of the paradigms). Only for the future do the categories match; because of the mismatches, five tense categories are distinguished in a distributed fashion. Clearly we need to investigate the marking of other types of objects in order to understand the workings of the Kanum verb in order to understand things better.

. A wider survey of rules of referral in Kanum verbs We can construct a set of regular agreement affixes for verbs, shown in Table 4.1; based on these forms, which show the syncretisms displayed in Table 4.7, we can plot the irregularities found with object agreement in eyerk ‘sneak up on’ and wr ‘bite’ in OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Morphological opacity: Rules of referral in Kanum verbs 

Table . Pronominal distinctions in the idealized agreement affixes Object Subject sg pl sg pl 1αβ 1 α (β) 2βγ 2 α (γ) 3γ 3 α γ (δ) 3f δ 3f α

Table . Takeovers in the object prefixes for two verbs

wr Idealised eyerk 1sg 1sg 1sg 2sg 2sg 2sg 3sg 3sg 3sg 3sg.f 3sg.f 3sg.f 1pl 1pl 1pl 2pl 2pl 2pl 3pl 3pl 3pl

Table 4.8. While unpredictable, the takeover of paradigmatic cells always proceeds from singular forms to plurals. The complexity we see is irregular, but does appear to follow some rules: no new forms are found, and only a small subset of all possible takeovers is attested. (Maximally two distinctions are made in the plural forms for subjects as well as objects. Depending on the tense, β = γ = δ, β = γ = δ, or β = γ = δ. Across tenses, α = βisattested.) We do find most variation (from the idealized paradigm in Tables 4.13 and 4.14) in the object prefixes, but the other inflectional categories, subject and tense, also show referral. In the following section we shall examine the results of a survey of verb forms in Kanum, for the kinds of rules of referral they employ (see also section 5.5 in Koenig and Michelson’s chapter from this volume).

. Patterns in the rules of referral We can examine a wider selection of verbs, and examine not just the rules of referral found with object agreement, but also with subjects and tense. The results of this survey are shown in Tables 4.9–4.11, which show which verbs exemplify a particular OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Mark Donohue

Table . Referrals in the object prefixes

1sg 2sg 3sg, 2/3pl 3sg.f 1pl 2/3pl

1sg → wr = eyerk, âwmp, wr = rmpatwr, wr 2sg, 1pl → = arwar, yntakn = âwmp, eyerk 3sg, 2/3pl → = e, eyerk, = wkâ = rsa, yntakn 3sg.f → = = = = = pl → = = = = = =

Table . Referrals in tense affixes Future Present, Yesterday Today’s Past, Remote Past

Future → âwmp, e, erm, mak = Present, → atwa, âw, âwâ, erm, = Yesterday’s Past, lmpa, nkw Remote Past Today’s past → = ayngkâ, ey, lmpa, wâw =

Table . Referrals in the subject suffixes

sg 1pl 2pl 3pl sg → = ânta, wme, wr ânta, wr 1pl → = âwmp eyr 2/3pl → aprmngk ayngkâ, eyr, wâw extension of their idealized paradigm to another cell.3 Of this sample, fully twenty-two of the thirty verbs show some area in which the form in one paradigmatic cell extends to another. In terms of object takeovers, the plural forms, and the feminine form, never extend to mark other person/numbers. The 1sg object is immune from takeovers, and

3 The thirty verbs examined are: âlmyn ‘track’; ampl ‘laugh (at), have fun (with)’; ânta ‘feverish.pl.pst’; aprmngk ‘make’; arwar ‘call’; atwa ‘vomit’; âw ‘see’; âwâ ‘fetch’; âwmp ‘wash’; ayngkâ ‘fall’; e ‘tell’; erm ‘shoot.sg’; ew ‘see’; ey ‘die’; eyerk ‘sneak up on’; eyr ‘sleep’; lmpa ‘be angry with’; makr ‘roast’; nkw ‘hit linearly’; rmpatwr ‘jump (at)’; rmpwl ‘hit’; rsa ‘hit’; rwar ‘call’; wâ(w) ‘be’; wkâ ‘see’; wme ‘stay at’; wmpe ‘wash (tr.)’; wr ‘bite’; yntakn ‘trick’.Verbs that appear in more than one of Tables – are shown in bold; only one verb, âwmp ‘wash’,shows takeovers for all of subject, object, and tense inflection. A ‘=’ in a cell indicates that no extension occurs in that cell, and shading indicates the impossibility of extension. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Morphological opacity: Rules of referral in Kanum verbs 

1sg is the only form that can take over the 3sg.f role. In a similar vein, the 2sg/1pl forms are the only ones that can extend to the 3sg/2pl/3pl category, in one case breakingtheuniformityofthatgroup,andtheconverseisalsotrue.Inall,nineof the thirty verbs in the survey show paradigmatic takeovers for object agreement. In terms of tense takeovers, the Today’s Past and Remote Past are immune from other tense forms taking over their role. The future is never taken over by the Today’s Past, and the spread of a Present form to the Future is the most common form of tense takeovers. Eleven of the thirty verbs show paradigm takeovers for tense. With the subject suffixes eight of the thirty verbs show paradigm takeovers, though because of the extreme portmanteauing of subject and tense we should consider the number of cases of takeovers for subjects to be sixteen, almost twice as many as for objects and including nearly all of the verbs in the survey that do show referrals. Further, there are fewer constraints on the direction of takeovers for subject suffixes than those seen for object prefixes, or tense affixes, with the only restriction being on the non-swapability of 1pl and sg forms. Given that there are constraints on the kinds of takeovers attested in Kanum, and that the forms that occupy the cells are not only not randomly varying, but are in all cases regular members of the paradigms, it seems inelegant to simply describe the differences in inflectional paradigms in the verbs as ‘irregular’. Similarly, weare not discussing some kind of defective or deponent verb behaviour, since all of the inflectional elements are present on the verb. What we have, rather, is a loosely and erratically constrained pattern of alternations in the realization of inflectional categories, and in the kinds and number of contrasts that are realized. Nonetheless, the patterns of alternations are constrained, and do not represent wholesale suppletion within the paradigm. While there are no new elements in the paradigm to increase the size of the system, and no extensions of the dimensions of the inflectional paradigm (even portmanteau forms), there is a clear loss of transparency between the features specified for an inflectional cell and the morphemes representing those features. The following section briefly offers some examples of a loss of transparency in terms of how (much) an inflectional category is realized.

. Opacity in the paradigms realized In addition to the (partial) collapse in transparency we have seen in Kanum, due to the prevalence of rules of referral, degrees of morphological opacity can also be described in other languages with perfectly regular inflectional paradigms as a result of what is essentially lexically unpredictable inflectional ‘exuberance’. This can be illustrated with the Iha examples in (20) and (21). In (20) we can see that two verbal predicates compounded together take one agreement suffix that applies to the combined verbal compound; it is not possible for either of the verbal roots to take separate agreement inflection, even though ‘descend’ is eligible to take the same inflectional suffixes as OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Mark Donohue

‘fall’ when predicating. In (21) we see that when the evidential morpheme is added it is added to an already suffixed verb, and that the evidential morpheme itself takes suffixal agreement. It is not grammatical for one of the agreement morphemes to be omitted, apparently presenting the converse of the case in (20), and so establishing a degree of complexity in the predictability of verbal inflection. The functional reason behind the difference is evident in (21c): the evidential suffix is capable of taking ‘subject’ agreement suffixes that are referentially distinct from those used on the main verb, with a corresponding difference in the kind of evidentiality asserted. The fact that the inflected evidential morpheme alone is enough to form a(short) sentence, as in (22), indicates that the uses in (9b) and (9c) represent an only slightly modified grammaticalization from an independent verb, making the parallels with the compound in (20) more striking. Although there is a functional explanation for the differences in inflection, we are still faced with the fact that two different kinds of combinations of contentful verb-like morphemes show very different inflectional behaviour. Since suffixed auxiliariesinIha,suchasintheexampleshownin(23), pattern more like the serial verb construction seen in (20) we really cannot find a purely morphological explanation for the differential behaviour of the evidential morpheme with respect to inflection.

Iha (20) a. Ih-mo hu-hoqpow-dya. fruit-that descend-fall-3tpst ‘That fruit fell down.’ ∗ b. Ihmohudya-hoqpowdya (21) a. Ki-kne-dya. 2-see-3tpst ‘They saw you yesterday.’ b. Ki-kne-dy(a)-e-da. 2-see-3tpst-evid-3 ‘They saw you yesterday, they say.’ ∗ c. kikneedya, *kikneeda, *kiknedyae d. Ki-kne-dya-te-n. 2-see-3tpst-evid-1sg ‘They saw you yesterday, I assert.’ (22) Te-n. evid-1 ‘Really(Isawitwithmyowneyes)!’ OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Morphological opacity: Rules of referral in Kanum verbs 

(23) Ki-qpreg-myo-n-do-mb-on. 2-follow-just-conj-aux-ypst-1 ‘I was just following you.’ These different kinds of opacity, the unpredictability of takeovers in Kanum, and the non-predictability of inflectional realization in Iha, shows that there isa space between regularity and suppletion. Comparing the data on serial verbs from Palu’e (Austronesian, Southern Indonesia) and Skou (Skou family, north-central New Guinea; Donohue 2008) we can see that this kind of morphological unpredictability extends across word boundaries: in Palu’e serial verbs allow for only one agreement prefix, while in Skou each in a serial verb requires an agreement prefix. This, however, is consistent on a language level, but represents a level of opacity in the design of grammatical forms. (24) Palu’e a. Kam-kha lama-pue. 1pl-eat rice-mung.bean ‘We ate rice mixed with mung beans.’ b. Kam-1oa phalu lae nua. 1pl-descend go be.at house ‘Wewentdowntothehouse,...’ ∗ c. Kamboa kampalu ...

(25) Skou a. Mè ha mè=m-a. 2sg bag 2sg=2sg-carry ‘You carried a bag.’ b. Mè ha mè=b-é m-a me m-e pá=ing a. 2sg bag 2sg=2sg-get 2sg-carry 2sg:go 2sg-ascend house=def ‘You carried a bag away up into the house.’ ∗ c. Mè ha mè=b-é ha re e pá=ing a.

The occurrence of multiple agreement markers is well attested; Aikhenvald (2003) presents an interesting case from Tariana, where identical morphological agreement is found on verbs which do not share the same arguments, and both Anderson (1992) and Ortmann (1999) discuss the complications found in Dargwa. Cases of historical accretion of inflectional morphology, corresponding to different periods in a language’s history, are also well documented (van Driem 1987, Donohue 1999). The difference between these examples of the unpredictable appearance of agree- ment morphology, whether multiply on the same verb, or in different places in the complex predicate, is that Kanum does not show any optionality in the appearance OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Mark Donohue of agreement. Rather, the optionality is in how the relevant paradigm is represented. Whilesomeverbsareattestedinwhichtheinflectionalmorphologydirectlyrepre- sents the features of the arguments, or tense category, we have seen that it is quite common for verbs in Kanum to extend the range of one cell of a paradigm into another. While there are constraints on the kind of extensions found, the appearance of such an extension in a verb’s paradigm is not predictable.

. The middle ground between productivity and suppletion The preceding discussion has raised the questions of predictability, productivity, and complexity (see also Bauer 2004). The presence of rules of referral in any verb’s paradigm is lexically unpredictable, but is highly productive. It does not involve the use of novel or suppletive forms, but rather involves the extension (or, in some cases, transfer) of a form from one cell of a paradigm to another. In Kanum we have seen that takeovers operate most strongly for tense, but that object agreement is most strongly implicated in the more complicated (in the sense of unpredictable) verbal paradigms. Whether these patterns can be found in other languages or not remains to be seen. OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Morphological complexity àlaOneida

JEAN-PIERRE KOENIG AND KARIN MICHELSON

This chapter is about one particularly rich part of the verbal inflection of Oneida, a polysynthetic Iroquoian language. Morphological referencing of event participants in Oneida is achieved via a system of fifty-eight pronominal prefixes that are an obligatory part of the inflection of verbs. The sheer number of prefixes, and the relations between them, afford us the opportunity to ask what we believe is a unique setofquestionsaboutmorphologicalcomplexity. Morphological complexity differs from syntactic complexity in significant ways. Issues of morphosyntactic complexity have been mostly approached from two per- spectives: computational complexity, which results from the concatenation of compo- nents in a construction (a phrase, or a sentence), and algorithmic complexity, which is due to processing sentences in real time. Whether complexity is evaluated at the computational or algorithmic level, the kind of complexity that has interested syntac- ticians is syntagmatic, that is, it arises from the fact that the formatives of a sentence are linearly sequenced. In contrast, morphological complexity is paradigmatic in nature. Thus issues of morphological complexity stem, for example, from the need to select an affix (in the case of affixal morphology) among many possible choices andthe need to segment words into their formatives (or at least analyse words for purposes of classifying inflected words into the right paradigm; see Blevins 2013 for discussion). Of course, that morphological complexity differs from syntactic complexity is not novel, and some of the issues that have been addressed specifically about inflectional systems are the limits on inflectional classes (e.g. Carstairs 1983, Carstairs-McCarthy 1994, and discussion in Blevins 2004), optimal principal part systems (e.g. Finkel and Stump 2007, Finkel and Stump 2013), and entropy (e.g. Ackerman et al. 2009). Our focus here is to show that the two aspects of language production and comprehension that were mentioned as specific to morphology—the paradigmatic notions of selection and OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson segmentation—can lead to enormous complexity even when one considers just a single block of inflectional realizations (or, in traditional terms, a single position class).1 Oneida (Iroquoian) verbs include bound, obligatory prefixes that reference par- ticipants in the situations described by verbs. Thus, the verb form in (1) includes the prefix luwa-, which indicates that the described situation includes a third person feminine singular, third person indefinite, or third person dual or plural agent acting on (represented by “>”) a third person masculine singular patient, while the verb form in (2) includes the prefix shukwa-, which indicates that the described situation includes a third person masculine singular agent acting on a first person plural patient.2 (1) luwa-hlo.lí-he≈ 3>3m.sg-tell-hab ‘she, someone, or they tell him’ (2) shukwa-hlo.lí-he≈ 3m.sg>1pl-tell-hab ‘he tells us’ The prefixes illustrated in (1) and (2)—traditionally labelled pronominal prefixes— occur in a single position class (i.e. in incremental or realizational terms, are the outputofasingleruleblock).Therearefifty-eightoftheseprefixesandeversince Lounsbury’s (1953) seminal work, the complexity of Oneida pronominal prefixes has been considered a hallmark of Iroquoian languages (and a challenge for second language learners; see Abrams 2006). Oneida pronominal prefixes, then, provide a rather unique case of paradigmatic complexity. The bulk of our chapter is devoted to identifying the formal dimensions along which the Oneida pronominal prefix system may qualify as paradigmatically complex and we come back to the nature of paradigmatic complexity in the conclusion. Section 5.1 briefly describes Oneida pronominal prefixes. Section 5.2 identifies parameters of

1 See Hankamer () on Turkish, Fortescue () on Greelandic Eskimo, as well as Anderson (this volume), for cases where morphology is syntagmatically relatively complex. 2 In the Oneida examples R is a mid, central, nasalized vowel, and u is a high or mid-to-high, back, nasal- ized vowel. A raised period represents vowel length. Voicing is not contrastive. Abbreviations used in the morpheme glosses and in Table . are: a(gent), caus(ative), dp (dual or plural), du(al), ex(clusive person), fact(ual mode), f(eminine), fi (third person feminine singular or third person indefinite), fz(feminine- zoic), hab(itual aspect), indef(third person indefinite), jn (joiner vowel), m(asculine), n(euter), p(atient), pl(ural), pnc (punctual aspect), rep(etitive), sg (singular). The symbol > indicates a proto-agent acting on a proto-patient; for example, m.sg > sg should be understood as third person masculine singular acting on first person singular. / indicates that proto-role is underspecified for the prefix (i.e. the prefix references semantic properties of two participants, but not which of them is a proto-agent and which is a proto-patient). A comma before du or pl indicates that the dual or plural number is specified either for the proto-agent or the proto-patient; for example, >,du means that there is a first person acting on a second person, when eitherfirstorsecondpersonisdual,orbotharedual.Thebarenumeral, unaccompanied by any number or gender, abbreviates third person indefinite, third person feminine singular, third person masculine dual or plural, and third person feminine-zoic dual or plural. OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity  paradigmatic complexity. Sections 5.3, 5.4, and 5.5 focus on how Oneida pronominal prefixesstackuponthreeoftheseparameters(sizeofthespaceofmorphological distinctions to be marked, semantic ambiguity, and directness). Section 5.6 discusses one aspect of paradigmatic complexity specific to Oneida pronominal prefixes that, to our knowledge, has not been discussed in the literature, and that is the possible misidentification of pronominal prefixes in verb forms. Section 5.7 concludes the chapter.

. A brief description of Oneida pronominal prefixes The Oneida verb has an elaborate internal structure. Stems can be complex, derived via prefixation, suffixation, and noun incorporation. In addition to the obligatory pronominal prefixes, verb forms must have an aspect suffix or an imperative ending (often ‘zero’). The aspectual categories are habitual (basically imperfective), punctual (perfective), and stative (having stative, perfect, or progressive meaning depending on the verb). Verbs in the punctual aspect always occur with one of three modal prefixes, the factual, future, or optative (usually the factual in this chapter). In addition to a modal prefix, verb forms can have one or more of eight other prepronominal prefixes. An example of a typical verb in Oneida is given in (3). (3) s-a-huwRn-ákt-a-ht-e≈ rep-fact-3>3m.dp-go.to.a.point-jn-caus-pnc ‘they pushed them back, they made them retreat’ Pronominal prefixes are portmanteau-like. Although it is often possible to associate partsoftheprefixeswithsomefeatures(e.g.ti or wa with plurality), it is widely accepted that a single prefix references one or two participants as a whole. Thus, although one may associate the initial l with masculine in the prefix luwa- ‘3>3m.sg’, one cannot segment the prefix into two subparts, each referencing a distinct partic- ipant in the described situation. More generally, even parts of prefixes most easily associated with particular attribute-value pairs do not allow a segmentation into proto-agent and proto-patient parts. Consider the prefixes li- (referencing a 1sg proto- agent acting on a 3m.sg proto-patient) and lak- (referencing a 3m.sg proto-agent acting on a 1sg proto-patient). One can recognize l in both as marking masculine gender, but it marks the masculine gender of the proto-patient in the first case and the masculine gender of the proto-agent in the second case. Semantic categories distinguished by the pronominal prefixes are person (first, second, third, plus an inclusive/exclusive distinction), number (singular, dual, plural), and gender (masculine, feminine, feminine-zoic, neuter). In the singular, feminine- zoic gender is used for some female persons (see Abbott 1984) and for animals that are not personified and marked with the masculine. It is also used for all female OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson personsinthedualandplural,asthefemininegenderisrestrictedtothesingular.Note that neuter gender is a semantic category only (as explained later). In addition, there is a third person indefinite (‘indefinite’ in Lounsbury 1953, or ‘nonspecific’ in Chafe 1977) translated as ‘one, people, they, someone’. Because of the number of properties the prefixes can reference and because, as we will elaborate, prefixes mark up to two semantic arguments, the number of prefixes for this inflectional slot is quite large, fifty-eight in total. The fifty-eight prefixes are given in Table 5.1, based on Table6in Lounsbury 1953; the prefixes are numbered as in Lounsbury, and later on, when we refer to specific prefixes, we often identify the prefix with this number, preceded by‘L’ (for Lounsbury). The prefixes fall into two categories. ‘Transitive’ prefixes mark two animate argu- ments, as in the example in (4), repeated from (2). In Table 5.1, the properties of the proto-agentthataremarkedbytransitiveprefixesaregivenintheleftmostcolumnand properties of the proto-patient are given in the top row. The prefix in example (4) is used for a third person masculine singular proto-agent acting on a first person plural proto-patient. (4) shukwa-hlo.lí-he≈ 3m.sg>1pl-tell-hab ‘he tells us’ ‘Intransitive’ prefixes mark the single argument of monadic verbs. There are two categories of intransitive prefixes: an A(gent) set (‘subjective’ in Lounsbury 1953) and a P(atient) set (‘objective’ in Lounsbury 1953), exemplified in (5) and (7), respectively. In Table 5.1 agent prefixes are given in the column headed by  (for no patient) and patient prefixes are given in the row labelled  (for no agent). Verbs lexically select for agent/patient, although the distribution is not without semantic generalizations. Intransitive agent and patient prefixes are used also with dyadic or triadic verbs when there is only a single animate semantic argument and the other argument(s) is inanimate, as shown in (6), which has the same agent prefix as the example in (5). (5) wa-ha-ya.kR´-ne≈ fact-3m.sg.a-go.out-pnc ‘he went out’ (6) wa-ha-yR´tho-≈ fact-3m.sg.a-plant-pnc ‘he planted (it)’ (7) lo-nolú.se-he≈ 3m.sg.p-lazy-hab ‘he is lazy’ OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity  3m.dp 3fz.dp (3n) 24. loti- 23. yoti- 20. yo- 3f.sg 3indef (agent) 9. ka- 14. ni- 45. shakoti- 15. lati- 12. kni- 44. yakoti- 13. kuti- 27. shakwa- 3. yakwa- 28. ethni-29. ethwa- 4. tni- 5. twa- 31. etsni- 7. sni- 41. yethi- 32. etswa- 8. swa- 43. yetshi- -stem allomorphs) C - 7. sni- 8. swa- 20. yo- 9. ka- 22. yako- 52. ku- 53. kni- 54. kwa- 25. li- 1. k- 39. khe- 1sg 1du 1pl 2sg 2du 2pl 3m.sg 3fz.sg 46. yuk- 47. yukhi- 48. yesa- 43. yetshi- 33. luwa- 49. kuwa- 11.ye- 58. yutat- 51. luwati- 50. kuwati- Table . Oneida prenominal prefixes ( 1sg 1 ex.du1 ex.pl 26. shakni- 2. yakni- 40. yakhi- 1 in.du 1 in.pl 2sg2du 55. sk- 56. skni-2pl 57. skwa-3m.sg 34. lak- 35. shukni-3fz.sg 36. shukwa- 37. ya- 16. wak- 17. yukni- 31. etsni-(patient) 18. yukwa- 32. etswa- 21. lo- 19. sa 3f.sg 3indef 30. etsh- 10. la-3m.du 6. s-3m.pl 3fz.du 3fz.pl 38. shako- (3n) 42. she- OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson

The example in (6) shows that in dyadic verbs only animate arguments are marked. But since all verbs must have a pronominal prefix, verbs without any animate arguments default to an agent or patient feminine-zoic singular prefix, as shown in (8). In other words, neuter gender (for inanimates) is a semantic category only; it is not relevant morphologically (see Koenig and Michelson 2012).3 (In Table 5.1 the default feminine- zoic prefix is given in a cell labelled ‘(3N).’) (8) wa≈-ka-ná.nawR-≈ fact-3fz.sg.a-get.wet-pnc ‘it got wet’ All fifty-eight prefixes have varying forms (‘allomorphs’), at least two and up to five, depending on the initial sound of the following stem. There are five stem-classes: C-stems, i-stems, o-/u-stems, e-/R-stems, and a-stems. In addition, thirty-nine of the fifty-eight prefixes have two variants depending on what occurs to their left (e.g. whether they occur word-initially or not). Allomorphy is exemplified in (9) with L39 khe-/ khey- and with L33 luwa-/luwR-/luway-/luw-/-huwa-/-huwR-/-huway- /-huw-. In total, there are 326 allomorphs, with an average of about five allomorphs per prefix. Note that the allomorph khe- of L39 occurs with stems that begin with a consonant or i, while the allomorph khey- occurs with o-/u-stems, e-/R-stems, and a- stems. This is one possible grouping, i.e. C-andi-stems, versus o-/u-stems, e-/R-stems, and a-stems. The allomorphs of prefix L33 luwa-/luwR-/luway-/luw-/-huwa-/-huwR- /-huway-/-huw- exhibit a different grouping, namely e-/R-stems and a-stems (-huw-) versus o-/u-stems (-huway-)versusi-stems (-huwR-)versusC-stems (-huwa-). There are eleven such groupings. (9) a-stem a-stem wa≈-khey-atnúhtuht-e≈ wa-huw-atnúhtuht-e≈ fact-1sg>3-wait.for-pnc fact-3>3m.sg-wait.for-pnc ‘I waited for her or them’ ‘she or they waited for him’ e-/R-stem e-/R-stem wa≈-khey-R´hahs-e≈ wa-huw-R´hahs-e≈ fact-1sg>3-belittle-pnc fact-3>3m.sg-belittle-pnc ‘I belittled her or them’ ‘she or they belittled him’

3 The morphological irrelevance of neuter raises the question of whether neuter is relevant at all for Oneida morphosyntax, as a reviewer points out. The answer is that it still is, as it is a fact of Oneida grammar that only animate arguments are referenced morphologically. The statement of this fact requires the semantic differentiation of animate and inanimate semantic indices (see rule () and Koenig and Michelson  for details and Koenig and Michelson, in press, for how depersonalization can be used for communicative purposes). OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity 

o-/u-stem o-/u-stem wa≈-khey-ótyak-e≈ wa-huway-ótyak-e≈ fact-1sg>3-raise-pnc fact-3>3m.sg-raise-pnc ‘I raised her or them’ ‘she or they raised him’ i-stem i-stem wa≈-khe-(i)hnúks-a≈ wa-huwR-(i)hnúks-a≈ fact-1sg>3-fetch-pnc fact-3>3m.sg-fetch-pnc ‘I went after, fetched her or them’ ‘she or they went after, fetched him’ C-stem C-stem wa≈-khé-kwaht-e≈ wa-huwá-kwáht-e≈ fact-1sg>3-invite-pnc fact-3>3m.sg-invite-pnc ‘I invited her or them’ ‘she or they invited him’ C-stem (initial allomorph) luwa-kwát-ha≈ 3>3m.sg-invite-hab ‘she or they invite(s) him’

. Evaluating paradigmatic complexity As we mentioned in the introduction, our goal in this chapter is to delineate a form of complexity only morphology exhibits, what we call paradigmatic complexity.Wewill anchorourdiscussionofparadigmaticcomplexitytothenumberandkindsofrules needed for realizing the morphological distinctions expressed by Oneida pronominal prefixes. By discussing the issue in the context of rules for realizing morphological feature bundles (so-called rules of exponence, see Stump 2001), we can more easily providepossiblemeasuresofparadigmaticcomplexityandevaluateOneidapronom- inal prefixes on these possible measures. We provide more speculative remarks on how and why these measures can serve as indices of paradigmatic complexity in the conclusion. Since pronominal prefixes encode properties of participants in the event described by a verb, one can think of the proper use of pronominal prefixes by speakers as the result of the correct application of two sets of rules (or constraints). The first set of rules maps semantic properties of participants in the described situation onto morphosyntactic feature sets; the second maps morphosyntactic feature sets onto phonological marks. The first set of rules relates the semantic categories of participants in the situation types described by verbs and the values of the morphological agr attribute; the second relates the values of the morphological agr attribute to the phonologicalreflexesofthosevalues.Wehavenothingtosayaboutthefirstset of rules, except to note that that set of rules is simple in the case of Oneida, as the morphosyntactic features are easily inferrable from observable properties of participants in situations. In other words, Oneida prefixes exhibit what one could OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson call semantic ˆ feature sets. Pronominal prefixes reference properties of one or two animate participants in situations, number marking corresponds to model-theoretic plurality, gender marking to model-theoretic gender, and so forth.4 So, what makes the Oneida prononimal prefix system complex? One way to approach this question is to think about what inflectional rules are. At a conceptual level, we conceive of inflectional rules as in (10a); a very informal example from Oneida is provided in (10b) (see below for more formal examples). (10) a. Morphosyntactic Feature Set ⇒ Form b. 3m.sg>1pl ⇒ shukwa With (10) in mind, we can distinguish at least four kinds of complexity: 1. Size: (a) What is the number of rules of the form (10)? Obviously morphological complexity increases with the number of inflectional rules. For Oneida pronominal prefixes, we need at most 326 rules (see Zwicky 1985 and later in this section for more details), since there are 326 forms (or allomorphs). (b) What is the size of the morphosyntactic feature space? In other words, what is the number of distinctions that can be marked (abstracting away from neutralization)? Assessing that number depends on the linguistic model of the morphosyntactic feature set, as explained later in this section. 2. Semantic ambiguity: On average, how many semantic distinctions are expressed bythesameform?InthecaseofOneidapronominalprefixes,onaverage,howmany combinations of participant categories are expressed by a single prefix? 3. Directness: Can we account for all the forms with rules that have the form in (10)? The form of inflectional rules in (10) is simple: it assumes a direct link between morphosyntactic feature sets and phonological forms. The question is whether a particular inflectional block (in this case, Oneida pronominal prefixes) needs anything more than that. There is evidence, as we elaborate later, which suggests that the association between bundles of morphosyntactic features and forms can beindirectandinsomecasesthekindofrulethatisrequiredisonewherethe output (form) of one rule is extended to, or identified with, the output (form) of another rule (i.e. requires reference to a function, akin to a paradigm function in Stump 2001). 4. Segmentation and generalizability: How difficult is it to recognize a pronom- inal prefix that is present in a particular verb form? For example, given the form shukwahlo.líhe≈ ‘he tells us’, given earlier in (4), how easy is it to separate the prefix from the stem so that speakers can then identify the prefix in lakhlo.líhe≈ ‘he tells me’?

4 As can be expected, this is an oversimplification, but the extent to which features do not correspond to model-theoretic properties can be attributed to ordinary grammar ‘leakage’. OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity 

One way to think about this kind of potential difficulty is to ask how difficult itisto backward-chain from the consequent of the conditional to its antecedent (see Russell and Norvig 2009 on backward chaining). Another way of thinking about this question is to ask how difficult it is to infer the entire set of inflectional rules from a newly learned inflected form (or set of inflected forms). So, for example, how difficult is itto predict the form khehlo.líhe≈ ‘I tell her or them’ from either lakhlo.líhe≈ ‘he tells me’ or shukwahlo.líhe≈ ‘he tells us’ (or both)? This issue is sometimes raised in discussions of principal parts or discussions of conditional entropy (see Finkel and Stump 2007; Ackerman et al. 2009), but the specific kinds of problems that result from being led astray in segmenting a verb form and generalizing from it is rarely discussed in the literature, as far as we know, and it is a particularly interesting aspect of what makes Oneida pronominal prefixes complex.

. Size Oneida pronominal prefixes are rather complex if our measure is number of rules. The upper bound on the number of rules is 326. There are fifty-eight morphs, withon average a little over five allomorphs per morph. We use the term morph here rather than morpheme to stress that we are talking about a class of forms and not committing ourselves to a morpheme-based theory of inflection. The number of allomorphs isa upper bound on this complexity dimension because if inflectional morphology is a set of rules of the form Properties ⇒ Forms,thereareatmostasmanyrulesasthereare number of distinct forms. But if 326 is the maximum number of rules—something quite large for a single inflectional block—another possible measure of paradigmatic complexity is the space of possible morphosyntactic properties that the particular inflectional block serves to realize or mark. Oneida prefixes can mark up to two animate arguments, and if they mark one animate argument they can belong to the Agent or Patient series of (intransitive) prefixes. If the features were all orthogonal, a pronominal prefix could reference, in the worst-case scenario, (4 (persons) × 3(numbers)× 4 (genders)) + 1 indefinite = 49 combinations of features. Since transitive prefixes reference two argu- ments, transitive pronominal prefixes could reference up to 49 × 49 = 2,401 feature combinations and Agent and Patient intransitive prefixes could reference 49 feature combinations each, resulting in a total of 2,499 feature combinations. The space of possible antecedents of rules of exponence for Oneida pronominal prefixes would be quite large indeed. Whether 2,499 is a useful measure of paradigmatic complexity is not entirely clear, though. This is because argument properties that can be referenced by prefixes are not orthogonal. Much of the work on category structure in the 1980s (see Gazdar et al. 1988 for a nice overview), following more traditional structuralist work, reduces the space of feature combinations we just outlined quite significantly. So, a more OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson realistic measure of how size may lead to the complexity of the Oneida pronominal prefixes is the minimum number of morphological distinctions that may be referenced phonologically if one adopts the most parsimonious and motivated analysis of the space of feature combinations. In Oneida, as in most, if not all, languages, there are restrictions on the combination of properties of ˆ feature bundles. For example, gender is a relevant morphosyntactic feature only for third person. Some of these restrictions are typologically, or logically, expected (the one we just mentioned or the fact that the person value inclusive requires dual or plural number). Some are idiosyncrasies of the morphology of Oneida, such asthefactthatthefemininegenderoccursonlyinthesingular. (As mentioned earlier, when referring to two or more females, the feminine-zoic gender is used.) Formally, restrictions on the space of possible combinations of ˆ features result from two kinds of mechanisms or constraints: type (or sort) appropriateness conditions and feature co-occurrence restrictions. The first set of constraints encodes systematic restrictions on categories. For example, (11b) says that if the nominal index is of type 3-n-indef-index (denotes a discourse referent that is third person and not required to be unspecified/indefinite), then andonlythenthe feature gend is appropriate. By restricting features to the appropriate categories of nominal indices, type appro- priateness conditions are a way of encoding that, as far as Oneida’s morphology is concerned, gender is an attribute that only makes sense for non-indefinite third person participants. The second kind of constraints model idiosyncratic restrictions on combinations of properties. Thus, (12b) says that if a nominal index is of feminine gender, the value of the num attribute is singular. It so happens that dual or plural number and feminine gender are incompatible in Oneida. Feature co-occurrence restrictions encode this unpredictable incompatibility of attribute values. (11) a. Only third person nominal indices that are not indefinite bear gender information.   b. 3-n-indef-index ⇒ gend gender

(12) a. Feminine nominal indices are singular.     b. gend fem ⇒ num sg

The net effect of type appropriateness and feature co-occurrence constraints on combinations of ˆ features is to reduce the number of combinations from forty-nine (if features were truly orthogonal to each other) to nineteen, shown in Table 5.2. But the number of possible ˆ feature combinations is further reduced by one general constraint. Participants in described situations that are inanimate are never referenced morphologically. Thus, one can speak of four semantic genders in Oneida (and Iroquoian, in general), namely, masculine, feminine, feminine-zoic, and neuter, OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity 

Table . The nineteen possible categories of semantic indices 1st sg/du/pl 2nd sg/du/pl Incl du/pl 3rd-indef 3rd masc/feminine-zoic/neuter sg/du/pl 3rd feminine sg but only three morphological genders. We dub this constraint the Animate Argument Constraint. It is stated in (13). (Recall that verbs that have only inanimate (neuter) arguments default to the feminine-zoic gender.) The upshot of this constraint is that out of the nineteen semantic categories of nominal indices, only sixteen are morphologically relevant. (13) All and only indices for animate semantic arguments of verbs are members of the value of the agr attribute. All in all, linguistic analysis allows us to reduce the space of possible feature combinations from 2,499 to 16×16+2×16 = 288 possible combinations. In addition, for first person acting on first person, and for second person or inclusive acting on second person, a reflexive construction (prefix) is used, and this further reduces the number of possible combinations from 288 to 248. Quite large, but not inconceivable.5

. Semantic ambiguity Reducing the number of rules by reducing the number of possible ˆ combinations (from 2,499 to 288/248, in this case) will always lead to a reduction in complexity, as it simply reduces the space of morphosyntactic properties to reference morphologically. However, a reduction in the number of possible antecedents of rules of exponence (and consequently a reduction in the number of rules) does not result in a reduction of complexity unequivocally. This is because a reduction in rule antecedents is possible only because not all of the possible feature combinations are realized by distinct forms and this fact results in semantic ambiguity, as we explain in this section.

5 Another way of measuring the complexity that may arise from the sheer size of Oneida pronominal prefixes paradigm is to use Carstairs’ () approach to complexity and compute the number of logically possible paradigms from the number of affixes and the number of allomorphs for each affix. The number of such ‘possible paradigms’ for the Latin nominal declension is a little over 2 × 104;thenumberofsuch ‘possible paradigms’ for Oneida pronominal prefixes is a little under 4 × 1025. Whether Carstairs’ measure is useful or not is a matter of debate (see Blevins ), but the difference in size indicates that Oneida pronominal prefixes are in another league from the Latin nominal declension. OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson

nominal-index

3-indef-index n-indef-index person 3 num number

sp-part-index 3-n-indef-index pers sp-part pers 3 gend gender

Figure . A hierarchy of nom-index

person

3sp-part

Incl Excl

1-Excl 2-Excl

Figure . A hierarchy of person values

gender

neuter anim

fem other-anim

fem-zoic masc

Figure . A hierarchy of gender values

number

sg dp

pl dual

Figure . A hierarchy of number values OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity 

Although the total number of morphosyntactic distinctions which can be marked in Oneida is 288/248, only fifty-eight of those are marked. To model the reduction from the number of potential morphosyntactic distinctions to the number of actual morphosyntactic distinctions, we make use of underspecification. Technically, under- specification in our rules of exponence is achieved by letting rules of exponence make reference to more or less specific types of nominal indices or properties of nominal indices. So, a rule that applies to all animate participants will have the value of the gend attribute of the corresponding nominal index be animate,butarulethatonly applies to masculine gender participants will have the value of the gend attribute of the corresponding nominal index be masc. The hierarchies of types of nominal indices as well as the hierarchies of feature-values relevant for Oneida morphology are presented in Figures 5.1–5.4, where each non-leaf node in a hierarchy represents a more general type referred to by at least one rule of exponence.6 An example of an underspecified rule of exponence is given in (14). (14) a. If a stem belongs to the consonant class and references a first person exclusive dual or plural proto-agent acting on a third person feminine singular proto- patient, prefix yakhi- to its phonological form.7 stem 1 b. pdgm class c morph 2 ⇒ expo (yakhi ⊕ 1, 2) pers gend agr excl , fem num dp num sg

The antecedent of this rule applies to all lexemes that are in the consonant stem class and that describe situations where a first person exclusive dual or plural set of entities acts on a third person feminine singular entity. The antecedent leaves underspecified the number of the proto-agent argument by having as value for the number attribute the non-leaf type dp. In other words, the type dp covers all non- singular numbers, i.e. both plural and dual participants. The consequent simply concatenates the pronominal prefix yakhi- to the consonant-initial stem. Now, the number of distinct values of the agr features that serve in antecedents of rules of exponence is fifty-eight, and each of the fifty-eight distinct agr values is associated with a set of allomorphs. So, in addition to the rule in (14), we also have the rule in (15), which targets the same agr value (that is, references the same participant categories), but applies to lexemes that belong to the a-, e-/R-, and o-/u-stem classes.

6 By Excl in Figure . we mean first person to the exclusion of second person or second person to the exclusion of first person. 7 The prefix yakhi- also applies when the proto-patient is third person indefinite/unspecified, or mascu- line or feminine-zoic dual or plural as the result of the application of three distinct rules of referral, as we mention later on. OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson

(15) a. If a stem belongs to the a, e/R,oro/u class and references a first person exclusive dual or plural proto-agent acting on a third person feminine singular proto-patient, prefix yakhiy- to its phonological form. b. stem 1 pdgm class a ∨ e/Λ ∨ o/u morph 2 ⇒ expo (yakhiy ⊕ 1, 2) pers excl gend fem agr , num dp num sg

As alluded to earlier, to avoid committing ourselves to a morpheme-based view of inflection, we will call morph a class of allomorphs that share agr values, that is, a class of consequents of rules whose antecedent agr values are identical. For purposes of presentation, we will refer to morphs by their consonant-stem allomorphs. So, the consequents of (14b) and (15b) belong to the same morph. The set of distinct cells in Table 5.1 (or Table 6 in Lounsbury 1953) is the set of morphs in Oneida. Allomorphs of twenty-three of the fifty-eight morphs realize underspecified agr values.8 Underspecification allows us to further reduce the number of morphs from 248 to fifty-eight, i.e. by almost one-half order of magnitude, the set of agreement distinctions relevant to Oneida pronominal prefix rules of exponence. On the surface, the reduction in number of actual morphosyntactic distinctions through underspecification leads to a simplification of Oneida pronominal morph- ology. Just as the reduction of possible distinctions to be referenced morphologically is accomplished by structuring nominal indices, the underspecification of agr values reduces the number of rules of exponence. But the reduction in number of rules does not mean unequivocally a reduction in complexity of Oneida pronominal morphol- ogy. There is a distinction here between formal complexity and what one could call usage complexity. The morphological system of Oneida (measured in the number of rules, constraints, and the like) is certainly made simpler by the use of underspecified agr values in the antecedents of rules of exponence. But, underspecification comes at a cost for the user, particularly in comprehension. This is because upon hearing many verb forms (any one of the twenty-three that are described by rules that exploit underspecification), native speakers will not be able to determine important properties of the participants in the situation described by the verb form (all the more so, since less than a quarter

8 Ten morphs realize underspecified number (L yoti-,Lloti-,Lyakhi-,Lyethi-,Lyakoti-,L shakoti-,Lkuwati-,Lluwati-,Lkni-,Lkwa-). Four morphs realize underspecified incl/excl (L yukni-,Lyukwa-,Lshukni-,Lshukwa-). Four morphs leave underspecified the proto-role of the two participants they reference (L sni-,Lswa-,Letsni-,Letswa-). One morph realizes underspecified gender (L lo-). Three morphs realize underspecified number and inclusive/exclusive (L yukhi-,L skni-,Lskwa-), and one morph realizes underspecified number and role (L yetshi-). OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity  of semantic arguments are further specified by external phrases in Oneida). Consider the prefix etsni- that attaches to stems beginning in a consonant. This prefix is used whenever the described situation involves a third person masculine singular participant and a second person dual participant. But, the prefix leaves underspecified which proto-role these two participants play. So, it can be used either when a third person masculine singular proto-agent acts on a second person dual proto-patient, or a second person dual proto-agent acts on a third person masculine singular proto- patient. Underspecification of proto-roles associated with the prefix etsni- means that upon hearing a word form containing this prefix, hearers may be uncertain as to who was the agent and who was the patient. In other words, underspecification of proto-role, while reducing the number of rules of exponence, makes corresponding utterances more ambiguous and is thus a source of complexity for hearers. True, some underspecification may not be resolved by hearers; for example, hearers may not care if the proto-patient of a form like yakhi- references a first person exclusive dual or first person exclusive plural proto-agent (acting on a third person feminine singular). They may leave that issue unresolved until context resolves it. But the fact that, most plausibly, hearers will, for at least some forms (or some situations), attempt to resolve the semantic ambiguity arising from the underspecification of agr values means that underspecification is not an unqualified blessing.

. Directness Rulessuchas(14)and(15)directlyassociateamorphosyntacticfeaturesetwitha form.Thequestioniswhethertherealizationofallmorphosyntacticallyrelevant agr values takes this direct form in Oneida. The answer is No.Therearegood reasons to posit rules of referral of the kind Zwicky (1985) discusses. The rules of exponence we have talked about until now have used underspecification of val- ues of a portion of the agr attribute to model syncretism. Implicit in this use of underspecification is the assumption that the relevant underspecified (nominal) index makes semantic sense. For example, the exponence rule that references 3m.dp acting on3f.sg(therulefortheprefixshakoti-) can underspecify the dual/plural number distinction of the proto-agent index because non-singularity is a semantically coherent category. In contrast, there is no semantically coherent category corresponding to participants that are either third person indefinite or nonspecific (irrespective of number and gender) or third person feminine singular. The absence of a semantic natural class that would group together these two semantic indices is modelled in our account through the absence of an underspecified index that would cover these two indices. To account for the formal similarity of exponents of third person indefi- nite/nonspecific and third person feminine singular arguments, we could, of course, use a disjunction in the agr value of rules of exponence, which amounts to having two OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson sets of rules of exponence (as (p∨q) → r is logically equivalent to (p → q)∧(p → r)), one for agr values that include third person feminine singular indices, and one for agr values that include third person indefinite/nonspecific. Such an analysis amounts to treating the similarity in exponence as a phonological accident. And certainly those kinds of accidental formal identity arise in inflectional systems. But, in the case of 3indef and 3f.sg this is unlikely since the identity of exponence is systematic: All prefixes that reference a 3indef participant are formally identical to the prefixes referencing a corresponding 3f.sg participant (see Chafe 1977 for a possible historical explanation for that systematic formal identity). To model what is not an accidental formal identity of exponents we use rules of referral that systematically relate the exponence of one feature bundle to that of another feature bundle. Of course, the need for rules of referral or how to model them is well known (see Stump 2001 for a detailed overview). What interests us here, though, is that rules of referral introduce another, more formal, source of complexity, something that has not been stressed before. In particular, representing rules of referral requires us to mediate the association between a morphosyntactic feature set and a form by making explicit reference to a function that takes the morphosyntactic feature set as input and outputs the form. In plain English, rather than having rules that say something like ‘spell out agr value · as x’,we now have rules that say ‘the value of the function that spells out · isthesameasthevalueofthefunction that spells out ‚’. Thisisbecausearuleofreferralbasicallysaysthattheformmarkingafeatureset· is thesameastheformmarkingafeatureset‚. In other words, it says that the values of an (exponence) function are the same. Hence, rules of referral cannot be stated without the introduction of such exponence functions. The need for this additional level of abstraction (i.e. reference to exponence functions) is an additional complexity of the Oneida pronominal prefix system. There is evidence for five such rules of exponence in Oneida. One such ruleis stated in (16). (16) says that intransitive prefixes of the A(gent) class that realize the agr value < · > are the same as transitive prefixes that realize the agr value < ·, feminine-zoic sg >,provided· is a speech participant or third masculine singular. An example prefix for this rule of referral is L5 twa-, which attaches to consonant-initial stems and marks either first person inclusive plural A(gent) participants, or a first person inclusive plural proto-agent acting on a third person feminine-zoic singular proto-patient. A similar rule says that intransitive prefixes of the P(atient) class that realize the agr value < · > are the same as transitive prefixes that realize the agr value ,provided· is not third person masculine dual/plural or third person feminine-zoic dual/plural (in which case the prefixes loti- and yoti-, respectively, are used). (16) a. The pronominal prefix for stems that reference a single A(gent) speech participant or third masculine singular nominal index · is the same as the OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity 

pronominal prefix for stems that reference a proto-agent · acting on a third person feminine-zoic singular proto-patient. b. pdgm affix-type a expo ( 2 , morph gend masc ) agr 1 (sp-part-index ∨ ) num sg

gend zoic = expo ( 2 , morph agr 1 , ) num sg

Inherent in rule (16) is the assumption that this is a case of asymmetric neu- tralization, as defined in Stump (2001) and Baerman et al. (2005). Evidence for the asymmetry comes from two prefixes, L7 sni- and L8 swa- (we cite, as usual, the allomorphs for consonant-initial stems), which, as transitive prefixes, neutralize the distinction between proto-agent and proto-patient arguments. When morphologi- cally referencing two participants, sni- references either athirdpersonfeminine- zoic singular proto-agent acting on a second person dual proto-patient or asecond person dual proto-agent acting on a third person feminine-zoic singular proto-patient. When referencing a single participant for stems that select A(gent) prefixes, sni- references a second person dual participant and when referencing a single participant for stems that select P(atient) prefixes, sni- likewise references a second person dual participant. The prefix swa- has a similar distribution: when referencing two participants, swa- references either a third person feminine-zoic singular proto-agent acting on a second person plural proto-patient or asecondpersonpluralproto-agent acting on a third person feminine-zoic singular proto-patient. When referencing a single participant for stems that select A(gent) prefixes, swa- references a second person plural participant, and when referencing a single participant for stems that select P(atient) prefixes, swa- likewise references a second person plural participant. (Note that sni- and swa- aretheonlymorphsthatoccurinbothintransitiveagent and intransitive patient paradigms.) Now, if we assume the rules of exponence that correspond to sni- and swa- (and the other rules for the allomorphs of the same morph) have transitive agr values as antecedents, the equality in (16) can derive the corresponding intransitive uses of these prefixes. What is significant is the fact that the underspecified proto-role in the transitive uses of sni- and swa- is paralleled by the underspecified proto-role in three other prefixes: L31 etsni-,L32etswa-,andL43 yetshi-. Rules of referral that extend transitive uses of sni- and swa- to intransitive uses allow us to capture this parallel and thus leads to a conceptually simpler model (albeit, not by much). Three additional rules of referral are needed. One ‘extends’ the third person fem- inine singular to the third person indefinite/nonspecific; the other two extend the third person feminine singular to transitive prefixes with a third person masculine OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson dual/plural or third person feminine-zoic dual/plural proto-agent, and to transitive prefixes with a third person masculine dual/plural or third person feminine-zoic dual/plural proto-patient.

. Segmentation difficulties The question we set out to answer in this chapter is whether there are aspects of what we call paradigmatic complexity that are quite different from the more often studied syntagmatic complexity. The paradigmatic complexity we have focused on involves a single inflectional block, and one dimension that is specific to paradigmatic complexity is the complexity that arises from combining an affix with a stem. In Oneida, there are two ways that complexity can arise from concatenating a pronominal prefix with a stem. Both are due to phonological adjustments at the boundary between pronominal prefix and stem, and both have the effect of impeding the ability of speakers to generalize from an observed verb form to all the other verb forms based onthesamestem.9 Consider first the forms in (17), all three of which include the pronominal prefix L38 shako-, which references a third person masculine singular proto-agent acting on a ‘3’ proto-patient. This prefix induces the deletion of a stem- initial a or i, as (17a) and (17b) indicate. As a result, the three verb forms in (17) are ambiguous as to stem class. Each of the forms could be an a-stem, an i-stem, or a C-stem. This kind of ambiguity is similar to the better-known ambiguities found in Indo-European languages, e.g. the conjugation class ambiguity of Latin amo (is it a first or third conjugation class form?). (17) a. wa-shako-anúhtuht-e≈ ⇒ washakonúhtuhte≈ fact-3m.sg>3-wait.for-pnc ‘he waited for her or them’ b. wa-shako-ihnúks-a≈ ⇒ washakohnúksa≈ fact-3m.sg>3-fetch-pnc ‘he went after, fetched her or them’ c. wa-shako-li≈wanu.tú.s-e≈ ⇒ washakoli≈wanu.tú.se≈ fact-3m.sg>3-ask.someone-pnc ‘he asked her or them’ Consider next the forms in (18) which show different allomorphs of the prefix L34 lak-, which indicates that the described event involves a third person masculine singular proto-agent acting on a first person singular proto-patient. The allomorph hakw- occurs with e-/R-stems, while the allomorph hak- occurs with consonant stems.

9 See Bank and Trommer (this volume) for a discussion of automatic learning of morphological segmentation. OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity 

But, the forms in (18) can be parsed two different ways, depending on whether or not one assumes the w is part of the pronominal prefix or part of the stem. (18) 3m.sg>1sg hak/hakw : C-stem or e-/R-stem? a. wa-hakw-R´hahs-e≈ fact-3m.sg>1sg-belittle-pnc ‘he belittled me’ b. wa-hak-wRnahnóthahs-e≈ fact-3m.sg>1sg-read.to-pnc ‘hereadtome’ Both the types of situations exemplified in (17) and (18) lead to complexity because, in both cases, the class to which the stem belongs cannot be unambiguously deter- mined from the forms given. As a consequence, one cannot infer in either case all the other forms in the paradigm of the stem, as selection of pronominal prefix allomorphs depends on stem class. The reason for this latter fact is specific to Iroquoian: Each morph has several allomorphs conditioned by the class to which the stem belongs, and that stem class is determined by the identity of the initial sound of the stem. In fact, the mapping between allomorphs and stem classes is itself rather complex because different morphs associate different groups of stem classes with the different allomorphs. For example, the morph referencing a first person exclusive plural agent has four allomorphs depending on the class to which the stem belongs, yakwa- if the following stem starts with a consonant, yakwR- if it starts with an i, yakw- if it starts with a, e,orR,and,finally,yaky- if it starts with o or u. The morph referencing a first person singular proto-agent acting on a ‘3’ proto-patient, on the other hand, has only two allomorphs conditioned by the stem class, khe- before consonant-stems and i-stems and khey- before a-, e-/R-, and o-/u-stems. Overall, there are eleven distinct groupings of stem classes for the fifty-eight pronominal prefix morphs. (Groupings were mentioned earlier at the end of section 5.1 in connection with the distribution of the allomorphs of prefix L39 khe- and prefix L33 luwa-.) Why do the segmentation difficulties we have just mentioned—i.e. determining the class to which the stem belongs either because the stem-initial segment is obscured due to phonological processes or because the boundary between prefix and stem can be located in more than one place—lead to complexities? First, speakers must learn, for each morph, which group of stem classes goes with each allomorph. There are some subregularities, of course. But, ultimately, nothing can save speakers from having to learn that a-, e-/R-, and o-/u-stems require khey- as the allomorph for the 1sg>3prefix. Second, even after having learned which allomorph of which morph goes with which grouping of stem classes, speakers cannot necessarily generalize from a given form of a verb to all the other forms of that verb. We can quantify somewhat the complexity introduced by segmentation difficulties by asking what is the number of morphs in each stem class from which the other OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson

Table . Number of morphs in each stem class from which all other fifty-seven forms can be deduced C-stem i-stem o-/u-stem e-/R-stem a-stem

1114524–40/20–3026 fifty-seven forms of the stem class can be deduced. In a perfect system, each of the fifty-eight morphs for all five stem classes would allow one to infer the fifty-seven other forms. Table 5.3 shows that reality is far from perfect. Two notes regarding Table 5.3 are: (1) Lounsbury (1953: 56) reports that some speakers use prefixes with i-stems that are similar to those found on C-stems, while other speakers use prefixes that otherwise occur with stems with an initial vowel. The speakers that Michelson hasworkedwithinOntariosince1979usethevowel-stemprefixeswithonlytwo verbs, -ihlu- ‘say’ and -ihey- ‘die’. The number of i-stem forms is based on the prefixes that overlap with C-stems since these represent the majority. (2) An innovation is that some of the transitive prefixes have either extended an allomorph ending in y from o-/u-stems to e/R-stems (L33 luway-,L48yesay-,L49kuway-)ordevelopednew allomorphs in y before both o-/u-stems and e-/R-stems (e.g. L27 shukway- or L38 shakoy-). The table gives two numbers each for e-stems and R-stems; the first number is the number of morphs from which the other morphs of the same stem class can be deduced assuming the older forms without y;thesecondnumberisthenumberof morphs needed assuming the innovative forms with y areusedinstead. Table 5.3 shows that, depending on the stem class, between 1.72 per cent and 78 per cent of verb forms allow one to infer the fifty-seven other verb forms (again, restricting ourselves for now to the combination of pronominal prefixes and stems). As Ackerman et al. (2009) have argued though, the complexity of an inflectional system partially depends on how frequent certain ambiguities are. In this particular case, the generalizability penalty is more or less severe depending on the relative frequency of the various stem classes. If C-stems, for example, are very rare, then the lack of generalizability for all but one morph is not as worrisome as if C-stems were very frequent. Table 5.4 shows the number of stems of each class in twelve pseudo-randomly chosen naturally occurring texts and the proportion of stems of each class across these twelve texts. It seems that C-stems are the most frequent ones, accounting for 45 per cent of the stems in these twelve texts.10 So, the fact that only one of the fifty-eight forms

10 Despite the much more frequent occurrence of C-stems tokens in our sample texts, there is no intuitive senseinwhichC-stems forms are the default any more than, hypothetically, finding out that first declension forms are significantly more frequent in some Latin texts would lead us to say that first declension isthe default declension class of Latin nouns. OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity 

Table . Number and percentage of stems of each class in twelve texts Total 

C-stems 504 45 a-stems 373 34 e-/R-stems 109 10 i-stems 102 9 o-/u-stems 26 2 Total 1,114 100 for C-stems allow deduction of all other forms is worrisome. The average lack of generalizability from one form to all others is more severe than Table 5.3 would suggest.

. Conclusion Muchofthediscussiononwhatisahumanlanguageoverthelastsixtyyearshas focused on properties of the syntactic and semantic combinatorics, the ‘discrete infinity’ displayed by both syntax and semantics. Measures of complexities (including, for example, where natural languages fall on the Chomsky hierarchy) have, as a consequence, focused on syntagmatic complexity, i.e. the result of putting morphemes, words, and phrases together. In this chapter, we discussed another kind of possible complexity, namely the complexity that a single position class or rule block can exhibit, what we call paradigmatic complexity. By focusing on a single block we believe it is easier to ask questions about what makes inflection complex and explore to what extent the kind of complexity exhibited by inflection differs from syntagmatic complexities. The first kind of complexity a single inflectional block can exhibit is the sheer range of choices one can make. In the case of Oneida pronominal prefixes this means fifty-eight morphs and 326 allomorphs (roughly five allomorphs per morph). So,not unsurprisingly,paradigmaticcomplexityisfirstamatterofnumberofchoices.But why are inflectional choices difficult? After all, one’s vocabulary is about three orders of magnitude larger than the fifty-eight morphs in the Oneida pronominal paradigm, at least according to some estimates of the average vocabulary size of English speakers. So learning or choosing words does not seem to be all that difficult. However, at least according to subjective accounts from second-language learners of English versus Oneida, learning or choosing inflected forms of a verb in Oneida is significantly more difficult than learning or choosing content words (unfortunately data suggesting that learning or using pronominal prefixes is easy or hard for native speakers is not available, as no children are learning Oneida as a first language). So, what is it that makes the retrieval and selection of a pronominal prefix difficult? OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson

Wecanonlyofferspeculativeremarkshere.Nevertheless,wethinktheissueisof sufficienttheoreticalinteresttomakeitworthspeculatingabout.First,theuseofa particular pronominal prefix is obligatory. In other words, given the situation being described there is a single appropriate prefix that speakers must choose. Second, in choosing or interpreting pronominal prefixes speakers and hearers must attend to several very general properties that are not necessarily the most salient participant properties in the situation (the properties are not basic level properties, so to speak). We can quantify this putative underlying cause of perceived complexity by language learners by the number of decisions speakers must make when choosing a pronominal prefix(weleaveasideallomorphyfornow).Theymustdecideifthesituationinvolves one or two participants. If it involves one (animate) participant, they must decide if the verb selects Agent or Patient intransitive prefixes. For each participant that is morphologically referenced, they must make decisions about person and number, and for third person participants, whether it is indefinite/nonspecific or not, and, if not, what the participant’s gender is. Thus, to choose the pronominal prefix L10 la-, a speaker must have decided that the situation involved two animate participants (1), that the proto-agent was third person (2), and if not-indefinite (3) that it was masculine (4), and singular (5) and that the proto-patient was third person (6), and if not-indefinite (7) that it was feminine-zoic (8), and singular (9). In total, speakers must make, in the worst case, between four and nine decisions (each decision requires making a choice between two and four alternatives) before they can select the appropriate morph.11 When comparing this number with the typical number of decisions required to select nominal or verbal suffixes in Indo-European the number is quite high. To properly decline Latin nouns speakers must only decide on the noun’s number and gender (we omit case, as it is syntagmatically determined, or declension class, as it is moresimilartotheeffectofstem-classesinOneida).Toproperlyconjugate,say,an ancient Greek verb only five decisions must be made (voice; mood and tense; person and number). Moreover, these five decisions are not required to select one morph, but rather a sequence of several morphs. The perceived complexity of Oneida pronominal prefixes, we surmise, is partly due to the large number of decisions speakers must make when choosing a single morph. These first two possible factors that may lead to the complexity of the Oneida pronominal prefix slot can be grouped together under the rubric attentional complexity: To make one choice (which pronominal prefix to use), speakers must attend to many distinct and concurrent properties of participants in the described situation.

11 Underspecification often helps speakers, as it reduces the number of decisions to be made. Thus, to choose the prefix L yuk- (3 >sg), speakers need only make four choices, as the prefix neutralizes all distinctions among third person except masculine singular and feminine-zoic singular. The trade-off is an increase in semantic ambiguity for hearers, as we discussed in section .. OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

Oneida morphological complexity 

Third, because of neutralization, some properties are relevant some of the time, but not all of the time. For example, the distinction between dual and plural is never relevant for proto-patients; it is also not relevant for some proto-agents, namely just when a first or second person proto-agent acts on a ‘3’ proto-patient. Speakers and hearers must therefore attend to the contrast between two or more first or second personproto-agents...unlessthesituationinvolvesa ‘3’proto-patient.Furthermore, because most neutralizations are not systematic (except for the distinction between dual and plural third person proto-patients), speakers and hearers cannot rely on blanket statements on when to ignore some morphologically relevant participant properties. Fourth, having chosen which of the fifty-eight morphs is the right one for the situa- tion speakers must then choose an allomorph. And this is not easy. Each morph has on average five allomorphs and allomorphy, although often motivated, is not automatic; it is something that speakers must learn for each morph (and appropriately use in production). Moreover, as we have illustrated, allomorphy can lead to segmentation ambiguities for the hearer. If number of morphs (or allomorphs) within the block leads to complexities, it would seem that any reduction in size might lead to reductions in complexities. Interestingly, we tried to show that this is not necessarily the case. We distinguished between two kinds of reduction in number of morphs. The first kind reduces the space of possible morphosyntactic distinctions to mark or reference. We showed that a skilled linguistic analysis can reduce the number of such distinctions from 2,499 for an unstructured space to 288/248 when general and particular constraints on feature and feature-value combinations are in effect. Reductions in the number of morphosyntactic distinctions that can be marked always lead to a reduction in complexity. But, the second kind of reduction is not unmitigatedly a simplification. This is the reduction from the 288/248 possible morphosyntactic combinations to the 58 Oneida pronominal morphs. We model this reduction mostly through the use of underspecification in the statement of rules of exponence. Rules of exponence can leave underspecified the type of nominal indices or features of nominal indices. Underspecification results in a reduction in the number of rules, but at the cost of semantic ambiguity. Given a verb form with a particular pronominal prefix hearers cannot be sure of all the properties of participants in the situation that the pronominal prefix references. In certain cases, it is possible hearers do not resolve this ambiguity, but, as we showed, some of the time what is underspecified is so important to the speaker’s communicative intent (what situation is being described), that hearers are most likely to have to disambiguate what is being referenced by the prefix, as when the proto-role is being underspecified. In the end, what Oneida pronominal prefixes tell us is that there is more to linguistic complexity than the combinatorial issues we are most familiar with from work in syntax and semantics. The need for speakers to make a set of inter-related choices OUP CORRECTED PROOF – FINAL, 6/3/2015, SPi

 Jean-Pierre Koenig and Karin Michelson toretrieveandselectaformcanalsoleadtocomplexities.Theimportanceand relevance of accessing forms when producing or understanding languages will not surprise psycholinguists. What may be more interesting is that sometimes a language’s grammatical system can make this retrieval quite a difficult matter.

Acknowledgements We thank Karin Michelson’s Oneida collaborators, and especially Norma Kennedy. We are also grateful to Matthew Baerman, Greville Corbett, and Hanni Woodbury for reading earlier drafts of this chapter, and to Cifford Abbott, Michael K. Foster, and Hanni Woodbury for discussion of Iroquoian pronominal prefixes. We thank our colleagues Rui Chaves, Matthew Dryer, David Fertig, and Jeff Good for their input. Finally, this chapter would not have been written without Farrell Ackerman and Jim Blevins spurring our interest in this topic during their 2011 LSA Institute class. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex

MARINA CHUMAKINA AND GREVILLE G. CORBETT

. The problem The Nakh–Daghestanian language Archi has a small paradigm of markers real- izing gender and number. Though small, this paradigm proves complex. We see this complexity in the inventory of inflectional targets, since almost all parts of speech can mark gender and number but not all lexemes within the same part of speech behave alike. Predicting which will show gender and number is not straightforward. More difficult is specifying the position of the gender and number markers: many items have infixal marking, and these are found in some instances where prefixal marking would be felicitous on purely phonological grounds. That is, Archi exhibits the typologically rare phenomenon of ‘frivolous’ infixation. We propose a number of factors which bear on the presence and position of the gender– number markers, and also on their forms and syncretism pattern; these factors overlap in ways which make it hard to isolate the impact of individual factors. Our approach will be to give the defaults, and the more specific overrides to these defaults. It is ironic that this complexity is found in the relatively small gender andnumberparadigm,sinceArchiisfamedratherforthesheersizeofitsother paradigms.

. Description of the system For describing and analysing the morphological marking of gender–number agree- ment in Archi we use two main sources of data. First, there is the extensive work of Aleksandr Kibrik and his colleagues: the Archi grammar published in Russian in 1977 (here we use volumes I, II, and III, referred to as Kibrik et al. 1977, Kibrik 1977a, and Kibrik 1977b respectively) and the online collection of Archi OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett texts (Kibrik et al. 2007). Our second source is the electronic dictionary of Archi (Chumakina et al. 2007) compiled in the Surrey Morphology Group in 2004–7. This work involved several field trips to the Archi village and the creation of a database with lexical, phonological, and morphological information on 4,464 Archi words. Examples given in the chapter come mainly from the database and from field- work. We start with an example which demonstrates that every part of speech in Archi (except nouns) can be an agreement target:

» » (1) b-is o‹b›q a-t‰-ib χ‰ ele i/ii.pl-1sg.gen ‹i/ii.pl›leave.pfv-attr-pl guest(i/ii.pl)[abs] b-ez dit‰a‹b›u e‹b›χni i/ii.pl-1sg.dat soon‹i/ii.pl› forget‹i/ii.pl›pfv ‘I soon forgot my guests, who had left.’ Archi is a strongly ergative language, hence agreement is controlled by the absolutive argument; in (1) it is χ‰»ele ‘guests’. The verb eχmus ‘forget’ takes two arguments, following the pattern of verbs of perception: the dative (experiencer, the person who forgets) and the absolutive (perceived entity, here: forgotten entity). Agreement on this verb is realized by the infixal marker b.WefollowtheLeipzigglossingrules in indicating infixes within angle brackets; information inferred from the bare stem is given within square brackets. In (1) every word is an agreement target (except, naturally, χ‰»ele ‘guests’ which is the controller). We consider first the more familiar agreement targets:

• the verb e‹b›χni ‘forget’ agrees with the absolutive χ‰»ele ‘guests’ in gender and number; • the participle o‹b›q»at‰-ib ‘having left’ agrees with its head noun χ‰»ele ‘guests’ in twoplaces:theinfixrealizesagreementingenderandnumber,andthesuffixin number only; • the genitive pronoun b-is ‘my’ agrees with its head noun χ‰»ele ‘guests’ in gender and number. Those instances of agreement are unsurprising: verbs often agree with their arguments, and attributives commonly agree with their heads. There are two less familiar examples of agreement in (1): • the pronoun b-ez (the first singular pronoun in the ) agrees with the absolutive argument χ‰»ele ‘guests’ in gender and number; agreement is realized by a prefix. • the adverb dita‰‹b›u ‘soon’ agrees with the absolutive argument χ‰»ele ‘guests’ in gender and number; agreement is realized by an infix. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex 

Neither of these targets has a direct syntactic connection to the absolutive argument χ‰»ele ‘guests’, yet both agree with it1. Example (1) illustrates most of the characteristics of the Archi agreement system which will be important for us: the features involved in agreement, namely gender and number, the morphological positions for agreement markers (prefixes, infixes, and suffixes), and the parts of speech serving as agreement targets. While (1) shows that verbs, participles, pronouns, and adverbs can have a morphological slot for agreement, other parts of speech such as adjectives, particles, and a postposition can also serve as agreement targets in Archi. Thus the range of agreement targets in Archi is particularly extensive. Conversely, when we consider the lexicon, the range is more restricted, since there is no part of speech where agreement is found in all lexical items. Table 6.1 presents data from the Archi dictionary showing the proportion of lexical items which show agreement.

Table . Agreeing lexical items in the Archi dictionary

total agreeing  agreeing

adjectives 446 313 70.2 verbs 1248 399 32.0 adverbs 383 13 3.4 postpositions 34 1 2.9 enclitic particles 4 1 (25.0)

Based on Chumakina and Corbett (2008: 188)

From Table 6.1 we see clearly that within each part of speech it is only some items which agree. For adjectives, the majority agree and it is possible to define formal properties of agreeing items (see 6.3 and 6.4). The three bottom lines of Table 6.1 represent parts of speech for which the need to agree must be fully specified in the lexicon, as there are no phonological, morphological, or semantic regularities. Verbs are specially interesting in this respect, and we discuss them in 6.5–6.6. Personal pronouns were not included in Table 6.1, because it would be difficult to give meaningful figures. As we saw in example (1), some forms of some pronouns have a slot for agreement. To produce the right form, one needs both lexical and morphological information: we cannot just list (or count) which pronouns agree, we must also specify which cells of their paradigm have agreement slots. We discuss this in section 6.3.

1 For more on the challenge of the agreeing adverbs in Archi, see the Wiki site ‘From compet- ing theories to fieldwork’ (http://fahs-wiki.soh.surrey.ac.uk/wiki/projects/fromcompetingtheoriestofield- workarchi/Archi.html), particularly the discussion in the seminar dedicated to topic  ‘The domain problem’. 2 The paper cited gives different numbers for the adverbs ( total,  agreeing). These numbers have been corrected recently, as we have reanalysed some of the lexical items. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

For the items that agree, agreement is in gender and number.3 In this respect, Archi provides a picture fairly typical for Daghestanian languages: there are two numbers (singular and plural) and four genders. Genders I and II comprise male and female humans respectively. The rest of the noun lexicon is distributed between genders III and IV, where III comprises animates, all insects, and some inanimates, and gender IV includes some animates, some inanimates, and abstracts. We can now examine how all this comes together in paradigms. We will look at some representative paradigms, to illustrate the points made so far. (2) Adjective ha˛du ‘real, reliable’ number sg pl gender I ha˛du-(w)4 II ha˛du-r ha˛d-ib III ha˛du-b IV ha˛du-t

The paradigm of the adjective in (2) shows a singular–plural distinction, with four gender values distinguished in the singular but no differentiation of gender in the plural. All the marking is suffixal. In contrast, (3) illustrates one of the possibilities for verbs:

(3) Verb aχas ‘lie down’, perfective stem: number sg pl gender I a‹w›χu a‹b›χu II a‹r›χu III a‹b›χu aχu IV aχu

Again, four gender values are distinguished in the singular, and in the plural we see just a two-way distinction: genders I/II versus III/IV, which amounts to a distinction between human and non-human. Note the interesting syncretisms: genders I and II in the plural have the form of gender III singular, while genders III and IV plural have the form of gender IV singular. For more on gender syncretism in the nominal domain, see Milizia (this volume). In the paradigm in (3) the agreement markers are all infixal. The adverb in (4) provides a contrast with (3) in terms of agreement exponents:

3 We have argued elsewhere, most recently in Corbett (: –), that a person feature is required in the morphosyntax of Archi. The interesting complications need not concern us here, since the person forms are always syncretic with one of the forms we shall be analysing, and so person does not add to the data we need to account for in this analysis of morphological forms. 4 The suffix [w] is not always pronounced, but it surfaces if there is a vowel following it. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex 

(4) Adverb k’ellej‹t’›u ‘entirely’ number sg pl gender I k’ellej‹w›u k’ellej‹b›u II k’ellej‹r›u III k’ellej‹b›u k’ellej‹t’›u IV k’ellej‹t’›u

This is one of the minority of adverbs which agree. The number of values distinguished by agreement and the pattern of syncretism are as we saw with the verb, but the marking is different. Though both the verb and the adverb use infixes, the verb marks genders III and IV plural and gender IV singular by the bare stem (zero marking), whereas the adverb has a overt marker (t’). Finally, we consider a pronoun: (5) First person singular pronoun, dative case: number5 sg pl gender I w-ez b-ez II d-ez III b-ez ez IV ez

Here we see another variation on the theme. The pattern of syncretism is as we saw in (3) and (4); the markers are similar to the verbal ones in (3), but here they are prefixes (note the difference in the realization of gender II singular: [r] when infixed, [d] when prefixed). This pronoun agrees with the absolutive argument, as we saw in (1). For those items that mark agreement, we have seen two different patterns of syncretism in the paradigms in (2)–(5). If there were a specific exponent for each gender–number combination, we would have eight separate agreement markers. In reality, there are no more than five exponents for each part of speech. Abstracting away from the actual forms in our examples, we can summarize the patterns of syncretism, A and B, as shown in Tables 6.2 and 6.3. We have seen, too, that Archi employs prefixes, suffixes, and infixes to realize agreement. Their distribution is not straightforward. To some extent it is determined by the part of speech: adjectives realize agreement solely by suffixes, while adverbs, postpositions, particles, and some pronouns realize agreement by the infixes shown

5 Here, ‘gender’ and ‘number’ refer to the gender and number of the absolutive of the clause with which the dative pronoun agrees. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

Table . Syncretism pattern A

sg pl

I1 3 II 2 III 3 4 IV 4

Table . Syncretism pattern B

sg pl

I1 II 2 5 III 3 IV 4

Table . Archi prefixes

sg pl

Iw- b- II d- III b-  IV 

Table . Archi infixes, Set I sg pl

I‹w› ‹b› II ‹r› III ‹b› ‹t’› IV ‹t’› in (4) and given as Set I6. Some pronouns employ prefixes, and verbs use prefixes and the infixes shown in (3) and given as Set II. Tables 6.4–6.7 show all the possible agreement affixes.

6 There are some exceptions such as the adverb b-allej‹b›u ‘for free’,where the agreement is realized twice: as a prefix and as an infix. This information is lexically specified. Double exponence can also be observed in the dative case of the first person plural exclusive pronoun (Table .). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex 

Table . Archi infixes, Set II sg pl

I‹w› ‹b› II ‹r› III ‹b› ‹› IV ‹›

Table . Archi suffixes sg pl

I-w II -r III -b -ib IV -t

To get to grips with the system of gender–number inflection in Archi, we can separate out three ‘decisions’ required to establish the right agreement form: 1. determine whether the lexical item is an agreeing one; 2. choose the right exponent, whether it is a prefix, suffix, or infix (the syncretism patterns follow from this choice, only suffixes take syncretism pattern B); 3. choose the actual shape of the affix (in those instances where there is a choice). Here the inventory of the exponents does not show great phonological variability; it is not the richness of material that contributes to the complexity, but the number of choices required to produce the appropriate word form. This situation resembles that in Oneida (Koenig and Michelson, this volume), where the number of choices in a single position class or rule block produces considerable complexity of the language. This is the bare bones of the problem. The next sections present the factors which determine these choices.

. Factors regulating the system: an overview There are several interconnected and overlapping factors which determine the shape and position of the agreement marker, if any. We sketch them here, and then analyse them in turn. Phonology plays a role: for instance, stems beginning in a vowel are likely to accept a prefixal agreement marker (as in the verb b-ak‰us ‘see.iii.sg’), while those with an initial consonant typically do not (ga‹b›χ‰»as ‘take off.iii.sg’). Then there are derivational suffixes which bring with them an agreement slot; for instance, the originally emphatic suffixej‹›u - licenses an agreement marker within itself (as in OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett k’ellej‹b›u ‘entirely.iii.sg’). In addition, there are generalizations based on the item’s part of speech: most adjectives and most verbs of a certain morphological type (namely, simple dynamic verbs) agree, while most adverbs do not. Naturally we try to appeal to fewer factors. We might imagine, for instance, that generalizations based on phonological and morphological forms might allow us to account for the agreement markers, and that we would then not need to appeal to part of speech classification. Itturnsoutthatthismovefails,asweseewhenweconsideradjectives.Thus,wecan suggest a phonological generalization, which states that vowel–initial stems are likely to accept an agreement prefix. However, there are adjectives like o»ro»s ‘Russian’ which are vowel–initial, and do not inflect for gender and number. This happens because agreeing adjectives are formed with the suffix -t‰u / -du / -nnu, which brings a suffixal agreement slot, so the adjectives not formed by suffixation do not take agreement markers. There are also adjectives like aburi-t‰u-b ‘tidy.iii.sg’, which are vowel–initial andareformedbysuffixation;thesedoagree,butonlybythesuffixfollowingthe derivational suffix. Thus the phonological generalization works only for a subset of items, namely certain parts of speech (verbs) and not for adjectives. Consonant–initial adjectives, unsurprisingly, inflect for gender and number only if they are formed by suffixation. This shows that adjectives disregard the phonological generalization, and agree suffixally or not at all. Consider next the adverbs. The majority (ten out of thirteen) adverbs which agree, have the suffix -ej‹›u (or its variant -ij‹›u foryoungerspeakers).Thisishistorically an emphatic marker, but in some current adverbs, such as mumat‰ij‹b›u ‘while I am asking you nicely’ there is no obvious emphatic semantics. In some instances -ej‹›u/-ij‹›u is synchronically a derivational marker in the sense that it is affixed to an independently attested base such as jella ‘this’ from which jellej‹t’›u ‘in this way’ is derived. In others, however, there is no independently attested base in modern Archi, as in allej‹t’›u ‘for free’. Thus, it is true that every agreeing adverb has a finalu - and the agreement marker is prefixed to it. However, it is not the case that there is a synchronically justifiable derivational rule, which would form adverbs and predict their ability to show agreement. To add to the picture, we should note that there are also adverbs like da»šo‰»nu ‘anywhere’ which happen to end in -u but which do not agree. More generally, it becomes clear that we need reference both to phonological shape and to part of speech, as well as to morphological and lexical factors at work within parts of speech. To pull these factors apart, we first look at the (partial) generalizations based on the item’s part of speech (6.4).

. Part of speech determining the position of the agreement marker EstablishingthepartofspeechofanitemisrelativelystraightforwardinArchi according to syntactic tests. We have seen that items within all parts of speech except OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex  nouns can realize agreement in number and gender. In addition, nouns inflect for case, while verbs inflect for aspect and mood, among other things. Adverbs, particles, and postpositions do not inflect (except for gender and number).7 Given the part of speech of a particular lexical item, we can predict the pattern of syncretism, and we have some knowledge of the exponent shape, though some parts of speech still allow a choice. We can give default predictions for the agreement forms for the following: • adjectives: these follow pattern B (suffixes) • adverbs: pattern A (infixes, set I), (there is one exception: the adverb b-allej‹b›u ‘for free.III.sg’ which has multiple exponence, see footnote 7) • postpositions: pattern A (infixes, set I) • enclitic particles: pattern A (infixes, set I)8 Adjectives deserve a little more discussion. As already noted, inflected adjectives are easily recognized by their form, as they all contain the suffix -t‰u/-du/-nnu. Many of them are participles of stative verbs (more on the stative–dynamic division in 6.5). For example, na»μ-du-t ‘blue’ is a participle of the verb na»k≠’ ‘beblue’.However, there are also adjectives produced by the same suffix from adverbs: hinc-du-t ‘present, actual’ (now-attr-iv.sg), jak-du-t ‘deep’ (inside-attr-iv.sg) and loanwords: zor-t‰u- t ‘strong’, χas-du-t ‘special’, mašhur-t‰u-t ‘famous’. When used attributively, adjectives inflect for number and gender only, but when used independently they can also inflect for case. The non-agreeing adjectives do not make a formally or semantically coherent group. Pronouns employ prefixes and infixes (Set I) as exponents of agreement, but this is restricted in two ways: only the paradigm of the first person pronoun is involved, and within it, only a few cells of the paradigm are affected. Table 6.8 shows a partial paradigm of the first person pronoun and, for a comparison, the second person pronoun.Allcaseswhereagreementoccursareshown,buttheparadigmispartial because there are several more cases in Archi which are not given here. Shaded cells show the agreeing forms. For those cells which agree, the genitive pronoun agrees with the head of its noun phrase or with the absolutive of the clause, if thegenitiveisgovernedbytheverb;thedativeandergativeagreewiththeabsolutive of the clause. Wegive just the singular forms (following syncretism pattern A, the form for genders I/II plural is as gender III singular, and the plural of genders III/IV plural is as gender IVsingular). Note the instance of multiple exponence (prefix and infix) in the dative of the first person plural inclusive. Finally, verbs use prefixal and infixal exponents. The system is quite complex, with phonological and morphological factors determining the choice. The next three

7 Some adverbs have the possibility of adding different endings, but for these it is not easy to decide whether we are dealing with inflection or derivation. 8 Verbs are not included in this list since they do not allow ready predictions (see .); for this reason, infix Set II, found only with verbs, is not in the list. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

Table . Partial paradigm of first and second person pronouns

nen‹t’›u žwen nena‹w›(u) un nen nena‹r›u zari nena‹b›u nen‹t’›u etc. w-is ulu la‹w›u d-is d-olo la‹r›u b-is wit b-olo la‹b›u is etc. olo etc. la‹t’›u etc. wiš w-ez w-el w-ela‹w›(u) d-ez d-el d-ela‹r›u b-ez was b-el b-ela‹b›u ez etc. el etc. el‹t’›u etc. wež

≠‰uwa≠‰ula≠‰užwa≠‰u

waχur sections provide a more detailed description of verbal agreement and discussion of the factors involved.

. Agreement marking in simple dynamic verbs Archi verbs can be divided into dynamic verbs and stative verbs. To simplify the exposition, we postpone discussion of stative verbs until 6.6. Dynamic verbs are further divided into simple and complex. The distinguishing feature of a complex verb is that it consists of an uninflected part9 plus an inflected simple dynamic verb, so the discussion of agreement inflection involves the simple verbs only. Dynamic verbs have four aspectual stems (perfective, imperfective, finalis, and potential), and a form of the imperative which also serves as a stem and is often irregular. The verbal stems can be used independently or serve as base for further inflectional forms such as participles, converbs, and various moods. The realization of perfective, imperfective, finalis, and imperative is often irregular and it is not possible to postulate rules to produce one stem from another. Thus, these four forms are the principal parts of the Archi verbal

9 Some complex verbs have a first part derived from the simple verb class; these can take gender–number agreement. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex  paradigm. They all take agreement inflection.10 Example 6 shows the principal parts in all four genders and two numbers.

(6) Simple dynamic verb aχas ‘lie down’: aspectual stems and the imperative, infix Set II perfective imperfective finalis imperative sg pl sg pl sg pl sg pl i a‹w›χu w-a‹r›χa-r a‹w›χa-s w-aχa a‹b›χu b-a‹r›χa-r a‹b›χa-s b-aχa ii a‹r›χu d-a‹r›χa-r a‹r›χa-s d-aχa iii a‹b›χu b-a‹r›χa-r a‹b›χa-s b-aχa aχu a‹r›χa-r aχa-s aχa iv aχu a‹r›χa-r aχa-s aχa

For the verb aχas ‘lie down’ agreement is realized by prefixes and infixes (we discuss other possibilities later). It is important to note that the agreement exponents are not the only infixes employed in the paradigm: the imperfective here is realized by a combination of an infix ‹r› and a suffix -r (to be further discussed when we consider example (9)). The presence of the imperfective infix implies the presence of the suffix, but not the other way round: the imperfective can be realized by the suffix only. We can observe some regularities in the paradigm in (6): the perfective and finalis have infixes, the imperfective and imperative have prefixes. We will return to this regularity in 6.5.2. Until then, the discussion will focus on the perfective and imperfective stems. The four stems presented in (6) have an uneven frequency distribution: in terms of types, the number of forms produced from the perfective and theimperfectiveishigherthanthenumberofformsproducedfromthefinalisand theimperative(therearenumerousconverbsandmoodsbasedontheperfective and the imperfective). In terms of token frequency, the perfective outnumbers the other stems quite dramatically: for 1,525 tokens found in Kibrik’s texts (Kibrik et al. 2007), there are 1,037 forms of the perfective and its derivatives, and 138 forms of the imperfective and its derivatives (Chumakina 2011: 21–2). The perfective is also less complex morphologically: it is expressed by the stem only, whereas the imperfective, besides a distinct stem, can have a phonologically distinct exponent (a suffix), very often two (suffix and infix) and sometimes three (reduplication, suffix, andinfix). The plural form of a verb is always syncretic with some form of the singular, so the plural forms will also be excluded from further discussion. When comparing different verbs, we will present the III and IV gender singular forms as they show the position of the agreement marker most clearly. In this way we avoid the complications of genders I and II: gender I singular can be realized as a labialization of the first consonant rather than prefix w-, and the gender II infix ‹r› is homonymous with the imperfective infix (see the discussion of (9)).

10 The potential stem is produced by adding the suffix -qi totheperfectivestem.Therearenoirregular- ities in the formation, and the agreement marking is the same as it is in the perfective. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

In the Archi grammar (Kibrik 1977b) there are 163 simple dynamic verbs. Of these, 144 have a morphological slot for agreement. Of the 144 inflecting verbs, there are two which deserve mentioning, namely bat‰eš‰as ‘fulfil’ and bije≠‰as ‘begin’. Morphologically, they are not simple, but complex verbs: in the imperfective, their second gender singular forms are batderš‰ar and bijder≠‰ar respectively, i.e. the prefixal form of the gender II marker d-isused,and not the infixal ‹r›. Synchronically, they are unanalysable, and were included in the list ofsimpleverbsinKibrik(1977b).Weconsiderthemcomplexverbsanddonotinclude them in the count. Our count is therefore different from Kibrik’s: 161 simple dynamic verbs, of which 142 have a morphological slot for agreement. Therefore, we have a frequency-based expectation for the dynamic verbs to realize agreement overtly. In this section, we discuss the non-inflecting verbs, and then we establish the types of simple dynamic verbs according to the position of the agreement marker, before addressing the factors which determine the type to which a verb belongs. The first logical step in predicting the form of agreement is to establish whether the verb agrees at all. There are nineteen non-inflecting simple verbs in Archi, and the information on whether the verb agrees appears to be lexically specified. All the simple dynamic non-agreeing verbs are listed in (7), where we provide the form of the perfective stem. (7) Non-inflecting simple dynamic verbs (perfective stem) abc’u ‘hew’ bo ‘speak’ dorq’u ‘moan’ o≠‰u ‘be silent’ babχ‰»u ‘swell’ boq’»o ‘return’ dub≠‰u ‘sew’ sesu ‘roast’ ba»k≠ni ‘press’ dab≠u ‘unlock’ dubqu ‘destroy’ χ»eχ»ni ‘ferment’ barhu ‘babysit’ da»šni ‘cloud up’ gobχ‰u ‘scratch’ χwet‰u ‘swear’ bec‰’u ‘be able’ dat‰i ‘clear up’ ja‰hu ‘winnow’ There is no apparent semantic homogeneity in this group. As for formal regularities, they can only be formulated as expectations which have partial coverage. First, there is an expectation for vowel–initial verbs to inflect: only one verb, o≠‰u ‘be silent’, out of nineteen non-inflecting verbs, is vowel–initial. Second, there is an expectation for verbs with initial b-tobenon-inflecting:therearesevenb-initial verbs in (7), and no b-initial simple inflecting verbs. In this list, there are six non-inflecting d-initial verbs, but for them no expectation can be formulated, as there are also four inflecting d-initial verbs. We might assume that the fact that b- initial verbs do not inflect is a consequence of b- being one of the gender–number markers. However, d-isalsoa gender–number marker, and we find both inflecting and non-inflecting d-initial verbs: there are six non-inflecting ones in (7) and four inflecting d-initial verbs11.

11 It might appear that the non-inflecting verbs end in a high or mid-high vowel. However, this is part of a more general pattern, irrespective of the inflecting/non-inflecting distinction. Inflecting verbs are distributed as follows: twenty-five end in -i, seventy-two in -u,twentyin-e,fourteenin-o, and eleven in -a. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex 

The rest of the simple dynamic verbs have a morphological slot for agreement, which can be prefixal or infixal. Based on this, three types of morphological marking of verb agreement can be distinguished, and we present them in turn. Prefixal: the agreement marker is a prefix in all forms:12 (8) acu ‘milk’ perfective imperfective finalis imperative iv.sg iii.sg iv.sg iii.sg iv.sg iiisg iv.sg iii.sg acu b-acu a‹r›ca-r b-a‹r›ca-r aca-s b-aca-s aca b-aca

This is the most simple type for two reasons: first, all the principal parts have thesame position for the agreement marker. Second, since this position is prefixal, as we observe in the forms for gender III, there is no conflict in the imperfective between the infixal imperfective marker ‹r› and the agreement marker. Infixal: the agreement marker is an infix in all forms:

(9) caχu ‘throw’ perfective imperfective finalis imperative iv.sg iii.sg iv.sg iii.sg iv.sg iii.sg iv.sg iii.sg caχu ca‹b›χu ca‹r›χa-r ca‹b›χa-r caχa-s ca‹b›χa-s caχa ca‹b›χa

This type is slightly more complex than the prefixal one: although all forms have thesamepositionfortheagreementmarker,thispositionisinfixal,whichcreates competition with the imperfective. Compare the gender IV singular form carχar in the imperfective with the gender III singular: cabχar. The agreement marker of gender IV is zero; this leaves the infixal position ‘vacant’,and this infixal position is taken by the imperfective exponent ‹r›. In contrast, in gender III singular, all we see is the agreement marker ‹b›, which blocks the imperfective marker from appearing infixally. However, theimperfectivestillhasanovertrealization-r in the suffixal position. Mixed: the agreement marker is an infix in the perfective and finalis, and a prefix in the imperfective and imperative: (10) ak≠u ‘put through’ perfective imperfective finalis imperative iv.sg iii.sg iv.sg iii.sg iv.sg iii.sg iv.sg iii.sg ak≠u a‹b›k≠u a‹r›k≠a-r b-a‹r›k≠a-r ak≠a-s a‹b›k≠a-s ak≠a b-ak≠a

12 Here we show the forms of genders III and IV only to indicate the position of the agreement marker. There is no overt IV gender singular exponent. Recall that the potential stem is formed in a fully productive way and so it is not included here. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

In one respect, this type is less complex than the infixal one in that there is no competition between markers in the imperfective. But it is more complex than both of these types in that there is no single position for the agreement marker in the paradigm. The mixed type demonstrates clearly that infixation in Archi is indeed ‘frivolous’, that is, it is not phonologically conditioned (Yu 2007: 41–2); see also Anderson (this volume) on the non-canonicity of infixation. The gender III form of the imperfective b-a‹r›k≠a-r shows that there is no phonological condition preventing gender–number being marked by a prefix. In this form, the gender–number marker takes prefixal position, the aspectual marker takes the infixal one. Similarly, the imperative b-ak≠a shows prefixal marking. However, in the perfective and finalis both positions are open for the gender–number marker (compare (8) where perfective and finalis have prefixal agreements). Yet all verbs which belong to the mixed type choose the infixal position in the perfective and finalis for the agreement exponent, in spite of the prefixal position being phonologically available. The distribution of verbs over these three types (prefixal, infixal, and mixed) is partly determined by the phonology. We discuss the phonological factors in the next section.

.. Phonological factors determining the position of the agreement marker Phonological shape can be a predictor for the verb’s type of agreement realization (prefixal, infixal, or mixed). We need to take the following into account: • initial phoneme type (consonant or vowel) • number of syllables • place of stress • individual phonemes The major division is into consonant–initial and vowel–initial verbs, and we discuss the other factors within this broad division. Verbs which are vowel–initial are all bisyllabic with the stress on the first syllable. Consonant–initial verbs display greater variety in their phonological shape: they can be mono- or polysyllabic, with stress on either the initial or the second syllable. Vowel–initial verbs can belong to the prefixal or mixed type, consonant–initial verbs can be infixal or prefixal. In the following sections (6.5.1.1 through 6.5.1.5) we will discuss each shape in turn and define the type of agreement that it takes (prefixal, infixal or mixed). It is the consonant–initial verbs where the placement of the agreement marker is fully phonologically conditioned, so we will start with the discussion of these. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex 

... Consonant–initial polysyllabic verbs with the stress on the first syllable: infixes If the verb is consonant–initial, polysyllabic, and the stress falls on the first syllable, then gender–number agreement is realized by infixes both in the imperfective and the perfective. In the imperfective the agreement marker replaces the imperfective infix everywhere except where agreement has no realization, as in gender IV singular, and there the imperfective infix ‹r› surfaces:

(11) perfective imperfective iv.sg iii.sg iv.sg iii.sg throw cáχu cá‹b›χu cá‹r›χa-r cá‹b›χa-r must kwášu kwá‹b›šu kwá‹r›ša-r kwá‹b›ša-r search χ‰wá‰k’u χ‰wá‹b›k’u χ‰wá‹r›k’a-r χ‰wá‹b›k’a-r

Out of 142 inflecting simple dynamic verbs there are twenty-two verbs that meet these phonological conditions, but only twenty of them realize agreement by an infix. There are two verbs which have prefixes:

(12) perfective imperfective iv.sg iii.sg iv.sg iii.sg frown k≠’ó k ≠’u bo-k≠’ó k ≠’u k≠’ó k ≠’u-r bo-k≠’ó k ≠’u-r get lost q’»ák’a ba-q‰’»á-ba-k’a q’»é-k’e‹r›k’i-r be-q‰’»é-be-k’e‹r›k’i-r

The verb k≠’ók≠’u ‘frown.pfv.iv.sg’ is formed by derivational reduplication (as opposed to the inflectional reduplication used in the production of the imperfective), and this is exceptional in the Archi verbal system. The verb q’»ák’a ‘get lost.pfv[iv.sg]’ looks like a complex verb consisting of two parts, q’ »áandk’a. Double exponence of gender– numberandtheusageofaprefixinthesecondposition(asindá-q‰’»á-da-k’a ‘get lost.pfv.ii.sg’) suggest the independence of the parts, but other characteristics argue in favour of considering this to be one word. First, it has a single stress. Second, if it were a complex verb, then the fact that both parts take agreement inflection would indicate that both parts were verbal. If that were the case, they should both realize the imperfective, but while the second part k’e‹r›k’i-r does this by familiar means (reduplication, vowel change, a suffix, and an infix), the first part only changes the vowel, which would be a unique way of forming the imperfective in Archi. And finally, thetwopartshavenoseparatemeanings.Thusthetwoverbsk≠’ók≠’u ‘frown’ and q’»ák’a ‘get lost’ do not follow the pattern of the main group. ... Consonant–initial monosyllabic verbs: prefixes If a verb consists of one sylla- ble only, the agreement is realized by prefixes with an epenthetic vowel before the first consonant of the stem: OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

(13) perfective imperfective iv.sg iii.sg iv.sg iii.sg melt c’o bo-c‰’ó 13 c’a-r ba-c‰’á-r lick čo bo-čó ča-r ba-čá-r die k’a ba-k’á k’a-r ba-k’á-r ignite k’u bu-k’ú k’wa-r bu-k’á-r slaughter k≠’u bu-k≠’ú k≠’wa-r bu-k≠’á-r divide q’»o bo-q‰’»ó q’»a-r ba-q‰’»á-r touch s‰o bo-s‰ó s‰a-r ba-s‰á-r get up χ‰o bo-χ‰ó χ‰a-r ba-χ‰á-r » » » » carry χ o bo-χ ó χ a-r ba-χ á-r

Note that except for the verb k’a ‘die’,all of these verbs have [o] or [u] in the perfective stem. The consonants show more variety, in terms of place and manner of articulation: there is no restriction on the initial consonant. The imperfective is realized by vowel change and the suffix -r. When the agreement prefix attaches, the stress remains on the stem, so the epenthetic vowel is pronounced as schwa, as there is no unstressed [a] or [o] in Archi. We follow the orthographic convention suggested by Kibrik et al. (1977: 351)andspellthepretonicvowelwiththesamecharacterasthestressedvowel.The prefixal [u] in the third gender forms of ‘ignite’ and ‘slaughter’ is a reflex of a labialized consonant (seen in k’wa-r and k≠’wa-r respectively). An interesting variation of this phonological shape is the verbs which are mono- syllabic only in the perfective, and reduplicated in the imperfective. They, too, realize agreement by prefixes:

(14) perfective imperfective iv.sg iii.sg iv.sg iii.sg rot ša» ba»-šá» šé»‹r›ši-r be»-šé»‹r›ši-r14 win χa ba-χá χé‹r›χi-r be-χé‹r›χi-r

We should ask why there is no agreement infix in the imperfective, i.e. why it is ∗ be»-šé»‹r›šir,not šé»‹b›šir. If the rules were entirely phonological, then šé»‹r›ši-r in (14) should have had the infixal agreement marker, just as cá‹b›χa-r in (9). This is because it is a consonant–initial bisyllabic stem with stress on the first syllable, i.e. it has the phonological properties which favour infixation. Instead of this, it maintains

13 We show the position of the stress only where it is relevant to the discussion. 14 The vowel after b- in perfective and imperfective is the same vowel (some kind of schwa); the apparent difference here is an artefact of the spelling convention. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex  the morphological properties of the base lexeme. In fact, we do not find an instance where the agreement is infixal in the imperfective and prefixal in the perfective anywhere in the verbal system of Archi. The perfective is morphologically simpler than the imperfective and therefore it is not surprising that more morphologically complex (infixal) form for gender–number occurs in just the perfective, or in both aspectual stems but never in the imperfective alone. Thus, in order to predict the type (prefixal, infixal, or mixed), it is enough to describe the phonological shape ofthe perfective. Morphology is therefore playing a role here because these shapes describe the perfective stems, but the result is valid for the whole paradigm, indicating that this is a property associated with the lexeme.15 This regularity is complete: out of 144 inflected simple dynamic verbs, there are twenty-one consonant–initial verbs which have a monosyllabic perfective, and they are all of the prefixal type. ... Consonant–initial polysyllabic verbs with the stress on the second syllable: pre- fixes If the verb is consonant–initial and polysyllabic, and the stress falls on the second syllable, the agreement is realized by a prefix with an epenthetic vowel (15). The verbs of this shape are underlyingly monosyllabic (in the perfective) so the realization of agreement is unsurprisingly the same as that of the monosyllabic verbs discussed previously.

(15) PERFECTIVE IMPERFECTIVE IV.SG III.SG IV.SG III.SG sift c’enné be-c’né c’émc‰’in be-c‰’é m c ‰’in press č’e»nné» be»-č’né» č’a» n ba» -č‰’á » n reconcile q’oc’ó b o - q’c’ó q’ac’á-r ba-q’c’á-r eat kunné bu-kné kwan bu-kán pull k≠enné be-k≠né k≠an ba-k≠án tickle ≠oró≠ni bo-≠ró≠ni ≠oró≠in bo-≠ró≠in

The forms with overt agreement (iii.sg here) look very much like the forms we saw in (13): there is an epenthetic vowel after the agreement marker, and the form is bisyllabic with the stress on the second syllable. The first vowel in the gender IV singular form, as in c’enné ‘sift’,is an epenthetic vowel, and the gender III singular be-c’né has the same syllabic structure as the gender III singular of ‘lick’ (bo-čó).

15 Note, however, that the perfective only predicts the verb behaviour in terms of gender–number marking.Theformoftheimperfectiveitselfcannotbepredictedbytheperfective. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

It is the shape of the perfective which predicts that the verb belongs to the prefixal type: all the perfectives of this type are bisyllabic with the stress on the second syllable (underlyingly monosyllabic). The imperfective can be monosyllabic (as in č’a»n ‘press.ipfv[iv.sg]’), reduplicated (c’émc‰’in ‘sift.ipfv[iv.sg]’), or bisyllabic (q’ac’ár ‘reconcile.ipfv[iv.sg]’). But all these verbs realize agreement in the same way, by a prefix. There are twenty-six such verbs. ... Vowel–initial verbs Vowel–initial (inflecting) verbs show a different picture. They number seventy-three and, except for the single verb as ‘do’, they are all bisyllabic withthestressonthefirstsyllable.Thestressdoesnotmove,sowedonotindicateit here. Thirty-four vowel–initial verbs belong to the prefixal type, as in (16):

(16) perfective imperfective iv.sg iii.sg iv.sg iii.sg soften a‰q’»u b-a‰q’»u a‹r›q’»u-r b-a‹r›q’»u-r milk acu b-acu a‹r›ca-r b-a‹r›ca-r see ak‰u b-ak‰u ak‰u-r b-ak‰u-r leave akdi b-akdi a‹r›k‰i-r b-a‹r›k‰i-r bite eq‰’u b-eq‰’u e‹r›q‰’u-r b-e‹r›q’u-r pour eχu b-eχu e‹r›χu-r b-e‹r›χu-r

Thirty-nine vowel–initial verbs belong to the mixed type, that is, they have infixes in the perfective (and finalis), and prefixes in the imperfective (and the imperative) as illustrated in (17):

(17) perfective imperfective iv.sg iii.sg iv.sg iii.sg break aq»u a‹b›q»u a‹r›q»a-r b-arq»a-r do aču a‹b›ču a‹r›ča-r b-arča-r break ak’u a‹b›k’u a‹r›k’u-r b-ark’u-r stay eχ‰u e‹b›χ‰u e‹r›χ‰u-r b-erχ‰u-r rain eχdi e‹b›χdi e‹r›χi-r b-erχi-r

There seems to be no apparent difference in the phonological shape of verbs which belong to the different types: the syllabic structure and the stress placement are the same, and both prefixal and mixed verbs have affricates, fortis consonants, pharyn- gealized and ejective consonants. Table 6.9 compares verbs of similar phonological shape which have different types of placement of the agreement marker. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex 

Table . Verbs of similar phonology, but different patterns of realizing agree- ment perfective imperfective gloss consonant type iv.sg iii.sg iv.sg iii.sg

soil prefixal aχ‰ub-aχ‰u a‹r›χ‰u-r b-a‹r›χ‰u-r fortis stay mixed eχ‰u e‹b›χ‰u e‹r›χ‰u-r b-e‹r›χ‰u-r break prefixal aq»u a‹b›q»u a‹r›q»a-r b-a‹r›q»a-r pharyngealized » » » » switch off mixed aχ u a‹r›χ u a‹r›χ u-r b-a‹r›χ u-r break ejective mixed ak’u a‹b›k’u a‹r›k’u-r b-a‹r›k’u-r wake up affricate mixed≠ ek u e‹b›k≠u e‹r›k≠u-r b-e‹r›k≠u-r sweep prefixal ek≠’u b-ek≠’u e‹r›k≠’u-r b-e‹r›k≠’u-r ejective affricate » » » » sober up mixed o č’ni o ‹b›č’ni o ‹r›č’in b-o ‹r›č’in

Aphonologicalregularitycanbeobservedforvowel–initialverbs,butitconcerns only a minority of them: verbs which have [r] in the perfective stem in combination with some other consonant, are prefixal. Some examples:

(18) perfective imperfective iv.sg iii.sg iv.sg iii.sg hide erq’w»ni b-erq’w»ni erq’w»in b-erq’w»in » » » » scoop erχ‰ u b-erχ‰ u erχ‰ u-r b-erχ‰ u-r cool down o»rču b-o»rču o»rču-r b-o»rču-r dunk ors‰u b-ors‰u ors‰u-r b-ors‰u-r

There are twelve such verbs. For the rest of the vowel–initial verbs, the placement of the agreement marker is lexically specified. ... Summary The analysis of the verbs according to their phonological shape and the number of instances is brought together in Figure 6.1. Recall that the phono- logical characteristics of the verb concern its perfective stem only, but the result (type membership) is for the whole paradigm, hence there is a morphological factor in play here as well as a phonological one. Figure 6.1 should be read as follows: there are sixty-nine consonant–initial simple dynamic verbs out of which there are twenty-one monosyllabic and forty-eight poly- syllabic verbs.16 The polysyllabic verbs are divided according to the stress placement: twenty-two verbs have the stress on the first syllable and they are infixal. Twenty-six polysyllabic verbs have the stress on the second syllable and they are all prefixal. And

16 Recall that the vowel–initial verbs are all bisyllabic. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

(20) infixal

first-syllable stress (22)

second-syllable stress (26) polysyllabic (48) (2)

consonant-initial (69) monosyllabic (21) prefixal vowel-initial (73) stem: [r+C(C)] (12) (22)

stem: other (61

)

(39) mixed

Figure . Types of simple dynamic verbs according to the phonological shape of their perfective stem (total 142) so on. This accounts for 142 inflecting verbs. We can draw several conclusions. First, in absolute terms, there is a tendency for Archi simple dynamic verbs to choose prefixal marking: eighty-three in total out of 142. (To get to this number, one needs to add all numbers above the arrows pointing to the prefixal type: 2+26+21+12+22.) Second, for just over a half of the verbs (eighty-one out of 142) the placement of the agreement marker is regulated by phonological factors (all arrows except the lower two, shown in bold, define the phonological shape of the verb which predicts the realization of the agreement); these phonological factors predict prefixes more frequently than infixes (sixty-one against twenty). And third, there are sixty-one verbs remaining for which the placement of the marker is lexical information (two bold arrows). Of these sixty- one verbs, thirty-nine are maximally complex: besides requiring lexical specification of the realization of agreement markers, they show a split in the paradigm (some stems takeinfixes,sometakeprefixes),andthisisturnindicatesthattheyshowfrivolous infixation. In this respect, Archi is more complex than some other Daghestanian languages. For instance, in Tsez phonology is the decisive factor, and only vowel–initial verbs agree (Polinsky and Potsdam 2001: 586). Figure 6.1 shows clearly the status of thephonologicalfactors:ontheleftwehavethesefactorsandthenumberofverbs towhichtheyapply;ontherightwehavedecisionpointswithnofactorattached, indicating that we have not been able to find an operative phonological factor here.

.. Morphological factors So far, we have shown the formation and the agreement position of two stems only, perfective and imperfective. We should consider the rest of the paradigm. As we OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex  noted earlier, the only other two forms which we need to consider are the finalis and imperative, given that the rest of the very large paradigm can be predicted from them. For the placement of the agreement marker, we need to discuss the mixed type only. For this, there are the following paradigm regularities: in the finalis, the gender– number markers are placed as they are in the perfective, while the imperative has the same gender–number marking as the imperfective17. This is shown in (19):

(19) gloss PERFECTIVE IMPERFECTIVE IV. SG III. SG IV. SG III. SG ‘die out’ (of flame) aχ u a‹b›χ u a‹r›χ u-r b-a‹r›χ u-r ‘wake up’ ek u e‹b›k u e‹r›k u-r b-e‹r›k u-r

FINALIS IMPERATIVE IV. SG III. SG IV. SG III. SG aχ a-s a‹b›χ a-s aχ a b-aχ a ek a-s e‹b›k a-s ek a b-ek a

However, this paradigmatic regularity holds only if the phonological shape of the finalis stem allows it. For four verbs of the mixed type this is not the case: their finalis is irregular and has a phonological shape that requires prefixes (rather than the expected infixes):

(20) PERFECTIVE IMPERFECTIVE FINALIS IMPERATIVE gloss IV.SG III.SG IV.SG III.SG IV.SG III.SG IV.SG III.SG become édi é‹b›di ke-r be-ké-r ke-s be-ké-s ká ba-ká go óq»a ó‹b›q»a ó‹r›q»i-r b-ó‹r›q»i-r q»e-s be-q»é-s óq»a b-óq»a take away óχ‰a ó‹b›χ‰a ó‹r›χ‰i-r b-ó‹r›χ‰i-r χ‰e-s be-χ‰é-s χ‰éχ‰a be-χ‰éχ‰a take along óka ó‹b›ka ó‹r›ki-r b-ó‹r›ki-r kará-s ba-krá-s karáka ba-kráka

In the first three verbs, édi ‘become’, óq»a ‘go’, and óχ‰a ‘take away’, the finalis stem is consonant–initial and monosyllabic. As we saw in the discussion of the perfectives, formsshapedthiswayrealizeagreementbyprefixeswithanepentheticvowel,and the finalis here follows this rule. The finalis stem of the fourth verb, óka ‘take along’, is bisyllabic with the stress on the second syllable, and such stems, as we saw in 6.5.1.3, also take prefixes with an epenthetic vowel. Thus the morphological factor regulating the shape of the paradigm holds onlyif the realization of gender–number does not violate the phonological rules.

17 This regularity applies trivially across the rest of the system, since prefixal verbs have prefixes throughout and infixal verbs have infixes throughout. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

.. Conclusions for simple dynamic verbs We have exemplified the phonological factors that have an impact on gender–number marking in simple dynamic verbs. We have also demonstrated that the phonological shape of the perfective determines the behaviour of the rest of paradigm. There is one mixed type possible, and this rests on an implicational relation of stems in the paradigm, which shows that morphological factors have a role. A point of general interest is that the Archi system does indeed have frivolous infixing, that is, infixing whichisnotdeterminedbyphonologicalfactors.Thisisclearfromtheinstanceswhere infixal agreement occurs even though a viable prefixal slot is available. We showed that there are verbs (sixty in total) where phonological factors allow both infixes and prefixes. These are vowel–initial verbs (there are seventy-five of them altogether, but fifteenofthemhave[r]inthestem,andinfixesarenotpossibleinthisenvironment). Out of sixty verbs which, in principle, could belong to the prefixal or to the mixed type, forty belong to the mixed type, that is, they have infixes in the perfective and finalis where there is no apparent reason for them not to have prefixal marking. Such verbs also demonstrate that phonological and morphological factors are not the whole story. For a significant number of verbs, the placement of the agreement marker is lexically specified, and the majority of such verbs belong to the more complex mixed type, which involves a split in the paradigm with frivolous infixation in a part of it. Our discussion in this section has covered the dynamic verbs. We now turn to the contrasting situation found in the stative verbs.

. Stative verbs The most salient and clearest characteristic of stative verbs is that they lack the multiple stems of the dynamic verbs. In fact, they have only one stem, the imperfective, which can be used as an independent predicate: (21) ja-r laha-s‰-u han sini? this-ii.sg girl(ii).sg.obl-dat-and what(iv)[sg.abs] know ‘What then does this girl know?’18 Stative verbs also have all the usual forms derived from the imperfective stem, such as participles, converbs, and adverbial nouns, for example: sin-ši ‘knowing’ (converb), sini-t‰u-t ‘known’ (participle), sini-kul ‘knowledge’ (adverbial noun). The dynamic– stative distinction corresponds largely to a semantic distinction. Stative verbs denote a state rather than an action, for example: do‰»z ‘be big’, aχ ‘be far’, hiba ‘be good’, ja»t’an

18 The example is sentence  from text  of the online collection of Archi texts (Kibrik et al. ).In context the force is: ‘How would she know she had been lied to?’ The experiencer stands in the dative and the perceived entity in the absolutive. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Gender–number marking in Archi: Small is complex 

‘be red’. However, sometimes the semantic difference is less clear; compare the stative verbs k≠’an ‘love, want’, sini ‘know, know how’, and dynamic arhas ‘think, worry’. We might expect that stative verbs would take agreement morphology like dynamic verbs, to the extent that is possible given their single stem. However, unlike dynamic verbs, stative verbs are mostly non-inflecting: out of 190 simple stative verbs in the current version of the Archi dictionary (Chumakina et al. 2007) only seven inflect. This is the full list of inflecting stative verbs:

(22) gloss iv.sg iii.sg ache, hurt ác‰’ar b-ác‰’ar » » be enough aχ b-aχ be tasty ic’ b-ic’ be hungry íq‰’»a b-íq‰’»a be heavy iq’w» b-iq’w» be wide q’wa b-uq’a be better χáli b-aχáli

For agreement inflection, stative verbs use prefixes only. In all instances but one this is in accordance with the rules for dynamic verbs. That is, although we do not find as many stative verbs realizing agreement as we would expect if they followed the rules for dynamic verbs, when they do show agreement morphology, it is normally accordingtothesamerulesasfordynamicverbs.Fiveoutofsevenverbsarevowel– initial and therefore allow prefixes, though in principle they could have infixes as well. The verb q’ wa ‘be wide’ is a consonant–initial monosyllabic and takes the agreement prefixwithanepentheticvowel(realizedas[u]duetotherootconsonantlabialization) just like dynamic verbs do. The verb χáli ‘be better’ behaves differently. It is a bisyllabic verb with initial stress. If it were a dynamic verb, we would expect an infix here, like, cáχas ‘throw’ whose III gender singular is cá‹b›χas.However,χáli takes a prefix. The agreement behaviour of stative verbs, therefore, confirms their separate status. The fact that a verb belongs to the class of statives, which is morphologically deter- mined by the fact of having only an imperfective stem, provides a default prediction that it will not agree. For seven verbs the fact that they do agree is lexically specified information (compare the dynamic verbs where 142 out of 161 agree, and for most of the non-agreeing dynamic verbs this is lexical information). When stative verbs do show agreement, it is generally in line with the factors operating for dynamic verbs. However, just one verb takes a prefix where an infix would be expected. The effect is that those stative verbs which agree (the minority) all take prefixes. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Marina Chumakina and Greville G. Corbett

. Conclusions We have focused on a small part of the extensive paradigms of Archi, the realization of gender–number agreement. We have shown that the behaviour of a given lexical item can be predicted in part, on the basis of information as to its part of speech: thus,forinstance,mostadverbsdonotagree,butthosethatdotakeinfixesofSetI and so their syncretisms follow Pattern A. We saw the important role of phonological factors, particularly in our discussion of verbs. These factors had to be complemented by morphological predictions: within the verbs there are opposing predictions for dynamic verbs (by default these agree) and stative verbs (by default these do not). Moreover, there is a hierarchical relation of stems, in that the perfective determines the type of the paradigm, and the mixed type of verb (with prefixes and infixes) involves just two pairings of stems. However, even allowing for all of these different and overlapping predictive factors, our analysis allows for a significant residue. It is important to recall that many verbs belong to a complex type, with frivolous infixation in a part of the paradigm. Thus within this small area of the system, the gender– number paradigm, we have found considerable complexity, since for all the different factors we demonstrated, we still had to appeal to lexical specification for a significant proportion of the lexical items.

Acknowledgements The support of the AHRC (grant AH/I027193/1 From competing theories to fieldwork: the challenge of an extreme agreement system) and of the ERC (grant ERC-2008- AdG-230268 MORPHOLOGY) is gratefully acknowledged. Versions of the chapter were presented at the Conference on Morphological Complexity in London (January 2012) and at the 15th International Morphology Meeting in Vienna (February 2012); we thank both audiences for their helpful comment. We are also grateful to John Harris and Erich Round for useful advice and comments, and to Lisa Mack for help in preparing the chapter. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Part III

Measuring Complexity OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems: Some implications for computing morphological complexity

GREGORY STUMP AND RAPHAEL A. FINKEL

. Introduction Recent research has drawn attention to the possibility of measuring an inflection-class system’s complexity by means of computations based on a formal representation of that system (see, for example, Moscoso del Prado Martín et al. 2004, Ackerman et al. 2009, Finkel and Stump 2009, Milin et al. 2009). Proposed computations include both information-theoretic measures (the conditional entropy of cells and of paradigms) and set-theoretic measures (number of principal parts, cell predictiveness, the pre- dictability of a cell’s word form and of a lexeme’s inflection-class membership). Some of this research has neglected to take account of how critically these computations depend on the manner in which inflection-class systems are represented. The same system may be represented in different ways, each giving rise to a different setof results. In our work, we represent a language’s inflectional system as a plat1:atablewith morphosyntactic property sets (MPSs) on the horizontal axis, inflection classes (ICs) on the vertical axis, and the exponence of a particular property set in a particular inflection class in the appropriate column and row (Stump and Finkel 2013). As we show, the construction of a plat for a given language entails a series of choices: What are the MPSs for which lexemes inflect in that language? What are its ICs? And, perhaps most importantly, how are its inflectional exponents to be represented?

1 In American English, a plat is ‘a plan, map, or chart of a piece of land with actual or proposed features (as lots)’ [Merriam-Webster online]. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel

By way of illustration, consider the forty English verbs in (1). A plat representing the inflectional differences among these verbs minimally includes the MPSs2 in(2). We assume a variety of English in which the verbs in (1) have the forms represented orthographically in Table 7.1; but because the characteristics of English orthography introduce issues that are orthogonal to our present concerns, we focus our attention on phonological and morphophonological representations of a verb’s exponence. The inflectional differences among the verbs in (1) are always situated in or after the stem’s syllable nucleus; for this reason, the inflectional exponents listed in our plats consist of a verb stem’s rime and its inflectional suffix (if there is one). In this mode of representation, each of the verbs in (1) inflects differently; we accordingly use each verb as the name of the inflection class to which it belongs. Thus, the plats that we discuss here have five columns (one for each of the property sets in (2)) and forty rows (one for each verb in (1)). (1) band build do flee hide load pass ride send stand bite buy draw fly last lose pay say shop sting bring cast feel hang lean make peel see sing teach budge choose fight have light mean read seek slide write (2) {infinitive / subjunctive / default present} e.g. to sing / that she sing /wesing {3sg present indicative} he sings {past / irrealis} she sang /ifhesang tomorrow {present participle} singing {past participle} sung We consider two possible plats for the verbs in (1). In the first of these, given in Table 7.2,we represent a verb form’s exponence (its stem’s rime and—if one is present— its inflectional suffix) in phonemic transcription. (Throughout, transcriptions depict a standard American English pronunciation.) In the second, given in Table 7.3, we use morphophonemic transcription augmented by morph boundaries and indices that identify aspects of a word’s morphological structure; these include the indices in Table 7.4. It is clear that these plats represent different things. The plat in Table 7.2 represents differences among the forms in Table 7.1 that are directly accessible to auditory per- ception (specifically, those differences that are phonemically contrastive); we therefore call the first plat a hearer-oriented (H-O) plat. In this plat, the past participles of cast and pass are alike (both have the exponence /æst/), as are those of mean and send (with exponence /εnt/). The plat in Table 7.3, by contrast, represents a fluent English speaker’s

2 If be were among these verbs, we would have to assume a somewhat more elaborate system of morphosyntactic contrasts, since be has a first-person singular present indicative formam ( ); its infini- tive/subjunctive form be is distinct from its default present form are; it exhibits a special singular past indicative form (was); and it exhibits a number of negatively inflected formsaren’t, ( ain’t, isn’t, wasn’t, weren’t). OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems 

Table . Orthographic forms of the verbs in (1)

Lexeme {inf} {3sgPresInd} {past} {presPple} {pastPple} band band bands banded banding banded bite bite bites bit biting bitten bring bring brings brought bringing brought budge budge budges budged budging budged build build builds built building built buy buy buys bought buying bought cast cast casts cast casting cast choose choose chooses chose choosing chosen do do does did doing done draw draw draws drew drawing drawn feel feel feels felt feeling felt fight fight fights fought fighting fought flee flee flees fled fleeing fled fly fly flies flew flying flown hang hang hangs hung hanging hung have have has had having had hide hide hides hid hiding hidden last last lasts lasted lasting lasted lean lean leans leaned leaning leaned light light lights lit lighting lit load load loads loaded loading loaded lose lose loses lost losing lost make make makes made making made mean mean means meant meaning meant pass pass passes passed passing passed pay pay pays paid paying paid peel peel peels peeled peeling peeled read read reads read reading read ride ride rides rode riding ridden say say says said saying said see see sees saw seeing seen seek seek seeks sought seeking sought send send sends sent sending sent shop shop shops shopped shopping shopped sing sing sings sang singing sung slide slide slides slid sliding slid stand stand stands stood standing stood sting sting stings stung stinging stung teach teach teaches taught teaching taught write write writes wrote writing written OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel

Table . Hearer-oriented plat for the verbs in (1)

Lexeme {inf} {3sgPresInd} {past} {presPple} {pastPple}

band ænd ændz ændәd ændëŋændәd bite aëtaëts ëtaëtëŋ ëtn bring ëŋ ëŋz :t ëŋëŋ :t budge RM RMәz RMd RMëŋ RMd build ëld ëldz ëlt ëldëŋ ëlt buy aë aëz :taëëŋ :t cast æst æsts æst æstëŋæst choose uz uzәz oFzuzëŋoFzn do u Rz ëduëŋ Rn draw ::zu:ëŋ :n feel il ilz εlt ilëŋ εlt fight aëtaëts :taëtëŋ :t flee i iz Fdiëŋ εd fly aë aëzuaëëŋoFn hang æŋ æŋz Rŋæŋëŋ Rŋ have æv æz æd ævëŋæd hide aëdaëdz ëdaëdëŋ ëdn last æst æsts æstәd æstëŋæstәd lean in inz ind inëŋind light aëtaëts ëtaëtëŋ ët load oFdoFdz oFdәd oFdëŋoFdәd lose uz uzәz :st uzëŋ :st make eëkeëks eëdeëkëŋeëd mean in inz εnt inëŋ εnt pass æs æsәz æst æsëŋæst pay eë eëzeëdeëëŋeëd peel il ilz ild ilëŋild read id idz εdidëŋ εd ride aëdaëdz oFdaëdëŋ ëdn say eë εz :deëŋŋ εd see i iz : iëŋin seek ik iks :tikëŋ :t send εnd εndz εnt εndëŋ εnt shop &p &ps &pt &pëŋ &pt sing ëŋ ëŋz æŋ ëŋëŋ Rŋ slide aëdaëdz ëdaëdëŋ ëd stand ænd ændz Fdændëŋ Fd sting ëŋ ëŋz Rŋ ëŋëŋ Rŋ teach i< i

Contrasting modes of representation for inflectional systems 

Table . Speaker-oriented plat for the verbs in (1)

Lexeme {inf} {3sgPresInd} {past} {presPple} {pastPple} band ænd ænd-z ænd-d ænd-ëŋænd-d bite aëtaët-z lax(ët) aët-ëŋlax(ët)-n bring ëŋ ëŋ-z rime(:t) ëŋ-ëŋrime(:t) budge RM RM-z RM-d RM-ëŋ RM-d build ëld ëld-z ëld-t ëld-ëŋ ëld-t buy aë aë-z rime(:t) aë-ëŋrime(:t) cast æst æst-z æst æst-ëŋæst choose uz uz-z ablaut(oFz) uz-ëŋablaut(oFz)-n do u ablaut(R)-z ablaut(ë)-d u-ëŋablaut(R)-n draw ::-z ablaut(u) :-ëŋ :-n feel il il-z lax(εl)-t il-ëŋlax(εl)-t fight aëtaët-z rime(:t) aët-ëŋrime(:t) flee i i-z lax(ε)-d i-ëŋlax(ε)-d fly aë aë-z ablaut(u) aë-ëŋablaut(oF)-n hang æŋ æŋ-z ablaut(Rŋ) æŋ-ëŋablaut(Rŋ) have æv subtractCoda(æ)-z subtractCoda(æ)-d æv-ëŋ subtractCoda(æ)-d hide aëdaëd-z lax(ëd) aëd-ëŋlax(ëd)-n last æst æst-z æst-d æst-ëŋæst-d lean in in-z in-d in-ëŋ in-d light aëtaët-z lax(ët) aët-ëŋlax(ët) load oFdoFd-z oFd-d oFd-ëŋoFd-d lose uz uz-z lax(:z)-t uz-ëŋlax(εz)-t make eëkeëk-z subtractCoda(eë)-d eëk-ëŋ subtractCoda(eë)-d mean in in-z lax(εn)-t in-ëŋlax(εn)-t pass æs æs-z æs-d æs-ëŋæs-d pay eë eë-z eë-d eë-ëŋeë-d peel il il-z il-d il-ëŋil-d read id id-z lax(εd) id-ëŋlax(ëd) ride aëdaëd-z ablaut(oFd) aëd-ëŋablaut(ëd)-n say eë lax(ε)-z lax(ε)-d eë-ëŋlax(ε)-d see i i-z ablaut(:)i-ëŋi-n seek ik ik-z rime(:t) ik-ëŋrime(:t) send εnd εnd-z εnd-t εnd-ëŋ εnd-t shop &p &p-z &p-d &p-ëŋ &p-d sing ëŋ ëŋ-z ablaut(æŋ) ëŋ-ëŋablaut(Rŋ) slide aëdaëd-z lax(ëd) aëd-ëŋlax(ëd) stand ænd ænd-z rime(Fd) ænd-ëŋrime(Fd) sting ëŋ ëŋ-z ablaut(Rŋ) ëŋ-ëŋablaut(Rŋ) teach i< i<-z rime(:t) i<-ëŋrime(:t) write aëtaët-z ablaut(oFt) aët-ëŋablaut(ët)-n OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel

Table . Morphological indices employed in the speaker-oriented plat

Notation Significance ablaut(x) the vowel in x is an ablaut alternant of the stem vowel lax(x) the vowel in x is the stem vowel’s lax morphophonological alternant rime(x) x stands in place of the stem’s rime subtractCoda(x) the stem’s coda is absent from x awareness of each word form’s morphology; we therefore call it a speaker-oriented (S-O) plat. In this plat, neither the past participles of cast and pass nor those of mean and send are alike in their morphology: cast has the past participle exponence /æst/, pass has /æs-d/, mean has lax(εn)-t, and send has /εnd-t/. A clarification is in order here. Our apparently dichotomous terminology (‘hearer- oriented’ vs ‘speaker-oriented’) might be taken to suggest that only two sorts of plats are possible for a given inflection-class system. Such is certainly not the case. The H-O plat in Table 7.2 and the S-O plat in Table 7.3 belong to a continuum of imaginable plats for the fragment of English verb morphology in Table 7.1. For instance, one can construct a plat in which morph boundaries are represented but in which no information about nonconcatenative morphology is represented; in such a plat, the exponence of ridden might be represented as /ëd-n/ rather than as /ëdn/ (Table 7.2), or as ‘ablaut(ëd)-n’ (Table 7.3). Such a plat would be more speaker-oriented than the plat in Table 7.2, but more hearer-oriented than the plat in Table 7.3. Our S-O plat is rather deeply speaker-oriented, since it encodes both syntagmatic morphological information (morph boundaries) and paradigmatic morphological information (non- concatenative deviations from a lexeme’s default stem form); exactly how much mor- phological information is included in a plat is naturally a matter of choice. The same is true of phonological information; in our H-O plat, we have not, for example, repre- ... sented stress or syllable structure, but we could have, e.g. [trochee [σ æn] [σdәd] ]. Our two plats represent affixal alternations and stem alternations together, but one could also profitably construct a plat based purely on the affixal differences among the forms in Table 7.1 and compare its properties with those of a different plat based purely on the patterns of stem alternation exhibited by these forms. In short, we do not mean to imply that we regard the inflection-class system embodied by the verbs in (1) asbeingreducibletoexactlythetwoplatsinTables7.2and7.3;norareweproffering either of these two plats as ‘the right analysis’ of the English inflection-class system. Rather, we are using the differences between these two plats to make a methodological point relevant to any analysis of this system. Thus, our H-O and S-O plats are only twoofthepossibleplatsthatwemighthavebroughtforwardfordiscussion.Here, we are less interested in motivating the choice of any particular approach to plat construction than in showing how dramatically such choices affect the measurement OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems 

Table . Two hypothetical plats

Plat A Plat B ρστ ρστ

I abc I aaa II def II baa III ghi III aba IV jkl IV aab V mn o V bba VI pqr VI bab VII stu VII abb VIII vwx VIII bbb

of morphological complexity. Next, we review a number of information-theoretic and set-theoretic measures that have been proposed for morphological complexity, and as we show, they yield different results when applied to the contrasting representations of the English conjugational system in Tables 7.2 and 7.3. We regard the complexity of an inflectional system as the extent to which it inhibits motivated inferences about the word forms in a lexeme’s realized paradigm.3 Thus, consider the two hypothetical plats in Table 7.5. Plat A is maximally simple: every one of the exponences a through x in this plat reveals the inflection-class membership of a word form bearing that exponence. By contrast, Plat B is maximally complex: in order to know a word form’s inflection-class membership, one must simply know its ρ-form, its σ-form, and its τ-form; none of these can be motivatedly inferred from the others.4 Motivated inferences about the word forms in a lexeme’s realized paradigm can be inhibited in various ways:

3 A lexeme L’s realized paradigm is a set of cells, each the pairing w,σ of a word form w realizing L with the morphosyntactic property set σ realized by w. The paradigm-based conception of complexity assumed here is not, of course, the only kind of complexity that a language might be claimed to exhibit. Anderson (this volume) argues that allomorphy adds to a morphological system’s complexity. In that sense, one might regard the system represented by Plat A in Table . as being more complex than the system represented by Plat B: in Plat A, each of the three MPSs ρ, σ, τ has eight allomorphs, while in Plat B, each has only two allomorphs. Thus, an inflectional system whose paradigms are complicated by a poverty of implicative relations among their cells may exhibit relatively little allomorphy, and a system complicated by extensive allomorphy may exhibit paradigms whose cells conform to an orderly and extensive network of implicative relations. The two kinds of complexity cannot be equated; one is complexity at the level of paradigms, the other, complexity at the level of individual paradigm cells. 4 To cite a non-hypothetical example, the inference of the present participial forms moving and proving from the infinitive forms move and prove is a motivated inference in English; by contrast, the inference of thepastparticipleformsproven and *moven from these infinitive forms (alone) is not motivated. Our conception of an inflectional system’s complexity (as the extent to which the system inhibits motivated inferences about the realization of paradigm cells) recalls the typology of pronominal systems proposed by Donohue (this volume: .), in which a maximally transparent system affords many simple inferences of content from form and a maximally opaque system affords few such inferences. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel

• The word form in cell c of a lexeme’s realized paradigm P may be unpredictable or only predictable from a combination of two or more predictors. In such cases, c exhibits high conditional entropy (a property that we discuss in 7.3) and low cell predictability (7.6) and lowers P’s inflection-class predictability (7.5); in addition, P requires more than one dynamic principal part (7.4). • Conversely, a cell c’s word form may be unpredictive or only predictive in com- bination with two or more other predictors.Insuchcases,c exhibits low cell predictiveness (7.7). • In a given realized paradigm P, the set of cells predicting one cell’s word form may not suffice to predict other cells’ word forms.Insuchcases,P requires a large static principal-part set (7.4). • Only certain subsets of the set of cells constituting a lexeme’s realized paradigm P may be viable as predictors for P.5 In such cases, the density of P’s optimal principal-part sets is low (7.4). Because of these diverse sources of complexity, the full scope of an inflectional system’s complexity can only be revealed by employing a variety of different measurements; no single measurement gives a complete picture. In the following sections, we apply several measures to the H-O plat and the S-O plat. As we show, the H-O plat is more complex than the S-O plat by these measures. Because the S-O representation includes grammatical information that the H-O representation lacks (7.2), it yields slightly lower conditional entropy measures (7.3), fewer principal parts (7.4), and higher measures of inflection-class predictabil- ity (7.5), cell predictability (7.6), and cell predictiveness (7.7). This result does not imply that S-O representations are ‘better’. But it does show that if one gauges a language’s morphological complexity by the proposed information-theoretic and set- theoretic measures, then one must be precise about what one is measuring: the extent to which a lexeme’s inflection-class membership is deducible from its word forms depends on whether the differences among these word forms are seen as purely phonological or as additionally including differences in morphological struc- ture and membership in particular grammatical classes (or other possible factors as well). The statistics employed in the following discussion have been generated from the plats in Tables 7.2 and 7.3 by means of the Principal-Parts Analyser, a computer program that performs various calculations based on an input plat. This software is freely accessible at .

5 Where L is a lexeme with realized paradigm P and c is a cell in P, a viable set of predictors for c is a set of cells in P whose word forms uniquely determine that of c; a viable set of predictors for P is a set of cells in P whose word forms uniquely determine those of all of the remaining cells in P,i.e.asetofcellsinP whose word forms uniquely determine L’s inflection-class membership. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems 

. The contents of the two plats Thecontentsofthetwoplatsarealikeintwoways.First,theybothrepresentthe forty verbs in (1) as belonging to forty distinct ICs;6 that is, no two rows are alike in either plat. Second, they group MPSs in the same way. Given two MPSs σ, τ, we say that σ andτbelongtothesamedistillation if and only if the exponents of σ across all ICs are isomorphic to those of τ; otherwise, σ and τ belong to distinct distillations. Thus, consider the hypothetical system of ICs in Table 7.6, in which ρ, σ, τ, υ, and ϕ represent MPS; I, II, III, IV, V, and VI represent ICs; and a through i represent exponences. In this plat, the exponences of property sets τ and υ are identical; moreover, the exponences of ϕ are isomorphic to those of sets τ and υ, in the sense that when ϕ has exponence h inagivenIC,τandυbothhavef, and when ϕ has exponence i in a given IC, τ and υ both have g. In the same way, the property sets {inf} and {presPple} belongtothesamedistillationinboththeH-OandtheS-Oplat;thatis,eachplathas four distinct distillations: that of {inf} (which also includes {presPple}) and those of {3sgPresInd}, {past}, and {pastPple}. In calculating a plat’s viable principal-part sets, its predictabilities, its predictiveness, or its entropy measures, it is sensible to base these calculations on distillations rather than on morphosyntactic property sets per se, sincetwomorphosyntacticpropertysetsbelongingtothesamedistillationarealways equally informative. Our practice is to refer to a distillation by one of its members, typically its first-mentioned member in the plat under scrutiny. Notwithstanding the two similarities just noted, the S-O and H-O plats also present a first statistical difference: the S-O plat has 115 distinct exponences, while the H-O plathasonly103.Thisdifferenceisreasonable:althoughthepast-tenseformscast and passed are, from a purely phonemic perspective, alike in their exponence (both have

Table . Plat representing a hypothetical system of ICs ρστυϕ

I acf f h II acg g i III adf f h IV bdg g i V bef f h VI beg g i

6 In this respect, the plats are more fine-grained than the traditional classification of English verbs: traditionally, band and load are both simply weak verbs, but in these two plats, they belong to distinct conjugations because the rime of band differs from that of load. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel

Table . Inflection classes, exponences, and distillations in the two plats

H-O S-O

Identical ICs 0 0 Distinct exponences 103 115 Distillations 4: • {inf} 4: • {inf} • {3sgPresInd} • {3sgPresInd} • {past} • {past} • {pastPple} • {pastPple} the exponence /æst/), they are different from a morphological point of view: passed, unlike cast, has /æs-d/ as its exponence in the S-O plat. These similarities and differences between the two plats are summarized in Table 7.7.

. Conditional entropy In recent work on morphological complexity, the information-theoretic notion of conditional entropy (Shannon 1951) has been employed as an objective measure of an inflection-class system’s complexity (Moscoso del Prado Martín et al. 2004, Ackerman et al. 2009, Milin et al. 2009). Intuitively, the entropy of a MPS σ conditional on MPS Ù is the uncertainty about the exponence eσ of σ when one knows the exponence eÙ of Ù. Given a set S of MPSs such that M ∈ S, Stump and Finkel (2013) define the n- MPS entropy of M as the average of the entropy of M conditional on members of CM (= {c|c is a subset of S excluding M with up to n members}). The formula for n-MPS entropy is  ( | ) c∈CM H M c Hn(M) = , |CM| where H(M|c) is the entropy of M conditional on c.7 If we calculate the 4-MPS entropy of the four distillations in Table 7.7, we get the results in Table 7.8. On average, the H-O and S-O plats are nearly alike in their 4-MPS entropy, but closer examination reveals considerable variation between the two plats withrespecttothe4-MPSentropyofcertaincells.Inboththe{inf}and{3sgPresInd} distillations, the two plats have the same number of contrasting exponences (see Table 7.9); yet, the 4-MPS entropy for both distillations is substantially lower in the

7 The entropy H(M|c) of M conditional on c is defined as   ( | ) =− ( ) ( | ) ( | ) H M c P x P y x log2P y x , x∈c y∈M

where P(x)istheprobabilityofx and P(y|x)istheprobabilityofy given x. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems 

Table . 4-MPS entropy (× 100) of the four distillations in Table 7.7 Distillations Plat{inf}{3sgPresInd}{past}{pastPple}Average

H-O 81 82 90 88 85 S-O 72 73 89 95 82

Table . Distinct exponences in each of the four distillations

Distillations Plat{inf}{3sgPresInd}{past}{pastPple}

H-O 25 26 26 27 S-O 25 26 31 33

S-O plat than in the H-O plat. In the realization of these distillations, the S-O plat affords greater certainty than the H-O plat, and is in that sense less complex. In the {past} and {pastPple} distillations, by contrast, the S-O plat exhibits a larger number of distinct exponences than the H-O plat; accordingly, there is naturally a higher level of uncertainty associated with the S-O plat—indeed, the 4-MPS entropy of the S-O plat exceeds that of the H-O plat in the {pastPple} distillation.

. Principal-part sets The principal parts of a lexeme are a small set of cells in its realized paradigm from which all of the remaining cells are deducible. Because they make it possible for language learners to master entire realized paradigms with a minimum of mem- orization, principal parts have a long tradition of use in language pedagogy. They also have theoretical significance: they concisely embody implicative relations among the cells of a lexeme’s realized paradigm, and these relations define patterns that children clearly employ in learning their first language. As we now show, the precise configuration of an inflectional system’s implicative patterns depends on how that system is represented; thus, the viable principal-part sets afforded by a H-O plat may differ from those afforded by a S-O plat. Before proceeding, we must draw an important distinction between the principal- part sets employed in instructional contexts and those that we use in analysing a realized paradigm’s implicative patterns. Consider, for example, the principal parts OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel employed in teaching Latin verbs: traditionally, these are a verb’s first-person singular present indicative active form, its present infinitive form, its first-person singular perfectindicativeactiveform,anditsperfectpassiveparticiple;theexamplesinTable 7.10 illustrate. By learning these forms of a given verb, students of Latin learn to deduce all of that verb’s remaining forms. Principal-part sets of this sort have three characteristics that are important for language pedagogy but which are not essential for determining an inflectional system’s implicative patterns. First, the principal- part sets employed in language pedagogy are unique: each lexeme has exactly one such set. But an inflectional system’s implicative patterns do not, in general, boil down to a single principal-part set; typically, a lexeme’s full paradigm of inflected forms may be deduced in more than one way, from any of a number of possible sets of forms. Second, principal-part sets in language pedagogy are uniform:from onerealizedparadigmtothenext,itisalwaysthesameformsthatserveasprincipal parts (as in Table 7.10). But nothing about an inflectional system’s implicative patterns necessitates this uniformity; on the contrary, lexemes belonging to distinct inflection classes may participate in very different implicative relations. Finally, principal-part sets in language pedagogy are ordinarily optimal: they are as small as can be (while still adhering to the requirement of uniformity). But there is no logical necessity that the set of cells embodying a realized paradigm’s implicative structure be optimal in this sense. We assume a simpler conception of principal parts, one according to which a principal-part set may or may not be unique, uniform, or optimal. That is, a set of principal parts for a lexeme L is simply any set of cells in L’s realized paradigm P from whose realization one can reliably deduce the realization of the remaining cells in P. Thisdefinitionallowsustochoosewhetherornottorequirethatagivenprincipal-part set possess any or all of the properties of uniqueness, uniformity, and optimality. We call principal-part sets that adhere to the requirement of uniformity static principal- part sets; sets that do not adhere to this requirement are dynamic principal-part sets. The two plats afford similar principal-part sets under the static scheme: inboth cases, there are two optimal static principal-part sets: {inf, past, pastPple} and

Table . Traditional principal parts of five Latin verbs

Conjugation 1sg present Infinitive 1sg perfect Perfect passive Gloss indicative active indicative active participle

1st laudolaud¯ are¯ laudav¯ ¯ılaudatum¯ ‘praise’ 2nd moneomon¯ ¯ere monu¯ımonitum‘warn’ 3rd duc¯ od¯ ucere¯ dux¯ ¯ıductum¯ ‘lead’ 3rd (-io¯) capiocaperec¯ ¯ep¯ıcaptum‘take’ 4th audioaud¯ ¯ıre aud¯ıv¯ıaud¯ıtum ‘hear’ OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems 

Table . Optimal dynamic principal-part sets of three verbs (H-O plat) [ Italic 1–4 represent the four distillations: {inf}, {3sgPresInd}, {past}, {pastPple} ]

Principal-part sets Distillations 1234 band 33333dynamic principal-part number: 1 44444cell-predictor number: 1.00 number of optimal sets: 2 density of optimal sets: 50.0 of 4 bite 3, 43334dynamic principal-part number: 2 cell-predictor number: 1.00 number of optimal sets: 1 density of optimal sets: 16.7 of 6 bring 1, 31133dynamic principal-part number: 2 1, 41144cell-predictor number: 1.00 2, 32233number of optimal sets: 4 2, 42244density of optimal sets: 66.7 of 6

{3sgPresInd, past, pastPple}.8 Notwithstanding this similarity, the two plats differ under the dynamic principal-part scheme. Dynamic principal-part analysis affords various informative measurements. Con- sider,forexample,theband,bite,andbringconjugations.IntheH-Oplat,thesehave the optimal dynamic principal-part sets in Table 7.11. In this table, the italic numerals 1, 2, 3,and4 represent the cells in a lexeme’s realized paradigm corresponding to the four distillations in the H-O plat: {inf}, {3sgPresInd}, {past}, and {pastPple}, respectively. Viable principal-part sets are given in successive rows. The principal parts that suffice to determine the exponence of distillation σ in set A are listed in σ’s column on A’s row. As the table shows, we can compare band, bite, and bring according to (i) their dynamic principal-part number,i.e.thenumberofoptimaldynamic principal parts they require (band requires only one, while bite and bring both require two); (ii) their cell-predictor number,i.e.theaveragenumberofdynamicprincipal parts needed to deduce the exponence of a given distillation (in all three of these conjugations, the cell-predictor number is 1.0); (iii) the number of viable optimal dynamic principal-part sets they afford (band allows two, bite only one, and bring four); and (iv) the density of optimal dynamic principal-part sets among sets of n cells, given a dynamic principal-part number of n (two of four single-member principal-

8 Conventionally, the first of these is used in pedagogical contexts: sing, sang, sung. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel

part sets are viable for band, one of six two-member sets are viable for bite, and four of six two-member sets are viable for bring). If we average these four measures across all forty ICs in both plats, we arrive at the figures in Table 7.12. Here, the difference between the two plats becomes more pronounced. By three of the four measures, the S-O plat allows a verbal lexeme’s cells tobededucedmoreeasilythantheH-Oplat:onaverage,thefortyconjugationsrequire smaller principal-part sets in the S-O plat, which also affords more viable sets—that is, more ways of deducing unknown cells from known cells. By these criteria, the S-O plat is less complex than the H-O plat. We can localize these differences between the S-O and H-O plats by examining these results more closely. Consider first the dynamic principal-part numbers of the forty verbs in the two plats. As Table 7.13 shows, the two plats yield different principal- part numbers for only four verbs: bite, cast, hide, and mean. In the S-O plat, each of these four verbs requires only a single principal part, but each requires two in the H-O plat. Consider, for example, the verb cast. In the H-O plat, cast has the exponences in Table 7.14. In this plat, a principal-part set with only a single member cannot work for cast, because each of its exponences is identical to the corresponding exponence of either last or pass. For this reason, cast requires two principal parts. Given four

Table . Dynamic principal-part sets in the two plats

H-O S-O

Dynamic principal-part number, averaged across conjugations 1.23 1.12 Cell-predictor number, averaged across conjugations 1.00 1.00 Number of viable dynamic principal-part sets, averaged across 2.35 2.58 conjugations Average ratio of actual to possible dynamic principal-part sets, 52.9 60.6 averaged across conjugations

Table . Dynamic principal-part numbers in the two plats

H-O S-O

band,budge,build,choose,do,draw,feel,fly, 11 hang, have, last, lean, light, load, lose, make, pass, pay, peel, read, ride, say, see, seek, send, shop, sing, slide, stand, teach, write bring, buy, fight, flee, sting 2 2 bite, cast, hide, mean 2 1 Average 1.23 1.12 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems 

Table . Candidate principal-part sets for cast in the H-O plat (Within a given row, a pair of distillations marked ‘...’ corresponds to a viable principal-part set, and a pair of distillations marked ‘× ...×’ corresponds to asetthatisnotviable.) Lexeme {inf} {3sgPresInd} {past} {pastPple} Candidate principal-part sets cast æst æsts æst æst

1 ×× 2  3  4  5  6 ×× last æst æsts æstәd æstәd pass æs æsәz æst æst distillations, there are six candidate sets containing two principal parts; four of these six sets are viable, namely those numbered ‘2’–‘5’ in Table 7.14. (The principal-part set numbered ‘1’ doesn’t distinguish cast from last; the set numbered ‘6’ doesn’t distinguish cast from pass.) The situation is different in the S-O plat, where cast, last, and pass havethe exponences in Table 7.15. In this plat, a single principal part suffices to distinguish cast from last and pass. Given four distillations, there are four candidate single-member principal-part sets; as Table 7.15 shows, two of the four sets—those numbered ‘3’ and ‘4’—are viable. Thus, by the criterion of verbs’ dynamic principal-part numbers, the H-O plat is more complex than the S-O plat. Consider now the number of viable optimal dynamic principal-part sets available to each verb in the two plats. As Table 7.16 shows, most verbs exhibit the same number of sets in both plats. Among the ten verbs that do not, eight exhibit more viable sets in the S-O plat than in the H-O plat. For instance, the H-O plat affords two optimal

Table . Candidate principal-part sets for cast (S-O plat)

Lexeme {inf} {3sgPresInd} {past} {pastPple} Candidate principal-part sets cast æst æst-z æst æst

1 × 2 × 3  4  last æst æst-z æst-d æst-d pass æs æs-z æs-d æs-d OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel

Table . Number of viable optimal dynamic principal-part sets

H-O S-O

Same number in both plats:

bite, fly, hide, light, say, sing, slide 1 1 band, choose, feel, hang, last, lean, lose, peel, see, seek, stand, 22 sting, teach draw 3 3 bring, budge, build, buy, fight, flee, have, load, shop 4 4

More in S-O plat:

ride, write 1 2 pay 1 3 make, pass, read, send 2 4 do 3 4

More in H-O plat:

cast, mean 4 2

Average 2.35 2.58

Table . Candidate principal-part sets for pass (H-O plat)

Lexeme {inf} {3sgPresInd} {past} {pastPple} Candidate principal-part sets pass æs æsәz æst æst

1  2  3 × 4 ×

cast æst æsts æst æst dynamic principal-part sets for the verb pass, one containing its {inf} cell and the other, its {3sgPresInd} cell; neither the {past} cell of pass nor its {pastPple} cell is an optimal principal part, since neither distinguishes the conjugation of pass from that of cast in the H-O plat (Table 7.17). By contrast, the additional morphological infor- mation in the S-O plat makes each of the four cells an optimal dynamic principal part: in this plat, the exponence of passed (morphophonologically |pæs-d|) distinguishes it from cast (morphophonologically |kæst|), as Table 7.18 shows. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems 

Table . Candidate principal-part sets for pass (S-O plat)

Lexeme {inf} {3sgPresInd} {past} {pastPple} Candidate principal-part sets pass æs æs-z æs-d æs-d

1  2  3  4  cast æst æst-z æst æst

Table . Density of viable dynamic principal-part sets among all can- didate sets having the same number of members

H-O S-O

budge, build, have, load, shop 100 of 4 100 of 4 draw 75.0 of 4 75.0 of 4 bring, buy, fight, flee 66.7 of 6 66.7 of 6 band, choose, feel, hang, last, lean, lose, peel, see, 50.0 of 4 50.0 of 4 seek, stand, teach sting 33.3 of 6 33.3 of 6 fly, light, say, sing, slide 25.0 of 4 25.0 of 4 do 75.0 of 4 100 of 4 make, pass, read, send 50.0 of 4 100 of 4 pay 25.0 of 4 75.0 of 4 ride, write 25.0 of 4 50.0 of 4 bite, hide 16.7 of 6 25.0 of 4 cast, mean 66.7 of 6 50.0 of 4 Average 52.92 60.62

The two apparent exceptions to the generalization that the S-O plat affords more viable principal-part sets are cast and mean, each of which has four optimal dynamic principal-part sets in the H-O plat but only two in the S-O plat. This apparent exceptionality is, however, an artefact of a difference between the two plats that we have already noted: the H-O plat requires these two verbs to have two principal parts each, whereas the S-O plat only requires them to have one; see again Table 7.13. That is, castandmeanhavemoreviableprincipal-partsetsintheH-Oplatbecausetheyhave morecandidateprincipal-partsetsinthatplatthanintheS-Oplat—sixversusfour. Finally, consider the density of viable dynamic principal-part sets among candidate sets of the same size. As Table 7.19 shows, most verbs exhibit the same density in both plats. Of the remaining verbs, a solid majority exhibit more density in the S-O plat thanintheH-Oplat.Onlytwoexhibittheoppositepattern,andagainthesearecast OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel and mean, whose apparent exceptionality again stems from the fact that they require two principal parts in the H-O plat but only one in the S-O plat.

. Inflection-class (IC) predictability A given principal-part set reveals some of the implicative relations that hold among the cells of a lexeme’s realized paradigm, but it doesn’t reveal all such relations; a clearer picture can be arrived at by comparing a large number of alternative principal- part sets. Building on this idea, we have proposed a measure of IC predictability (Finkel and Stump 2009, 2013). Intuitively, the IC predictability of a lexeme L’s realized paradigm is the fraction of viable (though not necessarily optimal) dynamic principal- part sets among all nonempty subsets of cells in L’s realized paradigm.9 Since large subsetsalwaystendtobeviableprincipal-partsets,werestrictthesizeofthecell sets used in this computation to some arbitrary number m,usuallysettingm to 4 (as in the following calculations). We then define ICPL, the IC predictability of realized paradigm PL, as follows:  |m[DL ]| ICPL = |m[P(DL)\Ø]| The following abbreviations and notation are employed in this definition:

PL is the realized paradigm of lexeme L. ML is the set of cells in PL. DL is any maximal subset of ML none of whose members belong to the same distillation.  DL is the set {N : N ⊆ DL and N is a viable dynamic principal-part set for PL}. For any collection C of sets, m[C]represents{s ∈ C : |s|≤m}. For any set S, P(S) is the power set of S (= the set of subsets of S). For any sets S1,S2,S1 \ S2 is { x : x ∈ S1 and x ∈/ S2}. Intuitively, a realized paradigm with high IC predictability has many inter-cell implica- tive relations, and one with lower IC predictability has fewer. A realized paradigm with an IC predictability of 1.0 allows the word form in every cell to be deduced from thewordformineveryothercell;arealizedparadigmwithanICpredictabilityof0 allows no such inferences. Low IC predictability contributes to an inflectional system’s complexity. Most of the verbs in (1) have the same IC predictability in both the H-O and S-O plats; for twelve verbs, however, the two plats produce distinct measures. As Table 7.20 shows, it is invariably the S-O plat that induces the higher level of IC predictability in these cases; moreover, five of these verbs exhibit full IC predictability (a value of 1.0) in theS-OplatbutnotintheH-Oplat.Thus,thepatternthatemergedinthepreceding sections is again in evidence: the S-O plat is less complex than the H-O plat.

9 For a demonstration that IC predictability and n-MPS entropy are distinct measures (neither reducible to the other), see Stump and Finkel (: ). OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems 

Table . IC predictability of twelve verbs in the H-O and S-O plats

H-O S-O

bite 0.267 < 0.533 cast 0.600 < 0.800 do 0.933 < 1.000 hide 0.267 < 0.533 make 0.800 < 1.000 mean 0.600 < 0.800 pass 0.800 < 1.000 pay 0.733 < 0.933 read 0.800 < 1.000 ride 0.533 < 0.800 send 0.800 < 1.000 write 0.533 < 0.800

Avg 0.727 < 0.790

. Cell predictability

Intuitively, the cell predictability of a cell w,σ in a lexeme L’s realized paradigm PL is the ratio of (a) to (b), where (a) is the number of nonempty subsets of PL’s cells whose realization uniquely determines w,σ and(b)isthenumberofallnonemptysubsets of PL’s cells. More formally, we define the cell predictability CellPw,σ of a cell w,σ in a realized paradigm PL as follows:

|[m[Dw,σ]]−w,σ| CellPw,σ = |[m[P(DL)\Ø]]−w,σ| The following additional abbreviations and notation are employed in this definition: ⊆ Dw,σ is the set {N : N DL and N uniquely determines the realized cell w,σ in PL}. For any collection C of sets, [C]−w,σ represents the largest subset of C such that no member of [C]−w,σ contains w,σ. The word form in a cell with high cell predictability can be inferred in several ways; the word form in a cell with low cell predictability can be inferred in few ways, possibly not at all. Thus, cells with high cell predictability contribute to an inflection-class system’s simplicity, while low cell predictability contributes to its complexity. When we apply the cell predictability measure to the cells represented in the H-O and S-O plats, we find that most verbs have cells whose cell predictability is the same in both plats. But for the thirteen verbs in Table 7.21, we find differences; these differences OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel

Table . Cell predictability measures for thirteen verbs in the H-O and S-O plats

{inf} {3sgPresInd} {past} {pastPple} Avg H-O S-O H-O S-O H-O S-O H-O S-O H-O S-O

bite 0.875 0.875 0.875 0.875 0.000 < 0.500 0.000 0.000 0.438 < 0.562 cast 0.500 < 0.875 0.500 < 0.875 0.500 0.500 0.500 0.500 0.500 < 0.688 do 0.750 < 0.875 0.750 < 0.875 0.875 0.875 0.750 < 0.875 0.781 < 0.875 hide 0.750 < 0.875 0.750 < 0.875 0.000 < 0.500 0.000 0.000 0.375 < 0.562 make 0.500 < 0.875 0.500 < 0.875 0.875 0.875 0.875 0.875 0.688 < 0.875 mean 0.500 < 0.875 0.500 < 0.875 0.500 0.500 0.500 0.500 0.500 < 0.688 pass 0.500 < 0.875 0.500 < 0.875 0.875 0.875 0.875 0.875 0.688 < 0.875 pay 0.500 < 0.875 0.375 < 0.750 0.750 0.750 0.750 0.750 0.594 < 0.781 read 0.500 < 0.875 0.500 < 0.875 0.875 0.875 0.875 0.875 0.688 < 0.875 ride 0.875 0.875 0.875 0.875 0.000 < 0.500 0.500 0.500 0.562 < 0.688 send 0.500 < 0.875 0.500 < 0.875 0.875 0.875 0.875 0.875 0.688 < 0.875 slide 0.750 < 0.875 0.750 < 0.875 0.500 0.500 0.000 0.000 0.500 < 0.562 write 0.875 0.875 0.875 0.875 0.000 < 0.500 0.500 0.500 0.562 < 0.688 Avg 0.706 < 0.781 0.700 < 0.775 0.566 < 0.616 0.584 < 0.588 0.639 < 0.690 are shaded in Table 7.21. Without exception, if a cell has distinct cell predictabilities in the H-O and S-O plats, its cell predictability is lower in the H-O plat than in the S-O plat. Notice, in particular, the dark-outlined values in Table 7.21: these represent instances in which a cell’s word form is unpredictable in the H-O plat but predictable in the S-O plat. By the criterion of cell predictability, we again see that the H-O plat is morecomplexthantheS-Oplat.

. Cell predictiveness We define the predictiveness of a cell K in a realized paradigm as the fraction of the other cells in the paradigm whose word forms are fully determined by that of K. The two plats differ in their overall predictiveness (averaging over both cells and verbs): verbs’ cells are more predictive in the S-O plat than in the H-O plat. This difference stems from the fact that there are two cells in a verb’s realized paradigm that are, on average, more predictive in the S-O plat than in the H-O plat; these are the {past} and {pastPple} cells (Table 7.22). In particular, the ten verbs in Table 7.23 account for this difference. In this table, a verb’s {past} word form (notated as 3)eitherpredicts (1) or fails to predict (0) the word forms in its {inf} (= 1), {3sgPresIndic} (= 2), and {pastPple} (= 4) cells. The difference between the results for the H-O plat and those for the S-O plat is striking: in the S-O plat, the past-tense forms of these ten verbs are highly predictive; in the H-O plat, by contrast, they are at best mildly predictive, OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Contrasting modes of representation for inflectional systems 

Table . Predictiveness of a verb’s cells, averaged across verbs H-O S-O

Overall predictiveness of ({inf}): 0.550 0.550 of ({3sgPresInd}): 0.600 0.600 of ({past}): 0.592 0.767 of ({pastPple}): 0.658 0.808

Average predictiveness across all distillations: 0.600 0.681

Table . Predictiveness of a verb’s past-tense cell in the H-O and S-O plats

H-O S-O Distillations Distillations Predictiveness Predictiveness 1234 1234 cast 0 0 X 1 0.333 cast 1 1 X 1 1.000 do 0 0 X 0 0.000 do 1 1 X 1 1.000 hide 0 0 X 0 0.000 hide 1 1 X 0 0.667 make 0 0 X 1 0.333 make 1 1 X 1 1.000 mean 0 0 X 1 0.333 mean 1 1 X 1 1.000 pass 0 0 X 1 0.333 pass 1 1 X 1 1.000 pay 0 0 X 1 0.333 pay 1 1 X 1 1.000 read 0 0 X 1 0.333 read 1 1 X 1 1.000 send 0 0 X 1 0.333 send 1 1 X 1 1.000 slide 0 0 X 0 0.000 slide 1 1 X 0 0.667

1 = {inf} 2 ={3sgPresInd}3 ={past}4 = {pastPple} 0 = unpredicted 1 = predicted X = predictor if they are predictive at all. Predictiveness is therefore a final criterion according to which the S-O representation of English verb inflection is less complex than the H-O representation.

. Conclusion In investigating the inflectional morphology of the world’s languages, one is inevitably struck by their widely varying degrees of morphological complexity. Unless this impression is to remain nothing more than a vague and subjective observation, it is critical to find objectively measurable correlates of perceived differences in complexity, and indeed, to articulate in objective terms exactly what morphological complexity means. Here and elsewhere, we have proposed that the complexity of an inflectional OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Gregory Stump and Raphael A. Finkel system is the extent to which it inhibits motivated inferences about the word forms realizing a paradigm’s cells. The factors that inhibit such inferences are of various kinds, making it desirable to employ a range of approaches to measuring their effects. A number of approaches have recently been proposed, drawing on both information- theoretic and set-theoretic methods. In order to measure an inflectional system’s complexity, one must employ a precise and explicit representation of that system. We have argued here that the same system is inevitably representable in more than one way, and that the choice of representa- tion has profound effects on the measurement of the system’s properties. We have elaborated by presenting two different representations of the fragment of the English verb system in (1), one ‘hearer-oriented’,the other ‘speaker-oriented’.Our goal has not been to argue that one sort of representation is more suitable than the other (and we don’t think that that is the case in any event). Nor do we wish to suggest that these are the only possible representations of the system of verb inflection in (1). Rather, our objective has been to demonstrate how deeply the choice of representation can affect the results of measuring a system’s complexity. Although we have focused on English here, we have seen this same effect with every inflectional system that we have investigated: the results of calculations based on a H-O plat differ from those based on a S-O plat. Once one begins to investigate the relative complexity of two or more systems, it is important that sames be compared with sames: the mode of representation chosen for one system must be assiduously employed in representing the systems with which it is compared. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology

VITO PIRRELLI, MARCELLO FERRO, AND CLAUDIA MARZI

. Introduction In a constructive perspective on morphological theorizing, roots and affixes are the basic building blocks of morphological competence, on the assumption that the lexicon is largely redundancy-free. The speaker, having identified the parts ofa word form, proceeds to discard the full form from the lexicon. Fully inflected forms are parsed (in word recognition) or generated (in word production) on line from their building blocks. This is in sharp contrast with a second type of perspective, named abstractive, which treats word forms as basic units and their recurrent parts as abstractions over full forms. According to this perspective, full forms are the founding units of morphological processing, with sub-lexical units resulting from the application of morphological processes to full forms. Learning the morphology of a language thus amounts to learning relations between fully stored word forms, which are concurrently available in the speaker’s mental lexicon and jointly facilitate processing of morphologically related forms. The essential distinction between constructive and abstractive approaches (due to Blevins 2006) is not coextensive with the opposition between morpheme-based and word-based views of morphology. There is an obvious sense in which constructive approaches are morph-based, as they rely on the combination of minimal sub-lexical units. On the other hand, abstractive approaches are typically word-based, since words are taken to be essential to morphological generalizations. However, one can endorse an abstractive view of morphologically complex words as being internally structured into recognizable constituent parts. According to this view, constituent parts are analysed as ‘emergent’ from independent principles of lexical organization, whereby full lexical forms are redundantly stored and mutually related through entailment OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi relations (Matthews 1991, Corbett and Fraser 1993, Pirrelli 2000, Burzio 2004, Booij 2010). Conversely, a constructive orientation can underlie a word-based view of morphological structure. This is the case of Stump’s notion of ‘paradigm function’ (Stump2001),mappingtherootofalexemeandasetofmorphosyntacticproperties ontotheparadigmcelloccupiedbyaninflectedformofthelexeme. More importantly for our present purposes, the two views are also orthogonal to the distinction between symbolic (or rule-based) and sub-symbolic (or neurally inspired) word processing. First, we observe that, in spite of their being heralded as champions of the associative view on morphological structure, the way connectionist neural networks have been used to model morphology learning adheres, in fact, to a strictly derivational view of morphological relations, according to which a fully inflected form is always produced (or analysed) on the basis of a unique, underlying lexical form. By modelling inflection as a phonological mapping function from a lexical base toits range of inflected forms, connectionist architectures are closer to a rule-free variant of the classical constructive view, than to associative models of the mental lexicon. Conversely, according to Albright and Hayes’ Minimal Generalization algorithm (2003), speakers conservatively develop structure-based rules of mapping between fully inflected forms. These patterns are based on a cautious inductive generalization procedure: speakers are confident in extending a morphological pattern to other forms to the extent that (i) the pattern obtains for many existing word forms, (ii) there is a context-similarity between words that comply with the pattern, and (iii) there is a context-difference between those word forms and word forms that take other patterns. Accordingly, the speaker’s knowledge of word structure is more akin to one dynamic set of relations among fully inflected forms, in line with an overall abstractive approach.1 Finally, the separation line between constructive and abstractive views has little to do with the controversy over the psychological reality of morpheme-like units. Constructivists tend to emphasize the role of morphs in both morphology acquisition and processing, whereas in the abstractive view morphs emerge as abstractions over fullforms.Infact,wetakeittobeoneofthemostimportantcontributionsof emergentist approaches to language inquiry the observation that self-organization effects provide an explanatory basis for important aspects of language knowledge. We shall show that self-organization is a determinant of lexical competence and that emergent morphological patterns do play an important role in word processing and acquisition.

1 In fact, the sheer number of leading forms which need to be learned for a speaker to be able to master an entire paradigm is not a defining feature of abstractive approaches. In simple inflectional systems, a single form may well be sufficient to produce all remaining forms of the same paradigm. The computational question of how many different forms are needed to predict the entire range of formal variation expressed by a specific paradigm or by a class of morphologically related paradigms has lately been addressed on either information-theoretic (conditional entropy of cells and paradigms, Moscoso del Prado Martín et al. ) or set-theoretic grounds (see Stump and Finkel’s contribution to the present volume). OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology 

It could be tempting to suggest that the abstractive and constructive views just represent two complementary modes of word processing: (i) a production mode for the constructive perspective, assembling sub-lexical bits into lexical wholes, and (ii) a recognition mode for the abstractive perspective, splitting morphologically complex words into their most recurrent constituent morphs. Blevins (2006) shows that this is not true empirically, as the two approaches lead to very different implications (such as the ‘abstractive’ possibility for a word form to have multiple bases or ‘split- base effect’) and allow for a different range of representational statements (e.g. the sharp separation between actual data and abstract patterns arising from data in the constructive approach). In this chapter, we intend to show that the two views are also computationally different.Wecontendthatanumberofproblemsarisinginconnectionwithasub- symbolic implementation of the constructive view are tackled effectively, or disappear altogether, in a neurally inspired implementation of the associative framework, resting on key-notions such as self-organization and emergence. In particular, we will first go over some algebraic prerequisites of morphological processing and acquisition that, according to Marcus (2001), perceptron-like neural networks are known to meet to a limited extent only (section 8.2). A particular variant of Kohonen’s Self- Organizing Maps (2001) is then introduced to explore and assess the implications of a radically abstractive approach in terms of its computational complexity (sec- tion 8.3). Experimental data are also shown to provide a quantitative assessment of these implications (section 8.4). Finally, some general conclusions are drawn (section 8.5).

. The algebraic basis of constructive morphology Marcus (2001) introduces a number of criterial properties defining any descriptively adequate morphological system. First,thesystemmusthaveawaytodistinguishvariablesfrominstances,tobe able to make such a simple statement as ‘play is an English verb stem’, where play is a constant and stem is a variable which can be assigned any English verb stem. As we shall see, such a distinction is essential to capture the intuitive notion of repetition of the same symbol within a string, as when we say that the string pop contains two instances (tokens) of the same p (type). Second, a proper morphological system must avail itself of the formal means to represent abstract relationships between variables. Even the most straightforward default rule-like statement such as the following (1) PROGR(Y) = Y + ing, meaningthataprogressiveverbforminEnglishisbuiltbyaffixinging after any verb stem Y, requires specification of two relations: (i) an identity relation between the OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi lexical root Y to the left of ‘=’andthestemY in the right-hand side of the equation,2 and (ii) the ordering relation between the stem and the progressive inflectional ending -ing. Third, there must be a way to bind a particular instance (a constant value) to a given variable. For example, Y in (1) can be assigned a specific value (e.g. the individual stem play to yield the form playing), as the output of the concatenation operation expressed by ‘+’. Fourth, the system must be able to apply operations to arbitrary instances of a variable. For example, a concatenation operation must be able to combine ing withanyinputvalueassignedtoY,unlessY’s domain is otherwise constrained. Finally, the system must have a way to extract relations between variables like (1), on the basis of training examples (walking, sleeping, playing,etc.).Marcusgoes on to argue that any morphological system must be able to meet these prerequisites, irrespective of whether it is implemented symbolically as a rule-based processor, or sub-symbolically, as a neurally inspired artificial network. In fact, traditional connectionist models of morphological competence are taken by Marcus to fail to meet at least some of these prerequisites. What is called by Rosenblatt (1962: 73) ‘a sort of exhaustive rote-learning procedure’, in which every single case of a general relation must be learned individually, appears to be an essential feature of perceptron- like artificial neural networks. Such a characteristic behaviour extends to the family of multi-layered perceptrons classically used to simulate morphology learning. The consequences of this limitation are far-reaching, as detailed in the ensuing sections.

.. Binding In a multi-layered perceptron, a variable (or type)—say the letter p—can be repre- sented by a single node or, alternatively, by a distributed pattern of nodes, on the input layer. A single instantiation of a p in a string like put is encoded on the input layer by activating the corresponding node(s). We say that an instance of p is bound to its type node(s) through node’s activation. However, for a string like pop to be properly encoded, two instances of p must be distinguished—say p_1 (or p in first position) and p_3 (or p in third position). Activating the same p node twice would not do, as the whole string is encoded as an integrated activation pattern, where all nodes representing input letters are activated concurrently. Logically, the problem amounts to binding the input layer node (the letter type) to a specific individual, a letter token. When it comes to representing words, a token can be identified uniquely by specifying its type together with the token’s position in the word string. In this context, the solution to the binding problem amounts to assigning a specific order relationship to a given type. A traditional strategy for doing this with neural networks is known as conjunctive coding.

2 Computationally, this can be implemented by introducing a copy operator, replicating the content of a given memory register into another register, irrespective of the particular nature of such content. As we shall see later on, this is not the only possible computational interpretation of equation (). OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology 

.. Coding In conjunctive coding (Coltheart et al. 2001, Harm and Seidenberg 1999, McClelland and Rumelhart 1981, Perry et al. 2007, Plaut et al. 1996), a word form like pop is represented through a set of context-sensitive nodes. Each such node ties a letter to a specific serial position (e.g. {P_1, O_2, P_3}), as in so-called positional coding or, alternatively, to a specific letter cluster (e.g. {_Po, pOp, oP_}), as is customary in so-called Wickelcoding. Positional coding makes it difficult to generalize knowledge about phonemes or letters across positions (Plaut et al. 1996, Whitney 2001) and to align positions across word forms of differing lengths (Davis and Bowers 2004), as with book and handbook. The use of Wickelcoding, on the other hand, while avoiding some problems of positional coding, causes an acquisitional dead-lock. Speakers of different languages are known to exhibit differential sensitivity to symbol patterns. If such patterns are hard-wired in the input layer, the same processing architecture cannot be used to deal with languages exhibiting differential constraints on sounds or letters.

.. Relations and the problem of variables Consider the problem of providing a neurally inspired version of the relation expressed in (1). The equation conveys an identity mapping between the lexical base in PROGR(Y)andtheverbstemY to the right of ‘=’. A u n i v e r s a l l y q u a n t i fi e d relation of this kind is common to a variety of default mapping relations holding in the morphologies of virtually any language. Besides, this sort of free generalization seems to be a fundamental feature of human classificatory behaviour. Straightforward linguistic evidence that people freely generalize universally quantified one-to-one mappings comes from morphological reduplication (or immediate repetition), as found in Indonesian pluralization, whereby the plural of buku (‘book’) is buku-buku. Reduplication patterns like this have quite early roots in language development. Seven- month-old infants are demonstrably able to extract an underlying abstract structure from two minutes’ exposure to made-up syllabic sequences cast into an AAB pattern (ga-ga-na, li-li-ti,etc.)andtotellthishabituationstructurefromothertestpatterns containing repetitions in different positions (e.g. ABB) (Saffran et al. 1996, Marcus et al. 1999, Gerken 2006). A one-to-one mapping function can easily be implemented in a network that uses one node to represent each variable (Figure 8.1, left). The model can freely generalize an identity relation by setting the connection weight to 1. However, it is not at all clear how a one-node-per-variable solution can represent the highly non- linear types of morphological exponence abundantly documented in the literature (see Anderson, section 2.2, this volume for a thorough discussion), such as stem- internal vowel alternation in base/past-tense English pairs such as ring–rang, bring– brought,andhang–hung where different mapping functions are required depending OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi

input output input output input hidden output layer layer layer layer layer

x y 8 8 8 8

4 4 4 h1 4

2 2 2 h2 2

1 1 1 1

Figure . One-to-one and one-to-many relations in one-node-per-variable and many- nodes-per-variable neural networks. on both the vowel and its surrounding context. A descriptively more adequate solution is provided by a many-nodes-per-variable perceptron, where each node represents a specific value assigned to each variable, and different connections emanating from the same node can either enforce an identity relation (Figure 8.1, centre) or make room for other mapping relations, with hidden layer units accounting for inter- node influences (Figure 8.1, right). However more flexible, a many-node-per-variable perceptron, when trained by back-propagation, can learn a universally quantified one-to-one mapping relation only if it sees this relation illustrated with respect to each possible input and output node (Marcus 2001). Back to Figure 8.1 (centre), a perceptron trained on ‘8’,‘4’,and ‘2’ nodes only, is not in a position to extend an identity relation to an untrained ‘1’ node. An implication of this state of affairs is that a multi- layered perceptron can learn the relation expressed by equation (1) only for strings (in the domain of Y)thatfallentirelyintheperceptron’strainingspace.

.. Training independence Technically, the problem of dealing with relations over variables is a consequence of the back-propagation algorithm classically used to train a multi-layered perceptron. Back-propagation consists in altering the weights of connections emanating from an activated input node, for the level of activation of output nodes to be attuned to the expected output. According to the delta rule in equation (2), connections between the jth input node and the ith output node are in fact changed in proportion to the ˆ th difference between the target activation value hi of the i output node and the actually observed output value hi: ˆ (2) wi,j = γ(hi − hi)xj OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology 

th th where wi,j is the weight of the connection between the j input node and the i th output node, γ is the learning rate, and xj is the activation of the j input node. If xj is null, the resulting change wi,j is null. In other words, back-propagation never alters the weights of connections emanating from an input node in Figure 8.1 (centre), if the node is never activated in training. This is called by Marcus (2001) training independence. Interposition of a layer of hidden units mediating input– output mapping (Figure 8.1, right) does not remedy this. There is no direct way in which a change in the connections feeding the output node ‘1’ can affect the connections feeding a different output node.

.. Preliminary discussion In many respects, multilayer perceptrons are taken to provide a basic neurally inspired associationist mechanism for morphology processing and acquisition. Much of their success history is due to the methodological allure of their simplicity. It seems reasonable to investigate the inherent representational and procedural limitations of this class of models to understand more of the task of word learning and processing and its basic computational components. Among the main liabilities of multilayer perceptrons, the following ones strike us as particularly critical in the light of our previous discussion. Training locality. Due to training independence, training one node does not transfer to another node. There is no way to propagate information about connections between two (input and output) nodes to another pair of such nodes. This is not only detrimen- tal to the development of abstract open-ended relations holding over variables (as opposed to pair-wise memorized relationships between specific instances), but also to the modelling of word paradigms as complex morphological interfaces. Over the last fifteen years, considerable evidence has accrued on the critical role of paradigm- based relations as an order-principle imposing a non-local organizing structure on word forms memorized in the speaker’s mental lexicon, facilitating their retention, accessibility, and use, while permitting the spontaneous production and analysis of novel words. A number of theoretical models of the mental lexicon have been put for- ward to deal with the role of these global constraints in (i) setting an upper bound on the number of possible forms a speaker is ready to produce (Carstairs and Stemberger 1988), (ii) accounting for reaction times in lexical decision and related tasks (Baayen et al. 1997, Orsolini and Marslen-Wilson 1997, and others), (iii) explaining production errorsbybothadultsandchildren(BybeeandSlobin1982,BybeeandModer1983, Orsolini et al. 1998), and (iv) accounting for human acceptability judgements and generalizations over nonce verb stems (Say and Clahsen 2002, Albright 2002). We believe that computational models of morphology acquisition and processing have something to say about all these issues. The basic challenge is how it is possible for such global paradigm-based constraints to emerge on the basis of local learning steps (Pirrelli et al. 2004). OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi

Coding. Another important related point has to do with the nature of input representations. It has often been suggested that distributed representations may circumventtheconnectionistfailuretoencodeuniversallyquantifiedvariables(Elman 1998). If the same pool of nodes is used to encode different input patterns, then at least some of the mapping relations holding for one pattern will also obtain for the other ones, thus enforcing a kind of generalization. For example, if each input node represents not a word but a part of a word, then training one word has an impact on nodes that are part of the representation of other words. However, distributed representations can be of help only if the items to which one must generalize share that particular contrast (pool of nodes) on which the model was trained. If the model was trained on a different contrast than the one encoded in the test item, generalization to the test item will fail. More seriously, distributed representations can raise unnecessary coding ambiguity if each individual is encoded as an activation pattern over the pool of nodes corresponding to its variable. If two individuals are activated simultaneously, the resulting joint activation pattern can make it difficult to disentangle one individual from the other. In fact, the representational capacity of the network to uniquely bind a particular individual decreases with the extent of distribution. One extreme is a purely localist representation, with each input node coding a distinct individual. The other extreme is the completely distributed case, where each of the 2N binary activation patterns over N nodes represents a distinct individual. In this case, no two patterns canbesuperimposedwithoutspuriouslycreatinganewpattern. Alignment. Lack of generalization due to localist encoding is an important issue in acquisition of morphological structure, as it correlates with human perception of word similarity. The problem arises whenever known symbol patterns are presented in novel arrangements, as when we are able to recognize the English word book in handbook,or the shared root mach in German machen and gemacht. Conjunctive coding of letters is closer to the localist extreme, anchoring a letter either to its position in the string, or to its surrounding context. Since languages wildly differ in the way morphological infor- mation is sequentially encoded, ranging from suffixation to prefixation, circumfixa- tion, apophony, reduplication, interdigitation, and combinations thereof, alignment of lexical roots in three diverse pairs of paradigmatically related forms like English walk– walked,Arabickataba–yaktubu (‘he wrote’–‘he writes’) and German machen–gemacht (‘make’–‘made’ past participle), requires substantially different encoding strategies. If we wired any such strategy into lexical representations (e.g. through a fixed templatic structure separating the lexical root from other morphological markers) we would in fact slip morphological structure into the input, making input representations dependent on languages. This is not wrong per se, but it should be the outcome, not a prerequisite of lexical acquisition. A cognitively plausible solution is to let the processing system home in on the right sort of encoding strategy through repeated exposure to a range of language-specific families of morphologically related words. Thisiswhatconjunctivecodinginclassicalconnectionistarchitecturescannotdo. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology 

For example, in German machen and gemacht the shared substring mach would be indexed to different time positions. In Arabic, the set of Wickelcodes {_Ka, kAt, aTa, tAb, aBa, bA_} encoding the perfective form kataba turns out to have an empty intersectionwiththeset{_Ya,yAk,aKt,kTu,tUb,uBu,bU_}fortheimperfective form yaktubu. Relations between variables and the copy operator.Theapparentneedforalgebraic variablesandopen-endedrelationshipsbetweenvariablesinmorphologicalrulessuch as (1) makes the computational problem of implementing copying mechanisms centre- stage in word processing. This is illustrated by the two-level transducer of Figure 8.2, implementing a classical constructive approach to word parsing and generation. The input tape represents an input memory register containing an abstract representation of the Italian form vengono (‘they come’), with ven representing the verb root, ‘V3’ its conjugation class, and ‘3p_pres_ind’ the feature specification ‘third person singular of present indicative’ (‘+’ is a morpheme boundary). The string resulting from the application of the finite state transducer to the input string is shown on the output tape. The mapping relationship between characters on the two tapes is represented by the colon (‘:’) operator in the centre box, with input characters appearing to the left of ‘:’,and the corresponding output characters to its right. When no colon is specified, the input letter is intended to be left unchanged. Note thatε ‘ ’ stands for the empty symbol. Hence the mapping ‘[+ : ε]’ reads ‘delete a marker of morpheme boundary’. Most notably, transductive operations of this kind define declarative, parallel, and bi- directional relationships. We can easily swap the two tapes and use, for word parsing, the same transducer designed for word generation.3 Multi-layered perceptrons seem to have little to match such a rich weaponry of formal tools and computational operations. Nonetheless, it should be appreciated that multiple copying of unbounded strings is an operation that goes beyond the computational power of finite-state transducers such as the one in Figure 8.2. This is an immediate consequence of the fact that states represent the order of memory of a transductive automaton. For an automaton to be able to replicate an exact copy of any possible string, irrespective of its length, the number of available states needed to precompile all copies is potentially unbounded. As pointed out by Roark and Sproat (2007), if fully reduplicative morphological phenomena require potentially unbounded copying, they are probably among the very few morphological phenom- enawhicharenotinthepurviewoffinite-stateprocessingdevices. It seems natural to establish a parallelism between the two-level architecture in Figure 8.2 and the multi-layered perceptron of Figure 8.1 (right). In both cases, we

3 Due to the hybrid status of lexical representations in two-level automata, where both surface and abstract morphological information is combined on the same level, abstract features (e.g. conjugation class) must be linearly arranged and interspersed with phonological information. This may strike the reader as rather awkward and other neater solutions have been entertained in the literature. However, as argued by Roark and Sproat (), they do not add up to the computational power of two-level finite state machines. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi

input tape ven + V3 + 3p_pres_ind

T = ven[+: g] [V3+:ε] [1s_pres_ind: 0 |3p_pres_ind: ono]

output tape vengono

Figure . A two-level finite state transducer for the Italian irregular form vengono ‘they come’. have an input and an output representation layer, which could be likened to digital registers. In both cases, mapping relations enforce the requirement that at least part of the representation in either layer is also found in the other layer. It is also tempting to conceptualize such relation as a copy operation. However, although this is indeed the case for finite-state transducers, it is not literally true of multi-layered perceptrons, where data copying is implemented as weight adjustment over inter-layer connections. According to Marcus, this is a fundamental liability of multi-layered perceptrons. Since they cannot implement unbounded copying, he argues, they are not suitable computer models of morphology learning. In fact, we observed that universal copying is not a problem for perceptrons only. Finite-state transducers are in principle capable of dealing with multiple copying, as longasthenumberofstringsinthedomainofthecopyoperationisbounded.Thisis somewhat reminiscent of the local constraint on weight adjustments in multi-layered perceptrons, where the domain of identity mapping must be defined with respect to each possible input and output node. Admittedly, the two limitations are of a different nature and are due to different reasons. Nonetheless, it is important to appreciate that universal copying is a thorny computational issue, and its necessity for word processingandlearningshouldbearguedforwithcare.

.. Interim summary In this section, we have been through a number of criticisms levelled at classical connectionist networks dealing with issues of word processing with no symbolic manipulation of typed variables. In the remainder of this chapter, we will make the suggestion that many of the liabilities pointed out by Marcus in connection with OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology  multi-layered perceptrons in fact apply to a specifically constructive approach to morphological processing. In particular, they are due to (i) the idea that morphological inflection and derivation require a formal mapping from lexical bases to full forms, on (ii) the assumption that underlying bases and inflected forms are represented on two distinct layers of connectivity (or, equivalently, in two distinct registers). Such a theoretically loaded conceptualization makes computational models of inflection and derivation unnecessarily complex. In the following sections we propose an alternative neurally inspired view of word processing and learning that is strongly committed to an abstractive perspective on these and related issues. The approach amounts to a re-interpretation of equation (1) in terms of different computational operations than copying and concatenation. It seems likely that an adequate account of morphology acquisition will have a place both for memory mechanisms that are sensitive to frequency and similarity, and for processing operations that apply to all possible strings. A fundamental assumption of the computational framework proposed here is that both aspects, lexical memory and processing operations, are in fact part and parcel of the same underlying computational mechanism. This mechanism is considerably simpler than advocates of algebraic symbol manipulation are ready to acknowledge, and points in the direction of a more parsimonious use of variables and universally quantified one-to-to mappings. This is not to imply that these notions play norole in morphology learning or language learning in general, but that their role in word learning and processing should be somewhat qualified.

. A computational framework for abstractive morphology Kohonen’s Self-Organizing Maps (SOMs) (Kohonen 2001) define a class of unsuper- vised artificial neural networks that mimics the behaviour of small aggregations of neurons (pools) in the cortical areas involved in the classification of sensory data (brain maps). In such aggregations, processing consists in the activation of specific neurons upon presentation of a particular stimulus. A distinguishing feature of brain maps is their topological organization (Penfield and Roberts 1959): nearby neurons in the map are activated by similar stimuli. There is evidence that at least some aspects of their neural connectivity emerge through self-organization as a function of cumulated sensory experience (Kaas et al. 1983). Functionally, brain maps are thus dynamic memory stores, directly involved in input processing, exhibiting effects of dedicated long-term topological organization. Topological Temporal Hebbian SOMs (hereafter TSOMs, Ferro et al. 2011, Marzi et al. 2012c) represent a variant of classical SOMs making room for memorizing time series of symbols as activation chains of map nodes. This is made possible by a level of temporal connectivity (or temporal connection layer, Figure 8.3, left), implemented as a pool of re-entrant connections providing the state of activation of the map at the immediately preceding time tick. Temporal connections encode the OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi

spatial temporal input connection map connection # # # $ $ $ $ $ $ $ $ $ $ L S S S S S S layer layer nodes layer # # $ $ $ $ $ $ $ $ $ $ $ L L S L S S S G G G $ $ $ G G G $ $ $ $ L L L L S S S G G G G G G G G G $ $ $ $ L L L L L M M G G G G G G G G G G $ $ $ L L L L L M M 1 F F G G G G G G G G A A A L L L L L L M X F F F G G G G G G A A A A A L L L L M M 2 F F G G G G G G A A A A A A A L L L M M x1 F F F A A A A A A A A A A A L L L L M M ... F F F A A A A A A A A A A A A L L L L H x2 F N F A A A A E A A A A A A A L L L H H ...... N N N A A A E E E E A A A A A P P H H H N N N E E E E E E E E A A A P P P H H H E xj i N N E E E E E E E E P P P P P P I H H N N N E E E E E E E E P P P P P I I I I ...... N N N E E E E E E E E E P C C I I I I I N N N E E E E E E E E E E C C C I I I I x D ... T N T E E E T E E E E E C C C I I I I R T T T T T T T T E E E C C C C C I R R R N T T T T T T T T E E E E C C C C C C R R

Figure . Left: Outline architecture of a TSOM. Each node in the map is connected with all nodes of the input layer. Input connections define a communication channel with no time delay, whose synaptic strength is modified through training. Connections on the temporal layer are updated with a fixed one-step time delay, based on activity synchronization between BMU(t–1) and BMU(t). Right: A two-dimensional 20 × 20 TSOM, trained on German verb forms, showing activation chains for gelacht and gemacht represented as the input strings #,G,E,L,A,C,H,T,$ and #,G,E,M,A,C,H,T,$ respectively. See text for more details. map’s probabilistic expectations of upcoming events on the basis of past experience. In the literature, there have been several proposals of how this idea of re-entrant temporal connections can be implemented (see Voegtlin 2002 and references therein). In this chapter, we will make use of one particular such proposal (Koutnik 2007,Pirrelli et al. 2011). In language, acoustic symbols are concatenated through time to reach the hearer as a time-bound input signal. With some qualifications, also written words can be defined as temporal patterns, conceptualized as strings of letters that are produced and input one letter at a time, under the assumption that time ticks are sampled discretely upon the event of letter production/presentation. In our view, this abstract conceptu- alization highlights some peculiar (and often neglected) aspects of the linguistic input and provides an ecological setting for dealing with issues of language production, recognition, and storage at an appropriate level of abstraction. TSOMs are designed to simulate and test models of language processing targeted at this level of abstraction.

.. Input protocol: peripheral encoding In our idealized setting, each word form is represented by a time-series of symbols (be they phonological segments or printed letters) which are administered to a TSOM by encoding one symbol at a time on the input layer (Figure 8.3, left). The input OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology  layer is implemented as a binary vector whose dimension is large enough for the entire set of symbols to be encoded, with one pattern of bits uniquely associated with each symbol. Patterns can partially overlap (distributed representations) or can be mutually orthogonal (localist representations). In all experiments presented here, we adoptedorthogonalcodes.Awholewordispresentedtoamapstartingwithastart- of-word symbol (‘#’) and ending with an end-of-word symbol (‘$’). Map’s re-entrant connections are initialized upon presentation of a new word. The implication of this move is that the map’s activation state upon seeing the currently input word form has no recollection of past word forms. Nonetheless, the map’s overall state is affected by previously shown words through long-term learning, as detailed in what follows.

.. Parallel activation: recoding Upon presentation of one symbol on the input layer all map nodes are activated simultaneously through their input (or spatial) connections. Their resulting level of activation at time t, hi(t), is also determined by the contribution of re-entrant connections, defined over the map’s temporal layer, which convey information about the state of map’s activation at time t-1, according to the following equation:

(3) hi(t) = α · hS,i(t) + β · hT,i(t). Equation(3)definesthelevelofactivationoftheith node at time t as a weighted summation of the contribution hS,i of the spatial layer and the contribution hT,i flowing through the map’s temporal layer. Given (3), we define the map’s Best Matching Unit at time t (hereafter BMU(t)) as the node showing the highest activation level at time t. BMU(t) represents (a) the map’s response to input, and (b) the way the current input stimulus is internally recoded by the map. As a result, after a string of letters is presented to the map one character at a time, a temporal chain of BMUsisactivated. Figure 8.3 (right) illustrates two such temporal chains, triggered by the German verb forms gelacht (‘laughed’, past participle) and gemacht (‘made’, past participle), which are input as the series #,G,E,L,A,C,H,T,$ and #,G,E,M,A,C,H,T,$ respectively. In the figure, each node is labelled with the letter the node gets most sensitive to after training. Pointed arrows represent temporal connections linking two consecutively activated BMUs. They thus depict the temporal sequence of symbol exposure (and node activation), starting from the symbol ‘#’ (anchored in the top left corner of the map) and ending with ‘$’. Temporal BMU chains of this kind represent how the map recodes input words.

.. Long-term dynamic: training In the learning phase, at each time step t, BMU(t) adjusts its connection weights over the spatial and temporal connection layers (Figure 8.3, left) and propagates adjustment to neighbouring nodes as an inverse function of their distance from BMU(t). OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi

Adjustment makes connection weights closer to values in the input vector. This amounts to a topological specialization of individual nodes which get increasingly sensitive to specific symbol types. Notably, specialization does not apply across the board. Due to the propagation function, adjustment is stronger for nodes that are closer to BMU(t). Moreover, the topological range of propagation shrinks as learning progresses. This means that, over time, the propagation wave gets shorter and shorter, reaching fewer and fewer nodes. The temporal layer presents a similar dynamic. Adjustment of Hebbian connections consists of two steps: (i) potentiate the strength of association from BMU(t-1) to BMU(t) (and its neighbouring nodes), and (ii) depress the strength of association from all other nodes to BMU(t) (and neighbouring nodes) (Figure 8.4, left). The two steps enforce the logical entailment BMU(t) → BMU(t-1) and the emergence of a context-sensitive recoding of symbols. This means that the same symbol type will recruit different BMUs depending on its preceding context in the input string. This is shown in Figure 8.3 (right) where the letter a in gemacht and gelacht activates two different nodes, the map keeping memory of their different left contexts. Interestingly,

#

G

E

S F

P B B R E R A A O H C G G

H T E t-1 time t $ N

$

Figure . Left: Topological propagation of long-term potentiation (solid lines) and long- term depression (dotted lines) of temporal re-entrant connections over two successive time steps. B-nodes indicate BMUs. Right: Word-graph representation of German past participles. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology  the Markovian order of the map memory is not limited to one character behind but can rise as the result of dynamic recoding. If space on the map allows, the different BMUsassociatedwitha in gemacht and gelacht will, in turn, select different BMUs for the ensuing c, as shown in Figure 8.3. Were nodes trained independently, the map would thus exhibit a tendency to dedicate distinct activation chains to distinct input strings (memory resources allowing). This happens if the topological range of the propagation function for temporal connections goes down to . In this case, the map can train temporal connections to each single node independently of connections to neighbouring nodes. However, if the topological range of the propagation function for temporal connections does not get to , adjustments over temporal connections will transfer from one node to its neighbouring nodes (Figure 8.4, left). In this condition, activation chains triggered by similar words will tend to share nodes, reflecting the extent to which the map perceives them as identical. The topological distance between chains of activated BMUs responding to similar input strings tells us how well the map is aligning the two strings.

. Implications of TSOMs for word processing and storage TSOMs are dynamic models of string storage. They can be seen as mimicking the self- organizingbehaviourofamentallexicongrowinginsizeasafunctionofcumulated lexical experience. The more word forms are stored in a TSOM through lexical exposure, the more complex its overall internal structure becomes, with recurrent patterns of morphological structure being internally recoded as shared chains of node activation. This is shown in Figure 8.4 (right) (adapted from Marzi et al. 2012a), where the activation chains for the German past participles gesprochen (‘spoken’), gesehen (‘seen’), gesagt (‘said’), and gefragt (‘asked’) are unfolded and vertically arranged in a word graph. It is important to appreciate that this behaviour is the outcome of an internally generated dynamic, not a function of either direct or indirect supervision. The map isinnowaytaughtwhattheexpectedmorphologicalstructureofinputwordforms is. Structure is an emergent property of self-organization and a function of exposure to input representations. In a strict sense, TSOMs offer no output representations. In a broader sense, input and output representations co-exist on the same layer of map circuitry. It is the level of node connectivity that changes through learning. Output morphological structure can thus be defined as an overall patterning of long-term connectivity. Such a radical change of perspective has important implications both theoretically and computationally.

.. Coding A fundamental difference between TSOMs and traditional perceptrons is that in TSOMs we distinguish among levels of input coding. In particular, note that the OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi input vector, peripherally encoded on the input layer in Figure 8.3 (left), is even- tually recoded on the map proper. It is the latter level that provides the long-term representation upon which morphological structure is eventually perceived.4 Since recoding is the result of learning, different cumulated input patterns may lead to different recoded representations. By neglecting the important distinction between levels of input coding, perceptrons exhibit a principled difficulty in dealing with input representations. Conjunctive coding can naturally deal with issues of input specificity if two conditions are met: (i) there is a topological level of encoding where two nodes that are sensitive to the same symbol type share identical internal representations; and (ii) context-sensitive specialization on symbol tokens is acquired, not wired-in. In TSOMs, both conditions are met: (i) nodes that are selectively sensitive to the same symbol type have overlapping representations on the spatial layer, and (ii) nodes become gradually sensitive to time-bound representations through specialization of their temporal connections. Hence, different input instances of the same symbol receive differently recoded representations, thus effectively dealing with variable binding when the same symbol type occurs more than once in the same input string.

.. Time vs Space: Differences in recoding strategies A TSOM recoding strategy may reflect either a bias towards the local context where a symbol is embedded (i.e. its immediately preceding symbols), or a tendency to conjunctively encode the symbol and its position in the string. Choice of either strategy is governed by the relative prominence of parameters α and β in equation (3), with α modulating the map’s sensitivity to symbol encoding on the input layer, and β weighing up the map’s expectation of upcoming symbols on the basis of past input. The two recoding variants (hereafter dubbed spatio-temporal and temporal respectively)haveanimpactontheoveralltopologyofthemap.Inaspatio-temporal map, topologically neighbouring nodes tend to be sensitive to symbol identity, with sub-clusters of nodes being selectively sensitive to context-specific instances of the same symbol. Conversely, in a temporal map neighbouring nodes are sensitive to different symbols taking identical positions in the input string (Marzi et al. 2012a). This is shown in Figure 8.5, plotting the average topological dispersion of BMUsactivated bysymbolsinthe1st,2nd,..., 9thpositionintheinputstringona spatio-temporal (upper plot) and a temporal map (lower plot). Dispersion is calculated as the average distance between nodes discharging upon seeing any symbol appearing in a certain position in the input string, and is given as a fraction of 100 over the map’s diagonal. Dispersion values thus range between  per cent (when all symbols activate the same

4 This is in fact confirmed by what we know about levels of letter representation in the brain, ranging from very concrete location-specific patterns of geometric lines in the occipito-temporal area of the left hemisphere, to more frontal representations abstracting away from physical features of letters such as position in the visual space, case, and font (Dehaene ). OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology 

30 temporal spatio-temporal 25

20

15

10

topological dispersion (%) dispersion topological 5

0 123456789 symbol position

Figure . Topological dispersion of symbols on temporal and spatio-temporal maps, plotted by their position in input words. Dispersion values are given as a fraction of 100 over the map’s diagonal. node, this is true of the start-of-word symbol ‘#’) to 100 per cent, when two nodes lie at the opposite ends of the map’s diagonal. Marzi et al. (2012b) show that the two recoding strategies exhibit a differential behaviour. Due to their greater sensitivity to temporal ordering, temporal maps are better at recalling an input string by reinstating the corresponding activation chain of BMUs in the appropriate order, when the input string is already familiar to the map. This is due to temporal maps being able to build up precise temporal expectations through learning, and avoid possible confusion arising from repetition of the same symbols at different positions in the input string. On the other hand, spatio-temporal maps are better at identifying substrings that are shared by two or more word forms, even when the substrings occur at different positions in the input string. This seems to be convenient for acquiring word structure in non-templatic morphologies. For a TSOM to be able to perceive the similarity between—say—finden and gefunden,the pieces of morphological structure shared by the two forms must activate identical or topologically neighbouring nodes. Clearly, due to the combinatorial properties of morphological constituents, this is done more easily when recoded symbols are abstracted away from their position in time. We can measure the sensitivity of a map to a specific morphological constituent by calculating the topological distance between the chains of BMUs that are activated by the morphological constituent appearing in morphologically related words. For example, we can calculate how close on the map are nodes that discharge upon pre- sentation of a verb stem shared by different forms of the same paradigm. The shorter OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi

0.95 0.95 findest finde finden findet findend finde findest finden 0.925 fandst fandet fanden 0.925 fand findet findend

fand fandst fandet 0.9 0.9 fanden

gefunden

0.875 0.875 stem alignment stem alignment stem

0.85 0.85

0.825 0.825 gefunden

0.8 0.8 0.9 0.91 0.92 0.93 0.94 0.95 0.9 0.91 0.92 0.93 0.94 0.95 full-form alignment full-form alignment

Figure . Alignment plots of the finden paradigm on a temporal (left) and a spatio-temporal (right) map. theaveragedistanceis,thegreaterthemap’ssensitivitytothesharedmorphological structure. If D is such an average distance normalized as a fraction of the map’s diagonal (or dispersion value), then 1-D measures the alignment between activation chains discharging on that morphological constituent. Plots in Figure 8.6 show how well a temporal map (left) and a spatio-temporal map (right) perceive the alignment between the stem in each inflected form of German finden (‘find’)andthestemsinallotherformsofthesameverbparadigm.Stem alignment scores (y axis) are plotted against the perceived alignment of each full form with all other forms in the same paradigm (full-form alignment, x axis). Intercept values of the linear interpolation of alignment scores in the two maps show that a spatio-temporal map can align intra-paradigmatic forms better than a temporal map can. In particular, spatio-temporal maps prove to be less sensitive to shifts in stem position, as shown by the different alignment values taken by the form gefunden,which is perceived as an outlier in the temporal map. This is consistently replicated in the inter-paradigmatic perception of affixes. Box plots in Figure 8.7 show how scattered are activation chains that discharge upon seeing a few inflectional endings of Italian and German verbs (Marzi et al. 2012b). Activation chains are more widely scattered on temporal maps than they are on spatio-temporal maps. This is true of both German and Italian inflectional endings, confirming a systematic difficulty of temporal maps in perceiving the formal identity of inflectional endings that occur at different positions in word forms. OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology 

Italian maps: -ARE, -ERE, -IRE, -NO, -I German maps: -EN, -EM, -ER, -T, -ST 14 14 12 12 10 10 8 8 6 6 4 4

topological dispersion (%) dispersion topological set) (training 2 (%) dispersion topological set) (training 2 temporal spatio-temporal temporal spatio-temporal

14 14 12 12 10 10 8 8 6 6 4 4

topological dispersion (%) dispersion topological set)(test 2 (%) dispersion topological set)(test 2 temporal spatio-temporal temporal spatio-temporal

Figure . Top: (Topological) dispersion values across activation patterns triggered by selected inflectional endings on temporal and spatio-temporal maps for Italian (left) and German (right) known word forms. Bottom: (Topological) dispersion values for the same set of inflections calculated on unknown word forms.n I all panels, dispersion values are given as a fraction of 100 over the map’s diagonal.

It is worth emphasizing, in passing, that a TSOM identifies inflectional endings only on the basis of the formal contrast between fully inflected word forms. In German preterite forms, it is easier for a map to perceive a possible segmentation te-st by aligning glaubte with glaubtest,thanthesegmentationte-s-t on the basis of the contrast between glaubtest and glaubtet. The latter segmentation requires discontinuous align- ment between the final t’s in the two forms, which is something the map does more reluctantly, and only under certain distributional conditions. This behaviour appears to be in keeping with Bank and Trommer’s ‘subanalysis complexity hypothesis’ (this volume).

.. Training independence Training independence is taken by Marcus to be responsible for perceptrons’ inability to generalize connectivity patterns to untrained nodes. As nodes are trained inde- pendently, nothing learned in connection with one specific node can be transferred to another node. This is not true of TSOMs, where learning requires topological propaga- tion of adjusted spatial and temporal connections, modulated by the neighbourhood distance between each map’s node and the current BMU(t). Topological propagation plays an important role in allowing the map to generalize over unseen activation OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi

# #EE E EEE E E C C C C C C P P P L # # # E O E O EEE E C C C C C CCP F V O O O O O O EEEEEI C I I I F F V V O O O O O E E E EEE I I I I I I F S V C O O O O O O O O EEE EEI I I I S S S V R O O O O O O O O EEEEEEI I I I S S O T O O O O O O EEEEEE I I I I S S E E T T T T T OOOOOO I I I I I I I S S T T T T T O O O OOOO I I I I I I S S T T T T T TOOOOO I I I I I I M M M D D $ $ T TTTTTTTT I T I I I I M M M $ $$TTT TTT TTTTT I I I M M M $ $ T TTTTTTTTT T D D D D D M M I E $ $ R RRT TTTGT D D D D D D D M M $ RRR R A A A G G G D D D D D D D N N A T $ $ RRRA A A G G G G G G N DDN NN $ $ R R R A A A G G G G G G N N NNN N M E $ $ RRRA A A A A A G GGNNNNNN $ $ RRRA A A A A A A AANNNNNN O $ $ $ R RRR A A A A A A A N NNN NN N $

Figure . BMU activation chains for vediamo-vedete-crediamo on a 20 × 20 map (left) and their word-graph representation (right). In the graph, nodes that are strongly activated by the novel form credete are highlighted in grey. Topological proximity of D-nodes allows the map to bridge the transitional gap between cred- and -ete. chains. Figure 8.8 (left) shows the activation chains triggered by the Italian verb forms vediamo ‘we see’, vedete ‘you see’ (second person plural), and crediamo ‘we believe’, on a 20×20maptrainedonalltheseforms.Thesamemapiseventuallyprompted with the unknown credete ‘you believe’ (second person plural), to test its capacity to anticipate a novel form. The crucial generalization step here involves the circled area including D-nodes that are activated upon seeing stem-final d in either ved- or cred-. When credete is shown to the map, spread of temporal activation over the circled area will raise an expectation for -ete to follow, based on the activation chain of vedete.The word graph in Figure 8.8 (right) illustrates this. Topological propagation of temporal connections over neighbouring map nodes enforces inter-node training dependence, allowing expectations of upcoming symbols to be transferred from one activation chaintoanother,basedontopologicalcontiguity. An important consequence of this generalization strategy is that inferences are made cautiously, on the basis of a local, analogy-based, inter-paradigmatic extension. This seems to be a logically necessary step to take for a learner to be able to capture the wide range of stem variation and stem selection phenomena attested in inflectional systems of medium and high complexity. Another interesting implication of the generalization strategy enforced by a TSOM is that inflectional endings are better aligned if they are immediately preceded by the same left context. Hence, not any OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology 

0.95 Italian 0.9 German 0.85 0.8 0.75 0.7 0.65

correlation score correlation 0.6 0.55 0.5 0.45 –3 –2 –1 0 1 2 3456 position relative to morpheme boundary

Figure . Correlation coefficients between alignment scores and recall scores for novel words, plotted by letter positions relative to the stem-ending boundary. Position order on the x axis is right to left:  corresponds to the first letter of the inflectional ending; negative values correspond to letters coming after the first letter of the inflectional ending, and positive values index letters preceding the inflectional ending. The plot shows that correlation rapidly decreases as an inverse function of the letter distance from the stem-ending boundary on either side. analogy between two stems will work, but only a shared sub-string in the immediate context of the inflectional ending. This is shown by the graph of Figure 8.9, plotting the correlation between how accurately the map is recalling a novel input word, and how well the novel word’s activation chain is aligned with the closest activation chain of a known word. As we move away from the stem-ending boundary of the novel word on either side, correlation of alignment scores with recall accuracy becomes increasingly weaker and statistically less significant. This is exactly what Albright and Hayes’ Minimal Generalization algorithm would expect: alignment between nodes activated by a novel word and nodes activated by known words matters less as nodes lie further away from the morphemic boundary. Unlike Minimal Generalization, however, which requires considerable aprioriknowl- edgeaboutwordstructure,mappingrulesandthespecificstructuralenvironmentfor their application, the generalization bias of TSOMs follows from general principles of topological memory and self-organization.

.. Universally quantified variables AccordingtoMarcus,adefaultmorphologicalrelationlike(1)(repeatedin(4)for the reader’s convenience) requires the notion of a copy operation over a universally OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi quantified variable. An abstractive computational model of morphological compe- tencesuchastheoneembodiedbytheTSOMofFigure8.8doesnotrequireacopy operation. (4) PROGR(Y) = Y + ing, In fact, equation (4) only imposes the requirement that the base form and progressive form of a regular English verb share the same stem. In TSOMs, sharing the same stem implies that the same morphological structure is used over again. There is no computational implication that this structure is copied from one level of circuitry to another such level. This is well illustrated by the activation chains plotted in Figure 8.8 for the Italian verb forms vediamo and vedete, which appear to include the same word initial activation chain. Another interesting related issue is whether a TSOM can entertain the notion of an abstract morphological process such as prefixation. If a map A, trained on prefixed word forms, is more prone to recognize a word form with an unfamiliar prefix than another map B never trained on prefixed words, we can say that map A developed an expectation for a word to undergo prefixation irrespectively of the specific prefix. To ascertain this, we trained two maps on two sets of partially instantiated paradigms: the first map was exposed to fifty Italian verb paradigms, and the second maptofifty German verb paradigms. Each paradigm contained all present indicative forms, all past tense forms, the infinitive, past participle, and gerund/present participle forms, for a total of fifteen paradigm cells. Whereas Italian past participles are mostly suffixed (with occasional stem alternation), German past participles are typically (but not always) prefixed with ge-. After training, we assessed to what extent the two maps were able to align stems in pairs, where X is a known infinitive, and PREF an unknown prefix. Results are shown in Figure 8.10, where stem dispersion is tested on fifty made-up Italian and German pairs, instantiating a (e.g. riprendere, prendere)anda (e.g. zumachen, machen)patternrespectively. The German map is shown to consistently align prefixed stems with base stems better than the Italian map does, with over 50 per cent of German pairs activating exactly the same BMUs, thus proving to be more tolerant towards variability of stem position in the input string. This evidence cannot simply be interpreted as the result of amore flexible coding strategy (e.g. spatio-temporal recoding as opposed to temporal) since both maps were trained with the same parameter set. Rather, it is the map’s acquired familiarity with prefixed past participles in German that makes it easier for similarly (butnotidentically)prefixedpatternstoberecognized.

. General discussion Human lexical competence is known to require the fundamental ability to retain sequences of linguistic items (e.g. letters, syllables, morphemes, or words) in the OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology 

8 Italian German 7

6

5

4

3 stem dispersion (%) dispersion stem 2

1

0 5 10 15 20 25 30 35 40 45 50 pairs

Figure . Stem dispersion over pairs in Italian and German verb forms. Dispersion values are given as a fraction of 100 over the map’s diagonal. working memory (Gathercole and Baddeley 1989, Papagno et al. 1991). Speakers appear to be sensitive to frequency effects in the presentation of temporal sequences of verbal stimuli. Items that are frequently sequenced together are stored in the long-term memory as single chunks, and accessed and executed as though they had no internal structure. This increases fluency and eases comprehension. Moreover, it also explains the possibility to retain longer sequences in short-term memory when familiar chunks arepresented.Theshort-termspanisunderstoodtoconsistofonlyalimitednumber (ranging from three to five according to recent estimates, e.g. Cowan 2001) of available store units. A memory chunk takes one store unit of the short-term span irrespective of length, thus leaving more room for longer sequences to be temporarily retained. Furthermore, chunking produces levels of hierarchical organization of the input stream:whatisperceivedasatemporalsequenceofitemsatonelevelmaybeperceived as a single unit on a higher level, to become part of more complex sequences (Hay and Baayen 2003). Finally, parts belonging to high-frequency chunks tend to resist being perceived as autonomous elements in their own right and being used independently. As a further implication of this ‘wholeness’ effect, frequently used chunks do not participate in larger word families such as inflectional paradigms (Marzi et al. 2012b). We take this type of evidence to illustrate principles of lexical self-organization and to shed light on the intimate interplay between processing and storage in language acquisition. TSOMs provide a general framework for putting algorithmic hypotheses of the processing–storage interaction to the severe empirical test of a computer implementation. Furthermore, unlike classical perceptron-like neural architectures OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi trained on back-propagation, they allow scholars to entertain a purely abstractive view of morphological competence, based on a number of realistic assumptions concerning acquisitionofwordstructure.Werecapitulatetheminwhatfollows. Ecology of training conditions. In modelling word acquisition, it is important to be clear about what is provided in input, and what is expected to be acquired. Choice of input representation may in fact prejudge morphological generalizations to a con- siderable extent (see also Stump and Finkel’s consideration of representational issues, this volume). Machine learning algorithms tend to make fairly specific assumptions on word representations. Even memory-based approaches to morphology induction (Daelemans and van den Bosch 2005), which make ‘lazy’ use of abstraction-free exem- plars,arenoexception.Toensurethatonlyfeaturesassociatedwithalignedsymbols (letters or phonological segments) are matched, Keuleers and Daelemans (2007) align exemplar representations to the right and cast them into a syllabic template. Indeed, for most European languages, we can construe a fixed-length vector representation by aligning input words to the right, since inflection in those languages typically involves suffixation and sensitivity to morpheme boundaries. However, this type of encoding presupposes considerable knowledge of morphology of the target language and does not possibly work with prefixation, circumfixation, and non-concatenative morphological processes in general. Likewise, most current unsupervised algorithms (see Hammarström and Borin 2011 for a recent survey) model morphology learning as a segmentation task, assuming a hard-wired linear correspondence between sub- lexical strings and morphological structure. However, both highly fusional and non- concatenative morphologies lend themselves grudgingly to being segmented into linearly concatenated morphemes. In assuming that word forms simply consist of a linear arrangement of time-bound symbols, TSOMs take a minimalist view on matters of input representation. Differences in self-organization and recoding are the by- product of acquired sensitivity of map nodes to recurrent morphological patterns in the training data. In our view, this represents a principled solution to the notorious difficulty of multi-layered perceptrons in dealing with issues of input binding and coding. Finally, it addresses the more general issue of avoiding input representations that surreptitiously convey structure or implicitly provide the morphological informa- tion to be acquired. Models of word acquisition should be more valued for adapting themselves to the morphological structure of a target language, than for their inductive or representational bias. Storage and processing. In models of memory, mnemonic representations are acquired, not given. In turn, they affect acquisition, due to their being strongly implicated in memory access, recognition, and recall. TSOMs process words by acti- vating chains of memory nodes. Parts of these chains are not activated by individual words only, but by classes of morphologically related words: for example, all forms sharing the same stem in regular paradigms, or all paradigmatically homologous forms sharing the same suffix. An important implication of this view is that both OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Computational complexity of abstractive morphology  lexical forms and sub-lexical constituents are concurrently represented on the same level of circuitry by Hebbian patterns of connectivity linking stored representations. This eliminates the need for a copy operator transferring information from one level to another, according to a mechanistic interpretation of the variable Y in a default generalization like PROGR(Y) = Y + ing. Moreover, robust recoding of time-bound information effectively minimizes the risk of creating spurious patterns due to the superposition of lexical and sub-lexical chains. Another consequence of this view is that morphological productivity is no longer conceptualized as a mapping function between a (unique) lexical base (stored in the lexicon)andaninflectedform(generatedon-line).Anyinflectedformcanactasa baseforanyotherinflectedform.Itisimportanttoappreciate,however,thatabaseis not defined in terms of structural factors such as economy of storage or simplicity of mapping operations, but rather in terms of usage-oriented factors such as time of acquisition, frequency-based entrenchment, probabilistic support of implicative relations, and gradient capability of discriminating inflectional class among alternative hypotheses. As long as we are ready to accept that lexical representations are not dichotomized from lexical entailments, the reconceptualization of word structure offered by TSOMs is in line with a genuinely abstractive view of morphology. Generalization and training. Marcus (2001) correctly emphasizes that no general- ization is possible with training independence. Generalization implies propagation, transferring of information from one node to another node, not just adaptation of connectivity between individual input and output nodes. Propagation is also inde- pendently needed to address another apparent paradox in incremental acquisition. Human speakers are very good at inferring generalizations on the basis of local analo- gies, involving the target word and its closest neighbours (Albright 2002). However, analogy is a pre-theoretical notion and any linguistically relevant analogy must take into account global information, such as the overall number of forms undergoing a particular change under structurally governed conditions. How can global analogies be inferred on the basis of local processing steps? Topological propagation offers a solution to this problem. Although a BMU isidentifiedonthebasisofapairwise similarity between each node and the current input stimulus only, the connectivity pattern of a BMU is propagated topologically on the spatial and temporal layers of a map. This prompts information transfer but also competition between conflicting patterns of local analogy. It is an important result of our simulations that the neat effect of competition between local analogies is the emergence of a global, paradigm- sensitive notion of formal analogy (Marzi et al. 2012a). All issues raised here appear to address the potential use of computational models of language acquisition for theoretical linguists and psycho-linguists focusing on the nature of grammar representations. We argue that there is wide room for cross- disciplinary inquiry and synergy in this area, and that the search for the most appropriate nature of mental representations for words and sub-word constituents, OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Vito Pirrelli, Marcello Ferro, and Claudia Marzi and a better understanding of their interaction in a dynamic system, can provide such a common ground. At our current level of understanding, it is very difficult to establish a direct correspondence (Clahsen 2006) between language-related categories and macro-functions (rules vs exceptions, grammar vs lexicon) on the one hand, and neuro-physiological correlates on the other hand. As an alternative approach to the problem, in the present chapter we have focused on a bottom-up investigation of the computational complexity and dynamic of psycho-cognitive micro-functions (e.g. perception, storage, alignment, and recall of time series) to assess their involvement in language processing, according to an indirect correspondence hypothesis. Albeit indirect, such a perspective strikes us as conducive to non-trivial theoretical and computational findings. To date, progress in cross-disciplinary approaches to language inquiry has been hindered by the enormity of the gap between our understanding of some low- level properties of the brain, on the one hand, and some very high-level properties of the language architecture on the other. The research perspective entertained in these pages highlights some virtues of modelling, on a computer, testable processing architectures as dynamic systems. Arguably, one of the main such virtues lies in the possibility that by implementing a few basic computational functions and their dynamic interrelations, we can eventually provide some intermediate-level building blocksthatwouldmakeiteasiertorelatelanguagetocognitiveneuroscience. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Patterns of syncretism and paradigm complexity: The case of Old and Middle Indic declension

PAOLO MILIZIA

. The principle of compensation According to the principle of compensation, formulated in 1940 by Viggo Brøndal (1940: 102), a marked category does not typically allow more subdistinctions than its unmarkedcounterparts(cf.alsoJakobson1939:146,Greenberg1966:27ff.,Baerman et al. 2005: 22 f.). The canonical instance of such a principle involves two categories that may be seen as hierarchically organized so that, in the presence of marked values of the superordinateone,wecanexpecttohavepartialortotalsyncretismaffectingthevalues of the other. Massive case syncretism in the dual number of the nominal declension of ancient Indo-European languages (assuming number to be superordinate to case in these systems) neatly exemplifies such a tendency (cf. Brøndal 1940: 103). On the other hand, as the complexity of the considered paradigm increases, and, in particular, when the interacting grammatical categories are more than two, syncretism patterns can emerge for which it may be difficult to assess whether, and to what extent, the principle of compensation is complied with. In addition to the theoretical problem related to the very notion of markedness (which we shall touch on later), it should be noticed, first, that Brøndal’s formulation makes reference to a category hierarchy that, in fact, is not always easy to define, since in a paradigm the same category may appear superordinate or subordinate to another according to the different syncretism instances.1 Moreover, one might wonder which prediction should be made for a possible three-member hierarchy A > B > C. In other words, given a marked value of A, is the principle of compensation complied with to a greater extent when both B

1 A definition of the concept of ‘domination réciproque’ is already in Hjelmslev (: ). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Paolo Milizia and C are morphologically neutralized or when only the values of B are syncretized while those of C are distinguished? In the present contribution, we shall propose some observations concerning prob- lems of this type using as a starting point the adjectival declension of the -a/a¯- stems in Old and Middle Indic. Indeed this paradigm represents a case worthy of attention in that, as we shall see, the position of its syncretism patterns along the three categorial dimensions of number, gender, and case shows noticeable properties in both synchronic and diachronic perspective.2 Among the three categories for which the Old Indic adjective is inflected, number seems to be the least prone to syncretism, while the marked values of number, i.e. plural and dual, tend to represent the context for syncretism of gender or case values. Interestingly, in Pali the inflection of the adjectival stems in -a/a¯-shows that a slab of the paradigm, namely the feminine gender, if we take it in isolation, represents a seeming counterexample to the principle of compensation, since the plural distinguishes more oblique cases (instrumental/ablative -ahi¯ ,genitive-anam¯ . , locative -asu¯ ) than the singular (where -aya¯ indicates instrumental, ablative, genitive, and, in competition with -ayam¯ . ,locative). In order to explain such state of affairs a promising approach may be to reformu- late the principle of compensation in terms of frequency and information content. According to the formulation of some scholars (cf. Cairns 1986: 18) the principle of compensation can be seen as the effect of a general tendency to avoid an excessive number of marked values. Along this line of argument, for instance, it might be said that in Latin or in Old Indic the ablative plural is syncretic with the dative plural because both its case and number values are marked. This conception entails that the relevant property is the ‘number of marked values’. But in what sense may it be said that markedness values are addable items, and where in the morphological system does this operation of markedness-counting have its place? The position defended here is that marked values are not addable by themselves, because they are heterogeneous items, but their markedness is indeed addable if we interpret it in terms of frequency. Indeed, this approach (cf. Hawkins 2004: 64 ff., Haspelmath 2006), also with its corollary that a tendency to syncretism may be related to low-frequency values (s. Greenberg 1966: 68–9; cf. also Croft 2003: 113), is not new in the study of morphology. Moreover, in recent years, a line of investigation that describes paradigm relationships in terms of information theory has been proven to be scientifically fruitful (see, e.g., Baayen et al. 2011; Milin et al. 2009; Kostić and Božić 2007; Moscoso del Prado Martín et al. 2004).

2 In fact, the same properties are largely shared by the noun system if we consider inflectional class oppositions instead of gender oppositions. Indeed, in Old and Middle Indic nominal, inflectional class and genderarecloselyrelated(cf.alsofn.). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Patterns of syncretism and paradigm complexity 

Crucially, relative frequency and information content (henceforth IC)—i.e. the base-2 cologarithm of the relative frequency (Shannon 1948)—are purely numerical, non-categorical, notions. Thus, if markedness is interpreted as depending on fre- quency or IC, then the operation of summing poses no problem from a theoretical point of view. Moreover, if paradigms are described in information-theoretical terms, then the IC of an exponent will be due to the sum of the ICs associated with each of the morphosyntactic properties that are cumulatively expressed by that exponent. In other words, the operation of summing is automatically involved by cumulative exponence. Along this line of analysis, we may assume that a paradigm cell associated with ‘marked’—in the sense of being relatively infrequent—values, such as for instance a locative, feminine, dual, will be likely to share its exponent with other cells, because a non-syncretic exponent specific to that cell would have a too low frequency, i.e. a too high IC. Such a formulation, in fact, leaves undetermined what frequency value should be considered relevantly low. Yet, it would be misguided to look for a universal critical value under which syncretism would become likely or more likely: indeed, if we compare a paradigm with four cells with a paradigm with seventy-two cells (like, e.g., the Old Indic adjective), then a cell of the latter is expected to be less frequent than a cell of the former, each within its paradigm; and that, of course, independently of markedness relationships. Therefore, we propose, tentatively, that the relevant point of reference for each paradigm is the average frequency value of its cells. To make such an assumption is,infact,tohypothesizethatoneoftheaimsofsyncretismistocompensatefor the difference in frequencies shown by the cells of a paradigm by allowing two— or more—rare morphosyntactic property sets to share the same exponent.3 In other words, our working hypothesis will be that one of the forces operating within the morphological system strives to make all exponents have the same frequency or, at any rate, to reach an equilibrium point whereby the distribution of the exponents is closertotheequiprobabledistributionthantheonethatwouldbegeneratedbyaone- to-one mapping between exponents and morphosyntactic property sets. Given our premises, the question arises of which information-theoretic quantities aremostsuitabletobeusedasheuristicindicatorsofthedegreeofcompliancewiththe principle of compensation. In regard to this issue, we can look, in the first instance, at the Kullback–Leibler divergence between the observed and the uniform distribution (henceforth D) and at the Shannon redundancy (henceforth R), with the caveat that

3 Such a claim may be viewed as complementary to the assumption that systematic inflectional homonymy favours memorability in paradigms exhibiting cumulation (Carstairs ; : –). See also Brown et al. () for the observation that, in a rule-based framework distinguishing between syncretism due to exponent underspecification and syncretism due to explicit rules of referral, the latter might also increase memorability. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Paolo Milizia we do not rule out that other measures might prove to be able to describe the linguistic data more exactly. D and R are related to entropy (H), which represents the average information contentofasymbol,i.e.inourcase,ofanexponentoftheinflectionalparadigm. It should be noted that such a use of entropy is different from that described by Stump and Finkel in Chapter 7. Importantly, in the present chapter the probability values relevant to what we call ‘entropy of the exponent set’ are always related to the text (token) frequency of the items. The entropy of the exponent set is defined according to the following expression, where X is the set of the exponents {x1, x2 ...xN}, N is the number of the exponents belonging to X,andP(xi) is the probability (i.e. the relative frequency) of an expo- nent xi: N ( ) =− ( ) ( ) (1) H x P xi log2P xi i=1 D and R represent, respectively, the difference (see expression 2; cf. Rissanen 2007: 18–19) and the fractional difference (see expression 3; cf. Shannon 1948) between the maximum possible value of entropy (which is equal to log2 N)anditsobservedvalue. = − ( ) (2) D log2N H X ( ) (3) R = 1 − H X log2N H is at its maximum when all probability values are equal. Hence, by definition, when all symbols have the same probability, both D and R are minimized and equal to zero.4 Therefore, it seems to be justified to interpret a decrease in these quantities5 as a shift towards an exponent distribution closer to the equiprobable case and, therefore, towards a paradigm organization more in keeping with the principle of compensation.

4 Unlike other information-theoretic measures proposed for the description of morphological paradigms (cf. Kostić et al. ; s. also Milin et al. :  ff.), the entropy of the exponent set as we have defined it does not involve a weighting of the probabilities of the exponents according to their functions and meanings. 5 While D can be considered as a measure of the distance from the uniform distribution, the use of R as an indicator is compatible with the assumption of a system organization leading to redundancy reduction. Both indexes predict that syncretism is disfavoured between high-frequency cells and favoured between low-frequency ones. However, the remark must be made that there is an intermediate range of frequency values for which appearance of syncretism causes a decrease in D and an increase in R.Moreprecisely,it can be shown that, in the case of a syncretism involving two cells a and b with probabilities pa and pb, D decreases if the following inequality is satisfied: − ( − ) log2N log2 N 1 pa + p < b pa pb H( + , + ) pa pb pa pb

On the other hand, the condition for R to decrease is the satisfaction of an inequality identical to the one above except for the fact that the right term is multiplied by (-R). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Patterns of syncretism and paradigm complexity 

Table . Relative frequencies of inflectional values on the basis of Lanman (1880)

number case singular 0.703 nominative 0.399 plural 0.250 vocative 0.084 non-oblique 0.726 dual 0.047 accusative 0.242 instrumental 0.082 gender dative 0.048 masculine 0.621 ablative 0.011 oblique 0.274 neuter 0.208 genitive 0.075 feminine 0.171 locative 0.059

. Some observations on the Old Indic adjectives in -a- ThedataoftheOldIndicadjectivaldeclension(basedonLanman1880)areeffectivein showing how different from each other the relative frequencies of the morphosyntactic property sets can be for a paradigm involving three morphological categories (cf. Table 9.1 6 andFigure9.17): compare, for instance, a property set combining three frequent values such as the nom. masc. sg. with another combining three infrequent valuessuchastheloc.fem.du.(topandbottomofthegraphinFigure9.1,respectively). If we look at the paradigm of the Old Indic adjectives in -a/a¯-(Table9.2,8 cf. Thumb and Hauschild 1959: 30–50), we can see that the negative correlation between frequency of morphosyntactic property sets and proneness to syncretism is manifest. In addition to the already noticed ‘dual effect’, it is particularly remarkable that the ablative, which is the of the lowest frequency, is, at the same time, one of the two oblique cases showing syncretism in the plural (with the dative), and one of the two cases exhibiting syncretism in the feminine singular (with the genitive).

6 As Lanman’s tables (: –) conflate nominative and vocative in the plural, nominative, accusative, and vocative in the dual and in the neuter plural, and nominative and accusative in the neuter singular, the values concerning the three non-oblique cases have been completed so that the proportion between the frequencies of two morphosyntactic properties corresponds to the one resulting from the sections of Lanman’s tables where those properties are distinguished. 7 The graph is based on the data in Table ., and represents an idealization since it treats gender, number, and case as if they were mutually completely independent variables. The volume of each parallelepipedal tile is proportional (with the caveat just noted) to the relative frequency of the corresponding morphosyntactic property set. 8 In Table . a probability value is given for each exponent of the paradigm on the basis of a calculation of the relative frequency of each morphosyntactic property set according to Lanman’s data (). For syncretic exponents, the table records the sum of the relative frequencies of the property sets associated with them. The endingse - of the locative masculine/neuter singular, the vocative feminine singular, and the non-oblique feminine/neuter dual have been considered as different homonymous exponents. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Paolo Milizia

Masc

N GENDER Sing Ntr Fem

A NUMBER

CASE V Plur I D Abl Du G L

Figure . A graph representing the relative frequencies of morphosyntactic property com- binations in Old Indic adjectival paradigms

Now, in order to exemplify how the defined indicators may be used to assess syncretism patterns, we can consider hypothetical variants of the paradigm in Table 9.2. To pick a clear example of a ‘compensation-compliant’ pattern, reduction of syncretism in the dual by introducing, e.g. an opposition between a masculine-neuter oblique exponent **-aebhy¯ am¯ and a corresponding feminine **-abhy¯ am¯ ,wouldpro- duce an increase in D and R. This means that the presence of this syncretism instance contributes to keep D and R low (cf. Table 9.3, first variant).9

9 ThisisanexemplifyingcalculationbasedonthedatashowninTable.,whichleavesoutof consideration the information content related to the inflectional class and the fact that certain exponents (e.g. gen. -asya)arespecifictothe-a/a¯-class, while others are shared by more classes (e.g. voc. zero). Moreover, OI adjectives follow the same inflection shown by substantives with the specific property that certain adjectival inflectional classes form a feminine subparadigm by creating a peculiar stema in-¯- (for masc./ntr. stems in -a-), in -u¯-(formasc./ntr.stemsin-u-), or in -¯ı-. Therefore, a comprehensive OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Patterns of syncretism and paradigm complexity 

Table . Old Indic -a-/-¯a- adjective declension (with the omission of some ending variants) and relative frequencies of the sets of inflectional value arrays associated with the different exponents singular plural dual masc. neuter fem. neuter masc. fem. m. n. f. nom. non- -as -ā voc. obl. -a-am -e -āni -ās -au -e acc. -ām -ān instr. -ena -ayā -ais/-ebhis -ābhis dat. -āya -āyai -ābhyām -ebhyas -ābhyas abl. -āt obl. -āyās gen. -asya -ānām loc. -e -āyām -es.u -āsu -ayos singular plural dual m. n. f. n. m. f. m. n.| f. nom. .186 .029 voc. .055.210 .004 .039 .117 .033 .010 acc. .024 .019 instr. .029 .016 .025 .011 dat. .035 .009 .001 .004 .001 abl. .008 .007 gen. .051 .017 .003 loc. .036 .005 .010 .006

Importantly, this kind of comparison allows us to identify those syncretism instances that cannot be traced back to the principle of compensation and, in fact, are in conflict with it. Thus, the introduction for the feminine nom./acc./voc. plural of an ending different from the one of the nom. pl. (-as¯ )wouldmakeD and R decrease andnotincrease(cf.Table9.3,secondvariant).10 On the other hand, it is clear that in Old Indic nominal paradigms the principle of compensation coexists and interacts with a series of major structuring principles.11 Thus, the cases are organized into a non-oblique and an oblique subset; the masculine and the neuter behave as two subcategories (almost systematically neutralized in the oblique) of a masculine/neuter gender; the masculine is prone to syncretism with the feminine in the non-a/a¯- classes, while it generally rejects syncretism with the information-theoretic description of the whole noun system, which is beyond the purpose of the present chapter, should also incorporate this sort of data. 10 We can observe that, given the entropy (H) and the number of exponents (N) of the paradigm in = Table ., for a syncretism between two cells having equal probabilities pa pb to cause a decrease in D, + the sum pa pb must be less than about .. For R to decrease the same sum must be less than about . (cf. fn. ). 11 See Koenig and Michelson (this volume), for an example of how a series of structuring principles (that can be formalized by means of rules with underspecified antecedent and rules of referrals) may lead to a system of actual morphosyntactic distinctions that is dramatically different, and poorer, than the set of the potential morphosyntacic distinctions. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Paolo Milizia

Table . R and D valuesforsomehypotheticalvariantsofthe paradigm in Table 9.2: (1) with distinction between m./ntr. and fem. in the instr./dat./abl. du.; (2) with distinction between m. nom. pl. and fem. non-obl. pl.; (3) with oblique-case syncretism in the fem. sg.; (4) with oblique-case syncretism in the m./ntr. sg.

NR D

Paradigm 29 0.2078 1.0097

N R R variation D D variation

Variant 1 30 0.2156 +0.0078 1.0582 +0.0485 Variant 2 30 0.1932 −0.0146 0.9478 −0.0619 Variant 3 26 0.1962 −0.0116 0.9222 −0.0875 Variant 4 25 0.2450 +0.0372 1.1378 +0.1281 neuter in the nominative and vocative cases; the neuter systematically syncretises non-oblique cases. These facts may be thought of as due, at least partially, to seman- tic/functional grounds.12 More generally, the here hypothesized tendency towards equiprobabilityistobeviewedasoneofseveralfactorsthatmayinfluenceinflectional structures.13

. Compensation and language change It is important to specify that, in our view, the principle of compensation, reformulated in information-theoretical terms, is to be intended as a factor capable of conditioning thediachronicdevelopmentofalanguageratherthanasarulethatoperatessynchron- ically at the morphological level.

12 Clearly, a crucial fact is that one of the major functions of non-oblique cases is to express syntactic relations, particularly the subject/non-subject opposition. Thus, to a certain degree, non- syncretism is balanced by the possibility of recovering syntactic structure from other elements such as word order or verbal agreement. Significantly, non-oblique syncretism seems to be cross-linguistically commoner than syncretism between oblique cases (cf. Baerman et al. : ). As for the OI tendency to structurate gender distinction in non-oblique cases along an opposition masculine–feminine vs neuter, this may depend on the fact that, as far as the core of nouns with semantic gender assignment is concerned, such an opposition coincides with the semantic feature [± human]. Indeed, in interaction with other categories such as topicality and agentivity, humanness is typically associated with the syntactic relation of subject. 13 For more details about principles and tendencies that are or may be in conflict with the preference for equiprobable exponents, see Milizia () (especially Chapters  and ). A point to be mentioned here concerns the possible assumption of a general tendency towards the phonological optimization of exponents, in accordance with which more frequent exponents tend to be phonologically shorter, or simpler, than less frequent ones. Indeed, as I attempted to show, such kind of optimization has the side-effect of favouring the appearance of anti-compensative syncretism. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Patterns of syncretism and paradigm complexity 

In other words, our claim is that, other things being equal, i.e. apart from pressures exerted by the phonological and the syntactic level, a diachronic change that is consistent with the principle of compensation (i.e. that makes the proposed indicators decrease) will be more likely to occur than a change that goes in the opposite direction. Now, for a paradigm exhibiting cumulative exponence, the diachronic changes that have the desired effect are of at least three types: (1) syncretization of low-frequency cells; (2) introduction of formal distinctions eliminating instances of syncretism that involve high-frequency cells; (3) shift from cumulative to separate exponence in correspondence with low-frequency cells. Types (1) and (2) directly follow from our previous reasoning. Type (3) is also evidently ‘compensation-compliant’ in that it dispenses the system with having inflec- tions limited to ultra-rare morphosyntactic property combinations.

. The Pali development Interesting observations can be made if we compare the Old Indic paradigm with its Middle Indic descendant in the Pali language (Table 9.4, cf. Oberlies 2001: 141). ThemorphosyntacticpropertiesarethesameasinOldIndicexceptforthelossof the dual number and the near-complete loss of the dative case (with its functions predominantly subsumed by the genitive).14 The ablative plural, left without its ancient syncretism companion (i.e. the dative plural), finds a new fellow in the instrumental plural.15 The oblique feminine singular is totally or almost totally syncretic for case (depending on which locative variant is

Table . Pali -a-/-¯a- adjective declension singular plural masc. neuter fem. neuter masc. fem. nom. -o -ā -ā non-obl. voc. -a-am. -e -āni -ā/-āyo acc. -am. -e/-āni instr. -ena -e-hi -ā-hi abl. -ā/-amhā/-asmā/-ato -āya gen. - obl. -assa ānam. loc. - - -e/-amhi /-asmi(-m. ) -āya(-m. ) e-su ā-su

14 As for the remnants of the -aya¯ - dative, see von Hinüber :  ff. These forms, having final func- tion and being predominantly associated with action nouns, seem to have moved towards the derivational end of the inflectional–derivational continuum. 15 In Pali, the ablative continues to be the least frequent case. A text sample containing  occurrences of nominals has given the following frequency values for the non-oblique cases: instr. .; abl. .; gen.: .; loc.: .. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Paolo Milizia

Table . Syncretism in the marked gender and in the marked number in Pali -a-/-¯a- adjectives feminine plural singular plural masc. / fem. instr. neuter -ā-hi instr. abl. - - gen. -āya -ānam. abl. e-hi ā-hi loc. -ā-su gen -ānam. loc. -e-su -ā-su

selected) and this development is not phonologically expected.16 These changes clearly comply with the principle of compensation (type 1 according to the classification sketched earlier). Indeed, it may be noted that, if hypothetically applied to the Old Indic paradigm, syncretization of all the oblique cases in the feminine singular involves a decrease in D and R, while the same is not true if syncretization is applied, e.g., to the masculine/neuter slab (cf. Table 3, third and fourth variants). Moreover, in the non-oblique plural, Old Indic had an ending -as¯ for nom./voc. masc. and nom./voc./acc. fem, but this syncretism is lost in the Pali paradigm variant17 characterized by the ending -ayo¯ , which is specific to the non-oblique feminine plural. As we have already seen, the loss of the syncretism in question (a type 2 development) also involves a shift of the exponent distribution towards equiprobability. Thus, even if other systemic forces that might have led to this syncretism loss are easy to imagine (homonymy avoidance; analogical pressure of the endings of the -¯ı- class, cf. Oberlies 2001: 150), our starting hypothesis predicts that it might have been favoured—and certainly was not hindered—by the principle of compensation. Now, as we mentioned at the beginning of this chapter, the subparadigm of the femi- nine oblique cases exhibits an interesting phenomenon in that the plural distinguishes more case forms than the singular. Is this in contradiction with the principle of compensation? In fact, if we look at the broader picture, we can see that for the oblique cases of the marked number, i.e. the plural, the scarceness of case syncretism is somehow balanced by the properties characterizing the expression of gender in the same paradigm cells. We observe two

16 In the instr. sing. in -aya¯ the length of the vowel a¯ is unexpected (cf. OI -aya¯); also unexpected is the loss of the final consonant in the -aya¯ locative. 17 For a literary language with a complex tradition like Pali (cf. Oberlies : –, with further references), it is difficult to say whether a variation of this kind should be ascribed to the presence oftwo possibilities in the system or to a diasystematic variation coming to the surface in our texts. This issue, at any rate, does not affect the core of our argument. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Patterns of syncretism and paradigm complexity  relevant phenomena: the presence of gender syncretism18 inthegenitivein-an¯ am¯ and the tendency to disaggregate cumulative exponence in the other oblique cases. Indeed, the exponents -ehi;-ahi¯ ;-esu;-asu¯ can be analysed as sequences of two morphs if we posit 1) a morph -e- peculiar to the masculine/neuter (plural), 2) a morph -a¯-peculiar to the feminine (plural), 3) a morph -hi marking the instrumental/ablative plural, and 4) a morph -su marking the locative plural. It is true that this segmentability of the oblique plural endings was in part already present in Old Indic, where, moreover, -bhis and -su were synchronically operating also as exponents of the consonantal declensions. Nevertheless, it is significant that the phonological decomposability was increased in Pali by two specific changes: the first is the elimination of the non-segmentable variant of the instr. pl. -ais (which was the ending inherited from Proto-Indo-European) in favour of the segmentable -ebhis (> Middle Indic -ehi), which is a secondary analogical creation (cf. Thumb and Hauschild 1959:36);thesecondistheintroductionofnewformsofaccusativemasculineplural in -e.19 Indeed, the -e-accusatives may be analysed as containing the morph -e plus a zero case-inflection and may be thus considered to be parallel to the forms in -ehi and -esu. Now, as noted, these phenomena of partial change from cumulative to separate exponence—or, perhaps better said, to semi-separate exponence since in our forms numberandcasecontinuetobecanonicallycumulated—canbeseenasameansof complying with the principle of compensation (type 3 development).20 Namely, in the case of separate exponence we have inflections that appear in more paradigm cells, like in syncretism, and that, therefore, have a frequency higher than that of the fully cumulative non-segmented exponents (see Milizia 2014 for a more comprehensive treatment of this issue).

. Parallel developments The phenomena that we have just described have striking parallels in other Indo- European languages (Table 9.6). In the definite adjectival paradigm of Old Church Slavic (Table 6a, cf. Huntley 1993: 147), the feminine singular shows two instances of case syncretism, and, at the same time, the plural and dual oblique are totally syncretic for gender. In the

18 The appearance of gender syncretism in the context of non-singular number values is a cross- linguistically recurrent phenomenon that complies with the principle of compensation. An interesting instance of this kind, by which the distribution of different gender syncretism patterns is related to word class distinctions, is exhibited by Archi (Chumakina and Corbett, this volume). 19 The masculine accusative plural in-e is in competition with a variant in -ani¯ that is homonymous with the nom./acc./voc. ntr. and reproduces the gender/case syncretism pattern of the singular ending -am. . 20 A distinction such as, e.g., the one between masculine (endings -va, -ta) and non-masculine (endings -ve,-te) in the dual of the Slovene verb (s. Plank and Schellinger :  f.) should be analysed along this line (compare the masc. dvaˆ,non-masc.dveˆ ‘two’). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Paolo Milizia

Table . Old Church Slavic (a) and Russian (b) definite adjective a. singular plural dual masc. neuter fem. neuter masc. fem. m. n. f. non- nom. -ŭjĭ -aja -iji -oje -aja -yję -aja -ěji obl. acc. =nom./gen. -o˛jo˛ instr. -yjimĭ -yjimi -yjima dat. -ujemu -yjimŭ obl. -ěji loc. -ějemĭ -yjixŭ -uju gen. -ajego -yję b. singular plural masc. neuter fem. masc./neuter/fem. non- nom. -yj -aja -ye -oe obl. acc. =nom./gen. -uju =nom/gen. instr. -ym -ymi dat. -omu -ym obl. -oj loc. -om gen. -ogo -yx

corresponding Russian declension (Table 6b; cf. Vaillant 1958: 497f.) these two lines of development have been taken to their extreme consequences: the feminine oblique is totally syncretic, as in Pali, and, in the plural, gender distinctions have been completely neutralized (in this case also at the syntactic level). Thus Indo-Aryan and Slavic, starting from similar paradigms (not identical because the antecedent of the Slavic definite adjective contains a univerbized pronoun), show two developments that, though mutually independent, are strikingly reminiscent of each other.21 A difference is that in the Slavic plural we find only gender syncretism, while the Pali plural exhibits a mix of gender syncretism and gender separate exponence. Also this type of development is not unparalleled among the IE languages. Thus, in the paradigm of the Latin adjectives of the type bonus, the etymological corre- spondent of the Indo-Aryan -a-/-a¯- class, the dative/ablative plural shows total gender syncretism, but the genitive plural exhibits the two different endingsorum -¯ and -arum¯ , associated with the masculine/neuter and with the feminine respectively (Table 9.7). Significantly, these endings can be analysed as sequences formed by the morph -o¯-,

21 Analogous developments are also found in the Germanic group. Thus, in the Old Icelandic weak adjectival declension, the singular has the ending -a forthegenitiveandthedative(andtheaccusative)of the masculine/neuter and the ending -o for all corresponding feminine cells, while the plural distinguishes the dative (in -om) from the genitive (in -u, like the nominative) but is totally syncretic for gender (cf. Noreen : ). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Patterns of syncretism and paradigm complexity 

Table . Gender syncretism and gender semi-separate exponence in the plural of the Latin -o-/-¯a and of the Pali -a-/-¯a- adjectives Latin masc./neuter fem. Pali masc./neuter fem. acc. -ō-s -ā-s dat. acc. -e-ā abl. -īs instr. -e-hi -ā-hi gen. -ō-rum -ā-rum abl. gen. - ānam. loc. -e-su -ā-su for the masculine/neuter, or -a¯-, for the feminine, plus a morph -rum specific to the genitive plural; in turn the accusative plural endings -os¯ and -as¯ are interpretable as formed by -o¯-or-a¯-plus-s. Noticeably, this situation is not the inherited one. The endingorum -¯ represents the product of morphological analogy (the inherited one was -um, cf. Leumann 1977: 415): the analogical creation of -rum in the Latin genitive plural parallels, somehow, that of -ehi (< -ebhis) in the Indo-Aryan instrumental and ablative plural. Indeed we might say that the Latin pattern is the ‘mirror image’ of the Pali one: in Latin the ablative is syncretic for gender and the genitive has separate gender exponence, while in Pali the opposite occurs. Again, we observe similar lines of development that are carried out in autonomous ways.

. Vertical and horizontal syncretism The developments mentioned earlier involve two alternative means to avoid the appearance of exponents with low frequency, i.e. either case syncretism (as in Pali in the feminine singular) or elimination of gender distinction from cumulative expo- nence; in turn, such an elimination is achieved either by gender syncretism or by the introduction of separate gender exponence. Now, if we think that the parallel development shown by Pali and Slavic is not purely due to chance, we have to explain why oblique case syncretism seems to be preferred in the feminine singular, while elimination of gender distinction from cumulative exponence seems to be preferred in the plural. The recurrence of such a configuration is indeed not banal in that it conflicts with the hypothesis of a fixed hierarchy among inflectional categories (cf. the Feature Rank- ing Principle in Stump 2001: 239 f.; s. also Baerman et al. 2005: 113 ff.) and, after all, with the principle of compensation as Brøndal conceived it. If we posit that the category of case may have a feature structure involving a node [oblique]22 (cf. Wiese 2004),

22 Cf. Wiese (). For further issues related to underspecification, see also Koenig and Michelson (this volume). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Paolo Milizia then oblique case syncretism can be seen as an instance of case underspecification in the context of gender, which suggests that gender is superordinate to case; on the contrary, gender syncretism with no or limited loss of case distinctions seems to point to a different hierarchy. A possible solution might be to assume that the ranking of gender with respect to case is dependent on number because of some semantic/functional grounds. Nevertheless, there is a second, perhaps more interesting possibility, i.e. to assume that the whole syncretism pattern can be explained as a by-product of the interaction between competing structural principles internal to the morphological level.23 We start from the following two assumptions: (1) morphosyntactic property oppo- sitions may differ in strength; (2) the IC that can be carried by a single exponent is constrained so that a too high IC may result in a critical instability of the correspond- ing exponent. Table 9.8 shows a simulation describing the IC (in bits) of the exponent of an oblique case in the feminine singular (left part of the table) and plural (right part) in a hypo- thetical language having four equiprobable oblique cases, a feature-value structure analogous to that of the Pali adjective, and feminine/non-feminine, plural/singular, and oblique/non-oblique ratios (1:4, 3:7, 3:7, respectively) similar to the ones of Old and Middle Indic. The rows are arranged according to three types of exponents: non- syncretic exponents (row 1); exponents associated with all the oblique cases, i.e. show- ing ‘vertical syncretism’ (row 2); exponents specific to a particular oblique case and associated with all the three gender values, i.e. showing ‘horizontal syncretism’ (row 3). Now, let us posit that the opposition between the feminine and the mascu- line/neuter is stronger than the one between an oblique case and another, i.e. that, other things being equal, vertical syncretism is favoured as against horizontal syn- cretism. Given that premise, it is possible to imagine a situation in which the tendency to avoid exponents with an IC far higher than the average results not only in a dispreference for the non-syncretic patterns (i.e. A1 and B1) but also, in the plural and only in the plural, in a dispreference for the—elsewhere preferred—vertical syncretism (B2), and this is because of the low relative frequency of plural nominals, which raises the overall IC of the plural exponents. The easiest way to devise a model able to generate the observed patterns would be to posit the existence of an IC threshold and hypothesize that the threshold value

23 On the other hand, it is difficult to assess to what extent the phonological level may have played a role in these developments. One might think that in the singular the OI feminine endings of the preserved oblique cases (-aya¯,-ay¯ as¯ ,-ay¯ am¯ ) had in common a significant amount of phonological material with each other and that this fact might have favoured a phonological collapse (even if the regular sound changes would not have led to it). Nevertheless, the fact that the corresponding masculine/neuter endings (-ena, -asya,-at¯ ,-e) are less similar to each other with respect to what occurs in the feminine slab, is in part due to morphological grounds: the inherited instr. sg ending was indeed -a¯ (cf. Thumb and Hauschild : ), while -ena is an Indo-Aryan innovation. In other words, the morphological level is able to enhance formal distinctions when this is in keeping with its preferences. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Patterns of syncretism and paradigm complexity  sing. plur. sing. plur. sing. plur. mn f mn f mn f mn f mn f mn f case α case β case a case b case c case d case case α case β case a case b case c case d case case α case β case a case b case c case d case non- obl. obl. non- obl. obl. non- obl. obl. bits bits bits plural 2.32 feminine 1.74 case a 7.8 total 3.74 plural 2.32 feminine 1.74 obl. 5.80 total 1.74 plural–0 1.74 case a 5.48 total 3.74 B1) B2) B3) sing. plur. sing. plur. sing. plur. mn f mn f mn f mn f mn f mn f case α case β case a case b case c case d case case α case β case a case b case c case d case case α case β case a case b case c case d case non- non- obl. obl. non- obl. obl. non- obl. obl. bits bits bits singular–0 0.51 case a 4.25 total 3.74 singular 2.32 feminine 0.51 obl.total 1.74 4.57 singular 2.32 feminine 0.51 case a 6.57 total 3.74 A1) A3) A2) Table . Vertical and horizontal syncretism in a hypothetical paradigm inflected for number, gender, and case OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Paolo Milizia lies between the IC of B2 and that of B3—let it be, e.g., 5.5 bit. Indeed, given such athreshold,itisexpectedthatB2willbeavoidedandB3willbepreferred.Inother words, we can posit a configuration of values on the basis of which oblique case syncretism is not sufficient in the plural, so that the most preferable paradigm will have horizontal syncretism in the plural (B3) due to the pressure of the IC-restraining tendency, but vertical syncretism in the singular (A2) in compliance with the system- specific syncretism preference. Yet it would be going too far to say that such a simple evolutionary algorithm can be considered as an exact description of the Pali or Slavic development.24 As regards the details of the algorithmic machinery, other hypotheses could be legitimately devised. In particular, it is clear that a different model of the way the IC-restraining tendency and the resistance to syncretism offered by the different categorial oppositions interact with each other might lead to partially different predictions. Our aim here is limited to defining a problem that seems to us of theoretical importance—i.e. the question of why different IE groups independently exhibit inflec- tions with vertical syncretism in the singular and horizontal syncretism in the plural— andtotryingtoshowthatsuchaphenomenon,whichappearsasanincreaseinthe organizational complexity of the paradigm, may be thought of as the result of an interaction of preferences by which restrictions related to the information capacity of the exponents must play a central role.

. Horizontal syncretism and the feminine gender In the previous paragraph we hypothesized that in the Slavic and Pali developments the feminine/non-feminine opposition was stronger than oblique case oppositions (cf. also Hjelmslev 1935: 108). That such a language-specific property may have been shared by both Indo-Aryan and Slavic poses no problem since the two are genetically related. On the other hand, cross-linguistic data seem to point to a hierarchy according to which gender syncretism is more widespread than (and therefore cross-linguistically preferred to) case syncretism (see Baerman et al. 2005: 113 f.). However, because of the particular status of the feminine in Proto-Indo-European, the assumption of a non- typical situation in Pali and Slavic is not particularly demanding. Indeed, a widespread hypothesis is that the category of the feminine was originally autonomous from the

24 As for Indo-Aryan, a model with a fixed-threshold fits the data of the instrumental, genitive, and locative (and predicts, therefore, the different distribution of the two patterns along the two number values), but assesses every non-syncretic ablative exponent as critically unstable. This is due to the fact that, because of the rareness of this case (cf. also fn. ), a non-syncretic ablative exponent will have an IC greater than the threshold imagined. On the other hand, in both Old Indic and Pali, adjectives and nouns do have a non-syncretic ablative exponent—i.e. the masc./ntr. OI -at¯ > Pali -a¯—only for the least marked values of the categories of gender (masculine/neuter), number (singular), and declension class (-a/a¯-). Significantly, among the Indo-European languages, the commonest development is the loss of the purely ablatival case by means of a category merger involving at least another oblique case. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Patterns of syncretism and paradigm complexity 

Table . Jaina-Mah¯ ar¯ as¯.t.r¯ı -a-/-¯a- adjective declension singular plural masc. neuter fem. neuter masc. fem. nom. -o -ā non-obl. voc. -ā -a -am. -e -ān. i/-āi(m. ) -ā/-āo acc. - -am. e abl. - - - - -āo e-hi(m. ) / e-him. to ā-hi(m. ) / ā-him. to instr. - - obl. -en. a(m. ) e-hi(m. ) ā-hi(m. ) gen. - -assa -āe ān. a(m. ) - - loc. -e/-ammi e-su(m. ) ā-su(m. )

opposition between masculine and neuter, and was only secondarily included into the category of gender (cf. Clackson 2007: 104 f.). Significantly, the absorption of the feminine into the gender-system seems to go further in the course of the Middle Indo-Aryan period. In the paradigm in Table 9.9 (cf. Jacobi 1886: xxxvi f.) we can see the inflection of the -a/a¯-adjectivesinJaina- Mah¯ ar¯ as¯.t.r¯ı, a Prakrit language that represents a more advanced stage of the Middle Indo-Aryan development. Here, the feminine singular distinguishes the ablative case from the other oblique cases,butagainthepresenceofthiscasedistinctionisrealizedatthecostoflosing gender distinctions, since the ablative singular ending -ao¯ ,istotallysyncreticfor gender.25 Thus, at this stage, gender distinctions seem to have been weakened and to have increased their proneness to syncretism independently of number. Moreover, the whole development corroborates the idea that a particularly rare paradigm cell tends to share its exponent with other cells independently of the syncretism pattern that is realized. From Proto-Indo-European to Middle Indo-Aryan, the ablative case is constantly involved in some pattern of syncretism, but these patterns are continuously reshaped.

. Conclusion The Indo-Aryan morphological structures previously considered confirm that an information-theoretic approach to paradigm description turns out to be a promising tool for grasping organizational principles underlying the emergence of complexity

25 The -ao¯ forms (cf. also the corresponding form -ado¯ in the Śaurasen¯ı prakrit variety, cf. Pischel : §), which are continuations of denominal ablatival adverbs in -tas, are phonologically expected in the feminine but not in the masculine/neuter, where the thematic vowel -a-isetymologicallyshort.Noticeably, ◦ the adverbial suffixtas - is also the etymological ancestor of the final to which is added to the instrumental plural to form ablatival ending variants non-syncretic for case (cf. Table .). The distribution oftas - >–to (> -o) is, therefore, the reflex of an instance of semi-separate exponence that has been clouded by the loss of intervocalic t. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Paolo Milizia in syncretism patterns. More generally, it may be thought that one of the sources of morphological complexity lies in the interplay between organizational principles at the level of the morphosyntactic properties and equilibrium tendencies related to the quantitative distribution of the inflectional items. As we have tried to show, the study of diachronic developments provides a favourable viewpoint for observing how this interplay exerts its moulding force.

Acknowledgements Most of the issues addressed in the present chapter are treated in more detail in Milizia (2013). The author wishes to express his gratitude to Matthew Baerman and Dunstan Brown for their precious observations and suggestions. He is also grateful to Marco Mancini, Luca Lorenzetti, and Giancarlo Schirru for the support received while carrying out this research. As usual, he remains entirely responsible for possible errors or omissions. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi



Learning and the complexity of Ø-marking

SEBASTIAN BANK AND JOCHEN TROMMER

. Introduction Research on the automatic learning of morphological segmentation is a history of mixedsuccess.Ontheonehand,therearebynowwell-understoodandefficient methods for stemming, i.e. the identification of lexical stems in inflectional word forms (Lovins 1968, Porter 1980, Goldsmith 2010). On the other hand, identifying affixes intheaffixalstringsisolatedbystemming,ataskwhichwewillcallinthefollowing subanalysis, is a research area still in its infancy.1 Consider, for example, the German verbal agreement paradigm in (1). Standard stemming assigns the segmentation in (1a), which separates the stem glaub ‘believe’ from the suffixes expressing tense and subject agreement. However, it is obvious tothe linguistically trained eye, and uncontroversial in German linguistics, that the resulting suffix strings have further structure: they contain the separable past tense suffix -te, whose segmentation (1b) reveals that most of the agreement markers appearing in present tense forms are also used in the corresponding past forms. In fact, recent research in theoretical morphology (cf. the papers in Trommer and Müller 2006, and references cited there) often assumes a much more radical subanalysis. Thus Müller (2005) argues for the segmentation in (1c) where the apparent 2sg affix -st is decomposed into a true 2sg marker -s and the non-first-person affix -t which also showsupinthe2plandthe3sg:

1 We do not address here two further problems for morphological segmentation: First, discontinuous morphology, as in Semitic root-and-pattern morphology (see Pirelli et al., this volume, for discussion) and infixation (cf. the case of Archi discussed by Chumakina and Corbett, this volume). Second, the possibility that affixes occur in different positions with respect to each other (cf. Rice  on variable orders among prefixes and suffixes, and Chumakina and Corbett on cases in Archi, where the same affix occurs asaprefix, suffix, or infix depending on phonological, morphological, and selectional properties of the base). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer

(1) German present and preterite verbal agreement a. output of stemming (i) present (ii) preterite sg pl sg pl 1 glaub-e glaub-en 1 glaub-te glaub-ten 2 glaub-st glaub-t 2 glaub-test glaub-tet 3 glaub-t glaub-en 3 glaub-te glaub-ten b. minimal subanalysis (i) present (ii) preterite sg pl sg pl 1 glaub-e glaub-(e)n 1 glaub-te glaub-te-n 2 glaub-st glaub-t 2 glaub-te-st glaub-te-t 3 glaub-t glaub-(e)n 3 glaub-te glaub-te-n c. elaborate subanalysis (Müller 2005: 10) (i) present (ii) preterite sg pl sg pl 1 glaub-e glaub-(e)n 1 glaub-te glaub-te-n 2 glaub-s-t glaub-t 2 glaub-te-s-t glaub-te-t 3 glaub-t glaub-(e)n 3 glaub-te glaub-te-n

In this chapter, we advance the hypothesis that Ø-affixes are a crucial factor that enables subanalysis. Thus in German verb inflection, present tense is systematically zero (although one might argue that the 1sg suffix -e expresses also present tense), which allows for the identification of the person/number markers without further segmentation.2 Similarly 1sg is zero in past tense forms, which reveals the bare past suffix -te. Thus, every marker of the segmentation in (1b) occurs on its own inat least one paradigm cell. In analogy to the (in-)dependent occurrence of morphemes in syntax, we will call the occurrence of a morpheme as the only form preceding or following the stem free and the occurrence as substring of the string preceding or following the stem bound (hence -st is free in (1a-i) and bound in (1a-ii); see also Koenig and Michelson, this volume for a discussion on subparts of pronominal port- manteau affixes, that may be assigned a consistent meaning, but are ultimately non- separable because they are bound elements in the sense assumed here). We will argue that it is exactly this contrast that makes the segmentation in (1b) (which involves only free forms) uncontroversial and transparent, whereas the further subanalysis in (1c) (which adds the bound affix -s to the inventory) is more debated and opaque (althoughaswewillsee,stillplausible).Foralearnerguidedbyfreeformsitiseasier

2 Note that the disclosure of the agreement marker’s forms is independent from the analytical options of having a present tense marker ‘-Ø’ in the lexicon or just lacking a present tense exponent. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Learning and the complexity of Ø-marking  to subanalyse glaub-tet into -te and -t than glaub-st into -s and -t,as-s does not occur on its own. More generally, we show that subanalysing learning algorithms profit substantially from paradigms where most or all markers have a free occurrence since this reduces the search space for possible segmentations (established by possible segmentation points): if less potential segmentation points are considered, the search for the correct segmentation remains shallow, and the range of possible subanalyses the learner must consider is restricted. Hence learners with different reliance on the free occurrence of a potential marker can provide dramatically different results. In fact, each kind of search-space-reduction that we will define classifies the complexity of subanalyses to be (un-)feasible with a particular strategy. Zero marking of inflectional categories, which is often regarded as a major source of complexity in morphological systems (cf. e.g. Anderson 1992, Wunderlich and Fabri 1994, Segel 2008), can hence be seen as a central factor facilitating the learning of subsegmentation. The fact that zero- exponence is a typologically ubiquitous phenomenon (e.g. for third person, present tense, etc.) would then have the effect that the subanalysis complexity observed cross- linguistically is low in the majority of languages, helping the learner to identify affixes. Our major empirical hypothesis is thus that no language has a subanalysis complexity that demands the learner to explore the full range of possible segmentations. The chapter is structured as follows: In section 10.2, we define a hierarchy of subanalysis complexity classes in terms of their reliance on the occurrences of free markers,andspecifytherespectiveamountsofsearch-space-reductiontheyinvolve. In section 10.3, we demonstrate the impact on the learning of inflection by introducing an incremental learning algorithm for inflectional affix inventories. Finally, we present the results of a typological pilot study on the distribution of zero-exponence in tense and agreement across a cross-linguistic language sample, which confirms the correla- tion between complexity classes and cross-linguistic distribution of Ø-marking.

. Subanalysis complexity Let us assume that a learner of a language has already managed to isolate the lexical material of an inflected word form from affixal material by stemming. The learner is then left with everything that precedes and follows the stem, what we will call affix strings (prefix_string–stem–suffix_string). The German verbal agreement paradigm from (1) is strictly suffixing, so its representation for subanalysis can simply omittheemptyprefixstringsandalsotheslot-indicatinghyphensforthesuffixes:3

3 In the following, we largely abstract away from phonological interference by using underlying repre- sentations as input for the learner. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer

(2) German verbal agreement suffix strings after stem removal a. prs sg pl b. pst sg pl 1 en 1 te ten 2 st t 2 test tet 3 tn 3 te ten A learner faced with (2) might hypothesize about segmenting the affix strings into subaffixes or rather assume that they are undecomposable morphemes expressing tense and agreement in a portmanteau fashion. The crucial problem is that first, the affix strings may be segmented in different ways (segmentation options) and second, affix strings with identical forms can be mapped to the same or different markers (lexicon assignment options). The combination of these two interdependent kinds of analytic options leaves the learner with a vast range of possibilities. For the content of each paradigm cell there are length(string)–1 possible segmentation points. Each seg- mentation point represents a binary decision of (non-) segmentation, so they add up to ( ) 2length string –1 possible segmentations. E.g. the 2sg past suffix string test has 24–1 = 8 possible segmentations, listed exhaustively in (3f). The number of possible segmenta- ( ( ) )∗ tionsofthewholeparadigmistheproductof2length string –1 number of occurrences for 0∗1 1∗1 0∗2 each affix string. For the rather small paradigm in (2) this yields 2 e · 2 st · 2 t · 0∗2 1∗2 3∗1 2∗2 2∗1 12 2 n · 2 te · 2 test · 2 ten · 2 tet = 2 = 4, 096 different segmentations. (3) Suffix strings and their possible segmentations a. e → { e } b. st → { st, s-t } c. t → { t } d. n → { n } e. te → { te, t-e } f. test → { test, t-est, te-st, tes-t, t-e-st, t-es-t, te-s-t, t-e-s-t } g. ten → { ten, t-en, te-n, t-e-n } h. tet → { tet, t-et, te-t, t-e-t } Regardless of the segmentation, the learner needs to assign a single or multiple meaning(s) to forms that occur in more than one cell, i.e. analyse them as either syn- cretic instances of the same lexicon entry or unrelated occurrences of (accidentally) homonymous lexicon entries. Segmentation of course affects these choices because subanalysis determines what the forms of affixes are and how they are distributed (what their occurrences are): without subanalysis, the learner faces the eight distinct forms listed in (4a), of which four have two occurrences (indicated by superscripts). Each form with two occurrences can either be mapped to a single (syncretic) or to two different (homonymous) marker(s), hence there are 2 · 2 · 2 · 2 = 16 different OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Learning and the complexity of Ø-marking  possible lexicon assignments for (4a), i.e. distinct ways to map the form occurrence to meanings.4 (4) Suffix string inventory from different subanalysis depth a. unsegmented { e, st, t2, n2, te2, test, ten2, tet } b. intermediate { e, st2, t3, n4, te6 } c. maximal { e7, s2, t11, n4 } If leading -te is segmented from all suffix strings in the past paradigm, the number of forms drops to five (4b), while the number of lexicon assignments increases dramatically: just for the six occurrences of -te there are 203 different ways to assign them to one to six different markers (203 possible partitions of a six-item set). Maximal subanalysis is reached when every segment is segmented (4c), which demonstrates that the minimal possible number of different forms is four, and has an exploding number of lexicon assignments (678,570 possible partitions just for the eleven occur- rences of -t).

.. Complexity classes If segmentation is guided by the free occurrences of affix strings, possible forms for markers differ in how accessible they are from the affix strings that constitute a paradigm. The full affix strings themselves are already implicitly segmented bythe stem and the word boundary. Thus they constitute possible affix forms for the last resort of non-subanalysis. In this respect, a learner that never subsegments an affix string is maximally conservative (class  (5a)). The most restricted possibility for a subanalysing learner is one that only considers segmentation of an affix string, if all ofitspartsareaffixstringsontheirowninotherpartsoftheparadigm(class1(5b)), i.e. all parts need to have a free occurrence. The minimal way to lift this restriction is to allow for the possibility that one part of a subanalysis is a non-affix string (class 2 (5c)), which allows a kind of second order cranberry affix (bound affix): an affix which never occurs without at least one adjacent other affix. Finally, we establish a class for unrestricted subanalysis (class 3, (5d)): (5) Subanalysis complexity classes as constraints on subaffixes a. Class  Affix strings are potential forms (no subaffixes) b. Class 1 Every subaffix S of an affix string AS also occurs as an affix string c. Class 2 For every binary subanalysis of an affix string AS into S1 +S2 either S1 or S2 occur as an affix string d. Class 3 No restriction on the occurrences of subaffixes

4 The number of possible lexicon assignments for a single segmentation computes as the product of the number of partitions of each form’s set of occurrences (the nth Bell number Bn for a form with n occurrences). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer

Crucially the classes defined in (5) establish an implicational complexity hierarchy: (6) Hierarchy of subanalysis complexity classes Class  ⊆ Class 1 ⊂ Class 2 ⊂ Class 3 A learner capable of class 3 subanalysis complexity has to consider all possible segmentation points, a class 2 learner only those which result from removing an affix string from the beginning or end of another affix string. A class 1 learner is limited to subanalysing cells exhaustively composed of affix strings. Finally, class  learners are restricted to word and stem boundaries. In the next section, we will discuss the impact of subanalysis complexity for the set of possible forms for affix hypotheses.

.. Evaluation The German data in (2) involve a minimal amount of subanalysis complexity, as there is a reasonable subanalysis, which only includes class 1 segmentation points. A class  learner yields the single non-subanalysing segmentation and needs to consider eight possible forms of a marker. With its three syncretic/homonymous markers covering two cells, there are sixteen possible lexicon assignments for (7a). (7) German present and preterite verbal agreement a. Class  subanalysis (i) prs sg pl (ii) pst sg pl 1 en 1 te ten 2 st t 2 test tet 3 tn 3 te ten b. Class 1 subanalysis (i) prs sg pl (ii) pst sg pl 1 en 1 te te-n 2 st t 2 te-st te-t 3 tn 3 te te-n c. Possible forms to consider { e, st, t, n, te, test, ten, tet } Crucially, to find (7b), no additional possible forms need to be considered: Every subanalysed form of a possible marker also occurs freely. With four possible segmen- tation points there are sixteen possible class 1 subanalyses (including (7a)), and for the maximal class 1 subanalysis shown here, there are 30,450 possibilities which map syncretic/homonymous markers to lexicon entries.5 A higher complexity is needed for the subanalysis of the Swahili paradigm in (8a): it does not contain any class 1 segmentation points and thus can only be subanalysed

e st t n te 5 B1 · B2 · B3 · B4 · B6 = 1 · 2 · 5 · 15 · 203 = 30,450 lexicon assignments for the segmentation in (b). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Learning and the complexity of Ø-marking  by a learner of at least class 2. Note that the example shown in (8a) is not the maximal segmentation possible with this class. The segmentation of the trailing cranberry-like li- in the past (8a-ii) is made possible by the fact that every substring that precedes it also occurs in the zero tense–aspect–mood marked subjunctive (8a-i). (8) Swahili subjunctive and imperfective verbal agreement (Seidel 1900: 10–18) a. Class 2 subanalysis (i) sub sg pl (ii) imp sg pl 1 ni tu 1 ni-li tu-li 2 um 2 u-li m-li 3 awa 3 a-li wa-li b. Possible forms to consider by complexity class Class /1 { ni, u, a, tu, m, wa, nili, uli, ali, tuli, mli, wali } Class 2 Class /1 ∪ { t, w, li }6 Class 3 Class /1 ∪ Class2 ∪ { n, i, l, il, nil, ili, ul, al, tul, ml, wal } The extension of the search space from class /1 to class 2, which is in fact necessary to find (8a), is reflected by the additional possible forms that are considered, cf. (8b). Yet still the search space is restricted in comparison to class 3. While for a class 3 learner, every segment transition is a possible segmentation point that can be combined with any other segmentation point,7 there are only eight class 2 segmentation points in (8). In sum, class 2 yields 28 = 256 possible segmentations of the 218 = 262,144 possible with class 3. Finally, the full search space of class 3 is needed to subanalyse a paradigm that always has an overt marker for all categories. Recall that our hypothesis is that such a pattern should not exist. Yet, such complexity can occur if the learner does not ‘see’ all relevantdata,i.e.missesthezero-exponentparadigmcells.SoifalearnerofSwahilihas no access to the subjunctive paradigm but only the present and imperfective, which both consequently mark tense–aspect–mood, the subanalysis of this highly regular (sub-) paradigm would be of complexity 3: (9) Swahili present and imperfective verbal agreement with class 3 subanalysis a. prs sg pl b. imp sg pl 1 ni-na tu-na 1 ni-li tu-li 2 u-na m-na 2 u-li m-li 3 a-na wa-na 3 a-li wa-li

6 Observe that the possibility of the bound forms t-, w-,andli- in class  arises from the possibility to segment one of the cells in which they occur into a free and a bound marker like t-u-, w-a-,andni-li-. 7 Note that subanalysis complexity class  restricts not only the number of segmentation points but also their possible combinations. Given the free occurrences { t, e, st }, the affix string test caninclassbe segmented into t-est, te-st, tes-t,andt-e-st,butnotintot-es-t, te-s-t,andt-e-s-t. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer

In (9), all forms regularly consist of an agreement marker (ni-, u-, a-,...)followed by one of the TAM markers (na- and li). Yet none of these markers has a free occurrence as the only overt marker preceding the stem. Hence a subanalysing learner does not gain anything from comparing the free affix strings for substring search (no free string is contained within another free affix string). As there are no class 1 or class 2 segmentation points the learner would need to scan through all possible segmentations to get to this subanalysis.

. Learning algorithms To demonstrate the impact of subanalysis complexity on the learning of inflection, we have integrated the different complexity restrictions of the developed hierarchy in an incremental algorithm that we have implemented. The algorithm performs segmentation and meaning assignment in an integrated way, based on a paradigm of unsegmented affix strings and their morphosyntactic specification. At every cycle of the algorithm, the learner builds a set of possible form-meaning pairs, chooses the best morpheme hypothesis by comparing the accuracy and generality of the candidates, and removes the learned morphemes’ occurrences from the paradigm. What counts as a possible affix in this search process is crucially restricted by the complexity class the learner adopts. The selection of the optimal morpheme hypothesis is driven by a preference for maximally general and reliable (accurate) mappings between form and meaning. To evaluate the accuracy and generality of different form-meaning map- pings,weemploystandardclassificationmeasurementsusedinworkoninformation retrieval and machine learning (Baeza-Yates and Ribeiro-Neto 1999), namely precision and recall, or rather their non-proportional counterpart false positives and negatives, but it is important to keep in mind that these are simply more formal equivalents of the criteria morphologists use for morphological analysis. To exemplify the use of this terminology, we will use the abstract paradigm given in (10), where a and b are strings of segments, and [±x], [±y] morphological features. (10) Informal and formal paradigm representation a. [+y] [-y] b. { a,[+x +y], b,[+x -y], [+x] ab a,[-x +y], ab,[-x -y] } [-x] aab As formally stated in (10b), we represent a paradigm as the set of form, meaning pairs corresponding to its cells. Meanings are represented as (possibly empty) sets of features drawn from the morphosyntactic feature-values needed to establish the cell meanings, e.g. [+1 -pl -past] or [-3 +pl]. (11) lists some of the affix hypoth- eses corresponding to a and b in (10) and introduces the evaluation criteria we use. A false positive for an affix hypothesis H of the phonological form F is any OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Learning and the complexity of Ø-marking  paradigm cell for which H is predicted to occur, but which does not contain F. Conversely, a false negative is a cell where F is not predicted to occur by H,butstill shows up. (11) Accuracy criteria for affix hypotheses form, false false implication perfect meaning positives negatives relation accuracy ratio a. b,[-y] – – ↔ perf. precision, perf. recall b. a,[+y] – yes ← perf. precision c. a,[-x] – yes ← perf. precision d. a,[] yes – → perf. recall e. b,[+x] yes yes neither f. a,[-y] yes yes (2) neither The meaning of (11a) is a completely accurate characterization for the distribution of b in (10): all cells with the meaning [-y] contain the string b and conversely all cells containing b have the meaning [-y] (there is no b not covered by [-y]). Hence, there are no false positives and no false negatives for this hypothesis. Correspondingly, there is an implication both from the meaning to the form (←) and from the form to the meaning (→), it is a one-to-one mapping (↔). With (11b), every cell that matches the meaning [+y] has the form (←, perfect precision), yet the occurrence of a in the [-x -y] cellisnotcoveredbythemarker,resultinginafalsenegative(falseprediction of the non-occurrence of the form). Whereas (11b) doesn’t incur any false positives, (11d) does not involve false negatives: it does not occur in any contexts for which it would not be predicted. This is for the trivial reason that, being wholly underspecified, it is compatible with the entire paradigm. On the other hand, it leads to a false positive for the [+x -y] cell. Precision is the fraction of true positives of an affix hypothesis H of form F (the correctly predicted occurrences of F) from all paradigm cells matching H,andrecall the fraction of true positives from all occurrences of F in the paradigm. Thus (11a) has both perfect precision and recall (amounting to 1). (11b) has perfect precision, but a 2 recall of 3 ([+y] predicts only two of the three occurrences of a in the paradigm). 3 Conversely, (11d) has perfect recall, but a precision of 4 (a occurs only in three of four cells for which it is predicted). Thus optimizing (and hence maximizing) precision correlates with minimizing false positives, whereas optimizing recall correlates with minimizing false negatives. It is obvious that these evaluation metrics closely mirror the criteria linguists (and learners of a natural language) employ to determine the correct affix entries for morphological systems. Virtually every morphologist would conclude that (11a) (with perfect precision/recall and zero false positives/negatives) is the correct characteriza- OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer tion for b in the given paradigm, and that (11b) (with one false negative and imperfect recall, but perfect precision) is a better analysis for a than (11f) (which has two false negatives and also a false positive for [+x -y]). On the other hand, there is no inherent reason in the definition of these criteria to prefer either of the hypotheses with false positives or false negatives in cases where they can’t be completely avoided (cf. a in (10)). Again this corresponds closely to informed linguistic judgements: Which kind of imperfect distribution is preferable crucially depends on the details of the grammatical formalism assumed by the morphological framework at hand. Thegrammarmayprovideprinciplesoradditionalmachinerythateitherprevents a marker to occur although it matches a cell (e.g. blocking, impoverishment) or that makes a marker occur in a cell although it does not match its meaning (e.g. empty cells taking the next best marker, rules of referral; see also Michelson and Koenig, this volume on Directness). Thus whether (11b) (no false positives, but false negatives) or (11d) (no false negatives, but false positives) is the better characterization for a might be answered differently by morphologists of specific theoretical persuasions. (11b) would be a viable analysis for proponents of Paradigm Function Morphology (Stump 2001), which could capture the ‘aberrant’ occurrence of a in the [-x -y] cell by a rule of referral, whereas (11d) would be the option of choice in frameworks which favour underspecification (such as Distributed Morphology, cf. Halle and Marantz 1993, Halle 1997) and might assume blocking by an impoverishment rule. Hence to end up with a complete analysis that exactly matches the data, the learner wesketchherewouldhavetobeadaptedtotheinsertionrestrictionsthegrammar employs and to the use of its additional mechanisms to cope with either false positives or false negatives. For our algorithm, we assume that every marker matching a cell’s meaning is inserted (i.e. no blocking or other dependencies among inserted markers) and—while the learner tries to avoid homonymy whenever possible—there is no general ban on homonymy.Theseassumptionsarebestmatchedbyalearnerthatoptimizesforperfect precision (prefers markers without false positives) and assumes homonymy in case of false negatives. For (10) this would yield two markers for a, as the distribution of this form cannot be captured perfectly by a single marker, cf. (12a). (12) a. Lexicon optimized for maximal precision > maximal recall { b,[-y], a1,[+y], a2,[-x -y] } b. Lexicon optimized for maximal recall > maximal precision { b,[-y], a,[] } Note that while (12b) of course is preferable in terms of smaller marker inventory and the avoidance of homonymy, unlike (12a) it is not a complete analysis of the data, as OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Learning and the complexity of Ø-marking  it does not determine why a is not present in the [+x -y] cell contrary to its every meaning which is compatible with every cell.8

.. Implementation To avoid a brute force search through the whole search space resulting from the combination of possible segmentations and lexicon assignments for all affixes, our learner uses a greedy algorithm which searches in every optimization step for only a single affix hypothesis, the one with the most regular paradigmatic distribution in minimizing false positives and negatives. At the end of every optimization step, thisaffixhypothesisisaddedasanewentrytotheaffixlexiconofthelanguage,and the strings corresponding to the affix hypothesis are removed from the paradigm. Optimization steps are repeated until the paradigm is empty, i.e. all affix strings in its cells have been assigned to affix entries in the lexicon. The full algorithm is given in pseudocode in (13).9 Optimization in our algorithm is strictly local in being myopic for interdependen- cies between markers (blocking), possible partitions of form occurrences into different markers (homonymy), and even possible segmentations of paradigm cells. In fact, segmentation and homonymy are not even a genuine notion in the algorithm: they emerge from the removal of learned affixal material. (13) Greedy algorithm for incremental perfect precision learning Input: a paradigm P,i.e.asetofaffix string, meaning-pairs an empty lexicon L 1 build the set M of all potential markers for P which do not incur false positives 2 choose the optimal marker O ∈ M according to the metrics · ‚ γ · maximize the number of true positives (including subaffixes) ‚ minimize the number of false negatives (excluding subaffixes) γ maximize the number of segments 3 add O to L and remove the affix string of O from all affix string, meaning-pairs ∈ P that match its meaning

8 Note that this can’t follow from blocking in this particular example because this would imply a blocking where b,[-y] blocks a,[] in the [+x -y] cell, but not in the [-x -y] cell. An impoverishment rule can also not serve to remove this false positive and complete the analysis: as the meaning of a,[] is the complete default subsuming every meaning, there is no way to prevent its insertion by deleting features from the cell meaning it is matched against. 9 This type of incremental optimization is closely akin to the Harmonic Serialism version of Optimality Theory (McCarthy ) where, in contrast to the standard version of OT, candidates for evaluation may only exhibit a single structural change to the input, but which allows iteration of evaluation cycles, where every cycle takes the output of the preceding cycle as input, up to the point that optimization stagnates, i.e. does not lead to further harmonic improvement. In these terms, the single structural change that defines the optimization cycle for our learner consists of learning a single marker and removing its occurrences from the paradigm. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer

4ifany affix string, meaning-pair ∈ P has a non-empty affix string: go to step 1elseoutputL Parametrization: Restrict step 1 to class , class 1, or class 2 segmentations (checked with P) In principle, we have the following details of the learner which might be fine-tuned: (i) the potential segmentation points assumed when generating possible forms, (ii) the possible meanings they are combined with to generate affix hypotheses, and (iii) the evaluation metrics for choosing the best among the marker hypotheses. Parametriza- tion for possible segmentation points is explicitly included in the algorithm in (14), and we will discuss the consequences of the different parameters in the next section. With respect to possible meanings, an aspect not specified explicitly in (13), we remain rather agnostic. However, since we restrict possible markers to affix hypotheses which do not incur false positives, it is crucial that the set of meanings is complete in the sense that it allows one to refer individually to any single paradigm cell; in effect there is always a last resort option to build a one-cell marker with perfect precision. In its evaluation metrics determining the choice between different affix hypotheses with perfect precision (line 2 of (13)), the algorithm is biased towards markers occurring in more paradigm cells, so that the algorithm preferentially learns markers which cover more paradigmatic space (and learns them first). If this bias results in a tie, the algorithm prefers markers which are more accurate in terms of recall, covering more of the free occurrences of its form in the paradigm.10 If this still gives a tie, it will use the marker with more segments, again maximizing paradigmatic space for the current affix hypothesis. This also provides an inherent upper bound tothe subanalysis depth of the results the learner produces: if a paradigm consists of the cells { en,[x y], en,[x], e,[z] }, optimization for length always prefers en,[x] over both e,[x] and n,[x]—whichhavetheverysamedistribution—andhence prevents the vacuous segmentation of en into two markers with identical meaning.

.. Results Examples (14) and (15) show the affix lexica and the concomitant segmentations that the algorithm in (13) produces for the German verb paradigm if restricted to class 1 and class 2 segmentations respectively. Interestingly, the class 1 segmentation in (14) corresponds to the more traditional and conservative segmentation of German we

10 Notethatifalearnerismeanttoalwaysgainmoreconfidencefromthefreethantheboundoccurrence of a form, it must prefer free true positives over bound ones and bound false negatives over free ones. Also note that there is no free vs bound distinction for false positives and true negatives, as they refer to the non-occurrence of a string (which is neither free nor bound). OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Learning and the complexity of Ø-marking  have already seen in (1b), whereas class 2 segmentation gives us the more radical subanalysis proposed by Müller (2005) 1c.11 (14) German verbal agreement with class 1 restricted learner a. Segmentation (i) prs sg pl (ii) pst sg pl 1 en 1 te te-n 2 st t1 2 te-st te-t1 3 t2 n 3 te te-n b. Lexicon

(i) te, [+past] (iv) t1, [+2 +pl] (ii) n, [-2 +pl] (v) e, [+1 +sg -past] (iii) st, [+2 +sg] (vi) t2, [+3 +sg -past] The order in which the markers appear in the lexica in (14b) and (15b) reflects the sequence of optimization steps that has generated them. In both analyses, te and n are learned first since they involve the widest and most regular distributions of inflectional affix strings in the German paradigm. The corresponding affix hypotheses arecompletelyaccurate(theirstringspecificationsoccurinallandonlythecells subsumed by their meaning) and cover six and four cells respectively. However, after the removal of te and n from the paradigm, the two learners behave slightly differently. For the setting in (14), the set of possible forms is restricted to the freely occurring affix strings { e, st, t }(ofwhichst and t have two occurrences in the paradigm), hence the algorithm first learns the marker corresponding to st,thent1,andaftere finally, homonymous t2. (15) German verbal agreement with class 2 restricted learner a. Segmentation (i) prs sg pl (ii) pst sg pl 1 en 1 te te-n 2 s-t1 t1 2 te-s-t1 te-t1 3 t2 n 3 te te-n b. Lexicon (i) te, [+past] (iv) s, [+2 +sg] (ii) n, [-2 +pl] (v) e, [+1 +sg -past] (iii) t1, [+2] (vi) t2, [+3 +sg -past]

11 An interesting difference is that Müller does assume only one marker of the phonological shape -t, where our algorithm learns two homonymous affixes of this form in both versions. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer

Fortheclass2learnerof(15)e, st, t,andcruciallys are possible forms at this point, hence second person t with four occurrences is preferred; as a result st is subanalysed. From a linguistic point of view, we see it as an open question whether the subanalysis in (15) or the one in (14) is more adequate. See Müller (2005) for persuasive, but in our eyesnotcompelling,argumentsinfavourof(15).Stillitisasuggestiveresultthatthe two intermediate degrees of subanalysis complexity we propose correspond closely to the two major segmentations of German which have been suggested in the literature. In contrast, Swahili verb inflection (16) provides an example where an adequate subanalysisrequiresthealgorithmtopresupposesegmentationpointsofcomplexity class 2 (i.e. subanalysis of an affix string into two affix hypotheses only requires that one of the corresponding strings occurs freely in the paradigm). By parameterizing the algorithm to subanalysis complexity 2 we obtain the tense-agreement segmentation in (16a) which virtually every linguist would concur: the subjunctive exhibits the bare person/number suffixes, and all other subparadigms combine these with tense/aspect affixes.12 (16) Swahili verbal agreement with class 2 restricted learner a. Segmentation (i) sub sg pl (ii) prs sg pl 1 ni tu 1 ni-na tu-na 2 um 2 u-na m-na 3 a w-a 3 a-na w-a-na

(iii) imp sg pl 1 ni-li tu-li 2 u-li m-li 3 a-li w-a-li b. Lexicon (i) ni, [+1 +sg] (v) a, [+3] (ii) na, [-past] (vi) u, [+2 +sg] (iii) tu, [+1 +pl] (vii) m, [+2 +pl] (iv) li, [+past] (viii) w, [+3 +pl] However, from the perspective of subanalysis complexity, the present/imperfect mark- ers -na/-li are cranberry affixes. Neither of them occurs as a free affix string in any part of the paradigm. Consequently, if these data are analysed with our class 1 learner, it produces the counterintuitive result in (17b), which has one marker for every paradigm cell and is thus identical to the input paradigm, with affix entries sorted by cell length.

12 Swahili has plenty more verb forms than shown in (); all of them are transparently structured as the present and the subjunctive. See Seidel () for exhaustive discussion. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Learning and the complexity of Ø-marking 

(17) Swahili verbal agreement with class 1 restricted learner a. Segmentation (i) sub sg pl (ii) prs sg pl (iii) imp sg pl 1 ni tu 1 nina tuna 1 nili tuli 2 um 2 una mna 2 uli mli 3 awa 3 ana wana 3 ali wali b. Lexicon (i) nina, [+1 +sg -past] (x) uli, [+2 +sg +past] (ii) tuna, [+1 +pl -past] (xi) mli, [+2 +pl +past] (iii) wana, [+3 +pl -past] (xii) ali, [+2 +sg +past] (iv) nili, [+1 +sg +past] (xiii) ni, [+1 +sg +subj] (v) tuli, [+1 +pl +past] (xiv) tu, [+1 +pl +subj] (vi) wali, [+3 +pl +past] (xv) wa, [+3 +pl +subj] (vii) una, [+2 +sg -past] (xvi) u, [+2 +sg +subj] (viii) mna, [+2 +pl -past] (xvii) m, [+2 +pl +subj] (ix) ana, [+3 +sg -past] (xviii) a, [+3 +sg +subj] Thus Swahili provides good evidence that class 1 subanalysis is too weak in general, andthatatleastsomelanguagesrequiresubanalysisofcomplexity2.Infactour impressionistic estimation is that this pattern is even more frequent in other areas of inflection such as adjectival comparison. For example, Persian forms its superlative on top of the comparative (parallel structures are found e.g. in Ubykh, Sanskrit, Gothic; cf. Bobaljik 2007: 12). Thus the superlative suffix -ín is again of the cranberry type so that the hardly disputable segmentation given in (18) requires class 2 complexity: (18) Adjectival comparison in Persian (Mace 2003: 53) Positive Comparative Superlative bozorg bozorg-tár bozorg-tar-ín ‘big’ mofid mofid-tár mofid-tar-ín ‘useful’ moškel moškel-tár moškel-tar-ín ‘clear’

We turn now to a slightly more complex set of data to illustrate the consequences of analytic ambiguity for the application of our algorithm. Estonian verb inflection (19) provides another example, where an adequate subana- lysis requires the algorithm to presuppose segmentation points of complexity class 2. Thus most linguists would concur that the past forms in (19a-ii) comprise an imperfect suffix and the agreement affixes also found in the corresponding present forms. Where the morphologist’s intuition potentially breaks down (or, to put it more optimistically, requires more evidence) is the exact identity/segmentation of the imperfect marker: one plausible option is that the 3sg imperfective -s is an independent tense-agreement portmanteau, whereas all other imperfect forms comprise the imperfect suffix -si. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer

Alternatively, one might argue for a completely general imperfect suffix -s.This captures the obvious parallelism between the imperfect forms which all start with s, but has the drawback that we get an ‘orphanized’ affix string i, which we would have to analyse as a second imperfect marker inducing multiple (extended) exponence. Crucially, both analyses result in cranberry morphs: neither -si nor i occurs as a free affix string in the paradigm and, as with Swahili, our algorithm produces undersegmentation (in fact non-segmentation) if restricted to class 1 complexity: (19) Estonian verbal agreement (Ehala 2009: 42) with class 1 restricted learner a. Segmentation (i) prs sg pl (ii) imp sg pl 1 nme 1 sin sime 2 dte 2 sid1 site 3 b vad 3 s sid2 b. Lexicon (i) sime, [+1 +pl +past] (vii) n, [+1 +sg -past] (ii) site, [+2 +pl +past] (viii) d, [+2 +sg -past] (iii) vad, [+3 +pl -past] (ix) b, [+3 +sg -past] (iv) sin, [+1 +sg +past] (x) s, [+3 +sg +past] (v) me, [+1 +pl -past] (xi) sid1, [+2 +sg +past] (vi) te, [+2 +pl -past] (xii) sid2, [+3 +pl +past] Again, we achieve plausible segmentation if we parametrize the learner to subanalysis complexity of class 2: (20) Estonian verbal agreement with class 2 restricted learner a. Segmentation (i) prs sg pl (ii) imp sg pl 1 nme 1 s-i-n s-i-me 2 dte 2 s-i-d s-i-te 3 b vad 3 s s-id b. Lexicon (i) s, [+past] (v) n, [+1 +sg] (ix)b, [+3 +sg -past] (ii) me, [+1 +pl] (vi) d, [+2 +sg] (iii) i, [-3 +past] (vii) vad, [+3 +pl -past] (iv) te, [2 +pl] (viii) id, [+3 +pl +past] As (20a) needs eight instead of sixteen markers and does not introduce more homonymy, we can see this as confirmation that the data are an instance of complexity class 2. Although the learner is not restricted to freely occurring affixes (class 1 complexity), the order of the affix entries in (20) still demonstrates that the algorithm OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Learning and the complexity of Ø-marking  prefers affix hypotheses which occur freely in the paradigm, thus freely occurring 1pl me is learned before the cranberry submorph s. The Estonian data also nicely illustrate the reasons why a learning algorithm such as ours which seeks to emulate the intuitions guiding theoretical linguists necessarily combines computation of affix meaning and subanalysis. An analysis positing imper- fect -s and -i wouldbeclearlysuperiorforamorphologistgivingmaximalimportance to the Syncretism Principle (Müller 2003), whereas the assumption that true multiple exponence is universally excluded (Halle and Marantz 1993, Ortmann 1999) would make an analysis assuming imperfect -si theonlypossibleoption.Sincesyncretism and multiple exponence are by definition notions based on the assignment of meaning, approaches to the learning of morphological segmentation which rely exclusively on phonological information (Harris 1955, Langer 1991, Goldsmith 2010, Saffran et al. 1996) cannot capture the fact that they influence the decision whether a specific affix string such as si should be subanalysed or not.13 A further analytic option we haven’t considered so far is that -si and 3sg/imperfect -s areinstancesofthesamemorphologicalmarker-s, obscured by a (morpho-) phonological process inserting i in consonant clusters. Under this analysis, the subanalysis complexity of the Estonian paradigm drops to 1 so it can be successfully subanalysed considering nothing but free forms as potential markers, as shown in (21).14 At this point, the potential impact of (morpho-) phonological alternations for morphological subanalysis is simply ignored by our algorithm because the learning of phonological processes is a highly complex problem of its own. (21) Estonian assuming i-insertion with class 1 restricted learner a. Segmentation (i) prs sg pl (ii) imp sg pl 1 nme 1 s-n s-me 2 d1 te 2 s-d1 s-te 3 b vad 3 s s-d2 b. Lexicon

(i) s, [+past] (v) d1, [+2 +sg] (ii) me, [+1 +pl] (vi) vad, [+3 +pl -past] (iii) te, [+2 +pl] (vii) d2, [+3 +pl +past] (iv) n, [+1 +sg] (viii) b, [+3 +sg -past]

13 This does not necessarily mean that purely phonotactic approaches to morphological segmentation which ignore affix meaning are ‘wrong’ (i.e. do not model the competence of speakers/learners). Theoretical morphology might be wrong, or there might be different phases of learning employing different methods. 14 Applying a class  analysis to the data in () gives further subanalysis. Especially -d2 is segmented as aplsuffix. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer

The clearest case of data which might only be adequately analysed by parameterizing the learner to class 3 complexity is the verb paradigm of the Oceanic language Lenakel (Lynch 1978). Besides additional TAM and subject agreement prefixes (for number) in other positional slots, the core system of verbs according to Lynch (pp. 42, 45, 47) is the obligatory combination of an overt agreement prefix with a right adjacent overt TAMprefixasshownin(22): (22) Lenakelverbalagreement(Lynch1978)15 pres past stat seq neg 1ex i-ak- i-im- i-n- i-ep- i-is- 1i k-ak- k-im- k-n- k-ep- k-is- 2 n-ak- n-im- n-n- n-ep- n-is- 3sg i-ak- i-im- i-n- i-ep- i-is- 3nsg k-ak- k-im- k-n- k-ep- k-is- 3ks m-ak- m-im m-n- m-ep- m-is- As there is no zero-marking both in the agreement and the TAM marking, the subanalysis of these categories into adjacent markers cannot be done in terms of free affix strings, upon which the class 1 and class 2 learners rest. It demands a full search through all possible segmentations by a class 3 learner.

. A typological pilot study As we have shown in the last section, lower subanalysis complexity of inflectional paradigms leads to a processing advantage for a linguistically informed segmentation of affix strings by cutting down the search space for potential affixes. Under the plausible assumption that morphological systems are adapted for learnability, this leads to the prediction that inflectional paradigms should avoid the full complexity of class 3 complexity and show a bias for paradigms of class 1 complexity. To test this prediction we have carried out a small typological pilot study on subject agreement and TAM affixes. Since evaluating complex inflectional paradigms for their subanalysis complexity status would require a complete morphological and phonological analysis taking into account all subparadigms, allomorphs, and morphophonological alternations, we have not tested complexity status, but the closely connected occurrence of Ø-affixes. Crucially a paradigm which allows one to subanalyse TAM and subject agreement markers with respect to each other under the class 2 restriction must exhibit at least one TAM, or one agreement marker which is Ø. Similarly, a paradigm giving rise to a class 1 analysis must involve at least one Ø agreement affix and one Ø TAM affix.

15 pres = present, habitual, and concurrent mood, stat = stative/perfective, seq = sequential, neg = negative, ks = known subject;  abstracts away from phonological alternations. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Learning and the complexity of Ø-marking 

Table . Typological pilot study: language sample

Language Phylum Macroarea Ø-Agr Ø-TAM Source

Udmurt Uralic Eurasia + + Csúcs (1998) Armenian Indo-European Eurasia + + Schmitt (1981) Nahuatl Uto-Aztecan N.America + + Andrews (1975) Kobon Trans-N.Gui. Austr./N.Gui. + + Davies (1989) Mapudungun Araucanian S.America + + Zúñiga (2000) Azerbaijani S.Turkic Eurasia + + Schöenig (1998) Turkana Nilotic Africa + + Dimmendaal (1983) Berber Afroasiatic Africa + + Kossmann (2007) Choctaw Muskogean N.America + + Broadwell (2006) Remo Munda Eurasia + + Anderson et al. (2008) Kalkatungu PamaNyungan Austr./N.Gui. + + Blake (1979) Moghol Mongolian Eurasia + - Weiers (2011) Belhare Kiranti Se.Asia/Oc. + - Bickel (2003) Kannada S.Dravidian Eurasia + - Steever (1998) Somali Cushitic Africa + - El-Solami-Mewis (1987) Inuktitut Eskimo-Aleut Eurasia + - Mallon (1991) Swahili Bantu Africa - + Seidel (1900) Pawnee Caddoan N.America - + Parks (1976) Manambu Sepik Austr./N.Gui. - + Aikhenvald (2008) Lenakel CE.M-Polynes. Se.Asia/Oc. - - Lynch (1978)

Our sample contains inflectional verbal paradigms of twenty areally and genetically diverse languages on the basis of Ruhlen’s (1987) phyla and macroareas. We have considered only languages which have (at least some) subject agreement and TAM inflection on the same side of the stem, disregarding portmanteau expression of subject agreement + TAM, non-finite verb forms, and non-segmental exponence. Table 10.1 shows the results of our survey, where a ‘+’ for Ø-Agr (Ø-TAM) indicates that the language has at least one Ø-affix for subject agreement (TAM), whereas a‘−’ indicatesthatallrelevantaffixesofthelanguagearenon-zero. Crucially, more than half of the languages (11 of 20) have some Ø-marking for subject agreement and TAM, and virtually all languages (19 of 20) have some Ø-marking for either subject agreement or TAM. This result strikingly confirms our predictions. In fact the marginal character of class 3 languages suggests that language learning might generally rely on Ø-affixes and completely avoid class 3 complexity. Even for Lenakel, the only plausible candidate for being of complexity 3 we have encountered so far, Lynch reports that:

In each case, the categories of person, tense, and number are obligatory, except that ...tense may be omitted in certain ...circumstances ...Certain tense prefixes may be omitted under certain OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 Sebastian Bank and Jochen Trommer conditions. The markers ak- and im- may be omitted in verbs with third person subjects when the context makes the time of action quite clear. (1978: 43, 52) More generally, since we have only evaluated verbal paradigms, it is likely that the overall systems of many languages are actually of lower complexity since the markers employed in verbal inflection may occur as free forms elsewhere, e.g. as independent pronouns or as affixes in nominal or adjectival inflection. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

References

Abbott, C. (1984). ‘Two feminine genders in Oneida’. Anthropological Linguistics 26, 125–37. Abrams,P.(2006).Onondaga Pronominal Prefixes. PhD thesis, University of Buffalo. Ackerman, F., J. Blevins, and R. Malouf (2009). ‘Parts and wholes: Patterns of relatedness in complex morphological systems and why they matter’. In J. Blevins and J. Blevins (eds), AnalogyinGrammar:FormandAcquisition, pp. 54–82. Oxford: Oxford University Press. Aikhenvald, Y. (2003). A grammar of Tariana. Cambridge: Cambridge University Press. Aikhenvald, Alexandra Y. (2008). The Manambu language of East Sepik, Papua New Guinea. Oxford: Oxford University Press. Albright, Adam. (2002). ‘Islands of reliability for regular morphology: Evidence from Italian’. Language 78, 684–709. Albright, Adam, and Bruce Hayes (2003). ‘Rules vs. Analogy in English past tenses: A compu- tational/experimental study’. Cognition 90, 119–61. Anderson, Gregory D. S., and K. David Harrison (2008). Remo (Bonda). In G. D. S. Anderson, (ed.), The Munda languages. London: Routledge, pp. 557–632. Anderson, Stephen R. (1988). ‘Morphological change’. In Frederick J. Newmeyer (ed.), Linguis- tics: The Cambridge survey, vol. I, 324–62. Cambridge: Cambridge University Press. Anderson, Stephen R. (1992). A-Morphous morphology. Cambridge: Cambridge University Press. Anderson, Stephen R. (2005). Aspects of the theory of clitics. Oxford: Oxford University Press. Anderson, Stephen R. (2011). ‘Stress-conditioned allomorphy in Surmiran (Rumantsch)’. In Martin Maiden, John Charles Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds), Mor- phological autonomy: Perspectives from Romance inflectional morphology,pp.13–35.Oxford: Oxford University Press. Anderson, Stephen R. (to appear). ‘The morpheme: Its nature and use’. In Matthew Baerman (ed.), Oxford handbook of inflection. Oxford: Oxford University Press. Andrews, James Richard (1975). Introduction to Classical Nahuatl. Austin: University of Texas Press. Aronof, Mark (1994). Morphology by itself. Cambridge, MA: MIT Press. Baayen, R. Harald, Rochelle Lieber, and Robert Schreuder (1997). ‘The morphological complex- ity of simplex nouns’. Linguistics,35,861–77. Baayen, R. Harald, Petar Milin, Dusica Filipović ÐurHević, Peter Hendrix, and Marco Marelli (2011). ‘An amorphous model for morphological processing in visual comprehension based on naive discriminative learning’. Psychological Review 118, 438–81. Baerman, Matthew (2004). ‘Directionality and (Un)Natural Classes in Syncretism’. Language 80(4), 807–27. Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2005). The syntax–morphology interface: A study of syncretism. Cambridge: Cambridge University Press. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 References

Baerman, Matthew, Greville G. Corbett, and Dunstan Brown (eds) (2010). Defective paradigms: Missing forms and what they tell us. Oxford: Oxford University Press. Baerman, Matthew, Greville G. Corbett, Dunstan Brown, and Andrew Hippisley (eds) (2007). Deponency and morphological mismatches. Oxford: Oxford University Press. Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto (1999). Modern Information Retrieval.New York, NY: ACM Press, Addison-Wesley. Baker, Mark C. (1995). The polysynthesis parameter. New York: Oxford University Press. Bauer, Laurie (2004). Morphological productivity. Cambridge: Cambridge University Press. Beard, Robert (1995). Lexeme–morpheme base morphology.Albany,NY:SUNYPress. Bickel, Balthasar (2003).‘Belhare’. In G. Thurgood and R. J. LaPolla (eds), The Sino-Tibetan languages. London: Routledge, pp. 546–70. Blake, Barry J. (1979). A Kalkatungu grammar. Canberra: Dept. of Linguistics, Research School of Pacific Studies, Australian National University. Blevins, J. (2004). ‘Inflection classes and economy’. In Z. G. Muller Gereon and Lutz Gunkel (eds), Explorations in Nominal Inflection, pp. 375–96. Berlin: Mouton de Gruyter. Blevins, J. (2006). ‘Word-based morphology’. Journal of Linguistics 42, 531–73. Blevins, J. (2013). ‘Word-based morphology from Aristotle to modern wp (word and paradigm models)’.In K. Allan (ed.), The Oxford handbook of the history of linguistics,pp.41–85.Oxford: Oxford University Press. Boas, Franz (1947). ‘Kwakiutl grammar, with a glossary of the suffixes’. Transactions of the American Philosophical Society 37(3), 201–377. Bobaljik, Jonathan David (2007). On Comparative Suppletion. Ms., University of Connecticut. Bobaljik, Jonathan David (2012). Universals in comparative morphology: Suppletion, superlatives, and the structure of words. Cambridge: MIT Press. Bochner, H. (1993). Simplicity in Generative Morphology. Berlin: Mouton de Gruyter. Boelaars,J.H.M.C.(1950).The linguistic position of south-western New Guinea. Leiden: E. J. Brill. Booij, Geert (2010). Construction Morphology. Oxford: Oxford University Press. Broadwell, George Aaron (2006). A Choctaw reference grammar. Lincoln: University of Nebraska Press. Brøndal, Viggo (1940). ‘Compensation et variation, deux principes de linguistique générale’. Scientia 68, 101–9. Reprinted: Id. 1943. Essais de linguistique générale, pp. 105–16. København: Munskgaard. Brown, Dunstan, Carole Tiberius, and Greville G. Corbett (2007). ‘The alignment of form and function: Corpus-based evidence from Russian’. International Journal of Corpus Linguistics 12, 511–34. Burzio, Luigi (2004). ‘Paradigmatic and syntagmatic relations in Italian verbal inflection’. In J. Auger, J. C. Clements, and B. Vance (eds), Contemporary Approaches to Romance Linguistics. Amsterdam: John Benjamins. Bybee, Joan L., and Carol L. Moder (1983). ‘Morphological classes as natural categories’. Language 59, 251–70. Bybee, Joan L., and Dan Isaac Slobin (1982). ‘Rules and Schemas in the Development and Use of the English Past Tense’. Language 58, 265–89. Bybee,JoanL.,R.D.Perkins,andW.Pagliuca(1994). The evolution of grammar: Tense, aspect and modality in the languages of the world. Chicago: University of Chicago Press. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

References 

Cairns, Charles E. (1986). Word structure, markedness, and applied linguistics. Markedness. Pro- ceedings of the twelfth annual linguistics symposium of the University of Wisconsin-Milwaukee, March 11–12, 1983, ed. by Fred R. Eckman, Edith A. Moravcsik, and Jessica R. Wirth, pp. 13–38. New York: Plenum. Carstairs, Andrew (1983). ‘Paradigm economy’. Journal of Linguistics 19, 115–28. Carstairs, Andrew (1984). ‘Outlines of a constraint on syncretism’. Folia linguistica 18, 73–85. Carstairs, Andrew (1987). Allomorphy in inflection. London: Croom Helm. Carstairs, Andrew, and Paul Stemberger (1988). ‘A processing constraint on inflectional homonymy’. Linguistics 26, 601–18. Carstairs-McCarthy, Andrew (1994). ‘Inflection classes, gender, and the principle of contrast’. Language 70, 737–88. Carstairs-McCarthy, Andrew. (2010). The evolution of morphology.Oxford:OxfordUniversity Press. Chafe, W.L. (1977). The evolution of third person verb agreement in the Iroquoian languages. In C. Li (ed.), Mechanisms of Syntactic Change, pp. 493–524. Austin, Texas: University of Texas Press. Clahsen, Harald (2006). ‘Linguistic perspectives on morphological processing’. In D. Wunder- lich, (ed.), Advances in the Theory of the Lexicon, pp. 355–88. Berlin: Mouton de Gruyter. Chomsky, Noam, and Morris Halle (1968). The sound pattern of English.NewYork:Harper &Row. Chumakina, Marina (2011). ‘Morphological complexity of Archi verbs’. In Gilles Authier and Timur Maisak (eds), Tense, aspect, modality and finiteness in East Caucasian languages (Diversitas Linguarum 30), pp. 1–24. Bochum: Brockmeyer. Chumakina, Marina, Dunstan Brown, Harley Quilliam, and Greville G. Corbett (2007). Archi: A Dictionary of the Archi Villages, Southern Daghestan, Caucasus. . Chumakina, Marina, and Greville G. Corbett (2008). ‘Archi: The challenge of an extreme agreement system’. In Aleksandr V. Arxipov (ed.), Fonetika i nefonetika. K 70-letiju Sandro V. Kodzasova [Festschrift for S. V. Kodzasov], pp. 184–94. Moskva: Jazyki slavjanskix kul’tur. Clackson, James (2007). Indo-Europeanlinguistics:AnIntroduction.Cambridge: Cambridge University Press. Coltheart, Max, Kathleen Rastle, Conrad Perry, Robyn Langdon, and Johannes Ziegler (2001). ‘DRC: A dual route cascaded model of visual word recognition and reading aloud’. Psycho- logical Review 108, 204–56. Corbett, Greville G. (2007). ‘Deponency, syncretism and what lies between’. Proceedings of the British Academy 145, 21–43. Corbett, Greville G. (2012). Features. Cambridge: Cambridge University Press. Corbett, Greville G., and Norman M. Fraser. (1993). ‘Network morphology: A DATR account of Russian nominal inflection’. Journal of Linguistics 29, 113–42. Cowan, Nelson (2001). ‘The magical number 4 in short-term memory: A Reconsideration of mental storage capacity’. Behavioral and Brain Sciences 24, 87–114. Croft, William (2003). Typology and universals. 2nd edition. Cambridge: Cambridge University Press. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 References

Csúcs, Sándor (1998).‘Udmurt’. In D. Abondolo (ed.), The Uralic languages. London and New York:Routledge,pp.276–304. Culicover, Peter W. (2013). Grammar and Complexity: Language at the Intersection of Compe- tence and Performance. Oxford: Oxford University Press. Culy, Christopher (1985). ‘The complexity of the vocabulary of Bambara’. Linguistics and Philosophy 8, 345–51. Daelemans, Walter, and Antal van den Bosch (2005). Memory-based language processing. Cambridge: Cambridge University Press. Dahl, Östen (2004). The growth and maintenance of linguistic complexity.Amsterdam:John Benjamins. Davies, John (1989). Kobon. London: Routledge. Davis, J. Colin, and S. Jeffrey Bowers (2004). ‘What do letter migration errors reveal about letter position coding in visual word recognition?’ Journal of Experimental Psychology: Human Perception and Performance 30, 923–41. Dehaene, Stanislas (2009). Reading the brain. New York: Penguin. Dimmendaal, Gerrit Jan (1983). The Turkana language.Dordrecht:Foris. Donohue, Mark (1999). Warembori. Muenchen: Lincom Europa. Donohue, Mark (2008). ‘Complex predicates and bipartite stems in Skou’. Studies in Language 32(2), 279–335. Donohue, Mark (2011). ‘Case and configurationality: Scrambling or mapping?’ Morphology 21, 499–513. Drabbe, P. (1947). Nota’s over de Jénggalntjoer-taal.Merauke:Archief. Drabbe, P. (1950). ‘Talen en Dialecten van Zuid-West Nieuw-Guinea’. Anthropos 45, 545–74. Driem, George van (1987). AgrammarofLimbu. Berlin: Mouton de Gruyter. Ehala, Martin (2009). ‘Linguistic strategies and markedness in Estonian morphology’. Sprachty- pologie und Universalienforschung 1/2, 29–48. Elman, Jeff (1998). ‘Generalization, simple recurrent networks, and the emergence of structure’. Proceedings of the 20th Annual Conference of the Cognitive Science Society. Mahwah. NJ: Lawrence Erlbaum Associates. El-Solami-Mewis, Catherine (1987). Lehrbuch des Somali. Leipzig: VEB Verlag Enzyklopädie. Evans, Nicholas (1995a). A grammar of Kayardild: With historical-comparative notes on Tangkic. Berlin: Mouton de Gruyter. Evans, Nicholas (1995b). ‘Multiple case in Kayardild: Anti-iconic suffix order and the diachronic filter’.InFransPlank(ed.),Double case: Agreement by suffixaufnahme, pp. 396–428. New York: Oxford University Press. Ferro, Marcello, Claudia Marzi, and Vito Pirrelli (2011). ‘A Self-Organizing Model of Word Storage and Processing: Implications for Morphology Learning’. Lingue e Linguaggio (10)2, 209–26. Bologna: il Mulino. Finkel, Raphael, and Gregory Stump (2007). ‘Principal parts and morphological typology’. Morphology 17, 39–75. Finkel, Raphael, and Gregory Stump (2009). ‘Principal parts and degrees of paradigmatic transparency’. In James P. Blevins and Juliette Blevins (eds), Analogyingrammar:Formand acquisition, pp. 13–53. Oxford: Oxford University Press. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

References 

Fortescue, M. (1980). ‘Affix ordering in West Greenlandic derivational processes’, International Journal of American Linguistics 46(4), 259–78. Gathercole, E. Susan, and Alan D. Baddeley (1989). ‘Evaluation of the role of phonological STM in the development of vocabulary in children: A longitudinal study’. Journal of Memory and Language 28, 200–13. Gazdar, G., G. Pullum, B. Carpenter, E. Klein, T. Hukari, and R. Levine (1988). ‘Category structures’. Computational Linguistics 14, 1–19. Gerken, Lou Ann (2006). ‘Decisions, decisions: Infant language learning when multiple gener- alizations are possible’. Cognition 98, B67–B74. Gershenson, Carlos, and Nelson Fernández (2012). ‘Complexity and information: Measuring emergence, self-organization, and homeostasis at multiple scales’. Complexity 18, 29–44. Givón, Talmy (1971). ‘Historical syntax and synchronic morphology: An archaeologist’s field- trip’. Proceedings of the Chicago Linguistic Society 7, 394–415. Goldsmith,JohnA.(2010).‘Segmentationandmorphology’.InA.Clark,C.Fox,andS.Lappin (eds), Handbook of Computational Linguistics and Natural Language Processing.Oxford: Blackwell. Greenberg, Joseph (1966). Language universals, with special reference to the feature hierarchies. The Hague: Mouton. Reprinted 2005. Berlin: Mouton de Gruyter. Halle, Morris (1997). ‘Distributed Morphology: Impoverishment and Fission’.In Y. K. Benjamin Bruening and M. McGinnis (eds), Papers at the Interface.Vol.30ofMIT Working Papers in Linguistics, pp. 425–49. Cambridge MA: MITWPL. Halle, Morris, and Alec Marantz (1993). ‘Distributed Morphology and the Pieces of Inflection’. InK.HaleandS.J.Keyser(eds),The View from Building 20, pp. 111–76. Cambridge MA: MIT Press. Hammarström, Harald, and Lars Borin (2011). ‘Unsupervised learning of morphology’. Compu- tational Linguistics 37(2), 309–50. Hankamer, J. (1989). ‘Morphological parsing and the lexicon’. In W. Marslen Wilson (ed.), Lexical Representation and Process, pp. 392–408. Cambridge, MA: MIT Press. Hargus, Sharon (1997). ‘The Athabaskan disjunct prefixes: Clitics or affixes?’ In Jane Hill, P.J. Mistry, and Lyle Campbell (eds), The life of language: Papers in linguistics in honor of William Bright. Berlin: Mouton de Gruyter. Harm, W. Michael, and Mark S. Seidenberg (1999). ‘Phonology, reading acquisition, and dyslexia: Insights from connectionist models’. Psychological Review 106, 491–528. Harris, Zellig Sabbatai (1955). ‘From phoneme to morpheme’. Language 31, 190–222. Haspelmath, Martin (2006). ‘Against markedness (and what to replace it with)’. Journal of Linguistics 42, 25–70. Hawkins, John A. (2004). Efficiency and complexity in grammars.Oxford:OxfordUniversity Press. Hay, Jennifer, and R. Harald Baayen (2003). ‘Phonotactics, parsing and productivity’. Italian Journal of Linguistics 15, 99–130. Hippisley, Andrew (2010). ‘Lexical analysis’. In Nitin Indurkhya and Fred J. Damerau (eds) Handbookofnaturallanguageprocessing,secondeditionpp.31–58.BocaRaton,FL:CRCPress, Taylor and Francis Group. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 References

Hjelmslev, Louis (1935). La catégorie des cas. Copenhagen: Munskgaard. Hockett, Charles F. (1947). ‘Problems of morphemic analysis’. Language 23, 321–43. Hocket, Charles F. (1958). Acourseinmodernlinguistics.NewYork:MacmillanCompany. Huntley, David (1993). ‘Old ’. The Slavonic languages, ed. by Bernard Comrie and Greville G. Corbett, pp. 125–87. London: Routledge. Jacobi, Hermann (1886). Ausgewählte Erzählungen in Mâhârâshtrî. Leipzig: Hirzel. Jakobson,Roman(1939).‘Signezéro’.InMélanges de linguistique, offerts à Charles Bally, pp. 143–52. Genève: Georg. Jurafsky, Daniel, and James H. Martin (2009). Speech and language processing. London: Pearson Education Ltd. Kaas, Jon H., Michael M. Merzenich, and Herbert P. Killackey (1983). ‘The reorganization of somatosensory cortex following peripheral nerve damage in adults and developing mam- mals’. Annual Review of Neuroscience 6, 325–56. Keuleers, Emmanuel, and Walter Daelemans (2007). ‘Morphology-based learning models of inflectional morphology: A Methodological Case Study’. In V. Pirrelli (ed.), Psycho- computational issues in morphology and processing, Lingue e Linguaggio 2, 151–74. Kibrik, Aleksandr E. (1977a). Opyt strukturnogo opisanija arčinskogo jazyka. Volume 2: Tak- sonomičeskaja grammatika. Moskva: Izdatel’stvo Moskovskogo Universiteta. Kibrik, Aleksandr E. (1977b). Opyt strukturnogo opisanija arčinskogo jazyka. Volume 3: Dinamičeskaja grammatika. Moskva:Izdatel’stvoMoskovskogoUniversiteta. Kibrik, Aleksandr, Alexandre Arkhipov, Mikhail Daniel, and Sandro Kodzasov (2007). Archi text corpus. . Kibrik, Aleksandr E., Sandro Kodzasov, Irina Olovjannikova, and Džalil Samedov (1977). Opyt strukturnogo opisanija arčinskogo jazyka. Volume 1: Leksika, fonetika.Moskva:Izdatel’stvo Moskovskogo Universiteta. Koenig, Jean-Pierre, and Karin Michelson (2012). ‘The (non)universality of syntactic selection and functional application’. In C. Piñon (ed.), Empirical studies in syntax and semantics, Volume 9, pp. 185–205. Paris: Centre National de la Recherche Scientifique. Koenig, Jean-Pierre, and Karin Michelson (in press). ‘Invariance in argument realization: The case of Iroquoian’. Language, 90(4). Kohonen, Teuvo (2001). Self-organizing maps. Berlin Heidelberg: Springer-Verlag. Kossmann,MaartenG.(2007).‘Berbermorphology’.InA.Kaye(ed.),Morphologies of Asia and Africa, pp. 429–46. Winona Lake, Indiana: Eisenbrauns. Kostić, Aleksandar, and Milena Božić (2007). ‘Constraints on probability distributions of grammatical forms’. Psihologija 40, 5–35. Kostić, Aleksandar, Tania Markovic, and Aleksandar Baucal (2003). ‘Inflectional morphology and word meaning: Orthogonal or co-implicative domains’. Morphological Structure in Language Processing, ed. by R. H. Baayen and R. Schreuder, pp. 1–44. Berlin: Mouton de Gruyter. Koutnik, Jan (2007). ‘Inductive modelling of temporal sequences by means of self-organization’. Proceedings of International Workshop on Inductive Modelling (IWIM 2007), pp. 269–77. Prague: Czech Technical University. Langer, Hagen (1991). Ein automatisches Morphemsegmentierungsverfahren für das Deutsche. PhD thesis, Georg-August-Universität zu Göttingen. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

References 

Lanman, Charles R. (1880). ‘A statistical account of noun-inflection in the Veda’. Journal of the American Oriental Society 10, 325–601. Leumann, Manu (1977). Lateinische Laut- und Formenlehre.München:Beck. Lounsbury, F. (1953). Oneida verb morphology. Yale University Publications in Anthropology 48. New Haven, CT: Yale University Press. Lovins, Julie Beth (1968).‘Development of a stemming algorithm’. Mechanical Translation and Computational Linguistics 11(2), 22–31. Lynch, John (1978). AgrammarofLenakel.Vol.55ofSeries B, Pacific Linguistics B. Mace, John (2003). Persian grammar: For reference and revision. London: Routledge. Mallon, Mick (1991). Introductory Inuktitut: Reference grammar. Montreal: Arctic College— McGill University Inuktitut Text Project. Maiden, Martin (2005). ‘Morphological autonomy and diachrony’. Yearbook of Morphology 2004, 137–75. Maiden, Martin, and Paul O’Neill (2010). ‘On morphomic defectiveness’. Proceedings of the British Academy 163, 103–24. Marchand, Hans (1969). The categories and types of present-day English word formation.2nd edition. Minchen: Beck. Marcus,GaryF.(2001).The algebraic mind: Integrating connectionism and cognitive science. Cambridge, MA: MIT Press. Marcus, Gary F., S. Vijayan, Shoba Bandi Rao, and Peter M. Vishton (1999). ‘Rule learning in 7-month-old infants’. Science 283, 77–80. Marzi, Claudia, Marcello Ferro, Claudia Caudai, and Vito Pirrelli (2012a). ‘Evaluating Hebbian self-organizing memories for lexical representation and access’. Proceedings of the 8th Inter- national Conference on Language Resources and Evaluation (LREC 2012), 886–93, Istanbul. Marzi, Claudia, Marcello Ferro, and Vito Pirrelli (2012b). ‘Prediction and generalisation in word processing and storage’. Proceedings of the 8th Mediterranean Morphology Meeting (8th MMM 2011), 114–31. University of Patras, Greece. Marzi, Claudia, Marcello Ferro, and Vito Pirrelli (2012c). ‘Word alignment and paradigm Induction’. Lingue e Linguaggio (11)2, 251–74. Matthews, P. H. (1974). Morphology: An introduction to the theory of word-structure. Cambridge: Cambridge University Press. Matthews,P.H.(1991).Morphology. 2nd edition. Cambridge: Cambridge University Press. McCarthy, John (2010). An introduction to Harmonic Serialism.Ms.,UniversityofMas- sachusetts, Amherst. McClelland, James L., and David E. Rumelhart (1981). ‘An interactive activation model of context effects in letter perception: Part 1. An account of basic findings’. Psychological Review 88, 375–407. Meir, Irit, Wendy Sandler, Carol Padden, and Mark Aronof (2010). ‘Emerging sign languages’. In Marc Marschark and Patricia Elizabeth Spencer (eds), Oxford handbook of deaf studies, language, and education,vol.2,pp.267–80.NewYork:OxfordUniversityPress. Miestamo, Matti, Kaius Sinnemäki, and Fred Karlsson (2008). Language complexity: Typology, contact, change. Amsterdam: John Benjamins. Milin, Petar, Victor Kuperman, Aleksandar Kostić, and R. Harald Baayen (2009). ‘Words and paradigms bit by bit: An information-theoretic approach to the processing of paradigmatic OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 References

structure in inflection and derivation’. In James P. Blevins and Juliette Blevins (eds), Analogy in grammar: Form and acquisition, pp. 214–52. Oxford: Oxford University Press. Milizia, Paolo (2013). L’equilibrio nella codifica morfologica. Roma: Carocci. Milizia, Paolo (2014). ‘Semi-separate exponence in cumulative paradigms: Information- theoretic properties exemplified by Ancient Greek verb endings’. Linguistic Issues in Language Technology 11(4), 95–123. Mohri, Mehryar, and Richard Sproat (2006). ‘On a common fallacy in computational linguistics’. In M. Suomen, A. Arppe, A. Airola, O. Heinämäki, M. Miestamo, U. Määttä, J. Niemi, K. K. Pitkänen, and K. Sinnemäki (eds), AManofMeasure: Festschrift in Honour of Fred Karlsson on his 60th Birthday. SKY Journal of Linguistics 19, 432–9. Moscoso del Prado, Martín, Fermín Aleksandar Kostic, and R. Harald Baayen (2004). ‘Putting the bits together: An information-theoretical perspective on morphological processing’. Cognition 94(1), 1–18. Müller, Gereon (2003).‘On decomposing inflection class features: Syncretism in Russian noun inflection’. In L. Gunkel, G. Müller, and G. Zifonun (eds), Explorations in nominal inflection, pp. 189–228. Berlin: Mouton de Gruyter. Müller, Gereon (2005). Subanalyse verbaler Flexionsmarker.Ms.,UniversitätLeipzig. Noreen, Adolf (1923). Altisländische und altnorwegische Grammatik. Halle: Niemeyer. Oberlies, Thomas (2001). Pali:¯ A grammar of the language of the Theravada¯ Tipit.aka. Berlin: Mouton de Gruyter. O’Neill, Paul (2011). ‘The notion of the morphome’. In M. Goldbach, M. Maiden, and J.-C. Smith (eds), Morphological autonomy: Perspectives from Romance inflectional morphology, pp. 70–94. Oxford: Oxford University Press. Orsolini, Margherita, Rachele Fanari, and Hugo Bowles (1998). ‘Acquiring regular and irregular inflection in a language with verb classes’. Language and Cognitive Processes 13, 425–64. Orsolini, Margherita, and William Marslen-Wilson (1997). ‘Universals in morphological repre- sentation: Evidence from Italian’. Language and Cognitive Processes 12, 1–47. Ortmann, Albert (1999). ‘Affix repetition and non-redundancy in inflectional morphology’. Zeitschrift für Sprachwissenschaft 1, 76–120. Packard, Jerome L. (2000). The morphology of Chinese: A linguistic and cognitive approach. Cambridge: Cambridge University Press. Papagno Costanza, Tim Valentine, and Alan Baddeley (1991). ‘Phonological short-term mem- ory and foreign-language vocabulary learning’. Journal of Memory and Language 30, 331–47. Parks, Douglas Richard (1976). AgrammarofPawnee.NewYork:Garland. Penfield, Wilder, and Lamar Roberts (1959). Speech and brain mechanisms. Princeton, NJ: Princeton University Press. Perry, Conrad, Johannes C. Ziegler, and Marco Zorzi (2007). ‘Nested incremental modeling in the development of computational theories: The CDP+ model of reading aloud’. Psychological Review, 114(2), 273–315. Pirrelli, Vito (2000). Paradigmi in Morfologia. Un approccio interdisciplinare alla flessione verbale dell’italiano. Pisa-Roma: Istituti Editoriali e Poligrafici Italiani. Pirrelli, Vito, Basilio Calderone, Ivan Herreros, and Michele Virgilio (2004). ‘Non-locality all the way through: Emergent global constraints in the Italian morphological lexicon’. Proceedings of 7th Meeting of the ACL SIGPHON. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

References 

Pirrelli, Vito, Marcello Ferro, and Basilio Calderone (2011). ‘Learning paradigms in time and space. Computational evidence from Romance languages’. In M. Maiden, John C. Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds), Morphological autonomy: Perspectives from Romance inflectional morphology, pp. 135–57. Oxford: Oxford University Press. Pischel, Richard (1900). Grammatik der Prakrit-Sprachen. Strassburg: Trübner. Plank, Frans, and Wolfgang Schellinger (1997). ‘The uneven distribution of genders over numbers: Greenberg Nos. 37 and 45’. Linguistic Typology 1, 53–101. Plaut, David, James McClelland, Mark Seidenberg, and Karalyn Patterson (1996). ‘Understand- ing normal and impaired word reading: Computational principles in quasi-regular domains’. Psychological Review 103, 56–115. Polinsky, Maria, and Eric Potsdam (2001). ‘Long-distance agreement and topic in Tsez’. Natural Language and Linguistic Theory 19, 583–646. Kluwer Academic Publishers. Porter, Martin (1980). ‘An algorithm for suffix stripping’. Program 3, 130–7. de Reuse, Willem Joseph (1994). Siberian Yupik Eskimo: The language and its contacts with Chukchi. Salt Lake City: University of Utah Press. Rice, Keren (2000). Morpheme order and semantic scope: Word formation in the Athabaskan verb. Cambridge: Cambridge University Press. Rice, Keren (2011). ‘Principles of affix ordering: An overview’. Word Structure 4(2), 169–200. Rice, Keren (2012). ‘Morphological complexity in Athabaskan languages: A focus on disconti- nuities’.Presented at SSILA workshop on Morphological Complexity in the Languages of the Americas, Annual Meeting of the Linguistic Society of America, Portland, Oregon, January 2012. Rissanen, Jorma (2007). Information and complexity in statistical modeling. New York: Springer. Roark, Brian, and Richard Sproat (2007). Computational approaches to morphology and syntax. Oxford: Oxford University Press. Rosenblatt, Frank (1962). Principles of neurodynamics. New York: Spartan. Round, Erich R. (2009). Kayardild morphology, phonology and morphosyntax. Yale University PhD dissertation. Round, Erich R. (2010). ‘Autonomous morphological complexity in Kayardild’. Paper presented at the Workshop on Morphological Complexity, Harvard University, 12 January 2010. Round, Erich R. (2011). ‘Morphomes as a level of representation capture unity of exponence across the inflection–derivation divide’. Linguistica 51, 217–30. Round, Erich R. (2013). Kayardild morphology and syntax. Oxford: Oxford University Press. Round, Erich R. (forthcoming). ‘Kayardild inflectional morphotactics is morphomic’.In A. Luis and R. Bermudez-Otero (eds),The morphome debate. Oxford: Oxford University Press. Round, Erich R. (in prep). Paradigmatic evidence for morphomic organisation in Kayardild inflection. Ruhlen, Merritt (1987). A guide to the world’s languages: Classification.Vol.1.Stanford:Stanford University Press. Russell, S. and P. Norvig (2009). Artificial intelligence: A modern approach. 3rd edition. Upper Saddle River, NJ: Prentice Hall. Sagot, Benoît (2013). ‘Comparing complexity measures’. Paper presented at the Workshop ‘Computational Approaches to Morphological Complexity’, 22 February, 2013, Paris. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

 References

Sagot, Benoît, and Géraldine Walther (2011). ‘Non-canonical inflection: data, formalisation and complexity measures’. In ActesdeSFCM2011(2ndworkshoponSystemsandFrameworksfor Computational Morphology), Zürich, Switzerland. Saffran, Jenny, Richard N. Aslin, and Elissa L. Newport (1996). ‘Statistical Learning by 8-Month- Old Infants’. Science 274, 1926–8. Saffran,Jenny,ElissaL.Newport,andRichardN.Aslin(1996).‘Wordsegmentation:Theroleof distributional cues’. Journal of Memory and Language 35, 606–21. Salanova, Andres Pablo (2012). ‘Reduplication and verbal number in Mebengokre’.Presented at SSILA workshop on Morphological Complexity in the Languages of the Americas, Annual Meeting of the Linguistic Society of America, Portland, Oregon, January 2012. Sampson, Geoffrey, David Gil, and Peter Trudgill (2009). Language Complexity as an evolving variable (Studies in the evolution of language 13). Oxford: Oxford University Press. Sapir, Edward (1921). Language. New York: Harcourt, Brace & World. Say, Tessa, and Harald Clahsen (2002). ‘Words, rules and stems in the Italian mental lexicon’. In S. Nooteboom, Fred Weerman, and Frank Wijnen (eds), Storage and computation in the language faculty, pp. 96–122. Dordrecht: Kluwer Academic Publishers. Schmitt, Rüdiger (1981). Grammatik des Klassisch-Armenischen.Vol.32ofInnsbrucker Beiträge zur Sprachwissenschaft. Innsbruck: Institut für Sprachwissenschaft der Universität Innsbruck. Schönig, Claus (1998). ‘Azerbaidjanian’. In L. Johanson and E. A. Csató (eds), The Turkish Languages, pp. 248–60. London: Routledge. Segel, Esben (2008). ‘Re-evaluating zero: When nothing makes sense’. Skase Journal of Theoret- ical Linguistics 5(2), 1–20. Seidel, August (1900). Swahili Konversations grammatik. Heidelberg: Julius Groos. Shannon, Claude E. (1948). ‘A mathematical theory of communication’. Bell System Technical Journal 27, 379–423. Shannon, Claude E. (1951). ‘Prediction and entropy of printed English’. The Bell System Technical Journal 30, 50–64. Spencer, Andrew (2007). ‘Extending deponency: implications for morphological mismatches’. Proceedings of the British Academy 145, 45–70. Steever, Sanford B. (1998).‘Kannada’. In S. B. Steever (ed.), The Dravidian languages,pp.129–57. London: Routledge. Stump, Gregory T. (1993). ‘On rules of referral’. Language 69, 449–79. Stump, Gregory T. (2001). Inflectional morphology. Cambridge: Cambridge University Press. Stump, Gregory T. (2002). ‘Morphological and syntactic paradigms: Arguments for a theory of paradigm linkage’. Yearbook of Morphology 2001, 147–80. Stump, Gregory T. (2012). ‘The formal and functional architecture of inflectional morphology’. In A. Ralli, G. Booij, S. Scalise, and A. Karasimos (eds), Morphology and the Architecture of Grammar: Online Proceedings of the Eighth Mediterranean Morphology Meeting,pp.245–70. Stump, Gregory, and Raphael A. Finkel (2013). Morphological typology: From word to paradigm. Cambridge: Cambridge University Press. Thumb, Albert, and Richard Hauschild (1959). Handbuch des Sanskrit: Eine Einführung in das sprachwissenschaftliche Studium des Altindischen. II Teil. Heidelberg: Winter. Trommer, Jochen, and Gereon Müller (eds) (2006). Subanalysis of argument encoding in distributed morphology.Vol.84ofLinguistische Arbeits Berichte. Institut für Linguistik: Universität Leipzig. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

References 

Vaillant, André (1958). Grammairecomparéedeslanguesslaves. Tome II. Morphologie. Deuxième partie: Flexion pronominale.Lyon-Paris:IAC. Vajda, Edward J. (2010). ‘A Siberian link with Na-Dene languages’. In James Kari and Ben A. Potter (eds), The Dene–Yeniseian connection (Anthropological Papers of the University of Alaska New Series, Vol. 5(1–2)), pp. 33–99. Fairbanks: Department of Anthropology, University of Alaska Fairbanks. Voegtlin, Thomas (2002). ‘Recursive self-organizing maps’. Neural Networks 15, 979–91. Von Hinüber, Oskar (1968). Studien zur Kasussyntax des Pali,¯ besonders des Vinaya-pit.aka. München: Kitzinger. Weiers, Michael (2011). ‘Moghol’. In J. Janhunen (ed.), The Mongolic Languages,pp.248–64. Routledge. Whitney, Carol (2001). ‘Position-specific effects within the SERIOL framework of letter-position coding’. Connection Science 13, 235–55. Wiese, Bernd (2004). ‘Categories and paradigms: On underspecification in Russian declension’. In Gereon Müller, Lutz Gunkel, and Gisela Zifonun (eds), Explorations in nominal inflection, pp. 321–72. Berlin: Mouton de Gruyter. Wunderlich, Dieter, and Ray Fabri (1994). ‘Minimalist morphology: An approach to inflection’. Zeitschrift fü Sprachwissenschaft 20, 236–94. Yu,AlanC.L.(2007).A natural history of infixation. Oxford: Oxford University Press. Zúñiga, Fernando (2000). Mapudungun. München: Lincom Europa. Zwicky, A. (1985). ‘How to describe inflection’.In M. Niepokuj, M. van Clay, V.Nikiforidou, and D. Feder (eds), Proceedings of the Eleventh Annual Meeting of the Berkeley Linguists Society, pp. 372–86. Berkeley, CA: Berkeley Linguistics Society. OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Languages Index

Arabic 149 Kalkatungu 203 Archi3,9,18,93–116,177,185 Kannada 203 Armenian 203 Kanum 53–68 Athabaskan 17 Kayardild 30–52 Azerbaijani 203 Kobon 203 Kwakw’ala 14–17, 18, 19, 24 Babine-Witsu Wit’en 19 Bambara 4 Latin3,22,38,49,79,90,130,168,178 Belhare 203 Lenakel 201–4 Berber 203 Lingala 54 Berik 53

Celtic 16, 23 Manambu 203 Chinantec 3 Mandarin 18, 25 Choctaw 21, 203 Mapudungun 203 Church Slavic, Old 177 Meryam Mer 54 Moghol 203 Dargwa 67 Mohawk 17, 25

Emplawas, Bobotand 54 Nahuatl 203 English 8, 18, 25, 120, 143, 148 Nakh-Daghestanian 93 English, American 22, 119 Eskimo, Greenlandic 70 Oneida 9, 17, 69–92, 99 Eskimo-Aleut 18 Estonian 199–201 Pali 168, 175–82 Finnish 54 Palu’e 67 French 8 Pawnee 203 Persian 199 Georgian 25 German 8, 148, 149, 152, 154, 155, 158, 159, 162, Quechua 7 185, 186–90, 196–8 Gothic 199 Remo 203 Greek 3, 90 Russian 178 Icelandic 25 Icelandic, Old 178 Salish 18 Iha 67 Sanskrit 199 Indic Sanskrit, Vedic 20 Middle 167–84 SignLanguage,AlSayyidBedouin25 Old 167–84 Skou 67 Inuktitut 203 Slavic 179 Iroquoian 9 Slovene 177 Italian149,158,159,160,162 Somali 203 Surmiran 23 Jaina-Mah¯ ar¯ as¯.t.ri Prakrit 183 Swahili 190–1, 198–200, 203 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Languages Index

Tariana 67 Vietnamese 18 Tlingit-Athabaskan-Eyak 20 Tsez 112 Wakashan 16, 18 Turkana 203 Warlpiri 23 Turkish 70 Yeniseian 20 Ubykh 199 Yupik, Central Alaskan 17 Udmurt 203 Yupik, Central Siberian 18, 19 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Names Index

Abbott 71 Csúcs 203 Abrams 70 Culicover 7 Ackerman 4, 5, 6, 77, 88, 119, 128 Culy 4 Aikhenvald 67, 203 Albright 142, 147, 161, 165 Daelemans 164 Anderson 4, 14, 35, 67, 69, 187, 203 Dammel 7 Andrews 203 Davies 203 Aronoff 23, 49 Davis 145 de Reuse 18 Baayen 147, 163, 168 de Saussure 13 Baddeley 163 Dehaene 156 Baerman22,54,57,85,167,174,179,182 Dimmendaal 203 Baeza-Yates 192 Drabbe 54 Baker 11, 12 Bank 9 Ehala 200 Bauer 68 El-Solami-Mewis 203 Beard 39, 40, 41 Evans 30, 33, 34, 35, 42 Bickel 203 Blake 203 Fabri 187 Blevins 69, 79, 141, 143 Fernández 8 Boas 18 Ferro 9 Bobaljik 4, 199 Finkel6,9,69,77,119 Boelaars 54 Fortescue 70 Booij 142 Fraser 49, 142 Borin 164 Bowers 145 Gathercole 163 Božić 168 Gazdar 77 Broadwell 203 Gerken 145 Brøndal 167, 179 Gershenson 8 Brown 169 Givón 25 Burzio 142 Goldsmith 185, 201 Bybee 20, 147 Greenberg 167, 168

Cairns 168 Halle 24, 194, 201 Carstairs see Carstairs-McCarthy Hammarström 164 Carstairs-McCarthy4,13,69,79,147,169 Hankamer 70 Chafe 72, 84 Hargus 19 Chomsky 24 Harm 145 Chumakina 9 Harris 201 Clackson 183 Haspelmath 168 Clahsen 147, 166 Hauschild 171, 177, 180 Coltheart 145 Hawkins 168 Corbett 9, 38, 49, 142 Hay 163 Cowan 163 Hayes 142, 161 Croft 168 Hippisley 5 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

 Names Index

Hjelmslev 167, 182 Oberlies 175, 176 Hockett 7, 22 O’Neill 49 Huntley 177 Orsolini 147 Ortmann 67, 201 Jacobi 183 Jakobson 167 Packard 18 Jurafsky 4 Papagno 163 Parks 203 Kaas 151 Penfield 151 Keuleers 164 Perry 145 Kibrik 93, 94, 103, 114 Pirrelli 9 Koenig 9 Pischel 183 Kohonen 143, 151 Plank 177 Kossman 203 Plaut 145 Kostić 168, 170 Polinsky 112 Koutnik 152 Porter 185 Kürschner 7 Potsdam 112 Kuster 7 Ribeiro-Neto 192 Langer 201 Rice 19, 21, 185 Lanman 171 Rissanen 170 Leumann 179 Roark 149 Lounsbury 70, 72, 82, 88 Roberts 151 Lovins 185 Rosenblatt 144 Lynch 201, 202, 203 Ruhlen 202 Rumelhart 145 Mace 199 Russell 77 Maiden 30, 43, 49 Mallon 203 Marantz 194, 201 Saffran 145, 201 Marchand 18 Sagot 5, 6 Marcus 143, 144, 145, 146, 150, 159, 161, 165 Sampson 7 Marslen-Wilson 147 Sapir 11 Martin 4 Saussure 13 Marzi 9 Say 147 Matthews 35, 142 Schellinger 177 McCarthy 195 Schmitt 203 McClelland 145 Schöenig 203 Meir 25 Segel 187 Michelson 9 Seidel 191, 198, 203 Miestamo 7 Seidenberg 145 Milin119,128,168,170 Shannon 128, 169, 170 Milinet 6 Slobin 147 Milizia 9 Spencer 38 Moder 147 Sproat 4, 149 Mohri 4 Steever 203 MoscosodelPradoMartín6,119,128, Stemberger 147 142, 168 Stump6,9,35,36,50,54,69,75,77,84,85, Müller 185, 186, 197, 198, 201 119, 142, 179, 194

Noreen 178 Thumb 171, 177, 180 Norvig 77 Trommer 9 OUP CORRECTED PROOF – FINAL, 5/3/2015, SPi

Names Index 

Vaillant 177 Whitney 145 Vajda 20 Wiese 179 Van den Bosch 164 Wunderlich 187 Voegtlin 152 Von Hinüber 175 Yu 107

Walther 6 Zúñiga 203 Weiers 203 Zwicky36,54,76,83 OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Subject Index

ablaut learning see acquisition of morphology apophony 148 learning algorithms 143, 164, 187, 192–202 vowel alternation 22, 36, 43, 123–4, 145 Lexical Phonology 24 acquisition of morphology 25, 70, 87, 89, lexical specification 4, 98 n. 6, 111–16 86 n. 9, 89, 129–30, 141–8, 150, 151, 153–7, 159, 163–5, 185–204 markedness 9, 13, 16, 167–9, 176, 182 agreement 9, 20, 54–5, 58, 61–8, 93–116, 102, n. 24, 191 174n.12,185–92,197–203 morphological features 5, 75, 192 allomorphy 7–8, 13, 15–16, 23–4, 46–7, 50, morphology 73–4, 76–7, 79 n. 5, 81–2, 86–8, 89, 91, abstractive 141–66 125 n. 3, 202 constructive 141, 143, 151 architecture of grammar 20, 24, 49, 142, 166 word-based 9, 35, 141 morphome 8, 29–30, 32 n. 1, 39, 43–51 circumfix 21, 148, 164 multiple exponence 21, 67, 101, 200, 201 clitics 20, 24 compounding 18, 25, 65–6 Optimality Theory 13, 195 connectionism142,144,148,150,169,175, 177, 179 paradigm function 35, 76, 142 cumulative exponence 21, 44, 45, 51, 169 phonology3,4,9,13,15–16,20,23–4,29,30, 35–6,39–48,50,51,75,84,86,87,93,94, defectiveness 22, 65 95, 99–100, 101–14, 116, 120, 124, 126, deponency 22, 38–9, 65 142,149n.3,152,164,174n.13,175,176, derivation 18, 20, 40, 41, 45, 48–9, 50, 99, 177,180n.23,183n.25,185n.1,187n.3, 100, 107, 142, 151, 175 n. 14 192, 197 n. 11, 201, 202, 203 n. 15 dynamicsystems142,151,155,166 portmanteauform44,54,57,62,65,71,188, 199, 203 empty morph 15, 21 predictability 5–6, 8, 20, 23, 65–8, 119, 125–7, entropy 5–6, 77, 119, 126, 127, 128–9, 136 n. 9, 136–8 142n.1,170,173n.10 processing 141–3, 145, 147, 149, 150–2, 155, exponence,ruleof35,36,37,40,75,77,79, 163–6, 202 81–5, 91, 195 n. 8 rarity18,53,54,61,93,169,175,182n.24,183 redundancy 7, 141, 169, 170 n. 4 faithfulness 13, 16 referral, rule of 8, 35–40, 42, 48, 50, 53–5, frequency 88, 103–4, 151, 163, 165, 168–75, 177, 58–65, 68, 81, 84–6, 169 n. 3, 173 n. 11, 194 179, 180 segmentation 9, 69–71, 76–7, 86 n. 9, 91, 159, Grammaticalization 24, 66 164, 177, 186–92, 195–9, 201–2 self-organisation 142, 143, 151, 155, 161, 164 homonymy (accidental identity) 84 semantics3,8,9,14–15,19–20,23,29,36, 70 n. 2, 71–2, 74, 75–6, 78–9, 83, 89, 91, incorporation 71 114–15, 174 n. 12, 180 infixation 9, 18, 21, 93–116 structuralism 21, 77 inflection class 4–6, 8–9, 29, 30, 49, 50, 69, subanalysis 159, 186–9, 191, 192, 196–202 86, 88 n. 10, 90, 119–20, 124–6, 128, 130, subanalysis complexity classes 189–92, 132, 136–7, 149 n. 3, 168 n. 2, 172 n. 9 199–203 information theory 168 suppletion 4, 54, 57, 60–1, 65, 67–8 OUP CORRECTED PROOF – FINAL, 9/3/2015, SPi

Subject Index  syncretism 9, 22, 29, 30–7, 39, 42, 57–8, 62–3, umlaut 22, 24 93, 96–9, 116, 167–84, 188, 201 underspecification 70, 81–3, 85, 90 n. 11, syntax 3–8, 12–16, 24–5, 69, 89, 95, 100, 91, 169 n. 3, 173 n. 11, 179 n. 22, 180, 174 n.12, 175, 178, 186 193, 194 takeover 58, 60, 63–5, 67, 68 zero exponence 21, 71, 97, 105, 172 n. 9, typology5,6,7,8,11,23,125n.4 185–204