Crowgey Washington 0250E 2

©Copyright 2019 Joshua Crowgey Braiding Language (by Computer): Lushootseed Grammar Engineering Joshua Crowgey A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2019 Reading Committee: Emily M. Bender, Chair David Beck Sharon Hargus Program Authorized to Offer Degree: Linguistics University of Washington Abstract Braiding Language (by Computer): Lushootseed Grammar Engineering Joshua Crowgey Chair of the Supervisory Committee: Professor Emily M. Bender Linguistics This dissertation describes the beginnings of the t̕əbšucid project. t̕əbšucid, literally the braiding of language, is a way to refer to “grammar” in Lushootseed (also known as Puget Salish, ISO-639-3:lut). The t̕əbšucid project has three overlapping goals: (1) to advance linguistic sci- ence via grammar engineering methods with a specific focus on Lushootseed; (2) to package and distribute linguistic results in a way which is useful for people involved in the development of language-related applications which may serve a role in the documentation and revitalization of endangered languages; (3) to highlight the inherent value of the traditional language of the Puget Sound. To those ends, I began to build a system which could process Lushootseed texts. I implemented an initial morphophonological analyzer which can map Lushootseed orthogra- phy to a regularized morphophonemic representation. This representation is one which can serve as input to a syntactico-semantic grammar which provides semantic analyses for input sen- tences. I then implemented an initial syntactico-semantic grammar which maps the morpheme- regularized form to sentence-level semantics, via an explicit syntactic representation. In doing so, I address the first and third motivations listed above by presenting a series of case studies which emerged from the initial implementation work. In connection to the second motivation, this dissertation describes both the overall architecture and the context of the system, in the hope that the work I’ve carried out to-date can be is something which can be built upon by others. TABLE OF CONTENTS Page List of Figures . iv List of Tables . v Chapter 1: Introduction . 1 1.1 Motivation . 2 1.2 About the author . 7 1.3 Introduction to the Dissertation . 9 Chapter 2: Lushootseed . 15 2.1 Language Name and Areal Distribution . 15 2.2 History . 19 2.3 Philogenetics . 36 2.4 Typology . 39 2.5 Conclusion . 47 Chapter 3: Linguistic Theory and Framework . 48 3.1 Bird’s Eye View . 48 3.2 Finite State Morphology and Phonology . 50 3.3 Head-driven Phrase Structure Grammar for Morphosyntax . 58 3.4 Introduction to DELPH-IN and Type Description Language . 69 3.5 Minimal Recursion Semantics . 71 3.6 Conclusion . 85 Chapter 4: Environment and Development Cycle . 86 4.1 Overall Architecture . 86 4.2 The LinGO Grammar Matrix . 88 i 4.3 Metagrammar Engineering: the CLIMB approach . 91 4.4 Meta-choices . 92 4.5 Transducer Generation . 95 4.6 Testsuites, Grammar profiles, Regression Testing . 98 4.7 Lushootseed data, testsuite construction . 100 4.8 Conclusion . 102 Chapter 5: Implemented Lushootseed Morphophonology . 104 5.1 Resources, Methodology, Initial Results . 105 5.2 Structures . 113 5.3 Relationship to Descriptive/Documentary Efforts . 123 5.4 Conclusion . 125 Chapter 6: Grammar Engineering Examples . 126 6.1 VSOv2 Word Order . 126 6.2 A morphological example: object markers . 149 6.3 Conclusion . 157 Chapter 7: Case-study I: Valence-increasing Morphology . 158 7.1 Data . 159 7.2 Theoretical and Practical Background . 175 7.3 Analysis . 201 7.4 Results . 228 7.5 Conclusion . 232 Chapter 8: Case Study II: Verbal Arguments and Non-verbal Predicates . 236 8.1 Nouns and Verbs in Salishan . 237 8.2 Data . 241 8.3 Analysis . 251 8.4 Discussion and Results on the Testsuite . 278 Chapter 9: Evaluation . 280 9.1 Evaluation Context: other notable implementations . 281 9.2 Results . 298 ii 9.3 Error Analysis on Unparsed Items . 316 9.4 Conclusion . 328 Chapter 10: Conclusion . 329 Bibliography . 331 Appendix A: Scripts . 352 Appendix B: Results for Valence Increasing Morphology Testsuite . 356 iii LIST OF FIGURES Figure Number Page 2.1 Lushootseed speaking area . 17 3.1 Automaton to model the non-finite whimsical language of bovines /moo∗/... 52 3.2 Transducer mapping the whimsical language of bovines /moo∗/ to a whimsical language of specters /boo∗/............................ 53 3.3 Transducers implementing Phonological Transformation Rules . 56 3.4 Automata to model the language of three temporal proclitics . 56 3.5 Automaton created by concatenating three regular languages . 56 3.6 Transducer created by composition of the automaton in Figure 3.5 with the transducer of Figure 3.3. 57 3.7 A hierarchy of animals . 60 3.8 An animals hierarchy with feature inheritance and typed feature values. 61 4.1 Text-processsing system architecture . 87 4.2 The text processing system as a pipeline. 88 4.3 The customization of a grammar . 90 5.1 Enumeration of the FST development methodology . 108 5.2 Initial Performance of Morphophonological Analyzer . 112 5.3 Network overview for the morphotactic system . 119 8.1 A set of tense, aspect and mood choices implementing Lushootseed general prefixes. 248 iv LIST OF TABLES Table Number Page 1.1 Abbreviations for linguistic glosses used in this document . 14 2.1 Selected Lushootseed place names and their English corollaries. Mostly from (Hilbert et al., 2001) . 18 2.2 Salishan Genealogical Grouping by Swadesh (1950) . 37 2.3 Salishan Genealogical Grouping by Thompson (1979) . 38 2.4 Lushootseed Phonological Inventory . 40 2.5 Lexical Suffixes used as numeral classifiers (Hess, 1995, 20) . 43 2.6 Referents of Lushootseed lexical suffixes: the same referent may require different suffixes in different contexts (Hess, 1998, 22–23). 46 2.7 Lushootseed Nouns with corresponding lexical suffixes . 47 3.1 The Chomsky-Schuützenberger hierarchy . 50 3.2 Structured Labels in DELPH-IN Elementary Predications . 73 4.1 [incr tsdb()] fields and the corresponding data types available for each testsuite item . 99 4.2 Testsuites used in HPSG Grammar Development . 103 5.1 The FST evaluation data to date . 112 5.2 Affixal template of Lushootseed verb stems . 114 6.1 The forms of Lushootseed main clause subject markers . 131 6.2 Testsuite items for subject markers and general word order constraints . 132 6.3 Results against the extended testsuite . 148 6.4 Lushootseed object markers . 149 6.5 Development testsuite for Lushootseed Object Markers . 151 7.1 Properties contributing to AGENT and PATIENT proto-roles . 186 7.2 The dimensions of Hopper and Thompson’s (1980) transitivity scale . 187 7.3 Summary of grammar coverage over valence-increasing morphology testsuite . 229 v 8.1 Lushootseed General Prefixes . 246 8.2 The Testsuite Results for System I . 279 8.3 The Testsuite Results for System II . 279 9.1 The evaluation results for Bear and Ant . 301 9.2 The evaluation results for Bear and Fishhawk . 307 9.3 The evaluation results for Coyote and Big Rock . 313 9.4 The evaluation results for Mink . 315 vi ACKNOWLEDGMENTS I would first like to humbly acknowledge and thank the Lushootseed-speaking elders who shared their language and recorded their stories. Although I never knew them personally, the voices of Vi taqʷšəblu Hilbert, Edward sʔadacut Sam, Martha səswix̌ab Lamont and others have changed me and influenced who I am and how I think about the world. I would like to thank Emily M. Bender, a language scientist par excellence—indefatigable in her search for generalizations and explanations while never taking her eyes far from the rich diversity of the data she examines. She carves mystifying phenomena into empirical questions, seemingly with ease. This thesis would not have been possible without massive amounts of help and guidance from David Beck. His deep knowledge of Lushootseed grammatical structures often steered me away from analytical dead ends and connundra. I would also like to.

Load more