Widespread but Not Universal: Improving the Typological Coverage of the Grammar Matrix
Total Page:16
File Type:pdf, Size:1020Kb
Widespread but Not Universal: Improving the Typological Coverage of the Grammar Matrix Scott Drellishak A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2009 Program Authorized to Offer Degree: Department of Linguistics University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Scott Drellishak and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Chair of the Supervisory Committee: Emily M. Bender Reading Committee: Emily M. Bender Barbara Citko Sharon Hargus Date: In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with “fair use” as prescribed in the U.S. Copyright Law. Requests for copying or reproduction of this dissertation may be referred to Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346, 1-800-521-0600, to whom the author has granted “the right to reproduce and sell (a) copies of the manuscript in microform and/or (b) printed copies of the manuscript made from microform.” Signature Date University of Washington Abstract Widespread but Not Universal: Improving the Typological Coverage of the Grammar Matrix Scott Drellishak Chair of the Supervisory Committee: Assistant Professor Emily M. Bender Department of Linguistics The LinGO Grammar Matrix provides a foundation for building grammars of natural languages in hpsg. It includes a web-based questionnaire that allows a linguist to describe a natural language, and then creates a starter grammar for that language based on the answers. In this dissertation, I describe improvements I have made to the typological coverage of this system, including support for core case marking, agreement in person, number, and gender, direct-inverse languages, and a lexicon containing an arbitrarily large number of lexical types, lexical items, and inflectional morphemes. TABLE OF CONTENTS Page ListofFigures ................................... vi ListofTables.................................... vii Chapter1: Introduction ............................ 1 1.1 Organization ............................... 3 1.2 Format................................... 3 1.3 Abbreviations ............................... 4 Chapter2: Background............................. 6 2.1 hpsg .................................... 6 2.1.1 Types ............................... 6 2.1.2 Syntax............................... 10 2.1.3 Semantics ............................. 14 2.1.4 Summary ............................. 18 2.2 Typology ................................. 18 2.2.1 Greenberg ............................. 18 2.2.2 Chomsky ............................. 20 2.2.3 SurveysandDatabases . 23 2.2.4 LinguisticQuestionnaires. 23 2.2.5 Summary ............................. 25 2.3 Multilingual Grammar Engineering . 26 2.3.1 ParGram ............................. 26 2.3.2 NaturalLanguageGeneration . 27 2.3.3 ReferentialPropertiesofNominals . 28 2.3.4 AutomaticAcquisitionofGrammars . 28 2.3.5 Porting a Grammar of a Closely-Related Language . 29 i 2.4 TheLinGOGrammarMatrix . 29 2.4.1 History............................... 30 2.4.2 CoreandLibraries ........................ 33 2.4.3 MatrixCustomizationSystem . 35 2.4.4 TheQuestionnaire ........................ 36 2.4.5 NewLibraries ........................... 45 2.5 Summary ................................. 46 Chapter3: Case................................. 47 3.1 Typology ................................. 47 3.1.1 Morphosyntactic Alignment . 48 3.1.2 SplitErgativity .......................... 50 3.1.2.1 NatureofMainVerb . 51 3.1.2.2 Nature of nps...................... 52 3.1.2.3 ClausalSplits.. .. 52 3.1.3 Focus-caseSystems . 53 3.1.4 Summary ............................. 54 3.2 Analysis .................................. 54 3.2.1 MarkingStrategies . .. .. 55 3.2.1.1 MixedMarking ..................... 56 3.2.1.2 Case-marked Determiners . 58 3.2.1.3 Summary ........................ 60 3.2.2 SimpleCase-Marking . 60 3.2.3 Split-S............................... 62 3.2.4 Fluid-S............................... 63 3.2.5 Split-N............................... 64 3.2.6 Split-V............................... 64 3.2.7 Focus-case............................. 65 3.2.8 Summary ............................. 66 3.3 Questionnaire ............................... 66 3.4 TestCases................................. 69 3.4.1 Pseudo-Languages . .. .. 70 3.4.2 NaturalLanguages ........................ 78 ii 3.4.2.1 German ......................... 78 3.4.2.2 Dyirbal ......................... 79 3.4.2.3 Hindi .......................... 82 3.4.2.4 Tagalog ......................... 86 3.5 Summary ................................. 87 Chapter4: Direct-inverseLanguages. ... 89 4.1 Typology ................................. 89 4.1.1 Algonquian ............................ 90 4.1.2 Fore ................................ 91 4.2 Analysis .................................. 92 4.2.1 sc-args .............................. 95 4.2.2 Algonquian ............................ 96 4.2.3 Fore ................................ 99 4.3 Questionnaire ............................... 102 4.4 TestCases................................. 103 4.4.1 Cree ................................ 103 4.4.2 Fore ................................ 107 4.5 Summary ................................. 109 Chapter5: Agreement ............................. 112 5.1 TypologyofAgreement. 114 5.2 AnalysisofAgreement . .. .. .. 121 5.3 Features .................................. 124 5.3.1 TypologyofGender. 125 5.3.2 AnalysisofGender . 128 5.3.3 TypologyofNumber . 129 5.3.4 AnalysisofNumber. 133 5.3.5 TypologyofPerson . 137 5.3.6 AnalysisofPerson . .. .. .. 140 5.3.6.1 Merging Person and Number . 141 5.3.6.2 Subdivided First Person . 144 5.3.7 Summary ............................. 146 iii 5.4 Questionnaire ............................... 146 5.4.1 Describing Hierarchies . 147 5.4.2 OtherFeatures .......................... 149 5.4.3 HierarchyAugmentation . 151 5.5 TestCases................................. 156 5.6 Summary ................................. 156 Chapter6: CaseStudy:Sahaptin . 157 6.1 ABriefSketchofSahaptin. 158 6.1.1 CaseMarking ........................... 159 6.1.2 Agreement and Argument Marking . 160 6.2 CustomizingaSahaptinGrammar. 162 6.2.1 WordOrder ............................ 164 6.2.2 Number .............................. 164 6.2.3 Person............................... 165 6.2.4 Case ................................ 167 6.2.5 Direct-Inverse . .. .. 167 6.2.6 OtherFeatures .......................... 168 6.2.7 Lexicon .............................. 169 6.3 TestingtheSahaptinGrammar . 176 6.4 Summary ................................. 181 Chapter7: Conclusion ............................. 183 7.1 TheMatrixandTypology . 183 7.1.1 RegressionTesting . 184 7.1.2 TestingTypology . 186 7.1.3 ADatabaseofChoicesFiles . 187 7.1.4 Computational Linguistic Typology . 188 7.2 FutureWork................................ 192 7.2.1 CoordinationandAgreement . 192 7.2.2 SyntacticErgativity . 194 7.2.3 LexicalTypeHierarchies . 194 7.2.4 Usability.............................. 195 iv 7.3 Conclusion................................. 196 Bibliography .................................... 198 AppendixA: GermanChoices . .. .. .. 208 AppendixB: DyirbalChoices. 212 AppendixC: HindiChoices............................ 216 AppendixD: TagalogChoices . .. .. .. 219 AppendixE: CreeChoices ............................ 222 AppendixF: ForeChoices ............................ 228 AppendixG: SahaptinChoices . 233 Appendix H: Sahaptin Test Sentences . 246 v LIST OF FIGURES Figure Number Page 2.1 The customization system questionnaire main page . ..... 37 2.2 Noun types in the customization system questionnaire . ....... 39 2.3 Latin tense slot in the questionnaire . ... 41 2.4 Latin person/number slot in the questionnaire . ..... 42 3.1 The beginning of the case section of the questionnaire . ....... 69 3.2 The section of the questionnaire for defining quirky cases....... 70 4.1 The direct-inverse section of the questionnaire . ....... 104 5.1 The gender section of the questionnaire . ... 149 5.2 The number section of the questionnaire . 150 5.3 The person section of the questionnaire . ... 151 5.4 The “other features” section of the questionnaire . ....... 152 5.5 Algorithmforhierarchyaugmentation. ... 155 6.1 A Sahaptin pronoun defined in the questionnaire . .... 166 7.1 The general information section of the questionnaire . ........ 189 vi LIST OF TABLES Table Number Page 2.1 Basicwordordersacrosslanguages . 19 2.2 HeadingsgeneratedbyAgile . 27 2.3 Paradigm of the Latin verb amare .................... 40 2.4 Portion of a choices file corresponding to Figure 2.2 . ...... 43 3.1 Testsentences:None ........................... 71 3.2 Test sentences: Nominative-accusative . .... 71 3.3 Test sentences: Ergative-absolutive . .... 72 3.4 Testsentences: Tripartite . 72 3.5 Testsentences:Split-S .......................... 72 3.6 Testsentences:Fluid-S. .. .. .. 73 3.7 Testsentences:Split-N .......................... 74 3.8 Testsentences:Split-V .......................... 75 3.9 Test sentences: Focus-case . 76 3.10 Test sentences: Adpositions . 76 3.11 Test sentences: