When Grammar Can't Be Trusted

Department of Language and Culture (ISK) When grammar can’t be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection — Linda Wiechetek A dissertation for the degree of Philosophiae Doctor – December 2017 When grammar can’t be trusted - Valency and semantic categories in North Sámi syntactic analysis and error detection Linda Wiechetek A dissertation for the degree of Philosophiae Doctor Department of Language and Culture (ISK) December 2017 Für meine Eltern & buot sámegiela oahpahalliide Contents I Beginning 3 1 Introduction 7 2 Background and methodology 13 2.1 Theoretical background: Valency theory . 14 2.1.1 Syntactic valency . 16 2.1.1.1 Obligatoriness . 17 2.1.1.2 Syntactic tests . 19 2.1.2 Selection restrictions and semantic prototypes . 21 2.1.3 Semantic valency . 23 2.1.3.1 Semantic roles vs. syntactic functions . 24 2.1.3.2 Semantic roles vs. referential semantics . 25 2.1.4 Semantic verb classes . 25 2.1.5 Criteria for potential governors . 27 2.2 Valency theory in Sámi research . 28 2.2.1 Case and valency . 28 2.2.2 Rection and valency . 30 2.2.3 Transitivity and valency . 32 2.2.4 Syntactic valency . 34 2.2.5 Governors . 35 2.2.6 Selection restrictions . 36 2.2.7 Semantic valency . 38 2.3 Human-readable and machine-readable valency resources . 42 2.3.1 Human-readable valency resources . 42 2.3.2 Machine-readable valency resources . 43 2.4 Methodology and framework . 46 2.4.1 Methodology . 46 2.4.2 Framework . 50 2.4.2.1 The Constraint Grammar formalism . 50 2.4.2.2 North Sámi constraint grammars . 52 2.5 Definition of key concepts . 54 II Middle 59 3 Valency annotation 63 3.1 Background . 64 3.2 The valency annotation grammar valency.cg3 ................ 68 3.2.1 Governors . 68 v vi CONTENTS 3.2.1.1 Coverage . 70 3.2.1.2 Lexicon and morphological processes . 70 3.2.1.3 Lexicon and syntactic issues: Multi-word verbs . 74 3.2.1.4 Linguistic considerations: Governing verb vs. auxiliary . 79 3.2.2 Valency tags . 81 3.2.2.1 Semantic role specifications in valency tags . 84 3.2.2.2 Morpho-syntactic specifications in valency tags . 93 3.2.2.3 Selection restrictions in valency tags . 99 3.2.3 Valency frames . 102 3.2.3.1 Synonymous valencies . 103 3.2.3.2 Polysemy . 104 3.2.3.3 Diathesis alternations . 107 3.2.4 Valency rules in valency.cg3 ...................... 118 3.3 Evaluation . 121 3.4 Conclusion . 121 4 Semantic prototype annotation 129 4.1 Background . 130 4.1.1 Theoretical background . 130 4.1.2 Semantic categories in natural language processing . 132 4.2 Annotation of the North Sámi lexicon . 134 4.2.1 Syntactic relevance of semantic categories . 135 4.2.2 Semantic primitives as structuring principles . 137 4.2.3 Semantic prototypes . 140 4.2.4 Syntactic tests for category membership . 147 4.2.4.1 Testing concrete categories . 147 4.2.4.2 Testing abstract categories . 152 4.2.5 Semantic prototypes and the lexicon . 155 4.2.6 Multiple categorization . 160 4.3 Evaluation . 164 4.3.1 Lexicon and corpus coverage . 165 4.3.2 Syntactic relevance of semantic prototypes . 165 4.4 Conclusion . 172 5 Semantics and valency in grammar checking 177 5.1 Background . 180 5.1.1 General grammar checking . 180 5.1.2 North Sámi grammar checking . 186 5.2 Valencies and semantic prototypes in GoDivvun ............... 188 5.2.1 Very local error detection . 190 5.2.2 Local error detection . 193 5.2.2.1 Real word errors . 194 5.2.2.2 Local case errors . 203 5.2.2.3 Summary: Local error detection . 208 5.2.3 Global error detection . 209 5.2.3.1 Valency errors . 213 5.2.3.2 Adapting disambiguation . 234 5.2.3.3 Governor argument dependency annotation . 237 5.2.3.4 Semantic role mapping . 242 CONTENTS vii 5.2.3.5 Valency error detection and correction . 244 5.2.3.6 Summary: Global error detection . 255 5.3 Evaluation . 255 5.3.1 Evaluation of real word error detection . 256 5.3.1.1 Quantitative evaluation . 257 5.3.1.2 Qualitative evaluation . 258 5.3.1.3 Conclusion regarding real word error detection . 263 5.3.2 Evaluation of local case error detection . 265 5.3.2.1 Quantitative evaluation . 266 5.3.2.2 Qualitative evaluation . 267 5.3.2.3 Conclusion regarding local case error detection . 271 5.3.3 Evaluation of valency error detection . 272 5.3.3.1 Quantitative evaluation . 273 5.3.3.2 Qualitative evaluation . 274 5.3.3.3 Conclusion regarding valency error detection . 288 5.4 Conclusion . 290 III End 295 6 Conclusion 299 A The 500 most frequent verbs in SIKOR 317 B Semantic prototype categories in Giella-sme 327 C Grammatical tags in grammarchecker.cg3 333 C.1 Parts of speech and their subcategories . 333 C.2 Morpho-syntactic properties . 333 C.3 Derivational tags . 334 C.4 Syntactic tags . 335 C.5 Semantic role tags . 336 C.6 Valency tags . 337 List of Figures 2.1 Lexical semantic features of German stative verbs . 26 2.2 The use of introspection when analyzing grammatical errors . 49 2.3 The dependency structure of Mun lean boahtán. in Giella-sme ....... 54 2.4 The dependency structure of Son divttii Liná bargat skuvlabargguid. in Giella-sme .................................... 55 2.5 The dependency structure of Mikelek bazkaria prestatu du. ......... 55 4.1 The 25 unique beginners for WordNet nouns . 138 4.2 The hierarchy of semantic prototypes in PALAVRAS ............ 139 4.3 The hierarchy of semantic prototypes in Giella-sme ............. 139 4.4 Central and peripheral members of the human prototype category . 141 4.5 Central and peripheral members of the vehicle prototype category . 145 4.6 The lexicon entry for ealga ‘moose’ in nouns.lexc .............. 156 4.7 Xfst and lookup2cg analysis of ealgasadji ‘moose place’ . 157 4.8 Xfst and lookup2cg analyses of jahkebealle ‘half a year’ . 158 4.10 The distribution of semantic categories in dependents of rastá ‘through’ . 168 4.11 The distribution of semantic categories in objects of bidjat johtui ‘put into action’ . 169 4.12 The distribution of semantic categories in objects of máksit ‘pay’ . 170 5.1 Error detection and correction by the GoDivvun online tool . 187 5.2 Simplified vislcg3 error detection and correction rules . 188 5.3 The system architecture of all local error detection in GoDivvun ...... 194 5.4 Disambiguation rules for badjel ‘over’ in disambiguator.cg3 ......... 206 5.5 The system architecture of global error detection in GoDivvun ....... 209 5.6 Ex. (50) syntactically analyzed by GoDivvun ................. 235 5.7 Dependency rules for governors with accusative + infinitive valencies in grammarchecker.cg3 .............................. 242 5.8 An error detection rule for verbal governors with a theme in illative case in grammarchecker.cg3 ............................. 247 5.9 The system architecture of GoDivvun ..................... 266 viii List of Tables 2.1 English verbs with selection restrictions to their subjects/objects . 22 2.2 Valency entries in different human-readable valency resources . 42 2.3 A comparison of some machine-readable valency resources . 45 3.1 The performance of Constraint Grammar tools that make use of valencies . 66 3.2 Some of the 500 most frequent verbs in SIKOR ............... 71 3.3 Part of speech-changing and non part of speech-changing derivations and inflections in North Sámi that affect valencies . 72 3.4 North Sámi verbal derivations and their effects on valency . 73 3.5 Four multi-word verbs and their theme realizations in SIKOR ....... 76 3.6 The distribution of nouns/adverbs as part of multi-verb words in SIKOR . 77 3.7 Different types of valency tags in valency.cg3 ................. 84 3.8 The North Sámi semantic role set used in valency.cg3 ............ 85 3.9 Semantic role annotation in Sammallahti (2005), Aldezabal (2004) and Bick (2007c) and valency.cg3 ............................ 88 3.10 Different morphological specifications in the valency tags of valency.cg3 .. 94 3.11 Valency tags for different types of finite subclauses in valency.cg3 ..... 95 3.12 Valency tags for accusative + infinitive constructions in valency.cg3 .... 99 3.13 Selection restrictions in valency tags in valency.cg3 ............. 100 3.14 North Sámi verbs with synonymous valencies . 103 3.15 The valency variation of bidjat ‘put’ in valency.cg3 ............. 105 3.16 Some polysemous verbs and their valencies in valency.cg3 .......... 106 3.17 Valency tags for derived verbs in valency.cg3 ................. 110 3.18 The distribution of valencies of passive, causative, reflexive, and reciprocal verbs in SIKOR ................................. 113 3.19 North Sámi verbs that participate in alternations without morphological derivations . 117 3.20 Rule distribution in valency.cg3 ........................ 119 3.21 Lexicon and corpus coverage of the valency tags in valency.cg3 ....... 122 3.22 The most valency-rich verbs in valency.cg3 .................. 122 4.1 A comparison of semantic categories in SALDO, WordNet 3.0, PALAVRAS and XUXENg .................................. 133 4.2 The distribution of members of the human prototype category as antecedents of relative subclauses . 143 4.3 The distribution of members of the vehicle prototype category in sentences with motion verbs . ..

When Grammar Can't Be Trusted

Annotation Schemes in North Sami Dependency Parsing

NLP Commercialisation in the Last 25 Years

Danish in Head-Driven Phrase Structure Grammar

Dative (First) Complements in Basque

Germanic Standardizations: Past to Present (Impact: Studies in Language and Society)

Intellibot: a Domain-Specific Chatbot for the Insurance Industry

Comparing the Basque Diaspora

Sentence Negation in Basque

Dative Overmarking in Basque: Evidence of Spanish-Basque Convergence

Grammar Checker for Hindi and Other Indian Languages

Looking for Parametric Correlations Within Faroese Höskuldur Thráinsson University of Iceland

Orthographies in Grammar Books