Inductive Learning of Phonotactic Patterns
Total Page:16
File Type:pdf, Size:1020Kb
University of California Los Angeles Inductive Learning of Phonotactic Patterns A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Linguistics by Jeffrey Nicholas Heinz 2007 c Copyright by Jeffrey Nicholas Heinz 2007 The dissertation of Jeffrey Nicholas Heinz is approved. Bruce Hayes D. Stott Parker Colin Wilson Kie Zuraw, Committee Co-chair Edward P. Stabler, Committee Co-chair University of California, Los Angeles 2007 ii To Mika iii Table of Contents 1 Introduction ................................. 1 1 Thesis .................................. 1 1.1 LocalityandLearning ..................... 2 1.2 FactoringtheLearningProblem . 4 2 Other Approaches to Phonotactic Learning . 5 2.1 Learning with Principles and Parameters . 7 2.2 Learning with Optimality Theory . 8 2.3 Learning with Connectionist Models . 10 2.4 LearningwithStatisticalModels . 11 2.5 LocalSummary......................... 12 3 Overview................................. 12 Appendices ................................. 16 A–1 MathematicalPreliminaries . 16 A–1.1 Sets ............................... 16 A–1.2 RelationsandPartiallyOrderedSets . 17 A–1.3 Equivalence Relations and Partitions . 18 A–1.4 Functions and Sequences . 18 A–1.5 StringsandFormalLanguages . 20 2 Establishing the Problem and Line of Inquiry ............ 22 1 Phonotactic Patterns and Phonotactic Knowledge . .. 22 iv 1.1 Patterns over Contiguous Segments . 23 1.2 Patterns over Non-contiguous Segments . 28 1.3 StressPatterns ......................... 29 1.4 Nonarbitrary Character of Phonotactic Patterns . 31 2 PhonotacticGrammars......................... 32 2.1 TheChomskyHierarchy . .. .. 33 2.2 PhonotacticPatternsasRegularSets . 34 2.3 Examples ............................ 37 2.4 LocalSummary......................... 39 3 Addressing the Learning Problem . 40 3.1 TheGoldLearningFramework . 42 3.2 The Probably-Approximately Correct (PAC) Framework . 44 3.3 SummaryofNegativeResults . 45 3.4 PositiveResults......................... 46 4 AResearchStrategy .......................... 47 5 Summary ................................ 50 Appendices ................................. 51 B–1 AFormalTreatmentoftheGoldFramework . 51 B–1.1 Definitions............................ 51 B–1.2 Any Superfinite Class is not Gold-learnable . 52 B–2 AFormalTreatmentofthePACFramework . 56 B–2.1 Definitions............................ 56 B–2.2 TheVCDimension ....................... 57 v B–2.3 The Class of Finite Languages is not PAC-learnable . .. 58 B–3 FiniteStateAcceptors . 59 B–3.1 Definition ............................ 59 B–3.2 Extending the Transition Function . 59 B–3.3 TheLanguageofanAcceptor . 60 B–3.4 BinaryOperations . 60 B–3.5 ReverseAcceptors . 61 B–3.6 Forward and Backward Deterministic Acceptors . 61 B–3.7 Relations Between Acceptors . 62 B–3.8 StrippedAcceptors . 62 B–3.9 CyclicAcceptors . .. .. 62 B–3.10 Languages and the Machines which Accept Them . 63 B–3.11TailCanonicalAcceptors. 64 B–3.12 The Myhill-Nerode Theorem . 64 B–3.13HeadCanonicalAcceptors . 66 3 Patterns over Contiguous Segments .................. 73 1 Overview................................. 73 2 N-gramGrammarsandLanguages. 74 3 Learning N-gram Languages as String Extension Learning . ... 77 4 GeneralizingbyStateMerging. 80 4.1 TheBasicIdea ......................... 80 4.2 PrefixTrees ........................... 81 4.3 StateMerging.......................... 83 vi 5 LearningN-gramLanguageswithStateMerging . 84 6 AreN-gramsAppropriateforPhonotactics? . 87 6.1 N-gramModelsCountto(n–1) . 87 6.2 Resolvable Consonant Clusters . 88 6.3 Discussion............................ 90 7 Summary ................................ 91 Appendices ................................. 92 C–1 StringExtensionGrammars . 92 C–1.1 Definitions and Properties of Lf ................ 92 C–1.2 Natural Gold-learning of Lf for Finite A ........... 95 C–2 AFormalTreatmentofN-grams. 96 C–2.1 The N-Contiguous Set Function . 96 C–2.2 N-GramGrammarsand N-gramLanguages . 97 C–2.3 PropertiesofN-gramLanguages. 98 C–2.4 ASimpleLearner . 101 C–3 AFormalTreatmentofStateMerging . 101 C–3.1 PrefixTrees ........................... 101 C–3.2 StateMerging. .. .. 102 C–3.3 TheStateMergingTheorem . 103 C–4 Learning Ln−gram viaStateMerging. 106 C–4.1 FiniteStateRepresentations . 106 C–4.2 Towards a Language-theoretic Characterization . 108 C–4.3 Learning N-gram Languages by State Merging . 109 vii C–4.4 Obtaining the Canonical FSA for a N-gram Grammar . 111 4 Patterns Over Non-contiguous Segments ............... 114 1 Overview................................. 114 2 LongDistanceAgreement . .. .. 115 2.1 KindsofLDA.......................... 116 2.2 InadequacyofN-gramModels . 120 2.3 Long Distance Agreement or Spreading? . 120 3 PrecedenceGrammars ......................... 121 3.1 TheBasicIdea ......................... 121 3.2 LearningPrecedenceLanguages . 123 3.3 LocalSummary......................... 126 4 Learning Precedence Languages by State Merging . 126 5 PropertiesofLDA ........................... 129 5.1 Phonetically Unnatural LDA Patterns . 131 5.2 DistanceEffects......................... 132 5.3 BlockingEffects......................... 133 6 Learning LDA Patterns and Patterns over Contiguous Segments . 136 7 Summary ................................ 136 Appendices ................................. 138 D–1 AFormalTreatmentofPrecedence . 138 D–1.1 PrecedenceRelationsandSets . 138 D–1.2 Precedence Grammars and Precedence Languages . 140 viii D–1.3 PropertiesofPrecedence Languages . 141 D–1.4 Learning Precedence Languages by String Extension . 144 D–2 Learning Lprec viaStateMerging . 144 D–2.1 FiniteStateRepresentation . 144 D–2.2 Towards a Language-theoretic Characterization . 146 D–2.3 Learning Precedence Languages by State Merging . 147 D–2.4 Obtaining the Canonical FSA for a Precedence Grammar . 149 D–3 Extending Precedence Grammars to Handle Local Blocking ..... 150 D–3.1 DefinitionsandExamples . 150 D–3.2 Properties of Relativized Bigram Precedence Languages. 152 D–3.3 LearningwithStringExtension . 152 5 Stress Patterns ............................... 153 1 Overview................................. 153 2 StressPatternsintheWorld’sLanguages . 154 2.1 TheStressTypology ...................... 154 2.2 SummaryoftheTypology . 164 2.3 Inadequacy of N-gram and Precedence Learners . 164 2.4 UnattestedStressPatterns. 165 3 Neighborhood-Distinctness . 165 3.1 Definition ............................ 166 3.2 Universality ........................... 168 3.3 Discussion............................ 168 4 TheLearner............................... 170 ix 4.1 TheForwardNeighborhoodLearner. 171 4.2 SuffixTrees ........................... 173 4.3 The Forward Backward Neighborhood Learner . 174 5 Discussion................................ 177 5.1 Basic Reasons Why the Forward Backward Learner Works . 177 5.2 InputSamples.......................... 178 5.3 UnlearnableUnattestedPatterns . 179 5.4 UnlearnableAttestedPatterns. 181 5.5 Addressing the Unlearnable Attested Patterns . 182 5.6 OtherPredictions........................ 185 5.7 ComparisontootherLearningModels . 186 6 Summary ................................ 189 Appendices ................................. 191 E–1 AFormalTreatmentofSuffixTrees . 191 E–2 Results of the Neighborhood Learning Study . 192 6 Deconstructing Neighborhood-distinctness .............. 203 1 Overview................................. 203 2 GeneralizingtheNeighborhood . 204 2.1 Preliminaries .......................... 204 2.2 Neighborhood-distinctness . 205 3 Subsets of Neighborhood-distinct Languages . 207 3.1 PrecedenceLanguages . 207 x 3.2 N-gramLanguages . .. .. 208 3.3 LocalSummary......................... 210 4 Neighborhood-distinctness not Preserved Under Intersection . 212 5 Towards a Compositional Analysis of Neighborhood-distinctness . 214 5.1 Strategy............................. 214 5.2 Merging States in Prefix Trees with Same Incoming Paths of Length n ............................ 215 5.3 Merging States in Suffix Trees with Same Outgoing Paths of Length n ............................ 216 5.4 Merging States in Prefix Trees with Same Outgoing Paths of Length n ............................ 216 5.5 Merging States in Suffix Trees with Same Incoming Paths of Length n ............................ 216 5.6 MergingFinalStatesinPrefixTrees . 217 5.7 MergingStartStatesinSuffixTrees. 224 5.8 MergingStartStatesinPrefixTrees . 228 5.9 MergingFinalStatesinSuffixTrees. 228 5.10 Merging Nonfinal States in Prefix Trees . 228 5.11 Merging Nonstart States in Suffix Trees . 228 5.12 Merging Nonstart States in Prefix Trees . 229 5.13 Merging Nonfinal States in Suffix Trees . 229 5.14 LocalSummary ......................... 230 6 Summary ................................ 232 xi 7 Conclusion .................................. 233 1 Results.................................. 233 2 LookingAhead ............................. 236 ⋆ Appendix: The Stress Typology .................... 239 xii List of Figures 2.1 TheChomskyHierarchy . .. .. 35 2.2 ∗CCCinYawelmaniYokuts . 38 2.3 NavajoSibilantHarmony. 38 2.4 TheStressPatternofPintupi . 39 2.5 TheLearningProcess.......................... 40 2.6 Locating Human Languages in the Chomsky Hierarchy . .. 48 2.7 Locating Phonotactic Patterns in the Regular Languages (I) .... 49 2.8 Locating Phonotactic Patterns in the Regular Languages (II). 49 2.9 A Tail-canonical Acceptor for L = {1, 10, 010, 0100, 01000,...} ... 70 2.10 A Head-canonical Acceptor for L = {1, 10, 010, 0100, 01000,...} .. 70 2.11 A Tail-canonical Acceptor for L={λ, 00, 11, 0011, 1100,