
Improved Reconstruction of Protolanguage Word Forms Alexandre Bouchard-Côté Thomas L. Griffiths Dan Klein Oceanic languages Hawaiian Samoan Tongan Maori Oceanic languages ‘fish’ Hawaiian iʔa Samoan iʔa Tongan ika Maori ika Shared ancestry ‘fish’ Hawaiian iʔa Samoan iʔa Proto-Oceanic Tongan ika Maori ika Shared ancestry ‘fish’ *k > ʔ Hawaiian iʔa Samoan iʔa Proto-Oceanic Tongan ika ‘fish’ Maori ika POc *ika Shared ancestry ‘fish’ Hawaiian iʔa Samoan iʔa Proto-Oceanic Tongan ika ‘fish’*ʔ > k Maori ika POc *ika or ‘fish’ POc *iʔa Shared ancestry ‘fish’ Hawaiian iʔa Samoan iʔa Proto-Oceanic Tongan ika ‘fish’ Maori ika POc *ika or ‘fish’ POc *iʔa Can we harness more languages? ‘fish’ Hawaiian iʔa Samoan iʔa Tongan ika Maori ika Geser ikan Rapanui ika Nukuoro iga Niue ika Welcome to Oceanic Park Oceanic Park . Motivation . Computational model Outline: . Learning and inference . Experiments on Proto-Oceanic Why reconstruct? . Can answer a large number of questions about our past . Learn about ancient populations’ migrations Why reconstruct? . Can answer a large number of questions about our past . Learn about ancient populations’ migrations . Decipherment of ancient scripts How linguists do reconstruction . Direct diachronic evidence, sometimes How linguists do reconstruction . Direct diachronic evidence, sometimes ‘fish’ ‘fear’ . Often not available Hawaiian iʔa makaʔu (prehistorical cultures) Samoan iʔa mataʔu . The comparative method Tongan ika manavahē . Unsupervised setup Maori ika mataku Computational Model Input Nggela Bugotu Tape Avava Neveei Naman Nese ‘fish’ ‘fear’ SantaAna Nahavaq Nati KwaraaeSol Lau Kwamera iʔa makaʔu Tolo Hawaiian Marshalles PuloAnna ChuukeseAK SaipanCaro Puluwatese Woleaian PuloAnnan iʔa mataʔu ... Carolinian Samoan Woleai Chuukese Nauna PaameseSou Anuta VaeakauTau Takuu ika Tokelau Tongan Tongan Samoan IfiraMeleM Tikopia Tuvalu + Niue FutunaEast ika mataku UveaEast Maori Rennellese Emae Kapingamar Sikaiana Nukuoro ... Luangiua Hawaiian Marquesan Tahitianth Rurutuan Maori Tuamotu Mangareva Rarotongan 512 languages X 6856 cognate sets Penrhyn RapanuiEas Pukapuka Mwotlap Mota FijianBau IPA format Namakir Nguna ArakiSouth Saa Raga PeteraraMa Density: 60K entries Input Nggela Bugotu Tape Avava Neveei Naman Nese ‘fish’ ‘fear’ SantaAna Nahavaq Nati KwaraaeSol Lau Kwamera iʔa makaʔu Tolo Hawaiian Marshalles PuloAnna ChuukeseAK SaipanCaro Puluwatese Woleaian PuloAnnan iʔa mataʔu ... Carolinian Samoan Woleai Chuukese Nauna PaameseSou Anuta VaeakauTau Takuu ika Tokelau Tongan Tongan Samoan IfiraMeleM Tikopia Tuvalu + Niue FutunaEast ika mataku UveaEast Maori Rennellese Emae Kapingamar Sikaiana } Nukuoro ... Luangiua Hawaiian Marquesan Tahitianth Rurutuan Maori Tuamotu Mangareva Rarotongan 512 languages X 6856 cognate sets Penrhyn RapanuiEas Pukapuka Mwotlap Mota FijianBau IPA format Namakir Nguna ArakiSouth Saa Raga PeteraraMa Density: 60K entries Input Nggela Bugotu Tape Avava Neveei Naman Nese ‘fish’ ‘fear’ SantaAna Nahavaq Nati KwaraaeSol Lau Kwamera iʔa makaʔu Tolo Hawaiian Marshalles PuloAnna ChuukeseAK SaipanCaro Puluwatese Woleaian PuloAnnan iʔa mataʔu ... Carolinian Samoan Woleai Chuukese Nauna PaameseSou Anuta VaeakauTau Takuu ika Tokelau Tongan Tongan Samoan IfiraMeleM Tikopia Tuvalu + Niue FutunaEast ika mataku UveaEast Maori Missing data Rennellese Emae Kapingamar Sikaiana Nukuoro ... Luangiua Hawaiian Marquesan Tahitianth Rurutuan Maori Tuamotu Mangareva Rarotongan 512 languages X 6856 cognate sets Penrhyn RapanuiEas Pukapuka Mwotlap Mota FijianBau IPA format Namakir Nguna ArakiSouth Saa Raga PeteraraMa Density: 60K entries Graphical model ‘fish’ ‘fear’ Hawaiian iʔa makaʔu Samoan iʔa mataʔu Tongan ika Maori ika mataku Graphical model POc ‘fish’ ‘fear’ Hawaiian iʔa makaʔu Samoan iʔa mataʔu Tongan ika i!a i!a ika ika Maori ika mataku H. S. T. M. Graphical model POc ‘fish’ ‘fear’ Hawaiian iʔa makaʔu Samoan iʔa mataʔu Tongan ika maka!u maka!u makaku Maori ika mataku H. S. T. M. Graphical model POc ‘fish’ ‘fear’ Hawaiian iʔa makaʔu Samoan iʔa mataʔu Tongan ika Maori ika mataku H. S. T. M. Graphical model POc Models diachronic ‘fish’ ‘fear’ word mutation Hawaiian iʔa makaʔu Samoan iʔa mataʔu Tongan ika Maori ika mataku H. S. T. M. Modeling string mutation . What kind of string mutations need to be captured? . Substitution *k > ʔ Modeling string mutation . What kind of string mutations need to be captured? . Substitution ‘break’ *k > ʔ Hawaiian haki . Insertion (and deletion) Samoan fati *h > wh Tongan fasi Maori whati Modeling string mutation . What kind of string mutations need to be captured? . Substitution ‘break’ *k > ʔ Hawaiian haki . Insertion (and deletion) Samoan fati *h > wh Tongan fasi Maori whati . Context *h > wh / # _ Modeling string mutation . What kind of string mutations need to be captured? . Substitution ‘break’ ‘aloha’ *k > ʔ Hawaiian haki aloha . Insertion (and deletion) Samoan fati alofa *h > wh Tongan fasi ʔalofa Maori whati aroha . Context NOT: arowha *h > wh / # _ String transducer ta!gis ta!gi ta!gis ta!gi a!gi angi ‘to cry’ String transducer ta!gis # t a ! i s # ta!gi ta!gis ta!gi a!gi angi # a n g i # ‘to cry’ String transducer } ! # t a ! i s # ? # a n g i # String transducer θS : Substitution/Deletion Parameters !S ! # t a ! i s # ? # a n g i # String transducer θI : Insertion Parameters !S !I ! # t a ! i s # n ? # a n g i # String transducer !S !I ! # t a ! i s # n g ? # a n g i # Parameters ! . Global? . Cannot explicitly represent sound changes! POc H. S. T. M. θ = θS & θI Parameters . Global? . Cannot explicitly represent sound changes! ! ! ! ! ! ! . Branch-specific POc . Parameter proliferation! H. S. T. M. Parameters ! . Global? . Cannot explicitly represent sound changes! ! ! ! ! ! ! . Branch-specific POc . Parameter proliferation! . Solution: . Learning H. S. T. M. cross-linguistic trends Cross-linguistic trends . Some sound changes are unlikely cross-linguistically: . Velar stop to vowel: k > a Cross-linguistic trends . Some sound changes are unlikely cross-linguistically: . Velar stop to vowel: k > a . Some sound changes are frequent cross-linguistically: . Consonant place change: k > ʔ . Debuccalization: f > h . Identity (faithfulness): x > x Learning cross-linguistic trends . How to learn these universals: express the transducer parameters as the output of a log-linear model !S a ! i ‣ŋ > n θS mutation exp λ,f ‣ to Maori ∝ { " # } n ( ) Learning cross-linguistic trends . How to learn these universals: express the transducer parameters as the output of a log-linear model !S a ! i ‣ŋ > n θS mutation exp λ,f ‣ to Maori ∝ { " # } n ( ) Add the name of the current branch in the context Learning cross-linguistic trends . How to learn these universals: express the transducer parameters as the output of a log-linear model . Universals ignore the name of the current branch λ ( ŋ > n ) Universal !S + a ! i ŋ > n ‣ŋ > n λ ( mutation &) to Maori θS mutation exp λ,f ‣ to Maori ∝ { " # } +... n ( ) Branch-specific Add the name of the current branch in the context Second improvement . Response to a concrete problem: sound changes are not exceptionless in real data Second improvement . Response to a concrete problem: sound changes are not exceptionless in real data . Example: tension between a sound change and a morphological paradigm Second improvement . Response to a concrete problem: sound changes are not exceptionless in real data . Example: tension between a sound change and a morphological paradigm Passive marker Vowel sound change whaka-maori-tia vs. ia > ie (‘translate into Maori’) Which one wins? Second improvement . Response to a concrete problem: sound changes are not exceptionless in real data . Example: tension between a sound change and a morphological paradigm Passive marker Vowel sound change whaka-maori-tia vs. ia > ie (‘translate into Maori’) If the sound change wins, get marked form: ending -tie Which onebecome wins? an exception Adding markedness features . Add dependencies in the string transducer model: !S !I !S !I ! ! n g ? a n g ? Adding markedness features . Add dependencies in the string transducer model: !S !I !S !I ! ! n g ? a n g ? . Also add new features: word has /a #/ word has & /C V V/ mutation to Maori Learning and Inference Learning λ while reconstructing . Monte Carlo EM . M step: not analytic but convex . E step: challenging; use MCMC Learning λ while reconstructing . Monte Carlo EM . M step: not analytic but convex . E step: challenging; use MCMC . Hardness of inference (E step): . Horizontal links ⟹ (inference ≥ non-planar Ising inference) . Insertions, deletion ⟹ non-standard setup Our previous work Single Sequence Resampling (SSR) bubu!ru Gibbs algorithm buburu ... bubure buburu buruburu bubure buuburu vuluvulu ‘grass’ Gibbs sampler bubu!ru buburu ... bubure buburu buruburu bubure buuburu vuluvulu ‘grass’ Gibbs sampler bubu!ru ... bubu!re bubure buburu buruburu bubure buuburu vuluvulu ‘grass’ Gibbs sampler bubu!ru ... bubu!re bubure buburu buruburu bubure buuburu vuluvulu ‘grass’ Gibbs sampler . Problems with the Single Gibbs sampler: . Extremely slow in phylogenetic trees with high branching (most linguistic trees) . Slow mixing in large trees Slow mixing b u b ... b u r u b ... b u b ... b u b ... ? b u r u b ... b u r u b ... b u r u b ... v u l u v ... b u r u b ... v u l u v How to jump to a state where the liquids /r/ and /l/ have a common ancestor? Slow mixing b u b ... b u b ... b u b ... b u r u b ... v u l u v ... Slow mixing b u b ... b u b ... b u b ... b u r u b ... v u l u v ... b u b ... b u r u b ... b u b ... b u r u b ... v u l u v ... Slow mixing b u b ... b u b ... b u b ... b u r u b ... v u l u v ... b u b ... b u r u b ... b u b ... b u r u b ... b u r u b ... v u l u v ... b u r u b ... b u b ... b u r u b ... v u l u v ... Slow mixing b u b ... b u b ... b u b ... b u r u b ... v u l u v ... b u b ... b u r u b ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages77 Page
-
File Size-