Opportunities in quantitative historical linguistiics
Eric Smith
work with Tanmoy, Jon, Bill, Ian, Dan Hruschka,Logan Sutton, Hyejin Youn
collaborative with Murray Gell-Mann, Ilia Peiros, George Starostin Outline
• Brief history of languages at SFI
• The steps toward quantitative phylogenetics • Example 1: lexical reconstruction • Example II: inferring models of word-order change
• New opportunities for empirical discovery The Evolution of Human
Goals: Deep reconstruction of language history and connection with genes and the archaeological record
Murray Gell-Mann George Starostin
Ilia Peiros Sergei Starostin Deep reconstruction: the Nostratic hypothesis
http://starling.rinet.ru/maps/maps.php?lan=en • Coined by Holger Pedersen (1903) • Modern form of the hypothesis by Vladislav Ilitch-Svitych and Aharon Dolgopolsky (1960s -- present) • Estimated 12,000-15,000 BCE http://en.wikipedia.org/wiki/Nostratic_languages Methods: lexicostatistics and glottochronology
Assign any sound map without penalty, EINS ONE • ZWEI TWO but require regularity DREI THREE VIER FOUR • Exclude borrowed items in either FàNF FIVE language from consideration 3OUND 4IER -AP ANIMAL 2EH DEER • Identify fraction preserved cognates; (UND DOG convert to separation time (penalty) 6ATER FATHER -UTTER MOTHER Attempt to fit separation times to an -ILCH MILK • &INGER FINGER ultrametric structure (tree) !RM ARM "EIN LEG t/τ PPreserve(word) = e−
∆t = τ log (frac. preserved) sep − EHL today • The Dene-Caucasion hypothesis • Borean? Relic of the last ice-age? • Reconstructions of Dravidian, Khoisan, ...
• The EHL database and Starling
http://starling.rinet.ru/cgi-bin/response.cgi?root=config&morpho=0&basename=\data\alt\turcet&first=1 Concepts and steps in a quantitative phylogenetics
• Roles of Likelihood and Bayesian methods • Frequent-pattern versus rare-feature innovations • Bayes’s theorem and prior prejudice • Typological constraints and Bayesian priors • Information criteria and significance of parameters • The likelihood part of a phylogenetic algorithm • Overall structure of sound and meaning change • Alignment, sound correspondence, and errors • Context discovery, detection of borrowings • Classification and reconstruction Rare innovations versus clusters of common innovations • Rare innovations: single features with ~0 probability to occur by chance (go in Bayesian priors) • Imply common descent or borrowing, even w/o mathematics • Only seen once: hard to assign probabilities from frequencies • Common in morpho-syntactic features • Useless for dating; do not support induction • Common variations: (estimate with likelihood) • Examples: sound shift and meaning shift in core lexicon • Individually uninformative, but can assign probabilities from data • Require math to handle, but do support induction, and can be informative about dates if change processes are regular Bayes’s Theorem and model comparison P (m, d) P (m) ≡ Represent both data and models ! d • )
d
with a joint probability (
P Split joint probability into • ≡ conditionals either of two ways ta P (m, d)
) da
d
m,
P (m, d) = P (m d) P (d) (
| P = P (d m) P (m) | m models ! • Bayes’s theorem: priors and likelihoods
P (d m) P (m) Bayesian prior P (m d) = | | P (d) Bayesian P (d m) P (m) posterior = | P (d m!) P (m!) m! | ! Typological constraints are a natural domain for priors (Ian’s lecture) • Three roles for priors • Include frequency evidence from outside this sample • Include non-frequency evidence (rare innovations) • Represent out-of-field evidence (molecular phylogenies) • On states • phoneme inventory, word order, ... • implicational relations (pronouns, time, color, aspect) • On transitions • phoneme contexts, intermediate word-order states (NDO) • geometric models of phonology or semantics? Akaike and Bayesian Information Criteria
K N! AIC 2 log (L) + 2k P (n , . . . n ) = pni 1 K i n !, . . . , n ! ≡ − i=1 1 K ! " # ND( !n p!) BIC 2 log (L) + k log (n) = e− N " ≡ − K !n n ni D p! i log N N ! ≡ N p i=1 i ! " # Kullback-Leibler divergence, or Relative Entropy ND( !n p!) n˜ e− N " δ p˜ = ρ(p˜) k Likelihoods & N − n˜ !n " # (n˜ , . . . , n˜ ) = p˜ j their maxima ! L 1 k j j=1 !k n˜i n˜j n˜j p˜ maxp˜ log (n˜1, . . . , n˜k) = N log i → N L N N j=1 ! ND( !n p!) k e− N " N p˜i log p˜i N p˜i(p!) log p˜i(p!) + (Distribution of max-likelihoods) ≈ 2 !n i i ! ! ! k 2 ND( !n p!) ND( !n p!) N (p˜(!n/N) p˜(p!)) e− N " n˜ (p!) log p˜ N p˜ (p!) log p˜ (p!) e− N " − i i ≈ i i − 2 p˜ (p!) i i j=1 i !!n ! ! !!n ! k Maximize average = N p˜ (p!) log p˜ (p!) i i − 2 likelihood of real samples i ! over average models N p˜ log p˜ k ∼ i i − Unbiased sample estimator! i ! More detail on Akaike derivation
k (n˜ , . . . , n˜ ) = p˜n˜j Likelihood of any data L 1 k j given a particular model j=1 ! Average log-likelihood of actual data, from a model ND !n p! e− ( N " ) log (n˜(!n) p˜) = n˜ (p!) log p˜ produced with fixed L | i i i estimated parameters p~ !!n (average) (data) (model) !
Now this averaged log-likelihood, averaged over estimated models; 2nd-order Taylor exp’n
k 2 ND( !n p!) ND( !n p!) N (p˜(!n) p˜(p!)) e− N " n˜ (p!) log p˜ (!n) N p˜ (p!) log p˜ (p!) e− N " − i i ≈ i i − 2 p˜ (p!) i i j=1 i !!n ! (estimated ! !!n ! (average) k models) = N p˜ (p!) log p˜ (p!) i i − 2 i But we don’t have the ideal p~s; the best! we can do is obtain an unbiased estimator from any single sample; for this we need to identify the bias typical of samples
ND !n p! k e− ( N " )N p˜ log p˜ N p˜ (p!) log p˜ (p!) + i i ≈ i i 2 !n i i ! !(sample ML) ! (ideal ML) (variance correction)
Use this to replace the ideal ML with an unbiased estimator from samples, get previous slide (data) Word lists are the starting point for lexical (= phonological / semantic) reconstruction http://starling.rinet.ru/cgi-bin/response.cgi?root=config&morpho=0&basename=\data\alt\turcet&first=1 Objects that must be modeled are joint histories of sound and meaning for a collection of words
• n.b., word forms are attested; meanings are indirectly inferred, and often ambiguous • easy to trace a form; but inadequate to infer history from forms alone
BREATH BLOW 0)% DHUS DESTRUCTION DISSIPATION
0' DEUZA WILD ANIMAL ,AT BESTIA ANIMAL WILD
'ER TIER ANIMAL %NG DEER DEER )T BESTIA ANIMAL BEAST &R BETE ANIMAL BEAST (n.b.: red deer) (n.b.: bete noir) Representing sound and meaning “innovation” in the comparative method (Bill’s lecture)
• Suppose that some stable meaning EINS ONE categories can be identified ZWEI TWO DREI THREE Identify primary words for each meaning VIER FOUR • FàNF FIVE 3OUND • Try to exclude “borrowed” terms; 4IER -AP ANIMAL suppose that what is left has been 2EH DEER (UND DOG transmitted through vertical descent 6ATER FATHER -UTTER MOTHER • Identify systematic sound relations and -ILCH MILK try to infer historical sound changes &INGER FINGER !RM ARM "EIN LEG • Associate semantic innovations with in- language substitutions within meaning categories Preserved meanings suggest sound maps
0)% (OI WO KO NO (Numbers give an example in DUOH which we can treat primary TREI meanings as language-universal KWETUOR and historically relatively stable) FINFI
0' AINAZ ,AT UNUS TWAI Use preservation of DUO meaning to infer regular RJIZ TRES D relations of sounds: here, FIDWOR e.g., thr <> tr QUATTUOR (Innovation) FINFI QUINQUE (Sound similarity) (Sound similarity) 'ER EINZ %NG ONE )T UNO &R UN ZWEI TWO DUE DEUX DREI THREE TRE TROIS VIER FOUR QUATTRO QUATRE FUNF FIVE CINQUE CINQ Sound maps help identify meaning shifts
0)% DHUS ANE
KWEN
0' DEUZA ,AT ANIMAL RAICAZ CUN DA CANIS