<<

Phylogenetic trees IV Maximum Likelihood

Gerhard Jäger

ESSLLI 2016

Gerhard Jäger Maximum Likelihood ESSLLI 2016 1 / 50 Theory

Theory

Gerhard Jäger Maximum Likelihood ESSLLI 2016 2 / 50 Theory Recap: Continuous time Markov model

l4

l3

l2 l5  s + re−t r − re−t  P (t) = l8 s − se−t r + se−t l1 π = (s, r)

l6 l7

Gerhard Jäger Maximum Likelihood ESSLLI 2016 3 / 50 Theory Likelihood of a tree background reading: Ewens and Grant (2005), 15.7

l4

l3

simplifying assumption: evolution at l2 different branches is independent l5 suppose we know probability l8 l distributions vt and vb over states at 1 top and bottom of branch lk T l6 L(lk) = vt P (lk)vb l7

Gerhard Jäger Maximum Likelihood ESSLLI 2016 4 / 50 Theory Likelihood of a tree

likelihoods of states (0, 1) at root are

T T v1 P (l1)v2 P (l2) l2 log-likelihoods

T T log(v P (l1)) + log(v P (l2)) 1 2 v2 l1 log-likelihood of larger tree: recursively apply this method from tips to root

v1

Gerhard Jäger Maximum Likelihood ESSLLI 2016 5 / 50 Theory (Log-)Likelihood of a tree

log L(tips below|mother = s) = P P 0 d∈daughters s0∈states log P (s → s |branchlength)+ log(L(tips below d|d = s0))

Gerhard Jäger Maximum Likelihood ESSLLI 2016 6 / 50 Theory (Log-)Likelihood of a tree

this is essentially identical to Sankoff algorithm for parsimony:

weight(i, j) = log P (lk)ij weight matrix depends on branch length → needs to be recomputed for each branch overall likelihood for entire tree depends on probability distribution on root if we assume that root node is in equilibrium:

L(tree) = (s, r)T L(root)

does not depend on location of the root (→ time reversibility) this is for one character — likelhood for all data is product of likelihoods for each character

Gerhard Jäger Maximum Likelihood ESSLLI 2016 7 / 50 Theory (Log-)Likelihood of a tree

likelihood of tree depends on branch lengths rates for each character likelihood for tree topology: ~ L(topology) = max L(tree|lk) lk: k is a branch

Gerhard Jäger Maximum Likelihood ESSLLI 2016 8 / 50 Theory (Log-)Likelihood of a tree

Where do we get the rates from? different options, increasing order of complexity 1 s = r = 0.5 for all characters 2 r = empirical relative frequency of state 1 in the data (identical for all characters) 3 a certain proportion pinv (value to be estimated) of characters are invariant 4 rates are gamma distributed

Gerhard Jäger Maximum Likelihood ESSLLI 2016 9 / 50 Theory Gamma-distributed rates

we want allow rates to vary, but not too much common method (no real justification except for mathematical convenience) equilibrium distribution is identical for all characters rate matrix is multiplied with coefficient λi for character i λi is random variable drawn from a Gamma distribution ββx(β−1)e−βx L(r = x) = i Γ(β)

Gerhard Jäger Maximum Likelihood ESSLLI 2016 10 / 50 Theory Gamma-distributed rates

overall likelihood of tree topology: integrate over all λi, weighted by Gamma likelihood computationally impractical in practice: split Gamma distribution into n discrete bins (usually n = 4) and approximate integration via Hidden Markov Model

Gerhard Jäger Maximum Likelihood ESSLLI 2016 11 / 50 Theory Modeling decisions to make

aspect of model possible choices number of parameters to estimate branch lengths unconstrained 2n − 3 (n is number of taxa) ultrametric n − 1 equilibrium probabilities uniform 0 empirical 1 ML estimate 1 rate variation none 0 Gamma distributed 1 invariant characters none 0 pinv 1

This could be continued — you can build in rate variation across branches, you can fit the number of Gamma categories ...

Gerhard Jäger Maximum Likelihood ESSLLI 2016 12 / 50 Theory Model selection

tradeoff rich models are better at detecting patterns in the data, but are prone to over-fitting parsimoneous models less vulnerable to overfitting but may miss important information standard issue in statistical inference one possible heuristics: Akaike Information Criterion (AIC)

AIC = −2 × log likelihood + 2 × number of free parameters

the model minimizing AIC is to be preferred

Gerhard Jäger Maximum Likelihood ESSLLI 2016 13 / 50 Theory Example: Model selection for cognacy data/ UPGMA tree

model no. branch lengths eq. probs. rate variation inv. char. AIC 1 ultrametric uniform none none 17515.95 2 ultrametric uniform none pinv 17518.39 3 ultrametric uniform Gamma none 17517.89 4 ultrametric uniform Gamma pinv 17519.75 5 ultrametric empirical none none 16114.66 6 ultrametric empirical none pinv 16056.85 7 ultrametric empirical Gamma none 15997.16 8 ultrametric empirical Gamma pinv 16022.21 9 ultrametric ML none none 16034.96 10 ultrametric ML none pinv 16058.83 11 ultrametric ML Gamma none 15981.94 12 ultrametric ML Gamma pinv 16009.90 13 unconstrained uniform none none 17492.73 14 unconstrained uniform none pinv 17494.73 15 unconstrained uniform Gamma none 17494.73 16 unconstrained uniform Gamma pinv 17496.73 17 unconstrained empirical none none 16106.52 18 unconstrained empirical none pinv 16049.28 19 unconstrained empirical Gamma none 16033.21 20 unconstrained empirical Gamma pinv 16011.38 21 unconstrained ML none none 16102.04 22 unconstrained ML none pinv 16051.27 23 unconstrained ML Gamma none 16025.99 24 unconstrained ML Gamma pinv 16001.00

Gerhard Jäger Maximum Likelihood ESSLLI 2016 14 / 50 Theory Tree search

ML computation gives us likelihood of a tree topology, given data and a model ML tree: heuristic search to find the topology maximizing likelihood optimize branch lengths to maximize likelihood for that topology computationally very demanding! for the 25 taxa in our running example, ML tree search for the full model requires several hours on a single processor; parallelization helps ideally, one would want to do 24 heuristic tree searches, one for each model specification, and pick the tree+model with lowest AIC in practice one has to make compromises

Gerhard Jäger Maximum Likelihood ESSLLI 2016 15 / 50 Running example

Running example

Gerhard Jäger Maximum Likelihood ESSLLI 2016 16 / 50 Running example Running example: cognacy data unconstrained branch lengths: ultrametric: AIC = 7929 AIC = 7972 Italian French Catalan Spanish Portuguese Romanian English Nepali Hindi Danish German Dutch Swedish Icelandic Ukrainian Russian Bengali Lithuanian Irish Polish Czech Bulgarian Welsh Spanish Portuguese Catalan Italian French Romanian German Dutch English Danish Swedish Icelandic Ukrainian Russian Polish Czech Bulgarian Lithuanian Nepali Hindi Bengali Welsh Breton Irish Greek Greek Breton

Gerhard Jäger Maximum Likelihood ESSLLI 2016 17 / 50 Running example Running example: WALS data unconstrained branch lengths: ultrametric: AIC = 2752 AIC = 2828 Ukrainian Polish Russian Portuguese Lithuanian Hindi Italian Bulgarian Romanian Greek Czech English Swedish Spanish Danish Catalan French Dutch German Icelandic Welsh Breton French Ukrainian Russian Polish Lithuanian Czech Greek Bulgarian Welsh Irish Breton Hindi Nepali Bengali English German Dutch Icelandic Swedish Danish Spanish Romanian Portuguese Italian Catalan Nepali Irish Bengali

Gerhard Jäger Maximum Likelihood ESSLLI 2016 18 / 50 Running example Running example: phonetic data unconstrained branch lengths: ultrametric: AIC = 89871 AIC = 90575 Italian Nepali Icelandic Spanish Greek Bengali Romanian Swedish Ukrainian Lithuanian Portuguese Russian Hindi Breton Catalan French Czech Danish Polish Welsh German Dutch Bulgarian Portuguese Catalan German Dutch Lithuanian Nepali Hindi Italian Spanish Romanian French Welsh Breton Irish Greek Swedish Icelandic Danish English Ukrainian Russian Czech Polish Bulgarian Bengali Irish English

Gerhard Jäger Maximum Likelihood ESSLLI 2016 19 / 50 Running example Wrapping up

ML is conceptually superior to MP (let alone distance methods) different mutation rates for different characters are inferred from the data possibility of multiple mutations are taken into account — depending on branch lengths side effect of likelihood computation: probability distribution over character states at each internal node can be read off disadvantages: computationally demanding many parameter settings makes model selection difficult (note that the ultrametric trees in our example are sometimes better even though they have higher AIC) ultrametric constraint makes branch lengths optimization computationally more expensive ⇒ not feasible for larger data sets (more than 100–200 taxa)

Gerhard Jäger Maximum Likelihood ESSLLI 2016 20 / 50 Cleaning up from yesterday

Cleaning up from yesterday

Gerhard Jäger Maximum Likelihood ESSLLI 2016 21 / 50 Cleaning up from yesterday Using all data and the most sophisticated model...

using both cognacy characters and phonetic characters Bayesian phylogenetic inference (related to Maximum Likelihood, but quite a bit more complex) 10 Gamma categories relaxed molecular clock ⇒ rates are allowed to vary between branches, but only to a limited degree

Gerhard Jäger Maximum Likelihood ESSLLI 2016 22 / 50 Cleaning up from yesterday Using all data and the most sophisticated model...

Greek 0,95 Catalan 0,69 Portuguese 0,59 0,98 Spanish 0,98 0,93 French Italian Romanian 0,8 0,97 Breton 0,28 0,98 Welsh Irish 0,98 Icelandic Swedish 0,99 0,97 Danish English 1 0,83 Dutch 0,98 German Bulgarian 0,99 Russian 0,98 Ukrainian 0,98 0,7 Polish 0,97 Czech 0,36 Lithuanian 0,98 0,87 Hindi Nepali Bengali

Gerhard Jäger Maximum Likelihood ESSLLI 2016 23 / 50 Cleaning up from yesterday Using all data and the most sophisticated model...

Welsh Breton Irish Spanish Portuguese Catalan French Italian Romanian Greek Bengali Nepali Hindi Czech Polish Russian Ukrainian Bulgarian Lithuanian Dutch German English Icelandic Swedish Danish

Gerhard Jäger Maximum Likelihood ESSLLI 2016 24 / 50 Application: Ancestral State Reconstruction

Application: Ancestral State Reconstruction

Gerhard Jäger Maximum Likelihood ESSLLI 2016 25 / 50 Application: Ancestral State Reconstruction

joint work with Johann-Mattis List

Gerhard Jäger Maximum Likelihood ESSLLI 2016 26 / 50 Application: Ancestral State Reconstruction What is Ancestral State Reconstruction?

While tree-building methods seek to find branching diagrams which explain how a has evolved, ASR methods use the branching diagrams in order to explain what has evolved concretely. Ancestral state reconstruction is very common in evolutionary biology but only spuriously practiced in computational historical linguistics (Bouchard-Côté et al., 2013) In classical historical linguistics, on the other hand, linguistic reconstruction of proto-forms and proto-meanings is very common and one of the main goals of the classical comparative method (Fox 1995).

Gerhard Jäger Maximum Likelihood ESSLLI 2016 27 / 50 Application: Ancestral State Reconstruction ASR of Lexical Replacement Patterns

If we look for words corresponding to one meaning in a wordlist and know which of the words are cognate or not, we may ask which of the word forms was the most likely candidate to be used in the proto-language of all descendant languages. This question resembles the task of “semantic reconstruction”, but in contrast to classical semantic reconstruction, we are only operating within one concept slot here, disregarding all words with a different meaning which may also be cognate with the words in our sample. As a result of this restriction, it is quite likely that we cannot recover the original form from our data. It is, however, very interesting to see to which degree we can propose a good candidate word form (cognate set) for the proto-language.

Gerhard Jäger Maximum Likelihood ESSLLI 2016 28 / 50 Application: Ancestral State Reconstruction Data

Gerhard Jäger Maximum Likelihood ESSLLI 2016 29 / 50 Application: Ancestral State Reconstruction Data

IELex ABVD 153 Indo-European doculects 743 Austronesian doculects → 100 were selected at random 207 concepts 210 concepts; for 154 of them entries for Proto-Indo-European entries for Proto-Austronesian for 135 concepts → used as gold standard split into training set and test set: arbitrarily split into training set and test set: training set: 81 concepts, 1695 cognate classes (88 training set: 67 concepts, occur in PAn) 1127 cognate classes (83 test set: 74 concepts, occur in PIE) 1584 cognate classes (79 test set: 68 concepts, 957 occur in PAn) cognate classes (79 from PIE)

Gerhard Jäger Maximum Likelihood ESSLLI 2016 30 / 50 Application: Ancestral State Reconstruction Prerequisites: Trees

Anakalang EastSumbaneseUmbuRatuNggaidialect Mamboru EastSumbaneseKamberaSoutherndialect EastSumbaneseLewadialect Kambera Masiwang TetunTerikFehandialect Lakalai NakanaiBilekiDialect GhariNggeri GhariTandai Talise TaliseMalagheti Tolo KwaraaeSolomonIslands Toambaita Lau Saa Tabar Babuyan Isamorong Ivasay Itbayat Itbayaten Imorod Iraralay Yami KakidugenIlongot Cebuano Surigaonon Tagalog TagalogAnthonydelaPaz Trees ManoboAtadownriver ManoboAtaupriver WesternBukidnonManobo DayakNgaju Katingan Indonesian MalayBahasaIndonesia Melayu Kerinci Ogan Komering KomeringUluAdumanisVillage KomeringIlirPalauGemantungVillage KomeringKayuAgungAsli trees were inferred with full KomeringUluDamarpuraVillage LampungApiDaya KomeringUluPerjayaVillage LampungApiBelalau LampungApiKotaAgung LampungApiKrui LampungApiRanau LampungApiSukau LampungApiKalianda LampungApiTalangPadang LampungApiJabung data set (training + test LampungApiPubian LampungApiSungkai LampungApiWayKanan Lampung LampungNyoAbungKotabumi LampungNyoAbungSukadana LampungNyoMenggalaTulangBawang Carolinian Woleai Chuukese FijianBau data) via Bayesian inference Neveei TannaSouthwest FutunaEast Niue Samoan Tongan Luangiua Sikaiana Rennellese Tikopia Hawaiian Marquesan Maori IELex outgroup: Anatolian Pukapuka Penrhyn Rarotongan Tuamotu Rurutuan TahitianModern Prasun Ashkun BabatanaKatazi Kati Sengga Sogdian Ossetic Kubokota Digor_Ossetic Luqa Iron_Ossetic Wakhi Blablanga Shughni BlablangaGhove Sariqoli ABVD outgroup: Baluchi MaringeKmagha Kurdish KilokakaYsabel Zazaki Tadzik Kokota Persian CiuliAtayalBandai Pashto Waziri SquliqAtayal Old_Persian PaiwanKulalao Avestan Vedic_Sanskrit Kashmiri Hindi 0.06 Lahnda Panjabi_St Urdu Bhojpuri Magahi Malayo-Polynesian Sindhi Marwari Gujarati Marathi Assamese Oriya Bengali Bihari Nepali Khaskura Gypsy_Gk Singhalese Old_Prussian Latvian Lithuanian_O Lithuanian_St Bulgarian_P Bulgarian Macedonian Macedonian_P Serbocroatian Serbian Serbocroatian_P random samples of 1000 Slovenian Slovenian_P Russian Russian_P Ukrainian_P Polish Ukrainian Byelorussian Byelorussian_P Slovak Czech_E Czech Slovak_P Czech_P Polish_P Upper_Sorbian Lower_Sorbian Old_Church_Slavonic Old_Breton trees from posterior Old_Cornish Old_Welsh Cornish Breton_Se Breton_List Breton_St Welsh_C Welsh_N Gaulish Old_Irish Irish_A Irish_B Gaelic_Scots Manx Oscan Umbrian Vlach Rumanian_List Dolomite_Ladino distributions Romansh Ladin Friulian Italian Walloon French Provencal Catalan Brazilian Portuguese_St Spanish Sardinian_L Sardinian_C Sardinian_N Latin Gothic Flemish Dutch_List Frisian German Standard_German_Munich Schwyzerduetsch maximum clade credibility Letzebuergesch Pennsylvania_Dutch Old_High_German Old_English English Old_Gutnish Old_Norse Icelandic_St Faroese Old_Swedish Stavangersk Norwegian Danish Danish_Fjolde Gutnish_Lau Oevdalian Swedish Swedish_Up trees Swedish_Vl Tocharian_A Tocharian_B Albanian_T Albanian Albanian_G Standard_Albanian Albanian_Top Albanian_K Albanian_C Ancient_Greek Greek_Ml Greek_D Greek_Md Tsakonian Greek_Mod Greek_K Classical_Armenian Armenian_Mod Armenian_List Lycian Luvian Palaic Hittite

600.0

Gerhard Jäger Maximum Likelihood ESSLLI 2016 31 / 50 Application: Ancestral State Reconstruction Phylogenetic uncertainty

Prasun Ashkun Kati Sogdian Ossetic Digor_Ossetic Iron_Ossetic Pashto Waziri Baluchi Kurdish Zazaki Tadzik Persian Wakhi Shughni Sariqoli Old_Persian Avestan Vedic_Sanskrit Kashmiri Nepali Khaskura Bengali Assamese Oriya Bihari Gujarati Marathi Sindhi Marwari Hindi Urdu Lahnda Panjabi_St Bhojpuri Magahi Gypsy_Gk Singhalese Old_Prussian Latvian Lithuanian_O Lithuanian_St Old_Church_Slavonic Serbocroatian proper way to deal with it: Serbian Serbocroatian_P Bulgarian_P Bulgarian Macedonian

100.0 Macedonian_P Slovenian Slovenian_P Russian work with posterior sample Russian_P Ukrainian_P Byelorussian_P Byelorussian Polish Ukrainian Polish_P Upper_Sorbian Lower_Sorbian Czech rather than with a single tree Slovak Czech_E Slovak_P Czech_P Gothic German Standard_German_Munich Pennsylvania_Dutch Schwyzerduetsch Letzebuergesch Frisian poor man’s method: Afrikaans Flemish Dutch_List Old_High_German Old_English English Old_Gutnish Stavangersk Norwegian Danish remove all short branches Danish_Fjolde Gutnish_Lau Oevdalian Swedish Swedish_Up Swedish_Vl Old_Swedish Faroese Old_Norse (shorter than some Icelandic_St Old_Breton Old_Cornish Old_Welsh Welsh_C Welsh_N Cornish Breton_St threshold) Breton_Se Breton_List Gaulish Old_Irish Irish_A Irish_B Gaelic_Scots Manx Oscan do ASR with resulting Umbrian Vlach Rumanian_List Dolomite_Ladino Romansh Ladin Friulian Italian Walloon multifurcating tree French Provencal Catalan Brazilian Portuguese_St Spanish Sardinian_L Sardinian_C Sardinian_N Latin Tocharian_A Tocharian_B Albanian_T Standard_Albanian Albanian Albanian_G Albanian_Top Albanian_K Albanian_C Ancient_Greek Greek_Mod Greek_Md Greek_Ml Greek_D Tsakonian Greek_K Classical_Armenian Armenian_Mod Armenian_List Lycian Luvian Palaic Hittite

Gerhard Jäger Maximum Likelihood ESSLLI 2016 32 / 50 Application: Ancestral State Reconstruction Summary on Indo-European ASR

Error Type GS ASR Number Missing forms A ∅ 7 Different forms A B 9 Additional forms in ASR A A, B 5 Missing root in ASR A, B A 4 Summary 25

Gerhard Jäger Maximum Likelihood ESSLLI 2016 33 / 50 Application: Ancestral State Reconstruction Evaluating the Differences

We evaluate the differences qualitatively by checking

the reflection of the proposed root in the branches, especially with semantically shifted word forms, the likelihood of semantic shift of the given root with help of the Database of Cross-Linguistic Colexifications (CLICS, List et al. 2013 and 2014), thoroughly whether the cognate sets in the data are really reflexes of the proposed PIE root.

Based on this check, we distinguish four grades of root quality: erroneous problematic possible good

Gerhard Jäger Maximum Likelihood ESSLLI 2016 34 / 50 Application: Ancestral State Reconstruction Indo-European ASR: Missing forms

Concept Form Meaning in Comment Reflexes

SEE *derḱ- to see Only reflected in Indo-Iranian, cognates also problematic.

SEE *weid- to see or to know Safe root for Indo-European.

SING *kan- to sing or the Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster” rooster which is a highly unlikely semantic change

SMELL *h₃ed- to smell Potential root for PIE, but only reflected in Greek and Romance SMALL *mei- small Wrong cognate judgments in the database, since neither Russian malenkij nor English small go back to this root

THINK *teng- to think or to feel Root only reflected in with spurious reflexes in semantically shifted form in other branches. A better candidate for PIE would be *men- “the mind or to think”.

WASH *leh₂w- to wash or to Wrong cognate assignment in the source since Romance and Albanian reflexes pour are not annotated.

WASH *neigʷ- to wash or water Very unlikely cognate assignment, due to the extreme shift from “to wash” to monster “water monster” (cf. English nix) in the Germanic languages.

WET *wed- water or wet Semantic change from “water” to “wet” is likely according to CLICS, but it is not clear why this should have already happened in PIE times. erroneous problematic possible good

Gerhard Jäger Maximum Likelihood ESSLLI 2016 35 / 50 Application: Ancestral State Reconstruction Indo-European ASR: Missing forms

Concept Form Meaning in Comment Reflexes

SEE *derḱ- to see Only reflected in Indo-Iranian, cognates also problematic.

SEE *weid- to see or to know Safe root for Indo-European.

SING *kan- to sing or the Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster” rooster which is a highly unlikely semantic change

SMELL *h₃ed- to smell Potential root for PIE, but only reflected in Greek and Romance SMALL *mei- small Wrong cognate judgments in the database, since neither Russian malenkij nor English small go back to this root

THINK *teng- to think or to feel Root only reflected in Germanic languages with spurious reflexes in semantically shifted form in other branches. A better candidate for PIE would be *men- “the mind or to think”.

WASH *leh₂w- to wash or to Wrong cognate assignment in the source since Romance and Albanian reflexes pour are not annotated.

WASH *neigʷ- to wash or water Very unlikely PIE root, due to the extreme shift from “to wash” to “water monster” monster (cf. English nix) in the Germanic languages.

WET *wed- water or wet Semantic change from “water” to “wet” is likely according to CLICS, but it is not clear why this should have already happened in PIE times. erroneous problematic possible good

Gerhard Jäger Maximum Likelihood ESSLLI 2016 35 / 50 Application: Ancestral State Reconstruction Indo-European ASR: Different Forms

Concept GS ASR Comment RIVER *h₂ekʷeh₂ *h₂ep- Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely according to CLICS, this meaning is an innovation in Germanic. The ASR form is reflected across multiple branches and a much better candidate.

RUB *melh₁- *terh₁- Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR is reflected in the meaning “to rub, to bore”.

SCRATCH *gerbʰ- *kes- Form in GS is only reflected in few Germanic languages, probably with a wrong cognate assignment. Following Derksen (2008), assuming the GSR form is a much better candidate for the PIE word for “scratch”.

SKIN *pel *(s)kewH- Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as the meaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning “to cover”, but the cognate should not contain Slavic words (Derksen 2008).

WALK *ǵʰeh₁ *h₁ei- The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but the meaning may also have been “to go”.

WATER *h₂ekʷeh₂ *wódr̥ The ASR form is a much better candidate for “water” in PIE, due to its high number of reflexes in all branches.

WHITE *h₂elbʰós *h₂erǵó- The GS form is only reflected in Romance in this meaning and as meaning “cloud” in Hittite. The ASR form is a much better candidate, with a much more plausible connection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.

WORM *wr̥mi- *kʷr̥mis The ASR form is reflected in more different branches of PIE, while the GS form is only reflected in Germanic and Romance. erroneous problematic possible good

Gerhard Jäger Maximum Likelihood ESSLLI 2016 36 / 50 Application: Ancestral State Reconstruction Indo-European ASR: Different Forms

Concept GS ASR Comment RIVER *h₂ekʷeh₂ *h₂ep- Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely according to CLICS, this meaning is an innovation in Germanic. The ASR form is reflected across multiple branches and a much better candidate.

RUB *melh₁- *terh₁- Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR is reflected in the meaning “to rub, to bore”.

SCRATCH *gerbʰ- *kes- Form in GS is only reflected in few Germanic languages, probably with a wrong cognate assignment. Following Derksen (2008), assuming the GSR form is a much better candidate for the PIE word for “scratch”.

SKIN *pel *(s)kewH- Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as the meaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning “to cover”, but the cognate should not contain Slavic words (Derksen 2008).

WALK *ǵʰeh₁ *h₁ei- The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but the meaning may also have been “to go”.

WATER *h₂ekʷeh₂ *wódr̥ The ASR form is a much better candidate for “water” in PIE, due to its high number of reflexes in all branches.

WHITE *h₂elbʰós *h₂erǵó- The GS form is only reflected in Romance in this meaning and as meaning “cloud” in Hittite. The ASR form is a much better candidate, with a much more plausible connection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.

WORM *wr̥mi- *kʷr̥mis The ASR form is reflected in more different branches of PIE, while the GS form is only reflected in Germanic and Romance. erroneous problematic possible good

Gerhard Jäger Maximum Likelihood ESSLLI 2016 36 / 50 Application: Ancestral State Reconstruction Indo-European ASR: Additional Forms

Concept Form in ASR Comment

MOON *lewk-s-nh₂ This form would go back to a PIE root meaning “to shine” and is often said to have independently turned to mean “moon” in Romance and Slavic and other branches. The shift from “shine” to “moon” is however not very likely (no evidence in CLICS), so it is also possible that the word meant already “moon” in PIE as an epithet (Vaan 2008).

SNOW *ǵʰéi-mn̥- The form has probably independently shifted from the original meaning “frost, cold”, which is a very likely shift according to CLICS.

SUCK *suḱ- The root is present in this meaning in many subbranches and a good candidate for PIE in this meaning.

THIS *so / *to The root is a clear PIE demonstrative (Meier-Brg̈ger 2010), but the reflexes in the daughter languages vary greatly, due to analogical levelling.

WITH *sm̥ A very good candidate for the meaning with reflexes in Greek, Indo-Iranian and Slavic. erroneous problematic possible good

Gerhard Jäger Maximum Likelihood ESSLLI 2016 37 / 50 Application: Ancestral State Reconstruction Indo-European ASR: Additional Forms

Concept Form in ASR Comment

MOON *lewk-s-nh₂ This form would go back to a PIE root meaning “to shine” and is often said to have independently turned to mean “moon” in Romance and Slavic and other branches. The shift from “shine” to “moon” is however not very likely (no evidence in CLICS), so it is also possible that the word meant already “moon” in PIE as an epithet (Vaan 2008).

SNOW *ǵʰéi-mn̥- The form has probably independently shifted from the original meaning “frost, cold”, which is a very likely shift according to CLICS.

SUCK *suḱ- The root is present in this meaning in many subbranches and a good candidate for PIE in this meaning.

THIS *so / *to The root is a clear PIE demonstrative (Meier-Brügger 2010), but the reflexes in the daughter languages vary greatly, due to analogical levelling.

WITH *sm̥ A very good candidate for the meaning with reflexes in Greek, Indo-Iranian and Slavic. erroneous problematic possible good

Gerhard Jäger Maximum Likelihood ESSLLI 2016 37 / 50 Application: Ancestral State Reconstruction Indo-European ASR: Missing Forms in ASR

Concept Form in GS Comment

NOT *meh₁ This form is reflected in Old Greek as a prohibitive negation and also recon- structed as such. Whether it was the normal negation in PIE is less clear.

SLEEP *drem This form is mainly reflected in Latin and spuriously in Indian and Greek. It is much more likely that it meant something else in PIE and then shifted into this meaning.

VOMIT *h₁rewg- No need to reconstruct this form back to PIE, since it is only reflected in two languages of Romance.

YEAR *ieHr- This form has only reflexes in Germanic languages. Generally, the meaning “year” is difficult to reconstruct, due to the high potential for shift from “summer”, “winter”, “time”, etc. as shown in CLICS. erroneous problematic possible good

Gerhard Jäger Maximum Likelihood ESSLLI 2016 38 / 50 Application: Ancestral State Reconstruction Indo-European ASR: Missing Forms in ASR

Concept Form in GS Comment

NOT *meh₁ This form is reflected in Old Greek as a prohibitive negation and also recon- structed as such. Whether it was the normal negation in PIE is less clear.

SLEEP *drem This form is mainly reflected in Latin and spuriously in Indian and Greek. It is much more likely that it meant something else in PIE and then shifted into this meaning.

VOMIT *h₁rewg- No need to reconstruct this form back to PIE, since it is only reflected in two languages of Romance.

YEAR *ieHr- This form has only reflexes in Germanic languages. Generally, the meaning “year” is difficult to reconstruct, due to the high potential for shift from “summer”, “winter”, “time”, etc. as shown in CLICS. erroneous problematic possible good

Gerhard Jäger Maximum Likelihood ESSLLI 2016 38 / 50 Application: Ancestral State Reconstruction Evaluation against our manually created gold standard

precision: 0.986 (1 false positive) recall: 0.895 (8 false negatives) F-score: 0.9381

1The IELex PIE entries have an F-score of 0.854. Gerhard Jäger Maximum Likelihood ESSLLI 2016 39 / 50 Application: Ancestral State Reconstruction False positive

● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Ancient Greek ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Gothic ● English ● ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● ● Faroese ● Icelandic St ● ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Gaelic Scots ● Irish B ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Vlach ● Sardinian L ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Old Prussian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Slovenian ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Avestan ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Sariqoli ● Shughni ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Vedic Sanskrit ● Kashmiri ● Singhalese ● Gypsy Gk ● Khaskura ● Nepali snow:D ● Marathi

Gerhard Jäger Maximum Likelihood ESSLLI 2016 40 / 50 Application: Ancestral State Reconstruction False negatives

● Hittite ● Luvian ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Ancient Greek ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Gothic ● Old English ● Old High German ● Frisian ● Flemish ● German ● Standard German Munich ● Letzebuergesch ● Schwyzerduetsch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Gaulish ● Old Irish ● Gaelic Scots ● Irish B ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Old Prussian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Slovak ● Kati ● Avestan ● Old Persian ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Pashto ● Persian ● Tadzik ● Zazaki ● Vedic Sanskrit ● Singhalese ● Khaskura ● Nepali ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Marwari ● Sindhi ● Hindi river:O ● Panjabi St

Gerhard Jäger Maximum Likelihood ESSLLI 2016 41 / 50 Application: Ancestral State Reconstruction False negatives

● Tocharian B ● Tocharian A ● Albanian K ● Albanian Top ● Albanian T ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Ancient Greek ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Letzebuergesch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Gaelic Scots ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Sardinian C ● Italian ● Dolomite Ladino ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Old Prussian ● Latvian ● Lithuanian St ● Old Church Slavonic ● Slovenian ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Slovenian P ● Russian P ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Iron Ossetic ● Digor Ossetic ● Shughni ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Vedic Sanskrit ● Gypsy Gk ● Khaskura ● Nepali ● Bihari ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Sindhi ● Hindi ● Urdu ● Panjabi St smell:W ● Lahnda

Gerhard Jäger Maximum Likelihood ESSLLI 2016 42 / 50 Application: Ancestral State Reconstruction False negatives

● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Ancient Greek ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Gothic ● English ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Gaelic Scots ● Irish B ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Slovenian ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Kati ● Avestan ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Shughni ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Vedic Sanskrit ● Kashmiri ● Singhalese ● Gypsy Gk ● Bihari ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Marwari ● Sindhi wet:I ● Hindi

Gerhard Jäger Maximum Likelihood ESSLLI 2016 43 / 50 Application: Ancestral State Reconstruction False negatives

● Tocharian B ● Tocharian A ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Ancient Greek ● Greek K ● Greek Mod ● Tsakonian ● Greek Md ● Greek D ● Greek Ml ● Gothic ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Manx ● Gaelic Scots ● Irish B ● Irish A ● Old Cornish ● Old Breton ● Old Welsh ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Slovenian ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Prasun ● Kati ● Ashkun ● Avestan ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Kurdish ● Vedic Sanskrit ● Kashmiri ● Khaskura ● Nepali ● Bihari ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Marwari ● Hindi ● Urdu skin:B ● Lahnda

Gerhard Jäger Maximum Likelihood ESSLLI 2016 44 / 50 Application: Ancestral State Reconstruction False negatives

Hittite ● Tocharian B ●Tocharian A ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Armenian List ● Armenian Mod Ancient Greek ● ● Greek K ● Greek Mod ● Tsakonian ● Greek Md ● Greek D ● Greek Ml Gothic ● ● English Old English ● ●Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch Old Gutnish ● ● Old Swedish ● Faroese ● Icelandic St Old Norse ● ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up Old Irish ● ● Manx ● Gaelic Scots ● Irish B ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se Latin ● ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian Old Prussian ● ● Latvian ● Lithuanian St ● Lithuanian O Old Church Slavonic ● ● Slovenian ● Serbocroatian P ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Kati Avestan ● ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Sariqoli ● Shughni ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Kurdish Vedic Sanskrit ● ● Kashmiri ● Singhalese ● Khaskura ● Nepali ● Bihari ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Marwari ● Sindhi ● Bhojpuri ● Hindi ● Urdu ● Panjabi St sleep:E ● Lahnda

Gerhard Jäger Maximum Likelihood ESSLLI 2016 45 / 50 Application: Ancestral State Reconstruction False negatives

● Hittite ● Tocharian B ● Tocharian A ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Armenian List ● Ancient Greek ● Greek K ● Greek Mod ● Tsakonian ● Greek Md ● Greek D ● Greek Ml ● Gothic ● English ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● Old Gutnish ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Gaulish ● Old Irish ● Manx ● Gaelic Scots ● Irish B ● Irish A ● Old Cornish ● Old Breton ● Old Welsh ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Prasun ● Kati ● Ashkun ● Avestan ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Sariqoli ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Kurdish ● Vedic Sanskrit ● Kashmiri ● Singhalese ● Gypsy Gk ● Khaskura ● Nepali ● Bihari ● Oriya ● Marathi ● Gujarati ● Marwari ● Hindi ● Panjabi St white:E ● Lahnda

Gerhard Jäger Maximum Likelihood ESSLLI 2016 46 / 50 Application: Ancestral State Reconstruction False negatives

● Tocharian B ● Tocharian A ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Gothic ● English ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Gaelic Scots ● Irish B ● Welsh N ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Provencal ● French ● Walloon ● Spanish ● Portuguese St ● Brazilian ● Old Prussian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Sogdian ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Sariqoli ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Vedic Sanskrit ● Kashmiri ● Singhalese ● Nepali ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Sindhi ● Magahi ● Hindi ● Urdu ● Panjabi St worm:B ● Lahnda

Gerhard Jäger Maximum Likelihood ESSLLI 2016 47 / 50 Application: Ancestral State Reconstruction Summary on Indo-European

As the qualitative evaluation shows, the proto-forms proposed to be reconstructed back to PIE by our best ASR method are mostly equally good if not even better candidates than those which we found in the gold standard. Given the general and well-known uncertainties in semantic reconstruction in classical historical linguistics, it seems that ASR methods could provide actual help in semantic reconstruction by providing objective evolutionary scenarios for word evolution along a given tree which follow a specific evolutionary model.

Gerhard Jäger Maximum Likelihood ESSLLI 2016 48 / 50 Application: Ancestral State Reconstruction Hands-on

How to run Maximum-Likelihood tree estimation in Paup*

Load your nexus file in to Paup* >paup4 soundConcept.bin.nex set optimality criterion to likelihood paup> set criterion=likelihood choose model: optimized rate parameter paup> lset basefreq=estimate ultrametric tree: paup> lset clock=yes gamma-distributed rates paup> lset rates=gamma shape = estimate assume invariant sites paup> lset pinvar=estimate

Gerhard Jäger Maximum Likelihood ESSLLI 2016 49 / 50 Application: Ancestral State Reconstruction Hands-on

How to run Maximum-Likelihood tree estimation in Paup* (cont.)

perform heuristic search paup> hsearch display tree paup> describetree /plot=phylo show log-likelihood and AIC lscores /aic=yes

Gerhard Jäger Maximum Likelihood ESSLLI 2016 50 / 50 Application: Ancestral State Reconstruction Bouchard-Côté, A., D. Hall, T. L. Griffiths, and D. Klein (2013). Automated reconstruction of ancient languages using probabilistic models of sound change. Proceedings of the National Academy of Sciences, 36(2):141–150. Ewens, W. and G. Grant (2005). Statistical Methods in Bioinformatics: An Introduction. Springer, New York.

Gerhard Jäger Maximum Likelihood ESSLLI 2016 50 / 50