Quick viewing(Text Mode)

Investigatng the Potential of Ancestral State Reconstruction Algorithms In

Investigating the potential of ancestral state reconstruction algorithms in historical linguistics

Gerhard Jäger & Johann-Mattis List

Tübingen University & CRLAO / Team AIRE, Paris

Capturing Phylogenetic Algorithms for Linguistics, Leiden

October 28, 2015

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 1 / 42 Introduction What is Ancestral State Reconstruction?

While tree-building methods seek to find branching diagrams which explain how a has evolved, ASR methods use the branching diagrams in order to explain what has evolved concretely. Ancestral state reconstruction is very common in evolutionary biology but only spuriously practiced in computational historical linguistics (Bouchard-Côté et al. 2013). In classical historical linguistics, on the other hand, linguistic reconstruction of proto-forms and proto-meanings is very common and one of the main goals of the classical comparative method (Fox 1995).

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 2 / 42 Introduction ASR of Lexical Replacement Patterns

If we look for words corresponding to one meaning in a wordlist and know which of the words are cognate or not, we may ask which of the word forms was the most likely candidate to be used in the proto-language of all descendant languages. This question resembles the task of “semantic reconstruction”, but in contrast to classical semantic reconstruction, we are only operating within one concept slot here, disregarding all words with a different meaning which may also be cognate with the words in our sample. As a result of this restriction, it is quite likely that we cannot recover the original form from our data. It is, however, very interesting to see to which degree we can propose a good candidate word form (cognate set) for the proto-language.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 3 / 42 Introduction ASR of Lexical Replacement Patterns

Kopf kop head tête testa cap "head" "head" "head" "head" "head" "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42 Introduction ASR of Lexical Replacement Patterns

Kopf kop head tête testa cap "head" "head" "head" "head" "head" "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42 Introduction ASR of Lexical Replacement Patterns ?

"head"? ?

? ?

Kopf kop head tête testa cap "head" "head" "head" "head" "head" "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42 Introduction ASR of Lexical Replacement Patterns

*kop testa "head" "head"

Kopf kop head tête testa cap "head" "head" "head" "head" "head" "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42 Introduction ASR of Lexical Replacement Patterns

*kaput- "head"

*haubud- caput "head" "head"

*kop testa "head" "head"

Kopf kop head tête testa cap "head" "head" "head" "head" "head" "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42 Introduction This talk

reconstruction of cognate class at the root

?

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 5 / 42 Introduction This talk

reconstruction of cognate class at the root

B

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 5 / 42 Materials and Methods Materials Data

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 6 / 42 Materials and Methods Materials Data

IELex ABVD 153 Indo-European doculects 743 Austronesian doculects → 100 were selected at random 207 concepts 210 concepts; for 154 of them entries for Proto-Indo-European entries for Proto-Austronesian for 135 concepts → used as gold standard split into training set and test set: arbitrarily split into training set and test set: training set: 81 concepts, 1695 cognate classes (88 training set: 67 concepts, occur in PAn) 1127 cognate classes (83 test set: 74 concepts, occur in PIE) 1584 cognate classes (79 test set: 68 concepts, 957 occur in PAn) cognate classes (79 from PIE)

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 7 / 42 Materials and Methods Methods Prerequisites: Trees

Anakalang EastSumbaneseUmbuRatuNggaidialect Mamboru EastSumbaneseKamberaSoutherndialect EastSumbaneseLewadialect Kambera Masiwang TetunTerikFehandialect Lakalai NakanaiBilekiDialect GhariNggeri GhariTandai Talise TaliseMalagheti Tolo KwaraaeSolomonIslands Toambaita Lau Saa Tabar Babuyan Isamorong Ivasay Itbayat Itbayaten Imorod Iraralay Yami KakidugenIlongot Cebuano Surigaonon Tagalog TagalogAnthonydelaPaz Trees ManoboAtadownriver ManoboAtaupriver WesternBukidnonManobo DayakNgaju Katingan Indonesian MalayBahasaIndonesia Melayu Kerinci Ogan Komering KomeringUluAdumanisVillage KomeringIlirPalauGemantungVillage KomeringKayuAgungAsli trees were inferred with full KomeringUluDamarpuraVillage LampungApiDaya KomeringUluPerjayaVillage LampungApiBelalau LampungApiKotaAgung LampungApiKrui LampungApiRanau LampungApiSukau LampungApiKalianda LampungApiTalangPadang LampungApiJabung LampungApiPubian data set (training + test LampungApiSungkai LampungApiWayKanan Lampung LampungNyoAbungKotabumi LampungNyoAbungSukadana LampungNyoMenggalaTulangBawang Carolinian Woleai Chuukese FijianBau Neveei data) via Bayesian inference TannaSouthwest FutunaEast Niue Samoan Tongan Luangiua Sikaiana Rennellese Tikopia Hawaiian Marquesan Maori Pukapuka IELex outgroup: Anatolian Penrhyn Rarotongan Tuamotu Rurutuan TahitianModern Prasun Ashkun BabatanaKatazi Kati Sengga Sogdian Ossetic Kubokota Digor_Ossetic Luqa Iron_Ossetic Wakhi Blablanga Shughni BlablangaGhove Sariqoli Baluchi ABVD outgroup: MaringeKmagha Kurdish KilokakaYsabel Zazaki Tadzik Kokota Persian CiuliAtayalBandai Pashto Waziri SquliqAtayal Old_Persian PaiwanKulalao Avestan Vedic_Sanskrit Kashmiri Hindi 0.06 Lahnda Panjabi_St Urdu Bhojpuri Magahi Malayo-Polynesian Sindhi Marwari Gujarati Marathi Assamese Oriya Bengali Bihari Nepali Khaskura Gypsy_Gk Singhalese Old_Prussian Latvian Lithuanian_O Lithuanian_St Bulgarian_P Bulgarian Macedonian Macedonian_P Serbocroatian Serbian Serbocroatian_P Slovenian random samples of 1000 Slovenian_P Russian Russian_P Ukrainian_P Polish Ukrainian Byelorussian Byelorussian_P Slovak Czech_E Czech Slovak_P Czech_P Polish_P Upper_Sorbian Lower_Sorbian Old_Church_Slavonic Old_Breton trees from posterior Old_Cornish Old_Welsh Cornish Breton_Se Breton_List Breton_St Welsh_C Welsh_N Gaulish Old_Irish Irish_A Irish_B Gaelic_Scots Manx Oscan Umbrian Vlach Rumanian_List Dolomite_Ladino distributions Romansh Ladin Friulian Italian Walloon French Provencal Catalan Brazilian Portuguese_St Spanish Sardinian_L Sardinian_C Sardinian_N Latin Gothic Flemish Dutch_List Frisian German Standard_German_Munich Schwyzerduetsch maximum clade credibility Letzebuergesch Pennsylvania_Dutch Old_High_German Old_English English Old_Gutnish Old_Norse Icelandic_St Faroese Old_Swedish Stavangersk Norwegian Danish Danish_Fjolde Gutnish_Lau Oevdalian Swedish Swedish_Up Swedish_Vl trees Tocharian_A Tocharian_B Albanian_T Albanian Albanian_G Standard_Albanian Albanian_Top Albanian_K Albanian_C Ancient_Greek Greek_Ml Greek_D Greek_Md Tsakonian Greek_Mod Greek_K Classical_Armenian Armenian_Mod Armenian_List Lycian Luvian Palaic Hittite

600.0

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 8 / 42 Materials and Methods Methods Phylogenetic uncertainty

Prasun Ashkun Kati Sogdian Ossetic Digor_Ossetic Iron_Ossetic Pashto Waziri Baluchi Kurdish Zazaki Tadzik Persian Wakhi Shughni Sariqoli Old_Persian Avestan Vedic_Sanskrit Kashmiri Nepali Khaskura Bengali Assamese Oriya Bihari Gujarati Marathi Sindhi Marwari Hindi Urdu Lahnda Panjabi_St Bhojpuri Magahi Gypsy_Gk Singhalese Old_Prussian Latvian Lithuanian_O Lithuanian_St Old_Church_Slavonic Serbocroatian proper way to deal with it: Serbian Serbocroatian_P Bulgarian_P Bulgarian Macedonian

100.0 Macedonian_P Slovenian Slovenian_P Russian Russian_P work with posterior sample Ukrainian_P Byelorussian_P Byelorussian Polish Ukrainian Polish_P Upper_Sorbian Lower_Sorbian Czech Slovak rather than with a single tree Czech_E Slovak_P Czech_P Gothic German Standard_German_Munich Pennsylvania_Dutch Schwyzerduetsch Letzebuergesch Frisian poor man’s method: Afrikaans Flemish Dutch_List Old_High_German Old_English English Old_Gutnish Stavangersk Norwegian Danish remove all short branches Danish_Fjolde Gutnish_Lau Oevdalian Swedish Swedish_Up Swedish_Vl Old_Swedish Faroese Old_Norse (shorter than some Icelandic_St Old_Breton Old_Cornish Old_Welsh Welsh_C Welsh_N Cornish Breton_St Breton_Se threshold) Breton_List Gaulish Old_Irish Irish_A Irish_B Gaelic_Scots Manx Oscan do ASR with resulting Umbrian Vlach Rumanian_List Dolomite_Ladino Romansh Ladin Friulian Italian Walloon multifurcating tree French Provencal Catalan Brazilian Portuguese_St Spanish Sardinian_L Sardinian_C Sardinian_N Latin Tocharian_A Tocharian_B Albanian_T Standard_Albanian Albanian Albanian_G Albanian_Top Albanian_K Albanian_C Ancient_Greek Greek_Mod Greek_Md Greek_Ml Greek_D Tsakonian Greek_K Classical_Armenian Armenian_Mod Armenian_List Lycian Luvian Palaic Hittite

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 9 / 42 Materials and Methods Methods Coding

Multi-state Binarized

B non-A

A A non-A non-A non-A non-A

B

A A B C C B non-C

non-B non-B B non-B non-B B

non-C non-C non-C C C non-C

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 10 / 42 Materials and Methods Methods Polymorphisms (a.k.a. synonyms)

problem for multistate coding possible representations: epistemic: both observations have 50% (subjective) probability lifted model: states in the technical sense are sets of Kopf kop head tête testa cap "head" "head" "head" "head" "head" "head" cognate classes Haupt hoofd "head" "head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 11 / 42 Materials and Methods Methods Parsimony reconstruction

B Parsimony = 2

B B

C A

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42 Materials and Methods Methods Parsimony reconstruction

A Parsimony = 3

A B

C A

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42 Materials and Methods Methods Parsimony reconstruction

C Parsimony = 3

A C

C A

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42 Materials and Methods Methods Weighted parsimony reconstruction

B Weighted Parsimony = 3 Weight matrix

B B ABC A 0 1 2 C A B 1 0 2 C 2 2 0

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42 Materials and Methods Methods Weighted parsimony reconstruction

A Weighted Parsimony = 4 Weight matrix

A B ABC A 0 1 2 C A B 1 0 2 C 2 2 0

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42 Materials and Methods Methods Weighted parsimony reconstruction

C Weighted Parsimony = 5 Weight matrix

A C ABC A 0 1 2 C A B 1 0 2 C 2 2 0

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42 Materials and Methods Methods Dynamic Programming (Sankoff Algorithm)

X wp(mother, s) = min (w(s, s0) + wp(d, s0)) s0∈states d∈daughters

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42 Materials and Methods Methods Dynamic Programming (Sankoff Algorithm)

X wp(mother, s) = min (w(s, s0) + wp(d, s0)) s0∈states d∈daughters

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42 Materials and Methods Methods Dynamic Programming (Sankoff Algorithm)

X wp(mother, s) = min (w(s, s0) + wp(d, s0)) s0∈states d∈daughters

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42 Materials and Methods Methods Dynamic Programming (Sankoff Algorithm)

X wp(mother, s) = min (w(s, s0) + wp(d, s0)) s0∈states d∈daughters

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42 Materials and Methods Methods Weighted Parsimony reconstruction

the state with the lowest parsimony score wins in case of ties, frequency at the leafs is tie-breaker binary characters: w(0 → 2) = 1; w(1 → 0) = 2 multi-state characters: all weights = 1 polymorphism only admitted at tips:

w(a → {a, b}) = 0 w(a → {b, c}) = 1

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 15 / 42 Materials and Methods Methods The MLN Method for ASR

The MLN method (List et al. 2014a) uses parsimony for ancestral state reconstruction. In contrast to classical parsimony, MLN tests different weighting schemes for gains and losses and selects the optimal scheme with help of the vocabulary size criterion. The vocabulary size criterion states that the amount of synonyms per word should be similar in the ancestral and the descendant languages.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 16 / 42 Materials and Methods Methods The MLN Method for ASR

Too many synonyms in ancestral nodes!

The vocabulary size criterion states that the amount of synonyms per word (here reflected by the size of the nodes in the tree) should be similar across ancestral and descendant languages. With help of this criterion, an optimal weighting scheme for gain-loss rates is chosen for individual datasets.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42 Materials and Methods Methods The MLN Method for ASR

Too few synonyms in ancestral nodes!

The vocabulary size criterion states that the amount of synonyms per word (here reflected by the size of the nodes in the tree) should be similar across ancestral and descendant languages. With help of this criterion, an optimal weighting scheme for gain-loss rates is chosen for individual datasets.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42 Materials and Methods Methods The MLN Method for ASR

Optimal amount of synonyms in ancestral nodes!

The vocabulary size criterion states that the amount of synonyms per word (here reflected by the size of the nodes in the tree) should be similar across ancestral and descendant languages. With help of this criterion, an optimal weighting scheme for gain-loss rates is chosen for individual datasets.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42 Materials and Methods Methods Reconstruction on a posterior sample

if a sample of trees is used: A state is reconstructed if it is reconstructed in more than θ trees in the sample. θ is estimated using the training set. values: database method θ IELex Sankoff/binary 0.690 Sankoff/multistate 0.056 MLN 0.464

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 18 / 42 Materials and Methods Methods Likelihood-based reconstruction

log L(tips below|mother = s) = P P 0 d∈daughters s0∈states log P (s → s |branchlength)+ log(L(tips below d|d = s0))

A A B C C B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 19 / 42 Materials and Methods Methods Likelihood-based reconstruction

note: likelihoods (unlike parsimony scores) depend on branch lengths! likelihoods at the root give likelihood of a reconstruction, given all observed data (for that character) total likelihood is obtained by multiplying root state likelihoods with equilibrium probabilities given a rate matrix rate matrix is optimized to maximize likelihood rates across characters are independently optimized for multistate characters, all rates are constrained to be equal (otherwise BayesTraits crashes…) using equilibrium probabilities, you can derive exptected state probabilities for root states

a state is likelihood-reconstructed if its expected probability > θ2

again, threshold θ2 must be estimated from training set

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 20 / 42 Results General Results Evaluation

0.8 0.8 0.8

0.6 0.6 0.6 algorithm database character type ML ABVD 0.4 binary valued 0.4 0.4 MLN IELex multi−valued Sankoff

0.2 0.2 0.2

0.0 0.0 0.0

precision recall F.score precision recall F.score precision recall F.score

0.8 0.8

0.6 0.6

tree type tree sample bifurcating posterior sample 0.4 0.4 multifurcating summary tree

0.2 0.2

0.0 0.0

precision recall F.score precision recall F.score

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 21 / 42 Results General Results Evaluation

IELex

algorithm characters furcating treeSample precision recall F-score ML binary bifurcating summary tree 0.817 0.734 0.773 ML binary bifurcating posterior sample 0.795 0.734 0.763 ML binary multifurcating summary tree 0.792 0.722 0.755 ML binary multifurcating posterior sample 0.756 0.747 0.752 Sankoff binary multifurcating summary tree 0.716 0.734 0.725 Sankoff binary bifurcating summary tree 0.704 0.722 0.712 Sankoff binary multifurcating posterior sample 0.720 0.684 0.701 Sankoff binary bifurcating posterior sample 0.72 0.684 0.701 ML multi bifurcating posterior sample 0.642 0.772 0.701 MLN multi bifurcating posterior sample 0.743 0.658 0.698 MLN binary multifurcating posterior sample 0.743 0.658 0.698 MLN binary bifurcating posterior sample 0.743 0.658 0.698 Sankoff multi bifurcating summary tree 0.671 0.722 0.695 Sankoff multi multifurcating posterior sample 0.671 0.722 0.695 Sankoff multi bifurcating posterior sample 0.671 0.722 0.695 ML multi multifurcating posterior sample 0.629 0.772 0.693 MLN multi multifurcating posterior sample 0.758 0.633 0.690 Sankoff multi multifurcating summary tree 0.735 0.633 0.680 ML multi multifurcating summary tree 0.735 0.633 0.680 ML multi bifurcating summary tree 0.721 0.620 0.667 MLN multi multifurcating summary tree 0.584 0.658 0.619 MLN binary multifurcating summary tree 0.584 0.658 0.619 MLN multi bifurcating summary tree 0.742 0.291 0.418 MLN binary bifurcating summary tree 0.742 0.291 0.418

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 22 / 42 Results General Results Evaluation

ABVD

algorithm characters furcating treeSample precision recall F-score ML multi bifurcating posterior sample 0.738 0.747 0.742 ML binary bifurcating posterior sample 0.682 0.759 0.719 ML multi bifurcating summary tree 0.740 0.684 0.711 ML binary bifurcating summary tree 0.757 0.681 0.711 Sankoff multi bifurcating summary tree 0.691 0.709 0.700 Sankoff binary multifurcating posterior sample 0.781 0.633 0.699 ML binary multifurcating posterior sample 0.761 0.646 0.699 ML multi multifurcating summary tree 0.726 0.671 0.697 Sankoff binary bifurcating posterior sample 0.726 0.671 0.697 ML binary multifurcating summary tree 0.732 0.658 0.693 Sankoff multi multifurcating summary tree 0.679 0.696 0.688 MLN multi bifurcating summary tree 0.655 0.722 0.687 MLN binary bifurcating summary tree 0.655 0.722 0.687 Sankoff binary bifurcating summary tree 0.629 0.557 0.591 Sankoff multi multifurcating posterior sample 0.542 0.570 0.556 Sankoff multi bifurcating posterior sample 0.542 0.570 0.556 MLN multi multifurcating posterior sample 0.414 0.848 0.556 MLN multi bifurcating posterior sample 0.414 0.848 0.556 MLN binary multifurcating posterior sample 0.414 0.848 0.556 MLN binary bifurcating posterior sample 0.414 0.848 0.556 ML multi multifurcating posterior sample 0.421 0.709 0.528 Sankoff binary multifurcating summary tree 0.469 0.570 0.514 MLN multi multifurcating summary tree 0.667 0.405 0.504 MLN binary multifurcating summary tree 0.667 0.405 0.504

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 23 / 42 Results Specific Results Summary on Indo-European ASR

Error Type GS ASR Number Missing forms A Ø 7 Different forms A B 9 Additional forms in ASR A A, B 5 Missing root in ASR A, B A 4 Summary 25

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 24 / 42 Results Specific Results Evaluating the Differences

We evaluate the differences qualitatively by checking the reflection of the proposed root in the branches, especially with semantically shifted word forms which may not occur in the wordlist data, using standard sources like Meier-Brügger (2002), Wodtko et al. (2008), Rix et al. (2002), and Pokorny (1959) for Indo-European in general, and specific sources like Vaan (2008) for Latin, Derksen (2008) and Vasmer (1986/1987) for Slavic, and Kroonen (2013) for Germanic. the likelihood of semantic shift of the given root with help of the Database of Cross-Linguistic Colexifications (CLICS, List et al. 2013 and 2014b, http://clics.lingpy.org), whether the cognate sets in the data are really reflexes of the proposed PIE root. Based on this check, we distinguish four grades of root quality: erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 25 / 42 Results Specific Results Indo-European ASR: Missing forms

Concept Form Meaning in Comment Reflexes

SEE *derḱ- to see Only reflected in Indo-Iranian, cognates also problematic. SEE *weid- to see or to Safe root for Indo-European. know

SING *kan- to sing or the Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster” rooster which is a highly unlikely semantic change

SMELL *h₃ed- to smell Potential root for PIE, but only reflected in Greek and Romance SMALL *mei- small Wrong cognate judgments in the database, since neither Russian malenkij nor English small go back to this root

THINK *teng- to think or to Root only reflected in with spurious reflexes in seman- feel tically shifted form in other branches. A better candidate for PIE would be *men- “the mind or to think”.

WASH *leh₂w- to wash or to Wrong cognate assignment in the source since Romance and Albanian re- pour flexes are not annotated.

WASH *neigʷ- to wash or water Very unlikely cognate assignment, due to the extreme shift from “to wash” monster to “water monster” (cf. English nix) in the Germanic languages.

WET *wed- water or wet Semantic change from “water” to “wet” is likely according to CLICS, but it is not clear why this should have already happened in PIE times. erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 26 / 42 Results Specific Results Indo-European ASR: Missing forms

Concept Form Meaning in Comment Reflexes

SEE *derḱ- to see Only reflected in Indo-Iranian, cognates also problematic. SEE *weid- to see or to Safe root for Indo-European. know

SING *kan- to sing or the Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster” rooster which is a highly unlikely semantic change

SMELL *h₃ed- to smell Potential root for PIE, but only reflected in Greek and Romance SMALL *mei- small Wrong cognate judgments in the database, since neither Russian malenkij nor English small go back to this root

THINK *teng- to think or to Root only reflected in Germanic languages with spurious reflexes in seman- feel tically shifted form in other branches. A better candidate for PIE would be *men- “the mind or to think”.

WASH *leh₂w- to wash or to Wrong cognate assignment in the source since Romance and Albanian re- pour flexes are not annotated.

WASH *neigʷ- to wash or water Very unlikely PIE root, due to the extreme shift from “to wash” to “water monster monster” (cf. English nix) in the Germanic languages.

WET *wed- water or wet Semantic change from “water” to “wet” is likely according to CLICS, but it is not clear why this should have already happened in PIE times. erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 26 / 42 Results Specific Results Indo-European ASR: Missing Forms in ASR

Concept Form in GS Comment

NOT *meh₁ This form is reflected in Old Greek as a prohibitive negation and also re- constructed as such. Whether it was the normal negation in PIE is less clear.

SLEEP *drem This form is mainly reflected in Latin and spuriously in Indian and Greek. It is much more likely that it meant something else in PIE and then shifted into this meaning.

VOMIT *h₁rewg- No need to reconstruct this form back to PIE, since it is only reflected in two languages of Romance.

YEAR *ieHr- This form has only reflexes in Germanic languages. Generally, the meaning “year” is difficult to reconstruct, due to the high potential for shift from “summer”, “winter”, “time”, etc. as shown in CLICS. erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 27 / 42 Results Specific Results Indo-European ASR: Missing Forms in ASR

Concept Form in GS Comment

NOT *meh₁ This form is reflected in Old Greek as a prohibitive negation and also re- constructed as such. Whether it was the normal negation in PIE is less clear.

SLEEP *drem This form is mainly reflected in Latin and spuriously in Indian and Greek. It is much more likely that it meant something else in PIE and then shifted into this meaning.

VOMIT *h₁rewg- No need to reconstruct this form back to PIE, since it is only reflected in two languages of Romance.

YEAR *ieHr- This form has only reflexes in Germanic languages. Generally, the meaning “year” is difficult to reconstruct, due to the high potential for shift from “summer”, “winter”, “time”, etc. as shown in CLICS. erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 27 / 42 Results Specific Results Indo-European ASR: Different Forms

Concept GS ASR Comment RIVER *h₂ekʷeh₂ *h₂ep- Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely according to CLICS, this meaning is an innovation in Germanic. The ASR form is reflected across multiple branches and a much better candidate.

RUB *melh₁- *terh₁- Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR is reflected in the meaning “to rub, to bore”.

SCRATCH *gerbʰ- *kes- Form in GS is only reflected in few Germanic languages, probably with a wrong cognate assignment. Following Derksen (2008), assuming the GSR form is a much better candidate for the PIE word for “scratch”.

SKIN *pel *(s)kewH- Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as the meaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning “to cover”, but the cognate should not contain Slavic words (Derksen 2008).

WALK *ǵʰeh₁ *h₁ei- The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but the meaning may also have been “to go”.

WATER *h₂ekʷeh₂ *wódr̥ The ASR form is a much better candidate for “water” in PIE, due to its high number of reflexes in all branches.

WHITE *h₂elbʰós *h₂erǵó- The GS form is only reflected in Romance in this meaning and as meaning “cloud” in Hittite. The ASR form is a much better candidate, with a much more plausible connection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.

WORM *wr̥mi- *kʷr̥mis The ASR form is reflected in more different branches of PIE, while the GS form is only reflected in Germanic and Romance. erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 28 / 42 Results Specific Results Indo-European ASR: Different Forms

Concept GS ASR Comment RIVER *h₂ekʷeh₂ *h₂ep- Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely according to CLICS, this meaning is an innovation in Germanic. The ASR form is reflected across multiple branches and a much better candidate.

RUB *melh₁- *terh₁- Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR is reflected in the meaning “to rub, to bore”.

SCRATCH *gerbʰ- *kes- Form in GS is only reflected in few Germanic languages, probably with a wrong cognate assignment. Following Derksen (2008), assuming the GSR form is a much better candidate for the PIE word for “scratch”.

SKIN *pel *(s)kewH- Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as the meaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning “to cover”, but the cognate should not contain Slavic words (Derksen 2008).

WALK *ǵʰeh₁ *h₁ei- The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but the meaning may also have been “to go”.

WATER *h₂ekʷeh₂ *wódr̥ The ASR form is a much better candidate for “water” in PIE, due to its high number of reflexes in all branches.

WHITE *h₂elbʰós *h₂erǵó- The GS form is only reflected in Romance in this meaning and as meaning “cloud” in Hittite. The ASR form is a much better candidate, with a much more plausible connection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.

WORM *wr̥mi- *kʷr̥mis The ASR form is reflected in more different branches of PIE, while the GS form is only reflected in Germanic and Romance. erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 28 / 42 Results Specific Results Indo-European ASR: Additional Forms

Concept Form in ASR Comment

MOON *lewk-s-nh₂ This form would go back to a PIE root meaning “to shine” and is often said to have independently turned to mean “moon” in Romance and Slavic and other branches. The shift from “shine” to “moon” is however not very likely (no evidence in CLICS), so it is also possible that the word meant already “moon” in PIE as an epithet (Vaan 2008).

SNOW *ǵʰéi-mn̥- The form has probably independently shifted from the original meaning “frost, cold”, which is a very likely shift according to CLICS.

SUCK *suḱ- The root is present in this meaning in many subbranches and a good can- didate for PIE in this meaning.

THIS *so / *to The root is a clear PIE demonstrative (Meier-Brügger 2010), but the reflexes in the daughter languages vary greatly, due to analogical levelling.

WITH *sm̥ A very good candidate for the meaning with reflexes in Greek, Indo-Iranian and Slavic. erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 29 / 42 Results Specific Results Indo-European ASR: Additional Forms

Concept Form in ASR Comment

MOON *lewk-s-nh₂ This form would go back to a PIE root meaning “to shine” and is often said to have independently turned to mean “moon” in Romance and Slavic and other branches. The shift from “shine” to “moon” is however not very likely (no evidence in CLICS), so it is also possible that the word meant already “moon” in PIE as an epithet (Vaan 2008).

SNOW *ǵʰéi-mn̥- The form has probably independently shifted from the original meaning “frost, cold”, which is a very likely shift according to CLICS.

SUCK *suḱ- The root is present in this meaning in many subbranches and a good can- didate for PIE in this meaning.

THIS *so / *to The root is a clear PIE demonstrative (Meier-Brügger 2010), but the reflexes in the daughter languages vary greatly, due to analogical levelling.

WITH *sm̥ A very good candidate for the meaning with reflexes in Greek, Indo-Iranian and Slavic. erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 29 / 42 Results Specific Results Evaluation against our manually created gold standard

precision: 0.986 (1 false positive) recall: 0.895 (8 false negatives) F-score: 0.9381

1The IELex PIE entries have an F-score of 0.854. Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 30 / 42 Results Specific Results False positive

● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Ancient Greek ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Gothic ● English ● ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● ● Faroese ● Icelandic St ● ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Gaelic Scots ● Irish B ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Vlach ● Sardinian L ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Old Prussian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Slovenian ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Avestan ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Sariqoli ● Shughni ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Vedic Sanskrit ● Kashmiri ● Singhalese ● Gypsy Gk ● Khaskura ● Nepali snow:D ● Marathi

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 31 / 42 Results Specific Results False negatives

● Hittite ● Luvian ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Ancient Greek ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Gothic ● Old English ● Old High German ● Frisian ● Flemish ● German ● Standard German Munich ● Letzebuergesch ● Schwyzerduetsch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Gaulish ● Old Irish ● Gaelic Scots ● Irish B ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Old Prussian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Slovak ● Kati ● Avestan ● Old Persian ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Pashto ● Persian ● Tadzik ● Zazaki ● Vedic Sanskrit ● Singhalese ● Khaskura ● Nepali ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Marwari ● Sindhi ● Hindi river:O ● Panjabi St

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 32 / 42 Results Specific Results False negatives

● Tocharian B ● Tocharian A ● Albanian K ● Albanian Top ● Albanian T ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Ancient Greek ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Letzebuergesch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Gaelic Scots ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Sardinian C ● Italian ● Dolomite Ladino ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Old Prussian ● Latvian ● Lithuanian St ● Old Church Slavonic ● Slovenian ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Slovenian P ● Russian P ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Iron Ossetic ● Digor Ossetic ● Shughni ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Vedic Sanskrit ● Gypsy Gk ● Khaskura ● Nepali ● Bihari ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Sindhi ● Hindi ● Urdu ● Panjabi St smell:W ● Lahnda

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 33 / 42 Results Specific Results False negatives

● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Ancient Greek ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Gothic ● English ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Gaelic Scots ● Irish B ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Slovenian ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Kati ● Avestan ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Shughni ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Vedic Sanskrit ● Kashmiri ● Singhalese ● Gypsy Gk ● Bihari ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Marwari ● Sindhi wet:I ● Hindi

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 34 / 42 Results Specific Results False negatives

● Tocharian B ● Tocharian A ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Ancient Greek ● Greek K ● Greek Mod ● Tsakonian ● Greek Md ● Greek D ● Greek Ml ● Gothic ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Manx ● Gaelic Scots ● Irish B ● Irish A ● Old Cornish ● Old Breton ● Old Welsh ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Slovenian ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Prasun ● Kati ● Ashkun ● Avestan ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Kurdish ● Vedic Sanskrit ● Kashmiri ● Khaskura ● Nepali ● Bihari ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Marwari ● Hindi ● Urdu skin:B ● Lahnda

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 35 / 42 Results Specific Results False negatives

Hittite ● Tocharian B ●Tocharian A ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Armenian List ● Armenian Mod Ancient Greek ● ● Greek K ● Greek Mod ● Tsakonian ● Greek Md ● Greek D ● Greek Ml Gothic ● ● English Old English ● ●Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch Old Gutnish ● ● Old Swedish ● Faroese ● Icelandic St Old Norse ● ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up Old Irish ● ● Manx ● Gaelic Scots ● Irish B ● Irish A ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se Latin ● ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian Old Prussian ● ● Latvian ● Lithuanian St ● Lithuanian O Old Church Slavonic ● ● Slovenian ● Serbocroatian P ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Kati Avestan ● ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Sariqoli ● Shughni ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Kurdish Vedic Sanskrit ● ● Kashmiri ● Singhalese ● Khaskura ● Nepali ● Bihari ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Marwari ● Sindhi ● Bhojpuri ● Hindi ● Urdu ● Panjabi St sleep:E ● Lahnda

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 36 / 42 Results Specific Results False negatives

● Hittite ● Tocharian B ● Tocharian A ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Armenian List ● Ancient Greek ● Greek K ● Greek Mod ● Tsakonian ● Greek Md ● Greek D ● Greek Ml ● Gothic ● English ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● Old Gutnish ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Gaulish ● Old Irish ● Manx ● Gaelic Scots ● Irish B ● Irish A ● Old Cornish ● Old Breton ● Old Welsh ● Welsh N ● Welsh C ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Romansh ● Provencal ● French ● Walloon ● Catalan ● Spanish ● Portuguese St ● Brazilian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Prasun ● Kati ● Ashkun ● Avestan ● Sogdian ● Ossetic ● Iron Ossetic ● Digor Ossetic ● Sariqoli ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Kurdish ● Vedic Sanskrit ● Kashmiri ● Singhalese ● Gypsy Gk ● Khaskura ● Nepali ● Bihari ● Oriya ● Marathi ● Gujarati ● Marwari ● Hindi ● Panjabi St white:E ● Lahnda

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 37 / 42 Results Specific Results False negatives

● Tocharian B ● Tocharian A ● Albanian C ● Albanian K ● Albanian Top ● Albanian T ● Standard Albanian ● Albanian G ● Albanian ● Classical Armenian ● Armenian List ● Armenian Mod ● Greek K ● Greek Mod ● Greek Md ● Greek D ● Greek Ml ● Gothic ● English ● Old English ● Old High German ● Frisian ● Afrikaans ● Dutch List ● Flemish ● German ● Standard German Munich ● Pennsylvania Dutch ● Letzebuergesch ● Schwyzerduetsch ● Old Swedish ● Faroese ● Icelandic St ● Old Norse ● Norwegian ● Stavangersk ● Danish Fjolde ● Danish ● Oevdalian ● Gutnish Lau ● Swedish ● Swedish Vl ● Swedish Up ● Old Irish ● Gaelic Scots ● Irish B ● Welsh N ● Cornish ● Breton St ● Breton List ● Breton Se ● Latin ● Rumanian List ● Vlach ● Sardinian L ● Sardinian N ● Sardinian C ● Italian ● Dolomite Ladino ● Friulian ● Ladin ● Provencal ● French ● Walloon ● Spanish ● Portuguese St ● Brazilian ● Old Prussian ● Latvian ● Lithuanian St ● Lithuanian O ● Old Church Slavonic ● Serbocroatian P ● Serbian ● Serbocroatian ● Macedonian P ● Macedonian ● Bulgarian ● Bulgarian P ● Slovenian P ● Russian P ● Russian ● Ukrainian P ● Byelorussian P ● Byelorussian ● Ukrainian ● Polish ● Polish P ● Lower Sorbian ● Upper Sorbian ● Czech P ● Slovak P ● Czech ● Czech E ● Slovak ● Sogdian ● Iron Ossetic ● Digor Ossetic ● Wakhi ● Sariqoli ● Waziri ● Pashto ● Persian ● Tadzik ● Baluchi ● Zazaki ● Vedic Sanskrit ● Kashmiri ● Singhalese ● Nepali ● Bengali ● Oriya ● Assamese ● Marathi ● Gujarati ● Sindhi ● Magahi ● Hindi ● Urdu ● Panjabi St worm:B ● Lahnda

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 38 / 42 Results Specific Results Summary on Indo-European

As the qualitative evaluation shows, the proto-forms proposed to be reconstructed back to PIE by our best ASR method are mostly equally good if not even better candidates than those which we found in the gold standard. Given the general and well-known uncertainties in semantic reconstruction in classical historical linguistics, it seems that ASR methods could provide actual help in semantic reconstruction by providing objective evolutionary scenarios for word evolution along a given tree which follow a specific evolutionary model.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 39 / 42 Discussion Benefits of ASR (?)

If the language family is well-known ASR is of limited use in semantic reconstruction, since independent reconstructions by the comparative methods are available, but it is quite useful to check data quality and reference tree topology in lexicostatistical datasets. If the language family is less well-known ASR is definitely useful as a preliminary analysis for semantic reconstruction, since it gives a more objective assessment of the consequences of a given theory of lexical replacement and external language change (a tree topology).

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 40 / 42 Discussion Benefits of ASR (!)

ASR may help 1 to identify loci of homoplasy and gives thus a first hint for parallel semantic change patterns and borrowing. 2 to quantify differential rates of lexical replacements for the concepts in a given wordlist. 3 to automatically identify sound change patterns and proto-form reconstructions.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 41 / 42 Discussion Caveats

Our current models are still very simplistic, in so far as they operate independently for each meaning slot, handle only binary (yes-no) cognate relations between words. Future research will show whether it is possible to model lexical change across meanings and to allow for more fine-grained relations between cognate classes.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 42 / 42 References A. Bouchard-Côté, D. Hall, T. L. Griffiths, and D. Klein. Automated reconstruction of ancient languages using probabilistic models of sound change. Proceedings of the National Academy of Sciences of the United States of America, 110(11):4224–4229, 2013. R. Derksen. Etymological dictionary of the Slavic inherited lexicon. Brill, Leiden and Boston, 2008. G. Kroonen. Etymological dictionary of Proto-Germanic. Number 11 in Leiden Indo-European Etymological Dictionary Series. Brill, Leiden and Boston, 2013. J.-M. List, A. Terhalle, and M. Urban. Using network approaches to enhance the analysis of cross-linguistic polysemies. In Proceedings of the 10th International Conference on Computational Semantics – Short Papers, pages 347–353, Stroudsburg, 2013. Association for Computational Linguistics. J.-M. List, T. Mayer, A. Terhalle, and M. Urban. Clics: Database of Cross-Linguistic Colexifications. Online Resource, 2014a. URL http://clics.lingpy.org. J.-M. List, S. Nelson-Sathi, H. Geisler, and W. Martin. Networks of lexical borrowing and lateral gene transfer in language and genome evolution. Bioessays, 36(2):141–150, 2014b. M. Meier-Brügger. Indogermanische Sprachwissenschaft. de Gruyter, Berlin and New York, 8 edition, 2002. J. Pokorny. Indogermanisches etymologisches Wörterbuch, volume 1. Francke, Bern, 1959. M. Vaan. Etymological dictionary of Latin and the other Italic languages. Number 7 in Leiden Indo-European Etymological Dictionary Series. Brill, Leiden and Boston, 2008. M. Vasmer. Ėtimologičeskij slovar’ russkogo jazyka. Progress, Moscow, 1986/1987. D. Wodtko, B. Irslinger, and C. Schneider. Nomina im Indogermanischen Lexikon. Winter, Heidelberg, 2008. Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 42 / 42