Reconstructing a Lexicon from Sister Languages Using Neural Machine Translation

Total Page:16

File Type:pdf, Size:1020Kb

Reconstructing a Lexicon from Sister Languages Using Neural Machine Translation Restoring the Sister: Reconstructing a Lexicon from Sister Languages using Neural Machine Translation Remo Nitschke The University of Arizona [email protected] Abstract the same language family that share some estab- lished common ancestor language and a significant The historical comparative method has a long amount of cognates with the target language. By history in historical linguists. It describes a reverse-engineering the historical phonological pro- process by which historical linguists aim to cesses that happened between the target language reverse-engineer the historical developments and the sister-languages, one can predict what the of language families in order to reconstruct proto-forms and familial relations between lan- lexical item in the target language should be. This guages. In recent years, there have been multi- is essentially a twist on the comparative method, us- ple attempts to replicate this process through ing the same principles, but to reconstruct a modern machine learning, especially in the realm of sister, as opposed to a proto-antecedent. cognate detection (List et al., 2016; Ciobanu While neural net systems have been used to em- and Dinu, 2014; Rama et al., 2018). So far, ulate the historical comparative method1 to recon- most of these experiments aimed at actual re- construction have attempted the prediction of struct proto-forms (Meloni et al., 2019; Ciobanu a proto-form from the forms of the daughter and Dinu, 2018) and for cognate detection (List languages (Ciobanu and Dinu, 2018; Meloni et al., 2016; Ciobanu and Dinu, 2014; Rama et al., et al., 2019). Here, we propose a reimple- 2018), there have not, to the best of our knowl- mentation that uses modern related languages, edge, been any attempts to use neural nets to pre- or sisters, instead, to reconstruct the vocabu- dict/reconstruct lexical items of a sister language lary of a target language. In particular, we for revitalization/reconstruction purposes. show that we can reconstruct vocabulary of a target language by using a fairly small data Meloni et al.(2019) report success for a similar set of parallel cognates from different sister task (reconstructing Latin proto-forms) by using languages, using a neural machine translation cognate pattern lists as a training input. Instead of (NMT) architecture with a standard encoder- reconstructing Latin proto-forms from only Italian decoder setup. This effort is directly in fur- roots, they use Italian, Spanish, Portuguese, Roma- therance of the goal to use machine learning nian and French cognates of Latin, i.e., mapping tools to help under-served language communi- from many languages to one. As our intended use- ties in their efforts at reclaiming, preserving, or reconstructing their own languages. case (see section 1.1) is one that suffers from data sparsity, we explicitly explore the degree to which 1 Introduction expanding the list of sister-languages in the many- to-one mapping can compensate for fewer available Historical linguistics has long employed the his- data-points. Since the long-term goal of this project torical comparative method to establish familial is to aid language revitalization efforts, the question connections between languages and to reconstruct of available data is of utmost importance. Machine proto-forms (cf. Klein et al., 2017b; Meillet, 1967). learning often requires vast amounts of data, and More recently, the comparative method has been languages which are undergoing revitalization usu- employed by revitalization projects for lexical re- ally have very sparse amounts of data available. construction of lost lexical items (cf. Delgado et al., Hence, the goal for a machine learning approach 2019). In the particular case of Delgado et al. (2019), lost lexical items of the target language are 1Due to the nature of neural nets we do not know whether these systems actually emulate the historical comparative reconstructed by using equivalent cognates of still- method or not. What is meant here is that they were used spoken modern sister languages, i.e., languages in for the same tasks. 122 Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 122–130 June 11, 2021. ©2021 Association for Computational Linguistics here is not necessarily the highest possible accu- from the Romance example. Regardless of this, racy, but rather the ability to operate with as little some insights gained here will still be applicable in data as possible, while still retaining a reasonable those cases, such as the question of compensating amount of accuracy. lack of data by using multiple languages. Our particular contributions are: Languages that are the focus of language revital- 1. We demonstrate an approach for reframing the ization projects are typically not targets for deep historical comparative method to reconstruct a learning projects. One of the reasons for this is the target language from its sisters using a neural fact that these languages usually do not have large machine translation framework. We show that amounts of data available for training state of the this can be done with easily accessible open art neural approaches. These systems need large source frameworks such as OpenNMT (Klein amounts of data, and Neural Machine Translation et al., 2017a). systems, as the one used in this project, are no ex- ception. For example, Cho et al.(2014) use data 2. We provide a detailed analysis of the degree to sets varying between 5.5million and 348million which inputs from additional sister languages words. However, the task of proto-form reconstruc- can overcome issues of data sparsity. We find tion, which is really a task of cognate prediction, that adding more related languages allows can be achieved with fairly small datasets, if par- for higher accuracy with fewer data points. allel language input is used. This was shown by However, we also find that blindly adding lan- Meloni et al.(2019), whose system predicted 84% guages to the input stream does not always within an edit distance of 1, meaning that 84% yield said higher accuracy. The results sug- of the predictions were so accurate that only one gest that there needs to be a significant amount or 0 edits were necessary to achieve the true tar- of cognates with the added input language and get. For example, if the target output is “grazie", the target language. the machine might predict “grazia" (one edit) or 1.1 Intended Use-Case and Considerations "grazie" (0 edits). Within a language revitalization context, this level of accuracy would actually be a This experiment was designed with a specific use- very good outcome. In this scenario, a linguist or case in mind: Lexical reconstruction for language speaker familiar with the language would vet the revitalization projects. Specifically, the situation output regardless, so small edit distances should where this type of model may be most appli- not pose a big problem. Further, all members of a cable would be a language reclamation project language revitalization project or language commu- in the definition of Leonhard(2007) or a lan- nity would ultimately vet the output, as they would guage revival process in the definition of Mc- make a decision on whether to accept or reject the Carty and Nicholas(2014). In essence, a lan- output as a lexical item of the language. guage where there is some need to recover or re- construct a lexicon. An example of such a case This begs the question of why a language revital- might be the Wampanoag language reclamation ization project would want to go through the trou- project (https://www.wlrp.org/), or com- ble of using such an algorithm in the first place, if parable projects using the methods outlined in Del- they have someone available to vet the output, then gado et al.(2019). that person may as well do the reconstructive work As this is a proof-of-concept, we use the themselves, as proposed in Delgado et al.(2019). Romance language family, specifically the non- This all depends on two factors: First, how high is endangered languages of French, Spanish, Italian, the volume of lexical items that need to be recon- Portuguese and Romanian, and operate under as- structed or predicted? The effort may not be worth sumption that these results can inform how one can it for 10 or even a 100 lexical items, but beyond use this approach with other languages of interest. this an neural machine translation model can poten- However, we are aware that the Romance language tially outperform the manual labor. Once trained, morphology may be radically different from some the model can make thousands of predictions in of the languages that may be in the scope of this minutes, as long as input data is available. use case, such as agglutinative and polysynthetic Second, and potentially more important, it will languages, and that we cannot fully predict the per- depend on how well the historical phonological re- formance of this type of system for such languages lationships between the languages are understood. 123 Spanish French Portuguese Romanian Italian (target) status 1 - -esque -e:scere - - removed, no target 2 mosto moût mosto must mosto 3 - - - - lugano removed, no input 4 párrafo - - - paragrafo 5 -edad - -idade -itate -ità Table 1: Examples of data patterns, including types of data removed during cleanup (e.g., rows 1 and 3). For a family like Romance, we have a very good Continental Romance understanding of the historical genesis of the lan- guages and the different phonological processes Italo Western Romance Eastern Romance they underwent, see for example Maiden et al. Italian Western Romance Balkan Romance (2013). However, there are many language fam- ilies in the world where these relationships and West-Ibero Romance Gallo-Rhaetian Romanian histories are less than clear.
Recommended publications
  • 《International Symposium》 Let's Talk About Trees National Museum
    《International Symposium》 Let’s Talk about Trees National Museum of Ethnology, Osaka, Japan. February 10, 2013 Abstract [Austronesian Language Case Studies 2] Limitations of the tree model: A case study from North Vanuatu Siva KALYAN1 and Alexandre FRANÇOIS2 1. Northumbria University 2. CNRS/LACITO Since the beginnings of historical linguistics, the family tree has been the most widely accepted model for representing historical relations between languages. It is even being reinvigorated by current research in computational phylogenetics (e.g. Gray et al. 2009). While this sort of representation is certainly easy to grasp, and allows for a simple, attractive account of the development of a language family, the assumptions made by the tree model are applicable in only a small number of cases. The tree model is appropriate when a speaker population undergoes successive splits, with subsequent loss of contact among subgroups. For all other scenarios, it fails to provide an accurate representation of language history (cf. Durie & Ross 1996; Pawley 1999; Heggarty et al. 2010). In particular, it is unable to deal with dialect continua, as well as language families that develop out of dialect continua (for which Ross 1988 has proposed the term "linkage"). In such cases, the scopes of innovations (in other words, their isoglosses) are not nested, but rather they persistently intersect, so that any proposed tree representation is met with abundant counterexamples. Though Ross's initial observations about linkages concerned the languages of Western Melanesia, it is clear that linkages are found in many other areas as well (such as Fiji: cf. Geraghty 1983). In this presentation, we focus on the 17 languages of the Torres and Banks islands in Vanuatu, which form a linkage (Tryon 1996, François 2011), and attempt to develop adequate representations for it.
    [Show full text]
  • Species Tree Likelihood Computation Given SNP Data Using Ancestral Configurations
    Species Tree Likelihood Computation Given SNP Data Using Ancestral Configurations DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Hang Fan, M.S. Graduate Program in Statistics The Ohio State University 2013 Dissertation Committee: Professor Laura Kubatko, Advisor Professor Bryan Carstens Professor Radu Herbei 1 Copyright by Hang Fan 2013 2 Abstract Inferring species trees given genetic data has been a challenge in the field of phylogenetics because of the high intensity during computation. In the coalescent framework, this dissertation proposes an innovative method of estimating the likelihood of a species tree directly from Single Nucleotide Polymorphism (SNP) data with a certain nucleotide substitution model. This method uses the idea of Ancestral Configurations (Wu, 2011) to avoid the computation burden brought by the enumeration of coalescent histories. Importance sampling is used to in Monte Carlo integration to approximate the expectations in the computation, where the accuracy of the approximation is tested in different tree models. The SNP data is processed beforehand which vastly boosts the efficiency of the method. Gene tree sampling given the species tree under the coalescent model is employed to make the computation feasible for large trees. Further, the branch lengths on the species tree are optimized according to the computed species tree likelihood, which provides the likelihood of the species tree topology given the SNP data. For inference, this likelihood computation method is implemented in the stepwise addition algorithm to infer the maximum likelihood species tree in the tree space given the SNP data, and simulations are conduced to test the performance.
    [Show full text]
  • EVOLUTIONARY INFERENCE: Some Basics of Phylogenetic Analyses
    EVOLUTIONARY INFERENCE: Some basics of phylogenetic analyses. Ana Rojas Mendoza CNIO-Madrid-Spain. Alfonso Valencia’s lab. Aims of this talk: • 1.To introduce relevant concepts of evolution to practice phylogenetic inference from molecular data. • 2.To introduce some of the most useful methods and computer programmes to practice phylogenetic inference. • • 3.To show some examples I’ve worked in. SOME BASICS 11--ConceptsConcepts ofof MolecularMolecular EvolutionEvolution • Homology vs Analogy. • Homology vs similarity. • Ortologous vs Paralogous genes. • Species tree vs genes tree. • Molecular clock. • Allele mutation vs allele substitution. • Rates of allele substitution. • Neutral theory of evolution. SOME BASICS Owen’s definition of homology Richard Owen, 1843 • Homologue: the same organ under every variety of form and function (true or essential correspondence). •Analogy: superficial or misleading similarity. SOME BASICS 1.Concepts1.Concepts ofof MolecularMolecular EvolutionEvolution • Homology vs Analogy. • Homology vs similarity. • Ortologous vs Paralogous genes. • Species tree vs genes tree. • Molecular clock. • Allele mutation vs allele substitution. • Rates of allele substitution. • Neutral theory of evolution. SOME BASICS Similarity ≠ Homology • Similarity: mathematical concept . Homology: biological concept Common Ancestry!!! SOME BASICS 1.Concepts1.Concepts ofof MolecularMolecular EvolutionEvolution • Homology vs Analogy. • Homology vs similarity. • Ortologous vs Paralogous genes. • Species tree vs genes tree. • Molecular clock.
    [Show full text]
  • The Probability of Monophyly of a Sample of Gene Lineages on a Species Tree
    PAPER The probability of monophyly of a sample of gene COLLOQUIUM lineages on a species tree Rohan S. Mehtaa,1, David Bryantb, and Noah A. Rosenberga aDepartment of Biology, Stanford University, Stanford, CA 94305; and bDepartment of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand Edited by John C. Avise, University of California, Irvine, CA, and approved April 18, 2016 (received for review February 5, 2016) Monophyletic groups—groups that consist of all of the descendants loci that are reciprocally monophyletic is informative about the of a most recent common ancestor—arise naturally as a conse- time since species divergence and can assist in representing the quence of descent processes that result in meaningful distinctions level of differentiation between groups (4, 18). between organisms. Aspects of monophyly are therefore central to Many empirical investigations of genealogical phenomena have fields that examine and use genealogical descent. In particular, stud- made use of conceptual and statistical properties of monophyly ies in conservation genetics, phylogeography, population genetics, (19). Comparisons of observed monophyly levels to model pre- species delimitation, and systematics can all make use of mathemat- dictions have been used to provide information about species di- ical predictions under evolutionary models about features of mono- vergence times (20, 21). Model-based monophyly computations phyly. One important calculation, the probability that a set of gene have been used alongside DNA sequence differences between and lineages is monophyletic under a two-species neutral coalescent within proposed clades to argue for the existence of the clades model, has been used in many studies. Here, we extend this calcu- (22), and tests involving reciprocal monophyly have been used to lation for a species tree model that contains arbitrarily many species.
    [Show full text]
  • Application of the Comparative Method to Morpheme-Final Nasals in Nivkh Halm, R
    Application of the Comparative Method to Morpheme-Final Nasals in Nivkh Halm, R. & Slater, J. Application of the Comparative Method to Morpheme-Final Nasals in Nivkh Robert Halm 1 Bloomington, IN, USA Jay Slater Pittsburgh, PA, USA Following up on recent work, we consider morpheme-final nasals in the Nivkh language family of northeast Asia using the Standard Comparative Method, and attempt to reconstruct the inventory and morphophonemic behavior of morpheme-final nasal phonemes in Proto-Nivkh (PN). Previous work has pointed towards PN nasals at four loci, /*m/, /*n/, /*ɲ/, /*ŋ/, of which at least /*ŋ/ could be phonemically either “strong”, triggering fricatives to surface across morpheme juncture, or “weak”, triggering plosives to surface; with weak /*ŋ/ place-assimilating to following plosives across morpheme juncture, and weak /*n/ and weak /*ŋ/ elided in the Amur and West Sakhalin lects. However, with the benefit of more and better data than were available to previous authors, we find instead that elision must have been conditioned by a feature other than the strong-weak contrast (provisionally, length), but which interacted with the strong-weak contrast (“short” strong nasals were inextant), and that this “length” contrast also conditioned assimilation or non-assimilation of final /*ŋ/ (only “short” weak /*ŋ/ assimilated, not “long” weak /*ŋ/). We confirm that the strong-weak morphophonemic contrast existed for at least /*m/, /*ɲ/, and /*ŋ/ (rather than only for /*ŋ/), and the “length” contrast for /*n/ as well as /*ŋ/. Keywords: Nivkh, Gilyak, Proto-Nivkh, nasal, Comparative Method 1 Address correspondence to: [email protected] Kansas Working Papers in Linguistics 1 Application of the Comparative Method to Morpheme-Final Nasals in Nivkh Halm, R.
    [Show full text]
  • Linguistic History and Language Diversity in India: Views and Counterviews
    J Biosci (2019) 44:62 Indian Academy of Sciences DOI: 10.1007/s12038-019-9879-1 (0123456789().,-volV)(0123456789().,-volV) Linguistic history and language diversity in India: Views and counterviews SONAL KULKARNI-JOSHI Deccan College, Pune, India (Email, [email protected]) This paper addresses the theme of the seminar from the perspective of historical linguistics. It introduces the construct of ‘language family’ and then proceeds to a discussion of contact and the dynamics of linguistic exchange among the main language families of India over several millennia. Some prevalent hypotheses to explain the creation of India as a linguistic area are presented. The ‘substratum view’ is critically assessed. Evidence from historical linguistics in support of two dominant hypotheses –‘the Aryan migration view’ and ‘the out-of-India hypothesis’–is presented and briefly assessed. In conclusion, it is observed that the current understanding in historical linguistics favours the Aryan migration view though the ‘substratum view’ is questionable. Keywords. Aryan migration; historical linguistics; language family; Out-of-India hypothesis; substratum 1. Introduction the basis of social, political and cultural criteria more than linguistic criteria. The aim of this paper is to lend a linguistic perspective on This vast number of languages is classified into four (or the issue of human diversity and ancestry in India to the non- six) language families or genealogical types: Austro-Asiatic linguists at this seminar. The paper is an overview of the (Munda), Dravidian, Indo-Aryan (IA) and Tibeto-Burman; major views and evidences gleaned from the available more recently, two other language families have been literature.
    [Show full text]
  • Indo-European Linguistics: an Introduction Indo-European Linguistics an Introduction
    This page intentionally left blank Indo-European Linguistics The Indo-European language family comprises several hun- dred languages and dialects, including most of those spoken in Europe, and south, south-west and central Asia. Spoken by an estimated 3 billion people, it has the largest number of native speakers in the world today. This textbook provides an accessible introduction to the study of the Indo-European proto-language. It clearly sets out the methods for relating the languages to one another, presents an engaging discussion of the current debates and controversies concerning their clas- sification, and offers sample problems and suggestions for how to solve them. Complete with a comprehensive glossary, almost 100 tables in which language data and examples are clearly laid out, suggestions for further reading, discussion points and a range of exercises, this text will be an essential toolkit for all those studying historical linguistics, language typology and the Indo-European proto-language for the first time. james clackson is Senior Lecturer in the Faculty of Classics, University of Cambridge, and is Fellow and Direc- tor of Studies, Jesus College, University of Cambridge. His previous books include The Linguistic Relationship between Armenian and Greek (1994) and Indo-European Word For- mation (co-edited with Birgit Anette Olson, 2004). CAMBRIDGE TEXTBOOKS IN LINGUISTICS General editors: p. austin, j. bresnan, b. comrie, s. crain, w. dressler, c. ewen, r. lass, d. lightfoot, k. rice, i. roberts, s. romaine, n. v. smith Indo-European Linguistics An Introduction In this series: j. allwood, l.-g. anderson and o.¨ dahl Logic in Linguistics d.
    [Show full text]
  • Proto-Indo-European Roots of the Vedic Aryans
    3 (2016) Miscellaneous 1: A-V Proto-Indo-European Roots of the Vedic Aryans TRAVIS D. WEBSTER Center for Traditional Vedanta, USA © 2016 Ruhr-Universität Bochum Entangled Religions 3 (2016) ISSN 2363-6696 http://dx.doi.org/10.13154/er.v3.2016.A–V Proto-Indo-European Roots of the Vedic Aryans Proto-Indo-European Roots of the Vedic Aryans TRAVIS D. WEBSTER Center for Traditional Vedanta ABSTRACT Recent archaeological evidence and the comparative method of Indo-European historical linguistics now make it possible to reconstruct the Aryan migrations into India, two separate diffusions of which merge with elements of Harappan religion in Asko Parpola’s The Roots of Hinduism: The Early Aryans and the Indus Civilization (NY: Oxford University Press, 2015). This review of Parpola’s work emphasizes the acculturation of Rigvedic and Atharvavedic traditions as represented in the depiction of Vedic rites and worship of Indra and the Aśvins (Nāsatya). After identifying archaeological cultures prior to the breakup of Proto-Indo-European linguistic unity and demarcating the two branches of the Proto-Aryan community, the role of the Vrātyas leads back to mutual encounters with the Iranian Dāsas. KEY WORDS Asko Parpola; Aryan migrations; Vedic religion; Hinduism Introduction Despite the triumph of the world-religions paradigm from the late nineteenth century onwards, the fact remains that Indologists require more precise taxonomic nomenclature to make sense of their data. Although the Vedas are widely portrayed as the ‘Hindu scriptures’ and are indeed upheld as the sole arbiter of scriptural authority among Brahmins, for instance, the Vedic hymns actually play a very minor role in contemporary Indian religion.
    [Show full text]
  • Outer and Inner Indo-Aryan, and Northern India As an Ancient Linguistic Area
    Acta Orientalia 2016: 77, 71–132. Copyright © 2016 Printed in India – all rights reserved ACTA ORIENTALIA ISSN 0001-6483 Outer and Inner Indo-Aryan, and northern India as an ancient linguistic area Claus Peter Zoller University of Oslo Abstract The article presents a new approach to the old controversy concerning the veracity of a distinction between Outer and Inner Languages in Indo-Aryan. A number of arguments and data are presented which substantiate the reality of this distinction. This new approach combines this issue with a new interpretation of the history of Indo- Iranian and with the linguistic prehistory of northern India. Data are presented to show that prehistorical northern India was dominated by Munda/Austro-Asiatic languages. Keywords: Indo-Aryan, Indo-Iranian, Nuristani, Munda/Austro- Asiatic history and prehistory. Introduction This article gives a summary of the most important arguments contained in my forthcoming book on Outer and Inner languages before and after the arrival of Indo-Aryan in South Asia. The 72 Claus Peter Zoller traditional version of the hypothesis of Outer and Inner Indo-Aryan purports the idea that the Indo-Aryan Language immigration1 was not a singular event. Yet, even though it is known that the actual historical movements and processes in connection with this immigration were remarkably complex, the concerns of the hypothesis are not to reconstruct the details of these events but merely to show that the original non-singular immigrations have left revealing linguistic traces in the modern Indo-Aryan languages. Actually, this task is challenging enough, as the long-lasting controversy shows.2 Previous and present proponents of the hypothesis have tried to fix the difference between Outer and Inner Languages in terms of language geography (one graphical attempt as an example is shown below p.
    [Show full text]
  • Linguistics 623 TOPICS in INDIC LINGUISTICS
    Linguistics 623 TOPICS IN INDIC LINGUISTICS Instructor: Brian D. Joseph; 206 Oxley Hall (292-4981); [email protected] Office Hours: M W 9:30 - 10:15 or (preferably) by appointment Focus of Course: History of Sanskrit / Sanskrit Historical Grammar Goals: To investigate and learn about: a. the prehistory of Sanskrit b. the development of the language within its historical attestation (Vedic into Classical Sanskrit) c. the external history of the language; effects of language contact and the sociolinguistic setting in ancient India d. those aspects of the synchronic grammar of Sanskrit that receive particular illumination when viewed in the context of their historical background and development and in so doing, to further understanding of methods and practices of historical linguistics. Specific Topics To Be Covered (more or less in this order): a. basics on comparative grammar, the comparative method, and language relatedness b. Sanskrit in its Indo-European context; connections with other IE languages c. Sanskrit within Indo-Iranian d. Sanskrit within Indic; the relationship of Sanskrit with Prakrit e. Sanskrit historical phonology (viewed against its IE background): • the relationship between IE ablaut and Sanskrit vowel gradation • the historical sources of nasal strengthening • Sanskrit sandhi peculiarities viewed historically • aspiration alternations viewed historically f. Sanskrit historical morphology, especially concerning the verb, and especially: • the origin and development of the present classes • the perfect system
    [Show full text]
  • The Shape of Phylogenetic Trees (Review Paper)
    REVIEW PAPER: THE SHAPE OF PHYLOGENETIC TREESPACE KATHERINE ST. JOHN Abstract: Trees are a canonical structure for representing evolutionary histories. Many popular criteria used to infer optimal trees are computationally hard, and the number of possible tree shapes grows super-exponentially in the number of taxa. The underlying structure of the spaces of trees yields rich insights that can improve the search for optimal trees, both in accuracy and running time, and the analysis and visualization of results. We review the past work on analyzing and comparing trees by their shape as well as recent work that incorporates trees with weighted branch lengths. Keywords: tree metrics, treespace, maximum parsimony, maximum likelihood. Tree structures have long been used to represent the evolutionary histories of sets of species. For example, the tips of the trees represent extant species and the internal nodes represent speciation events. Despite its simplicity, the tree model captures much of the complexity of the underlying phenomena. However, the sheer number of possibilities forces many simply presented operations on trees to be computationally hard. For example, the maximum parsimony criteria (Farris, 1970; Fitch, 1971) that seeks the tree with the minimal number of changes across the edges is compu- tationally hard to compute exactly (Foulds and Graham, 1982). The addition of weights on the branches, to denote quantities such as amount of evolutionary change, the time, or the confidence on the existence of the branch, adds complexity to the model (Felsenstein, 1973, 1978). A pop- ular corresponding optimality criteria for weighted trees, the maximum likelihood criteria, is also computationally hard (Roch, 2006).
    [Show full text]
  • Autochthonous Aryans? the Evidence from Old Indian and Iranian Texts
    Michael Witzel Harvard University Autochthonous Aryans? The Evidence from Old Indian and Iranian Texts. INTRODUCTION §1. Terminology § 2. Texts § 3. Dates §4. Indo-Aryans in the RV §5. Irano-Aryans in the Avesta §6. The Indo-Iranians §7. An ''Aryan'' Race? §8. Immigration §9. Remembrance of immigration §10. Linguistic and cultural acculturation THE AUTOCHTHONOUS ARYAN THEORY § 11. The ''Aryan Invasion'' and the "Out of India" theories LANGUAGE §12. Vedic, Iranian and Indo-European §13. Absence of Indian influences in Indo-Iranian §14. Date of Indo-Aryan innovations §15. Absence of retroflexes in Iranian §16. Absence of 'Indian' words in Iranian §17. Indo-European words in Indo-Iranian; Indo-European archaisms vs. Indian innovations §18. Absence of Indian influence in Mitanni Indo-Aryan Summary: Linguistics CHRONOLOGY §19. Lack of agreement of the autochthonous theory with the historical evidence: dating of kings and teachers ARCHAEOLOGY __________________________________________ Electronic Journal of Vedic Studies 7-3 (EJVS) 2001(1-115) Autochthonous Aryans? 2 §20. Archaeology and texts §21. RV and the Indus civilization: horses and chariots §22. Absence of towns in the RV §23. Absence of wheat and rice in the RV §24. RV class society and the Indus civilization §25. The Sarasvatī and dating of the RV and the Bråhmaas §26. Harappan fire rituals? §27. Cultural continuity: pottery and the Indus script VEDIC TEXTS AND SCIENCE §28. The ''astronomical code of the RV'' §29. Astronomy: the equinoxes in ŚB §30. Astronomy: Jyotia Vedåga and the
    [Show full text]