Reconstructing a Lexicon from Sister Languages Using Neural Machine Translation
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
《International Symposium》 Let's Talk About Trees National Museum
《International Symposium》 Let’s Talk about Trees National Museum of Ethnology, Osaka, Japan. February 10, 2013 Abstract [Austronesian Language Case Studies 2] Limitations of the tree model: A case study from North Vanuatu Siva KALYAN1 and Alexandre FRANÇOIS2 1. Northumbria University 2. CNRS/LACITO Since the beginnings of historical linguistics, the family tree has been the most widely accepted model for representing historical relations between languages. It is even being reinvigorated by current research in computational phylogenetics (e.g. Gray et al. 2009). While this sort of representation is certainly easy to grasp, and allows for a simple, attractive account of the development of a language family, the assumptions made by the tree model are applicable in only a small number of cases. The tree model is appropriate when a speaker population undergoes successive splits, with subsequent loss of contact among subgroups. For all other scenarios, it fails to provide an accurate representation of language history (cf. Durie & Ross 1996; Pawley 1999; Heggarty et al. 2010). In particular, it is unable to deal with dialect continua, as well as language families that develop out of dialect continua (for which Ross 1988 has proposed the term "linkage"). In such cases, the scopes of innovations (in other words, their isoglosses) are not nested, but rather they persistently intersect, so that any proposed tree representation is met with abundant counterexamples. Though Ross's initial observations about linkages concerned the languages of Western Melanesia, it is clear that linkages are found in many other areas as well (such as Fiji: cf. Geraghty 1983). In this presentation, we focus on the 17 languages of the Torres and Banks islands in Vanuatu, which form a linkage (Tryon 1996, François 2011), and attempt to develop adequate representations for it. -
Species Tree Likelihood Computation Given SNP Data Using Ancestral Configurations
Species Tree Likelihood Computation Given SNP Data Using Ancestral Configurations DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Hang Fan, M.S. Graduate Program in Statistics The Ohio State University 2013 Dissertation Committee: Professor Laura Kubatko, Advisor Professor Bryan Carstens Professor Radu Herbei 1 Copyright by Hang Fan 2013 2 Abstract Inferring species trees given genetic data has been a challenge in the field of phylogenetics because of the high intensity during computation. In the coalescent framework, this dissertation proposes an innovative method of estimating the likelihood of a species tree directly from Single Nucleotide Polymorphism (SNP) data with a certain nucleotide substitution model. This method uses the idea of Ancestral Configurations (Wu, 2011) to avoid the computation burden brought by the enumeration of coalescent histories. Importance sampling is used to in Monte Carlo integration to approximate the expectations in the computation, where the accuracy of the approximation is tested in different tree models. The SNP data is processed beforehand which vastly boosts the efficiency of the method. Gene tree sampling given the species tree under the coalescent model is employed to make the computation feasible for large trees. Further, the branch lengths on the species tree are optimized according to the computed species tree likelihood, which provides the likelihood of the species tree topology given the SNP data. For inference, this likelihood computation method is implemented in the stepwise addition algorithm to infer the maximum likelihood species tree in the tree space given the SNP data, and simulations are conduced to test the performance. -
EVOLUTIONARY INFERENCE: Some Basics of Phylogenetic Analyses
EVOLUTIONARY INFERENCE: Some basics of phylogenetic analyses. Ana Rojas Mendoza CNIO-Madrid-Spain. Alfonso Valencia’s lab. Aims of this talk: • 1.To introduce relevant concepts of evolution to practice phylogenetic inference from molecular data. • 2.To introduce some of the most useful methods and computer programmes to practice phylogenetic inference. • • 3.To show some examples I’ve worked in. SOME BASICS 11--ConceptsConcepts ofof MolecularMolecular EvolutionEvolution • Homology vs Analogy. • Homology vs similarity. • Ortologous vs Paralogous genes. • Species tree vs genes tree. • Molecular clock. • Allele mutation vs allele substitution. • Rates of allele substitution. • Neutral theory of evolution. SOME BASICS Owen’s definition of homology Richard Owen, 1843 • Homologue: the same organ under every variety of form and function (true or essential correspondence). •Analogy: superficial or misleading similarity. SOME BASICS 1.Concepts1.Concepts ofof MolecularMolecular EvolutionEvolution • Homology vs Analogy. • Homology vs similarity. • Ortologous vs Paralogous genes. • Species tree vs genes tree. • Molecular clock. • Allele mutation vs allele substitution. • Rates of allele substitution. • Neutral theory of evolution. SOME BASICS Similarity ≠ Homology • Similarity: mathematical concept . Homology: biological concept Common Ancestry!!! SOME BASICS 1.Concepts1.Concepts ofof MolecularMolecular EvolutionEvolution • Homology vs Analogy. • Homology vs similarity. • Ortologous vs Paralogous genes. • Species tree vs genes tree. • Molecular clock. -
The Probability of Monophyly of a Sample of Gene Lineages on a Species Tree
PAPER The probability of monophyly of a sample of gene COLLOQUIUM lineages on a species tree Rohan S. Mehtaa,1, David Bryantb, and Noah A. Rosenberga aDepartment of Biology, Stanford University, Stanford, CA 94305; and bDepartment of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand Edited by John C. Avise, University of California, Irvine, CA, and approved April 18, 2016 (received for review February 5, 2016) Monophyletic groups—groups that consist of all of the descendants loci that are reciprocally monophyletic is informative about the of a most recent common ancestor—arise naturally as a conse- time since species divergence and can assist in representing the quence of descent processes that result in meaningful distinctions level of differentiation between groups (4, 18). between organisms. Aspects of monophyly are therefore central to Many empirical investigations of genealogical phenomena have fields that examine and use genealogical descent. In particular, stud- made use of conceptual and statistical properties of monophyly ies in conservation genetics, phylogeography, population genetics, (19). Comparisons of observed monophyly levels to model pre- species delimitation, and systematics can all make use of mathemat- dictions have been used to provide information about species di- ical predictions under evolutionary models about features of mono- vergence times (20, 21). Model-based monophyly computations phyly. One important calculation, the probability that a set of gene have been used alongside DNA sequence differences between and lineages is monophyletic under a two-species neutral coalescent within proposed clades to argue for the existence of the clades model, has been used in many studies. Here, we extend this calcu- (22), and tests involving reciprocal monophyly have been used to lation for a species tree model that contains arbitrarily many species. -
Application of the Comparative Method to Morpheme-Final Nasals in Nivkh Halm, R
Application of the Comparative Method to Morpheme-Final Nasals in Nivkh Halm, R. & Slater, J. Application of the Comparative Method to Morpheme-Final Nasals in Nivkh Robert Halm 1 Bloomington, IN, USA Jay Slater Pittsburgh, PA, USA Following up on recent work, we consider morpheme-final nasals in the Nivkh language family of northeast Asia using the Standard Comparative Method, and attempt to reconstruct the inventory and morphophonemic behavior of morpheme-final nasal phonemes in Proto-Nivkh (PN). Previous work has pointed towards PN nasals at four loci, /*m/, /*n/, /*ɲ/, /*ŋ/, of which at least /*ŋ/ could be phonemically either “strong”, triggering fricatives to surface across morpheme juncture, or “weak”, triggering plosives to surface; with weak /*ŋ/ place-assimilating to following plosives across morpheme juncture, and weak /*n/ and weak /*ŋ/ elided in the Amur and West Sakhalin lects. However, with the benefit of more and better data than were available to previous authors, we find instead that elision must have been conditioned by a feature other than the strong-weak contrast (provisionally, length), but which interacted with the strong-weak contrast (“short” strong nasals were inextant), and that this “length” contrast also conditioned assimilation or non-assimilation of final /*ŋ/ (only “short” weak /*ŋ/ assimilated, not “long” weak /*ŋ/). We confirm that the strong-weak morphophonemic contrast existed for at least /*m/, /*ɲ/, and /*ŋ/ (rather than only for /*ŋ/), and the “length” contrast for /*n/ as well as /*ŋ/. Keywords: Nivkh, Gilyak, Proto-Nivkh, nasal, Comparative Method 1 Address correspondence to: [email protected] Kansas Working Papers in Linguistics 1 Application of the Comparative Method to Morpheme-Final Nasals in Nivkh Halm, R. -
Linguistic History and Language Diversity in India: Views and Counterviews
J Biosci (2019) 44:62 Indian Academy of Sciences DOI: 10.1007/s12038-019-9879-1 (0123456789().,-volV)(0123456789().,-volV) Linguistic history and language diversity in India: Views and counterviews SONAL KULKARNI-JOSHI Deccan College, Pune, India (Email, [email protected]) This paper addresses the theme of the seminar from the perspective of historical linguistics. It introduces the construct of ‘language family’ and then proceeds to a discussion of contact and the dynamics of linguistic exchange among the main language families of India over several millennia. Some prevalent hypotheses to explain the creation of India as a linguistic area are presented. The ‘substratum view’ is critically assessed. Evidence from historical linguistics in support of two dominant hypotheses –‘the Aryan migration view’ and ‘the out-of-India hypothesis’–is presented and briefly assessed. In conclusion, it is observed that the current understanding in historical linguistics favours the Aryan migration view though the ‘substratum view’ is questionable. Keywords. Aryan migration; historical linguistics; language family; Out-of-India hypothesis; substratum 1. Introduction the basis of social, political and cultural criteria more than linguistic criteria. The aim of this paper is to lend a linguistic perspective on This vast number of languages is classified into four (or the issue of human diversity and ancestry in India to the non- six) language families or genealogical types: Austro-Asiatic linguists at this seminar. The paper is an overview of the (Munda), Dravidian, Indo-Aryan (IA) and Tibeto-Burman; major views and evidences gleaned from the available more recently, two other language families have been literature. -
Indo-European Linguistics: an Introduction Indo-European Linguistics an Introduction
This page intentionally left blank Indo-European Linguistics The Indo-European language family comprises several hun- dred languages and dialects, including most of those spoken in Europe, and south, south-west and central Asia. Spoken by an estimated 3 billion people, it has the largest number of native speakers in the world today. This textbook provides an accessible introduction to the study of the Indo-European proto-language. It clearly sets out the methods for relating the languages to one another, presents an engaging discussion of the current debates and controversies concerning their clas- sification, and offers sample problems and suggestions for how to solve them. Complete with a comprehensive glossary, almost 100 tables in which language data and examples are clearly laid out, suggestions for further reading, discussion points and a range of exercises, this text will be an essential toolkit for all those studying historical linguistics, language typology and the Indo-European proto-language for the first time. james clackson is Senior Lecturer in the Faculty of Classics, University of Cambridge, and is Fellow and Direc- tor of Studies, Jesus College, University of Cambridge. His previous books include The Linguistic Relationship between Armenian and Greek (1994) and Indo-European Word For- mation (co-edited with Birgit Anette Olson, 2004). CAMBRIDGE TEXTBOOKS IN LINGUISTICS General editors: p. austin, j. bresnan, b. comrie, s. crain, w. dressler, c. ewen, r. lass, d. lightfoot, k. rice, i. roberts, s. romaine, n. v. smith Indo-European Linguistics An Introduction In this series: j. allwood, l.-g. anderson and o.¨ dahl Logic in Linguistics d. -
Proto-Indo-European Roots of the Vedic Aryans
3 (2016) Miscellaneous 1: A-V Proto-Indo-European Roots of the Vedic Aryans TRAVIS D. WEBSTER Center for Traditional Vedanta, USA © 2016 Ruhr-Universität Bochum Entangled Religions 3 (2016) ISSN 2363-6696 http://dx.doi.org/10.13154/er.v3.2016.A–V Proto-Indo-European Roots of the Vedic Aryans Proto-Indo-European Roots of the Vedic Aryans TRAVIS D. WEBSTER Center for Traditional Vedanta ABSTRACT Recent archaeological evidence and the comparative method of Indo-European historical linguistics now make it possible to reconstruct the Aryan migrations into India, two separate diffusions of which merge with elements of Harappan religion in Asko Parpola’s The Roots of Hinduism: The Early Aryans and the Indus Civilization (NY: Oxford University Press, 2015). This review of Parpola’s work emphasizes the acculturation of Rigvedic and Atharvavedic traditions as represented in the depiction of Vedic rites and worship of Indra and the Aśvins (Nāsatya). After identifying archaeological cultures prior to the breakup of Proto-Indo-European linguistic unity and demarcating the two branches of the Proto-Aryan community, the role of the Vrātyas leads back to mutual encounters with the Iranian Dāsas. KEY WORDS Asko Parpola; Aryan migrations; Vedic religion; Hinduism Introduction Despite the triumph of the world-religions paradigm from the late nineteenth century onwards, the fact remains that Indologists require more precise taxonomic nomenclature to make sense of their data. Although the Vedas are widely portrayed as the ‘Hindu scriptures’ and are indeed upheld as the sole arbiter of scriptural authority among Brahmins, for instance, the Vedic hymns actually play a very minor role in contemporary Indian religion. -
Outer and Inner Indo-Aryan, and Northern India As an Ancient Linguistic Area
Acta Orientalia 2016: 77, 71–132. Copyright © 2016 Printed in India – all rights reserved ACTA ORIENTALIA ISSN 0001-6483 Outer and Inner Indo-Aryan, and northern India as an ancient linguistic area Claus Peter Zoller University of Oslo Abstract The article presents a new approach to the old controversy concerning the veracity of a distinction between Outer and Inner Languages in Indo-Aryan. A number of arguments and data are presented which substantiate the reality of this distinction. This new approach combines this issue with a new interpretation of the history of Indo- Iranian and with the linguistic prehistory of northern India. Data are presented to show that prehistorical northern India was dominated by Munda/Austro-Asiatic languages. Keywords: Indo-Aryan, Indo-Iranian, Nuristani, Munda/Austro- Asiatic history and prehistory. Introduction This article gives a summary of the most important arguments contained in my forthcoming book on Outer and Inner languages before and after the arrival of Indo-Aryan in South Asia. The 72 Claus Peter Zoller traditional version of the hypothesis of Outer and Inner Indo-Aryan purports the idea that the Indo-Aryan Language immigration1 was not a singular event. Yet, even though it is known that the actual historical movements and processes in connection with this immigration were remarkably complex, the concerns of the hypothesis are not to reconstruct the details of these events but merely to show that the original non-singular immigrations have left revealing linguistic traces in the modern Indo-Aryan languages. Actually, this task is challenging enough, as the long-lasting controversy shows.2 Previous and present proponents of the hypothesis have tried to fix the difference between Outer and Inner Languages in terms of language geography (one graphical attempt as an example is shown below p. -
Linguistics 623 TOPICS in INDIC LINGUISTICS
Linguistics 623 TOPICS IN INDIC LINGUISTICS Instructor: Brian D. Joseph; 206 Oxley Hall (292-4981); [email protected] Office Hours: M W 9:30 - 10:15 or (preferably) by appointment Focus of Course: History of Sanskrit / Sanskrit Historical Grammar Goals: To investigate and learn about: a. the prehistory of Sanskrit b. the development of the language within its historical attestation (Vedic into Classical Sanskrit) c. the external history of the language; effects of language contact and the sociolinguistic setting in ancient India d. those aspects of the synchronic grammar of Sanskrit that receive particular illumination when viewed in the context of their historical background and development and in so doing, to further understanding of methods and practices of historical linguistics. Specific Topics To Be Covered (more or less in this order): a. basics on comparative grammar, the comparative method, and language relatedness b. Sanskrit in its Indo-European context; connections with other IE languages c. Sanskrit within Indo-Iranian d. Sanskrit within Indic; the relationship of Sanskrit with Prakrit e. Sanskrit historical phonology (viewed against its IE background): • the relationship between IE ablaut and Sanskrit vowel gradation • the historical sources of nasal strengthening • Sanskrit sandhi peculiarities viewed historically • aspiration alternations viewed historically f. Sanskrit historical morphology, especially concerning the verb, and especially: • the origin and development of the present classes • the perfect system -
The Shape of Phylogenetic Trees (Review Paper)
REVIEW PAPER: THE SHAPE OF PHYLOGENETIC TREESPACE KATHERINE ST. JOHN Abstract: Trees are a canonical structure for representing evolutionary histories. Many popular criteria used to infer optimal trees are computationally hard, and the number of possible tree shapes grows super-exponentially in the number of taxa. The underlying structure of the spaces of trees yields rich insights that can improve the search for optimal trees, both in accuracy and running time, and the analysis and visualization of results. We review the past work on analyzing and comparing trees by their shape as well as recent work that incorporates trees with weighted branch lengths. Keywords: tree metrics, treespace, maximum parsimony, maximum likelihood. Tree structures have long been used to represent the evolutionary histories of sets of species. For example, the tips of the trees represent extant species and the internal nodes represent speciation events. Despite its simplicity, the tree model captures much of the complexity of the underlying phenomena. However, the sheer number of possibilities forces many simply presented operations on trees to be computationally hard. For example, the maximum parsimony criteria (Farris, 1970; Fitch, 1971) that seeks the tree with the minimal number of changes across the edges is compu- tationally hard to compute exactly (Foulds and Graham, 1982). The addition of weights on the branches, to denote quantities such as amount of evolutionary change, the time, or the confidence on the existence of the branch, adds complexity to the model (Felsenstein, 1973, 1978). A pop- ular corresponding optimality criteria for weighted trees, the maximum likelihood criteria, is also computationally hard (Roch, 2006). -
Autochthonous Aryans? the Evidence from Old Indian and Iranian Texts
Michael Witzel Harvard University Autochthonous Aryans? The Evidence from Old Indian and Iranian Texts. INTRODUCTION §1. Terminology § 2. Texts § 3. Dates §4. Indo-Aryans in the RV §5. Irano-Aryans in the Avesta §6. The Indo-Iranians §7. An ''Aryan'' Race? §8. Immigration §9. Remembrance of immigration §10. Linguistic and cultural acculturation THE AUTOCHTHONOUS ARYAN THEORY § 11. The ''Aryan Invasion'' and the "Out of India" theories LANGUAGE §12. Vedic, Iranian and Indo-European §13. Absence of Indian influences in Indo-Iranian §14. Date of Indo-Aryan innovations §15. Absence of retroflexes in Iranian §16. Absence of 'Indian' words in Iranian §17. Indo-European words in Indo-Iranian; Indo-European archaisms vs. Indian innovations §18. Absence of Indian influence in Mitanni Indo-Aryan Summary: Linguistics CHRONOLOGY §19. Lack of agreement of the autochthonous theory with the historical evidence: dating of kings and teachers ARCHAEOLOGY __________________________________________ Electronic Journal of Vedic Studies 7-3 (EJVS) 2001(1-115) Autochthonous Aryans? 2 §20. Archaeology and texts §21. RV and the Indus civilization: horses and chariots §22. Absence of towns in the RV §23. Absence of wheat and rice in the RV §24. RV class society and the Indus civilization §25. The Sarasvatī and dating of the RV and the Bråhmaas §26. Harappan fire rituals? §27. Cultural continuity: pottery and the Indus script VEDIC TEXTS AND SCIENCE §28. The ''astronomical code of the RV'' §29. Astronomy: the equinoxes in ŚB §30. Astronomy: Jyotia Vedåga and the