Automated Dating of the Worlds Language Families Based On

Automated Dating of the Worlds Language Families Based On

Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2011 Automated Dating of the World’s Language Families Based on Lexical Similarity Holman, Eric W ; Brown, Cecil H ; Wichmann, Søren ; Müller, André ; Velupillai, Viveka ; Hammarström, Harald ; Sauppe, Sebastian ; Jung, Hagen ; Bakker, Dik ; Brown, Pamela ; Belyaev, Oleg ; Urban, Matthias ; Mailhammer, Robert ; List, Johann-Mattis ; Egorov, Dmitry Abstract: This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Sim- ilarity Judgment Program (ASJP) consortium, is different from glottochronology in four major respects: (1) it is automated and thus is more objective, (2) it applies a uniform analytical approach to a single database of worldwide languages, (3) it is based on lexical similarity as determined from Levenshtein (edit) distances rather than on cognate percentages, and (4) it provides a formula for date calculation that mathematically recognizes the lexical heterogeneity of individual languages, including parent lan- guages just before their breakup into daughter languages. Automated judgments of lexical similarity for groups of related languages are calibrated with historical, epigraphic, and archaeological divergence dates for 52 language groups. The discrepancies between estimated and calibration dates are found to be on average 29% as large as the estimated dates themselves, a figure that does not differ significantly among language families. As a resource for further research that may require dates of known level of accuracy, we offer a list of ASJP time depths for nearly all the world’s recognized language families and formany subfamilies. DOI: https://doi.org/10.1086/662127 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-132915 Journal Article Published Version Originally published at: Holman, Eric W; Brown, Cecil H; Wichmann, Søren; Müller, André; Velupillai, Viveka; Hammarström, Harald; Sauppe, Sebastian; Jung, Hagen; Bakker, Dik; Brown, Pamela; Belyaev, Oleg; Urban, Matthias; Mailhammer, Robert; List, Johann-Mattis; Egorov, Dmitry (2011). Automated Dating of the World’s Language Families Based on Lexical Similarity. Current Anthropology, 52(6):841-875. DOI: https://doi.org/10.1086/662127 Automated Dating of the World’s Language Families Based on Lexical Similarity Author(s): Eric W. Holman, Cecil H. Brown, Søren Wichmann, André Müller, Viveka Velupillai, Harald Hammarström, Sebastian Sauppe, Hagen Jung, Dik Bakker, Pamela Brown, Oleg Belyaev, Matthias Urban, Robert Mailhammer, Johann-Mattis List, Dmitry Egorov Reviewed work(s): Source: Current Anthropology, Vol. 52, No. 6 (December 2011), pp. 841-875 Published by: The University of Chicago Press on behalf of Wenner-Gren Foundation for Anthropological Research Stable URL: http://www.jstor.org/stable/10.1086/662127 . Accessed: 29/11/2011 14:07 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. The University of Chicago Press and Wenner-Gren Foundation for Anthropological Research are collaborating with JSTOR to digitize, preserve and extend access to Current Anthropology. http://www.jstor.org Current Anthropology Volume 52, Number 6, December 2011 841 Automated Dating of the World’s Language Families Based on Lexical Similarity by Eric W. Holman, Cecil H. Brown, Søren Wichmann, Andre´Mu¨ller, Viveka Velupillai, Harald Hammarstro¨m, Sebastian Sauppe, Hagen Jung, Dik Bakker, Pamela Brown, Oleg Belyaev, Matthias Urban, Robert Mailhammer, Johann-Mattis List, and Dmitry Egorov CAϩ Online-Only Material: Supplement A PDF This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Sim- ilarity Judgment Program (ASJP) consortium, is different from glottochronology in four major respects: (1) it is automated and thus is more objective, (2) it applies a uniform analytical approach to a single database of worldwide languages, (3) it is based on lexical similarity as determined from Levenshtein (edit) distances rather than on cognate percentages, and (4) it provides a formula for date calculation that mathematically recognizes the lexical heterogeneity of individual languages, including parent languages just before their breakup into daughter languages. Automated judgments of lexical similarity for groups of related languages are calibrated with historical, epigraphic, and archaeological divergence dates for 52 language groups. The discrepancies between estimated and calibration dates are found to be on average 29% as large as the estimated dates themselves, a figure that does not differ significantly among language families. As a resource for further research that may require dates of known level of accuracy, we offer a list of ASJP time depths for nearly all the world’s recognized language families and for many subfamilies. for items on such a list are replaced in individual languages The greater the degree of linguistic differentiation within over time at a more or less constant rate, Swadesh devised a a stock, the greater is the period of time that must be formula, using cognate percentage as input, for calculating assumed for the development of such differentiations. the length of time since a language divergence occurred. (Sapir 1916:76) Glottochronology, as formulated by Morris Swadesh (1950, Scientist in the Department of Linguistics at the Max Planck Institute for Evolutionary Anthropology (Deutscher Platz 6, D-04103 Leipzig, 1955), is a method for estimating the amount of elapsed time Germany). Andre´Mu¨ller is a student of linguistics and Chinese since phylogenetically related languages diverged from a com- studies at the University of Leipzig (Fachschaftsrat Linguistik, mon ancestral language. This approach involves determining Geisteswissenschaftliches Zentrum, Beethovenstraße 15, D-04107 the percentage of words that are cognate in a standard list of Leipzig, Germany). Viveka Velupillai is Research Fellow in the basic vocabulary. Working with the assumption that words Department of English at the Justus-Liebig-Universita¨t Gießen (Otto- Behaghel-Straße 10 B, D-35394 Gießen, Germany). Harald Hammarstro¨m is Postdoctoral Fellow in the Centre for Language Eric W. Holman is Professor Emeritus in the Department of Studies at Radboud Universiteit (Postbus 9103, NL-6500 HD Psychology at the University of California, Los Angeles (405 Hilgard Nijmegen, The Netherlands) and in the Department of Linguistics Avenue, Los Angeles, California 90095-1563, U.S.A.). Cecil H. Brown at the Max Planck Institute for Evolutionary Anthropology is Distinguished Research Professor Emeritus at Northern Illinois (Deutscher Platz 6, D-04103 Leipzig, Germany). Sebastian Sauppe University and Faculty Associate at the University of West Florida is Research Assistant in the Department of Linguistics at the Max (1700 Scenic Highway, Apartment 601, Pensacola, Florida 32503, Planck Institute for Evolutionary Anthropology (Deutscher Platz 6, U.S.A. [[email protected]]). Søren Wichmann is Senior D-04103 Leipzig, Germany). Hagen Jung is Computer Scientist at ᭧ 2011 by The Wenner-Gren Foundation for Anthropological Research. All rights reserved. 0011-3204/2011/5206-0003$10.00. DOI: 10.1086/662127 842 Current Anthropology Volume 52, Number 6, December 2011 Glottochronology has had a checkered history since its for- torical, epigraphic, and archaeological dates of language di- mulation some 60 years ago. An early review by Hymes (1960) vergence gathered from published sources. This calibration was generally favorable. Later, Embleton (1986) provided a not only facilitates estimation of dates but also allows quan- judicious summary of both positive and negative views. More titative evaluation of the accuracy of the calculated dates. The recently, the pros and cons of the method were discussed in observed level of accuracy can serve as the basis for informed numerous chapters of a collection edited by Renfrew, Mc- decisions as to how to use dates calculated by the same Mahon, and Trask (2000). We do not intend to continue the method for other groups. debate on the theoretical merits and demerits of glottochro- nology. Instead, we describe a new approach that infers lan- guage divergence from lexical similarity without the pro- The Automated Similarity Judgment tracted linguistic analysis required for cognate identification. Program Project Several distinct processes can cause lexical similarity among genetically related languages to diminish with the passage of The present approach is developed within the Automated time. One process is systematic change in sounds, whereby Similarity Judgment Program (ASJP),1 first described by commonly inherited words in related languages become pho- Brown et al. (2008). Brown et al. also review previous research nologically different. Other processes involve replacement of

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    37 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us