Review of Pagel Et Al. 2013: "Ultraconserved Words Point to Deep Language Ancestry Across Eurasia" Jaakko Häkkinen, 14Th May 2013

Review of Pagel et al. 2013: "Ultraconserved words point to deep language ancestry across Eurasia" Jaakko Häkkinen, 14th May 2013 The aim of the authors is to reach beyond the conven- *kêri 'bark'. Such cognates are not seen in the tional language families, all the way to the Ice Age Swadesh lists, because in the modern languages a superfamily from which many of the present Eurasian more common word is used instead of the old language families would have evolved. However, cognate: in the respective order Finn. rinta 'breast' their statistical method can never reliably define any instead of mälvi, Hung. lát- 'to see' instead of néz-, single concrete word to be inherited from this sup- posed "Proto-Eurasiatic" parent language. Finn. rasva 'fat' instead of kuu, Finn. hammas 'tooth' instead of pii, Finn. kaarna 'bark' instead of keri. These examples suffice to show that the mean- Words change by form and meaning ing-based Swadesh list gives as a result clearly fewer cognates than the form-based etymological It is well known that words change by their form. list, because the latter requires only formal cog- Within the language families there are many regu- nates, while the former requires both formal and lar cognates, which are beyond recognition with- semantic cognates. out historical linguistic training: the Latin word Taxonomically this means, that the meaning- rex 'king' is a regular cognate of the Hindi based Swadesh list gives much more distant rela- 'prince', and the Hungarian word egér 'mouse' is a tionship between Hungarian and Finnish than the regular cognate of the Finnish hiiri 'mouse'. It is form-based etymological list. A possible source of rather difficult to find words which would be very error arises, if the two Uralic branches in question similar by form in the remote ends of a language happen to have above the average amount of se- family. mantic shifts (changes in the meaning of the Not only form, but also meaning (the semantic word). Such an occasion could result as an erro- property of a word) can change. Etymological neous taxonomic illustration (family tree) of the cognates give a very different picture from that of language family: some other language with less "normal" Swadesh list cognates based on mean- semantic shifts or rarefications caused by newer ings. A few observations between Finnish and words might seem to be closer than a language Hungarian: with more semantic shifts or rarefications. – For the meaning 'snake' the words are not cognates: Hungarian has kígyó, Finnish has käärme. The Finnish word is a Baltic loan- word (cf. Lithuanian kirmis 'snake'), but Finnish also has a cognate for the Hungarian word: kyy 'adder' < *küji(wa) 'snake' > kígyó 'snake'. The Finnish word has gone through a specializing semantic shift. – For the meaning 'bird' the words are not cognates: Hungarian has madár, Finnish has lintu. But Hungarian also has a cognate for the Finnish word: lúd 'goose' < *lônta 'bird' > lintu 'bird'. The Hungarian word has gone through a specializing semantic shift. Temporally the situation means, that the common There are also other words in the Swadesh lists, protolanguage behind Hungarian and Finnish is which have cognates both in Hungarian and Finn- erroneously assessed much older than it actually ish but which have undergone a semantic shift or is. The meaning-based Swadeshian lexicostatistics become rarefied by newer words in one of the simply ignores many cognates existing in the languages: for example Proto-Uralic *mälki modern languages, even though all the cognates 'breast', *näki- 'to see', *kuji 'fat', *piŋi 'tooth', should definitely be taken into account when as- 1 sessing the age of the split between the related The authors of the article do not follow this order languages. Therefore the best basis for any lexi- – they totally skip the second item and subcontract costatistic or glottochronological study is the the first item to their source database. But if they form-based etymological list. cannot prove the true cognateness of the words, There are also other problems in lexicostatis- their claims are baseless: there is absolutely no tics and glottochronology, concerning the many way how high frequency and thus high age of distorting processes affecting the lexical level. the words could testify the words to be of the Because of these processes the lexical level can same origin! never be reliable at its own, but the phonological Without the correct procedure there are three level should always be applied, as it is the most equally possible options: reliable level for finding the true taxonomic struc- 1. Words are not related (they resemble each ture of a language family. More about this subject other by chance). can be found in my recent article "After the proto- 2. Words reflect ancient contacts (borrowed language: Invisible convergence, false divergence from one language to another). and boundary shift" in Finnisch-Ugrische 3. Words are true cognates (inherited from the Forschungen 61 (Häkkinen 2012b). I also men- common protolanguage). tion the main points in my draft "Problems in the method and interpretations of the computational phylogenetics based on linguistic data – An example of wishful thinking: Bouckaert et al. 2012" http://www.elisanet.fi/alkupera/Problems_of_phyl ogenetics.pdf (Häkkinen 2012a). The etymological procedure – or lack of it All these kind of words can be old and in frequent Pagel et al. state: use, so the method of Pagel et al. (2013) cannot "We have recently extended this result to include distinguish between these options. How much speakers from the Uralic, Sino-Tibetan, Niger-Congo, they try, they cannot avoid the problem of the lack Altaic, and Austronesian families, in addition to Indo- of proper etymological study of the words con- European, plus the isolate Basque and the Creole Tok cerned. The authors rely on the etymological da- Pisin. Even in languages as widely divergent as these, we found that a measure of the average frequency of tabase of the Eurasiatic languages, but a glimpse use predicted rates of lexical replacement as estimated is enough to reveal that this database does not in the Indo-European languages." contain reliable etymological data, either: This is a very interesting and promising result. – It applies rather vague semantic criteria (like Yet it is important to understand what can and so many long-range comparisons), making it what cannot be deduced from this. The main prob- all the more probable that there are many lem with their method is the lack of normal ety- chance resemblances included. mological procedure, which is so elementary in – It does not systematically present regular historical linguistics. There is a certain procedure sound correspondences and does not exclude how they should proceed with their method, when words if they do not fit into the regular pat- trying to prove the words to be cognates: tern. Also the many perils of omnicompara- 1. Search for the formal cognates with similar tivism should be pointed out, but this is not enough meaning. the place for that. 2. Try to distinguish the regular inherited cog- – It contains many loanwords (mainly due to nates from the mutual borrowings by defin- the previous point), like Uralic *teki- 'to do' ing sound correspondences. from Indo-European *dheh-, presented erro- 3. See if the words fulfill the criterion for being neously as cognates. It is highly improbable, frequent in use. that the words which are as similar between 4. Propose that a word in the present language Proto-Uralic and Proto-Indo-European as the families is inherited from the common current "Indo-Uralic" words, could be inher- Eurasiatic superprotolanguage. ited from their common protolanguage. It is impossible to explain how the languages could have developed so different from each other and at the same time not have gone 2 through many major sound changes – espe- The authors seem to confuse stability (no formal cially when we see that all Indo-European shift) with constancy (no formal change). and Uralic daughter languages/branches have 1. Semantic shift (new meaning replaces the gone through many differentiating sound old meaning) changes. 2. Formal shift (new word replaces the old word) The database merely collects similar-looking 3. Formal change (the old word goes through words with similar-looking meanings together, but sound changes) this is not a scientific result yet – it is only a max- imal corpus of data, which still requires a lot of The authors can thus exclude the first two pro- etymological work with the help of historical cesses but not the third one. Concerning their se- phonology. Before this work is done, we cannot cond point, there is no way to find out whether the tell true cognates, loanwords and chance re- "ultraconserved" words (conserved only in the semblances apart from each other. sense of semantic shift and formal shift) represent So, neither the authors nor their source data- chance resemblance, old borrowings or true cog- base possess any reliable etymological corpus. nates (see afore). The authors write: Therefore their method even theoretically could "We use this framework to predict words likely to not prove any distant affinity between the Eura- be shared among the Altaic, Chukchi-Kamchatkan - -, sian protolanguages. Considering the three possi- Dravidian, Eskimo (hereafter referred to as Inuit- bilities above, their method has merely a chance Yupik), Indo-European, Kartvelian, and Uralic lan- of guessing to get it right: there is a probability of guage families. These seven language families are hypothesized to form an ancient Eurasiatic superfamily 33,3 % with every single word to be inherited, that may have arisen from a common ancestor over 15 borrowed or chance resemblance. kya, and whose languages are now spoken over all of the Eurasiatic words found with the method of Eurasia." Pagel et al.

Load more