Summer School: Düsseldorf, July – August 2002

Home , Joseph Greenberg, Mass comparison

Approaches to Language Change April McMahon University of Sheffield ([email protected])

Session 7 Reversing Language Change: Reconstruction and Classification

1. Historical linguists are interested in change; reconstruction; and classification. - how can we demonstrate and represent genetic relationship among languages?

2. Let’s start with representation, and worry later about how we establish relatedness in the first place.

Most commonly: family tree diagrams showing relationships of descent and differentiation. - August Schleicher, 1861. - 'borrowing' from biology (although the biological tree began as a method of showing only static, synchronic relatedness, within a Creationist model, pre- Darwin).

3. Trees are convenient diagrammatic representations. They are based on the assumption that a restructuring of the system in some dialect of the ancestor language leads to a split into daughter languages:

Bengali Hindi jib ji:bh 'tongue' din din 'day' dur du:r 'distant' sona sun- 'bear'

- the original vowel length contrast persists in Hindi but has been lost in Bengali.

4. Information of this kind can allow us to subgroup languages within a family:

Bengali Hindi Oriya Assamese jib ji:bh jibh zibha din din din din dur du:r dur dur sona sun- sun- sun

- Hindi alone retains the length distinction in vowels. This guides us as to where to set up nodes within the family tree for the Indic languages. 2

5. Subgrouping is based on shared innovations: all languages in a subgroup share some change or feature NOT shared by other related languages descended from the same protolanguage. • this can be problematic if common, very plausible changes are involved, since these could conceivably be innovated in all the relevant daughters independently. • so, subgrouping hypotheses are stronger when based on shared aberrancies, which are less likely to reflect borrowing or chance

6. English German French good gut bon better besser meilleur best best- le meilleur - English and German don't just share suppletion, but very similar suppletive forms with repeated correspondences.

7. Problems with the family tree model: • it is idealised. Trees rarely if ever show the 'lowest' linguistic variants, i.e. dialects, and therefore go against the trend in modern historical linguistics to study variation and change in progress. • the splits shown in a family tree look immediate and carry the implication that no further contact takes place between speakers of the languages concerned. But we know that change is often gradual, and that contact between speakers of languages, related or not, is very common. • how can the results of contact be shown? How do we show borrowing without the trees involved seeming to 'join up'? How can different degrees of borrowing be shown? And what about pidgins and creoles? • the general outline of a family may be clear but the subgrouping unclear (e.g. Indo-European on the one hand, Balto-Slavic, Italo- Celtic, Indo-Hittite on the other); or vice versa. • the number of possible trees per set of languages can be immense. For two languages, there are two possible trees (if the languages are related: A is the mother of B, B is the mother of A, or they are sisters). For 3 languages, there are 6 possible trees. For 4, there are 15 (assuming none of the languages is the ancestor of any of the others). And so on. So how do we know that our tree is the right one? (See below). • the configuration of the tree might lead us to believe that the protolanguage at each stage is uniform, without dialect variation. This is obviously unrealistic.

8. What happens if changes due to contact are not recognised as such? - take the Tai languages (the best known are Thai and Laotian). These are a branch of the Austro-Tai subfamily of the Austric family (other branches are Miao-Yao, Austro-Asiatic...). 3

- for some time, Austro-Tai was thought to be related to Chinese and therefore Sino-Tibetan, because both have tone, are predominantly monosyllabic, and there are some apparent cognates. - it's now recognised that the presence of such general phonological features is more useful for typological than genetic classification. - the apparent cognates don't hold when other Sino-Tibetan languages are included. They result from early borrowing between Chinese and Tai. - Austro-Tai is now seen as a subfamily of Austronesian, not Sino-Tibetan. Clearly, this affects the family trees of both families.

9. But such revisions can only be made if: • we have enough evidence of the external history to know whether, and to what extent, contact was likely. • we have a good idea of the usual pattern of change within families. • we can rule out a history that would make a language 'non-genetic' • we have enough evidence to look thoroughly at both basic and non-basic vocabulary across the languages concerned.

10. The wave model: Schmidt 1872. - models the spread of linguistic innovations, which may cross language and dialect boundaries. - notation takes the form of an isogloss map - related to the idea that changes, particularly sound changes, diffuse: that is, they begin in particular areas, often those of relative sociopolitical importance (focal areas), and spread out into transition areas, which may be affected by innovations from more than one source. Certain socially or geographically isolated areas may remain unaffected (relic areas).

11. Problems with the wave model: - the diagrams can be extremely hard to read. - they represent only variation at a particular synchronic point, and do not have a straightforward diachronic dimension. - they cover only geographically adjacent areas, but particularly in the present century, borrowings can take place between non-adjacent languages.

12. Dixon (1997): borrowing the concept of punctuated equilibrium from biology.

Language change may follow a more or less treelike pattern at different stages. For long periods, there will be equilibrium, during which most change will be contact-induced. But there will then periodically (for language-external reasons) be punctuations, during which languages diversify and split.

13. Demonstrating relatedness: the Comparative Method. English Latin German Kannada mouse mus Maus ili father pater Vater appa three tres drei muru fish piscis Fisch minu 4

14. The Comparative Method requires forms which are similar in sound and meaning in the different languages under comparison. If sets of ‘matching’ words show regular correspondences across the lexicon, including the basic vocabulary (that is, if all or at least most cases of English [f] in a particular environment correspond to German [f] and Latin [p]), we assume that the items are cognate and that they, and the languages, come from a single common ancestor. We can then reconstruct the most likely ancestral form in the common ancestor of these daughters – that is, the protolanguage.

Proto-Indo-European *owis Lithuanian awis Greek ois Luwian hawi Sanskrit avis Latin ovis English ewe Old Irish oi

15. The Comparative Method seems to work! There have been some notable successes using the Method: take, for instance, the decipherment of Linear B. A series of labio-velar stops */kwh kw gw/ is generally reconstructed for PIE. Reflexes of /kw/ were Latin /kw/, Old English /hw/, Sanskrit /k/ or /s/ - but Greek /p/ before back vowels and /t/ before front vowels. There was no evidence from Greek for the labiovelars. But in Linear B there is a symbol transliterated in the right environments: Greek Linear B Latin Sanskrit 'four' tettares qetr- quattuor catur 'and' -te -qe -que ca 'horse' hippos i-qo equus asvas

16. Although this illustration is from Indo-European, the Comparative Method can be, and has been applied to other families. It has also been used in ONE type of longer-range comparison. There is an increasing trend in language classification and reconstruction to attempt to connect larger groups of languages. Construction of superfamilies, megafamilies, macrofamilies, phyla... Phylum linguistics. Long-range comparison. Ultimately, assumptions of monogenesis, postulation of 'global etymologies', reconstruction of Proto-World.

17. Holger Pedersen, C19th. Proposed hypothetical superfamily Nostratic (< Latin nostras, 'our countrymen'). Nostratic = IE, Finno-Ugric, Altaic, Caucasian, Dravidian, Afro-Asiatic. Later, Pedersen argued that all extant languages are descended from a single common ancestor. Nostratic now: Dolgopolsky, Manaster-Ramer, Bomhard.

18. 'Nostratic approach' to superfamilies: involves Comparative Method. Reconstruct protolanguages for each family, then compare these to reconstruct the more distant common ancestor. Extra stage of comparison. 5

19. Problems: - 'reaching down' - incomplete reconstruction for individual member families. For instance, Afro-Asiatic (formerly known as Hamito-Semitic) is a large family, with partly unclear classification; Proto-Afro- Asiatic has therefore not been fully reconstructed. Semitic is the main branch to have been investigated / reconstructed: therefore PIE is very frequently compared with Proto-Semitic. Is it reasonable to assume that conclusions drawn on this basis will hold for a comparison of IE with the whole of AA? - does comparing general factors like grammatical gender, affixation patterns etc. risk confusing typology with genetics? - what about problems with time-depth and the Comparative Method?

20. How good are the lexical correspondences? a. Proto-Semitic *br/ 'to bear, bring forth, create': PIE *ber- 'to bear, carry, bring forth'. Evidence: Hebrew bara' 'shape, create'; Aramaic b«ra' 'create', bar 'son'; Egyptian bry 'young'; Sanskrit bharati 'bear'; Gothic bairan 'bear, carry, bring forth', barn 'child'. b. Proto-Afroasiatic *mw- 'water': PIE *meu- 'to flow, be wet, damp'. Evidence: Arabic ma/ 'water'; Sanskrit mutram 'urine'; Lithuanian maudyti 'to bathe'.

21. Away from e.g. Indo-European, where not so much research has taken place and there is not so much early written evidence to supplement and verify reconstructions, other methods of comparison are being proposed. For instance, Joseph Greenberg's method of mass comparison allows him to propose 3 language families in the New World. - there are perhaps 20-30 families in the world as a whole, plus various language isolates, but previous scholarship in the Americas had suggested a much larger number of language groups, perhaps as many as 200 (although such work, based rigidly on the painstaking Comparative Method, must be seen as work in progress, which will ultimately lead to larger, higher-level groupings).

22. Classification in the New World: lumpers vs. splitters. 'Mr. John P. Harrington announces that he has found genetic relationship between Washoe and Chumashan.' (American Anthropologist, 1917)

23. Eskimo-Aleut - c.10 languages - Inuit, Yupik... Na-Dene - c.34 languages - Athabaskan, incl. Navajo...; Haida?; Eyak; Tlingit 6

Amerind - over 500 languages - Macro-Ge - Macro-Panoan - Macro-Carib - Equatorial - Macro-Tucanoan - Andean - Chibchan-Paezan - Central Amerind - Hokan - Penutian - Almosan-Keresiouan

- Eskimo-Aleut and Na-Dene are small and uncontroversial families. Amerind is huge, with over 500 languages and 11 substocks; a macrofamily or phylum. the

24. Greenberg's method of comparison does NOT involve finding recurrent correspondences of sound and meaning, or reconstructing protoforms. Instead, he collected (either first hand or from secondary sources) lists of words from any available languages, and identified those which are roughly similar in form and meaning as evidence for relatedness. He grouped these under semantic headings and refers to them as 'etymologies'.

25. To say that Greenberg's methodology is controversial would be a wild understatement. - most linguists do not reject mass comparison, but see it as a 'quick and dirty' first test to see if it is worth applying the more rigorous, time-consuming, and perhaps even testable Comparative Method. Greenberg argues that it can stand alone, and that careful 'bottom-up' comparison and reconstruction is unnecessary. - it is unclear to what extent Greenberg's work would be repeatable. His notebooks and word lists are available, but in some cases he has retranscribed forms, and it is not always clear what sources he has used. There are no clear constraints on which or how many forms he cites; or on the degree of phonetic or semantic similarity which can be permitted within a single 'etymology'. - e.g. in etymology 11, 'ashes', Greenberg includes forms pram, bri-, mreje, e-timu, poša-ci, poggo, mukne, puxusu, among others. He claims it is easy to group languages according to such forms: 'given pan - fan - ezuk, who would hesitate?' But what about e.g. pan - fan - vin? Or er - erku - duo (the words for 'two' in Mandarin, Armenian, and Latin)? - e.g. some etymologies contain a wide range of meanings; take black / green / grass/ blue/ night / excrement, or body / belly / heart/ skin / meat/ be greasy / fat / deer, or feather / hair / wing / leaf. 7

- there are numerous errors in Greenberg's data. Goddard claims that, of 142 etymologies containing Algonquian data, errors invalidate 93. These include wrong transcriptions; loan words included; forms listed that belong to other languages; language names being given which do not exist. Greenberg claims that these errors will simply cancel one another out, and that his method is so robust that mistakes are irrelevant. It is possible, however, that errors will compound. - the chances of wrongly identifying an accidental resemblance as evidence of relatedness is very high, especially since there is no condition requiring Greenberg to include data from a particular number of systems for the results to be seen as valid. So, resemblant forms from any two languages will constitute proof of relationship for the two substocks of which they are part. And because relatedness is taken to be transitive, if languages A and B are related, and so are B and C, then so are A and C.

26. On this basis, Campbell has demonstrated that Finnish is probably Amerind.

-n '1st person'. Finnish -n 'I', mene-n 'go-I', -ni 'my' -s '2nd person'. Finnish -si 'your (sg.)', talo-si 'house-your' 'be able, strong, know, male' Am. tuman, temma, ndom, etamax F. tomera 'strong, staunch'; tuima 'strong' 'anger, angry, be angry' Am. aka?a, ?ika F. aka- 'anger, be angry'

27. Eurasiatic: Indo-European Uralic-Yukaghir Altaic Korean Japanese Ainu Gilyak Chukchi-Kamchatkan Eskimo-Aleut

28. Global etymologies?

29. If we believe this kind of comparison is too lax, how can we refute it? We may say that the Comparative Method is preferable, but how do we prove it? Greenberg invokes evidence from archaeology and genetics to support his three-way classification in the Americas: can we blame colleagues from other disciplines for being attracted to these big ideas and bold hypotheses? All this reduces to the fact that in some ways the Comparative Method seems to be out of step with current trends elsewhere in historical linguistics, where there is a tendency to prioritise quantitative methods. Comparative Method tends to be presented as more of an art than a science: first know ‘your’ languages inside out, then you will get a feeling for how (or whether) they are related. 8

30. Increasingly there are moves to introduce quantification into language comparison. - Ringe and co-workers: computational cladistics. Heavily based on the Comparative Method. Involves using a computer algorithm to search for the ‘perfect phylogeny’, the tree most consistent with the linguistic data used. Character-based: that is, we build in our knowledge of the most salient features for the language groupings concerned. These may be lexical (mostly), phonological, morphological. - ‘Quantitative Methods in Language Classification’ – project funded by the Arts and Humanities Research Board and hosted by the University of Sheffield. Includes my own work with Robert McMahon: using tree- drawing and tree-selection programs from biology to calculate which is the best of all the trees which could possibly be drawn to show the relationships between a set of languages. We can apply these programs to very familiar data indicating simply whether two forms are cognate or not; or…. - Paul Heggarty is developing a method of measuring phonetic similarity. This is based on an objective assessment of phonetic distance, at the moment on an articulatory basis. This can be used directly for a segment-to-segment comparison; or it may be applied to whole words in two different dialects or languages, in which case the two are compared through a reconstruction of the common ancestor, in order to determine which segment should be compared with which.

Campbell, Lyle (1988) Review of Greenberg (1987) Language 64: 591-615 Dixon, R.M.W. (1997) The Rise and Fall of Languages. CUP. Durie, Mark and Malcolm Ross (eds.) (1996) The Comparative Method Reviewed. OUP. Fox, Anthony (1995) Linguistic Reconstruction. OUP. Ch.9 Greenberg, Joseph H. (1987) Language in the Americas. Stanford University Press. *Greenberg, Joseph, Christy G. Turner II and Stephen L. Zegura (1986) 'The settlement of the Americas: a comparison of the linguistic, dental, and genetic evidence.' Current Anthropology 27: 477-97 *Heggarty, Paul (2000) ‘Quantifying change over time in phonetics.’ In Colin Renfrew, April McMahon and Larry Trask (eds.) Time Depth in Historical Linguistics Vol.2: 531-62. Cambridge: McDonald Institute for Archaeological Research. Lass, Roger (1997) Historical Linguistics and Language Change. CUP. *Matisoff, James (1990) 'On megalocomparison.' Language 66: 106-20 *McMahon, April and Robert McMahon (1995) 'Linguistics, genetics and archaeology: internal and external evidence in the Amerind controversy.' Transactions of the Philological Society 93.2: 125-225. McMahon, April and Robert McMahon (in press) ‘Finding families: quantitative methods in language classification.’ To appear in Transactions of the Philological Society 101 (2003). Nettle, Daniel (1999) Language Diversity. OUP. *Ringe, Don, Tandy Warnow and Ann Taylor (2002) ‘Indo-European and computational cladistics.’ Transactions of the Philological Society 100: 59-129.