<<

Cladistic and reticulate processes in language change and diversification

Sarah Grey Thomason University of Michigan January 2002

1. Introduction. This paper explores some of the ways in which linguistic evidence can contribute to the effort to discover and understand the interweaving of linguistic, cultural, and genetic change. I will begin by discussing the historical linguist’s concept of language families, focusing on the problem of ‘synchronizing the clocks’ (John Moore, this volume) that represent different dating techniques in , archaeology, and genetics ( 2). Section 3 surveys the most extreme outcomes of language contact and attempts to § provide answers, from a linguist’s viewpoint, to two questions posed by William Durham (this volume). First, can one predict when ethnogenesis will come about through cladistic development and when it will come about via amalgamation, or reticulate development? And second, a question about tempo and mode: does linguistic change and diversification happen gradually, through ‘insensibly fine gradations’, or abruptly via punctuated equilib- rium? Section 4 then addresses (very tentatively) the prospects for using linguistic evidence in conjunction with evidence from other branches of anthropology, especially genetics, to achieve a unified picture of population origins. Throughout the paper, most of my examples will be drawn from the New World. My main conclusion will be that cladistic development is by far the most common route to language genesis, with reticulate origin a distant second: when different languages come into contact, amalgamation of the languages does sometimes occur, but it rarely results in a stable mixed language that becomes a community’s primary medium of communication. The reasons for the low probability of amalgamation are social, not linguistic. A related conclusion is that punctuated equilibrium, though it could be invoked frequently (if rather trivially) in describing specific linguistic changes, is probably not an important factor in the split of a parent language into two or more daughter languages to form a . Language splits probably proceed much more often by insensibly fine gradations. (But here the ‘probably’ hedge is needed, for reasons I’ll explain later on.) 2. Estimating time depths for language families. In the elaborate biologi- cal metaphor used by historical linguistics since the nineteenth century, a language family consists of a single parent language and a set of one or more descendent languages, which are changed later forms of the earlier parent language. The parent language is called a proto-language if it is unattested (literally, prehistoric), and its descendants are its daughter languages. The members of a language family, parent and daughters, are said to be genet- ically related. Crucially, parent and daughter languages do not co-exist as contemporary living languages. That is, while it is certainly possible for a parent language to live on in written or even limited spoken form after its split into two or more daughter languages, it will not continue as the major language of any speech community, and it is unlikely to be learned as a first language by children. is the most famous example of a parent lan- guage that survived its historical split into several daughter languages (the modern Romance languages): it continued to be used as a (or the) major language of European diplomacy long after its descendants replaced it as major languages of speech communities. Sanskrit is another example—a more typical one, because its continuing cultural importance was due to its status as the language of a major religion. Language families vary greatly in size and complexity. At one extreme, a few of the world’s language families are enormous and complex, with hundreds of member languages. The two most spectacular examples are Niger-Congo in sub-Saharan Africa and Austronesian in Oceania and parts of southeast Asia, but the Indo-European family, with far fewer lan- guages but with worldwide distribution (thanks primarily to colonialism), is of even greater importance in the international ecology of languages. At the other extreme, a non-trivial number of languages, e.g. Basque in western Europe and in Pakistan, constitute one-member language families, as far as we can tell (at least at present): they are certainly descended from earlier parent languages, but their respective parents apparently underwent no splits, each developing instead in a straight-line fashion into a single isolated modern language. In other words, language split is not inevitable: whether or not splits will occur is a function of non-linguistic factors such as the amount of territory occupied by a language’s speakers. Complexity in a language family is represented by branching in the metaphorical family tree, with subgroups clustering together on separate branches and often branching in

2 their turn into sub-subgroups, and so on. Historical linguists reconstruct cladistic (tree-forming) processes of language split and di- versification by means of a powerful methodology known simply as the . This method enables us to establish the fact of linguistic relationship by showing systematic sound/meaning correspondences in a particular set of languages, correspondences that are too numerous and too interlocking to be plausibly assigned to the operation of mere coinci- dence. The method also permits the reconstruction of parts of the proto-language’s lexical, phonological, and morphological structure, and to a lesser extent its syntactic structure; in fact, most historical linguists would insist that establishing genetic relationship requires showing that reconstruction is possible in all the major components of a language’s structure. Distinguishing the results of reticulate processes from the results of cladistic processes can be problematic in cases of languages that, if related at all, are only very distantly related. At shallower time depths (say, 8,000-10,000 years), the Comparative Method makes it possible to identify borrowed material that has been incorporated into one or more of the languages being compared. An occasional loanword or borrowed syntactic pattern may certainly es- cape detection. But if a language with two or more fairly closely related sister languages has borrowed extensively from another language, the borrowed items will be flagged as un- analyzable residue by the Comparative Method. The only exception to this generalization would be a case of extensive borrowing from a closely-related language; in such cases, the foreign material could in fact disrupt application of the Comparative Method—though that fact itself would be flagged by the methodology. Several such cases have been proposed from different parts of the world, especially in the Pacific, but the jury is still out on whether those instances actually do prevent successful application of the Comparative Method. The 8,000-10,000-year time depth is very, very rough and tentative, and this brings us to the problem of determining how long ago two or more related languages split off from their common parent. First the bad news: the only sure way of establishing the time depth for any language family is to have the initial split, and (if any) all subsequent splits as well, documented in datable texts. This would require dated documents written or inscribed in the parent language and in all the daughter languages. Unfortunately, such cases are vanishingly rare. There’s Latin, which (though not precisely in its most widely preserved written form)

3 is the ancestor of the Romance languages; there’s literary Old English, which is close to the ancestor of most of the Modern English dialects (there are a few kinks in the line from Old English to Modern English); there’s Classical Arabic, which is sort of the ancestor of the Modern Arabic dialects; there’s Sanskrit, which is close to the ancestor of the Modern Indic languages; and there are a very few others in the Old World and perhaps one or two in the New World. Chinese has the oldest continuous literary tradition of all the world’s living languages—the earliest known Chinese writing dates from the Shang dynasty (1766-1123? BCE)—but even aside from the fact that the ancient writing system is very conservative and tends to conceal dialect divergence, the emergence of the several different Modern Chinese languages was fairly recent. Egyptian, now dead, also had a very long written tradition (3200 BCE-1700 CE), but it remained a unitary tradition (no splits and thus only one daughter language at each stage), so it’s useless for the investigation of documented language splits. In practice, almost all of our precise information about time depths comes from a few branches of the large Indo-European (IE) family. Of all the hundreds of language families in the world, IE is by far the best understood historically, partly because it has received by far the most attention from historical linguists for the past century and a half and partly because it is so well documented, with attestations from several of its ten branches dating from before the turn of the common era. Now the good news, or at least the semi-good news: historical linguists do have ways of compensating for the distressing paucity of real-time data on language splits. The first step, always, is to work intensively on a group of related languages (after demonstrating that they are related) to get a feel for how closely related they are. Basic vocabulary can be compared and lexical closeness can be calculated. Grammatical closeness can only be judged impressionistically, but a historical linguist with solid training and experience in Indo-European can provide reasonable support for an estimate of grammatical closeness in another language family by comparison to one or more branches of Indo-European. In other words, Indo-European is usually used as the rough measuring stick for estimating closeness of relationship. Calculating the lexical closeness of related languages by means of the familiar method of can help to support a time-depth estimate if and only if the mathematical

4 dating fits well with the educated guess of a good historical linguist who knows the languages’ structures well and has carried out a significant amount of comparative research on the family. Historical linguists do not consider glottochronology to be a useful method, by itself, for estimating time depths in language families. Estimates can also be based in part on observed amounts of internal linguistic change over datable time periods; here, as with the use of Indo-European as a measuring stick for estimating closeness, the assumption is that rates of internally-motivated change will not vary widely from one family to another. As far as we know, this assumption is valid: it seems to fit all the data we have knowledge of, not just for IE but also for a wide variety of other languages. (And note that the branches and sub-branches of IE have been developing independently since the break-up of Proto-Indo-European [PIE], so they count as indepen- dent witnesses to expected rates of divergence). Or, to put it more cautiously, no one has presented any solid evidence either for strikingly small or for strikingly large amounts of internally-motivated change in any language over a period of five hundred years or more. In sharp contrast, no such assumption is valid for externally-motivated change, i.e. change caused by contact with other languages: in contact situations, all bets about rates of change are off. In principle, a language with no foreign contacts of any sort might be expected to change more slowly than languages with foreign contacts. But we do not have even a single promising example of a language that has developed without any foreign contacts whatsoever, either in regions where human populations are ancient or in more recently populated territories like the Americas. Finally, all available historical evidence about who was where when, and in contact with whom, is used in estimating time depths for language families. Making use of all these kinds of evidence to come up with time-depth estimates for different language families is tricky. For some (but not all) well-established language families, different experts may have wildly different estimates, which indicates at the least that the criteria are not precise. For many, and possibly most, language families, there’s only one real specialist, so no disagreements arise; but that of course doesn’t mean that the sole expert is necessarily right. The problems with estimates of grammatical closeness are obvious— impressionistic judgments on such points will often differ—so I won’t comment further on

5 those. Some discussion of why the lexicostatistical method of glottochronology is unreliable by itself is necessary, however, since it has remained popular in anthropological circles even though most linguists gave up on it about four decades ago. Glottochronology, which was inspired by the carbon-14 method of dating biological ma- terial, is based on the premises that the rate of replacement in the basic, or core, vocabulary is constant across languages, and that there is a set of basic lexical meanings that are found in all languages and relatively resistant both to internal change and to borrowing (see the works of , e.g. 1950). The formula was worked out and then tested on cases where the dates of divergence are known, and the results were generally good. But glot- tochronology lost almost all its adherents among linguists when various scholars identified several types of exceptions and pointed out that the method isn’t much use if you have no way of knowing whether a case is exceptional when you apply it. (Robert Lees, one of the early contributors to the development of the methodology, summed up the problem neatly in a 1969 lecture: glottochronology is mathematically elegant and unexceptionable, he said; the only trouble with it is that it doesn’t apply reliably to real-life languages.) A number of scholars have tried to modify glottochronological techniques to achieve greater reliability, but these have not (or at least not yet) been widely accepted by historical linguists (see Embleton 1992 for discussion). One major skewing factor is lexical borrowing. It’s true that this is likely to affect basic vocabulary less profoundly than culturally loaded vocabulary—speech communities are more likely to borrow a word for pasta or church than a word for water—but borrowings can certainly penetrate into the basic vocabulary too. In English, for instance, about 7% of the basic vocabulary items in a standard 200-word list are loanwords, some from French and some from Old Norse. If one of two emergent daughter languages in a family borrows numerous words into the basic vocabulary, a glottochronological measure of its relatedness to its sister language will give too great a time depth for the parent language. Any other cultural factor that causes unusually fast lexical replacement will also skew the results of glottochronology, increasing the estimated time depth for a family. An early example in the anti-glottochronology literature was West Greenlandic Eskimo, whose speak- ers had a taboo system that required finding new words to replace any words that resembled

6 the name of a newly-deceased person. If two or more Eskimo languages had similar taboos, the calculated time depth for the Eskimoan (sub-)family would increase dramatically. But the skewed result would also conflict with the evidence from a grammatical comparison of the languages, so it would become clear that one of two things had happened: either the rate of lexical replacement was unusually fast, or there was grammatical convergence after the initial language split, resulting in grammatical correspondences that were not inherited from the common parent language. Grammatical convergence can sometimes be ruled out on the ground that the different groups live far from each other, and in addition a knowledgeable expert will know about the cultural name taboo. Name taboos and lexical borrowing are not the only possible sources of rapid vocabulary turnover; another is lexical change made delib- erately to create more distance from a neighboring group, as has been reported for certain non- of New Guinea. In all such cases, application of the historical linguist’s standard Comparative Method will reveal the discrepancy and also (except in the most unfavorable instances; see 3 below) enable the investigator to determine which parts § of the languages are inherited from their parent language. Finally, Modern Icelandic has been cited as a case of unusually slow lexical replacement, so slow that glottochronology gives a too-shallow time estimate for its split from the other North . For this language we know the exact starting point for the split: Iceland was first settled by Norse speakers in 874 CE, so that’s when Icelandic-to-be could begin to diverge from the rest of Old Norse. But a glottochronological calculation using Standard Icelandic data gives a shallower time depth. What happened here was that, in standardizing the language, Icelanders deliberately archaized it, harking back to the glory days of the medieval Eddas; it’s as if someone in charge of Standard English were to fix on the language of Chaucer (1340?-1400) as the proper norm and reintroduce those words used by Chaucer that have long since been replaced by other words in Modern English. This skewing factor might be unique to standard literary languages, and if it is, it might always be easy to detect; but maybe not, because we might know less about the historical circumstances, both for the language itself and for its relatives, than we do in the Icelandic case. Again, impressionistic judgments of grammatical closeness will (with luck) conflict with the lexical evidence and thus warn us of the existence of a problem—provided that the archaizing efforts

7 were confined to the vocabulary. For all these reasons, glottochronology is not reliable by itself as a measure of closeness of linguistic relatedness. But since expert judgments of grammatical closeness and observations of amounts of change over particular time periods can and should also be used in arriving at time-depth estimates, glottochronology may sometimes be useful, especially for subgrouping, if one is alert for signs of skewing. Below, to provide a glimpse of the kinds of estimates historical linguists make about time depths, is an annotated list of a few language families and their proposed ages. The details of this list are less important than the range of estimates and the information on which they are based; I’ll comment on these points for particular families.

Indo-European. The split of Proto-Indo-European into its or ten independent branches is estimated to have occurred about 5500 years ago. This estimate is based on extrapolation from the dates of the oldest attested languages—especially Hittite (ca. 1600-1200 BCE), Greek (from ca. 1400 BCE), and Sanskrit (a few words from 1300 BCE, hymns from ca. 800 BCE). The rather conservative estimated break-up date, ca. 3500 BCE, allows 2000+ years for the wide-ranging lexical and grammatical changes that differentiate these languages at their earliest appearance in the documentary record. The proto-languages for several of the branches and sub-branches can be dated more precisely. The split of Vulgar Latin into the Romance languages, for instance, took place 1500-2000 years ago, on the evidence of written Latin (especially scribal errors) and the history of Roman settlement in what is now Romance-speaking territory. The earliest texts in a Romance language are the Strasbourg Oaths (842 CE), written in Old French, and specialists conclude that several hundred years of divergence must have preceded the first written Romance. Proto-Slavic was also last spoken about 1500 years ago; the earliest extant Slavic texts, 11th-century copies of ninth- or tenth-century documents in Old Church Slavic, are—though clearly South Slavic—linguistically so close to late Proto-Slavic that divergence probably hadn’t proceeded more than a few hundred years by the time of earliest attestation. This estimate fits with what we know of the movements of Slavic-speaking peoples. Proto- Germanic probably broke up about 2500 years ago. The earliest extensive texts are in Gothic, the only member of the now-extinct East Germanic sub-branch, dating from about 350 CE,

8 and there are runic inscriptions in Early Norse, a North Germanic language, from the third century CE; these two sets of materials show significant differences, extensive enough that their development must have required some centuries of divergence. Note that in all three of these estimates of time of divergence within IE branches there is a gap of 1000 years or less between the earliest attestations and the estimated break-up of the parent language into two or more daughter languages. It is calculations like these that lead linguists to give a general range of 500-1000 years for language split under appropriate external circumstances, i.e. when subgroups of an original unified speech community become partially or entirely separated, and when no dramatic borrowing events speed up the process of divergence into daughter languages. Turning now to the New World, let’s look at some estimates for time depths in well- established families of Native American languages. By ‘well-established’ I mean that all specialists agree on the classification; I do not include any groupings that are still not uni- versally accepted, since of course time-depth estimates are meaningless for groups comprising languages that can’t be shown to be changed later forms of a single parent language. So, for instance, Hokan (western North America) is not included here, but if it is a valid genetic grouping, its time depth is surely much greater than any of the ones reported here. The list below is not exhaustive; time depth estimates can be found in the literature for other New World families. The estimates here, however, include most of the ones I’ve seen that I consider reasonably reliable: the people making the estimates are good historical linguists with enough background both in the particular New World families and in Indo-European to be able to exercise good comparative judgments against the known IE cases, and they are also knowledgeable about the languages’ structures and lexicons and about the speakers’ known external history. I have not given the groups different labels according to the distance of the relationships. That is, I use the term ‘family’ for the most comprehensive genetic groupings, at whatever time depths; some historical linguists use terms like ‘phylum’ and ‘stock’ to label genetic groupings with greater time depths and reserve ‘family’ for genetic groupings with shallower time depths. My source for many of the families listed below is Terrence Kaufman, who has unparalleled expertise in most language groups of Meso-America, and who distinguishes

9 stocks (older) from families (younger) according to time depth. The reason I use the term ‘language family’ for both categories is that the dividing line between the two proposed categories is so fuzzy that the distinction is (it seems to me) of dubious usefulness. Of the groups listed below, the first two (Chibchan and Otomanguean) are stocks in Kaufman’s classification; the others he has estimates for belong in his family category. The groups in the list are arranged according to time depth rather than location. Time depths are given in years.

Chibchan (s. Meso-America, n. South America): 6700 (Kaufman, p.c.).

Otomanguean (Meso-America): 6000 (Kaufman 1988; see also Campbell 1997:159).

Arawakan (South America): 5000-5500 (?) (Kaufman, p.c.; but he says that this is an im- pression only, not based on thorough comparison).

Uto-Aztecan (North America): 4800 (Kaufman, p.c.).

Mayan (Meso-America): 4100 (Kaufman, p.c.).

Siouan-Catawban (North America): 4000; Siouan: 3000 (Rankin 1993, cited in Campbell 1997:142).

Salishan (North America): 4000 (M. Dale Kinkade, p.c.; Kinkade has also commented that Interior Salishan branch of the family shows ‘perhaps less structural diversity than is found among western Germanic languages’–1991:148, quoted in Campbell 1997:118).

Eskimo-Aleut (circumpolar): 4000; Eskimoan: 2000 (Woodbury 1984:61).

Chapakuran (South America): 3700 (Kaufman, p.c.).

Iroquoian (North America): 3500-3800 (Mithun 1979:157).

10 Mixe-Zoquean (Meso-America): 3500 (Kaufman, p.c.).

Athabaskan-Eyak (North America): at least 3000 (Krauss 1979:846).

Wakashan (North America): 2900 (William Jacobsen). Jacobsen’s comment on this time es- timate is a good illustration of the common use of IE as a reference point: ‘Sapir...compared Nootkan and Kwakiutlan [the two branches of the family] to...Slavic and Latin, and Rus- sian and German, which is certainly too great a difference. Swadesh...drew the analogy with English and Scandinavian, which may be too shallow. His later calculation...as 29 centuries seems plausible’ (1979:769). The first two IE pairs—Slavic and Latin, Russian and German—have no connection closer than Proto-Indo-European, ca. 5500 years ago; the second pair, English and Scandinavian, belong to different branches of Germanic, so they began diverging ca. 2500 years ago. Jacobsen’s (and Swadesh’s) estimate of 2900 years is thus based explicitly on the IE analogy.

Totonacan (Meso-America): 2600 (Kaufman, p.c.).

Algonquian (North America): 2500-3000 (Goddard 1978b:586); Eastern Algonquian: 2000 (Goddard 1978a:70).

Muskogean (North America): ca. 2500. (This estimate is based on Lyle Campbell’s comment [p.c. 1998] that the diversity in this group is about the same as Germanic and much less than in Siouan.)

Chumashan (North America): 1000 (Lyle Campbell, p.c. 1998).

Two points in particular should be noted about the implications of such a list of families and dates. First, two facts: all the dates in this list are relatively shallow (only two are greater than the estimated date for PIE, and most are considerably shallower); and we

11 have no good estimates at all for a sizable number of New World families, especially in South America. Both of these facts result from a reluctance on the part of most historical linguists to speculate beyond the dates we can be reasonably sure of—that is, comparisons to which reasonably precise dates can be attached, by analogy to IE branches especially— and beyond the limits of our very inexact estimating methods. Greater confidence about shallower time depths is hardly surprising. The lack of good estimates for many South American groups in part reflects the absence of good comparative information. For many proposed groupings there is simply no solid evidence of relationship, and even when it is fairly clear that a relationship exists, often no linguist knows the languages well enough to carry out a serious comparison. Comparing wordlists alone will not provide solid evidence; if there is no grammatical comparison to test the lexical comparison against, there is no way to tell whether this is an ordinary case or an unusual one, or even whether we are dealing with a language descended by normal transmission or a language created by mixing two or more other languages (see 3). § In any case, the relatively shallow time depths on this list should not be taken as evidence that there are no reliable dates available for more distant relationships. But I don’t believe there are many, if any, for New World languages. In fact, the dates that are typically given as an estimated upper bound on the limits of the Comparative Method—8,000 or 10,000 years—are generous extrapolations from the families whose dates we feel relatively confident about, primarily Indo-European. If I’m right about this, one implication for systematic cross-disciplinary comparison stands out: there is quite a bit of linguistic evidence that can be compared with ethno- logical and fairly recent archaeological evidence of cultural change and ethnogenesis, but direct historical evidence from the standard methods of historical linguistics is likely to be of limited usefulness for comparisons with results from archaeology and genetics at time depths greater than 8,000 years. Of course other methods have been proposed to overcome these probable time limitations, but so far none of them has received widespread support from experts in language change. This brings us to the question of long-range comparison, or macro-comparison. In the past few decades, proposals of very large and very old language families have received a great

12 deal of publicity, in particular in the works of the late Joseph H. Greenberg (e.g. 1987), but also through a group of Russian and other linguists, known as Nostraticists because they originally focused mainly on proposed distant relations between Indo-European and other families (see especially Illiˇc-Svityˇc1971, 1976). In the following paragraphs I’ll concentrate on Greenberg’s better-known proposals; analogous problems also arise with Russian and other claims of very distant linguistic relationships, though details vary considerably from one approach to another. Long-rangers, as they sometimes call themselves, reject the cautious historical linguists’ estimate of 8,000-10,000 years as a probable upper limit on the possibility of establishing genetic relationships among languages. Greenberg, in particular, argues that there is in principle no upper limit; in his 1987 book he referred to the possibility that his research would culminate in a proposal for Proto-Sapiens, the ancestor of all ancient and modern human languages (p. 000). But Greenberg’s results are almost universally rejected by historical linguists—not because all his proposed genetic groupings are implausible (many of them may well be historically correct), but because there is no way of finding out whether or not they are correct. That is, his hypotheses are untestable on the basis of currently available information, and the most distant ones are almost surely permanently untestable. The most obvious reason for this state of affairs is that the evidence for genetic relationship of languages (unlike the evidence for non-metaphorical genetic links) decays over time, so that after some thousands of years a set of proposed sister languages will retain too little detectable inherited material to support a hypothesis of relationship. There may well be superficial similarities between pairs of words in two sister languages; but there won’t be enough systematic phonetic correspondence in those word pairs to rule out chance as a reason for the similarities. There probably also won’t be enough information available to rule out borrowing as a source of the similarities. To understand why this is so, we need to look at the cornerstone of the Comparative Method: the regularity hypothesis of . According to this hypothesis, if a sound x changes to another sound y in one word, x will change to y in similar positions in every word in the language. It’s easy to find exceptions to this rule, for reasons that can often be specified in particular cases; but overall, in a great many languages belonging to

13 a wide variety of language families all over the world, it has proved to be close enough to historical truth to provide solid evidence for genetic relationship of languages. In applying the Comparative Method, historical linguists compare words with similar meanings (especially basic vocabulary) in a search for recurring correspondences—x in a number of Language A’s words corresponding to y in a number of Language B’s words. Comparing words is the first step, but a full-scale application requires comparing grammatical affixes as well (suffixes, prefixes, etc.), again looking for Language A’s x corresponding to Language B’s y. It isn’t useful to find a single example of x corresponding to y in a word or suffix with similar meanings in both languages, because that could be—and often is—due to pure chance. What’s needed is recurring correspondences, in at least several sets of words with closely matching meanings. The crucial point is that a pattern of recurring correspondences in two languages under comparison cannot be due to chance, but must reflect some historical connection between the languages. That connection could be either a genetic link or borrowing—reticulation— but these two possibilities can be disentangled, given enough data, by considering such factors as whether the correspondences cluster heavily in the basic vocabulary (and thus signal genetic relationship) or in the more culturally loaded vocabulary (and thus signal borrowing). Any hypothesis of genetic relationship that is supported by numerous recurring sound correspondences in the basic vocabulary and in grammatical morphemes will certainly be considered very promising, though more work is needed to rule out the possibility of reticulation as the source of the correspondences (see 3). Without a pattern of recurring § correspondences, no historical linguist will be favorably disposed toward any hypothesis of genetic relationship, because this is the only test we have that has been shown to point reliably to a history of gradual divergence of languages over a long period of time. This is the heart of the massive resistance to Greenberg’s proposals of long-range genetic relationships. He makes a distinction between ‘classification of languages’ and ‘historical lin- guistics’, claiming that classification can be carried out without actually doing any historical linguistics. In practice, his method of mass comparison involves scanning large amounts of lexical data in a search for similarities; and the similarities turned up by the method were used in classifying the languages into proposed families—often very large families, such as his

14 huge ‘Amerind’ family encompassing all the languages of South America and MesoAmerica and the vast majority of the languages of North America. His proposals do occasionally include claims of systematic correspondence in grammatical morphemes, but these are not supported by systematic correspondences in the vocabulary; the claimed lexical correspon- dences are almost entirely isolated, occurring in one set of words only. He also takes great latitude with semantic similarity. Nowhere does he state criteria for judging similarity; this is especially problematic in his phonetic comparisons, where he uses implicit criteria that greatly increase the chances for a successful ‘match’. One difficulty here is that ‘similarity’ is a slippery concept: what counts as similar? It’s easy to decide that a sound [t] in one language is similar to a sound [d] in another language— they share all their phonetic features except voicing (vibration of the vocal cords)—but is [t] similar enough to [r] to count as a match? Maybe; maybe not. It depends on the language. A more fundamental problem is that sounds can and do undergo quite drastic changes over several thousand years. One result of this fact is that inherited correspondences may be quite dissimilar sounds—the most-cited example is the recurring correspondence, in words of very similar meanings, of Armenian [rk] to early Greek (and other IE) [dw]. Another result of the fact that sounds change is that close similarities between words in two related languages at a time depth of, say, 15,000 years would be suspect in any case. It’s true that, in any given language family, there are likely to be a few sounds that haven’t changed much in any of the sister languages over several thousand years. But there’s no way to predict which sounds these will be in any particular case, and many other sounds will certainly change a great deal, so that shared inherited vocabulary—called vocabulary—is likely to look very different. The publication of Greenberg’s classification of Native languages of the Americas in 1987 stimulated a considerable amount of research, from close examination of his evidence on various language families in the Americas to elucidation of historical linguists’ criteria for establishing genetic relationship to statistical techniques for testing hypotheses of relatedness. A detailed analysis of Greenberg’s proposals, and of his critics’ responses to them, is beyond the scope of this paper. It is worth noting, however, that the Native American groupings have made their way into the anthropological literature as authoritative classifications, with results

15 that are not conducive to fruitful interdisciplinary approaches to (for instance) questions of origins. In particular, the most extensive investigations of possible language/gene linkages have used Greenberg’s classifications of Native American and other languages as their linguistic baseline. Since the supporting evidence for Greenberg’s classifications is so very weak, in- vestigations based on them are more likely to hinder than to help in the search for such linkages. Here’s one example of the problems that arise. The anthropologist Rik Ward, in an effort to test proposed correlations between languages and genes in the Pacific Northwest of North America, conducted genetic studies of several populations in the region (1996). He concluded (pp. 219-220) that mtDNA

‘within the four Beringian linguistic phyla is substantially less than that observed within the Amerind phylum. Even more dramatic, the mitochondrial divergence within a single Amerind-speaking tribe can exceed the divergence observed within the entire set of the four “Beringian” linguistic phyla. This is in total conflict with the predicted [linkage between genes and language]’.

In accepting Greeberg’s huge ‘Amerind’ family as a well-established linguistic grouping, together with his proposed subgroupings within ‘Amerind’, Ward relies on a position that obscures some of the actual lessons to be learned from his genetic investigation. One find- ing that contributes to his general conclusion is that speakers of Bella Coola—a language belonging to the Salishan language family, which according to Greenberg ‘is...linguistically very close to the Wakashan language family’ (Ward 1996:211)—are genetically quite distant from speakers of Wakashan languages. This result would be striking counterevidence to claims of language/gene linkages if it were true that Salishan and Wakashan languages are ‘linguistically very close’; but specialists in these two language families are united in rejecting Greenberg’s claim that they are related at all, much less closely related. Ward’s genetic evi- dence dividing Bella Coola speakers sharply from Wakashan speakers thus provides support for a hypothesis of language/gene correlations, not counterevidence. (A caveat is in order here: historical linguists never actually claim that any language is unrelated to any other language, except in rare cases of extreme reticulation; when we say that two languages are

16 unrelated, what we usually mean is that there is no evidence to support a hypothesis that they are related. We should, of course, be more precise in our phrasing.) The second implication of the above list of families and dates is probably the most important in the context of this volume, because it concerns the origins question: the lack of good time estimates for many families and the lack of demonstrable family relationships for a sizable number of languages provide no evidence in support of a hypothesis that most or all of the currently problematic languages arose in some way other than cladistic genesis. As in the Americas, systematic comparison of lexicon and grammar in the vast majority of languages in most regions of the world has enabled linguists to group languages into families and to draw branching trees that show the hierarchical levels of relationship of languages in each family. Our methods of establishing linguistic relationships do reveal discrepancies that signal unusual developments like language mixture, as indicated above. But we don’t encounter such cases very often. In the next section I’ll survey types of reticulate language genesis and try to explain why they are relatively rare.

3. Cladistic vs. reticulate ethnogenesis. Like most historical linguists, I be- lieve that cladistic language genesis is the norm, and the concomitant belief that reticulate language genesis is exceptional. Most of the blank areas on a world-wide map of language families reflect a lack of knowledge, not a determination of non-cladistic development. Two regions of the world in particular—South America and non-Austronesian-speaking parts of New Guinea—have numerous languages that have not (yet) been grouped into families by knowledgeable linguists. But even here some families have been identified, and it is almost certain that others will be identified when more data becomes available. There are (in my view) just three basic types of non-cladistic language genesis. First, the prototypical language develops for use as a second language for limited purposes of trade or other regular but restricted interaction in a multilingual context. All the speakers of a prototypical pidgin continue to use their native languages for intragroup communication; but under certain social conditions—as in the case of many, most, or all Caribbean creoles— speakers of a well-established pidgin may eventually give up their native languages, at which point the pidgin turns into a creole, the main language of a speech community. (But the new creole is a descendant of the original pidgin, not an entirely new language, so a creole

17 descended from a pidgin is not itself the product of reticulate development.) Crucially, develop only when the people in the new contact situation do not know, and do not learn, each other’s languages. There are cases of pidgin genesis in which speakers of the most prominent group in the contact situation withhold their full language from other groups; there is also at least one case in which speakers of the other language(s) deliberately refuse to learn anything but the vocabulary of the most prominent group’s language. That is, lack of bi-/multilingualism does not imply inability to learn a new language. The result, in any case, is typically a lexicon drawn almost entirely from a single language, but a grammar that resembles a cross-language compromise among the grammars of all the languages whose speakers created the pidgin. Two examples of prototypical pidgins are , once a major language of intertribal and Native-White interaction in the Pacific Northwest of North America (see e.g. Thomason 1983a) and Chinese Pidgin English, first attested in the 18th century (see e.g. Shi 1991); but there are many other typical pidgins as well, probably including the ancestors of most or all modern Caribbean creoles. Second, the prototypical abrupt —of which there are probably very few examples—emerges over a short period of time (perhaps one or two generations) as the main language of a new speech community in a new contact situation, without going through a stable pidgin stage. In such a case the speakers in the new community have little or no further opportunity to use their native languages, because they must talk regularly to people who don’t know it; their children grow up speaking the new contact language. As with pidgin genesis, the people in the new contact situation neither know already nor learn afterwards the language(s) of the other groups in contact. In both pidgin genesis and abrupt creole genesis the process is amalgamation, but the grammatical structure, even if it is divided into separate components such as and syntax, is not traceable directly to any one of the languages in contact. A probable example of an abrupt creole is Pitcairnese, which surely emerged within one or at most two generations on Pitcairn after the island was settled by the nine Bounty mutineers, together with the nineteen mostly Tahitian-speaking Polynesians they brought with them, after the famous 1790 mutiny (see e.g. Ross & Moverley 1964). Third, scattered about the world are a number of two-language mixtures that clearly arose among fluent bilinguals. Like pidgins and abrupt creoles, these are contact languages,

18 since they would certainly not exist if it weren’t for the contact situations in which they arose; but unlike pidgins and abrupt creoles, they did not emerge out of a need for a medium of intergroup communication, because their creators, being bilingual, were already able to talk to everyone else in the two-language contact situation. Instead, they arise—as far as we can tell from the very small number of well-documented cases—for one of two reasons. A bilingual mixed language is either created (probably deliberately and probably always abruptly) as a symbol of a new ethnic group, or it emerges through gradual replacement of structure, leaving only the original lexicon intact, as a symbol of cultural persistence. in the latter case, the reason appears to be a dogged refusal to assimilate completely to a pervasively dominant culture. Here are a few examples, starting with three new ethnic groups. , which combines French noun phrases (vocabulary, phonology, morphology, syntax) with Cree verb phrases and sentence structure, was created in Canada by the offspring of Indian-White marriages, who had a legal status distinct from that of both Whites and Indians (see e.g. Thomason & Kaufman 1988:228-233, Bakker & Papen 1997). Mednyj Aleut, originally spoken on one of Russia’s Commander Islands, was created by Russian/Aleut bilinguals and consists of Aleut structure and lexicon (though with many Russian loanwords and a few borrowed structural features as well) except for the finite verb morphology, which is entirely Russian (see e.g. Thomason 1997). Here too the offspring—in this case of Russian fur traders and Aleut women—had a special social, economic, and legal status. The Media Lengua of Ecuador, a mixture of Quechua grammar and Spanish lexicon, was created by young Quechuas who travel(ed) to work in Quito, the country’s capital, and whose social position is distinct from that of both the native speakers of Spanish in the city and the stay-at-home Quechuas (see Muysken 1997). The language presumably took on its distinctive form abruptly, but its emergence was preceded by much lexical borrowing into Quechua from Spanish. I know of only two fairly clear examples of gradual emergence of a bilingual mixture, though reports of others are beginning to come in. Ma’a, which is (or was) spoken in northeastern Tanzania, was originally (probably) a Cushitic language, but in the last two or three centuries its structure has been so heavily bantuized that only the lexical residue remains: the entire syntax and morphology are fully-elaborated Bantu structures, and about

19 half the vocabulary and most of the phonology are also Bantu (see e.g. Thomason 1983b, and see Mous 1994 for a partly differing view of the emergence of Ma’a). But most of the basic vocabulary is still Cushitic, and one characteristic Cushitic (and non-Bantu) sound—a voiceless lateral fricative—remains in the phonological system. The other example is Laha, an Austronesian language spoken on Ambon Island, Maluku, Indonesia. I’ve seen no data on this language, but it is reported to have undergone very extensive interference from another but rather distantly related Austronesian language, Ambonese Malay, the island’s dominant local language. According to Collins (1980:14), ‘Laha has maintained its indigenous language in the face of increasing pressure from Ambonese Malay [AM] but only at the expense of drastic revision of its grammar...Bit by bit the grammar of Laha has become nearly interchangeable with AM grammar’. None of these types of contact language will cause an investigator to mistake their retic- ulate origin for a cladistic one, provided that the source languages (or close relatives of the source languages) are available for comparison, and (probably) provided that they arose less than 8,000-10,000 years ago. As with the glottochronological discrepancies discussed in 2, § application of the Comparative Method will reveal mismatches between the lexicon and the grammar—in pidgins, creoles, the Media Lengua, Ma’a, and Laha—and between the two components found in various combinations in languages like Michif and Mednyj Aleut. The main lesson here, however, is that such languages are rare, in comparison to the very large number of languages for which one can establish the usual sorts of cladistic relationships. It’s not that the linguistic processes through which contact languages arise are themselves exotic or rare. On the contrary, all the specific processes can easily be observed in ordinary linguistic behavior: lexical borrowing; code-switching in bilinguals’ conversation (which often produces mixed sentences of the type found in Michif, with noun phrases from one language and everything else from the other); the efforts of (say) an Italian speaker and an English speaker to converse in French, when neither of them knows it at all well; and so forth. Even phenomena like the deliberate replacement of vocabulary (as in some New Guinea languages, noted in 2 above) have analogues in more ordinary contexts, such as Pig Latin and similar § games, whose purpose is to make words unrecognizable. What is unusual, then, is not the process, but the product. For bilingual language

20 mixing (as in Michif or the Media Lengua) to become crystallized into a language that is used regularly by a speech community, there has to be some powerful social reason. Most offspring of community-wide mixed marriages don’t go off to form their own ethnic group and invent their own language; most groups of bilinguals who move uneasily between rural home and city job don’t solidify themselves into a new unified group with a new language. Similarly, most speech communities under intense cultural pressure from surrounding and dominant groups do not hang onto their original languages for centuries, borrowing freely but keeping the most salient part of the language, its vocabulary. Instead, in the vast majority of cases, the pressured speakers of the non-dominant language simply shift to the dominant language. And so it is with pidgins and creoles too. Pidgins are the most common type of contact language, and there have probably been many more of them in world history than we have (or will ever have) any knowledge of. But bilingualism and multilingualism are even more common in trade situations; since (probably) most people in the world are at least bilingual, and since people are most likely to trade regularly with neighbors whose language(s) they already know, most trade doesn’t require a new contact language. Abrupt creoles involve language shift, in that the creators of the creole shift away from their several native languages. But this is an unusual kind of shift, because the shifting speakers learn the vocabulary without learning much, if any, of the putative target language’s grammar; this contrasts sharply with ordinary shift situations, in which the shifting speakers learn most of the target language, both lexicon and grammar. In sum, contact languages are rare because social factors combine to produce much less extreme linguistic results in most contact situations. This is not to say that language contact doesn’t often cause change in one or more of the languages; it does, and in a very minor sense these changes could perhaps be viewed as bits of reticulate development. But the scale is wrong for a hypothesis of language amalgamation. Borrowing and shift-induced interference have to be very extensive indeed before they interfere with the lines of descent with modification from a single parent language. There is, however, one class of possible exceptions to my generalization about amalga- mation as a rarity. There are well-documented cases in which divergent dialects of a single

21 language can re-merge to form a new language (and a new ethnic group); very closely re- lated languages—that is, languages that were until recently dialects of the same (parent) language—can also re-merge. There are no linguistic barriers to such a move, not just be- cause (as Michif, Ma’a, and their ilk show) there are no linguistic barriers to amalgamation anyway, but because they will already share almost all their vocabulary and almost all their grammar. The Comparative Method will sometimes detect the process retrospectively, but I suspect that in many cases it will not do so. Any systematic comparison will reveal in- explicable irregularities, and with an amalgamation of this type there might not be many more irregularities than in ordinary cases of language split. We therefore have absolutely no idea how common this sort of reticulate development has been in language history. But the re-merging of two (or more) dialects is qualitatively very different from the amalgamation of two completely distinct languages with unlike vocabularies and structures; and in any case the re-merging can’t occur before the ordinary dialect or language split has run its course. So I don’t believe that such amalgamations, even if they are fairly common, affect my overall conclusion that reticulate origins are exceptional. Let’s turn now to the question of whether it is possible to predict when new languages will arise through reticulation. The short answer to this question is no: there is no way to tell, in advance, whether a contact situation will lead to an extreme result—a mixed language of one of the three types described above—instead of more ordinary contact phenomena such as borrowing of vocabulary and some structural features. We can, to be sure, outline some prerequisites for contact-language genesis. Some of these are trivial, of course: most obviously, there can be no contact-language genesis with- out language contact. Less trivially, the contact must be intense. In the case of pidgins and abrupt creoles, this means that there must be different linguistic groups that share no com- mon language and that come under great social pressure to communicate with each other; in the case of bilingual mixed languages, there must be a community in which many, most, or all members are fluent speakers of two languages. Other social factors can be sketched, but in our present and foreseeable future state of knowledge, they can’t be specified with any precision. A pidgin or abrupt creole will develop if, in a new multilingual or (more rarely) bilingual

22 contact situation, none of the groups in contact learns another of the languages—because an especially prominent language in the situation is being withheld by its own speakers, or because speakers of the other languages have too little access to it to learn it, or because they have too little motivation to learn it, or for some other less obvious reason. But we can’t predict, for any given new situation, whether an appropriate set of social factors will be present in such force as to lead the groups to develop a pidgin or abrupt creole. An abruptly created bilingual mixed language will arise when a bilingual subgroup of a larger community chooses to create a new language to serve as a symbol of its new (sub-)ethnic identity, and a gradually developed bilingual mixed language will arise when a persistent ethnic community stubbornly resists total cultural assimilation to a dominant group. But again, we can’t predict when new ethnic subgroups will coalesce in such a way as to motivate the development of a new language, and we certainly can’t predict when a pressured group will borrow most of a dominant language rather than simply give up its entire ethnic-heritage language by shifting to the dominant language. The long answer to the question about predicting linguistic reticulation—the genesis of mixed languages—therefore isn’t very optimistic. We know it is unusual, and we can state several non-trivial conditions for the emergence of the three types of mixed languages; but we can’t predict their emergence with any confidence at all. Further work may well lead to a more precise set of preconditions (and the ones mentioned above can already be elaborated on), but it is unlikely that we will ever be able to predict reticulation. Part of the problem is that the relevant factors are certainly cultural rather than linguistic, and human cultural behavior in complex contact situations cannot be reduced to a set of cookbook recipes. Another part of the problem is that we have no solid evidence about the specific processes through which any mixed language has arisen. Finally, what about the tempo and mode of linguistic change and diversification? In particular, does cladistic language split come about gradually, through the accumulation of small changes over many generations, or does it occur abruptly, via punctuated equilibrium? The standard view in historical linguistics is that every living language is constantly chang- ing; that the changes are small when any two generations of speakers are compared; but that changes accumulate gradually over many generations to bring about divergence and

23 ultimately language split in two separated dialects of a single language. This view is, as far as historical linguists can tell, an accurate reflection of ordinary language split, the kind of linguistic diversification that has produced hundreds of language families and subfamilies all over the world. There are only two ways in which abrupt language split could come about. The first would be a case in which very extensive contact affected one diverging dialect so strongly, with much borrowed lexicon and greatly restructured grammar, that the two recent dialects were no longer mutually intelligible and were thus separate languages. This could, in prin- ciple, happen within one or two generations. I don’t know of any examples where this has happened; the extreme outcomes discussed above were new mixed languages, not merely extensively changed old languages. The second way would be a case in which speakers of one dialect deliberately changed their language to make it more different from neighboring (ex-)dialects; again, a very rapid split could take place if the deliberate changes were drastic enough to eliminate mutual intelligibility. Examples of this type have been reported from New Guinea, Peru, and Baluchistan, in what is now Pakistan. This last example is called

M¯okk¯i, and it is, or was, the language of the L¯or.¯is (Bray 1913:139-140). M¯okk¯i was appar- ently invented as a secret language so that the L¯or.¯is could talk to each other without being understood by unfriendly neighbors; but by the time Bray investigated it, it had become the group’s home language. And since its invention involved massive distortion of the lexicon, it was surely unintelligible to speakers of closely-related Indic languages. But any case of abrupt language split would betray its origin in any attempt to apply the Comparative Method to the results. M¯okk¯i, for instance, would easily be identified as an anomaly, because its grammar would match that of other Indic languages, but its lexicon would not—if recognizable at all as Indic, its distorted word forms would not display either the quantity or the systematic quality of sound correspondences with the words of related languages. Proposals of abrupt splits have sometimes been made in response to the intractable nature of languages in certain parts of the world—intractable, that is, from the viewpoint of the historical linguist trying to find out how they arose. Japanese is probably the most notorious case. Some linguists claim that it belongs to the Altaic family (if this is in fact a language

24 family—the grouping is controversial); others claim that it is related to Austronesian, and still others suggest that it is a mixed language (see e.g. the discussion in Matisoff 1990). I have already mentioned languages of South America and non-Austronesian languages of New Guinea as examples of languages that have not (yet) been classified exhaustively. In both regions the main current problem is lack of good documentation for the languages, which makes it difficult or impossible to carry out systematic comparisons. The same problem exists with some languages in other parts of the world too, but not in such an extreme form or with so many languages in a single region. But in New Guinea, at least, it may never be possible to unravel the historical connections among the non-Austronesian languages, in part because there has apparently been a great deal of grammatical convergence as well as deliberate change (both lexical and structural). One other region may present that same problem, although the documentation is much better than for languages of New Guinea: the 260 or so Aboriginal languages of Australia (minus Tasmania), though often considered to constitute a single family, are viewed by R. M. W. Dixon as one huge convergence area—or, rather, as a single very ancient language family whose members are so similar today because of much grammatical convergence and much lexical borrowing (1997). (His presentation is a bit confusing, as he seems to appeal to diffusion rather than inheritance to account for the shared grammatical features, but he also seems to believe in a single huge language family encompassing all Aboriginal languages.) Because Dixon argues that reticulate development of languages via massive diffusion is com- mon, his views require some comment here. Dixon’s main claim concerns the tempo and mode issue. He argues for long periods of equilibrium characterized by minor internal changes and very widespread diffusion, followed by short punctuations, in which language families arise rapidly through splits (i.e. cladisti- cally). His theory is designed to deal with the Australian problem—a continent settled at least 50,000 years ago, with languages that resemble each other closely structurally but have rather widely discrepant vocabularies. He sets up two alternative hypotheses. One posits a Proto-Australian that was spoken 50,000 years ago and then changed very slowly during the long period of equilibrium but with much diffusion among the emergent daughter languages; the other posits a Proto-Australian that was spoken 5,000-10,000 years ago. He then argues

25 vigorously for the first hypothesis. He does not consider the possibility that Australian lan- guages might not all be related to each other, in which case there would have been no single Proto-Australian. He also doesn’t consider the possibility that there might have been more two or more different Australian language families, but that only one eventually survived and spread at the expense of the others. He does account for the lexical mishmash: Australian cultures are apparently like West Greenlandic Eskimo culture, in that lexical turnover is very rapid because of name taboos. In Australia, the usual way of replacing a tabooed lexical item is to borrow a word from a neighboring language. I will not analyze Dixon’s arguments in detail here, but in general his proposals are rather too speculative to be convincing. The Australian situation is certainly puzzling, and the social factors, as noted, may make it impossible to sort out the route(s) by which the various modern languages arose. Diffusion is a major part of the Australian linguistic picture; in Arnhem Land, in particular, there seems to have been so much borrowing in all directions that the ultimate source of a borrowed word or structure is often impossible to determine (Heath 1978). Still, it is clear that too little systematic comparison has been done on Australian languages to decide the issue with any confidence. I would also disagree with Dixon on the inevitability of diffusion in languages in contact: there are too many examples of non-change under intense contact conditions for his claim to be plausible. Moreover, Dixon’s proposal is based on his interpretation of the Australian data; he has no examples, from Australia or elsewhere, that actually show a lack of major internally-motivated change over a few thousand years, much less over 50,000 years. (He says that no attestations of stasis in equilibrium are possible because the mere presence of a scientific observer brings on a punctuation. This is not impossible, but in the absence of any concrete evidence, this argument doesn’t carry much weight.) In spite of Dixon’s ingenious proposals, then, I will maintain the standard position that, since all living languages are always changing (Dixon himself agrees with this premise), linguistic changes will accumulate in any language over time, so that after 500-1000 years two separated dialects of a single parent language will have diverged into separate languages. This doesn’t mean that language split is inevitable. The language of a unified group that stays unified for millennia will develop in a straight unbroken line for millennia. But if two or

26 more subgroups do separate, language split is inevitable; and it will not take 5,000 years to happen, much less 50,000. Although there are relatively abrupt individual changes—in fact, most contact-induced change is relatively abrupt—gradual incremental change is constant in every language, even in a language that is simultaneously undergoing abrupt change under the influence of another language.

4. Conclusion. What is predictable about language change is the fact of change itself: internally-motivated language change is inevitable in every language. Everything else is unpredictable, starting with the particular changes—their specifics depend on a complex mix of linguistic and social factors. Language split is not inevitable, but is instead dependent on the existence of appropriate social circumstances. The same is true of contact-induced change (no contact, no externally-motivated change). It is impossible to predict when a new ethnic group will invent a new language as a symbol of their differentness; when a group of villagers will decide to change their language to exaggerate its distinctness from the next village’s language; or when a pressured ethnic group will decide to maintain their language against all odds rather than shift to the dominant group’s language. But a corollary of the inevitability of change, provided that at least two subgroups of a speech community become separated, is that language split is inevitable. So cladistic language genesis is inevitable under this common social condition; but reticulate language genesis is not, because the requisite social conditions for amalgamation of distinct languages don’t obtain very often. What, then, are the prospects for using linguistic evidence in conjunction with evidence from other branches of anthropology to achieve a unified picture of population origins? This depends (in my opinion) on the time depth: at shallow time depths, up to about 10,000 years, the prospects are very good. After about 10,000 years they are very poor, because of the decay of the kinds of linguistic patterns that provide the only solid evidence for hypotheses of linguistic relatedness. Useful comparisons of linguistic and other anthropological evidence must be based on secure linguistic classifications, all of which (at least in our current state of knowledge) are in the time range of 10,000 years or less. Here’s one example of a fruitful investigation using linguistic and genetic evidence, from work being carried out at the Max Planck Institute for Evolutionary Anthropology in Leipzig. In the Caucasus, Bernard Comrie and his colleagues have found, Armenians and Azerbai-

27 janis are genetically very close, but their languages are (as far as we can tell) unrelated to each other—Armenian is an Indo-European language, Azerbaijani is a Turkic language. Because Azerbaijanis are relative newcomers to the region, this discrepancy between genes and language suggests that ‘Azerbaijanis are the descendants of a population that shifted to a Turkic language without there being any significant corresponding geneflow’ (Comrie 2002). This comparison between genes and language makes it possible, Comrie notes, to offer solid evidence in favor of a hypothesis of language shift, which is often hard to demonstrate convincingly on the basis of linguistic evidence alone. Correlations have also been explored for non-genetic anthropological data, often with good results. Historical linguists have long made use of data from archaeology in their efforts to understand language history, for instance, although these efforts are greatly complicated by the fact that it’s impossible to be sure what language people were speaking unless they left documentary records. What is certain is that all of us are looking at the same picture. That is, some sequence of historical events took place, and in each instance we are all trying to find out just what those historical events were. So there is no doubt that the data from our several subfields must ultimately be compatible with a single historical picture. Unfortunately, this doesn’t mean that we can count on being able to discover how all the pieces fit together to reveal the pattern of history. But if we cooperate by pooling the results of our research on particular cases, pieces of the picture will fall into place—with luck, many pieces.

References

Bakker, Peter, and Robert A. Papen. 1997. Michif: a mixed language based on Cree and French. In Thomason, ed., 295-363.

Campbell, Lyle. 1997. American Indian languages: the historical linguistics of Native America. Oxford: Oxford University Press.

Campbell, Lyle, and Marianne Mithun, eds. The languages of Native America: historical and comparative assessment. Austin: University of Texas Press.

28 Bray, Denys de S. 1913. Census of India, 1911, vol. IV: Baluchistan. Calcutta: Superintendent Government Printing, India. Collins, James T. 1980. Laha, a language of the Central Moluccas. Indonesia Circle 23:3-19. Comrie, Bernard. 2001. Languages and genes: evidence from the Caucasus. Paper presented at the Annual Meeting of the Linguistic Society of America, San Francisco. Dixon, R.M.W. 1997. The rise and fall of languages. Cambridge: Cambridge University Press. Embleton, Sheila. 1992. Historical linguistics: mathematical concepts. In William Bright, ed., International encyclopedia of linguistics, vol. 2:131-135. Oxford: Oxford University Press. Goddard, Ives. 1978a. Eastern . In Bruce G. Trigger, ed., Handbook of North American Indians, vol. 15: Northeast, pp. 70-77. , DC: GPO. Goddard, Ives. 1978b. Central Algonquian languages. In Bruce G. Trigger, ed., Handbook of North American Indians, vol. 15: Northeast, pp. 583-587. Washington, DC: GPO. Goddard, Ives. 1992?

Greenberg, Joseph H. 1987. Language in the Americas. Stanford, CA: Stanford University Press. Heath, Jeffrey. 1978. Linguistic Diffusion in Arnhem Land. Canberra: Australian Institute of Aboriginal Studies. Illiˇc-Svityˇc.1971, 1976. Opyt sravnenija nostratiˇceskixjazykov. 2 vols. Moscow: Nauka.

Jacobsen, William H., Jr. 1979. Wakashan comparative studies. In Campbell & Mithun, eds., 766-791.

Kaufman, Terrence S. 1988. Otomanguean tense/aspect/mood, voice, and nominalization markers. Paper presented at the Second Spring Workshop on Theory and Method in Linguistic Reconstruction, University of Pittsburgh. Kaufman, Terrence S. 1990. Language history in South America: what we know and how to know more. In Doris L. Payne, ed., Amazonian linguistics: studies in lowland South American languages, 13-67. Austin: University of Texas Press.

29 Kinkade, M. Dale. 1991. Prehistory of the native languages of the Northwest Coast, vol. 1: The North Pacific to 1600, 137-158. Portland: Oregon Historical Society Press.

Krauss, Michael E. 1979. Na-Dene and Eskimo-Aleut. In Campbell & Mithun, eds., 803-901. Matisoff, James A. 1990. On megalocomparison. Language 66:106-120.

Mithun, Marianne. 1979. Iroquoian. In Campbell & Mithun, eds., 133-212.

Mous, Maarten. 1994. Ma’a or Mbugu. In Peter Bakker & Maarten Mous, eds., Mixed languages: 15 case studies in language intertwining, 175-200. Amsterdam: Uitgave IFOTT (Institute for Functional Research into Language and Language Use).

Muysken, Pieter. 1997. Media Lengua. In Thomason, ed., 365-426.

Rankin, Robert. 1993. On Siouan chronology. Paper presented at the Annual Meeting of the American Anthropological Association, Washington, DC.

Ross, Alan S.C., and A.W. Moverley. 1964. The Pitcairnese language. Oxford: Oxford University Press.

Shi, Dingxu. 1991. Chinese Pidgin English: its origin and linguistic features. Journal of Chinese Linguistics 19:1-40.

Swadesh, Morris. 1950. Salish internal relationships. International Journal of American Linguistics 16:157-167.

Thomason, Sarah Grey. 1983a. Chinook Jargon in areal and historical context. Language 59:820-870. Thomason, Sarah Grey. 1983b. Genetic relationship and the case of Ma’a (Mbugu). Studies in African Linguistics 14:195-231.

Thomason, Sarah Grey. 1997. Mednyj Aleut. In Thomason, ed., 449-468.

Thomason, Sarah Grey, ed. 1997. Contact languages: a wider perspective. Amsterdam: John Benjamins.

Thomason, Sarah Grey, and Terrence Kaufman. 1988. Language contact, creolization, and genetic linguistics. Berkeley: University of California Press.

30 Ward, R.H. 1996. Linguistic divergence and genetic evolution: A molecular perspective from the New World. In A.J. Boyce and C.G.N. Mascie-Taylor, eds., Molecular biology and human diversity, 205-224. Cambridge: Cambridge University Press.

Woodbury, Anthony C. 1984. Eskimo and Aleut languages. In David Damas, ed., Arctic (Handbook of North American Indians, ed. by William C. Sturtevant, vol. 5), 49-63. Washington, DC: Smithsonian Institution.

31