Applying Lexicostatistical Methods to Sign Languages: How Not to Delineate Sign Language Varieties

Palfreyman, Nick (2014) Applying lexicostatistical methods to sign languages: How not to delineate sign language varieties. Unpublished paper available from clok.uclan.ac.uk/37838 Applying lexicostatistical methods to sign languages: How not to delineate sign language varieties.1 Nick Palfreyman ([email protected]) International Institute for Sign Languages and Deaf Studies, University of Central Lancashire. The reliability and general efficacy of lexicostatistical methods have been called into question by many spoken language linguists, some of whom are vociferous in expressing their concerns. Dixon (1997) and Campbell (2004) go as far as to reject the validity of lexicostatistics entirely, and Dixon cites several publications in support of his argument that lexicostatistics has been ‘decisively discredited’, including Hoijer (1956), Arndt (1959), Bergslund and Vogt (1962), Teeter (1963), Campbell (1977) and Embleton (1992). In the field of sign language research, however, some linguists continue to produce lexicostatistical studies that delineate sign language varieties along the language-dialect continuum. Papers referring to lexicostatistical methods continue to be submitted to conferences on sign language linguistics, suggesting that these methods are – in some corners of the field, at least – as popular as ever. Given the ongoing popularity of lexicostatistical methods with some sign language linguists, it is necessary to be as clear as possible about how these methods may generate misleading results when applied to sign language varieties, which is the central aim of this paper. Furthermore, in several cases, lexicostatistical methods seem to have been deployed for a quite different purpose – that of establishing mutual intelligibility – but the suitability of lexicostatistical methods for such a purpose has not been openly addressed, and this is dealt with in Section 4. This paper is based on a review of the literature and was written following my time living in Indonesia (2007-2009) and working on research with users of Indonesian sign language varieties (between 2010 and 2013). Section 1 outlines the history of lexicostatistical methods in linguistics, while Section 2 is concerned with lexicostatistical studies of sign language varieties. Problems in applying lexicostatistical methods to sign languages are set out in Section 3, and in Section 4 I discuss the stated and implicit aims of sign language sociolinguists in making recourse to lexicostatistical methods. The final section highlights the need for sign language sociolinguistics to move away from lexicostatistics as a false proxy for mutual intelligibility. 1. What is lexicostatistics? Lexicostatistics is a method of classification that entails comparing the vocabulary of different language varieties to find a measure of distance through the application of a statistical scale, and enables linguists to reconstruct family-trees for groups of languages that are known to be related (Embleton, 2000; Campbell, 2004). Perhaps the most well-known proponent of lexicostatistics is Morris Swadesh (1950, 1 This paper is based on pp. 17-20 and pp. 22-55 of my revised PhD thesis, Sign language varieties of Indonesia: Linguistic and sociolinguistic perspectives (University of Central Lancashire) which was accepted with minor amendments in 2014. I would like to thank Ulrike Zeshan, Connie de Vos, Sheila Embleton and David Gil for discussing lexicostatistics with me, and my PhD examiners for their feedback. Any errors remain my own. 1954, 1955) – although several linguists proposed and developed lexicostatistical methods prior to this (Embleton, 2000). The application of Swadesh’s methods have been controversial from the outset: for an early example, see criticisms made by Gudschinsky (1956b) concerning Lees (1953). Several linguists have responded to the perceived methodological weaknesses of lexicostatistics individually through a series of modifications (see Trask, 1996, and Embleton, 2000, for examples); an alternative ‘pick and mix’ approach has seen linguists adopt some elements of lexicostatistics and jettison others, in accordance with their own objectives. Given the plethora of individual methodological combinations used in the literature, there is no single uniform, commonly recognised ‘lexicostatistical method’, and some working definitions are essential here to avoid confusion. The first useful distinction is between classical lexicostatistics and preliminary lexicostatistics. This distinction is proposed by Starostin (2010), who stipulates that classical lexicostatistics is only conducted once a historic relationship between the language varieties in question has already been demonstrated. Classical lexicostatistics is the last stage in a long process of determining the historical relationships between a series of language varieties. Preliminary lexicostatistics, on the other hand, entails the use of lexicostatistical methods before any relationship between the languages has been determined. This approach is open to accusations of circular reasoning: lexicostatistical methods are used to establish that there is a relationship, and on that basis, the same methods determine the nature of that relationship. Central to both methods is the concept of the ‘cognate’, which can best be explained with recourse to an example. Historical linguists consider German and English to be related because they have descended from a single language variety – West Germanic – in use around 2000 years ago (Hawkins, 2009). As a result of this common origin, the German word Tanz and the English word dance came from the same proto-form, and so they are cognate. As the West Germanic variety split – see Figure 1 – the proto-form changed in different ways as, through time, its sounds became subject to different but regular phonological processes in each ‘descendent’ language variety. Regular sound correspondence between German [t] and English [d] means that Tanz/dance is not the only cognate pair: others include Tag/day, tot/death, Tür/door – and gut/good, where the affected consonant is in a final position (Ratcliffe, 1998:14). Figure 1. A section of the West Germanic family tree (based on Hammarström et al., 2014). Written documentation is especially useful in permitting historical linguists to use techniques such as regular sound correspondence in order to identify cognates; methods from classical lexicostatistics can then be applied. One of the problems is that it can be difficult if not impossible to identify cognates simply by finding formal similarities. To give but one example, Spanish mucho and English much are formally and semantically similar, but not cognate: mucho comes from the Latin multum meaning ‘much’, while much comes from Old English micel meaning ‘big’ (Warnow, 1997:6586). Conversely, French chef 2 and English head are cognate, even though this is no longer formally apparent (Embleton, 2000:149).2 It is primarily for this reason that preliminary lexicostatistics has received criticism: identifying cognates on the basis of formal similarity alone is a risky endeavour. Another reason for caution is that formal similarities can arise through contact between language varieties in the period following a language ‘split’. There are formal similarities in pairs such as the French famille and the English family, for example, but these are not treated as cognate; rather, they occur as a result of language contact. In that case, the Norman invasion of England in 1066 led to many borrowings from language varieties across the English Channel. Embleton (2000:149) points out that the ‘splits’ manifest in the Stammbaum or ‘family-tree’ model of language change, where language varieties separate and change in different ways, are often inaccurate. Varieties do not always separate quickly and ‘cleanly’, nor do they always subsequently develop independently of one another. Conversely, it is very common for proximate language varieties to borrow from each other. Having established a historical relationship between language varieties, the classical lexicostatistics method requires linguists to ascertain how many items on a fixed word list are ‘cognates’ for these varieties. Once a percentage is obtained, this is applied to a classificatory scale in order to group these varieties. The resulting percentage of cognates – between English and German, for example – depends on the items that are examined. Importantly, lexicostatisticians have argued that the core or basic vocabulary must be used, because it is (apparently) more resistant to phenomena such as borrowing than are the peripheral or general vocabulary (Gudschinsky, 1956a:613; Crowley, 1997:171). Lexicostatistics also rests on the assumption that the rate of lexical replacement is more or less stable (Crowley, 1997:172), and Swadesh (1950) developed a list of ‘core vocabulary’ comprising noncultural lexical items that are supposedly less prone to borrowing.3 All of these assumptions have subsequently been questioned by linguists. The very notion that there is such a thing as a basic or core vocabulary of items, that are independent of language or culture, is criticised by Campbell (2004:201), who also doubts the validity of the assumption that the rate of lexical retention can be constant through time, and that the rate of loss is the same cross-linguistically (202). There are several documented examples where basic vocabulary changes rapidly and unevenly, through borrowing and other phenomena, and this distorts the results,

Applying Lexicostatistical Methods to Sign Languages: How Not to Delineate Sign Language Varieties

Deaf Culture, History and Sign Language in France May-June, 2021 (Specific Dates TBA)

Sign Language Typology Series

Integrative System for Korean Sign Language Resources”

Sign Language Endangerment and Linguistic Diversity Ben Braithwaite

The Guardian (UK)

On the Status of the Icelandic Language and Icelandic Sign Language

What Sign Language Creation Teaches Us About Language Diane Brentari1∗ and Marie Coppola2,3

The French Belgian Sign Language Corpus a User-Friendly Searchable Online Corpus

The Oxford Handbook of CHINESE LINGUISTICS

8. Classifiers

Negation in Kata Kolok Grammaticalization Throughout Three Generations of Signers

Variation and Change in English Varieties of British Sign Languagei