Regional Variation in Lalo: Beyond East and West

Cathryn Yang La Trobe University, Australia SIL International

Abstract

Lalo, a Burmic, Central Ngwi language spoken in western , , has been classified as having two dialects: East Mountain and West Mountain (Chen et al., 1985). Wang (2003) mentions a few differences between the dialects, but, in general, previous research has left major patterns of variation and the relative degrees of difference unexplored. Fieldwork conducted in 2008, including the collection of wordlists and texts, reveals a linguistic diversity previously unimagined. This paper presents evidence for the subgrouping of three lower-level dialect clusters: Central, Northwestern, and Eastern, from both a diachronic and a synchronic perspective. Diachronically, shared innovations in tone, initials and rhymes are used as criteria for subgrouping. Synchronically, the Levenshtein distance algorithm is applied to quantify aggregate pronunciation differences (Heeringa, 2004), and the results are presented using NeighborNet network mapping and multidimensional scaling. The synchronic, dialectometric results show a high degree of congruence with the diachronic findings for lower-level clusters. East and West Mountain Lalo are found to group together within the Central Lalo cluster. Northwestern, Eastern, and Central Lalo are clear lower-level subgroups, with possible higher-level connections to each other that distinguish them from peripheral Lalo groups that migrated out of the Lalo homeland area. Besides presenting major regional variation patterns in Lalo, this research furthermore reveals that the degree of difference is considerable, a finding with implications for any future planning.

Keywords: Lalo, Ngwi, Levenshtein distance, dialectology, dialectometry

1. Introduction

Lalo is a Burmic, Central Ngwi language cluster spoken in Western Yunnan, China and is closely related to Lolo, Lisu and Lahu. While Lisu and Lahu dialects have been well documented (Bradley, 1979a; Bradley, 1994; Matisoff, 1973 [1982]; Matisoff, 2006), important questions about Lalo dialect geography have not yet been answered. What are the patterns of regional variation? Within those patterns, what are the major groups? Previous research (Chen et al., 1985) identifies two dialects, East Mountain and West Mountain, but fieldwork conducted in 2008 reveals a much more complex picture. Based on shared innovations and phonetic distance as measured by Levenshtein distance (see Section 2.2), several lower-level Lalo dialect clusters can be identified: Central (la!"lu#$$pa#!"), Northwestern (la!"lu$$pa%$), and Eastern (la!"lu$$pu&!). In addition, there are several peripheral Lalo groups who migrated out of the Lalo homeland at various times, and whose affiliation with other dialect groups is still uncertain: Xuzhang Lalu (la!"lu$$) and Yangliu Lalu (la!"lu%%pa%%) in Baoshan Prefecture, and Mangdi Lalo (lo!"lo$$p'#!") and Eka (o!"k(a!&) in Prefecture. This paper focuses on the three lower-level dialect clusters: Central, Northwestern, and Eastern, and leaves the question of higher-level relationships between the clusters and between peripheral groups for future research. The division between each of the dialect clusters is substantial enough to lead to low mutual intelligibility, a reality that must be considered in any future language development. Lalo ethnic population is estimated around 500,000 (Björverud, 1998); however, the number of Lalo speakers is likely smaller as Lalo in urban centers shift to Chinese (Bradley, 2002). Lalo are found mainly in Southern Dali, Northern Baoshan, Northern Pu’er, and Northern Lincang Prefectures in Yunnan Province (Chen et al., 1985). Lalo’s geographic distribution in Western Yunnan has led to Chinese linguists’ classifying it as “Western Yi,” the term “Yi” being the official ethnic category Lalo has been placed in (Zhu, 2005). Weishan and Nanjian Counties in Dali Prefecture are widely considered to be the traditional homeland of the Lalo. Central Lalo speakers who have migrated out of Weishan to Nanjian and Jingdong Counties even have a regional autonym linking them to the area. This group call themselves “Misha-pa,” referring to the ancient administrative region known as Mengshe or later Menghua, which originally encompassed southern Weishan and northern Nanjian (Bai, 2002). The last syllable -pa in “Misha-pa” is Lalo for “person,” so “Misha-pa” means “person from Misha” (pronounced Mengshe in Chinese) (Bai, 2002). Mengshe is an historically important region that gave rise to the Nanzhao kingdom, a semi-independent state that challenged imperial China’s hold over Yunnan during the Tang dynasty (618-907 A.D.) (Backus, 1981; You, 1994). Chinese historians of the Tang era classified the ruling Mengshe clan as “Wuman,” an ethnic category that contrasted with the “Baiman.” Backus (1981: 50) and Bradley (1979b: 91-92) link the Wuman of the Dali region with ancestors of the Yi, and Baiman with proto-Bai. But “Yi” is a contemporary ethno-political category that includes Northern and Central Ngwi languages, even as it excludes certain Central Ngwi languages such as Lisu and Lahu. The Central Ngwi speakers who have traditionally occupied the Mengshe region are Lalo, and their regional autonym “Misha- pa” specifically links them to the Mengshe region and clan. While a direct link between the Wuman of the Nanzhao period and the ancestors of the Lalo remains tenuous, the Lalo’s perceived link to a glorious “Mengshe” past underlines the importance of the Weishan and Nanjian area as their ancestral homeland. Given the Lalos’ long history of inhabiting the Mengshe (Weishan/Nanjian) area, the level of diversity found there among Lalo varieties is not surprising. Both local Lalo speakers and outside researchers recognize two major dialect groups within Weishan, East Mountain and West Mountain. One speaker from Ma’anshan Township in the West Mountain region and another speaker from Wajiacun in Yongjian Township in the East Mountain region distinguish their respective varieties with the terms “Xishan-pa” (West Mountain person) and “Dongshan-pa” (East Mountain person) (Blackburn, 2006). These names are based on the groups’ geographic distribution within Weishan: West Mountain speakers are located in the mountains to the west of Weishan County’s central valley, and East Mountain speakers to the east. East Mountain (less than 10,000 speakers) has a much smaller population and more limited geographic spread than West Mountain, being mainly located in the northeast corner of Weishan County. It is interesting to note that a valley divides these two Lalo groups, not a mountain. Because valleys facilitate travel, they usually serve to unify groups, while mountains divide. But in the case of Weishan, the valley serves as a boundary between groups. This can be partially explained by resettlement patterns that have developed since the Yuan dynasty (1279-1368 A.D.), after the Mongols invaded and destroyed the Dali kingdom in 1253 A.D. Continuing immigration by outsiders to the Weishan valley, especially during the Qing dynasty (1644-1912 A.D.) (Atwill, 2005), pushed Lalo out of the central valley and into the mountains on either side. Lalo social networks then grew along their respective mountain ranges (east and west) and seldom crossed Weishan’s central valley. Initial intelligibility of West Mountain Lalo by East Mountain listeners is in the mid to high range (68%) when tested using Recorded Text Testing methodology ((Casad, 1974) as adapted in (Kluge, 2007)). East Mountain listeners heard a recorded narrative from a West Mountain variety and then were asked to retell the narrative’s contents after hearing it again section by section. In contrast, West Mountain listeners understood only 53% of an East Mountain recorded text. Speakers from the two groups initially use Chinese to communicate with each other, but are able to acquire comprehension with further contact (Blackburn, 2006). Given “East Mountain” and “West Mountain’s” geographic link to the topography of Weishan County, it is not surprising that these labels are only applicable to Lalo varieties within Weishan County. Eastern Lalo speakers in Dali municipality, just north of the East Mountain area, reject the label “East Mountain,” and instead use a loconym linking them to Dali, not Weishan. Likewise, Central Lalo speakers living outside of Weishan also do not use the West Mountain loconym. Chen et al. (1985) incorrectly affix the East Mountain label to other Lalo dialect groups besides those found in the northeast corner of Weishan. Wang (2003) and Zhu (2005) follow in this error. These sources are wide-sweeping surveys of Yi languages as a whole, and for lack of adequate information have failed to distinguish differences among Lalo sub-groups. The dichotomy of East and West was deemed to be an adequate picture of Lalo diversity, until now. The result of this false dichotomy is a confusing demographic distribution: East Mountain varieties are claimed to range from Midu in the east all the way west to Baoshan Prefecture, located to the far west of the West Mountain area (Zhu, 2005). The East Mountain label has previously conflated the Central Lalo in northeast Weishan (the true East Mountain) and the Eastern Lalo spoken in Dali municipality. Since villages in this area are geographically close to each other, wear a similar ethnic costume, and are located to the east of the West Mountain variety, the label “East Mountain” seemed a good fit, but it does not correspond to linguistic reality. Wang (2003: 23) provides some specific phonological differences between East Mountain and West Mountain Lalo. However, the differences Wang notes are recent, surface-level changes. For example, Wang states that West Mountain has preglottalized continuants such as )m, )n, )l, and )v (from Proto-Ngwi *)-/s- prefix), while East Mountain does not. East Mountain has merged preglottalized continuants with their non-preglottalized counterparts. But Björverud (1998) notes that this merger is also currently taking place in Longjie township, a West Mountain variety, so it may be a recent change in East Mountain as well. The second difference Wang notes is that West Mountain has syllabic consonants m, )m, n, )n, l, )l, )v, *, but East Mountain has only m, n, *. However, since East Mountain has already merged preglottalized continuants with non- preglottalized ones, this second difference necessarily follows from the first. While Wang’s research fails to uncover significant differences between East and West Mountain varieties, other previous research (Björverud, 1998; Chen et al., 1985; Zhu, 2005) does not even attempt to provide evidence to substantiate the East-West dichotomy. Therefore, this research focuses on the following questions: 1) based on shared innovations, how do Lalo varieties group together diachronically? 2) based on aggregate phonetic distance, how do Lalo varieties cluster synchronically? This paper’s main questions are each the focus of one of two complementary methodologies. The comparative method reveals significant shared innovations, which imply a common social history and form the criteria for historical subgrouping. Levenshtein distance, a dialectometric measure of phonetic distance, feeds into NeighborNet network analysis, neighbor-joining tree building, and classic multidimensional scaling to map out the synchronic Lalo dialect clusters. The findings from both methodologies show that, in fact, there are at least seven main dialects/dialect clusters in Lalo, with three dialect clusters located near the traditional homeland of the Lalo (Central (C), Eastern (E), and Northwestern (NW)), and four peripheral varieties that migrated out from this area at various times (Xuzhang (XZ), Yangliu (YL), Eka, and Mangdi (MD)). Thus, the main demarcating line does not fall along the traditional East-West divide at all. In fact, East and West Mountain varieties, which both belong to the Central Lalo dialect group, are much closer to each other than either is to any other Lalo dialect cluster. Table 1 summarizes the Lalo dialect groups and the village clusters whose lexical data form the basis of these groupings. East Mountain Lalo and West Mountain are represented by one variety each; other Central varieties are not labeled as West Mountain, since they do not use that loconym. Varieties that are non-typical members of their assigned cluster are marked with a question mark (?), i.e., E?-TS and NW?-YL. Figure 1 shows the area of study in Yunnan Province, and Figure 2 shows the locations of the Lalo datapoints used in this paper.

Village Dialect group Prefecture County Township Autonym Abbreviation Cluster Central (East Mt) Dali Weishan Yongjian Yong'an la#!"lu#$$pa#!" CE-YA Central (West Mt) Dali Weishan Ma'anshan Qingyun la#!")lu#$$pa#!" CW-QY Dali Weishan Wuyin Longjie la#!"lu#$$pa#!" C-LJ Dali Yangbi Wachang Wachang la!"lu#$$pa#!" C-WC Dali Yongping Shuixie Leba la#!"lu#$$ C-LB Central la#!"lu#$$pa#!", Dali Nanjian Xiaowandong Chajiang mi%%sa!"pa#!" C-CJ Pu'er Jingdong Anding Qingsheng mi%%sa!"pa#!" C-QS Dali Dali Shijiao qu Diaocao la&!lu$$pu&! E-DC Eastern Dali Dali Fengyi Houshan la!"l+$$p+#!" E-HS Dali Dali Taiyi Taoshu la!"lu$$p+#!" E?-TS Dali Yangbi Taiping Dutian la!"lu$$po)&% NW-DT Dali Yangbi Longtan Shuizhuping la!"lu$$pa%$ NW-SZP Northwestern Baoshan Longyang Wama Shanglizhuo la!"lu$$pa%$ NW-SLZ Dali Yongping Changjie Yilu la!"lo$$pa%$ NW?-YL Xuzhang Baoshan Longyang Wafang Xuzhang la!"lu$$ XZ Yangliu Baoshan Longyang Yangliu Yangliu la!"lu%$ YL Eka Lincang Shuangjiang Heliu Yijiacun o!"k(a!& Eka Mangdi Lincang Gengma Hepai Mangdi lo!"lo$$p'#!" MD Table 1. Summary of Lalo datapoints

Figure 1. Area of study, adapted from Yang and Chan (2008)

Figure 2. Lalo datapoints, adapted from Chan (2008)

2. Methodology

2.1. Comparative Method

The comparative method from historical linguistics uncovers shared innovations, which I then use as phonological isoglosses, revealing the geographic distribution of specific linguistic variables. For this research, I recorded and transcribed word lists of 1,000 lexical items from 18 Lalo villages, summarized in Table 1 above. In addition, both published and unpublished word lists were used as references (Björverud, 1998; Blackburn et al., 2007; Hu and Duan, 2000; Huang and Dai, 1992; Lam et al., 2007; Sun, 1991; YNYF, 1984). The lexical data from the 18 Lalo villages were then compared both to each other and to Bradley’s (1979b) reconstruction of Proto-Ngwi (PN) and Matisoff’s (2003) Proto-Tibeto-Burman and Proto-Lolo-Burmese (PLB) to reconstruct Proto-Lalo (PLa). Shared innovations after the PLa level were then used for subgrouping. In this paper, the Chao (1930) system of tone letters is used to represent relative pitch values. Chao’s system allows five levels of pitch, 1 being lowest and 5 being highest. Level tones are represented by two identical pitch values, e.g. a high level tone is marked 55, while contour tones are marked with initial and final pitch values, e.g. a low-rising tone is marked 24.

2.2. Levenshtein Distance

Levenshtein distance, also called string edit distance, is an algorithm that measures the least cost of transforming one string of information into another (Heeringa, 2004; Kruskal, 1999). In dialectometry, Levenshtein distance measures the least cost of transforming one cognate’s pronunciation into that of a cognate from another variety. As such, it works as a measure of the synchronic phonetic distance between varieties. I use phonetic distance as a complement to the comparative method. Instead of grouping varieties based on their phylogeny, phonetic distance presents a synchronic picture of Lalo varieties based on overall similarity. This perspective is helpful when considering speakers’ perceptions of difference (e.g., perceptual distance and intelligibility) and also for language planning. Although only recently developed, Levenshtein distance has already been applied to Irish (Kessler, 1995), Dutch and Norwegian (Heeringa, 2004), American English (Nerbonne, Forthcoming), Bulgarian, (Osenova et al., Forthcoming), and Nisu (Yang, 2009), with results to suggest it as a useful dialectometric tool. Levenshtein distance has been shown to correlate strongly with both speakers’ perceptual dialect distances and intelligibility. Gooskens and Heeringa (2004) found that Levenshtein distance and Norwegian speakers’ perceptions of dialect difference were highly correlated, with a Pearson’s correlation coefficient of r = .67 and a level of significance of p < .001. Gooskens (2006) also found a strong negative correlation with intelligibility test results for Scandinavian languages (r = -.82, p < .01), meaning that an inverse relationship exists between phonetic distance and intelligibility: the greater the phonetic distance, the lower the intelligibility. Yang (2009) shows a similar strong correlation between Levenshtein and intelligibility test results, but this time for Nisu, a Northern Ngwi language: r = -.62, p < .001. Yang and Castro (2008) apply Levenshtein distance to a non-Ngwi, Tibeto-Burman language (Bai) and a Tai language (Hongshuihe Zhuang), and also find a consistently strong correlation with intelligibility: r = -.75, p < .001 for Bai, and r = -.72, p < .001 for Zhuang. The consistent results suggest that Levenshtein distance effectively approximates intelligibility among dialects of East Asian tonal languages as well as Indo-European languages. Levenshtein distance aligns the phonetic segments of two cognates and computes the least cost of transformation in terms of substitutions, insertions and deletions. Figure 3 below illustrates Levenshtein distance between the pronunciations for the word ‘tiger’ in E Lalo (DC) (/l+!"pu&!/) and C Lalo (QY) (/la!"pa,!"/). Tone is represented by onset and following contour (e.g. low-falling as LF, high-falling as HF), as this representation was found to best approximate cross-dialectal intelligibility (Yang and Castro, 2008). Insertions, deletions, and substitutions are all weighted the same, as in (Heeringa, 2004). Figure 3 demonstrates a binary comparison method: either the sounds are the same, with no transformation cost, or they are different, with a cost of one. There are also gradual measures that break down each phone into feature bundles and then compare the feature bundles, so that the difference between an [i] and [-] is less than that between [i] and [']. However, Heeringa, Kleiweg, Gooskens and Nerbonne (2006) and Heeringa (2004) found that using binary segment differences was equally valid with that of using gradual measures. That is, using binary measures, the correlation to perceptual experiment results was equally high compared to that of gradual measures. This is a surprising result if viewed from a historical linguists’ standpoint. One would expect that transformations between sounds that are more similar should have less weight, just as they do in historical linguistics. McMahon and McMahon (2005) criticize the phone string comparison for its bluntness. However, the external validation of this method comes from actual perception experiments, and justification of the method also comes from examining how a native speaker processes differences. Heeringa (2004) suggests that because native speakers are sensitive to very slight differences in pronunciation, this weighting of small differences makes sense in the context of the correlation between Levenshtein distance and speakers’ perceptions. As shown in Section 3.1.3, how slight differences impact intelligibility is due to how the two compared phonological systems correspond, not whether the sounds are judged alike by criteria outside those systems. In order to prevent longer words from having undue weight in the calculation of average distance, a normalization function was used in which the total cost is divided by the longest alignment. The distances were then expressed as proportions in decimal form. In Figure 3, the total cost of 5 is divided by 9, the alignment length of the two cognates, for a normalized cost of .56.

Location Transcription Operation Cost

E-DC l+LEpuMF

laLEpuMF substitute a for + 1

laLEpaMF substitute a for u 1

laLEpaLF substitute L for H 1

laLEpaLE substitute E for F 1

CW-QY laLEpa#LE insert # (harsh voice) 1

total cost 5

normalized cost 0.56

Figure 3. Operations in calculating Levenshtein distance

A subset of 400 lexical items was selected from the 1,000 item word list and then analyzed using the RuG/L04 dialectometry software package developed by Peter Kleiweg (2004). L04 software requires the wordlists to be in machine readable code, not Unicode IPA, so the Unicode transcriptions were converted into XSAMPA code (Wells, 2005). The mean Levenshtein distance was calculated for each pair of varieties based on the Levenshtein distances of all pairs of lexical items, resulting in a distance matrix. The Levenshtein distance matrix is then used as input into the tree- and network-building program NeighborNet, first developed by Bryant and Moulton (2004) for use in evolutionary biology. Quantitative historical linguists such as April McMahon’s team at the University of Edinburgh have used NeighborNet to give implicit phylogenetic trees based on lexical comparison (McMahon, 2005; McMahon and McMahon, 2005). The use of NeighborNet here is primarily phenetic, not cladistic; that is, NeighborNet uses the Levenshtein distance matrix to present a network of Lalo varieties based on their overall phonetic similarity, not on their shared innovations. NeighborNet presents a snapshot of cross-varietal relations that includes all synchronic similarity at the phonetic level, whether the similarities are the results of retentions, shared innovations significant for subgrouping, contact-based change, or parallel developments. McMahon et al. (2007) uses NeighborNet in a very similar way and defends this use by pointing out that the goal in this kind of dialectometry is to reveal synchronic relationships, not diachronic ones. Phenograms such as the Lalo clusters seen in Figure 5 (Section 3.2) are diagrams that depict ‘taxonomic relationships … based on overall similarity ... without regard to evolutionary history or assumed significance of specific characters’ (Dictionary.com). As such, they do not illustrate genetic relatedness; rather, they show a synchronic snapshot of the relative phonetic distance between closely related varieties. The phenograms show varieties’ relative distance from each other in terms of their pronunciation of the cognates in the word list. Often they resemble a phylogenetic tree, but not necessarily, especially if there has been significant contact. Frequent sound correspondences have a greater impact on the phonetic distance than infrequent ones, since frequent sound correspondences add to the differences in many pair-wise comparisons, while infrequent ones do so only to a few. For example, the /a/-/o/ correspondence between Central and Eastern varieties occurs in roughly ten percent of the lexical items. If it only occurred in five percent, it would not have as great an impact on the overall distance between the varieties. However, those frequent sound correspondences are the very ones that are likely to be salient from the perspective of listeners’ perception of difference. Whether or not the frequency of the sound correspondence assists in cross-dialect intelligibility is further discussed in Section 3.2. As McMahon et al. (2007) note, one of the advantages NeighborNet brings is the ability to represent multiple tree structures in a single diagram. If there are similarities that are incompatible with one tree, NeighborNet still shows it through reticulated lines that form a net-like structure. In this way, ambiguities or mixed signals in the data are made explicit, instead of being collapsed into a single line as they are with clustering methods such as UPGMA (Unweighted Pair Group Method with Arithmetic mean, also known as average linkage method). When measuring phonetic distance, lexical variation is filtered out, and only cognates are accepted as data. However, Lalo is an isolating language with many di-syllabic words, in which one syllable may be cognate with other dialects, but the other syllable may not be. To filter out this “sub-lexical” variation would place the Levenshtein distance at the morpheme level, not the lexical level, thus removing it even further from the context in which communication occurs. Therefore, I chose not to remove the non-cognate syllables in di-syllabic words. When two words had no overlapping syllables, I treated them as different lexemes.

3. Findings

3.1. Isoglossic Patterns

3.1.1. Divergent Developments of Tones in Northwestern and Eastern Lalo

Table 2 below summarizes the development of tones in Central, Eastern and Northwestern Lalo from Proto-Lalo, based on Bradley’s (1977; 1979b) reconstruction of Proto-Ngwi. Proto-Lalo, like Proto-Ngwi, had a three-way pitch height contrast in syllables ending in vowels or nasals: *1, high; *2, low; and *3, mid. In Proto-Ngwi closed syllables, there was a two-way pitch height distinction, *High-stopped and *Low-stopped (Matisoff, 1972). Syllable-final stops eventually merged to a glottal stop and then to laryngealized vocal register on the vowel, as seen in most Ngwi languages. Proto-Lalo *High-stopped (*H) was a mid pitch with harsh phonation, and *Low-stopped (*L) a low pitch with harsh phonation, as is still seen today in Central Lalo varieties. In this section, I present the divergent developments of tones that provide evidence for distinct NW and E clusters. All NW varieties share a tonal innovation chain: Tone *1 becomes low rising except in syllables with *preglottalized sonorants, *L becomes high, and *1 in *preglottalized syllables merges with *H to mid high. Harsh phonation is completely lost in NW, enabling the merger of *1 and *H. Eastern varieties share a distinct chain, though with some parallels to NW: Tone *1 becomes low rising in syllables with voiced initials and remains high elsewhere, and *3 and *H merge to mid. E-HS and E?-TS lose harsh phonation in *H, but not in *L, while E-DC loses harsh phonation completely.

Proto-Lalo Central Lalo Northwestern Lalo Eastern Lalo *1: High High [55] a) Mid-high [44] /[*)sonorants]_ a) High [55]/[*-voi]_ b) Low rising [24]/elsewhere b) Low rising [24] /[*+voi]_ *2: Low, breathy Low, breathy [21] Low[21] Low [21] *3: Mid Mid [33] Mid [33] Mid [33], modal *H: Mid, harsh Mid, harsh [33] Mid-high, modal [44] *L: Low, harsh Low, harsh [21] High, modal [53] Mid falling, modal [31], in E-DC; Low, harsh [21], in E-HS, E?-TS Table 2. Central, Eastern and Northwestern Lalo tonal development from Proto-Lalo

Table 3 below presents the innovations in tone shown by various Lalo varieties; a check (!) denotes the locations that show a particular innovation. Innovations considered significant for these clusters are enclosed with lines. Eastern varieties all show the the merger of *3 and *H as well as the Tone *1 split based on voicing of the initial. Core NW varieties (SLZ, SZP, and DT) all show the same unusual conditioning for a split to low rising: low rising in both voiced and voiceless initials, remaining high only in syllables with *preglottalized sonorants. Non-typical NW?-YL shows the Eastern-type Tone *1 split. All NW varieties, even the non-typical NW?-YL, share a tonal innovation chain where *L > high [55], and *1 in syllables with *preglottalized sonorants merge with *H to mid-high [44]. There is further discussion of NW Lalo tone change in (Yang, accepted). The probability of NW and E varieties at one point sharing an Eastern-type Tone *1 split based on voicing is discussed below. Peripheral varieties YL and XZ show divergent tonal innovations that do not qualify them for membership in any of the lower-level clusters. XZ shows the Eastern-type Tone *1 split, and also a three-way merger of *L, *H and *3 to mid. YL shows an unusual split in Tone *1, becoming mid after voiceless initials and low rising elsewhere, a split that must have occurred after the loss of preglottalization, as is discussed further below. YL also shows the merger of *L and *H to high.

Location YL XZ E E- E? NW NW NW NW? CE Innovation DC HS TS DT SLZ SZP YL YA *1 > low rising /[+voi]_ ! ! ! ! ! ! *H, *3 > mid [33] ! ! ! *1 > low rising/non-)-_ ! ! ! *L > high [55] ! ! ! ! *1/+)-_ ,*H> mid-high [44] ! ! ! ! 1 *L, *H> high [55] ! *1 > mid/[-voi]; low rising/elsewhere ! *L, *H, *3 > mid [33] ! Loss of harsh phonation ! ! ! (p) (p) ! ! ! ! Table 3. Innovations in tone from Proto-Lalo

Bradley (1979b; 2004) demonstrates that Central Ngwi languages typically show a conditioned tone split in Proto-Ngwi Tone *1, albeit with varying conditioning environments. In Lisu, non *)- and *s- prefixed syllables show a mid-level pitch 33 (Lisu Tone 4), while *)- and *s- prefixes have a mid-high level pitch reflex 44 (Lisu Tone 3). Lahu shows a conditioned tone split in Tone *1 as well, but with a different conditioning environment. In Lahu, voiceless initials condition a mid-level pitch 33 (Lahu Tone 5), and voiced initials a low-falling pitch 21 (Lahu Tone 2) (Bradley, 1979b). Interestingly, Central Lalo varieties do not show any split in Proto-Ngwi Tone *1, unlike other Central Ngwi languages; neither do Southern Ngwi language Hani (*1 > [55]) nor Northern Ngwi languages Nosu and Nasu (*1 > [33]). Central Lalo’s consistently high pitch reflex for Tone *1 supports the hypothesis that Proto-Ngwi Tone *1 was phonetically high, as Bradley (1977; 1979b) argues. Several other Lalo varieties, in contrast with C Lalo, do have a conditioned tone split in *1, resulting in the creation of a contrastive, low rising tone. Table 4 below gives examples in E, NW and other varieties. In E Lalo, along with XZ, NW-YL and CE-YA, the conditioning is a

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! " NW-YL merges *1 and *H, but with slightly different conditioning than other NW varieties. In NW-YL, *1/[- voi]_,*H> high [44]. combination of the two conditioning environments that caused the splits in Lisu and Lahu, respectively, i.e. voicing and *)-/*s- prefixes. Crucially for this split, Proto Lalo retained the Proto-Ngwi *)- prefix before sonorants, resulting in a distinctive series of preglottalized sonorants and *)m, *)n *)l, *)v (*)v comes from Proto-Ngwi *)w). Both *preglottalized and *voiceless initials condition a high pitch reflex, but plain, non-prefixed *voiced initials condition a low rising pitch reflex. When preglottalization was lost in these varieties, the low rising pitch became a contrastive tone. The phonetic seeds of this tone split can be seen in Central Lalo’s subphonemic variation, where allotones for Tone *1 are conditioned by the manner of articulation of the initial. Syllables with voiced initials have a rising [45] pitch, and syllables with voiceless initials and preglottalized sonorants (*)m, *)n *)l, *)w) show a high-level [55] pitch. Most likely, this same allotonic variation was found in E Lalo and other Lalo varieties (NW-YL, CE-YA, XZ), setting the stage for the tone split. When these varieties merged *preglottalized sonorants with *plain sonorants, what was previously subphonemic variation became contrastive, and the phonemic low rising tone category was born. NW Lalo, on the other hand, shows a more unusual conditioning: both Proto-Lalo *voiced and *voiceless initials condition a low rising pitch. Only *preglottalized sonorants condition a mid-high pitch reflex, although there are a few exceptions to this rule, as shown in Table 4 below. In NW-DT and NW-SZP, *)v may have merged with the plain initial before the split, because words such as ‘to winnow,’ *)va" have a low rising pitch. A few syllables with voiceless aspirated , such as *ts(a*" ‘person’, and *t.(i" ‘rice’, unexpectedly show a mid-high pitch. In general, syllables with Proto-Lalo voiceless initials, such as *xy" ‘iron’, and *ts(y" ‘hair’ show a low rising pitch in core NW varieties. NW?-YL, a non-prototypical NW variety, shows the Eastern-type Tone *1 split, with all voiceless initials showing a mid-high reflex. NW?-YL is still classified as a NW variety because it shares the NW tonal push chain in which *L becomes high, pushing the remnants of Tone *1 down into *H’s territory, resulting in a merger of *1 and *H. YL, a peripheral variety found in Baoshan, the far west of Lalo distribution, shows yet another variation of the Tone *1 split, now only seen in the speech of older speakers. Speaker 5b (over 50 years old) shows a split in which syllables with *voiceless initials merge with *3 to mid, while all other initials, including PLa *preglottalized sonorants, show low-rising pitch. The low- rising pitch occurring even with PLa *preglottalized sonorants suggests that this split occurred after the loss of *preglottalization, and therefore the *preglottalization had no effect on the development of the pitch. This ordering is in contrast to all other Lalo varieties, in which *preglottalized sonorants must have been present at the time of the Tone *1 split to condition the development of the pitch. Speaker 5a (18 years old) has merged tonal reflexes for voiced and voiceless initials to a mid level [33] pitch. This generational difference suggests that YL is undergoing a merger of the Tone *1 [voiced initial/no initial] category with the already merged Tone *1 [voiceless initial] and Tone *3 categories. Eka, a peripheral Lalo variety found in the southern extreme of Lalo distribution, shows a consistent low rising tone for almost all reflexes of Proto-Lalo Tone *1; only a few syllables with voiceless aspirated initials shows a high pitch.

Proto- E- CE- NW? NW- NW- NW- YL- YL- Gloss Lalo XZ DC YA YL SLZ DT SZP old young Eka light (adj.) *la*" la!& la!& lu!& lu!& lu!& la!& lu!& lu!& lu$$ la!& sick *na" no!& n+!& na!& n/!& na!& no!& na!& na!& na$$ no!& bamboo *ma" mo!& m+!& mæ!& m/!& ma!& mo!& ma!& ma!& ma$$ / to winnow *)va" o%% +%% va%% v/&& a*!& o!& va&& oa*!& oa*$$ o!& tongue *)la" lo%% l+%% la%% l/&& la&& lo&& la&& la!& la$$ lo!& stick *)m0" m01%% m0%% / / m01&& m01&& m0&& / / / listen *)na" no%% n+%% n'%% n/&& na&& no%% na&& / na$$ / person *ts(a*" ts(a%% ts(a%% ts(u%% ts(u&& ts(u&& ts(a!& ts(u&& / ts(u$$ ts(a!& head hair *ts(y" ts(y%% ts(2%% t3(i%% t3(y&& t3(y!& t3(y!& t3(y!& / ts(2$$ ts(2!& iron *xy" xv4%% x2%% hu%% 3y&& xue!& xy!& xu!& x5$$ x5$$ x2!& star *k0" k0%% k0%% k6%% k6&& k0!& k0!& k0!& k0$$ k0$$ ki!& Table 4. Splits in Tone *1

The similarity between NW, E and other Lalo varieties’ Tone *1 split leads to the question of shared innovation. This hypothesis posits that the Eastern-type version of the Tone *1 split (*1 > low rising/+voi_, high elsewhere) occurred first among most non-Central varieties, and then only in NW varieties did the low rising tone spread to syllables with voiceless, non-preglottalized consonants, resulting in the conditioning now seen (*1 > mid-high/non-preglottalized_, low-rising elsewhere). However, there are a few reasons that I choose not to posit a link between NW and E at this time, leaving this question for further exploration in the future. First, voiced, non-preglottalized initials condition a split to low rising not only in Eastern, but also in the peripheral varieties YL and XZ, one NW variety (NW-YL), and one Central variety (CE-YA). The same exact conditioning also causes the same split in Tone *1 in non-Lalo languages in the Central Ngwi branch: Limi, spoken in Lincang Prefecture, and Gomotage, spoken in Eryuan County in Dali Prefecture, based on preliminary analysis of wordlists gathered there during my fieldwork in 2008. The non-contiguous distribution of the split within Lalo, occurring in each of the NW, E, and C clusters and in peripheral Lalo groups, and even occurring in languages that are clearly not Lalo, suggests that this split is a weak subgrouping criterion for linking across dialect clusters. Second, the phonetic naturalness of this change also lessens its value for subgrouping. Voiced initials conditioning a low pitch follows the voiced-low principle found in many other tonal languages, such as Lahu (Bradley, 1977) and Tai languages (Li, 1977). Eastern Lalo’s Tone *1 split presents a slight variation on the voiced-low principle: only the tonal onset is lowered, not the pitch of the entire syllable. Since the original pitch target of Tone *1 was high, depressing the onset results in a rising tone. Plus, the allotonic variation seen in Central Lalo (high with voiceless and preglottalized initials, rising with voiced initials) suggests that the contrastive low rising tone follows naturally after the merger of preglottalized and plain sonorants. Third, the split appears to have occurred in CE-YA as a result of contact with Eastern varieties. CE-YA is located in Yongjian Township in Weishan County, just to the south of Dali and right on the border between Central and Eastern Lalo. Ties between the two groups through marriage, family and friendship results in frequent, sustained contact. Most other innovations in initials and finals, however, place CE-YA as a Central Lalo dialect, not an Eastern one. CE-YA is at heart a Central dialect, but in close contact with Eastern Lalo. Other East Mountain varieties vary as to whether they show the split or not. Lam (2009), personal communication, reports that while villages in Yongjian (e.g. Yong’an) and Northwest Dacang (e.g. Xinsheng) townships show the split, villages just to the south of this area do not. Speakers in Caochang village in Miaojie township, to the south of Dacang, claim their ancestors first moved to that location from Northwest Dacang (Xinsheng) 200 years ago. Caochang speakers do not show the split, and instead have a high level tone for the entire Tone 1 category. This implies that the split has happened in the past 200 years in those East Mountain varieties located closest to Eastern Lalo. The geographical distribution of the tone split within the East Mountain area suggests diffusion from Eastern Lalo, rather than shared innovation. Finally, several innovations in the initials suggest a possible link between NW and C, to the exclusion of the E cluster, as shown in Section 3.1.2. When weighed against a series of innovations, the E and NW Tone *1 splits, which don’t even share the same conditioning, seem paltry evidence for higher-level subgrouping. The non-contiguous distribution of the Eastern-type split, including languages that are clearly not Lalo, the phonetic naturalness of the conditioning of that split, and the apparent ease of its spread to neighboring varieties all weaken the Eastern-type Tone *1 split’s value as a unique, subgroup-identifying criterion. The various sets of tonal innovations described above distinguish NW and E clusters from each other and from C Lalo, which retains the Proto-Lalo tonal system. Innovations in initials, particularly the divergent paths of Proto-Lalo bilabial stops before *o and *i, distinguishes Central Lalo in contrast to NW and E Lalo, as described in Section 3.1.2 below.

3.1.2. Divergent Developments of Initials in Central and Northwestern Lalo

Divergent innovations of the bilabial stops before PLa *o and *i provide partial evidence for lower level divisions between Central, Northwestern and Eastern. Table 5 below summarizes the innovations in bilabial stops before *o and *i. In both the table and the following discussion, *b is used as a cover symbol for all PLa bilabial stops (*p(, *p, *b), since the set develops in parallel. The first column in Table 5 shows the relative ordering of changes in both C and NW (unmarked) and the separate innovations that occurred in C (marked C) and NW (marked NW, but includes XZ and E?-TS).

# Innovation X E? NW NW NW NW? CE C C C C C C Z TS DT SLZ SZP YL YA QY WC CJ QS LB LJ 1. *bo >*[b7i] ! ! ! ! ! ! ! ! ! ! ! ! ! 2.C phonologization of *b7i ! ! ! ! ! ! ! ! 3.C *b7 > v ! ! ! ! ! 3.C *b7 > d8 ! ! 2.NW *bi >*[b7i] ! ! ! ! ! 3.NW *[b7i] >dz9 ! ! ! ! ! ! Table 5. Innovations in bilabial stops before *o and *i

In all Northwestern varieties (plus XZ and E?-TS), bilabial stops became alveolar affricates before *o (e.g. *bo > /dz9/), but in most Central varieties, bilabial stops became labiodental fricatives (e.g. *bo > /vi/). Most Eastern Lalo varieties (excluding E?-TS) do not share the (af)frication innovation; instead, *b remains a stop before both *o and *i. This is also the case in peripheral Lalo varieties such as YL, MD, and Eka. The divergent innovations in *bo ( > /dz9/ in NW, > /vi/ or /d8i/ in C) point to an intermediate stage shared by NW and C, in which *b was realized as labiodental *[bv] before Proto-Ngwi *o (corresponding to Matisoff’s (2003) *2w). Some Lalo varieties still show remnants of this *labiodental affricate, as in pf!i"" ‘rooster’ (< *p(o") in C-QS. Although the exact phonetic quality of the Proto-Lalo vowel is uncertain, the vowel may have been pronounced with 'lip compression,' a manner of articulation in which the lower lip pushes upwards to the upper lip. Lip compression of the vowel may have spread to the labial initial, resulting in the insertion a labiodental in the transition from initial stop to vowel. This lip compression spread to the initial is also seen in many Lalo dialects' reflex of fv#"", 'silver' from *p(u". Lahu also has labiodental affricates as reflexes of *bo (>/pu/ [pf5]) and *po (> /p(u/ [pf(5]) as a result of synchronic processes; in Lahu, labiodental affricates are allophones of bilabial stops before /u/, which in turn is always found with lip compression in this environment. The *labiodental affricates were allophones of the bilabial stops until *o became i after labial stops, which is another change shared by NW and C varieties. This change resulted in the phonologization of the labiodental affricate in all Central varieties and in NW?-YL, so that there was a contrast between *bvi and *bi. This can be seen by the separate correspondence sets of *bo (>*bvi), which became either /vi/ or /d8i/, versus *bi, which in most C varieties remained /bi/. See Table 6 and Table 7 below for examples of the separate developments of *bo and *bi. The feature that contrasted these two correspondence sets was the manner of the initial (*bv versus *b), which prevented their merger, even after *o > i after labial stops. The intermediate phase of *labiodental affricates, although only still seen in a few words in C-QS (e.g. pf!i"" ‘rooster’), makes sense given their development into labiodental fricatives in most C varieties and into palato-alveolar affricates in C-LB and C-LJ. In most NW (plus peripheral XZ and non-typical E?-TS), no such phonologization of labiodental affricates took place. Instead, there was a complete merger of *bi and *bo to /dzi/ [dz9]. Apical vowel [9] was at that stage an allophone of /i/ after alveolar affricates or fricatives, though subsequent changes transferred this vowel to be an allophone of close /6/ in several varieties. The merger of both *bo and *bi into an alveolar affricate suggests that *bi also had a non-contrastive labiodental fricative transition between the initial and close , e.g. *bi [bvi]. The labiodental affricates from both *bo and *bi then became alveolar affricates. In NW?-YL, a non-typical member of the NW group, *bo > /dzi/, but *bi remains /bi/. NW?-YL may share with Central the creation of contrastive labiodental affricates before /i/, i.e. *bvi versus *bi. However, NW?-YL shares with other NW varieties the innovation wherein *bv (a contrastive phoneme in NW?-YL, but a non-contrastive allophone of /b/ before /i/ in core NW) became an alveolar affricate. The evidence does not suggest a shared intermediate stage of *alveolar affricates *ts( and *dz simplifying to f and v in Central Lalo. Proto-Lalo already had *alveolar affricates, and if Central Lalo had merged the *labiodental affricates with the *alveolar affricates, the series could not then un-merge to today's distinct labiodental fricatives and alveolar affricates. Table 6 below gives examples of the developments in *bo, while Table 7 shows *bi. CE-YA shows variation in its reflexes of *bo, in keeping with its transitional nature and contact effects. While CE-YA’s reflexes of *p(o!, *bo" and *bo! match the usual Central Lalo reflexes, *p(o" ‘rooster’ and *bo$ ‘owe (v.)’ show the alveolar affricate. The majority of Central Lalo varieties (CW-QY, C-CJ, C-WC, C-QS) show a consistent labiodental fricative. C varieties C-LJ and C-LB show palato-alveolar affricates /t.(/ and /d8/ (realized as t$! and d%). This is not a shared innovation with NW Lalo, since C-LJ/C-LB show palato-alveolar affricates rather than alveolar, and they do not merge *bo and *bi as most NW varieties do. Comparing the two tables reveals that NW-DT, a representative of the core NW group, has completely merged *bi and *bo to /dzi/ [dz9]. In contrast, NW?-YL shows /dzi/ for *bo, but /bi/ for *bi, making it a non-typical NW variety. CE-YA, while a bit erratic in its reflex of /vi/ for *bo, nonetheless consistently shows /bi/ for *bi. C-CJ shows /vi/ for *bo, but /dzi/ for *bi. CW- QY shows an almost complete merger of *bo and *bi to /vi/, except for the voiceless unaspirated stop in /pi!"/ ‘classifier for grandparent + grandchild.’ C-LJ consistently shows /d8i/ for *bo, but irregularly shows both /d8i/ and /bi/ for *bi. C-LJ’s reflex of /f0#!"/ for p(i#: is unexpected (the expected reflex is t3(i#!") and may be a loan from surrounding C varieties.

Proto- NW?- Gloss Lalo NW-DT YL CE-YA C-CJ CW-QY C-LJ crow (v.) *bo" dz9!& dz9!& vi!& vi%% vi%% d;i%% carry (v.) *bo! dz9!" dz9!" vi!" vi

Proto- NW?- Gloss Lalo NW-DT YL CE-YA C-CJ CW-QY C-LJ pus *bi" dz9!& bi!& bi!& dz9!& vi%% d;i%% full *bi$ dz9$$ bi$$ bi$$ dz9$$ vi$$ bi$$ spit *p(i#: ts(9&% / p(i#!" ts(9#!" fi#!" f0#!" older woman *p(i! ts(9!" p(i!" p(i!" ts(9!" fi!" p(i!" grandparent+ grandchildren (classifier) *pi! la! ts9!" pi!" pia!" ts9!" pi!" pi!" Table 7. *bi

There is a distinct but somewhat parallel correspondence set in which aspirated bilabial stop *p( becomes f before the syllabic fricative v4, while unaspirated *p becomes velar stop k before *v4. Table 8 below gives examples. This occurs in Central, Eastern and some NW varieties. Peripheral varieties YL, XZ, Eka, and MD do not take part in this set of innovations, which may have occurred after these groups left the Lalo homeland area. Instead, they retain bilabial stops. E- DC/HS, NW-SLZ/SZP, and C-LJ/NJ/QS/CJ all show k for *p and and f for *p(. CE-YA probably had f for *p( at an earlier stage, but has now weakened it to h. NW-YL and C-LB, which are neighboring varieties, show velar stops for both *p and *p(. CW-QY and C-WC show p for *p and f for *p(. The additional correspondence sets shown below serve to place the innovations of *bo and *bi in context. Except for peripheral varieties, all other Lalo varieties show interaction between the bilabial stops and the following rhyme.

Proto- NW?- NW- CW- English Lalo XZ YL E-DC CE-YA SZP C-LJ C-CJ QY porcupine *pu" / kv4%% k=>%% k2%% / kv4%% kv4%% / steam *pu! / kv4!" k=>&! k2!" kv4!" kv4!" kv4!" p6!" open *p(u$ p(v14$$ k(?v4$$ / h2$$ fv4$$ fv4$$ fv4$$ fv4$$ silver *p(u" p(v14%% k(?v4%% f=>%% h2%% fv4!& fv4%% fv4>%% fv4%% Table 8. *pu and *p(u

These innovations in the bilabial stops provide evidence for grouping all Central Lalo varieties together, and partial evidence for grouping NW varieties. All Central Lalo varieties share the process wherein *bo > bvi, the creation of contrastive labiodental affricates, and the subsequent simplification of those affricates to either labiodental fricatives of palato-alveolar affricates. Proto- Central Lalo retained *bi as bi, as seen in most C varieties today, with further developments in C- CJ (*bi > dzi) and CW-QY and C-WC (*bi > vi except for voiceless unaspirated). All Northwestern Lalo varieties share the change *bo > dzi, and all core NW show a complete merger of *bo and *bi > dzi. Complicating this neat subgrouping is the fact that NW?-YL takes part partially in the innovations in both Central and Northwestern, and XZ and E?-TS take part in all the NW innovations. Therefore, for subgrouping NW, these changes provide less clear evidence than the tonal innovation chain described in Section 3.1.1. Nonetheless, for Central Lalo, these unusual innovations serve to group Central Lalo varieties together in contrast to Northwestern and Eastern Lalo.

3.1.3. Divergent developments of rhymes in Central and Eastern Lalo

Eastern Lalo and Central Lalo’s divergent developments of rhymes give partial evidence for grouping them as distinct clusters. Central varieties share the merger of *e and *i to i in both modal and harsh phonation, and the merger of *0# and *a# to a. E varieties share the merger of *o, *y, and *6 to 6, an innovation unique to this cluster, although non-typical E?-TS shows only partial participation. These changes and others are presented in Table 9 below.

E- E- E? XZ NW NW NW? CE- CW C- C- C- C- C- DC HS TS DT SLZ YL YA QY WC CJ QS LB LJ *o > i/labial stops_ ! ! ! ! ! ! ! ! ! ! ! ! *o > v4/ velars, m_ ! ! ! ! ! ! ! ! ! ! ! *a* > u ! ! ! ! ! ! ! ! ! *e, e# > i, i# ! ! ! ! ! ! ! ! (p) *0# > a# ! (p) ! ! ! ! ! ! *o, *y, *6 > 6 ! ! (p) *a > o ! ! ! ! ! *a* > a ! ! ! ! ! Table 9. Divergent developments of rhymes in Lalo varieties

Eastern varieties share a chain of innovations: *o, *6 and *y merge to /6/, *a > /o/, and *a* > /a/. E?-TS participates fully in the *a > o, *a* > a chain, but shows only a partial merger of *o, *y, and *6. In E?-TS, both *y and *o engage in multiple conditioned splits: *y > 6 after labials and > u after velars, remaining y elsewhere, while *o > 6 after nasals *m and ** (there are no examples of *no) and > u after *velars. The result is a partial merger of *o, *y, *6, and this partial participation with other E varieties reinforces E?-TS's status as a non-typical member of the cluster. E?-TS also qualifies for membership in the Eastern cluster because of its shared merger of *H and *3 to mid, although its innovations in the initials suggest a possible link with NW varieties, as described in 3.1.2. The chain of *a > o, *a* > a also occurs in NW-DT, XZ, MD, Eka, and in other Ngwi languages, including Nanhua (NH) Lolo (Sun, 1991). Bradley (1979b) finds this isogloss insufficient for Ngwi subgrouping due to its dispersed geographical distribution: *a > o happened independently in different languages belonging to each of the Ngwi sub-branches (Central, Southern, and Northern). Therefore, more evidence is required before grouping together all varieties that show *a > o, *a* > a. Table 10 below gives examples of the different developments of *a, *a*, *o, *y, and *6 in Eastern Lalo and Central Lalo.

Gloss Proto-Lalo E-HS E-DC E?-TS C-LJ C-WC to exchange *pa" p+%% p+%% pu%% pa%% pa%% meat *xa! x+!" x+!" x+!" xa!" xa!" ear *pa*" pa%% pa%% pa%% pu%% pu%% clean *xa*" xa%% / ha%% xu%% / mushroom *mo" m6!& m6!& m6!& my%% mu%% to cry **o" *6!& *6!& *6!& *v4%% *v4%% to steal *k(o! k(6!" k(6!" k(v4!" k(v4!" k(v4!" melon *p(y! p(6!" p(6!" p(6!" p(y!" p(y!" flour *my$ m6$$ m6$$ m6$$ my$$ mu$$ iron *xy" x6%% x6%% xu%% 3y%% 3y%% to untie *p(6" p(6%% p(6%% p(6%% p(6%% p(6%% water *@6" @6!& @6!& @6!& @6%% @6%% Table 10: *a, *a*, *o, *y, and *6 in Eastern Lalo and Central Lalo

The divergent developments of rhymes *a and *a* contribute to a complex vowel correspondence pattern between Eastern Lalo and Central Lalo, seen in Figure 4 below. Eastern Lalo’s /a/ sometimes corresponds to Central Lalo’s /a/, but in words whose source had rhyme *a*, Eastern Lalo’s /a/ corresponds to Central Lalo’s /u/. In Central Lalo, [o] and [u] are non-contrastive; [o#] occurs in syllables with laryngealized phonation type, and [u] occurs elsewhere (Björverud, 1998). In Eastern Lalo, however, laryngealized phonation type has been lost, and so /o/ and /u/ are now contrastive phonemes. Eastern Lalo’s /u/ corresponds to Central Lalo’s /u/, but Eastern Lalo’s /o/ (from *a) corresponds to Central Lalo’s /a/. This sets up an incongruent pattern in which Eastern Lalo’s /a/ and /o/ may correspond with Central Lalo’s /a/, while Eastern Lalo’s /a/ may also corresponds to Central Lalo’s /u/. At the same time, Eastern Lalo’s /u/ corresponds to Central Lalo’s /u/. Eastern Central Lalo Lalo

/a/ /a/

/u/ /u/

/o/ Figure 4!"Complex vowel correspondence pattern

This complex pattern creates uncertainty during communication, because listeners cannot reliably connect the sounds they hear to their own phonological system. This uncertainty has been found to have a negative impact on cross-dialectal intelligibility (Milliken, 1988; Moberg et al., 2007). This may partially explain the low intelligibility between Eastern Lalo and Central Lalo. E-DC, E-HS and E?-TS scored an average of 30% in comprehension test when listening to a Central Lalo (CW-QY) story. CW-QY listeners scored slightly better (58%), but still low, when listening to E- HS. Imagine the difficulty listeners from E-DC experience when trying to understand a speaker from CW-QY. In the conversation, the CW-QY speaker might say something like “drink!” [du%%], but the E-DC listeners will probably not be able to connect this utterance to their word for ‘drink,’ [da!&] (from *da*"). Instead, they may interpret it as “get out!,” pronounced [du$$] in E-DC (from *du#H ‘exit, leave’) and similar in pronunciation to CW-QY’s ‘drink.’ C varieties’ reflex for ‘get out’ is harsh du#$$ [do#$$], while most other varieties' show modal /du$$/ or /do$$/. Even if the context of the situation helps E-DC to understand CW-QY’s meaning, the complexity of the correspondence patterns throughout the lexicon is such that misunderstandings are bound to arise. C varieties all share the merger of *e and *i to /i/ in both harsh and modal phonation, and the merger of *0# and *a# to a#. Examples are given in Table 11 below. Harsh i# is realized as diphthong [i0#] in several C varieties. Of this set of two innovations, CE-YA and C-LJ share fully in one, but only partially in the second. CE-YA shows the mergers of *e, *e# to *i, *i# fully, but the merger of *0# and *a# to a# is only seen after velars or /h/. As shown in the table below, after velars or /h/, CE- YA shows a#, but elsewhere shows 0#. C-LJ merges modal *e and *i, but only merges harsh *e# and *i# after labial fricative *v, palatal *A, and palato-alveolar initials *t., *d8, *., *8. Elsewhere, C-LJ reflects *e as harsh /0#/, realized as [æ#], which is contrastive with /i#/ [i0#] except after the initials that triggered the partial merger of *e# and *i#. While most Central varieties have /i#/ but no /e#/ or /0#/ due to the complete merger of *e# with *i# and *0# with *a#, CE-YA and C-LJ have contrastive /i#/ and /0#/. No variety outside of the C cluster shows all three of these mergers, although a few participate in one or two of the changes. E?-TS and E-HS both show the merger of harsh and modal *e, *e# to *i, *i#. Eka shows splits of *e to 0 and i, and of *e# to i and 6, resulting in a partial merger of *e, *e# with *i, *i#. NW?-YL, Eka, and MD show the merger of *0# and *a# to a#. Eka and MD migrated out of the Weishan area hundreds of years ago, so there is a possibility that they share these changes with Central because the changes had already begun before their emigration. Since this paper focuses on the lower-level clusterings, none of which Eka and MD belong to clearly, the question of higher level groupings is left for future research. ! ! Proto- CW- CE- NW?- E?- Gloss Lalo C-WC QY C-CJ YA C-LJ C-LB YL TS E-DC cat ni" ni%% ni%% ni%% ni%% ni%% ni%% ni!& ni!& ni!& heart )ni#H ni0#$$ )ni#$$ / / iB#$$ i0B#$$ ni&& iB$$ ni$$ to pound te! t0!" ti!" ti!" ti!" ti!" ti!" t0!" ti!" te!" to carry te#H ti0#$$ ti#$$ ti#$$ ti0#$$ tæ#$$ ti0#$$ t0&& ti$$ te$$ to lack k(0#: k(a#!" k(a#!" k(a#!" k(a#!" k(a#!" k(a#!" / k(0#!" k(0#!" rat h0#H ha#$$ ha#$$ ha#$$ ha#$$ ha#$$ ha#$$ ha&& h0$$ xa$$ to mend )n0#: na#!" )na#!" na#!" n0#!" na#!" na#!" na%$ n0#!" n0$" Table 11. *i, *i#, *e, *e#, and *0# in Central Lalo and others

3.2. Levenshtein Distance

Section 3.1 presents shared innovations that provide evidence for subgrouping E, C, and NW clusters from a diachronic, phylogenetic perspective. Phonetic distance as measured by Levenshtein distance provides evidence for the lower-level clusters from a synchronic perspective. Levenshtein distance, unlike the comparative method, does not differentiate between retention and innovation, nor does it distinguish contact-induced change from genetically shared change. Therefore, results based on Levenshtein distance should never be equated with a phylogenetic tree based on shared innovations, though there are often similarities. The usefulness of phonetic distance is to be found elsewhere, in its strong correlation to speaker perception and intelligibility. For the purposes of language planning, factors such as overall phonetic similarity (whatever the cause), speaker’s perceptions of difference, and intelligibility are often more important than shared innovations. Critically for endangered languages, historical linguistics and language development efforts must be considered in tandem. Figure 5 below shows the NeighborNet phenogram using the equal angle method on the phonetic distance matrix for all 18 Lalo varieties. Figure 5 is not a historical tree, in that it does not illustrate genetic relatedness. This phenogram shows Lalo varieties’ relative distance from each other in terms of the differences in their pronunciation of the cognates in the 400-item word list. The reticulated or net-like lines show ambiguity or mixed signals in the data: while some pronunciations are shared, others are different. Reticulation in a linguistic phenogram is a result of either independent shared developments or borrowing through contact (McMahon et al., 2007). Fewer reticulated lines mean a clearer signal and thus a more clearly defined cluster. The relative length of lines depicts relative distance: the longer the branch, the more different the variety is. Of course, difference in the pronunciation of cognates arises from historical divergence. However, a regular change that may not be important from a historical perspective, but that affects many words, will have a major impact on the Levenshtein distance. For example, the *a rhyme occurs frequently, so that the lexical items in the Central /a/~ Eastern /o/ correspondence set are quite numerous (over 10% of the 400-item wordlist). Levenshtein distance will count that difference every time it occurs, so this one regular change contributes substantially to the distances seen between Central and Eastern varieties. The weight given to this correspondence set is justified, however, because of the complex correspondence patterns’ impact on intelligibility, explained in Section 3.1.3 above. Figure 5 shows Central Lalo varieties clustered together on the left side of the phenogram, with fewer reticulations and shorter lines connecting them. East Mountain Lalo (CE-YA) is placed at the outer edge of the Central Lalo cluster, and is connected through reticulated lines to Eastern Lalo. This connection to Eastern Lalo is most likely a result of sustained, close contact that occurs between East Mountain and Eastern Lalo speakers. This result is consistent with the mixed nature of East Mountain Lalo’s isoglosses. East Mountain shares the bulk of its innovations in initials and finals with other Central Lalo varieties, but also shares the Tone 1 split to low rising with Eastern. Eastern Lalo varieties are found on the right side of the phenogram, but clustered less tightly to each other than Central Lalo varieties are to each other. E-HS and E-DC group together, with E?-TS nearby. A rather long reticulated branch to the right connects all NW varieties, with core NW clustering more tightly, and NW?-YL slightly less so. Northwestern Lalo varieties form a less clearly defined cluster than Central Lalo. This branch also includes the peripheral varieties XZ and YL, both of which show longer lines and thus further distance from Northwestern varieties. These peripheral varieties are more similar to the E and NW clusters than to Central because of their various tonal splits, which may be independent innovations from those seen in the E and NW clusters. MD and Eka appear to form a cluster through a long, reticulated branch, but are then separated from each other by long individual lines. The connection between MD and Eka does not reflect shared historical innovations, however, and instead reflects the early emigration out of the Lalo area before many of the innovations occurred which distinguish the lower-level clusters. The apparent MD and Eka cluster is a result of shared retentions, not innovations, and they are not grouped together on historical grounds. !

Figure 5. NeighborNet phenogram based on Levenshtein distance

!

Figure 6. Neighbor-joining tree of Lalo phonetic distance ! Figure 6 above shows an unrooted tree based on Lalo phonetic distance, produced by the neighbor-joining method in the SplitsTree software package. Neighbor-joining as a tree-drawing method is used by McMahon et al. (2007) and reviewed positively by Nichols and Warnow (2008) in their evaluation of computational phylogenetic methods. The reticulated lines seen in NeighborNet are collapsed into one line, but the identification of clusters is equally clear as it is in Figure 5. The main difference between the two diagrams is the placement of E?-TS in Figure 6: instead of clustering with other Eastern varieties, it branches off from the root of the Eastern cluster. In this tree, East Mountain Lalo does not group with Eastern varieties, but instead appears as a divergent Central variety, which it is from a diachronic perspective. !

! Figure 7. Classic multidimensional scaling of Lalo varieties ! Finally, Figure 7 above shows a classic multidimensional scaling (Gower, 1966) of Lalo varieties’ phonetic distance, giving results mostly consistent with the other figures. Central Lalo varieties again cluster tightly together, with CE-YA slightly set apart. Eastern varieties also form a tight cluster, somewhere in between Central and Northwestern. Northwestern varieties are more loosely grouped together in the top left corner of the diagram and cannot be said to form a clear cluster together. XZ and YL are found in between E and NW varieties. Eka and MD are far apart from each other and from everyone else, and do not form a cluster with anyone. !

4. Conclusion

The area just south of Erhai Lake in Dali Prefecture, traditional homeland of the Lalo, shows a rich diversity in its regional variation. The central valley in Weishan that bisects the county into eastern and western halves plays a subsidiary role in the variation, not the main one as was previously thought. CE-YA, located in the east, diverges slightly from the main Central Lalo body found in the western half of Weishan and other areas. However, the main divisions in Lalo are not between East and West Mountain, but rather between the seven dialects/dialect clusters in Lalo: C, NW, and E dialect clusters, all located near the traditional homeland of the Lalo, and four peripheral varieties that migrated out from this area at various times in the past (XZ, YL, Eka, and MD). Northwestern, Eastern, and Central Lalo diverge from each other in important ways. NW Lalo shares a chain of tonal innovations, C Lalo share changes in initials and rhymes, and E Lalo shares innovations in tones and rhymes. These patterns of change create complex synchronic correspondence sets that negatively impact cross-dialectal comprehension, to the extent that intelligibility between the three groups is negligible. The shared innovations described in this paper are summarized in Table 12 below. This list of innovations is by no means exhaustive, but does adequately demonstrate the divergence between the lower level clusters. Important innovations that distinguish the groups are enclosed with bold lines. Table 12 and the innovations discussed in 3.1 provide sufficient evidence for the three lower- level clusters C, NW, and E from a diachronic perspective. NeighborNet networks, neighbor- joining cluster analysis, and classic multi-dimensional scaling based on phonetic distance also identify the three dialect clusters from a synchronic perspective. The question of higher-level groupings remains for future research, especially investigating possible links between the three dialect clusters. Also, the relation of peripheral groups XZ and YL in Baoshan and MD and Eka in Lincang to the homeland clusters of C, NW, and E is still uncertain. Further exploration of innovative patterns, including lexical and morphological innovations, may bring further clarity to these issues. Lalo is a minority language in China and as such is under pressure to assimilate to Chinese. Indeed, Lalo speakers in many areas have already begun the shift to Chinese. By the end of this century, Lalo may no longer be a vital language. Further exploration and documentation of regional variation in Lalo must happen now, if it is to happen at all. Beyond enriching our knowledge of Central Ngwi languages, a realistic picture of Lalo dialects is fundamental to sound language planning and language development, especially in terms of orthography design. Language development based on a thorough understanding of the subgroups can then help Lalo speakers maintain their language beyond this century. !

Location MD Eka YL XZ E- E- E? NW NW NW NW? CE- CW- C- C- C- C- C- Innovation DC HS TS DT SLZ SZP YL YA QY WC CJ QS LB LJ *p! > f /_v " ! ! !! !! !! !! !! !! !! !! !! !! !! ! *p > k /_v " !! !! nd nd ! ! ! ! ! ! ! ! *b > [b#]/_o !! ! !! ! !! !! !! !! !! !! !! !! !! *o > i/ labial stops_! ! !! !! !! !! !! !! !! !! !! !! !! !! !! *a$ > u! ! ! ! ! ! ! !! !! ! !! !! !! !! !! !! ! *H, *3 > mid [33]! ! ! ! ! ! ! ! ! ! ! ! *o, *y, *% > % ! !! !! (p)! *b > b#/_i ! ! ! !! !! !! *b# > dz ! !! !! !! !! !! !! ! ! ! ! ! ! ! *1 > Low rising/non-&-_! ! ! !! !! !! ! ! ! ! ! ! *L > high [55] !! !! !! !! *1/+&-_ ,*H> mid-high ! [44] !! !! !! ! ! ! ! ! ! ! ! contrastive labiodentals ! ! !! !! !! !! !! !! !! !! *e> i (p) !! !! ! !! !! !! !! !! !! !! *e' > i ' (p) !! !! ! !! !! !! !! !! !! (p)! *(' > a ' !! !! ! ! ! (p) !! !! !! !! !! !! *b# > v ! ! ! ! ! !! !! ! *b# > d) ! ! ! !! !! *1 > Low rising /[+voi]_! ! ! ! !! !! ! ! ! ! !! !! ! ! ! ! ! ! Loss of *harsh in *H ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Loss of *harsh in *L! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! *a > o !! !! ! !! !! !! ! !! ! ! ! ! ! ! ! ! ! ! *a$ > a! !! !! ! !! !! !! ! !! ! ! ! ! ! ! ! ! ! !

Table 12: Summary of shared innovations

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! " NW-YL merges *1 and *H, but with slightly different conditioning than other NW varieties. In NW-YL, *1/[-voi]_,*H> high [44].

5. Acknowledgements

I would like to acknowledge Molly Cha, P.L. Blackburn, Laura Blackburn, Lena Pu, Edwin Lam, and Angel Lam for giving me access to their unpublished wordlists. I also want to thank David Bradley and two anonymous reviewers for their useful comments on earlier versions of this paper.

REFERENCES

Atwill, D. G. 2005. The Chinese Sultanate: Islam, Ethnicity, and the Panthay Rebellion in Southwest China, 1856-1873. Stanford: Stanford University Press.

Backus, C. 1981. The Nan-chao Kingdom and T'ang China's Southwestern Frontier. Cambridge; NY: Cambridge University Press.

Bai, X. 2002. Yizu Wenhua Shi [A Cultural History of the Yi Nationality]. Kunming: Yunnan Minzu Chubanshe.

Björverud, S. 1998. A Grammar of Lalo, Lund University: PhD dissertation.

Blackburn, P.L. 2006. Discussion of Lalo language use. Dali: Personal communication.

Blackburn, P.L., Blackburn, L and Cha, S. 2007. Ma'anshan Lalo Wordlist. Unpublished manuscript.

Bradley, D. 1977. Proto-Loloish tones. In D. Bradley (ed.). Papers in Southeast Asian Linguistics No. 5. (pp. 1-22). Canberra: Pacific Linguistics.

Bradley, D. 1979a. Lahu Dialects. Canberra: Australian National University Press.

Bradley, D. 1979b. Proto-Loloish: Scandinavian Institute of Asian Studies Monograph No. 39. London: Curzon Press.

Bradley, D. 1994. A Dictionary of the Northern Dialect of Lisu (China and ): Pacific Linguistics Series C-126. Canberra: Pacific Linguisitcs.

31 ! Bradley, D. 2002. The Subgrouping of Tibeto-Burman. In C. Beckwith and H. Blezer (eds.). Medieval Tibeto-Burman languages. PIATS 2000: Tibetan Studies: Proceedings of the Ninth Seminar of the International Association for Tibetan Studies (pp. 73-112). Leiden: Brill.

Bradley, D. 2004. Endangered Central Ngwi languages of northwestern Yunnan. Paper presented at 37th ICSTLL, Lund.

Bryant, D. and Moulton, V. 2004. NeighborNet: An agglomerative algorithm for the construction of planar phylogenetic networks. Molecular Biology and Evolution 21, 255- 65.

Casad, E. H. 1974. Dialect Intelligibility Testing. Dallas: SIL.

Chan, C. C. Y. 2008. Lalo Datapoints. Unpublished manuscript.

Chao, Y. 1930. A system of tone letters. Le Maitre Phonetique 30, 24-27.

Chen, S., Bian, S. and Li, X. 1985. Yiyu Jianzhi [Outline of the Yi language]: Zhongguo shaoshu minzu yuyan jianzhi congshu. Beijing: Minzu Chubanshe.

Dictionary.com. phenogram. in Dictionary.com Unabridged (v 1.1). Retrieved on August 25, 2009 from http://dictionary.reference.com/browse/phenogram.

Gooskens, C. 2006. Linguistic and extra-linguistic predictors of inter-Scandinavian intelligibility. In J. van de Weijer and B. Los (eds.). Linguistics in the Netherlands 2006 (pp. 101-13). Amsterdam: John Benjamins.

Gooskens, C. and Heeringa, W. 2004. Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data. Language Variation and Change 16, 189- 207.

Gower, J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–328.

Heeringa, W. 2004. Measuring pronunciation differences with Levenshtein distance. PhD dissertation. Humanities Computing, University of Groningen.

32 ! Heeringa, W., Kleiweg, P., Gooskens, C. and Nerbonne, J. 2006. Evaluation of string distance algorithms for dialectology. Paper presented at Linguistic Distances Workshop at the joint conference of International Committee on Computational Linguistics and the Association for Computational Linguistics, Sydney.

Hu, C. and Duan, L. 2000. Dali Baizu Zizhizhou Fangyan Zhi [A Survey of the Dialects of Dali Bai ]. Dali Baizu Zizhizhou Zhi [Gazetteer of Dali Bai Autonomous Prefecture], ed. by Editorial committee. Dali: Dali Shifan GaoDeng Zhuankexuexiao.

Huang, B. and Dai, Q. (eds.) 1992. Zangmian yuzu yuyan cihui [A Tibeto-Burman lexicon]. Beijing: Zhongyang Minzu Daxue Chubanshe.

Kessler, B. 1995. Computational dialectology in Irish Gaelic. Paper presented at Proceedings of the European ACL, Dublin.

Kluge, A. 2007. RTT Retelling Method: An alternative approach to intelligibility testing. Retrieved on January 15, 2009 from http://www.sil.org/silewp/abstract.asp?ref=2007- 006.

Kruskal, J. 1999. An overview of sequence comparison. In D. Sankoff and J. Kruskal (eds.). Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison (pp. 1-44). Stanford: CSLI.

Lam, E. 2009. East Mountain Lalo variation in Tone 1. Unpublished manuscript.

Lam, E., Lam, A. and Qu, H. 2007. Diaocao Lalupu Wordlist. Unpublished manuscript.

Li, F. 1977. A Handbook of Comparative Tai. Honolulu: University of Hawaii Press.

Matisoff, J. A. 1972. The Loloish Tonal Split Revisited: Center for South and Southeast Asia Studies, Research Monograph No. 7. Berkeley: University of California.

Matisoff, J. A. 1973 [1982]. The Grammar of Lahu. 2nd ed. Berkeley: University of California Press.

Matisoff, J. A. 2003. Handbook of Proto-Tibeto-Burman: System and Philosophy of Sino- Tibetan Reconstruction. Berkeley: University of California Press.

33 ! Matisoff, J. A. 2006. English-Lahu Lexicon. Berkeley: University of California Press.

McMahon, A. 2005. Introduction (to quantitative historical linguistics issue). Transactions of the Philological Society 103(2), 113-119.

McMahon, A. and McMahon, R. 2005. Language Classification by Numbers. New York: Oxford University Press.

McMahon, A., Heggarty, P., McMahon, R. and Maguire, W. 2007. The sound patterns of Englishes: representing phonetic similarity. English Language and Linguistics 11, 113- 142.

Milliken, M. E. 1988. Phonological Divergence and Intelligibility: A Case Study of English and Scots. Cornell University: PhD dissertation.

Moberg, J., Gooskens, C. and Nerbonne, J. 2007. Conditional entropy measures intelligibility among related languages. In P. Dirix, I. Schuurman, V. Vandeghinste and F. Van Eynde, (eds.). Proceedings of the 17th Meeting of Computational Linguistics in the Netherlands (pp. 51-66). Amsterdam: Rodopi.

Nerbonne, J. Forthcoming. Various variation aggregates in the LAMSAS south. In C. Davis and M. Picone (eds.). Language Variety in the South III. Tuscaloosa: University of Alabama Press.

Nichols, J. and Warnow, T. 2008. Tutorial on computational linguistic phylogeny. Language and Linguistics Compass 2, 760-820.

Osenova, P., Heeringa, W. and Nerbonne, J. Forthcoming. A quantitative analysis of Bulgarian dialect pronunciation. Zeitschrift für slavische Philologie.

Sun, H. (ed.) 1991. Zangmian Yuyin He Cihui [Tibeto-Burman Sound Systems and Lexicons]. Beijing: Zhongguo Shehui Kexue Chubanshe.

Wang, C. 2003. Yi Yu Fangyan Bijiao [Comparative study of Yi dialects]. Chengdu: Sichuan Minzu Chubanshe.

Wells, J. 2005. Sampa computer readable phonetic alphabet. Retrieved October 3, 2007 from http://www.phon.ucl.ac.uk/home/sampa/.

34 ! Yang, C. 2009. Nisu dialect geography. SIL Electronic Survey Reports.

Yang, C. accepted. Tone change in Northwestern Lalo. In Y. Treis (ed.). Selected Papers from the 2009 Annual Meeting of the Australian Linguistics Society. Melbourne: ALS.

Yang, C. and Chan, C. C. Y. 2008. The : A first look. Paper presented at 41st International Conference on Sino-Tibetan Languages and Linguistics, London.

Yang, C. and Castro, A. 2008. Representing tone in Levenshtein distance. International Journal of Humanities and Computing 2, 205-19.

YNYF. 1984. Editorial committee, eds. Yunnan Yiyu fangyan ciyu huibian [A lexical compendium of Yi dialects]. Kunming: Yunnan Minzu Xueyuan.

You, Z. 1994. Yunnan Minzu Shi [History of Yunnan Nationalities]. Kunming: Yunnan University Press.

Zhu, W. 2005. Yiyu fangyan xue [Yi dialect studies]. Beijing: Zhongyang Minzu Daxue Chubanshe.

35 !