<<

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

North Caucasian languages: comparison of three classification approaches

Valery Solovyev Kazan State University, Kazan, [email protected]

Abstract

In the paper three approaches to reconstruction of languages evolution trees are compared on the material of North Caucasian languages: the expert one (comparative-historical method), , application of phylogenetic algorithms to databases. It is shown that degree of coherence of different computer solutions is approximately the same as degree of coherence of expert solutions. A new classification of North Caucasian languages is proposed, as a result of applying the consensus method to different known classifications.

Keywords: North Caucasian languages, phylogenetic algorithms, evolution trees, linguistic databases, consensus method.

http://www.ijhcs.com/index Page 1309

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

1. Introduction

Over the last years comparative linguists have developed language classification methods based on computer-aided calculations of linguistic similarities. Such methods have added substantially to the toolset of comparative . Methods utilizing computer programs to construct phylogenetic trees are conventionally called “automated”. The most complete overview of the state of affairs in this area is given in Nichols and Warnow (2008). This work is concerned with both the comparison of algorithms for constructing trees and the analysis of attempts to apply them to various language families. In order to determine the possibilities and usefulness of phylogenetic algorithms, it is proposed to test them on data from well-described families with unquestionable structure (benchmark or Gold Standard) and to compare the trees generated by computational algorithms with those obtained in a traditional manner. The Indo-European is the family which has been the of the largest number of works applying computational classification methods (Gray & Atkinson 2003; Atkinson & Gray 2006; Nakhleh et al. 2005; Nicholls & Gray 2008; Rexová et al. 2003; Ringe et al. 2002; Nakhleh et al. 2005a). Also, a large number of works has been devoted to Bantu languages (Holden 2002; Holden & Gray 2006; Rexová et al. 2006; Brown et al. 2008; Serva & Petroni 2008; Bastin 1983; Holden et al. 2005; Marten 2006). Also, several papers focus on Austronesian and Papuan languages (Gray & 2000; Dunn et al. 2005; Donohue & Musgrave 2007; Dunn et al. 2007; Saunders 2005). Finally, some papers look at Native American Languages (Wichmann & Saunders 2007; Cysouw et al. 2006; Brown et al. 2008). In the view of Nichols and Warnow (2008: 814), much of this work is somewhat disappointing: “One of the main observations in the studies reviewed here is that trees obtained for the same language family but using different datasets and/or different methods can differ in substantial ways … while the development of methods for phylogenetic estimation in linguistics is exciting, we still do not have evidence that any of these methods is capable of accurate estimation of linguistic phylogenies.”. Thus, there is a necessity for developing new sets of empirical data and improve on models of language evolution and phylogenetic methods. Let us point out that while there are more than 300 language families in the world, computational phylogenetic methods have only been applied to a minority part of them. It makes sense to extend the set of families on which phylogenetic methods are tested. Different language families evolve under different conditions. For example, Indo-Europeans populate vast territories, migrated frequently and established many contacts with other people. In contrast, people of the North live in an extremely limited territory with specific conditions of communication (mountains and gorges complicate contacts), and have occupied this region for long time without essential resettlements. It is quite probable that such differences have correlates in the way that phylogenies evolve. It is in any case important to extend case studies of phylogenetic methods to new families. In this paper we consider North Caucasian languages, a family for which we have high- quality sets of data and a well-studied phylogenetic structure. The situation with the classification of North Caucasian languages is approximately the same as that of Indo- European. There is a large tradition of studies of these languages, and researchers tend to http://www.ijhcs.com/index Page 1310

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

concur in their opinions on basic issues. Higher-level subgroups of the upper level are generally accepted although controversy does persist in some cases. Some clades at the lower classificatory levels are also not well established. There are some recent works on the application of computational methods to the classification to North Caucasian languages (Koryakov 2006; Kassian 2015). In Kassian (2015), the application of six phylogenetic algorithms to the classification of the Lesbian subgroup is compared. In all cases 110-item Swadesh lists constitute the input. Philogenetic methods can be used to solve other problems than reconstruction of the trees. So, in (Shijulal et al. 2011) these methods are applied for investigation of borrowings; however, they are not widespread yet. An overview of such problems and approaches can be found in (Dunn 2014). The aim of the present paper is a discussion and analysis of previously published classifications of North Caucasian languages. Four of them were obtained by computational methods. Two of these four classifications were developed within the framework of the Automated Similarity Judgment Program (ASJP) (Müller et al. 2013; Jäger 2013). A third was developed within the framework of the Global Lexicostatistic Database (GLD) (Starostin 2015). The fourth tree is the result of more traditional lexicostatistical method (Burlak 2005). Moreover, seven expert classifications: (Lewis 2013); Haspelmath et al. (2005); Ruhlen (1987); Schulze (2014); Alekseev (2001); Burlak (2005), Diakonov & Starostin (1988) as well as other works of relevance for the classification of North Caucasian languages (Nichols 2003; Nikolaev & Starostin 1994; Kassian 2015; Talibov 1980) are considered. Thus, all modern classifications of North Caucasian languages are taken into account. When comparing the classifications obtained automatically, the attention is focused on comparing the datasets used rather than the algorithms for constructing trees. We are interested just in the structure of trees (pure topologies), not in the time of divergence of languages (branch lengths). All exploited databases are lexical ones. In (Rama & Kolachina 2012) an overview of application of philogenetic algorithms to typological databases is given. However, there are no publications in which typological databases would be applied to North Caucasian languages. Moreover, comparison of results of philogenetic algorithm NJ (neighbour joining) application to typological databases Jazyki Mira (Polyakov & Solovyev 2006), WALS and lexical database ASJP revealed (Polyakov et al. 2009) significant advantages of ASJP. Let us initially describe the main differences between the approaches used in the projects ASJP, GLD and traditional lexicostatistics. In the both ASJP and GLD projects phonetic similarities among languages are determined automatically. The main, shared principles in these approaches are the following: (1) a short choice list of basic vocabulary is chosen (some variant of the ); (2) a phonetic similarity between words representing the same meanings in different languages is determined; (3) an algorithm is applied to construct a language family tree using the phonetic similarity measure. The above- mentioned procedures differ in the selection of basic vocabulary (in GLD it is larger), methods of phonetic similarity calculation, and algorithms for trees constructing. However, both approaches are similar in spirit. In particular, they circumvent some of the ideas peculiar to the of , including on the identification of shared innovations and the establishment of cognates. The differences between them mainly concern details of how phonetic similarities are computed. The only difference in the two ASJP

http://www.ijhcs.com/index Page 1311

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

versions consists in the approaches used to calculate distances between languages on the basis of words from the lists of basic vocabulary. The lexicostatistical method (Burlak 2005) has features that link it both to the two approaches describe above and to the comparative method. The etymological approach to establishing cognates links the lexicostatistical method to the comparative method, whereas the use of restricted set of lexical data (which typically varies from 35 to 200 basic items in different approaches) and the use of computer programs for tree constructing is similar to the ASJP and GLD procedures. All three approaches differ in the extent to which expert knowledge is being drawn upon and with regard to the computer-aided calculations used. The lexicostatistical method requires cognates to be established and hence uses non- trivial expertise. The advantages over classical the comparative method are the use of a computer algorithm can simultaneously access all the data from all the languages concerned (a person is simply unable to do this, due to the huge amount of data) and its objectivity. Traditional historical linguistics relies on the subjectivity of experts who may interpret data differently and evaluate different parameters as more or less relevant, applying criteria that are rarely stated explicitly. For instance, there is no consensus about how many shared innovations are sufficient for establishing a subgroup, how to clearly distinguish between truly shared innovations and accidentally similar innovations, and how to deal with conflicting evidence. To sum up, in this three approaches are considered: the classical comparative- historic approach (“expert approach”), the lexicostatistical method (which takes expert knowledge on cognates as input to phylogenetic algorithms), and the computer-only approach (where the distance between languages is determined algorithmically, without using expertise on cognates). Let us emphasize the differences between the four automatic methods mentioned above: А). Lexicostatistical method vs. rest: using cognates vs. not using cognates (influence of labor-intensive expertise). B). GLD vs. ASJP: 110-item wordlists vs. 40-item wordlists. It is believed that GLD lists have been checked more carefully for the quality of its data than is the case for ASJP (http://starling.rinet.ru/new100/treesr.htm). There are slight differences between the processing algorithms of these two approaches, but they do not seem essential. C). The versions of ASJP differ in the method (LDND vs. MPI) used to calculate the distance between languages. Descriptions and comparisons of different algorithms for constructing trees are offered in Nichols & Warnow (2008) and Wichmann & Saunders (2007), and we shall not consider this issue here. However, the important point is that phylogenetic algorithms are clustering algorithms, for which instability is usual. In Templ (2007) it is noted that “Applying cluster analysis on real data results in highly non-stable results for many reasons”. It is recommended in Mooi & Sarsted (2011:237-284) to apply various cluster algorithms to the same data and to compare the results obtained, identifying recurring clusters. In Wilkinson (1996) a method for the automatic construction of consensus trees is described. This technique has not been applied earlier in linguistic phylogenetic research and will be tested for the first time in this article. The purposes of this paper are the following:  to introduce to the attention of researchers a new benchmark, that of North Caucasian languages with its the corresponding datasets and trees; http://www.ijhcs.com/index Page 1312

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

 to assess four different methods of computer classification for well-studied subgroups of North Caucasian;  to compare by objective quantitative methods the results offered by experts and those obtained by computer methods, and thereby estimate the influence of various computer approaches, first of all datasets they draw upon, on the results of classification;  to construct and analyze the consensus trees for large branches of North Caucasian languages considering all the available classifications. In order to assess how well these new methods can be applied to determine language relationships we use the consensus classification of North Caucasian languages from Lewis (2013), henceforth Ethnologue, as well as other expert classifications. The trees constructed using the different methods will be compared by means of strict quantitative approaches. We shall also juxtapose the hypotheses of different experts in order to assess the variability of results obtained by applying the comparative historical approach. As it turns out, the aforementioned new methods and more traditional approaches inform and shed light on each other’s assumptions. The structure of this article is the following. In section 2 I present traditional schemes of classification of North Caucasian languages and the research methods used. In section 3 computer methods are tested for the upper level groups. In section 4 the lower level structures for each branch of North Caucasian languages are considered. A discussion of the results obtained is provided in section 5.

2. Data and methods

As in several works before this one (Wichmann et al. 2010; Huff & Lonsdale 2011; Pompei et al. 2011), the Ethnologue classification will serve as a gold standard. This is due to the fact that this classification only contains uncontroversial clades. Fig. 1 reproduces the Ethnologue classification of North Caucasian languages, in which 34 languages are taken into account. We only consider the languages listed in Ethnologue rather than including which are also not generally considered in other classifications.

• West Caucasian • Abkhaz-Abazin: Abaza (abq), Abkhaz (abk) • Circassian: Adyghe (ady), Kabardian (kbd) • Ubyx: Ubykh (uby) • East Caucasian • Nakh • Batsi: Bats (bbl) • Chechen-Ingush: Chechen (che), Ingush (inh) • Avar-Andic • Andic: Akhvakh (akv), Andi (ani), Bagvalal (kva), Botlikh (bph), Chamalal (cji), Ghodoberi (gdo), Karata (kpt), Tindi (tin) • Avar: Avar (ava) • Tsezic

http://www.ijhcs.com/index Page 1313

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

• East Tsezic: Bezhta (kap), Hunzib (huz) • West Tsezic: Dido (ddo), Hinukh (gin), Khvarshi (khv) • Dargi: Dargwa (dar) • Khinalugh: Khinalugh (kjj) • Lak: Lak (lbe) • Lezgic • Archi: Archi (aqc) • Udi: Udi (udi) • Nuclear Lezgic • East Lezgic: Aghul (agx), Lezgi (lez), Tabassaran (tab) • South Lezgic: Budukh (bdk), Kryts (kry) • West Lezgic: Rutul (rut), Tsakhur (tkr)

Fig.1. North Caucasian languages from Ethnologue

This and all other trees are represented in the Newick format in the Applications. Other sources suggest more detailed classifications comprising additional language groups. The most commonly adopted groups are those given in Haspelmath et al. (2005), henceforth WALS, which correspond to the ones of Ruhlen (1987). WALS contains the same set of languages as Ethnologue, plus two Adyghe dialects. The WALS classification differs from Ethnologue in four aspects: first, North East Caucasian languages are subdivided into two subgroups: Nakh and all the rest (Daghestanian); furthermore, Khinalug is included in the Lezgic branch, Tsezic is integrated with Avar-Andic, and Lak is coupled with Dargwa. These combinations have many supporters. We shall consider them as the main hypotheses and will be interested in seeing whether computational methods confirm them. Further on, we shall examine more detailed and, correspondingly, more hypothetical expert classifications (Schulze 2014; Alekseev 2001). Figures S2a-S2c in the Applications illustrate hypothetical trees from Schulze (2014). Unfortunately, Schulze does not provide a detailed description of data and methods used for the construction of these trees. His partially published data (Gippert 2008) are questioned in (Kassian 2015). Fig. S3 in the Applications presents the classification used in the monograph of Alekseev (2001) from the Jazyki Mira series of typologically oriented sketch grammars of (mainly) Eurasian languages. The classification from Alekseev (2001) (comprising the same languages) is a compilation of results obtained by the traditional comparative-historical method and reflects the views of the Soviet and Russian schools of Caucasology. Let us pass on to another classification, that of the ASJP. This project makes use of word lists comprising 40 items from the Swadesh list considered to be particularly stable. The word lists are transcribed used a special system, so-called ASJP code (Holman et al. 2008). As the project has developed, four successive versions of database have become available, along with four world trees of lexical similarity based on the different versions. Here we use the fourth version of the world tree (Müller et al. 2013). This version uses LDND measure,

http://www.ijhcs.com/index Page 1314

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

based on edit distance of Levenstein. North Caucasian languages from ASJP are presented in the form of the tree in the Applications (Fig. S4). Recently Jäger (2013) in his work has successfully applied Partial Mutual Information (PMI) (Church and Hanks 1990) metrics to calculate the distance between words. The metrics is based on ideas of sequence alignment theory elaborated in bioinformatics. The ASJP tree with PMI metric is presented in the Applications, Fig. S5. Further we shall separately consider ASJP-LDND and ASJP-PMI trees. Details of the definition of the LDND distance metric can be found in Wichmann et al. (2010) and Jäger (2013) describes PMI distance metric. One more stipulation concerns ASJP trees structure is in place. The latest versions of this database contain a variety of dialects from the North Caucasian region. Since Ethnologue, which is being used as a standard of comparison, does not contain any dialects we remove all dialects of each language except one. That does not present difficulties or affect the trees structure since in all but one cases the dialects form compact branching structures united under single nodes. The exception is that two dialects of Karata language occupy non- adjacent positions in the tree in Fig. S3 in the Applications. In this case the Tokitin is removed, being the more peripheral one (Magomedbekova 2001). As mentioned above, the GLD project is based on 110-item lexical Swadesh lists. For the purpose of constructing the “Objectively Generated” Trees of Phonetic Similarity in the GLD project, the following definition of word similarity is used: “the first and the second of the of the words compared should belong to the same consonantal class» (http://starling.rinet.ru/new100/trees.htm). The consonant classes used are displayed on http://starling.rinet.ru/new100/sound.pdf. Next, the extent of linguistic affinity between the languages is calculated as the percentage of phonetically similar words and based on this data a phylogenetic tree is constructed by means of the Neighbor-Joining algorithm. Fig. S6 in the Applications shows the tree from GLD project, where, unfortunately, only a part (17) of North Caucasian languages: Lezgic, Tsezic, Nakh, and Khinalug is presented. The comparison of automatically constructed trees from Fig. S4, S5, S6 (the Applications) is of interest due to the fact that they are built on the basis of different metrics and apply Swadesh list of various lengths. Therefore, we have an opportunity to juxtapose the effects of these factors. Finally, Fig. S7 in the Applications presents the tree from Koryakov (2006), which is constructed by means of lexicostatistics. The tree is based on cognate identification among the items of 110-words Swadesh list for North Caucasian languages (http://starling.rinet.ru/cgi- bin/main.cgi?=new100&encoding=utf-rus), with application of the STARLING phylogenetic algorithm (http://starling.rinet.ru/new100/downloads.htm), analog of the neighbour joining algorithm. The lists are created within the project the Tower of Babel (later on being transformed to GLD). Among the various automatic classifications, the focus of attention in the present work will be on that of the ASJP project, since it comprises a larger number of languages in general and North Caucasian in particular as compared with GLD, and because the procedures which it is constructed are better documented than those of Koryakov (2006). A direct and exact comparison of the trees illustrated in figures S2-S7 with the reference tree is not straightforward, since the trees are structured too differently, ranging from the completely binary branching ones produced by the neighbor-joining algorithm to trees with http://www.ijhcs.com/index Page 1315

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

predominantly non-binary branching where the number of coordinate branches sometimes is as high as eight, as in the Ethnologue tree. Family trees constructed by means of the comparative-historical method are often highly unresolved, reflecting uncertainty on the part of the researchers of evidence for subgrouping at higher taxonomic levels. In Nichols & Warnow (2008) three criteria for comparing computer-constructed trees with a benchmark are proposed. We shall use only one of them, namely the second (“No missing subgroups”), since the first is a special weak case of this second one, and the third one relates to the issue of dating, which we do not deal with. The criterion “no missing subgroups” requires that all subgroups differentiated in the benchmark were differentiated in the computer-constructed tree. This is a very strict test having only two values – “satisfied” and “not satisfied”. However, in large families with many languages and subgroups, some languages may be wrongly placed in the tree, although, in general, recognized subgroups are correctly differentiated. In such cases it is possible to use numerical estimates of the degree of similarity Donohue (2010) proposed an approach similar to the one mentioned above. This method builds on logic from the field of information theory (Rijsbergen 1979). We now briefly outline this approach and then apply it to the case of North Caucasian languages. Following this approach, let us juxtapose an inferred tree, T, with the reference tree and determine with the help of the benchmark how completely and exactly each branch of the benchmark is presented соответствующим поддеревом in T. We will use the following designations: tp = true positives – the part of languages of sub-tree in related to the branch; fp = false positives – the part of languages of sub-tree not related to the branch; fn = false negatives – the part of languages in the branch not comprised into sub-tree. P (Precision) = tp/(tp+fp), R (Recall) = tp/(tp+fn). These two measures can be combined into a single index, the so-called F1-measure calculated by the formula: F1 = 2PR/(P+R). The precision, recall and F1-measure are customary ways of evaluation for systems of information extraction (Rijsbergen 1979). An metric often used is the Robinson-Foulds distance (Robinson & Foulds 1981). This works as follows. Each tree is treated as unrooted and is formalized as a set of partitions, and a count is then made of how many partitions occur in one tree which do not occur in the other. For instance, the tree in Fig. 2a contains the following partitions: AB|CDE and ABC|DE, and the one is Fig. 2b the partitions AB|CDE and ABD|CE. Thus, of two partitions one is different, so the raw RF distance is 1 and the RF distance normalized by the number of comparison is ½ or 50%. The method requires that the trees contain the same taxons (languages in this case). It is known to be somewhat overly sensitive to differences ensuing from the different placement of just one taxon. If the placement is quite different in the two trees it can involve many nodes and produce a large RF distance, in spite of being produced by just a single taxon. The method will count a difference when one tree as an unresolved node when the other tree is fully binary. For instance, if one of the trees in Fig 2 had C, D, and E connected to a single node it would mean a difference of 1 with the other tree, also implying a normalized distance of 50%. Since, as mentioned, language family trees based on the comparative method are often unresolved, Pompei et al. (2011) introduced the Generalized Robinson-Foulds distance, which does not “punish” one tree for being more resolved than the other. This metric would judge two trees to be identical in situation just described where one of them places C, D, and E under the same node.

http://www.ijhcs.com/index Page 1316

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Fig. 2a, b. Unrooted trees illustrating the Robinson-Foulds tree-distance metric.

Another popular measure of the trees proximity – the quartet distance – is proposed in Estabrook et al. (1985). The quartet distance is defined as the number of subsets of four that form different subtrees in both trees.

3. The upper level of North Caucasian languages classification

3.1. North East Caucasian and North West Caucasian. All North Caucasian languages according to all classifications at the upper level are divided into North East Caucasian and North West Caucasian, so, we see no unconformities here. However, Ethnologue treats North Caucasian languages as a family (this point is supported by Nikolaev (1994)), while other classifications consider only North East Caucasian and North West Caucasian as families. Their relationship is questioned in other works (Nichols 2003) and can be interpreted as rather hypothetical. Automated tree reconstruction is not helpful in resolving this particular issue since it does not assign a status of a family to one or another unit, it can differentiate only evolutionary branches in a family already assumed to be a unit. 3.2. North West Caucasian languages At the next level of specification in Ethnologue the group of West Caucasian languages is divided into three branches: Abkhaz-Abazin, Circassian and Ubykh. WALS does not differentiate these branches due to its general strategy not to distinguish smaller groups. In Alekseev (2001) Ubykh is interpreted as an intermediate link between Abkhaz-Abazin and Circassian. Moreover, Alekseev notes that Ubykh is strongly influenced by Adyghe. In some works (Schulze 2014; Koryakov 2006) Ubykh is coupled in a sub-tree with Circassian. But in ASJP trees (in both variants) Ubykh is combined with Abkhaz-Abazin. Ubykh shares some innovations with Abkhaz-Abazin, and some with . Changes such as *p: > b, *bw > p, *t: > d unite Ubykh with Abkhaz-Abazin, while w the evolution *G > (voiced uvular ), *š > š, *ž > z and other link Ubykh to Circassian (Nikolaev 1994). Thus, no consensus in relation to Ubykh classification has been found. Computer methods also lead to controversial results. These discrepancies can be explained by the following course of events. It is quite possible that the initially belonged to Abkhaz-Abazin branch, but at a later stage under the influence of the linguistic borrowings from the Adyghe was changed so profounly that it moved closer to Circassian languages. 3.3. North East Caucasian languages. Upper level Let us now turn to the more interesting East Caucasian languages. As the branches Ethnologue differentiates: Nakh, Tsezic, Avar-Andic, Lak, Dargwa, Lezgic and Khinalug http://www.ijhcs.com/index Page 1317

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

with no additional groups. The lexicostatistical method (fig. S7 in the Applications, LS, shortly) distinguishes exactly the same branches, i.e. it satisfies the criterion “no missing subgroups” from Nichols & Warnow (2008). The GLD method in http://starling.rinet.ru/new100/trees.htm is applied only to Lezgic, Khinalug, Nakh and . It correctly determines the Nakh and Tsezic branches, but includes Khinalug in Lezgic (fig. S6) and, therefore, does not satisfy the criterion. The ASJP method also does not satisfy the criterion. Using the example of the ASJP trees let us show how to differentiate the branches from the gold standard and to numerically assess the completeness and exactness thereof. Above all, we see that each branch from Ethnologue possesses its clearly identified sub- trees in ASJP-LDND-tree. Table 1, 2 present the evaluation of completeness and exactness in differentiating the branches in ASJP-LDND-tree and ASJP-PMI-tree.

Table 1. Completeness and exactness in differentiating of branches in ASJP-LDND-tree

Group no. lgs. tp fp fn Precision Recall F1 Nakh 3 3 0 0 1.00 1.00 1.00 Tsezic 5 5 0 0 1.00 1.00 1.00 Avar- 9 8 0 1 1.00 0.89 0.94 Andic Lak 1 1 0 0 1.00 1.00 1.00 Dargwa 1 1 0 0 1.00 1.00 1.00 Lezgic 9 9 1 0 0.90 1.00 0.94 Khinalug 1 1 0 0 1.00 1.00 1.00

As can be seen from the table, the indices of ASJP-LDND are equal to 1 for all branches apart from two, and for Lezgian (one extra language is included into the branch) and Avar- Andic (one language of this branch is not included into it) branches it is close to 1.

Table 2. Completeness and exactness in differentiating of branches in ASJP-PMI-tree

Group no. lgs. tp fp fn Precision Recall F1 Nakh 3 3 0 0 1.00 1.00 1.00 Tsezic 5 5 0 0 1.00 1.00 1.00 Avar- 9 9 0 0 1.00 1.0 1.00 Andic Lak 1 1 0 0 1.00 1.00 1.00 Dargwa 1 1 0 0 1.00 1.00 1.00 Lezgic 9 9 1 0 0.90 1.00 0.94 Khinalug 1 1 0 0 1.00 1.00 1.00

http://www.ijhcs.com/index Page 1318

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

As we see, the ASJP-PMI method produces better results than ASJP- LDND. The divergence of ASJP-PMI and Ethnologue lies in the inclusion of Khinalug language into the Lezgian branch. Thus, if we compare the four methods on correct differentiation of generally recognized groups of the upper level, we obtain the following hierarchy: LS > ASJP-PMI, GLD > ASJP- LDND. 3.4. Main hypotheses on upper level grouping Now let us consider possible groups of branches in North East Caucasian languages. As was mentioned above, the main hypotheses are the following: A) inclusion of the Khinalug language into Lezgian branch, B) Lak and Dargwa are combined in a single branch, C) Avar-Andic and Tsezic are also combined in a single branch, D) all the languages except Nakh, are combined into Daghestanian subgroup. Let us consider them. A). Khinalug is included into Lezgian branch in a number of expert classifications (WALS, Ruhlen 1987, Schulze 2014, Nichols 2003, Talibov 1980). In 110-words Swadesh list for Khinalug (http://starling.rinet.ru/cgi- bin/response.cgi?root=new100&morpho=0&basename= new100\ncc\kjj&limit=-1) most of the words are raised on protolezghian forms. Khinalug shares with Lezgian a number of innovations, for instance, * > č: (Nikolaev 1994) ( - voiced hissing-hushing (=palatalized) ). Proto Eastern Caucasian *w (with subsequent n, m or l) has been preserved as w only in Lezgian and Khinalug and in the rest Eastern Caucasian languages it has been transformed into m or b (Nikolaev 1994). Though it appears to be not so much the shared innovation as retained conservatism, in phylogenetic algorithms this type of data will be taken into consideration in favour of linguistic affinity (and, possibly, in favor of more continuous existence as one proto language, i.e. relationship). At the same time in a number of works an opinion is offered that Khinalug’s belonging to Lezgian branchis not proven. In (Nikolaev 1994) the attention is paid to the development * > ( - voiceless uvular fricative) available only in Khinalug and Nakh. This problem is disputed in (Nichols 2003) where a series of examples of such affinity is considered, in particular, ‘five’: Nakh pxi and Khinalug px´u. Nichols J. makes an attempt to explain such a “mysterious” affinity due to these forms parallel development. Hence, in most of the works Khinalug is included into Lezgian branch. GLD and ASJP-PMI support this hypothesis and in case the hypothesis of their affinity is true, then the GLD and ASJP-PMI tree branches will be in full compliance with Ethnologue. The lexicostatistical method does not include Khinalug into the Lezgian branch, but it places them in the tree close to each other, grouping them in the 'Southern' branch and in this way also supporting their relationship. B). The data concerned of Lak and Dargwa affinity seem to be contradictory. In (Nichols 2003) there are multiple examples both of innovations shared by them and innovations shared by them with other languages. Here are a few other examples. 1. Cluster has disappeared only in Lak and protoDargwa (PD) but remained intact as form l in other branches. 2. The development *Hr- (Lak. t: Vr-, PD *dVr-), H – some laryngeal, V- some , is intrinsic only for Lak and Dargwa

http://www.ijhcs.com/index Page 1319

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

On the other side: 3. *b in part of words has developed as p Lak and Dargwa, proto Lezghian, Khinalug but not in other branches 4. * (glottalized labial stop) has elolutioned as b in Avar and proto ; and in part of lexis of proto Nakh, proto Dargwa, proto Tsezic, Khinalug but not in Lak and proto Lezhgian. 5. *w has developed as b in Avar, proto Andic, proto Nakh, proto Dargwa, proto Tsezic but has been preserved in Lak , proto Lezgian and Khinalug. To juxtapose these entire multiple and contradictory data is quite complicated. Therefore, though in many experts’ works (WALS; Schulze 2014) Lak and Dargwa are integrated into a common branch, their relationship is considered to be not proven. In Nakhleh et al. (2005) application of computer algorithms in such cases is given as a reason that the programs take into account simultaneously all the data and are able to find a solution agreeable with maximum number of data. In LS tree as well as in ASJP-PMI tree Lak and Dargwa are integrated into a single branch. In ASJP-LDND Lak and Dargwa are in close proximity in the tree but are separated by Avar which is combined with Dargwa. It is an apparent error of the tree constructing algorithm especially since in previous version of ASJP-LDND-tree Avar was placed with Andic languages while Lak and Dargwa formed a common branch. C). Avar-Andic and Tsezic are coupled in LS, ASJP-PMI and ASJP-LDND (with exception Avar). Affinity of Avar-Andic and Tsezic languages is recognized in practically all papers that is why will not be further disputed. The arrangement of Avar in ASJP-LDND-tree as being detached from other languages seems to be not the consequence of error in base data (ASJP data base) but rather a consequence of imperfection in trees constructing phylogenetic algorithm. Phylogenetic algorithms construct the correct trees if certain conditions on language evolution (additivity, lexical clock, etc.) are satisfied (Semple & Steel 2003). However, for real families these conditions are rarely satisfied and algorithm errors are quite possible and even almost inevitable. The discussion of this question comes beyond the scope of the given article. In this regard see (Solovyev 2014). If to apply another method, the procedure of multidimensional scaling with constructing 2- dimensional plot, we shall get the following: see (Fig. 3).

http://www.ijhcs.com/index Page 1320

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Fig. 3. Proximity diagram built for North East Caucasian languages by the method of multidimensional scaling

Here Avar located in Avar-Andic group what is rather far from Dargwa and Tsezic languages are apparently close to Avar-Andic. Although the method of multidimensional scaling does not build trees it is often being used in phylogenetic research (Serva 2011; Houtzagers 2010; Blanchard 2011). D). The status of stays to be not clear. In some expert classifications these languages are regarded as one of the two sub-groups in North East Caucasian languages in line with Daghestanian. In other classifications they are placed on the same level with the rest of Daghestanian languages. However, revealed Daghestanian innovations are open to discussion. For instance, all-Daghestanian innovation is proven to be *st > *c, however, (Nichols 2003) contests its significance regarding this development to be quite natural and its independent appearance in various branches of East Caucasian languages to be possible. Nevertheless, primary division of East Caucasian languages into Nakh and Daghestanian is recognized in most of the works and it is this division that is shown in ASJP- PMI tree. Suddenly, Daghestanian clade not allocated by LS, GLD and ASJP-LDND methods. The results are resumed in the following table.

http://www.ijhcs.com/index Page 1321

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Tsezic languages are integrated with Avar-Andic in all classifications. The Khinalug is combined with in all classifications except (Alekseev 2001, Burlak 2005). A situation of combining Dargwa with Lak and intregrating all Daghestanian in one branch is more confusing.

Table 3. Groups of the upper level in compliance with the main sources and methods

Ruhlen, Diako Burla Alek- Schul- Nich GLD ASJP LS WALS nov k seev ze ols LDND PMI Daghestanian + - - + - + - - + - Lezgic+ + + - - + + + + + + Khinalug Dargwa+Lak + - - - + - ? - + + Avar-Andic+ + + + + + + ? + + + Tsezic (except Avar) Note. The sign ‘?’ denotes absence of data.

In general table 3 shows that expert classifications can essentially diverge (for instance, Schulze 2014 and Alekseev 2001). At the same time best computer-aided classifications (Koryakov 2001, ASJP-PMI) are close to each other and coincide with or close to acknowledged expert classifications (WALS, Ruhlen (1987)). In general, LS и ASJP-PMI gives the best result (more agreeable from the traditional standpoint) in comparison with ASJP-LDND. The data of GLD project are incomplete.

4. The structure of branches

4.1. Nakh languages. In compliance with all classifications Bats was the first to be detached from proto-Nakh language, Chechen is close to Ingush and together they form one sub-branch. 4.2. Tsezic languages. All expert classifications setting the objective to construct a more detailed tree of Tsezic languages state the following relationship: (Hunzib, Bezhta) and (Dido, Hinukh). As far as Khvarshi concerns it is included into the first group (Schulze 2014), in (Alekseev 2001) and in Ethnologue it comes into the second group. It turns out to be included also in the second group according to both ASJP-trees and the LS-tree. Khvarshi is not examined in the GLD project. Hence, automatic classifications are in exact compliance with the reference one but not support version of (Schulze 2014). ASJP-trees as compared to the LS-tree of (Koryakov 2006) provide additional information of closer affinity of Dido and Hinukh rather than Khvarshi. 4.3. Avar-Andic languages. This is the first branch under consideration with rather large number of languages and quite complicated structure described in different ways. In all the trees the is differentiated from Andic, and both of them are separated as

http://www.ijhcs.com/index Page 1322

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

common branch (except the mistake in the ASJP-LDND-tree discussed above). Standard classifications like Ethnologue, WALS and recently created resource Glottlog (Nordhoff 2013) do not at all classification of Andic languages. However, further we see the divergence of classifications. W. Schulze (fig. S2b) suggests the following order of detaching languages from proto- Andic (in the Newick format): (Andi, (Akhvakh, (Karata, (Botlikh, Godoberi, Chamalal, (Tindi, Bagvalal))))). (Alekseev 2001) gives quite different classification of languages into three groups: ((Andi, Botlikh, Godoberi), (Akhvakh, Karata), (Bagvalal, Tindi, Chamalal)). According to the lexicostatistical tree of the fig. S7 the detaching procedure of languages is the following: (Akhvakh, (Andi, (Karata, (Botlikh, Godoberi, Chamalal, (Tindi, Bagvalal))))) that coincides with classification (Schulze 2014) nearly in full, except the position of the two languages mentioned first. ASJP-LDND-tree also shows first detachment of Akhvakh (supporting thereby lexicostatistical approach): (Akhvakh, (Andi, (Chamalal, ((Botlikh, Godoberi), (Karata, (Tindi, Bagvalal)))))). Finally, ASJP-PMI-tree shows the order: (Akhvakh, (Andi, (Karata, (((Botlikh, Godoberi), Chamalal), (Tindi, Bagvalal))))), almost exactly as the lexicostatistical tree. For the Andic languages Ethnologue does not recommend any nontrivial structure, so it is pointless use the criterion offered in Nichols & Warnow (2008) for the upper level branches. In this case it is possible to apply two ideas. Firstly, pairwise comparison of trees by means of one of the metrics with the subsequent comparison of the results shown by different methods. The purpose is to assess the methods, to determine which methods are similar or, on the contrary, standing apart. Secondly, the union (by means of special algorithms) of different trees in a consensus (summary) tree. We can expect that the gaps and wrong decisions present in different methods get leveled and the result will be the most probable tree of evolution. The more trees constructed by different (but reasonable) methods we have, the more probable it is to obtain the correct summary tree. This approach is applied in Kassian (2015) for the Lezgic languages and the LS group of algorithms, however, the summary tree was constructed manually, i.e. subjectivity remained. The following table suggests distances between the trees under discussion according to Robinson-Foulds, having been calculated with the help of TREX program (http://www.trex.uqam.ca/index.php?action=rf&project=trex).

Table 4. Distances between the trees of Andic branch according to Robinson-Foulds

Schulze Alekseev LS ASJP-LDND ASJP-PMI Schulze 0 7 2 6 4 Alekseev 7 0 7 9 9 LS 2 7 0 4 2 ASJP-LDND 6 9 4 0 4 ASJP-PMI 4 9 2 4 0

The classifications by Alekseev and ASJP-LDND are far apart from the other classifications and out of keeping with each other. The classifications by Schulze, LS, and ASJP-PMI form the group or cluster of the most alike classifications. It's curious that among

http://www.ijhcs.com/index Page 1323

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

these three most congruent classifications there are representatives of all three groups underlined in the beginning of the given paper: expert, lexicostatistical, and the group based on the phonetic affinity. Besides, lexicostatistical method of classification occupies a central position among the others. If to calculate an aggregate distance from each classification to the rest of them, then this aggregate distance will turn to be the least for lexicostatistical method. Quartet distance was computated by means of algorithm QDIST (http://birc.au.dk/software/qdist/). The results of computation are given in the table below.

Table 5. Distances between the trees of Andic branch according to Quartet distance

Schulze Alekseev LS ASJP-LDND ASJP-PMI Schulze 0 86 6 60 36 Alekseev 86 0 85 85 77 LS 6 85 0 54 30 ASJP-LDND 60 85 54 0 36 ASJP-PMI 36 77 30 36 0

At calculating the distances according to Quartet metrics the classifications of Schulze, andLS, ASJP-PMI happen to be most close to each other, moreover, the trees of Schulze and LS are very much similar. Though absolute numerical values in both metrics are very much different, their qualitative patterns are close enough. The most disputable is the question of two languages separation: which of them was first to detach: Andic or Akhvakh. According to all computer procedures Akhvakh was the first. However, in W. Schulze’s opinion, the first was Andic. The results of comparative method (Nikolaev 1994) support rather the first version than the second. Akhvakh is the only language among other Andic having kept the sound q while in the rest of these languages it was transformed into в χ or χ/h. Akhvakh is also sole language having preserved the sound , in one of its dialects, whereas in other Andic languages it was transformed in λ. Thus, there exist innovations shared by all the rest of Andic languages apart from Akhvakh. The Akhvakh language is characterized by a series of unique innovations: *c > č, *p > h(w) (in Northern dialect), * > , *k: > x and other. Judging from these data Akhvakh definitely stands on the periphery of Andic languages. Some features are also registered that unite Akhvakh with Andi, for instance, both of them keep . The other difference lies in merging of Botlikh and Godoberi in both ASJP- trees and classification by Alekseev (though it comprises Andic, too) while the former two trees do not.

4.4. Lezgic languages. Here we deal properly with Lezgian languages without Khinalug being suggested above. We saw the different representations for Lezgic:

LS: (Udi, (Archi, (Rutul, Tsakhur), ((Budukh, Kryts), (Lezgi, (Tabassaran, Aghul)))))

Alekseev: (Udi, (Archi, ((Lezgi, (Tabassaran, Aghul)), (Budukh, Kryts), (Rutul, Tsakhur)))) http://www.ijhcs.com/index Page 1324

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

ASJP: ((Archi, Udi), ((Rutul, Tsakhur), ((Budukh, Kryts), (Lezgi, (Aghul, Tabassaran)))))

Schulze: (Archi, ((Tsakhur, Rutul), (Kryts, Budukh), (Udi, (Tabasaran, (Lezgi, Agul)))))))

GLD: ((Archi, Udi), ((Lezgi, (Tabasaran, Aghul)), ((Budukh, Kryts), (Rutul, Tsakhur)))))).

At the lowest level all classifications are practically congruent in distinguishing three language groups of close relationship: (Rutul, Tsakhur), (Kryz, Budukh), (Lezgi, Tabasaran, Agul). The main divergences are seen only in the order of detaching Archi, Udi from proto- Lezgian. Archi is the first to be detached in (Schulze 2014) and Udi is isolated first in (Koryakov 2006), (Alekseev 2001). It is interesting to note that in Archi is inferred from the frames of Lezgian languages and is treated as an independent branch in Daghestanian. In both ASJP-trees and in GLD (fig. S6) Archi conjointly with Udi form a separate branch being kept apart from the other Lezgic. The structure of Lezgian branch fully coincides in both ASJP-trees. Table 6 shows the distances between the classifications mentioned above according to Robinson-Foulds.

Table 6. Distances between the trees of Lezgian branch according to Robinson-Foulds

Schulze Alekseev GLD LS ASJP Schulze 0 6 5 6 7 Alekseev 6 0 3 2 3 GLD 5 3 0 5 2 LS 6 2 5 0 3 ASJP 7 3 2 3 0

For Lezgic languages Schulze’s classification as well as LS one happened to be on the periphery, i.e. the most distant from other classifications. The trees in the three other classifications are close to each other, the distance between them does not exceed 3. This group comprises expert classification and two ones based on phonetic affinity including ASJP.

Table 7. Distances between the trees of Lezgian branch according to Quartet distance

Schulze Alekseev GLD LS ASJP Schulze 0 58 70 88 82 Alekseev 58 0 43 56 43 GLD 70 43 0 63 36

http://www.ijhcs.com/index Page 1325

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

LS 88 56 63 0 27 ASJP 82 43 36 27 0

According to this metric most close to each other appear to be computer classifications of LS and ASJP, GLD and ASJP. By the aggregate distance to other classifications ASJP has a minimum value and Schulze’s classification – a maximum one. In whole, the Lezgic classifications appear to be closer than Andic classifications. The problem of primary detachment of Archi or Udi is solved in various classifications in a different way. The same is true of bundling any two groups from three (Aghul, Lezgi, Tabassaran), (Budukh, Kryts) and (Rutul, Tsakhur) into one subbranch. Inside the group of (Lezgi, Aghul, Tabassaran) almost all the classifications offer such a variant: (Lezgi, (Aghul, Tabassaran)). The comparative method gives conflicting data according to evolution of Archi and Udi. Archi is sole language having preserved in its dialects tw, *tw > t is the shared innovation for another of Lezgic languages. Among the innovations in Archi itself is the evolution of *k: > - k, *k > -k:-. Evolution of *t:w > d took place in all the languages apart from Archi and Udi. Development > t: occurred only in Udi. For more detailed information according to shared innovations refer to (Nikolaev 1994).

5. Approaches comparison

To begin with, we take two methods based on phonetic affinity: ASJP and GLD. Their comparison is curious in the length of basic vocabulary lists: in GLD they contain up to 110 words and in ASJP they amount only 40. The impact of the list size upon final results has been discussed in (Brown 2010; Holman 2008) and the authors came to different conclusions. Holman (2008) suggests that the enlargement of words number does not result in considerable improvements while Brown (2010) acquired data (for Maya languages) proving that the longer lists give more exact trees. Unfortunately, GLD presents data concerning solely a part of North East Caucasian languages. The group of Nakh languages is too small and has a simple structure. This group is presented likewise in both approaches: ASJP and GLD (as well as in other schemes). Tsezic languages are also few in numbers and the structure of their affinity coincides in data of both databases ASJP and GLD. The greatest interest sparks Lezgic group. Both ASJP and GLD combine Archi and Udi languages in one branch. The result is quite unexpected because their close relationship was not suggested by any expert classification. These languages’ native speakers live rather far from each other, so, they have no close contacts and, hence, no large-scale borrowings at least lately. Later we shall return to this question. Irrespective of whether the hypothesis of Archi and Udi closer relationship within Lezgic group is true or not, we can state that both methods point out the great extent of these languages’ phonetic affinity despite the length of vocabulary lists in GLD. Both methods place Khinalug on the periphery of the remaining part of Lezgic languages. Both approaches adequately differentiate the groups of East, West and South Lezgic as well as correctly determine closer affinity of Aghul and Tabassaran (as compared to Lezgian).

http://www.ijhcs.com/index Page 1326

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

The only difference between these two classifications is the fact that GLD combines West and South Lezgic into one branch whereas ASJP couples South and East Lezgic. But this difference is insignificant. The results obtained in this matter by each method are different, and below we shall return to it. Thus, both classifications based on phonetic affinity give practically identical results in spite of the different length of basic vocabulary lists and other differences in technical details (the means of distance calculation and trees construction). The comparison of trees ASJP-LDND and ASJP-PMI makes possible to find out the extent of words affinity measure laid in the languages affinity measure. As it has already been mentioned above, the apparent errors in ASJP-LDND tree emerged at the level of division into sub branches. At a lower level ASJP-PMI tree also demonstrates its advantage before ASJP-LDND. Though for Lezgian languages the results of these two approaches coincide, they differ for Andic and, as was stated above, the result of ASJP-PMI better agree with other data being currently available. Further on let us compare ASJP-PMI with lexicostatistical approach. For Andic languages both classifications are similar, however, for Lezgian languages ASJP-PMI shows a better result. In general, both methods produce similar results for the lower level branches. For average level two classifications differ in positions of peripheral languages in large branches: Archi and Udi in Lezgic group, Ubykh in North West Caucasian family. Lexicostatistical approach in these cases is closer to expert classifications, what is not surprising, as the use of cognates lies in its base. Accordingly, lexicostatistical classification requires essentially larger scope of expert knowledge than the approaches based on the phonetic affinity. And at last, at the uppermost level both classifications properly differentiate the main branches of East Caucasian languages. Moreover ASJP-PMI additionally combines Lak- Dargwa with Lezgic and lexicostatistical approach couples Nakh with Avar-Andic-Tsezic. Combining Lak-Dargwa languages into the common branch with Lezgic was suggested by (Schulze 2014), implicitly bolsters up ASJP-PMI version and model of (Diakonov 1988) in which the is included into Dargwa-Lezgian branch. At the same time, the convergence of Nakh with Avar-Andic-Tsezic looks doubtful. Let’s come over to ASJP-PMI comparison with expert classifications. In the cases when language groups are reliably determined by comparative-historical approach ASJP-PMI procedure gives the same classification where at the upper level we find such groups as Avar- Andic, Tsezic, Lezgic, Lak, Dargwa, Khinalug, Nakh. The medium level supposes differentiation of the Avar language in Avar-Andic group. The lower level will comprise (Abkhaz, Abaza), (Adyghe, Kabardian), (Chechen, Ingush), (Bagvalal, Tindi), (Chamalal, Godoberi, Botlib), (Bezhta, Hunzib), (Dido, Hinukh, Khwarshi), (Tabassaran, Aghul, Lezgi), (Kryz, Budukh), (Tsakhur, Rutul). In cases under dispute ASJP-PMI procedure backs up the main hypotheses set forward within the frames of comparative-historical approach: the relationship of Lak and Dargwa and other mentioned above. The availability of several trees constructed on various principles presents one more interesting opportunity of comparison. For a variety of trees itemizing the same objects it is possible to construct a consensus tree summarizing all agreements between the trees. The degree of such agreement can be varied. Strict consensus method requires full agreement. Whereas Majority-rule consensus method requires only the majority of trees agreement http://www.ijhcs.com/index Page 1327

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

(>0.5). The precise definitions can be found in (Wilkinson 1996). Algorithm of consensus trees construction is implemented in the package R (www.r-project.org/). At applying this package to we shall get the following consensus trees.

Fig. 4. A consensus tree of Andic languages, Strict consensus method

As can be seen, Strict consensus Tree is coincides with the tree from Ethnologue. It is nothing to be surprised about since Ethnologue comprised only indisputable classification decisions. And as we can see, not any hypothesis of Andic languages classification met with unanimous support in five approaches under consideration.

Fig. 5. A consensus tree of Andic languages, Majority-rule consensus method

In the tree constructed by means of Majority-rule consensus method (solutions taken in minimum of three trees from five) a more complicated structure is postulated. Akhvakh, Andi, Karata are withdrawn from the common group and, moreover, this assumes that in such particular order these languages are detached from proto Andic. Bagvalal and Tindi are integrated in a common sub branch. This hypothetic tree is quite possible and with a strong probability can be laid into the basis of further disputes about the structure of Andic group. Let us come over to consensus trees of Lezgian group.

http://www.ijhcs.com/index Page 1328

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Fig. 6. A consensus tree of Lezgian languages, Strict consensus method

Fig. 7. A consensus tree of Lezgian languages, Majority-rule consensus method

Strict consensus tree of Lezgian languages does not support the division into Nuclear Lezgic and situated outside it Udi and Archi from Ethnologue. In particular, Udi in classification of Schulze wedges in Nuclear Lezgic. For this group of languages Majority-rule consensus tree appears to be most proximal to the tree from Ethnologue. The only distinction between them lies in the fact that in the branch of (Aghul, Lezgi, Tabassaran) in Majority-rule consensus tree Aghul and Tabassaran are additionally outlined as more closely related. The most interesting is that our majority-rule consensus tree of Lezgian languages coincides with the consensus trees constructed in Kassian (2015). In this paper, the summary tree is constructed manually by comparing 6 trees obtained by different phylogenetic algorithms. The construction of all trees is based on Swadesh 110-item wordlists indicating cognate matches. This coincidence has been obtained in independent research summarizing the results of many other independent studies, and this is a good indication that the majority- rule consensus tree may reflect the real process of evolution of Lezgian languages.

http://www.ijhcs.com/index Page 1329

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Let us consider to the fundamental structural difference between usual phylogenetic trees and consensus trees. Usually in phylogenetic trees exactly two edges go out from each internal node, while in consensus trees often several edges depart from one node. This can reflect our lack of knowledge about the order in which branches split off (this is called a soft polytomy in Nichols 2008), as well as the really simultaneous, or nearly simultaneous, split of branches into several subbranches (hard polytomy in the terminology used in (Nichols & Warnow 2008)). The last situation may occur for Udi, Archi and the protolanguage of the other Lezgic languages. The phylogenetic algorithms applied by us show a strictly binary split, and therefore we are compelled in this case to unite some two branches. Owing to some minor features of Udi and Archi or details of the algorithm performance, Udi and Archi are joined in one branch. This can mean that there was really an independent proto-Udi- which separated from proto-Lezgian, however, it very quickly split into Udi and Archi, so it did not leave noticeable traces in the form of common innovations. In absence of such obvious evidence of the existence of a proto-Udi-Archi, the scenario presented in the Majority-rule consensus tree seems more natural. The situation with South, West and East Lezgic is similar. Possibly, they separated almost at the same time, as it is shown by the Majority-rule consensus tree. The phylogenetic algorithms constructing binary trees do not show a stable result in such situations. Depending on some minor nuances, they unite one couple of branches or the other.

6. Conclusion

The paper presents the data acquired by means of the main currently well-known classifications of North Caucasian languages. All the classifications both expert and automatic are substantially identical to each other at their upper and lower levels of branching. Automatic classifications either based on lexicostatistics or on phonetic affinity, differentiate the same groups as experts what proves that both methods to be highly reliable. In cases when the comparative-historical approach makes possible to get generally accepted result both expert and computer-aided methods also affirm thereof. The good agreement observed between phylogenetic methods and expert methods is perhaps due to particularities in the evolution of North Caucasian languages – lack of migration and presence of obstacles to communication between villages. The automatic (computer-aided) classifications can be used if the description of basic vocabulary of the languages under survey being available. Their results can serve a good approximation for true relationship of the languages. As for the disputable issues at the medium level, the divergences between automatic classifications and their divergences with expert classifications do not exceed the differences between the expert classifications. Automatic classifications provide a reliable back up for some suggested hypotheses: the affinity of Lak and Dargwa, Lezgic and Khinalug languages. The comparison of ASJP and GLD procedures showed that the enlargement of words number in the lists of basic vocabulary at determining phonetic affinity does not at all influence the trees structure at least for North Caucasian languages. A case study of Andic and Lezgic languages shows how it is possible to use various classifications to get some kind of average (consensus) classification to the most extent accounting the results of all others. The consensus classifications can also be built up automatically, and, as it is shown in the example of Lezgic languages, it coincides with the http://www.ijhcs.com/index Page 1330

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

one constructed manually. The study can be compared with Metasearch engine when a search query is sent to several search services and after that the results received from them are processed and combined by some means. Elaboration of quite a number of methods based on various kinds of language affinity definitions makes possible to utilize Metasearch idea in historical linguistics with a view to get classifications that will be more detailed and at the same time draw upon more experts’ opnions than the existing ones. Let us conclude the paper with a quote from M. Dunn talk at 10th International Conference on the Evolution of Language (Dunn 2014): “Phylogenetic comparative methods put historical linguistics to the forefront of the modern endeavour to understand language structure and the mechanisms linguistic and cultural change”.

Acknowledgment. This work was funded by the subsidy of the Russian Government to support the Program of Competitive Growth of Kazan Federal University.

References

Alekseev, Mikhail. E. (ed.). 2001. Kavkazkie Jazyki. Moscow: Izd-vo Akademija. Atkiinson, Quentin& Russell Gray. 2006. How old is the Indo-European language family? Illumination or more moths to the flame? Phylogenetic methods and the prehistory of languages, ed. by Peter Forster & Colin Renfrew, 91–110. Cambridge, UK: MacDonald Institute Press, University of Cambridge. Bastin, Yvonne. 1983. Classification lexicostatistique des langues bantoues (214 releves). Bulletin des Séances de l’Académie royale des sciences d’outre-mer 27.173–99. Blanchard, Philippe, F. Petroni, M. Serva & D. Volchenkov. Geometric Representations of Language Taxonomies. Computer Speech & Language, Elsevier, 2011, 25 (3), pp.679. Brown, Cecil H. & Eric W. Holman. 2010. Comparing ASJP approaches to automated classification: correspondence-based and lexical-based Trees for Mayan. Formerly at http://email.eva.mpg.de/~wichmann/Mayan_2.pdf. Accessed May 1, 2013. Brown, Cecil H., Eric W. Holman & Søren Wichmann. 2013. Sound correspondences in the world’s languages. Language 89:4-29 Brown, Cecil H., Eric W. Holman, Søren Wichmann & Viveka Velupillai. 2008. Automated classification of the world's languages: A description of the method and preliminary results. STUF – Language Typology and Universals 61.4: 285-308 Burlak, Svetlana A. & Sergey A. Starostin. 2005. Sravnitel’no-istoricheskoe yazykoznanie. Moskva: Akademiya. Church, Kenneth Ward & Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16.1: 22-29. Cysouw, Michael, Søren Wichmann & David Kamholz. 2006. A critique of the separation base method for genealogical subgrouping, with data from Mixe-Zoquean. Journal of Quantitative Linguistics 13.225–64. Diakonov, Igor M. & Sergey A. Starostin. 1988. Hurrito-Urartskie i Vostochnokavkazskie Jazyki. In: Drevnij Vostok: Etnokul’turnye Sbjazi, 164-207. Moscow: Nauka. Donohue, Mark. 2010. Skou. Formerly at http://email.eva.mpg.de/~wichmann/Skou.pdf. Accessed May 1, 2013.

http://www.ijhcs.com/index Page 1331

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Donohue, Mark & Simon Musgrave. 2007. Typology and the linguistic macrohistory ofIsland Melanesia. Oceanic Linguistics 46.348–87. Dunn, Michael. 2014. Language phylogenies. In C. Bowern & B. Evans (Eds.), The Routledge handbook of historical linguistics (pp. 190-211). London: Routlege. Dunn, Michael. 2014. What were they thinking? Phylogenetic comparative methods and language history. 10th International Conference on the Evolution of Language 2014. Workshop booklet. P. 42. https://evolangx.univie.ac.at/ Dunn, Michael, Angela Terrill, Ger P. Reesink, Robert A. Foley & Stephen C. Levinson. 2005. Structural phylogenetics and the reconstruction of ancient language history. Science 309.2072–5. Dunn, Michael, Robert A. Foley, Stephen C. Levinson, Ger P. Reesink & Angela Terrill. 2007. Statistical reasoning in the evaluation of typological diversity in Island Melanesia. Oceanic Linguistics 46.388–403. Estabrook George F., F. R. McMorris & Christopher A. Meachham. 1985. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Systematic Zoology 34(2):193–200. Gippert Jost, Wolfgang Schulze, Zaza Aleksidze & Jean-Pierre Mahé. 2008. The Caucasian Albanian Palimpsests of Mt. Sinai. 2 vols. Turnhout: Brepols Publishers. Gray, Russell D. & Fiona M. Jordan. 2000. Language trees support the express-train sequence of Austronesian expansion. Nature 405.1052–5. Gray, Russell D. & Quentin D. Atkinson. 2003. Language-tree divergence times support the Anatolian theory of Indo-European origins. Nature 426.435–9. Haspelmath, Martin, Matthew S. Dryer, David Gil & Bernard Comrie. 2005. The World Atlas of Language Structures. Oxford: Oxford University Press. Holden, Clare J. 2002. Bantu language trees reflect the spread of farming across Sub- Saharan Africa: a maximum-parsimony analysis. Proceedings of the Royal Society of London Series B 269.793–9. Holden, Clare J., A. Meade & M. Pagel. 2005. Comparison of maximum parsimony and Bayesian Bantu language trees. The evolution of cultural diversity: a phylogenetic approach, ed. by R. Mace, C. J. Holden & S. Shennan, 53–65. London, UK: University College London Press. Holden, Clare J. & Russell Gray. 2006. Rapid radiation, borrowing, and dialect continua in the Bantu languages. Phylogenetic methods and the prehistory of languages, ed. by Peter Forster and Colin Renfrew, 19–31. Cambridge, UK: MacDonald Institute Press, University of Cambridge. Houtzagers, Peter, John Nerbonne & Jelena Prokić. 2010. Quantitative and traditional classifications of Bulgarian dialects Compared. Scando-Slavica 56:2, 163-188. Holman, Eric W., Søren Wichmann, Cecil H. Brown, Viveka Velupillai, André Müller & Dik Bakker. 2008. Explorations in automated language classification. Folia Linguistica 42.2: 331-354. Huff, Paul & Deryle Lonsdale. 2011. Positing language relations using ALINE. Language Dynamics and Change 1: 128-162. Jäger, Gerhard. 2013. Phylogenetic inference from word lists using weighted alignment with empirically determined weights. Language Dynamics and Change 3: 245-291.

http://www.ijhcs.com/index Page 1332

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Kassian, Aleksei. 2015. Towards a formal genealogical classification of the Lezgian languages (): testing various phylogenetic methods on lexical data. PLoS ONE 10(2): e0116950. doi:10.1371/journal.pone.0116950 Koryakov, Yury. 2006. Atlas Kavkazskih Jazykov. Moscow: Institut Jazykoznanija RAN. Lewis, M. Paul, Gary F. Simons & Charles D. Fennig (eds.). 2013. Ethnologue: Languages of the World, Seventeenth edition. Dallas, Texas: SIL International. Online version: http://www.ethnologue.com. Magomedbekova Z. 2001. Karatinskij jazyk. In “Jazyki mira. Kavkazskie Jazyki”. M: Academia. P. 261-268. Marten, Lutz. 2006. Bantu classification, Bantu trees, and phylogenetic methods. Phylogenetic methods and the prehistory of languages, ed. by Peter Forster & Colin Renfrew, 43–56. Cambridge, UK: MacDonald Institute Press, University of Cambridge. Mooi E. and M. Sarstedt. 2011. A Concise Guide to Market Research. Berlin/Heidelberg: Springer-Verlag. Müller, André, Søren Wichmann, Viveka Velupillai, Cecil H. Brown, Pamela Brown, Sebastian Sauppe, Eric W. Holman, Dik Bakker, Johann-Mattis List, Dmitri Egorov, Oleg Belyaev, Robert Mailhammer, Matthias Urban, Helen Geyer & Anthony Grant. 2010. ASJP World Language Tree of Lexical Similarity: Version 3 (July 2010). http://asjp.clld.org/download. Müller, André, Viveka Velupillai, Søren Wichmann, Cecil H. Brown, Eric W. Holman, Sebastian Sauppe, Pamela Brown, Harald Hammarström, Oleg Belyev, Johann-Mattis List, Dik Bakker, Dmitri Egorov, Matthias Urban, Robert Mailhammer, Matthew S. Dryer, Evgenia Korovina, David Beck, Helen Geyer, Pattie Epps, Anthony Grant & Pilar Valenzuela. 2013. ASJP World Language Trees of Lexical Similarity: Version 4 (October 2013). http://asjp.clld.org/download. Nakhleh Luay, Don Ringe & Tandy Warnow. 2005. Perfect phylogenetic networks: A new methodology for reconstructing the evolutionary history of natural languages. Language 81(2): 382-420. Nakhleh, Luay, Tandy, Warnow, Donald A., Ringe, Jr & Steven N. Evans. 2005a. A comparison of phylogenetic reconstruction methods on an IE dataset. Transactions of the Philological Society 103.171–92. Nichols, Johanna. 2003. The Nakh-Daghestanian consonant correspondences. Current Trends in Caucasian, East European, and Inner Asian Linguistics: Papers in honor of Howard I. Aronson, ed. by Dee Ann Holisky and Kevin Tuite, 207-251. Amsterdam: John Benjamins. Nichols, Johanna & Tandy Warnow. 2008. Tutorial on computational linguistic phylogeny. Language and Linguistics Compass 2: 760-820. Nikolaev, Sergey & Sergey Starostin. 1994. A North Caucasian Etymological . Moscow: Asterisk. Nordhoff, Sebastian, Harald Hammarström,Robert Forkel& Martin Haspelmath (eds.). 2013. Glottolog 2.2. Leipzig: Max Planck Institute for Evolutionary Anthropology. Polyakov, Vladimir & Valery Solovyev. Komp’juternye modeli I metody v tipologii i komparativistike (Computational Models and Methods in Typology and ). Kazan: Kazanskiy Gosudarstvennyy Universitet. (in Russian). 2006.

http://www.ijhcs.com/index Page 1333

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Polyakov, Vladimir, Valery Solovyev, Sóren Wichmann, Oleg Belyaev. Using WALS and Jazyki mira. Linguistic Typology. V. 13. 2009. P. 135–165. Pompei, Simone, Vittorio Loreto & Francesca Tria. 2011. On the accuracy of language trees. PLoS One 6.6, e20109. Rama Taraka & Prasanth Kolachina. How Good are Typological Distances for Determining Genealogical Relationships among Languages? Proceedings of COLING 2012: Posters. Martin Kay and Christian Boitet (eds.) Indian Institute of Technology Bombay, 2012. 975-984. Rexová, KateRina, Daniel Frynta & J. Zrzavý. 2003. Cladistic analysis of languages: Indo-European classification based on lexicostatistical data. Cladistics 19.120–7. Rexová, KateRina, Yvonne Bastin & Daniel Frynta. 2006. Cladistic analysis of Bantulanguages: a new tree based on combined lexical and grammatical data. Naturwissenschaften 93(4).189–94. Rijsbergen, Keith van. 1979. Information Retrieval. London: Butterworth. Ringe, Donald A., Jr, Tandy Warnow & Ann Taylor. 2002. Indo-European and computational cladistics. Transactions of the Philological Society 100.59–129. Robinson, D. R. & L. R. Foulds. 1981. Comparison of phylogenetic trees. Mathematical Biosciences 53: 131-147. Ruhlen, Merritt. 1987. A Guide to the World’s Languages, Vol. 1: Classification. Stanford: Stanford University Press. Saunders, Arpiar. 2005. Linguistic phylogenetics of the Austronesian family: a performance review of methods adapted from biology. BA thesis, Swarthmore College. . Schulze, Wolfgang. 2014. Comparative Grammar of East Caucasian. http://schulzewolfgang.de/ index.php /2-biblio-schulze Semple Charles & Mike Steel. 2003. Phylogenetics. Oxford: Oxford University Press. Serva Maurizio. 2011. Phylogeny and geometry of languages from normalized Levenshtein distance. http://arXiv.org/abs/1104.4426v3. Serva, Maurizio & Filippo Petroni. 2008. Indo-European languages tree by Levenshtein distance. Europhysics Letters 81.68005–9. Shijulal, Nelson-Sathi, Johann-Mattis List, Hans Geisler, Heiner Fangerau, Russell D. Gray, William Martin & Tal Dagan. 2011. Networks uncover hidden lexical borrowing in Indo-European language evolution. Proceedings of the Royal Society B: Biological Sciences 278 (1713): 1794– 1803. Solovyev, Valery, Venera Bayrasheva & Rustam Faskhutdinov. Testing phylogenetic algorithms in linguistic databases. Communications in Computer and Information Science. V. 465. 2014. P.373-383 Starostin, Georgiy (ed.) 2011–2015. The Global Lexicostatistical Database. Moscow/Santa Fe: Center for Comparative Studies at the Russian State University for the Humanities; Santa Fe Institute. Available: http://starling.rinet.ru/new100. Accessed 16.02.2014. Talibov, Bukar. 1980. Sravnitel’naya fonetica lezginskih jazykov. Moscow. Templ M., Filzmoser P. & Reimann C. 2007. Problems and Possibilities of Cluster Analysis.

http://www.ijhcs.com/index Page 1334

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

http://www.meduniwien.ac.at/ROeS/ROeS_Seminar_Bern_2007/talks/ROeS2007_Templ. pdf. Wichmann, Søren, Eric W. Holman, Dik Bakker & Cecil H. Brown. 2010. Evaluating linguistic distance measures. Physica A. 389: 3632-3639 (doi:10.1016/j.physa.2010.05.011). Wichmann, Søren & Arpiar Saunders. 2007. How to use typological databases in historical linguistic research. Diachronica 24.2: 373-404. Wilkinson, Mark. 1996. Majority-rule reduced consensus trees and their use in bootstrapping. Molecular Biology and Evolution 13: 437-444.

http://www.ijhcs.com/index Page 1335

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Application. The trees in the Newick format

(((Abaza, Abkhaz), (Adyghe, Kabardian), Ubykh), ((Bats, (Chechen, Ingush)), ((Avar, (Akhvakh, Andi, Bagvalal, Botlikh, Chamalal, Ghodoberi, Karata, Tindi)), ((Bezhta, Hunzib), (Dido, Hinukh, Khvarshi)), Dargwa, Khinalugh, Lak, (Archi, Udi, ((Aghul, Lezgi, Tabassaran), (Budukh, Kryts), (Rutul, Tsakhur))))))

Fig. S1. North Caucasian languages from Ethnologue

 Innovative Type • Avar-Andic • Andic • Avar • Tsezic  Conservative Type • Nakh-Dargi-Lak • Nakh • Dargi-Lak • Dargi • Lak • Khinalugh- Lezgic • Lezgic • Khinalugh

(((Nakh, (Dargi, Lak)), (Lezgian, Khinalug)), (Tsezian, (Awar, Andian)))

Fig. S2a. Higher level division of North-East Caucasian languages from (Schulze, 2014)

((Awar, (Andi, (Akhwakh, (Karata, (Botlikh, Godoberi, Chamalal, (Bagvalal, Tindi)))))), ((Tsez, Hinukh), (Bezhta, Hunzib, Khwarshi)))

Fig. S2b. Detailed classification of Tsezic and Avar-Andic languages from (Schulze, 2014)

(((Dargi, Lak), (Bats, (Ingush, Chechen))), (Khinalug, (Archi, ((Tsakhur, Rutul), (Kryts, Budukh), (Udi, (Tabasaran, (Lezgi, Agul))))))) http://www.ijhcs.com/index Page 1336

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

Fig. S2c. Detailed classification of the remaining North Caucasian languages from (Schulze, 2014)

• West Caucasian • Abkhaz-Abazin: Abaza (abq), Abkhaz (abk) • Circassian: Adyghe (ady), Kabardian (kbd) • Ubyx: Ubykh (uby) • East Caucasian • Nakh • Bats (bbl) • Chechen-Ingush: Chechen (che), Ingush (inh) • Daghestanian • Avar-Andic-Tsezic • Tsezic • East Tsezic: Bezhta (kap), Hunzib (huz) • West Tsezic: Dido (ddo), Hinukh (gin), Khvarshi (khv) • Andic • Andic (A): Andi (ani), Botlikh (bph), Ghodoberi (gdo) • Andic (B): Akhvakh (akv), Karata (kpt) • Andic (C): Chamalal (cji), Bagvalal (kva), Tindi (tin) • Avar (ava) • Dargwa (dar) • Khinalug (kjj) • Lak (lbe) • Lezgic • Udi (udi) • Nuclear Lezgic • Archi (aqc) • Proper Lezgic • East Lezgic • Lezgi (lez) • Tabasaran-Aghul: Tabassaran (tab), Aghul (agx) • South Lezgic: Budukh (bdk), Kryts (kry) • West Lezgic: Rutul (rut), Tsakhur (tkr)

(((Abaza, Abkhaz), (Adyghe, Kabardian), Ubykh), ((Bats, (Chechen, Ingush)), ((Avar, ((Bezhta, Hunzib), (Dido, Hinukh, Khvarshi)), ((Andi, Botlikh, Godoberi), (Akhvakh, Karata), (Chamalal, Bagvalal, Tindi))), Dargwa, Khinalug, Lak, (Udi, (Archi, ((Lezgi, (Tabassaran, Aghul)), (Budukh, Kryts), (Rutul, Tsakhur))))))).

Fig. S3. Classification from Alekseev (2001)

http://www.ijhcs.com/index Page 1337

Volume 3 Issue 1 INTERNATIONAL JOURNAL OF HUMANITIES AND June 2016 CULTURAL STUDIES ISSN 2356-5926

((((Abaza, Abkhaz), Ubykh), (Adyghe, Kabardian)), (((Akhvakh, (Andi, (Chamalal, ((Botlikh, Ghodoberi), (Karata, (Bagvalal, Tindi)))))), ((Bezhta, Hunzib), ((Dido, Hinukh), Khvarshi))), ((((Dargwa, Avar), Lak), (Bats, (Chechen, Ingush))), ((Archi, Udi), (Khinalugh, ((Rutul, Tsakhur), ((Budukh, Kryts), (Lezgi, (Aghul, Tabassaran)))))))))

Fig. S4. ASJP-tree, 4th version from (Müller et al. 2013)

((((Abaza, Abkhaz), Ubykh), (Adyghe, Kabardian)), ((Bats, (Chechen, Ingush)), (((Lak, Dargi),((Udi, Archi), (Khinalug, (Tsakhur, Rutul), ((Kryz, Budukh), (Lazgi, (Aghul, Tabassaran)))))), (((Hunzib, Bezhta), (Khwarshi, (Hinukh, Dido))), (Avar, (Akhvakh, (Andi, (Karata, ((Tindi, Bagvalal), (Chamalal, (Botlikh, Godoberi)))))))))))

Fig. S5. ASJP-tree with PMI metric (Jaeger 2013)

(((Batsbi, (Chechen, Ingush)), ((Dido, Hinukh), (Bezhta, Hunzib))), ((Archi, Udi), (Khinalug, ((Lezgi, (Tabasaran, Aghul)), ((Budukh, Kryts), (Rutul, Tsakhur))))).

Fig. S6. North Caucasian languages from GLD (http://starling.rinet.ru/new100/allsim.png)

(((Abaza, Abkhaz), (Adyghe, Kabardian), Ubykh), ((((Bats, (Chechen, Ingush)), (((Bezhta, Hunzib), (Dido, Hinukh, Khvarshi)), (Avar, (Akhvakh, (Andi, (Karata, ((Bagvalal, Tindi), Chamalal, Godoberi, Botlikh))))))), (Lak, Dargi), (Khinalug, (Udi, (Archi, (Rutul, Tsakhur), ((Budukh, Kryts), (Lezgi, (Tabassaran, Aghul))))))))).

Fig. S7. Tree constructed by lexicostatistical methods (Koryakov 2006)

http://www.ijhcs.com/index Page 1338