Alexei Kassian (Institute of Linguistics of the Russian Academy of Sciences) [email protected], 21 October, 2014

Linguistic homoplasy and phylogeny reconstruction. The cases of Lezgian and (North )

The paper deals with the problem of linguistic homoplasy (parallel or back developments), how it can be detected, what kinds of linguistic homoplasy can be distinguished and what kinds are more deleterious for language phylogeny reconstruction. It is proposed that language phylogeny reconstruction should consist of two main stages. Firstly, a consensus tree, based on high-quality input data elaborated with help of the main phylogenetic methods (such as NJ, Bayesian MCMC, MP), and ancestral character states are to be reconstructed that allow us to reveal a certain amount of homoplastic characters. Secondly, after these homoplastic characters are eliminated from the input matrix, the consensus tree is to be compiled again. It is expected that, after homoplastic optimization, individual problem clades can be better resolved and generally the homoplasy-optimized phylogeny should be more robust than the initially reconstructed tree. The proposed procedure is tested on the 110-item Swadesh wordlists of the Lezgian and Tsezic groups. Lezgian and Tsezic results generally support theoretical expectations. The Minimal lateral network method, currently implemented in the LingPy software, is a helpful tool for linguistic homoplasy detection.

1. How to reveal homoplasy ...... 1 2. What kind of homoplasy is more deleterious? ...... 5 3. Data...... 8 4. Phylogenetic methods...... 9 5. Lezgian case...... 10 6. Tsezic case...... 42 7. Conclusions ...... 75 8. References ...... 76

1. How to reveal homoplasy.1 Homoplasy is parallel or back (reverse) developments arising in the evolutionary process. This is a phenomenon which perturbs input data and makes it difficult to produce a robust phylogenetic tree of the . In some cases, intensive homoplasy makes it impossible to reveal a true phylogeny. A good indicator of the potential presence of secondary, i.e., homoplastic matches between two lects is a situation, when the lexicostatistical distances between the involved lects do not fulfil the condition of additivity. In Fig. 1a–b, the distances are more normal for natural language evolution (in Fig. 1a, the lects L1 & L2 form a distinct clade; in Fig. 1b, the lects L1, L2 & L3 form a ternary node) than Fig. 1c–d. As concerns lexicostatistics and the Swadesh wordlist, there are different views on the problem of rate of cognate

1 This section partially overlaps with List et al. 2014b. List et al. focus on loanword detection, but borrowings of any kind can be formally treated as a particular case of homoplasy. 1 replacement. For example, the original idea of Morris Swadesh (Swadesh 1952; Swadesh 1955; Lees 1953) was that that cognate replacement within the basic vocabulary can be described by the strict clock model (evolutionary rates across lineages are constant or nearly constant). The linguistic data collected by the Moscow school (the Tower of Babel and Global Lexicostatistical Database projects) generally conform to this approach, although with certain “relaxing” improvements proposed by Sergei Starostin (S. Starostin 1989/2007; S. Starostin 1999/2000; Novotná & Blažek 2007; Balanovsky et al. 2011). On the other hand, a number of scholars prefer to apply the relaxed molecular clock model to language evolution, implying that the mean rate of lexical replacement varies among branches (e.g., Gray & Atkinson 2003; Kitchen et al. 2009). In any case, it is unlikely that the range within which the mean rate of basic vocabulary replacement in practice varies can be very large (except for some rare special cases such as Icelandic). Thus, the pairs L1-L2 or L2-L3 in Fig. 1c and L1-L2 or L1-L3 in Fig. 1d are suspected to have secondary matches.

Fig. 1. Reverse distances between three lects (L1, L2, L3). Higher percentage of the shared character states means greater closeness. (a) L2 & L3 are close to each other, both are equally remote from L1; (b) the three are equally distant from each other; (c) the three distances are not equal to each other; (d) (a) L2 & L3 are remote from each other, both are equally close to L1.

(a) (b) (c) (d)

L1 L1 L1 L1 60% 60%

50% 50% 40% 40% 40% 50% 40% L2 L3

L2 50% L3 L2 60% L2 60% L3 L3

A more difficult task is to detect exactly what characters are homoplastic. The original linguistic dataset represents a multistate matrix (for matrix compilation, see, e.g., Atkinson & Gray 2006: 93–94). If we are dealing with lexical characters (lexicostatistics), synonyms, i.e., more than one word in one slot, are almost inevitable. To my best knowledge, Starling (S. Starostin 1993/2007; Burlak & Starostin 2005: 270 ff.) is the only phylogenetic software which is able to process input matrices with synonyms (when the same Swadesh slot is occupied by more than one word, i.e., by several synonyms, all possible pairs of involved words between two languages are compared within this slot: if there is at least one matching pair, Starling treats the whole slot as a match). In order to make the dataset importable in most popular phylogenetic packages, it was proposed by Gray & Atkinson 2003; Atkinson & Gray 2006 to convert the original multistate matrix into binary format. Binarization is coding the presence “1” or absence “0” of the specific proto- root with the specific Swadesh meaning in the given language, Swadesh items superseded 2 by loanwords or simply not documented are marked as “?” (the difference between this procedure, accepted in the Global Lexicostatistical Database project, and the conversion, described in Atkinson & Gray 2006, is that Atkinson and Gray treat loanwords as full-fledged items with distinct cognate indices). It remains unclear how seriously such a conversion corrupts input data and causes model misspecification (cf. Barbançon et al. 2013: 164), but up today all available tests suggest that phylogenetic results of a multistate matrix and its binary counterpart are quite similar if not identical. Not all homoplastic developments can be revealed. Firstly, some cases of back evolution cannot be detected (at least without extra evidence such as ancient texts or old borrowings in neighboring languages): Fig. 2.

Fig. 2. A character has two states: A, B. A

B

B A

Secondly, parallel evolution within the same clade can hardly be distinguishable from evolution of the intermediate ancestral state: Fig. 3.

Fig. 3. A character has two states: A, B. A A

A B vs.

B B B B

From the formal point of view, if we have two characters in a multistate matrix each of them has at least two states with equal cost of change between the states (e.g. one has the states A & B, the second — C & D), and they take all four possible pairs of states in the matrix: “AC”, “AD”, “BC”, “BD”, these characters are incompatible and at least one of them is homoplastic (see, e.g., Semple & Steel 2003: 69 ff.). In such and some other cases, the reconstructed tree topology can suggest exactly which character is homoplastic: Fig. 4.

3 Fig. 4. Two incompatible characters. The first character has the states A, B; the second one has the states C, D. The second character demonstrates homoplasy. A C~D

A A B B C D C D

As one can see, reconstructed tree helps to detect homoplasy within one multistate character (the so-called “criss-crossed” configuration): Fig. 5.

Fig. 5. A character has two states: C, D. C~D

C DC D

However, the maximum amount of homoplastic developments in multistate or binary matrix can be revealed, if ancestral character states, i.e., character states for the proto- language are reconstructed. Such a reconstruction is actually a non-trivial theoretical and practical task (Kassian, Zhivlov & Starostin forth.), particularly the reconstruction is impossible without the established phylogenetic tree. The picture is somewhat different when we are dealing with a binary lexicostatistical matrix, converted from an original multistate matrix with “1” denoting a marked state of the character and “0” — an unmarked one (i.e., “1” = presence, whereas “0” = absence of the specific proto-root with the specific Swadesh meaning in the given language; the so- called presence/absence matrix). Even there are two incompatible characters in the input matrix which take all four possible pairs of states: “00”, “01”, “10”, “11”, the change 1 > 0 (loss of the root) is not a significant event, it can occur independently in different languages, and such a loss may hardly be regarded homoplastic. Thereby the known tree topology is unhelpful for detection of linguistic homoplasy in such a binary matrix. To detect the fact of homoplasy and reveal exact homoplastic characters it is needed to reconstruct ancestral character states. Once the phylogenetic tree of the analyzed language group is obtained and character states for the proto-language are reconstructed, it is reasonable to examine the input matrix searching for homoplastic characters and eliminate all parasitic matches caused by

4 these characters. It is expected that a new phylogenetic tree reconstructed on the basis of the elaborated matrix with eliminated homoplastic matches will turn out more robust than the initially reconstructed tree. Hence I call a phylogenetic tree produced from the standard dataset as non-homoplasy-optimized tree or simply non-optimized tree, and a tree produced from the examined dataset with at least partially eliminated homoplastic developments as homoplasy-optimized tree. Cf. recent attempts to detect lexical homoplasy and loanwords with help of formal algorithms, based on the minimal lateral network (MLN) approach, implemented in the LingPy software: Nelson-Sathi et al. 2011; List & Moran 2013; List et al. 2014a; List et al. 2014b; below it will be tested and discussed in more detail. For the NeighborNet-network approach, see Bryant et al. 2005; Holden & Gray 2006.

2. What kind of homoplasy is more deleterious? For practical tasks, several partially overlapped kinds of linguistic homoplasy can be distinguished. 1) Lexical borrowings. 2) Independent homoplasy. 3) Contact-driven homoplasy. 4) Synonymy and suppletion in a proto-language.

Lexical borrowings (loanwords) represent the most trivial and routine case of homoplasy. Normally, loanwords are revealed on the basis of phonetic and morphological evidence, although, in some cases, sociolinguistic or historical information can help to detect a borrowed item. It seems reasonable to exclude all the revealed loanwords from the phylogenetic analysis (S. Starostin 1989/2007: 416–417), i.e., treat them as lexicographic lacunae, when no expressions for the given semantic concepts are documented for the given language. Sometimes loanwords are technically analyzed as full-fledged, although etymologically isolated items (singletons): cf. the Indo-European Swadesh database Dyen et al. 1997 and some phylogenetic studies, based on Dyen et al.’s data, such as Gray & Atkinson 2003. Such an approach does not seem justified, since lexical replacement by borrowed items does not reflect natural language evolution. It is extra-linguistic events which are accidental, being depending on political and sociolinguistic situation. A n e x a m p l e. Modern English ⟨mountain⟩ goes back to the Middle English form which was borrowed from Old French ⟨montaigne⟩ ‘mountain’. The Swadesh slot ‘mountain’ should be left empty for the Modern English language, since we do not know, how an inherited Modern English word for this meaning could sound. Note that it is recommended to treat a loanword as a normal lexicostatistical item, if it has acquired its meaning in question already in the target language.

5 E x a m p l e s. Modern Demotic Greek puli ⟨πουλί⟩ ‘bird’ originates from late Ancient Greek puːll-íon ⟨πουλλίον⟩, a diminutive of pûːll-o-s ⟨ποῦλλος⟩ ‘chicken’. The latter was indeed borrowed from Latin pullus ‘young (of animals) / chick, chicken’, but the meaning shift ‘chicken’ > ‘bird’ is an internal Greek development; therefore, it is natural to treat Modern Demotic puli ‘bird’ as a full-fledged Swadesh item. Similarly, Modern German ⟨Kopf⟩ ‘head’ goes back to Old High German ⟨kopf⟩ ‘mug, bowl’, borrowed from Latin ⟨cupa, cuppa⟩ ‘cask, bowl’, but the meaning shift ‘bowl’ > ‘head’ is an internal Germanic event, thus Modern German ⟨Kopf⟩ should be lexicostatistically regarded as a full-fledged form.

Independent homoplasy within the Swadesh wordlist arises relatively rarely at a reasonable time depth and cannot seriously affect the resulting phylogenetic tree. Its impact is especially insignificant if “step-by-step” reconstruction is applied, when a proto- language is reconstructed sequentially on the basis of proto-languages of the previous taxonomic level. A n e x a m p l e. Examining the slot ‘moon’ in the Indo-European (IE) family, we find that two IE lexemes are used for this meaning in the Slavic group: *meːn-oː-t / *meːn-(e)s- (the bulk of lects) and *lowk-s-n-aː (Russian, Bulgarian, Slovene). The same two lexemes occur as ‘moon’ in the Italic group: the former one in Umbrian, the latter one in Latin. It is a criss-crossed configuration (Fig. 5) which implies a parallel development for these groups. Proceeding from various more or less formal criteria (such as tree topology and typology of semantic shifts), we can safely reconstruct *meːn-oː-t / *meːn-(e)s- as both Proto- Slavic and Proto-Italic terms for ‘moon’. Thus, *lowk-s-n-aː in the meaning ‘moon’ is an independent, i.e., parallel semantic development within Slavic and Italic. Note that after such intermediate reconstructions, there is no homoplasy in the slot ‘moon’, when we are dealing with the reconstructed Proto-Slavic and Proto-Italic data.

A more critical factor is a phenomenon which could be called contact-driven homoplasy. Two contacting lects can acquire the same phonetic, morpho-syntactic or semantic innovations under the influence on the part of each other. This contact-driven effect can be especially strong, when closely genetically related and geographically neighboring lects are involved. For lexicostatistical purposes, two main kinds of contact-driven semantic shifts, i.e., contact-driven homoplasy should be noted (both are named as loanshift by Haugen 1950: 214–215, 219–220; specified as loan translation and loan meaning extension by Haspelmath 2009: 39). 1) Cognate or simply phonetically similar words in two lects may synchronously acquire a new meaning. There is no lexical borrowing per se, only a semantic concept is borrowed that is supported by phonetic similarity of the words in question. A n e x a m p l e. Ukrainian rik underwent the shift ‘term, time period’ > ‘year’ (having superseded more archaic ɦid ‘year’) under the influence on the part of Polish rok ‘year’.

6 Ukrainian rik and Polish rok are etymological cognates with regular sound correspondences that is transparent for Ukrainian and Polish speakers. Two distinct cognate indices should be assigned to the Ukrainian and Polish forms in the homoplasy- optimized lexicostatistical matrix or, since we are sure that the direction of influence was Polish > Ukrainian, we can go further and simply mark Ukrainian rik as a loanword. 2) Loan translation or semantic loan, when a semantic concept is borrowed without phonetic similarity or etymological relationship of the expressions in question. E x a m p l e s. Slovenian verb za=stop-i-ti ‘to understand’ is a morpheme-for-morpheme borrowing from German ⟨verstehen⟩ ‘id.’. German words ⟨Kopf⟩ ‘head’ has acquired the additional meaning ‘main word in a syntactic phrase’ under the influence on the part of the same polysemy of English ⟨head⟩. Apparently the former kind of contact-driven semantic shifts, supported by phonetic similarity and etymological relationship, occur more frequently than the latter one (cf. similar observations by Haugen 1950: 220), but often, when closely related lects are involved, homoplastic developments of the former kind successfully imitate natural etymological evolution and get treated by linguists as true cognates.

Synonymy in a proto-language. In order to explain incompatible or criss-crossed lexical characters, historical linguists sometimes propose to reconstruct synonymical roots or stems. It means that one meaning was expressed by several equal words in a proto- language and further, in daughter languages, such a synonymy was simplified in different ways with only one word having retained the original meaning in individual lects. Such an uncontrolled reconstruction of proto-synonymy especially does not make sense, when we are dealing with the Swadesh wordlist. The available typological data of the GLD project suggest that the normal cases of technical synonymy in the 100- or 110-item wordlist are either morphological suppletion (for which see below) or inadequate lexicographic descriptions (when we are not able to chose between two identically glossed forms and are forced to fill the slot with the both). When there are two words with the same Swadesh meaning in a language which are equal in respect of frequency and style it should mean that a middle stage of lexical replacement is registered for the given Swadesh slot (perhaps Modern English ⟨many⟩ and ⟨a lot of⟩ can illustrate this phenomenon). It is expected, however, that a language should have zero, one or at best two slots with such “true” synonyms at a moment. Thus, if a proto-language indeed possessed Swadesh slots with “true” synonyms, we can neglect it, since the amount of these slots is minimal. A specific and important case of lexicostatistical synonymy is morphological suppletion, if we conventionally treat suppletive stems as lexicostatistical synonyms. For examples, the GLD standard (Kassian et al. 2010; G. Starostin 2010) prescribes to fill the slots of personal pronouns ‘I’, ‘thou’, ‘we’ with both direct and oblique stems, if these are

7 suppletive. As noted above, simplification of a suppletive paradigm can produce incompatible characters (Fig. 4) and criss-crossed configuration (Fig. 5). A n e x a m p l e. The Bulgarian personal pronoun ni-ye ‘we’, nas ‘us’ etymologically corresponds to Latin noːs ‘we, us’, whereas Lithuanian mìːs ‘we’, mus ‘us’ etymologically corresponds to Tocharian B wes ‘we, us’. Since Bulgarian is the closest relative of Lithuanian, we are dealing with the criss-crossed configuration (Fig. 5). It is not a real homoplasy, however, since we can securely reconstruct the suppletive paradigm *wey-s [direct] / *n(V)s- [oblique] for the Proto-Indo-European language. Note that only expert can decide whether the reconstruction of synonymical character states for the proto-language is reasonable or not in the individual case. Information, required for such a decision, is missing from input matrices. Because of this, proto- synonymy could hardly be discriminated from real homoplasy by any formal algorithms.

3. Data. Within the framework of the Global Lexicostatistical Database project,2 110-item high- quality wordlists of basic vocabulary for 20 Lezgian lects and 9 Tsezic lects have been compiled and annotated by the author (Kassian 2011–2012; Kassian 2013–2014). The following languages are included in the current version of the GLD Lezgian database: Udi (Nidzh, Vartashen), Archi, Kryts (Kryts proper, Alyk), Budukh, Tsakhur (Mishlesh, Mikik, Gelmets), Rutul (Mukhad, Ixrek, Luchek), Aghul (Koshan, Keren, Gequn, Fite, Aghul proper), Tabasaran (Northern, Southern), Lezgi (Gyune) plus the reconstructed Proto-Lezgian list. The current version of the GLD Tsezic database consists of: Hunzib, Bezhta (Bezhta proper, Khoshar-Khota, Tlyadal), Hinukh, Dido (Kidero, Sagada), Khwarshi (Khwarshi proper, Inkhokwari) plus the reconstructed Proto-Tsezic list. Cognation indexes within the multistate matrices were marked with help of traditional comparative method. I use the Proto-Lezgian reconstruction by the late Sergei Starostin (Starostin & Nikolayev 1994: 122 ff.; S. Starostin 1994; S. Starostin n.d.) and the Proto-Tsezic reconstruction by Sergei Nikolaev (Nikolayev 1978; Starostin & Nikolayev 1994: 110 ff.) with certain corrections and improvements when necessary. See Kassian, Zhivlov & Starostin forth.; G. Starostin 2013; Kassian 2013 for the methodology and basic principles of proto-language wordlist reconstruction. For tree rooting, the 110-item wordlist of the Chechen literary language (G. Starostin 2011) has been introduced into comparison as an outgroup. Chechen was chosen as a language genetically related to the investigated groups (Lezgian and Tsezic) within the North Caucasian linguistic family (or more narrowly within its Nakh-Dagestanian cluster), on the one hand, and as a lect which is definitely not a member of the Lezgian or Tsezic

2 http://starling.rinet.ru/new100/main.htm [Accessed 20.09.2014]. 8 groups on the other. Etymological comparison between Chechen and Lezgian/Tsezic is based on Starostin & Nikolayev 1994 with some corrections from G. Starostin 2011.

4. Phylogenetic methods. Lexicostatistical trees were produced by several phylogenetic methods. 1. Modified neighbor joining method, designed by S. Starostin for lexicostatistical analysis and implemented in the Starling software (method Starling neighbor joining, hence StarlingNJ); see Burlak & Starostin 2005: 163 ff.; Kassian fothc.. The StarlingNJ trees were produced in the Starling software v.2.5.3 (see S. Starostin 1993/2007; Burlak & Starostin 2005: 270 ff.) from the lexicostatistical database which represents a multistate matrix with synonymy allowed. For node dating, the so-called “experimental method” was applied, according to which each Swadesh item possesses an individual relative index of stability (S. Starostin 2007a; G. Starostin 2010). The non- parametric bootstrap test was performed (10 000 pseudoreplicates). The hierarchical agglomerative clustering produces by its very definition a rooted tree. Dates of the nodes were established by strict molecular clocks, see S. Starostin 1989/2007; S. Starostin 1999/2000; Novotná & Blažek 2007; Balanovsky et al. 2011 on scale calibration and further details. For data elaborated by the StarlingNJ method, two kinds of trees are offered: a tree with binary nodes only (as produced by the NJ algorithm), and the same tree, where neighboring nodes are joined in one node if the temporal distance between them is 300 years or less (300 years correspond to mutation of ca. 1.5 words in a lect, a reasonable calculation error). The trees were visualized in Starling and then manually redrawn for best appearance. 2. Standard neighbor joining method (hence NJ), see Saitou & Nei 1987; Makarenkov et al. 2006: 65–66. The trees were produced in the SplitsTree4 software v.4.13.1 (Huson & Bryant 2006) from the binary lexicostatistical matrix (NEXUS format) which was generated from the original multistate matrix by coding the presence (“1”) or absence (“0”) of each proto-root in each language (Swadesh items superseded by loanwords or simply not documented are marked as “?”). The non-parametric bootstrap test was performed (10 000 pseudoreplicates). The trees were rooted by the outgroup (the Chechen wordlist). The trees are not dated. The trees were visualized in the FigTree software (v.1.4.0). Also additional trees were produced by the BioNJ method (Gascuel 1997), these are topologically identical to the NJ ones in all cases. 3. Unweighted pair group method with arithmetic mean method (hence UPGMA), see Sneath & Sokal 1973: 230–234; Makarenkov et al. 2006: 65–66. The trees were produced in the SplitsTree4 software v.4.13.1 from the binary matrix described above. The non-parametric bootstrap test was performed (10 000 pseudoreplicates). The trees were rooted by the outgroup (the Chechen wordlist). The trees are not dated. The trees were visualized in the FigTree software (v.1.4.0). 4. Markov chain Monte Carlo method under Bayesian framework (hence Bayesian MCMC), see Makarenkov et al. 2006: 68–69, as it was for the first time applied to linguistic data in Gray & Atkinson 2003. The trees were produced in the MrBayes software v.3.2.1 (Huelsenbeck & Ronquist 2001)

9 from the binary matrix described above. I used F81 model with rates = gamma. The program was run 4 times using 4 concurrent Markov chains; the was marked as an outgroup. Each run produced 5 000 000 tree generations with samples taken every 500 generations. For each run, first 25% tree generations were discarded as a burn- in. The consensus trees were rooted by the outgroup (the Chechen wordlist). The trees are not dated. The trees were visualized in the FigTree software (v.1.4.0). 5. Unweighted maximum parsimony method (hence UMP), see Makarenkov et al. 2006: 66–67. The trees were produced in the TNT software (Willi Hennig Society edition of TNT, v.1.1, May 2014, see Goloboff et al. 2008) from the binary matrix described above by the branch-and-bound (“Implicit enumeration”) algorithm. Obligatory binarization of nodes was prohibited (“Collapse trees after the search”); the Chechen language was marked as an outgroup. When several optimal trees of equal cost are obtained, the strict consensus tree is produced for which the non-parametric bootstrap test is performed (1000 pseudoreplicates). The trees were rooted by the outgroup (the Chechen wordlist). The trees are not dated. The trees were visualized in the FigTree software (v.1.4.0).

Lexicostatistical networks were produced by several phylogenetic methods. 6. NeighborNet method (Bryant & Moulton 2004; Makarenkov et al. 2006: 89–90). The networks were produced in the SplitsTree4 software v.4.13.1 from the binary matrix described above. The non-parametric bootstrap test was performed (10 000 pseudoreplicates). The networks were visualized in the SplitsTree4 software. 7. Minimal lateral network method (List et al. 2014a; List et al. 2014b). The networks were produced and visualized in the LingPy software (List & Moran 2013), version 2.4.1.alpha (List et al. 2014c) from the specific matrix converted from the Starling multistate matrix. The analysis was based on the weighted parsimony approach which assigns different weights to gain and loss events and searches for the most parsimonious evolutionary scenarios to explain how the characters evolved along the reference tree. The software tests five gain-loss models: 3−1, 5−2, 2−1, 3−2, 1−1, and chooses the best fitting one for the given dataset.

5. Lezgian case. Lezgian is a relatively deep linguistic group which consists of languages spoken in South-East (Russian Federation) and the adjacent parts of Azerbaijan, Fig. 6. The Lezgian group is a member of the Nakh-Dagestanian clade of the North Caucasian linguistic family.

10 Sea Vartashen Vartashen Nidzh Nidzh [earlier] Kryts proper Dzhek Alyk Xaput Gyune Yarki Qurah Gelkhen Giliar Doquzpara Akhty Fiy Qurush Jaba Dashagyl Khinalug Kyuri Samur Quba Khinalug Budukh language Lezgi language Kr yts language Caspian Dyubek Khyuryuk Khirga Churkul Kukhrik Sugak Kurkakh Akhty Kaluk Nitrik Eteg Northern Southern Tabasaran

Azerbaijani Kaitag Azerbaijani Azerbaijan DAGHESTAN ' 2006 Koryakov, Yuri [email protected] Keren Gequn/Burkikhan Tsirkhe Aghul proper Fite Khpyuk Burshag Khudig non-Koshan Koshan

Aghu l language Azerbaijani Ixrek Muxrek Vurush Luchek Amsar Shinaz Mukhad Borch Khnov Northern Southern Rutu l language Avar

Georgian AzerbaijaniAjinohursteppe group A

W

Azerbaijani Mukhakh-Sabunchi Dzhynykh Mishlesh Muslakh Tsakhur-Kum Suvagil Mikik Gelmets LAK group Arakul Darg Amukh Chirag Ashti a Georgi Tsakh Gelmets Thakhur language lezgian group

Avar Azerbaijani

Fig. 6 (adapted from Koryakov 2006: map #13). Map of the modern Lezgian lects.

11 Lexicostatistical analysis of the Lezgian group, performed with the aid of the aforementioned phylogenetic methods (StarlingNJ, NJ, UPGMA, Bayesian MCMC, UMP), was published in Kassian fothc.. See Fig. 7 for the consensus Lezgian non-homoplasy-optimized tree which very well conforms to the traditional expert classification of the group; see Fig. 8 for the Lezgian non-homoplasy-optimized NeighborNet-network; see Fig. 9 for the Lezgian non-homoplasy-optimized minimal lateral network.

12 Koshan Aghul Keren Aghul Gequn Aghul Fite Aghul Aghul (proper) Nidzh Udi Udi Vartashen Archi Gelmets Tsakhur Mishlesh Tsakhur Mikik Tsakhur Luchek Rutul Mukhad Rutul Ixrek Rutul Budukh Kryts (proper) Alyk Kryts Northern Tabasaran Southern Tabasaran 2000 Gyune Lezgi 1500 63 / 86 x Tsakhur (1240 AD) (1240 Tsakhur 1000 Udi (990 AD) Udi (990 Kryts (970 AD) Kryts (970 Tabasaran (990 AD) (990 Tabasaran Rutul (900 AD) Rutul (900 78 / x 55 86 / 91 76 Aghul (510 AD) Aghul (510 South Lezgian (480 AD) South Lezgian (480 AD x / 64 BC x / 92 500 0 East Lezgian (770 BC) West Lezgian (740 BC) West 1000 500 90 / x 62 90 / x 80 1500 Nuclear Lezgian (1240 BC) 59 / 82 48 Lezgian (1730 BC)

Fig. 7 (adapted from Kassian fothc.: Fig. 5XXX). Manually constructed consensus non-homoplasy-optimized phylogenetic tree of the Lezgian lects based on the StarlingNJ, NJ, BioNJ, UPGMA, Bayesian MCMC, UMP methods. The gray ellipses mark 4 joined nodes which cover binary branchings that differ depending on the method. Statistical support values are shown in the following sequence: NJ / MCMC / UMP (“x” means that P ≥ 0.95 in an individual method; not shown for nodes with P ≥ 0.95 in all methods). StarlingNJ dates are proposed.

13 Koshan_Aghul

Gequn_Aghul Outgroup_Chechen Aghul_proper

Southern_Tabasaran Northern_Tabasaran Archi Fite_Aghul

94.1 Keren_Aghul87.969.7 25.7

5037.7 47.9 84.9 Vartashen_Udi 91.9 Nidzh_Udi 275.511.970.7 10.8 51 86.24.452.734.1 Gyune_Lezgi 33.922.3 62.2 11.993.4

73.345.536.4

20.1 88.2 Mikik_Tsakhur 86.4 Mishlesh_Tsakhur 38.159.5 Gelmets_Tsakhur Alyk_Kryts 78.192.9 91.9 Ixrek_Rutul

Mukhad_Rutul Kryts_proper Budukh

Luchek_Rutul Fig. 8. Non-homoplasy-optimized phylogenetic network of the Lezgian lects produced by the NeighborNet method from the binary matrix in the SplitsTree4 software. Bootstrap values are shown near the branches (not shown for stable branches with bootstrap value ≥ 95%). Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4.

14 Koshan_Aghul -

Keren_Aghul - Gequn_Aghul -

tul ul u Fite_Aghul - Northern_TabasaranAghul_proper - -

Southern_Tabasaran - - - - Ixrek_R - - Luchek_Rutul - - Mukhad_Rut 3 - Gelmets_Tsakhur

Gyune_Lezgi - - - Mikik_Tsakhur - Mishlesh_Tsakhur

Kryts_proper - - - Archi

- - Vartashen_Udi Links Inferred Alyk_Kryts - - Nidzh_Udi 1 Budukh - - Outgroup_Chechen Fig. 9. Non-homoplasy-optimized minimal lateral network of the Lezgian lects produced in the LingPy software. Based on the consensus non-homoplasy-optimized tree (Fig. 7). Node size reflects the inferred number of cognate sets present in each lect. The solid links illustrate lateral transfer events suggested by the method. Thickness and color of the links indicate the inferred number of homoplastic characters between two nodes, as specified by the right scale. The gain-loss model 5−2 is best fitting, p = 0.55 (the next nearest model is 2−1, p = 0.48).

At least the following cases of homoplastic developments within the Lezgian 110-item wordlists (Kassian 2011–2012) can be detected proceeding from the reconstructed phylogenetic tree (Fig. 7) and the reconstructed Proto-Lezgian wordlist. Initially I checked the Lezgian non-homoplasy optimized dataset manually, then the MLN module of LingPy was applied to the non-homoplasy optimized dataset. The latter detected the majority (although not all) of the homoplastic characters previously known from manual analysis plus revealed several homoplastic developments I have overlooked.

1. #3 bark Rutul (common): ǯigal / ǯugal ‘bark / peel’. Aghul (Keren dialect): gužal / žigal ‘bark’. Lezgi (common?): čikːal / čkːal ‘bark / peel’. It is likely that the Proto-Lezgian meaning of *čːukːa-la / *kːučːa-la was ‘noodles’ or ‘a k. of food rising to the surface after boiling’. Proto-Aghul term for ‘bark’ is *qːärk a (retained in bot Koshan and non-Koshan dialects), whereas Keren Aghul gužal / žigal ‘bark’ is a clear innovation, influenced on the part of neighboring Lezgi or Rutul, thus the Keren Aghul term is to be treated as a loanword. The situation with the Common Rutul and Common Lezgi terms are less obvious. Proceeding from general reasons, it is more natural to suppose that Rutul was influenced on the part of Lezgi, not vice versa, but nevertheless I

15 prefer to treat both Rutul and Lezgi terms as inherited and mark them with two different numbers.

2. #4 belly Nidzh Udi: tapan 'belly'. Budukh: t p n 'belly / stomach'. The origin of tapan ~ t p n is unknown, but in all likelihood these forms represent either an independent innovations in Nidzh Udi and Budukh or a borrowing from an unknown source, since the Proto-Lezgian or at least Proto-Nuclear Lezgian term for ‘belly’ can be safely reconstructed as *uo=ɬ n ~ *ro=ɬ n. I mark the Nidzh Udi and Budukh words with two different numbers.

3. #5 big Tsakhur (common): χe- / χa- ‘big / many’. Aghul (non-Koshan dialects): aχa- / a- ‘big’. Tabasaran (common): aχːi / aχu ‘big’. It is possible that the Proto-Lezgian meaning of *ʔaχ - was ‘many’. The Aghul dialectal terms ‘big’ were clearly influenced on the part of Tabasaran (Koshan Aghul retains the Proto-Lezgian term *pːVhV- ~ *hVpːV- ‘big’), thus I mark Aghul aχa- / a- as loanwords. On the contrary, the Tsakhur and Tabasaran words likely represent independent innovations *ʔaχ - ‘many’ > ‘big’ (there are no common border between the Tsakhur and Tabasaran areas). I mark them with two different numbers.

4. #7 to bite Rutul (common): s s aʔ- ‘incisor / canine tooth’ + ‘to do’ / g čʼ haʔ- ‘a k. of tooth’ + ‘to do’. Aghul (common): qʼacʼ-ikʼa- ‘a piece, a bite’ + ‘to put in, move into (trans.)’. Tabasaran (common): qʼacʼ ax- ‘a bite; a piece’ + ‘to put’ / qʼacʼ apʼ- ‘a bite; a piece’ + ‘to do’ / ancʼ apʼ- ‘a bite’ + ‘to do’. The Rutul-Aghul-Tabasaran analytic patterns ‘tooth’ / ‘a piece’ / ‘a bite’ + an auxiliary verb is a recent introduction of areal origin. The starting point of this innovation is unclear, however, and I mark the Rutul, Aghul, Northern Tabasaran and Southern Tabasaran entries with four different numbers.

5. #8 black Tsakhur (common): kʼar - ‘black’ Aghul (common): kʼare- ‘black’ Tabasaran (common): kʼari ‘black’ The Proto-Lezgian and Proto-Nuclear Lezgian term for ‘black’ apparently was *lVχːV-, whereas it is likely that the Proto-Lezgian meaning of *kʼar - was ‘charcoal’. The Aghul- 16 Tabasaran forms may either point to the shift ‘charcoal’ > ‘black’ in Proto-Aghul- Tabasaran or represent late introductions (is this case, Aghul can be influenced on the part of Tabasaran). On the contrary, the Tsakhur word likely represents an independent innovation (there are no common border between the Tsakhur and Aghul-Tabasaran areas). I mark the Tsakhur and Aghul-Tabasaran words with two different numbers.

6. #10 bone Kryts (common): kʼäräpʼ Budukh: kʼerepʼ Rutul (common): qʼ r b Tabasaran (Southern only): kʼurab Lezgi (common): kʼarab The Proto-Lezgian and Proto-Nuclear Lezgian term for ‘bone’ was *yirʼː. The exact Proto-Lezgian meaning of *ʼorapː is unclear, but ‘hand bone’ is a good candidate. The Southern Tabasaran word was clearly influenced on the part of Lezgi (Northern Tabasaran retains the Proto-Lezgian term *yirʼː ‘bone’), thus I mark Southern Tabasaran kʼurab as a loanword. In the pair Rutul-Lezgi, apparently the Lezgi dialects influenced on neighboring Rutul, but since there is no formal evidence for such a direction I treat the Rutul and Lezgi terms as inherited and mark them with two different numbers. The South Lezgian (Kryts, Budukh) words can either be influenced on the part of the neighboring Lezgi dialects (the Quba group) or represent an independent semantic innovation. I mark the Kryts-Budukh term with a third number.

7. #13 nail Aghul (common): kirk ‘fingernail’ Lezgi (common?): kek ‘fingernail’ Because Tabasaran retains the Proto-Lezgian root for ‘fingernail’ (*mːäɬː), the Aghul- Lezgi match represent a late areal isogloss (the origin and original meaning of *kerk are unclear). Apparently the Lezgi dialects influenced on neighboring Aghul, but since there are no formal evidence for such a direction I treat the Aghul and Lezgi terms as inherited and mark them with two different numbers.

8. #14 cloud Archi: diɬː - Kryts (Alyk dialect): ǯif ‘cloud / fog’ Tabasaran (in the majority of dialects): difː / ǯif ‘cloud’ Lezgi (common): cːif ‘cloud / fog’

17 The general situation with Lezgian designations of ‘cloud’ and ‘fog’ is controversial, but I prefer to reconstruct Proto-Lezgian *tːiɬː with the meaning ‘fog’. Tabasaran dialects and Lezgi may represent an areal isogloss (although the direction of influence is not clear), but geographically remote Archi and Kryts demonstrate independent introductions. I mark the Archi, Alyk Kryts, Tabasaran and Lezgi forms with four different numbers.

9. #15 cold Archi: χe-tːu-CLASS ‘cold’ Tabasaran (Dyubek dialect): aq-li ‘cold’ Synchronic participle-like deverbative forms in Archi and Dyubek Tabasaran obviously represent independent innovations, although the Archi and Tabasaran verbal roots are etymologically related (*iqä- ‘to get cold’ > ‘cold (adj.)’). I mark the Archi and Dyubek Tabasaran forms with two different numbers.

Kryts (common): s=aa-y / qːa-y ‘cold’ Budukh: s=aa ‘cold’ Aghul (Proto-Aghul): ruu- ‘cold’ Tabasaran (Southern): =au ‘cold’ Lezgi (Gyune dialect): qːa-yi ‘cold’ These synchronic participle-like deverbative forms looks like recent introductions, although the corresponding verbal roots in the aforementioned lects are etymologically related (*ʔirqːe(r)- ‘to get cold’ > ‘cold (adj.)’). The Aghul-Tabasaran-Lezgi match is apparently of areal origin, but direction of influence is unclear. The Kryts-Budukh match can be either late introductions or a Proto-Kryts-Budukh innovation (in any case, independent from the Aghul-Tabasaran-Lezgi one). I mark the Kryts-Budukh, Aghul, Southern Tabasaran and Lezgi forms with four different numbers.

Aghul (Koshan dialect): mikʼ-le- ‘cold’ Although this Aghul adjective contains the Proto-Lezgian root *meʼ-, which I reconstruct with the meaning ‘cold (subst.)’ > ‘cold (adj.)’, the Aghul dialectal stem represents a clear new formation, having been secondarily derived from the substantive ‘cold’. Because of this, I mark the Koshan Aghul form with a number, different from the number, used for other Lezgian forms, which originate from *meʼ-ä- ‘cold (adj.)’.

10. #16 to come Archi: =ai- ‘to come’ [imperf.] Aghul (common): arg-i- / ad-i- / ar-i- ‘to come’ [perf.] Independent innovation *ʔarːe- ‘a k. of motion’ > ‘to come’ in two geographically remote languages. I mark these forms with two different numbers.

18 Kryts (common): =uxu- ‘to come’ [perf.] Budukh: =axi- ‘to come’ [perf.] Tabasaran (common): af- ‘to come’ [perf.] Independent innovation *ʔiɬ e ‘to go’ > ‘to come’ in two geographically remote areas South Lezgian (Kryts-Budukh) and Tabasaran. I mark these forms with two different numbers.

11. #19 to drink Aghul (common): uχ-a- ‘to drink’ Tabasaran (Southern): uχ- ‘to drink’ It is likely that the Proto-Lezgian meaning of *ʔoχ a was ‘to gulp’ vel sim. The Aghul- Southern Tabasaran match is an areal isogloss, since Northern Tabasaran retains the Proto- Lezgian verb *HVqːVr- ‘to drink’. The Southern Tabasaran lects were influenced on the part of neighboring Aghul, thus I mark the Southern Tabasaran form as a loanword.

12. #22 earth Aghul (many dialects): rug ‘earth, soil / dust’ Tabasaran (common): rug ‘earth, soil’ Lezgi (Gyune dialect): rug ‘earth, soil / dust’ The Proto-Lezgian meaning of *rukː was ‘dust’. Because some Aghul and Lezgi dialects retain the root *näqʼ with its Proto-Lezgian meaning ‘earth, soil’, the meaning shift ‘dust’ > ‘soil’ is a late areal Aghul-Tabasaran-Lezgi isogloss, apparently Tabasaran-induced. I mark the Aghul and Lezgi forms as loanwords.

13. #26 fat Aghul (Koshan dialect): ħul ‘fat’ Tabasaran (common): χul ‘fat’ Since other Aghul dialects retain the Proto-Lezgian term *maʔ ‘fat’, the Tabasaran- Koshan Aghul match is a late areal isogloss *χul ‘a k. of fat’ > ‘fat in general’ (Koshan Aghul was influenced on the part of neighboring Tabasaran). I mark the Koshan Aghul form as a loanword.

14. #29 fish Aghul (common): čʼekʼ ‘fish’ Tabasaran (many dialects): čičʼ ‘fish’ Since Eteg Tabasaran, which is not adjacent to the Aghul area, retains the Proto-Lezgian root *χːanː ‘fish’, it is likely that widely distributed Tabasaran čičʼ represent a late areal Aghul-induced isogloss *čʼeʼ ‘a k. of reptile’ > ‘fish’. I mark Tabasaran čičʼ as a loanword. 19 15. #39 to hear Archi: =k΄o- ‘to hear’ Tabasaran (Northern): yik-΄ ‘to hear’ Since Souther Tabasaran retains the basic Proto-Lezgian root *ʔeɬ(ː) - ‘to hear’, Archi and Northern Tabasaran verbs represent independent innovations *ʔi(r)k (r)- ‘a k. of perception’ > ‘to hear’ in two geographically remote areas. I mark the Archi and Northern Tabasaran forms with two different numbers.

Rutul (common): un yik- or un y χ- or un ečʼ - ‘to hear’ Aghul (common): daχ xi- or un xa- ‘to hear’ Lezgi (Gyune): wan že- ‘to hear’ Analytic constructions ‘sound happens to X’ with different words for ‘sound’ (un = wan ≠ daχ) and different auxiliary verbs. This is a late areal isogloss, although its origin is unknown. I mark the Rutul, Aghul and Lezgi forms with different numbers.

16. #44 knee Rutul (common): qʼ aqʼ ‘knee’ Aghul (common): qʼ aqʼ / qʼuqʼ ‘knee’ Tabasaran (common): qʼamqʼ ‘knee’ Together with Proto-Dargi *qʼ aqʼ a ‘knee’, this is a late areal isogloss, shared by several neighboring lects. The original meaning of the Proto-Lezgian anatomic term *qʼamqʼ is unknown, however. I mark Rutul and Aghul-Tabasaran words with two different numbers (the Aghul-Tabasaran forms for ‘knee’ can formally go back to the Proto-Aghul- Tabasaran language).

17. #46 leaf Kryts (Proper): beš ‘leaf’ Aghul (some dialects): pʼaž ‘tree leaf’ Lezgi (Gyune dialect): beš ‘leaf’ The original meaning of Proto-Lezgian *pːaša is unclear (‘bud’ vel sim.?), but the Kryts and Aghul-Lezgi words represent independent innovations in two non-adjacent areas. Since Tabasaran probably retains the Proto-Lezgian root *ʼačʼa ‘leaf’, the dialectal Aghul- Lezgi match should be treated as a late areal isogloss (whose origin is unclear). I mark the Kryts, Aghul and Lezgi forms with three different numbers.

18. #47 to lie Rutul (common): l=uk- ‘to fall, go sprawling / to lie’ Lezgi (Gyune dialect): qːat=xi- ‘to lie / to lie down’ 20 The situation with Lezgian verbs for ‘to lie’ and ‘to sleep’ is complicated, but the Rutul- Lezgi match looks secondary, because the semantic development l=uk- ‘to fall, go sprawling’ > ‘to lie’ is probably a recent innovation in the modern Rutul dialects, independent from the Lezgi data. I mark the Rutul and Lezgi forms with two different numbers.

Tsakhur (common): =ex- ‘to lie / to lie down’ Aghul (some dialects): =ix- ‘to lie / to lie down’ Outgroup Chechen: =ill-a ‘to lie / to lie down’ The Proto-Lezgian meaning of *ʔeɬː - was apparently ‘to put; to lie (inanimate subj.)’. The Tsakhur and Aghul verbs should represent (independent) innovations. The same concerns the etymologically related Chechen stem. I mark the Tsakhur, Aghul and Chechen forms with two different numbers.

19. #49 long Udi: boχo ‘long’ Kryts (Proper): aχ-ti ‘long’ Budukh: a-p-χu ‘long’ The Proto-Lezgian meaning of *h[a]χV- was ‘to be high’ or ‘to rise, raise’. Udi and South Lezgian (Kryts, Budukh) represent independent innovations in the non-adjacent areas. I mark the Udi and Kryts-Budukh forms with two different numbers.

20. #50 louse Kryts (common): liš ‘louse’ Budukh: liš ‘louse’ Tsakhur (common): wix ‘louse’ Rutul (common): lix ‘louse’ Since Udi, Archi and East Lezgian (Aghul, Tabasaran, Lezgi) retains the Proto-Lezgian root *näcʼː ‘louse’, it is likely that etymologically obscure *loɬ( ) is a relatively late innovation for ‘louse’ in South Lezgian (Kryts, Budukh) and West Lezgian (Tsakhur, Rutul). Details are quite unclear, but I prefer to mark the South Lezgian (Kryts, Budukh) and West Lezgian (Tsakhur, Rutul) forms with two different numbers.

21. #56 mouth Tsakhur (common): al ‘mouth’ Rutul (common): al ‘mouth’ Since the Proto-Lezgian root *sː w ‘mouth’ is sporadically retained in Tsakhur and Rutul with the original anatomic meaning, it is likely that we are dealing with the late areal isogloss *ːal ‘?’ > ‘mouth’ (the direction of influence between the closely related Tsakhur 21 and Rutul languages is unclear). I mark the Tsakhur and Rutul forms with two different numbers.

22. #65 rain Budukh: m f Rutul (Ixrek dialect): maf ‘rain’ Aghul (Fite dialect): marf ‘rain’ Tabasaran (common): marx ‘rain’ Lezgi (Gyune, Quba dialects): marf ‘rain’ It is likely that the Proto-Lezgian meaning of *marɬ was ‘a k. of precipitation / foam / a k. of cloud’ vel sim. Probably Proto-Lezgian *ʔoqː a-l ‘rain’ was superseded with *marɬ in Proto-Tabasaran, whereas other forms listed above represent areal Tabasaran-induced innovations (note also *mark( )a ‘rain’ in the neighboring Dargi area); *ʔoqː a-l ‘rain’ is retained in many Rutul, Aghul and Lezgi dialects. The Budukh term can be either an independent introduction or the result of influence on the part of neighboring Quba Lezgi. I prefer to mark the Tabasaran and Budukh forms with two different numbers, whereas other forms are treat as loanwords.

23. #69 round Rutul (many dialects): ru-ud ‘round 3D/2D’ Lezgi (common): el=qː e-y ‘round 3D/2D’ These forms are synchronic participles from the verbs ‘to be round; to walk around, hang around’ and ‘to turn (intrans.)’. The Rutul and Lezgi verbs are cognates, but the derivative words for ‘round’ are obviously independent innovations. I mark the Rutul and Lezgi forms with two different numbers.

Aghul (Koshan): al=arc-ni-r ‘round 3D’ Aghul (Fite): al=urcu-t ‘round 2D’ These participle from the verb ‘to turn’ are transparent new formations (perhaps of contact origin). I mark the Koshan and Fite forms with two different numbers.

24. #72 to see Kryts (common): irqi- ‘to see’ Budukh: irqi- ‘to see’ Aghul (Koshan): raqː-a- ‘to see’ Tabasaran (common): aː- / raqː- ‘to see’ It is likely that the Proto-Lezgian meaning of *ʔarqʼːä- was ‘to look’. Thus the meaning shift ‘to look’ > ‘to see’ in South Lezgian (Kryts, Budukh) and Tabasaran should be treated as independent introductions in two non-adjacent areas. On the contrary, the Koshan 22 Aghul term is clearly influenced on the part of the Tabasaran verbs (other Aghul dialects retain the Proto-Lezgian verb *ʔakː ä- ‘to see’). I mark the Kryts-Budukh and Tabasaran forms with two different numbers and the Koshan Aghul entry as a loanword.

25. #74 to sit Tsakhur (common): g=īʔar ‘to sit / to sit down’ Tabasaran (common): =ʔeʔ- / =eʔ- ‘to sit / to sit down’ Although the original meaning of the Proto-Lezgian root *ʔeʔ( )Vr- is unclear, it is obvious that *ʔeʔ( )Vr- superseded the old root *ʔiqʼ ä- ‘to sit’ independently in Proto- Tsakhur and Proto-Tabasaran. I mark the Tsakhur and Tabasaran forms with two different numbers.

26. #81 stone Kryts (Proper): χud ‘stone’ Rutul (Mukhad, Ixrek dialects): duχ-ul ‘stone’ The Kryts and Rutul roots can formally be treated as etymologically related (Proto- Lezgian *χ(ː)utː ‘a k. of stone’), but the meaning ‘stone (in general)’ obviously represents independent innovations. I mark the Kryts and Rutul forms with two different numbers.

27. #85 that Nidzh Udi: šo ‘that’ Tsakhur (common): še-n ‘that’ This match clearly represents independent innovations, since it is even unlikely that šo in the attributive function can be reconstructed for Proto-Udi. I mark the Nidzh Udi and Tsakhur forms with two different numbers.

28. #89 tooth Rutul (Luchek, Khnyukh, Shinaz dialects): s s ‘tooth’ Tabasaran (Khiv dialect): sars ‘tooth’ Lezgi (common?): sas ‘tooth’ Although the original meaning of the Proto-Lezgian root *sars is unclear (‘a k. of tooth, fang’), it is obvious that *sars superseded the old root *s lː ‘tooth’ in (Proto-)Lezgi, whereas the aforementioned Rutul and Tabasaran dialects were subsequently influenced on the part of neighboring Lezgi dialects (other Rutul and Tabasaran dialects retain Proto- Lezgian *s lː ‘tooth’). The Rutul and Tabasaran forms should be marked as loanwords.

29. #90 tree South Lezgian (Kryts Proper, Budukh): dar, d r ‘tree’ Fite Aghul: dar ‘tree / forest’ 23 Lezgi (common?): tːar ‘tree’ Since the Proto-Aghul term for ‘tree’ is likely to be reconstructed as *kʼ ir (retained in all Aghul dialects including Fite), the second Fite word for this meaning, dar, looks like a recent and independent introduction. Note that for all this *tːar is to be reconstructed as the Proto-Nuclear word for ‘tree’. Thus a back development is expected for the Fite Aghul lineage. I mark the Fite Aghul form with a distinct number.

30. #97 white Kryts (common): läz / luzu ‘white’ Budukh: luzu ‘white’ Tabasaran (common): liʒi / lizi ‘white’ Lezgi (common): lacːu ‘white’ These forms represent adjective formations from the Proto-Lezgian substantive *lacː ‘white of egg’ or ‘white color’. The Proto-Nuclear Lezgian term for ‘white’ was probably *čːakː arV-, retained in West Lezgian (Tsakhur, Rutul), Aghul, whereas in the case of *lacː- V, it is very likely that we are dealing with independent or contact-driven innovations in the aforementioned lects according to the productive morphological pattern. I mark the South Lezgian (Kryts, Budukh), Tabasaran and Lezgi forms with three different numbers.

31. #99 woman Kryts (common): χ n b ‘woman’ Aghul (Koshan): χewe-r ‘woman’ Southern Tabasaran: χpːi-r ‘woman’ Proto-Lezgian *χon-pːV should be reconstructed with the plural meaning ‘women’ (an element of the suppletive paradigm). The Kryts, Aghul and Tabasaran forms in the singular meaning ‘woman’ represent late and apparently independent innovations. Note that these forms are not likely to be reconstructed with the singular meaning ‘woman’ even for the Proto-Aghul and Proto-Tabasaran levels. I mark the Kryts, Koshan Aghul and Southern Tabasaran forms with three different numbers.

32. #100 yellow Tsakhur (many dialects): qː b - /  b - ‘yellow’ Rutul (Mukhad dialect): qː b- ‘yellow’ Lezgi (common): qpːi ‘yellow’ These forms represent adjective formations from the Proto-Lezgian substantive *qː pː ‘yolk’. The Proto-Nuclear Lezgian term for ‘yellow’ was probably *qäqV-, retained in Archi and Aghul, whereas in the case of *qː pː-V, it is very likely that we are dealing with a late areal innovation in the aforementioned lects according to a productive morphological pattern. I mark the Tsakhur, Mukhad Rutul and Lezgi forms with three different numbers 24 (although it is natural to suppose that Mukhad Rutul was influenced on the part of the neighboring Lezgi dialects).

33. #102 heavy Aghul (many dialects): ee- / qːeqːe- ‘heavy’ Tabasaran (many dialects): ai ‘heavy’ These forms represent adjective formations from the substantives for ‘burden, load’. This a late areal innovations, because other Aghul and Tabasaran dialects retain the Proto- Lezgian term *hiqʼ ‘heavy’. For some phonetic reasons, it is likely that the Aghul dialects have been influenced on the part of Tabasaran rather than vice versa. Nevertheless I formally mark the Aghul and Tabasaran forms with two different numbers.

34. #103 near Kryts (common): müqʼo-v / miqʼe- ‘near’ Aghul (Keren, Fite dialects): muqʼu ‘near’ Lezgi (common): maqʼ a / muqʼa-l ‘near’ The general situation with the adverbs ‘near’ in Lezgian languages is unclear, but it is at least possible to suppose that the locative forms of the Proto-Lezgian substantive *w nqʼ (a) ‘place’, used with the adverbial meaning ‘near’ in the aforementioned lects, represent late innovations (either independent or areal). I mark the Kryts, Aghul and Lezgi forms with three different numbers.

35. #105 short Tsakhur (common): ǯitʼa- ‘short’ Rutul (common): ǯ k- ‘short’ Aghul (common): ǯee- / ǯeqːe- ‘short’ Tabasaran (common): ǯiqːi ‘short’ These forms represent innovations, which have superseded the Proto-Lezgian (or at least Proto-Nuclear Lezgian) adjective *kː VtʼV ‘short’. For phonetical and topological reasons, I mark the Tsakhur, Rutul and Aghul-Tabasaran forms with three different numbers (it is also possible that the Aghul form is actually a Tabasaran loanword).

36. #109 worm Tabasaran (common): š(ː)ar ‘earthworm / helminth’ Lezgi (many dialects): šar ‘earthworm / helminth’ Since Aghul retains the Proto-Lezgian (or at least Proto-Nuclear Lezgian) opposition *mulaq  ‘earthworm’ / *šːar ‘helminth’, the semantic broadening *šːar ‘helminth’ > ‘earthworm / helminth’ in neighboring Tabasaran and Lezgi is a late areal innovation. The

25 origin of such a semantic development is unclear (Tabasaran > Lezgi?), thus I mark the Tabasaran and Lezgi forms with two different numbers.

Additionally, the following instances of homoplasy between individual Lezgian lects and the outgroup (Chechen) must be mentioned and improved. In all cases, we are dealing with independent innovations either in Lezgian or in Chechen or in both.

37. #8 black Gyune Lezgi: čʼulaw ‘black’ Outgroup Chechen: ärž-a ‘black’ Since there is a good candidate for the Proto-Lezgian meaning ‘black’ (*laχːV-), whereas *čʼulV is practically isolated within the Lezgian group, it is hard to suppose that *čʼulV survived with its original meaning ‘black’ only in Lezgi; we should assume the meaning ‘a k. of dark color’ for Proto-Lezgian *čʼulV and the late development ‘a k. of dark color’ > ‘black’ in modern Lezgi. I mark the Lezgi and Chechen forms with two different numbers.

38. #12 to burn Luchek Rutul: l=ikʼ-u aʔ- ‘to burn (trans.)’ Outgroup Chechen: d=aːg-oː ‘to burn (trans.)’ The Luchek Rutul expression which literally means ‘to cause to burn (intrans.)’ is a clear new formation (the obvious Proto-Lezgian candidate for this meaning is another root, *ʔokː -). I mark the Luchek Rutul and Chechen forms with two different numbers.

39. #38. head Archi: kartʼi ‘head’ Outgroup Chechen: korta ‘head’ Since the Proto-Lezgian term for this meaning can be safely reconstructed as *woʼul (retained in Udi and Nuclear Lezgian), whereas *k ltʼ- is a good candidate for the status of the Proto-Lezgian term for ‘temple’, the Archi meaning ‘head’ can be considered secondary. I mark the Archi and Chechen forms with two different numbers.

40. #47 to lie See above.

41. #62 not Gyune Lezgi: -č ‘not’ Outgroup Chechen: ca ‘not’ Proto-Lezgian negative suffix -*čːV is sporadically attested outside Lezgi with specific negative functions (such as negated dubitative), but it certainly cannot be reconstructed as

26 a basic Proto-Lezgian exponent of negation of assertion (the prefix *tːV- is an obvious candidate for this status). I mark the Lezgi and Chechen morphemes with two different numbers.

42. #76 to sleep Vartashen Udi: nepː-aχ-e-sun ‘to sleep’, literally ‘to be in sleep’ Outgroup Chechen: nab yan ‘to sleep’, literally ‘to do sleep’ Both Udi and Chechen analytic expressions are transparent recent introductions. I mark the Udi and Chechen slots with two different numbers.

43. #85 that Tabasaran (common): du-mu ‘that’ Outgroup Chechen: daː-ra ‘that’ Since the Tabasaran morpheme du- is clearly secondary in such a distal deictic function (its etymological comparanda among Lezgian lects are weak and its proto-status is unclear), the Tabasaran-Chechen should be treated as homoplastic. I mark the Tabasaran and Chechen forms with two different numbers.

Now let us examine, how the MLN-module of LingPy has coped with the task. As was noted above, LingPy correctly detected the majority of the aforementioned cases of homoplasy that should be considered a very good result. Some details, however, require additional discussion. Firstly, LingPy reconstructs false homoplasy, if the involved taxa are too far from each other. For example, the Proto-Lezgian term for ‘sand’, *šːäm, is only reconstructed on the basis of the data from one outlier (Vartashen Udi) and one Nuclear Lezgian lect (Mukhad Rutul); in other lects, inherited forms are normally superseded by loanwords. Although there are no topological conflicts in the character ‘sand’, LingPy treats the Vartashen Udi- Mukhad Rutul match as homoplastic. The second and inevitable type of false responses is synonymy and suppletion in the proto-language for which see above. For the Lezgian dataset, LingPy infers homoplasy in the pronominal slot ‘thou’. This slot indeed contains two roots which actually go back to the Proto-Lezgian suppletive paradigm. Secondly, there are several cases, where LingPy fairly detects formal homoplasy, although I, as a linguist, arbitrarily prefer to leave these characters without improvement. These are Swadesh items which are unstable in the Lezgian group: ‘skin’, ‘sleep’, ‘what’, ‘year’. Reflexes of the competing protoforms in these Swadesh slots are rather complicated and the Proto-Lezgian reconstruction is either unreliable or simply impossible. Thirdly, among the Lezgian homoplastic etymologies treated above, there is a number of cases missed by the LingPy MLN-algorithm. These can be divided into two types.

27 1) The fact that we are dealing with homoplasy follows from the linguistic data not included in the input dataset. Because of this, such homoplastic developments cannot be detected by formal algorithms. The following slots fall within this category: #7 ‘to bite’ (Rutul + Aghul + Tabasaran), #29 ‘fish’ (Aghul + Tabasaran), #56 ‘mouth’ (Tsakhur + Rutul), #105 ‘short’ (West Lezgian + East Lezgian). 2) Standard topological conflicts which could be revealed from the input dataset. The following slots fall within this category: #10 ‘bone’ (Kryts + Budukh + Rutul + Tabasaran + Lezgi), #13 ‘nail’ (Aghul + Lezgi), #19 ‘to drink’ (Aghul + Tabasaran), #26 ‘fat’ (Aghul + Tabasaran), #38. head (Archi + Chechen), #39 ‘to hear’ (Rutul + Aghul + Lezgi), #50 ‘louse’ (Kryts + Budukh + Tsakhur + Rutul), #72 ‘to see’ (Kryts + Budukh + Aghul + Tabasaran), #102 ‘heavy’ (Aghul + Tabasaran), #109 ‘worm’ (Tabasaran + Lezgi).

Modifying the etymology-based non-optimized Lezgian dataset (Kassian 2011–2012; Kassian fothc.) in accordance with the above discussion, we obtain the homoplasy-optimized dataset. The following trees and networks from the homoplasy-optimized dataset were produced: • Fig. 10, StarlingNJ method with binary nodes only. • Fig. 11, StarlingNJ method with neighboring nodes joined. • Fig. 12, NJ method. • Fig. 13, UPGMA method. • Fig. 14, Bayesian MCMC method. • Fig. 15, UMP method. • Fig. 16, manually constructed consensus tree. • Fig. 17, NeighborNet network. • Fig. 18, Minimal lateral network.

28 Koshan Aghul Keren Aghul Gequn Aghul Fite Aghul Aghul (proper) Nidzh Udi Udi Vartashen Archi Gelmets Tsakhur Mishlesh Tsakhur Mikik Tsakhur Luchek Rutul Mukhad Rutul Ixrek Rutul Budukh Kryts (proper) Alyk Kryts Northern Tabasaran Southern Tabasaran 2000 Gyune Lezgi 1500 50 (59) 45 84 (83) 73 (82) 81 (89) 1000 Tsakhur (1110 AD) (1110 Tsakhur Udi (990 AD) Udi (990 80 (80) Kryts (970 AD) Kryts (970 Rutul (960 AD) Rutul (960 Tabasaran (980 AD) (980 Tabasaran South Lezgian (480 AD) South Lezgian (480 Aghul (720 AD) Aghul (720 AD BC 500 0 68 (66) 1000 500 West Lezgian (850 BC) West 75 (75) East Lezgian (940 BC) 16 (31) 1500 38 (58) Nuclear Lezgian (1410 BC) Lezgian (1750 BC) 38 (40) Fig. 10. Homoplasy-optimized phylogenetic tree of the Lezgian lects produced by the StarlingNJ method from the multistate matrix (binary nodes only). The gray ellipses mark nodes which differ from the non- optimized StarlingNJ-tree (Kassian fothc.: Fig. 1aXXX). Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%); in parentheses, bootstrap values from the non-optimized StarlingNJ-tree (Kassian fothc.: Fig. 1aXXX) are quoted. The tree is dated.

29 Koshan Aghul Keren Aghul Gequn Aghul Fite Aghul Aghul (proper) Nidzh Udi Udi Vartashen Gelmets Tsakhur Mishlesh Tsakhur Mikik Tsakhur Luchek Rutul Mukhad Rutul Ixrek Rutul Budukh Kryts (proper) Alyk Kryts Northern Tabasaran Southern Tabasaran Archi 2000 Gyune Lezgi 1500 Tsakhur (1240 AD) (1240 Tsakhur 1000 Udi (990 AD) Udi (990 Kryts (970 AD) Kryts (970 Tabasaran (980 AD) (980 Tabasaran Rutul (960 AD) Rutul (960 Aghul (720 AD) Aghul (720 South Lezgian (480 AD) South Lezgian (480 AD BC 500 0 1000 500 West Lezgian (850 BC) West East Lezgian (940 BC) 1500 Nuclear Lezgian (1400 BC) Lezgian (1730 BC)

Fig. 11. Homoplasy-optimized phylogenetic tree of the Lezgian lects produced by the StarlingNJ method from the multistate matrix (neighboring nodes are joined if the distance between them is ≤ 300 years). The tree is dated.

30 r Mishlesh_Tsakhu Mikik_Tsakhur Gelmets_Tsakhur Budukh Gyune_Lezgi Northern_Tabasaran Vartashen_Udi Nidzh_Udi Southern_Tabasaran Alyk_Kryts Kryts_proper Luchek_Rutul Archi 89 (89) Koshan_Aghul Mukhad_Rutul Ixrek_Rutul Fite_Aghul Aghul_proper Keren_Aghul Gequn_Aghul 83 (78) 42 84 (86) 64 (63) 87 (78) 89 (90) 93 (90) 25 Outgroup_Chechen 38 (59) 39 (42)

Fig. 12. Homoplasy-optimized phylogenetic tree of the Lezgian lects produced by the NJ method from the binary matrix in the SplitsTree4 software. The gray ellipses mark nodes which differ from the non- optimized NJ-tree (Kassian fothc.: Fig. 2XXX). Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%); in parentheses, bootstrap values from the non-optimized NJ-tree (Kassian fothc.: Fig. 2XXX) are quoted. Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4. The BioNJ method yields the same topology.

31 Nidzh_Udi Vartashen_Udi Archi Budukh Kryts_proper Alyk_Kryts Gelmets_Tsakhur Mishlesh_Tsakhur Mikik_Tsakhur Mukhad_Rutul Luchek_Rutul Ixrek_Rutul Koshan_Aghul Keren_Aghul Gequn_Aghul Fite_Aghul Aghul_proper Southern_Tabasaran Northern_Tabasaran Gyune_Lezgi 37 53 93 (94) 57 (60) 93 (93) 75 (75) 84 (91) Outgroup_Chechen 33 (46) 34 (57) 47 (57)

Fig. 13. Homoplasy-optimized phylogenetic tree of the Lezgian lects produced by the UPGMA method from the binary matrix in the SplitsTree4 software. The gray ellipses mark nodes which differ from the non- optimized UPGMA-tree (Kassian fothc.: Fig. 3XXX). Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%); in parentheses, bootstrap values from the non-optimized UPGMA-tree (Kassian fothc.: Fig. 3XXX) are quoted. Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4. 32 r Mishlesh_Tsakhu Mikik_Tsakhur Gelmets_Tsakhur Alyk_Kryts Budukh Luchek_Rutul Kryts_proper Southern_Tabasaran Northern_Tabasaran Mukhad_Rutul Gyune_Lezgi Vartashen_Udi Fite_Aghul Nidzh_Udi Ixrek_Rutul Aghul_proper Keren_Aghul Koshan_Aghul x (0,91) Archi Gequn_Aghul x (0,86) x (0,94) 0,59 (0,77) Outgroup_Chechen 0,91 (0,82)

Fig. 14. Homoplasy-optimized phylogenetic tree of the Lezgian lects produced by the Bayesian MCMC method from the binary matrix in the MrBayes software. Bayesian posterior probabilities are shown near the branches (not shown or abbreviated as “x” for stable branches with P ≥ 0.95); in parentheses, probabilities from the non-optimized MCMC-tree (Kassian fothc.: Fig. 4XXX) are quoted. Branch length reflects the relative rate of cognate replacement as suggested by MrBayes.

33 Gequn_Aghul Fite_Aghul Keren_Aghul Keren_Aghul Fite_Aghul Keren_Aghul Aghul_proper Aghul_proper Fite_Aghul Aghul_proper Gequn_Aghul Gequn_Aghul MP-tree #1 MP-tree #2 MP-tree #3 Mishlesh_Tsakhur Mikik_Tsakhur Gelmets_Tsakhur Northern_Tabasaran Gyune_Lezgi Southern_Tabasaran Luchek_Rutul 71 (69) Alyk_Kryts Mukhad_Rutul Budukh Ixrek_Rutul Vartashen_Udi Gequn_Aghul Kryts_proper Nidzh_Udi Keren_Aghul Koshan_Aghul Fite_Aghul Archi Aghul_proper 80 (78) 77 (76) 88 (55) x (64) 83 (80) 88 (92) 20 74 (62) 32 44 (48) Outgroup_Chechen

Fig. 15. Homoplasy-optimized consensus phylogenetic tree of the Lezgian lects produced by the UMP method from the binary matrix in the TNT software. The gray ellipses mark nodes which differ from the non-optimized UMP-tree (Kassian fothc.: Fig. 5XXX). Bootstrap values are shown near the nodes (not shown or abbreviated as “x” for stable nodes with bootstrap value ≥ 95%); in parentheses, bootstrap values from the non-optimized UMP-tree (Kassian fothc.: Fig. 5XXX) are quoted. Branch length reflects the relative rate of cognate replacement as suggested by TNT. The three optimal trees only differ in the Aghul node as shown in the above panel.

34 The obtained homoplasy-optimized phylogenetic trees of the Lezgian lects require some comments. StarlingNJ, binary tree (Fig. 10). As compared with the non-optimized StarlingNJ-tree (Kassian fothc.: Fig. 1aXXX), the only discrepancy in topology concerns Aghul dialects: Keren and Gequn now form a distinct clade (marked with a gray circle). Some internal nodes acquired different, normally somewhat deeper dates. Bootstrap values become somewhat weaker. StarlingNJ, tree with joined nodes (Fig. 11). There are no discrepancies in topology as compared with the non-optimized StarlingNJ-tree (Kassian fothc.: Fig. 1bXXX). Some internal nodes acquired slightly different dates. NJ (Fig. 12). There are two discrepancies in topology (marked with gray circles) as compared with the non-optimized NJ-tree (Kassian fothc.: Fig. 2XXX): (1) Proto-Nuclear Lezgian splits as (East (West, South)) as opposed to the split (West (East, South)) in the non- optimized tree; (2) in Aghul dialects, Gequn splits off prior to Keren. In both cases, the relevant distances are short and the bootstrap values are rather low. Bootstrap values slightly changed. UPGMA (Fig. 13). There are two discrepancies in topology (marked with gray circles) as compared with the non-optimized NJ-tree (Kassian fothc.: Fig. 3XXX): (1) in the Rutul node, Ixrek splits off first; (2) in the Aghul node, Keren and Gequn form a distinct clade (in both cases, the relevant distances are very short and the bootstrap values are rather low). Bootstrap values become somewhat weaker. Bayesian MCMC (Fig. 14). There are no discrepancies in topology as compared with the non-optimized MCMC-tree (Kassian fothc.: Fig. 4XXX). Some bootstrap values changed (normally they become stronger). UMP (Fig. 15). There are two discrepancies in topology (marked with gray circles) as compared with the non-optimized UMP-tree (Kassian fothc.: Fig. 5XXX): (1) in the Lezgian clade, Udi splits off first, then Archi splits off as opposed to the sequence Archi then Udi in the non- optimized tree; (2) Proto-Nuclear Lezgian splits as (East (West, South)) as opposed to the split (West (East, South)) in the non-optimized tree. Some bootstrap values changed (particularly the Aghul clade become significantly stabler). Thus, the distance-based methods (StarlingNJ, NJ, BioNJ, UPGMA), applied to the homoplasy-optimized Lezgian dataset, produced the trees which topologically differ from the corresponding non-optimized trees in three points: (1) initial Proto-Nuclear Lezgian split, (2) Aghul dialects, and (3) Rutul dialects. These changes do not seem substantial, since, in all three cases, the tested phylogenetic methods vary from each other that implies ternary nodes in the consensus trees be it the non-optimized dataset (Fig. 7) or the homoplasy-optimized one (Fig. 16). The main and unexpected result of the distance-based methods concerns nodes with bootstrap value < 95%: their bootstrap values in the

35 homoplasy-optimized trees are somewhat weaker as compared with the non-optimized trees (although opposite instances also occur). The results of the character-based methods (Bayesian MCMC, UMP) are more significant. Firstly, bootstrap and probability values become stronger in most cases. Secondly, topology of the homoplasy-optimized UMP-tree (Fig. 15) changed. Particularly it concerns the Proto-Lezgian node (in the non-optimized UMP-tree, Archi splits off first that contradicts both traditional classification and other phylogenetic methods). Topological improvement of the UMP-tree meet theoretical expectations, since the maximum parsimony method depends on homoplasy to a greater extent than other methods (note that for the non-optimized dataset, the UMP method has yielded the less likely tree as discussed in Kassian fothc.). Before consensus tree (Fig. 16) constructing, the difference between the individual homoplasy-optimized obtained trees (Fig. 10–15) should be discussed. The discrepancies do not seem substantial, these are even less substantial than in the case of the non- optimized dataset (Kassian fothc.), since the UMP-tree is now altered. Let us examine them. 1) All distance-based methods, i.e., StarlingNJ, NJ, BioNJ, UPGMA (Fig. 10, 12, 13) plus the character-based method UMP (Fig. 15) suggest the Proto-Lezgian consecutive bifurcations with the Udi branch split off first and the Archi branch split off second. The distance between two nodes (the separation of Udi and the separation of Archi) is, however, short in all the distance-based trees, as follows from the tree visualization and the probabilistic values of the branches, and under the assumption of the temporal error of 300 years in StarlingNJ (Fig. 11) the first split of the Lezgian group turns out a three-way one: Udi, Archi, Nuclear Lezgian. The character-based Bayesian MCMC method (Fig. 14), immediately suggest the ternary split into Udi, Archi and Nuclear Lezgian. 2) All the methods suggest the three-part division of the Nuclear Lezgian clade: (1) proto-West Lezgian [Tsakhur, Rutul], (2) proto-South Lezgian [Kryts, Budukh], (3) proto- East Lezgian [Aghul, Tabasaran, Lezgi]. The difference is found out in the hierarchy of the splits. The StarlingNJ (Fig. 10) method suggests that West Lezgian splits off first; NJ (Fig. 12), Bayesian MCMC (Fig. 14) and UMP (Fig. 15) suggest that East Lezgian splits off first; UPGMA (Fig. 13) suggests that South Lezgian splits off first. The distance between two nodes (i.e., consecutive bifurcations between West, South and East proto-languages) is, however, short in all the trees, as follows from the tree visualization and the probabilistic values of the branches, and under the assumption of the temporal error of 300 years in StarlingNJ (Fig. 11) the split of the Nuclear Lezgian sub-group turns out a three- way one: West, South and East. 3) The non-Koshan Aghul dialects. All the methods (except some UMP variants) reconstruct the distinct Proper Aghul/Fite clade, but contradict each other as concerns the Keren and Gequn dialects. However, under the assumption of the temporal error of 300

36 years in StarlingNJ (Fig. 11) the split of the proto-Aghul language after the separation of Koshan turns out a three-way one: Keren, Gequn and Proper Aghul/Fite. 4) The Rutul dialects. StarlingNJ (Fig. 10) suggests that Luchek split off first; all other methods reconstruct the initial separation of Ixrek (note that formally it is a case when the results of multistate matrix analysis are opposed to those of binary matrix analysis). At the same time, in Fig. 10 (StarlingNJ), two Rutul nodes are chronologically remote enough not to get joined under the assumption of the temporal error of 300 years (Fig. 11). The Rutul problem is discussed in Kassian fothc.: the lexicostatistical distances in the Rutul part of the tree do not fulfil the condition of additivity that is abnormal and should imply unrevealable loans and contact-driven homoplasy between Rutul dialects.

Taking into account the aforementioned discrepancies, the following consensus phylogenetic tree of the Lezgian lects can be manually constructed, see Fig. 16. In this tree, the neighboring nodes are joined, (1) if the temporal distance between them is ≤ 300 years as calculated by the StarlingNJ method, see Fig. 10, 11; or (2) if their topology depends on the individual phylogenetic methods (the only exception is the Proper Aghul/Fite Aghul clade which is missing from some of the UMP-trees, Fig. 15). The gray ellipses mark 3 joined ternary nodes which cover binary branchings that differ depending on the method: two of them are automatically obtained under the assumption of the temporal error of 300 years, whereas the third one joins the Rutul dialects as discussed above. As one can see, the topology of the consensus tree (Fig. 16) is identical to the StarlingNJ-tree (Fig. 11) except for the additional joining of the Rutul dialects into one ternary node.

37 Koshan Aghul Keren Aghul Gequn Aghul Fite Aghul Aghul (proper) Nidzh Udi Udi Vartashen Archi Gelmets Tsakhur Mishlesh Tsakhur Mikik Tsakhur Luchek Rutul Mukhad Rutul Ixrek Rutul Budukh Kryts (proper) Alyk Kryts Northern Tabasaran Southern Tabasaran 2000 Gyune Lezgi ) 1500 64 / x 63 / 86 ( ) 87 / x 88 78 / x 55 ( Tsakhur (1240 AD) (1240 Tsakhur 1000 77 Udi (990 AD) Udi (990 x / Kryts (970 AD) Kryts (970 Rutul (960 AD) Rutul (960 Tabasaran (980 AD) (980 Tabasaran 84 / (86 / 91 76) Aghul (720 AD) Aghul (720 x / (x / x 64) South Lezgian (480 AD) South Lezgian (480 AD BC x / x/ 88 x / 92) ( 500 0 1000 500 89 / x 80 (90 / x 80) West Lezgian (850 BC) West East Lezgian (940 BC) 93 / x 74 90 / x 62) ( 44 1500 91 / 38 / 59 / 82 48) ( Nuclear Lezgian (1400 BC) Lezgian (1730 BC)

Fig. 16. Manually constructed homoplasy-optimized consensus phylogenetic tree of the Lezgian lects based on the StarlingNJ, NJ, BioNJ, UPGMA, Bayesian MCMC, UMP methods. The gray ellipses mark 3 joined nodes which cover binary branchings that differ depending on the method. Statistical support values are shown in the following sequence: NJ / MCMC / UMP (“x” means that P ≥ 0.95 in an individual method; not shown for nodes with P ≥ 0.95 in all methods); in parentheses, probability values from the non-optimized consensus tree (Fig. 7) are quoted. StarlingNJ dates are proposed.

38 Summing up. Topology of the homoplasy-optimized consensus Lezgian tree (Fig. 16) is the same as that of the non-optimized consensus tree (Fig. 7); some internal nodes acquired slightly different dates. Statistical support values (bootstrap and Bayesian posterior probabilities) alter in both directions, but generally the homoplasy-optimized consensus tree is stabler, especially it concerns the initial splits of Proto-Lezgian and the Aghul clade (first of all, it is due to the homoplasy-optimized UMP-tree which is now in agreement with other methods). Interesting that even after such an extensive homoplastic optimization as described above, it cannot be said that the Lezgian NeighborNet-network based on the homoplasy- optimized dataset (Fig. 17) seriously differs from the NeighborNet-network based on the non-homoplasy-optimized dataset (Fig. 8). Such an absence of substantial difference is probably because NeighborNet uses binary matrices with equal cost of change between the states, although actually the input data represent presence/absence matrices (“1” = presence, “0” = absence of the specific proto-root with the specific Swadesh meaning in the given language). On the other hand, it is reported in Holden & Gray 2006: 25 that coding lexical data in multistate format makes “very little difference to the results” (Bantu languages were investigated). More tests with various linguistic data are needed.

39 Gequn_AghulFite_Aghul Koshan_Aghul

Southern_Tabasaran Gyune_Lezgi Aghul_proper 82.391.6Keren_Aghul 94.257.547.98.987.2 Archi Northern_Tabasaran 1020 4.5 Outgroup_Chechen

51 14.9 54.716.5 92.8 16.446.8 5.27.57.3 21.814.923.32241 93.4 10.624 Nidzh_Udi 41.533.4 16.574.41438 Vartashen_Udi

Alyk_Kryts

Kryts_proper Budukh 19.877.738.7 91.390.4 91 Ixrek_Rutul 51.2 Mikik_Tsakhur 88.2 Mishlesh_Tsakhur Gelmets_Tsakhur

Luchek_Rutul Mukhad_Rutul

Fig. 17. Homoplasy-optimized phylogenetic network of the Lezgian lects produced by the NeighborNet method from the binary matrix in the SplitsTree4 software. Bootstrap values are shown near the branches (not shown for stable branches with bootstrap value ≥ 95%). Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4.

Results of the MLN-module of the LingPy software appear to be more important. The minimal lateral network based on the homoplasy-optimized dataset (Fig. 18) is significantly less conflicting than the minimal lateral network based on the non- homoplasy-optimized dataset (Fig. 9). As follows from the graphical representation (Fig. 18), the total amount of inferred homoplastic characters is rather modest; the highest number of conflicting characters was obtained between Archi + Proto-Aghul: 2 characters (‘to lie’ and ‘yellow’, in both cases, however, we are dealing with false responses).

40 Koshan_Aghul -

Keren_Aghul - Gequn_Aghul -

utul

Fite_Aghul - R Northern_TabasaranAghul_proper - -

Southern_Tabasaran - - - - Ixrek_ - - Luchek_Rutul - - Mukhad_Rutul 2 - Gelmets_Tsakhur lesh_Tsakhur Gyune_Lezgi - - - Mikik_Tsakhur - Mish chi Kryts_proper - - - Ar

- Vartashen_Udi Inferred Links Alyk_Kryts - - Nidzh_Udi 1 Budukh - - Outgroup_Chechen Fig. 18. Homoplasy-optimized minimal lateral network of the Lezgian lects produced in the LingPy software. Based on the consensus homoplasy-optimized tree (Fig. 16). Node size reflects the inferred number of cognate sets present in each lect. The solid links illustrate lateral transfer events suggested by the method. Thickness and color of the links indicate the inferred number of homoplastic characters between two nodes, as specified by the right scale. The gain-loss model 3−1 is best fitting, p = 0.2 (the next nearest model is 5−2, p = 0.17).

41 6. Tsezic case. Tsezic is a linguistic group which consists of languages spoken in South-West Dagestan (Russian Federation), Fig. 19. The Tsezic group is a member of the Nakh-Dagestanian clade of the North Caucasian linguistic family.

42 Fig. 19 (adapted from Koryakov 2006: map #11). Map of the modern Tsezic lects.

43 According to the traditional expert classification, the group is divided into two main branches: Eastern Tsezic (Hunzib & Bezhta) and Western Tsezic (Dido & Khwarshi), see Bokarev 1959: 227 (general lexical evidence); Imnaishvili 1963: 9 (various evidence); Testelets 1993 (lexicostatistics and phonetic evidence); Starostin & Nikolayev 1994: 110 (lexicostatistics and general evidence); Alekseev 1998: 299–300 (lexicostatistics and general evidence); Khalilova 2009: 1 (general evidence). Slightly differently in van den Berg 1995: 5, where Khwarshi constitutes a third distinct branch (Northern Tsezic). Position of the fifth language, Hinukh, is slightly more controversial since in Bokarev 1959: 227 Hinukh is classified as “an intermediate lect”: “Hinukh occupies the intermediate position: it shares one part of its [native] lexicon with the languages of the Western subgroup, and another part with the languages of the Eastern subgroup”. However, according to Lomtadze 1963 (various evidence), van den Berg 1995: 5 (general evidence) and Testelets 1993 (lexicostatistics and phonetic evidence), Hinukh is a closest relative of Dido or simply a dialect of Dido. Previously published formal classifications of the Tsezic group suggest the same two- way division into Eastern Tsezic (Hunzib & Bezhta) and Western Tsezic (Hinukh, Dido & Khwarshi): 1) lexicostatistical calculations by Testelets 1993 which are based on the etymologized 100- item wordlists, elaborated by the StarlingNJ-method; Hinukh and Dido form a distinct clade; 2) lexicostatistical calculations by Alekseev 1998: 300 and Koryakov 2006: 21 which are based on the etymologized 100-item wordlists, elaborated by the StarlingNJ-method; Western Tsezic is a ternary node which splits into Hinukh, Dido and Khwarshi; 3) Automated Similarity Judgment Program project phylogeny (Müller et al. 2013), based on the Levenshtein distances between non-etymologized 40-item wordlists; Hinukh and Dido form a distinct clade. 4) phylogeny by Cysouw & Forker 2009: Fig. 1, 2, based on the Levenshtein distances between non- etymologized 1300-item wordlists; Hinukh and Dido form a distinct clade.

Lexicostatistical analysis, based on the 110-item non-homoplasy-optimized wordlists of the 9 Tsezic lects (Kassian 2013–2014), yielded the following phylogenetic trees and networks: • Fig. 20, StarlingNJ method with binary nodes only; • Fig. 21, StarlingNJ method with neighboring nodes joined; • Fig. 22, NJ method; • Fig. 23, UPGMA method. • Fig. 24, Bayesian MCMC method. • Fig. 25, UMP method. • Fig. 26, manually constructed consensus tree. • Fig. 27, NeighborNet network. • Fig. 28, Minimal lateral network.

44 Hunzib proper Bezhta proper Khoshar-Khota Bezhta Tlyadal Bezhta Hinukh Kidero Dido Sagada Dido Khwarshi proper Inkhokwari Khwarshi 2000 1500 88 1000 Khwarshi (1060 AD) Khwarshi (1060 Bezhta (870 AD) Bezhta (870 65 Dido (820 AD) Dido (820 500 East Tsezic (700 AD) (700 East Tsezic 71 AD BC West Tsezic (50 BC) Tsezic West 500 0 Tsezic (760 BC) Tsezic

Fig. 20. Non-homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the StarlingNJ method from the multistate matrix (binary nodes only). Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%). The tree is dated.

45 Hunzib proper Bezhta proper Khoshar-Khota Bezhta Tlyadal Bezhta Hinukh Kidero Dido Sagada Dido Khwarshi proper Inkhokwari Khwarshi 2000 1500 1000 Khwarshi (1060 AD) Khwarshi (1060 Dido (820 AD) Dido (820 East Tsezic (780 AD) (780 East Tsezic 500 AD BC West Tsezic (70 AD) (70 Tsezic West 500 0 Tsezic (760 BC) Tsezic

Fig. 21. Non-homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the StarlingNJ method from the multistate matrix (neighboring nodes are joined if the distance between them is ≤ 300 years). The tree is dated.

46 Inkhokwari_Khwarshi Sagada_Dido Kidero_Dido Khwarshi_proper Bezhta_proper Khoshar_Khota_Bezhta Hunzib_proper Tlyadal_Bezhta Hinukh 83 69 Outgroup_Chechen

Fig. 22. Non-homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the NJ method from the binary matrix in the SplitsTree4 software. Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%). Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4. The BioNJ method yields the same topology.

47 Hunzib_proper Bezhta_proper Khoshar_Khota_Bezhta Tlyadal_Bezhta Hinukh Kidero_Dido Sagada_Dido Khwarshi_proper Inkhokwari_Khwarshi 94 83 Outgroup_Chechen

Fig. 23. Non-homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the UPGMA method from the binary matrix in the SplitsTree4 software. Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%). Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4.

48 Inkhokwari_Khwarshi Khwarshi_proper Sagada_Dido Kidero_Dido Khoshar_Khota_Bezhta Bezhta_proper Tlyadal_Bezhta Hunzib_proper Hinukh 0,93 0,92 Outgroup_Chechen

Fig. 24. Non-homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the Bayesian MCMC method from the binary matrix in the MrBayes software. Bayesian posterior probabilities are shown near the branches (not shown for stable branches with P ≥ 0.95). Branch length reflects the relative rate of cognate replacement as suggested by MrBayes.

49 Hinukh Sagada_Dido Khoshar_Khota_Bezhta Bezhta_proper Kidero_Dido Tlyadal_Bezhta Khwarshi_proper Inkhokwari_Khwarshi Hunzib_proper 45 77 60 Outgroup_Chechen

Fig. 25. Non-homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the UMP method from the binary matrix in the TNT software (1 optimal tree was obtained). Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%). Branch length reflects the relative rate of cognate replacement as suggested by TNT.

As illustrated by the above trees, all the phylogenetic methods reconstruct two main clades: Eastern Tsezic (Hunzib & Bezhta) and Western Tsezic (Hinukh, Dido & Khwarshi) that agrees with the traditional and previous formal classifications. The methods, however, contradict each other in topology of the West Tsezic clade. Some of the methods suggest a Hinukh-Dido clade distinct from Khwarshi: StarlingNJ (Fig. 20), UPGMA (Fig. 23), UMP

50 (Fig. 25). Others suggest a Dido-Khwarshi clade distinct from Hinukh: NJ (Fig. 22), Bayesian MCMC (Fig. 24). Thus, formally the following consensus tree can be constructed, see Fig. 26. In this tree, the neighboring nodes are joined, (1) if the temporal distance between them is ≤ 300 years as calculated by the StarlingNJ method, see Fig. 20, 21; or (2) if their topology depends on the individual phylogenetic methods. The gray ellipsis marks the only joined ternary nodes which cover binary branchings that differ depending on the method: this one is, however, automatically obtained under the assumption of the temporal error of 300 years. As one can see, the topology of the consensus tree (Fig. 26) is identical to the StarlingNJ- tree (Fig. 21).

51 Hunzib proper Bezhta proper Khoshar-Khota Bezhta Tlyadal Bezhta Hinukh Kidero Dido Sagada Dido Khwarshi proper Inkhokwari Khwarshi 2000 1500 45 83 / x 1000 Khwarshi (1060 AD) Khwarshi (1060 Dido (820 AD) Dido (820 East Tsezic (780 AD) (780 East Tsezic 500 AD / x 92 BC x / West Tsezic (70 AD) (70 Tsezic West 500 0 Tsezic (760 BC) Tsezic

Fig. 26. Manually constructed consensus non-homoplasy-optimized phylogenetic tree of the Tsezic lects based on the StarlingNJ, NJ, BioNJ, UPGMA, Bayesian MCMC, UMP methods. The gray ellipsis marks 1 joined nodes which cover binary branchings that differ depending on the method. Statistical support values are shown in the following sequence: NJ / MCMC / UMP (“x” means that P ≥ 0.95 in an individual method; not shown for nodes with P ≥ 0.95 in all methods). StarlingNJ dates are proposed.

Below I examine the reverse lexicostatistical distances for two East Tsezic lects (ETs) — Hunzib proper, Bezhta proper — and three West Tsezic lects (WTs) — Hinukh, Kidero Dido, Khwarshi proper — as obtained from the multistate matrix; higher percentage of the shared basic vocabulary means greater closeness: Tab. 1.

52 Table 1. Reverse lexicostatistical distances for some Tsezic lects (non-homoplasy-optimized dataset). Hunzib proper Bezhta proper Hinukh (WTs) Kidero Dido Khwarshi (ETs) (ETs) (WTs) proper (WTs) Hunzib proper — 0.87 0.63 0.55 0.55 (ETs) Bezhta proper — 0.63 0.53 0.55 (ETs) Hinukh (WTs) — 0.77 0.70 Kidero Dido —0.76 (WTs) Khwarshi — proper (WTs)

If we exclude Hinukh, the lexicostatistical distances between the four remaining lects fulfil the condition of additivity: two East Tsezic lects are close to each other (87% shared basic vocabulary), two West Tsezic lect are close to each other (76%), whereas any East Tsezic lect is equally remote from any West Tsezic lect (53–55%). The configuration gets abnormal, however, when Hinukh is introduced. First, distances between three West Tsezic lects do not fulfil the condition of additivity: Kidero Dido is equally close to Khwarshi and Hinukh (76–77%), whereas Khwarshi and Hinukh are remote from each other (70%). It means that there should be a number of parasitic, i.e., secondary matches either between Kidero Dido & Khwarshi or between Kidero Dido & Hinukh. Geographical distribution (Fig. 19) and ethnographic evidence (“Many Hinuq men marry Tsez [Dido] women, who then move to the village of Hinuq. These women often do not fully acquire the Hinuq language and sometimes simply continue to speak Tsez [Dido], at least at home”, Forker 2013: 16) suggest that this pair is expected to have secondary contacts. Since Sagada Dido (which is not adjacent to the Hinukh territory) demonstrates the same lexicostatistical closeness to Hinukh as Kidero Dido does (76–77%), it is more likely that the normal direction of the influence is Kidero Dido > Hinukh rather than vice versa. Second, Hinukh comparison with East Tsezic lects also demonstrates irregular ratios. Four sets of three languages can be analyzed. 1) Hunzib proper (ETs) / Bezhta proper (ETs) / Hinukh (WTs). The configuration is normal: two East Tsezic lects are close to each other (87%) and equally remote from the West Tsezic lect (63%). 2) Hunzib proper (ETs) / Bezhta proper (ETs) / Kidero Dido (WTs). The configuration is normal: two East Tsezic lects are close to each other (87%) and equally remote from the West Tsezic lect (53–55%).

53 3) Hinukh (WTs) / Kidero Dido (WTs) / Hunzib proper (ETs). The configuration is not quite normal: two West Tsezic lects are indeed close to each other (77%), but not equally remote from the East Tsezic lect: Hinukh / Hunzib = 63%, whereas Kidero Dido / Hunzib = only 55% (the difference is 8). 4) Hinukh (WTs) / Kidero Dido (WTs) / Bezhta proper (ETs). The configuration is even more abnormal: two West Tsezic lects are indeed close to each other (77%), but not equally remote from the East Tsezic lect: Hinukh / Bezhta = 63%, whereas Kidero Dido / Bezhta = only 53% (the difference is 10). As follows from the analysis of these four sets, the lexicostatistical distances between one West Tsezic and two East Tsezic lects do not satisfy the condition of additivity. Hinukh demonstrates abnormal closeness to East Tsezic lects, both to Bezhta and Hunzib. Such a closeness should be treated as secondary, i.e., a number of secondary lexical matches between Proto-Hinukh and Proto-East Tsezic is to be assumed. This can be explained as a result of serious influence in between Proto-Hinukh and Proto-East Tsezic, although the default direction of influence, Proto-East Tsezic > Proto-Hinukh or vice versa, cannot be established by means of such a formal analysis. The NeighborNet network (Fig. 27) also depicts conflicting signal between Hinukh and Dido, on the one hand, and Hinukh and Bezhta, on the other. Thus, two stages in the history of Hinukh can be reconstructed, if we accept that rate of Swadesh cognate replacement is strict within the Tsezic group. Initially, Hinukh entered into close contact with Proto-East Tsezic and subsequently Bezhta (the direction of influence is not entirely clear). Later, Hinukh was influenced by the neighboring Dido (especially Kidero Dido). Cf. a similar statement by Forker, but unfortunately without any further detail: “there has been and there still is extensive contact between Hinuq speakers and speakers of two other Tsezic languages, Bezhta and Tsez [Dido]” (Forker 2013: 12). Forker (2013: 16) also attributes Hinukh-Dido contacts to the present time. In other words, formal analysis of lexicostatistical distances suggests that Dido and Khwarshi form a clade which is distinct from Hinukh. There is indeed a certain similarity in phonology and morphosyntax between Dido and Hinukh that may even provoke some linguists to treat Hinukh as a dialect of Dido (Lomtadze 1963), but actually, according to Ya. Testelets (p.c.), Dido and Hinukh lack any reliable shared innovations of that kind.

54 Khwarshi_proper

Outgroup_Chechen

Inkhokwari_Khwarshi Sagada_Dido

33.5

60.5 Kidero_Dido 60.7

50.5 52.5

Hinukh

21.81.454.6 Hunzib_proper

85.725.8 Bezhta_proper

Khoshar_Khota_Bezhta

Tlyadal_Bezhta

Fig. 27. Non-homoplasy-optimized phylogenetic network of the Tsezic lects produced by the NeighborNet method from the binary matrix in the SplitsTree4 software. Bootstrap values are shown near the branches (not shown for stable branches with bootstrap value ≥ 95%). Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4.

The Tsezic non-homoplasy-optimized minimal lateral network (Fig. 28), based on the consensus tree with ternary nodes (Fig. 26), also detects several conflicting characters, although their amount is relatively modest: no more than 1 character between two nodes.

55 Kidero_Dido - Hinu

kh -

Sagada_Dido -

- - Tlyadal_Bezhta 1

- - Khoshar_Khota_Bezhta

Bezhta_proper Khwars - - hi_proper -

- - Hunzib_proper Inferred Links Inferred

Inkhokwari_Khwarshi -- - Outgroup_Chechen 0 Fig. 28. Non-homoplasy-optimized minimal lateral network of the Tsezic lects produced in the LingPy software. Based on the consensus non-homoplasy-optimized tree (Fig. 26). Node size reflects the inferred number of cognate sets present in each lect. The solid links illustrate lateral transfer events suggested by the method. Thickness and color of the links indicate the inferred number of homoplastic characters between two nodes, as specified by the right scale. The gain-loss models 2−1 and 5−2 are equally best fitting, p = 0.92 in both cases.

At least the following cases of homoplastic developments within the Tsezic 110-item wordlists (Kassian 2013–2014) can be detected proceeding from the reconstructed phylogenetic tree (Fig. 26) and the reconstructed Proto-Tsezic wordlist. Initially I checked the Tsezic non-homoplasy optimized dataset manually, then the MLN module of LingPy was applied to the non-homoplasy optimized dataset. The latter about half of the homoplastic characters previously known from manual analysis that is not bad.

1. #1 all Hinukh: čʼekʼː-u ‘all’. Dido (common): cʼikʼ-y-u, cʼikʼ-a-w ‘all’. Since Khwarshi retains the deverbative *g=ʫ-y- ‘all’ which likely represents a Proto- Tsezic term for ‘all’ (it is also present in East Tsezic), the match between Hinukh čʼekʼː-u and Dido cʼikʼ-y-u, cʼikʼ-a-w should be treated as secondary and contact-driven. I mark the Hinukh and Dido words with two different numbers.

2. #5 big Hinukh: CLASS=ežiː ‘big’. Dido (common): CLASS=eže ‘big’. Since Khwarshi retains *CLASS=uqʼV ‘all’ which represents a Proto-Tsezic term for ‘big’, whereas the original meaning of *CLASS=ižV (Hinukh, Dido) was something like ‘many,

56 numerous’ vel sim., the match between Hinukh and Dido should be treated as secondary and contact-driven. I mark the Hinukh and Dido words with two different numbers.

3. #11 breast East Tsezic: *χeru ‘breast’. Khwarshi proper: ħele-lokʼ a ‘breast’ Since Proto-Tsezic and Proto-West Tsezic term for this meaning was apparently *χ mV(- rV), the match between East Tsezic *χeru ‘breast’ and Khwarshi proper ħele-lokʼ a ‘breast’, literally ‘*χeru-heart’, is secondary. It should be treated as independent semantic development of Tsezic *χeru (although the original meaning of *χeru is unclear). I mark the East Tsezic and Khwarshi proper words with two different numbers.

4. #36 hair Hunzib: kera ‘hair’. Bezhta proper: kẽyã ‘hair’. Since all Bezhta dialects have *mučʼ as the basic expression for ‘hair’ which probably represents the Proto-Tsezic term for this meaning, Bezhta proper *kẽ (obl. *kera) in the meaning ‘head hair’ is secondary under the influence on the part of neighboring Hunzib. I mark the Hunzib and Bezhta proper words with two different numbers.

Hinukh: mus-be ‘head hair’ Outgroup Chechen: mas ‘head hair’ The Proto-Tsezic meaning of *mosː was ‘a k. of hair’, probably ‘body hair, fur’, but not ‘head hair’, since there are more reliable Proto-Tsezic candidates for this meaning. I mark the Hinukh and Chechen forms with two different numbers.

5. #43 to kill Khoshar-Khota Bezhta: CLASS=uo-l ‘to kill’. Hinukh: CLASS=uhe-r ‘to kill’. Dido (common): CLASS=eχu-r ‘to kill’. Since all other East Tsezic lects retain Proto-Tsezic *CLASS=iʼV ‘to kill’, Khoshar-Khota Bezhta CLASS=uo-l ‘to kill’, a synchronic from ‘to die’, is an transparent new formation, independent from the same pattern in West Tsezic (Hinukh, Dido). I mark the Khoshar-Khota Bezhta and West Tsezic (Hinukh, Dido) words with two different numbers. An alternative solution is to suppose that *CLASS=iχ V-l ‘to die-CAUS’ existed already in Proto-Tsezic, where it functioned as a marginal expression for ‘to kill’.

6. #44 knee 57 Sagada Dido: qʼontu ‘knee’. Khwarshi (proper): qʼontu ‘knee’. Since other dialects of Dido retain the Proto-Tsezic term *bičnV ‘knee’, the match between Sagada Dido and Khwarshi proper is secondary. Probably it should be treated as independent semantic development of Proto-Tsezic *qʼ(n)tV. I mark the Sagada Dido and Khwarshi proper words with two different numbers.

7. #52 many Hinukh: ()aši ‘many’. Sagada Dido: aši ‘many’. Since *ašː- can be reconstructed with the Proto-Tsezic meaning ‘thick (2D)’ that is retained in Kidero Dido, its meaning ‘many’ seems to be an (independent?) introduction in Hinukh and Sagada Dido. I mark the Hinukh and Sagada Dido words with two different numbers.

Hunzib: laχːi ‘many’. Outgroup Chechen: duqa ‘many’. *laχːi, attested only in Hunzib, can hardly be reconstructed as the Proto-Tsezic term for ‘many’, since there exist more reliable candidates for this meaning. Apparently Hunzib and Chechen represent independent developments. I mark the Hunzib and Chechen words with two different numbers.

8. #53 meat East Tsezic: *χːo ‘meat’. Hinukh: χu ‘meat’ Since the Proto-Tsezic expression for ‘meat’ can be reconstructed as *ri, the stem *χːo acquired the meaning ‘meat’ in Hinukh under the influence on the part of East Tsezic. I mark the East Tsezic and Hinukh words with two different numbers.

9. #65 rain Hinukh: qema ‘rain’. Dido (common): qema ‘rain’. Khwarshi (proper): qema ‘rain’. Since the Proto-Tsezic expression for ‘rain’ can be reconstructed as *ː d (that particularly retained in Inkhokwari Khwarshi), the stem *q ma ‘cloudiness / cloudy’ with the substantivized meaning ‘rain’ looks like a parallel innovation in Hinukh, Dido and Khwarshi proper (at least for Hinukh we should suppose influence on the part of Dido). I mark the Hinukh, Dido and Khwarshi proper words with three different numbers.

58 10. #70 sand Bezhta (common): miso ‘sand’. Hinukh: mese ‘sand’. Since in other lects ‘sand’ is expressed by *kebu, one of the two stems, *m sːV or *kebu, demonstrates parallel semantic development in daughter lects. Geographical distribution clearly suggests that it was *m sːV that apparently acquired the meaning ‘sand’ in (Proto- )Bezhta and then spread on neighboring Hinukh. I mark the Bezhta and Hinukh words with two different numbers.

11. #71 say East Tsezic: *n=isː V ‘to say’. Hinukh: ese ‘to say’. Since the basic Proto-Tsezic verb for ‘to say’ is to be reconstructed as *ʔiV, the *isː V with the generic meaning ‘to say’ is a parallel innovation in Proto-East Tsezic and Hinukh. Due to morphological difference (fossilized prefix in the East Tsezic stem), it should be treated as independent introductions rather than the result of East Tsezic influence on Hinukh. I mark the East Tsezic and Hinukh words with two different numbers.

12. #73 seed Bezhta proper: hakʼ ‘seeds’ (not basic ‘seed’). Hinukh: akʼ ‘seed’. Dido (majority of dialects): akʼ ‘seed’. The original semantics of Tsezic *ħakʼ can hardly be reconstructed, but its meaning ‘seed(s)’ is an innovation of the central Tsezic area. The exact source of the innovation is not entirely clear, however (Dido dialects > Hinukh > Bezhta?). I mark the Hinukh and Dido words with two different numbers.

13. #83 to swim Bezhta (proper): ẽχe y=ao ‘river’ + ‘to take out’. Hinukh: iχu y=i ‘river’ + ‘to take out’. Formal matches between some East Tsezic and West Tsezic lects suggests the analytic construction *ɬː! r=ːV 'to take out/off the water' as the Proto-Tsezic expression for ‘to swim’. Apparently the similar Bezhta-Hinukh expression with ‘river’ is a parallel contact- driven development. I mark the Bezhta proper and Hinukh words with two different numbers.

14. #89 tooth Hinukh: kʼeču ‘tooth’ Dido (common): kʼicu ‘tooth’ 59 Since the Proto-Tsezic stem for ‘tooth’ is *s l (as it is particularly retained in Khwarshi), whereas *kʼ cu should be reconstructed with the Proto-Tsezic meaning ‘canine tooth, fang’, the match between Hinukh and Dido is apparently a contact-driven innovation. I mark the Hinukh and Dido words with two different numbers.

15. #92 to go Dido (Kidero): CLASS=ikʼi ‘to go’. Khwarshi (Inkhokwari): CLASS=õkʼ ‘to go’. Since other Dido and Khwarshi dialects retain the Proto-Tsezic verb *CLASS=ẽʼV ‘to go’, it was replaced by *CLASS=ʔẽkʼV independently in Kidero Dido and Inkhokwari Khwarshi. I mark the Hinukh and Dido words with two different numbers.

Additionally, the following instances of homoplasy between individual Lezgian lects and the outgroup (Chechen) must be mentioned and improved. In all cases, we are dealing with independent innovations either in Tsezic or in Chechen or in both.

16. #7 to bite East Tsezic (Hunzib, Bezhta): sila + AUX ‘to bite’, lit. ‘to beat the tooth’ and ‘to put the tooth’ Outgroup Chechen: cerg-aš y=oxka ‘to bite’, lit. ‘to stick (in) the teeth’ Despite the fact that Proto-Tsezic *s l ‘tooth’ and Proto-Nakh *ca-r-ikʼ ‘tooth’ are etymologically cognate, the analytic constructions for ‘to bite’ are transparent recent introductions. I mark the East Tsezic and Chechen forms with two different numbers.

17. #36 hair See above.

18. #52 many See above.

19. #76 to sleep Khwarshi (common): es ‘to sleep’ Outgroup Chechen: nab yan ‘to sleep’, literally ‘to do sleep’ The Khwarshi verb can hardly be posited as the Proto-Tsezic verb for ‘to sleep’, the Chechen analytic expression is a transparent recent introduction, and the etymological connection between Khwarshi es and Chechen nab is highly dubious. I mark the Khwarshi and Chechen forms with two different numbers.

60 Now let us examine, how the MLN-module of LingPy has coped with the task. Unlike the Lezgian case, treated above, LingPy correctly detected only about half of the aforementioned cases of Tsezic homoplasy (#1 ‘all’, #11 ‘breast’, #43 ‘to kill’, #44 ‘knee’, #73 ‘seed’, #83 ‘to swim’, #92 ‘to go’). The following homoplastic developments with topological conflicts were ignored by the MLN-algorithm: #5 ‘big’ (Hinukh, Dido), #36 ‘hair’ (Hunzib, Bezhta proper), #52 ‘many’ (Hinukh, Sagada Dido), #53 ‘meat’ (East Tsezic, Hinukh), #65 ‘rain’ (Hinukh, Dido, Khwarshi), #71 ‘say’ (East Tsezic, Hinukh), #70 ‘sand’ (Bezhta, Hinukh), #89 ‘tooth’ (Hinukh, Dido). The homoplastic character #7 ‘to bite’ (East Tsezic, Chechen) was naturally not detected by LingPy, since the fact of homoplasy follows from the external data not included in the input dataset. In some rare case LingPy reconstructs false homoplasy.

Modifying the etymology-based non-optimized Tsezic dataset (Kassian 2013–2014) in accordance with the above discussion, we obtain the homoplasy-optimized dataset. The following trees and networks from the homoplasy-optimized dataset were produced: • Fig. 29, StarlingNJ method with binary nodes only. • Fig. 30, StarlingNJ method with neighboring nodes joined. • Fig. 31, NJ method. • Fig. 32, UPGMA method. • Fig. 33, Bayesian MCMC method. • Fig. 34, UMP method. • Fig. 35, manually constructed consensus tree with binary nodes only. • Fig. 36, manually constructed consensus tree with neighboring nodes joined. • Fig. 37, NeighborNet network. • Fig. 38, Minimal lateral network.

61 Hunzib proper Bezhta proper Khoshar-Khota Bezhta Tlyadal Bezhta Hinukh Kidero Dido Sagada Dido Khwarshi proper Inkhokwari Khwarshi 2000 1500 89 (88) 1000 Khwarshi (1060 AD) Khwarshi (1060 Bezhta (870 AD) Bezhta (870 67 (65) Dido (820 AD) Dido (820 500 East Tsezic (640 AD) (640 East Tsezic AD 62 BC West Tsezic (130 BC) Tsezic West 500 0 Tsezic (710 BC) Tsezic

Fig. 29. Homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the StarlingNJ method from the multistate matrix (binary nodes only). The gray ellipses mark nodes which differ from the non- optimized StarlingNJ-tree (Fig. 20). Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%); in parentheses, bootstrap values from the non-optimized StarlingNJ-tree (Fig. 20) are quoted. The tree is dated.

62 Hunzib proper Bezhta proper Khoshar-Khota Bezhta Tlyadal Bezhta Hinukh Kidero Dido Sagada Dido Khwarshi proper Inkhokwari Khwarshi 2000 1500 1000 Khwarshi (1060 AD) Khwarshi (1060 Dido (820 AD) Dido (820 East Tsezic (760 AD) (760 East Tsezic 500 AD BC West Tsezic (40 BC) Tsezic West 500 0 Tsezic (710 BC) Tsezic

Fig. 30. Homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the StarlingNJ method from the multistate matrix (neighboring nodes are joined if the distance between them is ≤ 300 years). The tree is dated.

63 Bezhta_proper Sagada_Dido Kidero_Dido Inkhokwari_Khwarshi Khwarshi_proper Khoshar_Khota_Bezhta Hunzib_proper Tlyadal_Bezhta Hinukh 77 (83) 85 (69) Outgroup_Chechen

Fig. 31. Homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the NJ method from the binary matrix in the SplitsTree4 software. Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%); in parentheses, bootstrap values from the non-optimized NJ-tree (Fig. 22) are quoted. Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4. The BioNJ method yields the same topology.

64 Hunzib_proper Bezhta_proper Khoshar_Khota_Bezhta Tlyadal_Bezhta Hinukh Kidero_Dido Sagada_Dido Khwarshi_proper Inkhokwari_Khwarshi 94 (94) 55 Outgroup_Chechen

Fig. 32. Homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the UPGMA method from the binary matrix in the SplitsTree4 software. The gray ellipses mark nodes which differ from the non- optimized UPGMA-tree (Fig. 23). Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%); in parentheses, bootstrap values from the non-optimized UPGMA-tree (Fig. 23) are quoted. Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4.

65 Inkhokwari_Khwarshi Bezhta_proper Khwarshi_proper Khoshar_Khota_Bezhta Tlyadal_Bezhta Sagada_Dido Kidero_Dido Hunzib_proper Hinukh (0,93) (0,92) Outgroup_Chechen

Fig. 33. Homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the Bayesian MCMC method from the binary matrix in the MrBayes software. Bayesian posterior probabilities are shown near the branches (not shown for stable branches with P ≥ 0.95); in parentheses, probabilities from the non-optimized MCMC-tree (Fig. 24) are quoted. Branch length reflects the relative rate of cognate replacement as suggested by MrBayes.

66 Inkhokwari_Khwarshi Khwarshi_proper Sagada_Dido Kidero_Dido Bezhta_proper Khoshar_Khota_Bezhta Tlyadal_Bezhta Hinukh Hunzib_proper 46 (45) 86 (77) 68 Outgroup_Chechen

Fig. 34. Homoplasy-optimized phylogenetic tree of the Tsezic lects produced by the UMP method from the binary matrix in the TNT software (1 optimal tree was obtained). The gray ellipses mark nodes which differ from the non-optimized UMP-tree (Fig. 25). Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%); in parentheses, bootstrap values from the non-optimized UMP- tree (Fig. 25) are quoted. Branch length reflects the relative rate of cognate replacement as suggested by TNT.

The obtained homoplasy-optimized phylogenetic trees of the Tsezic lects require some comments. StarlingNJ, binary tree (Fig. 29). As compared with the non-optimized StarlingNJ-tree (Fig. 20), the only discrepancy in topology concerns the West Tsezic cluster: Dido and Khwarshi now form a clade distinct from Hinukh. Bootstrap values slightly changed. Some nodes acquired slightly different dates.

67 StarlingNJ, tree with joined nodes (Fig. 30). There are no discrepancies in topology as compared with the non-optimized StarlingNJ-tree (Fig. 21). Some nodes acquired slightly different dates. NJ (Fig. 31). There are no discrepancies in topology as compared with the non- optimized NJ-tree (Fig. 22). Bootstrap values slightly changes. UPGMA (Fig. 32). As compared with the non-optimized StarlingNJ-tree (Fig. 23), the only discrepancy in topology concerns the West Tsezic cluster: Dido and Khwarshi form a clade distinct from Hinukh. Bootstrap values did not change. Bayesian MCMC (Fig. 33). There are no discrepancies in topology as compared with the non-optimized MCMC-tree (Fig. 24). Bootstrap values become somewhat stronger. UMP (Fig. 34). As compared with the non-optimized UMP-tree (Fig. 25), the only discrepancy in topology concerns the West Tsezic cluster: Dido and Khwarshi form a clade distinct from Hinukh. Bootstrap values become somewhat stronger.

Thus, three methods — StarlingNJ, UPGMA, UMP — applied to the homoplasy- optimized Tsezic dataset, produced the trees which topologically differ from the corresponding non-optimized trees in one point: Dido and Khwarshi form a distinct clade that is opposed to the distinct Hinukh-Dido clade in the non-optimized trees. All the methods, applied to the homoplasy-optimized Tsezic dataset, produced topologically identical trees which can be summarized as a consensus tree. See Fig. 35 for the consensus tree which simply repeats the obtained topology; see Fig. 36 for the consensus tree, where the neighboring nodes are joined, if the temporal distance between them is ≤ 300 years as calculated by the StarlingNJ method. As one can see, the topology of the second homoplasy-optimized consensus tree (Fig. 36) is identical to the both StarlingNJ-trees with joined nodes: non-homoplasy- optimized (Fig. 21) and homoplasy-optimized (Fig. 30).

68 Hunzib proper Bezhta proper Khoshar-Khota Bezhta Tlyadal Bezhta Hinukh Kidero Dido Sagada Dido Khwarshi proper Inkhokwari Khwarshi 2000 1500 77 / x 46 (83 / x 45) 1000 Khwarshi (1060 AD) Khwarshi (1060 Bezhta (870 AD) Bezhta (870 Dido (820 AD) Dido (820 x / 86 (x / x 77) 500 East Tsezic (640 AD) (640 East Tsezic AD BC 85 / x 68 (69 / 93 -) x / West Tsezic (130 BC) Tsezic West (x / 92 x) 500 0 Tsezic (710 BC) Tsezic

Fig. 35. Manually constructed homoplasy-optimized consensus phylogenetic tree of the Tsezic lects based on the StarlingNJ, NJ, BioNJ, UPGMA, Bayesian MCMC, UMP methods. Statistical support values are shown in the following sequence: NJ / MCMC / UMP (“x” means that P ≥ 0.95 in an individual method; not shown for nodes with P ≥ 0.95 in all methods); in parentheses, probability values from the non-optimized trees are quoted. StarlingNJ dates are proposed.

69 Hunzib proper Bezhta proper Khoshar-Khota Bezhta Tlyadal Bezhta Hinukh Kidero Dido Sagada Dido Khwarshi proper Inkhokwari Khwarshi 2000 1500 77 / x 46 (83 / x 45) 1000 Khwarshi (1060 AD) Khwarshi (1060 Dido (820 AD) Dido (820 East Tsezic (760 AD) (760 East Tsezic 500 AD BC x / West Tsezic (40 BC) Tsezic West (x / 92 x) 500 0 Tsezic (710 BC) Tsezic

Fig. 36. Manually constructed homoplasy-optimized consensus phylogenetic tree of the Tsezic lects based on the StarlingNJ, NJ, BioNJ, UPGMA, Bayesian MCMC, UMP methods (neighboring nodes are joined if the distance between them is ≤ 300 years). Statistical support values are shown in the following sequence: NJ / MCMC / UMP (“x” means that P ≥ 0.95 in an individual method; not shown for nodes with P ≥ 0.95 in all methods); in parentheses, probability values from the non-optimized trees are quoted. StarlingNJ dates are proposed.

Summing up. Homoplastic optimization uniforms the obtained Tsezic topology between all methods and makes statistical support values (bootstrap and Bayesian posterior probabilities) stronger. Some nodes acquired slightly different dates. The resulting homoplasy-optimized consensus tree is more stable than its non-optimized counterpart.

70 Now we can reexamine the reverse lexicostatistical distances for two East Tsezic lects (Hunzib proper, Bezhta proper) and three West Tsezic lects (Hinukh, Kidero Dido, Khwarshi proper) as obtained from the multistate matrix; higher percentage of the shared basic vocabulary means greater closeness: Tab. 2.

Table 2. Reverse lexicostatistical distances for some Tsezic lects (homoplasy-optimized dataset); in parentheses, values for the non-optimized dataset (Tab. 1) are quoted, if differ. Hunzib proper Bezhta proper Hinukh (WTs) Kidero Dido Khwarshi (ETs) (ETs) (WTs) proper (WTs) Hunzib proper — 0.86 (0.87) 0.62 (0.63) 0.55 0.54 (0.55) (ETs) Bezhta proper — 0.60 (0.63) 0.53 0.54 (0.55) (ETs) Hinukh (WTs) — 0.72 (0.77) 0.69 (0.70) Kidero Dido — 0.75 (0.76) (WTs) Khwarshi — proper (WTs)

First, distances between three West Tsezic lects become more normal: Kidero Dido is close to Khwarshi (75%), whereas Hinukh is almost equally remote from Kidero Dido and Khwarshi (69–72%). The difference between the Hinukh / Kidero Dido distance (72%) and the Hinukh / Khwarshi one (69%) could suggest that not all secondary, i.e., homoplastic matches in the Hinukh / Kidero Dido pair have been revealed by the above linguistic analysis. Second, Hinukh comparison with East Tsezic lects still demonstrates irregular ratios. Four sets of three languages can be analyzed. 1) Hunzib proper (ETs) / Bezhta proper (ETs) / Hinukh (WTs). The configuration is normal: two East Tsezic lects are close to each other (86%) and equally remote from the West Tsezic lect (60–62%). 2) Hunzib proper (ETs) / Bezhta proper (ETs) / Kidero Dido (WTs). The configuration is normal: two East Tsezic lects are close to each other (86%) and equally remote from the West Tsezic lect (53–55%). 3) Hinukh (WTs) / Kidero Dido (WTs) / Hunzib proper (ETs). The configuration is not normal: two West Tsezic lects are indeed close to each other (72%), but not equally remote from the East Tsezic lect: Hinukh / Hunzib = 62%, whereas Kidero Dido / Hunzib = only 55% (the difference is 7). 4) Hinukh (WTs) / Kidero Dido (WTs) / Bezhta proper (ETs). The configuration is also abnormal: two West Tsezic lects are indeed close to each other (72%), but not equally

71 remote from the East Tsezic lect: Hinukh / Bezhta = 60%, whereas Kidero Dido / Bezhta = only 53% (the difference is 7). As follows from the analysis of these four sets, the lexicostatistical distances between one West Tsezic and two East Tsezic lects do not satisfy the condition of additivity: Hinukh still demonstrates abnormal closeness to East Tsezic lects, both to Bezhta and Hunzib, although situation is slightly better than lexicostatistical anomalies between these taxa for the non-optimized dataset. This implies that there are several cases of contact- driven parallel developments within the 110-item wordlist between Hinukh and East Tsezic (Bezhta & Hunzib) which cannot be revealed by the current linguistic analysis. The homoplasy-optimized NeighborNet network (Fig. 37) demonstrates that conflicting signal between Hinukh and Dido, on the one hand, and between Hinukh and Bezhta, on the other, become somewhat weaker as compared to the non-optimized network (Fig. 27). On the contrary, as expected, the minimal lateral network produced by the LingPy software only detects a couple of (false) conflicts: Fig. 38.

72 Khwarshi_proper

Outgroup_Chechen

Sagada_Dido Inkhokwari_Khwarshi

51.4

Kidero_Dido 60.3 92.944.553 94.9

52.1 93.5

Hinukh

24.611.131.214.2 18.8 Bezhta_proper 82.3 Tlyadal_Bezhta

Khoshar_Khota_Bezhta

Hunzib_proper

Fig. 37. Homoplasy-optimized phylogenetic network of the Tsezic lects produced by the NeighborNet method from the binary matrix in the SplitsTree4 software. Bootstrap values are shown near the branches (not shown for stable branches with bootstrap value ≥ 95%). Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4.

73 Kidero_Dido - Hinukh -

Sagada_Dido - - Tlyadal_Bezhta 1

- Khoshar_Khota_Bezhta

Khw - Bezhta_proper arshi_proper - b_proper - Hunzi Inferred Links Inferred

Inkhokwari_Khwarshi - - Outgroup_Chechen 0 Fig. 38. Homoplasy-optimized minimal lateral network of the Tsezic lects produced in the LingPy software. Based on the consensus homoplasy-optimized tree (Fig. 36). Node size reflects the inferred number of cognate sets present in each lect. The solid links illustrate lateral transfer events suggested by the method. Thickness and color of the links indicate the inferred number of homoplastic characters between two nodes, as specified by the right scale. The gain-loss models 2−1 and 5−2 are equally best fitting, p = 0.81 in both cases.

Besides inferred unrevealable homoplasy between Hinukh and neighboring lects (as it may be suggested by the above distance analysis), there is one additional problem with the obtained Tsezic phylogeny. All the methods, applied to both non-optimized and optimized datasets, reconstruct the Bezhta clade (which comprises three dialects: Bezhta proper, Khoshar-Khota, Tlyadal) opposed to the Hunzib language. It well conforms to the traditional expert classification, since grammatically Hunzib and Bezhta are indeed two distinct languages (Yakov Georgievich, is it really so?? Could we regard Hunzib / Bezhta proper / the rest of Bezhta as three dialectal branches?). According to the StarlingNJ-trees, however, the time span between the Proto-East Tsezic and Proto-Bezhta nodes is relatively short: 170 years for the non-homoplasy-optimized dataset (Fig. 20) and 230 years for the homoplasy-optimized dataset (Fig. 29). Does such a time span make sense from the historical point of view? It is an open question. Under the assumption of the temporal error of 300 years, the Proto-East Tsezic split turns out a three-way one: Hunzib / Bezhta proper / Khoshar-Khota–Tlyadal (see Fig. 21 for the non-homoplasy-optimized dataset and Fig. 30 & 36 for the homoplasy-optimized dataset).

74 7. Conclusions. It seems that the reconstruction of language phylogeny should consist of several steps. 1. The high-quality input dataset is elaborated with help of the main phylogenetic methods and the consensus phylogenetic tree is produced. 2. The ancestral, i.e., proto-language character states are reconstructed (unfortunately it is normal that in a number of characters more than one formally equal reconstructed items compete with each other). 3. Proceeding from the consensus tree and ancestral character states, the dataset is examined for homoplasy (contact-driven parallel developments are especially deleterious). The minimal lateral network module of LingPy is a powerful tool for that purpose. The revealed items that constitute secondary matches should be marked as unrelated or, if we are sure of the direction of influence, the target item should be treated as borrowing. Note that normally not all cases of linguistic homoplasy can be detected. 4. The homoplasy-optimized dataset is elaborated with help of the main phylogenetic methods and the consensus phylogenetic tree is produced.

After homoplastic optimization, individual clades can be better resolved and generally the homoplasy-optimized phylogeny should be more robust than the initially reconstructed tree. The Lezgian and Tsezic datasets tested above confirm these expectations. It must be additionally noted that in both Lezgian and Tsezic cases statistical support values (bootstrap and Bayesian posterior probabilities) have increased first of all for the character-based methods: Bayesian MCMC and Maximum parsimony; the distance-based methods, such as NJ and UPGMA, turned out less noise-sensitive.

Supporting materials (LingPy files) can be downloaded from: • http://newstar.rinet.ru/~kass/Linguistic_homoplasy_Lezgian_Tsezic_DRAFT/04/ Lezgian_non-homoplasy-optimized_MLN_lingpy_04.zip • http://newstar.rinet.ru/~kass/Linguistic_homoplasy_Lezgian_Tsezic_DRAFT/04/ Lezgian_homoplasy-optimized_MLN_lingpy_04.zip • http://newstar.rinet.ru/~kass/Linguistic_homoplasy_Lezgian_Tsezic_DRAFT/04/ Tsezic_non-homoplasy-optimized_MLN_lingpy_04.zip • http://newstar.rinet.ru/~kass/Linguistic_homoplasy_Lezgian_Tsezic_DRAFT/04/ Tsezic_homoplasy-optimized_MLN_lingpy_04.zip

75 8. References. Alekseev, M. 1998. Tsezskie yazyki [Tsezic languages]. In: M. Alekseev et al. (eds.). Yazyki mira: Kavkazskie yazyki [Languages of the World: Caucasian Languages]. Moscow: Academia: 299–303. Atkinson, Q. D. & R. D. Gray. 2006. How old is the Indo-European language family? Progress or more moths to the flame? In: P. Forster, C. Renfrew (eds.). Phylogenetic Methods and the Prehistory of Languages. Cambridge: The McDonald Institute for Archaeological Research, 91–109. Balanovsky O, Dibirova K, Dybo A, Mudrak O, Frolova S, Pocheshkhova E, Haber M, Platt D, Schurr T, Haak W, et al. (2011) Parallel evolution of genes and languages in the Caucasus region. Mol. Biol. Evol. 28: 2905–2920. Barbançon F., Evans, S. N., Nakhleh, L., Ringe, D., Warnow, T. 2013 An experimental study comparing linguistic phylogenetic reconstruction methods. Diachronica 30/2: 143– 170. Berg, H. van den. 1995. A grammar of Hunzib. With texts and lexicon. Proefschrift ter verkrijging van de graad van Doctor aan de Rijksuniversiteit te Leiden, 25 januari 1995. Bokarev E. A. 1959. Tsezskie (didojskie) yazyki Dagestana [Tsezic languages of Dagestan]. Moscow: Izd-vo AN SSSR. Bryant D, Moulton V (2004). NeighborNet: an agglomerative algorithm for the construction of phylogenetic networks. Mol. Biol. Evol. 21: 255–265. Bryant, D., F. Filimon & R. D. Gray. 2005. Untangling our past: languages, trees, splits and networks. In: R. Mace, C. Holden & S. Shennan (eds.). The Evolution of Cultural Diversity: a Phylogenetic Approach. London: UCL Press, 69–85. Burlak, S. A., Starostin, S. A. 2005. Sravnitel’no-istoricheskoe yazykoznanie [Historical Linguistics]. 2nd ed. Moscow: Academia. Cysouw, M., Forker, D. 2009. Reconstruction of morphosyntactic function: Nonspatial usage of spatial case marking in Tsezic. Language 85/3: 588–617. Dyen, I., Kruskal, J., Black, P. 1997. Comparative Indo-European Database. This file was last modified on Feb 5, 1997. http://www.wordgumbo.com/ie/cmp/ [Accessed 07.07.2014]. Forker, D. 2013. A Grammar of Hinuq. Berlin/Boston: De Gruyter Mouton. Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution 14: 685–695. Goloboff PA, Farris JS, Nixon KC (2008) TNT, a free program for phylogenetic analysis. Cladistics 24/5: 774–786. Gray, R. D., Atkinson, Q. D. 2003. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426: 435–439. Haspelmath, M. 2009. Lexical borrowing: concepts and issues. In: Haspelmath, M. & U. Tadmor (eds.). Loanwords in the World’s Languages. A Comparative Handbook. Berlin: De Gruyter Mouton: 35–54.

76 Haugen, E. 1950. The Analysis of Linguistic Borrowing. Language 26 (2): 210–231. Holden C. J., R. D. Gray. 2006. Rapid radiation, borrowing and dialect continua in the Bantu languages. In: P. Forster, C. Renfrew (eds.). Phylogenetic Methods and the Prehistory of Languages. Cambridge: The McDonald Institute for Archaeological Research, 19–31. Huelsenbeck JP, Ronquist F (2001) MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17/8: 754–755. Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23/2: 254–267. Imnaishvili D. S. 1963. Didojskij yazyk v sravnenii s ginukhskim i khvarshijskim yazykami [Dido language in comparison with the Hinukh and Khwarshi languages]. Tbilisi: Mecniereba. Kassian A (2011–2012) Annotated Swadesh wordlists for the Lezgian group (North Caucasian family). Database compiled and annotated by A. Kassian (November 2011 — October 2012). The Global Lexicostatistical Database project: http://starling.rinet.ru/cgi- bin/main.cgi?root=new100&encoding=utf-eng. Accessed 12.09.2014. Kassian, A. 2013. The Lezgian linguistic group within the framework of the Global Lexicostatistical Database. Talk given at the conference “Comparative-Historical Linguistics of the 21st Century: Issues and Perspectives”, Moscow, March 20–22, 2013. https://www.academia.edu/3040336/The_Lezgian_linguistic_group_within_the_framewor k_of_the_Global_Lexicostatistical_Database Kassian A (2013–2014) Annotated Swadesh wordlists for the Tsezic group (North Caucasian family). Database compiled and annotated by A. Kassian (October 2013 — July 2014). The Global Lexicostatistical Database project: http://starling.rinet.ru/cgi- bin/main.cgi?root=new100&encoding=utf-eng. Accessed 12.09.2014. Kassian A. Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data. Forthcoming. Russian version (2014): https://app.box.com/s/224y9ywdzcs1icf0mdpy Kassian A, Starostin G, Dybo A, Chernov V (2010) The Swadesh wordlist. An attempt at semantic specification. Journal of Language Relationship 4: 46–89. Kassian, A., Zhivlov, M., Starostin, G. Proto–Indo-European–Uralic comparison from the probabilistic point of view. Forthcoming. Russian version (2014): https://app.box.com/s/q8f7l17thcwyoaa3iu3l Khalilova, Z. 2009. A Grammar of Khwarshi. Proefschrift ter verkrijging van de graad van Doctor aan de Universiteit Leiden, 17 december 2009. Kitchen, A., Ehret, C., Assefa, Sh., Mulligan, C. J. 2009. Bayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near East. Proc. R. Soc. B 276: 2703–2710. Koryakov Yu. B. 2006. Atlas kavkazskikh yazykov: s prilozheniem polnogo reestra yazykov [Atlas of the Caucasian languages with language guide]. Moscow: Institute of Linguistic. Lees, R. 1953. The Basis of Glottochronology. Language 29.

77 List, J.-M., Moran, S. 2013. An open source toolkit for quantitative historical linguistics. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Stroudsburg, PA: Association for Computational Linguistics: 13–18. List, J.-M., Nelson-Sathi, S., Geisler, H., Martin, W. 2014a. Networks of lexical borrowing and lateral gene transfer in language and genome evolution. BioEssays 36.2: 141–150. List, J.-M., Nelson-Sathi, S., Martin, W., Geisler, H. 2014b. Using phylogenetic networks to model Chinese dialect history. Language Dynamics and Change 4: 222–252. List, J.-M., Moran, S., Bouda, P., Dellert, J. 2014c. LingPy: Python library for quantitative tasks in historical linguistics. Version 2.4.1.alpha, DOI: 10.5281/zenodo.11886. Marburg: Forschungszentrum Deutscher Sprachatlas. Available at: http://lingpy.org/ [Accessed 28.09.2014]. Lomtadze E. 1963. Ginukhskij dialekt didojskogo yazyka [Hinukh dialect of the Dido language]. Tbilisi: Mecniereba. Makarenkov V, Kevorkov D, Legendre P (2006) Phylogenetic Network Construction Approaches. In: Arora DK, Berka RM, Singh GB, eds. Applied Mycology and Biotechnology, Vol. 6: Bioinformatics. Amsterdam / Boston: Elsevier, pp. 61–98. Müller A., Velupillai V., Wichmann S., Brown C. H., Holman E. W. et al. 2013. ASJP World Language Trees of Lexical Similarity: Version 4 (October 2013): http://email.eva.mpg.de/~wichmann/language_tree.htm [Accessed 07.07.2014]. Nelson-Sathi, S.; List, J.-M.; Geisler, H.; Fangerau, H.; Gray, R. D.; Martin, W. and Dagan, T. 2011. Networks uncover hidden lexical borrowing in Indo-European language evolution. Proc. R. Soc. B 278:. 1794–1803 (published online before print on November 24, 2010). Nikolayev S. L. 1978. Rekonstruktsiya foneticheskoj sistemy pratsezskogo yazyka [Reconstruction of the Proto-Tsezic phonological system]. In: Yartseva V. N. et al. (eds.). Konferentsiya: Problemy rekonstruktsii (tezisy dokladov). Moscow: Institut yazykoznaniya AN SSSR: 87–89. Novotná P, Blažek V (2007) Glottochronology and its application to the Balto-Slavic languages. Baltistica 42/2: 185–210; Baltistica 42/3: 323–346. Saitou N, Nei M (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4: 406–425. Semple Ch, Steel M (2003) Phylogenetics. Oxford: Oxford University Press. Sneath PHA, Sokal RR (1973) Numerical Taxonomy. San Francisco: W.H. Freeman and Company. Starostin GS (2010) Preliminary lexicostatistics as a basis for language classification: A new approach. Journal of Language Relationship 3: 79–116. Starostin GS (2011) Annotated Swadesh wordlists for the Nakh group (North Caucasian family). Database compiled and annotated by G. Starostin (last revision: October 2011). The

78 Global Lexicostatistical Database project: http://starling.rinet.ru/cgi- bin/main.cgi?root=new100&encoding=utf-eng. Accessed 12.09.2014. Starostin GS (2013) Yazyki Afriki. Opyt postroeniya leksikostatisticheskoj klassifikatsii [Languages of Africa: A New Lexicostatistical Classification]. Vol. 1: Metod. Kojsanskie yazyki [Methodology. Khoisan Languages]. Moscow: LRC. Starostin SA (1989/2007) Sravnitel’no-istoricheskoe yazykoznanie i leksikostatistika [Historical linguistics and lexicostatistics]. In: Starostin 2007: 407–447. [First publ. in: Lingvisticheskaya rekonstruktsiya i drevnejshaya istoriya Vostoka (Moscow, 1989): 3–39. Eng. version: S. Starostin 1999/2000.] Starostin, S. A. 1993/2007. Rabochaya sreda dlya lingvista [Linguist’s workspace]. In: S. Starostin 2007, pp. 481–496. [First publ. in: Bazy dannykh po istorii Evrazii v srednie veka 2. Moscow: Institut vostokovedeniya RAN, 1993: 50–64. Republ.: Gumanitarnye nauki i novye informatsionnye tekhnologii. Moscow: RGGU, 1994: 7–23.] Starostin SA (1994) Lezgian Etymological Database. Computerized version of the Proto- Lezgian corpus, available at http://starling.rinet.ru/cgi-bin/main.cgi?flags=eygtnnl [Accessed 10.09.2014]. Includes some Proto-Lezgian etymologies (mostly basic lexicon items) that have not been included in Starostin & Nikolayev 1994 due to their lack of external cognates in other branches of North Caucasian. Starostin SA (1999/2000) Comparative-historical linguistics and lexicostatistics. In: Historical Linguistics and Lexicostatistics. Melbourne: Association for the History of Language, 1999, pp. 3–50 [Republ. in: Time Depth in Historical Linguistics. Oxford: McDonald Institute for Archaeological Research, 2000, pp. 223–259.] Starostin, S. A. 2007. Trudy po yazykoznaniyu [Works in Linguistics]. Moscow: LRC Publishing House. Starostin SA (2007a) Opredelenie ustojchivosti bazisnoj leksiki [Defining the stability of basic lexicon]. In: Starostin 2007: 827–839. Starostin SA (n.d.) Istoricheskaya fonetika lezginskikh yazykov [Lezgian historical phonology]. Unpubl. ms, 1980s. Starostin SA, Nikolayev SL (1994) A North Caucasian Etymological Dictionary. Moscow: Asterisk, 1994 [reprinted: 3 vols. Ann Arbor: Caravan Books, 2007]. Available online at the Tower of Babel project as Caucet.dbf: http://starling.rinet.ru/cgi- bin/main.cgi?flags=eygtnnl Swadesh, M. 1952. Lexico-statistic dating of prehistoric ethnic contacts. Proceedings of the American Philosophical Society 96: 453–63. Swadesh, M. 1955. Towards greater accuracy in lexicostatistic dating. International Journal of American Linguistics 21: 121–137. Testelets Ya. G. 1993. K sravnitelno-istoricheskoj fonetike tsezskix yazykov (rekonstrukciya vokalizma) [Towards a historical phonology of Tsezic languages: vowels]. In: Nikolaeva T. M. (ed.). Problemy fonetiki 1. Moscow: Prometej: 126–134.

79