Linguistic complexity and language (contact) history: The case of Albanian dialects Maria S. Morozova, Yu. Rusakov, (Maria A. Ovsjannikova) Institute for Linguistic Studies of the Russian Academy of Sciences (ILS RAS) Saint Petersburg State University (SPbSU) Saint Petersburg, Russia [email protected], [email protected], ([email protected])

Balkan Languages and Dialects: Corpus-based and Quantitative Studies October 18–20, 2018 Institute for Linguistic Studies of the Russian Academy of Sciences (Saint Petersburg, Russia)

Roadmap 1. Introduction • Goals of the paper 2. Phonetics and grammar: complexity of the Albanian varieties • What is linguistic complexity? • Data used in the study • Complexity on the Albanian dialectal map • Complexity and closeness of the Albanian varieties 3. Lexicon: comparing “grammatical” and “lexical” data on the Albanian varieties • Lexicon and closeness of the Albanian varieties • “Grammatical” and “lexical” closeness • Complexity and lexical borrowings in the Albanian varieties • Complexity and unique words in the Albanian varieties 4. Conclusions Goals of the paper The study has two interrelated goals. 1. To measure the level of complexity of Albanian varieties. To examine the correlation of the complexity level with the real processes in the history of the , such as language and ethnic contact situations of different types, isolation of some groups of varieties, population movements, etc., and try to throw light on the “balkanization” processes in the Albanian-speaking area. Further, it would be interesting to address some issues relevant for the Balkan area as a whole, i.e. the degrees of balkanization of the other Balkan varieties. 2. To prove, deepen and specify our knowledge about the character of Albanian dialect division and its history, using the parameter of linguistic complexity and the quantitative data on the Albanian dialect lexicon for measuring and examining the dialect variation and the degree of closeness.

Roadmap 1. Introduction • Goals of the paper 2. Phonetics and grammar: complexity of the Albanian varieties • What is linguistic complexity? • Data used in the study • Complexity on the Albanian dialectal map • Complexity and closeness of the Albanian varieties 3. Lexicon: comparing “grammatical” and “lexical” data on the Albanian varieties • Lexicon and closeness of the Albanian varieties • “Grammatical” and “lexical” closeness • Complexity and lexical borrowings in the Albanian varieties • Complexity and unique words in the Albanian varieties 4. Conclusions What is linguistic complexity?

• “grammatical complexity – complexity of the strictly linguistic domains of phonology, morphosyntax, lexicon, etc. and their components. <…> [C]omplexity can be measured as follows. – For each subsystem of the grammar, the number of elements it contains. <…> – The number of paradigmatic variants, or degrees of freedom, of each such element or set of elements: allophones, allomorphs, declension or conjugation classes. <…> – Syntagmatic phenomena <…> – Constraints on elements, alloforms, and syntagmatic dependencies, including constraints on their combination” (Nichols 2009: 111-112). Data used in the study • Dialectological Atlas of Albanian Language (DAAL 2007–2008): dialectal maps with 131 villages in the main area and 14 villages in diaspora. • Phonological and morphological data for the Atlas was collected in 1970–1980s using a questionnaire, with 65 questions on phonology, 80 questions on grammar (DAAL 2007: 437-453). • The lexical volume of the Atlas (DAAL 2008) maps the local terms for 260 lexical items.

Data used in the study

• Albanian varieties for the current study are drawn from the main area: – 93 located in the Republic of and in the adjacent part of (Çamëri) – 25 located in and in the Republic of Serbia (Preševo) – 7 located in the Republic of Macedonia – 6 located in the Republic of • Diaspora varieties spoken in Serbia (Pešter), (Zadar), Greece, and Italy were not taken into consideration in the study. Data used in the study

• Each of the 131 dialectal varieties was described in terms of 27 binary features (e.g. presence of /θ/ or presence of supercompound verb forms), basing on the features represented in DAAL. 1 = presence of a feature 0 = absence of a feature • All selected features fall into types 1 and 2 (the number of elements in phonology and morphology, and the number of paradigmatic variants), according to Nichols 2009. • Complexity of each variety was then calculated as a simple sum of its features.

maxCompl = 23 minCompl = 9 Complexity on the Albanian dialectal map • The color grading from black to white was used to show the linguistic complexity from 23 to 9.

Observations: 1. A strong decrease of linguistic complexity in the direction from north to south, i.e. from the Gheg to the Tosk area. 2. Less articulated decrease from west to east can be seen, especially in the northern part of the map.

Complexity on the Albanian dialectal map Observation 1: • The complexity of the majority of Gheg varieties ranges from 16 to 23 (except for two locations in Dibra, with the total complexity scores of 13 and 14). • The complexity of all Tosk varieties ranges from 9 to 14. • This corresponds with the ideas about the stronger balkanization of the Tosk area (because of its adjacency to the “center of balkanization” located in the region of Ohrid and Prespa lakes, see Lindstedt 2000, etc.). Complexity on the Albanian dialectal map Observation 2: • The decrease of complexity from west to east, which can be seen especially in the northern part of the map, to some extent correlates with the actual contact situation. E.g. the Northeastern Gheg varieties spoken in Kosovo and the Central Gheg varieties spoken in Macedonia and on the border with it, where the Albanian- Slavic contact is ongoing, are less complex than the Northern Gheg and Central Gheg varieties in the territory of Albania. • Some correlation between the relief and the level of linguistic complexity was observed. • The NWG varieties in the mountainous Northern Albania (Albanian Alps) and in the isolated area around Lake Skadar are more likely to be complex than the NEG varieties in the central part of Kosovo and the plateau of Dukagjin. • In the central part of Albania, less complex varieties are spoken on the coast (Durrës, Kavajë), while more complex can be found in the highlands. Complexity and closeness of the Albanian varieties • Multidimensional scaling (MDS) with R was applied for assessment and visual representation of closeness of the Albanian varieties. Cf. a similar study by Dombrowski, with a network analysis of the dialects of Macedonian (2014). – Formally, each variety is ideally associated with a 27-dimensional vector. – Comparisons are calculated based on comparable features shared by a pair of varieties. – Shorter/longer distances between points in the MDS plot correspond to the lower/higher degree of closeness of varieties. Closeness of the Albanian varieties • The surprisingly big distance between (all) Gheg and (all) Tosk dialects may point at the secondary character of the old Gheg/Tosk border, which follows the river in the mid of Albania (i.e. the Gheg/Tosk separation did not arise in situ, see Русаков 2013), reflect the subsequent ethnic and linguistic changes in the Tosk area (massive language shift of Slavic and Aromanian population, see Десницкая 1976), or may result from the combination of these two situations.

Complexity and closeness of the Albanian varieties

COMPLEXITY lower higher

Roadmap 1. Introduction • Goals of the paper 2. Phonetics and grammar: complexity of the Albanian varieties • What is linguistic complexity? • Data used in the study • Complexity on the Albanian dialectal map • Complexity and closeness of the Albanian varieties 3. Lexicon: comparing “grammatical” and “lexical” data on the Albanian varieties • Lexicon and closeness of the Albanian varieties • “Grammatical” and “lexical” closeness • Complexity and lexical borrowings in the Albanian varieties • Complexity and unique words in the Albanian varieties 4. Conclusions Lexicon: comparing “grammatical” and “lexical” data on the Albanian varieties • For this part of the study we analyzed 219 lexical maps from DAAL. The maps reflect mainly semantic fields of trees and plants, wild and domestic animals, household terms, and names for the objects of material culture. • All lexemes were classified in “inherited words” (including Ancient Greek and loanwords) and “borrowings”. • Each of the 131 dialectal varieties was described in terms of 219 non-binary features. • Multidimensional scaling (MDS) with R was applied for measuring the closeness of the varieties, as in the grammatical part. Lexicon and closeness of the Albanian varieties Lexicon and closeness of the Albanian varieties Lexicon and closeness of the Albanian varieties “Grammatical” and “lexical” closeness

• Both sets of the results correlate very well with the traditional dialect classification.

“grammatical” closeness “lexical” closeness “Grammatical” and “lexical” closeness

• Lexical parameters support the previously expressed idea about the great distance between Gheg and Tosk dialects. On the other hand, they demonstrate a degree closeness between Southern Gheg and Northern Tosk dialect which may indicate the relatively late contacts between Gheg and Tosk in around Shkumbin river.

“grammatical” closeness “lexical” closeness “Grammatical” and “lexical” closeness

• In both graphs, the CG area has a number of unclustered varieties, which are close to the other subgroups (SG, NEG), or do not join any of the compact groups.

“grammatical” closeness “lexical” closeness Complexity and lexical borrowings in the Albanian varieties

• We calculated the number of lexical borrowings in the 131 varieties of Albanian: min = 41 of 219 words (19%) max = 85 of 219 words (39%) Lexical borrowings and closeness of the Albanian varieties

LEXICON: ALL WORDS LEXICON: LOANWORDS ONLY Complexity and lexical borrowings in the Albanian varieties

• We calculated the number of lexical borrowings in the 131 varieties of Albanian: min = 41 of 219 words (19%) max = 85 of 219 words (39%) • The results of the study of the linguistic complexity (based on DAAL 2007) were compared with the data on lexical borrowings and their distribution (based on DAAL 2008). Lexical borrowings Complexity on the Albanian dialectal map on the Albanian dialectal map

Complexity and lexical borrowings in the Albanian varieties • The correlation of the linguistic complexity and the spread of lexical borrowings in the Albanian varieties has a complicated and uneven character. • On the one hand, relatively high, though not extreme, complexity is combined with relatively many loanwords in the zones of the Albanian-Slavic contact in the eastern part of the Albanian dialect continuum (compare with the northwestern part!). Complexity and lexical borrowings in the Albanian varieties

• On the other hand, most part of the Tosk area demonstrates low complexity and not so high number of lexical borrowings. The processes of the relatively new contacts that brought the most recent Slavic and Greek loanwords to a part of the Tosk dialects obviously was quite different from very deep processes of the older Tosk “balkanization”. The latter maybe involved a massive language shift. Complexity and unique words in the Albanian varieties

• We calculated the number of unique words (= either borrowing or inherited word represented in one point only) in the 131 varieties of Albanian. • Hypothesis: the number of the unique words may correlate with the degree of the past and/or present isolation of the given community • 19 points on the map have no unique words. • In 121 points the number of unique words range from 1 to 10. • 1 point (No. 89, Reka e Dibrës) has 21 unique words.

Unique words Complexity on the Albanian dialectal map on the Albanian dialectal map

Complexity and unique words in the Albanian varieties • We have almost no correlation between complexity and the number of the unique words. • The high number of the unique words in the “unclustered” CG points in Macedonia is remarkable and may indicate the special way of the development of these communities (from the point of view of contact vs. isolation). Roadmap 1. Introduction • Goals of the paper 2. Phonetics and grammar: complexity of the Albanian varieties • What is linguistic complexity? • Data used in the study • Complexity on the Albanian dialectal map • Complexity and closeness of the Albanian varieties 3. Lexicon: comparing “grammatical” and “lexical” data on the Albanian varieties • Lexicon and closeness of the Albanian varieties • “Grammatical” and “lexical” closeness • Complexity and lexical borrowings in the Albanian varieties • Complexity and unique words in the Albanian varieties 4. Conclusions Conclusions

• It seems that our attempt to study the Albanian dialect varieties by quantitative methods allowed to clarify some moments of the language history of the given area, first of all, to diagnose some cases of “aberrant” development (language shift and so one). The combination of the lexical and phonetical- grammatical data also helps to reflect the Albanian dialect history in a more full way. Conclusions

• The methods used in the macro-areal studies (linguistic complexity) work well on the micro- areal level also and may be applied together with the methods of the dialectometry. • A systematic study of the Balkan dialects using this approach would probably allow us to more exactly identify the areas of language contact of different intensity, and to throw light on the real processes that took place in the linguistic history in the Balkans. Bibliography • Beci, Bahri. 2007. Të folmet qendrore të shqipes së veriut. Gegërishtja qendrore dhe grupimi i të folmeve të saj. Tiranë: EDFA. • Dombrowski, Andrew. 2014. A Network Analysis of Macedonian Dialects (a methodological experiment). A paper presented at the 19th Biennial Conference on Balkan and South Slavic Linguistics, Literature and Folklore. April 25–27, 2014, University of Chicago, Illinois. • Gjinari, Jorgji, Bahri Beci, Gjovalin Shkurtaj, Xheladin Gosturani. 2007. Atlasi dialektologjik i gjuhës shqipe. Vëllimi 1. Tiranë; Napoli. • Gjinari, Jorgji, Gjovalin Shkurtaj. 2000. Dialektologjia. Ribotim. Tiranë: ShBLU. • Lindstedt, Jouko. 2000. Linguistic Balkanization: Contact-induced change by mutual reinforcement. In D.G. Gilbers, J. Nerbonne and J. Shaeken (eds.). Languages in Contact (= Studies in Slavic and General Linguistics 28). Amsterdam — Atlanta, GA: Rodopi, 231-246. • Nickols, Johanna. 2009. Linguistic complexity: a comprehensive definition and survey. In Sampson, Geoffrey, David Gil, and Peter Trudgill (eds.). Language Complexity as an Evolving Variable. Oxford: Oxford University Press, 110-125. • Десницкая, Агния Васильевна. 1968. Албанский язык и его диалекты. Л.: Наука. • Десницкая, Агния Васильевна. 1976. Эволюция диалектной системы в условиях этнического смешения (из истории славяно-албанских языковых контактов). In Вопросы этногенеза и этнической истории славян и восточных романцев. М.: Наука, 186-197. • Десницкая, Агния Васильевна. 1987. Роль устных койне в истории образования албанского литературного языка. In А.В. Десницкая. Албанская литература и албанский язык. М.: Наука, 195-204. • Русаков, Александр Юрьевич. 2013. Некоторые изоглоссы на албанской диалектной карте (к вопросу о возникновении и распространении балканизмов албанского языка). In Вяч. Вс. Иванов (отв. ред.). Исследования по типологии славянских, балтийских и балканских языков (преимущественно в свете языковых контактов). СПб.: Алетейя, 113-174. Thank you for your attention! Faleminderit për vëmendjen!

[email protected] [email protected]