The Albanian Linguistic Journey from Ancient Illyricum to EU: Lexical

Total Page:16

File Type:pdf, Size:1020Kb

The Albanian Linguistic Journey from Ancient Illyricum to EU: Lexical Linköping University Department of Culture and Communication Master’s Program Language and Culture in Europe The Albanian Linguistic Journey from Ancient Illyricum to EU "Lexical Borrowings" Ariola Kulla Language and Culture in Europe Spring Term, 2010 Supervisor: Richard Hirsch © Ariola Kulla Department of Culture and Communication Master’s Program in Language and Culture in Europe http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-57208 ISRN: LIU-IKK/MPLCE-A--10/01 Linköping University Electronic Press Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för icke- kommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan be- skrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se för- lagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet – or its possible replacement –from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © Ariola Kulla. © Ariola Kulla Department of Culture and Communication Master’s Program in Language and Culture in Europe http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-57208 ISRN: LIU-IKK/MPLCE-A--10/01 i Table of Contents List of illustrations .................................................................................................................... iii Acknowledgments .......................................................................................................................v Guide to the reader .................................................................................................................... vi A. Abbreviations ................................................................................................................... vi B. Symbols ..............................................................................................................................6 1 Introduction .........................................................................................................................1 1.1 Problem description ......................................................................................................2 1.2 Aim...............................................................................................................................2 2 Historical background .........................................................................................................3 3 Albanian as an Indo-European language ..............................................................................6 3.1 Albanian-PIE phonological correspondences .................................................................8 3.2 The origin controversy ................................................................................................ 10 3.3 Albanian Standard Language....................................................................................... 12 3.4 Dialects of Albanian.................................................................................................... 14 4 Lexical borrowing ............................................................................................................. 16 4.1 Definition of lexical borrowing ................................................................................... 17 4.2 Characteristics of Borrowings ..................................................................................... 20 4.3 Motivation for lexical borrowing ................................................................................. 21 4.4 Effects of borrowing ................................................................................................... 22 4.5 What is a Loanword? .................................................................................................. 22 4.5.1 Direct Loans ........................................................................................................ 23 4.5.2 Indirect Loans ...................................................................................................... 23 4.6 Borrowings from Greek (8th BC to 146 BC) ............................................................... 24 4.7 Borrowings from Latin ................................................................................................ 26 ii 4.8 Borrowings during the Byzantine period ..................................................................... 28 4.9 Borrowings from Turkish ............................................................................................ 32 4.10 Borrowings from Italian .......................................................................................... 35 4.11 Borrowings from English......................................................................................... 36 5 Conclusion ........................................................................................................................ 38 5.1 Summary and conclusion ............................................................................................ 38 5.2 Future Research .......................................................................................................... 39 Works Cited .............................................................................................................................. 40 Appendices ............................................................................................................................... 43 Appendix A- Albanian-PIE phonological correspondences .................................................... 43 Appendix B- Indo-European family of languages .................................................................. 47 Appendix C- International Phonetic Alphabet........................................................................ 48 iii List of illustrations Table 1: Albanian words and their PIE-roots ...............................................................................7 Table 2: Unclear origin of the borrowing ................................................................................... 17 Table 3: Loanwords .................................................................................................................. 18 Table 4: Loan translations ......................................................................................................... 18 Table 5: Foreign words ............................................................................................................. 19 Table 6: Loan blends ................................................................................................................. 19 Table 7: Sound changes ............................................................................................................. 20 Table 8: Direct loans ................................................................................................................. 23 Table 9: Indirect loans ............................................................................................................... 23 Table 10: Words borrowed from Greek ..................................................................................... 25 Table 11: Words borrowed from Latin ...................................................................................... 27 Table 12: Further words borrowed from Latin ........................................................................... 28 Table 13: Borrowings from Gothic ............................................................................................ 29 Table 14: Words borrowed from Slavic languages..................................................................... 30 Table 15: Words borrowed from New-Greek ............................................................................. 30 Table 16: More words borrowed from New-Greek .................................................................... 31 Table 17: Words borrowed
Recommended publications
  • Instituti Albanologjik I Prishtinës
    Begzad BALIU Onomastikë dhe identitet Prof.asc.dr. Begzad BALIU Onomastikë dhe identitet Recensues Prof. dr. Bahtijar Kryeziu Shtëpia botuese Era, Prishtinë, 2012 Botimin e këtij libri e ka përkrahur Drejtoria për Kulturë e Komunës së Prishtinës 2 Begzad BALIU ONOMASTIKË DHE IDENTITET Era Prishtinë, 2012 3 4 Bardhës, Erës, Enit, fëmijëve të mi! 5 6 PËRMBAJTJA PARATHËNIE E RECENSUESIT. ..............................11 HYRJE ..... .....................................................................15 I ONOMASTIKA E KOSOVËS - NDËRMJET MITEVE DHE IDENTITEVE ......................................................... 19 I. 1. Onomastika si fat ...................................................... 19 I. 2. Onomastika dhe origjina e shqiptarëve ..................... 21 I. 3. Onomastika dhe politika ........................................... 24 II. 1. Etnonimi kosovar..................................................... 27 II. 2. Ruajtja e homogjenitetit .......................................... 29 II. 3. Toponimi Kosovë dhe etnonimi kosovar ................ 30 III. 1. Konteksti shqiptaro-sllav i toponimisë................... 35 III. 2. Ndeshja: struktura e toponimisë ............................. 36 III. 3. Struktura shumështresore e toponimisë së Kosovës .......................................................................................... 39 III. 4. Konteksti ................................................................ 40 III. 5. Standardizimi i toponimisë dhe gjuha .................... 43 III. 6. Standardizimi i toponimisë - goditja
    [Show full text]
  • Failures and Achievements of Albanian Nationalism in the Era of Nationalism
    FAILURES AND ACHIEVEMENTS OF ALBANIAN NATIONALISM IN THE ERA OF NATIONALISM Nuray BOZBORA ABSTRACT The development of Albanian nationalism was not uniform from the beginning and it followed distinct patterns. First there were local protest movements, some were culturally based while others were created by the local elite to protest against local and specific problems. Later these different patterns in Albanian nationalism turned into mass uprising during the 1910 and 1911s. The aim of this paper is to understand the crucial period of mass uprising of Albanians and to analyse how these different patterns in the movement had participated and expressed themselves, what the basic motivation of uniting around a common purpose was, and ease and difficulties in this regard. Keywords: Nationalism, Nation, National Identity, Albanian Nationalism, Balkan Nationalism MİLLİYETÇİLİK DÖNEMİNDE ARNAVUT MİLLİYETÇİLİĞİNİN YETERSİZLİĞİ VE BAŞARILARI ÖZET Arnavut milliyetçiliği başlangıcından itibaren tek tip bir hareket olarak gelişmemiş kendi içinde farklılıklar göstermiştir. Hareket içindeki bu farklı gelişme biçimleri önce yerel protestolar, kültürel temelli hareketler ve belirli sorunlara karşı yerel seçkinlerin protesto harekeleri olarak ortaya çıkmıştır. Daha sonra Arnavut milliyetçiliği içindeki bu farklı gelişme biçimleri, 1910 ve 1911 yıllarında kitlesel bir ayaklanmaya dönüşmüştür. Çalışmanın amacı, kitlesel Arnavut ayaklanmasının ortaya çıktığı bu önemli dönemi, hareket içindeki farklılıkların kendilerini nasıl konumlandırdığı, nasıl ifade ettiği
    [Show full text]
  • Albanian Language Identification in Text
    5 BSHN(UT)23/2017 ALBANIAN LANGUAGE IDENTIFICATION IN TEXT DOCUMENTS *KLESTI HOXHA.1, ARTUR BAXHAKU.2 1University of Tirana, Faculty of Natural Sciences, Department of Informatics 2University of Tirana, Faculty of Natural Sciences, Department of Mathematics email: [email protected] Abstract In this work we investigate the accuracy of standard and state-of-the-art language identification methods in identifying Albanian in written text documents. A dataset consisting of news articles written in Albanian has been constructed for this purpose. We noticed a considerable decrease of accuracy when using test documents that miss the Albanian alphabet letters “Ë” and “Ç” and created a custom training corpus that solved this problem by achieving an accuracy of more than 99%. Based on our experiments, the most performing language identification methods for Albanian use a naïve Bayes classifier and n-gram based classification features. Keywords: Language identification, text classification, natural language processing, Albanian language. Përmbledhje Në këtë punim shqyrtohet saktësia e disa metodave standarde dhe bashkëkohore në identifikimin e gjuhës shqipe në dokumente tekstuale. Për këtë qëllim është ndërtuar një bashkësi të dhënash testuese e cila përmban artikuj lajmesh të shkruara në shqip. Për tekstet shqipe që nuk përmbajnë gërmat “Ë” dhe “Ç” u vu re një zbritje e konsiderueshme e saktësisë së identifikimit të gjuhës. Për këtë arsye u krijua një korpus trajnues i posaçëm që e zgjidhi këtë problem duke arritur një saktësi prej më shumë se 99%. Bazuar në eksperimentet e kryera, metodat më të sakta për identifikimin e gjuhës shqipe përdorin një klasifikues “naive Bayes” dhe veçori klasifikuese të bazuara në n-grame.
    [Show full text]
  • Reflections on the Religionless Society: the Case of Albania
    Occasional Papers on Religion in Eastern Europe Volume 16 Issue 4 Article 1 8-1996 Reflections on the Religionless Society: The Case of Albania Denis R. Janz Loyola University, New Orleans, Louisiana Follow this and additional works at: https://digitalcommons.georgefox.edu/ree Part of the Christianity Commons, and the Eastern European Studies Commons Recommended Citation Janz, Denis R. (1996) "Reflections on the Religionless Society: The Case of Albania," Occasional Papers on Religion in Eastern Europe: Vol. 16 : Iss. 4 , Article 1. Available at: https://digitalcommons.georgefox.edu/ree/vol16/iss4/1 This Article, Exploration, or Report is brought to you for free and open access by Digital Commons @ George Fox University. It has been accepted for inclusion in Occasional Papers on Religion in Eastern Europe by an authorized editor of Digital Commons @ George Fox University. For more information, please contact [email protected]. REFLECTIONS ON THE RELIGIONLESS SOCIETY: THE CASE OF ALBANIA By Denis R. Janz Denis R. Janz is professor of religious studies at Loyola University, New Orleans, · Louisiana. From the time of its inception as a discipline, the scientific study of religion has raised the question of the universality of religion. Are human beings somehow naturally religious? Has there ever been a truly religionless society? Is modernity itself inimical to religion, leading slowly but nevertheless inexorably to its extinction? Or does a fundamental human religiosity survive and mutate into ever new forms, as it adapts itself to the exigencies of the age? There are as of yet no clear answers to these questions. And religiologists continue to search for the irreligious society, or at least for the society in which religion is utterly devoid of any social significance, where the religious sector is a tiny minority made up largely of elderly people and assorted marginal figures.
    [Show full text]
  • Albanian Families' History and Heritage Making at the Crossroads of New
    Voicing the stories of the excluded: Albanian families’ history and heritage making at the crossroads of new and old homes Eleni Vomvyla UCL Institute of Archaeology Thesis submitted for the award of Doctor in Philosophy in Cultural Heritage 2013 Declaration of originality I, Eleni Vomvyla confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Signature 2 To the five Albanian families for opening their homes and sharing their stories with me. 3 Abstract My research explores the dialectical relationship between identity and the conceptualisation/creation of history and heritage in migration by studying a socially excluded group in Greece, that of Albanian families. Even though the Albanian community has more than twenty years of presence in the country, its stories, often invested with otherness, remain hidden in the Greek ‘mono-cultural’ landscape. In opposition to these stigmatising discourses, my study draws on movements democratising the past and calling for engagements from below by endorsing the socially constructed nature of identity and the denationalisation of memory. A nine-month fieldwork with five Albanian families took place in their domestic and neighbourhood settings in the areas of Athens and Piraeus. Based on critical ethnography, data collection was derived from participant observation, conversational interviews and participatory techniques. From an individual and family group point of view the notion of habitus led to diverse conceptions of ethnic identity, taking transnational dimensions in families’ literal and metaphorical back- and-forth movements between Greece and Albania.
    [Show full text]
  • Form, Function and History of the Present Suffix -I/-Ën in Albanian and Its Dialects
    M.A. Lopuhaä Form, function and history of the present suffix -i/-ën in Albanian and its dialects Master Thesis, July 1, 2014 Supervisor: Dr. M.A.C. de Vaan Contents 1 Introduction 4 2 Conventions and notation 5 3 Background and statement of the problem 7 3.1 The Albanian verbal system ................................... 7 3.2 The Proto-Albanian verbal system ............................... 8 3.3 Main research questions ..................................... 9 3.4 Previous work on the subject .................................. 9 4 Morphological changes from Old Albanian to Modern Albanian 11 4.1 Verbal endings in Old and Modern Albanian .......................... 11 4.2 Present singular .......................................... 12 4.3 Present plural ........................................... 12 4.4 Imperfect and subjunctive .................................... 13 5 Proto-Albanian reconstruction 14 6 Proto-Indo-European reconstruction 17 6.1 Vocalic nasals in Albanian .................................... 17 6.2 The reality of a PIE suffix *-n-ie/o- ............................... 18 7 Dialectal information 20 7.1 Buzuku .............................................. 23 7.2 Northwestern Geg ........................................ 23 7.3 Northern Geg ........................................... 24 7.4 Northeastern Geg ......................................... 25 7.5 Central Geg ............................................ 26 7.6 Southern Geg ........................................... 27 7.7 Transitory dialects .......................................
    [Show full text]
  • Alexander's Empire
    4 Alexander’s Empire MAIN IDEA WHY IT MATTERS NOW TERMS & NAMES EMPIRE BUILDING Alexander the Alexander’s empire extended • Philip II •Alexander Great conquered Persia and Egypt across an area that today consists •Macedonia the Great and extended his empire to the of many nations and diverse • Darius III Indus River in northwest India. cultures. SETTING THE STAGE The Peloponnesian War severely weakened several Greek city-states. This caused a rapid decline in their military and economic power. In the nearby kingdom of Macedonia, King Philip II took note. Philip dreamed of taking control of Greece and then moving against Persia to seize its vast wealth. Philip also hoped to avenge the Persian invasion of Greece in 480 B.C. TAKING NOTES Philip Builds Macedonian Power Outlining Use an outline to organize main ideas The kingdom of Macedonia, located just north of Greece, about the growth of had rough terrain and a cold climate. The Macedonians were Alexander's empire. a hardy people who lived in mountain villages rather than city-states. Most Macedonian nobles thought of themselves Alexander's Empire as Greeks. The Greeks, however, looked down on the I. Philip Builds Macedonian Power Macedonians as uncivilized foreigners who had no great A. philosophers, sculptors, or writers. The Macedonians did have one very B. important resource—their shrewd and fearless kings. II. Alexander Conquers Persia Philip’s Army In 359 B.C., Philip II became king of Macedonia. Though only 23 years old, he quickly proved to be a brilliant general and a ruthless politician. Philip transformed the rugged peasants under his command into a well-trained professional army.
    [Show full text]
  • The Albanian Case in Italy
    Palaver Palaver 9 (2020), n. 1, 221-250 e-ISSN 2280-4250 DOI 10.1285/i22804250v9i1p221 http://siba-ese.unisalento.it, © 2020 Università del Salento Majlinda Bregasi Università “Hasan Prishtina”, Pristina The socioeconomic role in linguistic and cultural identity preservation – the Albanian case in Italy Abstract In this article, author explores the impact of ever changing social and economic environment in the preservation of cultural and linguistic identity, with a focus on Albanian community in Italy. Comparisons between first major migration of Albanians to Italy in the XV century and most recent ones in the XX, are drawn, with a detailed study on the use and preservation of native language as main identity trait. This comparison presented a unique case study as the descendants of Arbëresh (first Albanian major migration) came in close contact, in a very specific set of circumstances, with modern Albanians. Conclusions in this article are substantiated by the survey of 85 immigrant families throughout Italy. The Albanian language is considered one of the fundamental elements of Albanian identity. It was the foundation for the rise of the national awareness process during Renaissance. But the situation of Albanian language nowadays in Italy among the second-generation immigrants shows us a fragile identity. Keywords: Language identity; national identity; immigrants; Albanian language; assimilation. 221 Majlinda Bregasi 1. An historical glance There are two basic dialect forms of Albanian, Gheg (which is spoken in most of Albania north of the Shkumbin river, as well as in Montenegro, Kosovo, Serbia, and Macedonia), and Tosk, (which is spoken on the south of the Shkumbin river and into Greece, as well as in traditional Albanian diaspora settlements in Italy, Bulgaria, Greece and Ukraine).
    [Show full text]
  • The Resurrection of Alexander Push Kin John Oliver Killens
    New Directions Volume 5 | Issue 2 Article 9 1-1-1978 The Resurrection Of Alexander Push kin John Oliver Killens Follow this and additional works at: http://dh.howard.edu/newdirections Recommended Citation Killens, John Oliver (1978) "The Resurrection Of Alexander Push kin," New Directions: Vol. 5: Iss. 2, Article 9. Available at: http://dh.howard.edu/newdirections/vol5/iss2/9 This Article is brought to you for free and open access by Digital Howard @ Howard University. It has been accepted for inclusion in New Directions by an authorized administrator of Digital Howard @ Howard University. For more information, please contact [email protected]. TH[ ARTS Essay every one of the courts of Europe, then 28 The Resurrection of of the 19th century? Here is how it came to pass. Peter felt that he had to have at least Alexander Pushkin one for his imperial court. Therefore, In the early part of the 18th century, in By [ohn Oliver Killens he sent the word out to all of his that sprawling subcontinent that took Ambassadors in Europe: To the majority of literate Americans, up one-sixth of the earth's surface, the giants of Russian literature are extending from the edge of Europe "Find me a Negro!" Tolstoy, Gogol, Dostoevsky and thousands of miles eastward across Meanwhile, Turkey and Ethiopia had Turgenev. Nevertheless, 97 years ago, grassy steppes (plains), mountain ranges been at war, and in one of the skirmishes at a Pushkin Memorial in Moscow, and vast frozen stretches of forest, lakes a young African prince had been cap- Dostoevsky said: "No Russian writer and unexplored terrain, was a land tured and brought back to Turkey and was so intimately at one with the known as the Holy Russian Empire, fore- placed in a harem.
    [Show full text]
  • Iso/Iec Jtc1/Sc2/Wg2 N4131r L2/11-296R
    ISO/IEC JTC1/SC2/WG2 N4131R L2/11-296R 2011-10-28 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Proposal for encoding the Caucasian Albanian script in the SMP of the UCS Source: UC Berkeley Script Encoding Initiative (Universal Scripts Project) Authors: Michael Everson and Jost Gippert Status: Liaison Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Date: 2011-10-28 1. Introduction. Tradition has it that the Armenian bishop Mesrop Mashtocʿ devised a script in the early fifth century CE not only for the Armenians, but also for the Caucasian Albanians, who lived in an area northeast of Armenia. (Causasian Albania is not the same as European Albania.) The Caucasian Albanian script was recognized in 1937 on the basis of an alphabet list in an Armenian manuscript in the Matenadaran collection in Yerevan and confirmed by a few inscriptions on artifacts excavated in north - west Azerbaijan around 1950 (Abuladze 1938:69–71, Šanidze 1938:47, Gippert et al 2009–10 §4.3ff.). In the 1990s two palimpsest manuscripts containing the Caucasian Albanian script were discovered by Zaza Aleksidze in St Catherine’s Monastery on Mount Sinai. These undated manuscripts appear to have been written during the seventh century CE, because the Caucasian Albanian state was conquered by the Arabs and their autonomous church was absorbed into the Armenian patriarchate in the 8th century CE (Aleksidzé and Mahé 1997:517, 2001; Aleksidze 2003). Between 1999 to 2008, the palimpsests were deciphered and the structure of the Caucasian Albanian language and script established by Gippert, Schulze, Aleksidzé, and Mahé who also demonstrated that Caucasian Albanian was closely related to, if not an ancestor of, the present-day Udi language.
    [Show full text]
  • Një Udhëzues Për Festat Pagane Shqiptare- -A Guide to Albanian Pagan Festivities- © Pagan Shqiptar © © Atp © -2021
    (Mali i Tomorrit, ‘‘Olimpi’’ Shqiptar - Tomorri Mountain, Albanian "Olympus") ⊕ -NJË UDHËZUES PËR FESTAT PAGANE SHQIPTARE- -A GUIDE TO ALBANIAN PAGAN FESTIVITIES- © PAGAN SHQIPTAR © © ATP © -2021- 2 1 -Parathënie ⊕ Preface- Këto artikuj të përmbledhur në këtë vepër përbënjë veprën e parë kushtuar krejtësisht festave pagane shqiptare. Botuar gjatë gjithë vitit 2020, secili prej tyre përqëndrohet në një festë specifike në një mënyrë të shkurtër e cila do të mundësojë një kuptim më të mirë të tyre, madje edhe për ata që nuk kanë njohuri mbi festat tona pagane shqiptare, por gjithashtu përmbajnë shumë detaje interesante që nxjerrin në pah perspektiva dhe kuptime të reja mbi festat tona pagane. Në të vërtetë, këto prezantime nuk janë prezantime të themeluara tashmë të festave tona antike pagane, por ato janë të mbushura me interpretime që i japin mundësi lexuesit të kuptojë dhe vlersojë ato në nivelet më të larta. Në nivel simbolik, çdo festë duhet të kuptohet si reflektim tokësor i një realiteti më të lartë kozmik. Duke u rrotulluar rreth këtyre festave, paraardhësit tanë ishin në një akordim me ritmin e Natyrës dhe në harmoni me Kozmosin, disiplina hynore e universit. Në nivelin historik, fakti që disa nga këto praktika u vunë re ende në mesin e njerëzve tanë në mes të shekullit XX, dëshmon se ato nuk janë një pjesë e parëndësishme e identitetit tonë. Për më tepër, respektimi i tyre, pavarësisht nga ndarjet e besimeve midis njerëzve tanë, dëshmon identitetin tonë të përbashkët dhe na bën me të vërtetë një popull. Në të vërtetë, nën festat tona antike pagane, ne gjejmë vlera që paraardhësit tanë i respektonin dhe i konsideronin si më të dashurit e tyre.
    [Show full text]
  • Morphological Tagging and Lemmatization of Albanian: a Manually Annotated Corpus and Neural Models
    Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models Nelda Kote¹, Marenglen Biba², Jenna Kanerva³, Samuel Rönnqvist³ and Filip Ginter³ ¹Faculty of Information Technology, Polytechnic University of Tirana, Albania ²Faculty of Information Technology, New York University of Tirana, Albania ³TurkuNLP Group, University of Turku, Finland [email protected], [email protected], {jmnybl, saanro, figint}@utu.fi Abstract In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it. There is currently a lack of available NLP resources for Albanian, and its complex grammar and morphology present challenges to their development. We have created an Albanian part-of-speech corpus based on the Universal Dependencies schema for morphological annotation, containing about 118,000 tokens of naturally occuring text collected from different text sources, with an addition of 67,000 tokens of artificially created simple sentences used only in training. On this corpus, we subsequently train and evaluate segmentation, morphological tagging and lemmatization models, using the Turku Neural Parser Pipeline. On the held-out evaluation set, the model achieves 92.74% accuracy on part-of-speech tagging, 85.31% on morphological tagging, and 89.95% on lemmatization. The manually annotated corpus, as well as the trained models are available under an open license. Keywords: part-of-speech tagging, morphological tagging, lemmatization, Albanian language 1. Introduction 2. Related Work The Albanian language is an Indo-European language There are several previous attempts to develop NLP tools spoken by around 7 million native speakers mostly in the for Albanian.
    [Show full text]