Neural Machine Translation Between Similar South-Slavic Languages

Total Page:16

File Type:pdf, Size:1020Kb

Neural Machine Translation Between Similar South-Slavic Languages Neural Machine Translation between similar South-Slavic languages Maja Popovic,´ Alberto Poncelas ADAPT Centre, School of Computing Dublin City University, Ireland [email protected] Abstract (NMT) has not been investigated yet for these lan- guages. This paper describes the ADAPT-DCU ma- In this work, we first compare bilingual and mul- chine translation systems built for the WMT tilingual systems in order to determine whether 2020 shared task on Similar Language Trans- joining Serbian and Croatian data is useful. After- lation. We explored several set-ups for wards, we investigate additional cleaning of remain- NMT for Croatian–Slovenian and Serbian– ing misaligned segments by using character n-gram Slovenian language pairs in both translation directions. Our experiments focus on differ- matching scores (Popovic´, 2015). The beauty of ent amounts and types of training data: we the method for similar languages is that it can be ap- first apply basic filtering on the OpenSubti- plied directly to the given training corpus providing tles training corpora, then we perform addi- matching scores for each pair of the source-target tional cleaning of remaining misaligned seg- segments. For distant languages, translation of one ments based on character n-gram matching. side of the training corpus would be required. Fi- Finally, we make use of additional monolin- nally, we make use of monolingual data in each of gual data by creating synthetic parallel data through back-translation. Automatic evalua- the three languages by creating additional synthetic tion shows that multilingual systems with joint parallel training sets via back-translation (Sennrich Serbian and Croatian data are better than bilin- et al., 2016a; Poncelas et al., 2018; Burlot and gual, as well as that character-based cleaning Yvon, 2018). leads to improved scores while using less data. The results also confirm once more that adding 2 Language properties back-translated data further improves the per- formance, especially when the synthetic data Common properties All three languages, Croa- is similar to the desired domain of the devel- tian, Serbian and Slovenian, belong to the South- opment and test set. This, however, might Western Slavic branch. As Slavic languages, they come at a price of prolonged training time, es- have a very rich inflectional morphology for all pecially for multitarget systems. word classes: six cases and three genders for all nouns, pronouns, adjectives and determiners. For 1 Introduction verbs, person and many tenses are expressed by the suffix so that the subject pronoun is often omitted. Machine translation (MT) between closely related There are two verb aspects, so that many verbs have languages is, in principle, less challenging than perfective and imperfective form(s) depending on translation between distantly related languages, but the duration of the described action. As for syntax, it is still far from being solved. While MT be- all three languages have quite a free word order, tween closely related South-Western Slavic lan- and neither language uses articles, either definite guages, Croatian, Slovenian and Serbian based on or indefinite. In addition to this, multiple negation the rule-based (RBMT) and the phrase-based (PB- is always used. SMT) approaches has been investigated in the last years (Etchegoyhen et al., 2014; Petkovski et al., Croatian and Serbian Croatian and Serbian ex- 2014; Klubickaˇ et al., 2016; Arcanˇ et al., 2016; hibit a large overlap in vocabulary and a strong Popovic´ et al., 2016a), to the best of our knowledge, morpho-syntactic similarity so that the speakers the new state-of-the-art neural machine translation can understand each other without difficulties. Nev- 430 Proceedings of the 5th Conference on Machine Translation (WMT), pages 430–436 Online, November 19–20, 2020. c 2020 Association for Computational Linguistics ertheless, there is a number of small but notable lang. set domain # sentences and also frequently occurring differences between sl-hr train Subtitles 11 213 386 them. The largest differences between the two dev PR publications 2457 languages are in vocabulary: some words are com- test PR publications 2582 pletely different, some however differ only by one sl-sr train Subtitles 11 780 062 or two letters. Apart from lexical differences, there dev PR publications 1259 are also structural differences mainly concerning test PR publications 1260 verbs: modal verb constructions, future tense, as well as conditional. Table 1: Corpus statistics. Slovenian Even though Slovenian is very closely related to Croatian and Serbian, and the languages Croatian and vice versa was shown to be helpful share a large degree of mutual intelligibility, a num- for the PBSMT approach (Popovic´ and Ljubesiˇ c´, ber of Croatian/Serbian speakers may have difficul- 2014; Popovic´ et al., 2016b). However, our prelim- ties with Slovenian and the other way round. The inary experiments in this direction indicated that nature of the lexical differences is similar to the one this technique is not helpful for the NMT approach. between Croatian and Serbian, namely a number of words is completely different and a number only The original parallel data were filtered in order differs by one or two letters. However, the amount to eliminate noisy parts: too long segments (more of different words is much larger. In addition to than 100 words), segment pairs with dispropor- that, the set of overlapping words includes a num- tional sentence lengths, segments with more than ber of false friends (e.g. brati means to pluck in 1/3 of non-alphanumeric characters, as well as du- Croatian and Serbian but to read in Slovenian). plicate segment pairs were removed. The statistics The amount of grammatical differences is also of the remaining subtitles together with the devel- larger and includes local word order, verb mood opment and test sets is shown in Table1. The and/or tense formation, question structure, usage development and test sets were provided by the of cases, structural properties for certain conjunc- organisers and originate from Public Relations pub- tions, as well as some other structural differences. lications of a business intelligence company. Another important difference is the Slovenian dual grammatical number which refers to two entities (apart from singular for one and plural for more 3.1 Additional cleaning of OpenSubtitles than two). It requires additional set of pronouns, as well as additional sets for noun, adjective and verb While a large number of noisy parts and misaligned inflexion rules not existing either in Croatian or in segments was removed from OpenSubtitles by the Serbian. basic filtering procedure, a number of misaligned segments still remained. In order to remove these, 3 Data we applied additional cleaning based on the char- For training, we used publicly available OPUS1 acter n-gram F-score chrF usually used for MT parallel corpora (Tiedemann, 2012) indicated by evaluation (Popovic´, 2015). For the purpose of the workshop organisers. OpenSubtitles is indi- cleaning, the chrF score is calculated for each pair cated for all translation directions. For Croatian– of segments in the training data. Due to simi- Slovenian, other corpora are indicated too, but they larity between the languages, the scores between are either not sentence-aligned (JW300) or are ex- the properly aligned segments are higher than the tremely noisy (DGT, MultiParaCrawl). Therefore, scores of misaligned segments. Nevertheless, the we decided to use only OpenSubtitles for all trans- languages are sufficiently different so that some lation directions. properly aligned short segments (or single words) can have low scores, too. Still, if those words It is worth noting that the organisers also in- also appear in longer sentences, they will not be dicated the SETIMES News parallel Croatian– removed. Preliminary experiments with different Serbian corpus. Developing an additional Croatian– thresholds showed that keeping the segments with Serbian MT system for converting Serbian data into the chrF score equal or greater than 20 is the best 1http://opus.nlpl.eu/ option. 431 3.2 Using monolingual data lion best ranked Croatian sentences were translated In addition to the parallel OpenSubtitles corpora, into Slovenian. we also used the monolingual data in each of the Translation into Slovenian: Slovenian is the target three languages which were indicated by the or- language for two translation directions, and we ganisers, namely the mixed-domain data collected wanted to have equally relevant Slovenian sen- from Web, hrWac, slWac and hrWac (Ljubesiˇ c´ and tences for both directions. Therefore, we did not Erjavec, 2011; Ljubesiˇ c´ and Klubickaˇ , 2014). As take the first two million sentences for one source a first step, we removed too long and too short language and the second two million for the other, sentences, keeping those between 5 and 60 words. because the Slovenian sentences for the first source Then, we removed sentences with more than 1/3 language would be more relevant than those for the of non-alphanumeric characters, sentences with second source language. Instead, we took the first URLs, as well as duplicate sentences. four million best ranked Slovenian sentences, and Then, we wanted to rank these sentences accord- then translated every odd sentence into Serbian and ing to the relevance for our experiments, namely every even sentence into Croatian. according to their similarity to the development corpus. For this purpose, we used Feature Decay 4 MT systems Algorithm (FDA) (Bic¸ici and Yuret, 2011). This All our systems are built using the Sockeye imple- method iteratively selects sentences from an ini- mentation (Hieber et al., 2018) of the Transformer tial set S based on the number of n-grams which architecture (Vaswani et al., 2017). The systems overlap with an in-domain text Seed and adds these operate on sub-word units generated by byte-pair sentences to a selected set Sel. In addition, in order encoding (BPE) (Sennrich et al., 2016b).
Recommended publications
  • The Relative Order of Foci and Polarity Complementizers
    1 2 3 4 The Relative Order of Foci and Polarity Complementizers: 5 1 6 A Slavic Perspective 7 8 9 Abstract 10 11 According to Rizzi & Bocci’s (2017) suggested hierarchy of the left 12 periphery, fronted foci (FOC) may never precede polarity 13 14 complementizers (POL); yet languages like Bulgarian and Macedonian, 15 where POL may also follow FOC, seem to provide a counterargument to 16 17 such generalization. On the basis of a cross-linguistic comparison of ten 18 Slavic languages, I argue that in the Slavic subgroup the possibility of 19 20 having a focus precede POL is dependent on the morphological properties 21 of the complementizer itself: in languages where the order FOC < POL 22 23 is acceptable, POL is a complex morpheme derived through the 24 incorporation of a lower functional head with a higher one. The order 25 26 FOC < POL is then derived by giving overt spell-out to the intermediate 27 copy of the polarity complementizer rather than to the highest one. 28 29 30 Keywords : Left Periphery, Fronted Foci, Polarity Questions, 31 Complementizers, Bulgarian, Macedonian, Word Order. 32 33 34 I. Introduction 35 36 37 In this article, I will be concerned with accounting for cross-linguistic variation in the 38 relative distribution of two types of left-peripheral elements. The first are polarity 39 40 complementizers (POL), namely complementizers whose function is that of introducing 41 embedded polarity questions; the second are fronted types of constituents in narrow focus. 42 43 I provide an example of a configuration containing both elements in (1), from Italian; in 44 (1), the polarity complementizer “se” (=if) is marked in bold, whereas the fronted focus -in 45 this case, a PP- is in capitals.
    [Show full text]
  • The Production of Lexical Tone in Croatian
    The production of lexical tone in Croatian Inauguraldissertation zur Erlangung des Grades eines Doktors der Philosophie im Fachbereich Sprach- und Kulturwissenschaften der Johann Wolfgang Goethe-Universität zu Frankfurt am Main vorgelegt von Jevgenij Zintchenko Jurlina aus Kiew 2018 (Einreichungsjahr) 2019 (Erscheinungsjahr) 1. Gutacher: Prof. Dr. Henning Reetz 2. Gutachter: Prof. Dr. Sven Grawunder Tag der mündlichen Prüfung: 01.11.2018 ABSTRACT Jevgenij Zintchenko Jurlina: The production of lexical tone in Croatian (Under the direction of Prof. Dr. Henning Reetz and Prof. Dr. Sven Grawunder) This dissertation is an investigation of pitch accent, or lexical tone, in standard Croatian. The first chapter presents an in-depth overview of the history of the Croatian language, its relationship to Serbo-Croatian, its dialect groups and pronunciation variants, and general phonology. The second chapter explains the difference between various types of prosodic prominence and describes systems of pitch accent in various languages from different parts of the world: Yucatec Maya, Lithuanian and Limburgian. Following is a detailed account of the history of tone in Serbo-Croatian and Croatian, the specifics of its tonal system, intonational phonology and finally, a review of the most prominent phonetic investigations of tone in that language. The focal point of this dissertation is a production experiment, in which ten native speakers of Croatian from the region of Slavonia were recorded. The material recorded included a diverse selection of monosyllabic, bisyllabic, trisyllabic and quadrisyllabic words, containing all four accents of standard Croatian: short falling, long falling, short rising and long rising. Each target word was spoken in initial, medial and final positions of natural Croatian sentences.
    [Show full text]
  • The Potentiality of Pluricentrism Albanian Case Studies and Beyond
    AlbF Albanische Forschungen 41 41 The Potentiality of Pluricentrism of The Potentiality The Potentiality of Pluricentrism Albanian Case Studies and Beyond Edited by Lumnije Jusufi Harrassowitz www.harrassowitz-verlag.de Harrassowitz Verlag Albanische Forschungen Begründet von Georg Stadtmüller Für das Albanien-Institut herausgegeben von Peter Bartl unter Mitwirkung von Bardhyl Demiraj, Titos Jochalas und Oliver Jens Schmitt Band 41 2018 Harrassowitz Verlag . Wiesbaden The Potentiality of Pluricentrism Albanian Case Studies and Beyond Edited by Lumnije Jusufi 2018 Harrassowitz Verlag . Wiesbaden Bibliografi sche Information der Deutschen Nationalbibliothek Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografi e; detaillierte bibliografi sche Daten sind im Internet über http://dnb.dnb.de abrufbar. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografi e; detailed bibliographic data are available in the internet at http://dnb.dnb.de For further information about our publishing program consult our website http://www.harrassowitz-verlag.de © Otto Harrassowitz GmbH & Co. KG, Wiesbaden 2018 This work, including all of its parts, is protected by copyright. Any use beyond the limits of copyright law without the permission of the publisher is forbidden and subject to penalty. This applies particularly to reproductions, translations, microfilms and storage and processing in electronic systems. Printed
    [Show full text]
  • Mutual Intelligibility Between West and South Slavic Languages Golubovic, Jelena; Gooskens, Charlotte
    University of Groningen Mutual intelligibility between West and South Slavic languages Golubovic, Jelena; Gooskens, Charlotte Published in: Russian Linguistics DOI: 10.1007/s11185-015-9150-9 IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2015 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Golubovic, J., & Gooskens, C. (2015). Mutual intelligibility between West and South Slavic languages. Russian Linguistics, 39, 351–373. https://doi.org/10.1007/s11185-015-9150-9 Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 24-09-2021 Russ Linguist (2015) 39:351–373 DOI 10.1007/s11185-015-9150-9 Mutual intelligibility between West and South Slavic languages Взаимопонимание между западнославянскими и южнославянскими языками Jelena Golubovic´1 · Charlotte Gooskens1 Published online: 18 September 2015 © The Author(s) 2015.
    [Show full text]
  • Language in Croatia: Influenced by Nationalism
    Language in Croatia: Influenced by Nationalism Senior Essay Department of Linguistics, Yale University CatherineM. Dolan Primary Advisor: Prof. Robert D. Greenberg Secondary Advisor: Prof. Dianne Jonas May 1, 2006 Abstract Language and nationalism are closely linked, and this paper examines the relationship between the two. Nationalism is seen to be a powerful force which is capable ofusing language for political purposes, and the field oflinguistics has developed terminology with which the interface oflanguage and nationalism maybe studied. Using this background, the language situation in Croatia may be examined and seen to be complex. Even after thorough evaluation it is difficult to determine how languages and dialects should be delineated in Croatia, but it is certain that nationalism and politics play key roles in promoting the nation's linguistic ideals. 2 , Acknowledgements I suppose I could say that this essay was birthed almost two years ago, when I spent the summer traveling with a group ofstudents throughout Croatia, Bosnia and Serbia in order to study issues ofjustice and reconciliation. Had I never traveled in the region I may have never gained an interest in the people, their history and, yes, their language(s). Even after conducting a rigorous academic study ofthe issues plaguing former Socialist Federal Republic ofYugoslavia, I carry with me the impression that this topic can never be taken entirely into the intellectual realm; I am reminded by my memories that the Balkan conflicts involve people just as real as myself. For this, I thank all those who shared those six weeks oftraveling. That summer gave me new perspectives on many areas oflife.
    [Show full text]
  • Mutual Intelligibility and Slavic Constructed Interlanguages : a Comparative Study of Ruski Jezik and ​ ​ Interslavic
    Mutual intelligibility and Slavic constructed interlanguages : a comparative study of Ruski Jezik and ​ ​ Interslavic ** Jade JOANNOT M.A Thesis in Linguistics Leiden university 2020-2021 Supervisor : Tijmen Pronk TABLE OF CONTENTS Jade Joannot M.A Thesis Linguistics 24131 words 1.1. Abstract 1.2. Definitions 1.2.1. Constructed languages 1.2.2. Interlanguage 1.2.3. Mutual intelligibility 1.3. Object of study 1.3.1. History of Slavic constructed languages Pan-Slavic languages (19th century) Esperanto-inspired projects Contemporary projects 1.3.2. Ruski Jezik & Interslavic Ruski Jezik (17th century) Interslavic (21th century) 1.3.3. Shared aspects of Ruski Jezik and Interslavic 1.4. Relevance of the study 1.4.1. Constructed languages and mutual intelligibility 1.4.2. Comparative study of Ruski Jezik and Interslavic 1.4.3. Historical linguistics 1.5. Structure of the thesis 1.5.1. Research question 1.8. Description of the method 1.8.1. Part 1 : Approaches to Slavic mutual intelligibility and their conclusions 1.8.2. Part 2 : Study of Ruski Jezik and Interslavic I.1. Factors of mutual intelligibility I.1.1. Extra-linguistic factors I.1.2. Linguistic predictors of mutual intelligibility I.1.2.1. Lexical distance I.1.2.2. Phonological distance I.1.2.3. Morphosyntactic distance I.1.2.3.1. Methods of measurements I.1.2.3.2. The importance of morphosyntax I.1.3. Conclusions I.2. Mutual intelligibility in the Slavic area I.2.1. Degree of mutual intelligibility of Slavic languages I.2.2. The case of Bulgarian 2 Jade Joannot M.A Thesis Linguistics 24131 words I.2.3.
    [Show full text]
  • Macedonian: Genealogy, Typology and Sociolinguistics
    JEZIKOSLOVNI ZAPISKI 26 2020 2 43 MATEJ šEKLI MACEDONIAN: GENEALOGY, TYPOLOGY AND SOCIOLINGUISTICS COBISS: 1.01 HTTPS://DOI.ORG/10.3986/JZ.26.2.04 Makedonščina: geneo-, tipo- in sociolingvistična opredelitev V prispevku je makedonščina opredeljena geneo-, tipo- in sociolingvistično. S stališča genealoškega jezikoslovja je s pomočjo relativne kronologije in zemljepisne razširjeno- sti relevantnih jezikovnih sprememb prikazano oblikovanje makedonščine in bolgarščine kot geolektov znotraj vzhodne južne slovanščine. Z gledišča tipološkega jezikoslovja je umeščena v kontekst balkanske jezikovne zveze. Sociolingvistični pogled pa prikaže pro- ces standardizacije in pravni položaj sodobnega makedonskega knjižnega/standardnega jezika kot sociolekta. Ključne besede: makedonščina, bolgarščina, genealosko jezikoslovje, tipološko jeziko- slovje, sociolingvistika The article attempts to define Macedonian from the view-point of linguistic genealogy and typology as well as sociolinguistics. The genesis of Macedonian and Bulgarian as geolects within Eastern South Slavic is discussed from the vantage point of genealogical linguistics, using the relative chronology and the geographical distribution of the individ- ual linguistic changes. The typological part of the discussion then attempts to establish the position of Macedonian in the context of the so-called Balkan Sprachbund. Finally, the process of standardisation and the legal status of modern Macedonian literary/standard language as a sociolect are presented, thus shedding additional light on the
    [Show full text]
  • The Loss of Case Inflection in Bulgarian and Macedonian
    SLAVICA HELSINGIENSIA 47 THE LOSS OF CASE INFLECTION IN BULGARIAN AND MACEDONIAN Max Wahlström HELSINKI 2015 SLAVICA HELSINGIENSIA 47 Series editors Tomi Huttunen, Jouko Lindstedt, Ahti Nikunlassi Published by: Department of Modern Languages P.O. Box 24 (Unioninkatu 40 B) 00014 University of Helsinki Finland Copyright © by Max Wahlström ISBN 978-951-51-1185-2 (paperback) ISBN 978-951-51-1186-9 (PDF) ISSN-L 0780-3281, ISSN 0780-3281 (Print), ISSN 1799-5779 (Online) Summary Case inflection, characteristic of Slavic languages, was lost in Bulgarian and Macedonian approximately between the 11th and 16th centuries. My doctoral dissertation examines the process of this language change and sets out to find its causes and evaluate its consequences. In the earlier research literature, the case loss has been attributed either to language contacts or language internal sound changes, yet none of the theories based on a single explaining factor has proven satisfactory. In this study, I argue that the previous researchers of the Late Medieval manuscripts have often tried to date changes in the language earlier than what is plausible in light of the textual evidence. Also, I propose that the high number of second language speakers is among the key factors that reduced the number of morphological categories in the language, but, at the same time, several minor developments related to the case loss—for instance, in the marking of possession—are likely to result from a specific contact mechanism known as the Balkan linguistic area. My main methodological argument is that the study of language contacts must take into account a general typological perspective to determine the uniqueness of the suspected contact-induced changes.
    [Show full text]
  • Mutual Intelligibility Between West and South Slavic Languages Взаимопонимание Между Западнославянскими И Южнославянскими Языками
    Russ Linguist (2015) 39:351–373 DOI 10.1007/s11185-015-9150-9 Mutual intelligibility between West and South Slavic languages Взаимопонимание между западнославянскими и южнославянскими языками Jelena Golubovic´1 · Charlotte Gooskens1 Published online: 18 September 2015 © The Author(s) 2015. This article is published with open access at Springerlink.com Abstract In the present study we tested the level of mutual intelligibility between three West Slavic (Czech, Slovak and Polish) and three South Slavic languages (Croatian, Slovene and Bulgarian). Three different methods were used: a word translation task, a cloze test and a picture task. The results show that in most cases, a division between West and South Slavic languages does exist and that West Slavic languages are more intelligible among speakers of West Slavic languages than among those of South Slavic languages. We found an asymmetry in Croatian-Slovene intelligibility, whereby Slovene speakers can understand written and spoken Croatian better than vice versa. Finally, we compared the three methods and found that the word translation task and the cloze test give very similar results, while the results of the picture task are somewhat unreliable. Аннотация В настоящей статье рассматривается вопрос взаимопонимания между тремя западнославянскими (чешским, словацким и польским) и тремя южнославян- скими (хорватским, словенским и болгарским) языками. В исследовании использова- лись три различных метода: задание перевода слов, тестовое задание с заполнением пробелов и изобразительное задание. Результатытеста показывают,что в большинстве случаев разделение между западнославянскими и южнославянскими языками сущест- вует. При этом западнославянские языки—более близки между собой в смысле пони- мания для их носителей, в то время как у носителей южнославянских языков такое взаимопонимание затруднено.
    [Show full text]
  • Musical Practices in the Balkans: Ethnomusicological Perspectives
    MUSICAL PRACTICES IN THE BALKANS: ETHNOMUSICOLOGICAL PERSPECTIVES МУЗИЧКЕ ПРАКСЕ БАЛКАНА: ЕТНОМУЗИКОЛОШКЕ ПЕРСПЕКТИВЕ СРПСКА АКАДЕМИЈА НАУКА И УМЕТНОСТИ НАУЧНИ СКУПОВИ Књига CXLII ОДЕЉЕЊЕ ЛИКОВНЕ И МУЗИЧКЕ УМЕТНОСТИ Књига 8 МУЗИЧКЕ ПРАКСЕ БАЛКАНА: ЕТНОМУЗИКОЛОШКЕ ПЕРСПЕКТИВЕ ЗБОРНИК РАДОВА СА НАУЧНОГ СКУПА ОДРЖАНОГ ОД 23. ДО 25. НОВЕМБРА 2011. Примљено на X скупу Одељења ликовне и музичке уметности од 14. 12. 2012, на основу реферата академикâ Дејана Деспића и Александра Ломе У р е д н и ц и Академик ДЕЈАН ДЕСПИЋ др ЈЕЛЕНА ЈОВАНОВИЋ др ДАНКА ЛАЈИЋ-МИХАЈЛОВИЋ БЕОГРАД 2012 МУЗИКОЛОШКИ ИНСТИТУТ САНУ SERBIAN ACADEMY OF SCIENCES AND ARTS ACADEMIC CONFERENCES Volume CXLII DEPARTMENT OF FINE ARTS AND MUSIC Book 8 MUSICAL PRACTICES IN THE BALKANS: ETHNOMUSICOLOGICAL PERSPECTIVES PROCEEDINGS OF THE INTERNATIONAL CONFERENCE HELD FROM NOVEMBER 23 TO 25, 2011 Accepted at the X meeting of the Department of Fine Arts and Music of 14.12.2012., on the basis of the review presented by Academicians Dejan Despić and Aleksandar Loma E d i t o r s Academician DEJAN DESPIĆ JELENA JOVANOVIĆ, PhD DANKA LAJIĆ-MIHAJLOVIĆ, PhD BELGRADE 2012 INSTITUTE OF MUSICOLOGY Издају Published by Српска академија наука и уметности Serbian Academy of Sciences and Arts и and Музиколошки институт САНУ Institute of Musicology SASA Лектор за енглески језик Proof-reader for English Јелена Симоновић-Шиф Jelena Simonović-Schiff Припрема аудио прилога Audio examples prepared by Зоран Јерковић Zoran Jerković Припрема видео прилога Video examples prepared by Милош Рашић Милош Рашић Технички
    [Show full text]
  • From Pie to Serbian
    [A SHORTCUT] FROM PIE TO SERBIAN ALEKSANDRA TOMIC HISTORICAL LINGUISTICS UNIVERSITY OF FLORIDA, 2018 4/19/2018 "2 PRESENTATION FOCUS • What makes Serbian – Serbian? What makes Polish – Polish? • Differences between Slavic languages and other Indo-European (IE) languages • Differences among South, West and East Slavic languages • Differences within the South Slavic language group 4/19/2018 "3 CONTEMPORARY SLAVIC LANGUAGES 4/19/2018 "4 SERBIAN IN RELATION TO OTHER SLAVIC LANGUAGES 4/19/2018 "5 SERBIAN LANGUAGE • 30 phonemes, 25 consonants, 5 vowels (no diphthongs) • Interesting features: • Plenty of palatal affricates, with softness and hardness (laminality) distinctions • /r/ trill • pitch accent: short-falling, short-rising, long-falling, long-rising • 7-case system (nouns, pronouns, adjectives, determiners) • 4 verb conjugation classes • synthetic language (prefixation, suffixation, infixation) • free word order • agreement: • Determiner-adjective-noun agreement in number, gender, case • Subject-verb agreement in case, number, gender 4/19/2018 "6 SERBIAN PHONOLOGY • Vowels, short and long Front Central Back Close i u Mid e o Open a 4/19/2018 "7 SERBIAN PHONOLOGY • Pitch accent Slavicist! IPA! Description symbol symbol ȅ ê short vowel with falling tone ȇ ê# long vowel with falling tone è $ short vowel with rising tone é $# long vowel with rising tone e e non-tonic short vowel ē e# non-tonic long vowel 4/19/2018 "8 SERBIAN PHONOLOGY • Consonants Many palatal sounds Many affricates Today’s presentation might help you figure out why! 4/19/2018 "9 CONTEMPORARY DIFFERENCES AMONG SLAVIC LANGUAGES • Proto-Slavic: *golvà, ‘head’: • Serbian (South Slavic) – Lat. gláva; Cyr. глав( а • Russian (East Slavic) – Cyr.
    [Show full text]
  • Downloaded from Brill.Com10/05/2021 02:14:13AM Via Free Access
    journal of language contact 12 (2019) 305-343 brill.com/jlc Language Contact in Social Context: Kinship Terms and Kinship Relations of the Mrkovići in Southern Montenegro Maria S. Morozova Russian Academy of Sciences, Saint Petersburg, Russia [email protected] Abstract The purpose of this article is to study the linguistic evidence of Slavic-Albanian lan- guage contact in the kinship terminology of the Mrkovići, a Muslim Slavic-speaking group in southern Montenegro, and to demonstrate how it refers to the social con- text and the kind of contact situation. The material for this study was collected during fieldwork conducted from 2012 to 2015 in the villages of the Mrkovići area. Kinship terminology of the Mrkovići dialect is compared with that of bcms, Albanian, and the other Balkan languages and dialects. Particular attention is given to the items borrowed from Albanian and Ottoman Turkish, and to the structural borrowing from Albanian. Information presented in the article will be of interest to linguists and anthropologists who investigate kinship terminologies in the world’s languages or do their research in the field of Balkan studies with particular attention to Slavic-Albanian contact and bilingualism. Keywords Mrkovići dialect – kinship terminology – language contact – bilingualism – borrowing – imposition – BCMS – Albanian – Ottoman Turkish © maria s. morozova, 2019 | doi:10.1163/19552629-01202003 This is an open access article distributed under the terms of the prevailing cc-by-nc License at the time of publication. Downloaded from Brill.com10/05/2021 02:14:13AM via free access <UN> 306 Morozova 1 Introduction Interaction between Slavs and Albanians in the Balkans has resulted in numer- ous linguistic changes, particularly for those dialects that were in immediate contact with one another.
    [Show full text]