English Applied Linguistics Phd Program

Total Page:16

File Type:pdf, Size:1020Kb

English Applied Linguistics Phd Program University of Szeged Graduate School in Linguistics English Applied Linguistics PhD Program Language policies and language ideologies related to multilingualism: A case study of the Hungarian minority population in Szeklerland PhD dissertation Zsuzsanna Éva Kiss Supervisor: Prof. Miklós Kontra Szeged 2012 Language policies and language ideologies related to multilingualism: A case study of the Hungarian minority population in Szeklerland Table of contents Chapter 1. Introduction: The aim of the research ....................................................... 6 Chapter 2. Sociolinguistic background ......................................................................... 9 2.1. Historical background ...................................................................................................................... 9 2.2. Language policy: Language as a right, as a resource or as a problem.......................................... 10 2.2.1. Romanian ................................................................................................................................ 10 2.2.2. Hungarian ................................................................................................................................ 18 2.2.3. Foreign language learning ....................................................................................................... 25 2.3. Language ideologies ....................................................................................................................... 28 Chapter 3. Theoretical framework .............................................................................. 30 3.1. Language policy and planning ....................................................................................................... 30 3.2. Language ideologies ....................................................................................................................... 38 3.3. Combining language policies and language ideologies ................................................................. 43 Chapter 4. Methodology ............................................................................................... 45 4.1. Research questions ......................................................................................................................... 45 4.2. Type of research: Case study .......................................................................................................... 45 4.3. Place of data collection .................................................................................................................. 47 4.3.1. Demographic, educational, linguistic and geographic background ......................................... 47 4.3.2. School profile .......................................................................................................................... 49 4.4. Data collection................................................................................................................................ 51 4.5. Participants .................................................................................................................................... 53 4.6. Data analysis: Thematic content analysis ...................................................................................... 56 4.7. Language policy and language ideologies evaluation .................................................................... 58 4.8. My position as a researcher ............................................................................................................ 59 Chapter 5. Results ......................................................................................................... 65 5.1. Multilingualism ............................................................................................................................... 65 5.2. The Romanian language ................................................................................................................. 73 5.2.1. Language policy ...................................................................................................................... 73 5.2.2. The ideology of territoriality: Is Szeklerland different from the rest of Romania? ................. 86 5.2.3. Learning Romanian: A territory-imposed obligation?............................................................. 95 5.2.4. Further ideologies related to learning Romanian ................................................................... 102 5.2.5. Terminological disagreement: the Romanian as a “foreign” language ideology ................... 105 5.3. The English language: the “English at every step” policy is “not enough” ................................ 109 5.4. Hungarian ..................................................................................................................................... 116 Chapter 6. Discussion ................................................................................................. 125 6.1. Language ideologies ..................................................................................................................... 125 6.2. Language policy............................................................................................................................ 128 6.3. Language policy and language ideologies.................................................................................... 133 Chapter 7. Conclusion ................................................................................................ 138 Appendix ...................................................................................................................... 143 References .................................................................................................................... 169 Notes ............................................................................................................................. 184 Acknowledgements: The manuscript has gained very much from the advice and suggestions of a number of people, in particular my supervisor, Miklós Kontra. Without his expertise in both the linguistic and historico-political situation of Hungarians in Romania, his reading instructions and factual guidance, this dissertation would never have been completed. I would also like to express my gratitude to Anna Fenyvesi, whose patience, encouragement and innumerable linguistic corrections have significantly helped this dissertation come into existence. I am in debt to the LINEE Project for the four year scholarship during and after my PhD studies at the University of Szeged (Hungary) and the financial support I have benefited from while participating at a number of workshops, fieldwork trips, summer schools and conferences all around Europe. For the most part, the writing of this dissertation has taken place in the staff room of my workplace, the János Karácsonyi Catholic Grammar School in Gyula, Hungary. I would like to thank the school‟s principal, Zoltán Petrócki, and my colleagues for their support and pleasant working environment they offered throughout the long afternoon hours I spent at school. Also, I am grateful to all the teachers, school principals, students, educational representatives and parents in Szeklerland who agreed to participate as subjects for the LINEE Project and actively helped me in carrying out my fieldwork. Last but not least, thank you for the pre-examiners, Attila Benő and Anna Borbély for their proposed corrections. Keywords: language policy, language ideologies, multilingualism, Szeklerland. Abstract This dissertation aims to investigate the educational language policies of minority Hungarian educational settings in Szeklerland and to trace aspects of language ideologies connected to these language policies in terms of multilingualism as conceived of by students, parents, teachers, school presidents and educational officials (Throop, 2007). The characteristics of multilingualism are explored with explicit focus on the Hungarian, Romanian and English language. The dissertation combines the theoretical framework of language policy and language ideologies as proposed by Spolsky (2004, 2007) and Shohamy (2006a), who conceive of the aforementioned linguistic fields to be interconnected to each other. Chapter 1. Introduction: The aim of the research In the present dissertation I aim to look at the language policies and language ideologies related to multilingualism in the educational context of the Hungarian minority population in Szeklerland. The motivation for the choice of the topic can be briefly summarized as follows. In recent decades it has become commonplace in European bilingual contexts to have more than two languages in the school curriculum and studies carried out to explore the characteristics of multilingual education are becoming more and more widespread. However, the number of research studies exploring multilingualism within the combined theoretical framework of language policies and language ideologies is still insignificant. The number of studies which aim to explore the phenomenon of multilingualism in Szeklerland, Romania, in the educational context of the Hungarian minority is even smaller. For instance, Kontra (1995: 20) argues that due to the fact that under communism it was taboo to speak about the issues of multilingualism, of ethnic identity and linguistic human rights in East-Central Europe, scholarly research on these problems was only being started in the middle of the 1990‟s. In addition, Kontra (2009: 93)
Recommended publications
  • Roma of Romania
    Center for Documentation and Information on Minorities in Europe - Southeast Europe (CEDIME-SE) MINORITIES IN SOUTHEAST EUROPE Roma of Romania Acknowledgements This report was prepared in cooperation with the Ethnocultural Diversity Resource Center (EDRC). It was researched and written by Cathy O’Grady and Daniela Tarnovschi, and updated by Tibor Szasz, Researchers of CEDIME-SE and EDRC. It was edited by Panayote Dimitras, Director of CEDIME-SE; Nafsika Papanikolatos, Coordinator of CEDIME-SE; and Caroline Law and Ioana Bianca Rusu, English Language Editors of CEDIME-SE and EDRC. CEDIME-SE and EDRC would like to express their deep appreciation to the external reviewers of this report, Gabriel Andreescu, program director of “National Minorities and Religious Freedom,” member of Romanian Helsinki Committee, Istvan Haller, program coordinator of Liga ProEuropa, Florin Moisa, Executive President of the Resource Center for Roma Communities, and Julius Rostaş governmental expert at the Department for Protection of National Minorities -National Office for Roma. CEDIME-SE and EDRC would also like to thank all persons who generously provided information and/or documents, and/or gave interviews to their researchers. The responsibility for the report’s content, though, lies only with CEDIME-SE. We welcome all comments sent to: [email protected]. 1 MAJOR CHARACTERISTICS Updated: November 2001 State Romania Name (in English, in the dominant language and, if different, in the minority language) Roma (English), Ţigani, or sometimes Romi (Romanian), Rom (the language of the minority). Is there any form of recognition of the minority? Yes. The government Department for the Protection of National Minorities has a National Office for Roma.
    [Show full text]
  • Aleksey A. ROMANCHUK ROMANIAN a CINSTI in the LIGHT of SOME
    2021, Volumul XXIX REVISTA DE ETNOLOGIE ȘI CULTUROLOGIE E-ISSN: 2537-6152 101 Aleksey A. ROMANCHUK ROMANIAN A CINSTI IN THE LIGHT OF SOME ROMANIAN-SLAVIC CONTACTS1 https://doi.org/10.52603/rec.2021.29.14 Rezumat определенное отражение следы обозначенного поздне- Cuvântul românesc a cinsti în lumina unor contacte праславянского диалекта (диалектов). В частности, к româno-slave таким следам, возможно, стоит отнести как украинские Pornind de la comparația dintre cuvântul ucrainean диалектные чандрий, шандрий, чендрий, так и диалект- частувати ‚a trata’ și românescul a cinsti‚ a trata (cu vin), ное (зафиксировано в украинском говоре с. Булэешть) / a bea vin, se consideră un grup de împrumuturi slave cu мон|золетеи/ ‘мусолить; впустую теребить’. /n/ epentetic în limba română, căreia îi aparține și a cin- Ключевые слова: славяне, румыны, лексические sti. Interpretând corpul de fapte disponibil, putem presu- заимствования, этническая история, украинские диа- pune că convergență semantică dintre cuvintele честь и лекты, Молдова, Буковина. угощение a apărut în perioada slavică timpurie. Cuvântul românesc a cinsti este un argument important pentru da- Summary tarea timpurie a apariției acestei convergențe semantice. Romanian a cinsti in the light of some Așadar, cuvântul ucrainean частувати, ca și cuvântul po- Romanian-Slavic contacts lonez częstowac, apar independent unul de celălalt, la fel A group of Slavic loanwords with epenthetic /n/ in the ca și de cuvântul românesc a cinsti. Cuvântul românesc a Romanian language, to which a cinsti belongs, is consid- cinsti, ca și, în general, grupul menționat de împrumuturi ered. Interpreting the existing set of facts, the author sup- slave cu /n/ epentetic în limba română, reprezintă un rezul- poses that the semantic convergence between the Slav- tat al contactelor timpurii ale limbii române cu un dialect ic честь ‘honour’ and угощение ‘treat’ appeared as far back (dialecte) slav vechi, pentru care era caracteristică tendința as the Late Slavic period.
    [Show full text]
  • LD5655.V855 1989.K434.Pdf
    Power in Stalinist states: The Personality cult of Nicolae Ceausescu by John Oliver Kinder Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Arts in Political Science APPROVED: Timothy W. Luke Lynette G. Rummel Ronald G. Shaiko April 26, 1989 Blacksburg, Virginia Power in Stalinist states: The Personality cult of Nicolae Ceausescu by John Oliver Kinder Timothy W. Luke Political Science (ABSTRACT) This study examines the Socialist Republic of Romania as a Stalinist state which employs a personality cult. The leader of a state is the focus of a personality cult, but he does not enjoy the status it gives without consent from elsewhere within the government. In order to determine where this power comes from, three possible sources are discussed. These are: Nicolae Ceausescu, president of Romania; the state bureaucracy; and the people. The Soviet Union, during the time of Stalin, is used as a comparative element. When Nicolae Ceausescu came to power he did so with the consent of the elite. As the Romanian elite are less inclined to support his policies, Ceausescu has had to continually take steps to stay ahead of the opposition. The Romanian people also lent their support to Ceausescu earlier, and have since become discontented with the regime. This study concludes that a leader with a personality cult must have some form of consent to come into power, but his personal characteristics will determine how he leads and whether or not he will be able to remain in power if that consent is withdrawn.
    [Show full text]
  • Romania, Dobruja, Crimean Tatars and People Around Them
    Iulian Boldea (Editor) – Literature, Discourses and the Power of Multicultural Dialogue Arhipelag XXI Press, Tîrgu Mureș, 2017, eISBN: 978-606-8624-12-9 ROMANIA, DOBRUJA, CRIMEAN TATARS AND PEOPLE AROUND THEM Ismail Nilghiun Lecturer, PhD, Giresun University, Turkey Abstract:This paper attempts to highlight some aspects of social and cultural history of the Crimean Tatar ethnic minority in Romania, as part of the western hinderland of the Balck Sea, the south- eastern corner of Europe. This research is based on both quantitative and qualitative analysis for which I used some documents which are part of the heritage of the Başbakanlık Osmanlı Arşivi (The Ottoman Archives of the Prime Minister‟s Office), issues faced by the refugees during thier refuge from the Crimean peninsula to the Ottoman lands, challenges following thier settlement in the newly created state Romaia, the assimilation process influenced by the nationalist discourse of the Romanian political elites reflected on newpapers of time owned by Constanta County Library „Ioan Roman”. The body of the paper highlights some aspects of historical evolution of the Crimean Tatars living in today‟s Dobruja, Romania and provides detailes about their religion affiliation and demographic evolution based on Romanian official data. The conclusion lines of this paper shows my own views on cultural bridges build up by the Crimean Tatars ethnic minority of Dobruja and emphasizes minority‟s strugle to protect its cultural identiy. Keywords:Crimea, Dobruja, Crimean Tatars, historical memory, the challenge of diversity. 1.Introduction 1.1. Argument and methods related to this research It is a great pleasure for me to write this study dedicated to the Crimean Tatars of Dobruja, their origins, homeland and history, as I am a native Crimean Tatar borne in Dobruja, Romania.
    [Show full text]
  • The Moldavian and Romanian Dialectal Corpus
    MOROCO: The Moldavian and Romanian Dialectal Corpus Andrei M. Butnaru and Radu Tudor Ionescu Department of Computer Science, University of Bucharest 14 Academiei, Bucharest, Romania [email protected] [email protected] Abstract In this work, we introduce the Moldavian and Romanian Dialectal Corpus (MOROCO), which is freely available for download at https://github.com/butnaruandrei/MOROCO. The corpus contains 33564 samples of text (with over 10 million tokens) collected from the news domain. The samples belong to one of the following six topics: culture, finance, politics, science, sports and tech. The data set is divided into 21719 samples for training, 5921 samples for validation and another 5924 Figure 1: Map of Romania and the Republic of samples for testing. For each sample, we Moldova. provide corresponding dialectal and category labels. This allows us to perform empirical studies on several classification tasks such as Romanian is part of the Balkan-Romance group (i) binary discrimination of Moldavian versus that evolved from several dialects of Vulgar Romanian text samples, (ii) intra-dialect Latin, which separated from the Western Ro- multi-class categorization by topic and (iii) mance branch of languages from the fifth cen- cross-dialect multi-class categorization by tury (Coteanu et al., 1969). In order to dis- topic. We perform experiments using a shallow approach based on string kernels, tinguish Romanian within the Balkan-Romance as well as a novel deep approach based on group in comparative linguistics, it is referred to character-level convolutional neural networks as Daco-Romanian. Along with Daco-Romanian, containing Squeeze-and-Excitation blocks.
    [Show full text]
  • The National Councils of National Minorities in Serbia
    The national councils of national minorities in Serbia Katinka Beretka* and István Gergő Székely** January 2016 Recommended citation: Beretka Katinka and Székely István Gergő, “The national councils of national minorities in the Republic of Serbia”, Online Compendium Autonomy Arrangements in the World, January 2016, at www.world-autonomies.info. © 2016 Autonomy Arrangements in the World Content 1. Essential Facts and Figures 2. Autonomy in the Context of the State Structure 3. Establishment and Implementation of Autonomy 4. Legal Basis of Autonomy 5. Autonomous Institutions 6. Autonomous Powers 7. Financial Arrangements 8. Intergovernmental Relations 9. Inter-group Relations within the Autonomous Entity (not applicable) 10. Membership, “Quasi-citizenship” and Special Rights 11. General Assessment and Outlook Bibliography 2016 © Autonomy Arrangements in the World Project 1. Essential Facts and Figures 1 Serbia is located in the center of the Balkans, being an everyday subject of world news from the beginning of the 1990s, often due to ethnicity-related issues, ranging from civil war and secession to autonomy arrangements meant to accommodate ethnocultural diversity. Although according to the 2011 census almost 20% of the total population of the state (without Kosovo) belong to a minority group (see Table 1), in Serbia there are no officially recognized or unrecognized minorities. There is neither an exact enumeration of minority groups, nor clear principles to be followed about how a minority should be recognized. While the absence of precise regulations may be regarded as problematic, the approach of Serbia to the minority question can also be interpreted as being rather liberal, which may have resulted from the intention to protect ethnic Serb refugees who have become minorities abroad, including in the former Yugoslav member states.
    [Show full text]
  • 2.5.Roman**.Q ¥
    241 FROM THE UNITARY TO THE PLURALISTIC: FINE-TUNING MINORITY POLICY IN ROMANIA István Horváth and Alexandra Scacco Contents Abstract . 243 1. Introduction . 243 1.1. Demographic characteristics. 244 Table 1. Ethnic Structure of Romania’s Population Censuses of 1930, 1956, 1966, 1977 and 1992 . 245 Table 2. Ethnic Structure of Romania’s Population Censuses of 1930, 1956, 1977 and 1992 (in per cent). 246 Table 3. Ethnic and Religious Structure of Romania’s Population Census from1992 (Percentages) . 247 Table 4. Ethnic Structure of Romania’s Population by Districts. (Percentages) . 248 Figure 1. The Historical Regions of Romania. 249 1.2. Historical background . 249 Table 5. Romania’s Population According to Mother Tongue . 250 1.3. Political mobilisation of minority groups . 252 2. The Legal Framework for the Protection of National Minorities . 253 2.1. Constitutional provisions . 253 Table 6. Romanian Constitutional Provisions Relevant to Minorities . 254 2.2.Draft laws on minorities . 255 2.3. International legislation . 257 Table 7. International Documents (Multi-lateral and Bilateral Treaties) Signed by Romania . 257 3.The Institutional Framework of Minority Protection . 258 3.1. Representation in the legislature. 258 3.2. The Council for National Minorities. 259 3.3. The Department for the Protection of National Minorities . 260 3.4. Specific issues . 261 242 4. Local Public Administration in Romania . 262 4.1. Minority language use in local public administration . 264 5. Minority-language Education . 266 5.1. Special educational measures for the Roma population . 268 6. Conclusion . 269 7. Recommendations . 270 Further Reading. 271 Bibliography . 374 \ 243 FROM THE UNITARY TO THE PLURALISTIC: FINE-TUNING MINORITY POLICY IN ROMANIA István Horváth and Alexandra Scacco Abstract This chapter constructs a typology of the principal minority groups in Romania, incor- porating three types—the Hungarian minority, the Roma minority and the ‘smaller’ minor- ity groups (comprised of fewer than 100,000 members).
    [Show full text]
  • The Tension Between Self-Reliance
    Looking to Themselves: The Tension between Self-Reliance, Regionalism, and Support of Greater Romania within the Saxon Community in Transylvania 1918-1935 By Rachel Renz Mattair Submitted to Central European University History Department In partial fulfillment of the requirements for the degree of Master of Arts Supervisor: Balázs Trencsényi Second reader: Viktor Karády CEU eTD Collection Budapest, Hungary 2012 Copyright in the text of this thesis rests with the author. Copies by any process, either in full or part, may be made only in accordance with the instructions given by the author and lodged in the Central European Library. Details may be obtained from the librarian. This page must form a part of any such copies made. Further copies made in accordance with such instructions may not be made without the written permission of the author. CEU eTD Collection Abstract This thesis traces the changes in self-preservation policies of the Transylvanian Saxons from 1918 to 1935 as they transitioned from being a semi-autonomous group to an ethnic minority in the newly established Romanian state following the First World War. It examines the domestic and international alliances of both conservative Saxon elites and social dissidents on the basis of interwar cultural journals and press material. Particular emphasis is placed on the tension between rising National Socialist rhetoric from the German Reich and Transylvanian regionalism in these publications. Unlike many existing studies on this topic, the work offers a balanced approach between internal and external Saxon relations, and distinguishes between Saxon elite narratives and average outlooks. The various movements traced lead to the question of whether historians can even speak of a cohesive Saxon identity during the interwar period, or merely of fragmentation among community members.
    [Show full text]
  • Politiche E Pianificazioni Linguistiche in Bessarabia: Romenità
    UNIVERSITÀ DEGLI STUDI DI UDINE Corso di Dottorato in Scienze Linguistiche e Letterarie Ciclo XXV TESI DI DOTTORATO DI RICERCA Politiche e pianificazioni linguistiche in Bessarabia: romenità, russificazione, moldovenismo Dottorando Alessandro Zuliani Relatori Prof. Fabiana Fusco Prof. Celestina Zenobia Fanella Anno Accademico 2012-2013 INTRODUZIONE Tesi di dottorato di Alessandro Zuliani, discussa presso l'Università degli Studi di Udine 2 La presente ricerca verte sulle politiche e le pianificazioni linguistiche che hanno interessato la Bessarabia, regione europea che rappresenta l'estremità orientale della Romània continua, già parte del Principato di Moldavia e che oggi coincide pressappoco con i confini della Repubblica di Moldavia, stato nato dalla dissoluzione dell'Unione Sovietica nel 1991. Nel corso del nostro studio siamo partiti dal 1812, data di annessione della Bessarabia all’Impero russo, e abbiamo cercato di ripercorrere alcune tappe importanti che hanno portato all’attuale realtà sociolinguistica della Repubblica di Moldavia. Soffermandoci sulle politiche linguistiche della Russia zarista e sulle pianificazioni linguistiche sovietiche, abbiamo rilevato l'iniziale processo di profonda russificazione subito dalla popolazione autoctona dalla Bessarabia, cui è seguito il tentativo, in parte riuscito, di creare un nuovo popolo e un nuovo idioma. La cosiddetta lingua moldava altro non è che l'espressione della volontà di separare, anche linguisticamente, i romeni della Bessarabia dal resto della Romania, di fatto sancendo l'esistenza di due nazionalità e di due idiomi ben distinti. L’unità etnolinguistica romena dell’area compresa tra i fiumi Tibisco, Danubio, Dniestr e il Mar Nero, già affermata dagli storici e dai cronicari a partire dal XVI secolo, viene dunque messa in discussione e avversata in modo esplicito dalle tesi del moldovenismo, fenomeno linguistico e culturale incentrato sulla differenziazione etnica e linguistica tra moldavi e romeni.
    [Show full text]
  • Post-Communist Romania
    Political Science • Eastern Europe Carey Edited by Henry F. Carey Foreword by Norman Manea “Henry Carey’s collection captures with great precision the complex, contradic- tory reality of contemporary Romania. Bringing together Romanian, West European, and American authors from fields as diverse as anthropology, politi- Romania cal science, economics, law, print and broadcast journalism, social work, and lit- ROMANIA SINCE 1989 erature, the volume covers vast ground, but with striking detail and scholarship and a common core approach. Romania since 1989 provides perhaps the most comprehensive view of the continuing, murky, contested reality that is Romania today and is a must read for any scholar of modern Romania, of East-Central Europe, and of the uncertain, troubled, post-socialist era.” since 1989 —David A. Kideckel, Central Connecticut State University Sorin Antohi “The wealth of detail and quality of insights will make this an excellent source- Wally Bacon book for students of political change after the Cold War. It should be taken seri- Gabriel Ba˘ descu ously by policy practitioners increasingly involved with Romania’s problems.” Zoltan Barany —Tom Gallagher, Professor of Peace Studies, Bradford University, U.K. Politics, Jóhanna Kristín Birnir Larry S. Bush Those who study Romania must confront the theoretical challenges posed by a Economics, Pavel Câmpeanu country that is undergoing a profound transformation from a repressive totali- Henry F. Carey tarian regime to a hazy and as yet unrealized democratic government. The most and Society Daniel Da˘ ianu comprehensive survey of Romanian politics and society ever published abroad, Dennis Deletant this volume represents an effort to collect and analyze data on the complex prob- Christopher Eisterhold lems of Romania’s past and its transition into an uncertain future.
    [Show full text]
  • Is the EU Accession a Critical Juncture for Romania's Language Policy?
    MARÁCZ, LÁSZLÓ PHD [email protected] assistant professor (Department of European Studies, University of Amsterdam, The Netherlands) Is the EU Accession a Critical Juncture for Romania’s Language Policy? ABSTRACT In the course of history, Romania’s Transylvania was the home of a number of different ethno- linguistic groups, including Hungarians, Romanians, Germans, Jews and Roma among others. After the First World War, and particularly during the last decades of the communist regime, however, pressures to create a highly centralized, uniform, monolingual state with the Romanian language as its only official language have largely increased. This uniformizing French style Jacobin language policy became a key element of the Romanian state tradition, supported by the make-up of its institutions and legal provisions. As a consequence, the languages of Romania have been ordered hierarchically with the official Romanian language outranking the different minority languages, including Hungarian, German, Roma, Ukrainian, Slovakian, Serbian, Bulgarian, Ruthenian, Russian and so on. In this framework, the minority languages could be used at a local and regional level only. However, minority language use was restricted by language laws, thresholds and other hampering measures. The country’s accession to the European Union (EU) in 2007 has been celebrated as a critical juncture challenging the canonical top-down Jacobin state tradition and its exclusive language policy with respect to the minority languages. The analysis presented in this paper will weigh the pro’s and contra’s of this claim. It will be concluded that although minority languages have received more recognition under the new EU order than under former Romanian nationalizing regimes, like the preceding post-communist and communist rules, the implementation of a permissive minority language policy still shows serious deficiencies.
    [Show full text]
  • MOROCO: the Moldavian and Romanian Dialectal Corpus
    MOROCO: The Moldavian and Romanian Dialectal Corpus Andrei M. Butnaru and Radu Tudor Ionescu Department of Computer Science, University of Bucharest 14 Academiei, Bucharest, Romania [email protected] [email protected] Abstract In this work, we introduce the Moldavian and Romanian Dialectal Corpus (MOROCO), which is freely available for download at https://github.com/butnaruandrei/MOROCO. The corpus contains 33564 samples of text (with over 10 million tokens) collected from the news domain. The samples belong to one of the following six topics: culture, finance, politics, science, sports and tech. The data set is divided into 21719 samples for training, 5921 samples for validation and another 5924 Figure 1: Map of Romania and the Republic of samples for testing. For each sample, we Moldova. provide corresponding dialectal and category labels. This allows us to perform empirical studies on several classification tasks such as Romanian is part of the Balkan-Romance group (i) binary discrimination of Moldavian versus that evolved from several dialects of Vulgar Romanian text samples, (ii) intra-dialect Latin, which separated from the Western Ro- multi-class categorization by topic and (iii) mance branch of languages from the fifth cen- cross-dialect multi-class categorization by tury (Coteanu et al., 1969). In order to dis- topic. We perform experiments using a shallow approach based on string kernels, tinguish Romanian within the Balkan-Romance as well as a novel deep approach based on group in comparative linguistics, it is referred to character-level convolutional neural networks as Daco-Romanian. Along with Daco-Romanian, containing Squeeze-and-Excitation blocks.
    [Show full text]