The Moldavian and Romanian Dialectal Corpus

Total Page:16

File Type:pdf, Size:1020Kb

The Moldavian and Romanian Dialectal Corpus MOROCO: The Moldavian and Romanian Dialectal Corpus Andrei M. Butnaru and Radu Tudor Ionescu Department of Computer Science, University of Bucharest 14 Academiei, Bucharest, Romania [email protected] [email protected] Abstract In this work, we introduce the Moldavian and Romanian Dialectal Corpus (MOROCO), which is freely available for download at https://github.com/butnaruandrei/MOROCO. The corpus contains 33564 samples of text (with over 10 million tokens) collected from the news domain. The samples belong to one of the following six topics: culture, finance, politics, science, sports and tech. The data set is divided into 21719 samples for training, 5921 samples for validation and another 5924 Figure 1: Map of Romania and the Republic of samples for testing. For each sample, we Moldova. provide corresponding dialectal and category labels. This allows us to perform empirical studies on several classification tasks such as Romanian is part of the Balkan-Romance group (i) binary discrimination of Moldavian versus that evolved from several dialects of Vulgar Romanian text samples, (ii) intra-dialect Latin, which separated from the Western Ro- multi-class categorization by topic and (iii) mance branch of languages from the fifth cen- cross-dialect multi-class categorization by tury (Coteanu et al., 1969). In order to dis- topic. We perform experiments using a shallow approach based on string kernels, tinguish Romanian within the Balkan-Romance as well as a novel deep approach based on group in comparative linguistics, it is referred to character-level convolutional neural networks as Daco-Romanian. Along with Daco-Romanian, containing Squeeze-and-Excitation blocks. which is currently spoken in Romania, there We also present and analyze the most discrim- are three other dialects in the Balkan-Romance inative features of our best performing model, branch, namely Aromanian, Istro-Romanian, and before and after named entity removal. Megleno-Romanian. Moldavian is a subdialect of Daco-Romanian, that is spoken in the Republic of 1 Introduction Moldova and in northeastern Romania. The delim- The high number of evaluation campaigns on spo- itation of the Moldavian dialect, as with all other ken or written dialect identification conducted in Romanian dialects, is made primarily by analyz- recent years (Ali et al., 2017; Malmasi et al., ing its phonetic features and only marginally by 2016; Rangel et al., 2017; Zampieri et al., 2017, morphological, syntactical, and lexical character- 2018) prove that dialect identification is an inter- istics. Although the spoken dialects in Romania esting and challenging natural language process- and the Republic of Moldova are different, the two ing (NLP) task, actively studied by researchers in countries share the same literary standard (Mina- nowadays. Due to the recent interest in dialect han, 2013). Some linguists (Pavel, 2008) consider identification, we introduce the Moldavian and that the border between Romania and the Repub- Romanian Dialectal Corpus (MOROCO), which lic of Moldova (see Figure1) does not correspond is composed of 33564 samples of text collected to any significant isoglosses to justify a dialectal from the news domain. division. One question that arises in this context 688 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 688–698 Florence, Italy, July 28 - August 2, 2019. c 2019 Association for Computational Linguistics is whether we can train a machine to accurately nels, in (i) distinguishing the Moldavian and distinguish literary text samples written by peo- the Romanian dialects and in (ii) categoriz- ple in Romania from literary text samples writ- ing the text samples by topic. ten by people in the Republic of Moldova. If We organize the remainder of this paper as fol- we can construct such a machine, then what are lows. We discuss related work in Section2. We the discriminative features employed by this ma- describe the MOROCO data set in Section3. We chine? Our corpus formed of text samples col- present the chosen classification methods in Sec- lected from Romanian and Moldavian news web- tion4. We show empirical results in Section5, sites, enables us to answer these questions. Fur- and we provide a discussion on the discriminative thermore, MOROCO provides a benchmark for features in Section6. Finally, we draw our conclu- the evaluation of dialect identification methods. sion in Section7. To this end, we consider two state-of-the-art meth- ods, string kernels (Butnaru and Ionescu, 2018; 2 Related Work Ionescu and Butnaru, 2017; Ionescu et al., 2014) There are several corpora available for dialect and character-level convolutional neural networks identification (Ali et al., 2016; Alsarsour et al., (CNNs) (Ali, 2018; Belinkov and Glass, 2016; 2018; Bouamor et al., 2018; Francom et al., 2014; Zhang et al., 2015), which obtained the first two Johannessen et al., 2009; Kumar et al., 2018; places (Ali, 2018; Butnaru and Ionescu, 2018) in Samardziˇ c´ et al., 2016; Tan et al., 2014; Zaidan the Arabic Dialect Identification Shared Task of and Callison-Burch, 2011). Most of these cor- the 2018 VarDial Evaluation Campaign (Zampieri pora have been proposed for languages that are et al., 2018). We also experiment with a novel widely spread across the globe, e.g. Arabic (Ali CNN architecture inspired the recently introduced et al., 2016; Alsarsour et al., 2018; Bouamor et al., Squeeze-and-Excitation (SE) networks (Hu et al., 2018), Spanish (Francom et al., 2014), Indian (Ku- 2018), which exhibit state-of-the-art performance mar et al., 2018) or German (Samardziˇ c´ et al., in object recognition from images. To our knowl- 2016). Among these, Arabic is the most popu- edge, we are the first to introduce Squeeze-and- lar, with a number of four data sets (Ali et al., Excitation networks in the text domain. 2016; Alsarsour et al., 2018; Bouamor et al., 2018; As we provide category labels for the collected Zaidan and Callison-Burch, 2011), if not even text samples, we can perform additional exper- more. iments on various text categorization by topic Arabic. The Arabic Online news Commentary tasks. One type of task is intra-dialect multi-class (AOC) (Zaidan and Callison-Burch, 2011) is the categorization by topic, i.e. the task is to clas- first available dialectal Arabic data set. Although sify the samples written either in the Moldavian AOC contains 3.1 million comments gathered dialect or in the Romanian dialect into one of the from Egyptian, Gulf and Levantine news websites, following six topics: culture, finance, politics, sci- the authors labeled only around 0:05% of the data ence, sports and tech. Another type of task is set through the Amazon Mechanical Turk crowd- cross-dialect multi-class categorization by topic, sourcing platform. Ali et al.(2016) constructed i.e. the task is to classify the samples written in a data set of audio recordings, Automatic Speech one dialect, e.g. Romanian, into six topics, using Recognition transcripts and phonetic transcripts of a model trained on samples written in the other Arabic speech collected from the Broadcast News dialect, e.g. Moldavian. These experiments are domain. The data set was used in the 2016, 2017 aimed at showing if the considered text categoriza- and 2018 VarDial Evaluation Campaigns (Mal- tion methods are robust to the dialect shift between masi et al., 2016; Zampieri et al., 2017, 2018). training and testing. Alsarsour et al.(2018) collected the Dialectal In summary, our contribution is threefold: ARabic Tweets (DART) data set, which contains • We introduce a novel large corpus containing around 25K manually-annotated tweets. The data 33564 text samples written in the Moldavian set is well-balanced over five main groups of Ara- and the Romanian dialects. bic dialects: Egyptian, Maghrebi, Levantine, Gulf • We introduce Squeeze-and-Excitation net- and Iraqi. Bouamor et al.(2018) presented a large works to the text domain. parallel corpus of 25 Arabic city dialects, which • We analyze the discriminative features that was created by translating selected sentences from help the best performing method, string ker- the travel domain. 689 Other languages. The Nordic Dialect Corpus (Johannessen et al., 2009) contains about 466K spoken words from Denmark, Faroe Islands, Ice- land, Norway and Sweden. The authors tran- scribed each dialect by the standard official or- thography of the corresponding country. Francom et al.(2014) introduced the ACTIV-ES corpus, which represents a cross-dialectal record of the informal language use of Spanish speakers from Argentina, Mexico and Spain. The data set is Figure 2: The distribution of text samples per topic for the Moldavian and the Romanian dialects, respectively. composed of 430 TV or movie subtitle files. The Best viewed in color. DSL corpus collection (Tan et al., 2014) comprises news data from various corpora to emulate the in Figure2. In both countries, we notice that the diverse news content across different languages. most popular topics are finance and politics, while The collection is comprised of six language vari- the least popular topics are culture and science. ety groups. For each language, the collection con- The distribution of topics for the two dialects is tains 18K training sentences, 2K validation sen- mostly similar, but not very well-balanced. For in- tences and 1K test sentences. The ArchiMob cor- stance, the number of Moldavian politics samples pus (Samardziˇ c´ et al., 2016) contains manually- (5154) is about six times higher than the number annotated transcripts of Swiss German speech col- of Moldavian science samples (877). However, lected from four different regions: Basel, Bern, MOROCO is well-balanced when it comes to the Lucerne and Zurich. The data set was used in distribution of samples per dialect, since we were the 2017 and 2018 VarDial Evaluation Campaigns able to collect 15403 Moldavian text samples and (Zampieri et al., 2017, 2018). Kumar et al.(2018) 18161 Romanian text samples.
Recommended publications
  • Romanian Language and Its Dialects
    Social Sciences ROMANIAN LANGUAGE AND ITS DIALECTS Ana-Maria DUDĂU1 ABSTRACT: THE ROMANIAN LANGUAGE, THE CONTINUANCE OF THE LATIN LANGUAGE SPOKEN IN THE EASTERN PARTS OF THE FORMER ROMAN EMPIRE, COMES WITH ITS FOUR DIALECTS: DACO- ROMANIAN, AROMANIAN, MEGLENO-ROMANIAN AND ISTRO-ROMANIAN TO COMPLETE THE EUROPEAN LINGUISTIC PALETTE. THE ROMANIAN LINGUISTS HAVE ALWAYS SHOWN A PERMANENT CONCERN FOR BOTH THE IDENTITY AND THE STATUS OF THE ROMANIAN LANGUAGE AND ITS DIALECTS, THUS SUPPORTING THE EXISTENCE OF THE ETHNIC, LINGUISTIC AND CULTURAL PARTICULARITIES OF THE MINORITIES AND REJECTING, FIRMLY, ANY ATTEMPT TO ASSIMILATE THEM BY FORCE KEYWORDS: MULTILINGUALISM, DIALECT, ASSIMILATION, OFFICIAL LANGUAGE, SPOKEN LANGUAGE. The Romanian language - the only Romance language in Eastern Europe - is an "island" of Latinity in a mainly "Slavic sea" - including its dialects from the south of the Danube – Aromanian, Megleno-Romanian and Istro-Romanian. Multilingualism is defined narrowly as the alternative use of several languages; widely, it is use of several alternative language systems, regardless of their status: different languages, dialects of the same language or even varieties of the same idiom, being a natural consequence of linguistic contact. Multilingualism is an Europe value and a shared commitment, with particular importance for initial education, lifelong learning, employment, justice, freedom and security. Romanian language, with its four dialects - Daco-Romanian, Aromanian, Megleno- Romanian and Istro-Romanian – is the continuance of the Latin language spoken in the eastern parts of the former Roman Empire. Together with the Dalmatian language (now extinct) and central and southern Italian dialects, is part of the Apenino-Balkan group of Romance languages, different from theAlpine–Pyrenean group2.
    [Show full text]
  • Peace Corps Romania Survival Romanian Language Lessons Pre-Departure On-Line Training
    US Peace Corps in Romania Survival Romanian Peace Corps Romania Survival Romanian Language Lessons Pre-Departure On-Line Training Table of Contents………………………………………………………………………. 1 Introduction……………………………………………………………………………… 2 Lesson 1: The Romanian Alphabet………………………………………………… 3 Lesson 2: Greetings…………………………………………………………………… 4 Lesson 3: Introducing self…………………………………………………………… 5 Lesson 4: Days of the Week…………………………………………………………. 6 Lesson 5: Small numbers……………………………………………………………. 7 Lesson 6: Big numbers………………………………………………………………. 8 Lesson 7: Shopping………………………………………………………………….. 9 Lesson 8: At the restaurant………………………………………………………..... 10 Lesson 9: Orientation………………………………………………………………… 11 Lesson 10: Useful phrases ……………………………………………………. 12 1 Survival Romanian, Peace Corps/Romania – December 2006 US Peace Corps in Romania Survival Romanian Introduction Romanian (limba română 'limba ro'mɨnə/) is one of the Romance languages that belong to the Indo-European family of languages that descend from Latin along with French, Italian, Spanish and Portuguese. It is the fifth of the Romance languages in terms of number of speakers. It is spoken as a first language by somewhere around 24 to 26 million people, and enjoys official status in Romania, Moldova and the Autonomous Province of Vojvodina (Serbia). The official form of the Moldovan language in the Republic of Moldova is identical to the official form of Romanian save for a minor rule in spelling. Romanian is also an official or administrative language in various communities and organisations (such as the Latin Union and the European Union – the latter as of 2007). It is a melodious language that has basically the same sounds as English with a few exceptions. These entered the language because of the slavic influence and of many borrowing made from the neighboring languages. It uses the Latin alphabet which makes it easy to spell and read.
    [Show full text]
  • (Daco-Romanian and Aromanian) in Verbs of Latin Origin
    CONJUGATION CHANGES IN THE EVOLUTION OF ROMANIAN (DACO-ROMANIAN AND AROMANIAN) IN VERBS OF LATIN ORIGIN AIDA TODI1, MANUELA NEVACI2 Abstract. Having as starting point for research on the change of conjugation of Latin to the Romance languages, the paper aims to present the situation of these changes in Romanian: Daco-Romanian (that of the old Romanian texts) and Aromanian dialect (which does not have a literary standard). Keywords: conjugation changes; Romance languages; Romanian language (Daco-Romanian and Aromanian). Conjugation changes are a characteristic feature of the Romance verb system. In some Romance languages (Spanish, Catalan, Portuguese, Sardinian) verbs going from one conjugation to another has caused the reduction of the four conjugations inherited from Latin to three inflection classes: in Spanish and Portuguese the 2nd conjugation extended (verbs with stressed theme vowel): véndere > Sp., Pg. vender; cúrere > Sp., Pg. correr; in Catalan 3rd conjugation verbs assimilated the 2nd conjugation ones, a phenomenon occurring in Sardinian as well: Catal. ventre, Srd. biere; Additionally, in Spanish and Portuguese the 4th conjugation also becomes strong, assimilating 3rd conjugation verbs: petĕre > Sp., Pg. petir, ungĕre> Sp., Pg. ungir, iungĕre> Sp., Pg. ungir (Lausberg 1988: 259). Lausberg includes Aromanian together with Spanish, Catalan and Sardinian, where the four conjugations were reduced to three, mentioning that 3rd conjugation verbs switched to the 2nd conjugation3. The process of switching from one conjugation to another is frequent from as early as vulgar Latin. Grammars experience changes such as: augĕre > augēre; ardĕre> ardēre; fervĕre > fervēre; mulgĕre> mulgēre; respondĕre> respondēre; sorbĕre> sorbēre; torquĕre > torquēre; tondĕre >tondēre (Densusianu 1961: 103, 1 “Ovidius” University, Constanţa, [email protected].
    [Show full text]
  • Theoretical and Methodological Approaches to the Sociolinguistic Integration of to Approaches Migration Andthe Sociolinguistic Methodological  Theoretical
    Theoretical Methodological and the Sociolinguistic Migration Approaches to of Integration Theoretical and Methodological Approaches to the Sociolinguistic • Florentino Paredes García and María Sancho Pascual Integration of Migration Edited by Florentino Paredes García and María Sancho Pascual Printed Edition of the Special Issue Published in Languages www.mdpi.com/journal/languages Theoretical and Methodological Approaches to the Sociolinguistic Integration of Migration Theoretical and Methodological Approaches to the Sociolinguistic Integration of Migration Special Issue Editors Florentino Paredes Garc´ıa Mar´ıa Sancho Pascual MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin Special Issue Editors Florentino Paredes Garc´ıa Mar´ıa Sancho Pascual University of Alcala´ Complutense University of Madrid Spain Spain Editorial Office MDPI St. Alban-Anlage 66 4052 Basel, Switzerland This is a reprint of articles from the Special Issue published online in the open access journal Languages (ISSN 2226-471X) (available at: https://www.mdpi.com/journal/languages/special issues/sociolinguistic migration). For citation purposes, cite each article independently as indicated on the article page online and as indicated below: LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Article Number, Page Range. ISBN 978-3-03936-192-2 (Hbk) ISBN 978-3-03936-193-9 (PDF) Cover image courtesy of Florentino Paredes Garc´ıa and Mar´ıa Sancho Pascual. c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.
    [Show full text]
  • East European Languages
    Latest Trends on Cultural Heritage and Tourism SEMANTIC CONCORDANCE IN THE SOUTH- EAST EUROPEAN LANGUAGES ANCUłA NEGREA, AGNES ERICH, ZLATE STEFANIA Faculty of Humanities Valahia University of Târgoviste Address: Street Lt. Stancu Ion, No. 35, Targoviste COUNTRY ROMANIA [email protected]; [email protected] Abstract: - The common terms of the southeast European languages that present semantic convergences [1], belong to a less explored domain, that of comparative linguistics. “The reason why the words have certain significations and not others is not found in the linguistic system or in the human biological or mental structures … it is the societies that, through their diverse history, determine the significations, and the semantics is found at the crossroads between the historical complexity of the reality it studies and the historical complexity of the culture reflecting on this reality… its object is the result and the condition of the historical character, thanks to the relations of mutual conditioning that relate the signification to the historic events of each human community.”[2] Key-Words: Balkan linguistics, terminology of the word “house”: Bg. kăšta, Srb., Cr. kuča, Maced. kuk’a, slov. koča < old sl. kotia meaning “what is hidden, what is sheltered”, Rom. casa < Lat. casa; Ngr. σπητη < Lat. hospitium; Alb. shtepi. 1 Introduction NeoGreek, and Albanese, present unanimous The “mentality” common to the Balkan world, concrdances for the terms having the meaning: having manifestations in the language, is the - “house”, “dwelling” and also historical product of the cohabitation in similar - ”room”, “place to live”, though these forms of civilization and culture. terms have different origins: Kristian SANDFELD, in his fundamental work - Bg.
    [Show full text]
  • Aleksey A. ROMANCHUK ROMANIAN a CINSTI in the LIGHT of SOME
    2021, Volumul XXIX REVISTA DE ETNOLOGIE ȘI CULTUROLOGIE E-ISSN: 2537-6152 101 Aleksey A. ROMANCHUK ROMANIAN A CINSTI IN THE LIGHT OF SOME ROMANIAN-SLAVIC CONTACTS1 https://doi.org/10.52603/rec.2021.29.14 Rezumat определенное отражение следы обозначенного поздне- Cuvântul românesc a cinsti în lumina unor contacte праславянского диалекта (диалектов). В частности, к româno-slave таким следам, возможно, стоит отнести как украинские Pornind de la comparația dintre cuvântul ucrainean диалектные чандрий, шандрий, чендрий, так и диалект- частувати ‚a trata’ și românescul a cinsti‚ a trata (cu vin), ное (зафиксировано в украинском говоре с. Булэешть) / a bea vin, se consideră un grup de împrumuturi slave cu мон|золетеи/ ‘мусолить; впустую теребить’. /n/ epentetic în limba română, căreia îi aparține și a cin- Ключевые слова: славяне, румыны, лексические sti. Interpretând corpul de fapte disponibil, putem presu- заимствования, этническая история, украинские диа- pune că convergență semantică dintre cuvintele честь и лекты, Молдова, Буковина. угощение a apărut în perioada slavică timpurie. Cuvântul românesc a cinsti este un argument important pentru da- Summary tarea timpurie a apariției acestei convergențe semantice. Romanian a cinsti in the light of some Așadar, cuvântul ucrainean частувати, ca și cuvântul po- Romanian-Slavic contacts lonez częstowac, apar independent unul de celălalt, la fel A group of Slavic loanwords with epenthetic /n/ in the ca și de cuvântul românesc a cinsti. Cuvântul românesc a Romanian language, to which a cinsti belongs, is consid- cinsti, ca și, în general, grupul menționat de împrumuturi ered. Interpreting the existing set of facts, the author sup- slave cu /n/ epentetic în limba română, reprezintă un rezul- poses that the semantic convergence between the Slav- tat al contactelor timpurii ale limbii române cu un dialect ic честь ‘honour’ and угощение ‘treat’ appeared as far back (dialecte) slav vechi, pentru care era caracteristică tendința as the Late Slavic period.
    [Show full text]
  • The Balkan Vlachs/Aromanians Awakening, National Policies, Assimilation Miroslav Ruzica Preface the Collapse of Communism, and E
    The Balkan Vlachs/Aromanians Awakening, National Policies, Assimilation Miroslav Ruzica Preface The collapse of communism, and especially the EU human rights and minority policy programs, have recently re-opened the ‘Vlach/Aromanian question’ in the Balkans. The EC’s Report on Aromanians (ADOC 7728) and its separate Recommendation 1333 have become the framework for the Vlachs/Aromanians throughout the region and in the diaspora to start creating programs and networks, and to advocate and shape their ethnic, cultural, and linguistic identity and rights.1 The Vlach/Aromanian revival has brought a lot of new and reopened some old controversies. A increasing number of their leaders in Serbia, Macedonia, Bulgaria and Albania advocate that the Vlachs/Aromanians are actually Romanians and that Romania is their mother country, Romanian language and orthography their standard. Such a claim has been officially supported by the Romanian establishment and scholars. The opposite claim comes from Greece and argues that Vlachs/Aromanians are Greek and of the Greek culture. Both countries have their interpretations of the Vlach origin and history and directly apply pressure to the Balkan Vlachs to accept these identities on offer, and also seek their support and political patronage. Only a minority of the Vlachs, both in the Balkans and especially in the diaspora, believes in their own identity or that their specific vernaculars should be standardized, and that their culture has its own specific elements in which even their religious practice is somehow distinct. The recent wars for Yugoslav succession have renewed some old disputes. Parts of Croatian historiography claim that the Serbs in Croatia (and Bosnia) are mainly of Vlach origin, i.e.
    [Show full text]
  • The Maskilim of Romania and the Question of Identity: "The Romanian Israelites"
    www.ssoar.info The Maskilim of Romania and the Question of Identity: "The Romanian Israelites" Herșcovici, Lucian-Zeev Veröffentlichungsversion / Published Version Zeitschriftenartikel / journal article Empfohlene Zitierung / Suggested Citation: Herșcovici, L.-Z. (2018). The Maskilim of Romania and the Question of Identity: "The Romanian Israelites". Annals of the University of Bucharest / Political science series, 2018(1), 5-26. https://nbn-resolving.org/urn:nbn:de:0168- ssoar-73989-6 Nutzungsbedingungen: Terms of use: Dieser Text wird unter einer CC BY-NC-ND Lizenz This document is made available under a CC BY-NC-ND Licence (Namensnennung-Nicht-kommerziell-Keine Bearbeitung) zur (Attribution-Non Comercial-NoDerivatives). For more Information Verfügung gestellt. Nähere Auskünfte zu den CC-Lizenzen finden see: Sie hier: https://creativecommons.org/licenses/by-nc-nd/2.0 https://creativecommons.org/licenses/by-nc-nd/2.0/deed.de CONSTRUCTING IDENTITY THE MASKILIM OF ROMANIA AND THE QUESTION OF IDENTITY: “THE ROMANIAN ISRAELITES” LUCIAN-ZEEV HERȘCOVICI Abstract . The aim of this paper is to answer some questions concerning the identity of the maskilim of Romania, mainly those of the second generation, called "the generation of 1878" or "the generation of the Congress of Berlin". They called themselves "Romanian Israelites," similarly to the maskilim of other countries, just like the "French Israelites," "German Israelites," "Russian Israelites," and so on. What was it that defined their Jewish identity and what their Romanian one? When did
    [Show full text]
  • The Jabnean Academy: Rabbi Dr. Jacob Isaac Niemirower's Proposal
    www.ssoar.info The Jabnean Academy: Rabbi Dr. Jacob Isaac Niemirower’s Proposal to Found a Modern Academy of the Jewish People Herșcovici, Lucian-Zeev Veröffentlichungsversion / Published Version Zeitschriftenartikel / journal article Empfohlene Zitierung / Suggested Citation: Herșcovici, L.-Z. (2014). The Jabnean Academy: Rabbi Dr. Jacob Isaac Niemirower’s Proposal to Found a Modern Academy of the Jewish People. Annals of the University of Bucharest / Political science series, 16(1), 39-52. https:// nbn-resolving.org/urn:nbn:de:0168-ssoar-399012 Nutzungsbedingungen: Terms of use: Dieser Text wird unter einer CC BY-NC-ND Lizenz This document is made available under a CC BY-NC-ND Licence (Namensnennung-Nicht-kommerziell-Keine Bearbeitung) zur (Attribution-Non Comercial-NoDerivatives). For more Information Verfügung gestellt. Nähere Auskünfte zu den CC-Lizenzen finden see: Sie hier: https://creativecommons.org/licenses/by-nc-nd/4.0 https://creativecommons.org/licenses/by-nc-nd/4.0/deed.de UNIVERSITY INSTITUTIONAL HISTORY: POLITICAL AND IDENTITY STAKES HISTOIRE INSTITUTIONNELLE UNIVERSITAIRE: ENJEUX POLITIQUES ET IDENTITAIRES ISTORIE institutional ! UNIVERSITARÀ: INTERESE POLITICE §I IDENTITARE THE JABNEAN ACADEMY: RABBI DR. JACOB ISAAC NIEMIROWER’S PROPOSAL TO FOUND A MODERN ACADEMY OF THE JEWISH PEOPLE* LUCIAN-ZEEV HER^COVICI Abstract (The Jabnean Academy: Rabbi Dr. Jacob Isaac Niemirower’s Proposal to Found a Modern Academy of the Jewish People): The aim of this article is to present and to analyze Rabbi Dr. Jacob Isaac Niemirower’s idea of the necessity of founding a modern Academy of the Jewish people in Jerusalem and his proposal and project for its foundation. When and why did Rabbi Dr.
    [Show full text]
  • Linguistic Resources and Technologies for Romanian Language
    Computer Science Journal of Moldova, vol.14, no.1(40), 2006 Linguistic Resources and Technologies for Romanian Language Dan Cristea Corina For˘ascu Abstract This paper revises notions related to Language Resources and Technologies (LRT), including a brief overview of some resources developed worldwide and with a special focus on Romanian lan- guage. It then describes a joined Romanian, Moldavian, English initiative aimed at developing electronically coded resources for Romanian language, tools for their maintenance and usage, as well as for the creation of applications based on these resources. 1 Introduction As we begin the 21st century, unhampered access to information tech- nology is one of the foremost requirements for social development. Hu- man Language Technology (HLT), sometimes called Language Engi- neering, concerned with providing information to the global commu- nity in natural, “human”, language by supporting unrestricted access to text and speech media documents, is one of the most active research areas nowadays. To ensure that humans and machines have adequate access to resources expressed in natural language, including those on the Internet, technologies such as information retrieval and extraction, speech recognition and text-to-speech capabilities, machine translation, etc. must be developed for the widest possible variety of human lan- guages and language families. Such a technological level is, at present, critical for languages less electronically visible, in order to ensure their appropriate exposure on the Web, especially on the Semantic Web, in a format and with processing capabilities able to give them quick and °c 2006 by D. Cristea, C. For¸ascu 34 Linguistic Resources and Technologies for Romanian Language complete integration with the new technological developments recently put on scene or which are being prepared worldwide.
    [Show full text]
  • (Romanian) on L2/L3 Learning (Catalan/Spanish) Simona Popa
    Nom/Logotip de la Universitat on s’ha llegit la tesi Language transfer in second language acquisition. Some effects of L1 instruction (Romanian) on L2/L3 learning (Catalan/Spanish) Simona Popa http://hdl.handle.net/10803/380548 ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. ADVERTENCIA. El acceso a los contenidos de esta tesis doctoral y su utilización debe respetar los derechos de la persona autora. Puede ser utilizada para consulta o estudio personal, así como en actividades o materiales de investigación y docencia en los términos establecidos en el art. 32 del Texto Refundido de la Ley de Propiedad Intelectual (RDL 1/1996).
    [Show full text]
  • Hungarian Influence Upon Romanian People and Language from the Beginnings Until the Sixteenth Century
    SECTION: HISTORY LDMD I HUNGARIAN INFLUENCE UPON ROMANIAN PEOPLE AND LANGUAGE FROM THE BEGINNINGS UNTIL THE SIXTEENTH CENTURY PÁL Enikő, Assistant Professor, PhD, „Sapientia” University of Târgu-Mureş Abstract: In case of linguistic contacts the meeting of different cultures and mentalities which are actualized within and through language are inevitable. There are some common features that characterize all kinds of language contacts, but in describing specific situations we will take into account different causes, geographical conditions and particular socio-historical settings. The present research focuses on the peculiarities of Romanian – Hungarian contacts from the beginnings until the sixteenth century insisting on the consequences of Hungarian influence upon Romanian people and language. Hungarians induced, directly or indirectly, many social and cultural transformations in the Romanian society and, as expected, linguistic elements (e.g. specific terminologies) were borrowed altogether with these. In addition, Hungarian language influence may be found on each level of Romanian language (phonetic, lexical etc.), its contribution consists of enriching the Romanian language system with new elements and also of triggering and accelerating some internal tendencies. In the same time, bilingualism has been a social reality for many communities, mainly in Transylvania, which contributed to various linguistic changes of Romanian language not only in this region, but in other Romanian linguistic areas as well. Hungarian elements once taken over by bilinguals were passed on to monolinguals, determining the development of Romanian language. Keywords: language contact, (Hungarian) language influence, linguistic change, bilingualism, borrowings Preliminary considerations Language contacts are part of everyday social life for millions of people all around the world (G.
    [Show full text]