Template Consilr 2006

Total Page:16

File Type:pdf, Size:1020Kb

Template Consilr 2006 PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE “LINGUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE” IAȘI, 22-23 NOVEMBER 2018 Editors: Vasile Păiș Daniela Gîfu Diana Trandabăț Dan Cristea Dan Tufiș Organisers Faculty of Computer Science “Alexandru Ioan Cuza” University of Iași Research Institute for Artificial Intelligence “Mihai Drăgănescu” Romanian Academy, Bucharest Institute for Computer Science, Romanian Academy, Iași Romanian Association of Computational Linguistics Under the auspices of the Academy of Technical Sciences ISSN 1843-911X SCIENTIFIC COMMITTEE Mihaela Colhon, Faculty of Mathematics and Natural Science, University of Craiova Dan Cristea, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Daniela Gîfu, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Adrian Iftene, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Verginica Barbu Mititelu, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Mihai Alex Moruz, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Vasile Păiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Ionuț Cristian Pistol, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Elena Isabelle Tamba, “A. Philippide” Institute for Romanian Philology, Romanian Academy, Iași Horia-Nicolai Teodorescu, ARFI-IIT & Faculty of Electronics, Telecommunications and Information Technology, “Gheorghe Asachi” Technical University of Iași Diana Trandabăț, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Ștefan Trăușan-Matu, Computer Science Department, “POLITEHNICA” University of Bucharest & Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest; Dan Tufiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Marius Zbancioc, Institute for Computer Science, Romanian Academy, Iași iii ORGANISING COMMITTEE Anca Diana Bibiri, Department of Interdisciplinary Research – Humanities and Social Sciences, “Alexandru Ioan Cuza” University of Iași Dan Cristea, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Paul Diac, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Lucian Gâdioi, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Daniela Gîfu, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Adrian Iftene, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Mihaela Onofrei, Institute for Computer Science, Romanian Academy, Iași Vasile Păiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest Andrei Scutelnicu, Faculty of Computer Science, “Alexandru Ioan Cuza” University and Institute for Computer Science, Romanian Academy, Iași Diana Trandabăț, Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași Dan Tufiș, Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, Bucharest TABLE OF CONTENTS TABLE OF CONTENTS ................................................................................................................ V FOREWORD ................................................................................................................................ VI CHAPTER 1 SPEECH PROCESSING ......................................................................................... 1 A Psycholinguistic View in Romanian Intonation Interpretation ....................................................................... 3 Doina Jitcă Nonlinear System for Romanian Voice Processing ......................................................................................... 13 Carmen Grigoraș, Victor Grigoraș, Vasile Apopei Comparison of I-Vector And GMM-UBM Speaker Recognition on a Romanian Large Speech Corpus............ 25 Alexandru-Lucian Georgescu, Horia Cucu, Corneliu Burileanu A Comparison Between Traditional Machine Learning Approaches and Deep Neural Networks for Text Processing in Romanian ................................................................................................................................. 33 Adriana Stan, Mircea Giurgiu Prosodic Phrasing between Subjective Perception and Objectivity .................................................................. 43 Vasile Apopei, Otilia Paduraru CHAPTER 2 TEXT PROCESSING .............................................................................................51 On Text Processing after OCR ....................................................................................................................... 53 Alexandru Colesnicov, Svetlana Cojocaru, Ludmila Malahov, Lyudmila Burtseva Romanian Diacritics Restoration using Recurrent Neural Networks ................................................................ 61 Stefan Ruseti, Teodor-Mihai Cotet, Mihai Dascalu TEPROLIN: an Extensible, Online Text Preprocessing Platform for Romanian............................................... 69 Radu Ion How to Find out Whether the Romanian Language Was Influenced by the Two Historical Unions? ................ 77 Dan Cristea, Daniela Gîfu, Svetlana Cojocaru, Alexandru Colesnicov, Ludmila Malahov, Marius Popescu, Mihaela Onofrei, Cecilia Bolea CHAPTER 3 LINGUISTIC RESOURCES ..................................................................................89 More Romanian Word Embeddings from the ReTeRom Project ..................................................................... 91 Vasile Păiș, Dan Tufiș Valence Dictionary for Romanian Language in Printed Version and XML Format ........................................ 101 Ana-Maria Barbu Nonstandard vs. RRT: Comparison of Two Romanian Corpora in UD .......................................................... 113 Cătălina Mărănduc, Victoria Bobicev Interlinking and Extending Large Lexical Resources for Romanian .............................................................. 125 Mihai Alex Moruz, Andrei Scutelnicu, Dan Cristea CHAPTER 4 APPLICATIONS ................................................................................................... 133 README - Improving Writing Skills in Romanian Language ...................................................................... 135 Maria-Dorinela Sirbu, Mihai Dascălu, Daniela Gîfu, Teodor-Mihai Cotet, Adrian Tosca, Stefan Trausan- Matu New GIS Based Approaches for the Linguistic Atlases ................................................................................. 147 Silviu-Ioan Bejinariu, Vasile Apopei, Manuela Nevaci, Nicolae Saramandu Let’s Teach Computer Poetry: Automatic Rhyme Detection ......................................................................... 159 Victoria Bobicev, Cătălina Mărănduc Extracting Actions from Romanian Instructions for IoT Devices ................................................................... 169 Bianca I. Nenciu, Stefan Ruseti, Mihai Dascalu Identifying Fake News on Twitter Using Naïve Bayes, SVM and Random Forest Distributed Algorithms ..... 177 Ciprian-Gabriel Cușmaliuc, Lucia-Georgiana Coca, Adrian Iftene INDEX OF AUTHORS................................................................................................................ 190 FOREWORD At its inception, in 2001, the ConsILR Conference (traditional acronym for Consorțiul de Informatizare pentru Limba Română – Consortium of Informatization for Romanian Language) was an initiative born in the Section of Information Science and Technology of the Romanian Academy, meant to bring together at the same table linguists and computational linguists, researchers in different fields of humanities, PhD students and master students in computational linguistics, pure linguistics and philologists, all those that have a major interest in the study of the Romanian language from a computational perspective. The series of events have run, with few exceptions, once every year, first in the format of a workshop, and since 2010 – as a conference. In order to reach wider visibility, the organisers decided to make the Conference itinerant and to publish its Proceedings in English. Thus, ConsILR is not strictly addressed to researchers working on Romanian language but also to other scientists, from any part of the world, which could find sources of inspiration in the models and techniques developed for our language and apply them for their own languages. Opening the gate for researchers working on languages other than Romanian to participate in the Conference and publish their results in this series of Proceedings, a reverse influence is also expected, namely that their work inspire scientists dealing with Romanian language. As always, this year too, the organisers of the Conference Linguistic Resources and Technologies for Romanian Language are: the Faculty of Computer Science of the “Alexandru Ioan Cuza” University of Iași and two institutes of the Romanian Academy: the Research Institute for Artificial Intelligence “Mihai Drăgănescu”, in Bucharest, and the Institute for Computer Science, in Iași. We are honored that the Academy of Technical Sciences has offered again its high auspices to run the event under and, for the first time, the Romanian Association
Recommended publications
  • Romanian Language and Its Dialects
    Social Sciences ROMANIAN LANGUAGE AND ITS DIALECTS Ana-Maria DUDĂU1 ABSTRACT: THE ROMANIAN LANGUAGE, THE CONTINUANCE OF THE LATIN LANGUAGE SPOKEN IN THE EASTERN PARTS OF THE FORMER ROMAN EMPIRE, COMES WITH ITS FOUR DIALECTS: DACO- ROMANIAN, AROMANIAN, MEGLENO-ROMANIAN AND ISTRO-ROMANIAN TO COMPLETE THE EUROPEAN LINGUISTIC PALETTE. THE ROMANIAN LINGUISTS HAVE ALWAYS SHOWN A PERMANENT CONCERN FOR BOTH THE IDENTITY AND THE STATUS OF THE ROMANIAN LANGUAGE AND ITS DIALECTS, THUS SUPPORTING THE EXISTENCE OF THE ETHNIC, LINGUISTIC AND CULTURAL PARTICULARITIES OF THE MINORITIES AND REJECTING, FIRMLY, ANY ATTEMPT TO ASSIMILATE THEM BY FORCE KEYWORDS: MULTILINGUALISM, DIALECT, ASSIMILATION, OFFICIAL LANGUAGE, SPOKEN LANGUAGE. The Romanian language - the only Romance language in Eastern Europe - is an "island" of Latinity in a mainly "Slavic sea" - including its dialects from the south of the Danube – Aromanian, Megleno-Romanian and Istro-Romanian. Multilingualism is defined narrowly as the alternative use of several languages; widely, it is use of several alternative language systems, regardless of their status: different languages, dialects of the same language or even varieties of the same idiom, being a natural consequence of linguistic contact. Multilingualism is an Europe value and a shared commitment, with particular importance for initial education, lifelong learning, employment, justice, freedom and security. Romanian language, with its four dialects - Daco-Romanian, Aromanian, Megleno- Romanian and Istro-Romanian – is the continuance of the Latin language spoken in the eastern parts of the former Roman Empire. Together with the Dalmatian language (now extinct) and central and southern Italian dialects, is part of the Apenino-Balkan group of Romance languages, different from theAlpine–Pyrenean group2.
    [Show full text]
  • The Unicode Cookbook for Linguists: Managing Writing Systems Using Orthography Profiles
    Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2017 The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles Moran, Steven ; Cysouw, Michael DOI: https://doi.org/10.5281/zenodo.290662 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-135400 Monograph The following work is licensed under a Creative Commons: Attribution 4.0 International (CC BY 4.0) License. Originally published at: Moran, Steven; Cysouw, Michael (2017). The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles. CERN Data Centre: Zenodo. DOI: https://doi.org/10.5281/zenodo.290662 The Unicode Cookbook for Linguists Managing writing systems using orthography profiles Steven Moran & Michael Cysouw Change dedication in localmetadata.tex Preface This text is meant as a practical guide for linguists, and programmers, whowork with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together. The intersection of the Unicode Standard and the International Phonetic Al- phabet is often not met without frustration by users. Nevertheless, thetwo standards have provided language researchers with a consistent computational architecture needed to process, publish and analyze data from many different languages. We bring to light common, but not always transparent, pitfalls that researchers face when working with Unicode and IPA. Our research uses quantitative methods to compare languages and uncover and clarify their phylogenetic relations. However, the majority of lexical data available from the world’s languages is in author- or document-specific orthogra- phies.
    [Show full text]
  • (Daco-Romanian and Aromanian) in Verbs of Latin Origin
    CONJUGATION CHANGES IN THE EVOLUTION OF ROMANIAN (DACO-ROMANIAN AND AROMANIAN) IN VERBS OF LATIN ORIGIN AIDA TODI1, MANUELA NEVACI2 Abstract. Having as starting point for research on the change of conjugation of Latin to the Romance languages, the paper aims to present the situation of these changes in Romanian: Daco-Romanian (that of the old Romanian texts) and Aromanian dialect (which does not have a literary standard). Keywords: conjugation changes; Romance languages; Romanian language (Daco-Romanian and Aromanian). Conjugation changes are a characteristic feature of the Romance verb system. In some Romance languages (Spanish, Catalan, Portuguese, Sardinian) verbs going from one conjugation to another has caused the reduction of the four conjugations inherited from Latin to three inflection classes: in Spanish and Portuguese the 2nd conjugation extended (verbs with stressed theme vowel): véndere > Sp., Pg. vender; cúrere > Sp., Pg. correr; in Catalan 3rd conjugation verbs assimilated the 2nd conjugation ones, a phenomenon occurring in Sardinian as well: Catal. ventre, Srd. biere; Additionally, in Spanish and Portuguese the 4th conjugation also becomes strong, assimilating 3rd conjugation verbs: petĕre > Sp., Pg. petir, ungĕre> Sp., Pg. ungir, iungĕre> Sp., Pg. ungir (Lausberg 1988: 259). Lausberg includes Aromanian together with Spanish, Catalan and Sardinian, where the four conjugations were reduced to three, mentioning that 3rd conjugation verbs switched to the 2nd conjugation3. The process of switching from one conjugation to another is frequent from as early as vulgar Latin. Grammars experience changes such as: augĕre > augēre; ardĕre> ardēre; fervĕre > fervēre; mulgĕre> mulgēre; respondĕre> respondēre; sorbĕre> sorbēre; torquĕre > torquēre; tondĕre >tondēre (Densusianu 1961: 103, 1 “Ovidius” University, Constanţa, [email protected].
    [Show full text]
  • East European Languages
    Latest Trends on Cultural Heritage and Tourism SEMANTIC CONCORDANCE IN THE SOUTH- EAST EUROPEAN LANGUAGES ANCUłA NEGREA, AGNES ERICH, ZLATE STEFANIA Faculty of Humanities Valahia University of Târgoviste Address: Street Lt. Stancu Ion, No. 35, Targoviste COUNTRY ROMANIA [email protected]; [email protected] Abstract: - The common terms of the southeast European languages that present semantic convergences [1], belong to a less explored domain, that of comparative linguistics. “The reason why the words have certain significations and not others is not found in the linguistic system or in the human biological or mental structures … it is the societies that, through their diverse history, determine the significations, and the semantics is found at the crossroads between the historical complexity of the reality it studies and the historical complexity of the culture reflecting on this reality… its object is the result and the condition of the historical character, thanks to the relations of mutual conditioning that relate the signification to the historic events of each human community.”[2] Key-Words: Balkan linguistics, terminology of the word “house”: Bg. kăšta, Srb., Cr. kuča, Maced. kuk’a, slov. koča < old sl. kotia meaning “what is hidden, what is sheltered”, Rom. casa < Lat. casa; Ngr. σπητη < Lat. hospitium; Alb. shtepi. 1 Introduction NeoGreek, and Albanese, present unanimous The “mentality” common to the Balkan world, concrdances for the terms having the meaning: having manifestations in the language, is the - “house”, “dwelling” and also historical product of the cohabitation in similar - ”room”, “place to live”, though these forms of civilization and culture. terms have different origins: Kristian SANDFELD, in his fundamental work - Bg.
    [Show full text]
  • CSS 2.1) Specification
    Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification W3C Editors Draft 26 February 2014 This version: http://www.w3.org/TR/2014/ED-CSS2-20140226 Latest version: http://www.w3.org/TR/CSS2 Previous versions: http://www.w3.org/TR/2011/REC-CSS2-20110607 http://www.w3.org/TR/2008/REC-CSS2-20080411/ Editors: Bert Bos <bert @w3.org> Tantek Çelik <tantek @cs.stanford.edu> Ian Hickson <ian @hixie.ch> Håkon Wium Lie <howcome @opera.com> Please refer to the errata for this document. This document is also available in these non-normative formats: plain text, gzip'ed tar file, zip file, gzip'ed PostScript, PDF. See also translations. Copyright © 2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract This specification defines Cascading Style Sheets, level 2 revision 1 (CSS 2.1). CSS 2.1 is a style sheet language that allows authors and users to attach style (e.g., fonts and spacing) to structured documents (e.g., HTML documents and XML applications). By separating the presentation style of documents from the content of documents, CSS 2.1 simplifies Web authoring and site maintenance. CSS 2.1 builds on CSS2 [CSS2] which builds on CSS1 [CSS1]. It supports media- specific style sheets so that authors may tailor the presentation of their documents to visual browsers, aural devices, printers, braille devices, handheld devices, etc. It also supports content positioning, table layout, features for internationalization and some properties related to user interface. CSS 2.1 corrects a few errors in CSS2 (the most important being a new definition of the height/width of absolutely positioned elements, more influence for HTML's "style" attribute and a new calculation of the 'clip' property), and adds a few highly requested features which have already been widely implemented.
    [Show full text]
  • Aleksey A. ROMANCHUK ROMANIAN a CINSTI in the LIGHT of SOME
    2021, Volumul XXIX REVISTA DE ETNOLOGIE ȘI CULTUROLOGIE E-ISSN: 2537-6152 101 Aleksey A. ROMANCHUK ROMANIAN A CINSTI IN THE LIGHT OF SOME ROMANIAN-SLAVIC CONTACTS1 https://doi.org/10.52603/rec.2021.29.14 Rezumat определенное отражение следы обозначенного поздне- Cuvântul românesc a cinsti în lumina unor contacte праславянского диалекта (диалектов). В частности, к româno-slave таким следам, возможно, стоит отнести как украинские Pornind de la comparația dintre cuvântul ucrainean диалектные чандрий, шандрий, чендрий, так и диалект- частувати ‚a trata’ și românescul a cinsti‚ a trata (cu vin), ное (зафиксировано в украинском говоре с. Булэешть) / a bea vin, se consideră un grup de împrumuturi slave cu мон|золетеи/ ‘мусолить; впустую теребить’. /n/ epentetic în limba română, căreia îi aparține și a cin- Ключевые слова: славяне, румыны, лексические sti. Interpretând corpul de fapte disponibil, putem presu- заимствования, этническая история, украинские диа- pune că convergență semantică dintre cuvintele честь и лекты, Молдова, Буковина. угощение a apărut în perioada slavică timpurie. Cuvântul românesc a cinsti este un argument important pentru da- Summary tarea timpurie a apariției acestei convergențe semantice. Romanian a cinsti in the light of some Așadar, cuvântul ucrainean частувати, ca și cuvântul po- Romanian-Slavic contacts lonez częstowac, apar independent unul de celălalt, la fel A group of Slavic loanwords with epenthetic /n/ in the ca și de cuvântul românesc a cinsti. Cuvântul românesc a Romanian language, to which a cinsti belongs, is consid- cinsti, ca și, în general, grupul menționat de împrumuturi ered. Interpreting the existing set of facts, the author sup- slave cu /n/ epentetic în limba română, reprezintă un rezul- poses that the semantic convergence between the Slav- tat al contactelor timpurii ale limbii române cu un dialect ic честь ‘honour’ and угощение ‘treat’ appeared as far back (dialecte) slav vechi, pentru care era caracteristică tendința as the Late Slavic period.
    [Show full text]
  • The Balkan Vlachs/Aromanians Awakening, National Policies, Assimilation Miroslav Ruzica Preface the Collapse of Communism, and E
    The Balkan Vlachs/Aromanians Awakening, National Policies, Assimilation Miroslav Ruzica Preface The collapse of communism, and especially the EU human rights and minority policy programs, have recently re-opened the ‘Vlach/Aromanian question’ in the Balkans. The EC’s Report on Aromanians (ADOC 7728) and its separate Recommendation 1333 have become the framework for the Vlachs/Aromanians throughout the region and in the diaspora to start creating programs and networks, and to advocate and shape their ethnic, cultural, and linguistic identity and rights.1 The Vlach/Aromanian revival has brought a lot of new and reopened some old controversies. A increasing number of their leaders in Serbia, Macedonia, Bulgaria and Albania advocate that the Vlachs/Aromanians are actually Romanians and that Romania is their mother country, Romanian language and orthography their standard. Such a claim has been officially supported by the Romanian establishment and scholars. The opposite claim comes from Greece and argues that Vlachs/Aromanians are Greek and of the Greek culture. Both countries have their interpretations of the Vlach origin and history and directly apply pressure to the Balkan Vlachs to accept these identities on offer, and also seek their support and political patronage. Only a minority of the Vlachs, both in the Balkans and especially in the diaspora, believes in their own identity or that their specific vernaculars should be standardized, and that their culture has its own specific elements in which even their religious practice is somehow distinct. The recent wars for Yugoslav succession have renewed some old disputes. Parts of Croatian historiography claim that the Serbs in Croatia (and Bosnia) are mainly of Vlach origin, i.e.
    [Show full text]
  • Iso/Iec Jtc1 Sc2 Wg2 N3984
    ISO/IEC JTC1 SC2 WG2 N3984 Notes on the naming of some characters proposed in the FCD of ISO/IEC 10646:2012 (3rd ed.) (JTC1/SC2/WG2 N3945, JTC1/SC2 N4168) Karl Pentzlin — 2011-02-02 This paper addresses the naming of the following characters proposed for inclusion: U+2E33 RAISED DOT U+2E34 RAISED COMMA (both proposed in JTC1/SC2/WG2 N3912) U+A78F LATIN LETTER GLOTTAL DOT (proposed in JTC1/SC2/WG2 N3567 as LATIN LETTER MIDDLE DOT) shown by the following excerpts from N3945: 1. Dots, Points, and "middle" and "raised" characters encoded in Unicode 6.0 (without script-specific ones and fullwidth/small forms; of similar characters within a block, a single character is selected as representative): Dots and Points: U+002E FULL STOP U+00B7 MIDDLE DOT U+02D9 DOT ABOVE U+0387 GREEK ANO TELEIA U+2024 ONE DOT LEADER U+2027 HYPHENATION POINT U+22C5 DOT OPERATOR U+2E31 WORD SEPARATOR MIDDLE DOT U+02D9 has the property Sk (Symbol, modifier), U+22C5 has Sm (Symbol, math), while all others have Po (Punctuation, other). U+A78F is proposed to have Lo (Letter, other). "Middle" and "raised" characters: U+02F4 MODIFIER LETTER MIDDLE GRAVE ACCENT U+2E0C LEFT RAISED OMISSION BRACKET (representing several similar characters in the same block) U+02F8 MODIFIER LETTER RAISED COLON U+A71B MODIFIER LETTER RAISED UP ARROW The following table will show these characters using some different widespread fonts. Font 002E 00B7 02D9 0387 2024 2027 22C5 2E31 02F4 2E0C 02F8 A71B Andron Mega Corpus x.E x·E x˙E α·Ε x․E x‧E x⋅E --- x˴E x⸌E x˸E --- Arial x.E x·E x˙E α·Ε x․E x‧E
    [Show full text]
  • Applications and Innovations in Typeface Design for North American Indigenous Languages Julia Schillo, Mark Turin
    Applications and innovations in typeface design for North American Indigenous languages Julia Schillo, Mark Turin To cite this version: Julia Schillo, Mark Turin. Applications and innovations in typeface design for North American Indige- nous languages. Book 2.0, Intellect Ltd, 2020, 10 (1), pp.71-98. 10.1386/btwo_00021_1. halshs- 03083476 HAL Id: halshs-03083476 https://halshs.archives-ouvertes.fr/halshs-03083476 Submitted on 22 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. BTWO 10 (1) pp. 71–98 Intellect Limited 2020 Book 2.0 Volume 10 Number 1 btwo © 2020 Intellect Ltd Article. English language. https://doi.org/10.1386/btwo_00021_1 Received 15 September 2019; Accepted 7 February 2020 Book 2.0 Intellect https://doi.org/10.1386/btwo_00021_1 10 JULIA SCHILLO AND MARK TURIN University of British Columbia 1 71 Applications and 98 innovations in typeface © 2020 Intellect Ltd design for North American 2020 Indigenous languages ARTICLES ABSTRACT KEYWORDS In this contribution, we draw attention to prevailing issues that many speakers orthography of Indigenous North American languages face when typing their languages, and typeface design identify examples of typefaces that have been developed and harnessed by histor- Indigenous ically marginalized language communities.
    [Show full text]
  • The Moldavian and Romanian Dialectal Corpus
    MOROCO: The Moldavian and Romanian Dialectal Corpus Andrei M. Butnaru and Radu Tudor Ionescu Department of Computer Science, University of Bucharest 14 Academiei, Bucharest, Romania [email protected] [email protected] Abstract In this work, we introduce the Moldavian and Romanian Dialectal Corpus (MOROCO), which is freely available for download at https://github.com/butnaruandrei/MOROCO. The corpus contains 33564 samples of text (with over 10 million tokens) collected from the news domain. The samples belong to one of the following six topics: culture, finance, politics, science, sports and tech. The data set is divided into 21719 samples for training, 5921 samples for validation and another 5924 Figure 1: Map of Romania and the Republic of samples for testing. For each sample, we Moldova. provide corresponding dialectal and category labels. This allows us to perform empirical studies on several classification tasks such as Romanian is part of the Balkan-Romance group (i) binary discrimination of Moldavian versus that evolved from several dialects of Vulgar Romanian text samples, (ii) intra-dialect Latin, which separated from the Western Ro- multi-class categorization by topic and (iii) mance branch of languages from the fifth cen- cross-dialect multi-class categorization by tury (Coteanu et al., 1969). In order to dis- topic. We perform experiments using a shallow approach based on string kernels, tinguish Romanian within the Balkan-Romance as well as a novel deep approach based on group in comparative linguistics, it is referred to character-level convolutional neural networks as Daco-Romanian. Along with Daco-Romanian, containing Squeeze-and-Excitation blocks.
    [Show full text]
  • Hungarian Influence Upon Romanian People and Language from the Beginnings Until the Sixteenth Century
    SECTION: HISTORY LDMD I HUNGARIAN INFLUENCE UPON ROMANIAN PEOPLE AND LANGUAGE FROM THE BEGINNINGS UNTIL THE SIXTEENTH CENTURY PÁL Enikő, Assistant Professor, PhD, „Sapientia” University of Târgu-Mureş Abstract: In case of linguistic contacts the meeting of different cultures and mentalities which are actualized within and through language are inevitable. There are some common features that characterize all kinds of language contacts, but in describing specific situations we will take into account different causes, geographical conditions and particular socio-historical settings. The present research focuses on the peculiarities of Romanian – Hungarian contacts from the beginnings until the sixteenth century insisting on the consequences of Hungarian influence upon Romanian people and language. Hungarians induced, directly or indirectly, many social and cultural transformations in the Romanian society and, as expected, linguistic elements (e.g. specific terminologies) were borrowed altogether with these. In addition, Hungarian language influence may be found on each level of Romanian language (phonetic, lexical etc.), its contribution consists of enriching the Romanian language system with new elements and also of triggering and accelerating some internal tendencies. In the same time, bilingualism has been a social reality for many communities, mainly in Transylvania, which contributed to various linguistic changes of Romanian language not only in this region, but in other Romanian linguistic areas as well. Hungarian elements once taken over by bilinguals were passed on to monolinguals, determining the development of Romanian language. Keywords: language contact, (Hungarian) language influence, linguistic change, bilingualism, borrowings Preliminary considerations Language contacts are part of everyday social life for millions of people all around the world (G.
    [Show full text]
  • Table of Contents
    CentralNET Business User Guide Table of Contents Federal Reserve Holiday Schedules.............................................................................. 3 About CentralNET Business ......................................................................................... 4 First Time Sign-on to CentralNET Business ................................................................. 4 Navigation ..................................................................................................................... 5 Home ............................................................................................................................. 5 Balances ........................................................................................................................ 5 Balance Inquiry Terms and Features ........................................................................ 5 Account & Transaction Inquiries .................................................................................. 6 Performing an Inquiry from the Home Screen ......................................................... 6 Initiating Transfers & Loan Payments .......................................................................... 7 Transfer Verification ................................................................................................. 8 Reporting....................................................................................................................... 8 Setup (User Setup) .......................................................................................................
    [Show full text]