Open-Source Morphology for Endangered Mordvinic Languages
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Финно-Угроведение, 2020 (№ 61) Научный Журнал Основан В Феврале 1994 Г
ФИННО-УГРОВЕДЕНИЕ, 2020 (№ 61) Научный журнал Основан в феврале 1994 г. ISSN 2312-0312 (Print), ISSN 2713-2250 (Online) Учредитель – Государственное бюджетное научное учреждение при Правительстве Республики Марий Эл «Марийский научно-исследовательский институт языка, литературы и истории им. В.М. Васильева» Редакционный совет: Кузьмин Евгений Петрович, председатель ред. совета, канд. ист. н., директор МарНИИЯЛИ (г. Йошкар-Ола) Пенькова Мария Викентьевна, секретарь ред. совета, канд. филол. н., зам. директора-ученый секретарь МарНИИЯЛИ (г. Йошкар-Ола) Агранат Татьяна Борисовна, д-р филол. н., ведущий научный сотрудник Института языкознания РАН (г. Москва) Воронцов Владимир Степанович, канд. ист. н., старший научный сотрудник Удмуртского института истории, языка и литературы УрО РАН (г. Ижевск) Зайцева Нина Григорьевна, д-р филол. н., зав. сектором языкознания Института языка, литературы и истории КарНЦ РАН (г. Петрозаводск) Косинцева Елена Викторовна, д-р филол. н., доцент, главный научный сотрудник Обско-угорского института прикладных исследований и разработок (г. Ханты-Мансийск) Крыласова Наталья Борисовна, д-р ист. н., главный научный сотрудник Пермского федерального исследовательского центра УрО РАН (г. Пермь) Куршева Галина Александровна, д-р ист. н., директор Научно-исследовательского института гуманитарных наук при Правительстве Республики Мордовия (г. Саранск) Лехтинен Илдико, д-р философии, доцент Хельсинкского университета (г. Хельсинки, Финляндия) Ошаев Алексей Григорьевич, канд. ист. н., декан историко-филологического факультета Марийского государственного университета (г. Йошкар-Ола) Пустаи Янош, профессор, д-р философии, председатель Ассоциации финно-угорских литератур, директор Института Collegium Fenno-Ugricum (г. Будапешт, Венгрия) Ракин Анатолий Николаевич, д-р филол. н., главный научный сотрудник Института языка, литературы и истории Коми НЦ УрО РАН (г. Сыктывкар) Сергеев Олег Арсентьевич, канд. филол. н., старший научный сотрудник МарНИИЯЛИ (г. -
Multilingual Facilitation
Multilingual Facilitation Honoring the career of Jack Rueter Mika Hämäläinen, Niko Partanen and Khalid Alnajjar (eds.) Multilingual Facilitation This book has been authored for Jack Rueter in honor of his 60th birthday. Mika Hämäläinen, Niko Partanen and Khalid Alnajjar (eds.) All papers accepted to appear in this book have undergone a rigorous peer review to ensure high scientific quality. The call for papers has been open to anyone interested. We have accepted submissions in any language that Jack Rueter speaks. Hämäläinen, M., Partanen N., & Alnajjar K. (eds.) (2021) Multilingual Facilitation. University of Helsinki Library. ISBN (print) 979-871-33-6227-0 (Independently published) ISBN (electronic) 978-951-51-5025-7 (University of Helsinki Library) DOI: https://doi.org/10.31885/9789515150257 The contents of this book have been published under the CC BY 4.0 license1. 1 https://creativecommons.org/licenses/by/4.0/ Tabula Gratulatoria Jack Rueter has been in an important figure in our academic lives and we would like to congratulate him on his 60th birthday. Mika Hämäläinen, University of Helsinki Niko Partanen, University of Helsinki Khalid Alnajjar, University of Helsinki Alexandra Kellner, Valtioneuvoston kanslia Anssi Yli-Jyrä, University of Helsinki Cornelius Hasselblatt Elena Skribnik, LMU München Eric & Joel Rueter Heidi Jauhiainen, University of Helsinki Helene Sterr Henry Ivan Rueter Irma Reijonen, Kansalliskirjasto Janne Saarikivi, Helsingin yliopisto Jeremy Bradley, University of Vienna Jörg Tiedemann, University of Helsinki Joshua Wilbur, Tartu Ülikool Juha Kuokkala, Helsingin yliopisto Jukka Mettovaara, Oulun yliopisto Jussi-Pekka Hakkarainen, Kansalliskirjasto Jussi Ylikoski, University of Oulu Kaisla Kaheinen, Helsingin yliopisto Karina Lukin, University of Helsinki Larry Rueter LI Līvõd institūt Lotta Jalava, Kotimaisten kielten keskus Mans Hulden, University of Colorado Marcus & Jackie James Mari Siiroinen, Helsingin yliopisto Marja Lappalainen, M. -
FENNO SUECANA FENNO-UGRICA SUECANA Nova Series 15 UGRICA
FENNO-UGRICA SUECANA Nova series Journal of Fenno -Ugric R esearch in Scandinavia 15 Institutionen för slaviska och baltiska språk, finska, nederländska och tyska Stockholm 2016 FENNO-UGRICA SUECANA Nova series Journal of Fenno-Ugric Research in Scandinavia 15 Editor-in-chief: Jarmo Lainio Issue editors: Peter S. Piispanen & Merlijn de Smit Editorial board: Jarmo Lainio, Stockholm Peter S. Piispanen, Stockholm Merlijn de Smit, Stockholm/Turku Stockholm 2016 © 2016 The authors Institutionen för slaviska och baltiska språk, finska, nederländska och tyska ISSN 0348-3045 ISBN 978-91-981559-0-7 FENNO-UGRICA SUECANA – Nova series 15 ARTICLES • Peter Piispanen : Statistical Dating of Finno-Mordvinic Languages through Comparative Linguistics and Sound Laws, p. 1 – 58 • Ante Aikio & Jussi Ylikoski: The origin of the Finnic l-cases, p. 59 – 158 • Håkan Rydving: Sydsamisk eller umesamisk? ”Södra Tärna” i det samiska språklandskapet, p. 159 – 174 • Riitta-Liisa Valijärvi : Ruotsinsuomalaisten opiskelijoiden kirjallisen tuotoksen morfosyntaksin ja sanaston virheanalyysia, p. 175 – 200 REVIEW No reviews in this issue. REPORTS • Lasse Vuorsola : Atmosfärförändring inom klimatdebatten, p. 201 – 207 REVIEW ARTICLE No review articles in this issue. FUS 15, 2016, ISSN 0348-3045, ISBN 978-91-981559-0-7 (online) Fenno-Ugrica Suecana – Nova Series This is the second issue of the revitalized Fenno-Ugrica Suecana series (now additionally termed ‘Nova Series‘), which continues to focus on issues concerning Fenno-Ugristics and Fennistics, in a wide sense. We invite researchers, teachers and other scientifically interested persons to send us contributions for publication. The planned annual deadlines are October 15th in the fall and February 15th in the spring. -
Περì O̓ρθότητος Ἐ Τύμων Uusiutuva Uralilainen Etymologia
ΠΕΡI OΡΘΌΤΗΤΌΣ EΤΎΜΩΝ UUSIUTUVA URALILAINEN ETYMOLOGIA Uralica Helsingiensia11 Περì ̓ ρθότητοςo ἐ τύμων Uusiutuva uralilainen etymologia TOIMITTANEET / EDITED BY SAMPSA HOLOPAINEN & JANNE SAARIKIVI HELSINKI 2018 Sampsa Holopainen & Janne Saarikivi (eds): Περı` o̓ ρθότητος ἐ τύμων. Uusiutuva uralilainen etymologia. Uralica Helsingiensia 11. Cover Rigina Ajanki Layout Mari Saraheimo English proofreading Alexandra Kellner ISBN 978-952-7262-04-7 (print) ISBN 978-952-7262-05-4 (online) ISSN 1797-3945 Orders • Tilaukset Printon Tiedekirja <www.tiedekirja.fi> Tallinn 2018 Snellmaninkatu 13 <[email protected]> FI-00170 Helsinki Uralica Helsingiensia Uralica Helsingiensia is a series published by the Finno-Ugrian Society. It features mono- graphs and thematic collections of articles with a research focus on Uralic languages, and it also covers the linguistic and cultural aspects of Estonian, Hungarian and Saami studies at the University of Helsinki. The series has a peer review system, i.e. the manuscripts of all articles and monographs submitted for publication will be refereed by two anonymous reviewers before binding decisions are made concerning the publication of the material. Uralica Helsingiensia on Suomalais-Ugrilaisen Seuran julkaisema sarja, jossa ilmestyy mono- grafioita ja temaattisia artikkelikokoelmia. Niiden aihepiiri liittyy uralilaisten kielten tutkimuk- seen ja kattaa myös Helsingin yliopistossa tehtävän Viron kielen ja kulttuurin, Unkarin kielen ja kulttuurin sekä saamentutkimuksen. Sarjassa noudatetaan refereekäytäntöä, eli -
Book of Abstracts
Congressus Duodecimus Internationalis Fenno-Ugristarum, Oulu 2015 Book of Abstracts Edited by Harri Mantila Jari Sivonen Sisko Brunni Kaisa Leinonen Santeri Palviainen University of Oulu, 2015 Oulun yliopisto, 2015 Photographs: © Oulun kaupunki ja Oulun yliopisto ISBN: 978-952-62-0851-0 Juvenes Print This book of abstracts contains all the abstracts of CIFU XII presentations that were accepted. Chapter 1 includes the abstracts of the plenary presentations, chapter 2 the abstracts of the general session papers and chapter 3 the abstracts of the papers submitted to the symposia. The abstracts are presented in alphabetical order by authors' last names except the plenary abstracts, which are in the order of their presentation in the Congress. The abstracts are in English. Titles in the language of presentation are given in brackets. We have retained the transliteration of the names from Cyrillic to Latin script as it was in the original papers. Table of Contents 1 Plenary presentations 7 2 Section presentations 19 3 Symposia 197 Symp. 1. Change of Finnic languages in a multilinguistic environment .......................................................................... 199 Symp. 2. Multilingual practices and code-switching in Finno-Ugric communities .......................................................................... 213 Symp. 3. From spoken Baltic-Finnic vernaculars to their national standardizations and new literary languages – cancelled ...... 231 Symp. 4. The syntax of Samoyedic and Ob-Ugric languages ...... 231 Symp. 5. The development -
Kone Foundation Te Language Programme 2012–2016
Kone Foundation !e Language Programme – Kone Foundation !e Language Programme .. Patrik Söderlund Mynäprint, Mynämäki "e Kone Foundation Language Programme – !"#$%#$& Background Kone Foundation Places Emphasis on the Minority Languages of the Baltic Region and on Finnish Supporting Multilingualism with the Help of the Language Programme "e Current Support for Finno-Ugrian Minorities and Languages Mission Vision "e Mission: Documentation of Small Finno-Ugrian Languages, Finnish, and Minority Languages in Finland Enhancing the Availability, Accessibility and Usability of Corpora Taking into Consideration All Kinds of Language Users Advancing a Culture of Openness and Interaction Advancing Multidisciplinarity and Crossing New Boundaries How Are the Objectives Achieved? Concrete Measures – Communication and Cooperation between Projects Projects Funded "rough Annual Grant Application Rounds, Specially Allocated Grant Calls, and Projects Initiated by the Foundation An Example of the Use and Development of Language Technology Applications Monitoring the Programme Koneen Säätiön ovat perustaneet vuorineuvos Heikki H. Herlin ja eko- nomi Pekka Herlin vuonna . Perustajat toimivat samanaikaisesti sekä säätiön että Kone Osakeyhtiön johdossa, joten kahden organisaation yh- teys oli läheinen. Nykyisin Koneen Säätiö on itsenäinen ja riippumaton yleishyödyllinen organisaatio, joka edistää tieteellistä tutkimustyötä, eri- tyisesti humanistista, yhteiskuntatieteellistä ja ympäristöntutkimusta, sekä taidetta ja kulttuuria Suomessa. -
The Grown-Up Siblings: History and Functions of Western Uralic *Kse
Rigina Ajanki University of Helsinki The grown-up siblings: history and functions of Western Uralic *kse In this paper, it is claimed that the case suffix *kse, known as translative, dates back to the Finnic-Mordvin proto language, where it functioned as a functive� It is illustrated using synchronic data from Finnic-Mordvin languages that the functions of *kse do not display an inherent feature of directionality ‘into’, or in other terms, lative� It is even possible that the suffix was neutral with respect to time stability, as it is in contemporary Erzya� Further, it is assumed that since the Northern Finnic languages have acquired a new stative case, the functive labelled essive *nA, formerly applied as an intralocal case, the functions of *kse have changed in these languages: *kse has become mainly the marker of a transformative, with an inherent feature of dynamicity� 1� Introduction 5�1� Translative with 2� Typological background: stative copula in Finnish: *kse as a functive similatives and functives 3� Translatives in the case systems 5�2� Finnish ditransitive of Finnic-Mordvin languages constructions displaying 3�1� The Finnish case translative-essive variation system revisited 5�3� Erzya and Finnish expressions 3�2� Functives as of order in translative non-verbal predicates 6� The developmental path 4� The emergence of *kse of translative *kse as a case suffix 7� The emergence of essive 5� The functions of *kse and its consequences for in contemporary the functions of *kse Finnic-Mordvin languages 1. Introduction The translative -
Baltic Loanwords in Mordvin
Riho Grünthal Department of Finnish, Finno-Ugrian and Scandinavian Studies University of Helsinki Baltic loanwords in Mordvin Linguists have been aware of the existence of Baltic loanwords in the Mord- vinic languages Erzya (E) and Moksha (M) since the 19th century. However, the analysis and interpretation of individual etymologies and the contacts between these two language groups have been ambiguous, as the assumptions on the place and age of the contacts have changed. The assertions on the prehistoric development and early language contacts between the Finno-Ugric (Uralic) and Indo-European languages have changed as well. The main evidence concerning early Baltic loanwords in the Finno-Ugric languages is drawn from the Finnic languages, which are located geographically further west relative to Mordvinic. The high number of early Baltic loanwords in the Finnic languages suggests that the most intensive contacts took place between the early varieties of the Finnic and Baltic languages and did not infl uence other Finno-Ugric languages to the same extent. In principle, the continuity of these contacts extends until the modern era and very recent contacts between Estonian, Livonian, and Latvian that are geographical neighbors and documented languages with a concrete geo- graphical distribution, historical and cultural context. The Baltic infl uence on the Saamic and Mordvinic languages was much less intensive, as evidenced by the considerably lesser number of loanwords. Moreover, the majority of Baltic loanwords in Saamic are attested in the Finnic languages as well, whereas the early Baltic infl uence on the Mordvinic lan- guages diverges from that on the Finnic languages. -
Book of Abstracts
Congressus Duodecimus Internationalis Fenno-Ugristarum, Oulu 2015 Book of Abstracts Edited by Harri Mantila Jari Sivonen Sisko Brunni Kaisa Leinonen Santeri Palviainen University of Oulu, 2015 Oulun yliopisto, 2015 Photographs: © Oulun kaupunki ja Oulun yliopisto ISBN: 978-952-62-0851-0 Juvenes Print This book of abstracts contains all the abstracts of CIFU XII presentations that were accepted. Chapter 1 includes the abstracts of the plenary presentations, chapter 2 the abstracts of the general session papers and chapter 3 the abstracts of the papers submitted to the symposia. The abstracts are presented in alphabetical order by authors' last names except the plenary abstracts, which are in the order of their presentation in the Congress. The abstracts are in English. Titles in the language of presentation are given in brackets. We have retained the transliteration of the names from Cyrillic to Latin script as it was in the original papers. Table of Contents 1 Plenary presentations 7 2 Section presentations 19 3 Symposia 199 Symp. 1. Change of Finnic languages in a multilinguistic environment .......................................................................... 201 Symp. 2. Multilingual practices and code-switching in Finno-Ugric communities .......................................................................... 215 Symp. 3. From spoken Baltic-Finnic vernaculars to their national standardizations and new literary languages – cancelled ...... 233 Symp. 4. The syntax of Samoyedic and Ob-Ugric languages ...... 233 Symp. 5. The development -
Information-Theoretic Causal Inference of Lexical Flow
Information-theoretic causal inference of lexical flow Johannes Dellert language Language Variation 4 science press Language Variation Editors: John Nerbonne, Martijn Wieling In this series: 1. Côté, Marie-Hélène, Remco Knooihuizen and John Nerbonne (eds.). The future of dialects. 2. Schäfer, Lea. Sprachliche Imitation: Jiddisch in der deutschsprachigen Literatur (18.–20. Jahrhundert). 3. Juskan, Martin. Sound change, priming, salience: Producing and perceiving variation in Liverpool English. 4. Dellert, Johannes. Information-theoretic causal inference of lexical flow. ISSN: 2366-7818 Information-theoretic causal inference of lexical flow Johannes Dellert language science press Dellert, Johannes. 2019. Information-theoretic causal inference of lexical flow (Language Variation 4). Berlin: Language Science Press. This title can be downloaded at: http://langsci-press.org/catalog/book/233 © 2019, Johannes Dellert Published under the Creative Commons Attribution 4.0 Licence (CC BY 4.0): http://creativecommons.org/licenses/by/4.0/ ISBN: 978-3-96110-143-6 (Digital) 978-3-96110-144-3 (Hardcover) ISSN: 2366-7818 DOI:10.5281/zenodo.3247415 Source code available from www.github.com/langsci/233 Collaborative reading: paperhive.org/documents/remote?type=langsci&id=233 Cover and concept of design: Ulrike Harbort Typesetting: Johannes Dellert Proofreading: Amir Ghorbanpour, Aniefon Daniel, Barend Beekhuizen, David Lukeš, Gereon Kaiping, Jeroen van de Weijer, Fonts: Linux Libertine, Libertinus Math, Arimo, DejaVu Sans Mono Typesetting software:Ǝ X LATEX Language Science Press Unter den Linden 6 10099 Berlin, Germany langsci-press.org Storage and cataloguing done by FU Berlin Contents Preface vii Acknowledgments xi 1 Introduction 1 2 Foundations: Historical linguistics 7 2.1 Language relationship and family trees ............. -
Finno-Ugric Languages in Time and Space Tartu, Estonia, February 1-3, 2015
Ariste Conference Finno-Ugric Languages in Time and Space Tartu, Estonia, February 1-3, 2015 On the Morphological Alignment of Erzya and Moksha Verbal Paradigms Feb. 3, 2015 Jack Rueter, PhD. University of Helsinki Finno-Ugric Languages in Time and Space => • Финнэнь-угрань кельтне: • Шкастонзо • Тарказост аравтозь => • Finno-Ugric Languages • In time • In their place Feb. 3, 2015 Jack Rueter, PhD. University of Helsinki Not “Why” but “What for” • Open morphological analyzers • Giellatekno.uit.no • Robust • Requires more analyses to disambiguate itself • Open syntactical disambiguation • Constraint Grammar course in the works for Autumn 2015 • Open text to speech, Open translation Feb. 3, 2015 Jack Rueter, PhD. University of Helsinki • Language affinity statistics • Morphology and Function • Moods and tenses • First Preterit • Non-ambiguous forms Feb. 3, 2015 Jack Rueter, PhD. University of Helsinki Statistics we have seen or heard on Erzya and Moksha language affinity • Vocabulary statistics for Erzya and Moksha: • 80% affinity (Keresztes 2011: 118). • Actual figure 80.80% (Zaicz 1990: 7-13). • 116 words: 92.42% • 902 Erzya words: 685 Moksha cognates • 75.94% • 1039 Moksha words: 890 Erzya cognates • 85.66% Feb. 3, 2015 Jack Rueter, PhD. University of Helsinki Statistics few have counted on Mordvin affinity • Paasonen’s “Mordwinisches Wörterbuch” 1990-1996, 2703 pages • 6952 root entries containing cognate information • Erzya 5302 = 76% • Moksha 4808 = 69% • Intersection 3158 = 45% • Stem entries: 21,754 • Erzya 14,524 = 67% • Moksha 12,945 = 59% • Intersection 5,785 = 27% Feb. 3, 2015 Jack Rueter, PhD. University of Helsinki Mutual Mordvin verb inventory based on Paasonen’s dictionary (1990-1996) Macro root articles (6,952) Erzya Moksha Intersection Union 1210 1198 734 1674 72.2% 71.5% 44% 100% Stem articles (21,754) Erzya Moksha Intersection Union 5,127 5,371 2,183 8,315 61.6% 64.6% 26% 100% Feb. -
Arxiv:2103.13275V1 [Cs.CL] 24 Mar 2021
When Word Embeddings Become Endangered Khalid Alnajjar[0000−0002−7986−2994] Department of Digital Humanities, Faculty of Arts, University of Helsinki, Finland [email protected] Abstract. Big languages such as English and Finnish have many nat- ural language processing (NLP) resources and models, but this is not the case for low-resourced and endangered languages as such resources are so scarce despite the great advantages they would provide for the language communities. The most common types of resources available for low-resourced and endangered languages are translation dictionar- ies and universal dependencies. In this paper, we present a method for constructing word embeddings for endangered languages using existing word embeddings of different resource-rich languages and the translation dictionaries of resource-poor languages. Thereafter, the embeddings are fine-tuned using the sentences in the universal dependencies and aligned to match the semantic spaces of the big languages; resulting in cross- lingual embeddings. The endangered languages we work with here are Erzya, Moksha, Komi-Zyrian and Skolt Sami. Furthermore, we build a universal sentiment analysis model for all the languages that are part of this study, whether endangered or not, by utilizing cross-lingual word em- beddings. The evaluation conducted shows that our word embeddings for endangered languages are well-aligned with the resource-rich languages, and they are suitable for training task-specific models as demonstrated by our sentiment analysis model which achieved a high accuracy. All our cross-lingual word embeddings and the sentiment analysis model have been released openly via an easy-to-use Python library. Keywords: Cross-lingual Word Embeddings · Endangered Languages · Sentiment Analysis.