Speech Recognition for Endangered and Extinct Samoyedic Languages

Total Page:16

File Type:pdf, Size:1020Kb

Speech Recognition for Endangered and Extinct Samoyedic Languages Speech Recognition for Endangered and Extinct Samoyedic languages Niko Partanen Mika Hämäläinen Tiina Klooster University of Helsinki University of Helsinki Luua Forestry School Finland and Rootroo Ltd Estonia [email protected] Finland [email protected] [email protected] Abstract are extinct. Even extinct Samoyedic languages have, however, been documented to various degrees. Our study presents a series of experiments on Documentation of Samoyedic languages has speech recognition with endangered and ex- reached a mature stage in last decades. Three lan- tinct Samoyedic languages, spoken in North- guages in this group have a recent monograph long ern and Southern Siberia. To best of our grammars in English. These are Tundra Nenets knowledge, this is the first time a functional ASR system is built for an extinct language. (Nikolaeva, 2014), Forest Enets (Siegl, 2013) and We achieve with Kamas language a Label Er- Nganasan (Wagner-Nagy, 2018). Similarly re- ror Rate of 15%, and conclude through care- sources on these languages have been steadily be- ful error analysis that this quality is already coming available for researchers, often in connec- very useful as a starting point for refined hu- tion with major language documentation projects, man transcriptions. Our results with related such as ‘The word stock of the Selkup language as Nganasan language are more modest, with best the main source of cultural and historical informa- model having the error rate of 33%. We show, however, through experiments where Kamas tion about a moribund endangered native ethnicum 1 2 training data is enlarged incrementally, that of Western Siberia’ , ‘Enets and Forest Nenets’ , 3 Nganasan results are in line with what is ex- ‘Corpus of Nganasan Folklore Texts’ , ‘Documen- pected under low-resource circumstances of tation of Enets: digitization and analysis of legacy the language. Based on this, we provide rec- field materials and fieldwork with last speakers’4, ommendations for scenarios in which further ‘Tundra Nenets Texts’5, ‘Comprehensive Documen- language documentation or archive process- tation and Analysis of Two Endangered Siberian ing activities could benefit from modern ASR Languages: Eastern Khanty and Southern Selkup’6, technology. All training data and processing scripts haven been published on Zenodo with ‘Selkup Language Corpus (SLC)’ (Budzisch et al., clear licences to ensure further work in this im- 2019), ‘Nganasan Spoken Language Corpus’ (Bryk- portant topic. ina et al., 2018), ‘INEL Selkup Corpus’ (Brykina et al., 2020), ‘INEL Kamas corpus’ (Gusev et al., 2019). Also Online Dictionary for Uralic Languages 1 Introduction (Rueter and Hämäläinen, 2017) includes Tundra Samoyedic languages are spoken in the Western Nenets lexical material. Siberia, and Tundra Nenets also extends far to the 1https://cordis.europa.eu/project/id/ European part of Northern Russia. These languages INTAS2005-1000006-8411 2 belong to the Uralic language family. All currently https://dobes.mpi.nl/projects/nenets/ 3https://iling-ran.ru/gusev/Nganasan/texts/ spoken Samoyedic languages are endangered (see 4https://elar.soas.ac.uk/Collection/MPI950079 Moseley (2010). Tundra Nenets is the largest lan- 5https://elar.soas.ac.uk/Collection/MPI120925 guage in the family, while some, such as Kamas, 6https://elar.soas.ac.uk/Collection/MPI43298 Our study here focuses on two of these languages: age (Partanen and Rießler, 2018). This responds Nganasan and Kamas. Our topic is ASR, but we an- well to OCR challenges identified earlier for these chor this work into a wider context of computational languages by Partanen (2017). The vast majority resources and language documentation. The work of these languages have virtually no language tech- has two goals: to examine feasibility of developing nology at the moment, but as there are increasingly an ASR system for an extinct language, in the case larger and larger corpora, the possibilities for future of Kamas, and to investigate the usability of such a work are many. system in a real on-going endangered language doc- One challenge in working with these endangered umentation scenario that is presented by Nganasan. languages is that very few researchers are able to These scenarios are not wide apart from one another. transcribe them confidently and accurately. In the Worldwide extinction of linguistic diversity has been past few years, however, speech recognition in en- recognized for the last 30 years (Krauss, 1992), and dangered language context has seen significant im- many languages are in a very endangered situation. provements, especially in scenarios where there is The models trained in this paper have been re- only one single speaker. Adams et. al. (2018) report leased and made openly accessible on Zenodo with a a reasonable accuracy under these circumstances al- permanent DOI7. As both corpora used in our study ready with just a few hours of transcribed data, with are distributed with restrictive non-commercial li- rapid increase in accuracy when there is more train- cense (CC-BY-NC-SA), we have also published our ing data. They also present a comparison of models training and testing materials with the same license. trained on different amounts of training data using We think open practices such as these will be gaining Na and Chatino data, which also inspired our own importance in the future, and we want to contribute comparative experiments. to this development as well by our example (Garellek We have also seen very recently large improve- et al., 2020). ments in such systems on related Uralic languages, for example Zyrian Komi (Hjortnaes et al., 2020b; 2 Context and Related Work Hjortnaes et al., 2020a). We have also seen experi- Despite the wide documentation activities, we have ments where ASR is being integrated to the language not witnessed large improvements in language tech- documentation workflows, for example, in Papuan nology and computational linguistics on these lan- context (Zahrer et al., 2020). Most widely applied guages. Thereby one of the goals in this paper speech recognition systems have been Persephone is to encourage further work on these languages, (Adams et al., 2018), Elpis (Foley et al., 2018) and also through our published new datasets. There is DeepSpeech (Hannun et al., 2014). In this paper, a rule based morphological analyser for Nganasan we present and discuss several experiments we have (Endrédy et al., 2010), which, however, appears to done using Persephone system. be available only through a web interface, and is not open access. 3 Languages and Data What it comes to other Samoyedic languages, a rule based Tundra Nenets morphological anal- Nganasan is an endangered Samoyedic language yser exists in the GiellaLT infrastructure (Moshagen spoken by Nganasans, a small ethnic group in et al., 2014)8 with availability through UralicNLP Taimyr Peninsula, Northern Siberia (Janhunen and (Hämäläinen, 2019). There are also early Selkup9 Gruzdeva, 2020). According to official statis- and Nganasan10 analysers in the same infrastruc- tics there are 470 Nganasans, from who approxi- ture. Also OCR models have been developed to tar- mately 125 speak the Nganasan language (Wagner- get early writing systems on these languages (Parta- Nagy, 2018, 3,17). Despite languages endanger- nen and Rießler, 2019b), with associated data pack- ment, plenty of documentation work has been con- ducted (Leisio, 2006; Wagner-Nagy, 2014; Ka- 7https://zenodo.org/record/4029494 8https://github.com/giellalt/lang-yrk heinen, 2020). Largest available Nganasan corpus 9https://github.com/giellalt/lang-sel was published in 2018 (Brykina et al., 2018), and it 10https://github.com/giellalt/lang-nio was used in our study. Kamas is another Samoyedic language, represent- ful and realistic scenario. ing the southern group of this branch of Uralic lan- As Nganasan corpus was significantly larger than guages. Kamas was spoken in the slopes of Sayan Kamas, also more preprocessing was needed, proba- mountains in the Central Siberia. It is believed that bly reflecting the longer time frame where it has been by the 19th century Kamas tribe consisted of only created. We excluded all segments that were shorted 130 people (Matveev, 1964). Kamas were forced than 400 milliseconds, and removed all empty an- to abandon their nomadic lifestyle in the beginning notations or annotations that contained only punctu- of 20th century, which, in connection with large so- ation characters. There were several invisible Uni- cietal changes, increased the contact with Russian code space characters that were removed. Also all speakers and led to a cultural assimilation (Klooster, annotations that contained number written as number 2015, 9). The last Kamas speaker was Klavdiya Plot- characters were excluded. We also removed from nikova, who was born in 1895 in a small village of Nganasan corpus all instances of utterances that con- Abalakovo in Central Siberia. She worked with sev- tained Cyrillic characters. eral linguists since 1960s, and this results in a sizable For both corpora the annotations that contained collection of Kamas recordings that are available in unclear words marked with HIAT conventions various archives. (Ehlich and Rehbein, 1976) were removed. When In 2019, a corpus containing transcribed versions the transcriptions contained annotations for non- of these materials was published (Gusev et al., 2019). verbal expressions, including coughing or laughter, We used Klavdiya Plotnikova’s part of the corpus we chose to remove
Recommended publications
  • Introducing Phonology
    Introducing Phonology Designed for students with only a basic knowledge of linguistics, this leading textbook provides a clear and practical introduction to phonology, the study of sound patterns in language. It teaches in a step-by-step fashion the logical techniques of phonological analysis and the fundamental theories that underpin it. This thoroughly revised and updated edition teaches students how to analyze phonological data, how to think critically about data, how to formulate rules and hypotheses, and how to test them. New to this edition: • Improved examples, over 60 exercises, and 14 new problem sets from a wide variety of languages encourage students to practice their own analysis of phonological processes and patterns • A new and updated reference list of phonetic symbols and an updated transcription system, making data more accessible to students • Additional online material includes pedagogical suggestions and password-protected answer keys for instructors david odden is Professor Emeritus in Linguistics at Ohio State University. Cambridge Introductions to Language and Linguistics This new textbook series provides students and their teachers with accessible introductions to the major subjects encountered within the study of language and linguistics. Assuming no prior knowledge of the subject, each book is written and designed for ease of use in the classroom or seminar, and is ideal for adoption on a modular course as the core recommended textbook. Each book offers the ideal introductory material for each subject, presenting students with an overview of the main topics encountered in their course, and features a glossary of useful terms, chapter previews and summaries, suggestions for further reading, and helpful exercises.
    [Show full text]
  • Migracijske Teme 4/1988
    Migracijske teme 15 (1999), 1-2: 63-153 UDK: 809.45-0 Izvorni znanstveni rad Primljeno: 17. 11. 1998. Paolo Agostini University of Padova [email protected] LANGUAGE RECONSTRUCTION – APPLIED TO THE URALIC LANGUAGES* SUMMARY After pointing out the shortcomings and methodological weakness of the general theory of linguistic reconstruction, the author disputes the alleged antiquity of Uralic. Proto-Uralic as recon- structed by the scholars seems to be the sum of a set of features belonging to several distinct language families. The paper examines a number of lexical concordances with historically attested languages and comes to the conclusion that the Proto-Uralic word-stock is the result of a sum of borrowings that took place from the most disparate languages: Balto-Slavic, Old Swedish, several Turkic dialects, Mongolic, Tunguz, Aramaic, Hebrew, Arabic, late Middle Persian dialects, Byzantine Greek and Latin. Yet, other languages may also come into account: Chinese, Caucasian languages as well as lan- guages unknown in present day are possible candidates. A large number of bases of the Uralic word- stock can be easily identified by following a few phonological constraints. The linguistic features of the Uralic daughter-languages seem to show that they originated from a pidgin language spoken along the merchant routes that connected the Silk Road to North- and East-European trade. It is a well-known phenomenon that sometimes, when groups of people speaking different languages come into contact for the first time, a new restricted language system (lingua franca or pidgin) comes into being in order to cater to essential common needs.
    [Show full text]
  • On the Article-Like Use of the Px2sg in Dolgan, Nganasan and Some Other Languages in an Areal Siberian Context1
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Jagiellonian Univeristy Repository On the Article-like Use of the Px2Sg in Dolgan, Nganasan and Some Other Languages in an Areal Siberian Context1 Marek Stachowski (Kraków) In Stachowski 1998 wurde gezeigt, dass das Possessivsuffix der 2. Person Sg. im Norddialekt des Dolganischen unter nganasanischem Einfluss die Funktion eines bestimmten Artikels erfüllen kann. Im vorliegenden Artikel wird B. Pakendorfs (2007) These, dass dieser Gebrauch des Possessivsuffixes auf ewenkischen Ein- fluss beruhe, sowie dass die Erscheinung im Dolganischen wie im Jakutischen eine gemeinsame Quelle habe, diskutiert. Zum Schluss wird nahegelegt, dass dieses Phänomen, das einerseits das Jakutische mit dem Selkupischen und an- dererseits das Dolganische mit dem Nganasanischen verbindet, möglicherweise zur Festlegung von zwei Spracharealen beitragen kann: „Tajmyr-Areal“ und „(Ur)Selkupisch-(Ur)Jakutisches“ Areal. Ten years ago I published a short article (Stachowski 1998) showing that the Px2Sg (= possessive suffix of the second person singular) can be used with the function of the definite article in the Northern dialect of Dolgan. Since the construction is completely untypical of a Turkic language, I suggested that it was developed under the influence of Nganasan. Some months ago, B. Pakendorf (2007) published a study addressing the same problem, and it is to her merit that she was able to present four Yakut sentences in which the Px2Sg was also used as a definite article.2 This material was unknown to me earlier, and it makes the phenomenon even more interesting. Pakendorf is of course quite right when she says that the Nganasan adstratum cannot be used to explain the origins of the phenomenon in Yakut because of the geographical distance between the two languages.
    [Show full text]
  • The Cassidy Code
    2019 THE CASSIDY CODE ZAHMOUL, Noureddine ABD/USA/США ABSTRACT Awareness about Basic Guttural Consonants, BGC, perdurable presence, since illo-tempore, in Hamito-Semitic languages, and conspicuous absence among Indo-European and Uralic languages, raises a case of interest. Tunsi Long Range Comparison, LRC, with the English and the Suomi languages entails discovery of regular differences, alternations, and reversal patterns hidden in the data. A brand new approach emerges facilitating languages LRC, and easing Language Origins Research, LOR. My first claim is about an unvoiced consonant gamut available to ofset eache missing BGC. My second claim covers the useful, notrivial, unobtrusive orijinal consonantal reversal phenomenon. The Cassidy Code is Sumerian. Grimm and Verner Laws sequel, alternating BGS with mostly unvoiced consonants or apocope entailing a forward shift of articulation basis, due finer pronunciation, and adding the tran mogrifying reversals. The idea is to put forward a paralel code, in LRC of language and LOR quest, to the focus on separate wide swaths of straight cognations. Key Words: Cassidy code, Sumerian, Language Origins Research (LOR), Basic Guttural Consonants (BGC), Long Range Comparison (LRC), Indo- European languages, Uralic languages. 2020 2021 2022 2023 Annex I: The Precession and the Forgotten Ice Age Jane B. Sellers sought during her sixty years of research to assess and demonstrate that: “Archeologists, by and large, lack an understanding of the precession and this affects their conclusions concerning ancient myths, ancient gods and ancient temple alignments. Philologists, too, ignore the accusation that certain problems are not going to be solved as long as they imagine that familiarity with grammar replaces scientific knowledge of astronomy.
    [Show full text]
  • A Grammar of Tundra Nenets Mouton Grammar Library
    Irina Nikolaeva A Grammar of Tundra Nenets Mouton Grammar Library Edited by Georg Bossong Bernard Comrie Matthew Dryer Patience L. Epps Volume 65 Irina Nikolaeva A Grammar of Tundra Nenets ISBN 978-3-11-032047-3 e-ISBN 978-3-11-032064-0 ISSN 0933-7636 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. 6 2014 Walter de Gruyter GmbH, Berlin/Boston Typesetting: RoyalStandard, Hong Kong Printing and binding: CPI buch bücher.de GmbH, Birkach ♾ Printed on acid-free paper Printed in Germany www.degruyter.com Acknowledgment This grammar is the result of many years of cooperation with members of the Tundra Nenets community, whose linguistic intuitions, passion for language, and, last but not least, extraordinary patience in dealing with me made it all possible. I am greatly indebted to all of you. Ңули” сава! I owe a great debt of gratitude to the colleagues with whom I have had the opportunity to work and discuss various intriguing aspects of Tundra Nenets grammar, especially to Farrell Ackerman, Larisa Leisiö and Tapani Salminen. I really miss our joint elicitation sessions; it was a lot of fun! Tapani Salminen was the first to intro- duce me to the language, and his own work on Tundra Nenets has always been a source of inspiration for me. I also thank Tapani and Larisa for their assistance in the practical aspects of my fieldwork.
    [Show full text]
  • Collaboration: Interoperability Between People in the Creation of Language Resources for Less-Resourced Languages a SALTMIL Workshop May 27, 2008 Workshop Programme
    Collaboration: interoperability between people in the creation of language resources for less-resourced languages A SALTMIL workshop May 27, 2008 Workshop Programme 14:30-14:35 Welcome 14:35-15:05 Icelandic Language Technology Ten Years Later Eiríkur Rögnvaldsson 15:05-15:35 Human Language Technology Resources for Less Commonly Taught Languages: Lessons Learned Toward Creation of Basic Language Resources Heather Simpson, Christopher Cieri, Kazuaki Maeda, Kathryn Baker, and Boyan Onyshkevych 15:35-16:05 Building resources for African languages Karel Pala, Sonja Bosch, and Christiane Fellbaum 16:05-16:45 COFFEE BREAK 16:45-17:15 Extracting bilingual word pairs from Wikipedia Francis M. Tyers and Jacques A. Pienaar 17:15-17:45 Building a Basque/Spanish bilingual database for speaker verification Iker Luengo, Eva Navas, Iñaki Sainz, Ibon Saratxaga, Jon Sanchez, Igor Odriozola, Juan J. Igarza and Inma Hernaez 17:45-18:15 Language resources for Uralic minority languages Attila Novák 18:15-18:45 Eslema. Towards a Corpus for Asturian Xulio Viejo, Roser Saurí and Angel Neira 18:45-19:00 Annual General Meeting of the SALTMIL SIG i Workshop Organisers Briony Williams Language Technologies Unit Bangor University, Wales, UK Mikel L. Forcada Departament de Llenguatges i Sistemes Informàtics Universitat d'Alacant, Spain Kepa Sarasola Lengoaia eta Sistema Informatikoak Saila Euskal Herriko Unibertsitatea / University of the Basque Country SALTMIL Speech and Language Technologies for Minority Languages A SIG of the International Speech Communication Association
    [Show full text]
  • Reconstructing Proto-Ugric and Proto-Uralic Object Marking Katalin É
    Reconstructing Proto-Ugric and Proto-Uralic Object Marking Katalin É. Kiss ([email protected]) Research Institute for Linguistics of the Hungarian Academy and Pázmány P. University Abstract This paper demonstrates that syntactic changes in the feature specifications of functional heads can be traced back to undocumented stages of languages. It reconstructs the object–verb relation in Proto-Uralic – by means of the comparative method adapted to syntax. Present-day Uralic languages display differential object–verb agreement and/or differential accusative marking. In double-marking languages, the head licensing object–verb agreement may be different from that licensing accusative-marking. The licensing conditions of object marking are also different across languages. It is argued that the Uralic parent language had both object-verb agreement and accusative assignment licensed by a TP-external functional head with a [topic] feature. The [topic] feature of this head has been reanalyzed as [specific] in Udmurt, and as [definite] in Hungarian – via a natural extention of the content of the notion of topicality. In languages with generalized accusative assignment, i.e., in Hungarian and Tundra Nenets, the licensing of object agreement and accusative marking have been divorced; the latter has come to be associated with v. Keywords: differential object marking (DOM), object–verb agreement, accusative, syntactic reconstruction, comparative method 1. Introduction According to the Borer–Chomsky Conjecture (Borer 1984), the parametric values of grammars are expressed in the functional lexicon. Under this assumption, syntactic changes involve changes in the feature specifications of functional heads. It is an open question whether changes of this type, affecting features of morphologically real or abstract syntactic heads, can be traced back to undocumented stages of languages (cf.
    [Show full text]
  • Drastic Demographic Events Triggered the Uralic Spread to Appear in Diachronica
    Drastic demographic events triggered the Uralic spread To appear in Diachronica Riho Grünthal (University of Helsinki) Volker Heyd (University of Helsinki) Sampsa Holopainen (University of Helsinki) Juha Janhunen (University of Helsinki) Olesya Khanina (Institute of Linguistics, Russian Academy of Sciences) Matti Miestamo (University of Helsinki) Johanna Nichols (University of California, Berkeley; University of Helsinki; Higher School of Economics, Moscow) Janne Saarikivi (University of Helsinki) Kaius Sinnemäki (University of Helsinki) Abstract: The widespread Uralic family offers several advantages for tracing prehistory: a firm absolute chronological anchor point in an ancient contact episode with well-dated Indo-Iranian; other points of intersection or diagnostic non-intersection with early Indo-European (the Late PIE-speaking Yamnaya culture of the western steppe, the Afanasievo culture of the upper Yenisei, and the Fatyanovo culture of the middle Volga); lexical and morphological reconstruction sufficient to establish critical absences of sharings and contacts. We add information on climate, linguistic geography, typology, and cognate frequency distributions to reconstruct the Uralic origin and spread. We argue that the Uralic homeland was east of the Urals and initially out of contact with Indo-European. The spread was rapid and without widespread shared substratal effects. We reconstruct its cause as the interconnected reactions of early Uralic and Indo-European populations to a catastrophic climate change episode and interregionalization opportunities which advantaged riverine hunter-fishers over herders. Keywords Uralic; Finno-Ugric; Indo-European; Yamnaya; Indo-Iranian; Siberia; Eurasia; Seima-Turbino, 4.2 ka event; linguistic homeland Note: In-text references have not yet been fully deanonymized. 2 Drastic demographic events triggered the Uralic spread (Contents, for convenience) Main text (pp.
    [Show full text]
  • Materials on Forest Enets, an Indigenous Language of Northern Siberia
    Materials on Forest Enets, an Indigenous Language of Northern Siberia SUOMALAIS-UGRILAISEN SEURAN TOIMITUKSIA MÉMOIRES DE LA SOCIÉTÉ FINNO-OUGRIENNE ❋ 267 ❋ Florian Siegl Materials on Forest Enets, an Indigenous Language of Northern Siberia SOCIÉTÉ FINNO-OUGRIENNE HELSINKI 2013 Florian Siegl: Materials on Forest Enets, an Indigenous Language of Northern Siberia Suomalais-Ugrilaisen Seuran Toimituksia Mémoires de la Société Finno-Ougrienne 267 Copyright © 2013 Suomalais-Ugrilainen Seura — Société Finno-Ougrienne — Finno-Ugrian Society & Florian Siegl Layout Anna Kurvinen, Niko Partanen Language supervision Alexandra Kellner This study has been supported by Volkswagen Foundation. ISBN 978-952-5667-45-5 (print) MÉMOIRES DE LA SOCIÉTÉ FINNO-OUGRIENNE ISBN 978-952-5667-46-2 (online) SUOMALAIS-UGRILAISEN SEURAN TOIMITUKSIA ISSN 0355-0230 Editor-in-chief Riho Grünthal (Helsinki) Vammalan Kirjapaino Oy Editorial board Sastamala 2013 Marianne Bakró-Nagy (Szeged), Márta Csepregi (Budapest), Ulla-Maija Forsberg (Helsinki), Kaisa Häkkinen (Turku), Tilaukset — Orders Gerson Klumpp (Tartu), Johanna Laakso (Wien), Tiedekirja Lars-Gunnar Larsson (Uppsala), Kirkkokatu 14 Matti Miestamo (Stockholm), FI-00170 Helsinki Sirkka Saarinen (Turku), www.tiedekirja.fi Elena Skribnik (München), Trond Trosterud (Tromsø), [email protected] Berhard Wälchli (Stockholm), FAX +358 9 635 017 Jussi Ylikoski (Kautokeino) He used often to say there was only one Road; that it was like a great river: its springs were at every doorstep, and every path was its tributary. “It’s a dangerous business, Frodo, going out of your door,” he used to say. “You step into the Road, and if you don’t keep your feet, there is no knowing where you might be swept off to […]” (The Fellowship of the Ring, New York: Ballantine Books, 1982, 102).
    [Show full text]
  • The Ethno-Linguistic Situation in the Krasnoyarsk Territory at the Beginning of the Third Millennium
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Siberian Federal University Digital Repository Journal of Siberian Federal University. Humanities & Social Sciences 7 (2011 4) 919-929 ~ ~ ~ УДК 81-114.2 The Ethno-Linguistic Situation in the Krasnoyarsk Territory at the Beginning of the Third Millennium Olga V. Felde* Siberian Federal University 79 Svobodny, Krasnoyarsk, 660041 Russia 1 Received 4.07.2011, received in revised form 11.07.2011, accepted 18.07.2011 This article presents the up-to-date view of ethno-linguistic situation in polylanguage and polycultural the Krasnoyarsk Territory. The functional typology of languages of this Siberian region has been given; historical and proper linguistic causes of disequilibrum of linguistic situation have been developed; the objects for further study of this problem have been specified. Keywords: majority language, minority languages, native languages, languages of ethnic groups, diaspora languages, communicative power of the languages. Point Krasnoyarsk Territory which area (2339,7 thousand The study of ethno-linguistic situation in square kilometres) could cover the third part of different parts of the world, including Russian Australian continent. Sociolinguistic examination Federation holds a prominent place in the range of of the Krasnoyarsk Territory is important for the problems of present sociolinguistics. This field of solution of a number of the following theoretical scientific knowledge is represented by the works and practical objectives: for revelation of the of such famous scholars as V.M. Alpatov (1999), characteristics of communicative space of the A.A. Burikin (2004), T.G. Borgoyakova (2002), country and its separate regions, for monitoring V.V.
    [Show full text]
  • A Reappraisal of M. Alexander Castrén's Forest Nenets Records
    Rentota Relata Studia Orient¡¡lia 97, Helsinki 2003, pp. 263-271 A Reappraisal of M. Alexander Castrén's Forest Nenets Records Tapani Salminen Among the first of the many highlights of M. Alexander Castrén's great Siberian expedition (1845-1849) was his scientific discovery of the Forest Nenets people and language in the surnmer of 1845. It turned out to have significant consequences for Castrén's thinking of comparative linguistics and Siberian ethnohistory, and his records of the language continue to be of utmost importance to Samoyedological studies. More exactly, Castrén's material shows the existence of two distinct varieties of Forest Nenets, and the bulk of this article is devoted to determining the nature of their differences and their background. The progress of Castrén's expedition is accounted in detail in Castrén (1856), and his encounters with the Forest Nenets are dcscribed in the section titled <<Reise von Samarowa nach Surgub>, which includes a travel account (pp. 62-88) as well as two letters, one to Castrén's supervisor, Councillor of State A. J. Sjögren (pp. 88-90), and the other to his friend, Assessor F. J. Rabbe (pp. 90-92). In the summer of 1845, Castrén was mainly engaged in Khanty studies, but he was even more intrigued about the possibility of contacting Forest Nenets people, known at the time as Kondinsk or Kazym Samoyeds, who he had heard about on his first expedition. He finally succeeded in meeting and, albeit briefly, working with a few Forest Nenets after he had left Samarova at the beginning of July and, for a month, travelled along the waterways of the Ob to various directions between Samarova and Silyarskoy.
    [Show full text]
  • [.35 **Natural Language Processing Class Here Computational Linguistics See Manual at 006.35 Vs
    006 006 006 DeweyiDecimaliClassification006 006 [.35 **Natural language processing Class here computational linguistics See Manual at 006.35 vs. 410.285 *Use notation 019 from Table 1 as modified at 004.019 400 DeweyiDecimaliClassification 400 400 DeweyiDecimali400Classification Language 400 [400 [400 *‡Language Class here interdisciplinary works on language and literature For literature, see 800; for rhetoric, see 808. For the language of a specific discipline or subject, see the discipline or subject, plus notation 014 from Table 1, e.g., language of science 501.4 (Option A: To give local emphasis or a shorter number to a specific language, class in 410, where full instructions appear (Option B: To give local emphasis or a shorter number to a specific language, place before 420 through use of a letter or other symbol. Full instructions appear under 420–490) 400 DeweyiDecimali400Classification Language 400 SUMMARY [401–409 Standard subdivisions and bilingualism [410 Linguistics [420 English and Old English (Anglo-Saxon) [430 German and related languages [440 French and related Romance languages [450 Italian, Dalmatian, Romanian, Rhaetian, Sardinian, Corsican [460 Spanish, Portuguese, Galician [470 Latin and related Italic languages [480 Classical Greek and related Hellenic languages [490 Other languages 401 DeweyiDecimali401Classification Language 401 [401 *‡Philosophy and theory See Manual at 401 vs. 121.68, 149.94, 410.1 401 DeweyiDecimali401Classification Language 401 [.3 *‡International languages Class here universal languages; general
    [Show full text]