Russian National Corpus

Total Page:16

File Type:pdf, Size:1020Kb

Russian National Corpus Russian National Corpus ruscorpora.ru Ekaterina Rakhilina, Vladimir Plungian, Olga Lyashevskaya, Dmitry Sichinava RNC Workshop, SCLC 2014 16 Feb 2014 Harvard University 1 Preliminary plan Russian National Corpus Season 2014: hints and tricks new features and plans Corpus data for offline research Discussion Your input is much appreciated! 2 Main participants V.V.Vinogradov Russian Language Institute Russian Academy of Sciences Moscow Yandex Internet and technology company 3 Ilya Segalovitch (1964-2013) chief technical officer of Yandex RNC non-commercial partnership universities (Moscow, Saint-Petersburg, Saratov, etc.) research institutes (IPPI RAN, ILI RAN) IT-companies personal membership You are welcome to share your corpus data through RNC! New goals: Licensing issues and data distribution. 5 Corpora RusGram Statistics & offline data Dictionaries RUSCORPORA.RU family The main corpus of written Modern Russian (1700-present, 230 MW) Newspapers & news (2000-present, 174 MW) Corpus of Russian poetry (10 MW) Spoken corpus (11 MW) Multimedia corpus (4 MW) Accentuated corpus (14 MW) Parallel corpora (54 MW) Syntactic treebank (0,7 MW) Corpus of Russian dialects Russian-for-Schools corpus 7 RUSCORPORA.RU - new corpora Diachronic corpora: Old Russian Church-Slavonic Middle Russian Blogger corpus Learner corpora 8 The full body of search results is freely available online 9 ... also in KWIC format 10 Sorting results 11 Saving results in Excel format 12 Customizing subcorpus The main corpus: Modern fiction of various genres Modern drama Memoirs and biographies Journalism and literary criticism Scientific, popular scientific and teaching texts Religious and philosophical texts Technical texts Business and jurisprudence texts Day-to-day life texts, including texts not intended for publication (letters, diaries, etc.) 13 Hints & tricks sorting: надо же было ... раз...ся (рас...ся ) Мама мыла раму. hypocoristic personal names not ending with *чка, *нька use word-formation вс- prefix also with possible alternations also on the 2nd place 14 Recent news from the RNC Poetry: up to 1990-2002 MURCO: Multi-media corpus (movies, talks, etc.) types of speech situations (welcome, questioning, interview, dispute, quarrel etc.) gestures + gestures provided by speech + academic talks & discussions + Parallel Spoken Russian: Gogol's Revizor on many stages (MultiParC) Diachronic evidences (Russian in XII-XVII cc.) Parallel corpora 15 Corpus of Russian poetry Corpus of Russian poetry RUSCORPORA.RU - new corpora Diachronic corpora Old Russian & Birch letters Church-Slavonic Middle Russian Slavic parallel corpora Blogger corpus Learner corpora 18 Old-Russian Old-Russian RNC annotation: the main corpus Four major annotation layers: meta-textual annotation register/genre, author, creation date, size, etc. word-level morphosyntactic annotation lemma, POS, inflectional categories, distorted or anomalous forms etc. accentual annotation normative place of accent, accentual shifts in fixed expressions lexico-semantic annotation lexical classes of verbs, nouns, pronouns, adjectives and adverbs + new! word-formation annotation prefixes, suffixes, roots 21 N-gram viewer http://ruscorpora.ru/ngram.html word forms - Графики cf. Google Books Ngram Viewer + wildcards *сторонился year span by by date of creation, not date of publishing (cf. GoogleBooks) smoothing (3... to 20 is recommended) lemmas, not words - Распределение по годам (output page) Статистика по метаатрибутам 22 Графики : сторонился , посторонился , *сторонился Year: 1800... 2010 Smoothing: 10 Annotation mistakes and how to fix them Please tag mistakes if you come across them in the output data 25 Even more Russian corpora in cooperation with the RNC "Simple" Russian (HSE in Nizhny Novgorod) "we cannot ask 5-year-old children to read examples from the corpus" (NB students!) a subcorpus of short simple sentences, frequent words from the "lexical minimum" "Non-perfect" Russian Heritage language in Finland and USA (study of language interference) Russian as L2 in Daghestan and other parts of Russia Learner corpus of academic writing 26 Even more Russian corpora in cooperation with the RNC "Simple" Russian (HSE in Nizhny Novgorod) "we cannot ask 5-year-old children to read examples from the corpus" (NB students!) a subcorpus of short simple sentences, frequent words from the "lexical minimum" "Non-perfect" Russian Heritage language in Finland and USA (study of language interference) Russian as L2 in Daghestan and other parts of Russia Learner corpus of academic writing > 27 Корпус Академического Письма http://web- corpora.net/RussianAcademCorpus/search/ Essays, drafts of term papers, other academic texts written by students >> sociology, economics, politics, law, psychology, linguistics, management, etc. >> 1 MW available so far 28 Corpus of academic writing 29 Corpus of academic writing 3 level of mistake annotation 1) linguistic type (orthography, punctuation, lexical choice, grammatical choice & form, discourse-oriented) 2) weight (minor mistake, medium level, major/critical mistake) 3) interpretation: what is the cause? (misprint, wrong synonym, mixt of constructions, etc) 30 Russian learner corpus Heritage language http://web- corpora.net/RussianLearnerCorpus/sear ch National Heritage Language Resource Center (UCLA) Polynsky Lab in Harvard О. Kisselev, A. Alsufieva, I.Dubibina et al. E. Rakhilina and her research lab in HSE 31 32 Some examples Эти ноутбуки потребляли меньше энергии, но были менее компактнее по объему. И прибыль от разрушения гораздо более заметна и быстра, нежели чем от строительства. В русском языке семантический диапазон данного слова чрезвычайно широк, нежели в английском (Academic Writing Corpus) В России человек больше (! чаще)чаще считается расистом из за действий (Heritage Corpus) 33 Corpora RusGram Statistics & offline data Dictionaries RusGram Corpus-based Russian reference grammar traditional академическая грамматика morphology (inflection) syntax + RNC-based statistics + lexical anchors in focus substandard Russian: negative evidences or "points of future development"? 35 rusgram.ru Corpus-based dictionaries http://dict.ruslang.ru/ Frequency dictionary of Modern Russian offline version available from my homepage New grammatical dictionary Russian idiomaticity in real usage (with frequences): Which adjectival intensifier can we use with nouns? Which verb can we use with abstract nouns? Framebank (the dictionary of argument-predicate constructions attested in the RNC) offline release summer 2014 37 Corpus-based dictionaries In progress: Grammatical forms of Russian lexemes Paradigms of verbs, nouns, adjectives Distribution by time & text registers Lexical classes: comparative study 38 Corpora RusGram Statistics & offline data Dictionaries Statistics & offline use Overall idea: to show patterns in your output statistics visualization But: RNC corpus workbench is not adapted to work with customized set of data 1 step: N-grams 40 N-grams search Beta! 2-, 3-, 4-, 5- word chains не до * потрясающе (* о | *е) Most frequent N-grams - ЧАСТОТЫ In progress: Search by lemma, morphology, semantics, word formation In progress: Explore time & text registers + in any subcorpus of your choice In progress: Search with distance btw words (incl. repetitions) 41 Offline data for advanced users & computational resources NB! We are linguists, not lawyers: we cannot distribute texts But: we can share annotations & statistics on this data So far: ЧАСТОТЫ : 2-, 3-, 4-, 5-grams http://ruscorpora.ru/corpora-freq.html 1 MW Morphological standard (manually disambiguated, shuffled sentences) Plans: N-grams for other corpora + annotated data POS-annotations etc. V-S-S-CONJ-ADJ-S. 42 studiorum.ruscorpora.ru A companion web site to the RNC Corpus methods in linguistic research Corpus in teaching Russian as a second language Corpus in teaching linguistics, Russian stylistics, philology and social sciences Corpus in teaching Russian in school References (incl. PhD manuscripts and term papers) Corpus resources F.A.Q. 43 Discussion Any questions? comments? complaints? What would you like to see in the corpus? Known issues > 44 Known issues 1. A bag of words Solution: TBA soon Lemma: дуло 'muzzle' Gram: V annotated n-grams 2. *базар* database search (разбазарить, разбазаривать, пробазарить, базарчик, Базаров ) NB word-formation: just words in the dictionary 3. Search across sentence boundaries 4. Unbalansed portions of data across time который и, в, на , они не 45 Thank you! Спасибо ! http://ruscorpora.ru 46 Appendix: RNC annotation layers meta-text info morphology lexico-semantic classes 47 RNC annotation: the main corpus Four major annotation layers: meta-textual annotation register/genre, author, creation date, size, etc. word-level morphosyntactic annotation lemma, POS, inflectional categories, distorted or anomalous forms etc. accentual annotation normative place of accent, accentual shifts in fixed expressions lexico-semantic annotation lexical classes of verbs, nouns, pronouns, adjectives and adverbs + new! word-formation annotation prefixes, suffixes, roots 48 Subcorpora and meta-textual parameters >> 49 Morphological parsing Zaliznjak's (1967, 1977) formal model of Russian inflection A set of parsers based on Grammatical dictionary MYSTEM (Segalovich 2003) and DIALING (Sokirko 2004) morphological parsers in use Lemma, POS and grammatical features: Examples: взял ‘take.PAST’ <ana lex=“взять"
Recommended publications
  • SCLA Book of Abstracts
    Aleksandrs Berdicevskis & Alexander Piperski WHAT DO WE REGULARIZE AND WHAT IS REGULAR: RUSSIAN VERBS THROUGH THE CENTURIES One of the most notable and widespread long-term processes in language change is the regularization of morphological forms. It has been studied from various aspects, and questions that have been addressed include, for instance, who is most likely to eliminate irregularities, children or adults (Hudson Kam & Newport 2009), when irregularities are most likely to be eliminated, in what social circumstances (Berdichevskij 2012), which irregularities are most likely to be eliminated (Lieberman et al 2007, Carroll et al. 2012). In this paper, we deal with the latter question. We also show, however, that in order to get a reliable answer a more fundamental question has to be addressed first: What is regular for the speakers’ minds? The answer is not always obvious. In a well-known study where a neat correlation between the rate of regularization of irregular English verbs and the frequency of word usage was found, Lieberman et al. (2007) classify the -ed verbs as regular and all other verbs as irregular, which seems a logical thing to do. Studying the same process in German strong verbs, Carroll et al. (2012) also use a binary opposition, noting though that for German this decision presents certain problems. We perform a similar study on Russian verbs, which cannot be divided into two classes (“regular” and “irregular”). Of the 16 basic inflectional classes (Zaliznjak 1977), 5 are sometimes labelled “regular” and 11 “irregular”, but they are in fact irregular to a different extent. Since binary notation is not an option, an understanding of what regularity actually is and how it should be operationalized is required.
    [Show full text]
  • Determining Morphosyntactic Feature Values: the Case of Case1 Greville G
    To appear in: Greville G. Corbett and Michael Noonan (eds) Case and grammatical relations: papers in honour of Bernard Comrie. Amsterdam: John Benjamins. Determining morphosyntactic feature values: the case of case1 Greville G. Corbett Surrey Morphology Group A long-running and still vital debate concerns the way in which we can determine the number of cases (case values) in a given language. This matters both for the description of particular languages, and even more for typology, given the imperative for the typologist to compare like with like. Within this debate special attention has been devoted to Russian. And rightly so, since Russian exhibits a whole set of difficult analytic problems with respect to case. As a result it has been claimed to have as few as six case values or as many as eleven. This contribution continues the debate, again giving Russian a central place. Our concern with case is partly with case as a feature (comparable to gender, number and person), but mainly with the values of the feature (nominative, accusative and so on).2 What is novel about it is first the adoption of a canonical approach, in which we construct a logical scheme against which to evaluate the different case values (see §1 below), and second the fact that the criteria we discuss are shown to be relevant to morphosyntactic features more generally, rather then being restricted to case. The debate on case has a distinguished earlier history, including among others Hjelmslev (1935-37), Jakobson (1936, 1958),3 de Groot (1939) and Kuryłowicz 1 This is an issue to which Bernard Comrie has made important contributions (1986, 1991).
    [Show full text]
  • Department of Modern Languages UPPSALA UNIVERSITY
    SLOVO Journal of Slavic Languages, Literatures and Cultures Volume 59, 2018 Department of Modern Languages UPPSALA UNIVERSITY Slovo Journal of Slavic Languages, Literatures and Cultures No. 59, 2018 Editors: Mattias Ågren, Julie Hansen, Jussi Nuorluoto, Jelena Spasenić Department of Modern Languages UPPSALA UNIVERSITET Department of Modern Languages Slovo. Journal of Slavic Languages, Literatures and Cultures No. 59, 2018 Front cover: Uppsala University Library Carolina Rediviva with marching students in the foreground. Johan Way, 1842 (UUB). ISSN: 2001–7395 CONTENTS ”RURIKS STAMTRÄD” – EN AV DE RYSKA SKATTERNA PÅ CAROLINA REDIVIVA 7 OLENA JANSSON, INGRID MAIER ЛЕКСИЧЕСКИЕ СРЕДСТВА ЭКСПРЕССИВНОГО И ЭМОЦИОНАЛЬНОГО ВОЗДЕЙСТВИЯ В ПОВЕСТИ АСТРИД ЛИНДГРЕН «ЭМИЛЬ ИЗ ЛЁННЕБЕРГИ» И ЕЁ ПЕРЕВОДАХ НА РУССКИЙ ЯЗЫК 40 ELENA KAPUSTINA, MARTINA BJÖRKLUND WHEN A SINGLE WORD IS ENOUGH: NORWEGIAN COMPOUNDS AND THEIR RUSSIAN COUNTERPARTS 61 TORE NESSET I MÖRKRETS SKUGGA… OM EDITH SÖDERGRANS OCH HALINA POŚWIATOWSKAS POETISKA VÄRLDAR 73 MAŁGORZATA ANNA PACKALÉN PARKMAN REVIEW. ANDREA GULLOTTA. INTELLECTUAL LIFE AND LITERATURE AT SOLOVKI 1923-1930: THE PARIS OF THE NORTHERN CONCENTRATION CAMPS. CAMBRIDGE: LEGENDA 2018. X + 370 PP. 93 IRINA KARLSOHN NEWS FROM UPPSALA 95 OLENA JANSSON IN MEMORIAM: PROFESSOR ANDREI ZALIZNIAK 99 KARINE ÅKERMAN SARKISIAN BIBLIOGRAPHY FOR 2017 102 JOHAN MUSKALA Slovo. Journal of Slavic Languages, Literatures and Cultures ISSN 2001–7395 No. 59, 2018, pp. 7–39 ”Ruriks stamträd” – en av de ryska skatterna på Carolina Rediviva1 Olena Jansson och Ingrid Maier Institutionen för moderna språk, Uppsala universitet [email protected] [email protected] Abstract. “Rurik’s genealogical tree” – one of the Russian treasures at Carolina Rediviva Among the treasures in the university library in Uppsala (Sweden) is an extraordinary genealogical tree of Russia’s Rurikid rulers, beginning with the legendary Rurik and ending with Tsar Fedor Ivanovič, who died in 1598.
    [Show full text]
  • Semantic Differences in Translation Exploring the Field of Inchoativity
    Semantic differences in translation Exploring the field of inchoativity Lore Vandevoorde language Translation and Multilingual Natural science press Language Processing 13 Translation and Multilingual Natural Language Processing Editors: Oliver Czulo (Universität Leipzig), Silvia Hansen-Schirra (Johannes Gutenberg-Universität Mainz), Reinhard Rapp (Johannes Gutenberg-Universität Mainz) In this series: 1. Fantinuoli, Claudio & Federico Zanettin (eds.). New directions in corpus-based translation studies. 2. Hansen-Schirra, Silvia & Sambor Grucza (eds.). Eyetracking and Applied Linguistics. 3. Neumann, Stella, Oliver Čulo & Silvia Hansen-Schirra (eds.). Annotation, exploitation and evaluation of parallel corpora: TC3 I. 4. Czulo, Oliver & Silvia Hansen-Schirra (eds.). Crossroads between Contrastive Linguistics, Translation Studies and Machine Translation: TC3 II. 5. Rehm, Georg, Felix Sasaki, Daniel Stein & Andreas Witt (eds.). Language technologies for a multilingual Europe: TC3 III. 6. Menzel, Katrin, Ekaterina Lapshinova-Koltunski & Kerstin Anna Kunz (eds.). New perspectives on cohesion and coherence: Implications for translation. 7. Hansen-Schirra, Silvia, Oliver Czulo & Sascha Hofmann (eds). Empirical modelling of translation and interpreting. 8. Svoboda, Tomáš, Łucja Biel & Krzysztof Łoboda (eds.). Quality aspects in institutional translation. 9. Fox, Wendy. Can integrated titles improve the viewing experience? Investigating the impact of subtitling on the reception and enjoyment of film using eye tracking and questionnaire data. 10. Moran, Steven & Michael Cysouw. The Unicode cookbook for linguists: Managing writing systems using orthography profiles. 11. Fantinuoli, Claudio (ed.). Interpreting and technology. 12. Nitzke, Jean. Problem solving activities in post-editing and translation from scratch: A multi-method study. 13. Vandevoorde, Lore. Semantic differences in translation. ISSN: 2364-8899 Semantic differences in translation Exploring the field of inchoativity Lore Vandevoorde language science press Vandevoorde, Lore.
    [Show full text]
  • Martin Haspelmath, List of Publications
    Martin Haspelmath - List of publications To appear 2010: [Andrej L. Malchukov, Martin Haspelmath & Bernard Comrie] 2010. Studies in Ditransitive Constructions: A Comparative Handbook. Berlin: De Gruyter Mouton. "The Behaviour-before-Coding Principle in syntactic change." To appear in: Floricic, Franck (ed.) Mélanges Denis Creissels. Paris: Presses de L'École Normale Supérieure. [Uri Tadmor, Martin Haspelmath & Bradley Taylor] "Borrowability and the notion of basic vocabulary." Diachronica 2010 "Framework-free grammatical theory." In: Heine, Bernd & Narrog, Heiko (eds.) The Oxford handbook of grammatical analysis. Oxford: Oxford University Press, 341-365. 2009 [Martin Haspelmath & Uri Tadmor] (eds.) 2009. Loanwords in the World's Languages: A Comparative Handbook. Berlin: De Gruyter Mouton, 1081 pp. [Martin Haspelmath & Uri Tadmor] (eds.) 2009. World Loanword Database. Munich: Max Planck Digital Library. Online resource, <http://wold.livingsources.org/> [Martin Haspelmath & Uri Tadmor]. 2009. "The Loanword Typology project and the World Loanword Database." In: [Martin Haspelmath & Uri Tadmor] (eds.) 2009. Loanwords in the World's Languages: A Comparative Handbook. Berlin: De Gruyter Mouton, 1-34. "Lexical borrowing: Concepts and issues." In: Martin Haspelmath & Uri Tadmor (eds.) 2009. Loanwords in the World's Languages: A Comparative Handbook. Berlin: De Gruyter Mouton, 35-54. "Terminology of case." In: Malchukov, Andrej & Spencer, Andrew (eds.) 2009. The Oxford handbook of case. Oxford: Oxford University Press, 505-517. "An empirical test of the Agglutination Hypothesis." Scalise, Sergio & Magni, Elisabetta & Bisetto, Antonietta (eds.) 2009. Universals of language today. (Studies in Natural Language and Linguistic Theory, 76.) Dordrecht: Springer, 13-29. "The typological database of the World Atlas of Language Structures." In: Everaert, Martin & Musgrave, Simon (eds.) 2009.
    [Show full text]
  • Participle-Converbs in Iron Ossetic: Syntactic and Semantic Properties1
    Participle-Converbs in Iron Ossetic: 1 Syntactic and Semantic Properties Oleg Belyaev, Arseniy Vydrin This paper concerns the use of forms in -gɐ and -gɐjɐ in contemporary Ossetic. Our aim is to produce a typologically informed and fine-grained account of both the syntax and semantics of these two formatives. As we will show, the main difference is that while the form in -gɐ is a participle-converb (with a wide range of uses), the form in -gɐjɐ is a converb proper. At the same time, there are a number of surprising syntactic effects and subtle semantic differences. We will provide a general description of the uses of the two forms and of some of the typologically interesting facts associated with them, and an explanation of the facts observed. 1. Introduction The two deverbal Ossetic2 forms under discussion are derived from the present stem via the affixes -gɐ and -gɐjɐ (formally the ablative of -gɐ), e.g. kɐn-ən "to do" → kɐn-gɐ(jɐ) žɐʁ-ən "to say" → žɐʁ-gɐ(jɐ) According to the main reference grammars (ABAEV 1970, AXVLEDIANI 1963, BAGAEV 1965), the form in -gɐ can be used both as a participle (1) and as a converb (2), while the form in -gɐjɐ can only be used as a converb (2). Word order in Ossetic NPs is fairly rigid, and attributes must always be preposed to the noun they modify. Therefore, the form in -gɐ(jɐ) in (2) is unambiguously converbal. (1) wəsə lɐppu-jə ɐldar xɐd-tul-gɐ(*-jɐ) wɐrdon-ə š-bad-ən kod-t-a that boy-GEN landlord self-roll-PART-ABL cart-IN PV-sit-INF do-TR-PST.3SG 3 "the landlord made that boy sit on a "self-going"cart"(ABAEV 1970: 612) (2) lɐppu kɐw-gɐ(-jɐ) ba-səd jɐ=xɐzar-mɐ boy cry-PART(-ABL) PV-go.PST.3SG POSS.3SG=house-ALL 4 "the boy, crying, came to his house" 1 The research was carried out with the financial support of RGNF, grant No.
    [Show full text]
  • The Expression of Modality in Logoori
    JALL 2020; 41(2): 195–238 John Gluckman* and Margit Bowler* The expression of modality in Logoori https://doi.org/10.1515/jall-2020-2010 Abstract: This study presents a theoretically informed description of the expres- sion of modality in Logoori (Luyia; Bantu). We document verbal and non-verbal modal expressions in Logoori, and show how these expressions fit into proposed typologies of modal systems (Kratzer, Angelika. 1981. The notional category of modality. In Hans-Jurgen Eikmeyer & Hannes Rieser (eds.), Words, worlds, and contexts: New approaches in word semantics,38–74. Berlin: Mouton de Gruyter, Kratzer, Angelika. 1991. Modality. In Armin von Stechow & Dieter Wunderlich (eds.), Semantics: An international handbook of contemporary research, 639–650. Berlin: Mouton de Gruyter; van der Auwera, Johan & Vladimir Plungian. 1998. Modality’s semantic map. Linguistic Typology 2. 79–124. https://doi.org/10.1515/ lity.1998.2.1.79; Nauze, Fabrice. 2008. Modality in typological perspective. Amsterdam: Institute for Logic, Language, and Computation PhD thesis). We show that Logoori’s modal system raises some interesting questions regarding the typology and theoretical analysis of modality and its relationship to other kinds of meaning. Our study contributes to the nascent but growing research on modal systems cross linguistically by adding data from an understudied Bantu language. Keywords: Logoori, modality, Bantu, typology Abstract in Loogori: Kuloma sia linyalika na lidemadema, kya kulanganga “imiima” (“modality”), gavoleka mu Lulogooli. Lulogooli ni lulimi lwa ihiri ya avaluhya, na lumolomwa mu vivala vya imugwi wa Afrika. Ulusuma ilu lunduta kutula ku zisaabu na lilekanya lya uvuhandiki na uvwimiridzu vwa imiima. Kulangama sia livugirirana lya tsingulu (“force”) na lifunya (“flavor”) kutula ku Kratzer, Angelika.
    [Show full text]
  • Towards a Corpus-Based Grammar of Upper Lozva Mansi: Diachrony and Variation
    Introduction Upper Lozva Mansi Our Documentation Grammar writing Conclusion Towards a corpus-based grammar of Upper Lozva Mansi: diachrony and variation Daria Zhornik, Sophie Pokrovskaya, Vladimir Plungian Lomonosov Moscow State University Institute of Linguistics, Russian Academy of Sciences [email protected] [email protected] [email protected] Descriptive Grammars and Typology, Helsinki, 29.03.2019 . Daria Zhornik, Sophie Pokrovskaya, Vladimir Plungian MSU, ILing RAS Upper Lozva Mansi: diachrony and variation Introduction Upper Lozva Mansi Our Documentation Grammar writing Conclusion Overview Introduction Upper Lozva Mansi Our Documentation Grammar writing Conclusion . Daria Zhornik, Sophie Pokrovskaya, Vladimir Plungian MSU, ILing RAS Upper Lozva Mansi: diachrony and variation Introduction Upper Lozva Mansi Our Documentation Grammar writing Conclusion Some background I Mansi < Ob-Ugric < Uralic, EGIDS status: threatened; I Russia, 940 speakers (2010 census) in Western Siberia. Daria Zhornik, Sophie Pokrovskaya, Vladimir Plungian MSU, ILing RAS Upper Lozva Mansi: diachrony and variation Introduction Upper Lozva Mansi Our Documentation Grammar writing Conclusion The Mansi language I Earlier: 4 Mansi dialect branches I Today: only the Northern group survives I Most Northern Mansi speakers: disperse distribution in the Khanty-Mansi Autonomous Okrug I Speakers mostly live in Russian villages/towns: I forced shift to Russian; I almost no opportunities for using Mansi; I all education is in Russian I In turns out that much
    [Show full text]
  • Notes on Eastern Armenian Verbal Paradigms
    Vladimir Plungian Notes on Eastern Armenian verbal paradigms “Temporal mobility” and perfective stems Abstract: The paper discusses two “hidden” semantic oppositions in the Arme- nian verbal system: both have no specific segmental markers but are manifested in the division of verbal forms into certain formal classes. In the first case, we deal with the the division into synthetic and periphrastic forms, which corresponds to the expression of the so-called "temporal mobility" (or the ability to express the opposition between present and past). In the second case, it is the morphological opposition between the basic verbal stem and the stem with an alternation. The choice of the alternating stem is related to the perfective semantics of the verbal form, so that one can speak of a general aspectual opposition of perfective and imperfective sets of forms in Armenian (not isolated in traditional analysis). Keywords: Armenian, verbal inflection, tense, aspect Introduction The main focus of the present paper will be certain formal properties of verbal paradigms in Armenian, first of all those which may have special cross-linguistic relevance. To the best of our knowledge, these properties have not been dis- cussed in the specialized literature at any length (if at all). The main bulk of our material comes from the standard written language of the Republic of Armenia, i.e., Average East Armenian (cf. Vaux 1998). Hereafter, we will refer to it simply as “Armenian”, unless otherwise stated. Similar properties of other idioms of Modern Armenian (both Eastern and Western dialects) deserve a separate and a more detailed discussion, which is far beyond the scope of the present study.
    [Show full text]
  • Peter M. Arkadiev
    PETER M. ARKADIEV CHAPTER 7 Differential Argument Marking in Two-term Case Systems and its Implications for the General Theory of Case Marking1 1. INTRODUCTION In this paper I present a view of case marking that explicitly rejects a commonly assumed position that its primary function is to merely distinguish arguments from one another (cf. Comrie 1978, 1989; Dixon 1979, 1994), while marking them according to their specific semantic or pragmatic functions is a secondary phenomenon. In order to show that such a view (which has already been challenged by many linguists, see section 2) is untenable, I will investigate data from argument- encoding variations in languages which possess only two cases, and will compare them with similar phenomena from languages with richer case systems. As it will be seen, ‘nondiscriminative’ coding strategies found in two-term case systems, though typologically unusual, can be easily accounted for under the assumption that case marking of a particular argument is subject to ‘local’ ‘indexing’ rules and constraints dealing rather with this particular argument, than with the overall ‘global’ relational structure of the clause. The ‘discriminatory’ function, though retaining its importance, is, in this view, no more than just one of the constraints relevant for argument marking, whose ranking with regards to other such constraints is not always and not necessarily high. Also, I am going to argue that, contrary to some recent Optimality-theoretic proposals (see e.g., Woolford 2001), the case inventory found in a particular language cannot be always derived from a universal set of constraints (see Wunderlich and Lakämper 2001 for a similar proposal).
    [Show full text]
  • Differential Argument Marking in Two-Term Case
    PETER M. ARKADIEV DIFFERENTIAL ARGUMENT MARKING IN TWO-TERM CASE SYSTEMS AND ITS IMPLICATIONS FOR THE GENERAL 1 THEORY OF CASE MARKING INTRODUCTION In this paper I present a view of case marking that explicitly rejects a commonly assumed position that its primary function is to merely distinguish arguments from one another (cf. Comrie 1978, 1989; Dixon 1979, 1994; Kibrik 1979, 1997), while marking them according to their specific semantic or pragmatic functions is a secondary phenomenon. In order to show that such a view (which has already been challenged by many linguists, see section 1) is untenable, I will investigate data from argument-encoding variations in the languages which possess only two cases, and will compare them with similar phenomena from languages with richer case systems. As it will be seen, ‘nondiscriminative’ coding strategies found in two-term case systems, though typologically unusual, can be easily accounted for under the assumption that case marking of a particular argument is subject to ‘local’ ‘indexing’ rules and constraints dealing rather with this particular argument, than with the overall ‘global’ relational structure of the clause. ‘Discriminatory’ function, though retaining its importance, is, in this view, no more than just one of the constraints relevant for argument marking, whose ranking with regards to other such constraints in not always and not necessarily high. Also, I am going to argue that, contrary to some recent Optimality-theoretic proposals (see e.g. Woolford 2001), the case inventory found in a particular language cannot be always derived from a universal set of constraints (see Wunderlich & Lakämper 2001 for a similar proposal).
    [Show full text]
  • Grammaticalization of the Verb to Acquire Into Modality
    Taiwan Journal of Linguistics Vol. 13.2, 117-150, 2015 DOI: 10.6519/TJL.2015.13(2).4 GRAMMATICALIZATION OF THE VERB ‘TO ACQUIRE’ INTO MODALITY: A CASE STUDY IN VIETNAMESE* Kingkarn Thepkanjana and Soraj Ruangmanee Chulalongkorn University ABSTRACT The verbs meaning ‘to acquire’ are known to be highly polyfunctional words in Southeast Asian languages. They have many syntactic functions and many grammatical meanings. One of the important grammatical meanings expressed by the verbs meaning ‘to acquire’ across languages is modality. This study aims to investigate the grammaticalization of the verb ‘to acquire’ in Vietnamese, namely, được, into a grammatical marker of many types of modality. It is found that được can indicate three types of modality, i.e. ability/capacity modality, circumstantial possibility, and permission. It has not developed into a full- fledged epistemic modal yet. It is argued that the grammaticalization of được into different types of modality is primarily driven by metonymic processes. Key words: grammaticalization, modality, verb ‘to acquire’, Vietnamese * We thank the two anonymous reviewers of this paper for valuable comments and suggestions. We also thank Professor Ly Toan Thang for providing various kinds of help with the Vietnamese data used in this study. This research work is supported by the Ratchadaphiseksomphot Endowment Fund of Chulalongkorn University (RES560530179-HS) granted to the first author and a postdoctoral scholarship of the Graduate School, Chulalongkorn University, granted to the second author. 117 Kingkarn Thepkanjana and Soraj Ruangmanee 1. INTRODUCTION It is widely known that the verbs meaning ‘to acquire’ are among the most polyfunctional words in Southeast Asian languages (van der Auwera; Kehayov and Vittrant 2007; Enfield 2003).
    [Show full text]