Parsing a Polysynthetic Language

Total Page:16

File Type:pdf, Size:1020Kb

Parsing a Polysynthetic Language Parsing a Polysynthetic Language Petr Homola Codesign, s.r.o. [email protected] Abstract (see Section 2.3 for a brief description). A rare We present the results of a project of build- property of words in Aymara is the so-called vowel ing a lexical-functional grammar of Ay- elision (sometimes called ‘subtractive morphol- mara, an Amerindian language. There was ogy’) which is quite hard to describe formally. We almost no research on Aymara in compu- show how vowel elision can be dealt with in the tational linguistics to date. The goal of lexicon. the project is two-fold: First, we want to The paper is organized as follows: Section 2 provide a formal description of the lan- presents selected properties of Aymara, many of guage. Second, NLP resources (lexicon them absent from well-researched languages such and grammar) are being developed that as English, and their formal analysis in LFG. could be used in machine translation and Section 3 introduces a dependency-based abstrac- other NLP tasks. The paper presents for- tion of f-structures which brings formal grammars mal description of selected properties of closer cross-linguistically. Section 4 describes Aymara which are uncommon in well- our experiment with MT from Aymara into Span- researched Western languages. Further- ish and English. Finally, we conclude in Section 5 more, we present an experimental machine and give an outlook for further research. translation system into Spanish and En- 2 Some Properties of Aymara glish. 1 Introduction In this section, we focus on some properties of Aymara at the level of morphology and syntax Aymara is an Amerindian language spoken in Bo- which are mostly absent from Western languages livia, Chile and Peru by approx. two million peo- such as English, and sketch their analysis in LFG. ple. It is a polysynthetic language that has many A detailed description of the language can be lexical and structural similarities with Quechua found in (Hardman et al., 2001; Adelaar and but the often suggested genetic relationship be- Muysken, 2007; Cerrón-Palomino and Carvajal, tween these languages is still disputed. 2009; Briggs, 1976). The only research on Aymara in the field of computational linguistics we know about is 2.1 Agglutinative Morphology the project described in (Beesley, 2006). The Aymara has a very rich inflection. Suffixes presented project uses Lexical-Functional Gram- of various categories can be chained to build mar (LFG) (Kaplan and Bresnan, 1982; Bresnan, up long words that would be expressed by a 2001) to formally describe the lexicon, morphol- sentence in languages like English. For ex- ogy and syntax of Aymara in a manner suitable ample, alanxarusksmawa (ala-ni-xaru-si-ka-sma- for natural language processing (NLP). The gram- wa) means “I am preparing myself to go and buy mar we have implemented is capable of parsing it for you”. complex sentences with embedded clauses. We In concordance with the principle of lexical in- have also done experiments with machine transla- tegrity (Bresnan, 2001), we deal with morphol- tion (MT) into Spanish and English; the results are ogy in the lexicon. Ishikawa (1985) has suggested presented in Section 4. to use word-internal (sublexical) rules to analyze Aymara is a polysynthetic language with a very structurally complex words in agglutinative lan- complicated system of polypersonal agreement guages. We have adopted this analysis. 562 Proceedings of Recent Advances in Natural Language Processing, pages 562–567, Hissar, Bulgaria, 12-14 September 2011. 2.2 Vowel elision the final vowel of the word nucleus is elided and Aymara uses vowel elision as morphosyntacic (↑ COMPEL) = − if it is not. Nouns with two marking, as illustrated in (1) and (2).1 vowels do not define this attribute, i.e., it can be unified with both values. The corresponding rule (1) aycha manq’ani for compound nouns is given in (7). meat eater N0 → (N0)N “who eats much meat” (7) (↑ MOD) = ↓ ↑=↓ (↓ COMPEL) = + (2) aych manq’ani 2.3 Polypersonal agreement meat-ELI eat-FUT 3→3 Being a polysynthetic language, Aymara has “(s)he will eat meat” polypersonal conjugation, i.e., the finite verb There are three types of vowel elision that inter- agrees with the subject and with another argument act with each other. Object elision marks a noun which may be the object (direct or indirect) or an or pronoun as direct object, such as in (3) (as op- oblique argument. An example is given in (8). posed to (4)). (8) Uñjsma see-NFUT1→2 (3) khits uñji “I see/saw you.” whom-ELI see-NFUT3→3 “Whom does he/she see?” The morpholexical entry for uñjsma is given in (9).3 Note that the PRED value for both subject (4) khitis uñji and object is optional.4 who see-NFUT 3→3 (9) “Who does see him/her?” uñjsma V(↑PRED) =‘uñjañah(↑SUBJ)(↑OBJ)i’ (↑TAM TENSE) = NON-FUT Noun compound elision occurrs in NPs. The (↑TAM MOOD) = INDIC ((↑SUBJ PRED) = ‘PRO’) final vowel of noun attributes gets elided if they (↑SUBJ PERS) = 1 have three or more syllables, as illustrated in (5) ((↑OBJ PRED) = ‘PRO’) and (6). (↑OBJ PERS) = 2 (5) aymar aru The verb agrees with the subject and with the Aymara-ELI language most animate argument which may be a patient, “the Aymara language” addressee or source, e.g., um churäma-FUT1→2 “I will give you water” (addresse), aych aläma- (6) qala uta FUT1→2 “I will buy meat from you” (source) etc. stone house However, there are verbal suffixes which can make “stone house” the verb agree with other arguments, such as the beneficiary, e.g., aych churarapitäta-BEN,FUT2→1 Complement elision is applied to all words that “You will give him/her bread for me” (the verb are arguments or adjuncts of a verb except for the agrees with the beneficiary instead of the ad- final word of a clause.2 dressee). All these agreement rules are encoded Whereas object elision concerns the nucleus in the lexicon. of a word (the stem with an optional possessive and/or plural suffix), noun compound and comple- 2.4 Free Word Order ment elisions concern the final vowel of a word At the clause level, the word order in Aymara is (the vowel of the last suffix or the stem if there are not restricted although SOV is preferred. There is no suffixes). Vowel elision is dealt with in the lexi- also no evidence for a VP, thus we assume a flat con. As for noun compound elision, all nouns with phrase structure. The rules for matrix clauses are more than two syllables get (↑ COMPEL) = + if given in (10).5 1 3 In the glosses, FUT3→3 means future tense. The numbers TAM means Tense-Aspect-Mood. express the person of the subject and an additional argument, 4Both arguments can be dropped. mostly object. 5In the functional annotation, κ is either ‘−’ (no case) 2Object and noun compound elision has the gloss ELI in or a semantic case and GF is the corresponding grammatical our examples. function. 563 S → X + jumax PRON (↑PRED) = ‘PRO’ (10) (14) (↑PERS) = 2 (↑PRED FN) ∈ (↑iTOP) where X is V or NP/CP ↑=↓ (↓ CASE) = κ ⇒ 2 n o3 (↑ GF) =↓ TOP ‘jumax’ 6 7 4 n o5 FOC ‘qillqiri’ CP → (C) , S ↑=↓ ↑=↓ jumaw PRON (↑PRED) = ‘PRO’ As can be seen, word order in a clause is free (15) (↑PERS) = 2 with the exception of an optional complementizer (↑PRED FN) ∈ (↑iFOC) (see (11) and (12)) which can be placed at the be- 2 n o3 ginning of the clause or at its end. TOP ‘qillqiri’ 6 7 4 n o5 FOC ‘jumax’ (11) Ukat juti then come-NFUT 3→3 The i-structure is very important for correct “Then (s)he came.” translation. For example, the sentence Chachax li- wrw liyi would be translated as “The man read(s) a (12) Jutät ukaxa. book” whereas Chachaw liwrx liyi would be better come-FUT if 2→3 translated as “The book is/was read by a man”.7 “If you will come. ” 3 Lexical Mapping Theory and There are no discontinuous constituents and D-Structures complement clauses can be embedded in the ma- trix sentence. Since Aymara is not discourse- Although f-structures abstract to some extent from configurational (see the next subsection), the word language specific features (such as differential ob- order, despite of being free, is usually unmarked ject marking, see (16) where the Spanish dative (SOV) and if it is different then mostly for stylis- phrase and the Polish genitive phrase would be in tic reasons. accusative in German), there are still many differ- ences even between relatively closely related lan- 2.5 Topic-Focus Articulation guages.8 We have adopted the approach proposed by King (1997). Thus we use an i(nformation)-structure to (16) Ayer visité a Juan approximate topic-focus articulation (TFA).6 yesterday visit-PAST,1SG to Juan A simple example of two sentences which differ “I visited Juan yesterday.” only in TFA is given in (13) (the word qullqirï is a verbalized noun). Nie mam samochodu NEG have-PRES,1SG car-SG,GEN (13) Jumax qillqirïtawa “I don’t have a car.” you-SG,TOP be-a-writer-NFUT2→3,FOC “You are a writer.” Wong and Hancox (1998) examine the use Jumaw qillqirïtaxa of a(rgument)-structures in machine translation you-SG,FOC be-a-writer-NFUT2→3,TOP 7Unlike some other languages with morphological topic “It is you who is the writer.” and/or focus markers, such as Japanese (cf. examples from (Kroeger, 2004): Taroo-wa-TOP sono hon-o-ACC yon- deiru “Taroo is reading that book.” vs.
Recommended publications
  • Computational Challenges for Polysynthetic Languages
    Computational Modeling of Polysynthetic Languages Judith L. Klavans, Ph.D. US Army Research Laboratory 2800 Powder Mill Road Adelphi, Maryland 20783 [email protected] [email protected] Abstract Given advances in computational linguistic analysis of complex languages using Machine Learning as well as standard Finite State Transducers, coupled with recent efforts in language revitalization, the time was right to organize a first workshop to bring together experts in language technology and linguists on the one hand with language practitioners and revitalization experts on the other. This one-day meeting provides a promising forum to discuss new research on polysynthetic languages in combination with the needs of linguistic communities where such languages are written and spoken. Finally, this overview article summarizes the papers to be presented, along with goals and purpose. Motivation Polysynthetic languages are characterized by words that are composed of multiple morphemes, often to the extent that one long word can express the meaning contained in a multi-word sentence in lan- guage like English. To illustrate, consider the following example from Inuktitut, one of the official languages of the Territory of Nunavut in Canada. The morpheme -tusaa- (shown in boldface below) is the root, and all the other morphemes are synthetically combined with it in one unit.1 (1) tusaa-tsia-runna-nngit-tu-alu-u-junga hear-well-be.able-NEG-DOER-very-BE-PART.1.S ‘I can't hear very well.’ Kabardian (Circassian), from the Northwest Caucasus, also shows this phenomenon, with the root -še- shown in boldface below: (2) wə-q’ə-d-ej-z-γe-še-ž’e-f-a-te-q’əm 2SG.OBJ-DIR-LOC-3SG.OBJ-1SG.SUBJ-CAUS-lead-COMPL-POTENTIAL-PAST-PRF-NEG ‘I would not let you bring him right back here.’ Polysynthetic languages are spoken all over the globe and are richly represented among Native North and South American families.
    [Show full text]
  • Linguistics 203 - Languages of the World Language Study Guide
    Linguistics 203 - Languages of the World Language Study Guide This exam may consist of anything discussed on or after October 20. Be sure to check out the slides on the class webpage for this date onward, as well as the syntax/morphology language notes. This guide discusses the aspects of specific languages we looked at, and you will want to look at the notes of specific languages in the ‘language notes: syntax/morphology’ section of the webpage for examples and basic explanations. This guide does not discuss the linguistic topics we discussed in a more general sense (i.e. where we didn’t focus on a specific language), but you are expected to understand these area as well. On the webpage, these areas are found in the ‘lecture notes/slides’ section. As you study, keep in mind that there are a number of ways I might pose a question to test your knowledge. If you understand the concept, you should be able to answer any of them. These include: Definitional If a language is considered an ergative-absolutive language, what does this mean? Contrastive (definitional) Explain how an ergative-absolutive language differs from a nominative- accusative language. Identificational (which may include a definitional question as well...) Looking at the example below, identify whether language A is an ergative- absolutive language or a nominative accusative language. How do you know? (example) Contrastive (identificational) Below are examples from two different languages. In terms of case marking, how is language A different from language B? (ex. of ergative-absolutive language) (ex. of nominative-accusative language) True / False question T F Basque is an ergative-absolutive language.
    [Show full text]
  • What Are the Limits of Polysynthesis?
    International Symposium on Polysynthesis in the World's Languages February 20-21, 2014 NINJAL, Tokyo What are the limits of Polysynthesis? Michael Fortescue, Marianne Mithun, and Nicholas Evans Of the various labels for morphological types currently in use by typologists ‘polysynthesis’ has proved to be the most difficult to pin down. For some it just represents an extreme on the dimension of synthesis (one of Sapir’s two major typological axes) while for others it is an independent category or parameter with far- reaching morphosyntactic ramifications. A recent characterization (Evans & Sasse 2002: 3f.) is the following: ‘Essentially, then, a prototypical polysynthetic language is one in which it is possible, in a single word, to use processes of morphological composition to encode information about both the predicate and all its arguments, for all major clause types [....] to a level of specificity, allowing this word to serve alone as a free-standing utterance without reliance on context.’ If the nub of polysynthesis is the packing of a lot of material into single verb forms that would be expressed as independent words in less synthetic languages, what exactly is the nature of and limitations on this ‘material’? The present paper investigates the limits – both upwards and downwards – of what the term is generally understood to cover. Reference Evans, N. & H.-J. Sasse (eds.). (2002). Problems of Polysynthesis. Berlin: Akademie Verlag. 1 2014-01-08 International Symposium on Polysynthesis in the World's Languages February 20-21, 2014 NINJAL, Tokyo Polysynthesis in Ainu Anna Bugaeva (National Institute for Japanese Language and Linguistics) Ainu is a typical polysynthetic language in the sense that a single complex verb can express what takes a whole sentence in most other languages.
    [Show full text]
  • Lara Mantovan Nominal Modification in Italian Sign Language Sign Languages and Deaf Communities
    Lara Mantovan Nominal Modification in Italian Sign Language Sign Languages and Deaf Communities Editors Annika Herrmann, Markus Steinbach, Ulrike Zeshan Editorial board Carlo Geraci, Rachel McKee, Victoria Nyst, Sibaji Panda, Marianne Rossi Stumpf, Felix Sze, Sandra Wood Volume 8 Lara Mantovan Nominal Modification in Italian Sign Language ISHARA PRESS ISBN 978-1-5015-1343-5 e-ISBN (PDF) 978-1-5015-0485-3 e-ISBN (EPUB) 978-1-5015-0481-5 ISSN 2192-516X e-ISSN 2192-5178 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2017 Walter de Gruyter Inc., Boston/Berlin and Ishara Press, Preston, UK Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com E Pluribus Unum (uncertain origin, attributed to Virgilio, Moretum, v. 103) Acknowledgements This book is a revised version of my 2015 dissertation which was approved for the PhD degree in Linguistics at Ca’ Foscari University of Venice. When I first plunged into the world of academic research, almost five years ago, I would never have imagined it was possible to achieve such an important milestone. Being so close to finalizing this book, I would like to look back briefly and remember and thank all the people who showed me the way, supported me, and encouraged me to grow both academically and personally.
    [Show full text]
  • Proceedings of the LFG13 Conference
    Proceedings of LFG13 Miriam Butt and Tracy Holloway King (Editors) 2013 CSLI Publications http://csli-publications.stanford.edu/ Contents 1 Dedication 4 2 Editors’ Note 5 3 Yasir Alotaibi, Muhammad Alzaidi, Maris Camilleri, Shaimaa ElSadek and Louisa Sadler: Psychological Predicates and Verbal Complemen- tation in Arabic 6 4 I Wayan Arka: Nominal Aspect in Marori 27 5 Doug Arnold and Louisa Sadler: Displaced Dependent Constructions 48 6 Daniele Artoni and Marco Magnani: LFG Contributions in Second Language Acquisition Research: The Development of Case in Russian L2 69 7 Oleg Belyaev: Optimal Agreement at M-structure: Person in Dargwa 90 8 Ansu Berg, Rigardt Pretorius and Laurette Pretorius: The Represen- tation of Setswana Double Objects in LFG 111 9 Tina Bögel: A Prosodic Resolution of German Case Ambiguities 131 10 Kersti Börjars and John Payne: Dimensions of Variation in the Ex- pression of Functional Features: Modelling Definiteness in LFG 152 11 George Aaron Broadwell: An Emphatic Auxiliary Construction for Emotions in Copala Triqui 171 12 Özlem Çetinoglu,˘ Sina Zarrieß and Jonas Kuhn: Dependency-based Sentence Simplification for Increasing Deep LFG Parsing Coverage 191 13 Elizabeth Christie: Result XPs and the Argument-Adjunct Distinction212 14 Cheikh Bamba Dione: Valency Change and Complex Predicates in Wolof: An LFG Account 232 15 Lachlan Duncan: Non-verbal Predicates in K’ichee’ Mayan: An LFG Approach 253 16 Dag Haug: Partial Control and the Semantics of Anaphoric Control in LFG 274 17 Annette Hautli-Janisz: Moving Right Along:
    [Show full text]
  • Stress Chapter
    Word stress in the languages of the Caucasus1 Lena Borise 1. Introduction Languages of the Caucasus exhibit impressive diversity when it comes to word stress. This chapter provides a comprehensive overview of the stress systems in North-West Caucasian (henceforth NWC), Nakh-Dagestanian (ND), and Kartvelian languages, as well as the larger Indo-European (IE) languages of the area, Ossetic and (Eastern) Armenian. For most of these languages, stress facts have only been partially described and analyzed, which raises the question about whether the available data can be used in more theoretically-oriented studies; cf. de Lacy (2014). Instrumental studies are not numerous either. Therefore, the current chapter relies mainly on impressionistic observations, and reflects the state of the art in the study of stress in these languages: there are still more questions than answers. The hope is that the present summary of the existing research can serve as a starting point for future investigations. This chapter is structured as follows. Section 2 describes languages that have free stress placement – i.e., languages in which stress placement is not predicted by phonological or morphological factors. Section 3 describes languages with fixed stress. These categories are not mutually exclusive, however. The classification of stress systems is best thought of as a continuum, with fixed stress and free stress languages as the two extremes, and most languages falling in the space between them. Many languages with fixed stress allow for exceptions based on certain phonological and/or morphological factors, so that often no firm line can be drawn between, e.g., languages with fixed stress that contain numerous morphologically conditioned exceptions (cf.
    [Show full text]
  • The Dative-Ergative Connection Miriam Butt
    Empirical Issues in Syntax and Semantics 6 O. Bonami & P.Cabredo Hofherr (eds.) 2006, pp. 69–92 The Dative-Ergative Connection Miriam Butt 1 Introduction The classic division between structural vs. inherent/ lexical case proposed within Gov- ernment-Binding (Chomsky 1981) remains a very popular one, despite evidence to the contrary that the inner workings of case systems are far more complex than this simple division would suggest and that individual case markers generally make a systematic structural and semantic contribution that interacts in a generalizable manner with the lexical semantics of a predicate (see Butt 2006 for a survey of theories and data, Butt 2006:125 and Woolford 2006 for a proposed disctinction between inherent (general- izable) and lexical (idiosyncratic) case.).1 That is, the semantic contribution of case cannot (and should not) be relegated to the realm of lexical stipulation because there are systematic semantic generalizations to be captured. This fact has been recognized in more and more recent work. One prominent ex- ample is the work engaged in understanding the semantic generalizations underlying so-called object alternations, perhaps the most famous of which is the Finnish parti- tive alternation shown in (1)–(2). In Finnish, the accusative alternates with the parti- tive on objects. This alternation gives rise to readings of partitivity (1) and aspectual (un)boundedness (2).2 (1) a. Ostin leivän bought.1.Sg bread.Acc ‘I bought the bread.’ Finnish 1I would like to thank the organizers of the CSSP 2005 for inviting me to participate in the conference. I enjoyed the conference tremendously and the comments I received at the conference were extremely constructive, particularly those by Manfred Krifka.
    [Show full text]
  • Nez Perce Verb Morphology Phillip Cash Cash University of Arizona, 2004
    Nez Perce Verb Morphology Phillip Cash Cash University of Arizona, 2004 1.0 Introduction In this paper, I present an introduction to Nez Perce verb morphology. The goal of such a study is to describe the internal structure of Nez Perce verb form and meaning. It takes as its task identifying the constituent elements of words and examining the rules that govern their co-occurrence. The Nez Perce language is a polysynthetic language and, as such, it displays an enriched morphological system whereby complex propositions can be expressed at the level of a single word. Typologically, utterances of the polysynthetic type suggest that speakers of these languages employ a structural principle of dependent-head synthesis that treats the minimal units of meaning, that is, its morphemes, in ways different from other world languages. This is simply to say that the morphology plays a more prominent role at the clausal level than in synthetic languages like English. Consider a concrete example as in /hiwlé·ke•yke/ ‘He/she/it ran.’ When we examine the structure of a morphosyntactic word in Nez Perce, we are interested in i) identifying the pairing of each morpheme’s phonological form, often called its surface structure, with the content specified in its lexical entry, and ii) identifying how morphemes are organized and combined with respect to grammatical principles. First, we begin by examining a morphosyntactic word through its component parts. Four main representations of words are used in this analysis, these are i) the surface form, ii) the morphological form, iii) the morphological gloss, and iv) the free translation.
    [Show full text]
  • Mighty Morpheme Tagging Rangers
    in Proceedings of NAACL-HLT 2019 Background ❖ It's hard to make crosslinguistic comparisons of RNN syntactic performance (e.g., on subject-verb agreement prediction) ➢ Languages differ in multiple typological properties ➢ Cannot hold training data constant across languages Proposal: generate synthetic data to devise a controlled experimental paradigm for studying the interaction of the inductive bias of a neural architecture with particular typological properties. Setup ❖ Data: English Penn Treebank sentences converted to Universal Dependencies scheme Example of a dependency parse tree ONE in Proceedings of NAACL-HLT 2019 Setup ❖ Identify all verb arguments with nsubj, nsubjpass, dobj and record plurality (HOW? manually?) Example of a dependency parse tree Setup ❖ Generate synthetic data by appending novel morphemes to the verb arguments identified to inflect them for argument role and number Setup ❖ Generate synthetic data by appending novel morphemes to the verb arguments identified to inflect them for argument role and number No explanation or motivation given for how the novel morphemes were developed, nor an explicit mention that they're novel! Might length matter? Typological properties ❖ Does jointly predicting object and subject plurality improve overall performance? ➢ Generate data with polypersonal agreement ❖ Do RNNs have inductive biases favoring certain word orders over others? ➢ Generate data with different word orders ❖ Does overt case marking influence agreement prediction? ➢ Generate data with different case marking systems ■ unambiguous, syncretic, argument marking Examples of synthetic data Task ❖ Predict a verb's subject and object plurality features. Input: synthetically-inflected sentence Output: one category prediction each for subject & object subject: [singular, plural] object: [singular, plural, none] (if no object) (It's NOT CLEAR in the paper WHAT the actual prediction task is / what the actual output space is.
    [Show full text]
  • Multi-Lingual Dependency Parsing: Word Representation and Joint
    Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis Mathieu Dehouck To cite this version: Mathieu Dehouck. Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis. Computer Science [cs]. Université de lille, 2019. English. tel-02197615 HAL Id: tel-02197615 https://tel.archives-ouvertes.fr/tel-02197615 Submitted on 30 Jul 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Centre de Recherche en Informatique, Signal et Automatique de Lille École Doctorale Sciences Pour L’Ingénieur Thèse de Doctorat Spécialité : Informatique et Applications préparée au sein de l’équipe Magnet, du laboratoire Cristal et du centre de recherche Inria Lille - Nord Europe financée par l’Université de Lille Mathieu Dehouck Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis Parsing en Dépendances Multilingue : Représentation de Mots et Apprentissage Joint pour l’Analyse Syntaxique sous la direction de Dr. Marc TOMMASI et l’encadrement de Dr. Pascal DENIS Soutenue publiquement à Villeneuve d’Ascq, le 20 mai 2019 devant le jury composé de: Mme Sandra KÜBLER Indiana University Bloomington Rapportrice M. Alexis NASR Université d’Aix Marseille Rapporteur Mme Hélène TOUZET CNRS Présidente du jury M.
    [Show full text]
  • Journal of South Asian Languages and Linguistics 2(2)
    JSALL 2021; aop Netra P. Paudyal and John Peterson* How one language became four: the impact of different contact-scenarios between “Sadani” and the tribal languages of Jharkhand https://doi.org/10.1515/jsall-2021-2028 Published online May 4, 2021 Abstract: Four Indo-Aryan linguistic varieties are spoken in the state of Jharkhand in eastern central India, Sadri/Nagpuri, Khortha, Kurmali and Panchparganiya, which are considered by most linguists to be dialects of other, larger languages of the region, such as Bhojpuri, Magahi and Maithili, although their speakers consider them to be four distinct but closely related languages, collectively referred to as “Sadani”. In the present paper, we first make use of the program COG by the Summer Institute of Linguistics (SIL) to show that these four varieties do indeed form a distinct, compact genealogical group within the Magadhan language group of Indo- Aryan. We then go on to argue that the traditional classification of these languages as dialects of other languages appears to be based on morphosyntactic differences between these four languages and similarities with their larger neighbors such as Bhojpuri and Magahi, differences which have arisen due to the different contact situations in which they are found. Keywords: Khortha; Kudmali; language contact; Sadani; Sadri 1 Introduction While the first official language of the state of Jharkhand in eastern central India is Hindi, over 96% of the state population speaks a local tribal or regional language as their first (L1) or second language (L2) on a daily basis, and only 3.7% of the people speak Hindi as their first language (JTWRI 2013:4–5).
    [Show full text]
  • Peter Arkadiev & Yury Lander October 2018 Version. Submitted to Maria
    THE NORTHWEST CAUCASIAN LANGUAGES Peter Arkadiev & Yury Lander October 2018 version. Submitted to Maria Polinsky (ed.), The Oxford Handbook of the Languages of the Caucasus. 1. The languages and their speakers The Northwest Caucasian (NWC) family, also known as West Caucasian or Abkhaz- Adyghe, comprises five languages, which are grouped into three branches: • Abkhaz-Abaza: o Abkhaz (including the Sadz, Ahchypsy, Bzyp, Tsabal, and Abzhywa dialects), o Abaza (nominally including the Tapanta and Ashkharywa dialects), • Ubykh, • Circassian: o West Circassian (also known as Adyghe, including the Bzhedugh, Shapsugh, Abzakh/Abadzekh, and Temirgoy dialects), o Kabardian (also known as East Circassian and including the Besleney, Baksan, Mozdok, Malka, Terek, and Kuban dialects). This conventional division is based on heterogeneous linguistic and sociolinguistic considerations and hence is a kind of simplification. For example, Abaza consists of two dialects, Tapanta and Ashkharywa, of which the latter is closer to Abkhaz than to Tapanta, West Circassian and Kabardian are often considered by their speakers to constitute a single Adyghe (or Circassian) language (despite the absence of mutual intelligibility), and some dialects of Abkhaz (e.g., Sadz) and West Circassian (e.g., Shapsugh) may be treated as separate languages. Nonetheless, below we use the language list as given above with a proviso that whenever it is possible we will try to overtly mark the variety referred to – with the exception of Abaza, whose examples always belong to the Tapanta dialect. During the centuries, the speakers of the languages of the family inhabited areas to the North and partly to the South of the Western part of the Caucasian Ridge including the Northeast coast of the Black Sea.
    [Show full text]