Arxiv:2008.01946V2 [Cs.CL] 3 Nov 2020 H Raino Meddwr Representa- Word Embedded of Creation the Fifrainaotgne Nwr Embeddings

Total Page:16

File Type:pdf, Size:1020Kb

Arxiv:2008.01946V2 [Cs.CL] 3 Nov 2020 H Raino Meddwr Representa- Word Embedded of Creation the Fifrainaotgne Nwr Embeddings An exploration of the encoding of grammatical gender in word embeddings Hartger Veeman Ali Basirat Dep. of Linguistics and Philology Dep. of Linguistics and Philology Uppsala University Uppsala University [email protected] ali.basirat@lingfil.uu.se Abstract The way this information is encoded in word em- beddings is of interest because it can contribute to The vector representation of words, known as the discussion of how nouns of a language are clas- word embeddings, has opened a new research sified into different gender categories. approach in linguistic studies. These represen- tations can capture different types of informa- This paper explores the presence of information tion about words. The grammatical gender of about grammatical gender in word embeddings nouns is a typical classification of nouns based from three perspectives: on their formal and semantic properties. The study of grammatical gender based on word 1. Examine how such information is encoded embeddings can give insight into discussions across languages using a model transfer be- on how grammatical genders are determined. tween languages. In this study, we compare different sets of word embeddings according to the accuracy of 2. Determine the classifier’s reliance on seman- a neural classifier determining the grammati- tic information by using contextualized word cal gender of nouns. It is found that there is an embeddings. overlap in how grammatical gender is encoded in Swedish, Danish, and Dutch embeddings. 3. Test the effect of gender agreements between Our experimental results on the contextualized nouns and other words categories on the in- embeddings pointed out that adding more con- formation encoded into word embeddings. textual information to embeddings is detrimen- tal to the classifier’s performance. We also 2 Grammatical gender observed that removing morpho-syntactic fea- tures such as articles from the training corpora Grammatical gender is a nominal classification of embeddings decreases the classification per- system found in many languages. In languages formance dramatically, indicating a large por- with grammatical gender, the noun forms an agree- tion of the information is encoded in the rela- ment with another language aspect based on the tionship between nouns and articles. noun class. The most common grammatical 1 Introduction gender divisions are masculine/feminine, mascu- arXiv:2008.01946v2 [cs.CL] 3 Nov 2020 line/feminine/neuter, and uter/neuter, but many The creation of embedded word representa- other divisions exist. tions has been a key breakthrough in natu- Gender is assigned based on a noun’s meaning ral language processing (Mikolov et al., 2013; and form (Basirat et al., in press). Gender assign- Pennington et al., 2014; Bojanowski et al., 2017; ment systems vary between languages and can be Peters et al., 2018a; Devlin et al., 2019). Words based solely on meaning (Corbett, 2013). The ex- embeddings have proven to capture informa- periments in this paper concern grammatical gen- tion relevant to many tasks within the field. der in Swedish, Danish, and Dutch. Basirat and Tang (2019) have shown that word em- Nouns in Swedish and Danish can have one of beddings contain information about the grammat- two grammatical genders, uter, and neuter indi- ical gender of nouns. In their study, a classifier’s cated through the article for indefinite form and ability to classify embeddings on grammatical gen- the suffix for the definite and plural form. Dutch der is interpreted as an indication of the presence nouns can technically be classified as masculine, of information about gender in word embeddings. feminine, or neuter. However, in practice, the masculine and feminine genders are only used for Table 1: Accuracy for model (vertical) applied to test nouns referring to people, with the distinction be- set (horizontal) tween masculine and feminine being the subject and object pronouns, while in other cases, all non- SV DA NL neuter nouns can be categorized as uter. Grammat- ical gender in Dutch is indicated in the definite ar- SV 93.55 73.89 73.37 ticle, adjective agreement, pronouns, and demon- DA 81.18 91.81 78.50 stratives. NL 71.32 78.54 93.34 3 Multilingual embeddings and model Table 2: Corrected accuracy for model (vertical) ap- transferability plied to test set (horizontal) The first experiment explores if grammatical gen- SV DA NL der is encoded in a similar way between different languages. For this experiment, we leverage the SV 32.03 15.25 11.37 multilingual word embeddings. Multilingual word DA 22.54 35.33 19.50 embeddings are word embeddings that are mapped to the same latent space so that the vector of words NL 9.32 19.54 31.15 that have a similar meaning has a high similarity regardless of the source language. The aligned word embeddings allow for the model transfer of 3.2 Results the neural classifier. This is done by applying the To isolate how much of the accuracy of a model is neural classifier model to a different language than caused by successfully applying learned patterns it is trained on. If the model effectively classifies from the source language to the target language embeddings from a different language, it is likely rather than successful random guessing, the fol- grammatical gender is encoded in the embeddings lowing baseline is defined: of two languages similarly. X p(gs)p(gt) 3.1 Data & Experiment g∈{u,n} The lemma form of all unique nouns and their This baseline is the chance the model would make genders were extracted from Universal Dependen- a correct guess purely based on the class distribu- cies treebanks (Nivre et al., 2016). This resulted tions of the training data and test data. Subtract- in about 4.5k, 5.5k, and 7.2k nouns in Danish (da), ing this chance from the results should give an Swedish (sv), and Dutch (nl), respectively. Ten indication of how much information in the model percent of each data set was sampled randomly was transferable to the task of classifying test data to be used as test data. For each noun, the em- in another language. The absolute results are dis- beddings was extracted from pre-trained aligned played in table 1 while the results corrected with word embeddings published in the MUSE project this baseline are displayed in table 2. (Conneau et al., 2017). The uter/neuter class dis- All transfers yielded a result above the random tributions are 74/26 for Swedish, 68/32 for Danish, guessing baseline. This means the classifier is and 75/25 for Dutch. learning patterns in the source language data that The classification is performed using a feed- are applicable to the target language data. From forward neural network. The network has an in- this, it can be concluded that grammatical gender put size of 300 and a hidden layer size of 600. is encoded similarly between these languages to a The loss function used is binary cross-entropy loss. certain degree. The training is run until no further improvement is The performance of the Swedish model for clas- observed. sifying Dutch grammatical gender and vice versa For every language, a network was trained us- is the lowest of all language pairs. This language ing the data described above. Every model was pair is also the pair with the largest phylogenetic then applied to its own test data and the other lan- language distance. The language pair with the guages’ test data to test the model’s transferability. smallest distance is Danish/Swedish. The Danish model applied to the Swedish test data produces Table 3: Accuracy and loss for different layers of the the best result of all model transfers, achieving an ELMO model accuracy of 21.06% over the baseline. This ob- servation on the inverse relationship between the Loss Accuracy language distance and the gender transferability agrees with the broader experiments performed by Word representation layer 0.168 93.82 (Veeman et al., 2020). First layer 0.260 92.13 Interestingly, the success of the model trans- Second layer 0.274 91.24 fer is not symmetrical. This is most clear in the Swedish-Danish pair which achieve an accuracy of 22.54% over baseline from Danish to Swedish, ers’ output was made to discover the influence of but only 15.25% over baseline from Swedish to this difference in semantic information. Danish. The patterns the Danish model learns are more applicable to the Swedish data than vice 4.2 Data & Experiment versa. The comparison was performed using a Swedish pre-trained ELMo model (Che et al., 2018). The 4 Contextual word embeddings nouns and their gender labels were extracted from Adding contextual information to a word embed- UD treebanks. The noun embeddings were gener- ding model has proven to be effective for seman- ated using their treebank sentence as context. The tic tasks like named entity recognition or seman- output is collected at the word representation layer, tic role labeling (Peters et al., 2018b). This addi- the first contextualized layer, and the second con- tion of semantic information can be leveraged to textualized (output) layer. This resulted in a set of measure its influence on the gender classification a little over 10k embeddings for every layer, from task. A contextualized word embedding model which 10% was randomly sampled and split as test was used to quantify the role of semantic informa- data. The embeddings have a size of 1024, and the tion in the noun classification. hidden layer size of the classifier has a size double the input size, 2048. The results for this compari- 4.1 Embeddings from Language Models son are shown in table 2. (ELMo) 4.3 Results The ELMo model (Peters et al., 2018a) is a multi- An apparent decrease in accuracy and an increase layer biRNN that leverages pre-trained deep bidi- in loss is observed when classifying the gender of rectional language models to create contextual the contextualized word embeddings.
Recommended publications
  • An Analysis of Semantic and Syntactic Negation in Oshiwambo and English
    JULACE: Journal of University of Namibia Language Centre Volume 4, No. 2, 2019 (ISSN 2026-8297) An analysis of semantic and syntactic negation in Oshiwambo and English Immanuel N. Shatepa1 Namibia University of Science and Technology Abstract Negation in languages has been documented since the 1970s and 1980s. This paper attempts to explain the negation structures in semantic and syntactic structures of Oshiwambo and English languages. These two languages have two complete negation structures and how they function to achieve negation is far from being similar. The focus of the paper was on the analysis of the sentential negation and how negative particles are used in English and Oshiwambo, a Bantu language. It analyzes and compares the use of full negatives, affixes and quasi negative words to achieve negation in English and Oshiwambo language. The Oshiwambo and English texts/contents were purposely sampled and content analysis was performed accordingly. The analysis shows that Bantu languages share a common rule of negation which is the use of a pre- initial prefix while the rules to changing negative imperative to interrogative or declarative are different between English and Oshiwambo. Keywords: pre-initial, negation, marker, imperative, declarative, integrative, syntactic, semantic, Bantu Introduction A number of studies on negation of English and Bantu languages have been conducted by linguists. Studies by Kim and Sag (1995); Neba and Tanda (2005) and Weir (2013) show that negation differs from language to language. They further state highlighted that since every language has its own morphological and semantic components to express negation, one rule fits all simply does not work.
    [Show full text]
  • Classifiers: a Typology of Noun Categorization Edward J
    Western Washington University Western CEDAR Modern & Classical Languages Humanities 3-2002 Review of: Classifiers: A Typology of Noun Categorization Edward J. Vajda Western Washington University, [email protected] Follow this and additional works at: https://cedar.wwu.edu/mcl_facpubs Part of the Modern Languages Commons Recommended Citation Vajda, Edward J., "Review of: Classifiers: A Typology of Noun Categorization" (2002). Modern & Classical Languages. 35. https://cedar.wwu.edu/mcl_facpubs/35 This Book Review is brought to you for free and open access by the Humanities at Western CEDAR. It has been accepted for inclusion in Modern & Classical Languages by an authorized administrator of Western CEDAR. For more information, please contact [email protected]. J. Linguistics38 (2002), I37-172. ? 2002 CambridgeUniversity Press Printedin the United Kingdom REVIEWS J. Linguistics 38 (2002). DOI: Io.IOI7/So022226702211378 ? 2002 Cambridge University Press Alexandra Y. Aikhenvald, Classifiers: a typology of noun categorization devices.Oxford: OxfordUniversity Press, 2000. Pp. xxvi+ 535. Reviewedby EDWARDJ. VAJDA,Western Washington University This book offers a multifaceted,cross-linguistic survey of all types of grammaticaldevices used to categorizenouns. It representsan ambitious expansion beyond earlier studies dealing with individual aspects of this phenomenon, notably Corbett's (I99I) landmark monograph on noun classes(genders), Dixon's importantessay (I982) distinguishingnoun classes fromclassifiers, and Greenberg's(I972) seminalpaper on numeralclassifiers. Aikhenvald'sClassifiers exceeds them all in the number of languages it examines and in its breadth of typological inquiry. The full gamut of morphologicalpatterns used to classify nouns (or, more accurately,the referentsof nouns)is consideredholistically, with an eye towardcategorizing the categorizationdevices themselvesin terms of a comprehensiveframe- work.
    [Show full text]
  • Morphological Classes and Gender in Ɓəna-Yungur
    Morphological classes and gender in əna-Yungur Mark van de Velde, Dmitry Idiatov To cite this version: Mark van de Velde, Dmitry Idiatov. Morphological classes and gender in əna-Yungur. Shigeki Kaji. Proceedings of the 8th World Congress of African Linguistics, Research Institute for Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies, pp.53-65, 2017. halshs-01484016 HAL Id: halshs-01484016 https://halshs.archives-ouvertes.fr/halshs-01484016 Submitted on 6 Mar 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Morphological classes and gender in Ɓə́ná-Yungur Mark Van de Velde1,2 & Dmitry Idiatov1,2 1Llacan (UMR 8135 CNRS – USPC/Inalco), Paris, France 2Research Centre for Nigerian Languages, KWASU, Malete, Nigeria Abstract This paper provides an analysis of the gender system of Ɓə́ná-Yungur (glottocode: bena1260), distinguishing noun classes proper, defined as agreement classes, from morphological classes, defined in terms of number marking on nouns. The gender system is typologically unusual in its symmetry and simplicity. Ɓə́ná-Yungur has three noun classes in the singular and the same three classes in the plural. All logically possible singular-plural pairings are attested, except one.
    [Show full text]
  • 0 German Noun Class As a Nominal Protection Device Richard Futrell
    German Noun Class as a Nominal Protection Device Richard Futrell Undergraduate Honors Thesis Stanford University, Department of Linguistics May 2010 ____________________________________ Faculty advisor: Dan Jurafsky ____________________________________ Second reader: Michael Ramscar 0 Gretchen: Wilhelm, where is the turnip? Wilhelm: She has gone to the kitchen. Gretchen: Where is the accomplished and beautiful English maiden? Wilhelm: It has gone to the opera. - Mark Twain, ―The Awful German Language‖ 1. Introduction Grammatical gender, also known as noun class, afflicts about half of the world's languages. Speakers of these languages must mark each noun for its membership in a certain noun class, and must similarly mark elements such as adjectives or verbs that agree with the noun. In over half of these cases, the choice of gender for a noun has no comprehensive systematic relationship to the meaning of the nouns (Corbett 2008), posing a significant obstacle for L2 learners (Harley 1979, Tucker et al. 1968: 312). Some have taken grammatical gender to be superfluous: for instance, Maratsos (1979: 235) calls the existence of such a system ―excellent testimony to the occasional nonsensibleness of the species.‖ A number of researchers have nonetheless defended noun class against the accusation of uselessness, and I will take up that cause in this paper. I will examine the gender system of Standard German with an eye toward detecting function. I claim that noun class serves as a sort of ‗Nominal Protection Device,‘ alleviating the linguistic difficulties inherent in nouns by reducing uncertainty about nouns. Noun class markers help language users predict nouns in a number of ways: they predict the form of the noun, they predict the semantics of the noun, and they predict 1 which discourse referent a pronoun points to in reference tracking (in the sense of Barlow 1992).
    [Show full text]
  • The Noun Class System Of
    UNIVERSITY OF YAOUNDE !! <f:.,;+ e.>, " PA, FACULTY OF LETTERS DEPARTMENT OF 'AFRICIAN AND SOCIAL SCIENCES LANGUAGES AND LINGUISTICS THE NOUN CLASS SYSTEM OF A Dissertation Presented in Partial Fulfilment of the Requirements for the Award of a Post-Graduate Diploma (Maitrise) in Linguistics BY Irene Swiri ASOBO 8. A. Modern Lefters Supervised by Dr. CarL EBOBISSE (Char@ de cours) September 1989 i Dedicated to my parents, bro-them and sisters, with all my love. C. ACKIIOKLEGEMENT I must acknowledge special indebtedness to my supervisor Dr. Cor1 EBOBISSE for his invaluable contribu- tion ot the realisation of this dissertation. His inde- fatigable patience criticisms and unswerving dcvotion were encouraging especially when I was doubting and discouraged. Llithout his potient guidance this work would not have been acheived. My heortfelt'gratitu8e to fioffessor B.S. Chumbow who assisted and advised me during the writing of this work. My special thanks also goes to Dr. Chia Emmanuel otic? 011 my lecturers who were a source of unwmering support to me. Great thankfulness to my parents Prince snii Mrs V.T. ASOBO for the moral and financial ai& they showered on me. I will like to gratefully 3cknowledge Evclyne Monikang for the wonderful and ucfziling encouragomcnt she gnvc me. She was always a pillar to 1em on. Sincere thanks to 311 my classmates whose camoroderie wc?s a11 T neef-ed to spur me on. All. my friends especially Walters Abie who was always ready to help, Po-po who never stopped to say go on and Dora Mbola for being there when I needed her.
    [Show full text]
  • 4.1 Inflection
    4.1 Inflection Within a lexeme-based theory of morphology, the difference between derivation and inflection is very simple. Derivation gives you new lexemes, and inflection gives you the forms of a lexeme that are determined by syntactic environment (cf. 2.1.2). But what exactly does this mean? Is there really a need for such a distinction? This section explores the answers to these questions, and in the process, goes deeper into the relation between morphology and syntax. 4.1.1 Inflection vs. derivation The first question we can ask about the distinction between inflection and derivation is whether there is any formal basis for distinguishing the two: can we tell them apart because they do different things to words? One generalization that has been made is that derivational affixes tend to occur closer to the root or stem than inflectional affixes. For example, (1) shows that the English third person singular present inflectional suffix -s occurs outside of derivational suffixes like the deadjectival -ize, and the plural ending -s follows derivational affixes including the deverbal -al: (1) a. popular-ize-s commercial-ize-s b. upheav-al-s arriv-al-s Similarly, Japanese derivational suffixes like passive -rare or causative -sase precede inflectional suffixes marking tense and aspect:1 (2) a. tabe-ru tabe-ta eat- IMP eat- PERF INFLECTION 113 ‘eats’ ‘ate’ b. tabe-rare- ru tabe-rare- ta eat - PASS-IMP eat- PASS-PERF ‘is eaten’ ‘was eaten’ c. tabe-sase- ru tabe-sase- ta eat- CAUS-IMP eat- CAUS-PERF ‘makes eat’ ‘made eat’ It is also the case that inflectional morphology does not change the meaning or grammatical category of the word that it applies to.
    [Show full text]
  • 1 Noun Classes and Classifiers, Semantics of Alexandra Y
    1 Noun classes and classifiers, semantics of Alexandra Y. Aikhenvald Research Centre for Linguistic Typology, La Trobe University, Melbourne Abstract Almost all languages have some grammatical means for the linguistic categorization of noun referents. Noun categorization devices range from the lexical numeral classifiers of South-East Asia to the highly grammaticalized noun classes and genders in African and Indo-European languages. Further noun categorization devices include noun classifiers, classifiers in possessive constructions, verbal classifiers, and two rare types: locative and deictic classifiers. Classifiers and noun classes provide a unique insight into how the world is categorized through language in terms of universal semantic parameters involving humanness, animacy, sex, shape, form, consistency, orientation in space, and the functional properties of referents. ABBREVIATIONS: ABS - absolutive; CL - classifier; ERG - ergative; FEM - feminine; LOC – locative; MASC - masculine; SG – singular 2 KEY WORDS: noun classes, genders, classifiers, possessive constructions, shape, form, function, social status, metaphorical extension 3 Almost all languages have some grammatical means for the linguistic categorization of nouns and nominals. The continuum of noun categorization devices covers a range of devices from the lexical numeral classifiers of South-East Asia to the highly grammaticalized gender agreement classes of Indo-European languages. They have a similar semantic basis, and one can develop from the other. They provide a unique insight into how people categorize the world through their language in terms of universal semantic parameters involving humanness, animacy, sex, shape, form, consistency, and functional properties. Noun categorization devices are morphemes which occur in surface structures under specifiable conditions, and denote some salient perceived or imputed characteristics of the entity to which an associated noun refers (Allan 1977: 285).
    [Show full text]
  • Staged Approach for Grammatical Gender Identification of Nouns Using Association Rule
    Staged Approach for Grammatical Gender Identification of Nouns using Association Rule Mining and Classification Shilpa Desai1, Jyoti Pawar1, and Pushpak Bhattacharyya2 1 Department of Computer Science and Technology Goa University, Goa - India [email protected], [email protected] 2 Department of Computer Science and Engineering IIT-Bombay, Mumbai - India [email protected] Abstract. In some languages, gender is a grammatical property of the noun. Grammatical gender identification enhances machine translation of such languages. This paper reports a three staged approach for gram- matical gender identification that makes use of word and morphological features only. A Morphological Analyzer is used to extract the morpho- logical features. In stage one, association rule mining is used to obtain grammatical gender identification rules. Classification is used at the sec- ond stage to identify grammatical gender for nouns that are not covered by grammatical gender identification rules obtained in stage one. The third stage combines the results of the two stages to identify the gender. The staged approach has a better precision, recall and F-score compared to machine learning classifiers used on complete data set. The approach was tested on Konkani nouns extracted from the Konkani WordNet and an F-Score 0.84 was obtained. 1 Introduction Gender is a grammatical property of nouns in many languages3 including Indian language such as Sanskrit, Hindi, Gujarati, Marathi and Konkani. In such lan- guages adjectives and verbs in a sentence agree with the gender of the noun. For example translation of \He is a good boy" and \She is a good girl" in Hindi is \vaha eka achchhaa laDakaa haai 4" and "vaha eka achchhii laDakii haai", respectively.
    [Show full text]
  • A Tutorial on Lexical Classes
    A tutorial on lexical classes Ricardo Bermúdez-Otero University of Manchester INTRODUCTION Goals of the tutorial §1 Our brief: L to address the key issues arising from exponence phenomena in which the lexicon is divided into arbitrary subsets, notably (a) inflectional classes and (b) cophonologies. §2 Our focus: L to explore the advantages and drawbacks of theories that model such phenomena by means of diacritic features. Overview §3 The discussion will touch upon the following questions: • Architectural issues How must the grammar be organized in order to guarantee the syntactic inertness of inflectional class features? Can syntactic features like gender be sensitive to inflectional class membership, and can inflectional class membership be determined by phonological properties? • Perspectives on inflectional class features (1): denial To what extent can inflectional class features be eliminated by (a) storing complex items in the lexicon and (b) invoking replacive mechanisms of exponence (‘overwriting’)? • Perspectives on inflectional class features (2): strengthening Instead of eliminating inflectional class features, do we rather need a more elaborate theory (e.g. one based on feature decomposition) capable of accounting for a wider range of phenomena (e.g. cross-class syncretism)? • Class features in phonology: cophonologies Can patterns of overlap between cophonologies be captured through the decomposition of cophonology diacritics, just like syncretisms between inflectional classes? Can derived environment effects in phonology be analysed in terms of the percolation of cophonology diacritics within morphological structures? http://www.bermudez-otero.com/WOTM4.pdf Workshop on Theoretical Morphology 4, Großbothen, 20 June 2008 2 ARCHITECTURAL ISSUES Basic properties of inflectional classes §4 Inflectional classes are traditionally defined by two properties: • arbitrariness and • syntactic inertness.
    [Show full text]
  • Number Systems in Grammar Position Paper
    1 Language and Culture Research Centre: 2018 Workshop Number systems in grammar - position paper Alexandra Y. Aikhenvald I Introduction I 2 The meanings of nominal number 2 3 Special number distinctions in personal pronouns 8 4 Number on verbs 9 5 The realisation of number 12 5.1 The forms 12 5.2 The loci: where number is shown 12 5.3 Optional and obligatory number marking 14 5.4 The limits of number 15 5.4.1 Number and the meanings of nouns 15 5.4.2 'Minor' numbers 16 5.4.3 The limits of number: nouns with defective number values 16 6 Number and noun categorisation 17 7 Markedness 18 8 Split, or mixed, number systems 19 9 Number and social deixis 19 10 Expressing number through other means 20 11 Number systems in language history 20 12 Summary 21 Further readings 22 Abbreviations 23 References 23 1 Introduction Every language has some means of distinguishing reference to one individual from reference to more than one. Number reference can be coded through lexical modifiers (including quantifiers of various sorts or number words etc.), or through a grammatical system. Number is a referential property of an argument of the predicate. A grammatical system of number can be shown either • Overtly, on a noun, a pronoun, a verb, etc., directly referring to how many people or things are involved; or • Covertly, through agreement or other means. Number may be marked: • within an NP • on the head of an NP • by agreement process on a modifier (adjective, article, demonstrative, etc.) • through agreement on verbs, or special suppletive or semi-suppletive verb forms which may code the number of one or more verbal arguments, or additional marker on the verb.
    [Show full text]
  • Lamnso' Gender Agreement
    * BETWEEN AGREEMENT AND CASE MARKING IN LAMNSO Laura W. McGarrity and Robert Botne Abstract: Lamnso, a language in the Grassfields branch of Southern Bantoid, has a system of noun classes marked by (C)V affixes that attach to the stem. Noun modifiers agree with the noun by attaching a comparable affix that matches the class. This type of NP-level concord is typical of Bantu languages. At the clausal level, (C)V markers that are identical in form to those appearing at the NP-level appear as enclitics on virtually all nouns in a sentence. Though these markers are identical, it is argued that they serve separate functions, marking agreement on subject nouns before the verb and case on oblique object nouns after the verb. Direct objects and nouns in locative expressions are not marked. Typological evidence in the form of a grammatical relations hierarchy is discussed in support of these claims. 1. Introduction Grammatical relations in human languages, such as those between a noun phrase and the verb, are primarily expressed by means of three different morphosyntactic strategies: word order, case mark- ing, and agreement (Croft 1990: 101). All languages, rather than relying on just one of these mecha- nisms, use some combination of the three. In this paper, we intend to show that the Southern Bantoid language Lamnso employs elements of all three of these strategies to indicate the relationship that a noun bears to the verb in a clause. Lamnso, like Bantu and many languages in the Niger-Congo family, has a system of noun classes marked by (C)V affixes that attach to the noun stem.
    [Show full text]
  • Interpreting Sequence-To-Sequence Models for Russian Inflectional Morphology
    Interpreting Sequence-to-Sequence Models for Russian Inflectional Morphology David L. King Andrea D. Sims Micha Elsner The Ohio State University king.2138, sims.120, elsner.14 @osu.edu { } Abstract Our broader linguistic contribution is to recon- nect the inflection task to the descriptive literature Morphological inflection, as an engineering task in NLP, has seen a rise in the use of neu- on morphological systems. Neural models for in- ral sequence-to-sequence models (Kann and flection are now being applied as cognitive models Schutze¨ , 2016; Cotterell et al., 2018; Aha- of human learning in a variety of settings (Mal- roni and Goldberg, 2017). While these out- ouf, 2017; Silfverberg and Hulden, 2018; Kirov perform traditional systems based on edit rule and Cotterell, 2018, and others). They are appeal- induction, it is hard to interpret what they ing cognitive models partly because of their high are learning in linguistic terms. We propose performance on benchmark tasks (Cotterell et al., a new method of analyzing morphological sequence-to-sequence models which groups 2016, and subsq.), and also because they make errors into linguistically meaningful classes, few assumptions about the morphological system making what the model learns more transpar- they are trying to model, dispensing with overly ent. As a case study, we analyze a seq2seq restrictive notions of segmentable morphemes and model on Russian, finding that semantic and discrete inflection classes. But while these con- lexically conditioned allomorphy (e.g. inani- structs are theoretically troublesome, they are still mate nouns like ZAVOD ‘factory’ and animates important for describing many commonly-studied like OTEC ‘father’ have different, animacy- languages; without them, it is relatively difficult conditioned accusative forms) are responsible for its relatively low accuracy.
    [Show full text]