Morphology Matters: a Multilingual Language Modeling Analysis Hyunji Hayley Park Katherine J

Total Page:16

File Type:pdf, Size:1020Kb

Morphology Matters: a Multilingual Language Modeling Analysis Hyunji Hayley Park Katherine J Morphology Matters: A Multilingual Language Modeling Analysis Hyunji Hayley Park Katherine J. Zhang Coleman Haley University of Illinois Carnegie Mellon University Johns Hopkins University [email protected] [email protected] [email protected] Kenneth Steimel Han Liu Lane Schwartz Indiana University University of Chicago∗ University of Illinois [email protected] [email protected] [email protected] Abstract find that morphological complexity is predictive of language modeling difficulty, while Mielke et al. Prior studies in multilingual language mod- (2019) conclude that simple statistics of a text like eling (e.g., Cotterell et al., 2018; Mielke the number of types explain differences in model- et al., 2019) disagree on whether or not ing difficulty, rather than morphological measures. inflectional morphology makes languages This paper revisits this issue by increasing the harder to model. We attempt to resolve the disagreement and extend those studies. number of languages considered and augment- We compile a larger corpus of 145 Bible ing the kind and number of morphological fea- translations in 92 languages and a larger tures used. We train language models for 92 number of typological features.1 We fill languages from a corpus of Bibles fully aligned in missing typological data for several lan- at the verse level and measure language model- guages and consider corpus-based measures ing performance using surprisal (the negative log- of morphological complexity in addition to likelihood) per verse (see §4.5). We investigate expert-produced typological features. We find that several morphological measures how this measure is correlated with 12 linguist- are significantly associated with higher sur- generated morphological features and four corpus- prisal when LSTM models are trained with based measures of morphological complexity. BPE-segmented data. We also investigate Additionally, we contend that the relation be- linguistically-motivated subword segmenta- tween segmentation method, morphology, and lan- tion strategies like Morfessor and Finite- guage modeling performance needs further inves- State Transducers (FSTs) and find that these tigation. Byte-Pair Encoding (BPE; Shibata et al., segmentation strategies yield better perfor- 1999) is widely used in NLP tasks including ma- mance and reduce the impact of a lan- guage’s morphology on language modeling. chine translation (Sennrich et al., 2016) as an un- supervised information-theoretic method for seg- menting text data into subword units. Variants of 1 Introduction BPE or closely related methods such as WordPiece (Kudo, 2018) are frequently employed by state- With most research in Natural Language Process- of-the-art pretrained language models (Liu et al., ing (NLP) directed at a small subset of the world’s 2019; Radford et al., 2019; Devlin et al., 2019; languages, whether the techniques developed are Yang et al., 2019). However, BPE and other seg- truly language-agnostic is often not known. Be- arXiv:2012.06262v1 [cs.CL] 11 Dec 2020 mentation methods may vary in how closely they cause the vast majority of research focuses on capture morphological segments for a given lan- English with Chinese a distant second (Mielke, guage, which may affect language modeling per- 2016), neither of which is morphologically rich, formance. the impact of morphology on NLP tasks for vari- Therefore, this paper focuses on the following ous languages is not entirely understood. two research questions: Several studies have investigated this issue in the context of language modeling by comparing 1. Does a language’s morphology influence lan- a number of languages, but found conflicting re- guage modeling difficulty? sults. Gerz et al.(2018) and Cotterell et al.(2018) 2. If so, how do different segmentation methods ∗Work done while at University of Colorado Boulder interact with morphology? 1https://github.com/hayleypark/ MorphologyMatters In order to answer the first question, we train models using data sets segmented by characters guage modeling performance. and BPE units. Our results show that BPE lan- Cotterell et al.(2018) arrived at a similar con- guage modeling surprisal is significantly corre- clusion modeling 21 languages using the Europarl lated with measures of morphological typology corpus (Koehn, 2005). When trained with n-gram and complexity. This suggests that BPE segments and character-based Long Short-Term Memory are ineffective in mitigating the effect of morphol- (LSTM) models, the languages showed different ogy in language modeling. modeling difficulties, which were correlated with As for the second question, we consider more a measure of morphology, Morphological Count- linguistically-motivated segmentation methods to ing Complexity (MCC) or the number of inflec- compare with BPE: Morfessor (Creutz and La- tional categories (Sagot, 2013). gus, 2007) and Finite-State Transducers (FSTs) However, Mielke et al.(2019) failed to repro- (see §4.3). Our comparison of the models us- duce the correlation with MCC when they in- ing the different segmentation methods shows that creased the scope to 69 languages, utilizing a Morfessor reduces the impact of morphology for Bible corpus (Mayer and Cysouw, 2014). They more languages than BPE. FST-based segmenta- also reported no correlation with measures of tion methods outperform the other segmentation morphosyntactic complexity such as head-POS methods when available. These results suggest entropy (Dehouck and Denis, 2018) and other that morphologically motivated segmentations im- linguist-generated features (Dryer and Haspel- prove cross-linguistic language modeling. math, 2013). Rather, they found that simpler statistics, namely the number of types and num- 2 Modeling difficulty across languages ber of characters per word, correlate with language model surprisal using BPE and character segmen- Studies have demonstrated that different lan- tation, respectively. guages may be unequally difficult to model and have tested the relations between such model- 3 Morphological measures ing difficulty and morphological properties of lan- guages, using different segmentation methods. Different measures of morphology are used to rep- Vania and Lopez(2017) compared the effective- resent a language’s morphology. ness of word representations based on different segmentation methods in modeling 10 languages 3.1 Linguist-generated measures with various morphological typologies. They The most linguistically-informed measures of trained word-level language models, but utilize morphology involve expert descriptions of lan- segmentation methods to create word embeddings guages. The World Atlas of Language Structures that include segment-level information. Compar- (WALS; Dryer and Haspelmath, 2013) has been ing character, BPE, and Morfessor segmentations, used frequently in the literature to provide typo- they concluded that character-based representa- logical information. WALS is a large database of tions were most effective across languages, with linguistic features gathered from descriptive mate- BPE always outperforming Morfessor. However, rials, such as reference grammars. It contains 144 models based on hand-crafted morphological anal- chapters in 11 areas including phonology, mor- yses outperformed all other segmentation methods phology, and word order. Each chapter describes a by a wide margin. feature with categorical values and lists languages Gerz et al.(2018) trained n-gram and neural lan- that have each value. However, not all languages guage models over 50 languages and argued that in the database have data for all the features, and the type of morphological system is predictive of for some languages there is no data at all. model performance. Their results show that lan- The studies reviewed in §2 all relied on this guages differ with regard to modeling difficulty. expert-description approach to quantify morpho- They attributed the differences among languages logical properties. Gerz et al.(2018) focused to four types of morphological systems: isolating, on WALS descriptions of inflectional synthesis fusional, introflexive, and agglutinative. While of verbs, fusion, exponence, and flexivity, while they found a significant association between the Mielke et al.(2019) looked at two WALS fea- morphological type and modeling difficulty, Type- tures, 26A “Prefixing vs. Suffixing in Inflectional Token Ratio (TTR) was the most predictive of lan- Morphology” and 81A “Order of Subject, Object and Verb.” Cotterell et al.(2018) used UniMorph Measure Definition (Kirov et al., 2018), instead of WALS, to calcu- late MCC. Vania and Lopez(2017) did not cite Types Number of unique word tokens any databases but provided descriptions of four TTR Number of unique word tokens di- morphological types (fusional, agglutinative, root- vided by total number of word to- and-pattern, and reduplication) and categorized 10 kens languages into these types. MATTR Average TTR calculated over a moving window of 500 word tokens A major issue with this approach to represent- MLW Average number of characters per ing morphology is that there is not enough expert word token data available to enable comparisons across many different languages. In fact, Mielke et al.(2019) Table 1: Corpus-based measures of morphology de- chose their two WALS features because data for fined for this study. These measures are calculated on these features existed for most of their languages. tokenized data sets before applying any segmentation Moreover, Bentz et al.(2016) showed that their method. WALS-based measure had lower
Recommended publications
  • ٧٠٩ Morphological Typology
    جامعة واسط العـــــــــــــــدد السابع والثﻻثون مجلــــــــة كليــــــــة التربيــــــة الجزء اﻷول / تشرين الثاني / 2019 Morphological Typology: A Comparative Study of Some Selected Languages Bushra Farhood Khudheyier AlA'amiri Prof. Dr. Abdulkareem Fadhil Jameel, Ph.D. Department of English, College of Education/ Ibn Rushud for Humanities Baghdad University, Baghdad, Iraq Abstract Morphology is a main part of English linguistics which deals with forms of words. Morphological typology organizes languages on the basis of these word forms. This organization of languages depends on structural features to mould morphological, patterns, typologising languages, assigning them to analytic, or synthetic types on the base of words segmentability and invariance, or measuring the number of morphemes per word. Morphological typology studies the universals in languages, the differences and similarities between languages in the structural patterns found in different languages, which occur within a restricted range. This paper aims at distinguishing the various types of several universal languages and comparing them with English. The comparison of languages are set according to the number of morphemes, the degree of being analytic ,or synthetic languages by given examples of each type. Accordingly, it is hypothesed that languages are either to be analytic, or synthetic according to the syntactic and morphological form of morphemes and their meaning relation. The analytical procedures consist of expressing the morphological types with some selected examples, then making the comparison between each type and English. The conclusions reached at to the point of the existence of similarity between these morphological types . English is Analytic , but it has some synythetic aspects, so it validated the first hypothesis and not entirely refuted the second one.
    [Show full text]
  • Morphosyntactic Development of Bangla-Speaking Preschool Children
    1 Morphosyntactic development of Bangla-speaking preschool children Morphosyntactic development of Bangla-speaking preschool children Asifa Sultana BRAC University, Bangladesh Stephanie Stokes and Thomas Klee The University of Hong Kong, Hong Kong Paul Fletcher University College Cork, Ireland Abstract This study examines the morphosyntactic development, specifically verb morphology, of typically-developing Bangla-speaking children between the ages of two and four. Three verb forms were studied: the Present Simple, the Present Progressive, and the Past Progressive. The study was motivated by the observations that reliable language-specific developmental information is not available in Bangla (which affects research in the areas of speech and language therapy and computational linguistics) and that properties of these verb forms render them suitable for exploring how language typology contributes to the learnability of verb morphology in emerging child language. Children’s performance on these forms was assessed through form-specific language elicitation tasks and spontaneous language samples. Three stages of development of verb morphology were identified by consideration of accuracy of production and error types. 2 Morphosyntactic development of Bangla-speaking preschool children Keywords Language acquisition, language typology, verb morphology, Bengali, Bangla Introduction Typological characteristics of languages have often been found to regulate the patterns of language development in children. Crosslinguistic investigations have revealed that, depending on the morphological character of the language investigated, young typically- developing (TD) children’s productions of verbal inflections is characterised by omission of obligatory tense markers on verbs, failure to mark agreement correctly on verbs, or substitution of incorrect verb forms for target verb forms (e.g. Leonard, Caselli, & Devescovi, 2002; Wexler, 1994).
    [Show full text]
  • SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection Ekaterina Vylomova@ Jennifer Whiteq Elizabeth Saleskyz Sabrina J
    SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection Ekaterina Vylomova@ Jennifer WhiteQ Elizabeth SaleskyZ Sabrina J. MielkeZ Shijie WuZ Edoardo PontiQ Rowan Hall MaudslayQ Ran ZmigrodQ Josef ValvodaQ Svetlana ToldovaE Francis TyersI;E Elena KlyachkoE Ilya YegorovM Natalia KrizhanovskyK Paula CzarnowskaQ Irene NikkarinenQ Andrew KrizhanovskyK Tiago PimentelQ Lucas Torroba HennigenQ Christo Kirov5 Garrett Nicolaiá Adina WilliamsF Antonios Anastasopoulosì Hilaria CruzL Eleanor Chodroff7 Ryan CotterellQ;D Miikka Silfverbergá Mans HuldenX @University of Melbourne QUniversity of Cambridge ZJohns Hopkins University EHigher School of Economics MMoscow State University KKarelian Research Centre 5Google AI áUniversity of British Columbia FFacebook AI Research ìCarnegie Mellon University IIndiana University LUniversity of Louisville 7University of York DETH Zürich XUniversity of Colorado Boulder [email protected] [email protected] Abstract 1950 and more recently, List et al., 2016), gram- matical features, and even abstract implications A broad goal in natural language processing (NLP) is to develop a system that has the capac- (proposed in Greenberg, 1963), each language nev- ity to process any natural language. Most sys- ertheless has a unique evolutionary trajectory that tems, however, are developed using data from is affected by geographic, social, cultural, and just one language such as English. The SIG- other factors. As a result, the surface form of MORPHON 2020 shared task on morpholog- languages varies substantially. The morphology ical reinflection aims to investigate systems’ of languages can differ in many ways: Some ability to generalize across typologically dis- exhibit rich grammatical case systems (e.g., 12 tinct languages, many of which are low re- in Erzya and 24 in Veps) and mark possessive- source.
    [Show full text]
  • An Introduction to Linguistic Typology
    An Introduction to Linguistic Typology An Introduction to Linguistic Typology Viveka Velupillai University of Giessen John Benjamins Publishing Company Amsterdam / Philadelphia TM The paper used in this publication meets the minimum requirements of 8 the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. Library of Congress Cataloging-in-Publication Data An introduction to linguistic typology / Viveka Velupillai. â. p cm. â Includes bibliographical references and index. 1. Typology (Linguistics) 2. Linguistic universals. I. Title. P204.V45 â 2012 415--dc23 2012020909 isbn 978 90 272 1198 9 (Hb; alk. paper) isbn 978 90 272 1199 6 (Pb; alk. paper) isbn 978 90 272 7350 5 (Eb) © 2012 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company • P.O. Box 36224 • 1020 me Amsterdam • The Netherlands John Benjamins North America • P.O. Box 27519 • Philadelphia PA 19118-0519 • USA V. Velupillai: Introduction to Typology NON-PUBLIC VERSION: PLEASE DO NOT CITE OR DISSEMINATE!! ForFor AlTô VelaVela anchoranchor and and inspiration inspiration 2 Table of contents Acknowledgements xv Abbreviations xvii Abbreviations for sign language names xx Database acronyms xxi Languages cited in chapter 1 xxii 1. Introduction 1 1.1 Fast forward from the past to the present 1 1.2 The purpose of this book 3 1.3 Conventions 5 1.3.1 Some remarks on the languages cited in this book 5 1.3.2 Some remarks on the examples in this book 8 1.4 The structure of this book 10 1.5 Keywords 12 1.6 Exercises 12 Languages cited in chapter 2 14 2.
    [Show full text]
  • Morphology and Syntax Slides
    Morphology and Syntax A Typological Approach David R. Mortensen Language Technologies Institute Carnegie Mellon University November 1, 2018 1 Morphology What is Morphology? What is a Word? Formal Operations Morphological Functions Traditional Typology of Morphology Improved Typological Features 2 Syntax What is Syntax Constituency Dependency Word Order Typology 3 Conclusion Morphology Linguistic Morphology is the study of the structure of words 1 Morphology What is Morphology? What is a Word? Formal Operations Morphological Functions Traditional Typology of Morphology Improved Typological Features 2 Syntax What is Syntax Constituency Dependency Word Order Typology 3 Conclusion Breaking the definition down Morphology is the study of the structure of words Assumptions There are linguistic units called “words” These units can have internal structure Examples un-dead king-fish-er-s re-implement-ation-s 同志们 tong-zhi-men same-purpose-pl ‘comrades’ 牛肉 niu-rou cattle-meat ‘beef’ The minimal meaningful units of words are called morphemes Hierarchical structure Words are not just sequences of morphemes Words have hierarchical structure Examples: kingfishers tongzhimen kingfisher -s tongzhi -men king fisher tong zhi fish -er 1 Morphology What is Morphology? What is a Word? Formal Operations Morphological Functions Traditional Typology of Morphology Improved Typological Features 2 Syntax What is Syntax Constituency Dependency Word Order Typology 3 Conclusion The problem of wordhood Perhaps the most difficult aspect of morphology is providing a good, cross-linguistically
    [Show full text]
  • Bachelor Thesis
    2007:085 BACHELOR THESIS A comparison of the morphological typology of Swedish and English noun phrases Emma Andersson Palola Luleå University of Technology Bachelor thesis English Department of Language and Culture 2007:085 - ISSN: 1402-1773 - ISRN: LTU-CUPP--07/085--SE A comparison of the morphological typology of Swedish and English noun phrases Emma Andersson Palola English 3 for Teachers Supervisor: Marie Nordlund Abstract The purpose of this study has been to examine the similarities and differences of the morphological typology of English and Swedish noun phrases. The principal aim has been to analyse the synthetic and analytic properties that English and Swedish possess and compare them to each other. The secondary aim has been to analyse and compare their agglutinative and fusional properties. According to their predominant characteristics, the languages have been positioned on the Index of Synthesis and the Index of Fusion respectively. The results show that English is mainly an analytic language, while Swedish is mainly a synthetic language. English also shows predominantly agglutinating characteristics, while Swedish shows mainly fusional characteristics. Table of contents 1. Introduction ............................................................................................................................ 1 1.1. Aim.................................................................................................................................. 2 1.2. Method and material........................................................................................................2
    [Show full text]
  • Past, Present, Future: a Computational Investigation of the Typology of Tense in 1000 Languages
    Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages Ehsaneddin Asgari1,2 and Hinrich Schutze¨ 1 1Center for Information and Language Processing, LMU Munich, Germany 2Applied Science and Technology, University of California, Berkeley, CA, USA, [email protected] [email protected] Abstract We address this challenge by proposing a new method for analyzing what we call superparallel We present SuperPivot, an analysis corpora, corpora that are by an order of magnitude method for low-resource languages that more parallel than corpora that have been available occur in a superparallel corpus, i.e., in a in NLP to date. The corpus we work with in this corpus that contains an order of magni- paper is the Parallel Bible Corpus (PBC) that con- tude more languages than parallel corpora sists of translations of the New Testament in 1169 currently in use. We show that SuperPivot languages. Given that no NLP analysis tools are performs well for the crosslingual analysis available for most of these 1169 languages, how of the linguistic phenomenon of tense. can we extract the rich information that is poten- We produce analysis results for more than tially hidden in such superparallel corpora? 1000 languages, conducting – to the best The method we propose is based on two hy- of our knowledge – the largest crosslin- potheses. H1 Existence of overt encoding. For gual computational study performed to any important linguistic distinction f that is fre- date. We extend existing methodology for quently encoded across languages in the world, leveraging parallel corpora for typological there are a few languages that encode f overtly analysis by overcoming a limiting as- on the surface.
    [Show full text]
  • Morphological Typology of the 19Th Century
    Morphological language classification Languages of the world Gerhard Jäger • Different degrees of morphological complexity • exs.: Yay (Southern China; (a)) vs. Oneida (North America; (b)) Morphological typology of the 19th century • The – implicit – premise of the first language typology is that morphology, especially inflection, forms the core of the language system. • Two paramters are in the center of interest: 1) Expression of grammatical meaning, i.e. the degree of their grammaticalization: - concrete vs. Abstract - degree of binding and fusion of grammatical forms 2) Degree of complexity of the word vs. Complexity of the sentence (degree of synthesis) Friedrich Schlegel's Types: inflecting vs. isolating Entweder werden die Nebenbestimmungen der Bedeutung durch innre Veränderungen des Wurzellauts angezeigt, durch Flexion, oder aber jedesmal durch ein eignes hinzugefügtes Wort, was schon an und für sich Mehrheit, Vergangenheit, ein zukünftiges Sollen oder andre Verhältnisbegriffe der Art bedeutet; und diese beiden einfachsten Fälle bezeichnen auch die Hauptgattungen aller Sprache. Alle übrigen Fälle sind bei näherer Ansicht nur Modifikationen und Nebenarten jener beiden Gattungen; daher dieser Gegensatz auch das ganze in Rücksicht auf die Mannigfaltigkeit der Wurzeln unermeßliche Gebiet der Sprache umfaßt und völlig erschöpft. [Schlegel 1808: 45] August Wilhelm Schlegel's three classes: isolating – agglutinating - inflecting • 1) Languages without any grammatical structure, • 2) languages that use affixes, • 3) languages with inflection. Among inflecting languages, two kinds are distinguished, synthetic and analytic. The analytic languages have: - articles preceding the noun, - personal pronouns preceding the verb, - auxiliary verbs in verbal inflection, - prepositions instead of case, - adverbs for comparative of adjective etc. The synthetic languages do not have such paraphrases (periphrases).
    [Show full text]
  • Diachronic Atlas of Comparative Linguistics
    DIACHRONIC ATLAS OF COMPARATIVE LINGUISTICS DiACL – Diachronic Atlas of Comparative Linguistics Online. Description of Dataset DiACL/ Typology/ Eurasia Author: Gerd Carling (Lund University) Date of publication: 2017-03-07 Contributors (dataset design, feeding, language control): Gerd Carling, Filip Larsson, Niklas Johansson, Anne Goergens, Arthur Holmer, Karina Vamling, Chundra Cathcart, Erich Round, Maka Tetradze, Tamuna Lomadze, Ketevan Gurchiani, and Revaz Tchantouria. Reference, database: Carling, Gerd (ed.) 2017. Diachronic Atlas of Comparative Linguistics Online. Lund University. Accessed on: z URL: https://diacl.ht.lu.se/ Reference, dataset: Carling, Gerd, Filip Larsson, Niklas Johansson, Anne Goergens, Arthur Holmer, Karina Vamling, Chundra Cathcart, Erich Round, Maka Tetradze, Tamuna Lomadze, Ketevan Gurchiani, and Revaz Tchantouria. Diachronic Atlas of Comparative Linguistics Online. Dataset DiACL/ Typology/ Eurasia. Accessed on: z URL: https://diacl.ht.lu.se/TypoGrid/Index?area=550 Reference, document: Carling, Gerd (2017). DiACL – Diachronic Atlas of Comparative Linguistics Online. Description of dataset DiACL/ Typology/ Eurasia. Lund University. Table of Content §1. Feature design, language selection, and coding policy ..................................................................... 2 §1.1. Description of the feature-hierarchical model ............................................................................ 2 §1.2. Dataset: selection of features and languages .............................................................................
    [Show full text]
  • Morphological Typology Andrew R
    University of Kentucky UKnowledge Linguistics Faculty Publications Linguistics 2011 Morphological Typology Andrew R. Hippisley University of Kentucky, [email protected] Right click to open a feedback form in a new tab to let us know how this document benefits oy u. Follow this and additional works at: https://uknowledge.uky.edu/lin_facpub Part of the Morphology Commons Repository Citation Hippisley, Andrew R., "Morphological Typology" (2011). Linguistics Faculty Publications. 65. https://uknowledge.uky.edu/lin_facpub/65 This Contribution to Reference Work is brought to you for free and open access by the Linguistics at UKnowledge. It has been accepted for inclusion in Linguistics Faculty Publications by an authorized administrator of UKnowledge. For more information, please contact [email protected]. Morphological Typology Notes/Citation Information Published in The Cambridge Encyclopedia of the Language Sciences, Patrick Colm Hogan (Ed.), p. 515. © Cambridge University Press 2011 The opc yright holder has granted permission for posting the encyclopedia entry here. This contribution to reference work is available at UKnowledge: https://uknowledge.uky.edu/lin_facpub/65 Morphological Typology Morphology ev -ler- im - iz -den house -PL-POSS. l-PL -ABL The third type of language expresses differences in morphosyn­ tactic and lexicosemantic properties through contrasting modi­ fications, or "inflections" of a word's stem. Theseare injlectional orfusionallanguages. The classicallanguages, Greek, Latin, and Sanskrit, belong to this type. In Latin "you (sg[singularJ) loved" is expressed by various modifications of the root am- "Iove 11 to yield amävistf: stem formative -äv to express perfect, and -islfto express perfect (again) + 2d person + singular. Typically, proper­ ties are "fused" in one exponent: Here aspect, person and num­ ber agreement are expressed together.
    [Show full text]
  • Morphological Systems: Some Cases from Native America1
    CONTACT-INDUCED CHANGE AND THE OPENNESS OF ‘CLOSED’ MORPHOLOGICAL SYSTEMS: SOME CASES FROM NATIVE AMERICA1 Anthony P. Grant Edge Hill University, Ormskirk, Lancashire, UK 1. Introduction Linguists who try to classify languages firstly into families and then into their constituent subgroups (thereby practising genetic classification or, as Malcolm Ross’s happier phrase has it, genealogical classification) operate with a certain number of assumptions. Some of these have to do with the role of borrowing or, as it has increasingly (and more felicitously) been called since the publication of Thomason & Kaufman (1988), contact-induced (language) change (CIC or CILC). There are some areas in which the operation of CILC is assumed to have fairly free rein in very many languages, such as the absorption of large quantities of culturally-salient vocabulary reflecting the mores and mechanisms of speakers of a prestige language. We have come a long way from the pre-Sir William Jones days of the 1700s, when similarities between languages were often thought to be evidence of borrowing (when in truth they were genealogically based), while actual borrowings—to say nothing of typological similarities grosso modo—were often erroneously taken to show some kind of non-existent genealogical linguistic relationship. But there are some areas where it is generally (if incorrectly) assumed that borrowing cannot and has not taken place. One of these is the realm of inflectional morphology, and the stricture is felt to apply especially strongly if such morphology is bound rather than free, or if it constitutes components of a paradigm. An exception to the claim that such morphology cannot be borrowed is made by some linguists in the case of the handful of ‘mixed languages’ which seem to have different origins for their morphology and for their basic lexicon.
    [Show full text]
  • Linguistic Typology
    Chapter 6: Linguistic Typology Chapter 6 Linguistic typology Simply speaking, the study of universals is concerned with what human languages have in common, while the study of typology deals with ways in which languages differ from each other. This contrast, however, is not sharp. When languages differ from each other, the variation is not random, but subject to limitations. Linguistic typology is not only concerned with variation, but also with the limitations on the degree of variation found in the languages of the world. It is due to these limitations that languages may be meaningfully divided into various types. For instance, typologists often divide languages into types according to so- called basic word order, often understood as the order of subject (S), object (O) and verb (V) in a typical declarative sentence. The vast majority of the languages of the world fall into one of three groups: SOV (Japanese, Tamil, Turkish etc.) SVO (Fula, Chinese, English etc.) VSO (Arabic, Tongan, Welsh etc.) Logically speaking, there should be nothing wrong with the three other possibilities: VOS, OVS and OSV. As mentioned above, however, they are exceedingly rare and typically occur in areas that have been relatively isolated. The three main groups have one thing in common, that the subject precedes the object. It is a small step, therefore, from basic word order typology to the formulation of the statistical universal we became acquainted with in the previous chapter: Subjects tend strongly to precede objects. The study of typology and the study of universals, therefore, go hand in hand. In this chapter, we will have a look at morphological typology, word order typology, the typology of motion verbs, and the typological distinction between tone languages and stress languages.
    [Show full text]