Past, Present, Future: a Computational Investigation of the Typology of Tense in 1000 Languages

Total Page:16

File Type:pdf, Size:1020Kb

Past, Present, Future: a Computational Investigation of the Typology of Tense in 1000 Languages Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages Ehsaneddin Asgari1,2 and Hinrich Schutze¨ 1 1Center for Information and Language Processing, LMU Munich, Germany 2Applied Science and Technology, University of California, Berkeley, CA, USA, [email protected] [email protected] Abstract We address this challenge by proposing a new method for analyzing what we call superparallel We present SuperPivot, an analysis corpora, corpora that are by an order of magnitude method for low-resource languages that more parallel than corpora that have been available occur in a superparallel corpus, i.e., in a in NLP to date. The corpus we work with in this corpus that contains an order of magni- paper is the Parallel Bible Corpus (PBC) that con- tude more languages than parallel corpora sists of translations of the New Testament in 1169 currently in use. We show that SuperPivot languages. Given that no NLP analysis tools are performs well for the crosslingual analysis available for most of these 1169 languages, how of the linguistic phenomenon of tense. can we extract the rich information that is poten- We produce analysis results for more than tially hidden in such superparallel corpora? 1000 languages, conducting – to the best The method we propose is based on two hy- of our knowledge – the largest crosslin- potheses. H1 Existence of overt encoding. For gual computational study performed to any important linguistic distinction f that is fre- date. We extend existing methodology for quently encoded across languages in the world, leveraging parallel corpora for typological there are a few languages that encode f overtly analysis by overcoming a limiting as- on the surface. H2 Overt-to-overt and overt-to- sumption of earlier work: We only require non-overt projection. For a language l that en- that a linguistic feature is overtly marked codes f, a projection of f from the “overt lan- in a few of thousands of languages as guages” to l in the superparallel corpus will iden- opposed to requiring that it be marked in tify the encoding that l uses for f, both in cases all languages under investigation. in which the encoding that l uses is overt and in cases in which the encoding that l uses is non- 1 Introduction overt. Based on these two hypotheses, our method Significant linguistic resources such as machine- proceeds in 5 steps. readable lexicons and part-of-speech (POS) tag- 1. Selection of a linguistic feature. We select a gers are available for at most a few hundred lan- linguistic feature f of interest. Running example: guages. This means that the majority of the We select past tense as feature f. languages of the world are low-resource. Low- 2. Heuristic search for head pivot. Through resource languages like Fulani are spoken by tens a heuristic search, we find a language lh that con- of millions of people and are politically and eco- tains a head pivot ph that is highly correlated with nomically important; e.g., to manage a sudden the linguistic feature of interest. refugee crisis, NLP tools would be of great ben- Running example: “ti” in Seychelles Creole efit. Even “small” languages are important for (CRS). CRS “ti” meets our requirements for a the preservation of the common heritage of hu- head pivot well as will be verified empirically in mankind that includes natural remedies and lin- 3. First, “ti” is a surface marker: it is easily § guistic and cultural diversity that can potentially identifable through whitespace tokenization and it enrich everybody. Thus, developing analysis is not ambiguous, e.g., it does not have a second methods for low-resource languages is one of the meaning apart from being a grammatical marker. most important challenges of NLP today. Second, “ti” is a good marker for past tense in 113 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 113–124 Copenhagen, Denmark, September 7–11, 2017. c 2017 Association for Computational Linguistics terms of both “precision” and “recall”. CRS has progressive and perfective aspect, is given in 4. § mandatory past tense marking (as opposed to lan- Running example: We compute the correla- guages in which tense marking is facultative) and tion of “ti” with words in other languages and se- “ti” is highly correlated with the general notion of lect the 100 most highly correlated words as piv- past tense. ots. Examples of pivots we find this way are Tor- This does not mean that every clause that a lin- res Strait Creole “bin” (from English “been”) and guist would regard as past tense is marked with Tzotzil “laj”. “laj” is a perfective marker, e.g., “ti” in CRS. For example, some tense-aspect con- “Laj meltzaj -uk” ‘LAJ be-made subj’ means “It’s figurations that are similar to English present per- done being built” (Aissen, 1987). fect are marked with “in” in CRS, not with “ti” 4. Projection of pivot set to all languages. (e.g., ENG “has commanded” is translated as “in Now that we have a large pivot set, we project the ordonn”). pivots to all other languages to search for linguis- Our goal is not to find a head language and a tic devices that express the linguistic feature f. Up head pivot that is a perfect marker of f. Such a to this point, we have made the assumption that it head pivot probably does not exist; or, more pre- is easy to segment text in all languages into pieces cisely, linguistic features are not completely rigor- of a size that is not too small (individual charac- ously defined. In a sense, one of the contributions ters of the Latin alphabet would be too small) and of this work is that we provide more rigorous defi- not too large (entire sentences as tokens would be nitions of past tense across languages; e.g., “ti” in too large). Segmentation on standard delimiters CRS is one such rigorous definition of past tense is a good approximation for the majority of lan- and it automatically extends (through projection) guages – but not for all: it undersegments some to 1000 languages in the superparallel corpus. (e.g., the polysynthetic language Inuit) and over- segments others (e.g., languages that use punctua- 3. Projection of head pivot to larger pivot tion marks as regular characters). set. Based on an alignment of the head language to the other languages in the superparallel corpus, For this reason, we do not employ tokenization we project the head pivot to all other languages in this step. Rather we search for character n- grams (2 n 6) to find linguistic devices that and search for highly correlated surface markers, ≤ ≤ i.e., we search for additional pivots in other lan- express f. This implementation of the search pro- guages. This projection to more pivots achieves cedure is a limitation – there are many linguistic three goals. First, it makes the method more ro- devices that cannot be found using it, e.g., tem- bust. Relying on a single pivot would result in plates in templatic morphology. We leave address- ing this for future work ( 7). many errors due to the inherent noisiness of lin- § guistic data and because several components we Running example: We find “-ed” for English use (e.g., alignment of the languages in the su- and “-te” for German as surface features that are perparallel corpus) are imperfect. Second, as we highly correlated with the 100 past tense pivots. discussed above, the head pivot does not neces- 5. Linguistic analysis. The result of the previ- sarily have high “recall”; our example was that ous steps is a superparallel corpus that is richly an- CRS “ti” is not applied to certain clauses that notated with information about linguistic feature would be translated using present perfect in En- f. This structure can be exploited for the analysis glish. Thus, moving to a larger pivot set increases of a single language li that may be the focus of recall. Third, as we will see below, the pivot set a linguistic investigation. Starting with the char- can be leveraged to create a fine-grained map of acter n-grams that were found in the step “projec- the linguistic feature. Consider clauses referring tion of pivot set to all languages”, we can explore to eventualities in the past that English speakers their use and function, e.g, for the mined n-gram would render in past progressive, present perfect “-ed” in English (assuming English is the language and simple past tense. Our hope is that the pivot li and it is unfamiliar to us). Many of the other set will cover these distinctions, i.e., one of the 1000 languages provide annotations of linguistic pivots marks past progressive, but not present pre- feature f for li: both the languages that are part of fect and simple past, another pivot marks present the pivot set (e.g., Tzotzil “laj”) and the mined n- perfect, but not the other two and so on. An exam- grams in other languages that we may have some ple of this type of map, including distinctions like knowledge about (e.g., “-te” in German). 114 We can also use the structure we have gener- 2 SuperPivot: Description of method ated for typological analysis across languages fol- The linguistic lowing the work of Michael Cysouw ((Cysouw, 1. Selection of a linguistic feature. feature of interest f is selected by the person who 2014), 5). Our method is an advancement com- § performs a SuperPivot analysis, i.e., by a linguist, putationally over Cysouw’s work because our NLP researcher or data scientist.
Recommended publications
  • ٧٠٩ Morphological Typology
    جامعة واسط العـــــــــــــــدد السابع والثﻻثون مجلــــــــة كليــــــــة التربيــــــة الجزء اﻷول / تشرين الثاني / 2019 Morphological Typology: A Comparative Study of Some Selected Languages Bushra Farhood Khudheyier AlA'amiri Prof. Dr. Abdulkareem Fadhil Jameel, Ph.D. Department of English, College of Education/ Ibn Rushud for Humanities Baghdad University, Baghdad, Iraq Abstract Morphology is a main part of English linguistics which deals with forms of words. Morphological typology organizes languages on the basis of these word forms. This organization of languages depends on structural features to mould morphological, patterns, typologising languages, assigning them to analytic, or synthetic types on the base of words segmentability and invariance, or measuring the number of morphemes per word. Morphological typology studies the universals in languages, the differences and similarities between languages in the structural patterns found in different languages, which occur within a restricted range. This paper aims at distinguishing the various types of several universal languages and comparing them with English. The comparison of languages are set according to the number of morphemes, the degree of being analytic ,or synthetic languages by given examples of each type. Accordingly, it is hypothesed that languages are either to be analytic, or synthetic according to the syntactic and morphological form of morphemes and their meaning relation. The analytical procedures consist of expressing the morphological types with some selected examples, then making the comparison between each type and English. The conclusions reached at to the point of the existence of similarity between these morphological types . English is Analytic , but it has some synythetic aspects, so it validated the first hypothesis and not entirely refuted the second one.
    [Show full text]
  • Morphosyntactic Development of Bangla-Speaking Preschool Children
    1 Morphosyntactic development of Bangla-speaking preschool children Morphosyntactic development of Bangla-speaking preschool children Asifa Sultana BRAC University, Bangladesh Stephanie Stokes and Thomas Klee The University of Hong Kong, Hong Kong Paul Fletcher University College Cork, Ireland Abstract This study examines the morphosyntactic development, specifically verb morphology, of typically-developing Bangla-speaking children between the ages of two and four. Three verb forms were studied: the Present Simple, the Present Progressive, and the Past Progressive. The study was motivated by the observations that reliable language-specific developmental information is not available in Bangla (which affects research in the areas of speech and language therapy and computational linguistics) and that properties of these verb forms render them suitable for exploring how language typology contributes to the learnability of verb morphology in emerging child language. Children’s performance on these forms was assessed through form-specific language elicitation tasks and spontaneous language samples. Three stages of development of verb morphology were identified by consideration of accuracy of production and error types. 2 Morphosyntactic development of Bangla-speaking preschool children Keywords Language acquisition, language typology, verb morphology, Bengali, Bangla Introduction Typological characteristics of languages have often been found to regulate the patterns of language development in children. Crosslinguistic investigations have revealed that, depending on the morphological character of the language investigated, young typically- developing (TD) children’s productions of verbal inflections is characterised by omission of obligatory tense markers on verbs, failure to mark agreement correctly on verbs, or substitution of incorrect verb forms for target verb forms (e.g. Leonard, Caselli, & Devescovi, 2002; Wexler, 1994).
    [Show full text]
  • SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection Ekaterina Vylomova@ Jennifer Whiteq Elizabeth Saleskyz Sabrina J
    SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection Ekaterina Vylomova@ Jennifer WhiteQ Elizabeth SaleskyZ Sabrina J. MielkeZ Shijie WuZ Edoardo PontiQ Rowan Hall MaudslayQ Ran ZmigrodQ Josef ValvodaQ Svetlana ToldovaE Francis TyersI;E Elena KlyachkoE Ilya YegorovM Natalia KrizhanovskyK Paula CzarnowskaQ Irene NikkarinenQ Andrew KrizhanovskyK Tiago PimentelQ Lucas Torroba HennigenQ Christo Kirov5 Garrett Nicolaiá Adina WilliamsF Antonios Anastasopoulosì Hilaria CruzL Eleanor Chodroff7 Ryan CotterellQ;D Miikka Silfverbergá Mans HuldenX @University of Melbourne QUniversity of Cambridge ZJohns Hopkins University EHigher School of Economics MMoscow State University KKarelian Research Centre 5Google AI áUniversity of British Columbia FFacebook AI Research ìCarnegie Mellon University IIndiana University LUniversity of Louisville 7University of York DETH Zürich XUniversity of Colorado Boulder [email protected] [email protected] Abstract 1950 and more recently, List et al., 2016), gram- matical features, and even abstract implications A broad goal in natural language processing (NLP) is to develop a system that has the capac- (proposed in Greenberg, 1963), each language nev- ity to process any natural language. Most sys- ertheless has a unique evolutionary trajectory that tems, however, are developed using data from is affected by geographic, social, cultural, and just one language such as English. The SIG- other factors. As a result, the surface form of MORPHON 2020 shared task on morpholog- languages varies substantially. The morphology ical reinflection aims to investigate systems’ of languages can differ in many ways: Some ability to generalize across typologically dis- exhibit rich grammatical case systems (e.g., 12 tinct languages, many of which are low re- in Erzya and 24 in Veps) and mark possessive- source.
    [Show full text]
  • An Introduction to Linguistic Typology
    An Introduction to Linguistic Typology An Introduction to Linguistic Typology Viveka Velupillai University of Giessen John Benjamins Publishing Company Amsterdam / Philadelphia TM The paper used in this publication meets the minimum requirements of 8 the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. Library of Congress Cataloging-in-Publication Data An introduction to linguistic typology / Viveka Velupillai. â. p cm. â Includes bibliographical references and index. 1. Typology (Linguistics) 2. Linguistic universals. I. Title. P204.V45 â 2012 415--dc23 2012020909 isbn 978 90 272 1198 9 (Hb; alk. paper) isbn 978 90 272 1199 6 (Pb; alk. paper) isbn 978 90 272 7350 5 (Eb) © 2012 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company • P.O. Box 36224 • 1020 me Amsterdam • The Netherlands John Benjamins North America • P.O. Box 27519 • Philadelphia PA 19118-0519 • USA V. Velupillai: Introduction to Typology NON-PUBLIC VERSION: PLEASE DO NOT CITE OR DISSEMINATE!! ForFor AlTô VelaVela anchoranchor and and inspiration inspiration 2 Table of contents Acknowledgements xv Abbreviations xvii Abbreviations for sign language names xx Database acronyms xxi Languages cited in chapter 1 xxii 1. Introduction 1 1.1 Fast forward from the past to the present 1 1.2 The purpose of this book 3 1.3 Conventions 5 1.3.1 Some remarks on the languages cited in this book 5 1.3.2 Some remarks on the examples in this book 8 1.4 The structure of this book 10 1.5 Keywords 12 1.6 Exercises 12 Languages cited in chapter 2 14 2.
    [Show full text]
  • Morphology and Syntax Slides
    Morphology and Syntax A Typological Approach David R. Mortensen Language Technologies Institute Carnegie Mellon University November 1, 2018 1 Morphology What is Morphology? What is a Word? Formal Operations Morphological Functions Traditional Typology of Morphology Improved Typological Features 2 Syntax What is Syntax Constituency Dependency Word Order Typology 3 Conclusion Morphology Linguistic Morphology is the study of the structure of words 1 Morphology What is Morphology? What is a Word? Formal Operations Morphological Functions Traditional Typology of Morphology Improved Typological Features 2 Syntax What is Syntax Constituency Dependency Word Order Typology 3 Conclusion Breaking the definition down Morphology is the study of the structure of words Assumptions There are linguistic units called “words” These units can have internal structure Examples un-dead king-fish-er-s re-implement-ation-s 同志们 tong-zhi-men same-purpose-pl ‘comrades’ 牛肉 niu-rou cattle-meat ‘beef’ The minimal meaningful units of words are called morphemes Hierarchical structure Words are not just sequences of morphemes Words have hierarchical structure Examples: kingfishers tongzhimen kingfisher -s tongzhi -men king fisher tong zhi fish -er 1 Morphology What is Morphology? What is a Word? Formal Operations Morphological Functions Traditional Typology of Morphology Improved Typological Features 2 Syntax What is Syntax Constituency Dependency Word Order Typology 3 Conclusion The problem of wordhood Perhaps the most difficult aspect of morphology is providing a good, cross-linguistically
    [Show full text]
  • Bachelor Thesis
    2007:085 BACHELOR THESIS A comparison of the morphological typology of Swedish and English noun phrases Emma Andersson Palola Luleå University of Technology Bachelor thesis English Department of Language and Culture 2007:085 - ISSN: 1402-1773 - ISRN: LTU-CUPP--07/085--SE A comparison of the morphological typology of Swedish and English noun phrases Emma Andersson Palola English 3 for Teachers Supervisor: Marie Nordlund Abstract The purpose of this study has been to examine the similarities and differences of the morphological typology of English and Swedish noun phrases. The principal aim has been to analyse the synthetic and analytic properties that English and Swedish possess and compare them to each other. The secondary aim has been to analyse and compare their agglutinative and fusional properties. According to their predominant characteristics, the languages have been positioned on the Index of Synthesis and the Index of Fusion respectively. The results show that English is mainly an analytic language, while Swedish is mainly a synthetic language. English also shows predominantly agglutinating characteristics, while Swedish shows mainly fusional characteristics. Table of contents 1. Introduction ............................................................................................................................ 1 1.1. Aim.................................................................................................................................. 2 1.2. Method and material........................................................................................................2
    [Show full text]
  • Morphological Typology of the 19Th Century
    Morphological language classification Languages of the world Gerhard Jäger • Different degrees of morphological complexity • exs.: Yay (Southern China; (a)) vs. Oneida (North America; (b)) Morphological typology of the 19th century • The – implicit – premise of the first language typology is that morphology, especially inflection, forms the core of the language system. • Two paramters are in the center of interest: 1) Expression of grammatical meaning, i.e. the degree of their grammaticalization: - concrete vs. Abstract - degree of binding and fusion of grammatical forms 2) Degree of complexity of the word vs. Complexity of the sentence (degree of synthesis) Friedrich Schlegel's Types: inflecting vs. isolating Entweder werden die Nebenbestimmungen der Bedeutung durch innre Veränderungen des Wurzellauts angezeigt, durch Flexion, oder aber jedesmal durch ein eignes hinzugefügtes Wort, was schon an und für sich Mehrheit, Vergangenheit, ein zukünftiges Sollen oder andre Verhältnisbegriffe der Art bedeutet; und diese beiden einfachsten Fälle bezeichnen auch die Hauptgattungen aller Sprache. Alle übrigen Fälle sind bei näherer Ansicht nur Modifikationen und Nebenarten jener beiden Gattungen; daher dieser Gegensatz auch das ganze in Rücksicht auf die Mannigfaltigkeit der Wurzeln unermeßliche Gebiet der Sprache umfaßt und völlig erschöpft. [Schlegel 1808: 45] August Wilhelm Schlegel's three classes: isolating – agglutinating - inflecting • 1) Languages without any grammatical structure, • 2) languages that use affixes, • 3) languages with inflection. Among inflecting languages, two kinds are distinguished, synthetic and analytic. The analytic languages have: - articles preceding the noun, - personal pronouns preceding the verb, - auxiliary verbs in verbal inflection, - prepositions instead of case, - adverbs for comparative of adjective etc. The synthetic languages do not have such paraphrases (periphrases).
    [Show full text]
  • Diachronic Atlas of Comparative Linguistics
    DIACHRONIC ATLAS OF COMPARATIVE LINGUISTICS DiACL – Diachronic Atlas of Comparative Linguistics Online. Description of Dataset DiACL/ Typology/ Eurasia Author: Gerd Carling (Lund University) Date of publication: 2017-03-07 Contributors (dataset design, feeding, language control): Gerd Carling, Filip Larsson, Niklas Johansson, Anne Goergens, Arthur Holmer, Karina Vamling, Chundra Cathcart, Erich Round, Maka Tetradze, Tamuna Lomadze, Ketevan Gurchiani, and Revaz Tchantouria. Reference, database: Carling, Gerd (ed.) 2017. Diachronic Atlas of Comparative Linguistics Online. Lund University. Accessed on: z URL: https://diacl.ht.lu.se/ Reference, dataset: Carling, Gerd, Filip Larsson, Niklas Johansson, Anne Goergens, Arthur Holmer, Karina Vamling, Chundra Cathcart, Erich Round, Maka Tetradze, Tamuna Lomadze, Ketevan Gurchiani, and Revaz Tchantouria. Diachronic Atlas of Comparative Linguistics Online. Dataset DiACL/ Typology/ Eurasia. Accessed on: z URL: https://diacl.ht.lu.se/TypoGrid/Index?area=550 Reference, document: Carling, Gerd (2017). DiACL – Diachronic Atlas of Comparative Linguistics Online. Description of dataset DiACL/ Typology/ Eurasia. Lund University. Table of Content §1. Feature design, language selection, and coding policy ..................................................................... 2 §1.1. Description of the feature-hierarchical model ............................................................................ 2 §1.2. Dataset: selection of features and languages .............................................................................
    [Show full text]
  • Morphological Typology Andrew R
    University of Kentucky UKnowledge Linguistics Faculty Publications Linguistics 2011 Morphological Typology Andrew R. Hippisley University of Kentucky, [email protected] Right click to open a feedback form in a new tab to let us know how this document benefits oy u. Follow this and additional works at: https://uknowledge.uky.edu/lin_facpub Part of the Morphology Commons Repository Citation Hippisley, Andrew R., "Morphological Typology" (2011). Linguistics Faculty Publications. 65. https://uknowledge.uky.edu/lin_facpub/65 This Contribution to Reference Work is brought to you for free and open access by the Linguistics at UKnowledge. It has been accepted for inclusion in Linguistics Faculty Publications by an authorized administrator of UKnowledge. For more information, please contact [email protected]. Morphological Typology Notes/Citation Information Published in The Cambridge Encyclopedia of the Language Sciences, Patrick Colm Hogan (Ed.), p. 515. © Cambridge University Press 2011 The opc yright holder has granted permission for posting the encyclopedia entry here. This contribution to reference work is available at UKnowledge: https://uknowledge.uky.edu/lin_facpub/65 Morphological Typology Morphology ev -ler- im - iz -den house -PL-POSS. l-PL -ABL The third type of language expresses differences in morphosyn­ tactic and lexicosemantic properties through contrasting modi­ fications, or "inflections" of a word's stem. Theseare injlectional orfusionallanguages. The classicallanguages, Greek, Latin, and Sanskrit, belong to this type. In Latin "you (sg[singularJ) loved" is expressed by various modifications of the root am- "Iove 11 to yield amävistf: stem formative -äv to express perfect, and -islfto express perfect (again) + 2d person + singular. Typically, proper­ ties are "fused" in one exponent: Here aspect, person and num­ ber agreement are expressed together.
    [Show full text]
  • Morphological Systems: Some Cases from Native America1
    CONTACT-INDUCED CHANGE AND THE OPENNESS OF ‘CLOSED’ MORPHOLOGICAL SYSTEMS: SOME CASES FROM NATIVE AMERICA1 Anthony P. Grant Edge Hill University, Ormskirk, Lancashire, UK 1. Introduction Linguists who try to classify languages firstly into families and then into their constituent subgroups (thereby practising genetic classification or, as Malcolm Ross’s happier phrase has it, genealogical classification) operate with a certain number of assumptions. Some of these have to do with the role of borrowing or, as it has increasingly (and more felicitously) been called since the publication of Thomason & Kaufman (1988), contact-induced (language) change (CIC or CILC). There are some areas in which the operation of CILC is assumed to have fairly free rein in very many languages, such as the absorption of large quantities of culturally-salient vocabulary reflecting the mores and mechanisms of speakers of a prestige language. We have come a long way from the pre-Sir William Jones days of the 1700s, when similarities between languages were often thought to be evidence of borrowing (when in truth they were genealogically based), while actual borrowings—to say nothing of typological similarities grosso modo—were often erroneously taken to show some kind of non-existent genealogical linguistic relationship. But there are some areas where it is generally (if incorrectly) assumed that borrowing cannot and has not taken place. One of these is the realm of inflectional morphology, and the stricture is felt to apply especially strongly if such morphology is bound rather than free, or if it constitutes components of a paradigm. An exception to the claim that such morphology cannot be borrowed is made by some linguists in the case of the handful of ‘mixed languages’ which seem to have different origins for their morphology and for their basic lexicon.
    [Show full text]
  • Linguistic Typology
    Chapter 6: Linguistic Typology Chapter 6 Linguistic typology Simply speaking, the study of universals is concerned with what human languages have in common, while the study of typology deals with ways in which languages differ from each other. This contrast, however, is not sharp. When languages differ from each other, the variation is not random, but subject to limitations. Linguistic typology is not only concerned with variation, but also with the limitations on the degree of variation found in the languages of the world. It is due to these limitations that languages may be meaningfully divided into various types. For instance, typologists often divide languages into types according to so- called basic word order, often understood as the order of subject (S), object (O) and verb (V) in a typical declarative sentence. The vast majority of the languages of the world fall into one of three groups: SOV (Japanese, Tamil, Turkish etc.) SVO (Fula, Chinese, English etc.) VSO (Arabic, Tongan, Welsh etc.) Logically speaking, there should be nothing wrong with the three other possibilities: VOS, OVS and OSV. As mentioned above, however, they are exceedingly rare and typically occur in areas that have been relatively isolated. The three main groups have one thing in common, that the subject precedes the object. It is a small step, therefore, from basic word order typology to the formulation of the statistical universal we became acquainted with in the previous chapter: Subjects tend strongly to precede objects. The study of typology and the study of universals, therefore, go hand in hand. In this chapter, we will have a look at morphological typology, word order typology, the typology of motion verbs, and the typological distinction between tone languages and stress languages.
    [Show full text]
  • Recent Advances in the Corpus-Based Study of Linguistic Complexity
    Introduction About complexity Profiles Kolmogorov Variational Conclusion Recent advances in the corpus-based study of linguistic complexity Benedikt Szmrecsanyi KU Leuven Quantitative Lexicology and Variational Linguistics Download these slides at http://www.benszm.net/AACL.pdf Introduction About complexity Profiles Kolmogorov Variational Conclusion Prologue: Complexity is beautiful. Or is it? \Bavarian Baroque": Kloster Roggenburg. Introduction About complexity Profiles Kolmogorov Variational Conclusion Introduction Introduction About complexity Profiles Kolmogorov Variational Conclusion Introduction • linguistic complexity: one of the currently most hotly debated issues in linguistics (e.g. Sampson et al. 2009; Trudgill 2011; Pallotti 2014, among many others) • theoretical linguistics: are all languages, or language varieties, equally complex? If not, what are the sociolinguistic factors that condition language complexity? • applied linguistics: how can we use complexity measures as proxies for tracking learners' proficiency, and/or for benchmarking development? this presentation: outline three complexity measures that are usage/corpus-based and holistic Introduction About complexity Profiles Kolmogorov Variational Conclusion Problems with popular complexity measures • complexity notions in the theoretical literature (e.g. distinctions beyond communicative necessity): ê holistic but typically not usage-based • complexity measures in the applied literature (e.g. mean length of T-unit, extent of clausal subordination): ê usage-based but selective
    [Show full text]