Learning Language Representations for Typology Prediction

Total Page:16

File Type:pdf, Size:1020Kb

Learning Language Representations for Typology Prediction Learning Language Representations for Typology Prediction Chaitanya Malaviya and Graham Neubig and Patrick Littell Language Technologies Institute Carnegie Mellon University cmalaviy,gneubig,pwl @cs.cmu.edu { } Abstract One central mystery of neural NLP is what neural models “know” about their subject matter. When a neural machine transla- tion system learns to translate from one language to another, does it learn the syn- tax or semantics of the languages? Can this knowledge be extracted from the sys- Figure 1: Learning representations from mul- tem to fill holes in human scientific knowl- tilingual neural MT for typology classification. edge? Existing typological databases con- (Model MTBOTH) tain relatively full feature specifications for only a few hundred languages. Ex- ploiting the existence of parallel texts in Typological information from sources like the more than a thousand languages, we build World Atlas of Language Structures (WALS) a massive many-to-one neural machine (Dryer and Haspelmath, 2013), has proven use- translation (NMT) system from 1017 lan- ful in many NLP tasks (O’Horan et al., 2016), guages into English, and use this to pre- such as multilingual dependency parsing (Ammar dict information missing from typological et al., 2016), generative parsing in low-resource databases. Experiments show that the pro- settings (Naseem et al., 2012; Tackstr¨ om¨ et al., posed method is able to infer not only syn- 2013), phonological language modeling and loan- tactic, but also phonological and phonetic word prediction (Tsvetkov et al., 2016), POS- inventory features, and improves over a tagging (Zhang et al., 2012), and machine trans- baseline that has access to information lation (Daiber et al., 2016). about the languages’ geographic and phy- However, the needs of NLP tasks differ in many logenetic neighbors.1 ways from the needs of scientific typology, and ty- pological databases are often only sparsely pop- 1 Introduction ulated, by necessity or by design.2 In NLP, on the other hand, what is important is having a rela- Linguistic typology is the classification of human tively full set of features for the particular group languages according to syntactic, phonological, of languages you are working on. This mis- and other classes of features, and the investiga- match of needs has motivated various proposals tion of the relationships and correlations between to reconstruct missing entries, in WALS and other these classes/features. This study has been a sci- databases, from known entries (Daume´ III and entific pursuit in its own right since the 19th cen- Campbell, 2007; Daume´ III, 2009; Coke et al., tury (Greenberg, 1963; Comrie, 1989; Nichols, 2016; Littell et al., 2017). 1992), but recently typology has borne practical In this study, we examine whether we can fruit within various subfields of NLP, particularly on problems involving lower-resource languages. 2For example, each chapter of WALS aims to provide a statistically balanced set of languages over language families 1Code and learned vectors are available at http:// and geographical areas, and so many languages are left out in github.com/chaitanyamalaviya/lang-reps order to maintain balance. 2529 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2529–2535 Copenhagen, Denmark, September 7–11, 2017. c 2017 Association for Computational Linguistics tackle the problem of inferring linguistic typol- Glottolog (Hammarstrom¨ et al., 2015). These fea- ogy from parallel corpora, specifically by training tures are divided into separate classes regarding a massively multi-lingual neural machine trans- syntax (e.g. whether a language has prepositions lation (NMT) system and using the learned rep- or postpositions), phonology (e.g. whether a lan- resentations to infer typological features for each guage has complex syllabic onset clusters), and language. This is motivated both by prior work in phonetic inventory (e.g. whether a language has linguistics (Bugarski, 1991; Garc´ıa, 2002) demon- interdental fricatives). There are 103 syntactical strating strong links between translation studies features, 28 phonology features and 158 phonetic and tools for contrastive linguistic analysis, work inventory features in the database. in inferring typology from bilingual data (Ostling,¨ 2015) and English as Second Language texts Baseline Feature Vectors: Several previous (Berzak et al., 2014), as well as work in NLP (Shi methods take advantage of typological implica- et al., 2016; Kuncoro et al., 2017; Belinkov et al., ture, the fact that some typological traits corre- 2017) showing that syntactic knowledge can be late strongly with others, to use known features extracted from neural nets on the word-by-word of a language to help infer other unknown fea- or sentence-by-sentence level. This work presents tures of the language (Daume´ III and Campbell, a more holistic analysis of whether we can dis- 2007; Takamura et al., 2016; Coke et al., 2016). cover what neural networks learn about the lin- As an alternative that does not necessarily require guistic concepts of an entire language by aggre- pre-existing knowledge of the typological features gating their representations over a large number of in the language at hand, Littell et al. (2017) have the sentences in the language. proposed a method for inferring typological fea- tures directly from the language’s k nearest neigh- We examine several methods for discovering bors (k-NN) according to geodesic distance (dis- feature vectors for typology prediction, including tance on the Earth’s surface) and genetic distance those learning a language vector specifying the (distance according to a phylogenetic family tree). language while training multilingual neural lan- In our experiments, our baseline uses this method guage models (Ostling¨ and Tiedemann, 2017) or by taking the 3-NN for each language according neural machine translation (Johnson et al., 2016) to normalized geodesic+genetic distance, and cal- systems. We further propose a novel method for culating an average feature vector of these three aggregating the values of the latent state of the en- neighbors. coder neural network to a single vector represent- ing the entire language. We calculate these feature Typology Prediction: To perform prediction, vectors using an NMT model trained on 1017 lan- we trained a logistic regression classifier3 with guages, and use them for typlogy prediction both the baseline k-NN feature vectors described above on their own and in composite with feature vectors and the proposed NMT feature vectors described from previous work based on the genetic and geo- in the next section. We train individual classi- graphic distance between languages (Littell et al., fiers for predicting each typological feature in a 2017). Results show that the extracted representa- class (syntax etc). We performed 10-fold cross- tions do in fact allow us to learn about the typol- validation over the URIEL database, where we ogy of languages, with particular gains for syn- train on 9/10 of the languages to predict 1/10 of tactic features like word order and the presence of the languages for 10 folds over the data. case markers. 3 Learning Representations for Typology 2 Dataset and Experimental Setup Prediction Typology Database: To perform our analysis, In this section we describe three methods for we use the URIEL language typology database learning representations for typology prediction (Littell et al., 2017), which is a collection of bi- with multilingual neural models. nary features extracted from multiple typological, phylogenetic, and geographical databases such LM Language Vector Several methods have as WALS (World Atlas of Language Structures) been proposed to learn multilingual language (Collins and Kayne, 2011), PHOIBLE (Moran 3We experimented with a non-linear classifier as well, but et al., 2014), Ethnologue (Lewis et al., 2015), and the logistic regression classifier performed better. 2530 models (LMs) that utilize vector representations Syntax Phonology Inventory of languages (Tsvetkov et al., 2016; Ostling¨ and -Aux +Aux -Aux +Aux -Aux +Aux Tiedemann, 2017). Specifically, these models NONE 69.91 83.07 77.92 86.59 85.17 90.68 LMVEC 71.32 82.94 80.80 86.74 87.51 89.94 train a recurrent neural network LM (RNNLM; MTVEC 74.90 83.31 82.41 87.64 89.62 90.94 Mikolov et al. (2010)) using long short-term mem- MTCELL 75.91 85.14 84.33 88.80 90.01 90.85 ory (LSTM; Hochreiter and Schmidhuber (1997)) MTBOTH 77.11 86.33 85.77 89.04 90.06 91.03 with an additional vector representing the current Table 1: Accuracy of syntactic, phonological, language as an input. The expectation is that and inventory features using LM language vec- this vector will be able to capture the features of tors (LMVEC), MT language vectors (MTVEC), the language and improve LM accuracy. Ostling¨ MT encoder cell averages (MTCELL) or both and Tiedemann (2017) noted that, intriguingly, ag- MT feature vectors (MTBOTH). Aux indicates glomerative clustering of these language vectors auxiliary information of geodesic/genetic nearest results in something that looks roughly like a phy- neighbors; “NONE -Aux” is the majority class logenetic tree, but stopped short of performing ty- chance rate, while “NONE +Aux” is a 3-NN clas- pological inference. We train this vector by ap- sification. pending a special token representing the source language (e.g. “ fra ” for French) to the begin- h i ning of the source sentence, as shown in Fig. 1, sentiment (Radford et al., 2017), we expect that then using the word representation learned for this the cell states will represent features that may be token as a representation of the language. We will linked to the typology of the language. To cre- call this first set of feature vectors LMVEC, and ate vectors for each language using LSTM hidden examine their utility for typology prediction. states, we obtain the mean of cell states (c in the standard LSTM equations) for all time steps of all 4 NMT Language Vector In our second set of sentences in each language.
Recommended publications
  • Why Is Language Typology Possible?
    Why is language typology possible? Martin Haspelmath 1 Languages are incomparable Each language has its own system. Each language has its own categories. Each language is a world of its own. 2 Or are all languages like Latin? nominative the book genitive of the book dative to the book accusative the book ablative from the book 3 Or are all languages like English? 4 How could languages be compared? If languages are so different: What could be possible tertia comparationis (= entities that are identical across comparanda and thus permit comparison)? 5 Three approaches • Indeed, language typology is impossible (non- aprioristic structuralism) • Typology is possible based on cross-linguistic categories (aprioristic generativism) • Typology is possible without cross-linguistic categories (non-aprioristic typology) 6 Non-aprioristic structuralism: Franz Boas (1858-1942) The categories chosen for description in the Handbook “depend entirely on the inner form of each language...” Boas, Franz. 1911. Introduction to The Handbook of American Indian Languages. 7 Non-aprioristic structuralism: Ferdinand de Saussure (1857-1913) “dans la langue il n’y a que des différences...” (In a language there are only differences) i.e. all categories are determined by the ways in which they differ from other categories, and each language has different ways of cutting up the sound space and the meaning space de Saussure, Ferdinand. 1915. Cours de linguistique générale. 8 Example: Datives across languages cf. Haspelmath, Martin. 2003. The geometry of grammatical meaning: semantic maps and cross-linguistic comparison 9 Example: Datives across languages 10 Example: Datives across languages 11 Non-aprioristic structuralism: Peter H. Matthews (University of Cambridge) Matthews 1997:199: "To ask whether a language 'has' some category is...to ask a fairly sophisticated question..
    [Show full text]
  • Modeling Language Variation and Universals: a Survey on Typological Linguistics for Natural Language Processing
    Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing Edoardo Ponti, Helen O ’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, Thierry Poibeau, Ekaterina Shutova, Anna Korhonen To cite this version: Edoardo Ponti, Helen O ’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, et al.. Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing. 2018. hal-01856176 HAL Id: hal-01856176 https://hal.archives-ouvertes.fr/hal-01856176 Preprint submitted on 9 Aug 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing Edoardo Maria Ponti∗ Helen O’Horan∗∗ LTL, University of Cambridge LTL, University of Cambridge Yevgeni Berzaky Ivan Vuli´cz Department of Brain and Cognitive LTL, University of Cambridge Sciences, MIT Roi Reichart§ Thierry Poibeau# Faculty of Industrial Engineering and LATTICE Lab, CNRS and ENS/PSL and Management, Technion - IIT Univ. Sorbonne nouvelle/USPC Ekaterina Shutova** Anna Korhonenyy ILLC, University of Amsterdam LTL, University of Cambridge Understanding cross-lingual variation is essential for the development of effective multilingual natural language processing (NLP) applications.
    [Show full text]
  • Typology, Documentation, Description, and Typology
    Typology, Documentation, Description, and Typology Marianne Mithun University of California, Santa Barbara Abstract If the goals of linguistic typology, are, as described by Plank (2016): (a) to chart linguistic diversity (b) to seek out order or even unity in diversity knowledge of the current state of the art is an invaluable tool for almost any linguistic endeavor. For language documentation and description, knowing what distinctions, categories, and patterns have been observed in other languages makes it possible to identify them more quickly and thoroughly in an unfamiliar language. Knowing how they differ in detail can prompt us to tune into those details. Knowing what is rare cross-linguistically can ensure that unusual features are richly documented and prominent in descriptions. But if documentation and description are limited to filling in typological checklists, not only will much of the essence of each language be missed, but the field of typology will also suffer, as new variables and correlations will fail to surface, and our understanding of deeper factors behind cross-linguistic similarities and differences will not progress. 1. Typological awareness as a tool Looking at the work of early scholars such as Franz Boas and Edward Sapir, it is impossible not to be amazed at the richness of their documentation and the insight of their descriptions of languages so unlike the more familiar languages of Europe. It is unlikely that Boas first arrived on Baffin Island forewarned to watch for velar/uvular distinctions and ergativity. Now more than a century later, an awareness of what distinctions can be significant in languages and what kinds of systems recur can provide tremendous advantages, allowing us to spot potentially important features sooner and identify patterns on the basis of fewer examples.
    [Show full text]
  • Linguistic Typology in Language Teaching and Learning in Multilingual Environments, Migrants’ Integration, and Preservation of Minority Languages
    International Journal of Arts & Sciences, CD-ROM. ISSN: 1944-6934 :: 09(04):481–494 (2017) LINGUISTIC TYPOLOGY IN LANGUAGE TEACHING AND LEARNING IN MULTILINGUAL ENVIRONMENTS, MIGRANTS’ INTEGRATION, AND PRESERVATION OF MINORITY LANGUAGES Kristian Pérez Zurutuza EHU-UPV, UNED, Spain Linguistic Typology has been a useful tool for the establishment of a unitary linguistic structure of language as a cognitive construct of the mind with the usage of various typological patterns to build a methodology that enables multilingual language teaching in infant, primary, and secondary education environments. This has occurred by making use of unified typological items, such as word order pattern, verb inflection, comparative morphology, or syntax, among others; permitting young learners to acquire second language(s) parallel to their own mother tongue in a direct manner in multilingual contexts through a unique methodology common to all multiple languages used as target languages, but considering pupils’ native tongue as a solid reference. Such methodology breaks down language into a skeleton young learners approach to regardless of previous knowledge, cultural background, or age. It becomes sustainable at all levels, for all second languages are addressed through their features related to the native tongue of the learner, leading to a more efficient comprehension and quick, effortless, natural assimilation and acquisition through visual and memory methods, along a plethora of exercises that only act as tools for such process, not as the main source for language learning nor teaching. The methodology does not depend on economic basis, for it may be used traditionally, or with ICTs, in a fluid manner to help natives and migrants banish linguistic barriers when integrating within foreign communities and their educational institutions, as well as helping preserve and ensure the growth of minority languages through the increase of speakers, which may lead to the creation of cultural production of whatever type.
    [Show full text]
  • An Introduction to Linguistic Typology
    An Introduction to Linguistic Typology An Introduction to Linguistic Typology Viveka Velupillai University of Giessen John Benjamins Publishing Company Amsterdam / Philadelphia TM The paper used in this publication meets the minimum requirements of 8 the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. Library of Congress Cataloging-in-Publication Data An introduction to linguistic typology / Viveka Velupillai. â. p cm. â Includes bibliographical references and index. 1. Typology (Linguistics) 2. Linguistic universals. I. Title. P204.V45 â 2012 415--dc23 2012020909 isbn 978 90 272 1198 9 (Hb; alk. paper) isbn 978 90 272 1199 6 (Pb; alk. paper) isbn 978 90 272 7350 5 (Eb) © 2012 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company • P.O. Box 36224 • 1020 me Amsterdam • The Netherlands John Benjamins North America • P.O. Box 27519 • Philadelphia PA 19118-0519 • USA V. Velupillai: Introduction to Typology NON-PUBLIC VERSION: PLEASE DO NOT CITE OR DISSEMINATE!! ForFor AlTô VelaVela anchoranchor and and inspiration inspiration 2 Table of contents Acknowledgements xv Abbreviations xvii Abbreviations for sign language names xx Database acronyms xxi Languages cited in chapter 1 xxii 1. Introduction 1 1.1 Fast forward from the past to the present 1 1.2 The purpose of this book 3 1.3 Conventions 5 1.3.1 Some remarks on the languages cited in this book 5 1.3.2 Some remarks on the examples in this book 8 1.4 The structure of this book 10 1.5 Keywords 12 1.6 Exercises 12 Languages cited in chapter 2 14 2.
    [Show full text]
  • Conventionalization of Grammatical Anomalies Through Linearization*
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Illinois Digital Environment for Access to Learning and Scholarship Repository Conventionalization of grammatical anomalies through linearization* Daniel Ross University of Illinois at Urbana-Champaign [email protected] This article explores the diachronic influence of linear form in the grammaticalization of new constructions, especially those with distinctive functional and structural properties. For example, it is shown that the coordination of prenominal adjectives without an overt conjunction in English (relevant, interesting research) is due to reanalysis of an original historical structure expressing a modification relationship (tall young man). Examples of this type are discussed in the context of grammaticality illusions, where speakers parse structures and meanings other than intended by the speaker or when the blending of two grammatical constructions produces a syntactic amalgam. Diachronic changes based on linear form thus enter the grammar as exceptional but conventionalized constructions. The paper emphasizes the need to consider surface form as well as structure and meaning in both diachronic and synchronic analysis. 1. Introduction Conventional but structurally or functionally exceptional expressions may require special treatment in grammatical description and analysis, but nevertheless often resemble in form other constructions in the language. In this article, grammaticalization via reanalysis of linear form is explored and shown to be a possible source for such grammatical anomalies. The remainder of the introduction motivates the perspective for this research. Specific examples of new constructions reanalyzed based on surface form are discussed in Section 3, but first Section 2 provides background on grammaticality and grammaticality illusions, which are argued here to sometimes have lasting effects on a grammar.
    [Show full text]
  • Should We Use Characteristics of Conversation to Measure Grammatical Complexity in L2 Writing Development?
    Should We Use Characteristics of Conversation to Measure Grammatical Complexity in L2 Writing Development? DOUGLAS BIBER AND BETHANY GRAY Northern Arizona University Flagstaff, Arizona, United States KORNWIPA POONPON Khon Kaen University Khon Kaen, Thailand Studies of L2 writing development usually measure T-units and clausal subordination to assess grammatical complexity, assuming that increased subordination is typical of advanced writing. In this article we challenge this practice by showing that these measures are much more characteristic of conversation than academic writing. The article begins with a critical evaluation of T-units and clausal subordination as measures of writing development, arguing that they have not proven to be effective discriminators of language proficiency differences. These shortcomings lead to the question of whether these measures actually capture the complexities of professional academic writing, and if not, what alternative measures are better suited? Corpus-based analyses are undertaken to answer these questions, investigating 28 grammatical features in research articles contrasted with conversation. The results are surprising, showing that most clausal subordination measures are actually more common in conversation than academic writing. In contrast, fundamentally different kinds of grammatical complexity are common in academic writing: complex noun phrase constituents (rather than clause constituents) and complex phrases (rather than clauses). Based on these findings, we hypothesize a sequence of developmental stages for student writing, proposing a radically new approach for the study of complexity in student writing development. doi: 10.5054/tq.2011.244483 s a reader, your initial reaction to the question posed in the title of A this article might have been ‘‘No, of course not.
    [Show full text]
  • Language Description and Linguistic Typology Fernando Zúñiga
    Language description and linguistic typology Fernando Zúñiga 1. Introduction The past decade has seen not only a renewed interest in field linguistics and the description of lesser-known and endangered languages, but also the appearance of the more comprehensive undertaking of language documen- tation as a research field in its own right. Parallel to this, the study of lin- guistic diversity has noticeably evolved, turning into a complex and sophis- ticated field. The development of these two intellectual endeavors is mainly due to an increasing awareness of both the severity of language endanger- ment and the theoretical significance of linguistic diversity, and it has bene- fited from a remarkable improvement of computing hardware and software, as well as from several simultaneous developments in the worldwide avail- ability and use of information technologies. The important recent development of these two subfields of linguistics has certainly not gone unnoticed in the literature. When addressing the relationship between them, however, most scholars have concentrated on how and how much typology depends on the data provided by descriptive work, as well as on the usefulness and importance of typologically in- formed descriptions (cf. e.g. Croft 2003, Epps 2011, and the references therein). Rather than replicating articles that deal with historical issues and questions raised by the results of descriptive and typological enterprises, the present paper focuses on methodological issues raised by their respec- tive objects of study and emphasizes the relevance of some challenges they face. The different sections address the descriptivist’s activity (§2), the typologist’s job (§3), and some selected challenges on the road ahead for the two subfields and their cooperation (§4).
    [Show full text]
  • Explanation in Typology
    Explanation in typology Diachronic sources, functional motivations and the nature of the evidence Edited by Karsten Schmidtke-Bode Natalia Levshina Susanne Maria Michaelis Ilja A. Seržant language Conceptual Foundations of science press Language Science 3 Conceptual Foundations of Language Science Series editors Mark Dingemanse, Max Planck Institute for Psycholinguistics N. J. Enfield, University of Sydney Editorial board Balthasar Bickel, University of Zürich, Claire Bowern, Yale University, Elizabeth Couper-Kuhlen, University of Helsinki, William Croft, University of New Mexico, Rose-Marie Déchaine, University of British Columbia, William A. Foley, University of Sydney , William F. Hanks, University of California at Berkeley, Paul Kockelman, Yale University, Keren Rice, University of Toronto, Sharon Rose, University of California at San Diego, Frederick J. Newmeyer, University of Washington, Wendy Sandler, University of Haifa, Dan Sperber Central European University No scientific work proceeds without conceptual foundations. In language science, our concepts about language determine our assumptions, direct our attention, and guide our hypotheses and our reason- ing. Only with clarity about conceptual foundations can we pose coherent research questions, design critical experiments, and collect crucial data. This series publishes short and accessible books that explore well-defined topics in the conceptual foundations of language science. The series provides a venue for conceptual arguments and explorations that do not require the traditional book-length treatment, yet that demand more space than a typical journal article allows. In this series: 1. Enfield, N. J. Natural causes of language. 2. Müller, Stefan. A lexicalist account of argument structure: Template-based phrasal LFG approaches and a lexical HPSG alternative 3. Schmidtke-Bode, Karsten, Natalia Levshina, Susanne Maria Michaelis & Ilja A.
    [Show full text]
  • SKY Journal of Linguistics Is Published by the Linguistic Association of Finland (One Issue Per Year)
    SKY Journal of Linguistics is published by the Linguistic Association of Finland (one issue per year). Notes for Contributors Policy: SKY Journal of Linguistics welcomes unpublished original works from authors of all nationalities and theoretical persuasions. Every manuscript is reviewed by at least two anonymous referees. In addition to full-length articles, the journal also accepts short (3-5 pages) ‘squibs’ as well as book reviews. Language of Publication: Contributions should be written in either English, French, or German. If the article is not written in the native language of the author, the language should be checked by a qualified native speaker. Style Sheet: A detailed style sheet is available from the editors, as well as via WWW at http://www.ling.helsinki.fi/sky/skystyle.shtml. Abstracts: Abstracts of the published papers are included in Linguistics Abstracts and Cambridge Scientific Abstracts. SKY JoL is also indexed in the MLA Bibliography. Editors’ Addresses (2005): Pentti Haddington, Department Finnish, Information Studies and Logopedics, P.O. Box 1000, FIN-90014 University of Oulu, Finland (e-mail [email protected]) Jouni Rostila, German Language and Culture Studies, FIN-33014 University of Tampere, Finland (e-mail [email protected]) Ulla Tuomarla, Department of Romance Languages, P.O. Box 24 (Unioninkatu 40B), FIN-00014 University of Helsinki, Finland (e-mail [email protected]) Publisher: The Linguistic Association of Finland c/o Department of Linguistics P.O. Box 4 FIN-00014 University of Helsinki Finland http://www.ling.helsinki.fi/sky The Linguistic Association of Finland was founded in 1977 to promote linguistic research in Finland by offering a forum for the discussion and dissemination of research in linguistics, both in Finland and abroad.
    [Show full text]
  • Linguistic Typology 2019; 23(2): 263–302
    Linguistic Typology 2019; 23(2): 263–302 Joan Bybee and Shelece Easterday Consonant strengthening: A crosslinguistic survey and articulatory proposal https://doi.org/10.1515/lingty-2019-0015 Received April 16, 2018; accepted December 07, 2018 Abstract: Given the common intuition that consonant lenition occurs more often than fortition, we formulate this as a hypothesis, defining these sound change types in terms of decrease or increase in oral constriction. We then test the hypothesis on allophonic processes in a diverse sample of 81 languages. With the hypothesis confirmed, we examine the input and output of such sound changes in terms of manner and place of articulation and find that while decrease in oral constriction (weakening) affects most consonant types, increase in oral constriction (strengthening) is largely restricted to palatal and labial glides. We conclude that strengthening does not appear to be the simple inverse of weakening. In conclusion we suggest some possible avenues for explaining how glide strengthening may result from articulatory production pressures and speculate that strengthening and weakening can be encompassed under a single theory of sound change resulting from the automatization of production. Keywords: phonology, consonant strengthening, lenition and fortition, allopho- nic variation, sound change, automatization 1 Introduction It is generally agreed that lenition is much more common than fortition in sound changes as well as phonological processes across the languages of the world.1 This asymmetry suggests the need for special attention to fortition in order to determine if it is the simple inverse of lenition or a different phenomenon. The Author Contribution Statement: Joan Bybee and Shelece Easterday contributed equally to the data collection, analysis, and writing of this paper 1 See Section 3 for references and discussion.
    [Show full text]
  • Linguistic Typology and Language Acquisition: the Accessibility Hierarchy and Relative Clauses
    Linguistic Typology and Language Acquisition: The Accessibility Hierarchy and Relative Clauses Jae Jung Song (University of Otago) Song, lae lung. (2002). Linguistic typology and language acquisition: The accessibility hierarchy and relative clauses. Language Research 38(2), 729-756. Linguistic typology, together with language universals research, is not very widely invoked as a theoretical framework in which to raise questions about the language acquisition process and also to address some of the theoretical issues or problems emerging from language acquisition research. The objective of the present review article is to raise the profile of linguistic typology and language universals in the context of language acquisition research. By way of illustration, a critical review of the validity or role of the Accessibility Hierarchy (AH) in the Ll/L2 acquisition of relative clauses is provided. Moreover, factors which may interfere with the predictions made by the AH or which may have a bearing upon the acquisition of relative clauses are identified and discussed. Key words: linguistic typology, the Accessibility Hierarchy, LlIL2 acquisition of relative clauses 1. Introduction The relationship between language universals and language acquisition was clearly identified very early on in the development of modern linguistic typology as was first enunciated by Jakobson in his 1941 monograph, Kindersprache, Aphasie und Allgemeine Lautgesetze (published again in 1968 in English under the title of Child language, aphasia and phonological universals). He assumed that the implicational universal of p=:Jq (if p, then q) can be dynamically interpreted with the effect that acquisition of phonological property q will precede acquisition of phono­ logical property p, for instance.I) Otherwise, the implicational universal of 1) For example, the presence of voiced aspirated stops implies the presence of voiceless aspirated stops, e.g.
    [Show full text]