On the Six-Way Word Order Typology

Total Page:16

File Type:pdf, Size:1020Kb

On the Six-Way Word Order Typology Studies in Language 21: I. 69-103 (1997). All rights reserved ON THE SIX-WAY WORD ORDER TYPOLOGY MATTHEW S. DRYER State University of New York at Buffalo ABSTRACT A number of arguments are given against the traditional word order typology based on the six types SOY, SVO, VSO, VOS, OVS, and OSV, and in favour of an alternative typology based on two binary parameters OV vs. VO and SV vs. VS. The arguments given include ones based on various advantages of collapsing VSO and VOS into a single type, the infrequency of clauses containing a noun subject and noun object, the value of isolating the more predictive parameter of OV vs. VO, and the fact that the traditional typology ignores the position of intransitive subjects. 1. Introduction There is a long tradition in linguistics that treats the basic word order of the subject, object, and verb as a fundamental typological parameter, that treats the question of whether a language is SOY, SYO, YSO, YOS, OYS, or OSY as one the most important things to know about a language. The purpose of this paper is to present arguments in favour of an alternative to this six-way typology, one based on two separate 2-way typological parame­ ters: OY vs. YO, and SY vs. YS. Together these two parameters define four types: YS&YO, SY&YO, SY&OY, and YS&OV. The first of these, YS& YO, which I will refer to as verb-initial, corresponds roughly to the two traditional types YSO and YOS; SY & YO corresponds roughly to SYO; SY &OV, which I will refer to as verb-final, corresponds to the two types SOY and bSY; and YS&OY corresponds to the rare type OYS. 1 I will present eight arguments for the proposed typology. First, I will argue that it allows easy classification of languages which are indeterminate- 70 MATTHEW S. DRYER ly VSO/VOS. Second, I will argue that there is no evidence that the differ­ ence between VSO and VOS languages is predictive of anything: the properties that are typical of VSO languages are apparently also typical of VOS languages, and hence VSO and VOS are best treated as belonging to the same type. Third, I will argue that the difference between VSO and VOS order is a relatively unstable one, both orders being commonly found as basic orders within the same language family and within the same linguistic area. Fourth, I will argue that the traditional six-way typology is based on a clause type that occurs relatively infrequently, while the proposed typology is based on clause types that occur much more frequently. Fifth, I will argue that there are many languages which have word order sufficiently flexible that they are impossible to classify by the traditional typology but which are still classifiable by the proposed typology. Sixth, I will argue that there are other languages with word order even more flexible but which are still classifiable either for the order of subject and verb or for the order of object and verb. Seventh, I will argue that the proposed typology is superior because it isolates the order of the object and verb, the more fundamental typological parameter in terms of word order correlations . And eighth, I will argue that the traditional typology suffers because it overlooks the order of subject and verb in intransitive clauses, even though the order in such clauses is occasionally different from the order of subject and verb in transitive clauses, and even though intransitive clauses containing a noun subject are much more common than transitive clauses containing a noun subject. 2. The notion of basic word order The term basic word order is used in various ways by different lin­ guists, often without an apparent awareness that it is being applied to different notions. 2 The characterization of basic word order by Hawkins (1983: 13) is representative of criteria assumed by many linguists: he uses a set of different criteria, all of which tend to correlate with each other, though none of them are necessary properties. These include the most frequent order, the order that occurs in the broadest set of syntactic environments, and the order that is unmarked by a variety of other markedness criteria. But ON THE SIX-WAY WORD ORDER TYPOLOGY 71 apart from the description of particular languages, the notion of basic word order has played its most significant role in identifying the orders of various pairs or sets of elements that provide the empirical basis for the crosslinguis­ tic generalizations originally discussed by Greenberg ( 1963) and pursued in works by various people (e.g. Lehmann 1973, Vennemann 1976, Hawkins 1983, Dryer 1992). Whatever the merits of the various criteria discussed by Hawkins, the fact remains that the empirical basis for these generalizations has largely been statements in grammatical descriptions which describe one order of some pair or set of elements as the normal order or the most common order. In other words, while criteria other than frequency may typically correlate with the most frequent word order, a large body of descriptive literature provides statements regarding the apparent relative frequency of different orders, but very little evidence regarding other criteria, excepting that for many languages, like English, one order is so clearly basic by all criteria that no questions arise. In short, we know that there are significant crosslinguistic generalizations based on a notion of basic order associated with most frequent word order, and the criteria for identifying an order as most frequent are relatively clearcut (though see below). For the purposes of this paper, therefore, I will simply define the basic word order of two or more elements as the most frequent order of those elements in the language. In defining the term basic word order in this way, I am not intending to deny that there may be other useful notions that one might apply the expression to, but the different notions should not be confused. 3 While some linguists assume a notion of basic word order that is not always the most frequent word order, this difference is simply a terminological one, with the term basic word order being used by different linguists as a label for related but distinct notions. Given the fact that most notions of basic word order at least correlate very strongly with the most frequent order, it seems likely that the conclusions of this paper are equally applicable to other notions of basic word order. But whether or not this is the case is difficult to determine, given the difficulties applying other criteria for identifying basic word order. Despite the fact that I appeal to evidence based on frequency, I must concede from the start two fundamental problems with the notion of most frequent order in a language, one methodological, the other substantive. The methodological problem is that, despite the fact that it is often easy to identify a most frequent order in a given body of texts, questions can arise whether the differences in frequency between alternative orders are necessari- 72 MATTHEWS. DRYER ly general properties of texts in the language rather than accidental properties of a particular text or set of texts. What this means is that if we really want to determine whether a particular order is really most frequent in a language, we need to examine as wide a variety of texts representing different genres as possible. If a particular order is more common in most or all texts, then we can justifiably describe that order as most frequent. If no order is most frequent over most texts, however, or if the order varies from genre to genre or text to text, we should probably not describe any particular order as the basic order (in the sense of most frequent order) and we should say that the language is one that lacks a basic word order, as Mithun ( 1987) argues for a number of languages. In short, while it may be relatively easy to identify a most frequent order in a single text or in a small body of texts, it is necessary to examine a wide variety of texts before one can decide with confidence that a particular order is most frequent in the language as a whole. Much of the frequency data cited below falls short of this ideal, and thus the text counts cited should probably be considered in many cases as no more than pilot studies that at best suggest that a particular order is most frequent. The substantive problem with frequency is that frequency is epi­ phenomenal relative to grammars of languages; typologies of languages are often assumed to be typologies of grammars of languages. Where one order is more frequent than another in a language, I assume it to be the case that the higher frequency ultimately reflects the discourse conditions under which the different orders are used and the order that is more frequent is more frequent only because the discourse conditions in which it is used tend to occur more frequently in normal discourse. Hence frequency is due to two factors, the linguistic factor of the rules or principles governing when particular orders are used, and the nonlinguistic factor of how often particu­ lar discourse conditions arise. The following example of word order in Papago should make this point clear. Payne (1987: 793-794) cites the text count given in Table 1. ON THE SIX-WAY WORD ORDER TYPOLOGY 73 Table 1: Word Order in Papago sv 48 (23%) ov 44 (29%) vs 158 (77%) vo I 08 (71%) Table I shows that both subjects and objects more commonly follow the verb in Papago.
Recommended publications
  • Some Principles of the Use of Macro-Areas Language Dynamics &A
    Online Appendix for Harald Hammarstr¨om& Mark Donohue (2014) Some Principles of the Use of Macro-Areas Language Dynamics & Change Harald Hammarstr¨om& Mark Donohue The following document lists the languages of the world and their as- signment to the macro-areas described in the main body of the paper as well as the WALS macro-area for languages featured in the WALS 2005 edi- tion. 7160 languages are included, which represent all languages for which we had coordinates available1. Every language is given with its ISO-639-3 code (if it has one) for proper identification. The mapping between WALS languages and ISO-codes was done by using the mapping downloadable from the 2011 online WALS edition2 (because a number of errors in the mapping were corrected for the 2011 edition). 38 WALS languages are not given an ISO-code in the 2011 mapping, 36 of these have been assigned their appropri- ate iso-code based on the sources the WALS lists for the respective language. This was not possible for Tasmanian (WALS-code: tsm) because the WALS mixes data from very different Tasmanian languages and for Kualan (WALS- code: kua) because no source is given. 17 WALS-languages were assigned ISO-codes which have subsequently been retired { these have been assigned their appropriate updated ISO-code. In many cases, a WALS-language is mapped to several ISO-codes. As this has no bearing for the assignment to macro-areas, multiple mappings have been retained. 1There are another couple of hundred languages which are attested but for which our database currently lacks coordinates.
    [Show full text]
  • LIN 631 Linguistic Description of American Languages
    Course: LIN 631 Linguistic description of American languages: Typological properties of Mesoamerican languages Term: Fall 2007 Text: Readings on reserve Meetings: T/R15:30 -16:50 in 118 Baldy Instructor: Dr. Jürgen Bohnemeyer – Office 627 Baldy Phone 645-2177 ext 727 E-mail [email protected] Office hours T 11:00-11:30 and R 10:00-11:00 Overview: The seminar covers Mesoamerican (MA) and adjacent languages. 1 The MA linguistic and cultural area stretches from the Valley of Mexico in the North to the northern Honduran border on the Caribbean coast and well into Costa Rica on the Pacific Coast. The MA languages comprise five unrelated families: Otomanguean; Aztecan (a branch of the larger Uto-Aztecan language family); Totonacan; Mixe- Zoquean; and Mayan. A number of MA languages are isolates or of uncertain genealogical grouping; these include Purépecha (or Tarascan), Huave, Oaxaca Chontal (or Tequistlatec), and Xinca. The MA languages have long been recognized as forming a sprachbund or linguistic area. Several millennia of intensive contact have changed the members of the various unrelated language families so as to enhance their compatibility in formal and semantic categories. Phenomena that are pervasive in the MA area include head-marking; ergative and split-intransitive traits in both morphology and syntax; alignment-hierarchy (or “obviation”) effects in argument linking; verb-initial and verb-final constituent orders; morpho-syntactic alienable-inalienable distinctions in adnominal possession; (numeral, nominal, and possessive) classificatory systems; and lack of deictic tense coupled with rich systems of aspectual and modal marking. Polysynthesis, in the sense that syntactic relations and processes tend to have morphological reflexes at the word level and in the sense that content words, in combination with the necessary inflections and function words, can constitute clauses by themselves (independently of their lexical category), is widespread in most MA language families, the most important exception being Otomanguean.
    [Show full text]
  • Language EI Country Genetic Unit Speakers RI Acatepec Tlapanec 5
    Language EI Country Genetic Unit Speakers RI Acatepec Tlapanec 5 Mexico Subtiapa-Tlapanec 33000 1 Alacatlatzala Mixtec 4.5 Mexico Mixtecan 23000 2 Alcozauca Mixtec 5 Mexico Mixtecan 10000 3 Aloápam Zapotec 4 Mexico Zapotecan 2100 4 Amatlán Zapotec 5 Mexico Zapotecan 6000 5 Amoltepec Mixtec 3 Mexico Mixtecan 6000 6 Ascunción Mixtepec Zapotec 1 Mexico Zapotecan 100 7 Atatláhuca Mixtec 5 Mexico Mixtecan 8300 8 Ayautla Mazatec 5 Mexico Popolocan 3500 9 Ayoquesco Zapotec 3 Mexico Zapotecan < 900 10 Ayutla Mixtec 5 Mexico Mixtecan 8500 11 Azoyú Tlapanec 1 Mexico Subtiapa-Tlapanec < 680 12 Aztingo Matlatzinca 1 Mexico Otopamean > < 100 13 Matlatzincan Cacaloxtepec Mixtec 2.5 Mexico Mixtecan < 850 14 Cajonos Zapotec 4 Mexico Zapotecan 5000 15 Central Hausteca Nahuatl 5 Mexico Uto-Aztecan 200000 16 Central Nahuatl 3 Mexico Uto-Aztecan 40000 17 Central Pame 4 Mexico Pamean 4350 18 Central Puebla Nahuatl 4.5 Mexico Uto-Aztecan 16000 19 Chaopan Zapotec 5 Mexico Zapotecan 24000 20 Chayuco Mixtec 5 Mexico Mixtecan 30000 21 Chazumba Mixtec 2 Mexico Mixtecan < 2,500 22 Chiapanec 1 Mexico Chiapanec-Mangue < 20 23 Chicahuaxtla Triqui 5 Mexico Mixtecan 6000 24 Chichicapan Zapotec 4 Mexico Zapotecan 4000 25 Chichimeca-Jonaz 3 Mexico Otopamean > < 200 26 Chichimec Chigmecatitlan Mixtec 3 Mexico Mixtecan 1600 27 Chiltepec Chinantec 3 Mexico Chinantecan < 1,000 28 Chimalapa Zoque 3.5 Mexico Zoque 4500 29 Chiquihuitlán Mazatec 3.5 Mexico Popolocan 2500 30 Chochotec 3 Mexico Popolocan 770 31 Coatecas Altas Zapotec 4 Mexico Zapotecan 5000 32 Coatepec Nahuatl 2.5
    [Show full text]
  • Presenta Eleuterio García Hernández
    COMISIÓN NACIONAL PARA EL DESARROLLO DE LOS PUEBLOS INDÍGENAS MAESTRÍA EN LINGÜÍSTICA INDOAMERICANA NARRATIVA CONVERSACIONAL EN EL CHINANTECO DE TEMEXTITLÁN: UN ACERCAMIENTO A LAS NARRACIONES ORALES EN DOS GENERACIONES PRESENTA ELEUTERIO GARCÍA HERNÁNDEZ TESIS PARA OPTAR AL GRADO DE MAESTRO EN LINGÜÍSTICA INDOAMERICANA DIRECTORA: DRA. M. REGINA MARTÍNEZ CASAS México D. F. Enero de 2014 A mi madre, quien es mi verdadera maestra A mi padre (†) que, desde el cosmos donde está, es mi fuerza para hacer obras como esta A mis hermanos y a mi hermana, a quienes les debo mucho Ah, también para Miri 0 Agradecimientos Mis más profundos agradecimientos a las personas de la comunidad de Temextitlán por haberme dado todas las facilidades para grabarlas y «explotarlas». A los hombres, a las mujeres, a los niños y niñas que participaron de manera directa e indirecta y que me permitieron entender como funciona su desafiante comunidad lingüística. Agradezco a la Dra. María Regina Martínez Casas por la dirección de esta tesis. A la Dra. Rebeca Barriga Villanueva, al Dr. Mario Ernesto Chávez Peón y al Dr. Pedro Hernández López por haber leído mi borrador y por sus valiosas sugerencias. También al conjunto de mis profesores y mis compañeros en la Maestría. Mis reconocimientos a CIESAS, a CONACyT y a la CDI por hacer posible mi tránsito por esta maestría. El apoyo económico para mi manutención y para el trabajo de campo resultaron cruciales para la culminación de mi trabajo académico. 1 Tabla de contenido I. Introducción ........................................................................................................................ 6 Objetivos ............................................................................................................................................ 12 1.1 Estudios previos sobre el chinanteco y el tema de investigación .......................
    [Show full text]
  • Inflectional Class Complexity in the Oto-Manguean Languages Matthew Baerman, Enrique Palancar, Timothy Feist
    Inflectional class complexity in the Oto-Manguean languages Matthew Baerman, Enrique Palancar, Timothy Feist To cite this version: Matthew Baerman, Enrique Palancar, Timothy Feist. Inflectional class complexity in the Oto- Manguean languages. Amerindia, Association d’Ethno-linguistique Amérindienne, 2019, 41, pp.1 - 18. hal-02428337 HAL Id: hal-02428337 https://hal.archives-ouvertes.fr/hal-02428337 Submitted on 5 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. AMERINDIA 41: 1-18, 2019 Inflectional class complexity in the Oto-Manguean languages Matthew BAERMAN Surrey Morphology Group, University of Surrey Enrique L. PALANCAR SeDyL, CNRS Timothy FEIST Surrey Morphology Group, University of Surrey Abstract: In this paper we introduce the object of study of this special issue of Amerindia, the inflectional classes of the Oto-Manguean languages of Mexico, together with their most relevant typological characteristics. These languages are rich both in the variety of their inflectional systems, and in the way these are split into inflection classes. In effect, the full typological range of possible inflection class systems can be found just in this one stock of languages. This is illustrated through a survey of the variety of morphological forms, assignment principles, and paradigm structure, as well as the effects of combining multiple inflection class systems across different exponents within a single word form.
    [Show full text]
  • The Mesoamerican Indian Languages Cambridge Language Surveys
    THE MESOAMERICAN INDIAN LANGUAGES CAMBRIDGE LANGUAGE SURVEYS General Editors: W. Sidney Allen, B. Comrie, C. J. Fillmore, E. J. A. Henderson, F. W. Householder, R. Lass, J. Lyons, R. B. Le Page, P. H. Matthews, F. R. Palmer, R. Posner, J. L. M. Trim This series offers general accounts of all the major language families of the world. Some volumes are organized on a purely genetic basis, others on a geographical basis, whichever yields the most convenient and intelligible grouping in each case. Sometimes, as with the Australian volume, the two in any case coincide. Each volume compares and contrasts the typological features of the languages it deals with. It also treats the relevant genetic relationships, historical development, and sociolinguistic issues arising from their role and use in the world today. The intended readership is the student of linguistics or general linguist, but no special knowledge of the languages under consideration is assumed. Some volumes also have a wider appeal, like those on Australia and North America, where the future of the languages and their speakers raises important social and political issues. Already published: The languages of Australia R. M. W. Dixon The languages of the Soviet Union Bernard Comrie Forthcoming titles include: Japanese/Korean M. Shibatani and Ho-min Sohn Chinese J. Norman and Mei Tsu-lin S. E. Asia J. A. Matisoff Dravidian R. E. Asher Austronesian R. Blust Afro-Asiatic R. Hetzron North American Indian W. Chafe Slavonic R. Sussex Germanic R. Lass Celtic D. MacAulay et al. Indo-Aryan C. P. Masica Balkans 7. Ellis Creole languages J.
    [Show full text]
  • UC Berkeley Dissertations, Department of Linguistics
    UC Berkeley Dissertations, Department of Linguistics Title The Phonetics and Phonology of San Martín Itunyoso Trique Permalink https://escholarship.org/uc/item/6d05x60k Author DiCanio, Christian T Publication Date 2008 eScholarship.org Powered by the California Digital Library University of California The Phonetics and Phonology of San Martín Itunyoso Trique by Christian Thomas DiCanio B.A. (SUNY Buffalo) 2002 M.A. (University of California, Berkeley) 2004 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Linguistics in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Dr. Keith Johnson, Chair Dr. Ian Maddieson Dr. Larry Hyman Dr. Johanna Nichols Fall 2008 The Phonetics and Phonology of San Martín Itunyoso Trique Copyright 2008 by Christian Thomas DiCanio 1 Abstract The Phonetics and Phonology of San Martín Itunyoso Trique by Christian Thomas DiCanio Doctor of Philosophy in Linguistics University of California, Berkeley Dr. Keith Johnson, Chair This dissertation investigates the phonology and phonetics of San Martín Itunyoso Trique, an Otomanguean language spoken in Mexico. Along with describing the language’s phonological system, I examine two distinct aspects of the language’s phonetics: the fortis-lenis consonant contrast and the interaction of laryngeals with tone. The investigation of the phonological system focuses on the structure of the morphological word, which is characterized by final syllable prominence. I show that prominence is instantiated by increased duration, the final syllable’s ability to license all phonological contrasts, and its ability to license certain contrasts on preceding syllables. I analyze the fortis-lenis contrast in Trique, observing its primary correlates to be durational with an additional glottal spreading gesture in fortis obstruents.
    [Show full text]
  • M604.Pdf (10.56Mb)
    CENTRO DE INVESTIGACIONES Y ESTUDIOS SUPERIORES EN ANTROPOLOGÍA SOCIAL COMISIÓN NACIONAL PARA EL DESARROLLO DE LOS PUEBLOS INDÍGENAS MAESTRÍA EN LINGÜÍSTICA INDOAMERICANA COMPLEJIDAD FONOLÓGICA EN EL CHINANTECO DE QUIOTEPEC: NASALIDAD, FONACIÓN Y TONO PRESENTA MIGUEL CASTELLANOS CRUZ TESIS PARA OPTAR AL GRADO DE MAESTRO EN LINGÜÍSTICA INDOAMERICANA DIRECTOR DE TESIS DR. MARIO E. CHÁVEZ PEÓN México, D.F. Julio del 2014. Dedicatoria En especial a Margarita, por ser mi primera escuela doméstica, por conocer mis miedos pero más mi capacidad que aún yo no he descubierto, por enseñarme la lengua desde que yo era un huevo, por darme su apoyo desde que salí de casa sin importarle mi edad, gracias por esperarme estoicamente con su enfermedad rutinario. Quisiera entregarte ahorita un ramo de rosas salpicadas aún por el rocío mañanero, darte un abrazo fuerte y un beso de hijo y de agradecimiento. A mis hermanos, los que viven en el pueblo y los que están fuera de casa, ustedes armados luchando con toda la hostilidad que se enfrentan en el camino al trabajo, en otras tierras y con otra lengua, yo aún me resisto en no deponer las armas del conocimiento. De manera especial me dirijo a ti Horten: Tu pueblo, sigue siendo hermoso: los cerros que lo rodean, sus casas, sus calles, su templo, su gente, su modo de ser, sus fiestas, su abundancia de caminitos que llevan a tantas comunidades... sus hijos de cerca, sus hijos de lejos, ¡qué bonito lugar! Y allí estás tú. Me siento muy bien contigo y te tengo la confianza que sólo se puede dar entre seres que se aman.
    [Show full text]
  • The Phonetics and Phonology of San Martín Itunyoso Trique
    The Phonetics and Phonology of San Martín Itunyoso Trique by Christian Thomas DiCanio B.A. (SUNY Buffalo) 2002 M.A. (University of California, Berkeley) 2004 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Linguistics in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Dr. Keith Johnson, Chair Dr. Ian Maddieson Dr. Larry Hyman Dr. Johanna Nichols Fall 2008 The Phonetics and Phonology of San Martín Itunyoso Trique Copyright 2008 by Christian Thomas DiCanio 1 Abstract The Phonetics and Phonology of San Martín Itunyoso Trique by Christian Thomas DiCanio Doctor of Philosophy in Linguistics University of California, Berkeley Dr. Keith Johnson, Chair This dissertation investigates the phonology and phonetics of San Martín Itunyoso Trique, an Otomanguean language spoken in Mexico. Along with describing the language’s phonological system, I examine two distinct aspects of the language’s phonetics: the fortis-lenis consonant contrast and the interaction of laryngeals with tone. The investigation of the phonological system focuses on the structure of the morphological word, which is characterized by final syllable prominence. I show that prominence is instantiated by increased duration, the final syllable’s ability to license all phonological contrasts, and its ability to license certain contrasts on preceding syllables. I analyze the fortis-lenis contrast in Trique, observing its primary correlates to be durational with an additional glottal spreading gesture in fortis obstruents. Articulatory strength has been both encoded in phonological theory as a distinctive feature, e.g. [TENSE] (Jansen, 2004), and as a constraint determining target attainment in consonant gestures, e.g.
    [Show full text]
  • Bayesian Models for Multilingual Word Alignment
    Bayesian Models for Multilingual Word Alignment Robert Östling Academic dissertation for the Degree of Doctor of Philosophy in Linguistics at Stockholm University to be publicly defended on Friday 22 May 2015 at 13.00 in hörsal 5, hus B, Universitetsvägen 10 B. Abstract In this thesis I explore Bayesian models for word alignment, how they can be improved through joint annotation transfer, and how they can be extended to parallel texts in more than two languages. In addition to these general methodological developments, I apply the algorithms to problems from sign language research and linguistic typology. In the first part of the thesis, I show how Bayesian alignment models estimated with Gibbs sampling are more accurate than previous methods for a range of different languages, particularly for languages with few digital resources available —which is unfortunately the state of the vast majority of languages today. Furthermore, I explore how different variations to the models and learning algorithms affect alignment accuracy. Then, I show how part-of-speech annotation transfer can be performed jointly with word alignment to improve word alignment accuracy. I apply these models to help annotate the Swedish Sign Language Corpus (SSLC) with part-of-speech tags, and to investigate patterns of polysemy across the languages of the world. Finally, I present a model for multilingual word alignment which learns an intermediate representation of the text. This model is then used with a massively parallel corpus containing translations of the New Testament, to explore word order features in 1001 languages. Keywords: word alignment, parallel text, Bayesian models, MCMC, linguistic typology, sign language, annotation transfer, transfer learning.
    [Show full text]
  • Code Alpha-3 Pour Un Traitement Exhaustif Des Langues
    PROJET DE NORME INTERNATIONALE ISO/DIS 639-3 ISO/TC 37/SC 2 Secrétariat: SCC Début de vote: Vote clos le: 2005-02-04 2005-07-04 INTERNATIONAL ORGANIZATION FOR STANDARDIZATION • МЕЖДУНАРОДНАЯ ОРГАНИЗАЦИЯ ПО СТАНДАРТИЗАЦИИ • ORGANISATION INTERNATIONALE DE NORMALISATION Codes pour la représentation des noms de langues — Partie 3: Code alpha-3 pour un traitement exhaustif des langues Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive coverage of languages ICS 01.140.20 Pour accélérer la distribution, le présent document est distribué tel qu'il est parvenu du secrétariat du comité. Le travail de rédaction et de composition de texte sera effectué au Secrétariat central de l'ISO au stade de publication. To expedite distribution, this document is circulated as received from the committee secretariat. ISO Central Secretariat work of editing and text composition will be undertaken at publication stage. CE DOCUMENT EST UN PROJET DIFFUSÉ POUR OBSERVATIONS ET APPROBATION. IL EST DONC SUSCEPTIBLE DE MODIFICATION ET NE PEUT ÊTRE CITÉ COMME NORME INTERNATIONALE AVANT SA PUBLICATION EN TANT QUE TELLE. OUTRE LE FAIT D'ÊTRE EXAMINÉS POUR ÉTABLIR S'ILS SONT ACCEPTABLES À DES FINS INDUSTRIELLES, TECHNOLOGIQUES ET COMMERCIALES, AINSI QUE DU POINT DE VUE DES UTILISATEURS, LES PROJETS DE NORMES INTERNATIONALES DOIVENT PARFOIS ÊTRE CONSIDÉRÉS DU POINT DE VUE DE LEUR POSSIBILITÉ DE DEVENIR DES NORMES POUVANT SERVIR DE RÉFÉRENCE DANS LA RÉGLEMENTATION NATIONALE. © Organisation internationale de normalisation, 2005 ISO/DIS 639-3 PDF — Exonération de responsabilité Le présent fichier PDF peut contenir des polices de caractères intégrées. Conformément aux conditions de licence d'Adobe, ce fichier peut être imprimé ou visualisé, mais ne doit pas être modifié à moins que l'ordinateur employé à cet effet ne bénéficie d'une licence autorisant l'utilisation de ces polices et que celles-ci y soient installées.
    [Show full text]
  • Silverman, Daniel (2005). “The Phonology of Chinantecan
    Article Number: LALI: 00103 Chinantec, the Phonology of 1 a0005 Chinantec, the Phonology of D Silverman, University of Illinois at changes. Additionally, certain irregular patterns are Urbana-Champaign, Champaign, IL, USA marked by ablaut. Due to their inherent inflection, ß 2006 Elsevier Ltd. All rights reserved. bare verbal roots do not exist as such in Chinantecan. All Chinantecan languages have a large number of verb classes, along with many lexical exceptions. p0005 Chinantecan is a group of about 14 VSO languages Classes are differentiated by patterns of identity or within the Otomanguean family, spoken by ap- non-identity across aspect/person combinations. For proximately 90,000 people in northeastern Oaxaca, example, in the following partial paradigm for the Mexico, having branched from the Otomanguean verb ‘to hit’ shown in Table 1, some complexes are tree more than 16 centuries ago. The 14 major lan- identical to others, while others are different. Verbs in guages (where ‘‘language’’ is defined as a speech com- this class will tend to show a similar pattern of iden- munity with mutual intelligibility not in excess of tity and non-identity across cells, while verbs in other 80% with other communities) are Ojitla´n, Usila, Tla- classes show a different pattern. coatzintepec, Chiltepec, Sochiapan, Tepetotutla, Tla- Table 2 provides examples of stem inflection from p0015 tepusco, Palantla, Valle Nacional, Ozumacı´n, Lalana, Quiotepec (Robbins, 1968). Lealao, Quiotepec, and Comaltepec. The first seven In at least some Chinantecan languages, the verb p0020 are northern languages and tend to be more innova- may be prefixed by a subject agreement marker for tive phonologically; the second seven southern lan- intransitive verbs, or by an object agreement marker guages are more conservative.
    [Show full text]