Morphological Analysis of the Dravidian Language Family

Total Page:16

File Type:pdf, Size:1020Kb

Morphological Analysis of the Dravidian Language Family ACL 2016 Submission ***. Confidential review copy. DO NOT DISTRIBUTE. 000 050 001 MorphologicalMorphological Analysis Segmentation of the Dravidianof the Dravidian Language Languages Family 051 002 052 003 Arun Kumar Ryan Cotterell Lluís Padró Antoni Oliver 053 004 054 Universitat Oberta JohnsAnonymous Hopkins ACLUniversitat submission Politècnica Universitat Oberta 005 de Catalunya University de Catalunya de Catalunya 055 006 [email protected] [email protected] [email protected] 056 007 057 008 058 009 059 010 Abstract 060 011 Abstract 061 012 The Dravidian family is one of the most 062 013 widely spoken set of languages in the India 063 world, yet there are very few annotated re- 014 064 1sources Introduction available to NLP researchers. To 015 Indo-Aryan 065 remedy this, we create DravMorph, a cor- 016 066 Thepus Dravidian annotated languages for morphological comprise segmenta- one of the 017 067 world’stion and major part-of-speech. language families Also, and we exploit are spoken 018 by overnovel 300 features million and people higher-order in southern models India. to De- 068 019 spiteachieve their prevalence, promising results they remain—with on these corpora respect 069 020 to languageon both tasks, technology—low beating techniques resource. proposed Our cur- 070 021 rentin work the literature focuses on by developing as much as new 4 points models in and Telugu 071 022 data for processing the four most commonly spo- 072 segmentation F1. Kannada 023 ken Dravidian languages: Kannada, Malayalam, Tamil 073 1Tamil Introduction and Telugu. We present a brief overview of 024 Malayalam 074 025 Thethe linguistic Dravidian features languages that characterize comprise one the of family the 075 026 world'sas whole major and then language describe families the development and are spoken of sta- Figure 1: The Dravidian languages are spoken natively in 076 Figuresouthern 1: India, The Dravidian whereas languages belongingare spoken to natively the Indo- in 027 tistical models that utilize these specific features. 077 by over 300 million people in southern India (see southernAryan family, India, a whereas subbranch languages of the larger belonging Indo-European to the Indo- fam- 028 FigureWe focus1). Despite on the their computational prevalence, processing they remain of Aryanily, are family, spoken a subbranchin the north. of the larger Indo-European family, 078 are spoken in the north. 029 lowDravidian resource morphology, with respect a to critical language issue technology. since the 079 Wefamily annotate exhibits new data rich and agglutinative develop new inflectionalmodels for 030 gers that use the output of our segmenters as fea- 080 themorphology most commonly as well spoken as highly-productive Dravidian languages: com- 031 tureWe greatly make three improve primary tagging contributions: accuracy. (i) This We re-in- 081 Kannada,pounding. Malayalam, For example, Tamil and nouns Telugu. are typically 032 leasedicates DravMorph, that for languages a corpus withannotated rich morphology, for morpho- 082 inflectedWe focus with on gender, the computational number, case processing in addition of 033 logicala more segmentation structured approach and part-of-speech to character-level (POS) fea- as 083 Dravidianto various morphology, postpositions. a critical Consider issue the since Malay- the 034 antures open-source than simple resource, prefix and encouraging suffix features future is work nec- 084 familyalam exhibits word ;gMBT`ppiiiBM`ƃ2v2¨TTKñ rich agglutinative inflectional 035 onessary. Dravidian Third, languages; we release (ii) the We annotated show that segmen- a com- 085 morphology(അിപർവതതിെേയാം as well as highly-productive), which consists com- 036 binationtation and of POS-tagged higher-order corpora models asand open-source linguistically- re- 086 pounding.of the compound For example, stem Dravidian;MBYT`ƃpiKñ nouns are motivated features yields state-of-the-art accuracy 037 sources, encouraging future work on Dravidian 087 typically(fire+mountain inflected) and with gender,the following number and suffixes: case on the task of morphological segmentation on the 038 languages. 088 inii addition(inflictive to various increment postpositions.), BM`ƃ2 (genitive E.g., con-case 039 corpus; (iii) We show that training POS taggers 089 sidermarker the), v2 word(inflictiveag niparvvatattinṟeyeāppam increment) and QTTKñ (post that use the output of our segmenters as features 040 2 Morphological Segmentation 090 (positionഅഗ്നിപർവ്വതത്തിന്റെയോപ്പം). These combine to give the meaning) in of significantly improves a state-of-the-art tagger. 041 Malayalamthe English phrase which “with is compromised a volcano”. The of added the The task of morphological segmentation entails 091 042 092 compoundintra-word complexity noun stem makesagni+paṟavvatam morphological 2breaking DravMorph a word up into its constituent morphs. 043 (analysisfire+mountain requisite) and for the the following Dravidian suffixes: languages.tta For example, the English word DQ#H2bbM2bb can 093 044 (inflectionalWe make three increment contributions.), inṟe ( First,genitive we show case Abe primary segmented contribution as DQ#+H2bb of this+M2bb work, uncovering is the re- 094 1 045 markerthat a), combinationye (inflectional of higher-order increment) models and oppam and leasehowthe of DravMorph, word was builta corrected and hinting corpus at the for seman- both 095 046 (linguistically-motivatedpostposition). These combine features to yieldsgive the state-of- mean- morphologicaltics of the resulting segmentation derived form. and POS When inthe process- four 096 047 ingthe-art of accuracies the English on phrase the task ``with of morphological a volcano.'' ing morphologically-rich languages, this helps re- 097 1The morphological analyzers and the code for correcting 048 segmentation in the four major Dravidian lan- duce the sparsity created by the higher OOV rate 098 This complexity makes morphological analysis the corpus available at https://github.com/Malkitti/ 049 obligatoryguages. Second, for the Dravidianwe show that languages. training POS tag- Corpusandcodesdue to the productive morphology, and, empiri- 099 217 1 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 217–222, Valencia, Spain, April 3-7, 2017. c 2017 Association for Computational Linguistics POS Segmentation Wiki Dump POS Tagging Segmentation Ka ILMT/IIIT-H ILMT/IIIT-H 2015-02-09 Lang # Sentences # Tokens # Types Ma ILMT/AM ILMT/AM 2015-05-08 Ka 8600 31364 3593 Ta ILMT/AM ILMT/AM 2015-05-09 Ma 4034 34300 4730 Te ILMT/AM ILMT/UoH 2015-02-03 Ta 4550 32400 4445 Te 5679 30625 4183 Table 1: The origin of the ruled-based analyzers and tag- gers. ILMT stands for Indian Language Machine Translation Project, AM stands for Amrita University, IIIT-H stands for Table 2: Per language breakdown of size of the POS portion IIIT-H University, UoH stands for University of Hyderabad. and the morphological segmentation portion of DravMorph. All train / dev / test splits used in the experiments will be re- leased with the corpus. most widely spoken Dravidian languages: Kan- nada, Malayalam, Tamil and Telugu. The corpus and unsupervised approaches have been success- contains 4034-8600 annotated sentences and 3593- ful, but, when annotated data is available, super- 4730 segmented types per language. The full statis- vised approaches typically greatly outperform un- tics are listed in Table 2. To the best of our knowl- supervised approaches (Ruokolainen et al., 2013). edge, this is the most comprehensive annotated cor- In light of this, we adopt a fully supervised model pus of the Dravidian languages. here. All the newly annotated corpora are based on We apply semi-Markov Conditional Random Wikipedia text in the respective languages (see Ta- Fields (S-CRFs) to the problem of morpholog- ble 1). To speed up annotation, we first ran closed- ical segmentation (Sarawagi and Cohen, 2004; source ruled-based morphological analyzers and Cotterell et al., 2015). S-CRFs have the ability POS taggers produced by the government of India to jointly model both a segmentation and a and Indian universities. We remark that the exis- labeling. For example, consider the following the tence of such rule-based tools does not diminish Malayalam word kūṭṭukāranmāruṭeyēāppam the utility of the annotated corpus---our ultimate (കൂട്ടുകാരന്മാരുടെയോപ്പം) (with (male) goal is the adoption of modern statistical methods friends): for Dravidian NLP, which requires annotated data. To ensure a gold standard corpus, we then hand- labeled segmentation corrected the resulting output. Additionally, we kūṭṭukāranmāruṭeyēāppam =========== ⇒ standardized the POS tagging schemes across lan- w guages, using the IIIT-H POS tagset (Bharati et al., [stem kūṭṭukāran] [suf mār] [suf uṭeZ ] [suf yēāppam] . 2006), which has 23 tags. Furthermore, we calcu- | {z } lated inter-annotator agreement of two annotators s1,ℓ1 s2,ℓ2 s3,ℓ3 s4,ℓ4 for morphological labels and all datasets have Co- A| S-CRF{z models} | this{z transformation} | {z } | as {z } hen's κ (Cohen, 1968) > 0.80. 3 Morphological Segmentation 1 pθ(s, l w)= exp θ⊤f(si, ℓi, ℓi 1) , We first examine the task of morphological seg- | Zθ(w) ( − ) ∑i=1 mentation in the Dravidian languages. The task en- d tails breaking a word up into its constituent morphs. where s is a segmentation, ℓ a labeling, θ R ∈ 2 For example, the English word joblessness is the parameter vector, f is a feature function can be segmented as job+less+ness. When and the partition function Zθ(w) ensures the dis- processing morphologically-rich languages, this tribution is normalized. Note that each ℓi is taken helps reduce the sparsity created by the higher from a set of labels L. In this work, we take L = prefix, stem, suffix . OOV rate due to productive morphology, and, { } empirically, has shown to be beneficial in a di- As an extension to the standard S-CRF Model, verse variety of down-stream tasks, e.g., machine we allow for higher-order segment interactions translation (Clifton and Sarkar, 2011), speech (Nguyen et al., 2011).
Recommended publications
  • Grammatical Gender in Hindukush Languages
    Grammatical gender in Hindukush languages An areal-typological study Julia Lautin Department of Linguistics Independent Project for the Degree of Bachelor 15 HEC General linguistics Bachelor's programme in Linguistics Spring term 2016 Supervisor: Henrik Liljegren Examinator: Bernhard Wälchli Expert reviewer: Emil Perder Project affiliation: “Language contact and relatedness in the Hindukush Region,” a research project supported by the Swedish Research Council (421-2014-631) Grammatical gender in Hindukush languages An areal-typological study Julia Lautin Abstract In the mountainous area of the Greater Hindukush in northern Pakistan, north-western Afghanistan and Kashmir, some fifty languages from six different genera are spoken. The languages are at the same time innovative and archaic, and are of great interest for areal-typological research. This study investigates grammatical gender in a 12-language sample in the area from an areal-typological perspective. The results show some intriguing features, including unexpected loss of gender, languages that have developed a gender system based on the semantic category of animacy, and languages where this animacy distinction is present parallel to the inherited gender system based on a masculine/feminine distinction found in many Indo-Aryan languages. Keywords Grammatical gender, areal-typology, Hindukush, animacy, nominal categories Grammatiskt genus i Hindukush-språk En areal-typologisk studie Julia Lautin Sammanfattning I den här studien undersöks grammatiskt genus i ett antal språk som talas i ett bergsområde beläget i norra Pakistan, nordvästra Afghanistan och Kashmir. I området, här kallat Greater Hindukush, talas omkring 50 olika språk från sex olika språkfamiljer. Det stora antalet språk tillsammans med den otillgängliga terrängen har gjort att språken är arkaiska i vissa hänseenden och innovativa i andra, vilket gör det till ett intressant område för arealtypologisk forskning.
    [Show full text]
  • A Comparative Phonetic Study of the Circassian Languages Author(S
    A comparative phonetic study of the Circassian languages Author(s): Ayla Applebaum and Matthew Gordon Proceedings of the 37th Annual Meeting of the Berkeley Linguistics Society: Special Session on Languages of the Caucasus (2013), pp. 3-17 Editors: Chundra Cathcart, Shinae Kang, and Clare S. Sandy Please contact BLS regarding any further use of this work. BLS retains copyright for both print and screen forms of the publication. BLS may be contacted via http://linguistics.berkeley.edu/bls/. The Annual Proceedings of the Berkeley Linguistics Society is published online via eLanguage, the Linguistic Society of America's digital publishing platform. A Comparative Phonetic Study of the Circassian Languages1 AYLA APPLEBAUM and MATTHEW GORDON University of California, Santa Barbara Introduction This paper presents results of a phonetic study of Circassian languages. Three phonetic properties were targeted for investigation: voice-onset time for stop consonants, spectral properties of the coronal fricatives, and formant values for vowels. Circassian is a branch of the Northwest Caucasian language family, which also includes Abhaz-Abaza and Ubykh. Circassian is divided into two dialectal subgroups: West Circassian (commonly known as Adyghe), and East Circassian (also known as Kabardian). The West Circassian subgroup includes Temirgoy, Abzekh, Hatkoy, Shapsugh, and Bzhedugh. East Circassian comprises Kabardian and Besleney. The Circassian languages are indigenous to the area between the Caspian and Black Seas but, since the Russian invasion of the Caucasus region in the middle of the 19th century, the majority of Circassians now live in diaspora communities, most prevalently in Turkey but also in smaller outposts throughout the Middle East and the United States.
    [Show full text]
  • THE INDO-EUROPEAN FAMILY — the LINGUISTIC EVIDENCE by Brian D
    THE INDO-EUROPEAN FAMILY — THE LINGUISTIC EVIDENCE by Brian D. Joseph, The Ohio State University 0. Introduction A stunning result of linguistic research in the 19th century was the recognition that some languages show correspondences of form that cannot be due to chance convergences, to borrowing among the languages involved, or to universal characteristics of human language, and that such correspondences therefore can only be the result of the languages in question having sprung from a common source language in the past. Such languages are said to be “related” (more specifically, “genetically related”, though “genetic” here does not have any connection to the term referring to a biological genetic relationship) and to belong to a “language family”. It can therefore be convenient to model such linguistic genetic relationships via a “family tree”, showing the genealogy of the languages claimed to be related. For example, in the model below, all the languages B through I in the tree are related as members of the same family; if they were not related, they would not all descend from the same original language A. In such a schema, A is the “proto-language”, the starting point for the family, and B, C, and D are “offspring” (often referred to as “daughter languages”); B, C, and D are thus “siblings” (often referred to as “sister languages”), and each represents a separate “branch” of the family tree. B and C, in turn, are starting points for other offspring languages, E, F, and G, and H and I, respectively. Thus B stands in the same relationship to E, F, and G as A does to B, C, and D.
    [Show full text]
  • The Dravidian Languages
    THE DRAVIDIAN LANGUAGES BHADRIRAJU KRISHNAMURTI The Pitt Building, Trumpington Street, Cambridge, United Kingdom The Edinburgh Building, Cambridge CB2 2RU, UK 40 West 20th Street, New York, NY 10011–4211, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia Ruiz de Alarc´on 13, 28014 Madrid, Spain Dock House, The Waterfront, Cape Town 8001, South Africa http://www.cambridge.org C Bhadriraju Krishnamurti 2003 This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2003 Printed in the United Kingdom at the University Press, Cambridge Typeface Times New Roman 9/13 pt System LATEX2ε [TB] A catalogue record for this book is available from the British Library ISBN 0521 77111 0hardback CONTENTS List of illustrations page xi List of tables xii Preface xv Acknowledgements xviii Note on transliteration and symbols xx List of abbreviations xxiii 1 Introduction 1.1 The name Dravidian 1 1.2 Dravidians: prehistory and culture 2 1.3 The Dravidian languages as a family 16 1.4 Names of languages, geographical distribution and demographic details 19 1.5 Typological features of the Dravidian languages 27 1.6 Dravidian studies, past and present 30 1.7 Dravidian and Indo-Aryan 35 1.8 Affinity between Dravidian and languages outside India 43 2 Phonology: descriptive 2.1 Introduction 48 2.2 Vowels 49 2.3 Consonants 52 2.4 Suprasegmental features 58 2.5 Sandhi or morphophonemics 60 Appendix. Phonemic inventories of individual languages 61 3 The writing systems of the major literary languages 3.1 Origins 78 3.2 Telugu–Kannada.
    [Show full text]
  • Turkic Languages 161
    Turkic Languages 161 seriously endangered by the UNESCO red book on See also: Arabic; Armenian; Azerbaijanian; Caucasian endangered languages: Gagauz (Moldovan), Crim- Languages; Endangered Languages; Greek, Modern; ean Tatar, Noghay (Nogai), and West-Siberian Tatar Kurdish; Sign Language: Interpreting; Turkic Languages; . Caucasian: Laz (a few hundred thousand speakers), Turkish. Georgian (30 000 speakers), Abkhaz (10 000 speakers), Chechen-Ingush, Avar, Lak, Lezghian (it is unclear whether this is still spoken) Bibliography . Indo-European: Bulgarian, Domari, Albanian, French (a few thousand speakers each), Ossetian Andrews P A & Benninghaus R (1989). Ethnic groups in the Republic of Turkey. Wiesbaden: Dr. Ludwig Reichert (a few hundred speakers), German (a few dozen Verlag. speakers), Polish (a few dozen speakers), Ukranian Aydın Z (2002). ‘Lozan Antlas¸masında azınlık statu¨ su¨; (it is unclear whether this is still spoken), and Farklı ko¨kenlilere tanınan haklar.’ In Kabog˘lu I˙ O¨ (ed.) these languages designated as seriously endangered Azınlık hakları (Minority rights). (Minority status in the by the UNESCO red book on endangered lan- Treaty of Lausanne; Rights granted to people of different guages: Romani (20 000–30 000 speakers) and Yid- origin). I˙stanbul: Publication of the Human Rights Com- dish (a few dozen speakers) mission of the I˙stanbul Bar. 209–217. Neo-Aramaic (Afroasiatic): Tu¯ ro¯ yo and Su¯ rit (a C¸ag˘aptay S (2002). ‘Otuzlarda Tu¨ rk milliyetc¸ilig˘inde ırk, dil few thousand speakers each) ve etnisite’ (Race, language and ethnicity in the Turkish . Languages spoken by recent immigrants, refugees, nationalism of the thirties). In Bora T (ed.) Milliyetc¸ilik ˙ ˙ and asylum seekers: Afroasiatic languages: (Nationalism).
    [Show full text]
  • LING 185 the Syntax of Austronesian Languages Preliminary Syllabus
    LING 185 The Syntax of Austronesian Languages Preliminary syllabus The goal of this class is to provide an introduction into comparative Austronesian syntax by discussing the most pertinent issues of Austronesian languages that have posed challenge to current syntactic theory and suggesting further readings and topics for discussion. The choice of the Austronesian language family as the focus of this class is not accidental. The Austronesian language family—roughly 1,200 genetically related languages dispersed over an area encompassing Madagascar, Taiwan, Southeast Asia, and islands of the Pacific—is often called the largest language family in the world. But it has been relatively little studied. Sophisticated research on the grammar of Austronesian languages did not really begin until the 1930’s and 1940’s (fueled, in part, by military interest in the Pacific region). Although there was a surge of interest in Austronesian in the 1970’s and—even more dramatically—in the 1990’s, the number of theoretical linguists working on these languages has remained small. Nonetheless, Austronesian languages have a significant contribution to make to linguistic theory, given the number of typologically unusual properties they exhibit (including the less common and poorly understood verb‐first word order, ergativity, and wh‐ agreement). If these languages were as well‐understood as, say, the Romance languages are today, syntactic theory could well be dramatically different. The following list illustrates just some of the intriguing features whose theoretical significance—already evident—will surely deepen when they are investigated from a comparative perspective: • Many Austronesian languages exhibit the uncommon word orders verb‐subject‐object (VSO) or verb‐object‐subject (VOS).
    [Show full text]
  • Linguistic History and Language Diversity in India: Views and Counterviews
    J Biosci (2019) 44:62 Indian Academy of Sciences DOI: 10.1007/s12038-019-9879-1 (0123456789().,-volV)(0123456789().,-volV) Linguistic history and language diversity in India: Views and counterviews SONAL KULKARNI-JOSHI Deccan College, Pune, India (Email, [email protected]) This paper addresses the theme of the seminar from the perspective of historical linguistics. It introduces the construct of ‘language family’ and then proceeds to a discussion of contact and the dynamics of linguistic exchange among the main language families of India over several millennia. Some prevalent hypotheses to explain the creation of India as a linguistic area are presented. The ‘substratum view’ is critically assessed. Evidence from historical linguistics in support of two dominant hypotheses –‘the Aryan migration view’ and ‘the out-of-India hypothesis’–is presented and briefly assessed. In conclusion, it is observed that the current understanding in historical linguistics favours the Aryan migration view though the ‘substratum view’ is questionable. Keywords. Aryan migration; historical linguistics; language family; Out-of-India hypothesis; substratum 1. Introduction the basis of social, political and cultural criteria more than linguistic criteria. The aim of this paper is to lend a linguistic perspective on This vast number of languages is classified into four (or the issue of human diversity and ancestry in India to the non- six) language families or genealogical types: Austro-Asiatic linguists at this seminar. The paper is an overview of the (Munda), Dravidian, Indo-Aryan (IA) and Tibeto-Burman; major views and evidences gleaned from the available more recently, two other language families have been literature.
    [Show full text]
  • A Bayesian Investigation of the Origin Hypotheses of the Dravidian Family
    A Bayesian investigation of the origin hypotheses of the Dravidian family Dravidian language family has about 81 languages according to Glottolog1. The Dravid- ian languages are well-studied [Krishnamurti, 2003] from a historical linguistics viewpoint. Re- cently, phylogenetic methods originating from bioinformatics have been applied to infer absolute chronologies of language trees for range of language families such as Indo-European [Rama, 2018, Chang et al., 2015], Pama-Nyungan [Bowern and Atkinson, 2012], and Dravidian [Kolipakam et al., 2018] based on lexical cognate data. The recent study by Kolipakam et al. [2018] infers a time depth of 4500 years for the root age of the Dravidian family (for 20 languages) based on calibration points such as the antiquity of Old Tamil (2100 Before Present [B.P]) and the first attestation of Telugu inscriptions dating to 1300 years B.P. The inferred tree does not group South-Dravidian I and II groups together under a single node. Although, the authors infer an age that matches with the age proposed by Krishnamurti [2003], the location of Proto-Dravidian has not been inferred through phylogeography techniques. Apart from assigning an age, Krishnamurti [2003] links Proto- Dravidian to Indus Valley Civilization. On the other hand, based on linguistic innovations and archaeological record Southworth [2004], associates Proto-Dravidian to be spoken in Lower Godavari Basin. We test the statistical support for both the origin hypotheses by applying phylogeography techniques to the lexical cognate dataset under different assumptions. The results of the first set of dating experiment where the phylogenetic tree is topologically constrained according to the findings of the comparative method is given in table 1.
    [Show full text]
  • In Search of Language Contact Between Jarawa and Aka-Bea: the Languages of South Andaman1
    Acta Orientalia 2011: 72, 1–40. Copyright © 2011 Printed in India – all rights reserved ACTA ORIENTALIA ISSN 0001-6483 In search of language contact between Jarawa and Aka-Bea: The languages of South Andaman1 Anvita Abbi and Pramod Kumar Cairns Institute, Cairns, Australia & Jawaharlal Nehru University, New Delhi Abstract The paper brings forth a preliminary report on the comparative data available on the extinct language Aka-Bea (Man 1923) and the endangered language Jarawa spoken in the south and the central parts of the Andaman Islands. Speakers of Aka-Bea, a South Andaman language of the Great Andamanese family and the speakers of Jarawa, the language of a distinct language family (Abbi 2006, 2009, Blevins 2008) lived adjacent to each other, i.e. in the southern region of the Great Andaman Islands in the past. Both had been hunter-gatherers and never had any contact with each other (Portman 1899, 1990). The Jarawas have been known for living in isolation for thousands of years, coming in contact with the outside world only recently in 1998. It is, then surprising to discover traces of some language-contact in the past between the two communities. Not a large database, but a few examples of lexical similarities between Aka-Bea and Jarawa are 1 The initial version of this paper was presented in The First Conference on ASJP and Language Prehistory (ALP-I), on 18 September 2010, Max Planck Institute of Evolutionary Anthropology, Leipzig, Germany. We thank Alexandra Aikhnevald for helpful comments on an earlier version of the paper. 2 Anvita Abbi & Pramod Kumar investigated here.
    [Show full text]
  • Revisiting the Position of Philippine Languages in the Austronesian Family
    The Br Andrew Gonzalez FSC (BAG) Distinguished Professorial Chair Lecture, 2017 De La Salle University Revisiting the Position of Philippine Languages in the Austronesian Family Lawrence A. Reid University of Hawai`i National Museum of the Philippines Abstract With recent claims from non-linguists that there is no such thing as an Austronesian language family, and that Philippine languages could have a different origin from one that all comparative linguists claim, it is appropriate to revisit the claims that have been made over the last few hundred years. Each has been popular in its day, and each has been based on evidence that under scrutiny has been shown to have problems, leading to new claims. This presentation will examine the range of views from early Spanish ideas about the relationship of Philippine languages, to modern Bayesian phylogenetic views, outlining the data upon which the claims have been made and pointing out the problems that each has. 1. Introduction Sometime in 1915 (or early 1916) (UP 1916), when Otto Scheerer was an assistant professor of German at the University of the Philippines, he gave a lecture to students in which he outlined three positions that had been held in the Philippines since the early 1600’s about the internal and external relations of Philippine languages. He wrote the following: 1. As early as 1604, the principal Philippine languages were recognized as constituting a linguistic unit. 2. Since an equally early time the belief was sustained that these languages were born of the Malay language as spoken on the Peninsula of Malacca.
    [Show full text]
  • Northwest Caucasian Languages and Hattic
    Kafkasya Calışmaları - Sosyal Bilimler Dergisi / Journal of Caucasian Studies Kasım 2020 / November 2020, Yıl / Vol. 6, № 11 ISSN 2149–9527 E-ISSN 2149-9101 Northwest Caucasian Languages and Hattic Ayla Bozkurt Applebaum* Abstract The relationships among five Northwest Caucasian languages and Hattic were investigated. A list of 193 core vocabulary words was constructed and examined to find look-alike words. Data for Abhkaz, Abaza, Kabardian (East Circassian), Adyghe (West Circassian) and Ubykh drew on the work of Starostin, Chirikba and Kuipers. A sub-set list of 15 look-alike words for Hattic was constructed from Soysal (2003). These lists were formulated as character data for reconstructing the phylogenetic relationships of the languages. The phylogenetic relationships of these languages were investigated by a well-known method, Neighbor Joining, as implemented in PAUP* 4.0. Supporting and dissenting evidence from human genetic population studies and archeological evidence were discussed. This project has produced a provisional set of character data for the Northwest Caucasian languages and, to a limited extent, Hattic. Phylogenetic trees have been generated and displayed to show their general character and the types of differences obtained by alternate methods. This research is a basis for further inquiries into the development of the Caucasian languages. Moreover, it presents an example of the method for contrast queries application in studying the evolution of language families. Keywords: Northwest Caucasian Languages, Hattic, Historical Linguistics, Circassian, Adyghe, Kabardian * Ayla Bozkurt Applebaum, ORCID 0000-0003-4866-4407, E-mail: [email protected] (Received/Gönderim: 15.10.2020; Accepted/Kabul: 28.11.2020) 63 Ayla Bozkurt Applebaum Kuzeybatı Kafkas Dilleri ve Hattice Özet Bu araştırma beş Kuzeybatı Kafkas Dilleri ve Hatik arasındaki ilişkiyi incelemektedir.
    [Show full text]
  • The Indo-European Languages the Indo-European Linguistic Family
    This article was downloaded by: 10.3.98.104 On: 27 Sep 2021 Access details: subscription number Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London SW1P 1WG, UK The Indo-European Languages Anna Giacalone Ramat, Paolo Ramat The Indo-European Linguistic Family: Genetic and Typological Perspectives Publication details https://www.routledgehandbooks.com/doi/10.4324/9780203880647.ch3 Bernard Comrie Published online on: 20 Nov 1997 How to cite :- Bernard Comrie. 20 Nov 1997, The Indo-European Linguistic Family: Genetic and Typological Perspectives from: The Indo-European Languages Routledge Accessed on: 27 Sep 2021 https://www.routledgehandbooks.com/doi/10.4324/9780203880647.ch3 PLEASE SCROLL DOWN FOR DOCUMENT Full terms and conditions of use: https://www.routledgehandbooks.com/legal-notices/terms This Document PDF may be used for research, teaching and private study purposes. Any substantial or systematic reproductions, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The publisher shall not be liable for an loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. 3 The Indo-European Linguistic Family: Genetic and Typological Perspectives Bernard Comrie Introduction: Genetic and Areal Affiliations The other chapters in this book are essentially inward-looking in terms of their Indo-European perspective, examining reasons for positing the genetic unity of the Indo-European languages and ways of accounting for their differ­ entiation from a single ancestor language.
    [Show full text]