An Empirical Test of the Agglutination Hypothesis1
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Reconsidering the “Isolating Protolanguage Hypothesis” in the Evolution of Morphology Author(S): Jaïmé Dubé Proceedings
Reconsidering the “isolating protolanguage hypothesis” in the evolution of morphology Author(s): Jaïmé Dubé Proceedings of the 37th Annual Meeting of the Berkeley Linguistics Society (2013), pp. 76-90 Editors: Chundra Cathcart, I-Hsuan Chen, Greg Finley, Shinae Kang, Clare S. Sandy, and Elise Stickles Please contact BLS regarding any further use of this work. BLS retains copyright for both print and screen forms of the publication. BLS may be contacted via http://linguistics.berkeley.edu/bls/. The Annual Proceedings of the Berkeley Linguistics Society is published online via eLanguage, the Linguistic Society of America's digital publishing platform. Reconsidering the Isolating Protolanguage Hypothesis in the Evolution of Morphology1 JAÏMÉ DUBÉ Université de Montréal 1 Introduction Much recent work on the evolution of language assumes explicitly or implicitly that the original language was without morphology. Under this assumption, morphology is merely a consequence of language use: affixal morphology is the result of the agglutination of free words, and morphophonemic (MP) alternations arise through the morphologization of once regular phonological processes. This hypothesis is based on at least two questionable assumptions: first, that the methods and results of historical linguistics can provide a window on the evolution of language, and second, based on the claim that some languages have no morphology (the so-called isolating languages), that morphology is not a necessary part of language. The aim of this paper is to suggest that there is in fact no basis for what I will call the Isolating Proto-Language Hypothesis (henceforth IPH), either on historical or typological grounds, and that the evolution of morphology remains an interesting question. -
Using Morphemes from Agglutinative Languages Like Quechua and Finnish to Aid in Low-Resource Translation
Using Morphemes from Agglutinative Languages like Quechua and Finnish to Aid in Low-Resource Translation John E. Ortega [email protected] Dept. de Llenguatges i Sistemes Informatics, Universitat d’Alacant, E-03071, Alacant, Spain Krishnan Pillaipakkamnatt [email protected] Department of Computer Science, Hofstra University, Hempstead, NY 11549, USA Abstract Quechua is a low-resource language spoken by nearly 9 million persons in South America (Hintz and Hintz, 2017). Yet, in recent times there are few published accounts of successful adaptations of machine translation systems for low-resource languages like Quechua. In some cases, machine translations from Quechua to Spanish are inadequate due to error in alignment. We attempt to improve previous alignment techniques by aligning two languages that are simi- lar due to agglutination: Quechua and Finnish. Our novel technique allows us to add rules that improve alignment for the prediction algorithm used in common machine translation systems. 1 Introduction The NP-complete problem of translating natural languages as they are spoken by humans to machine readable text is a complex problem; yet, is partially solvable due to the accuracy of machine language translations when compared to human translations (Kleinberg and Tardos, 2005). Statistical machine translation (SMT) systems such as Moses 1, require that an algo- rithm be combined with enough parallel corpora, text from distinct languages that can be com- pared sentence by sentence, to build phrase translation tables from language models. For many European languages, the translation task of bringing words together in a sequential sentence- by-sentence format for modeling, known as word alignment, is not hard due to the abundance of parallel corpora in large data sets such as Europarl 2. -
Complex Adpositions and Complex Nominal Relators Benjamin Fagard, José Pinto De Lima, Elena Smirnova, Dejan Stosic
Introduction: Complex Adpositions and Complex Nominal Relators Benjamin Fagard, José Pinto de Lima, Elena Smirnova, Dejan Stosic To cite this version: Benjamin Fagard, José Pinto de Lima, Elena Smirnova, Dejan Stosic. Introduction: Complex Adpo- sitions and Complex Nominal Relators. Benjamin Fagard, José Pinto de Lima, Dejan Stosic, Elena Smirnova. Complex Adpositions in European Languages : A Micro-Typological Approach to Com- plex Nominal Relators, 65, De Gruyter Mouton, pp.1-30, 2020, Empirical Approaches to Language Typology, 978-3-11-068664-7. 10.1515/9783110686647-001. halshs-03087872 HAL Id: halshs-03087872 https://halshs.archives-ouvertes.fr/halshs-03087872 Submitted on 24 Dec 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Public Domain Benjamin Fagard, José Pinto de Lima, Elena Smirnova & Dejan Stosic Introduction: Complex Adpositions and Complex Nominal Relators Benjamin Fagard CNRS, ENS & Paris Sorbonne Nouvelle; PSL Lattice laboratory, Ecole Normale Supérieure, 1 rue Maurice Arnoux, 92120 Montrouge, France [email protected] -
Diminutive and Augmentative Functions of Some Luganda Noun Class Markers Samuel Namugala MA Thesis in Linguistics Norwegian Un
Diminutive and Augmentative Functions of some Luganda Noun Class Markers Samuel Namugala MA Thesis in Linguistics Norwegian University of Science and Technology (NTNU) Faculty of Humanities Department of Language and Literature Trondheim, April, 2014 To my parents, Mr. and Mrs. Wampamba, and my siblings, Polycarp, Lydia, Christine, Violet, and Joyce ii Acknowledgements I wish to express my gratitude to The Norwegian Government for offering me a grant to pursue the master’s program at NTNU. Without this support, I would perhaps not have achieved my dream of pursuing the master’s degree in Norway. Special words of thanks go to my supervisors, Professor Kaja Borthen and Professor Assibi Amidu for guiding me in writing this thesis. Your scholarly guidance, constructive comments and critical revision of the drafts has made it possible for me to complete this thesis. I appreciate the support and the knowledge that you have shared with me. I look forward to learn more from you. My appreciation also goes to my lecturers and the entire staff at the Department of Language and Literature. I am grateful to Professor Lars Hellan, Assoc. Professor Dorothee Beermann, Professor Wim Van Dommelen, and Assoc. Professor Jardar Abrahamsen for the knowledge you have shared with me since I joined NTNU. You have made me the linguist that I desired to be. I also wish to thank the authors that didn’t mind to help me when contacted for possible relevant literature for my thesis. My appreciation goes to Prof. Nana Aba Appiah Amfo (University of Ghana), Assistant Prof. George J. Xydopoulos (Linguistics School of Philology, University of Patras, Greece), Prof. -
Fusion, Exponence, and Flexivity in Hindukush Languages
Fusion, exponence, and flexivity in Hindukush languages An areal-typological study Hanna Rönnqvist Department of Linguistics Independent Project for the Degree of Master 30 HE credits General Linguistics Spring term 2015 Supervisor: Henrik Liljegren Examiner: Henrik Liljegren Fusion, exponence, and flexivity in Hindukush languages An areal-typological study Hanna Rönnqvist Abstract Surrounding the Hindukush mountain chain is a stretch of land where as many as 50 distinct languages varieties of several language meet, in the present study referred to as “The Greater Hindukush” (GHK). In this area a large number of languages of at least six genera are spoken in a multi-linguistic setting. As the region is in part characterised by both contact between languages as well as isolation, it constitutes an interesting field of study of similarities and diversity, contact phenomena and possible genealogical connections. The present study takes in the region as a whole and attempts to characterise the morphology of the many languages spoken in it, by studying three parameters: phonological fusion, exponence, and flexivity in view of grammatical markers for Tense-Mood-Aspect, person marking, case marking, and plural marking on verbs and nouns. The study was performed with the perspective of areal typology, employed grammatical descriptions, and was in part inspired by three studies presented in the World Atlas of Language Structures (WALS). It was found that the region is one of high linguistic diversity, even if there are common traits, especially between languages of closer contact, such as the Iranian and the Indo-Aryan languages along the Pakistani-Afghan border where purely concatenative formatives are more common. -
Structuring Variation in Romance Linguistics and Beyond in Honour of Leonardo M
Linguistik Aktuell Linguistics Today 252 Structuring Variation in Romance Linguistics and Beyond In honour of Leonardo M. Savoia Edited by Mirko Grimaldi Rosangela Lai Ludovico Franco Benedetta Baldi John Benjamins Publishing Company Structuring Variation in Romance Linguistics and Beyond Linguistik Aktuell/Linguistics Today (LA) issn 0166-0829 Linguistik Aktuell/Linguistics Today (LA) provides a platform for original monograph studies into synchronic and diachronic linguistics. Studies in LA confront empirical and theoretical problems as these are currently discussed in syntax, semantics, morphology, phonology, and systematic pragmatics with the aim to establish robust empirical generalizations within a universalistic perspective. For an overview of all books published in this series, please see http://benjamins.com/catalog/la Founding Editor Werner Abraham Universität Wien / Ludwig Maximilian Universität München General Editors Werner Abraham Elly van Gelderen Universität Wien / Arizona State University Ludwig Maximilian Universität München Advisory Editorial Board Josef Bayer Hubert Haider Ian Roberts University of Konstanz University of Salzburg Cambridge University Cedric Boeckx Terje Lohndal Lisa deMena Travis ICREA/UB Norwegian University of Science McGill University and Technology Guglielmo Cinque Sten Vikner University of Venice Christer Platzack University of Aarhus University of Lund Liliane Haegeman C. Jan-Wouter Zwart University of Ghent University of Groningen Volume 252 Structuring Variation in Romance Linguistics and Beyond -
A Morphological Lexicon of Esperanto with Morpheme Frequencies
A Morphological Lexicon of Esperanto with Morpheme Frequencies Eckhard Bick University of Southern Denmark Campusvej 55, DK-5230 Odense M Email: [email protected] Abstract This paper discusses the internal structure of complex Esperanto words (CWs). Using a morphological analyzer, possible affixation and compounding is checked for over 50,000 Esperanto lexemes against a list of 17,000 root words. Morpheme boundaries in the resulting analyses were then checked manually, creating a CW dictionary of 28,000 words, representing 56.4% of the lexicon, or 19.4% of corpus tokens. The error percentage of the EspGram morphological analyzer for new corpus CWs was 4.3% for types and 6.4% for tokens, with a recall of almost 100%, and wrong/spurious boundaries being more common than missing ones. For pedagogical purposes a morpheme frequency dictionary was constructed for a 16 million word corpus, confirming the importance of agglutinative derivational morphemes in the Esperanto lexicon. Finally, as a means to reduce the morphological ambiguity of CWs, we provide POS likelihoods for Esperanto suffixes. Keywords: Morphological Analysis, Esperanto, Affixation, Compounding, Morpheme Frequencies 1. Introduction 2. Morphological analysis As an artificial language with a focus on regularity and From a language technology perspective, inflexional facilitation of language acquisition, Esperanto was regularity, morphological transparency and surface-based designed with a morphology that allows (almost) free, access to semantic features turn POS tagging of Esperanto productive combination of roots, affixes and inflexion into a non-task, and facilitate the parsing of syntactic and endings. Thus, the root 'san' (healthy) not only accepts its semantic structures (Bick 2007). -
Towards an Alternative Description of Incomplete Sentences in Agglutinative Languages S. Ido a Thesis Submitted in Fulfilment O
Towards an Alternative Description of Incomplete Sentences in Agglutinative Languages S. Ido A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy 2001 University of Sydney I declare that this thesis is all my own work. I have acknowledged in formal citation the sources of any reference I have made to the work of others. ____________________________ Shinji Ido ____________________________ Date Title: Towards an Alternative Description of Incomplete Sentences in Agglutinative Languages Abstract: This thesis analyses ‘incomplete sentences’ in languages which utilise distinctively agglutinative components in their morphology. In the grammars of the languages dealt with in this thesis, there are certain types of sentences which are variously referred to as ‘elliptical sentences’ (Turkish eksiltili cümleler), ‘incomplete sentences’ (Uzbek to‘liqsiz gaplar), ‘cut-off sentences’ (Turkish kesik cümleler), etc., for which the grammarians provide elaborated semantic and syntactic analyses. The current work attempts to present an alternative approach for the analysis of such sentences. The distribution of morphemes in incomplete sentences is examined closely, based on which a system of analysis that can handle a variety of incomplete sentences in an integrated manner is proposed from a morphological point of view. It aims to aid grammarians as well as researchers in area studies by providing a simple description of incomplete sentences in agglutinative languages. The linguistic data are taken from Turkish, Uzbek, -
Proceedings of the VII Nereus International Workshop : "Clitic Doubling and Other Issues of the Syntax/Semantic Interface I
Arbeitspapier Nr. 128 Proceedings of the VII Nereus International Workshop: “Clitic Doubling and other issues of the syntax/semantic interface in Romance DPs” Susann Fischer & Mario Navarro (eds.) Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-0-372560 Fachbereich Sprachwissenschaft der Universität Konstanz Arbeitspapier Nr. 128 Proceedings of the VII NEREUS INTERNATIONAL WORKSHOP: CLITIC DOUBLING AND OTHER ISSUES OF THE SYNTAX/SEMANTIC INTERFACE IN ROMANCE DPS SUSANN FISCHER & MARIO NAVARRO (EDS.) Fachbereich Sprachwissenschaft Universität Konstanz FACH 185 D-78457 Konstanz Germany Konstanz November 2016 Schutzgebühr € 3,50 Fachbereich Sprachwissenschaft der Universität Konstanz Sekretariat des Fachbereichs Sprachwissenschaft, Frau Tania Simeoni, Fach 185, D – 78457 Konstanz, Tel. 07531/88-2465 Table of Contents Susann Fischer & Mario Navarro Preface Artemis Alexiadou DP internal (clitic) Doubling .............................................................................................. 1 Elena Anagnostopoulou Clitic Doubling and Object Agreement ............................................................................... 11 Klaus von Heusinger, Diego Romero & Georg A. Kaiser Differential Object Marking in Spanish ditransitive constructions. An empirical approach ........................................................................................................ 43 Mihaela Marchis Moreno & Carolina Petersen On locality effects in Romance: the role of clitic doubling ................................................ -
Automatic Discovery of Adposition Typology
Automatic Discovery of Adposition Typology Rishiraj Saha Roy Rahul Katare∗ IIT Kharagpur IIT Kharagpur Kharagpur, India – 721302. Kharagpur, India – 721302. [email protected] [email protected] Niloy Ganguly Monojit Choudhury IIT Kharagpur Microsoft Research India Kharagpur, India – 721302. Bangalore, India – 560001. [email protected] [email protected] Abstract Natural languages (NL) can be classified as prepositional or postpositional based on the order of the noun phrase and the adposition. Categorizing a language by its adposition typology helps in addressing several challenges in linguistics and natural language processing (NLP). Understand- ing the adposition typologies for less-studied languages by manual analysis of large text corpora can be quite expensive, yet automatic discovery of the same has received very little attention till date. This research presents a simple unsupervised technique to automatically predict the adpo- sition typology for a language. Most of the function words of a language are adpositions, and we show that function words can be effectively separated from content words by leveraging differ- ences in their distributional properties in a corpus. Using this principle, we show that languages can be classified as prepositional or postpositional based on the rank correlations derived from entropies of word co-occurrence distributions. Our claims are substantiated through experiments on 23 languages from ten diverse families, 19 of which are correctly classified by our technique. 1 Introduction Adpositions form a subcategory of function words that combine with noun phrases to denote their se- mantic or grammatical relationships with verbs, and sometimes other noun phrases. NLs can be neatly divided into a few basic typologies based on the order of the noun phrase and its adposition. -
ONLINE APPENDIX: Not for Print Publication
ONLINE APPENDIX: not for print publication A Additional Tables and Figures Figure A1: Permutation Tests Panel A: Female Labor Force Participation Panel B: Gender Difference in Labor Force Participation A1 Table A1: Cross-Country Regressions of LFP Ratio Dependent variable: LFPratio Specification: OLS OLS OLS (1) (2) (3) Proportion speaking gender language -0.16 -0.25 -0.18 (0.03) (0.04) (0.04) [p < 0:001] [p < 0:001] [p < 0:001] Continent Fixed Effects No Yes Yes Country-Level Geography Controls No No Yes Observations 178 178 178 R2 0.13 0.37 0.44 Robust standard errors are clustered by the most widely spoken language in all specifications; they are reported in parentheses. P-values are reported in square brackets. LFPratio is the ratio of the percentage of women in the labor force, mea- sured in 2011, to the percentage of men in the labor force. Geography controls are the percentage of land area in the tropics or subtropics, average yearly precipitation, average temperature, an indicator for being landlocked, and the Alesina et al. (2013) measure of suitability for the plough. A2 Table A2: Cross-Country Regressions of LFP | Including \Bad" Controls Dependent variable: LFPf LFPf - LFPm Specification: OLS OLS (1) (2) Proportion speaking gender language -6.66 -10.42 (2.80) (2.84) [p < 0:001] [p < 0:001] Continent Fixed Effects Yes Yes Country-Level Geography Controls Yes Yes Observations 176 176 R2 0.57 0.68 Robust standard errors are clustered by the most widely spoken language in all specifications; they are reported in parentheses. -
Bachelor Thesis
2007:085 BACHELOR THESIS A comparison of the morphological typology of Swedish and English noun phrases Emma Andersson Palola Luleå University of Technology Bachelor thesis English Department of Language and Culture 2007:085 - ISSN: 1402-1773 - ISRN: LTU-CUPP--07/085--SE A comparison of the morphological typology of Swedish and English noun phrases Emma Andersson Palola English 3 for Teachers Supervisor: Marie Nordlund Abstract The purpose of this study has been to examine the similarities and differences of the morphological typology of English and Swedish noun phrases. The principal aim has been to analyse the synthetic and analytic properties that English and Swedish possess and compare them to each other. The secondary aim has been to analyse and compare their agglutinative and fusional properties. According to their predominant characteristics, the languages have been positioned on the Index of Synthesis and the Index of Fusion respectively. The results show that English is mainly an analytic language, while Swedish is mainly a synthetic language. English also shows predominantly agglutinating characteristics, while Swedish shows mainly fusional characteristics. Table of contents 1. Introduction ............................................................................................................................ 1 1.1. Aim.................................................................................................................................. 2 1.2. Method and material........................................................................................................2