Identifying Cognate Sets Across Dictionaries of Related Languages
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
An Automatic Part-Of-Speech Tagger for Middle Low German
An automatic part-of-speech tagger for Middle Low German Mariya Koleva, Melissa Farasyn, Bart Desmet, Anne Breitbarth and Véronique Hoste Ghent University Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them. Keywords: historical linguistics, part-of-speech tagging, conditional random fields, feature selection, normalization 1. Introduction Corpora of historical texts annotated with different levels of grammatical information, such as parts of speech, (inflectional) morphology, syntactic chunks, clausal syntax, provide an important resource for studies of diachronic syntactic variation and change (e.g. Kroch et al. 2000, Rögnvaldsson & Helgadóttir 2011). They enable the automatic extraction of syntactic information from historical texts (more than is manually possible), and allow making statistically valid observations. Apart from reducing the amount of time required for data retrieval, an important advantage is that they make research testable and replicable. The Corpus of Historical Low German (CHLG) (Breitbarth et al. -
Interdisciplinary Approaches to Stratifying the Peopling of Madagascar
INTERDISCIPLINARY APPROACHES TO STRATIFYING THE PEOPLING OF MADAGASCAR Paper submitted for the proceedings of the Indian Ocean Conference, Madison, Wisconsin 23-24th October, 2015 Roger Blench McDonald Institute for Archaeological Research University of Cambridge Correspondence to: 8, Guest Road Cambridge CB1 2AL United Kingdom Voice/ Ans (00-44)-(0)1223-560687 Mobile worldwide (00-44)-(0)7847-495590 E-mail [email protected] http://www.rogerblench.info/RBOP.htm This version: Makurdi, 1 April, 2016 1 Malagasy - Sulawesi lexical connections Roger Blench Submission version TABLE OF CONTENTS TABLE OF CONTENTS................................................................................................................................. i ACRONYMS ...................................................................................................................................................ii 1. Introduction................................................................................................................................................. 1 2. Models for the settlement of Madagascar ................................................................................................. 2 3. Linguistic evidence...................................................................................................................................... 2 3.1 Overview 2 3.2 Connections with Sulawesi languages 3 3.2.1 Nouns.............................................................................................................................................. -
CHAPTER SEVENTEEN History of the German Language 1 Indo
CHAPTER SEVENTEEN History of the German Language 1 Indo-European and Germanic Background Indo-European Background It has already been mentioned in this course that German and English are related languages. Two languages can be related to each other in much the same way that two people can be related to each other. If two people share a common ancestor, say their mother or their great-grandfather, then they are genetically related. Similarly, German and English are genetically related because they share a common ancestor, a language which was spoken in what is now northern Germany sometime before the Angles and the Saxons migrated to England. We do not have written records of this language, unfortunately, but we have a good idea of what it must have looked and sounded like. We have arrived at our conclusions as to what it looked and sounded like by comparing the sounds of words and morphemes in earlier written stages of English and German (and Dutch) and in modern-day English and German dialects. As a result of the comparisons we are able to reconstruct what the original language, called a proto-language, must have been like. This particular proto-language is usually referred to as Proto-West Germanic. The method of reconstruction based on comparison is called the comparative method. If faced with two languages the comparative method can tell us one of three things: 1) the two languages are related in that both are descended from a common ancestor, e.g. German and English, 2) the two are related in that one is the ancestor of the other, e.g. -
The Germanic Third Weak Class
THE GERMANIC THIRD WEAK CLASS JAY H. JASANOFF Harvard University Germanic verbs of the 3rd weak class form presents characterized by an alternation between predesinential *ai (e.g. Go. 3sg. habai1» and *a (e.g. Ipl. habam). These verbs are usually compared with the 'e-verbs' of Italic and Balto Slavic, but no IE present built on the stative suffix *-e- will account phonologi cally for the form of the suffix in Germanic. Instead, it can be shown that the characteristic Germanic paradigm results from the 'activization' of an older middle paradigm in which a 3sg. in *-ai « IE *-oi; cf. Skt. duh~ 'milks') was further suffixed by the productive active ending *-Pi « IE *-ti). 1. The inflection of the third class of weak verbs (exemplified by the verb 'to have': Gothic haban, Old Norse hafa,! Old High German hab~, Old Saxon hebbian, Old English habban) presents one of the classic problems in the historical morphology of Germanic. Not only do the verbs of this class show peculiarities in all the older Germanic languages, but they differ remarkably in their conjugation from one language to another, so that it is not at all obvious how the Common Germanic paradigm should be reconstructed. Given this diversity of forms, we will do well to begin with a short review of the morphological facts themselves. The situation is simplest in Old High German. The entire conjugation of hab~n is athematic (to the extent that this term still has any meaning), and is based on the single stem hab~-: 1sg. hab~, 3sg. hab~t, 3pl. -
Teaching English-Spanish Cognates Using the Texas 2X2 Picture Book Reading Lists
TEACHING ENGLISH-SPANISH COGNATES USING THE TEXAS 2X2 PICTURE BOOK READING LISTS JOSÉ A. MONTELONGO, ANITA C. HERNÁNDEZ, & ROBERTA J. HERTER ABSTRACT English-Spanish cognates are words that possess identical or nearly identical spellings and meanings in both English and Spanish as a result of being derived mainly from Latin and Greek. Of major importance is the fact that many of the more than 20,000 cognates in English are academic vocabulary words, terms essential for comprehending school texts. The Texas 2x2 Reading List is a list of recommended reading books for children ranging in ages from pre-school to the early primary grades. The list is published yearly by the Children’s Round Table, a division of the Texas Library Association. The books that comprise the Texas 2x2 Reading List are a rich source of vocabulary and contain many English-Spanish cognates. Teachers can use the Texas 2x2 picture books to create a cognate vocabulary lesson that can be taught as a companion to a picture book read- aloud. The purpose of this paper is to present some of the different types of cognate vocabulary lessons that may be created to accompany a picture book read-aloud. The lessons are based on the morphological and spelling regularities between English and Spanish cognates and can be used to teach students how to convert words from one language to another. Examples of the different types of regularities and the Texas 2x2 books that contain them are included, as is an example cognate vocabulary lesson plan to accompany the picture book, Oddrey (Whamond, 2012). -
Cognate Words in Mehri and Hadhrami Arabic
Cognate Words in Mehri and Hadhrami Arabic Hassan Obeid Alfadly* Khaled Awadh Bin Mukhashin** Received: 18/3/2019 Accepted: 2/5/2019 Abstract The lexicon is one important source of information to establish genealogical relations between languages. This paper is an attempt to describe the lexical similarities between Mehri and Hadhrami Arabic and to show the extent of relatedness between them, a very little explored and described topic. The researchers are native speakers of Hadhrami Arabic and they paid many field visits to the area where Mehri is spoken. They used the Swadesh list to elicit their data from more than 20 Mehri informants and from Johnston's (1987) dictionary "The Mehri Lexicon and English- Mehri Word-list". The researchers employed lexicostatistical techniques to analyse their data and they found out that Mehri and Hadhrmi Arabic have so many cognate words. This finding confirms Watson (2011) claims that Arabic may not have replaced all the ancient languages in the South-Western Arabian Peninsula and that dialects of Arabic in this area including Hadhrami Arabic are tinged, to a greater or lesser degree, with substrate features of the Pre- Islamic Ancient and Modern South Arabian languages. Introduction: three branches including Central Semitic, Historically speaking, the Semitic language Ethiopian and Modern south Arabian languages family from which both of Arabic and Mehri (henceforth MSAL). Though Arabic and Mehri descend belong to a larger family of languages belong to the West Semitic, Arabic descends called Afro-Asiatic or Hamito-Semitic that from the Central Semitic and Mehri from includes Semitic, Egyptian, Cushitic, Omotic, (MSAL) which consists of two branches; the Berber and Chadic (Rubin, 2010). -
A Penn-Style Treebank of Middle Low German
A Penn-style Treebank of Middle Low German Hannah Booth Joint work with Anne Breitbarth, Aaron Ecay & Melissa Farasyn Ghent University 12th December, 2019 1 / 47 Context I Diachronic parsed corpora now exist for a range of languages: I English (Taylor et al., 2003; Kroch & Taylor, 2000) I Icelandic (Wallenberg et al., 2011) I French (Martineau et al., 2010) I Portuguese (Galves et al., 2017) I Irish (Lash, 2014) I Have greatly enhanced our understanding of syntactic change: I Quantitative studies of syntactic phenomena over time I Findings which have a strong empirical basis and are (somewhat) reproducible 2 / 47 Context I Corpus of Historical Low German (‘CHLG’) I Anne Breitbarth (Gent) I Sheila Watts (Cambridge) I George Walkden (Konstanz) I Parsed corpus spanning: I Old Low German/Old Saxon (c.800-1050) I Middle Low German (c.1250-1600) I OLG component already available: HeliPaD (Walkden, 2016) I 46,067 words I Heliand text I MLG component currently under development 3 / 47 What is Middle Low German? I MLG = West Germanic scribal dialects in Northern Germany and North-Eastern Netherlands 4 / 47 What is Middle Low German? I The rise and fall of (written) Low German I Pre-800: pre-historical I c.800-1050: Old Low German/Old Saxon I c.1050-1250 Attestation gap (Latin) I c.1250-1370: Early MLG I c.1370-1520: ‘Classical MLG’ (Golden Age) I c.1520-1850: transition to HG as in written domain I c.1850-today: transition to HG in spoken domain 5 / 47 What is Middle Low German? I Hanseatic League: alliance between North German towns and trade outposts abroad to promote economic and diplomatic interests (13th-15th centuries) 6 / 47 What is Middle Low German? I LG served as lingua franca for supraregional communication I High prestige across North Sea and Baltic regions I Associated with trade and economic prosperity I Linguistic legacy I Huge amounts of linguistic borrowings in e.g. -
Old Frisian, an Introduction To
An Introduction to Old Frisian An Introduction to Old Frisian History, Grammar, Reader, Glossary Rolf H. Bremmer, Jr. University of Leiden John Benjamins Publishing Company Amsterdam / Philadelphia TM The paper used in this publication meets the minimum requirements of 8 American National Standard for Information Sciences — Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984. Library of Congress Cataloging-in-Publication Data Bremmer, Rolf H. (Rolf Hendrik), 1950- An introduction to Old Frisian : history, grammar, reader, glossary / Rolf H. Bremmer, Jr. p. cm. Includes bibliographical references and index. 1. Frisian language--To 1500--Grammar. 2. Frisian language--To 1500--History. 3. Frisian language--To 1550--Texts. I. Title. PF1421.B74 2009 439’.2--dc22 2008045390 isbn 978 90 272 3255 7 (Hb; alk. paper) isbn 978 90 272 3256 4 (Pb; alk. paper) © 2009 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa Table of contents Preface ix chapter i History: The when, where and what of Old Frisian 1 The Frisians. A short history (§§1–8); Texts and manuscripts (§§9–14); Language (§§15–18); The scope of Old Frisian studies (§§19–21) chapter ii Phonology: The sounds of Old Frisian 21 A. Introductory remarks (§§22–27): Spelling and pronunciation (§§22–23); Axioms and method (§§24–25); West Germanic vowel inventory (§26); A common West Germanic sound-change: gemination (§27) B. -
Comparing the Cognate Effect in Spoken and Written Second Language Word Production
1 Short title: Cognate effect in spoken and written word production Comparing the cognate effect in spoken and written second language word production 1 2 1 Merel Muylle , Eva Van Assche & Robert J. Hartsuiker 1Department of Experimental Psychology, Ghent University, Ghent, Belgium 2 Thomas More University of Applied Sciences, Antwerp, Belgium Address for correspondence: Merel Muylle Department of Experimental Psychology Ghent University Henri Dunantlaan 2 B-9000 Gent (Belgium) E-mail: [email protected] 2 Abstract Cognates – words that share form and meaning between languages – are processed faster than control words. However, it is unclear whether this effect is merely lexical (i.e., central) in nature, or whether it cascades to phonological/orthographic (i.e., peripheral) processes. This study compared the cognate effect in spoken and typewritten production, which share central, but not peripheral processes. We inquired whether this effect is present in typewriting, and if so, whether its magnitude is similar to spoken production. Dutch-English bilinguals performed either a spoken or written picture naming task in English; picture names were either Dutch-English cognates or control words. Cognates were named faster than controls and there was no cognate-by-modality interaction. Additionally, there was a similar error pattern in both modalities. These results suggest that common underlying processes are responsible for the cognate effect in spoken and written language production, and thus a central locus of the cognate effect. Keywords: bilingualism, word production, cognate effect, writing 3 Converging evidence suggests that bilinguals activate both their mother tongue (L1) and their second language (L2) simultaneously when processing linguistic information (e.g., Dijkstra & Van Heuven, 2002; Van Hell & Dijkstra, 2002). -
1 Roger Schwarzschild Rutgers University 18 Seminary Place New
Roger Schwarzschild Rutgers University 18 Seminary Place New Brunswick, NJ 08904 [email protected] to appear in: Recherches Linguistiques de Vincennes November 30, 2004 ABSTRACT In some languages, measure phrases can appear with non-compared adjectives: 5 feet tall. I address three questions about this construction: (a) Is the measure phrase an argument of the adjective or an adjunct? (b) What are we to make of the markedness of this construction *142lbs heavy? (c) Why is it that the markedness disappears once the adjective is put in the comparative (2 inches taller alongside 2lbs heavier)? I claim that because degree arguments are ‘functional’, the measure phrase has to be an adjunct and not a syntactic argument of the adjective. Like event modifiers in extended NP’s and in VPs, the measure phrase predicates of a degree argument of the adjective. But given the kind of meaning a measure phrase must have to do its job in comparatives and elsewhere, it is not of the right type to directly predicate of a degree argument. I propose a lexically governed type-shift which applies to some adjectives allowing them to combine with a measure phrase. KEY WORDS adjective, measure phrase, degree, functional category, lexical, adjunct, argument, antonymy. Measure Phrases as Modifiers of Adjectives1 1. Introduction There is a widely accepted account of expressions like five feet tall according to which the adjective tall denotes a relation between individuals and degrees of height and the measure phrase, five feet, serves as an argument of the adjective, saturating the degree-place in the relation2. -
Palatalization/Velar Softening: What It Is and What It Tells Us About the Nature of Language Morris Halle
Palatalization/Velar Softening: What It Is and What It Tells Us about the Nature of Language Morris Halle This study proposes an account both of the consonantal changes in- volved in palatalization/velar softening and of the fact that this change is encountered before front vowels. The change is a straightforward case of feature assimilation provided that segments/phonemes are viewed as complexes of features organized into the ‘‘bottle brush’’ model illustrated in example (4) and elsewhere in the text, and that the universal set of features includes, in addition to the familiar binary features, six unary features, which specify the designated articulator(s) for every segment (not only for consonants). Keywords: phonetics, palatalization, velar softening, features, articula- tors For Ken Stevens, friend and colleague, on his 80th birthday My purpose in this study is to present an account of the very common alternation between dorsal and coronal consonants often referred to as palatalization or velar softening. This alternation, exemplified by English electri[k] ϳ electri[s]ity, occurs most often before front vowels. In spite of its extremely common occurrence in the languages of the world, to this time there has been no proper account of palatalization that would relate it to the other properties of language, in particular, to the fact that it is found most commonly before front vowels. In the course of working on these remarks, it became clear to me that palatalization raises numerous theoretical questions about which there is at present no agreement among phonologists. Since these include matters of the most fundamental importance for phonology, I start by exten- sively discussing the main issues and the views on them that seem to me most persuasive at this time (section 1). -
Jicaque As a Hokan Language Author(S): Joseph H
Jicaque as a Hokan Language Author(s): Joseph H. Greenberg and Morris Swadesh Source: International Journal of American Linguistics, Vol. 19, No. 3 (Jul., 1953), pp. 216- 222 Published by: The University of Chicago Press Stable URL: http://www.jstor.org/stable/1263010 Accessed: 11-07-2017 15:04 UTC REFERENCES Linked references are available on JSTOR for this article: http://www.jstor.org/stable/1263010?seq=1&cid=pdf-reference#references_tab_contents You may need to log in to JSTOR to access the linked references. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://about.jstor.org/terms The University of Chicago Press is collaborating with JSTOR to digitize, preserve and extend access to International Journal of American Linguistics This content downloaded from 12.14.13.130 on Tue, 11 Jul 2017 15:04:26 UTC All use subject to http://about.jstor.org/terms JICAQUE AS A HOKAN LANGUAGE JOSEPH H. GREENBERG AND MORRIS SWADESH COLUMBIA UNIVERSITY 1. The problem 2. The phonological equivalences in Hokan 2. Phonological note have been largely established by Edward 3. Cognate list Sapir's work.3 The Jicaque agreements are 4. Use of lexical statistics generally obvious. A special point is that 5.