Linguistic Diversity: Empirical Perspectives
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Chapter One Phonetic Change
CHAPTERONE PHONETICCHANGE The investigation of the nature and the types of changes that affect the sounds of a language is the most highly developed area of the study of language change. The term sound change is used to refer, in the broadest sense, to alterations in the phonetic shape of segments and suprasegmental features that result from the operation of phonological process es. The pho- netic makeup of given morphemes or words or sets of morphemes or words also may undergo change as a by-product of alterations in the grammatical patterns of a language. Sound change is used generally to refer only to those phonetic changes that affect all occurrences of a given sound or class of sounds (like the class of voiceless stops) under specifiable phonetic conditions . It is important to distinguish between the use of the term sound change as it refers tophonetic process es in a historical context , on the one hand, and as it refers to phonetic corre- spondences on the other. By phonetic process es we refer to the replacement of a sound or a sequenceof sounds presenting some articulatory difficulty by another sound or sequence lacking that difficulty . A phonetic correspondence can be said to exist between a sound at one point in the history of a language and the sound that is its direct descendent at any subsequent point in the history of that language. A phonetic correspondence often reflects the results of several phonetic process es that have affected a segment serially . Although phonetic process es are synchronic phenomena, they often have diachronic consequences. -
(1'9'6'8'""159-83 and in Later Publica- (1972 539
FINAL WEAKENING AND RELATED PHENOMENA1 Hans Henrich Hock University of Illinois at Urbana-Champaign 1: Final devoicing (FD) 1.1. In generative phonology, it is a generally accepted doctrine that, since word-final devoicing (WFD) is a very common and natural phenomenon, the ob- verse phenomenon, namely word-final voicin~ should not be found in natural language. Compare for instance Postal 1968 184 ('in the context----'~ the rules always devoice rather than voice'), Stampe 1969 443-5 (final devoicing comes about as the result of a failure t~ suppress the (innate) process of final devoicing), Vennemann 1972 240-1 (final voicing, defined as a process increasing the complexity of affected segment~ 'does not occur.')o 1.2 One of the standard examples for WFD is that of German, cf. Bund Bunde [bUnt] [bUndeJ. However Vennemann (1'9'6'8'""159-83 and in later publica- tions) and, following him, Hooper (1972 539) and Hyman (1975 142) have convincingly demonstrated that in Ger- man, this process applies not only word-finally, but also syllable-finally, as in radle [ra·t$le]3 'go by bike' (in some varieties of German). The standard view thus must be modified so as to recognize at least one other process, namely syllable-final devoicing (SFD). (For a different eA""Planation of this phenomenon compare section 2.3 below.) 2· Final voicing (or tenseness neutralization) 2 1 A more important argument against the stan- dard view, however, is that, as anyone with any train- ing in Indo-European linguistics can readily tell, there is at least one 5roup, namely Italic, where there is evidence for the allegedly impossible final voicing, cf PIE *siyet > OLat. -
Arxiv:1908.07448V1
Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing Milan Straka and Jana Strakova´ and Jan Hajicˇ Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics {strakova,straka,hajic}@ufal.mff.cuni.cz Abstract Shared Task (Zeman et al., 2018). • We report our best results on UD 2.3. The We present an extensive evaluation of three addition of contextualized embeddings im- recently proposed methods for contextualized 25% embeddings on 89 corpora in 54 languages provements range from relative error re- of the Universal Dependencies 2.3 in three duction for English treebanks, through 20% tasks: POS tagging, lemmatization, and de- relative error reduction for high resource lan- pendency parsing. Employing the BERT, guages, to 10% relative error reduction for all Flair and ELMo as pretrained embedding in- UD 2.3 languages which have a training set. puts in a strong baseline of UDPipe 2.0, one of the best-performing systems of the 2 Related Work CoNLL 2018 Shared Task and an overall win- ner of the EPE 2018, we present a one-to- A new type of deep contextualized word repre- one comparison of the three contextualized sentation was introduced by Peters et al. (2018). word embedding methods, as well as a com- The proposed embeddings, called ELMo, were ob- parison with word2vec-like pretrained em- tained from internal states of deep bidirectional beddings and with end-to-end character-level word embeddings. We report state-of-the-art language model, pretrained on a large text corpus. results in all three tasks as compared to results Akbik et al. -
Contextualizing Historical Lexicology the State of the Art of Etymological Research Within Linguistics
Contextualizing historical lexicology The state of the art of etymological research within linguistics University of Helsinki, May 15–17, 2017 Abstracts Organized by the project “Inherited and borrowed in the history of the Uralic languages” (funded by Kone Foundation) Contents I. Keynote lectures ................................................................................. 5 Martin Kümmel Etymological problems between Indo-Iranian and Uralic ................ 6 Johanna Nichols The interaction of word structure and lexical semantics .................. 9 Martine Vanhove Lexical typology and polysemy patterns in African languages ...... 11 II. Section papers ................................................................................. 12 Mari Aigro A diachronic study of the homophony between polar question particles and coordinators ............................................................. 13 Tommi Alho & Aleksi Mäkilähde Dating Latin loanwords in Old English: Some methodological problems ...................................................................................... 14 Gergely Antal Remarks on the shared vocabulary of Hungarian, Udmurt and Komi .................................................................................................... 15 Sofia Björklöf Areal distribution as a criterion for new internal borrowing .......... 16 Stefan Engelberg Etymology and Pidgin languages: Words of German origin in Tok Pisin ............................................................................................ 17 László -
Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format
Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format Alina Wróblewska Institute of Computer Science Polish Academy of Sciences ul. Jana Kazimierza 5 01-248 Warsaw, Poland [email protected] Abstract even for languages with rich morphology and rel- atively free word order, such as Polish. The paper presents the largest Polish Depen- The supervised learning methods require gold- dency Bank in Universal Dependencies for- mat – PDBUD – with 22K trees and 352K standard training data, whose creation is a time- tokens. PDBUD builds on its previous ver- consuming and expensive process. Nevertheless, sion, i.e. the Polish UD treebank (PL-SZ), dependency treebanks have been created for many and contains all 8K PL-SZ trees. The PL- languages, in particular within the Universal De- SZ trees are checked and possibly corrected pendencies initiative (UD, Nivre et al., 2016). in the current edition of PDBUD. Further The UD leaders aim at developing a cross- 14K trees are automatically converted from linguistically consistent tree annotation schema a new version of Polish Dependency Bank. and at building a large multilingual collection of The PDBUD trees are expanded with the en- hanced edges encoding the shared dependents dependency treebanks annotated according to this and the shared governors of the coordinated schema. conjuncts and with the semantic roles of some Polish is also represented in the Universal dependents. The conducted evaluation exper- Dependencies collection. There are two Polish iments show that PDBUD is large enough treebanks in UD: the Polish UD treebank (PL- for training a high-quality graph-based depen- SZ) converted from Składnica zalezno˙ sciowa´ 1 and dency parser for Polish. -
Universal Dependencies According to BERT: Both More Specific and More General
Universal Dependencies According to BERT: Both More Specific and More General Tomasz Limisiewicz and David Marecekˇ and Rudolf Rosa Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics Charles University, Prague, Czech Republic {limisiewicz,rosa,marecek}@ufal.mff.cuni.cz Abstract former based systems particular heads tend to cap- ture specific dependency relation types (e.g. in one This work focuses on analyzing the form and extent of syntactic abstraction captured by head the attention at the predicate is usually focused BERT by extracting labeled dependency trees on the nominal subject). from self-attentions. We extend understanding of syntax in BERT by Previous work showed that individual BERT examining the ways in which it systematically di- heads tend to encode particular dependency verges from standard annotation (UD). We attempt relation types. We extend these findings by to bridge the gap between them in three ways: explicitly comparing BERT relations to Uni- • We modify the UD annotation of three lin- versal Dependencies (UD) annotations, show- ing that they often do not match one-to-one. guistic phenomena to better match the BERT We suggest a method for relation identification syntax (x3) and syntactic tree construction. Our approach • We introduce a head ensemble method, com- produces significantly more consistent depen- dency trees than previous work, showing that bining multiple heads which capture the same it better explains the syntactic abstractions in dependency relation label (x4) BERT. • We observe and analyze multipurpose heads, At the same time, it can be successfully ap- containing multiple syntactic functions (x7) plied with only a minimal amount of supervi- sion and generalizes well across languages. -
The Phonemic − Syllabic Comparisons of Standard Malay and Palembang Malay Using a Historical Linguistic Perspective
Passage2013, 1(2), 35-44 The Phonemic − Syllabic Comparisons of Standard Malay and Palembang Malay Using a Historical Linguistic Perspective By: Novita Arsillah English Language and Literature Program (E-mail: [email protected] / mobile: +6281312203723) ABSTRACT This study is a historical linguistic investigation entitled The Phonemic − Syllabic Comparisons of Standard Malay and Palembang Malay Using a Historical Linguistic Perspective which aims to explore the types of sound changes found in Palembang Malay. The investigation uses a historical linguistic comparative method to compare the phonemic and syllabic changes between an ancestral language Standard Malay and its decent language Palembang Malay. Standard Malay refers to the Wilkinson dictionary in 1904. The participants of this study are seven native speakers of Palembang Malay whose ages range from 20 to 40 years old. The data were collected from the voices of the participants that were recorded along group conversations and interviews. This study applies the theoretical framework of sound changes which proposed by Terry Crowley in 1997 and Lily Campbell in 1999. The findings show that there are nine types of sound changes that were found as the results, namely assimilation (42.35%), lenition (20%), sound addition (3.53%), metathesis (1.18%), dissimilation (1.76%), abnormal sound changes (3.53%), split (13.53%), vowel rising (10.59%), and monophthongisation (3.53%). Keywords: Historical linguistics, standard Malay, Palembang Malay, comparative method, sound change, phoneme, syllable. 35 Novita Arsillah The Phonemic − Syllabic Comparisons of Standard Malay and Palembang Malay Using a Historical Linguistic Perspective INTRODUCTION spelling system in this study since it is considered to be the first Malay spelling This study classifies into the system that is used widely in Malaya, field of historical linguistics that Singapore, and Brunei (Omar, 1989). -
Lecture 5 Sound Change
An articulatory theory of sound change An articulatory theory of sound change Hypothesis: Most common initial motivation for sound change is the automation of production. Tokens reduced online, are perceived as reduced and represented in the exemplar cluster as reduced. Therefore we expect sound changes to reflect a decrease in gestural magnitude and an increase in gestural overlap. What are some ways to test the articulatory model? The theory makes predictions about what is a possible sound change. These predictions could be tested on a cross-linguistic database. Sound changes that take place in the languages of the world are very similar (Blevins 2004, Bateman 2000, Hajek 1997, Greenberg et al. 1978). We should consider both common and rare changes and try to explain both. Common and rare changes might have different characteristics. Among the properties we could look for are types of phonetic motivation, types of lexical diffusion, gradualness, conditioning environment and resulting segments. Common vs. rare sound change? We need a database that allows us to test hypotheses concerning what types of changes are common and what types are not. A database of sound changes? Most sound changes have occurred in undocumented periods so that we have no record of them. Even in cases with written records, the phonetic interpretation may be unclear. Only a small number of languages have historic records. So any sample of known sound changes would be biased towards those languages. A database of sound changes? Sound changes are known only for some languages of the world: Languages with written histories. Sound changes can be reconstructed by comparing related languages. -
Using Coarticulation Rules in Automatic Phonetic Transcription
Using Coarticulation Rules in Automatic Phonetic Transcription Arlindo Veiga 1, Sara Candeias 1, Luís Sá 1,2 , Fernando Perdigão 1,2 1 Instituto de Telecomunicações, Pólo de Coimbra 2 Universidade de Coimbra – DEEC, Pólo II, P-3030-290 Coimbra, Portugal {aveiga, saracandeias, luis, fp}@co.it.pt Abstract. Phonetic transcription of continuous speech databases is an important and necessary task for speech synthesis and recognition. Manual transcription requires well-trained experts and is time-consuming. So, it becomes more and more usual to implement automatic or semi-automatic transcription procedures. However, it is well known that pronunciation variations and coarticulation phenomena involve serious problems for automatic transcribers. This paper describes an approach to automatically label a continuous speech database at phonetic level using orthographic transcription (word level) and a set of pre- built acoustic models to deal with pronunciation variations. These pronunciation variations arise from coarticulation phenomena. Common intra- word and trans-word coarticulation phenomena will be described as well. Keywords: Automatic phonetic transcription, Coarticulation, Pronunciation variation 1 Introduction Careful manual transcription of a speech database produces best results comparing to automatic transcriptions. However, it requires a lot of resources and is usually a forbidden option to transcribe a large corpus. In [1] the authors refer more than one hundred times real-time to do manual transcriptions. Automatic or semi-automatic methods are often used to perform this task with accuracies that tend to manual transcriptions after several iterations. However, particularly in continuous speech, multi-pronunciations and coarticulation must be considered in the automatic transcriptions. Most of the work in this area, use forced alignment to deal with multi- pronunciation. -
ELEGY 397 ( ); K. Strecker, “Leoninische Hexameter Und
ELEGY 397 (); K. Strecker, “Leoninische Hexameter und speare, Two Gentlemen of Verona) can be recommended Pentameter in . Jahrhundert,” Neues Archiv für ältere to a would-be seducer. T is composite understanding deutsche Geschichtskunde (); C. M. Bowra, Early of the genre, however, is never fully worked out and Greek Elegists, d ed. (); P. F r i e d l ä n d e r , Epigram- gradually fades. mata: Greek Inscriptions in Verse from the Beginnings to T e most important cl. models for the later devel. of the Persian Wars ( ); M. Platnauer, Latin Elegiac Verse elegy are *pastoral: the lament for Daphnis (who died (); L. P. Wilkinson, Golden Latin Artistry ( ); of love) by T eocritus, the elegy for Adonis attributed T. G. Rosenmeyer, “Elegiac and Elegos,” California to Bion, the elegy on Bion attributed to Moschus, and Studies in Classical Antiquity (); D. Ross, Style another lament for Daphnis in the fi fth *eclogue of and Tradition in Catullus ( ); M. L. West, Studies in Virgil. All are stylized and mythic, with hints of ritual; Greek Elegy and Iambus ( ); A.W.H. Adkins, Poetic the fi rst three are punctuated by incantatory *refrains. Craft in the Early Greek Elegists ( ); R. M. Marina T e elegies on Daphnis are staged performances within Sáez, La métrica de los epigramas de Marcial (). an otherwise casual setting. Nonhuman elements of the T.V.F. BROGAN; A. T. COLE pastoral world are enlisted in the mourning: nymphs, satyrs, the landscape itself. In Virgil, the song of grief is E L E G I A C S T A N Z A , elegiac quatrain, heroic qua- paired with one celebrating the dead man’s apotheosis; train. -
Morpho-Phonology and the Greek Glides
Morpho-phonology and the Greek glides In the Greek phonological literature there have been many attempts to uncover the factors regulating hiatus (i.e. vocalic sequences) and synizesis (i.e. glide formation) in Modern Greek (Kazazis 1968, Warburton 1976, Setatos 1974, Nyman 1981, Deligiorgis 1987, Malikouti-Drachman & Drachman 1990). There is a cross-linguistic tendency to avoid hiatus, as shown for example in Casali (1997), but in Modern Greek, hiatus is avoided only in some cases. These two processes, hiatus and synizesis, are seen as two opposing forces in Greek phonology yielding either words with vocalic sequences such as [io] ~ [iu], [ia] in (2) or words where a vowel turns into a glide to avoid hiatus as in [i] ~ [ju], [ja] in (1). Synizesis is also compounded by strengthening or hardening in some cases (1a-d) resulting in the fricativization of the glides. While both classes of words in (1) and (2) involve neuter nouns, it is only the former class that involves [i]~[j] alternations between the nominative singular form and the rest. The data below thus give us two types of contrast: i) an [i]~[j] alternation between the nominative singular and the rest in (1) and ii) a [j]~[i] contrast between the genitive and plural forms of (1) versus those of (2). In the latter class hiatus is tolerated. NOM. SING GEN. SING NOM. PLURAL GLOSS (1) a. po. ði po. ðʝú po. ðʝa foot b. ðo.ka.ri ðo.ka.rʝú ðo.ka.rʝa girder c. ko.lo.ci.θi ko.lo.ci.θç ú ko.lo.ci.θça pumpkin d. -
Trees, Waves and Linkages: Models of Language Diversification Alexandre François
Trees, Waves and Linkages: Models of Language Diversification Alexandre François To cite this version: Alexandre François. Trees, Waves and Linkages: Models of Language Diversification. Bowern, Claire; Evans, Bethwyn. The Routledge Handbook of Historical Linguistics, Routledge, pp.161-189, 2014, 978-0-41552-789-7. 10.4324/9781315794013.ch6. halshs-00874726v2 HAL Id: halshs-00874726 https://halshs.archives-ouvertes.fr/halshs-00874726v2 Submitted on 19 Mar 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Routledge Handbook of Historical Linguistics Edited by Claire Bowern and Bethwyn Evans RH of Historical Linguistics BOOK.indb iii 3/26/2014 1:20:21 PM 6 Trees, waves and linkages Models of language diversifi cation Alexandre François 1 On the diversifi cation of languages 1.1 Language extinction, language emergence The number of languages spoken on the planet has oscillated up and down throughout the history of mankind.1 Different social factors operate in opposite ways, some resulting in the decrease of language diversity, others favouring the emergence of new languages. Thus, languages fade away and disappear when their speakers undergo some pressure towards abandoning their heritage language and replacing it in all contexts with a new language that is in some way more socially prominent (Simpson, this volume).