Complete Paper
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Syntactic Transformations for Swiss German Dialects
Syntactic transformations for Swiss German dialects Yves Scherrer LATL Universite´ de Geneve` Geneva, Switzerland [email protected] Abstract These transformations are accomplished by a set of hand-crafted rules, developed and evaluated on While most dialectological research so far fo- the basis of the dependency version of the Standard cuses on phonetic and lexical phenomena, we German TIGER treebank. Ultimately, the rule set use recent fieldwork in the domain of dia- lect syntax to guide the development of mul- can be used either as a tool for treebank transduction tidialectal natural language processing tools. (i.e. deriving Swiss German treebanks from Stan- In particular, we develop a set of rules that dard German ones), or as the syntactic transfer mod- transform Standard German sentence struc- ule of a transfer-based machine translation system. tures into syntactically valid Swiss German After the discussion of related work (Section 2), sentence structures. These rules are sensitive we present the major syntactic differences between to the dialect area, so that the dialects of more Standard German and Swiss German dialects (Sec- than 300 towns are covered. We evaluate the tion 3). We then show how these differences can transformation rules on a Standard German treebank and obtain accuracy figures of 85% be covered by a set of transformation rules that ap- and above for most rules. We analyze the most ply to syntactically annotated Standard German text, frequent errors and discuss the benefit of these such as found in treebanks (Section 4). In Section transformations for various natural language 5, we give some coverage figures and discuss the processing tasks. -
Searching for the Origins of Uruguayan Fronterizo Dialects: Radical Code-Mixing As “Fluent Dysfluency”
Searching for the origins of Uruguayan Fronterizo dialects: radical code-mixing as “fluent dysfluency” JOHN M. LIPSKI Abstract Spoken in northern Uruguay along the border with Brazil are intertwined Spanish-Portuguese dialects known to linguists as Fronterizo `border’ dialects, and to the speakers themselves as portuñol. Since until the second half of the 19th century northern Uruguay was populated principally by monolingual Portuguese speakers, it is usually assumed that Fronterizo arose when Spanish-speaking settlers arrived in large numbers. Left unexplained, however, is the genesis of morphosyntactically intertwined language, rather than, e.g. Spanish with many Portuguese borrowings or vice versa. The present study analyzes data from several communities along the Brazilian border (in Argentina, Bolivia, and Paraguay), where Portuguese is spoken frequently but dysfluently (with much involuntary mixing of Spanish) by Spanish speakers in their dealings with Brazilians. A componential analysis of mixed language from these communities is compared with Uruguayan Fronterizo data, and a high degree of quantitative structural similarity is demonstrated. The inclusion of sociohistorical data from late 19th century northern Uruguay complements the contemporary Spanish-Portuguese mixing examples, in support of the claim that Uruguayan Fronterizo was formed not in a situation of balanced bilingualism but rather as the result of the sort of fluid but dysfluent approximations to a second language found in contemporary border communities. 1. Introduction Among the languages of the world, mixed or intertwined languages are quite rare, and have provoked considerable debate among linguists. The most well- known cases, Michif, combining French and Cree (Bakker and Papen 1997; Bakker 1996), and Media Lengua, combining Spanish and Quechua (Muysken 1981, 1989, 1997), rather systematically juxtapose lexical items from a European language and functional items from a Native American language, in Journal of Portuguese Linguistics, 8-1 (2009), 3-44 ISSN 1645-4537 4 John M. -
The Germanic Heldenlied and the Poetic Edda: Speculations on Preliterary History
Oral Tradition, 19/1 (2004): 43-62 The Germanic Heldenlied and the Poetic Edda: Speculations on Preliterary History Edward R. Haymes One of the proudest inventions of German scholarship in the nineteenth century was the Heldenlied, the heroic song, which was seen by scholars as the main conduit of Germanic heroic legend from the Period of Migrations to the time of their being written down in the Middle Ages. The concept stems indirectly from the suggestions of several eighteenth-century Homeric scholars that since the Homeric poems were much too long to have been memorized and performed in oral tradition, they must have existed as shorter, episodic songs. Friedrich August Wolf’s well-known Prolegomena ad Homerum (1795) collected evidence for the idea that writing was not used for poetry until long after Homer’s time. He argued for a thorough recension of the poem under (or perhaps by) Pisistratus in the sixth century BCE as the first comprehensive written Homer. These ideas were almost immediately applied to the Middle High German Nibelungenlied by Karl Lachmann (1816), who was trained as a classical philologist and indeed continued to contribute in that area at the same time that he was one of the most influential members of the generation that founded the new discipline of Germanistik. On the basis of rough spots and contradictions (not only Homer nods!) Lachmann thought he could recognize twenty separate Lieder in the Middle High German epic. At the same time that Lachmann was deconstructing the German medieval epic, Elias Lönnrot was assembling the Finnish epic he called Kalevala from shorter songs in conscious imitation of the Homer (or Pisistratus) described by Wolf. -
Morphological Analysis and Lemmatization for Swiss German Using Weighted Transducers
Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016) Morphological analysis and lemmatization for Swiss German using weighted transducers Reto Baumgartner University of Zurich [email protected] Abstract haar and Wyler, 1997, p. 37). Conversely it pos- sesses infinitive particles that are not known to StG. With written Swiss German becoming SwG consists of different local dialects that more popular in everyday use, it has be- mainly differ in phonology and to a lesser extent in come a target for text processing. The ab- vocabulary. There is no standard orthography, but sence of a standard orthography and the there are proposals for sound-character assignment variety of dialects, however, lead to a vast like Dieth-Schreibung (Dieth, 1986) or Bärndüt- variation in different spellings which makes schi Schrybwys (Marti, 1985a) that are, however, this task difficult. We built a system based not known to everyone. This results in a high vari- on weighted transducers that recognizes ability of spellings influenced by both dialects and over 90% of the tokens in certain texts. personal writing preferences. As an example for Weights ensure preferring the best analysis the StG word Jahr “year”, we found in our corpus for most words while at the same time al- Jahr and Jaar, Johr and Joor, even Joh for different lowing for very broad range of spelling pronunciations and spelling preferences. variations. Our morphological tagset that The lack of a standard orthography and the vast- we defined for this purpose and lemmas in ness of variants motivate the choices that have to Standard German open the possibility for be made to process these dialects. -
Prioritizing African Languages: Challenges to Macro-Level Planning for Resourcing and Capacity Building
Prioritizing African Languages: Challenges to macro-level planning for resourcing and capacity building Tristan M. Purvis Christopher R. Green Gregory K. Iverson University of Maryland Center for Advanced Study of Language Abstract This paper addresses key considerations and challenges involved in the process of prioritizing languages in an area of high linguistic di- versity like Africa alongside other world regions. The paper identifies general considerations that must be taken into account in this process and reviews the placement of African languages on priority lists over the years and across different agencies and organizations. An outline of factors is presented that is used when organizing resources and planning research on African languages that categorizes major or crit- ical languages within a framework that allows for broad coverage of the full linguistic diversity of the continent. Keywords: language prioritization, African languages, capacity building, language diversity, language documentation When building language capacity on an individual or localized level, the question of which languages matter most is relatively less complicated than it is for those planning and providing for language capabilities at the macro level. An American anthropology student working with Sierra Leonean refugees in Forecariah, Guinea, for ex- ample, will likely know how to address and balance needs for lan- guage skills in French, Susu, Krio, and a set of other languages such as Temne and Mandinka. An education official or activist in Mwanza, Tanzania, will be concerned primarily with English, Swahili, and Su- kuma. An administrator of a grant program for Less Commonly Taught Languages, or LCTLs, or a newly appointed language authori- ty for the United States Department of Education, Department of Commerce, or U.S. -
Translating Brazil: from Transnational Periodicals to Hemispheric Fictions, 1808-2010
Translating Brazil: From Transnational Periodicals to Hemispheric Fictions, 1808-2010 By Krista Marie Brune A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Hispanic Languages and Literatures in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Natalia Brizuela, Co-chair Professor Candace Slater, Co-chair Professor Scott Saul Spring 2016 Abstract Translating Brazil: From Transnational Periodicals to Hemispheric Fictions, 1808-2010 by Krista Marie Brune Doctor of Philosophy in Hispanic Languages and Literatures University of California, Berkeley Professor Natalia Brizuela, Co-chair Professor Candace Slater, Co-chair This dissertation analyzes how travel and translation informed the construction of Brazil as modern in the 19th century, and how similar processes of transnational translation continue to shape the cultural visibility of the nation abroad in the contemporary moment. By reading journals, literary works, and cultural criticism, this study inserts Brazilian literature and culture into recent debates about translatability, world literature, and cosmopolitanism, while also underscoring the often-overlooked presence of Brazilians in the United States. The first half of the dissertation contends that Portuguese-language periodicals Correio Braziliense (London, 1808-1822), Revista Nitheroy (Paris, 1836), and O Novo Mundo (New York, 1870-1879) translated European and North American ideas of technology and education to a readership primarily in Brazil. The transnational circulation of these periodicals contributed to the self- fashioning of intellectuals who came to define the nation. To suggest parallels between Brazil and the United States in the late 19th century, the analysis of O Novo Mundo focuses on discourses of nation, modernity, and technological progress emerging in the hemispheric travels of scientists, intellectuals, and the Brazilian empire Dom Pedro II, and in the national displays at the 1876 Centennial Exhibition in Philadelphia. -
Texas Alsatian
2017 Texas Alsatian Karen A. Roesch, Ph.D. Indiana University-Purdue University Indianapolis Indianapolis, Indiana, USA IUPUI ScholarWorks This is the author’s manuscript: This is a draft of a chapter that has been accepted for publication by Oxford University Press in the forthcoming book Varieties of German Worldwide edited by Hans Boas, Anna Deumert, Mark L. Louden, & Péter Maitz (with Hyoun-A Joo, B. Richard Page, Lara Schwarz, & Nora Hellmold Vosburg) due for publication in 2016. https://scholarworks.iupui.edu Texas Alsatian, Medina County, Texas 1 Introduction: Historical background The Alsatian dialect was transported to Texas in the early 1800s, when entrepreneur Henri Castro recruited colonists from the French Alsace to comply with the Republic of Texas’ stipulations for populating one of his land grants located just west of San Antonio. Castro’s colonization efforts succeeded in bringing 2,134 German-speaking colonists from 1843 – 1847 (Jordan 2004: 45-7; Weaver 1985:109) to his land grants in Texas, which resulted in the establishment of four colonies: Castroville (1844); Quihi (1845); Vandenburg (1846); D’Hanis (1847). Castroville was the first and most successful settlement and serves as the focus of this chapter, as it constitutes the largest concentration of Alsatian speakers. This chapter provides both a descriptive account of the ancestral language, Alsatian, and more specifically as spoken today, as well as a discussion of sociolinguistic and linguistic processes (e.g., use, shift, variation, regularization, etc.) observed and documented since 2007. The casual observer might conclude that the colonists Castro brought to Texas were not German-speaking at all, but French. -
The Use of Third Person Accusative Pronouns in Spoken Brazilian Portuguese: an Analysis of Different Tv Genres
THE USE OF THIRD PERSON ACCUSATIVE PRONOUNS IN SPOKEN BRAZILIAN PORTUGUESE: AN ANALYSIS OF DIFFERENT TV GENRES by Flávia Stocco Garcia A Thesis submitted to the Faculty of Graduate Studies of The University of Manitoba in partial fulfillment of the requirements of the degree of MASTER OF ARTS Department of Linguistics University of Manitoba Winnipeg Copyright © 2015 by Flávia Stocco Garcia ABSTRACT This thesis presents an analysis of third person accusative pronouns in Brazilian Portuguese. With the aim to analyze the variation between the use of standard (prescribed by normative grammar) and non-standard pronouns found in oral language, I gathered data from three kinds of TV show (news, non-scripted and soap-opera) in order to determine which form of pronoun is more common and if there is any linguistic and/or sociolinguistic factors that will influence on their usage. Based on data collected, I demonstrate that non-standard forms are favored in general and that the rules prescribed by normative grammar involving standard forms are only followed in specific contexts. Among all the variables considered for the analysis, the ones that showed to be significant were the kind of show, the context of the utterance, the socio-economic status of the speaker and verbs in the infinitive. Considering my results, I provide a discussion regarding to which extent the distribution of the 3rd-person pronouns on TV reflect their use by Brazilians and a brief discussion of other issues related to my findings conclude this work. 2 ACKNOWLEDGEMENTS First and foremost, I would like to express my gratitude to my advisor, Verónica Loureiro-Rodríguez, for all her help during the completion of this work. -
German Dialects in Kansas and Missouri Scholarworks User Guide August, 2020
German Dialects in Kansas and Missouri ScholarWorks User Guide August, 2020 Table of Contents INTERVIEW METHODOLOGY ................................................................................................................................. 1 THE RECORDINGS ................................................................................................................................................. 2 THE SPEAKERS ...................................................................................................................................................... 2 KANSAS ....................................................................................................................................................................... 3 MISSOURI .................................................................................................................................................................... 4 THE QUESTIONNAIRES ......................................................................................................................................... 5 WENKER SENTENCES ..................................................................................................................................................... 5 KU QUESTIONNAIRE ..................................................................................................................................................... 6 REFERENCE MAPS FOR LOCATING POTENTIAL SPEAKERS ..................................................................................... -
Verb-Second Intricacies: an Investigation Into Verb Positions in English
UNIVERSITY OF PRIŠTINA FACULTY OF PHILOSOPHY Nenad Pejić Verb-Second Intricacies: An Investigation into Verb Positions in English Doctoral Dissertation Supervisor: dr Dragana Spasić Kosovska Mitrovica, 2011. УНИВЕРЗИТЕТ У ПРИШТИНИ ФИЛОЗОФСКИ ФАКУЛТЕТ Ненад Пејић Проблематика глагола као другог конституента: испитивање позиције глагола у енглеском језику Докторска дисертација Ментор: др Драгана Спасић Косовска Митровица, 2011. Table of Contents 1. Introduction ………………………………………………………………………..…..1 1.1. The Problem ……………………………………………………………………..…..1 1.2. Generative Grammar ……………………………………………………….………..3 1.3. Germanic Languages ……………………………………………………….………..9 2. Syntactic Change ……………………………………………………………….……16 2.1. Introduction …………………………………………………………………...……16 2.2. Actuation and Diffusion of Change …………………………………………..……17 2.3. The Locus of Change ………………………………………………………………21 2.4. Language Change versus Grammar Change ………………………………….……23 2.4.1. The Logical Problem of Language Change …………………………………25 2.5. The Principles and Parameters Model of Change ……………………………….…27 2.5.1. Parametric Change ……………………………………………………..……31 2.6. Mechanisms of Syntactic Change …………………………………………….……33 2.6.1 Internal Mechanisms …………………………………………………………34 2.6.1.1 Reanalysis ……………………………………………………………35 2.6.1.1 1. Grammaticalization ……………………………………..…39 2.6.1.2 Extension …………………………………………………………..…40 2.6.2 External Mechanism …………………………………………………………42 2.6.2.1 Language Contact ……………………………………………………43 2.7. Theories of Syntactic Change ………………………………………………...……44 2.7.1. Structuralist Approach ………………………………………………………45 -
Corpus Collection for Under-Resourced Languages with More Than One Million Speakers
Corpus collection for under-resourced languages with more than one million speakers Dirk Goldhahn1, Maciej Sumalvico1, Uwe Quasthoff1,2 Natural Language Processing Group, University of Leipzig, Germany Department of African Languages, University of South Africa, South Africa Email: { dgoldhahn, janicki, quasthoff, }@informatik.uni-leipzig.de Abstract For only 40 of about 350 languages with more than one million speakers, the situation concerning text resources is comfortable. For the remaining languages, the number of speakers indicates a need for both corpora and tools. This paper describes a corpus collection initiative for these languages. While random Web crawling has serious limitations, native speakers with knowledge of web pages in their language are of invaluable help. The aim is to give interested scholars, language enthusiasts the possibility to contribute to corpus creation or extension by simply entering a URL into the Web Interface. Using a Web portal URLs of interest are collected with the help of the respective communities. A standardized corpus processing chain for daily newspaper corpora creation is adapted to append newly added web pages to an increasing corpus. As a result we will be able to collect larger corpora for under-resourced languages by a community effort. These corpora will be made publicly available. Keywords: corpora, under-resourced languages, Web portal, community 1. Introduction links require the execution of JavaScript code by the There are about 350 languages with more than one million crawler which often produces errors. So, if a website speakers1. For about 40 of them, the situation concerning heavily uses JavaScript links and there are no other links text resources is comfortable: there are corpora of pointing to special pages (coming from another website, reasonable size and also tools like POS taggers adapted to for instance), then all but the main page might be these languages. -
Šiauliai University Faculty of Humanities Department of English Philology
ŠIAULIAI UNIVERSITY FACULTY OF HUMANITIES DEPARTMENT OF ENGLISH PHILOLOGY RENDERING OF GERMANIC PROPER NAMES IN THE LITHUANIAN PRESS BACHELOR THESIS Research Adviser: Assist. L.Petrulion ė Student: Aist ė Andži ūtė Šiauliai, 2010 CONTENTS INTRODUCTION......................................................................................................................3 1. THE CONCEPTION OF PROPER NAMES.........................................................................5 1.2. The development of surnames.............................................................................................6 1.3. Proper names in Germanic languages .................................................................................8 1.3.1. Danish, Norwegian, Swedish and Icelandic surnames.................................................9 1.3.2. Dutch surnames ..........................................................................................................12 1.3.3. English surnames........................................................................................................13 1.3.4. German surnames .......................................................................................................14 2. NON-LITHUANIAN SURNAMES ORTHOGRAPHY .....................................................16 2.1. The historical development of the problem.......................................................................16 2.2. The rules of transcriptions of non-Lithuanian proper names ............................................22 3. THE USAGE