The Nordic Dialect Corpus – an Advanced Research Tool
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Influence of Old Norse on the English Language
Antonius Gerardus Maria Poppelaars HUSBANDS, OUTLAWS AND KIDS: THE INFLUENCE OF OLD NORSE ON THE ENGLISH LANGUAGE HUSBANDS, OUTLAWS E KIDS: A INFLUÊNCIA DO NÓRDICO ANTIGO NA LÍNGUA INGLESA Antonius Gerardus Maria Poppelaars1 Abstract: What have common English words such as husbands, outlaws and kids and the sentence they are weak to do with Old Norse? Yet, all these examples are from Old Norse, the Norsemen’s language. However, the Norse influence on English is underestimated as the Norsemen are viewed as barbaric, violent pirates. Also, the Norman occupation of England and the Great Vowel Shift have obscured the Old Norse influence. These topics, plus the Viking Age, the Scandinavian presence in England, as well as the Old Norse linguistic influence on English and the supposed French influence of the Norman invasion will be described. The research for this etymological article was executed through a descriptive- qualitative approach. Concluded is that the Norsemen have intensively influenced English due to their military supremacy and their abilities to adaptation. Even the French-Norman French language has left marks on English. Nowadays, English is a lingua franca, leading to borrowings from English to many languages, which is often considered as invasive. But, English itself has borrowed from other languages, maintaining its proper character. Hence, it is hoped that this article may contribute to a greater acknowledgement of the Norse influence on English and undermine the scepticism towards the English language as every language has its importance. Keywords: Old Norse Loanwords, English Language, Viking Age, Etymology. Resumo: O que têm palavras inglesas comuns como husbands, outlaws e kids e a frase they are weak a ver com os Nórdicos? Todos esses exemplos são do nórdico antigo, a língua dos escandinavos. -
Using Constraint Grammar for Treebank Retokenization
Using Constraint Grammar for Treebank Retokenization Eckhard Bick University of Southern Denmark [email protected] Abstract multiple tokens, in this case allowing the second part (the article) to become part of a separate np. This paper presents a Constraint Grammar-based method for changing the Tokenization is often regarded as a necessary tokenization of existing annotated data, evil best treated by a preprocessor with an establishing standard space-based abbreviation list, but has also been subject to ("atomic") tokenization for corpora methodological research, e.g. related to finite- otherwise using MWE fusion and state transducers (Kaplan 2005). However, there contraction splitting for the sake of is little research into changing the tokenization of syntactic transparency or for semantic a corpus once it has been annotated, limiting the reasons. Our method preserves ingoing comparability and alignment of corpora, or the and outgoing dependency arcs and allows evaluation of parsers. The simplest solution to the addition of internal tags and structure this problem is making conflicting systems for MWEs. We discuss rule examples and compatible by changing them into "atomic evaluate the method against both a tokenization", where all spaces are treated as Portuguese treebank and live news text token boundaries, independently of syntactic or annotation. semantic concerns. This approach is widely used in the machine-learning (ML) community, e.g. 1 Introduction for the Universal Dependencies initiative (McDonald et al. 2013). The method described in In an NLP framework, tokenization can be this paper can achieve such atomic tokenization defined as the identification of the smallest of annotated treebank data without information meaningful lexical units in running text. -
Using Danish As a CG Interlingua: a Wide-Coverage Norwegian-English Machine Translation System
Using Danish as a CG Interlingua: A Wide-Coverage Norwegian-English Machine Translation System Eckhard Bick Lars Nygaard Institute of Language and The Text Laboratory Communication University of Southern Denmark University of Oslo Odense, Denmark Oslo, Norway [email protected] [email protected] Abstract running, mixed domain text. Also, some languages, like English, German and This paper presents a rule-based Japanese, are more equal than others, Norwegian-English MT system. not least in a funding-heavy Exploiting the closeness of environment like MT. Norwegian and Danish, and the The focus of this paper will be existence of a well-performing threefold: Firstly, the system presented Danish-English system, Danish is here is targeting one of the small, used as an «interlingua». «unequal» languages, Norwegian. Structural analysis and polysemy Secondly, the method used to create a resolution are based on Constraint Norwegian-English translator, is Grammar (CG) function tags and ressource-economical in that it uses dependency structures. We another, very similar language, Danish, describe the semiautomatic as an «interlingua» in the sense of construction of the necessary translation knowledge recycling (Paul Norwegian-Danish dictionary and 2001), but with the recycling step at the evaluate the method used as well SL side rather than the TL side. Thirdly, as the coverage of the lexicon. we will discuss an unusual analysis and transfer methodology based on Constraint Grammar dependency 1 Introduction parsing. In short, we set out to construct a Norwegian-English MT Machine translation (MT) is no longer system by building a smaller, an unpractical science. Especially the Norwegian-Danish one and piping its advent of corpora with hundreds of output into an existing Danish deep millions of words and advanced parser (DanGram, Bick 2003) and an machine learning techniques, bilingual existing, robust Danish-English MT electronic data and advanced machine system (Dan2Eng, Bick 2006 and 2007). -
Administration of Donald J. Trump, 2020 Proclamation 10097—Leif
Administration of Donald J. Trump, 2020 Proclamation 10097—Leif Erikson Day, 2020 October 8, 2020 By the President of the United States of America A Proclamation More than 1,000 years ago, the Norse explorer and Viking Leif Erikson made landfall in modern-day Newfoundland, likely becoming the first European to discover the New World. Today, Leif Erikson represents over a millennium of shared history between the Nordic countries and the Americas and symbolizes the many contributions of Nordic Americans to our great Nation. Accomplished in the face of daunting danger and carried out in service of Judeo-Christian values, Leif Erikson's story reflects the fundamental truths about the American character. On a mission to evangelize Greenland, Leif Erikson and his crew were blown off course. They had to brave the cold waters of the northern Atlantic to find safe harbor on the North American coastline. In surviving this ordeal, these hardened Vikings tested the limits of human exploration in a way that continues to inspire us today. In 1825, six Norwegian families repeated this voyage, landing their sloop in New York Harbor in the first organized migration to the United States from Scandinavia. Like the Puritans and pilgrims before them, these people came to our Nation seeking religious freedom and safety from persecution. Now, more than 11 million Americans can trace their roots to Denmark, Finland, Iceland, Norway, and Sweden, and among them stand Nobel Laureates, Academy Award winners, and Legion of Merit recipients. Across our Nation, from the Danish villages of western Iowa to the Norwegian Ridge in Minnesota and the Finns of Michigan's Upper Peninsula, Nordic Americans have left their mark on our culture, economy, and society. -
Floresta Sinti(C)Tica : a Treebank for Portuguese
)ORUHVWD6LQWi F WLFD$WUHHEDQNIRU3RUWXJXHVH 6XVDQD$IRQVR (FNKDUG%LFN 5HQDWR+DEHU 'LDQD6DQWRV *VISL project, University of Southern Denmark Institute of Language and Communication, Campusvej, 55, 5230 Odense M, Denmark [email protected], [email protected] ¡ SINTEF Telecom & Informatics, Pb 124, Blindern, NO-0314 Oslo, Norway [email protected],[email protected] $EVWUDFW This paper reviews the first year of the creation of a publicly available treebank for Portuguese, Floresta Sintá(c)tica, a collaboration project between the VISL and the Computational Processing of Portuguese projects. After briefly describing the main goals and the organization of the project, the creation of the annotated objects is presented in detail: preparing the text to be annotated, applying the Constraint Grammar based PALAVRAS parser, revising its output manually in a two-stage process, and carefully documenting the linguistic options. Some examples of the kind of interesting problems dealt with are presented, and the paper ends with a brief description of the tools developed, the project results so far, and a mention to a preliminary inter-annotator test and what was learned from it. supporting 16 different languages. VISL's Portuguese ,QWURGXFWLRQ0RWLYDWLRQDQGREMHFWLYHV system is based on the PALAVRAS parser (Bick, 2000), There are various good motives for creating a and has been functioning as a role model for other Portuguese treebank, one of them simply being the desire languages. More recently, VISL has moved to incorporate to make a new research tool available to the Portuguese semantic research, machine translation, and corpus language community, another the wish to establish some annotation proper. -
Norsk Ordbok - the Crown of Nynorsk Lexicography?
Lars S. Vik0r, Sectionfor Norwegian Lexicography, University ofOslo Norsk Ordbok - the Crown of Nynorsk Lexicography? Abstract Norsk Ordbok 'Norwegian Dictionary' is a multi-volume dictionary of the Norwegian standard variety Nynorsk and the Norwegian dialects. It is one of the very few dictionaries which cover both a written standard language and the oral dialects on which this standard is based. It was initiated around 1930, based on dialect material collected by volunteers and stored in a vast card archive, and on a variety of written sources. At present, three oftwelve planned volumes have appeared, reaching into g. The paper gives a historical outline of the project, followed by a brief description of its structure and the types of information it gives. This is exemplified by the treatment of one particular word, bunad. Finally, some fundamental problems are briefly discussed: 1) the selection of lemmas, 2) the character of the sources, 3) the treatment of dialect forms, 4) the sequence of definitions. The full title of Norsk Ordbok is Norsk Ordbok. Ordbok over det norske folkemâlet og det nynorske skriftmâlet 'Norwegian Dictionary. A dic tionary of the Norwegian popular language [i.e. the Norwegian dialects], and the Nynorsk written language'. This title at once indicates the dual aspect of the dictionary: It gives integrated coverage of both oral dialects and a written standard language. This dual aspect is the most special distinguishing feature of Norsk Ordbok as a lexicographic work. Normally, dictionaries cover written standard languages or some aspect of them (or, in the case of pro nouncing dictionaries, oral standard language). -
Instructions for Preparing LREC 2006 Proceedings
Translating the Swedish Wikipedia into Danish Eckhard Bick University of Southern Denmark Rugbjergvej 98, DK 8260 Viby J [email protected] Abstract Abstract. This paper presents a Swedish-Danish automatic translation system for Wikipedia articles (WikiTrans). Translated articles are indexed for both title and content, and integrated with original Danish articles where they exist. Changed or added articles in the Swedish Wikipedia are monitored and added on a daily basis. The translation approach uses a grammar-based machine translation system with a deep source-language structural analysis. Disambiguation and lexical transfer rules exploit Constraint Grammar tags and dependency links to access contextual information, such as syntactic argument function, semantic type and quantifiers. Out-of-vocabulary words are handled by derivational and compound analysis with a combined coverage of 99.3%, as well as systematic morpho-phonemic transliterations for the remaining cases. The system achieved BLEU scores of 0.65-0.8 depending on references and outperformed both STMT and RBMT competitors by a large margin. 1. Introduction syntactic function tags, dependency trees and a The amount of information available in Wikipedia semantic classification of both nouns and named differs greatly between languages, and many topics are entities. badly covered in small languages, with short, missing or stub-style articles. This asymmetry can be found 2. The Translation System (Swe2Dan) between Scandinavian languages, too. Thus, the In spite of the relatedness of Swedish and Danish, a Swedish Wikipedia has 6 times more text than its one-on-one translation is possible in less than 50% of Danish equivalent. Robot-created articles have helped all tokens. -
Social-Ecological Resilience in the Viking-Age to Early-Medieval Faroe Islands
City University of New York (CUNY) CUNY Academic Works All Dissertations, Theses, and Capstone Projects Dissertations, Theses, and Capstone Projects 9-2015 Social-Ecological Resilience in the Viking-Age to Early-Medieval Faroe Islands Seth Brewington Graduate Center, City University of New York How does access to this work benefit ou?y Let us know! More information about this work at: https://academicworks.cuny.edu/gc_etds/870 Discover additional works at: https://academicworks.cuny.edu This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] SOCIAL-ECOLOGICAL RESILIENCE IN THE VIKING-AGE TO EARLY-MEDIEVAL FAROE ISLANDS by SETH D. BREWINGTON A dissertation submitted to the Graduate Faculty in Anthropology in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York 2015 © 2015 SETH D. BREWINGTON All Rights Reserved ii This manuscript has been read and accepted for the Graduate Faculty in Anthropology to satisfy the dissertation requirement for the degree of Doctor of Philosophy. _Thomas H. McGovern__________________________________ ____________________ _____________________________________________________ Date Chair of Examining Committee _Gerald Creed_________________________________________ ____________________ _____________________________________________________ Date Executive Officer _Andrew J. Dugmore____________________________________ _Sophia Perdikaris______________________________________ _George Hambrecht_____________________________________ -
17Th Nordic Conference of Computational Linguistics (NODALIDA
17th Nordic Conference of Computational Linguistics (NODALIDA 2009) NEALT Proceedings Series Volume 4 Odense, Denmark 14 – 16 May 2009 Editors: Kristiina Jokinen Eckhard Bick ISBN: 978-1-5108-3465-1 Printed from e-media with permission by: Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 Some format issues inherent in the e-media version may also appear in this print version. Copyright© (2009) by the Association for Computational Linguistics All rights reserved. Printed by Curran Associates, Inc. (2017) For permission requests, please contact the Association for Computational Linguistics at the address below. Association for Computational Linguistics 209 N. Eighth Street Stroudsburg, Pennsylvania 18360 Phone: 1-570-476-8006 Fax: 1-570-476-0860 [email protected] Additional copies of this publication are available from: Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 USA Phone: 845-758-0400 Fax: 845-758-2633 Email: [email protected] Web: www.proceedings.com Contents Contents iii Preface vii Commitees ix Conference Program xi I Invited Papers 1 JEAN CARLETTA Developing Meeting Support Technologies: From Data to Demonstration (and Beyond) 2 RALF STEINBERGER Linking News Content Across Languages 4 II Tutorial 6 GRAHAM WILCOCK Text Annotation with OpenNLP and UIMA 7 III Regular papers 9 LENE ANTONSEN,SAARA HUHMARNIEMI AND TROND TROSTERUD Interactive pedagogical programs based on constraint grammar 10 JARI BJÖRNE,FILIP GINTER,JUHO HEIMONEN,SAMPO PYYSALO AND TAPIO SALAKOSKI Learning to Extract Biological Event and -
Valuing Immigrant Memories As Common Heritage
Valuing Immigrant Memories as Common Heritage The Leif Erikson Monument in Boston TORGRIM SNEVE GUTTORMSEN This article examines the history of the monument to the Viking and transatlantic seafarer Leif Erikson (ca. AD 970–1020) that was erected in 1887 on Common- wealth Avenue in Boston, Massachusetts. It analyzes how a Scandinavian-American immigrant culture has influenced America through continued celebration and commemoration of Leif Erikson and considers Leif Erikson monuments as a heritage value for the public good and as a societal resource. Discussing the link between discovery myths, narratives about refugees at sea and immigrant memo- ries, the article suggests how the Leif Erikson monument can be made relevant to present-day society. Keywords: immigrant memories; historical monuments; Leif Erikson; national and urban heritage; Boston INTRODUCTION At the unveiling ceremony of the Leif Erikson monument in Boston on October 29, 1887, the Governor of Massachusetts, Oliver Ames, is reported to have opened his address with the following words: “We are gathered here to do honor to the memory of a man of whom indeed but little is known, but whose fame is that of having being one of those pioneers in the world’s history, whose deeds have been the source of the most important results.”1 Governor Ames was paying tribute to Leif Erikson (ca. AD 970–1020) from Iceland, who, according to the Norse Sagas, was a Viking Age transatlantic seafarer and explorer.2 At the turn History & Memory, Vol. 30, No. 2 (Fall/Winter 2018) 79 DOI: 10.2979/histmemo.30.2.04 79 This content downloaded from 158.36.76.2 on Tue, 28 Aug 2018 11:30:49 UTC All use subject to https://about.jstor.org/terms Torgrim Sneve Guttormsen of the nineteenth century, the story about Leif Erikson’s being the first European to land in America achieved popularity in the United States. -
Dialect Acquisition and Migration in Norway – Questions of Authenticity, Belonging and Legitimacy
Journal of Multilingual and Multicultural Development ISSN: 0143-4632 (Print) 1747-7557 (Online) Journal homepage: https://www.tandfonline.com/loi/rmmm20 Dialect acquisition and migration in Norway – questions of authenticity, belonging and legitimacy Unn Røyneland & Bård Uri Jensen To cite this article: Unn Røyneland & Bård Uri Jensen (2020): Dialect acquisition and migration in Norway – questions of authenticity, belonging and legitimacy, Journal of Multilingual and Multicultural Development, DOI: 10.1080/01434632.2020.1722679 To link to this article: https://doi.org/10.1080/01434632.2020.1722679 © 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group Published online: 31 Jan 2020. Submit your article to this journal Article views: 169 View related articles View Crossmark data Citing articles: 1 View citing articles Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=rmmm20 JOURNAL OF MULTILINGUAL AND MULTICULTURAL DEVELOPMENT https://doi.org/10.1080/01434632.2020.1722679 Dialect acquisition and migration in Norway – questions of authenticity, belonging and legitimacy Unn Røynelanda and Bård Uri Jensena,b* aDepartment of Linguistics and Scandinavian Studies, Center for Multilingualism in Society Across the Lifespan - MultiLing (CoE), University of Oslo, Oslo, Norway; bDepartment of Education, Inland Norway University of Applied Sciences, Hamar, Norway ABSTRACT ARTICLE HISTORY Norway is known for its dialect diversity and also for the fact that dialects, Received 13 January 2020 on the whole, are cherished and used within all social domains and by Accepted 23 January 2020 people in all social strata. Previous studies indicate that also immigrants KEYWORDS to Norway tend to acquire and use local speech, and that this generally Dialect use; visual-verbal is positively perceived. -
Thediachronyof Definitenessinnorth Germanic
The Diachrony of Definiteness in North Germanic Brill’s Studies in Historical Linguistics Series Editor Jóhanna Barðdal (Ghent University) Consulting Editor Spike Gildea (University of Oregon) Editorial Board Joan Bybee (University of New Mexico) – Lyle Campbell (University of Hawai’i Manoa) – Nicholas Evans (The Australian National University) Bjarke Frellesvig (University of Oxford) – Mirjam Fried (Czech Academy of Sciences) – Russel Gray (University of Auckland) – Tom Guldemann (Humboldt-Universität zu Berlin) – Alice Harris (University of Massachusetts) Brian D. Joseph (The Ohio State University) – Ritsuko Kikusawa (National Museum of Ethnology) – Silvia Luraghi (Università di Pavia) Joseph Salmons (University of Wisconsin) – Søren Wichmann (mpi/eva) volume 14 The titles published in this series are listed at brill.com/bshl The Diachrony of Definiteness in North Germanic By Dominika Skrzypek Alicja Piotrowska Rafał Jaworski leiden | boston This is an open access title distributed under the terms of the cc by-nc-nd 4.0 license, which permits any non-commercial use, distribution, and reproduction in any medium, provided no alterations are made and the original author(s) and source are credited. Further information and the complete license text can be found at https://creativecommons.org/licenses/by-nc-nd/4.0/ The terms of the cc license apply only to the original material. The use of material from other sources (indicated by a reference) such as diagrams, illustrations, photos and text samples may require further permission from the respective copyright holder. The research presented in this monograph was financed by a research grant from the Polish National Science Centre (ncn) entitled Diachrony of definiteness in Scandinavian languages, number 2015/19/b/hs2/00143.