The Nordic Research Infrastructure
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Beyond Borealism
Beyond Borealism: New Perspectives on the North eds. Ian Giles, Laura Chapot, Christian Cooijmans, Ryan Foster and Barbara Tesio Norvik Press 2016 © 2016 Charlotte Berry, Laura Chapot, Marc Chivers, Pei-Sze Chow, Christian Cooijmans, Jan D. Cox, Stefan Drechsler, Ryan Foster, Ian Giles, Karianne Hansen, Pavel Iosad, Elyse Jamieson, Ellen Kythor, Shane McLeod, Hafor Medbøe, Kitty Corbet Milward, Eleanor Parker, Silke Reeploeg, Cristina Sandu, Barbara Tesio. Norvik Press Series C: Student Writing no. 3 A catalogue record for this book is available from the British Library. ISBN: 978-1-909408-33-3 Norvik Press Department of Scandinavian Studies UCL Gower Street London WC1E 6BT United kingdom Website: www.norvikpress.com E-mail address: [email protected] Managing Editors: Elettra Carbone, Sarah Death, Janet Garton, C.Claire Tomson. Cover design: Sarah Diver Lang and Elettra Carbone Inside cover image: Sarah Diver Lang Layout: Elettra Carbone 2 Contents Introduction 7 Acknowledgements 12 Biographies 13 Editor Biographies 13 Contributor Biographies 15 ECHOES OF HISTORY 21 The Medieval Seal of Reynistaður 22 Stefan Drechsler ‘So Very Memorable a Matter’: Anglo-Danish History and the Encomium Emmae Reginae 41 Eleanor Parker A Similar but Different Boat Tradition: The Import of Boats from Norway to Shetland 1700 to 1872 54 Marc Chivers LINGUISTIC LIAISONS 77 Tonal Stability and Tonogenesis in (North) Germanic 78 Pavel Iosad Imperative Commands in Shetland Dialect: Nordic Origins? 96 Elyse Jamieson 3 ART AND SOCIETY 115 The Battle of Kringen -
Using Constraint Grammar for Treebank Retokenization
Using Constraint Grammar for Treebank Retokenization Eckhard Bick University of Southern Denmark [email protected] Abstract multiple tokens, in this case allowing the second part (the article) to become part of a separate np. This paper presents a Constraint Grammar-based method for changing the Tokenization is often regarded as a necessary tokenization of existing annotated data, evil best treated by a preprocessor with an establishing standard space-based abbreviation list, but has also been subject to ("atomic") tokenization for corpora methodological research, e.g. related to finite- otherwise using MWE fusion and state transducers (Kaplan 2005). However, there contraction splitting for the sake of is little research into changing the tokenization of syntactic transparency or for semantic a corpus once it has been annotated, limiting the reasons. Our method preserves ingoing comparability and alignment of corpora, or the and outgoing dependency arcs and allows evaluation of parsers. The simplest solution to the addition of internal tags and structure this problem is making conflicting systems for MWEs. We discuss rule examples and compatible by changing them into "atomic evaluate the method against both a tokenization", where all spaces are treated as Portuguese treebank and live news text token boundaries, independently of syntactic or annotation. semantic concerns. This approach is widely used in the machine-learning (ML) community, e.g. 1 Introduction for the Universal Dependencies initiative (McDonald et al. 2013). The method described in In an NLP framework, tokenization can be this paper can achieve such atomic tokenization defined as the identification of the smallest of annotated treebank data without information meaningful lexical units in running text. -
Modeling Regional Variation in Voice Onset Time of Jutlandic Varieties of Danish
Chapter 4 Modeling regional variation in voice onset time of Jutlandic varieties of Danish Rasmus Puggaard Universiteit Leiden It is a well-known overt feature of the Northern Jutlandic variety of Danish that /t/ is pronounced with short voice onset time and no affrication. This is not lim- ited to Northern Jutland, but shows up across the peninsula. This paper expands on this research, using a large corpus to show that complex geographical pat- terns of variation in voice onset time is found in all fortis stops, but not in lenis stops. Modeling the data using generalized additive mixed modeling both allows us to explore these geographical patterns in detail, as well as test a number of hypotheses about how a number of environmental and social factors affect voice onset time. Keywords: Danish, Jutlandic, phonetics, microvariation, regional variation, stop realization, voice onset time, aspiration, generalized additive mixed modeling 1. Introduction A well-known feature of northern Jutlandic varieties of Danish is the use of a variant of /t/ known colloquially as the ‘dry t’. While the Standard Danish variant of /t/ has a highly affricated release, the ‘dry t’ does not. Puggaard (2018) showed that variation in this respect goes beyond just that particular phonetic feature and dialect area: the ‘dry t’ also has shorter voice onset time (VOT) than affricated variants, and a less affricated, shorter variant of /t/ is also found in the center of Jutland. This paper expands on Puggaard (2018) with the primary goals of providing a sounder basis for investigating the geographic spread of the variation, and to test whether the observed variation is limited to /t/ or reflects general patterns in plosive realization. -
Using Danish As a CG Interlingua: a Wide-Coverage Norwegian-English Machine Translation System
Using Danish as a CG Interlingua: A Wide-Coverage Norwegian-English Machine Translation System Eckhard Bick Lars Nygaard Institute of Language and The Text Laboratory Communication University of Southern Denmark University of Oslo Odense, Denmark Oslo, Norway [email protected] [email protected] Abstract running, mixed domain text. Also, some languages, like English, German and This paper presents a rule-based Japanese, are more equal than others, Norwegian-English MT system. not least in a funding-heavy Exploiting the closeness of environment like MT. Norwegian and Danish, and the The focus of this paper will be existence of a well-performing threefold: Firstly, the system presented Danish-English system, Danish is here is targeting one of the small, used as an «interlingua». «unequal» languages, Norwegian. Structural analysis and polysemy Secondly, the method used to create a resolution are based on Constraint Norwegian-English translator, is Grammar (CG) function tags and ressource-economical in that it uses dependency structures. We another, very similar language, Danish, describe the semiautomatic as an «interlingua» in the sense of construction of the necessary translation knowledge recycling (Paul Norwegian-Danish dictionary and 2001), but with the recycling step at the evaluate the method used as well SL side rather than the TL side. Thirdly, as the coverage of the lexicon. we will discuss an unusual analysis and transfer methodology based on Constraint Grammar dependency 1 Introduction parsing. In short, we set out to construct a Norwegian-English MT Machine translation (MT) is no longer system by building a smaller, an unpractical science. Especially the Norwegian-Danish one and piping its advent of corpora with hundreds of output into an existing Danish deep millions of words and advanced parser (DanGram, Bick 2003) and an machine learning techniques, bilingual existing, robust Danish-English MT electronic data and advanced machine system (Dan2Eng, Bick 2006 and 2007). -
The Northwest European Phonological Area New Approaches to an Old Problem
The northwest European phonological area New approaches to an old problem Pavel Iosad The University of Edinburgh [email protected] Linguistic Circle 29th September 2016 Outline • A Northern European Sprachbund? • Three case studies: – Preaspiration – Tonogenesis out of syllable counts – Sonorant pre-occlusion • Prosodic structure as the common denominator • Revisiting contact: what does it take? 1 Nordeuropäische Lautgeographie 1.1 Phonological connections Trubetzkoy: Proposition 16 • Trubetzkoy (1928): phonology isn’t very important for defining a Sprachbund Gruppen, bestehend aus Sprachen, die eine grosse Ähnlichkeit in syntakti- scher Hinsicht, eine Ähnlichkeit in den Grundsätzen des morphologischen Baus aufweisen, und eine grosse Anzahl gemeinsamer Kulturwörter bieten, manchmal auch äussere Ähnlichkeit im Bestande der Lautsysteme,—dabei aber keine gemeinsame Elementarwörter besitzen—solche Sprachgrupper nennen wir Sprachbünde (emphasis mine) ¹ ¹‘We call language areas (Sprachbünde) groups that consist of languages showing a large similarity in syn- tactic terms, a similarity in the basics of morphological structure and a large number of common cultural vocabulary — sometimes also a superficial similarity in their sound inventories — without, however, sharing core vocabulary.’ 1 Jakobson: Über die phonologischen Sprachbünde • Jakobson (1931): a Baltic Sprachbund exists, defined by ‘tonality’ Ebenso bilden die Sprachen des Baltikums einen Sprachbund, den die Poly- tonie kennzeichnet; hierher gehören: das Schwedische, das Norwegische mit Ausnahme der nordwestlichen Mundarten, die meisten dänischen Dialekte, einige norddeutsche Mundarten, das Nordkaschubische, das Litauische und Lettische, das Livische und Estnische. In den meisten Sprachen und Mund- arten dieses Bundes ist die Tonverlaufkorrelation und in den übrigen ihre Ab- änderung, die Tonbruchkorrelation, vorhanden. In allen Sprachen des balti- schen Bundes, mit Ausnahme der litauisch-lettischen Familie, ist die Polytonie eine Neubildung. -
Floresta Sinti(C)Tica : a Treebank for Portuguese
)ORUHVWD6LQWi F WLFD$WUHHEDQNIRU3RUWXJXHVH 6XVDQD$IRQVR (FNKDUG%LFN 5HQDWR+DEHU 'LDQD6DQWRV *VISL project, University of Southern Denmark Institute of Language and Communication, Campusvej, 55, 5230 Odense M, Denmark [email protected], [email protected] ¡ SINTEF Telecom & Informatics, Pb 124, Blindern, NO-0314 Oslo, Norway [email protected],[email protected] $EVWUDFW This paper reviews the first year of the creation of a publicly available treebank for Portuguese, Floresta Sintá(c)tica, a collaboration project between the VISL and the Computational Processing of Portuguese projects. After briefly describing the main goals and the organization of the project, the creation of the annotated objects is presented in detail: preparing the text to be annotated, applying the Constraint Grammar based PALAVRAS parser, revising its output manually in a two-stage process, and carefully documenting the linguistic options. Some examples of the kind of interesting problems dealt with are presented, and the paper ends with a brief description of the tools developed, the project results so far, and a mention to a preliminary inter-annotator test and what was learned from it. supporting 16 different languages. VISL's Portuguese ,QWURGXFWLRQ0RWLYDWLRQDQGREMHFWLYHV system is based on the PALAVRAS parser (Bick, 2000), There are various good motives for creating a and has been functioning as a role model for other Portuguese treebank, one of them simply being the desire languages. More recently, VISL has moved to incorporate to make a new research tool available to the Portuguese semantic research, machine translation, and corpus language community, another the wish to establish some annotation proper. -
Instructions for Preparing LREC 2006 Proceedings
Translating the Swedish Wikipedia into Danish Eckhard Bick University of Southern Denmark Rugbjergvej 98, DK 8260 Viby J [email protected] Abstract Abstract. This paper presents a Swedish-Danish automatic translation system for Wikipedia articles (WikiTrans). Translated articles are indexed for both title and content, and integrated with original Danish articles where they exist. Changed or added articles in the Swedish Wikipedia are monitored and added on a daily basis. The translation approach uses a grammar-based machine translation system with a deep source-language structural analysis. Disambiguation and lexical transfer rules exploit Constraint Grammar tags and dependency links to access contextual information, such as syntactic argument function, semantic type and quantifiers. Out-of-vocabulary words are handled by derivational and compound analysis with a combined coverage of 99.3%, as well as systematic morpho-phonemic transliterations for the remaining cases. The system achieved BLEU scores of 0.65-0.8 depending on references and outperformed both STMT and RBMT competitors by a large margin. 1. Introduction syntactic function tags, dependency trees and a The amount of information available in Wikipedia semantic classification of both nouns and named differs greatly between languages, and many topics are entities. badly covered in small languages, with short, missing or stub-style articles. This asymmetry can be found 2. The Translation System (Swe2Dan) between Scandinavian languages, too. Thus, the In spite of the relatedness of Swedish and Danish, a Swedish Wikipedia has 6 times more text than its one-on-one translation is possible in less than 50% of Danish equivalent. Robot-created articles have helped all tokens. -
Sverrestauslandjohnsen
Sverre Stausland Johnsen (Updated September 20, 2021) Department of Linguistics and Scandinavian Studies [email protected] University of Oslo https://sverrestausland.github.io/ po Box 1102 Blindern Google Scholar profile 0317 Oslo, Norway Employment 2016– Professor of Scandinavian Linguistics Department of Linguistics and Scandinavian Studies University of Oslo 2016 Associate Professor of Scandinavian Linguistics Department of Linguistics and Scandinavian Studies University of Oslo 2015 Associate Professor of Norwegian and Teaching Norwegian as a Native Language Department of Languages Buskerud and Vestfold University College 2012–2015 Postdoctoral Research Fellow in Scandinavian Linguistics Department of Linguistics and Scandinavian Studies University of Oslo 2013–2015 Adjunct Assistant Professor Department of Foreign Languages & Literatures National Chiao Tung University 2008–2010 Teaching Fellow Department of Germanic Languages and Literatures Harvard University 2007–2010 Teaching Fellow Department of Linguistics Harvard University Education 2011 Doctor of Philosophy (Ph.D.) in Linguistics, Harvard University. Dissertation: The origin of variation in Norwegian retroflexion. Committee: Adam Albright, Jay Jasanoff, & Maria Polinsky. Major field: Phonology Minor field: Historical linguistics 1 of 17 Sverre Stausland Johnsen University of Oslo 2008 Master of Arts (M.A.), Harvard University. 2005 Master of Philosophy (M.Phil.) in Language – Comparative Germanic Linguistics, University of Oslo. Dissertation: The Germanic (i)jō-stem declension: Origin and development. Committee: Jón Axel Harðarson & Trygve Skomedal. 2003 Bachelor of Arts (B.A.), University of Oslo. Publications Articles and book chapters 2021 1901-rettskrivingi var den fyrsta samnorsk-umboti. Norsk Årbok, 111–118. 2021 Grammaticalization in Somali and the development of morphological tone. Pro- ceedings of the Linguistic Society of America 6(1): 587–599. -
17Th Nordic Conference of Computational Linguistics (NODALIDA
17th Nordic Conference of Computational Linguistics (NODALIDA 2009) NEALT Proceedings Series Volume 4 Odense, Denmark 14 – 16 May 2009 Editors: Kristiina Jokinen Eckhard Bick ISBN: 978-1-5108-3465-1 Printed from e-media with permission by: Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 Some format issues inherent in the e-media version may also appear in this print version. Copyright© (2009) by the Association for Computational Linguistics All rights reserved. Printed by Curran Associates, Inc. (2017) For permission requests, please contact the Association for Computational Linguistics at the address below. Association for Computational Linguistics 209 N. Eighth Street Stroudsburg, Pennsylvania 18360 Phone: 1-570-476-8006 Fax: 1-570-476-0860 [email protected] Additional copies of this publication are available from: Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 USA Phone: 845-758-0400 Fax: 845-758-2633 Email: [email protected] Web: www.proceedings.com Contents Contents iii Preface vii Commitees ix Conference Program xi I Invited Papers 1 JEAN CARLETTA Developing Meeting Support Technologies: From Data to Demonstration (and Beyond) 2 RALF STEINBERGER Linking News Content Across Languages 4 II Tutorial 6 GRAHAM WILCOCK Text Annotation with OpenNLP and UIMA 7 III Regular papers 9 LENE ANTONSEN,SAARA HUHMARNIEMI AND TROND TROSTERUD Interactive pedagogical programs based on constraint grammar 10 JARI BJÖRNE,FILIP GINTER,JUHO HEIMONEN,SAMPO PYYSALO AND TAPIO SALAKOSKI Learning to Extract Biological Event and -
Building Runic Networks
Háskóli Íslands Hugvísindasvið Viking and Medieval Norse Studies Building Runic Networks A Digital Investigation of Dialects in the East Norse Runic Corpus Ritgerð til MA-prófs í Viking and Medieval Norse Studies Benjamin Holt Kt.: 220589-4719 Leiðbeinendur: Haraldur Bernharðsson Michael Lerche Nielsen September 2017 ABSTRACT The aim of this thesis is to create a complex and three-dimensional overview of East Norse dialects in the age of runic inscription (approximately 700 AD through 1200 AD). It does so through the use of two innovations – namely, variable co-occurrence and network analysis – that allow for greater depth and complexity than previous studies offer. Prior scholarship has focused primarily on only one set of linguistic variables. By examining and analyzing the occurrences of two sets of variables simultaneously, this thesis exponentially increases the complexity – and thus credibility – of the resultant dialectal analysis. Creating networks of runic inscriptions based on these co-occurrences makes it possible to free dialectal data from abstract tables and visualize linguistic connections and patterns in a previously unexplored manner. By so doing, this thesis presents new and innovative insight into the dialects of Runic Swedish, Runic Danish, and Runic Gutnish and paves the way for future digital research into the same. ÚTDRÁTTUR Markmið þessarar ritgerðar er að skapa margbrotið þrívíddaryfirlit yfir austnorrænar mállýskur í rúnaáletrunum (u.þ.b. 700–1200). Þetta er gert með notkun tveggja nýjunga – greiningar á sameiginlegum málbreytum og netgreiningu (e. network analysis) – sem gera það kleift að ná dýpri innsýn og margslungnari niðurstöðum en fyrri rannsóknir þar sem sjónum hefur aðeins verið beint að einni samstæðu af málbreytum. -
Frag, a Hybrid Constraint Grammar Parser for French
FrAG, a Hybrid Constraint Grammar Parser for French Eckhard Bick University of Southern Denmark Odense, Denmark eckhard.bick @ mail.dk Abstract This paper describes a hybrid system (FrAG) for tagging / parsing French text, and presents results from ongoing development work, corpus annotation and evaluation. The core of the system is a sentence scope Constraint Grammar (CG), with linguist-written rules. However, unlike traditional CG, the system uses hybrid techniques on both its morphological input side and its syntactic output side. Thus, FrAG draws on a pre-existing probabilistic Decision Tree Tagger (DTT) before and in parallel with its own lexical stage, and feeds its output into a Phrase Structure Grammar (PSG) that uses CG syntactic function tags rather than ordinary terminals in its rewriting rules. As an alternative architecture, dependency tree structures are also supported. In the newest version, dependencies are assigned within the CG-framework itself, and can interact with other rules. To provide semantic context, a semantic prototype ontology for nouns is used, covering a large part of the lexicon. In a recent test run on Parliamentary debate transcripts, FrAG achieved F-scores of 98.7 % for part of speech (PoS) and between 93.1 % and 96.2 % for syntactic function tags. Dependency links were correct in 95.9 %. 1 CG with probabilistic input lexemes with information on This paper describes a hybrid tagger/parser for French, the PoS, paradigmatical 65.470 French Annotation Grammar (FrAG), and presents verbal valency 6.218 preliminary results from ongoing development work, nominal valency 230 corpus annotation and evaluation. The core of the system semantic class (nouns) 17.860 is a sentence scope Constraint Grammar (CG), with linguist-written rules modelled on similar systems for Table 1: Lexical information types Portuguese and Danish (Bick 2000). -
Prescriptive Infinitives in the Modern North Germanic Languages: An
Nor Jnl Ling 39.3, 231–276 C Nordic Association of Linguists 2016 doi:10.1017/S0332586516000196 Johannessen, Janne Bondi. 2016. Prescriptive infinitives in the modern North Germanic languages: An ancient phenomenon in child-directed speech. Nordic Journal of Linguistics 39(3), 231–276. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Prescriptive infinitives in the modern North Germanic languages: An ancient phenomenon in child-directed speech Janne Bondi Johannessen The prescriptive infinitive can be found in the North Germanic languages, is very old, and yet is largely unnoticed and undescribed. It is used in a very limited pragmatic context of a pleasant atmosphere by adults towards very young children, or towards pets or (more rarely) adults. It has a set of syntactic properties that distinguishes it from the imperative: Negation is pre-verbal, subjects are pre-verbal, subjects are third person and are only expressed by lexical DPs, not personal pronouns. It can be found in modern child language corpora, but probably originated before AD 500. The paper is largely descriptive, but some theoretical solutions to the puzzles of this construction are proposed. Keywords: child-directed speech, context roles, finiteness, imperatives, negation, North Germanic languages, prescriptive infinitives, subjects, word order Janne Bondi Johannessen, University of Oslo, MultiLing & Text Lab, Department of Linguistics and Scandinavian Studies, P.O. Box 1102 Blindern, N–0317 Oslo, Norway.