Character-Level Models

Total Page:16

File Type:pdf, Size:1020Kb

Character-Level Models Character-Level Models Hinrich Sch¨utze Center for Information and Language Processing, LMU Munich 2019-08-29 1 / 70 Overview 1 Motivation 2 fastText 3 CNNs 4 FLAIR 5 Summary 2 / 70 Outline 1 Motivation 2 fastText 3 CNNs 4 FLAIR 5 Summary 3 / 70 Typical NLP pipeline: Tokenization Mr. O’Neill thinks that Brazil’s capital is Rio. Mr.|O’Neill|thinks|that|Brazil|’s|capital|is|Rio|. 4 / 70 Typical NLP pipeline: Tokenization Mr. O’Neill thinks that Brazil’s capital is Rio. Mr.|O’Neill|thinks|that|Brazil|’s|capital|is|Rio|. 5 / 70 Typical NLP pipeline: Tokenization Mr. O’Neill thinks that Brazil’s capital is Rio. Mr.|O’Neill|thinks|that|Brazil|’s|capital|is|Rio|. 6 / 70 Typical NLP pipeline: Tokenization Mr. O’Neill thinks that Brazil’s capital is Rio. Mr.|O’Neill|thinks|that|Brazil|’s|capital|is|Rio|. 7 / 70 Typical NLP pipeline: Tokenization Mr. O’Neill thinks that Brazil’s capital is Rio. Mr.|O’Neill|thinks|that|Brazil|’s|capital|is|Rio|. 8 / 70 Typical NLP pipeline: Tokenization Mr. O’Neill thinks that Brazil’s capital is Rio. Mr.|O’Neill|thinks|that|Brazil|’s|capital|is|Rio|. 9 / 70 Typical NLP pipeline: Morphological analysis For example: lemmatization Mr. O’Neill knows that the US has fifty states Mr. O’Neill know that the US have fifty state 10 / 70 Preprocessing in the typical NLP pipeline Tokenization Morphological analysis Later today: BPEs What is the problem with this? 11 / 70 Problems with typical preprocessing in NLP Rules do not capture structure within tokens. Regular morphology, e.g., compounding: “Staubecken” can mean “Staub-Ecken” (dusty corners) or “Stau-Becken” (dam reservoir) Non-morphological, semi-regular productivity: cooooooooooool, fancy-shmancy, Watergate/Irangate/Dieselgate Blends: Obamacare, mockumentary, brunch Onomatopoeia, e.g., “oink”, “sizzle”, “tick tock” Certain named entity classes: What is “lisinopril”? Noise due to spelling errors: “signficant” Noise that affects token boundaries, e.g., in OCR: “run fast” → “runfast” 12 / 70 Problems with typical preprocessing in NLP Rules do not capture structure across tokens. Noise that affects token boundaries, e.g., in OCR: “gumacamole” → “guaca” “mole” recognition of names / multiphrase expressions “San Francisco-Los Angeles flights” “Nonsegmented” languages: Chinese, Thai, Burmese 13 / 70 Pipelines in deep learning (and StatNLP in general) We have a pipeline consisting of two differ- ent subsystems: A preprocessing component: tokenization, morphology, BPEs The deep learning model that is optimized for a particular objective The preprocessing component is not optimal for the objective and there are many cases where it’s outright harmful. If we replace the preprocessing component with a character-level layer, we can train the architecture end2end and get rid of the pipeline. 14 / 70 Advantages of end2end vs. pipeline End2end optimizes all parameters of a deep learning model directly for the learning objective, including “first-layer” parameters that connect the raw input representation to the first layer of internal representations of the network. Pipelines generally don’t allow “backtracking” if an error has been made in the first element of the pipeline. In character-level models, there is no such thing as an out-of-vocabulary word. (OOV analysis) Character-level models can generate words / units that did not occur in the training set (OOV generation). End2end can deal better with human productivity (e.g., “brunch”), misspellings etc. 15 / 70 Three character-level models fastText Bag of character ngrams Character-aware CNN (Kim, Jernite, Sontag, Rush, 2015) CNN FLAIR Character-level BiLSTM 16 / 70 Outline 1 Motivation 2 fastText 3 CNNs 4 FLAIR 5 Summary 17 / 70 fastText FastText is an extension of word2vec. It computes embeddings for character ngrams A word’s embedding is the sum of its character ngram embeddings. Parameters: minimum ngram length: 3, maximum ngram length: 6 The embedding of “dendrite” will be the sum of the following ngrams: @dendrite@ @de den end ndr dri rit ite te@ @den dend endr ndri drit rite ite@ @dend dendr endri ndrit drite rite@ @dendr dendri endrit ndrite drite@ 18 / 70 fastText: Example for benefits Embedding for character ngram “dendri” → “dentrite” and “dentritic” are similar word2vec: no guarantee, especially for rare words 19 / 70 fastText paper 20 / 70 fastText objective T X X − log p(wc |wt ) t=1 c∈Ct T length of the training corpus in tokens Ct words surrounding word wt 21 / 70 Probability of a context word: softmax? exp(s(wt , wc )) p(wc |wt )= W , Pj=1 exp(s(wj wc )) s(wt , wc ) scoring function that maps word pair to R Problems: too expensive 22 / 70 Instead of softmax: Negative sampling and binary logistic loss log(1 + exp(−s(wt , wc ))) + X log(1 + exp(s(wt , wn))) n∈Nt,c ℓ(s(wt , wc )) + X ℓ(−s(wt , wn)) n∈Nt,c Nt,c set of negative examples sampled from the vocabulary ℓ(x) log(1 + exp(−x)) (logistic loss) 23 / 70 Binary logistic loss for corpus T X X ℓ(s(wt , wc )) + X ℓ(−s(wt , wn)) t=1 c∈Ct n∈Nt,c ℓ(x) log(1 + exp(−x)) 24 / 70 Scoring function u⊺ v s(wt , wc )= wt wc u wt the input vector of wt v wc the output vector (or context vector) of wc 25 / 70 Subword model 1 z⊺ v s(wt , wc )= X g wc |Gwt | g∈Gwt Gwt set of ngrams of wt and wt itself 26 / 70 fastText: Summary Basis: word2vec skipgram Objective: includes character ngrams as well as word itself Result: word embeddings that combine word-level and character-level information We can compute an embedding for any unseen word (OOV). 27 / 70 Letter n-gram generalization can be good word2vec 1.000 automobile 779 mid-size 770 armored 763 seaplane 754 bus 754 jet 751 submarine 750 aerial 744 improvised 741 anti-aircraft fastText 1.000 automobile 976 automobiles 929 Automobile 858 manufacturing 853 motorcycles 849 Manufacturing 848 motorcycle 841 automotive 814 manufacturer 811 manufacture 28 / 70 Letter n-gram generalization can be bad word2vec 1.000 Steelers 884 Expos 865 Cubs 848 Broncos 831 Dinneen 831 Dolphins 827 Pirates 826 Copley 818 Dodgers 814 Raiders fastText 1.000 Steelers 893 49ers 883 Steele 876 Rodgers 857 Colts 852 Oilers 851 Dodgers 849 Chalmers 849 Raiders 844 Coach 29 / 70 Letter n-gram generalization: no-brainer for unknowns (OOVs) word2vec (“video-conferences” did not occur in corpus) fastText 1.000 video-conferences 942 conferences 872 conference 870 Conferences 823 inferences 806 Questions 805 sponsorship 800 References 797 participates 796 affiliations 30 / 70 fastText extensions (Mikolov et al, 2018) 31 / 70 fastText extensions (Mikolov et al, 2018) Position-dependent features Phrases (like word2vec) cbow Pretrained word vectors for 157 languages 32 / 70 fastText evaluation 33 / 70 Code fastText https://fasttext.cc gensim https://radimrehurek.com/gensim/ 34 / 70 Pretrained fasttext embeddings Afrikaans, Albanian, Alemannic, Amharic, Arabic, Aragonese, Armenian, Assamese, Asturian, Azerbaijani, Bashkir, Basque, Bavarian, Belarusian, Bengali, Bihari, Bishnupriya Manipuri, Bosnian, Breton, Bulgarian, Burmese, Catalan, Cebuano, Central Bicolano, Chechen, Chinese, Chuvash, Corsican, Croatian, Czech, Danish, Divehi, Dutch, Eastern Punjabi, Egyptian Arabic, Emilian-Romagnol, English, Erzya, Esperanto, Estonian, Fiji Hindi, Finnish, French, Galician, Georgian, German, Goan Konkani, Greek, Gujarati, Haitian, Hebrew, Hill Mari, Hindi, Hungarian, Icelandic, Ido, Ilokano, Indonesian, Interlingua, Irish, Italian, Japanese, Javanese, Kannada, Kapampangan, Kazakh, Khmer, Kirghiz, Korean, Kurdish (Kurmanji), Kurdish (Sorani), Latin, Latvian, Limburgish, Lithuanian, Lombard, Low Saxon, Luxembourgish, Macedonian, Maithili, Malagasy, Malay, Malayalam, Maltese, Manx, Marathi, Mazandarani, Meadow Mari, Minangkabau, Mingrelian, Mirandese, Mongolian, Nahuatl, Neapolitan, Nepali, Newar, North Frisian, Northern Sotho, Norwegian (Bokm˚al), Norwegian (Nynorsk), Occitan, Oriya, Ossetian, Palatinate German, Pashto, Persian, Piedmontese, Polish, Portuguese, Quechua, Romanian, Romansh, Russian, Sakha, Sanskrit, Sardinian, Scots, Scottish Gaelic, Serbian, Serbo-Croatian, Sicilian, Sindhi, Sinhalese, Slovak, Slovenian, Somali, Southern Azerbaijani, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Upper Sorbian, Urdu, Uyghur, Uzbek, Venetian, Vietnamese, Volap¨uk, Walloon, Waray, Welsh, West Flemish, West Frisian, Western Punjabi, Yiddish, Yoruba, Zazaki, Zeelandic 35 / 70 fastText skipgram parameters -input <path> training file path -output <path> output file path -lr (0.05) learning rate -lrUpdateRate (100) rate of updates for the learning rate -dim (100) dimensionality of word embeddings -ws (5) size of the context window -epoch (5) number of epochs 36 / 70 fastText skipgram parameters -minCount (5) minimal number of word occurrences -neg (5) number of negatives sampled -wordNgrams (1) max length of word ngram -loss (ns) loss function ∈ { ns, hs, softmax } -bucket (2,000,000) number of buckets -minn (3) min length of char ngram -maxn (6) max length of char ngram 37 / 70 fastText skipgram parameters -threads (12) number of threads -t (0.0001) sampling threshold -label <string> labels prefix 38 / 70 Outline 1 Motivation 2 fastText 3 CNNs 4 FLAIR 5 Summary 39 / 70 Convolutional Neural Networks (CNNs): Basic idea We learn feature detectors. Each feature detector has a fixed size, e.g., a width of three characters. We slide the feature detector over the input (e.g., an input word). The feature detector indicates for each point in the input the activation of the feature at that point. Then we pass to the next layer
Recommended publications
  • University of Alberta
    University of Alberta Making Magyars, Creating Hungary: András Fáy, István Bezerédj and Ödön Beöthy’s Reform-Era Contributions to the Development of Hungarian Civil Society by Eva Margaret Bodnar A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy in History Department of History and Classics © Eva Margaret Bodnar Spring 2011 Edmonton, Alberta Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission. Abstract The relationship between magyarization and Hungarian civil society during the reform era of Hungarian history (1790-1848) is the subject of this dissertation. This thesis examines the cultural and political activities of three liberal oppositional nobles: András Fáy (1786-1864), István Bezerédj (1796-1856) and Ödön Beöthy (1796-1854). These three men were chosen as the basis of this study because of their commitment to a two- pronged approach to politics: they advocated greater cultural magyarization in the multiethnic Hungarian Kingdom and campaigned to extend the protection of the Hungarian constitution to segments of the non-aristocratic portion of the Hungarian population.
    [Show full text]
  • Germanic Standardizations: Past to Present (Impact: Studies in Language and Society)
    <DOCINFO AUTHOR ""TITLE "Germanic Standardizations: Past to Present"SUBJECT "Impact 18"KEYWORDS ""SIZE HEIGHT "220"WIDTH "150"VOFFSET "4"> Germanic Standardizations Impact: Studies in language and society impact publishes monographs, collective volumes, and text books on topics in sociolinguistics. The scope of the series is broad, with special emphasis on areas such as language planning and language policies; language conflict and language death; language standards and language change; dialectology; diglossia; discourse studies; language and social identity (gender, ethnicity, class, ideology); and history and methods of sociolinguistics. General Editor Associate Editor Annick De Houwer Elizabeth Lanza University of Antwerp University of Oslo Advisory Board Ulrich Ammon William Labov Gerhard Mercator University University of Pennsylvania Jan Blommaert Joseph Lo Bianco Ghent University The Australian National University Paul Drew Peter Nelde University of York Catholic University Brussels Anna Escobar Dennis Preston University of Illinois at Urbana Michigan State University Guus Extra Jeanine Treffers-Daller Tilburg University University of the West of England Margarita Hidalgo Vic Webb San Diego State University University of Pretoria Richard A. Hudson University College London Volume 18 Germanic Standardizations: Past to Present Edited by Ana Deumert and Wim Vandenbussche Germanic Standardizations Past to Present Edited by Ana Deumert Monash University Wim Vandenbussche Vrije Universiteit Brussel/FWO-Vlaanderen John Benjamins Publishing Company Amsterdam/Philadelphia TM The paper used in this publication meets the minimum requirements 8 of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. Library of Congress Cataloging-in-Publication Data Germanic standardizations : past to present / edited by Ana Deumert, Wim Vandenbussche.
    [Show full text]
  • Characteristics of the West-Central-Bavarian Vowel System - a Comparison Between Adults and Children
    Characteristics of the West-Central-Bavarian vowel system - a comparison between adults and children The West-Central-Bavarian (WCB) dialect, which is spoken in the south of Germany and in most parts of Austria, has often been a subject of research, due to its large vowel system with an astonishing number of diphthongs, that do not exist in the corresponding Standard language at all. Although there is a large amount of literature concerned with descriptions of the dialect, nearly all of it is based on impressionistic auditory descriptions (Zehetner, 1985; Merkle, 1976; Capell, 1979; Mansell, 1973a; Keller, 1961; Mansell, 1973a). While in the last decades systematic acoustic analyses on the Austrian side of the Bavarian dialect have been increasingly elaborated (Moosmüller et al.), the German side still remains largely unexplored. However, there is much evidence that Standard German (SG) is superimposed on German dialects, causing sound change in the respective dialects (e.g. Müller et al. (2001) for East-Franconian, Bukmaier & Harrington (2014) for Augsburg German). The goal of the current study was 1) to systematically measure some of the defining vowel characteristics of WCB for an acoustically based analysis of the Bavarian vowel system and 2) to investigate whether these characteristics are being preserved across generations or if there is a sound change in progress observable, in which young speakers show more standard characteristics than old on some attributes of vowels where Bavarian and the Standard are known to differ. The new concept for testing 2) is to combine synchronic and diachronic approaches in order to detect sound change.
    [Show full text]
  • Identifying Regional Varieties by Pitch Information: a Comparison of Two Approaches
    Identifying regional varieties by pitch information: A comparison of two approaches Jörg Petersa, Peter Gillesb, Peter Auerb, and Margret Seltingc aUniversity of Nijmegen, The Netherlands bUniversity of Freiburg, Germany cUniversity of Potsdam, Germany E-mail: [email protected], [email protected], [email protected], [email protected] ABSTRACT moderate success rate of their experiment may be due to the method of converting the speech signal into a pulse train. It is a commonly held belief that languages and dialects can This method may destroy crucial prosodic information like be identified by pitch information alone. In most previous syllable or word boundaries. A similar argument may apply experiments, subjects were presented with pitch informa- to the other studies mentioned, as in all experiments listen- tion as well as with limited information on amplitude and ers were presented with lower-level, i.e. non-linguistic, in- timing but not with higher-level, i.e. linguistic, information formation only. as represented by intonation contours. The question arises An experiment that did include higher-level information as to whether higher-level information may significantly was reported by Romano [6]. Romano tested the hypothesis enhance success rates. To evaluate both approaches, two that some of the prosodic differences between regional experiments were carried out. In the first experiment, varieties of Italian persist in spoken standard Italian. Speak- listeners were presented with isolated pitch information ers from six villages of the Salento (South East of Italy) extracted from recordings of four varieties of German. In were asked to read sentences in standard Italian.
    [Show full text]
  • Binary Tree — up to 3 Related Nodes (List Is Special-Case)
    trees 1 are lists enough? for correctness — sure want to efficiently access items better than linear time to find something want to represent relationships more naturally 2 inter-item relationships in lists 1 2 3 4 5 List: nodes related to predecessor/successor 3 trees trees: allow representing more relationships (but not arbitrary relationships — see graphs later in semester) restriction: single path from root to every node implies single path from every node to every other node (possibly through root) 4 natural trees: phylogenetic tree image: Ivicia Letunic and Mariana Ruiz Villarreal, via the tool iTOL (Interative Tree of Life), via Wikipedia 5 natural trees: phylogenetic tree (zoom) image: Ivicia Letunic and Mariana Ruiz Villarreal, via the tool iTOL (Interative Tree of Life), via Wikipedia 6 natural trees: Indo-European languages INDO-EUROPEAN ANATOLIAN Luwian Hittite Carian Lydian Lycian Palaic Pisidian HELLENIC INDO-IRANIAN DORIAN Mycenaean AEOLIC INDO-ARYAN Doric Attic ACHAEAN Aegean Northwest Greek Ionic Beotian Vedic Sanskrit Classical Greek Arcado Thessalian Tsakonian Koine Greek Epic Greek Cypriot Sanskrit Prakrit Greek Maharashtri Gandhari Shauraseni Magadhi Niya ITALIC INSULAR INDIC Konkani Paisaci Oriya Assamese BIHARI CELTIC Pali Bengali LATINO-FALISCAN SABELLIC Dhivehi Marathi Halbi Chittagonian Bhojpuri CONTINENTAL Sinhalese CENTRAL INDIC Magahi Faliscan Oscan Vedda Maithili Latin Umbrian Celtiberian WESTERN INDIC HINDUSTANI PAHARI INSULAR Galatian Classical Latin Aequian Gaulish NORTH Bhil DARDIC Hindi Urdu CENTRAL EASTERN
    [Show full text]
  • Language and Space
    Language and Space An International Handbook o Linguistic Variation Volume 2: Language Mapping Edited by Alred Lameli Roland Kehrein Stean Rabanus Oprint De Gruyter Mouton 180 II. Traditions 9. Mapping Dutch and Flemish 1. Introduction 2. Prescientific linguistic maps or dialect data covering the entire Dutch language area 3. Dialect surveys 4. Linguistic atlases of the entire Dutch language area 5. Linguistic atlases or maps covering a Dutch region (regional atlases) 6. Special linguistic atlases or linguistic maps 7. Conclusion 8. Atlases 9. References 1. Introduction The beginnings and growth of dialectology in the Netherlands and Flanders can only be discussed against the background of the situation in neighboring countries. In reaction to the Enlightenment, the Romantic period brought with it an interest in and apprecia- tion of prescientific culture and nostalgia for the past. More attention came to be paid to art, myths, fairy tales and the language of ordinary people past and present, as op- posed to the Enlightenment period interest in cultured standard language. The new focus was imported from the France of Rousseau and the Germany of Schiller. But later devel- opments in the Netherlands and Flanders did not take place in isolation either. Thus, the German linguist Georg Wenker and his French colleague Jules Gillie´ron were the unmistakable precursors of Dutch linguistic atlas projects, while the work of Jean Se´guy and of Hans Goebl served as models for dialectometry in our region. In section 2 of this chapter, we take a brief look at prescientific linguistic maps and dialect collections. In section 3, dialect surveys that have not been developed into linguis- tic atlases are discussed.
    [Show full text]
  • Reproductions Supplied by EDRS Are the Best That Can Be Made from the Original Document
    DOCUMENT RESUME ED 447 692 FL 026 310 AUTHOR Breathnech, Diarmaid, Ed. TITLE Contact Bulletin, 1990-1999. INSTITUTION European Bureau for Lesser Used Languages, Dublin (Ireland). SPONS AGENCY Commission of the European Communities, Brussels (Belgium). PUB DATE 1999-00-00 NOTE 398p.; Published triannually. Volume 13, Number 2 and Volume 14, Number 2 are available from ERIC only in French. PUB TYPE Collected Works Serials (022) LANGUAGE English, French JOURNAL CIT Contact Bulletin; v7-15 Spr 1990-May 1999 EDRS PRICE MF01/PC16 Plus Postage. DESCRIPTORS Ethnic Groups; Irish; *Language Attitudes; *Language Maintenance; *Language Minorities; Second Language Instruction; Second Language Learning; Serbocroatian; *Uncommonly Taught Languages; Welsh IDENTIFIERS Austria; Belgium; Catalan; Czech Republic;-Denmark; *European Union; France; Germany; Greece; Hungary; Iceland; Ireland; Italy; *Language Policy; Luxembourg; Malta; Netherlands; Norway; Portugal; Romania; Slovakia; Spain; Sweden; Ukraine; United Kingdom ABSTRACT This document contains 26 issues (the entire output for the 1990s) of this publication deaicated to the study and preservation of Europe's less spoken languages. Some issues are only in French, and a number are in both French and English. Each issue has articles dealing with minority languages and groups in Europe, with a focus on those in Western, Central, and Southern Europe. (KFT) Reproductions supplied by EDRS are the best that can be made from the original document. N The European Bureau for Lesser Used Languages CONTACT BULLETIN This publication is funded by the Commission of the European Communities Volumes 7-15 1990-1999 REPRODUCE AND PERMISSION TO U.S. DEPARTMENT OF EDUCATION MATERIAL HAS Office of Educational Research DISSEMINATE THIS and Improvement BEEN GRANTEDBY EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC) This document has beenreproduced as received from the personor organization Xoriginating it.
    [Show full text]
  • Copyright by Collin Laine Brown 2018
    Copyright by Collin Laine Brown 2018 The Dissertation Committee for Collin Laine Brown Certifies that this is the approved version of the following Dissertation: CONVERSION, HERESY, AND WITCHCRAFT: THEOLOGICAL NARRATIVES IN SCANDINAVIAN MISSIONARY WRITINGS Committee: Marc Pierce, Supervisor Peter Hess Martha Newman Troy Storfjell Sandra Straubhaar CONVERSION, HERESY, AND WITCHCRAFT: THEOLOGICAL NARRATIVES IN SCANDINAVIAN MISSIONARY WRITINGS by Collin Laine Brown Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy The University of Texas at Austin December 2018 Dedication Soli Deo gloria. Acknowledgements First and foremost, I would like to acknowledge my wife Robin. She especially helped me through the research and writing process, and kept me sane through the stress of having to spend so much time away from her while in graduate school. I wish that my late father Doug could be here, and I know that he would be thrilled to see me receive my PhD. It was his love of history that helped set me on the path I find myself today. My academic family has also been amazing during my time in graduate school. Good friends were always there to keep me motivated and stimulate my research. The professors involved in my project are also much deserving of my thanks: Marc Pierce, my advisor, as well as Sandra Straubhaar, Peter Hess, Martha Newman, and Troy Storfjell. I am grateful for their help and support, and for the opportunity to embark on this very interdisciplinary and very fulfilling project.
    [Show full text]
  • Discord & Consensus
    c Discor Global Dutch: Studies in Low Countries Culture and History onsensus Series Editor: ulrich tiedau DiscorD & Discord and Consensus in the Low Countries, 1700–2000 explores the themes D & of discord and consensus in the Low Countries in the last three centuries. consensus All countries, regions and institutions are ultimately built on a degree of consensus, on a collective commitment to a concept, belief or value system, 1700–2000 TH IN IN THE LOW COUNTRIES, 1700–2000 which is continuously rephrased and reinvented through a narrative of cohesion, and challenged by expressions of discontent and discord. The E history of the Low Countries is characterised by both a striving for consensus L and eruptions of discord, both internally and from external challenges. This OW volume studies the dynamics of this tension through various genres. Based C th on selected papers from the 10 Biennial Conference of the Association OUNTRI for Low Countries Studies at UCL, this interdisciplinary work traces the themes of discord and consensus along broad cultural, linguistic, political and historical lines. This is an expansive collection written by experts from E a range of disciplines including early-modern and contemporary history, art S, history, film, literature and translation from the Low Countries. U G EDIT E JANE FENOULHET LRICH is Professor of Dutch Studies at UCL. Her research RDI QUIST AND QUIST RDI E interests include women’s writing, literary history and disciplinary history. BY D JAN T I GERDI QUIST E is Lecturer in Dutch and Head of Department at UCL’s E DAU F Department of Dutch.
    [Show full text]
  • Analysis of Huncokars' Dialect
    109 ETHNOLOGIA ACTUALIS Vol. 14, No. 2/2014 DUŠAN FEDIČ Analysis of Huncokars’ Dialect Analysis of Huncokars’ Dialect DUŠAN FEDIČ Department of German Studies, University of SS. Cyril and Methodius in Trnava [email protected] ABSTRACT The study is an attempt to present the background and the first results of the current research, documentation of the language of German woodcutters known as Huncokars. We introduce the characteristic Huncokars’ dialect. Based on the record comparison of Huncokars’ dialect with dialects in Tyrol, Styria and Bavaria we have identified possible area from which Huncokars came to Slovakia. Huncokars have developed a language- specific enclosed settlement, which lives either through several individual memories of their descendants or as a part of the collective memory of today's local communities living near the former settlements. KEY WORDS: Huncokars, German colonization, Germans‘dialect, Huncokars’ dialect Introduction Mountain woodcutters - "Huncokári" (hereinafter Huncokars), living in the Little Carpathians Mountain massif completing the western part of Slovakia, are a specific language and social ethnic group. They are the last wave of German settlement in Slovakia. The colonization under the German law took place from the mid-12th century to the end of the 15th century, affecting a large part of the area and all spheres of political, economic and cultural life. Migration of the German population took place in the following DOI: 10.1515/eas-2015-0006 © University of SS. Cyril and Methodius in Trnava. All rights reserved. 110 ETHNOLOGIA ACTUALIS Vol. 14, No. 2/2014 DUŠAN FEDIČ Analysis of Huncokars’ Dialect centuries, but at a much lower intensity and fundamentally different legal, economic and socio-political conditions.
    [Show full text]
  • On the Frontier Between Eastern and Western Yiddish: Sources from Burgenland
    European Journal of Jewish Studies 11 (2017) 130–147 brill.com/ejjs On the Frontier between Eastern and Western Yiddish: Sources from Burgenland Lea Schäfer* Abstract Burgenland, the smallest state of current Austria, located on the border with Hungary, once had seven vibrant Jewish communities under the protection of the Hungarian Eszterházy family. There is next to nothing known about the Yiddish variety spoken in these communities. This article brings together every single piece of evidence of this language to get an impression of its structure. This article shows that Yiddish from Burgenland can be integrated into the continuum between Eastern and Western Yiddish and is part of a gradual transition zone between these two main varieties. Keywords Yiddish dialectology and phonology – Jews in Austria and Hungary – Eastern and Western Yiddish transition zone Burgenland, the smallest state of Austria today, located on the border with Hungary, once had seven vibrant Jewish communities that stood under the protection of the Hungarian Eszterházy family. There is next to nothing known about the Yiddish variety spoken in these communities. Its geographical posi- tion, however, makes Burgenland interesting for Yiddish dialectology. As Dovid Katz has postulated, it is on the southern end of a transition zone between Eastern and Western Yiddish.1 This article will show that Yiddish * I would like to thank Jeffrey Pheiff, Oliver Schallert and Ricarda Scherschel for checking my English. I also want to thank the anonymous reviewers for their useful comments. 1 Dovid Katz, “Zur Dialektologie des Jiddischen,” in Dialektologie: Ein Handbuch zur deutschen und allgemeinen Dialektforschung 1.2., eds.
    [Show full text]
  • Dutch Guiana Demographics and Living Conditions and the Emergence of Dutch Creoles During the First One Hundred Years, 1580–1675
    journal of language contact 8 (2015) 70-90 brill.com/jlc Dutch Guiana Demographics and Living Conditions and the Emergence of Dutch Creoles during the First One Hundred Years, 1580–1675 Silvia Kouwenberg University of the West Indies, Mona [email protected] Abstract The first one hundred years of the Dutch presence on the “Wild Coast” of Guiana, beginning with exploratory voyages and establishment of trading networks, and cul- minating in the establishment of plantation societies in Berbice and Essequibo, forms the historical context for the emergence of the Dutch creole languages of Berbice and Essequibo. This article explores that historical backdrop, focusing on the early planta- tion colonies, their management, and the presence and roles of different linguistic groups: Amerindian, Dutch, African. Amerindians—both free and enslaved—formed a numerically dominant presence in the initial plantation phase; although they were soon to be outnumbered by enslaved Africans, they were present on and around the plantations throughout the history of these Dutch colonies. It is surprising, then, to note that Arawak-origin material in rather peripheral domains of the Berbice Dutch lexicon forms the sole evidence of an Amerindian presence during its formation. This contrasts sharply with the very central Eastern-Ijo derived contribution to basic lexi- con and bound morphology. On the Dutch side, given the dominance of the south- western provinces in the colonization of both Berbice and Essequibo, it is not surprising that Zeelandic Dutch characteristics can be recognized in many of the Dutch-derived forms. The marginal linguistic role played by Amerindians suggests that the dynamics of slavery determined the linguistic influence of the different groups historically pres- ent in the plantation society.
    [Show full text]