Arxiv:2006.03564V1 [Cs.CL] 5 Jun 2020

Total Page:16

File Type:pdf, Size:1020Kb

Arxiv:2006.03564V1 [Cs.CL] 5 Jun 2020 Spoken dialect identification in Twitter using a multi-filter architecture Mohammadreza Banaei Remi´ Lebret Karl Aberer EPFL, Switzerland Abstract and even a single author might use different spelling for a word between two sentences. There also exists This paper presents our approach for SwissText a dialect continuum across the German-speaking part & KONVENS 2020 shared task 2, which is a multi-stage neural model for Swiss German of Switzerland, which makes NLP for Swiss German (GSW) identification on Twitter. Our model even more challenging. Swiss German has its own outputs either GSW or non-GSW and is not pronunciation, grammar and also lots of its words are meant to be used as a generic language identifier. different from German. Our architecture consists of two independent There exists some previous efforts for discriminating filters where the first one favors recall, and the similar languages with the help of tweets metadata second one filter favors precision (both towards such as geo-location (Williams and Dagli, 2017), but in GSW). Moreover, we do not use binary models (GSW vs. not-GSW) in our filters but rather a this paper, we do not use tweets metadata and restrict multi-class classifier with GSW being one of the our model to only use tweet content. Therefore, this possible labels. Our model reaches F1-score of model can also be used for language identification in 0.982 on the test set of the shared task. sources other than Twitter. LIDs that support GSW like fastText (Joulin et al., 1 Introduction 2016) LID model are often trained by using Alemannic Out of over 8000 languages in the world (Hammarstrm Wikipedia, which also contains other German dialects et al., 2020), Twitter language identifier (LID) only such as Swabian, Walser German, and Alsatian Ger- supports around 30 of the most used languages1, which man; hence, these models are not able to discriminate is not enough for NLP community needs. Furthermore, dialects that are close to GSW. Moreover, fastText LID it has been shown that even for these frequently used also has a pretty low recall (0.362) for Swiss German languages, Twitter LID is not highly accurate, especially tweets, as it identified many of them as German. when the tweet is relatively short (Zubiaga et al., 2016). In this paper, we use two independently trained However, Twitter data is linguistically diverse filters to remove non-GSW tweets. The first filter and especially includes tweets in many low-resource is a classifier that favors recall (towards GSW), and languages/dialects. Having a better performing Twitter the second one favors precision. The exact same LID can help us to gather large amounts of (unlabeled) idea can be extended to N consecutive filters (with text in these low-resource languages that can be used to N ≥ 2), with the first N −1 favoring recall and the enrich models in many down-stream NLP tasks, such last filter favoring precision. In this way, we make arXiv:2006.03564v1 [cs.CL] 5 Jun 2020 as sentiment analysis (Volkova et al., 2013) and named sure that GSW samples are not filtered out (with high entity recognition (Ritter et al., 2011). probability) in the first N −1 iterations, and the whole However, the generalization of state-of-the-art NLP pipeline GSW precision can be improved by having a models to low-resource languages is generally hard filter that favors precision at the end (N-th filter). The due to the lack of corpora with good coverage in these reason that we use only two filters is that adding more languages. The extreme case is the spoken dialects, filters improved the performance (measured by GSW where there might be no standard spelling at all. In F1-score) negligibly on our validation set. this paper, we especially focus on Swiss German as We demonstrate that by using this architecture, we our low-resource dialect. As Swiss German is a spoken can achieve F1-score of 0.982 on the test set, even with dialect, people might spell a certain word differently, a small amount of available data in the target domain (Twitter data). Section2 presents the architecture of 1https://dev.twitter.com/docs/ developer-utilities/supported-languages/ each of our filters and the rationale behind the chosen api-reference training data for each of them. In section3, we discuss our LID implementation details and also discuss the subwords embedding should be updated in order to detailed description of used datasets. Section4 presents improve the downstream task performance. In addition, the performance of our filters on the held-out test there are even syntactic differences between German dataset. Moreover, we demonstrate the contribution and GSW (and even among different variations of GSW of each of the filters on removing non-GSW filters to in different regions (Honnet et al., 2017)). For these see their individual importance in the whole pipeline three reasons, we can conclude that freezing the BERT (for this specific test dataset). body (and just training the classifier layer) might not be optimal for this transfer learning between German 2 Multi-filter language identification and our target language. Hence, we also let the whole In this paper, we follow the combination of N −1 fil- BERT body be trained during the downstream task, ters favoring recall, followed by a final filter that favors which of course needs a large amount of supervised more precision. We choose N = 2 in this paper to data to avoid quick overfitting in the fine-tuning phase. demonstrate the effectiveness of the approach. As dis- For this filter, we choose the same eight classes for cussed before, adding more filters improved the perfor- training LID as Linder et al.(2019) (the dataset classes mance of the pipeline negligibly for this specific dataset. and their respective sizes can be found in section 3.1). However, for more challenging datasets, it might be These languages are similar in structure to GSW (such needed to have N >2 to improve the LID precision. as German, Dutch, etc.), and we try to train a model Both of our filters are multi-class classifiers with that can distinguish GSW from similar languages GSW being one of the possible labels. We found it to decrease GSW false positives. For all classes empirically better to use roughly balanced classes for except GSW, we use sentences (mostly Wikipedia training the multi-class classifier, rather than making and Newscrawl) from Leipzig text corpora (Goldhahn the same training data a highly imbalanced GSW vs. et al., 2012). We also use the SwissCrawl (Linder et al., non-GSW training data for a binary classifier, especially 2019) dataset for GSW sentences. for the first filter (section 2.1) which has much more Most GSW training samples (SwissCrawl data) parameters compared to the second filter (section 2.2). come from forums and social media, which are less formal (in structure and also used phrases) than other 2.1 First filter: fine-tuned BERT model (non-GSW) classes samples (mostly from Wikipedia The first filter should be designed in a way to favor and NewsCrawl). Moreover, as our target dataset GSW recall, either by tuning inference thresholds or consist of tweets (mostly informal sentences), this by using training data that implicitly enforces this bias could make this filter having high GSW recall during towards GSW. Here we follow the second approach the inference phase. Additionally, our main reason for for this filter by using different domains for training using a cased tokenizer for this filter is to let the model different labels, which is further discussed below. also use irregularities in writing, such as improper Moreover, we use a more complex (in terms of the capitalization. As these irregularities mostly occur in number of parameters) model for the first filter, so informal writing, it will again bias the model towards that it does the main job of removing non-GSW inputs GSW (improving GSW recall) when tweets are passed while having reasonable GSW precision (further detail to it, as most of the GSW training samples are informal. in section4). The second filter will be later used to improve the pipeline precision by removing a relatively 2.2 Second filter: fastText classifier smaller number of non-GSW tweets. For this filter, we also train a multiclass classifier with Our first filter is a fine-tuned BERT (Devlin et al., GSW being one of the labels. The other classes are 2018) model for the LID downstream task. As we again close languages (in structure) to GSW such do not have a large amount of unsupervised GSW as German, Dutch and Spanish (further detail in data, it will be hard to train the BERT language model section 3.1). Additionally, as mentioned before, our (LM) from scratch on GSW itself. Hence, we use the second filter should have a reasonably high precision German pre-trained LM (BERT-base-cased model2), to enhance the full pipeline precision. Hence, unlike which is the closest high-resource language to GSW. the first filter, we choose the whole training data However, this LM has been trained using sentences to be sampled from a similar domain to the target (e.g., German Wikipedia) that are quite different test set. non-GSW samples are tweets from SEPLN from the Twitter domain. Moreover, lack of standard 2014 (Zubiaga et al., 2014) and Carter et al.(2013) spelling in GSW introduces many new words (unseen dataset. GSW samples consist of this shared task in German LM training data) that their respective provided GSW tweets and also part of GSW samples 2Training details available at https://huggingface.
Recommended publications
  • ALEMANA GERMAN, ALEMÁN, ALLEMAND Language
    ALEMANA GERMAN, ALEMÁN, ALLEMAND Language family: Indo-European, Germanic, West, High German, German, Middle German, East Middle German. Language codes: ISO 639-1 de ISO 639-2 ger (ISO 639-2/B) deu (ISO 639-2/T) ISO 639-3 Variously: deu – Standard German gmh – Middle High german goh – Old High German gct – Aleman Coloniero bar – Austro-Bavarian cim – Cimbrian geh – Hutterite German kksh – Kölsch nds – Low German sli – Lower Silesian ltz – Luxembourgish vmf – Main-Franconian mhn – Mócheno pfl – Palatinate German pdc – Pennsylvania German pdt – Plautdietsch swg – Swabian German gsw – Swiss German uln – Unserdeutssch sxu – Upper Saxon wae – Walser German wep – Westphalian Glotolog: high1287. Linguasphere: [show] Beste izen batzuk (autoglotonimoa: Deutsch). deutsch alt german, standard [GER]. german, standard [GER] hizk. Alemania; baita AEB, Arabiar Emirerri Batuak, Argentina, Australia, Austria, Belgika, Bolivia, Bosnia-Herzegovina, Brasil, Danimarka, Ekuador, Errumania, Errusia (Europa), Eslovakia, Eslovenia, Estonia, Filipinak, Finlandia, Frantzia, Hegoafrika, Hungaria, Italia, Kanada, Kazakhstan, Kirgizistan, Liechtenstein, Luxenburgo, Moldavia, Namibia, Paraguai, Polonia, Puerto Rico, Suitza, Tajikistan, Uzbekistan, Txekiar Errepublika, Txile, Ukraina eta Uruguain ere. Dialektoa: erzgebirgisch. Hizkuntza eskualde erlazionatuenak dira Bavarian, Schwäbisch, Allemannisch, Mainfränkisch, Hessisch, Palatinian, Rheinfränkisch, Westfälisch, Saxonian, Thuringian, Brandenburgisch eta Low saxon. Aldaera asko ez dira ulerkorrak beren artean. high
    [Show full text]
  • Partitive Article
    Book Disentangling bare nouns and nominals introduced by a partitive article IHSANE, Tabea (Ed.) Abstract The volume Disentangling Bare Nouns and Nominals Introduced by a Partitive Article, edited by Tabea Ihsane, focuses on different aspects of the distribution, semantics, and internal structure of nominal constituents with a “partitive article” in its indefinite interpretation and of potentially corresponding bare nouns. It further deals with diachronic issues, such as grammaticalization and evolution in the use of “partitive articles”. The outcome is a snapshot of current research into “partitive articles” and the way they relate to bare nouns, in a cross-linguistic perspective and on new data: the research covers noteworthy data (fieldwork data and corpora) from Standard languages - like French and Italian, but also German - to dialectal and regional varieties, including endangered ones like Francoprovençal. Reference IHSANE, Tabea (Ed.). Disentangling bare nouns and nominals introduced by a partitive article. Leiden ; Boston : Brill, 2020 DOI : 10.1163/9789004437500 Available at: http://archive-ouverte.unige.ch/unige:145202 Disclaimer: layout of this document may differ from the published version. 1 / 1 Disentangling Bare Nouns and Nominals Introduced by a Partitive Article - 978-90-04-43750-0 Downloaded from PubFactory at 10/29/2020 05:18:23PM via Bibliotheque de Geneve, Bibliotheque de Geneve, University of Geneva and Universite de Geneve Syntax & Semantics Series Editor Keir Moulton (University of Toronto, Canada) Editorial Board Judith Aissen (University of California, Santa Cruz) – Peter Culicover (The Ohio State University) – Elisabet Engdahl (University of Gothenburg) – Janet Fodor (City University of New York) – Erhard Hinrichs (University of Tubingen) – Paul M.
    [Show full text]
  • Polarity-Reversing Affirmative Particles
    Polarity-reversing Affirmative Particles A Feature of Standard Average European (SAE) Elena Vera Moser Department of Linguistics Independent Project for the Degree of Master 30 HEC Typology and Linguistic Diversity Spring 2019 Examiner: Mats Wirén Supervisor: Ljuba Veselinova Expert reviewer: Henrik Liljegren Polarity-reversing Affirmative Particles A Feature of Standard Average European (SAE) Abstract Polarity-reversing affirmative particles are a phenomenon that has largely been overlooked in previous research. A polarity-reversing affirmative particle is used to express disagree- ment with the polarity of a preceding negative statement. It is a typical answer strategy in Swedish, German, Dutch as well as in French. In fact, findings from previous cross-linguistic studies suggest, though without further detail, that polarity-reversing affirmative particles are a phenomenon predominantly found in European and more specifically in Germanic languages (Da Milano 2004; Roelofsen & Farkas 2015; Moser 2018). The aim of this study is to examine the hypotheses presented in Moser (2018). The goals are to investigate the distribution of polarity- reversing affirmative particles in Europe on the one hand, and to examine the phenomenon in Swedish, German, Dutch and French from a diachronic perspective on the other hand. On the basis of these endeavors, this study is embedded in the framework of areal typology. This study brings forth highly interesting findings in view of the discussion of Standard Average European and the Charlemagne Sprachbund. Keywords: polarity-reversing affirmative particle, linguistic area, European languages Polaritetsomvändande Affirmativa Partiklar Ett Kännetecken av Standard Average European (SAE) Sammanfattning Polarity-reversing affirmative particles (sv. polaritetsomvändande affirmativa partiklar) är ett fenomen som har örbisetts i tidigare forskning.
    [Show full text]
  • Indo-European, Germanic, West, High German, German, Middle German, East Middle German
    1 ALEMANA GERMAN, ALEMÁN, ALLEMAND Language family: Indo-European, Germanic, West, High German, German, Middle German, East Middle German. Language codes: ISO 639-1 de ISO 639-2 ger (ISO 639-2/B) deu (ISO 639-2/T) ISO 639-3 Variously: deu – Standard German gmh – Middle High german goh – Old High German gct – Aleman Coloniero bar – Austro-Bavarian cim – Cimbrian geh – Hutterite German kksh – Kölsch nds – Low German sli – Lower Silesian ltz – Luxembourgish vmf – Main-Franconian mhn – Mócheno pfl – Palatinate German pdc – Pennsylvania German pdt – Plautdietsch swg – Swabian German gsw – Swiss German uln – Unserdeutssch sxu – Upper Saxon wae – Walser German wep – Westphalian Glotolog: high1287. Linguasphere: [show] 2 Beste izen batzuk (autoglotonimoa: Deutsch). deutsch alt german, standard [GER]. german, standard [GER] hizk. Alemania; baita AEB, Arabiar Emirerri Batuak, Argentina, Australia, Austria, Belgika, Bolivia, Bosnia-Herzegovina, Brasil, Danimarka, Ekuador, Errumania, Errusia (Europa), Eslovakia, Eslovenia, Estonia, Filipinak, Finlandia, Frantzia, Hegoafrika, Hungaria, Italia, Kanada, Kazakhstan, Kirgizistan, Liechtenstein, Luxenburgo, Moldavia, Namibia, Paraguai, Polonia, Puerto Rico, Suitza, Tajikistan, Uzbekistan, Txekiar Errepublika, Txile, Ukraina eta Uruguain ere. Dialektoa: erzgebirgisch. Hizkuntza eskualde erlazionatuenak dira Bavarian, Schwäbisch, Allemannisch, Mainfränkisch, Hessisch, Palatinian, Rheinfränkisch, Westfälisch, Saxonian, Thuringian, Brandenburgisch eta Low saxon. Aldaera asko ez dira ulerkorrak beren artean.
    [Show full text]
  • Comparative Constructions Across the German Minorities of Italy: a Semasiological Approach
    Linguistic Typology at the Crossroads 1-1 (2021): 288-332 Comparative constructions across the German minorities of Italy: a semasiological approach LIVIO GAETA1 1DEPARTMENT OF HUMANITIES, UNIVERSITY OF TURIN Submitted: 25/11/2020 Revised version: 16/06/2021 Accepted: 22/06/2021 Published: 31/08/2021 Abstract Comparative constructions of inequality display a recurrent pattern throughout all Germanic languages, which is partially inherited from the Indo-European mother tongue. This common semasiological format consists in a copulative construction in which the adjective expressing the quality carries a comparative suffix and is accompanied by a particle introducing the standard. For the latter, a morpheme coming from various onomasiological domains is generally recruited. After a general overview of the construction within the Germanic family, the paper will focus on its consistency in the German linguistic islands of Northern Italy, where a remarkable variety is found, which is only partially due to the long-standing contact with Romance languages. Besides an overview of the Bavarian islands of the North-East, particular attention is devoted to the Walser German islands of the North-West, where a number of peculiar patterns are found, which partially reflect structural possibilities attested in earlier stages of the German-speaking territory, but also display unique developments such as for instance the comparative particle ŝchu ‘so’ found in Rimella. Keywords: comparative construction; semasiology; onomasiology; language minority; linguistic island; language contact. 1. Introduction Comparative Constructions of Inequality (= CCI) display a recurrent pattern throughout all Germanic languages, which is partially inherited from the Indo- European mother tongue and corresponds to the other cognates of the family.
    [Show full text]
  • The Intergenerational Transmission of Catalan in Alghero Chessa, Enrico
    Another case of language death? The intergenerational transmission of Catalan in Alghero Chessa, Enrico For additional information about this publication click this link. http://qmro.qmul.ac.uk/jspui/handle/123456789/2502 Information about this research object was correct at the time of download; we occasionally make corrections to records, please therefore check the published record when citing. For more information contact [email protected] Another case of language death? The intergenerational transmission of Catalan in Alghero Enrico Chessa Thesis submitted for the qualification of Doctor of Philosophy (PhD) Queen Mary, University of London 2011 1 The work presented in this thesis is the candidate’s own. 2 for Fregenet 3 Table of Contents Abstract .................................................................................................................................... 8 Acknowledgements .................................................................................................................. 9 Abbreviations ......................................................................................................................... 11 List of Figures ........................................................................................................................ 12 List of Tables ......................................................................................................................... 15 Chapter 1: Introduction .........................................................................................................
    [Show full text]
  • Soldiers, Rabbis, and the Ostjuden Under German Occupation: 1915-1918
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 8-2010 Shattered Communities: Soldiers, Rabbis, and the Ostjuden under German Occupation: 1915-1918 Tracey Hayes Norrell [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Part of the Diplomatic History Commons, European History Commons, History of Religion Commons, Military History Commons, and the Political History Commons Recommended Citation Norrell, Tracey Hayes, "Shattered Communities: Soldiers, Rabbis, and the Ostjuden under German Occupation: 1915-1918. " PhD diss., University of Tennessee, 2010. https://trace.tennessee.edu/utk_graddiss/834 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Tracey Hayes Norrell entitled "Shattered Communities: Soldiers, Rabbis, and the Ostjuden under German Occupation: 1915-1918." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in History. Vejas G. Liulevicius, Major Professor We have read this dissertation and recommend
    [Show full text]
  • Bibliographie 2014
    Institutional Repository - Research Portal Dépôt Institutionnel - Portail de la Recherche University of Namurresearchportal.unamur.be RESEARCH OUTPUTS / RÉSULTATS DE RECHERCHE Sociolinguistic bibliography of European countries 2014 Darquennes, Jeroen; Held, Gurdrun; Kaderka, Petr; Kellermeier-Rehbein, Birte; Pärn, Hele; Zamora, Francisco; Sandoy, Helge; Ledegen, Gudrun; Oakes, Leigh; Goutsos, Dionysos; Archakis, Argyris; Skelin-Horvath, Anita; Borbély, Anna; Berruto, Gaetano; Kalediene, Laima; Druviete, Ina; Neteland, Randi; Bugarski, Ranko; Troschina, Natalia; Broermann, Marianne ; Gilles, Peter ; Ondrejovic, Slavomir DOI: Author(s)10.1515/soci-2016-0018 - Auteur(s) : Publication date: 2016 Document Version PublicationPublisher's date PDF, - also Date known de aspublication Version of record : Link to publication Citation for pulished version (HARVARD): Darquennes, J, Held, G, Kaderka, P, Kellermeier-Rehbein, B, Pärn, H, Zamora, F, Sandoy, H, Ledegen, G, Oakes, L, Goutsos, D, Archakis, A, Skelin-Horvath, A, Borbély, A, Berruto, G, Kalediene, L, Druviete, I, PermanentNeteland, link R, Bugarski, - Permalien R, Troschina, : N, Broermann, M, Gilles, P & Ondrejovic, S 2016, Sociolinguistic bibliography of European countries 2014: Soziolinguistische Bibliographie europäischer Länder für 2014. de Gruyter, Berlin. https://doi.org/10.1515/soci-2016-0018 Rights / License - Licence de droit d’auteur : General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright
    [Show full text]
  • Church Unity, Luther Memory, and Ideas of the German Nation, 1817-1883
    That All May be One? Church Unity, Luther Memory, and Ideas of the German Nation, 1817-1883 Item Type text; Electronic Dissertation Authors Landry, Stan Michael Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 23/09/2021 16:04:45 Link to Item http://hdl.handle.net/10150/193760 THAT ALL MAY BE ONE? CHURCH UNITY, LUTHER MEMORY, AND IDEAS OF THE GERMAN NATION, 1817-1883 by STAN MICHAEL LANDRY _____________________ A Dissertation Submitted to the Faculty of the DEPARTMENT OF HISTORY In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2010 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we Certify that we have read the dissertation prepared by Stan Michael Landry entitled: "That All May be One? ChurCh Unity, Luther Memory, and Ideas of the German Nation, 1817-1883" and reCommend that it be aCCepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy _________________________________________________________________________________ Date: 8 Feb. 2010 Susan A. Crane _________________________________________________________________________________ Date: 8 Feb. 2010 Susan Karant-Nunn _________________________________________________________________________________ Date: 8 Feb. 2010 Peter W. Foley Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final Copies of the dissertation to the Graduate College. I hereby Certify that I have read this dissertation prepared under my direCtion and reCommend that it be aCCepted as fulfilling the dissertation requirement.
    [Show full text]
  • 16 August 2021 Aperto
    AperTO - Archivio Istituzionale Open Access dell'Università di Torino A specter is haunting Europe: the Alps as a linguistic area? This is a pre print version of the following article: Original Citation: Availability: This version is available http://hdl.handle.net/2318/1784541 since 2021-04-08T12:14:33Z Published version: DOI:10.1515/stuf-2021-1021 Terms of use: Open Access Anyone can freely access the full text of works made available as "Open Access". Works made available under a Creative Commons license can be used according to the terms and conditions of said license. Use of all other works requires consent of the right holder (author or publisher) if not exempted from copyright protection by the applicable law. (Article begins on next page) 28 September 2021 STUF; 2021; 74(1): 1–17 Livio Gaeta* and Guido Seiler A specter is haunting Europe: The Alps as a linguistic area? 1 The Alps as a bridge and as a barrier It is a well-known fact that the Alps are a zone of long-standing, intensive con- tact and multilingualism among Germanic, Romance and Slavic languages and varieties. Exchange between Alpine dialects of different genetic affiliations is well attested in vocabulary and onomastics (Krefeld and Lücke 2014). However, the Alpine context seems to meet exactly the kind of extra-linguistic setting where areal convergence in grammatical structure is likely to emerge, too. In this light, it is not by chance that recently concepts like Alpindeutsch ‘Alpine German’ have been used to label the set of commonalities (of linguistic, cultur- al, social, etc.
    [Show full text]
  • Siben Komoine Im Land Chalchoufe Remmalju Kampel Pomatt Bersntol
    VALLE DEI GRESSONEY ISSIME ALAGNA CARCOFORO RIMELLA CAMPELLO MONTI FORMAZZA MÒCHENI Greschoney Eischeme Im Land Chalchoufe Remmalju Kampel Pomatt Bersntol Comitato unitario delle isole linguistiche storiche germaniche in Italia Einheitskomitee der historischen deutschen Sprachinseln in Italien !"!"###"!"! !"!"###"!"! !"!"###"!"! !"!"###"!"! !"!"###"!"! !"!"###"!"! !"!"###"!"! !"!"###"!"! Le isole linguistiche storiche germaniche aderenti al Comitato Unitario delle Isole Linguistiche Storiche Germaniche in Italia Die historischen deutschen Sprachinseln vertreten vom Einheitskomitee der historischen deutschen Sprachinseln in Italien The historic German language enclaves represented by the Unitary Committee of historic German language Enclaves in Italy VALCANALE AUSTRIA KANALTAL Pontebba/Pontafel SVIZZERA !!" Malborghetto-Valbruna ALTOALTO ADIGEADIGE TIMAU !!" Malborgeth-Wolfsbach SÜDTIROLSÜDTIROL TISCHLBONG Tarvisio/Tarvis !"!"!"###"!"!"! FORMAZZA 0 50 km 100 km POMATT Bolzano Comitato Unitario delle Isole Linguistiche Bozen Storiche Germaniche in Italia SAPPADA Il Comitato Unitario delle Isole Linguistiche RIMELLA PLODN Storiche Germaniche in Italia è formato dai rappresentanti delle comunità germaniche REMMALJU SAURIS insediate nell’arco alpino italiano, con lo scopo di promuoverne le lingue e le culture. È stato ZAHRE fondato nell’anno 2002, in seguito ad un paio CARCOFORO di incontri promossi a partire dal 2001, proclamato dall’Unione Europea e dal Consiglio CHALCHOUFE d’Europa ”Anno Europeo delle lingue”. Nel corso SLOVENIA degli anni
    [Show full text]
  • Adjective Orders in Cimbrian Dps
    Appeared as: Bidese, Ermenegildo, Andrea Padovan and Claudia Turolla (2019). Adjective orders in Cimbrian DPs. Linguistics 57(2): 373–394 [https://doi.org/10.1515/ling-2019-0004]. Adjective orders in Cimbrian DPs Ermenegildo Bidese*, Andrea Padovan+ and Claudia Turolla* Abstract: In this work we aim to give a first description of the morphosyntactic behavior of some adjectives in the Cimbrian of Luserna. This Germanic variety allows a subclass of adjec- tives to appear in post-nominal position. This aspect seems to be relevant, since neither collo- quial Standard German nor any other German substandard variety spoken in German-speaking areas display a similar pattern. Along the lines of Cinque (2010, 2014), we argue that Cimbrian, with respect to the adnominal adjectival order, has maintained the Germanic pattern of Merge, but permits in some cases NP-Movement above the (“bare” AP reduced) relative clause projec- tion. The fact that adjectives following the head noun are predicative rather than attributive is supported by the fact that post-nominal modifiers never show up with inflection. Keywords: Cimbrian, German-based minority languages, syntax, adjectives, NP-movement 1 Introduction1 Cimbrian is a German(ic) minority language belonging to the Southern Ba- varian dialects and is nowadays spoken only in the small mountain village of Luserna (Lusérn), close to Trento, in the south-eastern Italian Alps (see Bidese 2004); its lexicon shows a number of borrowings from the surround- ing Romance dialects, which is a well-known process since the middle ages i (see Gamillscheg 1912). Until few decades ago this language was spoken in a much wider area in the Veneto and Trentino regions, in the provinces of Trento, Verona and Bassano del Grappa (see [Linguistics 57(2), 374] again Bidese 2004).
    [Show full text]