Spoken Dialect Identification in Twitter Using a Multi-Filter Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
ALEMANA GERMAN, ALEMÁN, ALLEMAND Language
ALEMANA GERMAN, ALEMÁN, ALLEMAND Language family: Indo-European, Germanic, West, High German, German, Middle German, East Middle German. Language codes: ISO 639-1 de ISO 639-2 ger (ISO 639-2/B) deu (ISO 639-2/T) ISO 639-3 Variously: deu – Standard German gmh – Middle High german goh – Old High German gct – Aleman Coloniero bar – Austro-Bavarian cim – Cimbrian geh – Hutterite German kksh – Kölsch nds – Low German sli – Lower Silesian ltz – Luxembourgish vmf – Main-Franconian mhn – Mócheno pfl – Palatinate German pdc – Pennsylvania German pdt – Plautdietsch swg – Swabian German gsw – Swiss German uln – Unserdeutssch sxu – Upper Saxon wae – Walser German wep – Westphalian Glotolog: high1287. Linguasphere: [show] Beste izen batzuk (autoglotonimoa: Deutsch). deutsch alt german, standard [GER]. german, standard [GER] hizk. Alemania; baita AEB, Arabiar Emirerri Batuak, Argentina, Australia, Austria, Belgika, Bolivia, Bosnia-Herzegovina, Brasil, Danimarka, Ekuador, Errumania, Errusia (Europa), Eslovakia, Eslovenia, Estonia, Filipinak, Finlandia, Frantzia, Hegoafrika, Hungaria, Italia, Kanada, Kazakhstan, Kirgizistan, Liechtenstein, Luxenburgo, Moldavia, Namibia, Paraguai, Polonia, Puerto Rico, Suitza, Tajikistan, Uzbekistan, Txekiar Errepublika, Txile, Ukraina eta Uruguain ere. Dialektoa: erzgebirgisch. Hizkuntza eskualde erlazionatuenak dira Bavarian, Schwäbisch, Allemannisch, Mainfränkisch, Hessisch, Palatinian, Rheinfränkisch, Westfälisch, Saxonian, Thuringian, Brandenburgisch eta Low saxon. Aldaera asko ez dira ulerkorrak beren artean. high -
Syntactic Transformations for Swiss German Dialects
Syntactic transformations for Swiss German dialects Yves Scherrer LATL Universite´ de Geneve` Geneva, Switzerland [email protected] Abstract These transformations are accomplished by a set of hand-crafted rules, developed and evaluated on While most dialectological research so far fo- the basis of the dependency version of the Standard cuses on phonetic and lexical phenomena, we German TIGER treebank. Ultimately, the rule set use recent fieldwork in the domain of dia- lect syntax to guide the development of mul- can be used either as a tool for treebank transduction tidialectal natural language processing tools. (i.e. deriving Swiss German treebanks from Stan- In particular, we develop a set of rules that dard German ones), or as the syntactic transfer mod- transform Standard German sentence struc- ule of a transfer-based machine translation system. tures into syntactically valid Swiss German After the discussion of related work (Section 2), sentence structures. These rules are sensitive we present the major syntactic differences between to the dialect area, so that the dialects of more Standard German and Swiss German dialects (Sec- than 300 towns are covered. We evaluate the tion 3). We then show how these differences can transformation rules on a Standard German treebank and obtain accuracy figures of 85% be covered by a set of transformation rules that ap- and above for most rules. We analyze the most ply to syntactically annotated Standard German text, frequent errors and discuss the benefit of these such as found in treebanks (Section 4). In Section transformations for various natural language 5, we give some coverage figures and discuss the processing tasks. -
Morphological Analysis and Lemmatization for Swiss German Using Weighted Transducers
Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016) Morphological analysis and lemmatization for Swiss German using weighted transducers Reto Baumgartner University of Zurich [email protected] Abstract haar and Wyler, 1997, p. 37). Conversely it pos- sesses infinitive particles that are not known to StG. With written Swiss German becoming SwG consists of different local dialects that more popular in everyday use, it has be- mainly differ in phonology and to a lesser extent in come a target for text processing. The ab- vocabulary. There is no standard orthography, but sence of a standard orthography and the there are proposals for sound-character assignment variety of dialects, however, lead to a vast like Dieth-Schreibung (Dieth, 1986) or Bärndüt- variation in different spellings which makes schi Schrybwys (Marti, 1985a) that are, however, this task difficult. We built a system based not known to everyone. This results in a high vari- on weighted transducers that recognizes ability of spellings influenced by both dialects and over 90% of the tokens in certain texts. personal writing preferences. As an example for Weights ensure preferring the best analysis the StG word Jahr “year”, we found in our corpus for most words while at the same time al- Jahr and Jaar, Johr and Joor, even Joh for different lowing for very broad range of spelling pronunciations and spelling preferences. variations. Our morphological tagset that The lack of a standard orthography and the vast- we defined for this purpose and lemmas in ness of variants motivate the choices that have to Standard German open the possibility for be made to process these dialects. -
Partitive Article
Book Disentangling bare nouns and nominals introduced by a partitive article IHSANE, Tabea (Ed.) Abstract The volume Disentangling Bare Nouns and Nominals Introduced by a Partitive Article, edited by Tabea Ihsane, focuses on different aspects of the distribution, semantics, and internal structure of nominal constituents with a “partitive article” in its indefinite interpretation and of potentially corresponding bare nouns. It further deals with diachronic issues, such as grammaticalization and evolution in the use of “partitive articles”. The outcome is a snapshot of current research into “partitive articles” and the way they relate to bare nouns, in a cross-linguistic perspective and on new data: the research covers noteworthy data (fieldwork data and corpora) from Standard languages - like French and Italian, but also German - to dialectal and regional varieties, including endangered ones like Francoprovençal. Reference IHSANE, Tabea (Ed.). Disentangling bare nouns and nominals introduced by a partitive article. Leiden ; Boston : Brill, 2020 DOI : 10.1163/9789004437500 Available at: http://archive-ouverte.unige.ch/unige:145202 Disclaimer: layout of this document may differ from the published version. 1 / 1 Disentangling Bare Nouns and Nominals Introduced by a Partitive Article - 978-90-04-43750-0 Downloaded from PubFactory at 10/29/2020 05:18:23PM via Bibliotheque de Geneve, Bibliotheque de Geneve, University of Geneva and Universite de Geneve Syntax & Semantics Series Editor Keir Moulton (University of Toronto, Canada) Editorial Board Judith Aissen (University of California, Santa Cruz) – Peter Culicover (The Ohio State University) – Elisabet Engdahl (University of Gothenburg) – Janet Fodor (City University of New York) – Erhard Hinrichs (University of Tubingen) – Paul M. -
THE SWISS GERMAN LANGUAGE and IDENTITY: STEREOTYPING BETWEEN the AARGAU and the ZÜRICH DIALECTS by Jessica Rohr
Purdue University Purdue e-Pubs Open Access Theses Theses and Dissertations 12-2016 The wS iss German language and identity: Stereotyping between the Aargau and the Zurich dialects Jessica Rohr Purdue University Follow this and additional works at: https://docs.lib.purdue.edu/open_access_theses Part of the Anthropological Linguistics and Sociolinguistics Commons, and the Language Interpretation and Translation Commons Recommended Citation Rohr, Jessica, "The wS iss German language and identity: Stereotyping between the Aargau and the Zurich dialects" (2016). Open Access Theses. 892. https://docs.lib.purdue.edu/open_access_theses/892 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. THE SWISS GERMAN LANGUAGE AND IDENTITY: STEREOTYPING BETWEEN THE AARGAU AND THE ZÜRICH DIALECTS by Jessica Rohr A Thesis Submitted to the Faculty of Purdue University In Partial Fulfillment of the Requirements for the degree of Master of Arts Department of Languages and Cultures West Lafayette, Indiana December 2016 ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF THESIS APPROVAL Dr. John Sundquist, Chair Department of German and Russian Dr. Daniel J. Olson Department of Spanish Dr. Myrdene Anderson Department of Anthropology Approved by: Dr. Madeleine M Henry Head of the Departmental Graduate Program iii To my Friends and Family iv ACKNOWLEDGMENTS I would like to thank my major professor, Dr. Sundquist, and my committee members, Dr. Olson, and Dr. Anderson, for their support and guidance during this process. Your guidance kept me motivated and helped me put the entire project together, and that is greatly appreciated. -
Polarity-Reversing Affirmative Particles
Polarity-reversing Affirmative Particles A Feature of Standard Average European (SAE) Elena Vera Moser Department of Linguistics Independent Project for the Degree of Master 30 HEC Typology and Linguistic Diversity Spring 2019 Examiner: Mats Wirén Supervisor: Ljuba Veselinova Expert reviewer: Henrik Liljegren Polarity-reversing Affirmative Particles A Feature of Standard Average European (SAE) Abstract Polarity-reversing affirmative particles are a phenomenon that has largely been overlooked in previous research. A polarity-reversing affirmative particle is used to express disagree- ment with the polarity of a preceding negative statement. It is a typical answer strategy in Swedish, German, Dutch as well as in French. In fact, findings from previous cross-linguistic studies suggest, though without further detail, that polarity-reversing affirmative particles are a phenomenon predominantly found in European and more specifically in Germanic languages (Da Milano 2004; Roelofsen & Farkas 2015; Moser 2018). The aim of this study is to examine the hypotheses presented in Moser (2018). The goals are to investigate the distribution of polarity- reversing affirmative particles in Europe on the one hand, and to examine the phenomenon in Swedish, German, Dutch and French from a diachronic perspective on the other hand. On the basis of these endeavors, this study is embedded in the framework of areal typology. This study brings forth highly interesting findings in view of the discussion of Standard Average European and the Charlemagne Sprachbund. Keywords: polarity-reversing affirmative particle, linguistic area, European languages Polaritetsomvändande Affirmativa Partiklar Ett Kännetecken av Standard Average European (SAE) Sammanfattning Polarity-reversing affirmative particles (sv. polaritetsomvändande affirmativa partiklar) är ett fenomen som har örbisetts i tidigare forskning. -
Indo-European, Germanic, West, High German, German, Middle German, East Middle German
1 ALEMANA GERMAN, ALEMÁN, ALLEMAND Language family: Indo-European, Germanic, West, High German, German, Middle German, East Middle German. Language codes: ISO 639-1 de ISO 639-2 ger (ISO 639-2/B) deu (ISO 639-2/T) ISO 639-3 Variously: deu – Standard German gmh – Middle High german goh – Old High German gct – Aleman Coloniero bar – Austro-Bavarian cim – Cimbrian geh – Hutterite German kksh – Kölsch nds – Low German sli – Lower Silesian ltz – Luxembourgish vmf – Main-Franconian mhn – Mócheno pfl – Palatinate German pdc – Pennsylvania German pdt – Plautdietsch swg – Swabian German gsw – Swiss German uln – Unserdeutssch sxu – Upper Saxon wae – Walser German wep – Westphalian Glotolog: high1287. Linguasphere: [show] 2 Beste izen batzuk (autoglotonimoa: Deutsch). deutsch alt german, standard [GER]. german, standard [GER] hizk. Alemania; baita AEB, Arabiar Emirerri Batuak, Argentina, Australia, Austria, Belgika, Bolivia, Bosnia-Herzegovina, Brasil, Danimarka, Ekuador, Errumania, Errusia (Europa), Eslovakia, Eslovenia, Estonia, Filipinak, Finlandia, Frantzia, Hegoafrika, Hungaria, Italia, Kanada, Kazakhstan, Kirgizistan, Liechtenstein, Luxenburgo, Moldavia, Namibia, Paraguai, Polonia, Puerto Rico, Suitza, Tajikistan, Uzbekistan, Txekiar Errepublika, Txile, Ukraina eta Uruguain ere. Dialektoa: erzgebirgisch. Hizkuntza eskualde erlazionatuenak dira Bavarian, Schwäbisch, Allemannisch, Mainfränkisch, Hessisch, Palatinian, Rheinfränkisch, Westfälisch, Saxonian, Thuringian, Brandenburgisch eta Low saxon. Aldaera asko ez dira ulerkorrak beren artean. -
Searching for the Bilingual Advantage in Executive Functions in Speakers of Hunsrückisch and Other Minority Languages: a Literature Review
Searching for the bilingual advantage in executive functions in speakers of Hunsrückisch and other minority languages: a literature review Elisabeth Abreu and Bernardo Limberger (Rio Grande do Sul) Abstract The issue of a bilingual advantage on executive functions has been a hotbed for research and debate. Brazilian studies with the minority language Hunsrückisch have failed to replicate the finding of a bilingual advantage found in international studies with majority languages. This raises the question of whether the reasons behind the discrepant results are related to the lan- guage’s minority status. The goal of this paper was to investigate the bilingual advantage on executive functions in studies with minority languages. This is a literature review focusing on state of the art literature on bilingualism and executive functions; as well as a qualitative anal- ysis of the selected corpus in order to tackle the following questions: (1) is there evidence of a bilingual advantage in empirical studies involving minority languages? (2) are there common underlying causes for the presence or absence of the bilingual advantage? (3) can factors per- taining to the language’s minority status be linked to the presence or absence of a bilingual advantage? The analysis revealed that studies form a highly variable group, with mixed results regarding the bilingual advantage, as well as inconsistent controlling of social, cognitive and linguistic factors, as well as different sample sizes. As such, it was not possible to isolate which factors are responsible for the inconsistent results across studies. It is hoped this study will provide an overview that can serve as common ground for future studies involving the issue of a bilingual advantage with speakers of minority languages. -
Comparative Constructions Across the German Minorities of Italy: a Semasiological Approach
Linguistic Typology at the Crossroads 1-1 (2021): 288-332 Comparative constructions across the German minorities of Italy: a semasiological approach LIVIO GAETA1 1DEPARTMENT OF HUMANITIES, UNIVERSITY OF TURIN Submitted: 25/11/2020 Revised version: 16/06/2021 Accepted: 22/06/2021 Published: 31/08/2021 Abstract Comparative constructions of inequality display a recurrent pattern throughout all Germanic languages, which is partially inherited from the Indo-European mother tongue. This common semasiological format consists in a copulative construction in which the adjective expressing the quality carries a comparative suffix and is accompanied by a particle introducing the standard. For the latter, a morpheme coming from various onomasiological domains is generally recruited. After a general overview of the construction within the Germanic family, the paper will focus on its consistency in the German linguistic islands of Northern Italy, where a remarkable variety is found, which is only partially due to the long-standing contact with Romance languages. Besides an overview of the Bavarian islands of the North-East, particular attention is devoted to the Walser German islands of the North-West, where a number of peculiar patterns are found, which partially reflect structural possibilities attested in earlier stages of the German-speaking territory, but also display unique developments such as for instance the comparative particle ŝchu ‘so’ found in Rimella. Keywords: comparative construction; semasiology; onomasiology; language minority; linguistic island; language contact. 1. Introduction Comparative Constructions of Inequality (= CCI) display a recurrent pattern throughout all Germanic languages, which is partially inherited from the Indo- European mother tongue and corresponds to the other cognates of the family. -
Kashubian INDO-IRANIAN IRANIAN INDO-ARYAN WESTERN
2/27/2018 https://upload.wikimedia.org/wikipedia/commons/4/4f/IndoEuropeanTree.svg INDO-EUROPEAN ANATOLIAN Luwian Hittite Carian Lydian Lycian Palaic Pisidian HELLENIC INDO-IRANIAN DORIAN Mycenaean AEOLIC INDO-ARYAN Doric Attic ACHAEAN Aegean Northwest Greek Ionic Beotian Vedic Sanskrit Classical Greek Arcado Thessalian Tsakonian Koine Greek Epic Greek Cypriot Sanskrit Prakrit Greek Maharashtri Gandhari Shauraseni Magadhi Niya ITALIC INSULAR INDIC Konkani Paisaci Oriya Assamese BIHARI CELTIC Pali Bengali LATINO-FALISCAN SABELLIC Dhivehi Marathi Halbi Chittagonian Bhojpuri CONTINENTAL Sinhalese CENTRAL INDIC Magahi Faliscan Oscan Vedda Maithili Latin Umbrian Celtiberian WESTERN INDIC HINDUSTANI PAHARI INSULAR Galatian Classical Latin Aequian Gaulish NORTH Bhil DARDIC Hindi Urdu CENTRAL EASTERN Vulgar Latin Marsian GOIDELIC BRYTHONIC Lepontic Domari Ecclesiastical Latin Volscian Noric Dogri Gujarati Kashmiri Haryanvi Dakhini Garhwali Nepali Irish Common Brittonic Lahnda Rajasthani Nuristani Rekhta Kumaoni Palpa Manx Ivernic Potwari Romani Pashayi Scottish Gaelic Pictish Breton Punjabi Shina Cornish Sindhi IRANIAN ROMANCE Cumbric ITALO-WESTERN Welsh EASTERN Avestan WESTERN Sardinian EASTERN ITALO-DALMATIAN Corsican NORTH SOUTH NORTH Logudorese Aromanian Dalmatian Scythian Sogdian Campidanese Istro-Romanian Istriot Bactrian CASPIAN Megleno-Romanian Italian Khotanese Romanian GALLO-IBERIAN Neapolitan Ossetian Khwarezmian Yaghnobi Deilami Sassarese Saka Gilaki IBERIAN Sicilian Sarmatian Old Persian Mazanderani GALLIC SOUTH Shahmirzadi Alanic -
Toward an Orthography: the Textualization of Swiss German
Toward an Orthography: The Textualization of Swiss German Yiwei Luo This paper discusses the future of Swiss German. Currently Swiss German is considered a variety of the Standard German dialect because it is not a written language. The author argues, however, that Swiss German has a much larger presence currently than being a variety. In a number of examples shown, Swiss German is growing in numbers of speakers and in ways it is used. Because of this, there are more instances of written Swiss German being produced. There are currently a few online sources dedicated to standardizing Swiss German spelling, yet the future of Swiss Ger- man is uncertain. An Introduction A motif of human civilization has been to accord authority com mensurate to age. Writing, however, stands out as an exception: de spite being predated by spoken language by eons, it is writing that commands greater respect. For many centuries, people considered a written form to be the key determiner for language hood, and that forms restricted to the domain of speech were merely “vernac ulars.”1 This particular belief prevailed in the West until about a thousand years ago when vernaculars began to slowly gain prestige and find their way onto paper.2 However, traces of this former bias persist where Alemannic dialects are concerned. Swiss German— though mutually unintelligible with Standard German—is not considered a real language due to its lack of presence in writing. Native speakers of Standard German recognize the Swiss variety as distinct, but in contrasting it with “real German,” “actual German,” and “proper German” (just to name a few monikers for Standard German given by a Berliner), their views on the subject become clear: Swiss German is not a language. -
The Intergenerational Transmission of Catalan in Alghero Chessa, Enrico
Another case of language death? The intergenerational transmission of Catalan in Alghero Chessa, Enrico For additional information about this publication click this link. http://qmro.qmul.ac.uk/jspui/handle/123456789/2502 Information about this research object was correct at the time of download; we occasionally make corrections to records, please therefore check the published record when citing. For more information contact [email protected] Another case of language death? The intergenerational transmission of Catalan in Alghero Enrico Chessa Thesis submitted for the qualification of Doctor of Philosophy (PhD) Queen Mary, University of London 2011 1 The work presented in this thesis is the candidate’s own. 2 for Fregenet 3 Table of Contents Abstract .................................................................................................................................... 8 Acknowledgements .................................................................................................................. 9 Abbreviations ......................................................................................................................... 11 List of Figures ........................................................................................................................ 12 List of Tables ......................................................................................................................... 15 Chapter 1: Introduction .........................................................................................................