Measuring Boundaries in the Dialect Continuum
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Adapting MARBERT for Improved Arabic Dialect Identification
Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task Badr AlKhamissi∗ Mohamed Gabr∗ Independent Microsoft EGDC badr [at] khamissi.com mohamed.gabr [at] microsoft.com Muhammed ElNokrashy Khaled Essam Microsoft EGDC Mendel.ai muelnokr [at] microsoft.com khaled.essam [at] mendel.ai Abstract syntax, morphology, vocabulary, and even orthog- raphy. Dialects may be heavily influenced by pre- In this paper, we tackle the Nuanced Ara- viously dominant local languages. For example, bic Dialect Identification (NADI) shared task Egyptian variants are influenced by the Coptic lan- (Abdul-Mageed et al., 2021) and demonstrate guage, while Sudanese variants are influenced by state-of-the-art results on all of its four sub- the Nubian language. tasks. Tasks are to identify the geographic ori- gin of short Dialectal (DA) and Modern Stan- In this paper, we study the classification of such dard Arabic (MSA) utterances at the levels of variants and describe our model that achieves state- both country and province. Our final model is of-the-art results on all of the four Nuanced Ara- an ensemble of variants built on top of MAR- bic Dialect Identification (NADI) subtasks (Abdul- BERT that achieves an F1-score of 34:03% for Mageed et al., 2021). The task focuses on distin- DA at the country-level development set—an guishing both MSA and DA by their geographi- improvement of 7:63% from previous work. cal origin at both the country and province levels. The data is a collection of tweets covering 100 1 Introduction provinces from 21 Arab countries. -
Linguistics Development Team
Development Team Principal Investigator: Prof. Pramod Pandey Centre for Linguistics / SLL&CS Jawaharlal Nehru University, New Delhi Email: [email protected] Paper Coordinator: Prof. K. S. Nagaraja Department of Linguistics, Deccan College Post-Graduate Research Institute, Pune- 411006, [email protected] Content Writer: Prof. K. S. Nagaraja Prof H. S. Ananthanarayana Content Reviewer: Retd Prof, Department of Linguistics Osmania University, Hyderabad 500007 Paper : Historical and Comparative Linguistics Linguistics Module : Indo-Aryan Language Family Description of Module Subject Name Linguistics Paper Name Historical and Comparative Linguistics Module Title Indo-Aryan Language Family Module ID Lings_P7_M1 Quadrant 1 E-Text Paper : Historical and Comparative Linguistics Linguistics Module : Indo-Aryan Language Family INDO-ARYAN LANGUAGE FAMILY The Indo-Aryan migration theory proposes that the Indo-Aryans migrated from the Central Asian steppes into South Asia during the early part of the 2nd millennium BCE, bringing with them the Indo-Aryan languages. Migration by an Indo-European people was first hypothesized in the late 18th century, following the discovery of the Indo-European language family, when similarities between Western and Indian languages had been noted. Given these similarities, a single source or origin was proposed, which was diffused by migrations from some original homeland. This linguistic argument is supported by archaeological and anthropological research. Genetic research reveals that those migrations form part of a complex genetical puzzle on the origin and spread of the various components of the Indian population. Literary research reveals similarities between various, geographically distinct, Indo-Aryan historical cultures. The Indo-Aryan migrations started in approximately 1800 BCE, after the invention of the war chariot, and also brought Indo-Aryan languages into the Levant and possibly Inner Asia. -
On Burgenland Croatian Isoglosses Peter
Dutch Contributions to the Fourteenth International Congress of Slavists, Ohrid: Linguistics (SSGL 34), Amsterdam – New York, Rodopi, 293-331. ON BURGENLAND CROATIAN ISOGLOSSES PETER HOUTZAGERS 1. Introduction Among the Croatian dialects spoken in the Austrian province of Burgenland and the adjoining areas1 all three main dialect groups of central South Slavic2 are represented. However, the dialects have a considerable number of characteris- tics in common.3 The usual explanation for this is (1) the fact that they have been neighbours from the 16th century, when the Ot- toman invasions caused mass migrations from Croatia, Slavonia and Bos- nia; (2) the assumption that at least most of them were already neighbours before that. Ad (1) Map 14 shows the present-day and past situation in the Burgenland. The different varieties of Burgenland Croatian (henceforth “BC groups”) that are spoken nowadays and from which linguistic material is available each have their own icon. 5 1 For the sake of brevity the term “Burgenland” in this paper will include the adjoining areas inside and outside Austria where speakers of Croatian dialects can or could be found: the prov- ince of Niederösterreich, the region around Bratislava in Slovakia, a small area in the south of Moravia (Czech Republic), the Hungarian side of the Austrian-Hungarian border and an area somewhat deeper into Hungary east of Sopron and between Bratislava and Gyǡr. As can be seen from Map 1, many locations are very far from the Burgenland in the administrative sense. 2 With this term I refer to the dialect continuum formerly known as “Serbo-Croatian”. -
Dialects of Spanish and Portuguese
30 Dialects of Spanish and Portuguese JOHN M. LIPSKI 30.1 Basic Facts 30.1.1 Historical Development Spanish and Portuguese are closely related Ibero‐Romance languages whose origins can be traced to the expansion of the Latin‐speaking Roman Empire to the Iberian Peninsula; the divergence of Spanish and Portuguese began around the ninth century. Starting around 1500, both languages entered a period of global colonial expansion, giving rise to new vari- eties in the Americas and elsewhere. Sources for the development of Spanish and Portuguese include Lloyd (1987), Penny (2000, 2002), and Pharies (2007). Specific to Portuguese are fea- tures such as the retention of the seven‐vowel system of Vulgar Latin, elision of intervocalic /l/ and /n/ and the creation of nasal vowels and diphthongs, the creation of a “personal” infinitive (inflected for person and number), and retention of future subjunctive and pluper- fect indicative tenses. Spanish, essentially evolved from early Castilian and other western Ibero‐Romance dialects, is characterized by loss of Latin word‐initial /f‐/, the diphthongiza- tion of Latin tonic /ɛ/ and /ɔ/, palatalization of initial C + L clusters to /ʎ/, a complex series of changes to the sibilant consonants including devoicing and the shift of /ʃ/ to /x/, and many innovations in the pronominal system. 30.1.2 The Spanish Language Worldwide Reference grammars of Spanish include Bosque (1999a), Butt and Benjamin (2011), and Real Academia Española (2009–2011). The number of native or near‐native Spanish speakers in the world is estimated to be around 500 million. In Europe, Spanish is the official language of Spain, a quasi‐official language of Andorra and the main vernacular language of Gibraltar; it is also spoken in adjacent parts of Morocco and in Western Sahara, a former Spanish colony. -
Voicing Distinctions in the Dutch-German Dialect Continuum
Voicing distinctions in the Dutch-German dialect continuum Nina Ouddeken Meertens Instituut This study investigates the phonetics and phonology of voicing distinctions in the Dutch-German dialect continuum, which forms a transition zone between voicing and aspiration systems. Two phonological approaches to represent this contrast exist in the literature: a [±voice] approach and Laryngeal Realism. The implementation of the change between the two language types in the transition zone will provide new insights in the nature of the phonological representa- tion of the contrast. In this paper I will locate the transition zone by looking at phonetic overlap between VOT values of fortis and lenis plosives, and I will compare the two phonological approaches, showing that both face analytical problems as they cannot explain the variation observed in word-initial plosives and plosive clusters. Keywords: dialectology, phonology, phonetics, Laryngeal Realism, transition zones 1. Introduction Voicing distinctions in plosive systems have been studied extensively in phonolo- gy. Lisker & Abramson (1964) discovered that the phonetic realisations of ‘voiced’ and ‘voiceless’ plosives are not identical across languages: in word-initial posi- tion fortis and lenis plosives can be distinguished in different ways with respect to Voice Onset Time (VOT; describing the onset of vocal fold vibration relative to the moment of plosive release). Languages contrasting two plosive series typically contrast prevoiced plosives with plain voiceless plosives (voicing languages), or plain voiceless plosives with voiceless aspirated plosives (aspiration languages). The boundary between plain voiceless and aspirated plosives is placed around 20– 35 msec (depending on Place of Articulation (PoA)) by Keating (1984): Linguistics in the Netherlands 2016, 106–120. -
Exploring Occitan and Francoprovençal in Rhône-Alpes, France Michel Bert, Costa James
What counts as a linguistic border, for whom, and with what implications? Exploring Occitan and Francoprovençal in Rhône-Alpes, France Michel Bert, Costa James To cite this version: Michel Bert, Costa James. What counts as a linguistic border, for whom, and with what implications? Exploring Occitan and Francoprovençal in Rhône-Alpes, France. Dominic Watt; Carmen Llamas. Language, Borders and Identity, Edinburgh University Press, 2014, Language, Borders and Identity, 0748669779. halshs-01413325 HAL Id: halshs-01413325 https://halshs.archives-ouvertes.fr/halshs-01413325 Submitted on 9 Dec 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. What counts as a linguistic border, for whom, and with what implications? Exploring Occitan and Francoprovençal in Rhône-Alpes, France Michel Bert (DDL, Université Lumière/Lyon2) [email protected] James Costa (ICAR, Institut français de l’éducation/ENS de Lyon) [email protected] 1. Introduction Debates on the limits of the numerous Romance varieties spoken in what was once the western part of the Roman Empire have been rife for over a century (e.g. Bergounioux, 1989), and generally arose in the context of heated discussions over the constitution and legitimation of Nation-states. -
87 LANGUAGE, PEOPLE, SALIENCE, SPACE: PERCEPTUAL DIALECTOLOGY and LANGUAGE REGARD Dennis R. Preston Oklahoma State University, U
Dialectologia 5 (2010), 87-131. LANGUAGE, PEOPLE, SALIENCE, SPACE: PERCEPTUAL DIALECTOLOGY AND LANGUAGE REGARD Dennis R. Preston Oklahoma State University, USA [email protected] Abstract This paper explores the not novel idea that popular notions of the geographical distribution and status of linguistic facts are related to beliefs about the speakers of regional varieties but goes on to develop an approach to the underlying cognitive mechanisms that are employed when such connections are made. A detailed procedural account of the awakening of a response, called one of language regard , is given, as well as a structural account of an underlying attitudinal cognitorium with regard to popular beliefs about United States’ Southerners. A number of studies illustrating a variety of research tools in determining the connections, the response mechanisms, and the underlying structures of belief are provided. Keywords language regard, language attitudes, attitudinal cognitorium, perceptual dialectology, implicit and explicit attitudes 1. Introduction It is important to people whose major interest is in language variation to know where the land lies and how it is shaped and filled up — the facts of physical geography — because such matters have an influence on language. Sarah Thomason notes, for example, that Swahili will probably borrow no linguistic features from Pirahã (2001: 78) because speakers of the two languages are widely separated geographically and unlikely to run into one another. Introductory linguistics commonplaces about blocking (mountains, rivers, etc.) and facilitating (passes, rivers, etc.) geographical elements with regard to language and variety contact are also well known. There has even been suspicion that physical facts about geography are directly, even causally, related to 87 ©Universitat de Barcelona Dennis R. -
Why Major in Linguistics (And What Does a Linguist Do)? by Monica Macaulay and Kristen Syrett
1 Why Major in Linguistics (and what does a linguist do)? by Monica Macaulay and Kristen Syrett What is linguistics? Speakers of all languages know a lot about their languages, usually without knowing that they know If you are considering becoming a linguistics it. For example, as a speaker of English, you major, you probably know something about the possess knowledge about English word order. field of linguistics already. However, you may find Perhaps without even knowing it, you understand it hard to answer people who ask you, "What that Sarah admires the teacher is grammatical, exactly is linguistics, and what does a linguist do?" while Admires Sarah teacher the is not, and also They might assume that it means you speak a lot of that The teacher admires Sarah means something languages. And they may be right: you may, in entirely different. You know that when you ask a fact, be a polyglot! But while many linguists do yes-no question, you may reverse the order of speak multiple languages—or at least know a fair words at the beginning of the sentence and that the bit about multiple languages—the study of pitch of your voice goes up at the end of the linguistics means much more than this. sentence (for example, in Are you going?). Linguistics is the scientific study of language, and However, if you speak French, you might add est- many topics are studied under this umbrella. At the ce que at the beginning, and if you know American heart of linguistics is the search for the unconscious knowledge that humans have about language and how it is that children acquire it, an understanding of the structure of language in general and of particular languages, knowledge about how languages vary, and how language influences the way in which we interact with each other and think about the world. -
INTELLIGIBILITY of STANDARD GERMAN and LOW GERMAN to SPEAKERS of DUTCH Charlotte Gooskens1, Sebastian Kürschner2, Renée Van Be
INTELLIGIBILITY OF STANDARD GERMAN AND LOW GERMAN TO SPEAKERS OF DUTCH Charlotte Gooskens 1, Sebastian Kürschner 2, Renée van Bezooijen 1 1University of Groningen, The Netherlands 2 University of Erlangen-Nürnberg, Germany [email protected], [email protected], [email protected] Abstract This paper reports on the intelligibility of spoken Low German and Standard German for speakers of Dutch. Two aspects are considered. First, the relative potential for intelligibility of the Low German variety of Bremen and the High German variety of Modern Standard German for speakers of Dutch is tested. Second, the question is raised whether Low German is understood more easily by subjects from the Dutch-German border area than subjects from other areas of the Netherlands. This is investigated empirically. The results show that in general Dutch people are better at understanding Standard German than the Low German variety, but that subjects from the border area are better at understanding Low German than subjects from other parts of the country. A larger amount of previous experience with the German standard variety than with Low German dialects could explain the first result, while proximity on the sound level could explain the second result. Key words Intelligibility, German, Low German, Dutch, Levenshtein distance, language contact 1. Introduction Dutch and German originate from the same branch of West Germanic. In the Middle Ages these neighbouring languages constituted a common dialect continuum. Only when linguistic standardisation came about in connection with nation building did the two languages evolve into separate social units. A High German variety spread out over the German language area and constitutes what is regarded as Modern Standard German today. -
A New Model of Indo–European Subgrouping and Dispersal
A New Model of Indo—European Subgrouping and Dispersal For Prof. Murray Emeneau on his 95th birthday, 28 February 1999 Andrew Garrett University of California, Berkeley 1. Introduction In this century two great discoveries have shaken our view of the Indo—European family tree and protolanguage. The first was the discovery of Hittite, which in turn revealed the existence of an Anatolian branch of Indo—European; the second was the discovery in Central Asia of languages belonging to the previously unknown Tocharian branch of the family. Yet as important as these are, they are not the only twentieth century archaeological finds with Indo—European ramifications. In this paper I will explore the implications of a less dramatic set of discoveries for Indo—European subgrouping. I begin with a question posed in recent work by Johanna Nichols. Like many profound questions, this one is both shockingly obvious and disturbingly obscure: Why does Indo—European have so many branches? Ten are fully documented, and the count rises if you add the so—called ‘minor’ languages – Phrygian, Macedonian, Thracian, Venetic, and others known only through inscriptional remains. This is shown graphically in (1): 1 Celtic Italic Germanic Albanian Greek Indo—European Armenian Anatolian Balto—Slavic Indo—Iranian Tocharian ‘Minor’ languages: Phrygian, Macedonian, etc. Typical subgrouping situations are of two distinct types. One is the family tree with binary or ternary branching. This corresponds to situations where a speech community is separated for some reason, such as population movement into or out of the area it occupies, and the newly separated communities evolve in relative linguistic isolation. -
LNGT0101 Introduction to Linguistics
LNGT0101 Announcements Introduction to Linguistics If you don’t hear from me about your LAP proposal, then you’re good to go. Presentations on Monday: Myth 2: Some languages are just not good enough. Myth 4: French is a logical language. Myth 11: Italian is beautiful; German is ugly. Myth 6: Women talk too much. Lecture #17 Reactions to The Linguists? Nov 9th, 2011 Any questions? Transition from last class Sociolinguistics Sociolinguistics is the study of language in social contexts. It focuses on the language of the speech community and variation within Short clips from ‘Do you speak American?’ with Dennis Preston and Bill Labov. that speech community. There are several sociological variables that affect our usage of language, and sociolinguists are interested in studying linguistic variation correlated with these variables. So, … Variables affecting language use Region. What are some of the sociological variables Ethnicity. that may correlate with linguistic variation? Socio-economic background. Education. Age. Gender. Register/Style Whether or not you know another language. 1 Do you speak American? Northern Cities Vowel Shift http://www.pbs.org/speak/ahead/change/vowelpower/vowel.html The Northern Cities Vowel Shift: Another excerpt from ‘Do you speak American?’ From O’Grady et al 2005, p. 511. The language-dialect distinction The language-dialect distinction So, if one of you grew up in New England and Sociolinguists focus on linguistic diversity another one was born and raised in Georgia, internal to speech communities. One such case you’re still able to understand one another, of linguistic diversity is dialectal variation. despite differences in the language variety So, what’s the difference between a language each of you speaks. -
Geography and Language
Language Contact Linguistics 203 Languages of the World Major Language Families Source: Comrie et al. (2003) Lack of Language Contact • leads to greater differences in related languages/dialects • Reasons for lack of contact • migration • physical boundaries (mountains, rivers, etc.) • political • religious / cultural Language Contact • causes languages to become more similar • shared vocabulary • shared grammatical structures • shared sounds / sound systems • etc English vocabulary • English • Norman Conquest in 1066 • French became language of aristocracy for next 300- 400 years • Large influx of French vocabulary; often more erudite or specialized French origin Anglo-saxon French origin Anglo-saxon combat fight finish end conceal hide gain win cordial hearty mount go up economy thrift perish die egotism selfishness tolerate put up with Sprachbund • A group of unrelated languages which come to share similarities due their proximity and language contact (and not due to common inheritance). Balkan Sprachbund Source: Comrie et al. (2003) Balkan Sprachbund • Languages from four Indo-European families 1. Slavic Serbo-Croatian, Bulgarian, Macedonian 2. Romance Romanian 3. Hellenic Greek 4. Albanian Albanian Balkan Sprachbund • definite article is a suffix (except in Greek) Romanian lup-ul (cf. Spanish) el lobo wolf-the the wolf Bulgarian voda-ta (cf. Russian) ta voda water-the that water Albanian mik-u friend-the Data in ‘Balkan Sprachbund’ section mainly from Comrie et al. (2003) and lecture notes from Matthew Gordon. Balkan Sprachbund • Have case system with Nom (subject), Acc (direct object), Dat (‘to’ + NP), Gen (‘of’ + NP) • Nom+Acc may be merged, and Dat+Gen may be merged Romanian lup-ul-ui (cf. Spanish*) al lobo ‘to the wolf’ wolf-the-to/of del lobo ‘of the wolf’ Bulgarian na starikut (cf.