Geography and Language
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Adapting MARBERT for Improved Arabic Dialect Identification
Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task Badr AlKhamissi∗ Mohamed Gabr∗ Independent Microsoft EGDC badr [at] khamissi.com mohamed.gabr [at] microsoft.com Muhammed ElNokrashy Khaled Essam Microsoft EGDC Mendel.ai muelnokr [at] microsoft.com khaled.essam [at] mendel.ai Abstract syntax, morphology, vocabulary, and even orthog- raphy. Dialects may be heavily influenced by pre- In this paper, we tackle the Nuanced Ara- viously dominant local languages. For example, bic Dialect Identification (NADI) shared task Egyptian variants are influenced by the Coptic lan- (Abdul-Mageed et al., 2021) and demonstrate guage, while Sudanese variants are influenced by state-of-the-art results on all of its four sub- the Nubian language. tasks. Tasks are to identify the geographic ori- gin of short Dialectal (DA) and Modern Stan- In this paper, we study the classification of such dard Arabic (MSA) utterances at the levels of variants and describe our model that achieves state- both country and province. Our final model is of-the-art results on all of the four Nuanced Ara- an ensemble of variants built on top of MAR- bic Dialect Identification (NADI) subtasks (Abdul- BERT that achieves an F1-score of 34:03% for Mageed et al., 2021). The task focuses on distin- DA at the country-level development set—an guishing both MSA and DA by their geographi- improvement of 7:63% from previous work. cal origin at both the country and province levels. The data is a collection of tweets covering 100 1 Introduction provinces from 21 Arab countries. -
Linguistics Development Team
Development Team Principal Investigator: Prof. Pramod Pandey Centre for Linguistics / SLL&CS Jawaharlal Nehru University, New Delhi Email: [email protected] Paper Coordinator: Prof. K. S. Nagaraja Department of Linguistics, Deccan College Post-Graduate Research Institute, Pune- 411006, [email protected] Content Writer: Prof. K. S. Nagaraja Prof H. S. Ananthanarayana Content Reviewer: Retd Prof, Department of Linguistics Osmania University, Hyderabad 500007 Paper : Historical and Comparative Linguistics Linguistics Module : Indo-Aryan Language Family Description of Module Subject Name Linguistics Paper Name Historical and Comparative Linguistics Module Title Indo-Aryan Language Family Module ID Lings_P7_M1 Quadrant 1 E-Text Paper : Historical and Comparative Linguistics Linguistics Module : Indo-Aryan Language Family INDO-ARYAN LANGUAGE FAMILY The Indo-Aryan migration theory proposes that the Indo-Aryans migrated from the Central Asian steppes into South Asia during the early part of the 2nd millennium BCE, bringing with them the Indo-Aryan languages. Migration by an Indo-European people was first hypothesized in the late 18th century, following the discovery of the Indo-European language family, when similarities between Western and Indian languages had been noted. Given these similarities, a single source or origin was proposed, which was diffused by migrations from some original homeland. This linguistic argument is supported by archaeological and anthropological research. Genetic research reveals that those migrations form part of a complex genetical puzzle on the origin and spread of the various components of the Indian population. Literary research reveals similarities between various, geographically distinct, Indo-Aryan historical cultures. The Indo-Aryan migrations started in approximately 1800 BCE, after the invention of the war chariot, and also brought Indo-Aryan languages into the Levant and possibly Inner Asia. -
On Burgenland Croatian Isoglosses Peter
Dutch Contributions to the Fourteenth International Congress of Slavists, Ohrid: Linguistics (SSGL 34), Amsterdam – New York, Rodopi, 293-331. ON BURGENLAND CROATIAN ISOGLOSSES PETER HOUTZAGERS 1. Introduction Among the Croatian dialects spoken in the Austrian province of Burgenland and the adjoining areas1 all three main dialect groups of central South Slavic2 are represented. However, the dialects have a considerable number of characteris- tics in common.3 The usual explanation for this is (1) the fact that they have been neighbours from the 16th century, when the Ot- toman invasions caused mass migrations from Croatia, Slavonia and Bos- nia; (2) the assumption that at least most of them were already neighbours before that. Ad (1) Map 14 shows the present-day and past situation in the Burgenland. The different varieties of Burgenland Croatian (henceforth “BC groups”) that are spoken nowadays and from which linguistic material is available each have their own icon. 5 1 For the sake of brevity the term “Burgenland” in this paper will include the adjoining areas inside and outside Austria where speakers of Croatian dialects can or could be found: the prov- ince of Niederösterreich, the region around Bratislava in Slovakia, a small area in the south of Moravia (Czech Republic), the Hungarian side of the Austrian-Hungarian border and an area somewhat deeper into Hungary east of Sopron and between Bratislava and Gyǡr. As can be seen from Map 1, many locations are very far from the Burgenland in the administrative sense. 2 With this term I refer to the dialect continuum formerly known as “Serbo-Croatian”. -
Iberian Imperialism and Language Evolution in Latin America
11 * The Ecology of Language Evolution in Latin America: A Haitian Postscript toward a Postcolonial Sequel michel degraff While reading the preceding chapters in this volume, on Iberian Imperial- ism and Language Evolution in Latin America, I kept trading two distinct hats on my bald head: one for the theoretical linguist interested in the cognitive aspects of language contact and language evolution, the other for the MIT professor challenged by social injustice in language policy and education in my native Haiti and other Creole-speaking communi- ties. These communities, like many others in the world, including the United States, still suff er from insidious colonial and neocolonial impe- rialist prejudices and practices. By the time I fi nished those chapters, I realized that the two hats are fundamentally made of the same material. As a theoretical linguist, I was fascinated by the contributors’ insight- ful illustrations of the complexity of language contact in Latin America— complexity in sociohistorical, ecological, and linguistic-structural dimen- sions. As a Haitian and a Haitian Creole–speaking linguist, I was curious as to how language shift, language change, language endangerment, and (meta-)linguistic correlates of social hierarchies in Iberian America may help us better understand related phenomena in the Caribbean, and vice versa. I’ve used the phrases Latin America and Iberian America with some trepidation, as I realize that the chapters to which I am responding have focused exclusively on areas of Latin America that were colonized by the Spanish or the Portuguese, leaving aside Latin American territories that were or are still under the control of France. -
Voicing Distinctions in the Dutch-German Dialect Continuum
Voicing distinctions in the Dutch-German dialect continuum Nina Ouddeken Meertens Instituut This study investigates the phonetics and phonology of voicing distinctions in the Dutch-German dialect continuum, which forms a transition zone between voicing and aspiration systems. Two phonological approaches to represent this contrast exist in the literature: a [±voice] approach and Laryngeal Realism. The implementation of the change between the two language types in the transition zone will provide new insights in the nature of the phonological representa- tion of the contrast. In this paper I will locate the transition zone by looking at phonetic overlap between VOT values of fortis and lenis plosives, and I will compare the two phonological approaches, showing that both face analytical problems as they cannot explain the variation observed in word-initial plosives and plosive clusters. Keywords: dialectology, phonology, phonetics, Laryngeal Realism, transition zones 1. Introduction Voicing distinctions in plosive systems have been studied extensively in phonolo- gy. Lisker & Abramson (1964) discovered that the phonetic realisations of ‘voiced’ and ‘voiceless’ plosives are not identical across languages: in word-initial posi- tion fortis and lenis plosives can be distinguished in different ways with respect to Voice Onset Time (VOT; describing the onset of vocal fold vibration relative to the moment of plosive release). Languages contrasting two plosive series typically contrast prevoiced plosives with plain voiceless plosives (voicing languages), or plain voiceless plosives with voiceless aspirated plosives (aspiration languages). The boundary between plain voiceless and aspirated plosives is placed around 20– 35 msec (depending on Place of Articulation (PoA)) by Keating (1984): Linguistics in the Netherlands 2016, 106–120. -
Language Typology and Sprachliche Universalien La Typologie Des
i ;.1,ijr! jri:j...li'' ljl LanguageTypology and LanguageUniversals Sprachtypologieund sprachlicheuniversalien La typologiedes langues et lesuniversaux linguistiques An InternationalHandbook / Ein internationalesHandbuch / Manuelinternational Editedby I Herausgegebenvon / Edite par Martin Haspelmath' Ekkehard Konig Wulf Oesterreicher' WolfgangRaible Volume2,1 2. Halbband/ Tome2 Walter de Gruvter ' Berlin ' New York 2001 i*.' 1492 XIV. Typologicalcharacterization of languagefamilies and linguisticareas 107.The European linguistic area: Standard Average European L lntroduction guagesshare structural features which cannot 2. The major SAE features be due to retention from a common proto- 3. Somefurther likely SAE features languageand which give these languagesa 4. Degreesof membershipin SAE profile that makesthem stand out amongthe 5. How did SAE come into being? is thus no min- 6. Abbreviationsof languagenames surroundinglanguages. There 7. References imum number of languagesthat a linguistic area comprisesQtace Stolz 2001a).In prin- ciple, there could be a linguistic area con- 1. Introduction sisting of just two languages(though this would be rather uninteresting), and there This article summarizessome of the main are also very large(continent-sized) linguistic piecesof evidencefor a linguistic area (or areas (Dryer 1989a).Likewise, there is no li rlltt' i. rili Sprachbund)in Europe that comprises the minimum number of structural featuresthat llll'itt Romance, Germanic and Balto-Slavic lan- the languagesmust sharein order to qualify guages,the -
Language in Croatia: Influenced by Nationalism
Language in Croatia: Influenced by Nationalism Senior Essay Department of Linguistics, Yale University CatherineM. Dolan Primary Advisor: Prof. Robert D. Greenberg Secondary Advisor: Prof. Dianne Jonas May 1, 2006 Abstract Language and nationalism are closely linked, and this paper examines the relationship between the two. Nationalism is seen to be a powerful force which is capable ofusing language for political purposes, and the field oflinguistics has developed terminology with which the interface oflanguage and nationalism maybe studied. Using this background, the language situation in Croatia may be examined and seen to be complex. Even after thorough evaluation it is difficult to determine how languages and dialects should be delineated in Croatia, but it is certain that nationalism and politics play key roles in promoting the nation's linguistic ideals. 2 , Acknowledgements I suppose I could say that this essay was birthed almost two years ago, when I spent the summer traveling with a group ofstudents throughout Croatia, Bosnia and Serbia in order to study issues ofjustice and reconciliation. Had I never traveled in the region I may have never gained an interest in the people, their history and, yes, their language(s). Even after conducting a rigorous academic study ofthe issues plaguing former Socialist Federal Republic ofYugoslavia, I carry with me the impression that this topic can never be taken entirely into the intellectual realm; I am reminded by my memories that the Balkan conflicts involve people just as real as myself. For this, I thank all those who shared those six weeks oftraveling. That summer gave me new perspectives on many areas oflife. -
Roman Jakobson's Conception of «Sprachbund»
Cahiers de l’ILSL, N° 9, 1997, pp. 199-204 Roman Jakobson’s Conception of «Sprachbund» Helmut W. SCHALLER University of Marburg In the linguistic literature one frequently comes across the term «Sprach- bund», which is generally accepted these days as a linguistic term. One has only to think of the Balkan Sprachbund as an example. This term, however, as will be shown later, has not been given an absolute definition, but has nevertheless been applied since 1930 to languages of different families which show linguistic similarities. In addition to «Balkan Sprachbund», the terms «Europäischer Sprachbund», «Donausprachbund», «Eurasischer Sprachbund», «Evrazijskij sojuz» were also in existence. To illustrate the vagueness of the notion «Sprachbund» since Tru- betzkoy and Jakobson, I should like to make a survey of its usage and then attempt to come to some definition with special reference to the «Bal- kansprachbund». The notion «Sprachbund» was first mooted by N. Tru- betzkoy first of all known as the founder of the phonological method in 1923 in «Vavilonskaja bašnja i smešenie jazykov», then at the First Inter- national Congress of Linguists in The Hague in 1928, in order to add to language families and groups another term, which takes into account the linguistic peculiarities which have arisen from mutual influences between languages. Trubetzkoy writes : Viele Missverständnisse und Fehler entstehen dadurch, dass die Sprachforscher die Ausdrücke Sprachgruppe und Sprachfamilie ohne genügende Vorsicht und in zu wenig bestimmter Bedeutung gebrauchen. -
INTELLIGIBILITY of STANDARD GERMAN and LOW GERMAN to SPEAKERS of DUTCH Charlotte Gooskens1, Sebastian Kürschner2, Renée Van Be
INTELLIGIBILITY OF STANDARD GERMAN AND LOW GERMAN TO SPEAKERS OF DUTCH Charlotte Gooskens 1, Sebastian Kürschner 2, Renée van Bezooijen 1 1University of Groningen, The Netherlands 2 University of Erlangen-Nürnberg, Germany [email protected], [email protected], [email protected] Abstract This paper reports on the intelligibility of spoken Low German and Standard German for speakers of Dutch. Two aspects are considered. First, the relative potential for intelligibility of the Low German variety of Bremen and the High German variety of Modern Standard German for speakers of Dutch is tested. Second, the question is raised whether Low German is understood more easily by subjects from the Dutch-German border area than subjects from other areas of the Netherlands. This is investigated empirically. The results show that in general Dutch people are better at understanding Standard German than the Low German variety, but that subjects from the border area are better at understanding Low German than subjects from other parts of the country. A larger amount of previous experience with the German standard variety than with Low German dialects could explain the first result, while proximity on the sound level could explain the second result. Key words Intelligibility, German, Low German, Dutch, Levenshtein distance, language contact 1. Introduction Dutch and German originate from the same branch of West Germanic. In the Middle Ages these neighbouring languages constituted a common dialect continuum. Only when linguistic standardisation came about in connection with nation building did the two languages evolve into separate social units. A High German variety spread out over the German language area and constitutes what is regarded as Modern Standard German today. -
A New Model of Indo–European Subgrouping and Dispersal
A New Model of Indo—European Subgrouping and Dispersal For Prof. Murray Emeneau on his 95th birthday, 28 February 1999 Andrew Garrett University of California, Berkeley 1. Introduction In this century two great discoveries have shaken our view of the Indo—European family tree and protolanguage. The first was the discovery of Hittite, which in turn revealed the existence of an Anatolian branch of Indo—European; the second was the discovery in Central Asia of languages belonging to the previously unknown Tocharian branch of the family. Yet as important as these are, they are not the only twentieth century archaeological finds with Indo—European ramifications. In this paper I will explore the implications of a less dramatic set of discoveries for Indo—European subgrouping. I begin with a question posed in recent work by Johanna Nichols. Like many profound questions, this one is both shockingly obvious and disturbingly obscure: Why does Indo—European have so many branches? Ten are fully documented, and the count rises if you add the so—called ‘minor’ languages – Phrygian, Macedonian, Thracian, Venetic, and others known only through inscriptional remains. This is shown graphically in (1): 1 Celtic Italic Germanic Albanian Greek Indo—European Armenian Anatolian Balto—Slavic Indo—Iranian Tocharian ‘Minor’ languages: Phrygian, Macedonian, etc. Typical subgrouping situations are of two distinct types. One is the family tree with binary or ternary branching. This corresponds to situations where a speech community is separated for some reason, such as population movement into or out of the area it occupies, and the newly separated communities evolve in relative linguistic isolation. -
LNGT0101 Introduction to Linguistics
LNGT0101 Announcements Introduction to Linguistics If you don’t hear from me about your LAP proposal, then you’re good to go. Presentations on Monday: Myth 2: Some languages are just not good enough. Myth 4: French is a logical language. Myth 11: Italian is beautiful; German is ugly. Myth 6: Women talk too much. Lecture #17 Reactions to The Linguists? Nov 9th, 2011 Any questions? Transition from last class Sociolinguistics Sociolinguistics is the study of language in social contexts. It focuses on the language of the speech community and variation within Short clips from ‘Do you speak American?’ with Dennis Preston and Bill Labov. that speech community. There are several sociological variables that affect our usage of language, and sociolinguists are interested in studying linguistic variation correlated with these variables. So, … Variables affecting language use Region. What are some of the sociological variables Ethnicity. that may correlate with linguistic variation? Socio-economic background. Education. Age. Gender. Register/Style Whether or not you know another language. 1 Do you speak American? Northern Cities Vowel Shift http://www.pbs.org/speak/ahead/change/vowelpower/vowel.html The Northern Cities Vowel Shift: Another excerpt from ‘Do you speak American?’ From O’Grady et al 2005, p. 511. The language-dialect distinction The language-dialect distinction So, if one of you grew up in New England and Sociolinguists focus on linguistic diversity another one was born and raised in Georgia, internal to speech communities. One such case you’re still able to understand one another, of linguistic diversity is dialectal variation. despite differences in the language variety So, what’s the difference between a language each of you speaks. -
SLS Handout MD in Balkan Sprachbund Context
Multiple Determination in a Balkan Sprachbund Context: Albanian and Balkan Slavic SLS 14, Potsdam, September 11-13 2019 Catherine Rudin Victor A. Friedman Wayne State College University of Chicago & La Trobe University [email protected] [email protected] MULTIPLE DETERMINATION (MD) (aka Double Determination) More than one indicator oF deFiniteness within DP. Many types; see e.g. Alexiadou (2014). We concentrate on one: DEM + DEF (demonstrative + definite article/inflection) ... IN A BALKAN SPRACHBUND CONTEXT Intensive lanGuaGe contact area in the Balkan peninsula, known For numerous shared Grammatical Features (as well as lexical borrowinG) – afFecting: Balkan Slavic (BulGarian, Macedonian & Torlak [=southeastern] BCMS) Balkan Romance (Romanian, Aromanian, MeGlenoromanian) Romani in the Balkans (Vlax & Balkan Romani) Albanian Balkan Judezmo Greek West Rumelian Turkish & GaGauz SYNTACTIC “BALKANISMS” INCLUDE: (among many others....) • replacement of infinitive (finite verb with modal/subjunctive particle) • postposed definite article (Albanian, Romance, Slavic) • Multiple Determination constructions (e.g. Friedman 2006, Joseph 2019) The converGence oF the Balkan lanGuaGes is strikinG – but the view of the Balkan Sprachbund as “one grammar with several lexicons” (Kopitar 1829:86) requires nuancing. As demonstrated in Joseph (1983), even core Balkanisms like inFinitive replacement show varyinG maniFestations in the diFFerent lanGuaGes. True for MD as well: All Balkan languages have some form of MD, but realization and place in the grammar