Online Language Ecology: Twitter in Europe
Total Page:16
File Type:pdf, Size:1020Kb
. l a c s a P e s i a l B s e r i a t i s : r o e t v i n n o i U t u s b e i r s t s n e o r c P a . s f i o s ) y l s a s n e A c o c r i t p s i w u e g i n v i . l t e o r x i r e c remain few: interpretations Lee, of observed patterns. Studies in which reviewing online language choice is documented methodological issues existing pertaining t research than into population online 2014), many language studies of bilingualism or linguistic diversity have diversity focused on individual rather as Twitter and (e.g., Leetaru et al. 2013; Mocanu et a language (PaolilloWhile 2007). transparent thesituation has 11 inthe withseveral changed since, years frequently cited studies are often the products of marketing companies whose empirical work documenting relative frequency of actual use of particular languages. It was online noted in 2007 that in the language absence of empirical research into choice CMC language choice has typically in offered only an overview ofpopulations, survey t data, the not most documented usage (e.g. European other from regions. toEurope migration by and increased within Europe been Commission 2012), and for many aggregate Europeans, particularly among young people, and risk overall of linguistic immediate displacement. Bilingualism with English, however, has become the wi norm trends that are particularly evident in online communicative domains. In increasing Europe, languages bi movements an representation (Leetaru et al. 2013 media, and platforms communicatio such as Recent Twitter years have have seen become an multilingual increase in 1. sites the with relative global prominence of computer onTwitter. language ofthat thesignal and count language. The results show that high rates of English use can be attested for most a European language network was created based on the likelihood entropy measure. Language European countries/territories and language groups were created and quantified by using an accessing the Twitter APIs. After language detection and filtering, linguistic profilescountries. situations to ofEuropean thelanguage in relation for perspective Twitterand platform fromconsidered ischaracterized ageographical media social yet been documented i immigrant languages, and English are used online (i.e. the online language ecology) has not of populations remains limited. In Europe, the extent to which official Social media and platforms are increasingly local languages, t e o n e s th official statuses o p r c o l e f ries ries and that a positive relationship exists between the size of a linguistic community a Despite Despite this, large A large corpus of Twitter mess h i t a c r r g o Introduction andBackground Introduction - e n p use profiles at country level having been produced for social media platforms such i r r m o u m d C - o n (CMC) modalities such as texting, instant messaging, or posting on social e n c d changes in education and media consumption have contributed towards an Online Language and and multilingualization of local environm d o - i a a t level level use, or have not focused specifically on Europe. In some cases, a n m i c i s d n e continue to receive reinforcement in education and media and are not in e u g s n depth. In this study, the online language ecology of Europe on the m n u - a scale studies of online language in Europe have often been based on m e h o b c t - C use distributions were compared to results from survey data, and g o d n n i e t t t c s a i e u l f d o data collection and filtering have weakened the proposed m e e r t ; M n n - r Mocanu et al. 2013). At the same time, population e o t i e t n s Ecology:TwitterEurope in used r ages ages with geographical metadata was created by u o e p c v m e t , but knowledge of the online linguistic behavior o h e 46 s T C e g p n y i t l. 2013; Magdy et al. 2014; Graham et al. - d l i n u o n B . e ) of bi h 9 t ( 1 ents, ents, particularly with English t 0 n 2 i ( - r ) p or m . t s s d o University of Finland Oulu, University e p ( ultilingual ultilingual users sharing a . e h W t . s i E , s i e l h T m methods are not Steven COATS Steven e t S diversity diversity has & , - . mediated R . C , m — he he a h g i W . l a c s a P e s i a l B s e r i a t i s : r o e t v i n n o i U t u s b e i r s t s n e o r c P a . s f i o s ) y l s a s n e A c o c r i t p s i w u e g i n v i . l t e o r x i r e c this book) corpus For example, Frey et al. (2015) datacollected from Facebook users in order to create (Nguyen et al. mobility (Moise et al. 2016), and that Twitter data can be used to inve Magdy et al. 2014). It has been shown that geo using Twittercountries data been have produced (e.g. et 2013;Leetaru al. al.2013;Mocanu et groups on Twitter. Maps and language use profiles for individual languages or for individual CMC platforms. Hong e al.2014) et English (Bamman inAmerican features (Coa grammatical variation in English et al. 2014), African hashtags (Wik investigated aspects of the use of English necessary on on English in Europe. Twitter Since the including introduction of the the platform in discourse 2006, research has functions of 2016). the 2002; discussion see contributions a infor domains (Görlach Linn some European societies, English may be ecology of Europe (De Swaa replacing local languages in certain knowledge functional of the language has cement to its status as a lingua Commission franca, particular attention has been European paid context, to for English: example, Increasing communicative the contexts, Eurobarometer or surveys the conducted Extensive attitudes research by has investigated knowledge of of the speakers European towards their use. In the 2.1. broader 2. diversity. for ononlinework language prospects research and survey data. The quantified. In Section conclusion 6, the notes results are discussed some and interprete methodological of a language network issues in which the and tendency of bi future Commission (European thecreation language survey Eurobarometer 5presents 2012).Section entities and languages. terms of mean linguistic diversity from the perspective of The European country Twitter data quantify is compared to users, and identification the of tweet language responses are described, and collection of the of entropy data measure froma used to Twitter’s APIs, data language and filtering, online geographical language localization of Twitter languages of the world” (Haugen 1972: 337). In Section 2, related work on CMC and Twitter us something about where stand languages and where they are going in comparison with other of European users on the Twitter platform, in (2016: 129). islacking diversity” linguistic multilingualism, notes th t e o n e s o p r c o l ts 2016a, 2016b), or the interaction between demographic parameters and language e f a In In light of these factors, this study provides a characterization of the linguistic diversity Several Several research projects have investigated language choice and multilin Language h i t a c a a 650,000 r r g o Multilingualism, CMC and Twitter and Twitter CMC Multilingualism, Work Related e linguistic linguistic diversity is introduced. Section 4 presents the results of the analysis in n p i , and other projects have investigat r r m o u m d C (2006; 2011 (2006; o ström 2014; Squires 2015), lexical innovation in American English (Eisenstein e n c 2015). 2015). Studies with a specific geographical focus hav - d o i a related related CMC resea a t - a token multilingual corpus from South Tyrol n m i c i s d n e e u - g s American American Vernacular English on Twitter (Jørgensen et al. 2015), m n u a at at in many contexts, a “reliable, quantitative measurement of online t al. m ; 2012) provide dat ; 2012)provide e h o b c t C g o n 2001; Soler d n n i (2010) (2010) found different posting behavior by different language e t t t diversity diversity is reviewed. In Section 3, the methods used for the c s a i e u l f d - language language Twitter from Finland and the Nordic countries m e rch has also been heavily focused on English, although not e r t M n n - r e o t i e t n s r u o ed ed its “hypercentral” e p c - v Carbonell 2016), and it has been suggested that in m e t a onselfa o ed ed linguistic diversity on social media at city h e 47 s T C the hope that a macro e - g located tweets can shed light on international p n y i t - d languages, languages, their status in various media or , among other topics other , among l - i n - u o and and multilinguals to share a language is reported language useDue attitudes.