<<

Accommodation of Conversational Code-Choice

Anshul Bawa Monojit Choudhury Kalika Bali Microsoft Research Bangalore, India {t-anbaw,monojitc,kalikab}@microsoft.com

Abstract code-choice may be unexpected and noticed by other speakers, and is likely to affect other speak- Bilingual speakers often freely mix lan- ers’ subsequent code-choice. In other words, guages. However, in such bilingual con- speakers may accommodate to each other’s code- versations, are the choices of choice, positively or negatively (Genesee, 1982). the speakers coordinated? How much In this work, we propose a set of metrics to does one speaker’s choice of language study the social accommodation of code-choice affect other speakers? In this paper, as a sociolinguistic style marker. We build upon we formulate code-choice as a linguis- the existing framework on accommodation by tic style, and show that speakers are in- Danescu-Niculescu-Mizil et al.(2011) and adapt deed sensitive to and accommodating of that for code-choice by introducing relevant fea- each other’s code-choice. We find that tures for code-choice. We then motivate and illus- the or markedness of a language trate the effect of code markedness on the degree in context directly affects the degree of of accommodation - the more salient code is more accommodation observed. More impor- strongly accommodated for. We further generalize tantly, we discover that accommodation of the framework to also account for delayed accom- code-choices persists over several conver- modation, instead of only next-turn or immediate sational turns. We also propose an alter- accommodation. native interpretation of conversational ac- In addition, we introduce an alternative view of commodation as a retrieval problem, and accommodation as a query-response task, and em- show that the differences in accommo- ploy mean reciprocal rank, a well-understood met- dation characteristics of code-choices are ric from the domain of Information Retrieval, as a based on their markedness in context. metric for latency of accommodation. We mea- sure how quickly a style marker (code-choice in 1 Introduction our case) introduced by a speaker is retrieved by Code-switching (CS) refers to the fluid alter- the other speaker during the conversation. Our ation between two or more within a approach is developed for analyzing code-choice conversation, and is a common feature of all but is applicable to other dimensions of linguis- multilingual societies. (Auer, 2013). Multilin- tic style as well (Tausczik and Pennebaker, 2010). gual speakers are known to code-switch in spo- This presents an alternative view of conversational ken conversations for a variety of reasons, moti- style accommodation and offers a simple but ef- vated by information-theoretic and cognitive prin- fective way of measuring, characterizing and even ciples, and also as a result of numerous social, predicting elements of conversational style. communicative and pragmatic functions (Scotton We test this formulation on two CS conversa- and Ury, 1977;S oderberg¨ Arnfast and Jørgensen, tional datasets - dialog scripts of bilingual Indian 2003; Gumperz, 1982). movies (in English and Hindi) and a transcrip- Code-choice refers to a speaker’s decision of tion of real-world conversations between Spanish- which code to use in a given utterance, and in English bilinguals in Florida, US. In both the case of a CS utterance, to what extent the differ- corpora, we observe strong signals of interper- ent codes are to be used. Depending on the soci- sonal code-choice accommodation for the salient olinguistic and conversational context, a speaker’s or marked code. We also observe that on average,

82 Proceedings of The Third Workshop on Computational Approaches to Code-Switching, pages 82–91 Melbourne, Australia, July 19, 2018. c 2018 Association for Computational Linguistics the marked code is accommodated within the first their linguistic styles towards (or away from) each three to four conversational turns, beyond which other in a conversation for social effect. In the the effect of accommodation on code-choice de- CAT framework, the interlocutors’ desire for ‘so- cays gradually. Contextually-unmarked code is cial approval’ results in an attempt to match each less strongly accommodated for, even when it oc- other’s linguistic style. Accommodation has been curs relatively infrequently within a conversation. studied for many markers of linguistic style like As far as we know, this is the first computa- tense, negations, articles, prepositions, pronouns tional study of code-choice accommodation, and and sentiment (Taylor and Thomas, 2008; Nieder- first work that introduces and formalizes the con- hoffer and Pennebaker, 2002). cept of delayed accommodation, that can be ap- Since it is possible to convey the same semantic plied to other style dimensions as well. content while widely varying the extent of CS, we The rest of the paper is organized as follows. also consider code-choice as a linguistic style di- We describe the background and related work in mension. Therefore, we expect to observe accom- Section2, which motivates the first formulation of modation in terms of code-choice in similar man- code-choice accommodation in Section3. We im- ner to that of variables for other linguistic styles. prove this by formulation by modifying the fea- While there have been linguistic and small-scale tures in Section4. We generalize the formulation studies (Sachdev and Giles, 2004; Bourhis, 2008; to multiple turns and introduce the analogy to re- Bissoonauth and Offord, 2001; y Bourhis et al., trieval in Section5, along with the results. We 2007) that argue for prevalence of code-choice ac- wrap up with a discussion in Section6 that we commodation, there are no large-scale quantitative conclude in Section7. or computational studies that corroborate this and shed light on the various patterns of code-choice 2 Related Work accommodation. Further, these studies rely on simple correlation-based measures. CS is employed by speakers to signal a common multilingual identity(Auer, 2005), and can be ef- The first computational study of linguistic style fectively used to reduce (or increase) the perceived accommodation (Danescu-Niculescu-Mizil et al., social distance between the speakers (Camilleri, 2011) shows that it is highly prevalent in Twit- 1996). As a marker of informality, it has been ter conversations. They use binary features for shown to lower interpersonal distance (Myers- the presence of various psychologically - Scotton, 1995; Genesee, 1982). ful word categories as described by the LIWC Common structural patterns in CS as well as method(Tausczik and Pennebaker, 2010) to iden- the choice to switch between languages have tify stylistic variations in tweets. They then de- been the focus of many linguistic studies(Poplack, fine a probabilistic framework that mathematically 1988)(Auer, 1995). As CS is typically used as a models style accommodation in terms of the like- conversation strategy by bilinguals who are profi- lihood of an addressee to respond in the same style cient in both languages (Auer, 2013), it is not sur- as the speaker. prising that certain pragmatic and socio-linguistic Though CS is similar enough to other kinds of factors, such as formality of context (Fishman, linguistic style to allow analysis using the same 1970), age (Ervin-Tripp and Reyes, 2005), expres- framework, it also differs from them in being sion of emotion (Dewaele, 2010) and sentiment a strong sociological indicator of identity (Auer, (Rudra et al., 2016), are found to signal language 2005) and in not being processed nonconsciously preference in CS conversations. A Twitter study (Levelt and Kelter, 1982). We demonstrate that a of CS patterns across several geographies (Rijh- model that does not account for these crucial dif- wani et al., 2017), also suggests that there might be ferences fails to capture the accommodative pat- complex sociolinguistic reasons for code-choice. terns of code-choice. Because of being processed Thus, CS, and the choice of language or code consciously, code-choice also exhibits accommo- in which one communicates during a multilingual dation over several conversational turns, an ef- conversation, could be considered a marker of lin- fect which is not observed as strongly for other guistic style. style dimensions (Danescu-Niculescu-Mizil et al., Communication accommodation theory (Giles 2011). Long-term effects in accommodation have et al., 1973; Giles, 2007) states that speakers shift received very little attention, and have mostly

83 studied based on crude conversation-level correla- (Es denotes an expectation over all speakers s) tion values (Niederhoffer and Pennebaker, 2002). ∗  Acm (F ) = Es Acms(F )   3 Accommodation of Code-Choice as = Es P δ F δS(d)=s δ F di di−1 (2) Linguistic Style  − P (δ F δS(d)=s) di As a first step, we adapt an existing framework (Danescu-Niculescu-Mizil et al., 2011) that quan- 3.2 Measuring Code-Choice tifies accommodation of a given linguistic style. Our general hypothesis is that code-choice is re- Any linguistic feature is said to exhibit accommo- ciprocated in a bilingual conversation. To mea- dation if it is more likely to be expressed in re- sure this, we introduce simple binary features for sponse to a dialog that also expresses it, than oth- presence of each code, along the lines of the bi- erwise. In other words, an accommodative feature nary features in (Danescu-Niculescu-Mizil et al., in a dialog begets the same feature in the next dia- 2011), with individual language expression substi- log. We use the term ‘dialog’ or ‘turn’ to refer to a tuting for the style dimensions. For each language single spoken utterance or dialog within a conver- L, we define a feature FL indicating, for a dialog sation, and the term ‘speaker’ to refer to conversa- d, if the dialog contains words in the language L. tion participants. This framework thus restricts the The event that dialog d is at least partially in L, definition of accommodation to only single-turn FL is denoted by δdFL . In other words, δd is true if effects. the language L is expressed in dialog d, and false otherwise. 3.1 Measuring Accommodation 3.3 Data Mathematically, let F denote some binary feature We employ two datasets of bilingual conversa- over a dialog (we describe the features themselves tions, each in a different conversational context in Section 3.2 below). F is said to exhibit accom- and a different pair of languages, to test the oc- modation if the likelihood of a user expressing F currence of code-choice accommodation. Table1 increases when F has been expressed in the previ- reports the number of dialogs and words for the ous dialog. We define the degree of accommoda- two datasets, and the fraction of words that are in tion as follows English.

Dataset Dialogs Words %En  Movies (En-Hi) 20.1K 240K 24.1 Acm(F ) = P δdF δdF − P (δdF ) (1) i i−1 i Bangor (Es-En) 18.5K 216K 62.9

Here, dialog di−1 immediately precedes dialog Table 1: Conversational dataset overview di, and δdF is the event that the dialog d exhibits F . The first term can be thought of as the reciprocity over F . The second term is the fraction of dialogs Hindi Movies in the corpus for which F = 1, which is also the The data comprises of scripts of 32 Hindi movies empirical probability of observing F in a dialog d. released between 2012 and 2017. 17 of these Instead of computing these likelihoods over the scripts were collected by Pratapa and Choudhury 1 entire corpus, we could also compute them in- (2017) from scripts posted online . We collected 2 dividually for each speaker, and doing so yields 15 scripts of our own from a similar online source a fairer condition for accommodation. Different and parsed them replicating the methodology of speakers can have widely different base likeli- Pratapa and Choudhury(2017). hoods. This metric requires an average speaker to All the scripts have word-level language tags reciprocate more than their own (individual) base- as created by the language identification system line likelihood of expressing F , rather than sim- from (Gella et al., 2013). The language labels on ply more than the population baseline. Denoting manual inspection were found to have significant the event that a dialog d is spoken by a speaker s 1https: //moifightclub.com/category/scripts 2 as δS(d)=s, we redefine accommodation as follows http://www.filmcompanion.in/category/fc-pro/scripts/ 84 amount of noise, we corrected frequently observed errors with manual supervision. Each dialog is assumed to be in response to the immediately preceding dialog within a scene. We restrict our analysis to dialogs that are between no more than two speakers, to avoid confounding ef- fects of multi-party conversations on accommoda- tion. This also filters out most dialogs in the scripts which are not conversational in nature. Movie conversations, even though imagined, Figure 1: Fraction of Spanish over time in a con- are designed to sound natural, and therefore, are versation. The x-axis denotes consecutive dia- suitable for studying style accommodation, as log pairs, with dialog i above aligned with dialog is argued in Danescu-Niculescu-Mizil and Lee i + 1 below, so two aligned bars denote two con- (2011), and also multilingualism (Bleichenbacher, secutive dialogs. 2008) and code-choice (Vaish, 2011). It is true that movie dialogs promote stereotypes that may ∗ Dataset Code (L) Acm(FL) Acm (FL) affect characters’ expression of code-choice, how- En 0.06† 0.04 ever accommodative effects can still be expected to Bangor Es 0.12 0.09 play out largely independent of such stereotypes. En 0.10 0.06 There have been several linguistic and quantitative Movies Hi 0.02† -0.02† studies on Hindi-English CS in Hindi movies (Par- shad et al., 2016;L osch¨ , 2007; Pratapa and Choud- Table 2: Accommodation values for different hury, 2017). codes. Values with a (†) are not significant. Sig- Bangor Corpus nificance for Acm(F ) is computed using Fisher’s exact test, and significance for Acm∗(F ) is com- We use the Bangor Miami corpus3 of word- puted using one-tailed paired t-test. level language labeled transcripts of spoken con- versations between Spanish-English bilinguals in Florida, US. The original dataset contains 56 con- plot in Figure2, the rate of accommodation by s, F versations, from which we selected 40 conversa- Acms(F ), against the respective base rate P (ds ), tions that have non-trivial amount of English and for F ∈ {FEn,FHi}. Spanish, and sufficient dialogs from each speaker. Clearly, we see that a high base rate of expres- Figure1 shows the fraction of Spanish used by sion corresponds to far less accommodation. In a dyad of speakers in a sample conversation from other words, the instances of code-choice that are this dataset (the complement fraction being En- uncommon and therefore unexpected within the glish). Intuitively, we expect our metrics to cap- conversational context are likely to be accommo- ture how coordinated two speakers are. dated for. In a conversation that is predominantly in Hindi, a dialog uttered in Hindi carries little 3.4 Results salience and doesn’t stand out. This code-choice is Table2 shows the metrics from Section 3.1 com- unlikely to be registered as a communicative sig- puted over the features in Section 3.2 on the two nal or a marked expression of any linguistic style, datasets. and therefore wouldn’t elicit accommodation. En- While these numbers do suggest that accom- glish and Spanish are respectively less common in modative effects are present, they seem to be fairly Movies and Bangor, and indeed their rates of ac- weak. The rate of reciprocation is only slightly commodation are higher than the rates for the cor- higher than the base rate, and in some cases the responding dominant languages. difference isn’t statistically significant. Since the metrics in Section 3.1 compute like- However, looking at individual differences in lihoods over all instances of code-choice irrespec- these values reveals an interesting observation. tive of salience, the observed rates of accommoda- For each speaker s in the Movies dataset, we tion are low. We borrow the notion of markedness of code-choice, as described in Myers-Scotton 3http://bangortalk.org.uk/speakers.php?c=miami (2005), and incorporate it into our framework, as

85 dation. The Bangor corpus came with named- entity tags, and in the Movies corpus we removed all character names from the dialogs, but we were not aware of any NER system for Hindi-English CS data that we could have used to remove other named entities. Ideally, we would like to exclude all such words from the triggers expected to elicit accommodation, as their usage isn’t stylistically marked (Auer, 1999). The word-level language tags also have some amount of noise, and it is de- sirable to use features that are resilient to it. Besides, it is possible that a relatively high in- cidence of marked code in a dialog is perceived Figure 2: Variation of accommodation rate as a stronger style marker, and is perhaps accom- against base rate. Observed rate (x + y) can modated for more strongly than a lower incidence. vary between 0 and 1. The highlighted region We introduce a simple fraction-based thresholding denotes positive accommodation and a low base that allows us to test the same. rate (x < 0.5 and y > 0). In contrast, all other re- For every dialog d, we define feature FL,τ such gions, as demarcated by dashed lines, are sparser. that dFL,τ = 1 if and only if (a) d is sufficiently long and (b) fraction of words of d in the marked language L is more than τ. We consider an ut- described in the next section. terance to be sufficiently long if it contains more 4 Marked Code-Choice Features than 4 words, as this is expected to filter out most frozen expressions and named entities that may 4.1 Code Salience be borrowed from one language to another. We As shown earlier, measuring accommodation show results for accommodation of Fτ for τ ∈ makes sense only over marked instances of code- {0, 0.2, 0.5}. While F0 would capture presence of choice. Thus, for every conversation in our even one word in a marked code, F0.2 represents a dataset, we identify the marked language, and non-trivial occurrence and F0.5 represents major- measure accommodation only over that language. ity occurrence of the marked code in context. We choose a conversation as the unit for decid- ing if a code is marked because the set of speak- 5 Beyond Immediate Accommodation ers and the conversation context typically dictates code-choice in multilingual societies. A language is considered marked if it is the The metrics in Section3 and those in Danescu- non-dominant language - we keep the threshold of Niculescu-Mizil et al.(2011) only consider the im- markedness at no more than 40% of total words in mediate next turn as a candidate for reciprocation. the entire conversation. We discard highly mixed However, it is possible for accommodative effects conversations where none of the languages meets to span a few conversation turns. Consider the fol- the threshold. This consideration also makes the lowing snippet from one of the conversations in calculation of accommodation more robust, as for Bangor (Spanish code is in bold and its translation a high fraction of incidence of a code, the effect of is in italics). the previous turn would be harder to isolate. In cases like this, the content of the conversa- tion prevents a possibility of accommodating im- 4.2 Threshold of Occurrence mediately, but the speaker Sarah still reciprocates Another limitation of the formulation in Section Paige’s code-choice at the first instance possible. 3 is that it doesn’t incorporate the extent of pres- We can test if such cases of delayed accommoda- ence of each code in a dialog. Consequently, tion are indeed common in the data, by extending even named entities, frequently borrowed words our formulation to an arbitrary number of turns. and frozen expressions from the marked language, We extend Equation (2) below, and Equation (1) would be considered as candidates for accommo- can be extended analogously.

86 precisely equal to the first term in Equation3, the Paige i wanna see them. F Sarah pick. pick like (name) flowers or ... probability using F in responding to a dialog c . Paige ¡ay que´ lindo esta´ ese! The second term in Equation3 is the expected oh, how pretty that is! of recall under the independence assump- ok, enter the date. it will be ... tion, i.e., if s randomly introduces marked code Sarah may. at every turn with probability ps. Therefore, a Paige may. ninth? speaker is accommodative if their recall is higher Sarah ninth. Paige two thousand and eight. than that of this random baseline. and then you put what you want. A popular metric to evaluate retrieval systems (name) trip? is the mean reciprocal rank (MRR). The recipro- Sarah no te cabe. it doesn’t fit you. cal rank of a query response is the multiplicative just (name). inverse of the rank of the first relevant response. The MRR of a system is simply the mean of the re- ciprocal ranks of all its responses. Since we expect 5.1 Generalization of Immediate the accommodative speaker to have a higher recall Accommodation than the random baseline, we also expect the ac- commodative speaker to have a higher MRR, with The baseline rate of a speaker s using a feature F the difference from baseline MRR being propor- across n (consecutive) turns is the likelihood that tional to its accommodativeness. at least one the n turns expresses F , and is given n F Not only does this present an alternative view by 1 − (1 − ps) , where ps is simply P (d ). For a s of accommodation and exposes well-studied for- speaker s, the rate of n-turn accommodation is the malisms and concepts from information retrieval, increase in likelihood of occurrence of F in either but the ability to capture speakers’ styles as re- of the n dialogs ds,1 to ds,n, conditioned on the sponse characteristics also facilitates predictive event that the preceding dialog d0 expresses F . conversational modelling. n  _  Mean reciprocal ranks for the random baselines Acm (F ) = P dF  dF n,s s,i 0 can be computed analytically as follows. We first i=1 (3) n compute the expected reciprocal rank r for any − 1 − (1 − ps) given query as a function of the correctness prob- ability ps. For the first relevant response to be at Acm∗ (F ) = E Acm (F ) (4) n s n,s rank i, all previous responses must be irrelevant. When n = 1, this resolves to Equation (2). Note Since each response is relevant with a probability that d1 to dn are the first n dialogs spoken by s ps, the probability of the i-th response being the immediately after the dialog d0. As before, Es first relevant response is given by : denotes expected value over all speakers. 1 P r =  = (1 − p )i−1 ∗ p (5) 5.2 Accommodation as Retrieval ps i s s Responding to marked code-choice with marked The baseline MRR of a speaker s, denoted by code-choice can be thought of or reformulated as Bases, is then the expected value of r, also as a retrieval task. For a speaker s, each instance of function of ps :  a dialog addressed to s with a feature F would Bases = E rps be a query posed to s. The next n dialogs spo- ∞ X i−1 ken by s would be the top-n retrieved responses = (1 − ps) ∗ ps/i (6) to the query. We are interested in the retrieval of i=1 responses that also have feature F , so we call a re- ps = − ln ps sponse with feature F to be relevant response and 1 − ps irrelevant otherwise, in keeping with the standard The overall baseline MRR, Base is then simply terminology in information retrieval. We consider Es(Bases). We compare the observed MRR on s to have retrieved a relevant response in n-turns the data (denoted by Obs) with the expected MRR if at least one of the first n responses is relevant. of the random baselines (Base), with their differ- When formulated this way, the recall of s, the ence being indicative of the degree and immediacy probability of retrieving a relevant response, is of accommodation.

87 Bangor Movies MRR τ Es En En Hi Obs 0.48 0.52 0.67 0.62 0 Base 0.18 0.34 0.32 0.40 Obs 0.46 0.54 0.57 0.45 0.2 Base 0.15 0.26 0.25 0.35 Obs 0.46 0.49 0.54 0.4 0.5 Base 0.12 0.22 0.15 0.25

Table 3: Mean Reciprocal Ranks of the observed (a) Bangor; L = Es for solid lines (significant for n < 6) responses (Obs) and the random baseline (Base) and L = En for dashed (significant for n < 4). for different features Fτ and different corpora.

a stronger tendency to accommodate (since the in- crease over respective base rate is identical). The difference between the retrieval characteristics for the different thresholds is more salient in Table3 - higher thresholds correspond to a smaller aver- age likelihood, and lower baseline MRRs. The dif- ference between observed and baseline MRR does (b) Movies; L = En. Significant for n < 4. slightly increase with τ, making higher fraction of marked code somewhat more accommodated for. In contrast to English, the accommodation for ∗ Figure 3: Accommodation rates (Acmn(FL,τ )) Hindi code-choice in conversations dominated by versus n. Red, green and blue lines indicate τ = English is not significant. This suggests that Hindi 0, 0.2 and 0.5 respectively. Accommodation of code isn’t marked even when it is the minority Hindi is not significant. code in a scene, an inference that aligns with the claim from Myers-Scotton(2005) that Hindi is not marked in Hindi movies, even when it is the non- 5.3 Results and Observations dominant language in context. ∗ Figure3 shows the trends in Acmn(FL,τ ) for dif- Hindi in Movies and English in Bangor have ferent values of n, L and τ. Significance scores a lower strength of accommodation than their re- are computed in the same way as for Table2. spective counterparts, even when measured over Table3 shows the real and baseline MRR values conversations where they are uncommon. Not for each corpus over different values of τ. only is accommodation stronger for Spanish, it It is evident that accommodation of code-choice also persists for more number of turns as com- is a prevalent and robust phenomenon. The values pared to English. This suggests that the context of accommodation are consistently positive for all of markedness is larger than the immediate con- the different marked-code features, languages and versation, and the being the dominant language of datasets, and for low values of n. the corpus as a whole reduces markedness. In Table3, the less common codes in each In most cases, accommodation is salient and dataset, Es and En respectively, have a lower significant even after a few turns. Delayed accom- baseline while having comparable or even higher modation is as prevalent as immediate accommo- observed MRRs as their more common counter- dation. And the likelihood of a given speaker re- part. This reiterates that accommodation is more ciprocating code-choice in kind, remains signifi- pronounced for more marked codes. cant for several turns in a conversation. From Figure3, a higher fraction of marked code 6 Discussion (τ = 0.5) does not seem to elicit stronger accom- modation than τ = 0. However, it is important to Accommodation is prevalent and robust, but not note that the base rate for F0 is much higher than universal. While it is observed across conver- that of F0.5, so in relative terms, the latter exhibits sations spanning different media and language

88 pairs, there is significant variation among speakers tion to study conversation-wide accommodation within a dataset. As many as 18% of the speakers effects and convergence of code-choice at scale. exhibit what may be considered negative accom- modation, or non-accommodation. Half of these 7 Conclusion ∗ do so with a value of Acm1(F0) less than −0.10. We demonstrate that code-choice is a marker of It is in fact known that accommodation or con- linguistic style, and when it is marked in context, vergence is neither a universal nor a positive in- it is interpersonally accommodated for. We extend terpersonal strategy (Genesee and Bourhis, 1988; the probabilistic formulation to multiple conversa- Giles et al., 1991; Burt, 1994). In-group/out-group tion turns, and show equivalence with a retrieval identity as well as attitudes towards CS and the task, both facilitating better conversational analy- languages involved can cause negative accommo- sis of code-choice in particular and style interac- dation as well as a negative perception of accom- tions in general. modation. Burt(1994) show that while conver- In the future, we would like to use richer and gence is largely viewed positively, some multilin- linguistically motivated features for code-choice, gual speakers may oppose it as either misplaced including parts-of-speech, and indicators of bor- solidarity with an in-group, or a slur on the lan- rowing across languages. Another generalization guage capability of an interlocutor. would be to also study LIWC words and markers While we work under the assumption that code- of sociolinguistic style in this framework. Finally, choice is a style dimension, largely independent of longer-term accommodation effects, like conver- content, it is in fact influenced by factors like topic gence being succeeded by divergence, or topical (Sert, 2005) and sentiment (Rudra et al., 2016). effects on convergence, remain to be explored us- These influences could either align or compete ing a quantitative method like ours. with the socially accommodative code-choice, and this explains several-turn accommodation - it is References not always possible to accommodate immediately. The difference between code-choice and other lin- Peter Auer. 1995. The pragmatics of code-switching: guistic style markers is also indicated by the poor a sequential approach. In Lesley Milroy and Pieter Muysken, editors, One speaker, two languages, results of Section3, which naively applies the pages 115–135. Cambridge University Press. style accommodation framework to code-choice. Peter Auer. 1999. From codeswitching via language It is worth noting that the baselines throughout mixing to fused lects: Toward a dynamic typology the paper assume that speakers do not adjust their of bilingual speech. International journal of bilin- overall rate of employing a particular code, in or- gualism, 3(4):309–332. der to accommodate. This is in fact a fairly strict Peter Auer. 2005. A postscript: Code-switching and assumption. In fact, the same speaker typically has social identity. Journal of pragmatics, 37(3):403– widely varying base rates in conversations with 410. multiple other speakers. The extent of marked Peter Auer. 2013. Code-switching in conversation: code to be used is itself often negotiated within Language, interaction and identity. Routledge. a conversation, and adjusting one’s base rate can be construed as accommodation, and harder to an- Anu Bissoonauth and Malcolm Offord. 2001. Lan- guage use of mauritian adolescents in education. alyze. Nevertheless, this assumption gives us a Journal of Multilingual and Multicultural Develop- strong and realistic baseline to judge the observa- ment, 22(5):381–400. tions against. Lukas Bleichenbacher. 2008. Multilingualism in the One limitation of our formulation is that we movies: Hollywood characters and their language do not look at individual words. Word or code choices, volume 135. BoD–Books on Demand. saliency in context is actually more complex that Richard y Bourhis, Shaha El-Geledi, and Itesh just language saliency in current conversation. Sachdev. 2007. Language, ethnicity and intergroup Some words are more marked than others, with relations. In Language, discourse and social psy- borrowed words carrying very little salience. It chology, pages 15–50. Springer. would be nice to have more complex features, Richard Y Bourhis. 2008. The english-speaking com- aware of the syntactic structure of dialogs. It munities of quebec: Vitality, multiple identities and would also be worthwhile to apply this formula- linguicism. The Vitality of the English-Speaking

89 Communities of Quebec: From Community Decline John J Gumperz. 1982. Discourse strategies, volume 1. to Revival, Montreal,´ Centre detudes´ ethniques des Cambridge University Press. universites´ montrealaises,´ Universite´ de Montreal´ . Willem JM Levelt and Stephanie Kelter. 1982. Surface Susan Meredith Burt. 1994. Code choice in intercul- form and memory in question answering. Cognitive tural conversation. Pragmatics. Quarterly Publi- psychology, 14(1):78–106. cation of the International Pragmatics Association (IPrA), 4(4):535–559. Eva Losch.¨ 2007. The construction of social distance through code-switching: an exemplary analysis for Antoinette Camilleri. 1996. Language values and iden- popular indian cinema. Department of Linguistics, tities: Code switching in secondary classrooms in Technical University of Chemnitz. malta. Linguistics and education, 8(1):85–103. Carol Myers-Scotton. 1995. Social motivations for Cristian Danescu-Niculescu-Mizil, Michael Gamon, codeswitching: Evidence from Africa. Oxford Uni- and Susan Dumais. 2011. Mark my words!: lin- versity Press. guistic style accommodation in social media. In Proceedings of the 20th international conference on Carol Myers-Scotton. 2005. Multiple voices: An intro- World wide web, pages 745–754. ACM. duction to bilingualism. Wiley-Blackwell. Kate G Niederhoffer and James W Pennebaker. 2002. Cristian Danescu-Niculescu-Mizil and Lillian Lee. Linguistic style matching in social interaction. Jour- 2011. Chameleons in imagined conversations: A nal of Language and Social Psychology, 21(4):337– new approach to understanding coordination of lin- 360. guistic style in dialogs. In Proceedings of the 2nd Workshop on Cognitive Modeling and Compu- Rana D. Parshad, Suman Bhowmick, Vineeta Chand, tational Linguistics, pages 76–87. Association for Nitu Kumari, and Neha Sinha. 2016. What is India Computational Linguistics. speaking? Exploring the “Hinglish” invasion. Phys- ica A, 449:375–389. Jean-Marc Dewaele. 2010. Emotions in multiple lan- guages. Palgrave Macmillan, Basingstoke, UK. Shana Poplack. 1988. Contrasting patterns of code- switching in two communities. Codeswitching: Susan Ervin-Tripp and Iliana Reyes. 2005. Child Anthropological and sociolinguistic perspectives, codeswitching and adult content contrasts. Interna- 215:44. tional Journal of Bilingualism, 9(1):85–102. Adithya Pratapa and Monojit Choudhury. 2017. Quan- J.A. Fishman. 1970. Sociolinguistics: a brief intro- titative characterization of code switching patterns duction. Newbury House language series. Newbury in complex multi-party conversations: A case study House. on hindi movie scripts. In Proceedings of the 14th International Conference on Natural Language Pro- Spandana Gella, Jatin Sharma, and Kalika Bali. 2013. cessing, pages 75–84. Query word labeling and back transliteration for indian languages: Shared task system description. Shruti Rijhwani, Royal Sequiera, Monojit Choudhury, FIRE Working Notes, 3. Kalika Bali, and Chandra Sekhar Maddila. 2017. Estimating code-switching on Twitter with a novel Fred Genesee. 1982. The social psychological signifi- generalized word-level language identification tech- cance of code switching in cross-cultural communi- nique. In Proceedings of the Annual Meeting of the cation. Journal of language and social psychology, ACL. 1(1):1–27. Koustav Rudra, Shruti Rijhwani, Rafiya Begum, Ka- Fred Genesee and Richard Y Bourhis. 1988. Evalua- lika Bali, Monojit Choudhury, and Niloy Ganguly. tive reactions to language choice strategies: The role 2016. Understanding language preference for ex- of sociostructural factors. Language & Communica- pression of opinion and sentiment: What do Hindi- tion, 8(3-4):229–250. English speakers do on Twitter? In EMNLP, pages 1131–1141. Howard Giles. 2007. Communication accommodation theory. Wiley Online Library. Itesh Sachdev and Howard Giles. 2004. Bilingual ac- commodation. The handbook of bilingualism, pages Howard Giles, Nikolas Coupland, and IUSTINE Cou- 353–378. pland. 1991. 1. accommodation theory: Communi- cation, context, and. Contexts of accommodation: Carol Myers Scotton and William Ury. 1977. Bilingual Developments in applied sociolinguistics, 1. strategies: The social functions of code-switching. International Journal of the sociology of language, Howard Giles, Donald M Taylor, and Richard Bourhis. 1977(13):5–20. 1973. Towards a theory of interpersonal accommo- dation through language: Some canadian data. Lan- Olcay Sert. 2005. The functions of code-switching in guage in society, 2(2):177–192. elt classrooms. Online Submission, 11(8).

90 Juni Soderberg¨ Arnfast and J Normann Jørgensen. 2003. Code-switching as a communication, learn- ing, and social negotiation strategy in first-year learners of danish. International journal of applied linguistics, 13(1):23–53. Yla R Tausczik and James W Pennebaker. 2010. The psychological meaning of words: Liwc and comput- erized text analysis methods. Journal of language and social psychology, 29(1):24–54. Paul J Taylor and Sally Thomas. 2008. Linguistic style matching and negotiation outcome. Negotiation and Conflict Management Research, 1(3):263–281. Viniti Vaish. 2011. Terrorism, nationalism and west- ernization: Code switching and identity in bolly- wood. FM Hult, & KA King, K. A (Eds.). Edu- cational linguistics in practice: Applying the local globally and the global locally, pages 27–40.

91