Quantitative Characterization of Code Switching Patterns in Complex Multi-Party Conversations: A Case Study on Hindi Movie Scripts Adithya Pratapa Monojit Choudhury Microsoft Research, India Microsoft Research, India
[email protected] [email protected] Abstract man et al., 2017). Nevertheless, there are no large- scale quantitative studies of code-switched conver- In this paper, we present a framework sations, primarily because currently the only avail- for quantitative characterization of code- able large-scale datasets come from social media. switching patterns in multi-party conver- These are either micro-blogs without any conver- sations, which allows us to compare and sational context or data from Facebook or What- contrast the socio-cultural and functional sApp with very short conversations. On the other aspects of code-switching within a set of hand, functions of CS are most relevant and dis- cultural contexts. Our method applies cernible in relatively long multi-party conversa- some of the proposed metrics for quan- tions embedded in a social context. For instance, tification of code-switching (Gamback and it is well documented (Auer, 2013) that CS is mo- Das, 2016; Guzman et al., 2017) at the tivated by complex social functions, such as iden- level of entire conversations, dyads and tity, social power and style accommodation, which participants. We apply this technique to are difficult to elicit and establish from short social analyze the conversations from 18 recent media texts. Hindi movies. In the process, we are able In this work, we propose a set of techniques for to tease apart the use of code-switching analyzing CS styles and functions in conversations as a device for establishing identity, socio- grounded over social networks.