A Case Study of Code Mixing and Code Switching in Political Speeches
Total Page:16
File Type:pdf, Size:1020Kb
Political Discourse Analysis: A Case Study of Code Mixing and Code Switching in Political Speeches Dama Sravani, Lalitha Kameswari, Radhika Mamidi Language Technologies Research Centre International Institute of Information Technology Hyderabad, Telangana, India {dama.sravani, v.a.lalitha}@research.iiit.ac.in [email protected] Abstract when interacting with members of other communi- ties and with tourists, the residents would use the Political discourse is one of the most interest- standard dialect. ing data to study power relations in the frame- work of Critical Discourse Analysis. With Foster et al.(1981) states that language is not the increase in the modes of textual and spo- neutral or universal in a political context. Language ken forms of communication, politicians use is used to reflect many historical, cultural and so- language and linguistic mechanisms that con- cial identities associated with the politician. In a tribute significantly in building their relation- multilingual country like India, CM and CS are a ship with people, especially in a multilingual norm. They not only reflect a person’s association country like India with many political parties with more than one language or a dialect, but also with different ideologies. This paper analy- conveys their social identity in a given context. In ses code-mixing and code-switching in Telugu political speeches to determine the factors re- this paper, our aim is to look at CM and CS as two sponsible for their usage levels in various so- distinctive techniques used for political gain. cial settings and communicative contexts. We The matrix language we have chosen to study also compile a detailed set of rules captur- these phenomena is Telugu, a South-Central Dra- ing dialectal variations between Standard and vidian language predominantly spoken in India’s Telangana dialects of Telugu. Southern parts, especially in Andhra Pradesh and Telangana. There are many regional dialects and 1 Introduction sub-dialects in Telugu, but the three major di- Gumperz(1982) defines Code Switching (CS) as alects are the Coastal Andhra dialect, the Telan- the juxtaposition within the same speech exchange gana dialect which has a significant influence of of passages of speech belonging to two different Urdu/Dakhni, and the Rayalaseema dialect. The grammatical systems or sub-systems. On the other variety spoken by the educated class from the inte- hand, Code Mixing(CM) refers to the embedding rior districts of Andhra area was modernised and of linguistic units such as phrases, words and mor- elevated to ’Standard Telugu’ status in 1969 and phemes of one language into an utterance of an- since then has been widely used in textbooks, news- other language (Myers-Scotton, 1997). So broadly papers and other formal communication. It is also speaking, CS occurs across sentences/phrases and referred to as Modern Standard Telugu (MST) in CM within a sentence/phrases (though some re- Krishnamurti et al.(1968). searchers do not distinguish the two). Even though Telugu is one of India’s largest Gumperz(1982) researched specific speech spoken languages with more than 80 million speak- events to examine the relationship between speak- ers, there is a severe dearth of resources in Telugu, ers’ linguistic choices. They also looked for CS which makes it hard for NLP research. For our instances, either between languages or between va- purpose, we did not find any corpus for Code Mix- rieties of the same language, to find out in what ing and Switching in Telugu. Hence, we created a situation and with what interlocutors, CS occurs corpus of political speeches in Telugu consisting and CS may signal various group memberships of 1134 sentences and about 10000 words. Since and identities. Gumperz(1977) found that local di- our work is closely associated with a set of rules alect carried great prestige, and as a person’s native and statistical observations corresponding to those speech is regarded as an integral part of his family rules, the corpus size was sufficient to give us good background, a sign of his local identity. However, results. We will further examine the factors re- 1 Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 1–5 June 11, 2021. ©2021 Association for Computational Linguistics https://doi.org/10.26615/978-954-452-056-4_001 sponsible for varying levels of CM and CS in these 3 Dataset and Annotation speeches. 3.1 Dataset collection Even with the advent of social and print media, 2 Related Work in-person modes of communication such as cam- paigns and political speeches remain the most pre- Kameswari and Mamidi(2018) conducted a study ferred ways of communicating with the general about various interpersonal speech choices in elec- public for politicians. They try to ensure that the tion campaign speeches, including the usage pat- audience feels connected to them, thereby increas- terns of nouns, pronouns, kinship terms, rhetor- ing their potential votes. This is done strategically ical questions, etc. There are a few more stud- and persuasively. ies (Martinez Guillem(2009), Ilic and Radulovic We chose our speakers as Mr K Chandrasekhar (2015), Kampf and Katriel(2016)) which analyse Rao (KCR), the Chief Minister of Telangana and the deeper intention behind the choice of words Mr Chandra Babu Naidu(CBN), former Chief Min- and phrases using the famous Speech Act theory by ister of Andhra Pradesh. KCR is the founder of Searle et al.(1980) and the Sociocognitive model the Telangana Rashtra Samiti (TRS) party and is by Van Dijk(2014). widely regarded as the face of the Telangana move- ment for a separate state in 2014. CBN is the leader There has been some work recently on CM and of the Telugu Desam Party. They use a variety of CS involving Indian languages. But most of the dialects and languages such as Telangana Telugu, work is done in the social media domain and in- Modern Standard Telugu, Urdu, English and Hindi volves Hindi-English pair because of the easier in their speeches. availability of data. Bali et al.(2014) worked on We chose a total of 6 speeches of both the speak- code mixed tweets in English-Hindi. They tried to ers in three different social settings and communica- differentiate between borrowing and code-mixing tive contexts to analyse the levels of code-mixing based on the frequency of co-occurrence of words and code-switching as follows: in tweets. In Dravidian languages, there is very little work 1. Public Meetings in Telangana: KCR’s done in this area so far. Srirangam et al.(2019) speech was during the Telangana movement, created a corpus for Named Entity recognition meant for the creation of a new state. He in English-Telugu code mixed tweets,Jitta et al. addressed the pathetic situation of Telangana (2017) created a English-Telugu code mixed con- residents and also discussed the plan and poli- versational data for Dialog Act recognition. cies for the new state.CBN’s speech is during There has been very less work on analysing the the Telangana elections in 2018. The audience Telangana dialect. Bhaskar wrote a book named were residents of Telangana. We will refer to Telanagana Padakosam with Telangana words and this as communicative event 1. their corresponding words in Standard dialect. 2. Felicitating Dr. Venkaiah Naidu when hon- Also, he has drawn few observations that are com- oured as Vice President: KCR and CBN’s mon in Telangana dialect.Chakravarthy(2016), An speeches were with MLAs and other parlia- Annotated Translation of Kalarekhalu A Historical ment members of their respective states,viz. Novel by Ampasayya Naveen, describes the impor- They spoke about Dr Venkaiah Naidu’s great tant phases that lead to the Telangana state. Few qualities and praised him for his service to the cultural words have been retained without translat- nation and attaining one of its highest posi- ing into English. Sastry(1987) provided a prosodic tions. We will refer to this as communicative analysis of Telangana dialect. event 2. To our knowledge, this work is the first of its kind, which analyses CM and CS together in po- 3. Capital Development: In these speeches, litical discourse along with dialectal level code- both the speakers were talking about devel- mixing analysis. We aim to understand these phe- opments of capitals. In the KCR speech, the nomena as a speech choice and its effect on the audience were Government officials and local audience in politics. politicians of Telangana. CBN addressed the 2 collectors of Andhra Pradesh. We will refer 1. Vowel rules to this one as communicative event 3. • Deletion: In Telangana dialect, vowels Though the speeches were available on YouTube, are dropped at the end of some words. none of the existing off-the-shelf speech to text sys- For example: tems could serve to capture the speech effectively nenu - nen along with the dialectal variations in the language. • Replacement: Long vowels are replaced Therefore, we manually transcribed the speeches in with short vowels. the WX format (Diwakar et al., 2010) and verified the transcription with the help of native speakers. vastAru - vastaru The duration of the speeches is 100 minutes for 2. Consonant rules each speaker, and after transcription, it consists of 1134 sentences. The total word count is around • Addition: In Telangana dialect g is 10000. added at the start for few words. ippuDu - gippuDu 3.2 Annotation • Deletion: In some words v is dropped at All speeches are annotated for the usage of CM and the beginning of the word. This occurs CS at the word level. Each speech is annotated for in nouns, pronouns and verbs Dialectal level code-mixing(DCM), Language level vAna - Ana code-mixing (LCM) and Code-Switching(CS).