Thousands of Small, Constant Rallies:A Large-Scale Analysis of Partisan Whatsapp Groups
Total Page:16
File Type:pdf, Size:1020Kb
1 54 2 Thousands of Small, Constant Rallies: 55 3 56 4 A Large-Scale Analysis of Partisan WhatsApp Groups 57 5 58 6 Victor S. Bursztyn Larry Birnbaum 59 7 [email protected] [email protected] 60 8 Department of Computer Science Department of Computer Science 61 9 Northwestern University Northwestern University 62 10 Evanston, Illinois, USA Evanston, Illinois, USA 63 11 64 ABSTRACT 1 INTRODUCTION 12 65 13 There is growing concern about the use of social platforms On October 28th 2018, amid strong polarization and con- 66 14 to push political narratives during elections. One very recent spiracy theories that flooded social media, Brazilians elected 67 15 case is Brazil’s, where WhatsApp is now widely perceived far-right candidate Jair Bolsonaro their next president. With 68 16 as a key enabler of the far-right’s rise to power. In this paper, over 120M users in Brazil – the second largest market in 69 17 we perform a large-scale analysis of partisan WhatsApp the world – the role that WhatsApp played in this electoral 70 18 groups to shed light on how both right- and left-wing users process has emerged as a major focus of attention and con- 71 19 leveraged the platform in the 2018 Brazilian presidential troversy due to its alleged importance in a large number of 72 20 election. Across its two rounds, we collected more than 2.8M successful political campaigns. 73 21 messages from over 45k users in 232 public groups (175 right- Indeed, WhatsApp in Brazil connects an audience com- 74 22 wing vs. 57 left-wing). After describing how we obtained a parable in scale to television’s to content created and dis- 75 23 sample that is many times larger than previous works, we tributed with almost no barriers or filters other than user 76 24 compare right- and left-wing users on their social network curation. Although the promise of inexpensive one-to-one 77 25 metrics, regional distribution, content-sharing habits, and mobile communication may have sparked this popularity – 78 26 most characteristic news sources. WhatsApp’s 1.5B users are largely in developing countries 79 27 – group chats arguably made it catch fire. The app allows 80 28 CCS CONCEPTS users to create groups where up to 256 users can share text 81 29 • Networks → Online social networks. and multimedia messages, transforming these groups into 82 30 highly active social spaces. Users can also create public in- 83 31 KEYWORDS vites to their groups and share them as URLs across the web, 84 transforming these groups into small public forums. 32 chat applications, WhatsApp, elections, partisanship, data 85 However, the fact that all messages circulate with end-to- 33 collection, social network analysis 86 34 end encryption hinders transparency and even law-enforcement, 87 making WhatsApp a fertile ground for bad actors. A recent 35 ACM Reference Format: 88 36 Victor S. Bursztyn and Larry Birnbaum. 2018. Thousands of Small, study on the Brazilian electorate found that false stories 89 37 Constant Rallies: A Large-Scale Analysis of Partisan WhatsApp that circulated massively through WhatsApp’s network were 90 38 Groups. In Woodstock ’18: ACM Symposium on Neural Gaze Detection, more far-reaching than initially assumed, as it revealed that 91 39 June 03–05, 2018, Woodstock, NY. ACM, New York, NY, USA, 5 pages. 90% of Bolsonaro’s supporters think they are truthful [2]. But 92 40 https://doi.org/10.1145/1122445.1122456 measuring effects on the surface only to speculate about such 93 41 a complex network isn’t enough to protect future elections 94 42 from the use of WhatsApp as a political weapon. Instead, it’s 95 43 Permission to make digital or hard copies of all or part of this work for necessary to learn how to measure partisan activity at scale. 96 personal or classroom use is granted without fee provided that copies are not 44 In practice, this is challenging because it requires finding 97 made or distributed for profit or commercial advantage and that copies bear a large number of partisan WhatsApp groups and following 45 this notice and the full citation on the first page. Copyrights for components 98 46 of this work owned by others than ACM must be honored. Abstracting with the digital rallies that happen in them. Next, we need to find 99 47 credit is permitted. To copy otherwise, or republish, to post on servers or to meaningful ways to analyze such rallies and characterize 100 48 redistribute to lists, requires prior specific permission and/or a fee. Request each partisan group. There are many analyses that can be 101 permissions from [email protected]. 49 made to compare right- and left-wing users in WhatsApp: 102 Woodstock ’18, June 03–05, 2018, Woodstock, NY 50 How their social networks are structured, how much do they 103 © 2018 Association for Computing Machinery. represent the larger population of voters, and what types of 51 ACM ISBN 978-1-4503-9999-9/18/06...$15.00 104 52 https://doi.org/10.1145/1122445.1122456 content are consumed and shared by them. 105 53 1 106 Woodstock ’18, June 03–05, 2018, Woodstock, NY Bursztyn and Birnbaum 107 Addressing the challenges involved in analyzing partisan 3 METHODOLOGY 160 108 activity in WhatsApp, we make the following contributions: Our data collection started on September 1, 2018, and was 161 109 162 • concluded for the purpose of this analysis on November 1, 110 We introduce a new data collection method capable of 163 growing a sample as much as necessary and towards 2018, thus spanning exactly two months. It included mean- 111 ingful events that preceded the electoral race (e.g., Brazil’s 164 112 specific directions in the political spectrum; 165 • Supreme Court barring former President Lula from the race 113 Our code, released to the research community upon 166 publication, can mine invites to other public WhatsApp on September 11) as well as the first and second rounds of 114 the election (on October 8 and 28, respectively). 167 115 groups from the groups already joined; 168 • All data was collected using a dedicated smartphone with 116 We analyze the first large-scale dataset of partisan 169 activity in WhatsApp using a variety of methods and 64GB of storage, allowing for a maximum of 1GB of data per 117 average day in our study. WhatsApp’s daily backups were 170 118 standpoints, contributing with real measurements about 171 right- vs. left-wing users in the platform. observed so that our portfolio of groups would never exceed 119 this safe margin. For our setting, in practice, it allowed the 172 120 tracking of 700-800 public groups – the total fluctuated as 173 2 RELATED WORK 121 groups were closed or added. 174 122 Several works indicate a recent rise of interest in analyzing In this section we describe our data collection method- 175 123 public WhatsApp groups: ology in two parts. First, we discuss the basic steps also 176 124 Rosenfeld et al. [8] provided an initial study of WhatsApp implemented by previous works [1, 3, 6, 7]. Second, we de- 177 125 messages with a particular focus on predictive analysis. Al- scribe our improvements, which substantially enhance data 178 126 though it comprised 4M messages, the dataset spanned only collection, now made available at a public repository. As 179 127 100 users and didn’t include the actual contents of those with previous works, we emphasize that the resulting data 180 128 messages – only a handful of meta-data. They noted that a collection complies to WhatsApp’s privacy policy. 181 129 small majority of messages originated from group messaging 182 130 and used it to distinguish WhatsApp from plain texting. Data Collection - Base Method 183 131 Garimella and Tyson [3] collected the first large-scale Public WhatsApp groups are characterized by the ability 184 132 WhatsApp dataset. To do so, they looked for invites to public to join them through a public URL created by their own- 185 133 WhatsApp groups in websites that aggregated and organized ers, called “an URL invite”. At the same time, all WhatsApp 186 134 information about such groups as well as by searching for the groups are limited to 256 users. Considering this, Caetano et 187 135 string “chat.whatsapp.com” in Google. After automating the al. [1, Figure 2] summarizes the base method in three steps: 188 136 process of joining groups using Selenium over WhatsApp’s 1) Searching the web for invites to public WhatsApp groups; 189 137 web client, their work was able to collect 454,000 messages 2) Trying to join them; and 3) Extracting data from them. 190 138 from 45,794 users in a six month period, encompassing a For step #1, a typical solution is to look for invites to pub- 191 139 total of 178 groups about a wide variety of themes. lic WhatsApp groups in a set of publicly accessible sources. 192 140 Caetano et al. [1] provided an initial study of political dis- These sources include: (i) websites aimed at organizing in- 193 141 cussions in Brazilian WhatsApp groups and, at the same time, formation about public WhatsApp groups; (ii) the web, in 194 142 offered valuable measurements for a non-electoral setting. general, by searching for the string that is the prefix of URL 195 143 Their dataset comprised 273,468 messages from 6,967 users in invites (“chat.whatsapp.com”) in Google; and (iii) the social 196 144 a one month period, encompassing a total of 81 groups.