Visualizing Social Bot Activity During Elections
Total Page:16
File Type:pdf, Size:1020Kb
Bot Electioneering Volume: Visualizing Social Bot Activity During Elections Kai-Cheng Yang Pik-Mai Hui Filippo Menczer [email protected] [email protected] cnets.indiana.edu/fil Center for Complex Networks and Center for Complex Networks and Center for Complex Networks and Systems Research, Indiana University Systems Research, Indiana University Systems Research, Indiana University Bloomington, Indiana Bloomington, Indiana Bloomington, Indiana ABSTRACT bots that aimed to effectively push their messages to the target It has been widely recognized that automated bots may have a sig- audience [1]. In particular, bots were most active in the core of nificant impact on the outcomes of national events. It is important the misinformation-sharing network [10] and effectively amplified to raise public awareness about the threat of bots on social media the spread of low-credibility content by posting it within seconds during these important events, such as the 2018 US midterm elec- and by targeting influential accounts [9]. Analogous automated tion. To this end, we deployed a web application to help the public campaigns were reported in countries around the globe [4, 11]. explore the activities of likely bots on Twitter on a daily basis. The Here we present Bot Electioneering Volume (BEV), a platform application, called Bot Electioneering Volume (BEV), reports on that visualizes the volume generated by bots and the corresponding the level of likely bot activities and visualizes the topics targeted targeted topics. BEV tracks online traffic centered around elections by them. With this paper we release our code base for the BEV framework, with the goal of facilitating future efforts to combat malicious bots on social media. KEYWORDS bot activity, Twitter, elections ACM Reference Format: Kai-Cheng Yang, Pik-Mai Hui, and Filippo Menczer. 2019. Bot Electioneer- ing Volume: Visualizing Social Bot Activity During Elections. In The Web Conference 2019, May 13–17, 2019, San Francisco, CA. ACM, New York, NY, USA, 3 pages. 1 INTRODUCTION Social bots have drawn great attention from the public recently. They are accounts on social media platforms controlled at least in part by algorithms to generate/share/retweet content and interact with human users [5]. The automated nature of social bots makes it easy to achieve scalability, with which a single person is capable to control thousands of accounts on one or more social media platforms. When needed, these social bots can work collectively to manipulate the public by promoting certain accounts or opinions. Being social animals, human users are inevitably vulnerable to arXiv:1902.02339v1 [cs.CY] 6 Feb 2019 the efforts of the social bots. Studies have shown ubiquitous social bots [12] distort online discussions, and particularly those about politics. During the 2010 US midterm election, primitive social bots were used to attack some candidates [6] and spread tweets with links to fake news websites [7]. A similar pattern emerged Figure 1: Illustration of BEV’s (a) crawler, (b) database, (c) an- in the 2016 US presidential election, only with more sophisticated alyzer, and screen shot of (d) front-end interface. The upper panel of the frontend shows the Bot Electioneering Volume Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for the past 8 days. Users can select a day of interest to ex- for profit or commercial advantage and that copies bear this notice and the full citation plore the top topics of that day. The bottom panels show a on the first page. Copyrights for components of this work owned by others than the tag cloud and entity lists for the selected day. The tag cloud author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission presents entities all together, with the size of each entity pro- and/or a fee. Request permissions from [email protected]. portional to how often it is tweeted by bots. The entity lists The Web Conference ’19, May 13–17, 2019, San Francisco, CA display hashtags, mentions, and links ranked by how often © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. each is tweeted by bots. The Web Conference ’19, May 13–17, 2019, San Francisco, CA Yang, et al. from Twitter by feeding the streaming API with a list of selected 1e6 2.5 Number of election tweets hashtags. By incorporating the bot-detection ability of Botometer Number of unique users [3, 12, 13], BEV is able to distinguish between content generated by 2.0 likely bots and humans. The measurement of average bot activity is 1.5 then compared with random samples of tweets to produce a number Number 1.0 that quantifies electioneering activity by bots. BEV also collects 0.5 (a) content topics, including hashtags, mentions, and URLs shared by 0.0 likely bots, and reports on their relative volumes. BEV (botometer.iuni.iu.edu/bev) monitored public tweets about 1.6 1.5 the 2018 US midterm elections between October 22, 2018 and Dec 1.4 30, 2018. During the collection period, BEV drew over 3,000 visits. 1.3 Election tweets Random sample An archive of the data from Oct to Dec 2018 remains publicly 1.2 available for retrospective inspections. We plan to activate BEV 1.1 (b) Average bot score 1.0 again for future elections. Our goal is to raise public awareness of bot activities and their impact during elections in the past, and more importantly those in the future. 60% 50% 2 SYSTEM DESIGN 40% BEV The BEV system contains 4 major parts: a crawler, a database, an 30% (c) analyzer, and a front-end interface, as illustrated in Figure 1. 20% The crawler is in charge of tracking Twitter’s filtering API for public election-related tweets, querying Twitter’s Spritzer API for 23 Oct 201802 Nov 201812 Nov 201822 Nov 201802 Dec 201812 Dec 201822 Dec 201801 Jan 2019 random samples of public tweets, and fetching bot scores. Crawled data is stored in the database. The analyzer then extracts the re- quired information and generates the statistics for the visualization Figure 2: (a) Number of election tweets and unique users for at the application frontend. The frontend has three major parts: the each day. (b) Average bot scores for election tweets and ran- Bot Electioneering Volume timeline, a tag cloud, and entity lists. dom sample. (c) BEV timeline. The 2018 US midterm elec- The Bot Electioneering Volume measures the activity of likely bots, tions day, Nov 6, is highlighted by a vertical dashed line. while the tag could and entity lists display the topics that are most tweeted by likely bots. By clicking on the links in an entity list, 2.2 Bot identification users are directed to Hoaxy (hoaxy.iuni.iu.edu) [8], where they can explore more in-depth visualizations of the influence of bots around BEV uses Botometer (botometer.iuni.iu.edu) [3, 12, 13] to obtain bot the entities on Twitter in the recent minutes/hours/days. scores for Twitter accounts involved in election-related discourse. The data collection runs in a streaming fashion, but fetching bot Botometer is a supervised machine learning algorithm that consid- scores and analyzing the data take time. The front-end interface is ers more than a thousand features about an account and its activity updated every 4 hours to reflect newly incoming data. to estimate the likelihood that the account is automated. We con- sider accounts with bot score above 4 (on a 5-point scale) as social 2.1 Data collection bots. This is a fairly conservative threshold choice, corresponding to a posterior probability of automation near 50% [13] based on a The collection of election-related tweets is crucial to our applica- 15% prior probability [12]. tion. Our collection process starts with a set of election-related hashtags that are tracked using the Twitter filtering API. We seeded 2.3 Bot activity measurement our set of hashtags with several widely-used political hashtags in- To measure the bot electioneering volume, we first take daily av- cluding #2018midterms, #maga, and #bluewave. We then repeatedly erages of the bot scores of accounts generating political tweets, expanded the set with co-occuring hashtags [2], resulting in a set of weighted by their tweet frequencies. Considering that spamming 110 hashtags. From this set we manually removed 6 hashtags that the same message is a common strategy for bots, the weighted are general and irrelevant to election. We also added the hashtags average better highlights the amount of bot-generated content and for each US state’s Senate race: #casen, #nysen, and so on. The full their potential influence. To obrain a baseline of bot activity, we list of hashtags can be found in the FAQ page of the BEV website. produce the same weighted average for random tweets from Twit- This methodology allows our system to collect most tweets with ter’s Spritzer API, with a rate of 1,000 tweets per hour. The daily newly emerging election hashtags, because it is likely that these average bot scores for election and random tweets are shown in tweets contain some of the hashtags in our list as well. Figure 2(b). Twitter’s free filtering API offers at most 1% of Twitter’s traffic. The Bot Electioneering Volume is defined by the relative differ- We estimated that the traffic captured by our method is about 0.3– ence between the two averages: 0.5% of Twitter’s complete traffic (see Figure 2(a)).