Of Bots and Humans (On Twitter)

Of Bots and Humans (on Twitter) Zafar Gilani∗, Reza Farahbakhshy, Gareth Tysonz, Liang Wang∗, Jon Crowcroft∗ ∗Computer Laboratory, University of Cambridge, Cambridge, UK, fszuhg2, lw525, [email protected] yCNRS Lab UMR5157, Institut Mines Telecom, Paris, France, [email protected] zQueen Mary University of London, London, UK, [email protected] Abstract—Recent research has shown a substantial active bots, in terms of their own activities as well as the reactions presence of bots in online social networks (OSNs). In this paper of others to them? What interactions between humans and bots we utilise our previous work (Stweeler) to comparatively analyse occur? How do bots affect the overall social activities? The the usage and impact of bots and humans on Twitter, one of the largest OSNs in the world. We collect a large-scale Twitter dataset understanding of these questions can have deep implications and define various metrics based on tweet metadata. Using a in many fields such as social media analysis and systems human annotation task we assign ‘bot’ and ‘human’ ground- engineering. truth labels to the dataset, and compare the annotations against To answer these questions, we have performed a large- an online bot detection tool for evaluation. We then ask a series scale measurement and analysis campaign on Twitter (xIII). of questions to discern important behavioural characteristics of bots and humans using metrics within and among four popularity We focus on bots in Twitter because it largely exposes public groups. From the comparative analysis we draw differences and content, and past studies indicate a substantial presence of interesting similarities between the two entities, thus paving the bots [4]. We offer a new and fundamental understanding of the way for reliable classification of bots, and studying automated characteristics of bots vs. humans, observing a number of clear political infiltration and advertisement campaigns. differences (xIV). For example, we find that humans generate Index Terms—content propagation; social network analysis; bot characterisation; behavioural analysis far more novel content, while bots rely more on retweeting. We also observe less intuitive trends, such as the propensity I. INTRODUCTION of bots to tweet more URLs, and upload bulkier media (e.g. images). We further analyse the social interconnectedness of Bots (automated agents) exist in vast quantities in online bots and humans to characterise how they influence the wider social networks. They are created for a number of purposes, 1 2 Twittersphere. We observe that, although human contributions e.g. news, marketing, link farming, political infiltration, are generally considered more important via typical metrics spamming and spreading malicious content. (e.g. number of likes, retweets), bots still sustain significant The rise of bots on Twitter is evidenced by a number of 3 influence over content production and propagation. As well studies [11], [15], [7], [5], and articles discussing bots. This as providing a powerful underpinning for future bot detection constitutes a radial shift in the nature of content production, methods, our work makes contributions to the wider field of which has traditionally been the realm of human creativity (or social content automation. Such understanding is critical for at least intervention). Although there have been past studies future studies of social media, which are often skewed by the on bots (xII), we are particularly interested in exploring their presence of bots. role in the wider social ecosystem, and how their behavioural characteristics differ from humans. This is driven by many II. RELATED WORK factors. The limited cognitive ability of bots clearly plays a Two main streams of research are relevant: (i) social, major role, however, it is also driven by their diverse range of demographical and behavioural analyses of either bots or purposes, ranging from curating news to answering customer humans; and (ii) the impact of bots in social environments. Bot queries. This raises a number of interesting questions regarding detection is not the focus of this paper, rather we characterise how these bots operate, interact and affect online content and compare the types of users for a broader understanding. production: What are the typical behaviours of humans and Social analysis of bots or humans. Most related to our work are behavioural studies of bots or humans. For example, 1Link farming – http://bit.ly/2cXhfBv 2Bots distort U.S. presidential election – http://bit.ly/2l3VzGf [8] studied the infiltration strategies of social bots on Twitter 3Bots in press and blogs – http://bit.ly/2dBAIbB using a manual approach. They use three metrics to quantify the infiltration of social bots: followers, popularity score, and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not message-based interaction (other users favouriting, retweeting, made or distributed for profit or commercial advantage and that copies bear replying or mentioning the bot). They found that bots can this notice and the full citation on the first page. Copyrights for components successfully evade Twitter defences (only 38 out of their 120 of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to bots got suspended over the course of 30 days). redistribute to lists, requires prior specific permission and/or a fee. Request Researchers have also inspected bot or human behaviour, permissions from [email protected]. though isolation. For example, [3] examined the retweet be- ASONAM ’17, July 31 - August 03, 2017, Sydney, Australia haviour of people, focussing on how people tweet, as well c 2017 Association for Computing Machinery. ACM ISBN 978-1-4503-4993-2/17/07. $15.00 http://dx.doi.org/10.1145/3110025.3110090 as why and what people retweet. The authors found that addition to known metrics (age, tweets, retweets, favourites, participants retweet using different styles, and for diverse replies and mentions, URL count, follower-friend ratio, etc), reasons (e.g. for others or for social action). This is relevant to we also analyse a set of six novel metrics not explored in our own work, as we also study retweets. In contrast, our work past bot research. These are: likes per tweet, retweets per provides further insights on important differences and striking tweet, user replies and mentions, activity source count, type of similarities between bots and humans in terms of retweet activity sources, and size of content uploaded. The selection patterns, account lifetime, content creation, content popularity, of features is driven by [6] and, to our knowledge, this is the entity interaction, content consumption, account reciprocity, most comprehensive study to date. and content propagation. To the best of our knowledge, we are the first to perform this methodical comparison of B. Data Pre-Processing representative metrics across these types of Twitter accounts. Our data contains a range of accounts in terms of their pop- Social influence of bots. In [2], authors use a bot on ularity (i.e. number of followers). Hence, we partition profiles aNobii, a social networking site aimed at readers, to explore into four popularity groups to enable a deeper understanding. the trust, popularity and influence of bots. They show that These are as follows: gaining popularity does not require individualistic user features G10M+– celebrity status: This is the subset of Twitter or actions, but rather simple social probing (i.e. bots following users with the highest number of followers, i.e. >9M follow- and sending messages to users randomly). The authors also ers. These are the most popular users, who hold celebrity status found that an account can circumvent trust if it is popular and are globally renowned. Popular and credible organisations (since popularity translates into influence). Closely related (e.g. CNN, NetGeo) use these accounts for various purposes, is [14], which develops models to identify users who are which makes them free of spam, thus having high credibility susceptible to social bots, i.e. likely to follow and interact with and trustworthiness. bots. The authors use a dataset from the Social Bot Challenge G1M– very popular: This subset of Twitter users is 2011, and make a number of interesting findings, e.g. that users amongst the most popular on the platform, i.e. 900K to 1.1M who employ more negation words have a higher susceptibility followers. These users are close to celebrity status and global level. Recent work [9] has also shown the impact of bots on recognition (e.g. nytfood, pcgamer). Twitter activity using a non-infiltrating honeypot experiment. G100k– mid-level recognition: This subset represents In our work, we study the characteristics of existing bots in popular accounts with mid-level recognition (e.g. CBSPhilly, detail and argue that this provides far broader vantage into real DomusWeb), i.e. 90k to 110k followers. bot activities. Hence, unlike studies that focus on the influence G1k– lower popularity: This subset represents more or- of individual bots (e.g. the Syrian Civil War [1]), we gain dinary users, i.e. 0.9k to 1.1k followers. These users (e.g. perspective on the wider spectrum of how bots and humans hope bot, Taiwan Agent) form a large base and, though they operate, and interact. show lower individual and accumulated activity, they do form the all-important tail of the distribution. III. METHODOLOGY Our dataset5 is a representative sample of Twitter users, 4 We use and build upon our previous work Stweeler [10] where each metric follows a Gaussian distribution. G10M+ for data collection, pre-processing, human annotation, and and G1M are similar in their characteristics (cf. xIV) and analysis. We define a ‘bot’ as any account that consistently constitute 0.65% of the total 105k accounts we partitioned involves automation over the observed period, e.g.

Of Bots and Humans (On Twitter)

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support