Modeling and Identifying Russian Troll Accounts on Twitter

Modeling and Identifying Russian Troll Accounts on Twitter

Still Out There: Modeling and Identifying Russian Troll Accounts on Twitter Jane Im,y Eshwar Chandrasekharan,yy Jackson Sargent,y Paige Lighthammer,y Taylor Denby,y Ankit Bhargava,y Libby Hemphill,y David Jurgens,y Eric Gilberty yUniversity of Michigan yyGeorgia Institute of Technology [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Abstract Twitter1 and “normal”, control accounts (170K accounts, or There is evidence that Russia’s Internet Research Agency 98.6% of our sample) collected by the authors. (See Figure 1 attempted to interfere with the 2016 U.S. election by run- for a visual overview of the process used in this work.) Our ning fake accounts on Twitter—often referred to as “Russian goals are to determine whether “new” trolls can be identified trolls”. In this work, we: 1) develop machine learning models by models built on “old” trolls and to demonstrate that troll that predict whether a Twitter account is a Russian troll within detection is both possible and efficient, even with “old” data. a set of 170K control accounts; and, 2) demonstrate that it is We find that it is possible to disambiguate between a Rus- possible to use this model to find active accounts on Twitter sian troll account and a large number of these randomly se- still likely acting on behalf of the Russian state. Using both lected control accounts among users. One model, a simple behavioral and linguistic features, we show that it is possible logistic regression, achieves a precision of 78.5% and an to distinguish between a troll and a non-troll with a precision of 78.5% and an AUC of 98.9%, under cross-validation. Ap- AUC of 98.9%. Next we asked whether it was possible to plying the model to out-of-sample accounts still active today, use the model trained on past data to unmask Russian trolls we find that up to 2.6% of top journalists’ mentions are oc- currently active on Twitter (see Figure 2 for an example)? cupied by Russian trolls. These findings imply that the Rus- The logistic regression is attractive in this context as its sim- sian trolls are very likely still active today. Additional anal- plicity seems most likely to generalize to out-of-sample data. ysis shows that they are not merely software-controlled bots, Toward that end, we apply our classifier to Twitter accounts and manage their online identities in various complex ways. that mentioned high-profile journalists in late 2018. We find Finally, we argue that if it is possible to discover these ac- the computational model flags 3.7% of them as statistically counts using externally-accessible data, then the platforms— likely Russian trolls and find reasonable agreement between with access to a variety of private internal signals—should our classifier and human labelers. succeed at similar or better rates. Our model allows us to estimate the activity of trolls. As a case study, we estimate the activity of suspected Russian 1 Introduction troll accounts engaging in one type of adversarial campaign: It is widely believed that Russia’s Internet Research Agency engaging with prominent journalists. Since we have no way (IRA) tried to interfere with the 2016 U.S. election as well of truly knowing which of these model-identified accounts as other elections by running fake accounts on Twitter— are truly Russian trolls—perhaps only the IRA knows this— often called the “Russian troll” accounts (Gorodnichenko, we perform a secondary human evaluation in order to estab- Pham, and Talavera 2018; Ferrara 2017; Stella, Ferrara, lish consensus on whether the model is identifying validly and De Domenico 2018). This interference could have im- suspicious accounts. Our human evaluation process suggests mense consequences considering the viral nature of some that roughly 70% of these model-flagged accounts—all of tweets (Mustafaraj and Metaxas 2010; Metaxas and Musta- them still currently active on Twitter—are highly likely to be arXiv:1901.11162v1 [cs.SI] 31 Jan 2019 faraj 2012), the number of users exposed to Russian trolls’ Russian trolls. As a result, we estimate that Russian trolls oc- content (Isaac and Wakabayashi 2017; Spangher et al. 2018), cupy 2.6% of the mentions of high-profile journalists’ men- and the critical role social media have played in past polit- tions. Moreover, we find that in contrast with some prevail- ical campaigns (Cogburn and Espinoza-Vasquez 2011). In ing narratives surrounding the Russian troll program, the this paper, we develop models on a dataset of Russian trolls model-flagged accounts do not score highly on the well- active on Twitter during the 2016 U.S. elections to predict known Botometer scale (Davis et al. 2016), indicating that currently active Russian trolls. We construct machine learn- they are not simply automated software agents. ing classifiers using profile elements, behavioral features, Finally, we perform an exploratory open coding of the language distribution, function word usage, and linguistic identity deception strategies used by the currently active ac- features, on a highly unbalanced dataset of Russian troll ac- counts discovered by our model. For instance, some pretend counts (2.2K accounts, or 1.4% of our sample) released by Copyright c 2018, Association for the Advancement of Artificial 1https://about.twitter.com/en_us/values/ Intelligence (www.aaai.org). All rights reserved. elections-integrity.html#data Figure 1: Flowchart illustrating the steps of our research pipeline. to be an American mother or a middle-aged white man via case of the 2016 E.U. Referendum (Brexit Referendum), and profile pictures and descriptions, but their tweet rates are ab- then dropped afterwards (Gorodnichenko, Pham, and Talav- normally high, and their tweets revolve solely around politi- era 2018). Furthermore, it is suspected that Russia is be- cal topics. hind the MacronLeaks campaign that occurred during the This paper makes the following three contributions, build- 2017 French presidential elections period (Ferrara 2017), ing on an emerging line of scholarship around the Russian as well as the Catalonian referendum (Stella, Ferrara, and troll accounts (Stewart, Arif, and Starbird 2018; Spangher et De Domenico 2018). al. 2018; Griffin and Bickel 2018; Zannettou et al. 2018b; Boatwright, Linvill, and Warren 2018; Boyd et al. 2018). 2.2 Emerging Work on the 2016 Russian Trolls First, we show that it is possible to separate Russian trolls While a brand new area of scholarship, emerging work has from other accounts in the data previous to 2019, and that examined the datasets of Russian trolls released by Twitter. this computational model is still accurate on 2019 data. As a Researchers from Clemson University identified five cate- corollary, we believe this work establishes that a large num- gories of trolls and argued the behavior between these cat- ber of Russian troll accounts are likely to be currently active egories were radically different (Boatwright, Linvill, and on Twitter. Second, we provide our model to the research 2 Warren 2018). This was especially marked for left- and community. This will enable other researchers to study their right-leaning accounts (the dataset contains both). For in- own questions about the trolls, such as “What are their ob- stance, the IRA promoted more left-leaning content than jectives?” and “How are they changing over time?” Third, right-leaning on Facebook, while right-leaning Twitter han- we find that accounts flagged by our model as Russian trolls dles received more engagement. (Spangher et al. 2018). are not merely bots but use diverse ways to build and man- New work has looked at how the Russian troll accounts age their online identities. Finally, we argue that if it is pos- were retweeted in the context of the #BlackLivesMatter sible to discover these accounts using externally-accessible movement (Stewart, Arif, and Starbird 2018)—a movement data, then the social platforms—with access to a variety of targeted by the trolls. The retweets were divided among dif- private, internal signals—should succeed at similar or better ferent political perspectives and the trolls took advantage of rates at finding and deactivating Russian troll accounts. this division. There is some disagreement about how pre- dictable the Russian trolls are. Griffin and Bickel (2018) ar- 2 Related Work gue that the Russian trolls are composed of accounts with First, we review what is known about Russian’s interfer- common but customized behavioral characteristics that can ence in Western democracies via online campaigns, and then be used for future identification (Griffin and Bickel 2018), move on to the emerging work on these 2016 election related while other work has shown that the trolls’ tactics and tar- Russian trolls themselves. We conclude by discussing work gets change over time, implying that the task of automatic on social bots, and by reviewing theories of online deception detection is not simple (Zannettou et al. 2018b). Finally, the that inform the quantitative approaches in this paper. Russian trolls show unique linguistic behavior as compared to a baseline cohort (Boyd et al. 2018). 2.1 Russia’s Interference on Political Campaigns Users Who Interact with the Trolls. Recent work has While state-level online interference in democratic pro- also examined the users who interact with the Russian trolls cesses is an emerging phenomenon, new research docu- on Twitter. For example, misinformation produced by the ments Russia’s online political manipulation campaigns in Russian trolls was shared more often by conservatives than countries other than the United States. For instance, previ- liberals on Twitter (Badawy, Ferrara, and Lerman 2018). ous work has shown that a high volume of Russian tweets Models can predict which users will spread the trolls’ con- were generated a few days before the voting day in the tent by making use of political ideology, bot likelihood scores, and activity-related account metadata (Badawy, Ler- 2URL available after blind review.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us