arXiv:1910.02001v1 [cs.CL] 4 Oct 2019 aiuaintol aeas emerged. also have trolls opinion manipulation re- political of More farms” “troll content. organized cently, inflammatory dis- posting sow by to seeking cord people, upset community and online quarrel an who of users are “trolls” Introduction 1 [email protected] erigseai,weelbl o rlsare trolls ( for and labels available, where scenario, learning h eodseai,wihhsntbe ex- far. been thus not literature the has in plored which for scenario, case second compelling the a providing sce- while first the nario, in state-of-the-art im- the over methodology proves Russian our that “IRA show the dataset on Troll” Experiments trolls. for the embeddings, i.e., of types representations, a several learned on- extract as we the represented which in trolls from messages graph, of the network of social text line the and ture trolls. struc- the community the by leverage we mentioned Technically, outlets news la- for more-commonly-available bels on rely available, not we are and trolls for labels where nario, xrce rmsca ei,ie,Titr in ( Twitter, i.e., scenarios: features media, two using social by from right— extracted feed, to role news political how their show —left, to we according particular, trolls In classify real- a in this setting. learning automate istic machine to using how by show analysis we paper, this In farms. troll newly-discovered first- for a tool labor- as response impractical and it making manual thus is intensive, analysis this population. the However, of of slices traces different target patterns, online which behavioral different the influence shown of has even trolls Analysis and ability opinion elections. their public sway for In- to attention Russian enormous the recently have to gained (IRA), trolls, linked Agency Political Research ones ternet the media. as “Inter- social such of in roles trolls” political net the investigate We tnsAtanasov Atanas oaUniversity Sofia Bulgaria rdcigteRl fPltclTol nSca Media Social in Trolls Political of Role the Predicting ii nadsatspriinsce- supervision distant a in ) i Abstract nataiinlsupervised traditional a in ) inac eFacsiMorales Francisci De Gianmarco [email protected] S Foundation ISI Italy ietescey n vniflec elections influence destabi- even opinions, and sway ( society, to the order lize in pro- paganda “sockpup- and so-called disseminate user which the pets”, pseudonymous personas, of state-sponsored and set accounts a of control consist who agents usually farms Such iht iietetol nosm ftefunc- the of ( some by into identified trolls categories tional the divide consider we and to we English, wish in case, content post this that For accounts only available. from is dataset Pres- public Twitter US a 2016 which the for par- of Elections, In case idential the (ML). process on learning this focus machine we automate ticular, of to help is the paper with goal current Our the actions. their in to according catego- trolls to the and rize data the through sift to labor manual time-consuming and painstaking requires analysis al. et Llewellyn fbigaflyatmtdslto oorinitial our to solution automated problem. fully a short being fall of would approach super- learning any to Thus, machine labels vised present. the be not available, would be from learn might accounts the identities the of While discovered. been elec- just US has or tions European farm upcoming troll the new sway a to Suppose trying case. use applicable real-world directly a not to is scenario a such for lution hr ewn olanafnto rmuesto users from function a learn categories to want learning, we supervised where is scenario ML prototypical ( UK in Brexit the and as referendum Elections such Presidential US circumstances, 2016 recent the different in lyzed cnrohsbe osdrdpeiul ntelit- This by the erature in available. previously considered are been users has troll scenario the for labels truth ivl n Warren and Linvill 2018 ecnie w osbeseais h first, The scenarios. possible two consider We ana- been has trolls political of behavior The ): ettroll left { i tal. et Kim left , , , right aa optn Research Computing Qatar ih troll right 2018 [email protected] nttt,HK,Qatar HBKU, Institute, , 2018 , ( 2019 .Hwvr hskn of kind this However, ). esfeed news ivl n Warren and Linvill rsa Nakov Preslav ). and , .Ufruaey so- a Unfortunately, ). ivl n Warren and Linvill } esfeed news n h ground the and , , . 2018 ; A more realistic scenario assumes that labels for 2 Related Work troll accounts are not available. In this case, we 2.1 Trolls and Opinion Manipulation need to use some external information in order to learn a labeling function. Indeed, we leverage The promise of to democratize con- more persistent entities and their labels: news me- tent creation (Kaplan and Haenlein, 2010) has dia. We assume a learning scenario with distant been accompanied by many malicious attempts supervision where labels for news media are avail- to spread misleading information over this new able. By combining these labels with a citation medium, which quickly got populated by sock- graph from the troll accounts to news media, we puppets (Kumar et al., 2017), can infer the final labeling on the accounts them- (Chen et al., 2013), astroturfers (Ratkiewicz et al., selves without any need for manual labeling. 2011), and seminar users (Darwish et al., 2017). One advantage of using distant supervision is Several studies have shown that trust is an impor- that we can get insights about the behavior of tant factor in online relationships (Ho et al., 2012; a newly-discovered quickly and effort- Ku, 2012; Hsu et al., 2014; Elbeltagi and Agag, lessly. Differently from troll accounts in social 2016; Ha et al., 2016), but building trust is a long- media, which usually have a high churn rate, news term process and our understanding of it is still media accounts in social media are quite stable. in its infancy (Salo and Karjaluoto, 2007). This Therefore, the latter can be used as an anchor point makes it easy for politicians and companies to to understand the behavior of trolls, for which data manipulate user opinions in forums (Dellarocas, may not be available. 2006; Li et al., 2016; Zhuang et al., 2018). We rely on embeddings extracted from social Trolls. Social media have seen the prolifera- media. In particular, we use a combination of em- tion of and clickbait (Hardalov et al., beddings built on the user-to-user mention graph, 2016; Karadzhov et al., 2017a), aggressiveness the user-to-hashtag mention graph, and the text of (Moore et al., 2012), and trolling (Cole, 2015). the tweets of the troll accounts. We further explore The latter often is understood to concern mali- several possible approaches using label propaga- cious online behavior that is intended to disrupt tion for the distant supervision scenario. interactions, to aggravate interacting partners, and As a result of our approach, we improve the to lure them into fruitless argumentation in or- classification accuracy by more than 5 percent- der to disrupt online interactions and communica- age points for the supervised learning scenario. tion (Chen et al., 2013). Here we are interested in The distant supervision scenario has not previ- studying not just any trolls, but those that engage ously been considered in the literature, and is one in opinion manipulation (Mihaylov et al., 2015a,b, of the main contributions of the paper. We show 2018). This latter definition of troll has also be- that even by hiding the labels from the ML algo- come prominent in the general public discourse rithm, we can recover 78.5% of the correct labels. recently. Del Vicario et al. (2016) have also sug- The contributions of this paper can be summa- gested that the spreading of misinformation on- rized as follows: line is fostered by the presence of polarization and echo chambers in social media (Garimella et al., • We predict the political role of Internet trolls 2016, 2017, 2018). (left, news feed, right) in a realistic, unsuper- Trolling behavior is present and has been vised scenario, where labels for the trolls are studied in all kinds of online media: on- not available, and which has not been explored line magazines (Binns, 2012), social network- in the literature before. ing sites (Cole, 2015), online computer games • We propose a novel distant supervision ap- (Thacker and Griffiths, 2012), online encyclope- proach for this scenario, based on graph em- dia (Shachaf and Hara, 2010), and online newspa- beddings, BERT, and label propagation, which pers (Ruiz et al., 2011), among others. projects the more-commonly-available labels Troll detection was addressed using domain- for news media onto the trolls who cited these adapted sentiment analysis (Seah et al., 2015), media. lexico-syntactic features about writing style and • We improve over the state of the art in the tra- structure (Chen et al., 2012; Mihaylov and Nakov, ditional, fully supervised setting, where train- 2016), and graph-based approaches over signed ing labels are available. social networks (Kumar et al., 2014). Sockpuppet is a related notion, and refers to For example, Castillo et al. (2011) leverage user a person who assumes a false identity in an In- reputation, author writing style, and various time- ternet community and then speaks to or about based features, Canini et al. (2011) analyze the themselves while pretending to be another person. interaction of content and social network struc- The term has also been used to refer to opinion ture, and Morris et al. (2012) studied how Twit- manipulation, e.g., in Wikipedia (Solorio et al., ter users judge truthfulness. Zubiaga et al. (2016) 2014). Sockpuppets have been identified by us- study how people handle rumors in social me- ing authorship-identification techniques and link dia, and found that users with higher reputation analysis (Bu et al., 2013). It has been also shown are more trusted, and thus can spread rumors eas- that sockpuppets differ from ordinary users in their ily. Lukasik et al. (2015) use temporal patterns posting behavior, linguistic traits, and social net- to detect rumors and to predict their frequency, work structure (Kumar et al., 2017). and Zubiaga et al. (2016) focus on conversational Internet Water Army is a literal translation of threads. More recent work has focused on the the Chinese term wangluo shuijun, which is a credibility and the factuality in community forums metaphor for a large number of people who are (Nakov et al., 2017; Mihaylova et al., 2018, 2019; well organized to flood the Internet with purpose- Mihaylov et al., 2018). ful comments and articles. Internet water army has been allegedly used in by the govern- 2.2 Understanding the Role of Political Trolls ment (also known as ) as well as by None of the above work has focused on under- a number of private organizations. standing the role of political trolls. The only Astroturfing is an effort to simulate a political closely relevant work is that of Kim et al. (2019), grass-roots movement. It has attracted strong in- who predict the roles of the Russian trolls on Twit- terest from political science, and research on it has ter by leveraging social theory and Actor-Network focused on massive streams of microblogging data Theory approaches. They characterize trolls using (Ratkiewicz et al., 2011). the digital traces they leave behind, which is mod- Identification of malicious accounts in so- eled using a time-sensitive semantic edit distance. cial media includes detecting spam accounts For this purpose, they use the “IRA Russian Troll” (Almaatouq et al., 2016; McCord and Chuah, dataset (Linvill and Warren, 2018), which we also 2011), fake accounts (Fire et al., 2014; use in our experiments. However, we have a very Cresci et al., 2015), compromised and phish- different approach based on graph embeddings, ing accounts (Adewole et al., 2017). Fake profile which we show to be superior to their method in detection has also been studied in the context the supervised setup. We further experiment with of cyber-bullying (Gal´an-Garc´ıa et al., 2016). A a new, and arguably more realistic, setup based on related problem is that of Web spam detection, distant supervision, where labels are not available. which has been addressed as a text classification To the best of our knowledge, this setup has not problem (Sebastiani, 2002), e.g., using spam been explored in previous work. keyword spotting (Dave et al., 2003), lexical affinity of arbitrary words to spam content 2.3 Graph Embeddings (Hu and Liu, 2004), frequency of punctuation and Graph embeddings are machine learning tech- word co-occurrence (Li et al., 2006). niques to model and capture key features from Trustworthiness of online statements is an a graph automatically. They can be trained ei- emerging topic, given the interest in fake news ther in a supervised or in an unsupervised manner (Lazer et al., 2018). It is related to trolls, as they (Cai et al., 2018). The produced embeddings are often engage in opinion manipulation and spread latent vector representations that map each vertex rumors (Vosoughi et al., 2018). Research topics V in a graph G to a d-dimensional vector. The vec- include predicting the credibility of information tors capture the underlying structure of the graph in social media (Ma et al., 2016; Mitra et al., by putting “similar” vertices close together in the 2017; Karadzhov et al., 2017b; Popat et al., vector space. By expressing our data as a graph 2017) and political debates (Hassan et al., 2015; structure, we can leverage and extract critical in- Gencheva et al., 2017; Jaradat et al., 2018), and sights about the topology and the contextual rela- stance classification (Mohtarami et al., 2018). tionships between the vertices in the graph. In mathematical terms, graph embeddings can be Technically, we leverage the community structure expressed as a function f : V → Rd from the set and the text of the messages in the social network of vertices V to a set of embeddings, where d is the of political trolls represented as a graph, from dimensionality of the embeddings. The function f which we learn and extract several types of vector can be represented as a matrix of dimensions |V |× representations, i.e., troll user embeddings. Then, d. In our experiments, we train Graph Embeddings armed with these representations, we tackle the in an unsupervised manner by using node2vec following tasks: (Grover and Leskovec, 2016), which is based on random walks over the graph. Essentially, this is T1 A fully supervised learning task, where we an application of the well-known skip-gram model have labeled training data with example troll (Mikolov et al., 2013) from word2vec to random and their roles; walks on graphs. Besides node2vec, there have been a num- T2 A distant supervision learning task, in which ber of competing proposals for building graph labels for the troll roles are not available at embeddings; see (Cai et al., 2018) for an ex- training time, and thus we use labels for news tensive overview of the topic. For example, media as a proxy, from which we infer labels SNE (Liao et al., 2018) model both the graph for the troll users. structure and some node attributes. Similarly, Line (Tang et al., 2015) represent each node as 3.1 Embeddings the concatenation of two embedded vectors that We use two graph-based (user-to-hashtag and model first- and second-order proximity. TriDNR user-to-mentioned-user) and one text-based (Pan et al., 2016) represents nodes by coupling (BERT) embedding representations. several neural network models. For our experi- ments, we use node2vec, as we do not have ac- 3.1.1 U2H cess to user attributes: the users have been banned from Twitter, their accounts were suspended, and We build a bipartite, undirected User-to-Hashtag we only have access to their tweets thanks to the (U2H) graph, where nodes are users and hashtags, “IRA Russian Trolls” dataset. and there is an edge (u, h) between a user node u and a hashtag node h if user u uses hashtag h in 3 Method their tweets. This graph is bipartite as there are no edges connecting two user nodes or two hashtag Given a set of known political troll users (each nodes. We run node2vec (Grover and Leskovec, user being represented as a collection of their 2016) on this graph, and we extract the embed- tweets), we aim to detect their role: left, right, dings for the users (we ignore the hashtag embed- or news feed. Linvill and Warren (2018) describe dings). We use 128 dimensions for the output em- these roles as follows: beddings. These embeddings capture how similar Right Trolls spread nativist and right-leaning troll users are based on their usage of hashtags. populist messages. Such trolls support the candi- dacy and Presidency of Donald Trump and den- 3.1.2 U2M igrate the Democratic Party; moreover, they of- We build an undirected User-to-Mentioned-User ten send divisive messages about mainstream and (U2M) graph, where the nodes are users, and there moderate Republicans. is an edge (u, v) between two nodes if user u men- Left Trolls send socially liberal messages and tions user v in their tweets (i.e., u has authored a discuss gender, sexual, religious, and -especially- tweet that contains “@v” ). We run node2vec on racial identity. Many tweets are seemed intention- this graph and we extract the embeddings for the ally divisive, attacking mainstream Democratic users. As we are interested only in the troll users, politicians, particularly Hillary Clinton, while we ignore the embeddings of users who are only supporting Bernie Sanders prior to the elections. mentioned by other trolls. We use 128 dimensions News Feed Trolls overwhelmingly present for the output embeddings. The embeddings ex- themselves as US local news aggregators, linking tracted from this graph capture how similar troll to legitimate regional news sources and tweeting users are according to the targets of their discus- about issues of local interest. sions on the social network. 3.1.3 BERT We propagate labels from the given media to the BERT offers state-of-the-art text embeddings troll user that mentions them according to the fol- based on the Transformer (Devlin et al., 2019). lowing media-to-user mapping: We use the pre-trained BERT-large, uncased LEFT→ left model, which has 24-layers, 1024-hidden, 16- RIGHT→ right (1) heads, and 340M parameters, which yields out- CENTER→ news feed put embeddings with 768 dimensions. Given a tweet, we generate an embedding for it by averag- This propagation can be done in different ways: ing the representations of the BERT tokens from (a) by training a proxy model for media and then the penultimate layer of the neural network. To applying it to users, (b) by additionally using label obtain a representation for a user, we average the propagation (LP) for semi-supervised learning. embeddings of all their tweets. The embeddings Let us describe the proxy model propagation for extracted from the text capture how similar users (a) first. Let M be the set of media, and U be the are according to their use of language. set of users. We say a user u ∈ U mentions a 3.2 Fully Supervised Learning (T1) medium m ∈ M if u posts a tweet that contains a link to the website of m. We denote the set of Given a set of troll users for which we have labels, users that mention the medium m as Cm ⊆ U. we use the above embeddings as a representation We can therefore create a representation for a to train a classifier. We use an L2-regularized lo- medium by aggregating the embeddings of the gistic regression (LR) classifier. Each troll user users that mention the target medium. Such a is an example, and the label for the user is avail- representation is convenient as it lies in the same able for training thanks to manual labeling. We space as the user representation. In particular, can therefore use cross-validation to evaluate the given a medium m ∈ M, we compute its repre- predictive performance of the model, and thus the sentation R(m) as predictive power of the features. We experiment with two ways of combining 1 R(m)= X R(u), (2) features: embedding concatenation and model en- |Cm| u∈Cm sembling. Embedding concatenation concatenates the feature vectors from different embeddings into where R(u) is the representation of user u, a longer feature vector, which we then use to train i.e., one (or a concatenation) of the embeddings the LR model. Model ensembling instead trains a described in Section 3.1. separate model with each kind of embedding, and Finally, we can train a LR model that uses R(m) then merges the prediction of the different mod- as features and the label for the medium l(m). els by averaging the posterior probabilities for the This model can be applied to predict the label of different classes. Henceforth, we denote embed- a user u by using the same type of representation ding concatenation with the symbol k and model R(u), and the label mapping in Equation 1. ensembling with ⊕. For example, U2H k U2M is Label Propagation (b) is a transductive, graph- a model trained on the concatenation of U2H and based, semi-supervised machine learning algo- U2M embeddings, while U2H ⊕ BERT represents rithm that, given a small set of labeled examples, the average predictions of two models, one trained assigns labels to previously unlabeled examples. on U2H embeddings and one on BERT. The labels of each example change in relationship to the labels of neighboring ones in a properly- 3.3 Distant Supervision (T2) defined graph. In the distant supervision scenario, we assume not More formally, given a partially-labeled dataset to have access to user labels. Given a set of troll of examples X = Xu ∪ Xl, of which Xl are la- users without labels, we use the embeddings de- beled examples with labels Yl, and Xu are unla- scribed in Section 3.1 together with mentions of beled examples, and a similarity graph G(X, E), news media by the troll users to create proxy mod- the label propagation algorithm finds the set of un- els. We assume that labels for news media are known labels Yu such that the number of discor- readily available, as they are stable sources of in- dant pairs (u, v) ∈ E : yu 6= yv is minimized, formation that have a low churn rate. where yz is the label assigned to example z. Role Users Tweets User Example Tweet Example Left 233 427 141 @samirgooden @MichaelSkolnik @KatrinaPierson @samesfandiari Trump folks need to stop going on CNN. Right 630 711 668 @chirrmorre BREAKING: Trump ERASES Obama’s Islamic Refugee Policy! https://t.co/uPTneTMNM5 News Feed 54 598 226 @dailysandiego Exit poll: Wisconsin GOP voters excited, scared about Trump #pol- itics Table 1: Statistics and examples from the IRA Russian Trolls Tweets dataset.

The algorithm works as follows: At every iter- Finally, we perform label propagation on the ation of propagation, each unlabeled node updates similarity graph defined by the embedding simi- its label to the most frequent one among its neigh- larity, with the set of nodes corresponding to M bors. LP reaches convergence when each node has starting with labels, and with the set of nodes cor- the same label as the majority of its neighbors. We responding to U starting without labels. define two different versions of LP by creating two different versions of the similarity graph G. 4 Data LP1 Label Propagation using direct mention. 4.1 IRA Russian Troll Tweets In the first case, the set of edges among users U Our main dataset contains 2 973 371 tweets by in the similarity graph G consists of the logical 2848 Twitter users, which the US House In- OR between the 2-hop closure of the U2H and the telligence Committee has linked to the Russian U2M graph. That is, for each two users u, v ∈ U, (IRA). The data was there is an edge in the similarity graph (u, v) ∈ E collected and published by Linvill and Warren if u and v share a common hashtag or a common (2018), and then made available online.1 The time user mention span covers the period from February 2012 to May (u, h) ∈ U2H ∧ (v, h) ∈ U2H ∨ 2018. The trolls belong to the following manually as- u, w U2M v, w U2M ( ) ∈ ∧ ( ) ∈ signed roles: Left Troll, Right Troll, News Feed, The graph therefore uses the same information Commercial, Fearmonger, Hashtag Gamer, Non that is available to the embeddings. English, Unknown. Kim et al. (2019) have argued To this graph, which currently encompasses that the first three categories are not only the most only the set of users U, we add connections to the frequent, but also the most interesting ones. More- set of media M. We add an edge between each over, focusing on these troll types allows us to es- pair (u, m) if u ∈ Cm. Then, we run the label tablish a connection between troll types and the propagation algorithm, which propagates the la- political bias of the news media they mention. Ta- bels from the labeled nodes M to the unlabeled ble 1 shows a summary of the troll role distribu- nodes U, thanks to the mapping from Equation 1. tion, the total number of tweets per role, as well as examples of troll usernames and tweets. LP2 Label Propagation based on a similarity graph. 4.2 Media Bias/Fact Check In this case, we use the same representation for We use data from Media Bias/Fact Check the media as in the proxy model case above, as de- (MBFC)2 to label news media sites. MBFC di- scribed by Equation 2. Then, we build a similarity vides news media into the following bias cate- graph among media and users based on their em- gories: Extreme-Left, Left, Center-Left, Center, beddings. For each pair x,y ∈ U ∪ M there is an Center-Right, Right, and Extreme-Right. We re- edge in the similarity graph (x,y) ∈ E iff duce the granularity to three categories by group- sim(R(x),R(y)) > τ, ing Extreme-Left and Left as LEFT, Extreme- where sim is a similarity function between vectors, Right and Right as RIGHT, and Center-Left, e.g., cosine similarity, and τ is a user-specified pa- Center-Right, and Center as CENTER. rameter that regulates the sparseness of the simi- 1http://github.com/fivethirtyeight/russian-troll-tweets larity graph. 2http://mediabiasfactcheck.com Bias Count Example We can see that accuracy and macro-averaged F1 are strongly correlated and yield very consistent LEFT 341 www.cnn.com rankings for the different models. Thus, hence- RIGHT 619 www.foxnews.com forth we will focus our discussion on accuracy. CENTER 372 www.apnews.com We can see in Table 3 that it is possible to pre- Table 2: Summary statistics about the Media dict the roles of the troll users by using distant su- Bias/Fact Check (MBFC) dataset. pervision with relatively high accuracy. Indeed, the results for T2 are lower compared to their T1 counterparts by only 10 and 20 points absolute in Table 2 shows some basic statistics about the terms of accuracy and F1, respectively. This is im- resulting media dataset. Similarly to the IRA pressive considering that the models for T2 have dataset, the distribution is right-heavy. no access to labels for troll users. Looking at individual features, for both T1 and 5 Experiments and Evaluation T2, the embeddings from U2M outperform those 5.1 Experimental Setup from U2H and from BERT. One possible reason is that the U2M graph is larger, and thus contains For each user in the IRA dataset, we extracted all more information. It is also possible that the so- the links in their tweets, we expanded them recur- cial circle of a troll user is more indicative than sively if they were shortened, we extracted the do- the hashtags they used. Finally, the textual content main of the link, and we checked whether it could on Twitter is quite noisy, and thus the BERT em- be found in the MBFC dataset. By grouping these beddings perform slightly worse when used alone. relationships by media, we constructed the sets of All our models with a single type of embedding users Cm that mention a given medium m ∈ M. easily outperform the model of Kim et al. (2019). The U2H graph consists of 108 410 nodes and The difference is even larger when combining the 443 121 edges, while the U2M graph has 591 793 embeddings, be it by concatenating the embedding nodes and 832 844 edges. We ran node2vec on vectors or by training separate models and then each graph to extract 128-dimensional vectors combining the posteriors of their predictions. for each node. We used these vectors as fea- tures for the fully supervised and for the distant- By concatenating the U2M and the U2H em- supervision scenarios. For Label Propagation, we beddings (U2H k U2M), we fully leverage the used an empirical threshold for edge materializa- hashtags and the mention representations in the la- tent space, thus achieving accuracy of 88.7 for T1 tion τ = 0.55, to obtain a reasonably sparse simi- larity graph. and 78.0 for T2, which is slightly better than when We used two evaluation measures: accuracy, training separate models and then averaging their and macro-averaged F1 (the harmonic average posteriors (U2H ⊕ U2M): 88.3 for T1 and 77.9 of precision and recall). In the supervised sce- for T2. Adding BERT embeddings to the combi- nario, we performed 5-fold cross-validation. In the nation yields further improvements, and follows a distant-supervision scenario, we propagated labels similar trend, where feature concatenation works from the media to the users. Therefore, in the latter better, yielding 89.2 accuracy for T1 and 78.2 for case the user labels were only used for evaluation. T2 (compared to 89.0 accuracy for T1 and 78.0 for T2 for U2H ⊕ U2M ⊕ BERT). 5.2 Evaluation Results Adding label propagation yields further im- Table 3 shows the evaluation results. Each line provements, both for LP1 and for LP2, with the of the table represents a different combination of latter being slightly superior: 89.6 vs. 89.3 accu- features, models, or techniques. As mentioned in racy for T1, and 78.5 vs. 78.3 for T2. Section 3, the symbol ‘ k ’ denotes a single model Overall, our methodology achieves sizable im- trained on the concatenation of the features, while provements over previous work, reaching an accu- the symbol ‘⊕’ denotes an averaging of individ- racy of 89.6 vs. 84.0 of Kim et al. (2019) in the ual models trained on each feature separately. The fully supervised case. Moreover, it achieves 78.5 tags ‘LP1’ and ‘LP2’ denote the two label propa- accuracy in the distant supervised case, which is gation versions, by mention and by similarity, re- only 11 points behind the result for T1, and is spectively. about 10 points above the majority class baseline. Full Supervision (T1) Distant Supervision (T2) Method Accuracy Macro F1 Accuracy Macro F1 Baseline (majority class) 68.7 27.1 68.7 27.1 Kim et al. (2019) 84.0 75.0 N/A N/A BERT 86.9 83.1 75.1 60.5 U2H 87.1 83.2 76.3 60.9 U2M 88.1 83.9 77.3 62.4 U2H ⊕ U2M 88.3 84.1 77.9 64.1 U2H k U2M 88.7 84.4 78.0 64.6 U2H ⊕ U2M ⊕ BERT 89.0 84.4 78.0 65.0 U2H k U2M k BERT 89.2 84.7 78.2 65.1 U2H k U2M k BERT + LP1 89.3 84.7 78.3 65.1 U2H k U2M k BERT + LP2 89.6 84.9 78.5 65.7 Table 3: Predicting the role of the troll users using full vs. distant supervision.

6 Discussion Method Accuracy Macro F1 6.1 Ablation Study Baseline (majority) 46.5 21.1 BERT 61.8 60.4 We performed different experiments with the U2H 61.6 60.0 hyper-parameters of the graph embeddings. With U2M 62.7 61.4 smaller dimensionality (i.e., using 16 dimensions U2H ⊕ U2M 63.5 61.8 instead of 128), we noticed 2–3 points of absolute U2H k U2M 63.8 61.9 decrease in accuracy across the board. U2H ⊕ U2M ⊕ BERT 63.7 61.8 Moreover, we found that using all of the data for U2H k U2M k BERT 64.0 62.2 learning the embeddings was better than focusing Table 4: Leveraging user embeddings to predict only on users that we target in this study, namely the bias of the media cited by troll users. left, right, and news feed, i.e., using the rest of the data adds additional context to the embedding space, and makes the target labels more contex- tually distinguishable. Similarly, we observe 5–6 6.3 Reverse Classification: Media from Trolls points of absolute drop in accuracy when training our embeddings on tweets by trolls labeled as left, Table 4 shows an experiment in distant supervi- right, and news feed. sion for reverse classification, where we trained a model on the IRA dataset with the troll labels, and 6.2 Comparison to Full Supervision then we applied that model to the representation Next, we compared to the work of Kim et al. of the media in the MBFC dataset, where each (2019), who had a fully supervised learning sce- medium is represented as the average of the em- nario, based on Tarde’s Actor-Network Theory. beddings of the users who cited that medium. We They paid more attention to the content of the can see that we could improve over the baseline by tweet by applying a text-distance metric in order 20 points absolute in terms of accuracy and by 41 to capture the semantic distance between two se- in terms absolute in terms of macro-averaged F1. quences. In contrast, we focus on critical elements We can see in Table 4 that the relative ordering of information that are salient in Twitter: hashtags in terms or performance for the different models is and user mentions. By building a connection be- consistent with that for the experiments in the pre- tween users, hashtags, and user mentions, we ef- vious section. This suggests that the relationship fectively filtered out the noise and we focused only between trolls and media goes both ways, and thus on the most sensitive type of context, thus auto- we can use labels for media as a way to label users, matically capturing features from this network via and we can also use labels for troll users as a way graph embeddings. to label media. 7 Conclusion and Future Work References We have proposed a novel approach to analyze the Kayode Sakariyah Adewole, Nor Badrul Anuar, Amir- rudin Kamsin, Kasturi Dewi Varathan, and Syed Ab- behavior patterns of political trolls according to dul Razak. 2017. Malicious accounts: Dark of the their political leaning (left vs. news feed vs. right) social networks. Journal of Network and Computer using features from social media, i.e., from Twit- Applications, 79:41–67. ter. We experimented with two scenarios: (i) su- Abdullah Almaatouq, Erez Shmueli, Mariam Nouh, pervised learning, where labels for trolls are pro- Ahmad Alabdulkareem, Vivek K Singh, Mansour vided, and (ii) distant supervision, where such la- Alsaleh, Abdulrahman Alarifi, Anas Alfaris, et al. bels are not available, and we rely on more com- 2016. If it looks like a spammer and behaves like a mon labels for news outlets cited by the trolls. spammer, it must be a spammer: analysis and detec- tion of microblogging spam accounts. International Technically, we leveraged the community struc- Journal of Information Security, 15(5):475–491. ture and the text of the messages in the online so- cial network of trolls represented as a graph, from Amy Binns. 2012. DON’T FEED THE TROLLS! which we extracted several types of representa- Managing troublemakers in magazines’ online com- munities. Journalism Practice, 6(4):547–562. tions, i.e., embeddings, for the trolls. Our exper- iments on the “IRA Russian Troll” dataset have Zhan Bu, Zhengyou Xia, and Jiandong Wang. 2013. A shown improvements over the state-of-the-art in sock puppet detection algorithm on virtual spaces. Know.-Based Syst., 37:366–377. the supervised scenario, while providing a com- pelling case for the distant-supervision scenario, Hongyun Cai, Vincent W Zheng, and Kevin Chen- which has not been explored before.3 Chuan Chang. 2018. A comprehensive survey of In future work, we plan to apply our methodol- graph embedding: Problems, techniques, and appli- cations. IEEE Transactions on Knowledge and Data ogy to other political events such as Brexit as well Engineering, 30(9):1616–1637. as to other election campaigns around the world, in connection to which large-scale troll campaigns Kevin R. Canini, Bongwon Suh, and Peter L. Pirolli. 2011. Finding credible information sources in social have been revealed. We further plan experiments networks based on content and social structure. In with other graph embedding methods, and with Proceedings of the IEEE Conference on Privacy, Se- other social media. Finally, the relationship be- curity, Risk, and Trust, and the IEEE Conference on tween media bias and troll’s political role that we Social Computing, SocialCom/PASSAT ’11, pages 1–8, Boston, MA, USA. have highlighted in this paper is extremely inter- esting. We have shown how to use it to go from Carlos Castillo, Marcelo Mendoza, and Barbara the media-space to the user-space and vice-versa, Poblete. 2011. Informationcredibility on Twitter. In but so far we have just scratched the surface in Proceedings of the 20th International Conference on World Wide Web, WWW ’11, pages 675–684, Hy- terms of understanding of the process that gener- derabad, . ated these data and its possible applications. Cheng Chen, Kui Wu, Venkatesh Srinivasan, and Acknowledgments Xudong Zhang. 2013. Battling the internet water army: Detection of hidden paid posters. In Proceed- This research is part of the Tanbih project,4 which ings of the International Conference on Advances aims to limit the effect of “fake news”, in Social Networks Analysis and Mining, ASONAM ’13, pages 116–120, Niagara Falls, ON, . and media bias by making users aware of what they are reading. The project is developed in col- Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. laboration between the Qatar Computing Research 2012. Detecting offensive language in social me- dia to protect adolescent online safety. In Proceed- Institute, HBKU and the MIT Computer Science ings of the IEEE Conference on Privacy, Security, and Artificial Intelligence Laboratory. Risk and Trust and of the IEEE Conference on Social Gianmarco De Francisci Morales acknowledges Computing, PASSAT/SocialCom ’12, pages 71–80, support from Intesa Sanpaolo Innovation Center. Amsterdam, Netherlands. The funder had no role in the study design, in the Kirsti K Cole. 2015. “It’s like she’s eager to be data collection and analysis, in the decision to pub- verbally abused”: Twitter, trolls, and (en) gender- lish, or in the preparation of the manuscript. ing disciplinary rhetoric. Feminist Media Studies, 15(2):356–358. 3Our data and code are available at http://github.com/amatanasov/conll_political_trollsStefano Cresci, Roberto Di Pietro, Marinella Petroc- 4http://tanbih.qcri.org/ chi, Angelo Spognardi, and Maurizio Tesconi. 2015. Fame for sale: efficient detection of fake Twitter fol- International ACM Web Science Conference, Web- lowers. Decision Support Systems, 80:56–71. Sci ’17, pages 43–52, Troy, NY, USA. Kareem Darwish, Dimitar Alexandrov, Preslav Nakov, Kiran Garimella, Gianmarco De Francisci Morales, and Yelena Mejova. 2017. Seminar users in the Aristides Gionis, and Michael Mathioudakis. 2018. Arabic Twitter sphere. In Proceedings of the Political discourse on social media: Echo chambers, 9th International Conference on Social Informatics, gatekeepers, and the price of bipartisanship. In Pro- SocInfo ’17, pages 91–108, Oxford, UK. ceedings of the International World Wide Web Con- ference, WWW ’18, pages 913–922, Lyon, France. Kushal Dave, Steve Lawrence, and David M Pennock. 2003. Mining the peanut gallery: Opinion extrac- Pepa Gencheva, Preslav Nakov, Llu´ıs M`arquez, Al- tion and semantic classification of product reviews. berto Barr´on-Cede˜no, and Ivan Koychev. 2017. In Proceedings of the 12th International World Wide A context-aware approach for detecting worth- Web conference, WWW ’03, pages 519–528, Bu- checking claims in political debates. In Proceedings dapest, Hungary. of the International Conference on Recent Advances in Natural Language Processing, RANLP ’17, pages Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, 267–276, Varna, Bulgaria. Fabio Petroni, Antonio Scala, Guido Caldarelli, H Eugene Stanley, and Walter Quattrociocchi. 2016. Aditya Grover and Jure Leskovec. 2016. Node2Vec: The spreading of misinformation online. Pro- Scalable feature learning for networks. In Proceed- ceedings of the National Academy of Sciences, ings of the 22nd ACM SIGKDD International Con- 113(3):554–559. ference on Knowledge Discovery and Data Mining, KDD ’16, pages 855–864, San Francisco, CA, USA. Chrysanthos Dellarocas. 2006. Strategic manipu- lation of internet opinion forums: Implications Hong-Youl Ha, Joby John, J. Denise John, and for consumers and firms. Management Science, Yongkyun Chung. 2016. Temporal effects of in- 52(10):1577–1593. formation from social networks on online behavior: The role of cognitive and affective trust. Internet Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Research, 26(1):213–235. Kristina Toutanova. 2019. BERT: Pre-training of Momchil Hardalov, Ivan Koychev, and Preslav Nakov. deep bidirectional transformers for language under- 2016. In search of credible news. In Proceedings standing. In Proceedings of the 2019 Conference of of the 17th International Conference on Artificial In- the North American Chapter of the Association for telligence: Methodology, Systems, and Applications, Computational Linguistics: Human Language Tech- AIMSA ’16, pages 172–180, Varna, Bulgaria. nologies, NAACL-HLT ’2019, pages 4171–4186, Minneapolis, MN, USA. Naeemul Hassan, Chengkai Li, and Mark Tremayne. 2015. Detecting check-worthy factual claims in Ibrahim Elbeltagi and Gomaa Agag. 2016. E- presidential debates. In Proceedings of the 24th retailing ethics and its impact on customer satis- ACM International Conference on Information and faction and repurchase intention: A cultural and Knowledge Management, CIKM ’15, pages 1835– commitment-trust theory perspective. Internet Re- 1838, Melbourne, . search, 26(1):288–310. Li-An Ho, Tsung-Hsien Kuo, and Binshan Lin. 2012. Michael Fire, Dima Kagan, Aviad Elyashar, and Yuval How social identification and trust influence orga- Elovici. 2014. Friend or foe? Fake profile identi- nizational online knowledge sharing. Internet Re- fication in online social networks. Social Network search, 22(1):4–28. Analysis and Mining, 4(1):1–23. Meng-Hsiang Hsu, Li-Wen Chuang, and Cheng-Se Patxi Gal´an-Garc´ıa, Jos´eGaviria de la Puerta, Car- Hsu. 2014. Understanding online shopping inten- los Laorden G´omez, Igor Santos, and Pablo Garc´ıa tion: the roles of four types of trust and their an- Bringas. 2016. Supervised machine learning for the tecedents. Internet Research, 24(3):332–352. detection of troll profiles in Twitter social network: Application to a real case of cyberbullying. Logic Minqing Hu and Bing Liu. 2004. Mining and summa- Journal of the IGPL, 24(1):42–53. rizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowl- Kiran Garimella, Gianmarco De Francisci Morales, edge Discovery and Data Mining, KDD ’04, pages Aristides Gionis, and Michael Mathioudakis. 2016. 168–177, Seattle, WA, USA. Quantifying controversy in social media. In Pro- ceedings of the 9th ACM International Conference Israa Jaradat, Pepa Gencheva, Alberto Barr´on-Cede˜no, on Web Search and Data Mining, WSDM ’16, pages Llu´ıs M`arquez, and Preslav Nakov. 2018. Claim- 33–42, San Francisco, CA, USA. Rank: Detecting check-worthy claims in Arabic and English. In Proceedings of the Conference Kiran Garimella, Gianmarco De Francisci Morales, of the North American Chapter of the Association Aristides Gionis, and Michael Mathioudakis. 2017. for Computational Linguistics: Human Language The effect of collective attention on controversialde- Technologies, NAACL-HLT ’18, New Orleans, LA, bates on social media. In Proceedings of the 9th USA. Andreas M Kaplan and Michael Haenlein. 2010. Users Darren L Linvill and Patrick L Warren. 2018. Troll of the world, unite! The challenges and opportuni- factories: The internet research agency and state- ties of social media. Business horizons, 53(1):59– sponsored agenda building. Resource Centre on Me- 68. dia Freedom in Europe. Georgi Karadzhov, Pepa Gencheva, Preslav Nakov, and Clare Llewellyn, Laura Cram, Robin L. Hill, and Ivan Koychev. 2017a. We built a fake news & click- Adrian Favero. 2018. For whom the bell trolls: bait filter: What happenednext will blow your mind! Shifting troll behaviour in the Twitter Brexit de- In Proceedings of the International Conference on bate. JCMS: Journal of Common Market Studies, Recent Advances in Natural Language Processing, 57(5):1148–1164. RANLP ’17, pages 334–343, Varna, Bulgaria. Michal Lukasik, Trevor Cohn, and Kalina Bontcheva. Georgi Karadzhov, Preslav Nakov, Llu´ıs M`arquez, Al- 2015. Point process modelling of rumour dynam- berto Barr´on-Cede˜no, and Ivan Koychev. 2017b. ics in social media. In Proceedings of the 53rd Fully automated fact checking using external Annual Meeting of the Association for Computa- sources. In Proceedings of the International Confer- tional Linguistics and the 7th International Joint ence on Recent Advances in Natural Language Pro- Conference on Natural Language Processing, ACL- cessing, RANLP ’17, pages 344–353, Varna, Bul- IJCNLP ’15, pages 518–523, Beijing, China. garia. Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Dongwoo Kim, Timothy Graham, Zimin Wan, and Bernard J. Jansen, Kam-Fai Wong, and Meeyoung Marian-Andrei Rizoiu. 2019. Tracking the digital Cha. 2016. Detecting rumors from microblogs with traces of Russian trolls: Distinguishing the roles and recurrent neural networks. In Proceedings of the strategy of trolls on Twitter. CoRR, abs/1901.05228. 25th International Joint Conference on Artificial In- telligence, IJCAI ’16, pages 3818–3824, New York, Edward C. S. Ku. 2012. Beyond price, how does trust NY, USA. encourage online group’s buying intention? Internet Research, 22(5):569–590. Michael McCord and M. Chuah. 2011. Spam detection on Twitter using traditional classifiers. In Proceed- Srijan Kumar, Justin Cheng, Jure Leskovec, and V.S. ings of the 8th International Conference on Auto- Subrahmanian. 2017. An army of me: Sockpuppets nomic and Trusted Computing, ATC ’11, pages175– in online discussion communities. In Proceedings 186, Banff, Canada. of the 26th International Conference on World Wide Web, WWW ’17, pages 857–866, Perth, Australia. Todor Mihaylov, Georgi Georgiev, and Preslav Nakov. 2015a. Finding opinion manipulation trolls in news Srijan Kumar, Francesca Spezzano, and VS Subrah- community forums. In Proceedings of the Nine- manian. 2014. Accurately detecting trolls in slash- teenth Conference on Computational Natural Lan- dot zoo via decluttering. In Proceedings of the guage Learning, CoNLL ’15, pages 310–314, Bei- 2014 IEEE/ACM International Conference on Ad- jing, China. vances in Social Network Analysis and Mining, ASONAM ’14, pages 188–195, Beijing, China. Todor Mihaylov, Ivan Koychev, Georgi Georgiev, and Preslav Nakov. 2015b. Exposing paid opinion ma- David M. J. Lazer, Matthew A. Baum, Yochai Ben- nipulation trolls. In Proceedings of the International kler, Adam J. Berinsky, Kelly M. Greenhill, Filippo Conference Recent Advances in Natural Language Menczer, Miriam J. Metzger, Brendan Nyhan, Gor- Processing, RANLP ’15, pages 443–450, Hissar, don Pennycook, David Rothschild, Michael Schud- Bulgaria. son, Steven A. Sloman, Cass R. Sunstein, Emily A. Thorson, Duncan J. Watts, and Jonathan L. Zit- Todor Mihaylov, Tsvetomila Mihaylova, Preslav train. 2018. The science of fake news. Science, Nakov, Llu´ıs M`arquez, Georgi Georgiev, and Ivan 359(6380):1094–1096. Koychev. 2018. The dark side of news community forums: Opinion manipulation trolls. Internet Re- Wenbin Li, Ning Zhong, and Chunnian Liu. 2006. search, 28(5):1292–1312. Combining multiple email filters based on multivari- ate statistical analysis. In Foundations of Intelligent Todor Mihaylov and Preslav Nakov. 2016. Hunting for Systems, pages 729–738. Springer. troll comments in news community forums. In Pro- ceedings of the 54th Annual Meeting of the Associa- Xiaodong Li, Xinshuai Guo, Chuang Wang, and tion for Computational Linguistics, ACL ’16, pages Shengliang Zhang. 2016. Do buyers express their 399–405, Berlin, . true assessment? Antecedents and consequences of customer praise feedback behaviour on . In- Tsvetomila Mihaylova, Georgi Karadzhov, Pepa ternet Research, 26(5):1112–1133. Atanasova, Ramy Baly, Mitra Mohtarami, and Preslav Nakov. 2019. SemEval-2019 task 8: Fact Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat- checking in community question answering forums. Seng Chua. 2018. Attributed social network embed- In Proceedings of the 13th International Workshop ding. IEEE Transactions on Knowledge and Data on Semantic Evaluation, SemEval ’19, pages 860– Engineering, 30(12):2257–2270. 869, Minneapolis, MN, USA. Tsvetomila Mihaylova, Preslav Nakov, Llu´ıs M`arquez, the spread of astroturf in microblog streams. In Pro- Alberto Barr´on-Cede˜no, Mitra Mohtarami, Georgi ceedings of the 20th International Conference Com- Karadjov, and James Glass. 2018. Fact checking panion on World Wide Web, WWW ’11, pages 249– in community forums. In Proceedings of the AAAI 252, Hyderabad, India. Conference on Artificial Intelligence, AAAI ’18, pages 879–886, New Orleans, LA, USA. Carlos Ruiz, David Domingo, Josep Llu´ıs Mic´o, Javier D´ıaz-Noci, Koldo Meso, and Pere Masip. 2011. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor- Public sphere 2.0? The democratic qualities of citi- rado, and Jeffrey Dean. 2013. Distributed repre- zen debates in online newspapers. The International sentations of words and phrases and their compo- Journal of Press/Politics, 16(4):463–487. sitionality. In Proceedings of the International Con- ference on Neural Information Processing Systems, Jari Salo and Heikki Karjaluoto. 2007. A conceptual NIPS’13, pages 3111–3119, Lake Tahoe, NV, USA. model of trust in the online environment. Online Information Review, 31(5):604–621. Tanushree Mitra, Graham P. Wright, and Eric Gilbert. 2017. A parsimonious language model of social Chun-Wei Seah, Hai Leong Chieu, Kian Ming Adam media credibility across disparate events. In Pro- Chai, Loo-Nin Teow, and Lee Wei Yeong. 2015. ceedings of the 2017 ACM Conference on Computer Troll detection by domain-adapting sentiment anal- Supported Cooperative Work and Social Computing, ysis. In Proceedings of the 18th International Con- CSCW ’17, pages 126–145, Portland, OR, USA. ference on Information Fusion, FUSION ’15, pages 792–799, Washington, DC, USA. Mitra Mohtarami, Ramy Baly, James Glass, Preslav Nakov, Llu´ıs M`arquez, and Alessandro Moschitti. Fabrizio Sebastiani. 2002. Machine learning in auto- 2018. Automatic stance detection using end-to- mated text categorization. ACM computing surveys end memory networks. In Proceedings of the 16th (CSUR), 34(1):1–47. Annual Conference of the North American Chap- Pnina Shachaf and Noriko Hara. 2010. Beyond vandal- ter of the Association for Computational Linguistics: ism: Wikipedia trolls. Journal of Information Sci- Human Language Technologies, NAACL-HLT ’18, ence, 36(3):357–370. pages 767–776, New Orleans, LA, USA. Thamar Solorio, Ragib Hasan, and Mainul Mizan. Michael J Moore, Tadashi Nakano, Akihiro Enomoto, 2014. Sockpuppet detection in Wikipedia: A cor- and Tatsuya Suda. 2012. Anonymity and roles as- pus of real-world deceptive writing for linking iden- sociated with aggressive posts in an online forum. tities. In Proceedings of the Ninth International Computers in Human Behavior, 28(3):861–867. Conference on Language Resources and Evaluation, LREC ’14, pages 1355–1358, Reykjavik, Iceland. Meredith Ringel Morris, Scott Counts, Asta Roseway, Aaron Hoff, and Julia Schwarz. 2012. Tweeting Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun is believing?: Understanding microblog credibility Yan, and Qiaozhu Mei. 2015. LINE: Large-scale in- perceptions. In Proceedings of the ACM 2012 Con- formation network embedding. In Proceedings of ference on Computer Supported Cooperative Work, the 24th International Conference on World Wide CSCW ’12, pages 441–450, Seattle, WA, USA. Web, WWW ’15, pages 1067–1077, Florence, Italy. Preslav Nakov, Tsvetomila Mihaylova, Llu´ıs M`arquez, Scott Thacker and Mark D Griffiths. 2012. An ex- Yashkumar Shiroya, and Ivan Koychev. 2017. Do ploratory study of trolling in online video gaming. not trust the trolls: Predicting credibility in commu- International Journal of Cyber Behavior, Psychol- nity question answering forums. In Proceedings of ogy and Learning (IJCBPL), 2(4):17–33. the International Conference on Recent Advances in Natural Language Processing, RANLP ’17, pages Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. 551–560, Varna, Bulgaria. The spread of true and false news online. Science, 359(6380):1146–1151. Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, and Yang Wang. 2016. Tri-party deep network rep- Mengzhou Zhuang, Geng Cui, and Ling Peng. 2018. resentation. Network, 11(9):12. Manufactured opinions: The effect of manipulating online product reviews. Journal of Business Re- Kashyap Popat, Subhabrata Mukherjee, Jannik search, 87:24 – 35. Str¨otgen, and Gerhard Weikum. 2017. Where the truth lies: Explaining the credibility of emerging Arkaitz Zubiaga, Maria Liakata, Rob Procter, Geral- claims on the web and social media. In Proceedings dine Wong Sak Hoi, and Peter Tolmie. 2016. of the 26th International Conference on World Wide Analysing how people orient to and spread rumours Web Companion, WWW ’17, pages 1003–1012, in social media by looking at conversational threads. Perth, Australia. PLoS ONE, 11(3):1–29.

Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Gonc¸alves, Snehal Patil, Alessandro Flam- mini, and Filippo Menczer. 2011. Truthy: Mapping