Íntegro: Leveraging Victim Prediction for Robust Fake Account Detection in OSNs

Yazan Boshmaf∗, Dionysios Logothetis†, Georgos Siganos‡, Jorge Lería§, Jose Lorenzo§, Matei Ripeanu∗, and Konstantin Beznosov∗ ∗University of British Columbia †Telefonica Research §Tuenti, Telefonica Digital ‡Qatar Computing Research Institute

Abstract—Detecting fake accounts in online social networks I.INTRODUCTION (OSNs) protects OSN operators and their users from various ma- licious activities. Most detection mechanisms attempt to predict The rapid growth of online social networks (OSNs), such and classify user accounts as real (i.e., benign, honest) or fake (i.e., as , , , LinkedIn, Google+, and Tuenti, malicious, Sybil) by analyzing user-level activities or graph-level has been followed by an increased interest in abusing them. structures. These mechanisms, however, are not robust against Due to their open nature, OSNs are particularly vulnerable to adversarial attacks in which fake accounts cloak their operation the Sybil attack [1], where an attacker creates multiple fake with patterns resembling real user behavior. accounts called Sybils for various adversarial objectives. We herein demonstrate that victims, benign users who control The problem. In its 2014 earnings report, Facebook estimated real accounts and have befriended fakes, form a distinct classifi- that up to 15 millions (1.2%) of its monthly active users are in cation category that is useful for designing robust detection mech- fact “undesirable,” representing fake accounts that are used in anisms. First, as attackers have no control over victim accounts and cannot alter their activities, a victim account classifier which violation of the site’s terms of service [2]. For such OSNs, the relies on user-level activities is relatively harder to circumvent. existence of fakes leads advertisers, developers, and investors Second, as fakes are directly connected to victims, a fake account to distrust their reported user metrics, which negatively impacts detection mechanism that integrates victim prediction into graph- their revenues [3]. Attackers create and automate fake accounts level structures is more robust against manipulations of the graph. for various malicious activities, including social spamming [4], To validate this new approach, we designed Íntegro, a scalable malware distribution [5], political astroturfing [6], and private defense system that helps OSNs detect fake accounts using a data collection [7]. It is therefore important for OSNs to detect meaningful a user ranking scheme. Íntegro starts by predicting fake accounts as fast and accurately as possible. victim accounts from user-level activities. After that, it integrates The challenge. Most OSNs employ detection mechanisms that these predictions into the graph as weights, so that edges incident to predicted victims have much lower weights than others. Finally, attempt to identify fake accounts through analyzing either user- Íntegro ranks user accounts based on a modified random walk level activities or graph-level structures. In the first approach, that starts from a known real account. Íntegro guarantees that unique features are extracted from recent user activities (e.g., most real accounts rank higher than fakes so that OSN operators frequency of friend requests, fraction of accepted requests), can take actions against low-ranking fake accounts. after which they are applied to a classifier that has been trained offline using machine learning techniques [8]. In the second We implemented Íntegro using widely-used, open-source dis- graph tributed computing platforms in which it scaled nearly linearly. approach, an OSN is formally modeled as a , with nodes We evaluated Íntegro against SybilRank, the state-of-the-art in representing user accounts and edges representing social rela- fake account detection, using real-world datasets and a large- tionships (e.g., friendships). Given the assumption that fakes scale deployment at Tuenti, the largest OSN in . We show can befriend only few real accounts, the graph is partitioned that Íntegro significantly outperforms SybilRank in user ranking into two regions separating real accounts from fakes, with a quality, where the only requirement is to employ a victim classifier narrow passage between them [9]. While these techniques are is better than random. Moreover, the deployment of Íntegro at effective against naïve attacks, various studies showed they Tuenti resulted in up to an order of magnitude higher precision are inaccurate in practice and can be easily evaded [7], [10], in fake accounts detection, as compared to SybilRank. [11]. For example, attackers can cheaply create fakes that resemble real users, circumventing feature-based detection, or use simple social engineering tactics to befriend a large number of real users, invalidating the assumption behind graph-based detection. In this work, we aim to tackle the question: “How Permission to freely reproduce all or part of this paper for noncommercial can we design a robust defense mechanism that allows an OSN purposes is granted provided that copies bear this notice and the full citation to detect accounts which are highly likely to be fake?” on the first page. Reproduction for commercial purposes is strictly prohibited without the prior written consent of the Society, the first-named author Implications. If an OSN operator can detect fakes efficiently (for reproduction of an entire paper only), and the author’s employer if the and effectively, it can improve the experience of its users by paper was prepared within the scope of employment. thwarting annoying spam messages and other abusive content. NDSS ’15, 8-11 February 2015, San Diego, CA, USA Copyright 2015 Internet Society, ISBN 1-891562-38-X The OSN operator can also increase the credibility of its user http://dx.doi.org/10.14722/ndss.2015.23260 metrics and enable third parties to consider its user accounts as authentic digital identities [12]. Moreover, the operator can Main results. We evaluated Íntegro against SybilRank using better utilize the time of its analysts who manually inspect and real-world datasets and a large-scale deployment at Tuenti. We validate accounts based on user reports. For example, Tuenti, chose SybilRank because it was shown to outperform known the largest OSN in Spain with 15M active users, estimates that contenders [13], including EigenTrust [15], SybilGuard [16], only 5% of the accounts inspected based on user reports are in SybilLimit [17], SybilInfer [18], Mislove’s method [19], and fact fake, which signifies the inefficiency of this manual pro- GateKeeper [20]. In addition, as SybilRank relies on a ranking cess [13]. The OSN operator can also selectively enforce abuse scheme that is similar to ours, albeit on an unweighted graph, mitigation techniques, such as CAPTCHA challenges [8] and evaluating against SybilRank allowed us to show the impact photo-based social authentication [14], to suspicious accounts of leveraging victim prediction on ranking quality. Our results while running at a lower risk of annoying benign users. show that Íntegro consistently outperforms SybilRank in user ranking quality, especially as E grows large. In particular, Our solution. We present Íntegro, a robust defense system that a Íntegro resulted in up to 30% improvement over SybilRank in helps OSNs identify fake accounts, which can befriend many the ranking’s area under ROC curve (AUC), which represents real accounts, through a user ranking scheme.1 We designed the probability that a random real account is ranked higher Íntegro for OSNs whose social relationships are bidirectional than a random fake account. (e.g., Facebook, Tuenti, LinkedIn), with the ranking process being completely transparent to users. While Íntegro’s ranking In practice, the deployment of Íntegro at Tuenti resulted in scheme is graph-based, the social graph is preprocessed first up to an order of magnitude higher precision in fake account and annotated with information derived from feature-based detection, where ideally fakes should be located at the bottom detection techniques. This approach of integrating user-level of the ranked user list. In particular, for the bottom 20K low- activities into graph-level structures positions Íntegro as the ranking users, Íntegro achieved 95% precision, as compared first feature-and-graph-based detection mechanism. to 43% by SybilRank and 5% by Tuenti’s user-based abuse Our design is based on the observation that victim accounts, reporting system. More importantly, the precision dramatically real accounts whose users have accepted friend requests sent decreased when moving up in the ranked list, which means by fakes, are useful for designing robust fake account detection Íntegro consistently placed most of the fakes at the bottom of mechanisms. In particular, Íntegro uses basic account features the list, unlike SybilRank. The only requirement with Íntegro is (e.g., gender, number of friends, time since last update), which to use a victim classifier that is better than random. This can be are cheap to extract from user-level activities, in order to train a easily achieved during the cross-validation phase by deploying classifier to predict unknown victims in the OSN. As attackers a victim classifier with an AUC greater than 0.5. do not have control over victims nor their activities, a victim From system scalability standpoint, Íntegro scales to OSNs classifier is inherently more resilient to adversarial attacks with many million users and runs on commodity machines. We than a similarly-trained fake account classifier. Moreover, as implemented Íntegro on top of open-source implementations of victims are directly connected to fakes, they form a “border- MapReduce [21] and Pregel [22]. Using a synthetic benchmark line” separating real accounts from fakes in the social graph. of an OSN consisting of 160M users, Íntegro takes less than 30 Íntegro makes use of this observation by incorporating victim minutes to finish its computation on 33 commodity machines. predictions into the graph as weights, such that edges incident to predicted victims have lower weights than others. Finally, Contributions. This work makes the following contributions: Íntegro ranks user accounts based on the landing probability of • Integrating user-level activities into graph-level structures. a modified random walk that starts from a known real account. We presented the design and analysis of Íntegro, a fake account The walk is “short” by terminating its traversal early before detection system that relies on a novel technique for integrating it converges. The walk is “supervised” by biasing its traversal user-level activities into graph-level structures. Íntegro uses towards nodes that are reachable through higher-weight paths. feature-based detection with user-level activities to predict how As this short, supervised random walk is likely to stay within likely each user is to be a victim. By weighting the graph the subgraph consisting of real accounts, most real accounts re- such that edges incident to predicted victims have much lower ceive higher ranks than fakes. Unlike SybilRank [13], the state- weights than others, Íntegro guarantees that most real accounts of-the-art in graph-based fake account detection, we do not are ranked higher than fakes. These ranks are derived from the assume sparse connectivity between real and fake accounts, landing probability of a modified random walk that starts from which makes Íntegro the first fake account detection system a known real account. To our knowledge, Íntegro is the first that is robust against adverse manipulation of the graph. detection system that is robust against adverse manipulation of For an OSN consisting of n users, Íntegro takes O(n log n) the social graph, where fakes follow an adversarial strategy to time to complete its computation. For attackers who randomly befriend a large number of accounts, real or fake, in an attempt establish a set Ea of edges between victim and fake accounts, to evade detection (Sections III and IV). Íntegro guarantees that no more than O(vol(Ea) log n) fakes • Implementation and evaluation. We implemented Íntegro on are assigned ranks similar to or higher than real accounts in the top of widely-used, open-source distributed machine learning vol(E ) worst case, where a is the sum of weights on edges in and graph processing platforms. We evaluated Íntegro against E a. Even with a random victim classifier that labels accounts SybilRank using real-world datasets and a large-scale deploy- as victims with 0.5 probability, Íntegro ensures that vol(E ) a ment at Tuenti. In practice, Íntegro has allowed Tuenti to detect is at most equals to |E |, resulting in an improvement factor a at least 10 times more fakes than their current user-based abuse of O (|E |/vol(E )) over SybilRank. a a reporting system, where reported users are not ranked. With 1In Spanish, the word “íntegro” means integrated, which suites our approach an average of 16K reports per day [13], this improvement has of integrating user-level activities into graph-level structures. been useful to both Tuenti and its users (Sections V and VI).

2 II.BACKGROUNDAND RELATED WORK real and fake accounts, with features similar to those outlined above. The authors trained two random forests (RF) classifiers We first outline the threat model we assume in this work. to detect fakes in Facebook and Twitter, ending up with 2% We then present required background and related work on fake FPR and 1% false-negative rate (FNR) for Facebook, and 2.5% account detection, abuse mitigation and the ground-truth, social FPR and 3% FNR for Twitter. infiltration, and analyzing victims of fakes in OSNs. Wang et al. used a click-stream dataset provided by Ren- A. Threat model Ren to cluster user accounts into “similar” behavioral groups, corresponding to real or fake accounts [29]. Using the METIS We focus on OSNs such as Facebook, RenRen, and Tuenti, clustering algorithm [30] with both session and clicks features, which are open to everyone and allow users to declare bilateral such as average clicks per session, average session length, the relationships (i.e., friendships). percentage of clicks used to send friend requests, visit photos, Capabilities. We consider attackers who are capable of cre- and share content, the authors were able to calibrate a cluster- ating and automating fake accounts on a large scale [23]. based classifier with 3% FPR and 1% FNR. Each fake account, also called a socialbot [24], can perform social activities similar to those of real users. This includes Even though feature-based detection scales to large OSNs, sending friend requests and posting social content. We do not it is still relatively easy to circumvent. This is the case because consider attackers who are capable of hijacking real accounts, it depends on features describing activities of known fakes as there are existing detection systems that tackle this threat in order to identify unknown ones. In other words, attackers (e.g., COMPA [25]). We focus on detecting fake accounts that can evade detection by adversely modifying the content and can befriend a large number of benign users in order to mount activity patterns of their fakes, leading to an arms race [31]– subsequent attacks, as we describe next. [33]. Also, feature-based detection does not provide any formal security guarantees and often results in a high FPR in practice. Objectives. The objective of an attacker is to distribute spam This is partly attributed to the large variety and unpredictability and malware, misinform, or collect private user data on a large of behaviors of users in adversarial settings [13]. scale. To achieve this objective, the attacker has to infiltrate the target OSN by using the fakes to befriend many real accounts. With Íntegro, we employ feature-based detection to identify Such an infiltration is required because isolated fake accounts unknown victims in a non-adversarial setting. The dataset used cannot directly interact with or promote content to most users to train a victim classifier includes features of only known real in the OSN [23]. This is also evident by a thriving underground accounts that have either accepted or rejected friend requests market for social infiltration. For example, attackers can now send by known fakes. As real accounts are controlled by benign connect their fake accounts with 1K users for $26 or less [26]. users who are not adversarial, a feature-based victim account classifier is harder to circumvent than a similarly-trained fake Victims. We refer to benign users who have accepted friend account classifier. As we discuss in Section IV, we only require requests from fake accounts as victims. We refer to friendships victim classification to be better than random guessing in order between victims and fakes as attack edges. Victims control real to outperform the state-of-the-art in fake account detection. accounts and engage with others in non-adversarial activities. Graph-based detection. As a response to the lack of formal B. Fake account detection security guarantees in feature-based detection, the state-of-the- art in fake account detection utilizes a graph-based approach From a systems design perspective, most of today’s fake ac- instead. In this approach, an OSN is modeled as a graph, with count detection mechanisms are either feature-based or graph- nodes representing user accounts and edges between nodes rep- based, depending on whether they utilize machine learning or resenting social relationship. Given the assumption that fakes graph analysis techniques in order to identify fakes. Next, we can establish only a small number of attack edges, the subgraph discuss each of these approaches in detail. induced by the set of real accounts is sparsely connected to 2 Feature-based detection. This approach relies on user-level fakes, that is, the cut which crosses over attack edges is sparse. activities and its account details (i.e., user logs, profiles). By Graph-based detection mechanisms make this assumption, and identifying unique features of an account, one can classify each attempt to find such a sparse cut with formal guarantees [34]– account as fake or real using various machine learning tech- [36]. For example, Tuenti employs SybilRank to rank accounts niques. For example, Facebook employs an “immune system” according to their perceived likelihood of being fake, based on that performs real-time checks and classification for each read structural properties of its social graph [13]. and write action on its database, which are based on features Yu et al. were among the first to analyze the social graph extracted from user accounts and their activities [8]. for the purpose of identifying fake accounts in OSNs [16], [17]. Yang et al. used ground-truth provided by RenRen to train The authors developed a technique that labels each account as an SVM classifier in order to detect fake accounts [27]. Using either fake or real based on multiple, modified random walks. simple features, such as frequency of friend requests, fraction This binary classification is used to partition the graph into two of accepted requests, and per-account clustering coefficient, the smaller subgraphs that are sparsely interconnected via attack authors were able to train a classifier with 99% true-positive edges, separating real accounts from fakes. They also proved rate (TPR) and 0.7% false-positive rate (FPR). that in the worst case O(|Ea| log n) fakes can be misclassified, where |Ea| is the number of attack edges and n is the number Stringhini et al. utilized honeypot accounts to collect data describing various user activities in OSNs [28]. By analyzing 2A cut is a partition of nodes into two disjoint subsets. Visually, it is a line the collected data, they were able to build a ground-truth for that cuts through or crosses over a set of edges in the graph (see Fig. 2).

3

Gender #Friends #Posts! Yang el al. studied the cyber criminal ecosystem on Twit- … Male! 3! ! 10! ter [44]. They found that victims fall into one of three cate- Feature vector of B! gories. The first are social butterflies who have large numbers of followers and followings, and establish social relationships B( with other accounts without careful examination. The second Real are social promoters who have large following-follower ratios, ! Trusted! larger following numbers, and a relatively high URL ratios in Attack! Victim edge their tweets. These victims use Twitter to promote themselves ! ! Fake or their business by actively following other accounts without ! consideration. The last are dummies who post few tweets but have many followers. These victims are actually dormant fake Real region! Fake region! accounts at an early stage of their abuse. Fig. 2: System model. In this figure, the OSN is represented as a graph consisting of 14 users. There are 8 real accounts, 6 fake accounts, III.INTUITION,GOALS, AND MODEL and 5 attack edges. The cut, represented by a dashed-line, partitions the graph into two regions, real and fake. Victim accounts are real We now introduce Íntegro, a fake account detection system accounts that are directly connected to fakes. Trusted accounts are that is robust against social infiltration. We first present the accounts that are known to be real and not victims. Each account has intuition behind our design, followed by its goals and model. a feature vector representing basic account information. Initially, all edges have a unit weight, so user B for example has a degree of 3. A. Intuition Some users are more likely to become victims than others. If we can train a classifier to accurately predict whether a user C. System model is a victim with some probability, we can then identify the As illustrated in Fig. 2, we model an OSN as an undirected cut which separates fakes from real accounts in the graph. As graph G = (V,E), where each node v ∈ V represents a victims are benign users who are not adversarial, the output i user account and each edge {v , v } ∈ E represents a bilateral of this classifier represents a reliable information which we i j social relationship among v and v . In the graph G, there are can integrate in the graph. To find the cut which crosses over i j n = |V | nodes and m = |E| edges. mostly attack edges, we can define a graph weighting scheme that assigns edges incident to predicted victims lower weights Attributes. Each node vi ∈ V has a degree deg(vi) that is than others, where weight values are calculated from prediction equal to the sum of weights on edges incident to vi. Moreover, probabilities. In a weighted graph, the sparsest cut is the cut vi has a feature vector A(vi), where each entry aj ∈ A(vi) with the smallest volume, which is the sum of weights on edges describes a feature or an attribute of the account vi. Each edge across the cut. Given an accurate victim classifier, such a cut {vi, vj} ∈ E has a weight w(vi, vj) ∈ (0, 1], which is initially is expected to cross over some or all attack edges, effectively set to w(vi, vj) = 1. separating real accounts from fakes, even if the number of attack edges is large. We find this cut using a ranking scheme Regions. The node set V is divided into two disjoint sets, Vr that ideally assigns higher ranks to nodes in one partition of and Vf , representing real and fake accounts, respectively. We the cut than others. This ranking scheme is inspired by similar refer to the subgraph induced by Vr as the real region Gr, graph partitioning algorithms proposed by Spielman et al. [45], which includes all real accounts and the friendships between Yu [34], and Cao et al. [13]. them. Likewise, we refer to the subgraph induced by Vf as the fake region Gf . The regions are connected by a set of attack edges Ea between victim and fake accounts. We assume the B. Design goals OSN operator is aware of a small set of trusted accounts Vt, Íntegro aims to help OSN operators in detecting fake ac- which are known to be real accounts and are not victims. counts using a meaningful user ranking scheme. In particular, Íntegro has the following design goals: IV. SYSTEM DESIGN • High-quality user ranking (effectiveness). The system should We now describe the design behind Íntegro. We start with consistently assign higher ranks to real accounts than fakes. It a short overview of our approach, after which we proceed with should limit the number of fakes that might rank similar to or a detailed description of each system component. higher than real accounts. The system should be robust against social infiltration under real-world attack strategies. Given a A. Overview ranked list of users, a high percentage of the users at the bottom of the list should be fake. This percentage should decrease as Íntegro extracts low-cost features from user-level activities we go up in the list. in order to train a classifier to identify unknown victims in the social graph. We refer to these accounts as potential victims, • Scalability (efficiency). The system should have a practical as there are probabilities attached to their labels. Íntegro then computational cost which allows it to scale to large OSNs. It calculates new edge weights from prediction probabilities such should deliver ranking results in only few minutes. The system that edges incident to identified victims have lower weights should be able to extract useful, low-cost features and process than others. Finally, Íntegro ranks user accounts based on the large graphs on commodity machines, in order to allow OSNs landing probability of a modified random walk that starts from to deploy it on their existing computer clusters. a trusted account picked at random. The walk is “short,” as it is

5 terminated early before it converges. The walk is “supervised,” land on a node as its trust value, so the probability distribution as it is biased towards traversing nodes which are reachable via of the walk at each step can be modeled as a trust propagation higher-weight paths. This short, supervised random walk has process [48]. In this process, a weight w(vi, vj) represents the a higher probability to stay in the real region of the graph, as rate at which trust may propagate from either side of the edge it is highly unlikely to escape into the fake region in few steps {vi, vj} ∈ E. We next describe this process in detail. through low-weight attack edges. Accordingly, Íntegro assigns most of the real accounts a higher rank than fakes. Trust propagation. Íntegro utilizes the power iteration method to efficiently compute trust values [49]. This method involves successive matrix multiplications where each element of the B. Identifying potential victims matrix is the transition probability of the random walk from For each user vi, Íntegro extracts a feature vector A(vi) one node to another. Each iteration computes the trust distri- from its recent user-level activities. A subset of feature vectors bution over nodes as the random walk proceeds by one step. is selected to train a binary classifier to predict whether each Let Tk(vi) denote the trust collected by each node vi ∈ V user is a victim and with what probability. As attackers have after k iterations. Initially, the total trust, denoted by τ ≥ 1, no control over victims, such a victim classifier is inherently is evenly distributed among the trusted nodes in Vt: more resilient to adversarial attacks than similarly-trained fake  account classifier. Let us consider one concrete example. In the τ/|Vt| if vi ∈ Vt, T0(vi) = (1) “boiling-frog” attack [31], fake accounts can force a classifier 0 otherwise. to tolerate abusive activities by slowly introducing similar activities to the OSN. Because the OSN operator has to retrain The process then proceeds as follows: deployed classifiers in order to capture new behaviors, a fake X w(vi, vj) account classifier will learn to tolerate more and more abusive Tk(vi) = Tk−1(vj) · , (2) deg(vj) activities, until the attacker can launch a full-scale attack {vi,vj }∈E without detection [7]. For victim prediction, on the other hand, this is possible only if the accounts used for training have been where in iteration k, each node vi propagates its trust Tk−1(vi) hijacked. This situation can be avoided by manually verifying from iteration k−1 to each neighbour vj, proportionally to the the accounts, as described in Section II-C. ratio w(vi, vj)/ deg(vi). This is required so that the sum of the propagated trust equals Tk−1(vi). The node vi then collects the Feature engineering. Extracting and selecting useful features trust propagated similarly from each neighbour vj and updates from user activities can be both challenging and time consum- its trust Tk(vi). Throughout this process, τ is preserved such ing. For efficiency, we seek features that can be extracted in that for each iteration k ≥ 1 we have: O(1) time per user. One candidate location for low-cost feature X X extraction is the profile page of user accounts, where features Tk−1(vi) = Tk(vi) = τ. (3) are readily available (e.g., a Facebook profile page). However, vi∈V vi∈V these features are expected to be statistically “weak,” which means they may not strongly correlate with whether a user is a victim or not (i.e., the label). As we explain later, we require Our goal is to ensure that most real accounts collect higher the victim classifier to be better than random in order to deliver trust than fake accounts. That is, we seek to limit the portion of τ G G robust fake account detection. This requirement, fortunately, is that escapes the real region r and enters the fake region f . easy to satisfy. In particular, we show in Section V that an OSN To achieve this property, we make the following modifications. operator can train and cross-validate a victim classifier that is Adjusted propagation rates. In each iteration k, the aggregate up to 52% better than random, using strictly low-cost features. rate at which τ may enter Gf is strictly limited by the sum of weights on the attack edges, which we denote by the volume Supervised learning. For each user vi, Íntegro computes a vul- vol(Ea). Therefore, we aim to adjust the weights in the graph nerability score p(vi) ∈ (0, 1) that represents the probability such that vol(Ea) ∈ (0, |Ea|], without severely restricting trust of vi to be a victim. For a fixed operating threshold α ∈ (0, 1) with a default value of α = 0.5, we say v is a potential victim propagation in Gr. We accomplish this by assigning smaller i weights to edges incident to potential victims than other edges. if p(vi) ≥ α. To compute vulnerability scores, Íntegro uses random forests (RF) learning algorithm [46] to train a victim In particular, each edge {vi, vj} ∈ E keeps the default weight classifier, which given A(v ) and α, decides whether the user w(vi, vj) = 1 if vi and vj are not potential victims. Otherwise, i we modify the weight as follows: vi is a victim with a score p(vi). We picked this learning algorithm because it is both efficient and robust against model w(v , v ) = min {1, β · (1 − max{p(v ), p(v )})} , (4) over-fitting [47]. It takes O(n log n) time to extract n low-cost i j i j feature vectors, each consisting of O(1) features, and train a where β is a scaling parameter with a default value of β = 2. O(n) victim classifier. It also takes to evaluate node scores, Now, as vol(Ea) → 0 the portion of τ that enters Gf reaches given the trained classifier and users’ feature vectors. zero as desired. For proper degree normalization, we introduce a self-loop {vi, vi} with weight w(vi, vi) = (1 − deg(vi)) /2 C. Integrating victim predictions and ranking users whenever deg(vi) < 1. Notice that self-loops are considered twice in degree calculation. To rank users, Íntegro computes the probability of a modi- fied random walk to land on each user vi after k steps, where Early-terminated propagation. In each iteration k, the trust the walk starts from a trusted user account picked at random. vector Tk(V ) = hTk(v1),...,Tk(vn)i describes the distribu- For simplicity, we refer to the probability of a random walk to tion of τ throughout the graph. As k → ∞ the vector converges

6 237" 13" 0" 0" (113)" (4)" 107" 159" 0" A" B" G 231" A" B" G A" B" G Real" High"="1.0" 103" (115)" 10" Trusted" Complementary"="0.25" E" F" 0" I" 0" E" F" (5)" I" 5" E" F" 108" I" 103" " 0" 46" (3)" VicOm 51" " Low"="0.1" 500" C" D H 316" C" D (46)" H 154" C" D H 0" (105)" 13" Fake" 500" 129" 56" 159" (117)" (4)"

(a) Initialization! (b) After 4 iterations! (c) Stationary distribution! Fig. 3: Trust propagation in a toy graph. Each value is rounded to its nearest natural number. Values in parentheses represent degree-normalized trust (i.e., rank values). In this example, we set α = 0.5, β = 2, τ = 1, 000, p(·) = 0.05 except for p(E) = 0.95, and ω = dlog2(9)e = 4.

to a stationary distribution T∞(V ), as follows [50]: D. Trusted accounts and community structures   deg(v1) deg(vn) Íntegro is robust against social infiltration, as it limits the T∞(V ) = τ · , . . . , τ · , (5) vol(V ) vol(V ) portion of τ that enters Gf by the rate vol(Ea), regardless of the number of attack edges, |Ea|. For the case when there are where the volume vol(V ) in this case is the sum of degrees of few attack edges so that Gr and Gf are sparsely connected, 3 nodes in V . In particular, Tk(V ) converges after k reaches the vol(Ea) is already small, even if one keeps w(vi, vj) = 1 mixing time of the graph, which is much larger than O(log n) for each attack edge {vi, vj} ∈ Ea. However, Gr is likely to iterations for various kinds of social networks [37], [51], [52]. contain communities [37], [53], where each represents a dense Accordingly, we terminate the propagation process early before subgraph that is sparsely connected to the rest of the graph. it converges after ω = O(log n) iterations. In this case, the propagation of τ in Gr becomes restricted Degree-normalization. As described in Equation 5, trust prop- by the sparse inter-community connectivity, especially if Vt is agation is influenced by individual node degrees. As k grows contained exclusively in a single community. We therefore seek large, the propagation starts to bias towards high degree nodes. a selection strategy for trusted accounts, or seeds, that takes This implies that high degree fake accounts may collect more into account the existing community structure in the graph. trust than low degree real accounts, which is undesirable for Selection strategy. We pick trusted accounts as follows. First, effective user ranking. To eliminate this node degree bias, we before rate adjustment, we estimate the community structure normalize the trust collected by each node by its degree. That in the graph using a community detection algorithm called is, we assign each node vi ∈ V after ω = O(log n) iterations a the Louvain method [54]. Second, after rate adjustment, we 0 rank value Tω(vi) that is equal to its degree-normalized trust: exclude potential victims and pick small samples of nodes from each detected community at random. Third and last, we T 0 (v ) = T (v )/ deg(v ). (6) ω i ω i i inspect the sampled nodes in order to verify they correspond to Finally, we sort the nodes by their ranks in a descending order. real accounts that are not victims. We initialize the trust only between the accounts that pass manual verification by experts. Example. Fig. 3 depicts trust propagation on a toy graph. In this example, we assume each account has a vulnerability score In addition to coping with the existing community structure of 0.05 except the victim E, which has a score of p(E) = 0.95. in the graph, this selection strategy is designed to also reduce The graph is weighted using α = 0.5 and β = 2, and a total the negative impact of seed-targeting attacks. In such attacks, trust τ = 1000 in initialized over the trusted nodes {C,D}. fakes befriend trusted accounts in order to adversely improve their ranking, as the total trust τ is initially distributed among After ω = 4 iterations, all real accounts {A, B, C, D, E} trusted accounts. By choosing the seeds at random, however, collect more trust than fake accounts {F, G, H, I}. The nodes the attacker is forced to guess the seeds among a large number also receive the correct ranking of (D, A, B, C, E, F, G, H, I), of nodes. Moreover, by choosing multiple seeds, the chance as sorted by their degree-normalized trust. In particular, all of correctly guessing the seeds is further reduced, while the real accounts have higher rank values than fakes, where the amount of trust assigned to each seed in lowered. In practice, 0 0 smallest difference is T4(E) − T4(F ) > 40. Moreover, notice the number of seeds depends on available resources for manual that real accounts that are not victims have similar rank values, account verification, with a minimum of one seed per detected 0 0 where the largest difference is T4(D) − T4(C) < 12. These community. sorted rank values, in fact, could be visualized as a stretched- out step function that has a significant drop near the victim’s Community detection. We picked the Louvain method as it is rank value. However, if we allow the process to converge after both efficient and produces high-quality partitions. The method k > 50 iterations, the fakes collect similar or higher trust iteratively groups closely connected communities together to than real accounts, following Equation 5. Also, notice that the greedily improve the modularity of the partition [55], which is attack edges Ea = {{E,G}, {E,F }, {E,H}} have a volume a measure for partition quality. In each iteration, every node of vol(Ea) = 0.3, which is 10 times lower than its value if the represents one community, and well-connected neighbors are graph had unit weights, with vol(Ea) = 3. As we soon show greedily combined into the same community. At the end of the in Section V, adjusting the propagation rates is essential for iteration, the graph is reconstructed by converting the resulting robustness against social infiltration. communities into nodes and adding edges that are weighted by inter-community connectivity. Each iteration takes O(m) time, 3The definition of vol(U) depends on whether U contains edges or nodes. and only a small number of iterations is required to find the

7 community structure which greedily maximizes the modularity. As Gr is fast mixing, each real account vi ∈ Vr receives 0 approximately identical rank value of Tω(vi) = c · τ/vol(V ), While one can apply community detection to identify fake where τ/vol(V ) is the degree-normalized trust value in T (V ) accounts [19], doing so hinges on the assumption that fakes ∞ (Equations 5 and 6). Knowing that Gf is controlled by the always form tightly-knit communities, which is not necessarily 0 attacker, each fake vj ∈ Vf receives a rank value Tω(vj) that true [27]. This also means fakes can easily evade detection if depends on how the fakes inter-connect to each other. However, they establish sparse connectivity among themselves [9]. With since the aggregate trust in Gf is bounded, each fake receives Íntegro, we do not make such an assumption. In particular, we 0 on average a rank value of Tω(vj) = f · τ/vol(V ), which is consider an attacker who can befriend a large number of real less than that of a real account. In the worst case, an attacker or fake accounts, without any formal restrictions. can arrange a set Vm ⊂ Vf of fake accounts in Gf such that 0 each ∈ Vm receives a rank value of Tω(vk) = c · τ/vol(V ), E. Computational cost while the remaining fakes receive a rank value of zero. Such a set cannot have more than (f/c)·vol(Vs) = O (vol(Ea) log n) n m For an OSN with users and friendships, Íntegro takes accounts, as otherwise, f would not be less than 1 and Gf O(n log n) time to complete its computation, end-to-end. We would receive more than it should in Tω(V ). next analyze the running time in detail. Improvement over SybilRank’s bound. Íntegro shares many Runtime analysis. Recall that users have a limit on how many design traits with SybilRank, which is the state-of-the-art in friends they can have (e.g., 5K in Facebook, 1K in Tuenti), graph-based detection [13]. In particular, modifying Íntegro by so we have O(m) = O(n). Identifying potential victims takes setting w(vi, vj) = 1 for each (vi, vj) ∈ E will in fact result in O(n log n) time, where it takes O(n log n) time to train an RF an identical ranking. It is indeed the prediction and incorpora- classifier and O(n) time to compute vulnerability scores. Also, tion of potential victims that differentiates Íntegro from other weighting the graph takes O(m) time. Detecting communities proposals, giving it the unique advantages outlined earlier. takes O(n) time, where each iteration of the Louvain method takes O(m) time, and the graph rapidly shrinks in O(1) time. As stated by Theorem 4.1, the bound on ranking quality Propagating trust takes O(n log n) time, as each iteration takes relies on vol(Ea), regardless of how large the set Ea grows. As O(m) time and the propagation process iterates for O(log n) we weight the graph based on the output of the victim classi- times. Ranking and sorting users by their degree-normalized fier, our bound is sensitive to its classification performance. We trust takes O(n log n) time. So, the running time is O(n log n). next prove that if an OSN operator uses a victim classifier that is uniformly random, which means each user account vi ∈ V is equally vulnerable with p(vi) = 0.5, then Íntegro is as good F. Security guarantees as SybilRank in terms of ranking quality [13]: For the upcoming security analysis, we consider attackers Corollary 4.2: For a uniformly random victims classifier, who establish attack edges with victims uniformly at random. the number of fake accounts that rank similar to or higher than Even though our design does not depend on the actual mixing real accounts after O(log n) iterations is O(|Ea| log n). time of the graph, we assume the real region is fast mixing Proof: This classifier assigns each user account v ∈ V for analytical tractability. This means that it takes O(log |Vr|) i iterations for trust propagation to converge in the real region. a score p(vi) = 0.5. By Equation 4, each edge {vi, vj} ∈ E In other words, we assume there is a gap between the mixing is assigned a unit weight w(vi, vj) = 1, where α = 0.5 and time of the whole graph and that of the real region such that, β = 2. By Theorem 4.1, the number of fake accounts that after O(log n) iterations, the propagation reaches its stationary rank similar to or higher than real accounts after ω = O(log n) distribution in the real region but not in the whole graph. iterations is O (vol(Ea) log n) = O(|Ea| log n). Main theoretical result. The main security guarantee provided By Corollary 4.2, Íntegro can outperform SybilRank in its by Íntegro is captured by the following theoretical result. For a ranking quality by a factor of O (|Ea|/vol(Ea)), given the used complete proof, we refer the reader to our technical report [56]: victim classifier is better than random. This can be achieved during the cross-validation phase of the victim classifier, which Theorem 4.1: Given a social graph with a fast mixing real we thoroughly describe in what follows. region and an attacker who randomly establishes attack edges, the number of fake accounts that rank similar to or higher than V. SYSTEM EVALUATION real accounts after O(log n) iterations is O (vol(Ea) log n). We analyzed and evaluated Íntegro against SybilRank using Proof sketch: Let us consider a graph G = (V,E) with a two real-world datasets recently collected from Facebook and fast mixing real region Gr. As weighting a graph changes its Tuenti. We also compared both systems through a large-scale mixing time by a constant factor [57], Gr remains fast mixing deployment at Tuenti in collaboration with its “Site Integrity” after rate adjustment. team, which has 14 full-time account analysts and 10 full-time engineers who fight spam and other forms of abuse. After O(log n) iterations, the trust vector Tω(V ) does not reach its stationary distribution T∞(V ). Since trust propagation Compared system. We chose SybilRank for two main reasons. starts from Gr, the fake region Gf gets only a fraction f < 1 First, as discussed in Section IV-F, SybilRank utilizes a similar of the aggregate trust it should receive in T∞(V ). On the other ranking scheme based on the power iteration method, albeit on hand, as the trust τ is conserved during the propagation process an unweighted version of the graph. This similarity allowed us (Equation 3), Gr gets c > 1 times higher aggregate trust than to clearly show the impact of leveraging victim prediction on it should receive in T∞(V ). fake account detection. Second, SybilRank outperforms other

8 RI Score (%) Feature Brief description Type Facebook Tuenti User activity: Friends Number of friends the user had Numeric 100.0 84.5 Photos Number of photos the user shared Numeric 93.7 57.4 Feed Number of news feed items the user had Numeric 70.6 60.8 Groups Number of groups the user was member of Numeric 41.8 N/A Likes Number of likes the users made Numeric 30.6 N/A Games Number of games the user played Numeric 20.1 N/A Movies Number of movies the user watched Numeric 16.2 N/A Music Number of albums or songs the user listened to Numeric 15.5 N/A TV Number of TV shows the user watched Numeric 14.2 N/A Books Number of books the user read Numeric 7.5 N/A Personal messaging: Sent Number of messages sent by the user Numeric N/A 53.3 Inbox Number of messages in the user’s inbox Numeric N/A 52.9 Privacy Privacy level for receiving messages 5-Categorical N/A 9.6 Blocking actions: Users Number of users blocked by the user Numeric N/A 23.9 Graphics Number of graphics (photos) blocked by the user Numeric N/A 19.7 Account information: Last updated Number of days since the user updated the profile Numeric 90.77 32.5 Highlights Number of years highlighted in the user’s time-line Numeric 36.3 N/A Membership Number of days since the user joined the OSN Numeric 31.7 100 Gender User is male or female 2-Categorical 13.8 7.9 Cover picture User has a cover picture 2-Categorical 10.5 < 0.1 Profile picture User has a profile picture 2-Categorical 4.3 < 0.1 Pre-highlights Number of years highlighted before 2004 Numeric 3.9 N/A Platform User disabled third-party API integration 2-Categorical 1.6 < 0.1

TABLE I: Low-cost features extracted from Facebook and Tuenti datasets. The RI score is the relative importance of the feature. A value of “N/A” means the feature was not available for this dataset. A k-Categorical feature means this feature can have one value out of k categories (e.g., boolean features are 2-Categorical).

contenders [13], including EigenTrust [15], SybilGuard [16], O(|Ea|/vol(Ea)) improvement on its security bound, requires SybilLimit [17], SybilInfer [18], Mislove’s method [19], and the same O(n log n) time, and is robust against social infiltra- GateKeeper [20]. We next contrast these systems to both tion, unlike SybilRank and all other systems. SybilRank and Íntegro. SybilGuard [16] and SybilLimit [17] identify fake accounts A. Datasets based on a large number of√ modified random walks, where We used two datasets from two different OSNs. The first the computational cost is O( mn log n) in centralized setting dataset was collected in the study described in Section II-D, like OSNs. SybilInfer [18], on the other hand, uses Bayesian and contained public user profiles and two graph samples. The inference techniques to assign each user account a probability second dataset was collected from Tuenti’s production servers, 2 of being fake in O(n(log n) ) time per trusted account. The and contained a day’s worth of server-cached user profiles. system, however, does not provide analytical bounds on how many fakes can outrank real accounts in the worst case. Research ethics. For collecting the first dataset, we followed known practices and obtained the approval of our university’s GateKeeper [20], which is a flow-based detection approach, research ethics board [7]. As for the second dataset, we signed improves over SumUp [58]. It relies on strong assumptions that a non-disclosure agreement with Tuenti in order to access an require balanced graphs and costs O(n log n) time per trusted anonymized, aggregated version of its user data, with the whole account, referred to as a “ticket source.” process being mediated by Tuenti’s Site Integrity team. Viswanath et al. used Mislove’s algorithm [39] to greedily The ground-truth. For the Tuenti dataset, the accounts were expand a local community around known real accounts in oder inspected and labeled by its accounts’ analysts. The inspection to partition the graph into two communities representing real included matching user profile photos to its declared age or and fake regions [19]. This algorithm, however, costs O(n2) address, understanding natural language in user posts, exam- time and its detection can be easily evaded if the fakes establish ining the friends of a user, and analyzing the user’s IP address sparse connectivity among themselves [9]. and HTTP-related information. For the Facebook dataset, we used the ground-truth of the original study [7], which we also Compared to these systems, SybilRank provides an equiv- re-validated for the purpose of this work, as we describe next. alent or tighter security bound and is more computationally ef- ficient, as it requires O(n log n) time regardless of the number Facebook. The dataset contained public profile pages of 9,646 of trusted accounts. Compared to SybilRank, Íntegro provides real users who received friend requests from fake accounts. As

9 1 0.761 AUC = 0.76 the dataset was collected in early 2011, we wanted to verify 0.759 whether these users are still active on Facebook. Accordingly, 0.8 AUC = 0.7 AUC = 0.5 0.757 we revisited their public profiles in June 2013. We found that 0.6 0.755

7.9% of these accounts were either disabled by Facebook or 0.753 0.4 TuenS( deactivated by the users themselves. Accordingly, we excluded True(posiSve(rate( 0.751 0.2 Facebook( these accounts, ending up with 8,888 accounts, out of which Mean(area(under(ROC(curve( 0.749 Random( 32.4% were victims who accepted a single friend request sent 0 0.747 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 by a fake posing as a stranger. As fakes initially targeted users False(posiSve(rate( Dataset(size((thousands)( at random, the dataset included a diverse sample of Facebook (a) ROC Analysis (b) Sensitivity to dataset size users. In particular, these users were 51.3% males and 48.7% females, lived in 1,983 cities across 127 countries, practiced 43 languages, and have used Facebook for 5.4 years on average. Fig. 4: Victim prediction using the RF algorithm. In (a), the ROC curves show the tradeoff between FPR and TPR for both datasets. The dataset also included two graph samples of Facebook, In ROC analysis, the closer the curve is to the upper-left corner the which were collected using a stochastic version of the Breadth- more accurate it is. The area under the ROC curve (AUC) summarizes First Search method called “forest fire” [59]. The first graph the classifier’s performance. Therefore, an AUC of 1 means a perfect consisted of 2,926 real accounts with 9,124 friendships (the classifier, while an AUC of 0.5 means a random classifier. We require real region), 65 fakes with 2,080 friendships (the fake region), the victim classifier to be better than random. In (b), during cross and 748 timestamped attack edges. The second graph consisted validation on Tuenti dataset, we observed that increasing the dataset size to more than 40K vectors did not significantly increase the AUC. of 6,136 real accounts with 38,144 friendships, which repre- sented the real region only.

Tuenti. The dataset contained profiles of 60K real users who operating characteristics (ROC) analysis. In ROC analysis, the received friend requests from fake accounts, out of which 50% closer the curve is to the top-left corner at point (0, 1) the better were victims. The dataset was collected in Feb 10, 2014 from the classification performance is. The quality of the classifier live production servers, where data resided in memory and no can be quantified with a single value by calculating the area expensive, back-end queries were made. For Tuenti, collecting under its ROC curve (AUC) [47]. this dataset was a low-cost and easy process, as it only involved reading cached user profiles of a subset of its daily active users, We also recorded the relative importance (RI) of features users who logged in to Tuenti on that particular day. used for the classification. The RI score is computed by the RF algorithm, and it describes the relative contribution of each feature to the predictability of the label (i.e., a victim or a non- B. Victim prediction victim), when compared to all other features [46]. We sought to validate the following claim: An OSN oper- Results. For both datasets, the RF classifier ended up with an ator can identify unknown victim accounts with a probability AUC greater than 0.5, as shown in Fig. 4a. In particular, for that is better than random, using strictly low-cost features the Facebook dataset, the classifier delivered an AUC of 0.7, extracted from readily-available user profiles. which is 40% better than random. For the Tuenti dataset, on Features. As described in Table I, we extracted features from the other hand, the classifier delivered an AUC of 0.76, which both datasets to generate feature vectors. The only requirement is 52% better than random. Also, increasing the dataset size to we had for feature selection was to have the feature value more than 40K feature vectors did not significantly improve the available for all users in the dataset, so that the resulting feature AUC during cross-validation, as show in Fig. 4b. This means vectors are complete. For the Facebook dataset, we were able an OSN operator can train a victim classifier using a relatively to extract 18 features from public user profiles. For Tuenti, small dataset, so fewer accounts need to be manually verified. however, the dataset was limited to 14 features, but contained user features that are not publicly accessible. C. Ranking quality Validation method. To evaluate the accuracy of the classifiers, We compared Íntegro against SybilRank in terms of their we performed a 10-fold, stratified cross-validation method [47] ranking quality under various attack scenarios, where ideally using the RF learning algorithm. First, we randomly partitioned real accounts should be ranked higher than fake accounts. Our the dataset into 10 equally-sized sets, with each set having the results are based on the average of at least 10 runs, with error same percentage of victims as the complete dataset. We next bars reporting 95% confidence intervals (CI), when applicable. trained an RF classifier using 9 sets and tested it using the We picked the Facebook dataset for this comparison because remaining set. We repeated this procedure 10 times (i.e., folds), it included both feature vectors and graph samples. with each of the sets being used once for testing. Finally, we Infiltration scenarios. We considered two attack scenarios. In combined the results of the folds by computing the mean of the first scenario, attackers establish attack edges by targeting their true-positive rate (TPR) and false-positive rate (FPR). users with whom their fakes have mutual friends. Accordingly, we used the first Facebook graph which contained timestamped Performance metrics. The output of the classifier depends on attack edges, allowing us to replay the infiltration by 65 its operating threshold, which is a cutoff value in the prediction socialbots (n=2,991 and m=11,952). We refer to this scenario probability after which the classifier identifies a given user as a as the targeted-victim attack. victim. In order to capture the trade-off between TPR and FPR in single curve, we repeated the cross-validation method under In the second scenario, we attackers establish attack edges different threshold values using a procedure known as receiver by targeting users at random [13]. We designated the second

10 1.00 1.00 0.95 0.95 dlog2(n)e iterations for both Íntegro and SybilRank. 0.90 0.90 0.85 0.85 Results. Íntegro consistently outperformed SybilRank in rank- 0.80 0.80 ing quality, especially as the number of attack edges increased. 0.75 0.75 0.70 0.70 Using the RF classifier, Íntegro resulted in an AUC which is 0.65 IntegroYBest( 0.65 IntegroYBest( 0.60 IntegroYRF( IntegroYRF( always greater than 0.92, and is up to 30% improvement over IntegroYRandom( 0.60

Mean(area(under(ROC(curve( IntegroYRandom( 0.55 Mean(area(under(ROC(curve( SybilRank( 0.55 SybilRank( SybilRank in each attack scenario, as shown in Fig 5. 0.50 0.50 In each infiltration scenario, both systems performed well Number(of(a9ack(edges( Number(of(a9ack(edges((thousands)( when the number of attack edges was relatively small. In other (a) Targeted-victim attack (b) Random-victim attack words, the fakes were sparsely connected to real accounts and Fig. 5: The ranking quality of both systems in terms of its AUC under so the regions were easily separated. As SybilRank limits the each infiltration scenario (CI=95%). SybilRank and Íntegro resulted in number of fakes that can outrank real accounts by the number a similar performance when a random victim classifier is used, which of attack edges, its AUC degraded significantly as more attack represents a practical baseline for Íntegro. As the number of attack edges were added to each graph. Íntegro, however, maintained edges increased, SybilRank’s AUC decreased significantly close to its performance, with at most 0.07 decrease in AUC, even 0.7, while Íntegro sustained its high performance with AUC > 0.9. when the number of attack edges was relatively large. Notice that Íntegro performed nearly as good as SybilRank when a random victim classifier was used, but performed much better when the RF classifier was used instead. This shows the impact Facebook graph as the real region. We then generated a of leveraging victim prediction on fake account detection. synthetic fake region consisting of 3,068 fakes with 36,816 friendships using the small-world graph model [60]. We then D. Sensitivity to seed-targeting attacks added 35,306 random attack edges between the two regions (n=9,204 and m=110,266). As suggested in related work [34], Sophisticated attackers might obtain a full or partial knowl- we used a relatively large number of fakes and attack edges edge of which accounts are trusted by the OSN operator. As in order to stress-test both systems under evaluation. We refer the total trust is initially distributed among these accounts, an to the this scenario as the random-victim attack. attacker can adversely improve the ranking of the fakes by establishing attack edges directly with them. We next evaluate Propagation rates. For each infiltration scenario, we deployed both systems under two variants of this seed-targeting attack. the previously trained victim classifier in order to assign new edge weights. As we injected fakes in the second scenario, Attack scenarios. We focus on two main attack scenarios. In we generated their feature vectors by sampling each feature the first scenario, the attacker targets accounts that are k nodes distribution of fakes from the first scenario.4 We also assigned away from all trusted accounts. This means that the length of edge weights using another victim classifier that simulates two the shortest from any fake account to any trusted account operational modes. In the first mode, the classifier outputs the is exactly k+1, representing the distance between the seeds best possible victim predictions with an AUC≈1 and proba- and the fake region. For k=0, each trusted account is a victim bilities greater than 0.95. In the second mode, the classifier and located at a distance of 1. We refer to this scenario, which outputs uniformly random predictions with an AUC≈0.5. We assumes a resourceful attacker, as the distant-seed attack. used this classifier to evaluate the theoretical best and practical In the second scenario, attackers have only a partial knowl- worst case performance of Íntegro. edge and target k trusted accounts picked at random. We refer Evaluation method. To evaluate each system’s ranking qual- to this scenario as the random-seed attack. ity, we ran the system using both infiltration scenarios starting Evaluation method. To evaluate the sensitivity of each system with a single attack edge. We then added another attack to a seed-targeting attack, we used the first Facebook graph edge, according to its timestamp if available, and repeated the to simulate each attack scenario. We implemented this by experiment. We kept performing this process until there were replacing the endpoint of each attack edge in the real region no more edges to add. At the end of each run, we measured with a real account picked at random from a set of candidates. the resulting AUC of each system, as explained next. For the first scenario, a candidate account is one that is k nodes away from all trusted accounts. For the second scenario, Performance metric. For the resulting ranked list of accounts, a candidate account is simply any trusted account. We ran we performed ROC analysis by moving a pivot point along the experiments for both systems using different values of k and list, starting from the bottom. If an account is behind the pivot, measured the corresponding AUC at the end of each run. we marked it as fake; otherwise, we marked it as real. Given the ground-truth, we measured the TPR and the FPR across Results. In the first attack scenario, both systems had a poor the whole list. Finally, we computed the corresponding AUC, ranking quality when the distance was small, as illustrated in which in this case quantifies the probability that a random real Fig. 6a. Because Íntegro assigns low weights to edges incident account is ranked higher than a random fake account. to victim accounts, the trust that escapes to the fake region is less likely to come back into the real region. This explains why Seeds and iterations. In order to make the chance of guessing SybilRank had a slightly better AUC for distances less than 3. seeds very small, we picked 100 trusted accounts that are non- However, once the distance was larger, Íntegro outperformed victim, real accounts. We used a total trust that is equal to n, SybilRank ,as expected from earlier results. the number of nodes in the given graph. We also performed In the second attack scenario, the ranking quality of both 4We excluded the “friends” feature, as it can be computed from the graph. systems degraded, as the number of victimized trusted accounts

11 1 1 1000( 20 18

0.8 0.8 800( 16 14

0.6 0.6 600( 12 10

0.4 0.4 400( 8 6 IntegroYBest( Number(of(friends( 0.2 0.2 IntegroYRF( 200( 4 Mean(area(under(ROC(curve( IntegroYRF( Mean(area(under(ROC(curve( IntegroYRandom( 2

SybilRank( SybilRank( PorSon(of(expected(friendships((%)( 0 0 0 0( 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 1 10 20 30 40 50 60 70 80 90 100 0 500 1000 1500 2000 2500 Distance(from(the(fake(region( Number(of(vicSmized(trusted(accounts( Days(since(joining(TuenS( Months(since(joining(TuenS( (a) Distant-seed attack (b) Random-seed attack (a) Users connectivity (b) Friendship growth over time

Fig. 6: The sensitivity of both systems to each seed-targeting attack Fig. 7: Preprocessing. In (a), there is a positive correlation between (CI=95%). In distant-seed attack, an attacker befriends users that are number of days since a user joined Tuenti and how well-connected at a particular distance from all trusted accounts, which represents the user is in terms of number of friends (Pearson’s r = 0.36). In a practical worst case scenario for both system. In the random-seed fact, 93% of all new users who joined Tuenti in the last 30 days had attack, the attacker directly befriends a subset of the trusted accounts. weak connectivity of 46 friends or less, much smaller than the average Overall, both systems are sensitive to seed-targeting attacks. of 254 friends. In (b), we found that most of the friendship growth happens in the first month since joining the network, where users on average establish 18.6% of their friendships. We accordingly defer the consideration of users who joined Tuenti in the last 30 days, as increased, where Íntegro consistently outperformed SybilRank, they will likely be assigned low ranks. as shown in Fig. 6b. Notice that by selecting a larger number of trusted accounts, it becomes much harder for an attacker to guess which account is trusted, while the gained benefit per victimized trusted account is further reduced. of the nodes in each detected community, and designated them as trusted accounts for both systems. E. Deployment at Tuenti Performance metric. As the number of users in the processed GCC is large, it was infeasible to manually inspect and label We deployed both systems on a snapshot of Tuenti’s daily each account. This means that we were unable to evaluate the active users graph in February 6, 2014. The graph consisted system using ROC analysis. Instead, we attempted to determine of several million nodes and tens of millions of edges. We the percentage of fake accounts at equally-sized intervals in had to mask out the exact numbers due to a non-disclosure the ranked list. We accomplished this in collaboration with agreement with Tuenti. After initial analysis of the graph, we Tuenti’s analysts by manually inspecting a user sample in each found that 96.6% of nodes and 94.2% of edges belonged to interval in the list. This percentage is directly related to the one giant connected component (GCC). Therefore, we focused precision of fake account detection, which is a performance our evaluation on this GCC. metric typically used to measure the ratio of relevant items k Preprocessing. Using a uniform random sample of 10K users, over the top- highest ranked items in terms of relevance [61]. we found that new users have weak connectivity to others due Evaluation method. We utilized the previously trained victim to the short time they have been on Tuenti, as shown in Fig. 7a. classifier in order to weight a copy of the graph. We then ran If these users were included in our evaluation, they would end both systems on two versions of the graph (i.e., weighted and up receiving low ranks, which would lead to false positives. unweighted) for dlog2(n)e iterations, where n is number of nodes in the graph. After that, we examined the ranked list of To overcome this issue, we estimated the period after which each system by inspecting the first lowest-ranked one million users accumulate at least 10% of the average number of friends users. We randomly selected 100 users out of each 20K user in Tuenti. To achieve this, we used a uniformly random sample interval for inspection in order to measure the percentage of of 10K real users who joined Tuenti over the last 77 months. fakes in the interval, that is, the precision. We do not include We divided the users in the sample into buckets representing the complete range due to confidentiality reasons. how long they have been active members. We then calculated the average number of new friendships they made after every Results. As shown in Fig. 8a, Íntegro resulted in 95% precision other month. As illustrated in Fig. 7b, users accumulated 53% in the lowest 20K ranking user accounts, as opposed to 43% of their friendships during the first 12 months. In addition, by SybilRank and 5% by Tuenti’s user-based abuse reporting 18.6% of friendships were made after one month since joining system. This percentage dropped dramatically as we went up the network. To this end, we decided to defer the consideration in the list, which means our ranking scheme placed most of of users who have joined in the last 30 days since Feb 6, 2014, the fakes at the bottom of the ranked list, as shown in Fig. 8b. which represented only 1.3% of users in the GCC. Let us consider SybilRank’s ranking shown in Fig. 8a and Community detection. We applied the Louvain method on the Fig. 8c. The precision, starting with 43% for the first interval, preprocessed GCC. The method finished quickly after just 5 gradually decreased until rising again at the 10th interval. This iterations with a high modularity score of 0.83, where a value pattern repeated at the 32nd interval as well. We inspected the of 1 corresponds to a perfect partitioning. In total, we found 42 fake accounts at these intervals and found that they belonged communities and the largest one consisted of 220,846 nodes. to three different, large communities. In addition, these fakes In addition, 15 communities were relatively large containing had a large number of friends, much larger than the average more than 50K nodes. Tuenti’s account analysts verified 0.05% of 254 friends. In particular, the fakes from the 32nd interval

12 1 100 100 14 IntegroYRF( 0.9 90 IntegroYRF( IntegroYRF( SybilRank( 90 SybilRank( 12 SybilRank( 0.8 80 80 0.7 70 70 10 0.6 60 60 8 0.5

50 50 CDF( 6 0.4 40 40 0.3 30 30 4 Percentage(of(fakes((%)( Percentage(of(fakes((%)(

Percentage(of(fakes((%)( 0.2 20 20 2 0.1 Fake( 10 10 Real( 0 0 0 0 1 2 3 4 5 6 7 8 9 0 10 20 30 40 50 10 20 30 40 50 0 100 200 300 400 500 600 20K(node(interval(in(ranked(list( 20K(node(interval(in(ranked(list( 20K(node(interval(in(ranked(list( Number(of(friends((degree)( (a) Precision at lower intervals (b) Precision over the whole list (c) Precision at higher intervals (d) Node degree distribution

Fig. 8: Deployment results at Tuenti. The overall ranking quality of both systems is summarized in (b). Ideally, all fake accounts should be in the bottom of the ranked list. In (a) and (c), we observed that Íntegro consistently outperforms SybilRank in term of fake account detection precision (i.e., the percentage of fakes in each sample). In particular, most of the fake accounts identified by Íntegro were located at significantly lower locations in the ranked list, unlike SybilRank. Upon further inspection of fakes at higher intervals, we found that they established a large number of attack edges, as suggested by the degree distribution in (d).

onwards had more than 300 friends, with a maximum of up distribution for each feature. We then ran Íntegro on each of to 539. Fig. 8d shows the degree distribution for both verified the generated graphs and measured its execution time. fake and real accounts. This figure suggests that fakes tend to create many attack edges with real accounts, which confirms Results. Íntegro achieved a nearly linear scalability with the earlier findings on other OSNs such as Facebook [7]. Also, number of nodes in a graph, as illustrated in Fig. 9. Excluding this behavior explains why Íntegro outperformed SybilRank in the time required to load the 160M node graph into memory, user ranking quality; these high degree fakes received lower 20 minutes for a non-optimized data format, it takes less than ranks as most of their victims were identified by the classifier. 2 minutes to train an RF classifier and compute vulnerability scores for nodes, and less than 25 minutes to weight the graph, SybilRank in retrospect. SybilRank was initially evaluated on rank nodes, and finally sort them. This makes Íntegro compu- Tuenti, where it effectively detected a significant percentage tationally practical even for large OSNs such as Facebook. of the fakes [13]. The original evaluation, however, pruned excessive edges of nodes that had a degree greater than 800, VII.DISCUSSION which include a non-disclosed number of fakes that highly infiltrated Tuenti. Also, the original evaluation was performed As mentioned in Section IV-F, Íntegro’s security guarantee on the whole graph, which included many dormant accounts. is sensitive to the performance of the deployed victim classifier, However, our evaluation was based on the daily active users which is formally captured by the volume vol(Ea) in the bound graph in order to focus on active fake accounts that could be O(vol(Ea) log n), and can be practically measured by its AUC. harmful. While this change limited the number of fakes that existed in the graph, it has evidently revealed the ineffective- Sensitivity to victim classification. As illustrated in Fig. 5, ness of SybilRank under social infiltration. Additionally, the improving the AUC of the victim classifier from random with original evaluation showed that 10–20% of fakes received high AUC ≈ 0.5, to actual with AUC= 0.7, and finally to best with ranks, a result we also attest, due to the fact that these fake AUC ≈ 1 consistently improved the resulting ranking in terms accounts had established many attack edges. On the other hand, of its AUC. Therefore, a higher AUC in victim prediction leads Íntegro has 0–2% of fakes at these high intervals, and so it to a higher AUC in user ranking. This is the case because the delivers an order of magnitude better precision than SybilRank. ROC curve of a victim classifier monotonically increases, so a higher AUC implies a higher true positive rate (TPR). In turn, a higher TPR means more victims are correctly identified, and so VI.IMPLEMENTATION AND SCALABILITY more attack edges are assigned lower weights, which evidently We implemented Íntegro in Mahout5 and Giraph6, which leads to a higher AUC in user ranking. are widely used, open-source distributed machine learning and Sensitivity to social infiltration. Regardless of the used victim graph processing platforms, respectively. We next describe the classifier, the ranking quality decreases as the number of attack scalability of Íntegro using a synthetic benchmark. edges increases, as illustrated in Fig. 5. This is the case because Benchmark. We deployed Íntegro an Amazon Elastic MapRe- even a small false negative rate (FNR) in victim classification duce7 cluster. The cluster consisted of one m1.small instance means more attack edges indecent to misclassified victims are serving as a master node and 32 m2.4xlarge instances serving assigned high weights, leading to a lower AUC in user ranking. as slave nodes. We employed the small-world graph model [60] to generate 5 graphs with an exponentially increasing number Maintenance. While an attacker does not control real accounts of nodes. For each one of these graphs, we used the Facebook nor their activities, it can still trick users into befriending fakes. dataset to randomly generate all feature vectors with the same In order to achieve a high-quality ranking, the victim classifier should be regularly retrained to capture new and changing user 5http://mahout.apache.org behavior in terms of susceptibility to social infiltration. This is, 6http://giraph.apache.org/ in fact, the case for supervised machine learning when applied 7http://aws.amazon.com/elasticmapreduce to computer security problems [8]. Also, as the ranking scheme

13 4.0 25 and is designed to detect automated fake accounts that be- 3.5 20 friend many victims for subsequent attacks. Íntegro has been 3.0 15 deployed at Tuenti along side a feature-based detection system 2.5 and a user-based abuse reporting system. 10 2.0

ExecuSon(Sme((minutes)( 5 ExecuSon(Sme((minutes)( 1.5 IX.CONCLUSION

1.0 0 10 60 110 160 10 60 110 160 OSNs today are faced with the problem of detecting fake Number(of(nodes((millions)( Number(of(nodes((millions)( accounts in a highly adversarial environment. The problem (a) Mahout (b) Giraph is becoming more challenging as such accounts have become Fig. 9: System scalability on both platforms. In (a), the execution time sophisticated in cloaking their operation with patterns resem- includes the time to train an RF classifier and compute a vulnerability bling real user behavior. In this paper, we presented Íntegro, score for each node in the graph. In (b), the execution time includes a scalable defense system that helps OSN operators to detect the time to weight the graph, rank nodes, and finally sort them. fake accounts using a meaningful user ranking scheme. Our evaluation results show that SybilRank, the state-of- the-art in fake account detection, is ineffective when the fakes is sensitive to seed-targeting attacks, the set of trusted accounts infiltrate the target OSN by befriending a large number of real should be regularly updated and validated in order to reduce users. Íntegro, however, has proven more resilient to this effect the negative impact of these attacks, even if they are unlikely by leveraging in a novel way the knowledge of benign victim to occur or succeed in practice, as discussed in Section IV-D. accounts that befriend fakes . We have implemented Íntegro on top of standard data processing platforms, Mahout and Giraph, Impact. By using Íntegro, Tuenti requires nearly 67 man hours which are scalable and easy to deploy in modern data centers. to manually validate the 20K lowest ranking user accounts, and In fact, Tuenti, the largest OSN in Spain with more than 15M discover about 19K fake accounts instead of 8.6K fakes with active users, has deployed our system in production to thwart SybilRank. With its user-based abuse reporting system that has fakes in the wild with at least 10 time more precision. 5% hit rate, and assuming all fakes get reported, Tuenti would need 1,267 man hours instead to discover 19K fake accounts. This improvement has been useful to both Tuenti and its users. X.ACKNOWLEDGMENT We would like to thank our shepherd, Gianluca Stringhini, VIII.LIMITATIONS and our colleagues for their help and feedback on an earlier We next outline two design limitations which are inherited version of this paper. The first author is thankful to the Univer- from SybilRank [13] and similar ranking schemes [34]: sity of British Columbia for a generous doctoral fellowship.

• Íntegro’s design is limited to only undirected social graphs. REFERENCES In other words, OSNs whose users declare lateral relationships are not expected to benefit from our proposal. This is the [1] J. R. Douceur, “The sybil attack,” in 1st International Workshop on case because directed graphs, in general, have a significantly Peer-to-Peer Systems. Springer-Verlag, 2002, pp. 251–260. smaller mixing time than their undirected counterparts [62], [2] Facebook, “Quarterly earning reports,” Jan 2014. [Online]. Available: http://goo.gl/YujtO which means a random walk on such graphs will converge in [3] CBC, “Facebook shares drop on news of fake accounts,” Aug 2012. a much small number of steps, rendering short random walks [Online]. Available: http://goo.gl/6s5FKL unsuitable for robust user ranking. [4] K. Thomas and et al., “Suspended accounts in retrospect: an analysis • Íntegro delays the consideration of new user accounts. This of Twitter spam,” in Proc. IMC’11. ACM, 2011, pp. 243–258. means that an OSN operator might miss the chance to detect [5] G. Yan and et al., “Malware propagation in online social networks: nature, dynamics, and defense implications,” in Proc. ASIACCS’11. fakes at their early life-cycle. However, as shown in Figure 7a, ACM, 2011, pp. 196–206. only 7% of new users who joined Tuenti in the last month [6] J. Ratkiewicz and et al., “Truthy: mapping the spread of astroturf in had more than 46 friends. To estimate the number of fakes in microblog streams,” in Proc. WWW’11. ACM, 2011, pp. 249–252. new accounts, we picked 100 accounts at random for manual [7] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu, “The verification. We found that only 6% of these accounts were socialbot network: when bots socialize for fame and money,” in Proc. fake, and the most successful fake account had 103 victims. ACSAC’11. ACM, 2011, pp. 93–102. In practice, the decision of whether to exclude these account [8] T. Stein, E. Chen, and K. Mangla, “Facebook immune system,” in is operational, and it depends on the actions taken on low- Proceedings of the 4th Workshop on Systems. ACM, ranking users. For example, an operator can enforce abuse 2011, pp. 8–14. mitigation technique, as discussed in Section II-C, against low- [9] L. Alvisi, A. Clement, A. Epasto, U. Sapienza, S. Lattanzi, and A. Pan- conesi, “SoK: The evolution of sybil defense via social networks,” In ranking users, where false positives can negatively affect user Proceedings of the IEEE Symposium on Security and Privacy, 2013. experience but slow down fake accounts that just joined the [10] L. Bilge, T. Strufe, D. Balzarotti, and E. Kirda, “All your contacts are network. This is a security/usability trade-off which we leave to belong to us: automated identity theft attacks on social networks,” in the operator to manage. Alternatively, the operator can use fake Proc. WWW’09. ACM, 2009, pp. 551–560. account detection systems that are designed to admit legitimate [11] C. Wagner, S. Mitter, C. Körner, and M. Strohmaier, “When social bots new users using, for example, a vouching process [63]. attack: Modeling susceptibility of users in online social networks,” in WWW Workshop on Making Sense of Microposts, vol. 12, 2012. Íntegro is not a stand-alone fake account detection system. [12] M. N. Ko, G. P. Cheek, M. Shehab, and R. Sandhu, “Social-networks It is intended to complement existing abuse detection systems connect services,” Computer, vol. 43, no. 8, pp. 37–43, 2010.

14 [13] Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro, “Aiding the detection [38] S. Fortunato, “Community detection in graphs,” Physics Reports, vol. of fake accounts in large scale social online services,” in Proc. NSDI’12. 486, no. 3, pp. 75–174, 2010. USENIX Association, 2012, pp. 15–15. [39] A. Mislove and et al., “You are who you know: inferring user profiles in [14] S. Yardi, N. Feamster, and A. Bruckman, “Photo-based authentication online social networks,” in Proc. WSDM’10. ACM, 2010, pp. 251–260. using social networks,” in Proceedings of the first workshop on Online [40] G. Wang, M. Mohanlal, C. Wilson, X. Wang, M. Metzger, H. Zheng, social networks. ACM, 2008, pp. 55–60. and B. Y. Zhao, “Social turing tests: Crowdsourcing sybil detection,” [15] S. D. Kamvar and et al., “The EigenTrust algorithm for reputation in Proc. NDSS’13. ACM, 2013. management in P2P networks,” in Proceedings of 12th international [41] S. Ghosh and et al., “Understanding and combating link farming in the conference on World Wide Web. ACM, 2003, pp. 640–651. twitter social network,” in Proceedings of 21st international conference [16] H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman, “Sybilguard: on World Wide Web. ACM, 2012, pp. 61–70. defending against sybil attacks via social networks,” ACM SIGCOMM [42] A. Elyashar, M. Fire, D. Kagan, and Y. Elovici, “Homing socialbots: Computer Communication Review, vol. 36, no. 4, pp. 267–278, 2006. intrusion on a specific organization’s employee using socialbots,” in [17] H. Yu and et al., “Sybillimit: A near-optimal social network defense Proc. ASONAM’13. ACM, 2013, pp. 1358–1365. against sybil attacks,” in Proc. S&P’08. IEEE, 2008, pp. 3–17. [43] G. Stringhini and et al., “Follow the green: growth and dynamics in [18] G. Danezis and P. Mittal, “Sybilinfer: Detecting sybil nodes using social twitter follower markets,” in Proc. IMC’13. ACM, 2013, pp. 163–176. networks.” in Proceedings of the 9th Annual Network & Distributed [44] C. Yang and et al., “Analyzing spammers’ social networks for fun and System Security Symposium. ACM, 2009. profit: a case study of cyber criminal ecosystem on twitter,” in Proc. of [19] B. Viswanath, A. Post, K. P. Gummadi, and A. Mislove, “An analysis WWW’12. ACM, 2012, pp. 71–80. of social network-based sybil defenses,” in Proceedings of ACM SIG- [45] D. A. Spielman and S.-H. Teng, “Nearly-linear time algorithms for COMM Computer Communication Review. ACM, 2010, pp. 363–374. graph partitioning, graph sparsification, and solving linear systems,” in [20] N. Tran, J. Li, L. Subramanian, and S. S. Chow, “Optimal sybil- Proc. TC’04. ACM, 2004, pp. 81–90. resilient node admission control,” in INFOCOM, 2011 Proceedings [46] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. IEEE. IEEE, 2011, pp. 3218–3226. 5–32, 2001. [21] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on [47] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statisti- large clusters,” Comm. of ACM, vol. 51, no. 1, pp. 107–113, 2008. cal learning: Data mining, inference, and prediction, second edition. [22] G. Malewicz and et al., “Pregel: a system for large-scale graph Springer, 2009. processing,” in Proceedings of the 2010 ACM SIGMOD International [48] Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen, “Combating web spam Conference on Management of data. ACM, 2010, pp. 135–146. with trustrank,” in Proceedings of VLDB, 2004, pp. 576–587. [23] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu, “Design and [49] G. H. Golub and H. A. Van der Vorst, “Eigenvalue computation in the analysis of a social botnet,” Computer Networks, vol. 57, no. 2, pp. 20th century,” Journal of Computational and Applied Mathematics, vol. 556–578, 2013. 123, no. 1, pp. 35–65, 2000. [24] T. Hwang, I. Pearce, and M. Nanis, “Socialbots: Voices from the fronts,” [50] E. Behrends, Introduction to Markov chains with special emphasis on interactions, vol. 19, no. 2, pp. 38–45, 2012. rapid mixing. Vieweg, 2000, vol. 228. [25] M. Egele and et al., “COMPA: Detecting compromised accounts on [51] M. Dellamico and Y. Roudier, “A measurement of mixing time in social social networks.” in Proc. NDSS’13, 2013. networks,” in Proceedings of the 5th International Workshop on Security and Trust Management, Saint Malo, France, 2009. [26] M. Motoyama and et al., “Dirty jobs: The role of freelance labor in web service abuse,” in Proceedings of the 20th USENIX Security Symposium. [52] A. Mohaisen, A. Yun, and Y. Kim, “Measuring the mixing time of USENIX Association, 2011, pp. 14–14. social graphs,” in Proceedings of the 10th annual conference on Internet measurement. ACM, 2010, pp. 383–389. [27] Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and Y. Dai, “Uncovering social network sybils in the wild,” in Proceedings of 2011 [53] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, “Statistical ACM Internet Measurement Csonference. ACM, 2011, pp. 259–268. properties of community structure in large social and information networks,” in Proc. WWW’08. ACM, 2008, pp. 695–704. [28] G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in Proceedings of the 26th Annual Computer Security [54] V. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfold- Applications Conference. ACM, 2010, pp. 1–9. ing of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, 2008. [29] G. Wang and et al., “You are how you click: Clickstream analysis for sybil detection,” in Proceedings of the 22nd USENIX Security [55] M. E. Newman, “Modularity and community structure in networks,” Symposium. USENIX Association, 2013, pp. 1–8. Proceedings of the National Academy of Sciences, vol. 103, no. 23, pp. 8577–8582, 2006. [30] G. Karypis and V. Kumar, “Multilevel k-way partitioning scheme for irregular graphs,” Journal of Parallel and Distributed computing, [56] Y. Boshmaf, D. Logothetis, G. Siganos, J. Lería, J. Lorenzo, M. Ri- vol. 48, no. 1, pp. 96–129, 1998. peanu, and K. Beznosov, “Íntegro: Leveraging victim prediction for robust fake account detection in OSNs,” LERSSE technical report, 2014. [31] J. Tygar, “Adversarial machine learning.” IEEE Internet Computing, vol. 15, no. 5, 2011. [57] A. Sinclair, “Improved bounds for mixing rates of Markov chains and multicommodity flow,” in Proceedings of Latin American Symposium [32] D. Lowd and C. Meek, “Adversarial learning,” in Proceedings of the on Theoretical Informatics. Springer-Verlag, 1992, pp. 474–487. 11th ACM SIGKDD. ACM, 2005, pp. 641–647. [58] D. N. Tran, B. Min, J. Li, and L. Subramanian, “Sybil-resilient online [33] Y. Boshmaf and et al., “Key challenges in defending against malicious content voting.” in NSDI, vol. 9, 2009, pp. 15–28. socialbots,” in Proc. LEET’12, vol. 12, 2012. [59] J. Leskovec and C. Faloutsos, “Sampling from large graphs,” in Pro- [34] H. Yu, “Sybil defenses via social networks: a tutorial and survey,” ACM ceedings of the ACM SIGKDD Conference. ACM, 2006, pp. 631–636. SIGACT News, vol. 42, no. 3, pp. 80–101, 2011. [60] D. J. Watts and S. H. Strogatz, “Collective dynamics of small-world [35] B. Viswanath and et al., “Exploring the design space of social network- networks,” nature, vol. 393, no. 6684, pp. 440–442, 1998. based sybil defenses,” in In Proceedings of the 4th International [61] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, “Evalu- Conference on Communication Systems and Networks. IEEE, 2012, ating collaborative filtering recommender systems,” ACM Transactions pp. 1–8. on Information Systems (TOIS), vol. 22, no. 1, pp. 5–53, 2004. [36] Y. Boshmaf and et al., “Graph-based sybil detection in social and [62] A. Mohaisen, H. Tran, N. Hopper, and Y. Kim, “On the mixing time Proc. ASONAM’13 information systems,” in . IEEE, 2013. of directed social graphs and security implications,” in Proceedings of [37] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney, “Community the ASIACCS Conference. ACM, 2012, pp. 36–37. structure in large networks: Natural cluster sizes and the absence of [63] Y. Xie and et al., “Innocent by association: early recognition of large well-defined clusters,” Internet Mathematics, vol. 6, no. 1, pp. legitimate users,” in Proc. CCS’12. ACM, 2012, pp. 353–364. 29–123, 2009.

15