<<

Identifying and Characterizing Sybils in the Network Philipp Winter, Princeton University and Karlstad University; Roya Ensafi, Princeton University; Karsten Loesing, ; Nick Feamster, Princeton University https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/winter

This paper is included in the Proceedings of the 25th USENIX Security Symposium August 10–12, 2016 • Austin, TX ISBN 978-1-931971-32-4

Open access to the Proceedings of the 25th USENIX Security Symposium is sponsored by USENIX Identifying and characterizing Sybils in the Tor network

† ‡ Philipp Winter∗ Roya Ensafi∗ Karsten Loesing Nick Feamster∗ † ‡ ∗Princeton University Karlstad University The Tor Project

Abstract consensus weight—an attacker can observe. As the at- tacker’s consensus weight grows, the following attacks Being a volunteer-run, distributed anonymity network, become easier. Tor is vulnerable to Sybil attacks. Little is known about real-world Sybils in the Tor network, and we lack practi- Exit traffic tampering: When leaving the Tor network, cal tools and methods to expose Sybil attacks. In this a Tor user’s traffic traverses exit relays, the last hop work, we develop sybilhunter, a system for detecting in a Tor circuit. Controlling exit relays, an attacker Sybil relays based on their appearance, such as config- can eavesdrop on traffic to collect unencrypted cre- uration; and behavior, such as uptime sequences. We dentials, break into TLS-protected connections, or used sybilhunter’s diverse analysis techniques to analyze inject malicious content [37, § 5.2]. nine years of archived Tor network data, providing us fingerprinting: Tor’s prevents with new insights into the operation of real-world attack- guard relays (the first hop in a Tor circuit) from ers. Our findings include diverse Sybils, ranging from learning their user’s online activity. Ignoring the botnets, to academic research, and relays that hijacked encrypted payload, an attacker can still take ad- transactions. Our work shows that existing Sybil vantage of flow information such as packet lengths defenses do not apply to Tor, it delivers insights into real- and timings to infer what Tor users are world attacks, and provides practical tools to uncover visiting [16]. and characterize Sybils, making the network safer for its Bridge address harvesting: Users behind censorship users. systems use private Tor relays—typically called bridges—as hidden stepping stones into the Tor net- 1 Introduction work. It is important that censors cannot obtain all bridge addresses, which is why The Tor Project rate- limits bridge distribution. However, an attacker can In a Sybil attack, an attacker controls many virtual iden- harvest bridge addresses by running a middle relay tities to obtain disproportionately large influence in a net- and looking for incoming connections that do not work. These attacks take many shapes, such as sockpup- originate from any of the publicly known guard re- pets hijacking online discourse [34]; the manipulation of lays [22, § 3.4]. BitTorrent’s distributed hash table [35]; and, most rele- By running both entry guards vant to our work, relays in the Tor network that seek to End-to-end correlation: and exit relays, an attacker can use timing analysis deanonymize users [8]. In addition to coining the term to link a Tor user’s identity to her activity, e.g., learn “Sybil,”1 Douceur showed that practical Sybil defenses that Alice is visiting Facebook. For this attack to are challenging, arguing that Sybil attacks are always work, an attacker must run at least two Tor relays, or possible without a central authority [11]. In this work, be able to eavesdrop on at least two networks [14]. we focus on Sybils in Tor—relays that are controlled by a single operator. But what harm can Sybils do? The effectiveness of many attacks on Tor depends on Configuring a relay to forward more traffic allows an how large a fraction of the network’s traffic—called the attacker to increase her consensus weight. However, the capacity of a single relay is limited by its link band- 1The term is a reference to a book in which the female protagonist, width and, because of the computational cost of cryptog- Sybil, suffers from dissociative identity disorder [29]. raphy, by CPU. Ultimately, increasing consensus weight

USENIX Association 25th USENIX Security Symposium 1169 requires an adversary to add relays to the network; we 2 Related work call these additional relays Sybils. In addition to the above attacks, an adversary needs In his seminal 2002 paper, Douceur showed that only a Sybil relays to manipulate onion services, which are TCP central authority that verifies new nodes as they join the servers whose IP address is hidden by Tor. In the current distributed system is guaranteed to prevent Sybils [11]. onion service protocol, six Sybil relays are sufficient to This approach conflicts with Tor’s design philosophy that take offline an onion service because of a weakness in seeks to distribute trust and eliminate central points of the design of the distributed hash table (DHT) that pow- control. In addition, a major factor contributing to Tor’s ers onion services [4, § V]. Finally, instead of being a network growth is the low barrier of entry, allowing op- direct means to an end, Sybil relays can be a side effect erators to set up relays both quickly and anonymously. of another issue. In Section 5.1, we provide evidence for An identity-verifying authority would raise that barrier, what appears to be botnets whose zombies are running alienate privacy-conscious relay operators, and impede Tor relays, perhaps because of a misguided attempt to Tor’s growth. Barring a central authority, researchers help the Tor network grow. have proposed techniques that leverage a resource that is Motivated by the lack of practical Sybil detection difficult for an attacker to scale. Two categories of Sybil- tools, we design and implement heuristics, leverag- resistant schemes turned out to be particularly popular, ing our observations that Sybils (i) frequently go on- schemes that build on social constraints and schemes line and offline simultaneously, (ii) share similarities in that build on computational constraints. For a broad their configuration, and (iii) may change their identity overview of alternative Sybil defenses, refer to Levine fingerprint—a relay’s fingerprint is the hash over its pub- et al. [19]. lic key—frequently, to manipulate Tor’s DHT. Three of Social constraints rely on the assumption that it is diffi- our four heuristics are automated and designed to run cult for an attacker to form trust relationships with honest autonomously while one assists in manual analysis by users, e.g., befriend many strangers on online social net- ranking what relays in the network are the most similar works. Past work leveraged this assumption in systems to a given reference relay. Our evaluation suggests that such as SybilGuard [39], SybilLimit [38], and Sybil- our heuristics differ in their effectiveness; one method Infer [6]. Unfortunately, social graph-based defenses detected only a small number of incidents, but some of do not work in our setting because there is no existing them no other method could detect. Other heuristics pro- trust relationship between relay operators.3 Note that we duced a large number of results, and seem well-suited could create such a relationship by, e.g., linking relays to to spot the “low hanging fruit.” We implemented these their operator’s social networking account, or by creat- heuristics in a tool, sybilhunter, which we subsequently ing a “relay operator ,” but again, we believe used to analyze 100 GiB worth of archived network data, that such an effort would alienate relay operators and see consisting of millions of files, and dating back to 2007. limited adoption. Finally, we characterize the Sybil groups we discovered. Orthogonal to social constraints, computational re- To sum up, we make the following key contributions: source constraints guarantee that an attacker seeking to operate 100 Sybils needs 100 times the computational re- We design and implement sybilhunter, a tool to an- • sources she would have needed for a single virtual iden- alyze past and future Tor network data. While we tity. Both Borisov [5] and Li et al. [21] used compu- designed it specifically for the use in Tor, our tech- tational puzzles for that purpose. Computational con- niques are general in nature and can easily be ap- straints work well in distributed systems where the cost plied to other distributed systems such as [31]. of joining the network is low. For example, a lightweight We characterize Sybil groups and publish our find- client is sufficient to use BitTorrent, allowing even low- • ings as datasets to stimulate future research.2 We end consumer devices to participate. However, this is not find that Sybils run MitM attacks, DoS attacks, and the case in Tor because relay operations require constant are used for research projects. use of bandwidth and CPU. Unlike in many other dis- tributed systems, it is impossible to run 100 Tor relays The rest of this paper is structured as follows. We while not spending the resources for 100 relays. Compu- begin by discussing related work in Section 2 and give tational constraints are inherently tied to running a relay. some background on Tor in Section 3. Section 4 presents In summary, we believe that existing Sybil defenses the design of our analysis tools, which is then followed are ill-suited for application in the Tor network; its dis- by experimental results in Section 5. We discuss our re- tinctive features call for customized solutions that con- sults in Section 6 and conclude the paper in Section 7. 3Relay operators can express in their configuration that their relays 2The datasets are available online at are run by the same operator, but this denotes an intra-person and not https://nymity.ch/sybilhunting/. an inter-person trust relationship.

2 1170 25th USENIX Security Symposium USENIX Association sider the nature of Tor relays. There has already been Decoy Bad exit website some progress towards that direction; namely, The Tor relay Project has incorporated a number of both implicit and explicit Sybil defenses that are in place as of June 2016. First, directory authorities—the “gatekeepers” of the Tor Exit relay Nearest network—accept at most two relays per IP address to Tor network neighbors prevent low-resource Sybil attacks [3, 2]. Similarly, Malicious exitmap Tor’s path selection algorithm ensures that Tor clients relays sybilhunter never select two relays in the same /16 network [9]. Sec- ond, directory authorities automatically assign flags to All relays, indicating their status and quality of service. The relays sybilhunter Tor Project has recently increased the minimal time until Consensuses Potential and descriptors relays obtain the Stable flag (seven days) and the HSDir Sybils flag (96 hours). This change increases the cost of Sybil Figure 1: Sybilhunter’s architecture. Two datasets serve attacks and gives Tor developers more time to discover as input to sybilhunter; consensuses and server descrip- and block suspicious relays before they get in a posi- tors, and malicious relays gathered with exitmap [37, tion to run an attack. Finally, the operation of a Tor re- § 3.1]. lay causes recurring costs—most notably bandwidth and electricity—which can further restrain an adversary. of a relay family. Families are used to express that a set 3 Background of relays is controlled by a single operator. Tor clients never use more than one family member in their path We now provide necessary background on the Tor net- to prevent correlation attacks. In February 2016, there work [10]. Tor consists of several thousand volunteer-run were approximately 400 relay families among all 7,000 relays that are summarized in the network consensus that relays. is voted on and published each hour by nine distributed directory authorities. The authorities assign a variety of 4 Data and design flags to relays: We define Sybils in the Tor network as two or more re- Valid: The relay is valid, i.e., not known to be broken. lays that are controlled by a single person or group of HSDir: The relay is an onion service directory, i.e., it people. Sybils per se do not have to be malicious; a relay participates in the DHT that powers Tor onion ser- operator could simply have forgotten to configure her re- vices. lays as a relay family. Such Sybils are no threat to the Tor Exit: The relay is an exit relay. network, which is why we refer to them as benign Sybils. BadExit: The relay is an exit relay, but is either mis- What we are interested in is malicious Sybils whose pur- configured or malicious, and should therefore not pose is to deanonymize or otherwise harm Tor users. be used by Tor clients. To uncover malicious Sybils, we draw on two Stable: Relays are stable if their mean time between datasets—one publicly available and one created by us. failure is at least the median of all relays, or at least Our detection methods are implemented in a tool, sybil- seven days. hunter, which takes as input our two datasets and then at- Guard: Guard relays are the rarely-changing first hop tempts to expose Sybil groups, as illustrated in Figure 1. for Tor clients. Sybilhunter is implemented in Go and consists of 2,300 Running: A relay is running if the directory authorities lines of code. could connect to it in the last 45 minutes.

Tor relays are uniquely identified by their fingerprint, 4.1 Datasets a Base32-encoded and truncated SHA-1 hash over their public key. Operators can further assign a nickname to Figure 1 shows how we use our two datasets. Archived their Tor relays, which is a string that identifies a relay consensuses and router descriptors (in short: descriptors) (albeit not uniquely) and is easier to remember than its allow us to (i) restore past states of the Tor network, pseudo-random fingerprint. Exit relays have an exit pol- which sybilhunter mines for Sybil groups, and to (ii) find icy—a list of IP addresses and ports that the relay allows “partners in ” of malicious exit relays that we dis- connections to. Finally, operators that run more than one covered by running exitmap, a scanner for Tor exit relays relay are encouraged to configure their relays to be part that we discuss below.

3 USENIX Association 25th USENIX Security Symposium 1171 Consensus Dataset # of files Size Time span Consensuses 72,061 51 GiB 10/2007–01/2016 ∙ Descriptor pointer ∙ Address and ports Descriptors 34,789,777 52 GiB 12/2005–01/2016 ∙ Nickname ∙ Platform ∙ Fingerprint ∙ Protocols Table 1: An overview of our primary dataset; consen- ∙ Publication ∙ Published suses and server descriptors since 2007 and 2005, respec- ∙ Address and ports ∙ Fingerprint ∙ Flags ∙ Uptime tively. ∙ Version ∙ Bandwidth ∙ Bandwidth ∙ Signature ∙ Exit policy Router descriptor Router statuses 4.1.2 Malicious exit relays

Figure 2: Our primary dataset contains nine years worth of consensuses and router descriptors. In addition to our publicly available and primary dataset, we collected malicious exit relays over 18 months. We call exit relays malicious if they modify forwarded traffic 4.1.1 Consensuses and router descriptors in bad faith, e.g., to run man-in-the-middle attacks. We add these relays to our dataset because they frequently The consensus and descriptor dataset is publicly avail- surface in groups, as malicious Sybils, because an at- able on CollecTor [32], an archiving service that is run tacker runs the same attack on several, physically dis- by The Tor Project. Some of the archived data dates back tinct exit relays. Winter et al.’s work [37, § 5.2] further to 2004, allowing us to restore arbitrary Tor network con- showed that attackers make an effort to stay under the figurations from the last decade. Not all of CollecTor’s radar, which is why we cannot only rely on active prob- archived data is relevant to our hunt for Sybils, though, ing to find such relays. We also seek to find potential which is why we only analyze the following two: “partners in crime” of each newly discovered malicious relay, which we discuss in Section 4.3.4.

Descriptors Tor relays and bridges periodically upload We exposed malicious exit relays using Winter et al.’s router descriptors, which capture their configuration, to exitmap tool [37, § 3.1]. Exitmap is a Python-based directory authorities. Figure 2 shows an example in the scanning framework for Tor exit relays. Exitmap mod- box to the right. Relays upload their descriptors no later ules perform a network task that can then be run over all than every 18 hours, or sooner, depending on certain con- exit relays. One use case is HTTPS man-in-the-middle ditions. Note that some information in router descriptors detection: A module can fetch the certificate of a web is not verified by directory authorities. Therefore, relays server over all exit relays and then compare its finger- can spoof information such as their operating system, Tor print with the expected, valid fingerprint. Exposed at- version, and uptime. tacks are sometimes difficult to attribute because an at- tack can take place upstream of the exit relay, e.g., at a malicious autonomous system. However, attribution is Consensuses Each hour, the nine directory authorities only a secondary concern. Our primary concern is pro- vote on their view of all Tor relays that are currently on- tecting Tor users from harm, and we do not need to iden- line. The vote produces the consensus, an authoritative tify the culprit to do so. list that comprises all running Tor relays, represented as a set of router statuses. Each router status in the consen- In addition to using the original exitmap modules [37, sus contains basic information about Tor relays such as § 3.1], we implemented modules that detect HTML and their bandwidth, flags, and exit policy. It also contains a HTTP tampering by connecting to a decoy server under pointer to the relay’s descriptor, as shown in Figure 2. As our control, and flagging an exit relay as malicious if the of June 2016, consensuses contain approximately 7,000 returned HTML or HTTP was modified, e.g., to inject router statuses, i.e., each hour, 7,000 router statuses are data or redirect a user over a transparent HTTP proxy. published, and archived, by CollecTor. Since we controlled the decoy server, we knew what our Table 1 gives an overview of the size of our consen- Tor client should get in response. Our modules ran pe- sus and descriptor archives. We found it challenging to riodically from August 2014 to January 2016, and dis- repeatedly process these millions of files, amounting to covered 251 malicious exit relays whose attacks are dis- more than 100 GiB of uncompressed data, so we imple- cussed in Appendix A. We reported all relays to The Tor mented a custom parser in Go [36]. Project, which subsequently blocked these relays.

4 1172 25th USENIX Security Symposium USENIX Association sybilhunter Suspicious groups. We iteratively improved our code and augmented relays CSV file it with new features when we experienced operational Churn shortcomings. Filter Fingerprints Consensuses Uptime 4.3.1 Network churn and descriptors Image The churn rate of a distributed system captures the rate Figure 3: Sybilhunter’s internal architecture. After an of joining and leaving network participants. In the Tor optional filtering step, data is then passed on to one of network, these participants are relays. An unexpect- three analysis modules that produce as output either CSV edly high churn rate between two subsequent consen- files or an image. suses means that many relays joined or left, which can re- veal Sybils and other network issues because many Sybil operators start and stop their Sybils at the same time, to 4.2 Threat model ease administration—they behave similarly. Most of this paper is about applying sybilhunter to The Tor Project is maintaining a Python script [15] archived network data, but we can also apply it to newly that determines the number of previously unobserved re- incoming data. This puts us in an adversarial setting lay fingerprints in new consensuses. If that number is as attackers can tune their Sybils to evade our system. greater than or equal to the static threshold 50, the script This is reflected in our adversarial assumptions. We as- sends an e-mail alert. We reimplemented the script in sume that an adversary does run more than one Tor re- sybilhunter and ran it over all archived consensus docu- lay and exhibits redundancy in their relay configuration, ments, dating back to 2007. The script raised 47 alerts or uptime sequence. An adversary further can know in nine years, all of which seemed to be true positives, how sybilhunter’s modules work, run active or passive i.e., they should be of interest to The Tor Project. The attacks, and make a limited effort to stay under the radar, script did not raise false positives, presumably because by diversifying parts of their configuration. To detect the median number of previously unseen fingerprints in Sybils, however, our heuristics require some redundancy. a consensus is only six—significantly below the conser- vative threshold of 50. Yet, the threshold likely causes false negatives, but we cannot determine the false nega- 4.3 Analysis techniques tive rate because we lack ground truth. In addition, The Having discussed our datasets and threat model, we now Tor Project’s script does not consider relays that left the turn to presenting techniques that can expose Sybils. Our network, does not distinguish between relays with differ- techniques are based on the insight that Sybil relays fre- ent flags, and does not adapt its threshold as the network quently behave or appear similarly. Shared configu- grows. We now present an alternative approach that is ration parameters such as port numbers and nicknames more flexible and robust. cause similar appearance whereas Sybils behave simi- We found that churn anomalies worthy of our attention larly when they reboot simultaneously, or exhibit iden- range from flat hills (Figure 4) to sudden spikes (Fig- tical quirks when relaying traffic. ure 5). Flat hills can be a sign of an event that affected a Sybilhunter can analyze (i) historical network data, large number of relays, over many hours or days. Such dating back to 2007; (ii) online data, to detect new Sybils an event happened shortly after the bug, when as they join the network; and (iii) find relays that might The Tor Project asked relay operators to generate new be associated with previously discovered, malicious re- keys. Relay operators acted gradually, most within two lays. Figure 3 shows sybilhunter’s internal architecture. days. Sudden spikes can happen if an attacker adds many Tor network data first passes a filtering component that relays, all at once. These are mere examples, however; can be used to inspect a subset of the data, e.g., only the shape of a time series cannot tell us anything about relays with a given IP address or nickname. The data the nature of the underlying incident. is then forwarded to one or more modules that imple- To quantify the churn rate α between two subsequent ment an analysis technique. These modules work inde- consensus documents, we adapt Godfrey et al.’s formula, pendently, but share a data structure to find suspicious re- which yields a churn value that captures both systems lays that show up in more than one module. Depending that joined and systems that left the network [13, § 2.1]. on the analysis technique, sybilhunter’s output is either However, an unusually low number of systems that left CSV files or images. could cancel out an unusually high number of new sys- While developing sybilhunter, we had to make many tems and vice versa—an undesired property for a tech- design decisions that we tackled by drawing on the expe- nique that should spot abnormal changes. To address rience we gained by manually analyzing numerous Sybil this issue, we split the formula in two parts, creating a

5 USENIX Association 25th USENIX Security Symposium 1173 Maximum Realistic 5 10 15 New relays per hour relays New

Jun 05 Jun 15 Jun 25 Jul 05 Jul 15 0 200 400 600 Max. undetected Sybils Max. undetected

Time 0.02 0.04 0.06 0.08

Figure 4: A flat hill of new relays in 2009. The time Threshold for churn value α series was smoothed using a moving average with a win- y dow size of 12 hours. Figure 6: The number of new Sybils ( axis) that can remain undetected given a threshold for the churn value α (x axis). The diagram shows both the maximum and a more realistic estimate that accounts for the median num- ber of new relays in consensuses. 5 10 20

New relays per hour relays New Finally, to detect changes in the underlying time se- Sep 03 Sep 08 Sep 13 Sep 18 Sep 23 Sep 28 ries trend—flat hills—we can smooth αn,l using a simple Time moving average λ defined as

w Figure 5: A sudden spike of new relays in 2010. The 1 λ = ∑ αi. (2) time series was smoothed using a moving average with a w · i=0 window size of 12 hours. As we increase the window size w, we can detect more subtle changes in the underlying churn trend. If λ or αn,l time series for new relays (αn) and for relays that left exceed a manually defined threshold, an alert is raised. (α ). C is the network consensus at time t, and denotes l t \ Section 5.3 elaborates on how we can select a threshold the complement between two consensuses, i.e., the relays in practice. that are in the left operand, but not the right operand. We define α and α as n l 4.3.2 Uptime matrix For convenience, Sybil operators are likely to administer Ct Ct 1 Ct 1 Ct αn = | \ − | and αl = | − \ |. (1) their relays simultaneously, i.e., update, configure, and Ct Ct 1 | | | − | reboot them all at the same time. This is reflected in their Both αn and αl are bounded to the interval [0,1].A relays’ uptime. An operating system upgrade that re- churn value of 0 indicates no change between two subse- quires a reboot of Sybil relays will induce a set of relays quent consensuses whereas a churn value of 1 indicates to go offline and return online in a synchronized manner. a complete turnover. Determining αn,l for the sequence To isolate such events, we are visualizing the uptime pat- Ct ,Ct 1, ..., Ct n, yields a time series of churn values terns of Tor relays by grouping together relays whose up- that can− readily− be inspected for abnormal spikes. Fig- time is highly correlated. The churn technique presented ure 6 illustrates the maximum number of Sybils an at- above is similar but it only provides an aggregate, high- tacker can add to the network given a threshold for α. level view on how Tor relays join and leave the network. The figure shows both the theoretical maximum and a Since the technique is aggregate, it is poorly suited for more realistic estimate that accounts for noise, i.e., the visualizing the uptime of specific relays; an abnormally median number of new relays in each consensus, which high churn value attracts our attention but does not tell is 73.4 We found that many churn anomalies are caused us what caused the anomaly. To fill this gap, we comple- by relays that share a flag, or a flag combination, e.g., ment the churn analysis with an uptime matrix that we HSDir (onion service directories) and Exit (exit relays). will now present. Therefore, sybilhunter can also generate per-flag churn This uptime matrix consists of the uptime patterns of time series that can uncover patterns that would be lost all Tor relays, which we represent as binary sequences. in a flag-agnostic time series. Each hour, when a new consensus is published, we add a new data point—“online” or “offline”—to each Tor re- 4Note that this analysis is “memoryless” and includes relays that have been online before; unlike the analysis above that considered only lay’s sequence. We visualize all sequences in a bitmap previously unobserved relays, for which the median number was six. whose rows represent consensuses and whose columns

6 1174 25th USENIX Security Symposium USENIX Association tor ID is derived from the onion service’s public key, a time stamp, and additional information. All HSDirs are public, making it possible to determine at which posi- tion in the DHT an onion service will end up at any point in the future. Attackers can exploit the ability to pre- the DHT position by repeatedly generating identity Figure 7: The uptime matrix for 3,000 Tor relays for keys until their fingerprint is sufficiently close to the tar- all of November 2012. Rows represent consensuses and geted onion service’s index, thus becoming its HSDir [4, columns represent relays. Black pixels mean that a relay § V.A]. was online, and white means offline. Red blocks denote We detect relays that change their fingerprint fre- relays with identical uptime. quently by maintaining a lookup table that maps a relay’s IP address to a list of all fingerprints we have seen it use. represent relays. Each pixel denotes the uptime status We sort the lookup table by the relays that changed their of a particular relay at a particular hour. Black pixels fingerprints the most, and output the results. Note that mean that the relay was online and white pixels mean reboots or newly assigned IP addresses are not an issue that the relay was offline. This type of visualization was for this technique—as long as relays do not lose their first proposed by Ensafi and subsequently implemented long-term keys that are stored on their hard drive, their by Fifield [12]. fingerprint stays the same. Of particular importance is how the uptime sequences are sorted. If highly correlated sequences are not adja- 4.3.4 Nearest-neighbor ranking cent in the visualization, we might miss them. We sort sequences using single-linkage clustering, a type of hier- We frequently found ourselves in a situation where ex- archical clustering algorithm that forms groups bottom- itmap discovered a malicious exit relay and we were left up, based on the minimum distance between group mem- wondering if there were similar, potentially associated bers. For our distance function, similar to Andersen et relays. Looking for such relays involved tedious manual al. [1, § II.B], we use Pearson’s correlation coefficient work, which we soon started to automate. We needed because it tells us if two uptime sequences change to- an algorithm for nearest-neighbor ranking that takes as gether. The sample correlation coefficient r yields a input a “seed” relay and creates as output a list of all re- value in the interval [ 1,1]. A coefficient of 1 denotes lays, ranked by their similarity to the seed relay. We de- − − perfect anti-correlation (relay R1 is only online when re- fine similarity as shared configuration parameters such as lay R2 is offline) and 1 denotes perfect correlation (relay port numbers, IP addresses, exit policies, or bandwidth R1 is only online when relay R2 is online). We define our values. Our algorithm ranks relays by comparing these distance function as d(r)=1 r, so two perfectly cor- configuration parameters. − related sequences have a distance of zero while two per- To quantify the similarity between two relays, we fectly anti-correlated sequences have a distance of two. use the Levenshtein distance [18], a distance metric that Once all sequences are sorted, we color five or more ad- takes as input two strings and determines the minimum jacent sequences in red if their uptime sequence is iden- number of modifications—insert, delete, and modify— tical. Figure 7 shows an example of our visualization al- that are necessary to turn string s2 into s1. Our algorithm gorithm, the uptime matrix for a subset of all Tor relays turns the router statuses and descriptors of two relays into in November 2012. strings and determines their Levenshtein distance. As an example, consider a simple representation consisting of 4.3.3 Fingerprint analysis the concatenation of nickname, IP address, and port. To turn string s2 into s1, six operations are necessary; four The information a Tor client needs to connect to an onion modifications (green) and two deletions (red): service is stored in a DHT that consists of a subset of all s1: Foo10.0.0.19001 Tor relays, the onion service directories (HSDirs). As s2: Bar10.0.0.2549001 of June 2016, 47% of all Tor relays serve as HSDirs. A Our algorithm determines the Levenshtein distance daily-changing set of six HSDirs the contact infor- between a “seed” relay and all other relays in a consen- mation of any given onion service. Tor clients contact sus. It then ranks the calculated distances in ascending one of these six HSDirs to request information about the order. For a consensus consisting of 6,525 relays, our al- onion service they intend to connect to. A HSDir be- gorithm takes approximately 1.5 seconds to finish.5 Note comes responsible for an onion service if the difference between its relay fingerprint and the service’s descriptor 5We measured on an Intel Core i7-3520M CPU at 2.9 GHz, a ID is smaller than that of any other relay. The descrip- consumer-grade CPU.

7 USENIX Association 25th USENIX Security Symposium 1175 that we designed our ranking algorithm to assist in man- with an impersonation domain, presumably hosted by ual analysis. Unlike the other analysis techniques, it does the attacker. Interestingly, the impersonation domains not require a threshold. shared a prefix with the original. For example, the do- main sigaintevyh2rzvw.onion was replaced with the im- personation domain sigaintz7qjj3val.onion whose first 5 Evaluation and results seven digits are identical to the original. The attacker could create shared prefixes by repeatedly generating key Equipped with sybilhunter, we applied our techniques to pairs until the hash over the public key resembled the de- nine years of archived Tor network data. We did not set sired prefix. Onion domains are generated by determin- any thresholds, to capture every single churn value, fin- ing the SHA-1 hash over the public key, truncating it to gerprint, and uptime sequence, resulting in an unfiltered its 80 most significant bits, and encoding it in Base32. dataset of several megabytes of CSV files and uptime Each Base32 digit of the 16-digit-domain represents five images. We then sorted this dataset in descending or- bits. Therefore, to get an n-digit prefix in the onion do- der by severity, and began manually analyzing the most main, 25n 1 operations are required on average. For the significant incidents, e.g., the largest churn values. In − seven-digit prefix above, this results in 25 7 1 = 234 op- Section 5.1, we begin by characterizing Sybil groups we · − erations. The author of scallion [30], a tool for gener- discovered that way. Instead of providing an exhaustive ating vanity onion domains, determined that an nVidia list of all potential Sybils, we focus on our most salient Quadro K2000M, a mid-range laptop GPU, is able to findings—relay groups that were either clearly malicious generate 90 million hashes per second. On this GPU, or distinguished themselves otherwise.6 Afterwards, we a partial collision for a seven-digit prefix can be found explore the impact of sybilhunter’s thresholds in Sec- in 234 1 190 seconds, i.e., just over three min- tions 5.2 to 5.6. · 90,000,000  utes. Once we discovered a seemingly harmful Sybil group, We inspected some of the phishing domains and found we reported it to The Tor Project. To defend against that the attackers further replaced the original Bitcoin ad- Sybil attacks, directory authorities can either remove a dresses with addresses that are presumably controlled by relay from the consensus, or take away its Valid flag, the attackers, enabling them to hijack Bitcoin transac- which means that the relay is still in the consensus, but tions. As a result, we believe that the attack was finan- Tor clients will not consider it for their first or last hop in cially motivated. a circuit. The majority of directory authorities, i.e., five out of nine, must agree on either strategy. This mecha- nism is meant to distribute the power of removing relays The “redirect” Sybils These relays all had the Exit into the hands of a diverse set of people in different ju- flag and tampered with HTTP redirects of exit traffic. risdictions. To protect their users’ login credentials, some Bitcoin sites would redirect users from their HTTP site to the encrypted HTTPS version. This Sybil group tampered 5.1 Sybil characterization with the redirect and directed users to an impersonation site, resembling the original Bitcoin site, probably to Table 2 shows the most interesting Sybil groups we iden- steal credentials. We only observed this attack for Bit- tified. The columns show (i) what we believe to be the coin sites, but cannot rule out that other sites were not purpose of the Sybils, (ii) when the Sybil group was at attacked. its peak size, (iii) the ID we gave the Sybils, (iv) the Interestingly, the Sybils’ descriptors and consensus number of Sybil fingerprints at its peak, (v) the analysis entries had less in common than other Sybil groups. techniques that could discover the Sybils, and (vi) a short They used a small set of different ports, Tor versions, description. The analysis techniques are abbreviated as bandwidth values, and their nicknames did not exhibit “N” (Neighbor ranking), “F” (Fingerprint), “C” (Churn), an easily-recognizable pattern. In fact, the only reason “U” (Uptime), and “E” (exitmap). We now discuss the why we know that these Sybils belong together is be- most insightful incidents in greater detail. cause their attack was identical. We discovered three Sybil groups that implemented The “rewrite” Sybils These recurring Sybils hijacked the redirect attack, each of them beginning to surface Bitcoin transactions by rewriting Bitcoin addresses in re- when the previous one got blocked. The initial group layed HTML. All relays had the Exit flag and replaced first showed up in May 2014, with only two relays, but onion domains found in a ’s HTTP response slowly grew over time, until it was finally discovered in January 2015. We believe that these Sybils were run by 6Our datasets and visualizations are available online, and can be inspected for an exhaustive set of potential Sybils. The URL is the same attacker because their attack was identical. https://nymity.ch/sybilhunting/. It is possible that this Sybil group was run by the same

8 1176 25th USENIX Security Symposium USENIX Association Purpose Peak activity Group ID Number NeighborFingerprintChurnUptimeExitmapDescription

MitM Jan 2016 rewrite∗ 42 E Replaced onion domains with impersonation site. Nov 2015 rewrite∗ 8 E Replaced onion domains with impersonation site. Jun 2015 rewrite∗ 55 E Replaced onion domains with impersonation site. Apr 2015 rewrite∗ 71 U,E Replaced onion domains with impersonation site. Mar 2015 redirect† 24 E Redirected users to impersonated site. Feb 2015 redirect† 17 E Redirected users to impersonated site. Jan 2015 redirect† 26 E Redirected users to impersonated site.

Botnet Mar 2014 default — N Likely a Windows-powered botnet. The group fea- tures wide geographical distribution, which is uncom- mon for typical Tor relays. Oct 2010 trotsky 649 N The relays were likely part of a botnet. They appeared gradually, and were all running Windows.

Unknown Jan 2016 cloudvps 61 C,U Hosted by Dutch hoster XL Services. Nov 2015 11BX1371 150 C,U All relays were in two /24 networks and a single relay had the Exit flag. Jul 2015 DenkoNet 58 U Hosted on Amazon AWS and only present in a single consensus. No relay had the Exit flag. Jul 2015 cloudvps 55 C,U All relays only had the Running and Valid flag. As their name suggests, the relays were hosted by the Dutch hoster “CloudVPS.” Dec 2014 Anonpoke 284 C,U The relays did not have the Exit flag and were re- moved from the network before they could get the HSDir flag. Dec 2014 FuslVZTOR 246 C,U The relays showed up only hours after the LizardNSA incident.

DoS Dec 2014 LizardNSA 4,615 C,U A group publicly claimed to be responsible for the at- tack [24]. All relays were hosted in the Google cloud and The Tor Project removed them within hours.

Research May 2015 fingerprints 168 F All twelve IP addresses, located in the same /24, changed their fingerprint regularly, presumably in an attempt to manipulate the distributed hash table. Mar 2014 FDCservers 264 C,U Relays that were involved in an experimental onion service deanonymization attack [8]. Feb 2013 AmazonEC2 1,424 F,C,U We observed 1,424 relay fingerprints on 88 IP ad- dresses. These Sybils were likely part of a research project [4, § V]. Jun 2010 planetlab 595 C,U According to a report from The Tor Project [20], a re- searcher started these relays to learn more about scal- ability effects.

Table 2: The most salient Sybil groups that sybilhunter and our exitmap modules discovered. We believe that groups marked with the symbols and † were run by the same operator, respectively. Note that sybilhunter was unable to ∗ detect six Sybil groups in the category “MitM.” attackers that controlled the “rewrite” group but we have adversary was upstream of the exit relay, or gained con- no evidence to support that hypothesis. Interestingly, trol over these relays. only our exitmap module was able to spot these Sybils. The relays joined the network gradually over time and The “FDCservers” Sybils Attackers used these Sybils had little in common in their configuration, which is why to deanonymize onion service users, as discussed by our heuristics failed. In fact, we cannot rule out that the The Tor Project in a July 2014 blog [8]. Sup-

9 USENIX Association 25th USENIX Security Symposium 1177 default every day and plot the result in Figure 8. Note that we trotsky might overestimate the numbers as our filter could cap- ture unrelated relays. The above suggests that some of the “default” relays are running without the owner’s knowledge. While the relays do not fit the pattern of Sefnit (a.k.a. Mevade) [26] 0 100 200 300 Number of Sybil relays and Skynet [27]—two pieces of that use an Jan 2008 Jan 2010 Jan 2012 Jan 2014 Jan 2016 onion service as command and control server—we be- lieve that the “default” relays constitute a botnet. Time

Figure 8: The number of “default” and “trotsky” Sybil The “trotsky” Sybils Similar to the “default” group, members over time. the “trotsky” relays appear to be part of a botnet. Most of the relays’ IP addresses were located in Eastern Eu- posedly, CMU/SEI-affiliated researchers were executing rope, in particular in Slovenia, Croatia, and Bosnia and a traffic confirmation attack by sending sequences of Herzegovina. The relays were all running on Windows, RELAY_EARLY and RELAY cells as a signal down the cir- in version 0.2.1.26, and listening on port 443. Most of cuit to the client, which the reference implementation the relays were configured as exits, and The Tor Project never does [8, 7]. The attacking relays were both onion assigned some of them the BadExit flag. service directories and guards, allowing them to control The first “trotsky” members appeared in September both ends of the circuit for some Tor clients that were 2010. Over time, there were two relay peaks, reaching fetching onion service descriptors. Therefore, the re- 139 (September 23) and 219 (October 3) relays, as illus- lays could tell for a fraction of Tor users what onion trated in Figure 8. After that, only 1–3 relays remained service they were intending to visit. Most relays were in the consensus. running FreeBSD, used Tor in version 0.2.4.18-rc, had identical flags, mostly identical bandwidth values, and The “Amazon EC2” Sybils The relays all used were located in 50.7.0.0/16 and 204.45.0.0/16. All of randomly-generated nicknames, consisting of sixteen or these shared configuration options made the relays easy seventeen letters and numbers; Tor in version 0.2.2.37; to identify. GNU/Linux; and IP addresses in Amazon’s EC2 net- The relays were added to the network in batches, pre- block. Each of the 88 IP addresses changed its finger- sumably starting in October 2013. On January 30, 2014, print 24 times, but not randomly: the fingerprints were the attackers added 58 relays to the 63 existing ones, giv- chosen systematically, in a small range. For example, re- ing them control over 121 relays. On July 8, 2014, The lay 54.242.248.129 had fingerprints with the prefixes 8D, Tor Project blocked all 123 IP addresses that were run- 8E, 8F, and 90. The relays were online for 48 hours. Af- ning at the time. ter 24 hours, most of the relays obtained the HSDir flag. This behavior appears to be a clear attempt to manipulate The “default” Sybils This group, named after the Tor’s DHT. Sybils’ shared nickname “default,” has been around since We believe that this Sybil group was run by Biryukov, September 2011 and consists of Windows-powered re- Pustogarov, and Weinmann as part of their Security lays only. We extracted relays by filtering consensuses and Privacy 2013 paper “Trawling for Tor Hidden Ser- for the nickname “default,” port 443, and vices” [4]—one of the few Sybil groups that were likely directory port 9030. The group features high IP address run by academic researchers. churn. For October 2015, we found “default” relays in 73 countries, with the top three countries being Ger- many (50%), Russia (8%), and Austria (7%). The ma- The “Anonpoke” Sybils All relays shared the nick- jority of these relays had little uptime and exhibited a name “Anonpoke” and were online for four hours un- diurnal pattern, suggesting that they were powered off til they were rejected. All relays were hosted by a VPS regularly—as it often is the case for desktop computers provider in the U.S., Rackspace, with the curious excep- and laptops. tion of a single relay that was hosted in the UK, and run- To get a better understanding of the number of “de- ning a different Tor version. The relays advertized the fault” relays over time, we analyzed all consensuses, ex- default bandwidth of 1 GiB/s on port 9001 and 9030. All tracting the number of relays whose nickname was “de- relays were middle relays and running as directory mir- fault,” whose onion routing port was 443, and whose di- ror. All Sybils were configured to be an onion service rectory port was 9030. We did this for the first consensus directory, but did not manage to get the flag in time.

10 1178 25th USENIX Security Symposium USENIX Association The “PlanetLab” Sybils A set of relays that used a variation of the strings “planet”, “plab”, “pl”, and “plan- etlab” as their nickname. The relays’ exit policy allowed ports 6660–6667, but they did not get the Exit flag. The Churn rate Sybils were online for three days and then removed by

The Tor Project, as mentioned in a blog post [20]. The 0.00 0.10 0.20 blog post further says that the relays were run by a re- Exit V2Dir Fast Valid Guard HSDir Stable searcher to learn more about “cloud computing and scal- Relay flags ing effects.” Figure 9: The churn distribution for seven relay flags. The “LizardNSA” Sybils All relays were hosted in We removed values greater than the plot whiskers. the Google Cloud and only online for ten hours, until the directory authorities started to reject them. The majority of machines were middle relays (96%), but the attack- 5.3 Churn rate analysis ers also started some exit relays (4%). The Sybils were set up to be onion service directories, but the relays were We determined the churn rate between two subsequent taken offline before they could earn the HSDir flag. If all consensuses for all 72,061 consensuses that were pub- relays would have obtained the HSDir flag, they would lished between October 2007 and January 2016. Consid- have constituted almost 50% of all onion service directo- ering that (i) there are 162 gaps in the archived data, that ries; the median number of onion service directories on (ii) we created time series for joining and leaving relays, December 26 was 3,551. and that (iii) we determined churn values for all twelve Shortly after the attack began, somebody claimed re- relay flags, we ended up with (72,061 162) 2 12 = − · · sponsibility on the tor-talk mailing list [24]. Judging by 1,725,576 churn values. Figure 9 shows a box plot for the supposed attacker’s demeanor, the attack was mere the churn distribution (joining and leaving churn values mischief. concatenated) for the seven most relevant relay flags. We removed values greater than the plot whiskers (which The “FuslVZTOR” Sybils All machines were mid- extend to values 1.5 times the interquartile range from dle relays and hosted in the netblock 212.38.181.0/24, the box) to better visualize the width of the distribu- owned by a UK VPS provider. The directory authorities tions. Unsurprisingly, relays with the Guard, HSDir, and started rejecting the relays five hours after they joined the Stable flag experience the least churn, probably because network. The relays advertized the default bandwidth of relays are only awarded these flags if they are particu- 1 GiB/s and used randomly determined ports. The Sybils larly stable. Exit relays have the most churn, which is were active in parallel to the “LizardNSA” attack, but surprising given that exit relays are particularly sensitive there is no reason to believe that both incidents were re- to operate. Interestingly, the median churn rate of the lated. network has steadily decreased over the years, from 0.04 in 2008 to 0.02 in 2015. Figure 10 illustrates churn rates for five days in Au- 5.2 Alerts per method gust 2008, featuring the most significant anomaly in our Having investigated the different types of alerts our data. On August 19, 822 relays left the network, result- methods raised, we now provide intuition on how many ing in a sudden spike, and a baseline shift. The spike of these alerts we would face in practice. To this end, was caused by the Tor network’s switch from consensus we first determined conservative thresholds, chosen to format version three to four. The changelog says that in yield a manageable number of alerts per week. For net- version four, routers that do not have the Running flag work churn, we set the threshold for αn for relays with are no longer listed in the consensus. the Valid flag to 0.017. For the fingerprint method, we To alleviate the choice of a detection threshold, we raised an alert if a relay changed its fingerprint at least ten plot the number of alerts (in log scale) in 2015 as the times per month, and for uptime visualizations we raised threshold increases. We calculate these numbers for an alert if at least five relays exhibited an identical up- three simple moving average window sizes. The result time sequence. We used a variety of analysis windows to is shown in Figure 11. Depending on the window size, achieve representative results. For example, the Tor net- thresholds greater than 0.012 seem practical considering work’s churn rate slowly reduced over the years, which that 181 alerts per year average to approximately one is why we only analyzed 2015 and 2016. Table 3 shows alert in two days—a tolerable number of incidents to in- the results. For comparison, the table also shows our ex- vestigate. Unfortunately, we are unable to determine the itmap modules, which did not require any thresholds. false positive rate because we do not have ground truth.

11 USENIX Association 25th USENIX Security Symposium 1179 Method Analysis window Threshold Total alerts Alerts per week Fingerprint 10/2007–01/2016 10 551 1.3 Churn 01/2015–01/2016 0.017 110 1.9 Uptimes 01/2009–01/2016 5 3,052 8.3 Exitmap 08/2014–01/2016 — 251 3.2

Table 3: The number of alerts our methods raised. We used different analysis windows for representative results, and chose conservative thresholds to keep the number of alerts per week manageable. Churn rate -0.4 0.0 0.4 Aug 16 Aug 17 Aug 18 Aug 19 Aug 20 Aug 21

Time Figure 12: In June 2010, a researcher started several hun- Figure 10: In August 2008, an upgrade in Tor’s consen- dred Tor relays on PlanetLab [20]. The image shows the sus format caused the biggest anomaly in our dataset. uptime of 2,000 relays for all of June. The positive time series represents relays that joined and the negative one represents relays that left.

1 hour 12 hours 24 hours Alerts (log) 20 100 1000

0.010 0.014 0.018 0.022 Figure 13: August 2012 featured a curious “step pattern,” caused by approximately 100 Sybils. The image shows Threshold the uptime of 2,000 relays for all of August.

Figure 11: The number of alerts (in log scale) in 2015 as the detection threshold increases, for three smoothing window sizes. height of the Sybil block, indicating that the relays were only online for a short time. Figure 13 features a curious “step pattern” for approx- 5.4 Uptime analysis imately 100 relays, all of which were located in Russia We generated relay uptime visualizations for each month and Germany. The relays appeared in December 2011, since 2007, resulting in 100 images. We now discuss and started exhibiting the diurnal step pattern (nine hours a subset of these images, those containing particularly uptime followed by fifteen hours downtime) in March interesting patterns. 2012. All relays had similar nicknames, consisting of Figure 12 shows June 2010, featuring a clear “Sybil eight seemingly randomly-generated characters. In April block” in the center. The Sybils belonged to a researcher 2013, the relays finally disappeared. who, as documented by The Tor Project [20], started Figure 14 illustrates the largest Sybil group to date, several hundred Tor relays on PlanetLab for research comprising 4,615 Tor relays (the “LizardNSA” Sybils on scalability (the “PlanetLab” Sybils discussed above). discussed above). An attacker set up these relays in the Our manual analysis could verify this. The relays were Google cloud in December 2014. Because of its magni- easy to identify because their nicknames suggested that tude, the attack was spotted almost instantly, and The Tor they were hosted on PlanetLab, containing strings such Project removed the offending relays only ten hours after as “planetlab,” “planet,” and “plab.” Note the small they appeared.

12 1180 25th USENIX Security Symposium USENIX Association 5.6 Accuracy of nearest-neighbor ranking

Given a Sybil relay, how good is our nearest-neighbor ranking at finding the remaining Sybils? To answer this question, we now evaluate our algorithm’s accuracy, Figure 14: In December 2014, an attacker started sev- which we define as the fraction of neighbors it correctly eral thousand Tor relays in the Google cloud. The image labels as Sybils. For example, if eight out of ten Sybils shows the uptime of 4,000 relays for all of December. are correctly labeled as neighbors, the accuracy is 0.8. A sound evaluation requires ground truth, i.e., relays that are known to be Sybils. All we have, however, are re- lays that we believe to be Sybils. In addition, the number of Sybils we found is only a lower bound—we are un- likely to have detected all Sybil groups. Therefore, our evaluation is doomed to overestimate our algorithm’s ac- curacy because we are unable to test it on the Sybils we Observed fi ngerprints

10 50 200 1,000 did not discover.

0 200 400 600 800 We evaluate our ranking algorithm on two datasets; the “bad exit” Sybil groups from Table 5, and relay families. IP addresses (0.03 percentile) We chose the bad exit Sybils because we observed them running identical, active attacks, which makes us confi- Figure 15: The number of observed fingerprints for the dent that they are in fact Sybils. Recall that a relay family 1,000 relays that changed their fingerprints the most. is a set of Tor relays that is controlled by a single - tor, but configured to express this mutual relationship in the family members’ configuration file. Therefore, relay 5.5 Fingerprint anomalies families are benign Sybils. As of January 2016, approx- imately 400 families populate the Tor network, ranging We determined how often all Tor relays changed their in size from only two to 25 relays. fingerprint from 2007 to 2015. Figure 15 illustrates the We evaluate our algorithm by finding the nearest number of fingerprints (y axis) we have observed for the neighbors of a family member. Ideally, all neighbors 1,000 Tor relays (x axis) that changed their fingerprint the are family members, but the use of relay families as most. All these relays changed their fingerprint at least ground truth is very likely to overestimate results because ten times. Twenty-one relays changed their fingerprint family operators frequently configure their relays iden- more than 100 times, and the relay at the very right end tically on purpose. At the time of this writing, a pop- of the distribution changed its fingerprint 936 times. This ular relay family has the nicknames “AccessNow000” relay’s nickname was “openwrt,” suggesting that it was to “AccessNow009,” adjacent IP addresses, and identi- a home router that was rebooted regularly, presumably cal contact information—perfect prerequisites for our al- losing its long-term keys in the process. The relay was gorithm. We expect the operators of malicious Sybils, running from August 2010 to December 2010. however, to go out of their way to obscure the relation- Figure 15 further contains a peculiar plateau, shown ship between their relays. in the shaded area between index 707 and 803. This To determine our algorithm’s accuracy, we used all re- plateau was caused by a group of Sybils, hosted in Ama- lay families that were present in the first consensus that zon EC2, that changed their fingerprint exactly 24 times was published in October 2015. For each relay that had (the “Amazon EC2” Sybils discussed above). Upon in- at least one mutual family relationship, we determined its spection, we noticed that this was likely an experiment n 1 nearest neighbors where n is the family size. Basi- − for a Security and Privacy 2013 paper on deanonymizing cally, we evaluated how good our algorithm is at find- Tor onion services [4, § V]. ing the relatives of a family member. We determined We also found that many IP addresses in the netblock the accuracy—a value in [0,1]—for each family mem- 199.254.238.0/24 frequently changed their fingerprint. ber. The result is shown in Figure 16(b), a distribution of We contacted the owner of the address block and were accuracy values. told that the block used to host VPN services. Appar- Next, we repeated the evaluation with the bad exit ently, several people started Tor relays and since the VPN Sybil groups from Table 5. Again, we determined the service would not assign permanent IP addresses, the Tor n 1 nearest neighbors of all bad exit relays, where n is − relays would periodically change their address, causing the size of the Sybil group. The accuracy is the fraction the churn we observe. of relays that our algorithm correctly classified as neigh-

13 USENIX Association 25th USENIX Security Symposium 1181 consensuses takes approximately one and two minutes, respectively—easy to invoke daily, or even several times a day. 0.0 0.4 0.8 0.0 0.4 0.8

CDF of Sybil groups CDF of Sybil groups 6 Discussion 0.0 0.4 0.8 0.0 0.4 0.8

Accuracy Accuracy Having used sybilhunter in practice for several months, we now elaborate on both our operational experience and (a) Bad exit relay Sybils (b) Benign family Sybils the shortcomings we encountered. Figure 16: ECDF for our two evaluations, the bad exit Sybils in Fig. 16(a) and the benign family Sybils in 6.1 Operational experience Fig. 16(b). Our practical work with sybilhunter taught us that an- alyzing Sybils frequently requires manual verification, Method Analysis window Run time e.g., (i) comparing an emerging Sybil group with a pre- Churn Two consensuses 0.2s viously disclosed one, (ii) using exitmap to send decoy ∼ Neighbor ranking One consensus 1.6s traffic over Sybils, or (iii) sorting and comparing infor- ∼ Fingerprint One month 58.0s mation in relay descriptors. We found that the amount of ∼ Uptimes One month 145.0s manual work greatly depends on the Sybils under inves- ∼ tigation. The MitM groups in Table 2 were straightfor- Table 4: The computational cost of our analysis tech- ward to spot—in a matter of minutes—while the botnets niques. required a few hours of effort. It is difficult to predict all analysis scenarios that might arise in the future, so we designed sybilhunter to be interoperable with Unix bor. The result is illustrated in Figure 16(a). command line tools [28]. Sybilhunter’s CSV-formatted As expected, our algorithm is significantly more ac- output can easily be piped into tools such as sed, awk, curate for the family dataset—66% of rankings had per- and grep. We found that compact text output was signif- fect accuracy. The bad exit dataset, however, did worse. icantly easier to process, both for plotting and for man- Not a single ranking had perfect accuracy and 59% of all ual analysis. Aside from Sybil detection, sybilhunter can rankings had an accuracy in the interval [0.3,0.6]. Nev- serve as valuable tool to better understand the Tor net- ertheless, we find that our algorithm facilitates manual work and monitor its reliability. Our techniques have analysis given how quickly it can provide us with a list disclosed network consensus issues and can illustrate the of the most similar relays. Besides, inaccurate results diversity of Tor relays, providing empirical data that can (i.e., similar neighbors that are not Sybils) are cheap as support future network design decisions. sybilhunter users would not spend much time on neigh- A key issue in the arms race of eliminating harmful re- bors that bear little resemblance to the “seed” relay. lays lies in information asymmetry. Our detection tech- niques and code are freely available while our adver- 5.7 Computational cost saries operate behind closed doors, creating an uphill bat- tle that is difficult to sustain given our limited resources. Fast techniques lend themselves to being run hourly, for In practice, we can reduce this asymmetry and limit our every new consensus, while slower ones must be run less adversaries’ knowledge by keeping secret sybilhunter’s frequent. Table 4 gives an overview of the runtime of our thresholds and exitmap’s detection modules, so our ad- 7 methods. We stored our datasets on a solid state drive versary is left guessing what our tools seek to detect. to eliminate I/O as performance bottleneck. This differentiation between an open analysis framework The table columns contain, from left to right, our anal- such as the one we discuss in this paper, and secret con- ysis technique, the technique’s analysis window, and how figuration parameters seems to be a sustainable trade-off. long it takes to compute its output. Network churn cal- Note that we are not arguing in favor of the flawed prac- culation is very fast; it takes as input only two consensus tice of security by obscurity. Instead, we are proposing to files and can easily be run for every new network con- add a layer of obscurity on top of existing defense layers. sensus. Nearest-neighbor ranking takes approximately We are working with The Tor Project on incorporating 1.6 seconds for a single consensus counting 6,942 relays. our techniques in Tor Metrics [33], a website containing Fingerprint and uptime analysis for one month worth of network visualizations that are frequented by numerous 7We determined all performance numbers on an Intel Core i7- volunteers. Many of these volunteers discover anomalies 3520M CPU at 2.9 GHz, a consumer-grade CPU. and report them to The Tor Project. By incorporating

14 1182 25th USENIX Security Symposium USENIX Association our techniques, we hope to benefit from “crowd-sourced” groups sybilhunter discovered, we found that (i) Sybil Sybil detection. relays are frequently configured very similarly, and join and leave the network simultaneously; (ii) attackers dif- 6.2 Limitations fer greatly in their technical sophistication; and (iii) our techniques are not only useful for spotting Sybils, but In Section 4.2, we argued that we are unable to expose turn out to be a handy analytics tool to monitor and bet- all Sybil attacks, so our results represent a lower bound. ter understand the Tor network. Given the lack of a cen- An adversary unconstrained by time and money can add tral identity-verifying authority, it is always possible for an unlimited number of Sybils to the network. Indeed, well-executed Sybil attacks to stay under our radar, but Table 2 contains six Sybil groups that sybilhunter was we found that a complementary set of techniques can go unable to detect. Fortunately, exitmap was able to ex- a long way towards finding malicious Sybils, making the pose these Sybils, which emphasizes the importance of Tor network more secure and trustworthy for its users. diverse and complementary analysis techniques. Need- All our code, data, visualizations, and an open ac- less to say, sybilhunter works best when analyzing at- cess bibliography of our references are available online tacks that took place before we built sybilhunter. Adver- at https://nymity.ch/sybilhunting/. saries that know of our methods can evade them at the cost of having to spend time and resources. To evade Acknowledgments our churn and uptime heuristics, Sybils must be added and modified independently over time. Evasion of our We want to thank our shepherd, Tudor Dumitra¸s, for fingerprint heuristic, e.g., to manipulate Tor’s DHT, re- his guidance on improving our work. We also want to quires more physical machines. Finally, manipulation of thank Georg Koppen, Prateek Mittal, Stefan Lindskog, our neighbor ranking requires changes in configuration. the Tor developers, and the wider Tor community for This arms race is unlikely to end, barring fundamental helpful feedback. This research was supported in part by changes in how Tor relays are operated. the Center for Information Technology Policy at Prince- Sybilhunter is unable to ascertain the purpose of a ton University and by the National Science Foundation Sybil attack. While the purpose is frequently obvious, Awards CNS-1540055 and CNS-1602399. Table 2 contains several Sybil groups that we could not classify. In such cases, it is difficult for The Tor Project to make a call and decide if Sybils should be removed from References the network. Keeping them runs the risk of exposing [1] David G. Andersen et al. “Topology Inference from BGP Rout- users to an unknown attack, but removing them deprives ing Dynamics”. In: Internet Measurement Workshop. ACM, the network of bandwidth. Often, additional context is 2002. URL: https : / / nymity . ch / sybilhunting / pdf / helpful in making a call. For example, Sybils that are (i) Andersen2002a.pdf (cit. on p. 7). operated in “bulletproof” autonomous systems [17, § 2], [2] Kevin Bauer and Damon McCoy. No more than one server per (ii) show signs of not running the Tor reference imple- IP address. Mar. 2007. URL: https : / / gitweb . torproject . org / torspec.git/tree/proposals/109-no-sharing-ips.txt (cit. on p. 3). mentation, or (iii) spoof information in their router de- [3] Kevin Bauer et al. “Low-Resource Routing Attacks Against scriptor all suggest malicious intent. In the end, Sybil Tor”. In: WPES. ACM, 2007. URL: https : / / nymity . ch / groups have to be evaluated case by case, and the ad- sybilhunting/pdf/Bauer2007a.pdf (cit. on p. 3). vantages and disadvantages of blocking them have to be [4] Alex Biryukov, Ivan Pustogarov, and Ralf-Philipp Weinmann. considered. “Trawling for Tor Hidden Services: Detection, Measurement, Finally, there is significant room for improving our Deanonymization”. In: Security & Privacy. IEEE, 2013. URL: https://nymity.ch/sybilhunting/pdf/Biryukov2013a.pdf (cit. on nearest neighbor ranking. For simplicity, our algorithm pp. 2, 7, 9, 10, 13). represents relays as strings, ignoring a wealth of nuances [5] Nikita Borisov. “Computational Puzzles as Sybil Defenses”. In: such as topological proximity of IP addresses, or pre- Peer-to-Peer Computing. IEEE, 2005. URL: https://nymity.ch/ dictable patterns in port numbers. sybilhunting/pdf/Borisov2006a.pdf (cit. on p. 2). [6] George Danezis and Prateek Mittal. “SybilInfer: Detecting Sybil Nodes using Social Networks”. In: NDSS. The Internet 7 Conclusion Society, 2009. URL: https : / / nymity . ch / sybilhunting / pdf / Danezis2009a.pdf (cit. on p. 2). We presented sybilhunter, a novel system that uses di- [7] Roger Dingledine. Did the FBI Pay a University to Attack Tor verse analysis techniques to expose Sybils in the Tor Users? Nov. 2015. URL: https://blog.torproject.org/blog/did- network. Equipped with this tool, we set out to ana- fbi-pay-university-attack-tor-users (cit. on p. 10). lyze nine years of The Tor Project’s archived network [8] Roger Dingledine. Tor security advisory: “relay early” traffic confirmation attack. July 2014. URL: https://blog.torproject. data. We discovered numerous Sybil groups, twenty of org / blog / tor - security - advisory - relay - early - traffic- which we present in this work. By analyzing the Sybil confirmation-attack (cit. on pp. 1, 9, 10).

15 USENIX Association 25th USENIX Security Symposium 1183 [9] Roger Dingledine and Nick Mathewson. Tor Path Specification. [27] nex. Skynet, a Tor-powered botnet straight from Reddit. Dec. URL: https : / / gitweb . torproject . org / torspec . git / tree / path - 2012. URL: https : / / community . rapid7 . com / community / spec.txt (cit. on p. 3). infosec / blog / 2012 / 12 / 06 / skynet - a - tor - powered - botnet - [10] Roger Dingledine, Nick Mathewson, and Paul Syverson. “Tor: straight-from-reddit (cit. on p. 10). The Second-Generation Onion Router”. In: USENIX Security. [28] Rob Pike and Brian W. Kernighan. “Program Design in the USENIX, 2004. URL: https : / / nymity . ch / sybilhunting / pdf / UNIX System Environment”. In: Bell Labs Technical Jour- Dingledine2004a.pdf (cit. on p. 3). nal 63.8 (1983). URL: https : / / nymity . ch / sybilhunting / pdf / [11] John R. Douceur. “The Sybil Attack”. In: Peer-to-Peer Systems. Pike1983a.pdf (cit. on p. 14). 2002. URL: https://nymity.ch/sybilhunting/pdf/Douceur2002a. [29] Flora Rheta Schreiber. Sybil: The true story of a woman pos- pdf (cit. on pp. 1, 2). sessed by 16 separate personalities. Henry Regnery, 1973 (cit. [12] David Fifield. #12813—Look at a bitmap visualization of relay on p. 1). consensus. 2014. URL: https://bugs.torproject.org/12813 (cit. [30] Eric Swanson. GPU-based Onion Hash generator. URL: https: on p. 7). //github.com/lachesis/scallion (cit. on p. 8). [13] P. Brighten Godfrey, Scott Shenker, and Ion Stoica. “Minimiz- [31] The Invisible Internet Project. URL: https://geti2p.net (cit. on ing Churn in Distributed Systems”. In: SIGCOMM. ACM, 2006. p. 2). URL: https://nymity.ch/sybilhunting/pdf/Godfrey2006a.pdf [32] The Tor Project. CollecTor – Your friendly data-collecting ser- (cit. on p. 5). vice in the Tor network. URL: https://collector.torproject.org/ [14] Aaron Johnson et al. “Users Get Routed: Traffic Correlation (cit. on p. 4). on Tor by Realistic Adversaries”. In: CCS. ACM, 2013. URL: [33] The Tor Project. Tor Metrics. URL: https://metrics.torproject. https://nymity.ch/sybilhunting/pdf/Johnson2013a.pdf (cit. on org (cit. on p. 14). p. 1). [34] Kurt Thomas, Chris Grier, and Vern Paxson. “Adapting So- doctor – service that periodically checks the [15] Damian Johnson. cial Spam Infrastructure for Political Censorship”. In: LEET. Tor network for consensus conflicts and other hiccups. URL: USENIX, 2012. URL: https : / / nymity . ch / sybilhunting / pdf / https://gitweb.torproject.org/doctor.git/tree/ (cit. on p. 5). Thomas2012a.pdf (cit. on p. 1). [16] Marc Juarez et al. “A Critical Evaluation of Website Finger- [35] Liang Wang and Jussi Kangasharju. “Real-World Sybil Attacks printing Attacks”. In: CCS. ACM, 2014. URL: https://nymity. in BitTorrent Mainline DHT”. In: Globecom. IEEE, 2012. URL: ch/sybilhunting/pdf/Juarez2014a.pdf (cit. on p. 1). https://nymity.ch/sybilhunting/pdf/Wang2012a.pdf (cit. on [17] Maria Konte, Roberto Perdisci, and Nick Feamster. “ASwatch: p. 1). An AS Reputation System to Expose Bulletproof Hosting [36] Philipp Winter. zoossh – Parsing library for Tor-specific data ASes”. In: SIGCOMM. ACM, 2015. URL: https://nymity.ch/ formats. URL: https://gitweb.torproject.org/user/phw/zoossh. sybilhunting/pdf/Konte2015a.pdf (cit. on p. 15). git/ (cit. on p. 4). [18] Vladimir Iosifovich Levenshtein. “Binary Codes Capable of [37] Philipp Winter et al. “Spoiled Onions: Exposing Malicious Tor Correcting Deletions, Insertions, and Reversals”. In: Soviet Exit Relays”. In: PETS. Springer, 2014. URL: https://nymity. Physics-Doklady 10.8 (1966). URL: https : / / nymity . ch / ch/sybilhunting/pdf/Winter2014a.pdf (cit. on pp. 1, 3, 4). sybilhunting/pdf/Levenshtein1966a.pdf (cit. on p. 7). [38] Haifeng Yu, Phillip B. Gibbons Michael Kaminsky, and Feng A [19] Brian Neil Levine, Clay Shields, and N. Boris Margolin. Xiao. “SybilLimit: A Near-Optimal Social Network Defense Survey of Solutions to the Sybil Attack . Tech. rep. University against Sybil Attacks”. In: Security & Privacy. IEEE, 2008. of Massachusetts Amherst, 2006. URL: https : / / nymity . ch / URL: https : / / nymity . ch / sybilhunting / pdf / Yu2008a . pdf (cit. sybilhunting/pdf/Levine2006a.pdf (cit. on p. 2). on p. 2). [20] Andrew Lewman. June 2010 Progress Report. June 2010. URL : [39] Haifeng Yu et al. “SybilGuard: Defending Against Sybil Attack https://blog.torproject.org/blog/june- 2010- progress- report via Social Networks”. In: SIGCOMM. ACM, 2006. URL: https: (cit. on pp. 9, 11, 12). //nymity.ch/sybilhunting/pdf/Yu2006a.pdf (cit. on p. 2). [21] Frank Li et al. “SybilControl: Practical Sybil Defense with Computational Puzzles”. In: Scalable Trusted Computing. ACM, 2012. URL: https://nymity.ch/sybilhunting/pdf/Li2012a. A Exposed malicious exit relays pdf (cit. on p. 2). [22] Zhen Ling et al. “Tor Bridge Discovery: Extensive Analysis and Table 5 provides an overview of our second dataset, 251 Large-scale Empirical Evaluation”. In: IEEE Transactions on bad exit relays that we discovered between August 2014 Parallel and Distributed Systems 26.7 (2015). URL: https : / / nymity.ch/sybilhunting/pdf/Ling2015b.pdf (cit. on p. 1). and January 2016. We believe that all single relays in the dataset were isolated incidents while sets of relays [23] Zhen Ling et al. “TorWard: Discovery, Blocking, and Trace- back of Malicious Traffic Over Tor”. In: IEEE Transactions on constituted Sybil groups. Sybil groups marked with the Information Forensics and Security 10.12 (2015). URL: https: symbols , †, and ‡ were run by the same attacker, re- ∗ //nymity.ch/sybilhunting/pdf/Ling2015a.pdf (cit. on p. 17). spectively. [24] Lizards. Dec. 2014. URL: https://lists.torproject.org/pipermail/ tor-talk/2014-December/036197. (cit. on pp. 9, 11). [25] . sslstrip. URL: https://moxie.org/software/ sslstrip/ (cit. on p. 17). [26] msft-mmpc. Tackling the Sefnit botnet Tor hazard. Jan. 2014. URL: https://blogs.technet.microsoft.com/mmpc/2014/01/09/ tackling-the-sefnit-botnet-tor-hazard/ (cit. on p. 10).

16 1184 25th USENIX Security Symposium USENIX Association Discovery # of relays Attack description Aug 2014 1 The relay injected JavaScript into returned HTML. The script embedded another script from the domain fluxx.crazytall.com—not clearly malicious, but suspicious. 1 The relay injected JavaScript into returned HTML. The script embedded two other scripts, jquery.js from the official jQuery domain, and clr.js from adobe.flashdst.com. Again, this was not necessarily malicious, but suspicious.

Sep 2014 1 The exit relay routed traffic back into the Tor network, i.e., we observed traffic that was supposed to exit from relay A, but came from relay B. The system presented by Ling et al. behaves the same [23]; the authors proposed to run intrusion detection systems on Tor traffic by setting up an exit relay that runs an NIDS system, and routes the traffic back into the Tor network after having inspected the traffic.

Oct 2014 1 The relay injected JavaScript into returned HTML. 1 The relay ran the MitM tool sslstrip [25], rewriting HTTPS to unencrypted HTTP links in returned HTML. 1 Same as above.

Jan 2015 23∗ .info’s web server redirects its users from HTTP to HTTPS. These relays tampered with blockchain.info’s redirect and returned unprotected HTTP instead—presumably to sniff login cre- dentials. 1 The relay used OpenDNS as DNS resolver and had the website category “proxy/anonymizer” blocked, resulting in several inaccessible websites, including torproject.org.

Feb 2015 1 The relay injected a script that attempted to load a resource from the now inaccessible torclick.net. Curiously, torclick.net’s front page said “We place your advertising materials on all websites online. Your ads will be seen only for anonymous network TOR [sic] users. Now it is about 3 million users. The number of users is always growing.” 17∗ Again, these relays tampered with HTTP redirects of Bitcoin websites. Interestingly, the attack be- came more sophisticated; these relays would begin to target only connections whose HTTP headers resembled Tor Browser.

Mar 2015 18∗ Same as above. 1 The relay injected JavaScript and an iframe into the returned HTML. The injected content was not clearly malicious, but suspicious.

Apr 2015 70† These exit relays transparently rewrote onion domains in returned HTML to an impersonation do- main. The impersonation domain looked identical to the original, but had different Bitcoin ad- dresses. We believe that this was attempt to trick Tor users into sending Bitcoin transactions to phishing addresses.

Jun 2015 55† Same as above.

Aug 2015 4† Same as above.

Sep 2015 1 The relay injected an iframe into returned HTML that would load content that made the user’s browser participate in some kind of mining activity.

Nov 2015 1 The relay ran the MitM tool sslstrip. 8† Same as the relays marked with a †.

Dec 2015 1‡ The relay ran the MitM tool sslstrip. 1‡ Same as above.

Jan 2016 43† Same as the relays marked with a †.

Table 5: An overview of our second dataset, 251 malicious exit relays that we discovered using exitmap. We believe that Sybil groups marked with an , †, and ‡ were run by the same adversary. ∗

17 USENIX Association 25th USENIX Security Symposium 1185