Identifying and Characterizing Sybils in the Tor Network
Total Page:16
File Type:pdf, Size:1020Kb
Identifying and Characterizing Sybils in the Tor Network Philipp Winter, Princeton University and Karlstad University; Roya Ensafi, Princeton University; Karsten Loesing, The Tor Project; Nick Feamster, Princeton University https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/winter This paper is included in the Proceedings of the 25th USENIX Security Symposium August 10–12, 2016 • Austin, TX ISBN 978-1-931971-32-4 Open access to the Proceedings of the 25th USENIX Security Symposium is sponsored by USENIX Identifying and characterizing Sybils in the Tor network † ‡ Philipp Winter∗ Roya Ensafi∗ Karsten Loesing Nick Feamster∗ † ‡ ∗Princeton University Karlstad University The Tor Project Abstract consensus weight—an attacker can observe. As the at- tacker’s consensus weight grows, the following attacks Being a volunteer-run, distributed anonymity network, become easier. Tor is vulnerable to Sybil attacks. Little is known about real-world Sybils in the Tor network, and we lack practi- Exit traffic tampering: When leaving the Tor network, cal tools and methods to expose Sybil attacks. In this a Tor user’s traffic traverses exit relays, the last hop work, we develop sybilhunter, a system for detecting in a Tor circuit. Controlling exit relays, an attacker Sybil relays based on their appearance, such as config- can eavesdrop on traffic to collect unencrypted cre- uration; and behavior, such as uptime sequences. We dentials, break into TLS-protected connections, or used sybilhunter’s diverse analysis techniques to analyze inject malicious content [37, § 5.2]. nine years of archived Tor network data, providing us Website fingerprinting: Tor’s encryption prevents with new insights into the operation of real-world attack- guard relays (the first hop in a Tor circuit) from ers. Our findings include diverse Sybils, ranging from learning their user’s online activity. Ignoring the botnets, to academic research, and relays that hijacked encrypted payload, an attacker can still take ad- Bitcoin transactions. Our work shows that existing Sybil vantage of flow information such as packet lengths defenses do not apply to Tor, it delivers insights into real- and timings to infer what websites Tor users are world attacks, and provides practical tools to uncover visiting [16]. and characterize Sybils, making the network safer for its Bridge address harvesting: Users behind censorship users. systems use private Tor relays—typically called bridges—as hidden stepping stones into the Tor net- 1 Introduction work. It is important that censors cannot obtain all bridge addresses, which is why The Tor Project rate- limits bridge distribution. However, an attacker can In a Sybil attack, an attacker controls many virtual iden- harvest bridge addresses by running a middle relay tities to obtain disproportionately large influence in a net- and looking for incoming connections that do not work. These attacks take many shapes, such as sockpup- originate from any of the publicly known guard re- pets hijacking online discourse [34]; the manipulation of lays [22, § 3.4]. BitTorrent’s distributed hash table [35]; and, most rele- By running both entry guards vant to our work, relays in the Tor network that seek to End-to-end correlation: and exit relays, an attacker can use timing analysis deanonymize users [8]. In addition to coining the term to link a Tor user’s identity to her activity, e.g., learn “Sybil,”1 Douceur showed that practical Sybil defenses that Alice is visiting Facebook. For this attack to are challenging, arguing that Sybil attacks are always work, an attacker must run at least two Tor relays, or possible without a central authority [11]. In this work, be able to eavesdrop on at least two networks [14]. we focus on Sybils in Tor—relays that are controlled by a single operator. But what harm can Sybils do? The effectiveness of many attacks on Tor depends on Configuring a relay to forward more traffic allows an how large a fraction of the network’s traffic—called the attacker to increase her consensus weight. However, the capacity of a single relay is limited by its link band- 1The term is a reference to a book in which the female protagonist, width and, because of the computational cost of cryptog- Sybil, suffers from dissociative identity disorder [29]. raphy, by CPU. Ultimately, increasing consensus weight USENIX Association 25th USENIX Security Symposium 1169 requires an adversary to add relays to the network; we 2 Related work call these additional relays Sybils. In addition to the above attacks, an adversary needs In his seminal 2002 paper, Douceur showed that only a Sybil relays to manipulate onion services, which are TCP central authority that verifies new nodes as they join the servers whose IP address is hidden by Tor. In the current distributed system is guaranteed to prevent Sybils [11]. onion service protocol, six Sybil relays are sufficient to This approach conflicts with Tor’s design philosophy that take offline an onion service because of a weakness in seeks to distribute trust and eliminate central points of the design of the distributed hash table (DHT) that pow- control. In addition, a major factor contributing to Tor’s ers onion services [4, § V]. Finally, instead of being a network growth is the low barrier of entry, allowing op- direct means to an end, Sybil relays can be a side effect erators to set up relays both quickly and anonymously. of another issue. In Section 5.1, we provide evidence for An identity-verifying authority would raise that barrier, what appears to be botnets whose zombies are running alienate privacy-conscious relay operators, and impede Tor relays, perhaps because of a misguided attempt to Tor’s growth. Barring a central authority, researchers help the Tor network grow. have proposed techniques that leverage a resource that is Motivated by the lack of practical Sybil detection difficult for an attacker to scale. Two categories of Sybil- tools, we design and implement heuristics, leverag- resistant schemes turned out to be particularly popular, ing our observations that Sybils (i) frequently go on- schemes that build on social constraints and schemes line and offline simultaneously, (ii) share similarities in that build on computational constraints. For a broad their configuration, and (iii) may change their identity overview of alternative Sybil defenses, refer to Levine fingerprint—a relay’s fingerprint is the hash over its pub- et al. [19]. lic key—frequently, to manipulate Tor’s DHT. Three of Social constraints rely on the assumption that it is diffi- our four heuristics are automated and designed to run cult for an attacker to form trust relationships with honest autonomously while one assists in manual analysis by users, e.g., befriend many strangers on online social net- ranking what relays in the network are the most similar works. Past work leveraged this assumption in systems to a given reference relay. Our evaluation suggests that such as SybilGuard [39], SybilLimit [38], and Sybil- our heuristics differ in their effectiveness; one method Infer [6]. Unfortunately, social graph-based defenses detected only a small number of incidents, but some of do not work in our setting because there is no existing them no other method could detect. Other heuristics pro- trust relationship between relay operators.3 Note that we duced a large number of results, and seem well-suited could create such a relationship by, e.g., linking relays to to spot the “low hanging fruit.” We implemented these their operator’s social networking account, or by creat- heuristics in a tool, sybilhunter, which we subsequently ing a “relay operator web of trust,” but again, we believe used to analyze 100 GiB worth of archived network data, that such an effort would alienate relay operators and see consisting of millions of files, and dating back to 2007. limited adoption. Finally, we characterize the Sybil groups we discovered. Orthogonal to social constraints, computational re- To sum up, we make the following key contributions: source constraints guarantee that an attacker seeking to operate 100 Sybils needs 100 times the computational re- We design and implement sybilhunter, a tool to an- • sources she would have needed for a single virtual iden- alyze past and future Tor network data. While we tity. Both Borisov [5] and Li et al. [21] used compu- designed it specifically for the use in Tor, our tech- tational puzzles for that purpose. Computational con- niques are general in nature and can easily be ap- straints work well in distributed systems where the cost plied to other distributed systems such as I2P [31]. of joining the network is low. For example, a lightweight We characterize Sybil groups and publish our find- client is sufficient to use BitTorrent, allowing even low- • ings as datasets to stimulate future research.2 We end consumer devices to participate. However, this is not find that Sybils run MitM attacks, DoS attacks, and the case in Tor because relay operations require constant are used for research projects. use of bandwidth and CPU. Unlike in many other dis- tributed systems, it is impossible to run 100 Tor relays The rest of this paper is structured as follows. We while not spending the resources for 100 relays. Compu- begin by discussing related work in Section 2 and give tational constraints are inherently tied to running a relay. some background on Tor in Section 3. Section 4 presents In summary, we believe that existing Sybil defenses the design of our analysis tools, which is then followed are ill-suited for application in the Tor network; its dis- by experimental results in Section 5. We discuss our re- tinctive features call for customized solutions that con- sults in Section 6 and conclude the paper in Section 7. 3Relay operators can express in their configuration that their relays 2The datasets are available online at are run by the same operator, but this denotes an intra-person and not https://nymity.ch/sybilhunting/.