Real-World Sybil Attacks in Bittorrent Mainline DHT

Real-World Sybil Attacks in BitTorrent Mainline DHT Liang Wang Jussi Kangasharju Dept. of Computer Science Helsinki Institute for Information Technology University of Helsinki University of Helsinki Helsinki, Finland Helsinki, Finland Abstract—Distributed hash tables (DHT) are a key building We present two possible routing table attacks on MLDHT • block for modern P2P content-distribution system, for example system: horizontal and vertical attack, both based on in implementing the distributed tracker of BitTorrent Mainline attacking the routing tables of nodes in the system. DHT. DHTs, due to their fully distributed nature, are known to We analyze the damages to the system caused by such be vulnerable to certain kinds of attacks and different kinds of • defenses have been proposed against these attacks. In this paper, attacks and their potential to violate user privacy. we consider two kinds of attacks on a DHT, one already known Through extensive measurements, we discover that both • attack and one new kind of an attack, and show how they can of these attacks are on-going in current MLDHT and be targeted against Mainline DHT. We complement them by an report on their detailed implementation and discuss the extensive measurement study using honeypots which shows that both attacks have been going on for a long time in the network implications of this. We discuss how existing security solutions could be and are still happening. We present numbers showing that the • number of sybils in the Mainline DHT network is increasing and applied to MLDHT in order to make it less vulnerable to is currently around 300,000. We analyze the potential threats attacks. from these attacks and propose simple countermeasures against This paper is structured as follows. Section II reviews the them. background on BitTorrent Mainline DHT. In Section III we present the two attacks and discuss the possible damage on I. INTRODUCTION the system. In Section IV we discuss the real-world attacks we BitTorrent is the dominant P2P software for file sharing found and analyze their potential impact. Section V discusses and content distribution. Originally BitTorrent was based on possible defenses against the attacks. We discuss related work a centralized tracker, where one tracker would host one or in Section VI, and conclude our paper in Section VII. more swarms with each swarm corresponding to one file being distributed. More recently, distributed variants of BitTorrent II. BACKGROUND ON MAINLINE DHT have been developed. In these variants, the tracker function- In the BitTorrent system, to join a swarm, a peer needs to ality has been spread over the peers in the system using a get meta information first. In standard BitTorrent, the meta distributed hash table (DHT). Popular DHT-based BitTorrent information can be obtained from the torrent file, which also variants include VUZE and Mainline DHT (MLDHT). contains a list of centralized trackers to help a peer get the In this paper we focus on MLDHT and its security prob- initial peer set to bootstrap the download. lems. DHTs have long since been known to have inherent Partly due to legal issues, but also based on improving security issues, e.g., [1] and our work in this paper shows the service availability and system robustness, distributed how MLDHT has failed to take these issues into account. trackers have been developed. BitTorrent has two independent, Effective lack of protection on MLDHT implies that it is very incompatible distributed tracker implementations, even though easy for an attacker to monitor large fractions of traffic on the both are based on the Kademlia DHT [2]. One is VUZE [3] system or even hijack large parts of the system. Due to these and the other is MLDHT. vulnerabilities of MLDHT, it turns out that even with very MLDHT implements the minimum functionality of Kadem- modest resources (a few computers), a determined attacker lia. In MLDHT, both peers and content each have a 160-bit can effectively make a very large-scale attack. We base these string as its ID. Content IDs are also known as infohashes. claims on the two forms of attack we present in this paper and A peer uses this infohash to obtain the meta information and on our measurement data, which shows that both attacks are initial peer set. MLDHT supports four control messages: being performed on a large scale in MLDHT. However, we are 1) PING: probe a node’s availability. If the node fails to only able to observe the presence of these attacks, but not their respond for some period of time, it will be purged out actual impact; our work only shows the potential for damage of the routing table. but we have not observed any actual malice happening on the 2) FIND_NODE: given a target ID, this message is used to network. find the K closest neighbors of the ID. Specifically, the contributions of this paper are as follows: 3) GET_PEERS: given an infohash, get the initial peer set. Peer 11 Peer 11 Peer 11 Peer 95 A (Peer 29) Peer 95 Peer 29 Peer 95 Peer 29 4 Peer 33 Find Node Peer 33 Peer 33 C (Peer 82) BT Protocol Peer 82 Peer 82 (29,78) 1 1 I am 78 Get Peers 2 Announce Peer Find Node (82,59) Announce Peer (82,57) Peer 78 (29,59) Peer 36 Peer 78 Peer 36 Peer 78 (29,59) Peer 36 Get Peers 1 Find Node 3 2 (82,59) I am 57 (43,65) BT Protocol 3 I am 65 2 Sybil Peer 43 Sybil Peer 43 Peer 71 Peer 71 Peer 43 Peer 71 Sybil Sybil 4 Sybil Peer 65 Peer 51 Peer 65 Peer 51 Peer 65 Peer 51 B (Peer 57) Peer 57 Peer 57 (a) Normal operations (b) Horizontal attack (c) Vertical attack Fig. 1: Illustration of normal operations and two kinds of attacks in MLDHT 4) ANNOUNCE_PEER: a peer announces it belongs to a foreach incoming message M do swarm. switch M do Figure 1a illustrates normal operation in MLDHT. Suppose case PING PONG with random ID we have 3 nodes A, B and C. A holds a file with infohash FIND NODE x = 59. Assume B is responsible for storing x its peer set. case Reply as the owner of the queried ID Node C wants to download the file. case GET PEERS First, A should publish the file by storing x at B. A will Save the infohash and keep silent call GET_PEERS iteratively to get closer and closer to B, and otherwise finally reaches it. Then, A will use ANNOUNCE_PEER to tell Drop M B he is downloading (sharing) a file with infohash x. B then endsw stores A’s contact information in the corresponding peer set endsw for x. Since A is the publisher, it is the only one in the peer end set at the moment. Algorithm 1: Horizontal Attack When A sends the GET_PEERS messages, two possibilities emerge. If the queried node knows this infohash already and stores some peers in the corresponding peer set, it will respond A. Horizontal Attack with the peer set. If it does not know the infohash, it will Horizontal attack spreads sybils widely across the system. k respond with the closest nodes to the infohash in its routing The aim is to pollute as many routing tables as possible. The A B table. In such a way, will get closer and closer to , and number of sybils in one routing table is not the concern; the B finally reach and the search finishes. goal is to pollute the maximum number of routing tables. A For C to download the file, it should get x first. It will do successful horizontal attack can let the attacker sniff most of exactly the same as A did before by using GET_PEERS to the the control messages and therefore hijack the system. approach B. Since B already saved the peer set for x, C can The k bucket mechanism implies that an attacker can − obtain the initial peer set from B. C joins the swarm, sets up effectively intercept messages if he has at least one sybil connections to the peers in peer set, and gets metadata (torrent- among a node’s k closest neighbors. It means an attacker only file) from other peers using BitTorrent extension protocols [4], N needs to inject k 1 sybils. However, we already know that the [5]. Then the download process starts. average number− of hops in routing a message in the system is O(log(N)). Because k is usually less than 8, O(log(N)) will III. ATTACKS &DAMAGES eventually become bigger than k as the network size grows. Sybil attack is a well-known attack originally introduced by Thus the minimum sybils needed to perform a successful N Douceur [6]. The idea is to inject multiple fake identities into horizontal attack is in fact max(k,log(N)) . the system, and use them as a starting point to perform further The next question arisen is how much resources we need to attacks. It can also be considered as routing table attack, which perform the attack. The most straightforward way is running tries to pollute users’ routing table by inserting sybils into it. one node instance for each sybil, which obviously requires lots In practice, there are two major strategies to perform routing of computation and traffic resources. However, by exploiting table attacks, and we name them as horizontal attack and the MLDHT protocol, this can be done with very limited vertical attack based on their characteristics. In this section, amount of physical resources. Figure 1b and Algorithm 1 we show how these two attacks work, and how to combine show how horizontal attack works in MLDHT.

Load more