Exploiting KAD: Possible Uses and Misuses
Total Page:16
File Type:pdf, Size:1020Kb
Exploiting KAD: Possible Uses and Misuses Moritz Steiner, Taoufik En-Najjary, and Ernst W. Biersack Institut Eurecom Sophia–Antipolis, France {steiner,ennajjar,erbi}@eurecom.fr This article is an editorial note submitted to CCR. It has NOT been peer reviewed. Authors take full responsibility for this article’s technical content. Comments can be posted through CCR Online. ABSTRACT [26] there are quite a few peers that do not follow this rule Kad ID Peer-to-peer systems have seen a tremendous growth in the and change their very frequently. last few years and peer-to-peer traffic makes a major frac- 1.1 Routing Lookup tion of the total traffic seen in the Internet. The dominating kad a application for peer-to-peer is file sharing. Some of the most Routing in is based on prefix matching: Node for- wards a query, destined to a node b, to the node in his rout- popular peer-to-peer systems for file sharing have been Nap- ing table that has the smallest XOR-distance. The XOR- ster, FastTrack, BitTorrent, and eDonkey, each one counting d a, b a b d a, b a ⊕ b a million or more users at their peak time. distance ( ) between nodes and is ( )= . kad It is calculated bitwise on the Kad IDsofthetwonodes, We got interested in , since it is the only DHT that a b d a, b has been part of very popular peer-to-peer system with sev- e.g. the distance between = 1011 and = 0111 is ( )= 1011 ⊕ 0111 = 1100. For details of the implementation see eral million simultaneous users. As we have been studying P kad [27]. The entries in the routing table of a peer point to over the course of the last 18 months we have been P P both, fascinated and frightened by the possibilities kad of- peers that are a various distances from : A peer stores kad only a few contacts to peers that are far away in the ID space fers. Mounting a Sybil attack is very easy in and allows P to compromise the privacy of kad users, to compromise the and increasingly more contacts to peers as we get closer . correct operation of the key lookup and to mount DDOS For details of the implementation see [27]. RoutingtoagivenKad ID is done in an iterative way. with very little resources. To improve robustness against node churn that can result in In this paper, we will relate some of our findings and point out how kad can be used and misused. stale routing table entries and to improve and look-up speed, the requesting peer P runs three parallel routing lookups for a given key at the same time: A peer P first consults his Categories and Subject Descriptors routing table to determine the three peers closest to the Kad H.3 [INFORMATION STORAGE AND RETRIEVAL]: ID. P sends route requests to these three peers, which Systems and Software – Distributed systems may or may not return to P route responses containing new peers even closer to the Kad ID, which are queried by General Terms P in the next step. The routing lookup terminates when the returned peers are further away from the Kad ID than the Algorithms, Security peer returning them. While iterative routing experiences a slightly higher delay Keywords than recursive routing, it offers increased robustness against kad Distributed Hash Table, Sybil attack, peer-to-peer system. message loss and it greatly simplifies crawling the net- work. In kad, a routing lookup will be performed in a first step by both, the publish and the search module. 1. INTRODUCTION TO KAD kad is a Kademlia-based [15] peer-to-peerDHT routing 1.2 Publishing and Searching protocol implemented by several peer-to-peerapplications such A key in a peer-to-peer system is an identifier used to as Overnet [18], eMule [11], and aMule [1]. The two open– retrieve information. In many peer-to-peer systems a key source projects eMule and aMule have the largest number of is typically published on a single peer that is numerically simultaneously connected users since these clients connect closest to that key. In kad,todealwithnodechurn,akey to the eDonkey network, which is a very popular peer-to- is published on ten different peers whose Kad ID agrees at peersystem for file sharing. Recent versions of these clients least in the first 8-bits with the key. This range of Kad implement the kad protocol. IDs around a key that agree in the first 8-bits with the key As in other DHTs, each kad node has a global identifier, is called the tolerance zone. Note that the key is not referred to as Kad ID, which is a 128 bit randomly generated published on the ten peers closest to the key, but simply on identifier. The Kad ID is generated when the client appli- peers whose Kad IDs are in the tolerance zone. To assure cation is started for the first time and is then permanently persistence of the information stored, the owner periodically stored with that client. The Kad ID stays unchanged on republishes the information every 5 or 24 hours, depending subsequent join and leaves of the peer, until the user deletes on the type of information. the application or its preferences file. However, as we show As for the publishing, the search procedure uses the rout- ACM SIGCOMM Computer Communication Review 65 Volume 37, Number 5, October 2007 ing lookup to find the peer(s) closest to the key searched zone Z of the kad network. For this, one needs to introduce for. To increase the robustness of the search in case of stale sybils in the zone Z and to make them known, so that their routing table entries, three searches are launched in parallel. presence is reflected in the routing tables of the regular, i.e. If the first arriving route response contains peers that are non-sybil peers. closer to the destination, immediately new route requests We have developed a light-weight implementation of such are sent. The four most important message types are: a “spy” that is able to create thousands of sybilsonone single physical machine as they do not keep any state about • hello : to check if the other peer is still alive and to the interactions with the regular peers [25]. inform the other peer about one’s existence and the 16 Kad ID When we spy on a 8-bit zone, we introduce 2 sybils: the and IP address. first 8 bit are defined by the zone we spy on, the following • route request/response(kid): To find peers that are 16 bits are different for each sybil. The spy works as follows: closer to the Kad ID kid. • First, crawl a zone Z of the Kad ID space using our • publish request/response: to publish information. crawler to to learn about the peers P currently online whose Kad IDsareinZ. • search request/response(key): to search for infor- mation whose hash is key. • Then, send hello requests to the peers P in order to “poison” their routing tables with entries that point 2. EXPLORING KAD to our sybils. The peers that receive a hello request We have developed our own crawler for kad,withthe will add the sybil to their routing table if the corre- aim to crawl kad frequently and over a duration of several sponding bucket of the routing table is not filled. months. Our crawler runs on a local machine and uses a • Later, when a route request(kid) initiated by regu- simple breadth first search issuing route requests to find P kad lar peer reaches a sybil that request will be answered the peers currently participating in . The speed of our with a set of sybilswhoseKad IDs are closer to the crawler allows us to crawl the entire kad system (entire kid Z Kad ID target in case the falls into the zone and ignored space) in about 8 minutes, which was never done otherwise. before. During a full crawl, we found between 3 and 4.3 million different peers. Between 1.5 and 2 million peers are This way, P has the impression of approaching the tar- not located behind NATs or firewalls and can be directly get. Once P is “close enough” to the target Kad ID, contacted by our crawler. it will initiate a publish request or search request However, to limit the network load and the data volume, also destined to one of our sybil peers. Therefore, for we decided to crawl only a part of the Kad ID space by any route request that reaches one of our sybil peers carrying out a zone crawl on a 8-bit zone, where we try we can be sure that the follow-up publish request or to find all active peers whose Kad IDshavethesame8 search request will also end-up on the same sybil. high-order bits. A zone crawl explores one 256-th of the • Store the content of all the requests received in a database entire Kad ID space and takes less than 2.5 seconds. For for later evaluation. slightly less than 6 months we crawled the same zone every 5 minutes. The detailed results of our crawl are reported in As described in Section 1, a key is published ten times and [24, 26]. for a search three parallel search requests are issued. For We made some surprising findings such as (i) several thou- our spy scheme to work as intended, the optimum would sand kad clients that all had the same Kad ID and (ii) sev- be to attract exactly one copy of every search or publish eral hundred peers with the same sub-net IP addresses and request.