Kademlia: a Peer-To-Peer Information System Based on the XOR Metric

Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Maymounkov and David Mazieres` fpetar,[email protected] http://kademlia.scs.cs.nyu.edu Abstract we validate with measurements of existing peer-to- peer systems). We describe a peer-to-peer system which has prov- Kademlia takes the basic approach of many peer- able consistency and performance in a fault-prone to-peer systems. Keys are opaque, 160-bit quantities environment. Our system routes queries and locates (e.g., the SHA-1 hash of some larger data). Partici- nodes using a novel XOR-based metric topology that pating computers each have a node ID in the 160-bit simplifies the algorithm and facilitates our proof. key space. hkey,valuei pairs are stored on nodes with The topology has the property that every message IDs “close” to the key for some notion of closeness. exchanged conveys or reinforces useful contact in- Finally, a node-ID-based routing algorithm lets any- formation. The system exploits this information to one locate servers near a destination key. send parallel, asynchronous query messages that tol- Many of Kademlia’s benefits result from its use of erate node failures without imposing timeout delays a novel XOR metric for distance between points in on users. the key space. XOR is symmetric, allowing Kadem- lia participants to receive lookup queries from pre- 1 Introduction cisely the same distribution of nodes contained in their routing tables. Without this property, systems This paper describes Kademlia, a peer-to-peer such as Chord [5] do not learn useful routing infor- hkey,valuei storage and lookup system. Kadem- mation from queries they receive. Worse yet, be- lia has a number of desirable features not simulta- cause of the asymmetry of Chord’s metric, Chord neously offered by any previous peer-to-peer sys- routing tables are rigid. Each entry in a Chord node’s tem. It minimizes the number of configuration mes- finger table must store the precise node proceeding sages nodes must send to learn about each other. an interval in the ID space; any node actually in the Configuration information spreads automatically as interval will be greater than some keys in the inter- a side-effect of key lookups. Nodes have enough val, and thus very far from the key. Kademlia, in knowledge and flexibility to route queries through contrast, can send a query to any node within an in- low-latency paths. Kademlia uses parallel, asyn- terval, allowing it to select routes based on latency or chronous queries to avoid timeout delays from failed even send parallel asynchronous queries. nodes. The algorithm with which nodes record each To locate nodes near a particular ID, Kademlia other’s existence resists certain basic denial of ser- uses a single routing algorithm from start to finish. vice attacks. Finally, several important properties of In contrast, other systems use one algorithm to get Kademlia can be formally proven using only weak near the target ID and another for the last few hops. assumptions on uptime distributions (assumptions Of existing systems, Kademlia most resembles Pas- try’s [1] first phase, which (though not described this This research was partially supported by National Science Foun- way by the authors) successively finds nodes roughly dation grants CCR 0093361 and CCR 9800085. half as far from the target ID by Kademlia’s XOR metric. In a second phase, however, Pastry switches 1 1 distance metrics to the numeric difference between IDs. It also uses the second, numeric difference met- 0.8 ric in replication. Unfortunately, nodes close by the second metric can be quite far by the first, creating 0.6 discontinuities at particular node ID values, reduc- ing performance, and frustrating attempts at formal 0.4 analysis of worst-case behavior. 0.2 0 2 System description 0 500 1000 1500 2000 2500 Each Kademlia node has a 160-bit node ID. Node Figure 1: Probability of remaining online another IDs are constructed as in Chord, but to simplify this hour as a function of uptime. The x axis represents paper we assume machines just choose a random, minutes. The y axis shows the the fraction of nodes 160-bit identifier when joining the system. Every that stayed online at least x minutes that also stayed message a node transmits includes its node ID, per- online at least x + 60 minutes. mitting the recipient to record the sender’s existence if necessary. these lists k-buckets. Each k-bucket is kept sorted by Keys, too, are 160-bit identifiers. To publish and time last seen—least-recently seen node at the head, find hkey,valuei pairs, Kademlia relies on a notion of most-recently seen at the tail. For small values of i, distance between two identifiers. Given two 160-bit the k-buckets will generally be empty (as no appro- identifiers, x and y, Kademlia defines the distance priate nodes will exist). For large values of i, the lists between them as their bitwise exclusive or (XOR) can grow up to size k, where k is a system-wide repli- interpreted as an integer, d(x; y) = x ⊕ y. cation parameter. k is chosen such that any given k We first note that XOR is a valid, albeit non- nodes are very unlikely to fail within an hour of each Euclidean, metric. It is obvious that that d(x; x) = 0, other (for example k = 20). d(x; y) > 0 if x 6= y, and 8x; y : d(x; y) = d(y; x). When a Kademlia node receives any message (re- XOR also offers the triangle property: d(x; y) + quest or reply) from another node, it updates the d(y; z) ≥ d(x; z). The triangle property follows appropriate k-bucket for the sender’s node ID. If from the fact that d(x; z) = d(x; y) ⊕ d(y; z) and the sending node already exists in the recipient’s k- 8a ≥ 0; b ≥ 0 : a + b ≥ a ⊕ b. bucket, the recipient moves it to the tail of the list. Like Chord’s clockwise circle metric, XOR is uni- If the node is not already in the appropriate k-bucket directional. For any given point x and distance ∆ > and the bucket has fewer than k entries, then the re- 0, there is exactly one point y such that d(x; y) = cipient just inserts the new sender at the tail of the ∆. Unidirectionality ensures that all lookups for the list. If the appropriate k-bucket is full, however, then same key converge along the same path, regardless the recipient pings the k-bucket’s least-recently seen of the originating node. Thus, caching hkey,valuei node to decide what to do. If the least-recently seen pairs along the lookup path alleviates hot spots. Like node fails to respond, it is evicted from the k-bucket Pastry and unlike Chord, the XOR topology is also and the new sender inserted at the tail. Otherwise, symmetric (d(x; y) = d(y; x) for all x and y). if the least-recently seen node responds, it is moved to the tail of the list, and the new sender’s contact is 2.1 Node state discarded. Kademlia nodes store contact information about k-buckets effectively implement a least-recently each other to route query messages. For each seen eviction policy, except that live nodes are never 0 ≤ i < 160, every node keeps a list of removed from the list. This preference for old con- hIP address; UDP port; Node IDi triples for nodes of tacts is driven by our analysis of Gnutella trace data distance between 2i and 2i+1 from itself. We call collected by Saroiu et. al. [4]. Figure 1 shows the 2 percentage of Gnutella nodes that stay online another nodes it has chosen. α is a system-wide concurrency hour as a function of current uptime. The longer parameter, such as 3. a node has been up, the more likely it is to remain In the recursive step, the initiator resends the up another hour. By keeping the oldest live contacts FIND NODE to nodes it has learned about from pre- around, k-buckets maximize the probability that the vious RPCs. (This recursion can begin before all nodes they contain will remain online. α of the previous RPCs have returned). Of the k A second benefit of k-buckets is that they pro- nodes the initiator has heard of closest to the tar- vide resistance to certain DoS attacks. One cannot get, it picks α that it has not yet queried and re- flush nodes’ routing state by flooding the system with sends the FIND NODE RPC to them.1 Nodes that new nodes. Kademlia nodes will only insert the new fail to respond quickly are removed from consider- nodes in the k-buckets when old nodes leave the sys- ation until and unless they do respond. If a round tem. of FIND NODEs fails to return a node any closer than the closest already seen, the initiator resends 2.2 Kademlia protocol the FIND NODE to all of the k closest nodes it has not already queried. The lookup terminates when the The Kademlia protocol consists of four RPCs: PING, initiator has queried and gotten responses from the k STORE, FIND NODE, and FIND VALUE. The PING closest nodes it has seen. When α = 1 the lookup al- RPC probes a node to see if it is online. STORE in- gorithm resembles Chord’s in terms of message cost structs a node to store a hkey; valuei pair for later and the latency of detecting failed nodes.

Load more