Skipnet: a Scalable Overlay Network with Practical Locality Properties

SkipNet: A Scalable Overlay Network with Practical Locality Properties Nicholas J.A. Harvey∗† John Dunagan∗ Michael B. Jones∗ Stefan Saroiu† Marvin Theimer∗ Alec Wolman∗ Microsoft Research Technical Report MSR-TR-2002-92 While DHTs provide nice load balancing properties, they do so at the price of controlling where data Abstract: Scalable overlay networks such as Chord, is stored. This has at least two disadvantages: Data CAN, Pastry, and Tapestry have recently emerged as flex- may be stored far from its users and it may be stored ible infrastructure for building large peer-to-peer sys- outside the administrative domain to which it be- tems. In practice, such systems have two disadvantages: longs. This paper introduces SkipNet [14, 15], a dis- They provide no control over where data is stored and tributed generalization of Skip Lists [26], adapted to no guarantee that routing paths remain within an ad- meet the goals of peer-to-peer systems. SkipNet is ministrative domain whenever possible. SkipNet is a a scalable overlay network that supports traditional scalable overlay network that provides controlled data overlay functionality as well as two locality prop- placement and guaranteed routing locality by organizing erties that we refer to as content locality and path data primarily by string names. SkipNet allows for both locality. fine-grained and coarse-grained control over data place- Content locality refers to the ability to either ex- ment: Content can be placed either on a pre-determined plicitly place data on specific overlay nodes or dis- node or distributed uniformly across the nodes of a hi- tribute it across nodes within a given organization. erarchical naming subtree. An additional useful con- Path locality refers to the ability to guarantee that sequence of SkipNet’s locality properties is that parti- message traffic between two overlay nodes within tion failures, in which an entire organization disconnects the same organization is routed within that organi- from the rest of the system, can result in two disjoint, but zation only. well-connected overlay networks. Furthermore, SkipNet Content and path locality provide a number of can efficiently re-merge these disjoint networks when the advantages for data retrieval, including improved partition heals. availability, performance, manageability, and security. For example, nodes can store important 1 Introduction data within their organization (content locality) and Scalable overlay networks, such as Chord [34], nodes will be able to reach their data through the CAN [28], Pastry [30], and Tapestry [40], have re- overlay even if the organization becomes discon- cently emerged as flexible infrastructure for build- nected from the rest of the Internet (path locality). ing large peer-to-peer systems. A key function that Storing data near the clients that use it also yields these networks enable is a distributed hash table performance benefits. Placing content onto a spe- (DHT), which allows data to be uniformly diffused cific overlay node—or a well-defined set of over- over all the participants in the peer-to-peer system. lay nodes—enables provisioning of those nodes to reflect demand. Content placement also allows ad- ∗Microsoft Research, Microsoft Corporation, Red- ministrative control over issues such as scheduling mond, WA 98052, {nickhar, jdunagan, mbj, theimer, maintenance for machines storing important data, alecw}@microsoft.com † thus improving manageability. Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, {nickhar, Content locality can improve security by allow- tzoompy}@cs.washington.edu ing one to control the administrative domain in which data resides. Even when encrypted and dig- ing to their name ID ordering, nodes within a single itally signed, data stored on an arbitrary overlay organization gracefully survive failures that discon- node outside the organization is susceptible to de- nect the organization from the rest of the Internet. nial of service (DoS) attacks as well as traffic anal- Furthermore, the organization’s SkipNet segment ysis. Although other techniques for improving the can be efficiently re-merged with the external Skip- resiliency of DHTs to DoS attacks exist [3], content Net when connectivity is restored. In the case of un- locality is a simple, zero-overhead technique. correlated, independent failures, SkipNet has simi- Path locality provides additional security bene- lar resiliency to previous overlay networks [30, 34]. fits to an overlay that supports content locality. Al- The basic SkipNet design, not including its en- though some overlay designs [4] are likely to keep hancements to support constrained load balancing, routing messages within an organization most of network proximity-aware routing, reduced overhead the time, none guarantee path locality. For ex- for virtual nodes, or merge algorithms, has been ample, without such a guarantee the route from concurrently and independently invented by Aspnes explorer.ford.com to mustang.ford.com could pass and Shah [1]. As described in Section 2, their work through camaro.gm.com, a scenario that people at has a substantially different focus than our work ford.com might prefer to prevent. With path local- and the two efforts are complementary to each other ity, nodes requesting data within their organization while still starting from the same underlying inspi- traverse a path that never leaves the organization. ration. This example also illustrates that path locality can The rest of this paper is organized as follows: be desirable even in a scenario where no content is Section 2 describes related work, Section 3 de- being placed on nodes. scribes SkipNet’s basic design, Section 4 discusses Controlling content placement is in direct tension SkipNet’s locality properties, Section 5 presents en- with the goal of a DHT, which is to uniformly dis- hancements to the basic design, Section 6 presents tribute data across a system in an automated fashion. the ring merge algorithms, Section 7 discusses de- A significant contribution of this paper is the con- sign alternatives to SkipNet, Section 8 presents a cept of constrained load balancing, which is a gen- theoretical analysis of SkipNet, Section 9 presents eralization that combines these two notions: Data is an experimental evaluation, and Section 10 con- uniformly distributed across a well-defined subset cludes the paper. of the nodes in a system, such as all nodes in a single organization, all nodes residing within a given 2 Related Work building, or all nodes residing within one or more data centers. A large number of peer-to-peer overlay net- SkipNet supports efficient message routing be- work designs have been proposed recently, tween overlay nodes, content placement, path local- such as CAN [28], Chord [34], Freenet [6], ity, and constrained load balancing. It does so by Gnutella [12], Kademlia [23], Pastry [30], employing two separate, but related address spaces: Salad [10], Tapestry [40], and Viceroy [22]. Skip- a string name ID space as well as a numeric ID Net is designed to provide the same functionality space. Node names and content identifier strings as existing peer-to-peer overlay networks, and ad- are mapped directly into the name ID space, while ditionally to provide improved content availability hashes of the node names and content identifiers are through explicit control over content placement. mapped into the numeric ID space. A single set of One key feature provided by systems such as routing pointers on each overlay node enables effi- CAN, Chord, Pastry, and Tapestry is scalable cient routing in either address space and a combina- routing performance while maintaining a scalable tion of routing in both address spaces provides the amount of routing state at each node. By scalable ability to do constrained load balancing. routing paths we mean that the expected number A useful consequence of SkipNet’s locality prop- of forwarding hops between any two communicat- erties is resiliency against a common form of Inter- ing nodes is small with respect to the total num- net failure. Because SkipNet clusters nodes accord- ber of nodes in the system. Chord, Pastry, and 2 Tapestry scale with log N, where N is the system Net, enabling, among other things, constrained load size, while maintaining log N routing state at each balancing. overlay node. CAN scales with D · N 1/D, where D Aspnes and Shah [1] have independently in- is a dimensionality parameter with a typical value of vented the same basic data structure that defines a 6, while maintaining an amount of per-node routing SkipNet, which they call a Skip Graph. Beyond state proportional to D. that, they investigate questions that are mostly or- A second key feature of these systems is that they thogonal to those addressed in this paper. In par- are able to route to destination addresses that do ticular, they describe and analyze different search not equal the address of any existing node. Each and insertion algorithms and they focus on formal message is routed to the node whose address is characterization of Skip Graph invariants. In con- “closest” to that specified in the destination field trast, our work is focused primarily on the con- of a message; we interchangeably use the terms tent and path locality properties of the design, and “route” and “search” to mean routing to the clos- we describe several extensions that are important est node to the specified destination. This feature in building a practical system: network proximity- enables implementation of a distributed hash table aware routing is obtained by means of two auxiliary (DHT) [13], in which content is stored at an over- routing tables; constrained load balancing is sup- lay node whose node ID is closest to the result of ported through a combination of searches in both applying a collision-resistant hash function to that the string name and numeric address spaces that content’s name (i.e.

Load more