Scalable Supernode Selection in Peer-To-Peer Overlay Networks∗
Total Page:16
File Type:pdf, Size:1020Kb
Scalable Supernode Selection in Peer-to-Peer Overlay Networks∗ Virginia Lo, Dayi Zhou, Yuhong Liu, Chris GauthierDickey, and Jun Li {lo|dayizhou|liuyh|chrisg|lijun} @cs.uoregon.edu Network Research Group – University of Oregon Abstract ically fulfill additional requirements such as load bal- ance, resources, access, and fault tolerance. We define a problem called the supernode selection The supernode selection problem is highly challeng- problem which has emerged across a variety of peer-to- ing because in the peer-to-peer environment, a large peer applications. Supernode selection involves selec- number of supernodes must be selected from a huge tion of a subset of the peers to serve a special role. The and dynamically changing network in which neither the supernodes must be well-dispersed throughout the peer- node characteristics nor the network topology are known to-peer overlay network, and must fulfill additional re- a priori. Thus, simple strategies such as random selec- quirements such as load balance, resource needs, adapt- tion don’t work. Supernode selection is more complex ability to churn, and heterogeneity. While similar to than classic dominating set and p-centers from graph dominating set and p-centers problems, the supernode theory, known to be NP-hard problems, because it must selection problem must meet the additional challenge respond to dynamic joins and leaves (churn), operate of operating within a huge, unknown and dynamically among potentially malicious nodes, and function in an changing network. We describe three generic super- environment that is highly heterogeneous. node selection protocols we have developed for peer-to- Supernode selection shows up in many peer-to-peer peer environments: a label-based scheme for structured and networking applications. For example, in peer- overlay networks, a distributed protocol for coordinate- to-peer file sharing systems, such as Kazaa [11] and based overlay networks, and a negotiation protocol for Gnutella [6], protocols were developed for the designa- unstructured overlays. We believe an integrated ap- tion of qualified supernodes (ultrapeers) to serve the or- proach to the supernode selection problem can benefit dinary peers for scalable content discovery. Peer-to-peer the peer-to-peer community through cross-fertilization infrastructure services require methods for the judicious of ideas and sharing of protocols. placement of monitors/landmarks/beacons throughout the physical network for purposes of measurement, po- sitioning, or routing [5, 16, 14]. In ad-hoc wireless net- 1. Introduction works, connectivity under highly dynamic conditions is achieved by identifying a subset of the nodes to serve as bridging nodes. This subset is formed using a dis- We have identified a fundamental problem, which tributed dominating set protocol such as in [22, 2, 23] so we call the supernode selection problem, which occurs that every node is within broadcast range of a bridging across a spectrum of peer-to-peer applications, includ- node. Within sensor networks, supernodes are selected ing file sharing systems, distributed hash tables (DHTs), for the purpose of data aggregation under the conditions publish/subscribe architectures, and peer-to-peer cycle that they are well-distributed among the sensors and also sharing systems. The supernode selection problem also have sufficient remaining battery life [3]. shows up in the fields of sensor networks, ad-hoc wire- less networks, and peer-based Grid computing. Super- In this paper, we demonstrate that many peer-to-peer node selection involves the selection of a subset of the and networking applications are seeking solutions to peers in a large scale, dynamic network to serve a dis- variations on the same fundamental problem. We first tinguished role. The specially selected peers must be define a general model for the supernode selection prob- well-dispersed throughout the network, and must typ- lem, describing its key requirements and challenges. We show how the supernode selection problem is related ∗Supported in part by NSF 9977524 and an NSF Graduate Re- to dominating set and p-centers problems, and describe search Fellowship how the supernode selection problem is instantiated in several well-known peer-to-peer applications. We then Supernode Distribution Criteria. The supernodes present three general purpose supernode selection proto- must be distributed throughout the peer-to-peer overlay cols we have developed for the three basic types of over- network in a topologically sensitive way to meet one or lay networks used by peer-to-peer applications: struc- more of the distribution criteria listed below. tured overlay networks, coordinate-based overlay net- Access: non-supernodes must have low latency ac- works, and unstructured overlay networks. cess to one or more supernodes. Access can be measured SOLE: a label-based supernode selection protocol in hop counts or delay. for applications built on a structured overlay network Dispersal: supernodes must be evenly distributed such as CAN [16], Chord [20] or Pastry [18]. SOLE throughout the overlay network; they should not be clus- exploits the regular node labeling schemes used within tered within only a few subregions of the overlay. structured overlays to distribute supernodes evenly Proportion: a pre-specified global ratio of super- throughout the overlay, to ensure fast access from non- nodes to non-supernodes must be maintained to meet supernodes to supernodes, and to maintain a fixed sup- application-specific performance requirements. ernode to non-supernode ratio as the overlay grows and Load balance: supernodes should not serve more shrinks dynamically. than k non-supernodes, where k can be configured lo- PoPCorn: a protocol for applications that use a cally based on the resource capability of each supernode. coordinate-based overlay network using systems such as Note that these criteria are inter-related in that speci- Vivaldi [1] or GNP [14]. This protocol distributes a fixed fication of requirements for one may impact another, or number of supernodes evenly with respect to the topol- a given application may require multiple criteria. ogy of the overlay network using a repulsion model. Peer-to-Peer Factors. The design of supernode selec- H20: an advertisement-based protocol deployable in tion protocols within a peer-to-peer environment is chal- an unstructured overlay network. This protocol can be lenging because in addition to fulfilling the distribution used to find qualified supernodes from among the peers requirements outlined above, they must also deal with such that every ordinary peer is within k hops of C sup- factors arising from the underlying nature of large scale, ernodes (multiple distance domination). highly dynamic systems. Heterogeneity: Current large-scale peer-based sys- 2. Problem Definition and Background tems consist of a large number of heterogeneous nodes with differing hardware and software resources. A node We first describe the unique requirements and chal- might not be eligible to be a supernode unless it meets lenges of the supernode selection problem in the context certain minimum qualifications. These qualifications in- of peer-to-peer systems. We then describe dominating clude resources, such as CPU power, disk or memory set and p-centers problems from graph theory that form space, or battery life; stability, such as uptime or fault the theoretical underpinnings of the supernode selection tolerance; communication, such as bandwidth or fan- problem. We conclude by surveying examples of appli- out; and safety, such as trust or security. cations from peer-to-peer computing and other network- Adaptability to churn: Peer-to-peer environments are ing domains that call for scalable supernode selection extremely dynamic. Supernode selection protocols must protocols. be able to handle churn and respond quickly, especially when supernodes leave the system. Supernode selection 2.1. The Supernode Selection Problem protocols must also be adaptive to dynamic changes in network traffic and overlay topology. We broadly define the supernode selection problem Resilience and fault tolerance: When a given super- as that of selecting some subset of the peers in a large node dies, other supernodes should quickly take over its scale peer-to-peer overlay network to take a special role, functions or a new supernode should be quickly selected. with the designated supernodes providing service to Security: Supernodes may be vulnerable to denial the non-supernodes. A potentially large and undefined of service attacks, malicious supernodes can disrupt the number of supernodes must be selected from an un- system by failing to forward messages or by giving out known, large scale, and dynamically changing overlay wrong information. network. First, we describe key topological distribution criteria that the supernodes must fulfill relative to the 2.2. Dominating sets, p-centers, and leader elec- non-supernodes. Second, we enumerate characteristics tion of peer-to-peer networks that make supernode selection difficult. A wealth of research in graph theory, location theory, and distributed computing provides a formal foundation for the supernode selection problem. These algorithms were not designed for large scale peer- The basic dominating set problem is the problem of to-peer networks that exhibit a high degree of churn and finding a minimal subset of the vertices in graph G, that are dynamically heterogeneous. called the dominator set, such that every node is either a dominator or adjacent to