A Robust P2P DHT-Based System for Replicated Objects

DistHash: A robust P2P DHT-based system for replicated objects Ciprian Dobre*. Florin Pop*, Valentin Cristea* *Computer Science Department, Faculty of Automatic Control and Computes Science, University POLITEHNICA of Bucharest (e-mails: {ciprian.dobre, florin.pop, valentin.cristea}@cs.pub.ro) Abstract: Over the Internet today, computing and communications environments are significantly more complex and chaotic than classical distributed systems, lacking any centralized organization or hierarchical control. There has been much interest in emerging Peer-to-Peer (P2P) network overlays because they provide a good substrate for creating large-scale data sharing, content distribution and application-level multicast applications. In this paper we present DistHash, a P2P overlay network designed to share large sets of replicated distributed objects in the context of large-scale highly dynamic infrastructures. We present original solutions to achieve optimal message routing in hop-count and throughput, provide an adequate consistency approach among replicas, as well as provide a fault-tolerant substrate. published data objects; b) peers can join or leave the network 1. INTRODUCTION at any time, without prior knowledge; c) the underlying Peer-to-peer (P2P) systems have been widely adopted for network infrastructure can be different than the adopted communication technologies, distributed system models, communication scheme. The rest of this paper is organized as applications, platforms, etc. Such systems describe not a follows. In Section 2 we present related work. In Section 3 particular model, architecture or technology, but rather group and 4 we provide an extensive presentation of the concepts a set of concepts and mechanisms for decentralized and algorithms used by the proposed system. In Section 5 we distributed computing and direct P2P information and data present several case scenarios and in Section 6 we conclude communication. and present future work. Currently there has been much interest in P2P network 2. RELATED WORK overlays (distributed systems that are not under any hierarchical organization or centralized control) because they DHTs (Distributed Hash Tables) are decentralized distributed provide (to some extent) a long list of features: selection of systems that partition the keys inserted into the hash table nearby peers, redundant storage, efficient search/location of among the participating nodes. They usually form a data items, data permanence or guarantees, hierarchical structured overlay network in which each communicating naming, security. P2P networks can potentially offer an node is connected to a small number of other nodes. Any efficient self-organized, massively scalable and robust DHT could be easily turned into a system offering routing architecture in the context of wide-area networks, communication services. combining fault tolerance, load balancing and explicit notion The Content Addressable Network (CAN) (Ratnasamy, et al, of locality. 2001) is a distributed decentralized P2P infrastructure that DHT-based systems (Lua, et al, 2004) are an important class provides hash-table functionality on Internet-like scale. CAN of P2P routing infrastructures. They support the rapid is designed to be scalable, fault-tolerant, and self-organizing. development of a wide variety of Internet-scale applications The architectural design is a virtual multi-dimensional ranging from distributed file and naming systems to Cartesian coordinate space on a multi-torus. It routes application-layer multicast. They also enable scalable, wide- messages in a d-dimensional space, where each node area retrieval of shared information. However, current DHT maintains a routing table with O(d) entries and any node can implementations face several problems. They assume that all be reached in O(dN1/d) routing hops. Unlike other solutions, peers equally participate in hosting published data objects or the routing table does not grow with the network size, but the their location information, an assumption that can lead to a number of routing hops increases faster than logN. Still, bottleneck at low-capacity peers. Also, in real-world P2P CAN requires an additional maintenance protocol to systems peers join the network according to loose rules, periodically remap the identifier space onto nodes. without any prior knowledge of the underlying topology. Pastry (Rowstron& Druschel, 2001) is a scalable, distributed In this we present an original DST-based P2P system, object location and routing scheme based on a self-organizing DistHash, that optimally shares sets of distributed objects in overlay network of nodes connected to the Internet. Pastry highly dynamic large scale infrastructures. The system is performs application-level routing and object location in a based on the implementation of a DHT (distributed hash potentially very large overlay network of nodes connected via table) in which a) peers do not equally participate in hosting the Internet. It can support a variety of peer-to-peer applications, including global data storage, data sharing, and RAgents. Thus, the RAgents connect together several group communication and naming. Pastry is completely connected clusters of Agents. Such a hierarchical decentralized, scalable, and self-organizing; it automatically interconnection topology ensures an optimum control scheme adapts to the arrival, departure and failure of nodes. and best reflects the real-world phenomenon of large Internet-scale systems. As demonstrated in (Barabasi, et al, Sharing similar properties as Pastry, Tapestry (Zhao, et al, 2000), networks as diverse as the Internet tend to organize 2004) employs decentralized randomness to achieve both themselves so that most peers have few links while a small load distribution and routing locality. The difference between number of peers have a large number of links. Pastry and Tapestry is the handling of network locality and data object replication. Tapestry’s architecture uses variant of the distributed search technique, with additional mechanisms to provide availability, scalability, and adaptation in the presence of failures and attacks. The Chord protocol (Stoica, et al, 2003) uses consistent hashing to assign keys to its peers. Consistent hashing is designed to let peers enter and leave the network with minimal interruption. It is closely related to both Pastry and Tapestry, but instead of routing towards nodes that share successively longer address prefixes with the destination, Figure 1. DistHash’s architecture. Chord forwards messages based on numerical difference with the destination address. Although Chord adapts efficiently as In addition we assume the existence of several nodes join or leave the system, unlike Pastry and Tapestry, it lookup&discovery services located at different locations. makes no explicit effort to achieve good network locality. These services are used to store information regarding current existing RAgents. The Kademlia (Maymounkov& Mazieres, 2002) P2P decentralized overlay network assigns each peer a NodeID in The modality to connect the peers into clusters is based on the 160-bit key space, and (key,value) pairs are stored on the geographic position of each peer, as well as a set of peers with IDs close to the key. A NodeID-based routing condition metrics. The idea is based on our previous work algorithm is used to locate peers near a destination key. One (Dobre, et al, 2007), where we presented an algorithm of the key architecture of Kademlia is the use of a novel XOR designed to assist the EVO (a P2P videoconference system) metric for distance between points in the key space. XOR is peers to dynamically detect the best reflectors to which to symmetric and it allows peers to receive lookup queries from connect to. The algorithm considers that the best reflectors precisely the same distribution of peers contained in their are chosen based on their network and geographic location routing tables. Kademlia can send a query to any peer within (Network domain, AS domain, Country, Continent), and also an interval, allowing it to select routes based on latency or based on their current load values, number of currently send parallel asynchronous queries. It uses a single routing connected clients, current network traffic. algorithm throughout the process to locate peers near a particular ID. In our architecture we too consider the existence of several LUSes from where a newly entered Agent reads the current The Viceroy (Malkhi, et al, 2002) P2P decentralized overlay list of RAgents (network address and port on which to network is designed to handle the discovery and location of connect). It then tries to estimate the best RAgent to which to data and resources in a dynamic butterfly fashion. Viceroy connect. In this case, the optimum RAgent is chosen based on employs consistent hashing to distribute data so that it is the network and geographic position. But the newly entered balanced across the set of servers and resilient to servers Agent will also request the number of other Agents connected joining and leaving the network. It utilizes the DHT to to several of the closest RAgents and will use this to compute manage the distribution of data among a changing set of a metric such that, in the end, the Agent will connect to one servers and allowing peers to contact any server in the of the closest RAgent having the smaller number of Agents network to locate any stored resource by name. Its diameter connected. This scheme was previously introduced in

Load more