Routing and Caching on Dhts

UNIVERSITA` DEGLI STUDI DI PADOVA Facoltàdi Ingegneria Corso di Laurea Magistrale in Ingegneria Informatica Routing and Caching on DHTs Laureanda: Federica Bogo Relatore: Prof. Enoch Peserico A.A. 2009-2010 2 CONTENTS 1. Introduction ::::::::::::::::::::::::::::::::: 5 2. Routing and caching on P2P networks: an overview ::::::::::: 7 2.1 Modeling P2P traffic . .7 2.2 Unstructured P2P networks . .9 2.2.1 Routing algorithms . 10 2.2.2 Replication and caching approaches . 10 2.3 Structured P2P networks . 12 2.3.1 Distributed Hash Tables . 12 2.3.2 Routing algorithms . 14 2.3.3 Replication and caching approaches . 17 2.4 Conclusions . 21 3. Caching effectiveness analysis for structured P2P networks ::::::: 25 3.1 Generalized model . 25 3.1.1 Mathematical background . 27 3.2 Routing assumptions . 28 3.3 Lower bound analysis . 28 3.3.1 The adversary model . 29 3.3.2 Routing table with 1-buckets . 29 3.3.3 Routing table with k-buckets . 31 3.3.4 Application to common DHTs . 32 3.4 A specific case: Kademlia . 32 3.4.1 Main features of the Kademlia network . 33 3.4.2 Probabilistic analysis . 36 3.5 Conclusions . 40 3.5.1 Extending the analysis . 40 3.5.2 Necessariety of our assumptions . 41 4. Cache values guided approach ::::::::::::::::::::::: 43 4.1 Network geometry . 43 4.1.1 The hypercube graph . 44 4.1.2 Hypercubic P2P networks . 47 4.2 A first intuitive approach . 49 4.2.1 Request load and routing load . 49 4.2.2 The forwarding value and adaptive routing concepts . 51 4.2.3 Routing and caching efficiency . 52 4.2.4 Simple comparison against Kademlia . 55 4 Contents 4.2.5 Introduction of inner nodes . 57 4.3 Our overlay . 62 4.3.1 DHT organization . 62 4.3.2 Joining and leaving nodes . 66 4.4 Our routing algorithm . 69 4.4.1 Lookup operations . 72 4.4.2 Caching mechanism . 72 4.5 Performance analysis . 75 4.6 Conclusions . 76 5. Simulations ::::::::::::::::::::::::::::::::: 79 5.1 Experimental setup and main results . 79 5.2 Results evaluation . 83 6. Conclusions and open problems :::::::::::::::::::::: 85 1. INTRODUCTION Currently, traffic generated by P2P systems has become a major portion of the Internet traffic, and it is still increasing. Due to the primary importance of this phenomenon, numerous studies addressing the problem of designing efficient P2P overlays have been proposed in the last few years. The earlier proposed approaches offered schemes that nowadays are normally labelled as unstructured P2P networks: in these models, no precise control is held over object placement and flooding search protocols are mostly employed; whereas such networks generally provide good self-organization capabilities and are easy to manage in distributed and evolving environments, they may suffer in terms of scalability and load balancing. In order to address these problems, structured P2P networks (usually represented by Distributed Hash Tables - DHTs) have been introduced. Structured networks use specialized placement algorithms to assign responsibility for each object to specific nodes, and well defined directed search protocols to efficiently locate objects; moreover, they adopt ad-hoc strategies to improve load balancing among nodes, for example by using cryptographic hashes to spread the mapping between objects and locations. However, such techniques generally lean on a problematic assumption: all the objects stored within the network are considered to have an equal popularity (that is, no object is supposed to be searched with higher frequency than others). In contrast, a real environment may present a very different situation: generally, relatively few highly popular objects are requested most of the times. If this more common skewed access distribution is not adequately handled, a heavy lookup traffic load can easily arise at the peers responsible for popular objects and at the intermediary nodes that route queries to those peers. In such a way, few individual nodes become easily overloaded, compromising the load balancing properties that DHTs want to achieve. The peak of this phenomenon is reached with the so called hotspots: hotspots take place when a large number of peers wishes to simultaneously access data from a very small set of nodes (or even from a single node), causing these target nodes to be swamped. Note that, however, this skewed popularity could also bring some favorable opportunities: through years, various caching and replication mechanisms have been proposed, in order to achieve a substantial reduction of the search cost for popular objects and to balance the workload of the whole network. In our work, we will show that, despite all the countermeasures adopted through years for dealing with heavy bias and fluctuation in objects popularity, the problem of overloaded nodes can still arise in modern DHTs. Interestingly, we identify the main causes of the persistency of this problem with two peculiar characteristics introduced in structured P2P networks: first, the 6 1. Introduction substantial lack of flexibility in DHTs data placement and routing, that appear too strictly predetermined; second, the distinction between KBR (Key Based Routing) layer and "storing" layer. Routing schemes seem to be totally unaware of the content cached by any intermediate node chosen as "next hop": therefore, finding the requested value within an intermediary node's cache becomes a purely random variable. After this "pars destruens", in which the main flaws of the existing protocols are depicted, a "pars construens" needs to be proposed; the challenge, here, is to design a novel, alternative overlay, which maintains a structured and efficient scheme but also introduces two features: a strict correlation between routing and caching (the latter must directly "guide" the former) and a randomized lookup algorithm. In a network of N nodes, we require that our overlay guarantees the following properties above all: • each node experiences a load (i.e., number of received queries) O(log N); • the size of each node's cache is kept O(1); • the complexity of finding the requested value (in terms of number of hops) is equal to log N or, at most, polylog N. Our aim is to develop a robust scheme, able to maintain those characteristics in any possible networking configuration, even an unusual and highly destructive one. Such situations can arise due to unexpected failures of nodes belonging to the network or, in particular, due to ad-hoc performed attacks. A simple example is a Denial of Service (DoS) attack that, generating an unanticipated, massive, rapid increasing in the number of requests targeting a limited and well defined set of nodes, causes these nodes to incur unbearably high loads, to dramatically degrade their performances and, eventually, to go down. Our P2P structure must take into account these anomalous conditions and develop adequate (and adaptive) mechanisms, in order to achieve robustness and resilience. In this work, we'll propose and analyze two different caching algorithms for our overlay; in order to formally prove their robustness, we'll try to test them against an adaptive adversary that can choose the sender and the target of any issued query. The rest of this work faces in a deeper and more articulate manner the analysis we've briefly outlined in this Introduction. In Chapter 2 we try to depict the "state of the art", presenting a survey of the most significant caching and replication schemes that have been proposed so far, for structured P2P networks and for unstructured ones as well; in Chapter 3, existing caching schemes are analyzed in order to find a lower bound on the number of cache hits that are expected to occur: the technique we employ applies to generic, simple models as well as to more sophisticated models (e.g., Kademlia-like structures). Chapter 4 presents our novel approach: the proposed overlay and the adopted routing and caching algorithms are explained in details, and their main qualitative and quantitative properties are formally defined. The theoretical analysis developed in Chapter 3 is then validated through the experimental results reported in Chapter 5. Finally, in the Conclusions we sum up our contributions and suggest some open problems. 2. ROUTING AND CACHING ON P2P NETWORKS: AN OVERVIEW In the last few years, P2P networks have rapidly become a basis for building distributed applications. Generally, much of the attention has been focused on their construction and on the search efficiency issue: in particular, routing algorithms have been widely studied. In most of the current search solutions, all peers are assumed to submit queries that uniformly search the contents stored in all nodes; unfortunately, the pop- ularities of the contents are often quite skewed, making the workload over the whole network unbalanced. Therefore, additional mechanisms such as caching and replication have been introduced. Initially, caching and replication schemes for P2P networks have been developed in order to handle bursts occurring in web traffic. Starting from the seminal work of Karger et al. [20], many caching algorithms for web environments have been proposed [39, 17]: in these cases, P2P-like structures are introduced in order to coordinate a distributed caching system and achieve load balancing for a small number of web servers. In general, less attention has been paid to caching and replication mechanisms addressing exclusively P2P traffic: here, we'll summarize the main results achieved in this specific field. After analyzing the skewed distribution of object popularity in greater detail, this chapter provides an overview of the most important P2P routing and caching algorithms proposed so far, for unstructured networks as well as for structured ones. The main purpose of this brief survey is to explore the connections that exist between routing and caching, in order to understand the limitations suffered by current approaches and develop a novel strategy.

Routing and Caching on Dhts

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support