Detecting and Recovering from Overlay Routing Attacks in Peer-To-Peer Distributed Hash Tables

Detecting and Recovering from Overlay Routing Attacks in Peer-to-Peer Distributed Hash Tables A thesis for the degree of Master of Science in Computer Science Keith Needels [email protected] Department of Computer Science Rochester Institute of Technology February 22, 2008 Committee: Professor James Minseok Kwon, Chair Professor Alan Kaminsky, Reader Professor Warren R. Carithers, Observer Abstract Distributed hash tables (DHTs) provide efficient and scalable lookup mechanisms for locating data in peer-to-peer (P2P) networks. A number of issues, however, prevent DHT based P2P networks from being widely deployed. One of these issues is security. DHT protocols rely on the users of the system to cooperate for lookup requests to successfully reach the correct destination. Users who fail to run the protocol correctly can severely limit the functionality of these systems. The fully distributed nature of DHTs compounds these security issues, as any security mechanism must be implemented in a non- centralized fashion for the system to remain truly P2P. This thesis examines the security issues facing DHT protocols, and we propose an extension to one such protocol (called Chord) to mitigate the effects of attacks on the underlying lookup message routing mechanism when a minority of nodes in the system are malicious. Our modifications require no trust to exist between nodes in the network except during the joining process. Instead, each node makes use of locally known information about the network to evaluate hops encountered during the lookup routing process for validity. Hops that are determined to be invalid are avoided. These modifications to the Chord protocol have been implemented in a simulator and then evaluated in the presence of malicious nodes. We present the results of this evaluation and compare them to the results obtained when running the unmodified Chord protocol. ii Table of Contents 1. Introduction ..................................................................................................................... 1 2. Peer-to-Peer Protocols .................................................................................................... 3 2.1. Overlay Networks .................................................................................................... 3 2.2. Napster and Gnutella ................................................................................................ 4 2.3. Distributed Hash Tables ........................................................................................... 5 2.3.1. Chord................................................................................................................. 6 2.3.2. Pastry................................................................................................................. 8 2.3.3. Content Addressable Networks (CANs) ........................................................... 9 3. Peer-to-Peer Protocol Security Issues and Related Work ............................................. 11 3.1. Data Attacks ........................................................................................................... 11 3.2. Identifier Attacks ................................................................................................... 12 3.3. Routing Attacks ..................................................................................................... 14 4. Chord Secure Routing Design ...................................................................................... 17 4.1. Threat Model .......................................................................................................... 17 4.2. Design Overview ................................................................................................... 18 4.4. The Backtracking Algorithm ................................................................................. 18 4.5. The Hop Verification Algorithm ........................................................................... 24 4.6. Maintaining Statistical Data ................................................................................... 25 4.7. Joining the Network ............................................................................................... 26 4.8. Updating Finger Table Entries ............................................................................... 27 5. Simulator Design .......................................................................................................... 28 5.1. Using the GUI Utility............................................................................................. 29 5.1.1. Experiment Setup ............................................................................................ 31 5.1.2. Viewing Experiment Results .......................................................................... 32 5.2. Writing Tests in Java ............................................................................................. 33 6. Evaluation ..................................................................................................................... 35 6.1. Dropped lookup requests ....................................................................................... 35 6.2. Incorrect Random Routing ..................................................................................... 37 6.3. Malicious Sub-ring Routing ................................................................................... 39 6.3.1. Effect of the Standard Deviation Parameter ................................................... 43 6.3.2. Effect of the Pruning Parameter ...................................................................... 45 7. Conclusion .................................................................................................................... 48 8. References ..................................................................................................................... 49 iii 1. Introduction The popularity of peer-to-peer (P2P) networks took off beginning in 1999 thanks to the success of Napster [9], and it has been a hot research topic ever since. Although there have been applications before Napster that could be considered peer-to-peer, giving the average Internet user the ability to easily obtain music and movies for free has made P2P technology well known throughout the world. Research in this area has resulted in a powerful class of P2P lookup protocols called distributed hash tables. While these protocols are scalable and efficient, they suffer from security vulnerabilities that prevent them from being widely deployed in open networks. A peer-to-peer network can be defined as a network where there are no central servers. Instead, each user of the system is both a client and a server, and is referred to as a peer. Peers connect directly to each other to transfer data. In a true peer-to-peer network, peers also locate data without using a central server, without using any kind of hierarchical organization, and without making some peers more important than others. There is no single point of failure. If a peer fails, other peers can continue to use the system. If a single peer is using all of its bandwidth, other peers are not affected. While illegal file sharing is perhaps the most well known application of peer-to-peer networks, there are many useful legal and ethical uses for these decentralized systems. These uses include overlay multicast, data backup, distributed file systems, distributed databases, instant messaging, DNS, and so on. Many large scale distributed systems that can benefit from an architecture where there is no single point of failure can make use of P2P technology. Unfortunately for Napster, it was not a true peer-to-peer network since file lookups were handled by a central server. This directory server was an easy legal target, and Napster was shut down. In the wake of Napster’s demise, many new peer-to-peer systems were developed. These systems were fully decentralized, but most of them relied on flooding techniques to locate peers containing desired data, which is not very efficient and is not guaranteed to find sought after files. To solve these problems, yet another class of P2P systems were developed, called Distributed Hash Tables (DHTs). DHTs are fully decentralized systems that locate data efficiently, and are today a popular research topic in the field of distributed systems. A few of these systems will be discussed in Section 2. The fundamental purpose of a DHT is to find the peer (also called a node) responsible for a resource given a key for that resource. Since it is not practical to keep track of every node in the network, each node should only be responsible for keeping track of a small subset of other nodes. Finding the node responsible for a key should be done by forwarding lookup requests through a structured overlay network . All of the nodes that a particular node keeps references to are the links that node has in the overlay. The number of hops needed to complete a lookup should also be small. While DHTs are much more efficient and scalable than the flooding systems that were popular immediately after the fall of Napster, there are some serious issues that prevent 1 them from being deployed in large, open networks, one of which is security. With DHTs, we must rely on other peers to correctly forward our lookup requests in order to find the peer responsible for a key. Unlike physical network routing, with overlay routing in an open DHT system, anybody can become a router. An individual attacking

Load more