Peer-To-Peer Systems and Distributed Hash Tables

Today Peer-to-Peer Systems and 1. Peer-to-Peer Systems Distributed Hash Tables – Napster, Gnutella, BitTorrent, challenges 2. Distributed Hash Tables COS 418: Advanced Computer Systems 3. The Chord Lookup Service Lecture 8 4. Concluding thoughts on DHTs, P2P Mike Freedman [Credit: Slides Adapted from Kyle Jamieson and Daniel Suo] 2 What is a Peer-to-Peer (P2P) system? Why might P2P be a win? Node • High capacity for services through parallelism: – Many disks Node Node – Many network connections Internet – Many CPUs Node Node • A distributed system architecture: • Absence of a centralized server may mean: – No centralized control – Less chance of service overload as load increases – Nodes are roughly symmetric in function – Easier deployment – A single failure won’t wreck the whole system – System as a whole is harder to attack • Large number of unreliable nodes 3 4 1 P2P adoption Example: Classic BitTorrent Successful adoption in some niche areas 1. User clicks on download link – Gets torrent file with content hash, IP addr of tracker 2. User’s BitTorrent (BT) client talks to tracker 1. Client-to-client (legal, illegal) file sharing – Tracker tells it list of peers who have file 3. User’s BT client downloads file from peers 2. Digital currency: no natural single owner (Bitcoin) 4. User’s BT client tells tracker it has a copy now, too 3. Voice/video telephony: user to user anyway 5. User’s BT client serves the file to others for a while – Issues: Privacy and control Provides huGe download bandwidth, without expensive server or network links 5 6 The lookup problem Centralized lookup (Napster) get(“Pacific Rim.mp4”) N2 N2 N3 N3 Client Client N1 N1 ? Lookup(“Pacific Internet SetLoc(“Pacific Rim.mp4”, DB Rim.mp4”) Publisher (N4) IP address of N4) Simple, but O(N) state and a Publisher (N4) N6 single pointN6 of failure put(“Pacific Rim.mp4”, N5 key=“Pacific Rim.mp4”, N5 [content]) value=[content] 7 8 2 Flooded queries (original Gnutella) Routed DHT queries (Chord) Lookup(“Pacific Lookup(H(data)) Rim.mp4”) N N2 2 N3 N3 Client Client N1 N1 Robust, but O(N = number of peers) messages per lookup Publisher (N4) N Publisher (N4) N N 6 N 6 key=“Star Wars.mov”, 5 key=“H(audioCan we make data)”, it robust5 , reasonable value=[content] value=[content] state, reasonable number of hops? 9 10 Today What is a DHT (and why)? 1. Peer-to-Peer Systems • Local hash table: key = Hash(name) put(key, value) 2. Distributed Hash Tables get(key) à value 3. The Chord Lookup Service • Service: Constant-time insertion and lookup How can I do (roughly) this across millions of hosts on the Internet? 4. Concluding thoughts on DHTs, P2P Distributed Hash Table (DHT) 11 12 3 What is a DHT (and why)? Cooperative storage with a DHT • Distributed Hash Table: key = hash(data) Distributed application put(key, data) get (key) data lookup(key) à IP addr (Chord lookup service) Distributed hash table (DHash) send-RPC(IP address, put, key, data) lookup(key) node IP address send-RPC(IP address, get, key) à data Lookup service (Chord) node node …. node • Partitioning data in large-scale distributed systems – Tuples in a global database engine – Data blocks in a global file system • App may be distributed over many nodes – Files in a P2P file-sharing system • DHT distributes data storage over many nodes 13 14 BitTorrent over DHT Why the put/get DHT interface? • BitTorrent can use DHT instead of (or with) a tracker • API supports a wide range of applications – DHT imposes no structure/meaning on keys • BitTorrent clients use DHT: – Key = file content hash (“infohash”) • Key/value pairs are persistent and global – Value = IP address of peer willing to serve file – Can store keys in other DHT values • Can store multiple values (i.e. IP addresses) for a key – And thus build complex data structures • Client does: – get(infohash) to find other clients willing to serve – put(infohash, my-ipaddr)to identify itself as willing 15 16 4 Why might DHT design be hard? Today • Decentralized: no central authority 1. Peer-to-Peer Systems • Scalable: low network traffic overhead 2. Distributed Hash Tables • Efficient: find items quickly (latency) 3. The Chord Lookup Service – Basic design • Dynamic: nodes fail, new nodes join – Integration with DHash DHT, performance 17 18 Chord lookup algorithm properties Chord identifiers • Interface: lookup(key) ® IP address • Key identifier = SHA-1(key) • Efficient: O(log N) messages per lookup • Node identifier = SHA-1(IP address) – N is the total number of servers • SHA-1 distributes both uniformly • Scalable: O(log N) state per node • Robust: survives massive failures • How does Chord partition data? • Simple to analyze – i.e., map key IDs to node IDs 19 20 5 Consistent hashing [Karger ‘97] Chord: Successor pointers Key 5 N120 K5 N10 N105 K20 N105 Node 105 Circular 7-bit N32 N32 ID space N90 K80 N90 K80 N60 Key is stored at its successor: node with next-higher ID 21 22 Basic lookup Simple lookup algorithm N120 Lookup(key-id) N10 “Where is K80?” ß N105 succ my successor if my-id < succ < key-id // next hop call Lookup(key-id) on succ N32 “N90 has K80” else // done K80 N90 return succ N60 Correctness depends only on successors 23 24 6 Improving performance “Finger table” allows log N-time lookups • Problem: Forwarding through successor is slow ¼ ½ • Data structure is a linked list: O(n) • Idea: Can we make it more like a binary search? 1/8 1/16 – Need to be able to halve distance at each step 1/32 1/64 N80 25 26 Finger i Points to Successor of n+2i Implication of finger tables • A binary lookup tree rooted at every node N120 – Threaded through other nodes' finger tables K112 ¼ ½ • Better than arranging nodes in a single tree 1/8 – Every node acts as a root • So there's no root hotspot 1/16 1/32 1/64 • No single point of failure • But a lot more state in total N80 27 28 7 Lookup with finger table Lookups Take O(log N) Hops Lookup(key-id) N5 look in local finger table for N10 N110 K19 highest n: my-id < n < key-id N20 if n exists N99 call Lookup(key-id) on node n // next hop N32 Lookup(K19) else return my successor // done N80 N60 29 30 An aside: Is log(n) fast or slow? Joining: Linked list insert • For a million nodes, it’s 20 hops • If each hop takes 50ms, lookups take a second N25 • If each hop has 10% chance of failure, it’s a couple of timeouts N36 • So in practice log(n) is better than O(n) but not great 1. Lookup(36) K30 N40 K38 31 32 8 Join (2) Join (3) N25 N25 2. N36 sets its own N36 3. Copy keys 26..36 N36 K30 successor pointer from N40 to N36 K30 K30 N40 N40 K38 K38 33 34 Notify maintains predecessors Stabilize message fixes successor N25 N25 notify N25 ✘ N36 stabilize N36 N40 notify N36 N40 “My predecessor is N36.” 35 36 9 Joining: Summary Failures may cause incorrect lookup N120 N10 N113 N25 N102 N36 K30 N85 K30 Lookup(K90) N40 K38 N80 • Predecessor pointer allows link to new node N80 does not know correct • Update finger pointers in the background successor, so incorrect lookup • Correct successors produce correct lookups 37 38 Successor lists Today 1. Peer-to-Peer Systems • Each node stores a list of its r immediate successors – After failure, will know first live successor 2. Distributed Hash Tables – Correct successors guarantee correct lookups • Guarantee is with some probability 3. The Chord Lookup Service – Basic design – Integration with DHash DHT, performance 39 42 10 The DHash DHT DHash data authentication • Builds key/value storage on Chord • Two types of DHash blocks: – Content-hash: key = SHA-1(data) • Replicates blocks for availability – Public-key: Data signed by corresponding private key – Stores k replicas at the k successors after the block on the Chord ring • Chord File System example: • Caches blocks for load balancing directory inode data block FS block block H(B1) H(D) H(F) B1 – Client sends copy of block to each server it contacted along lookup path root−block public key D F ... ... data block DHash DHash DHash H(B2) • Authenticates block contents ... signature B2 43 44 Chord Chord Chord Figure 2: A simple CFS file system structure example. The root-block is identified by a public key and signed by the corre- CFS Client CFS Server CFS Server sponding private key. The other blocks are identified by cryp- tographic hashes of their contents. Figure 1: CFS software structure. Vertical links are local APIs; horizontal links are RPC APIs. storage layer, and a Chord lookup layer. The client file system uses the DHash layer to retrieve blocks. The client DHash layer uses the This is more space-efficient, for large files, than whole-file caching. client Chord layer to locate the servers that hold desired blocks. CFS relies on caching only for files small enough that distributing Each CFS server has two software layers: a DHash storage layer DHash replicates blocks atblocks r successorsis not effective. Evaluating the performance impact of blockToday and a Chord layer. The server DHash layer is responsible for stor- storage granularity is one of the purposes of this paper. ing keyed blocks, maintaining proper levels of replication as servers OceanStore [13] aims to build a global persistent storage util-1. Peercome-to-Peerand go, Systemsand caching popular blocks. The server DHash and ity.

Load more