Compsci 514: Computer Networks Lecture 13: Distributed Hash Table
Total Page:16
File Type:pdf, Size:1020Kb
CompSci 514: Computer Networks Lecture 13: Distributed Hash Table Xiaowei Yang Overview • What problems do DHTs solve? • How are DHTs implemented? Background • A hash table is a data structure that stores (key, object) pairs. • Key is mapped to a table index via a hash function for fast lookup. • Content distribution networks – Given an URL, returns the object Example of a Hash table: a web cache http://www.cnn.com0 Page content http://www.nytimes.com ……. 1 http://www.slashdot.org ….. … 2 … … … • Client requests http://www.cnn.com • Web cache returns the page content located at the 1st entry of the table. DHT: why? • If the number of objects is large, it is impossible for any single node to store it. • Solution: distributed hash tables. – Split one large hash table into smaller tables and distribute them to multiple nodes DHT K V K V K V K V A content distribution network • A single provider that manages multiple replicas. • A client obtains content from a close replica. Basic function of DHT • DHT is a virtual hash table – Input: a key – Output: a data item • Data Items are stored by a network of nodes. • DHT abstraction – Input: a key – Output: the node that stores the key • Applications handle key and data item association. DHT: a visual example K V K V (K1, V1) K V K V K V Insert (K1, V1) DHT: a visual example K V K V (K1, V1) K V K V K V Retrieve K1 Desired properties of DHT • Scalability: each node does not keep much state • Performance: look up latency is small • Load balancing: no node is overloaded with a large amount of state • Dynamic reconfiguration: when nodes join and leave, the amount of state moved from nodes to nodes is small. • Distributed: no node is more important than others. A straw man design (0, V1) (3, V2) 0 (1, V3) (2, V5) (4, V4) 1 2 (5, V6) • Suppose all keys are intergers • The number of nodes in the network is n. • id = key % n When node 2 dies (0, V1) 0 (2, V5) (4, V4) (1, V3) (3, V2) (5, V6) 1 • A large number of data items need to be rehashed. Fix: consistent hashing • When a node joins or leaves, the expected fraction of objects that must be moved is the minimum needed to maintain a balanced load. • A node is responsible for a range of keys • All DHTs implement consistent hashing Chord: basic idea • Hash both node id and key into a m-bit one-dimension circular identifier space • Consistent hashing: a key is stored at a node whose identifier is closest to the key in the identifier space – Key refers to both the key and its hash value. Basic components of DHTs • Overlapping key and node identifier space – Hash(www.cnn.com/image.jpg) à a n-bit binary string – Nodes that store the objects also have n-bit string as their identifiers • Building routing tables – Next hops – Distance functions – These two determine the geometry of DHTs • Ring, Tree, Hybercubes, hybrid (tree + ring) etc. – Handle node join and leave • Lookup and store interface Chord: ring topology Key 5 Node 105 K5 N105 K20 Circular 7-bit N32 ID space N90 K80 A key is stored at its successor: node with next higher ID Chord: how to find a node that stores a key? • Solution 1: every node keeps a routing table to all other nodes – Given a key, a node knows which node id is successor of the key – The node sends the query to the successor – What are the advantages and disadvantages of this solution? Solution 2: every node keeps a routing entry to the nodes successor (a linked list) N120 N10 Where is key 80? N105 N90 has K80 N32 K80 N90 N60 Simple lookup algorithm Lookup(my-id, key-id) n = my successor if my-id < n < key-id call Lookup(key-id) on node n // next hop else return my successor // done • Correctness depends only on successors • Q1: will this algorithm miss the real successor? • Q2: whats the average # of lookup hops? Solution 3: Finger table allows log(N)-time lookups ¼ ½ 1/8 1/16 1/32 1/64 1/128 N80 • Analogy: binary search Finger i points to successor of n+2i-1 N120 112 ¼ ½ 1/8 1/16 1/32 1/64 1/128 N80 • A finger table entry includes Chord Id and IP address • Each node stores a small table log(N) Chord finger table example 1 [1,2) 1 Keys: 2 [2,4) 3 5,6 4 [4,0) 0 0 2 [2,3) 3 Keys: 7 1 3 [3,5) 3 1 5 [5,1) 0 6 2 4 [4,5) 0 5 3 Keys: 4 5 [5,7) 0 2 7 [7,3) 0 Lookup with fingers Lookup(my-id, key-id) look in local finger table for highest node n s.t. my-id < n < key-id if n exists call Lookup(key-id) on node n // next hop else return my successor // done 5 N1 lookup(K54) // ask node n to findthe successor of id N8 n.find successor(id) K54 N56 if (id ∈ (n, successor]) N51 return successor; N14 else N48 // forward the query around the circle return successor.findsuccessor(id); N21 N42 (a) N38 N32 (b) Fig. 3. (a) Simple (but slow) pseudocode to findthe successor node of an identifi er id. Remote procedure calls and variable lookups are preceded by the remote node. (b) The path taken by a query from node 8 for key 54, using the pseudocode in Figure 3(a). N1 N1 Finger table lookup(54) N8 +1 N8 + 1 N14 N8 N8 + 2 N14 K54 N56 +2 N8 + 4 N14 N8 + 8 N21 N51 +4 N51 +32 +8 N14 N8 +16 N32 N14 N8 +32 N42 N48 +16 N48 N21 N42 N21 N42 N38 N38 N32 N32 (a) (b) Fig. 4. (a) The fingertable entries for node 8. (b) The path a query for key 54 starting at node 8, using the algorithm in Figure 5. // ask node n to findthe successor of id Notation Defi nition n.find successor(id) finger[k] first node on circle that succeeds (n + if (id ∈ (n, successor]) 2k−1) mod 2m, 1 ≤ k ≤ m return successor; else successor the next node on the identifier circle; n′ = closest preceding node(id); finger[1].node return n′.findsuccessor(id); predecessor the previous node on the identifier circle // search the local table for the highest predecessor of id n.closest preceding node(id) TABLE I for i = m downto 1 Defi nition of variables for node n, using m-bit identifi ers. if (finger[i] ∈ (n, id)) return finger[i]; return n; Fig. 5. Scalable key lookup using the fingertable. The example in Figure 4(a) shows the finger table of node 8. The first finger of node 8 points to node 14, as node 14 is the 0 6 first node that succeeds (8 + 2 ) mod 2 = 9. Similarly, the last tion, extended to use finger tables. If id falls between n and finger of node 8 points to node 42, as node 42 is the first node its successor, find successor is finished and node n returns its 5 6 that succeeds (8 + 2 ) mod 2 = 40. successor. Otherwise, n searches its finger table for the node This scheme has two important characteristics. First, each n′ whose ID most immediately precedes id, and then invokes node stores information about only a small number of other find successor at n′. The reason behind this choice of n′ is that nodes, and knows more about nodes closely following it on the the closer n′ is to id, the more it will know about the identifier identifier circle than about nodes farther away. Second, a node’s circle in the region of id. finger table generally does not contain enough information to As an example, consider the Chord circle in Figure 4(b), and directly determine the successor of an arbitrary key k. For ex- suppose node 8 wants to find the successor of key 54. Since the ample, node 8 in Figure 4(a) cannot determine the successor of largest finger of node 8 that precedes 54 is node 42, node 8 will key 34 by itself, as this successor (node 38) does not appear in ask node 42 to resolve the query. In turn, node 42 will determine node 8’s finger table. the largest finger in its finger table that precedes 54, i.e., node Figure 5 shows the pseudocode of the find successor opera- 51. Finally, node 51 will discover that its own successor, node Chord lookup example • Lookup(1,6) 1 [1,2) 1 Keys: 5,6 • Lookup(1,2) 2 [2,4) 3 4 [4,0) 0 0 2 [2,3) 3 Keys: 7 1 3 [3,5) 3 1 5 [5,1) 0 6 2 4 [4,5) 0 5 3 Keys: 4 5 [5,7) 0 2 7 [7,3) 0 Node join • Maintain the invariant 1.Each nodes successor is correctly maintained 2.For every node k, node successor(k) answers for k. Its desiraBle that finger taBle entries are correct • Each nodes maintains a predecessor pointer • Tasks: – Initialize predecessor and fingers of new node – Update existing nodes state – Notify apps to transfer state to new node Chord Joining: linked list insert N25 N36 1. Lookup(36) N40 K30 K38 • Node n queries a known node n to initialize its state • for its successor: lookup (n) Join (2) N25 2.