Prefix Hash Tree an Indexing Data Structure Over Distributed Hash
Total Page:16
File Type:pdf, Size:1020Kb
Prefix Hash Tree An Indexing Data Structure over Distributed Hash Tables Sriram Ramabhadran ∗ Sylvia Ratnasamy University of California, San Diego Intel Research, Berkeley Joseph M. Hellerstein Scott Shenker University of California, Berkeley International Comp. Science Institute, Berkeley and and Intel Research, Berkeley University of California, Berkeley ABSTRACT this lookup interface has allowed a wide variety of Distributed Hash Tables are scalable, robust, and system to be built on top DHTs, including file sys- self-organizing peer-to-peer systems that support tems [9, 27], indirection services [30], event notifi- exact match lookups. This paper describes the de- cation [6], content distribution networks [10] and sign and implementation of a Prefix Hash Tree - many others. a distributed data structure that enables more so- phisticated queries over a DHT. The Prefix Hash DHTs were designed in the Internet style: scala- Tree uses the lookup interface of a DHT to con- bility and ease of deployment triumph over strict struct a trie-based structure that is both efficient semantics. In particular, DHTs are self-organizing, (updates are doubly logarithmic in the size of the requiring no centralized authority or manual con- domain being indexed), and resilient (the failure figuration. They are robust against node failures of any given node in the Prefix Hash Tree does and easily accommodate new nodes. Most impor- not affect the availability of data stored at other tantly, they are scalable in the sense that both la- nodes). tency (in terms of the number of hops per lookup) and the local state required typically grow loga- Categories and Subject Descriptors rithmically in the number of nodes; this is crucial since many of the envisioned scenarios for DHTs C.2.4 [Comp. Communication Networks]: Dis- involve extremely large systems (such as P2P mu- tributed Systems|distributed applications; E.1 [Data sic file sharing). However, DHTs, like the Internet, Structures]: distributed data structures; H.3.1 [Info- deliver "best-effort” semantics; put's and get's are rmation Storage and Retrieval]: Content Anal- likely to succeed, but the system provides no guar- ysis and Indexing|indexing methods antees. As observed by others [36, 5], this conflict between scalability and strict semantics appears General Terms to be inevitable and, for many large-scale Inter- Algorithms, Design, Performance net systems, the former is deemed more important than the latter. Keywords While DHTs have enjoyed some success as a build- distributed hash tables, data structures, range queries ing block for Internet-scale applications, they are seriously deficient in one regard: they only directly 1. INTRODUCTION support exact match queries. Keyword queries can The explosive growth but primitive design of peer- be derived from these exact match queries in a to-peer file-sharing applications such as Gnutella straightforward but inefficient manner; see [25, 20] [7] and KaZaa [29] inspired the research community for applications of this to DHTs. Equality joins to invent Distributed Hash Tables (DHTs) [31, 24, can also be supported within a DHT framework; see [15]. However, range queries, asking for all ob- 14, 26, 22, 23]. Using a structured overlay network, jects with values in a certain range, are particularly DHTs map a given key to the node in the network difficult to implement in DHTs. This is because holding the object associated with that key; this DHTs use hashing to distribute keys uniformly and lookup operation lookup(key) can be used to sup- so can't rely on any structural properties of the key port the canonical put(key,value) and get(key) space, such as an ordering among keys. hash table operations. The broad applicability of ∗email [email protected] Range queries arise quite naturally in a number of Leaf nodes Keys 000* 000001 0 0 1 000100 000100 00100* 001001 1 001010* 001010 0 1 0 1 001010 001010 2 001011* 001011 0 1 0 1 001011 0011* 01* 010000 3 010101 0 1 10* 100010 101011 4 101111 0 1 110* 110000 110010 5 110011 0 1 110110 111* 111000 111010 6 Figure 1: Prefix Hash Tree potential application domains: against data loss when nodes go down1, the failure of any given node in the Prefix Hash Tree does not Databases Peer-to-peer databases [15] need to sup- affect the availability of data stored at other nodes. port SQL-type relational queries in a distributed fashion. Range predicates are a key component in But perhaps the most crucial property of PHT is SQL. that it is built entirely on top of the lookup inter- face, and thus can run over any DHT. That is, PHT Distributed computing Resource discovery requires uses only the lookup(key) operation common to locating resources within certain size ranges in a all DHTs and does not, as in SkipGraph [1] and decentralized manner. other such approaches, assume knowledge of nor require changes to the DHT topology or routing Location-aware computing Many applications want behavior. While designs that rely on such lower- to locate nearby resources (computing, human or layer knowledge and modifications are appropriate commercial) based on a user's current location, for contexts where the DHT is expressly deployed which is essentially a 2-dimensional range query for the purpose of supporting range queries, we ad- based on geographic coordinates. dress the case where one must use a pre-existing DHT. This is particularly important if one wants Scientific computing Parallel N-body computations to make use of publicly available DHT services, [34] require 3-dimensional range queries for accu- such as OpenHash [18]. rate approximations. The remainder of the paper is organized as fol- In this paper, we address the problem of efficiently lows. Section 2 describes the design of the PHT supporting 1-dimensional range queries over a DHT. data structure. Section 3 presents the results of an Our main contribution is a novel trie-based dis- experimental evaluation. Section 4 surveys related tributed data structure called Prefix Hash Tree (hence- work and section 5 concludes. forth abbreviated as PHT) that supports such queries. As a corollary, the PHT can also support heap queries (\what is the maximum/minimum ?"), prox- 2. DATA STRUCTURE This section describes the PHT data structure, along imity queries (\what is the nearest element to X with related algorithms. ?"), and, in a limited way, multi-dimensional ana- logues of the above, thereby greatly expanding the querying facilities of DHTs. PHT is efficient, in 2.1 PHT Description that updates are doubly logarithmic in the size For the sake of simplicity, it is assumed that the do- of the domain being indexed. Moreover, PHT is main being indexed is f0; 1gD, i.e., binary strings self-organizing and load-balanced. PHT also toler- ates failures well; while it cannot by itself protect 1But PHT can take advantage of any replication or other data-preserving technique employed by a DHT. of length D, although the discussion extends nat- in regions of the domain which are sparsely popu- urally to other domains. Therefore, the data set lated. Finally, property 5 ensures that the leaves indexed by the PHT consists of some number N of of the PHT form a doubly linked list, which en- D-bit binary keys. ables sequential traversal of the leaves for answer- ing range queries. In essence, the PHT data structure is a binary trie built over the data set. Each node of the trie is As described this far, the PHT structure is a fairly labeled with a prefix that is defined recursively: routine binary trie. The novelty of PHT lies in given a node with label l, its left and right child how this logical trie is distributed among the peers nodes are labeled l0 and l1 respectively. The root in the network; i.e., in how PHT vertices are as- is labeled with the attribute being indexed, and signed to DHT nodes. This is achieved by hashing downstream nodes are labeled as above. the prefix labels of PHT nodes over the DHT iden- tifier space. A node with label l is thus assigned The following properties are invariant in a PHT. 4 to the peer to which l is mapped by the DHT, i.e., the peer whose identifier is closest to HASH(l). 1. (Universal prefix) Each node has either 0 or 2 This hash-based assignment implies that given a la- children. bel, it is possible to locate its corresponding PHT node via a single DHT lookup. This \direct access" 2. (Key storage) A key K is stored at a leaf node property is unlike the successive link traversals as- whose label is a prefix of K. sociated with typical data structures and results in the PHT having several desirable features that are 3. (Split) Each leaf node stores atmost B keys. discussed subsequently. 4. (Merge) Each internal node contains atleast 2.2 PHT Operations (B + 1) keys in its sub-tree. This section describes algorithms for PHT opera- tions. 5. (Threaded leaves) Each leaf node maintains a pointer to the leaf nodes on its immediate left and and immediate right respectively.2 2.2.1 Lookup Given a key K, a PHT lookup operation returns Property 1 guarantees that the leaf nodes of the the unique leaf node leaf(K) whose label is a prefix PHT form a universal prefix set 3. Consequently, of K. Because there are (D + 1) distinct prefixes given any key K 2 f0; 1gD , there is exactly one leaf of K, there are (D + 1) potential candidates; an obvious algorithm is to perform a linear scan of node leaf(K) whose label is a prefix of K. Prop- these (D + 1) nodes until the required leaf node is erty 2 states that the key K is stored at leaf(K).