Ranked Queries in Index Data Structures

Total Page:16

File Type:pdf, Size:1020Kb

Ranked Queries in Index Data Structures Universita` degli Studi di Pisa Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis Ranked Queries in Index Data Structures Iwona Bialynicka-Birula Supervisor Roberto Grossi October 15, 2008 Abstract A ranked query is a query which returns the top-ranking elements of a set, sorted by rank, where the rank corresponds to some sort of preference function defined on the items of the set. This thesis investigates the problem of adding rank query capabil- ities to several index data structures on top of their existing functionality. Among the data structures investigated are suffix trees, range trees, and hierarchical data structures. We explore the problem of additionally specifying rank when querying these data structures. So, for example in the case of suffix trees, we would like to obtain not all of the occurrences of a substring in a text or in a set of documents, but to obtain only the most preferable results. What is most important, the efficiency of such a query must be proportional to the number of preferable results and not all of the occurrences, which can be too many to process efficiently. First, we introduce the concept of rank-sensitive data structures. Rank-sensitive data structures are defined through an analogy to output-sensitive data structures. Output-sensitive data structures are capable of reporting the items satisfying an on- line query in time linear to the number of items returned plus a sublinear function of the number of items stored. Rank-sensitive data structures are additionally given a ranking of the items and just the top k best-ranking items are reported at query time, sorted in rank order. The query must remain linear only with respect to the number of items returned, which this time is not a function of the elements satisfying the query, but the parameter k given at query time. We explore several ways of adding rank-sensitivity to different data structures and the different trade-offs which this incurs. Adding rank to an index query can be viewed as adding an additional dimen- sion to the indexed data set. Therefore, ranked queries can be viewed as multi- dimensional range queries, with the notable difference that we additionally want to maintain an ordering along one of the dimensions. Most range data structures do not maintain such an order, with the exception of the Cartesian tree. The Cartesian tree has multiple applications in range searching an other fields, but is rarely used for indexing due to its rigid structure which makes it difficult to use with dynamic content. The second part of this work deals with overcoming this rigidness and describes the first efficient dynamic version of the Cartesian tree. Acknowledgments I would like to thank everybody at the University of Pisa, especially my advisor Pro- fessor Roberto Grossi, Professor Andrea Maggiolo Schettini, and Professor Pierpaolo Degano, who made my studies at the University of Pisa and this work possible. I would also like to thank Professor Anna Karlin for granting me guest access to the courses at the University of Washington. Last but not least, I would like to thank my family (Iwo Bia lynicki-Birula, Zofia Bia lynicka-Birula, Piotr Bia lynicki-Birula, Stefania Kube) and my husband (Vittorio Bertocci) without whose continuing faith and support this work would not have been possible. iv Contents Introduction xv I Background 1 1 State of the Art 3 1.1 Prioritysearchtrees ........................... 6 1.1.1 Complexity ............................ 7 1.1.2 Construction ........................... 7 1.1.3 Three-sidedquery.. .. .. 8 1.1.4 Dynamicprioritysearchtrees . 9 1.1.5 Priority search trees and ranked queries . .. 11 1.2 Cartesiantrees .............................. 12 1.2.1 Definition ............................. 12 1.2.2 Construction ........................... 12 1.3 Weight-balancedB-trees . 14 1.3.1 B-trees............................... 14 1.3.2 Weight-balancedB-trees . 15 2 Base data structures 19 2.1 Suffixtrees................................. 19 2.1.1 Complexity ............................ 20 2.1.2 Construction ........................... 21 2.1.3 Applications............................ 22 2.2 Rangetrees ................................ 23 2.2.1 Definition ............................. 24 2.2.2 Spacecomplexity .. .. .. 24 2.2.3 Construction ........................... 26 vi Contents 2.2.4 Orthogonalrangesearchquery. 26 2.2.5 Fractionalcascading . 28 2.2.6 Dynamicoperationsonrangetrees . 30 2.2.7 Applications............................ 30 II Making structures rank-sensitive 33 3 Introduction and definitions 35 4 Adding rank-sensitivity to suffix trees 37 4.1 Introduction................................ 37 4.2 Preliminaries ............................... 38 4.2.1 Lightandheavyedges . 38 4.2.2 Rank-predecessor and rank-successor . 40 4.3 Approach ................................. 40 4.3.1 Naive solution with quadratic space . 40 4.3.2 Naive solution with non-optimal time . 42 4.3.3 Rank-treesolution . 42 4.4 Rankedtreestructure. .. .. 44 4.4.1 Leaflists.............................. 45 4.4.2 Leafarrays ............................ 46 4.4.3 Nodearrays ............................ 46 4.5 Rankedtreealgorithms. 46 4.5.1 Top-k query............................ 46 4.6 Complexity ................................ 48 4.6.1 Spacecomplexity . .. .. 48 4.6.2 Timecomplexity ......................... 50 4.7 Experimentalresults . .. .. 52 4.7.1 Structureconstructionalgorithm . 54 4.7.2 Querytimeresults . .. .. 55 4.7.3 Structuresizeresults . 63 4.8 An alternative approach using Cartesian trees . ..... 63 4.8.1 Datastructure .......................... 63 4.8.2 Queryalgorithm ......................... 64 4.8.3 Discussion............................. 66 Contents vii 5 Rank-sensitivity — a general framework 69 5.1 Introductionanddefinitions . 69 5.2 Thestaticcaseanditsdynamization . 70 5.2.1 Staticcaseonasingleinterval . 71 5.2.2 Polylog intervals in the dynamic case . 72 5.3 Multi-Q-heaps............................... 76 5.3.1 High-level implementation . 77 5.3.2 Multi-Q-heap: Representation . 78 5.3.3 Multi-Q-heap: Supportedoperations . 78 5.3.4 Multi-Q-heap: Lookuptables . 80 5.3.5 Generalcase............................ 82 5.4 Conclusions ................................ 82 III Dynamic Cartesian trees 83 5.5 Introduction................................ 85 5.5.1 Background ............................ 85 5.5.2 Motivation............................. 87 5.5.3 Ourresults ............................ 88 5.6 Preliminaries ............................... 89 5.6.1 PropertiesofCartesiantrees . 90 5.6.2 DynamicoperationsonCartesiantrees . 91 5.7 Bounding the number of edge modifications . 95 5.8 Implementing the insertions . 99 5.8.1 TheCompanionIntervalTree . .102 5.8.2 SearchingCost ..........................104 5.8.3 RestructuringCost . .110 5.9 Implementing deletions . 112 5.10Conclusions ................................113 Conclusions 115 Index 117 Bibliography 119 viii Contents List of Tables 1.1 Operations supported with the use of a priority search tree and their corresponding complexities. 7 4.1 A table for the quick answering of rank-successor queries computed for the tree in Figure 4.1. A cell in column i and row j identifies the rank-successor of leaf i at the ancestor of i whose depth is j. For clarity, leaves are labeled with their rank. ... 41 4.2 Table 4.1 with only the distinct rank-successors of each leaf distin- guished. .................................. 41 4.3 The average ratio of the number of executions of the inner loop to k depending on the suffix tree size and shape. 57 4.4 The query time in milliseconds depending on k. ............ 58 4.5 Structure size and construction time (in milliseconds) depending on thelengthandtypeoftheindexedtext. 62 x List of Tables List of Figures 1.1 Asampleprioritysearchtree. 6 1.2 A sample dynamic priority search tree. .. 9 1.3 An example of a Cartesian tree. The nodes storing points x, y h i are located at coordinates (x, y) in the Cartesian plane. The edges e =( x ,y , x ,y ) are depicted as lines connecting the two coor- h L Li h R Ri dinates (xL,yL) and (xR,yR). The tree induces a subdivision of the Cartesian plane, which is illustrated by dotted lines. ...... 13 2.1 Suffix tree for the string senselessness$. Longer substrings are rep- resented using start and end positions of the substring in the text. Leaves contain starting positions of their corresponding suffixes. 20 2.2 Range tree for the set of points (1, 15), (2, 5), (3, 8), (4, 2), (5, 1), { (6, 6), (7, 14), (8, 11), (9, 16), (10, 13), (11, 4), (12, 3), (13, 12), (14, 10), (15, 7), (16, 9) . For clarity, the first coordinate has been omitted in } the second-level tree nodes. The query 1 x 13, 2 y 11 is ≤ ≤ ≤ ≤ considered. The path of the query is marked using bold edges. The leaves and the roots of the second-level nodes considered for the query are marked with black dots. Finally, the ranges of points reported are indicated using gray rectangles. 25 2.3 The second-level structures of the range tree in Figure 2.2 linked using fractional cascading. For clarity, the first-level nodes are not depicted — they are the same as the ones in Figure 2.2. The bold lines indicate the fractional cascading pointers following the paths of the query in the primary tree. The bold dashed lines indicate the fractional cascading pointers followed from a node on the main query path to its child whose secondary structure needs to be queried. ... 29 xii List of Figures 4.1 A sample tree with the leaf rank indicated inside the leaf symbols. Heavy edges
Recommended publications
  • Interval Trees Storing and Searching Intervals
    Interval Trees Storing and Searching Intervals • Instead of points, suppose you want to keep track of axis-aligned segments: • Range queries: return all segments that have any part of them inside the rectangle. • Motivation: wiring diagrams, genes on genomes Simpler Problem: 1-d intervals • Segments with at least one endpoint in the rectangle can be found by building a 2d range tree on the 2n endpoints. - Keep pointer from each endpoint stored in tree to the segments - Mark segments as you output them, so that you don’t output contained segments twice. • Segments with no endpoints in range are the harder part. - Consider just horizontal segments - They must cross a vertical side of the region - Leads to subproblem: Given a vertical line, find segments that it crosses. - (y-coords become irrelevant for this subproblem) Interval Trees query line interval Recursively build tree on interval set S as follows: Sort the 2n endpoints Let xmid be the median point Store intervals that cross xmid in node N intervals that are intervals that are completely to the completely to the left of xmid in Nleft right of xmid in Nright Another view of interval trees x Interval Trees, continued • Will be approximately balanced because by choosing the median, we split the set of end points up in half each time - Depth is O(log n) • Have to store xmid with each node • Uses O(n) storage - each interval stored once, plus - fewer than n nodes (each node contains at least one interval) • Can be built in O(n log n) time. • Can be searched in O(log n + k) time [k = #
    [Show full text]
  • Balanced Trees Part One
    Balanced Trees Part One Balanced Trees ● Balanced search trees are among the most useful and versatile data structures. ● Many programming languages ship with a balanced tree library. ● C++: std::map / std::set ● Java: TreeMap / TreeSet ● Many advanced data structures are layered on top of balanced trees. ● We’ll see several later in the quarter! Where We're Going ● B-Trees (Today) ● A simple type of balanced tree developed for block storage. ● Red/Black Trees (Today/Thursday) ● The canonical balanced binary search tree. ● Augmented Search Trees (Thursday) ● Adding extra information to balanced trees to supercharge the data structure. Outline for Today ● BST Review ● Refresher on basic BST concepts and runtimes. ● Overview of Red/Black Trees ● What we're building toward. ● B-Trees and 2-3-4 Trees ● Simple balanced trees, in depth. ● Intuiting Red/Black Trees ● A much better feel for red/black trees. A Quick BST Review Binary Search Trees ● A binary search tree is a binary tree with 9 the following properties: 5 13 ● Each node in the BST stores a key, and 1 6 10 14 optionally, some auxiliary information. 3 7 11 15 ● The key of every node in a BST is strictly greater than all keys 2 4 8 12 to its left and strictly smaller than all keys to its right. Binary Search Trees ● The height of a binary search tree is the 9 length of the longest path from the root to a 5 13 leaf, measured in the number of edges. 1 6 10 14 ● A tree with one node has height 0.
    [Show full text]
  • Augmentation: Range Trees (PDF)
    Lecture 9 Augmentation 6.046J Spring 2015 Lecture 9: Augmentation This lecture covers augmentation of data structures, including • easy tree augmentation • order-statistics trees • finger search trees, and • range trees The main idea is to modify “off-the-shelf” common data structures to store (and update) additional information. Easy Tree Augmentation The goal here is to store x.f at each node x, which is a function of the node, namely f(subtree rooted at x). Suppose x.f can be computed (updated) in O(1) time from x, children and children.f. Then, modification a set S of nodes costs O(# of ancestors of S)toupdate x.f, because we need to walk up the tree to the root. Two examples of O(lg n) updates are • AVL trees: after rotating two nodes, first update the new bottom node and then update the new top node • 2-3 trees: after splitting a node, update the two new nodes. • In both cases, then update up the tree. Order-Statistics Trees (from 6.006) The goal of order-statistics trees is to design an Abstract Data Type (ADT) interface that supports the following operations • insert(x), delete(x), successor(x), • rank(x): find x’s index in the sorted order, i.e., # of elements <x, • select(i): find the element with rank i. 1 Lecture 9 Augmentation 6.046J Spring 2015 We can implement the above ADT using easy tree augmentation on AVL trees (or 2-3 trees) to store subtree size: f(subtree) = # of nodes in it. Then we also have x.size =1+ c.size for c in x.children.
    [Show full text]
  • Advanced Data Structures
    Advanced Data Structures PETER BRASS City College of New York CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521880374 © Peter Brass 2008 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2008 ISBN-13 978-0-511-43685-7 eBook (EBL) ISBN-13 978-0-521-88037-4 hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Contents Preface page xi 1 Elementary Structures 1 1.1 Stack 1 1.2 Queue 8 1.3 Double-Ended Queue 16 1.4 Dynamical Allocation of Nodes 16 1.5 Shadow Copies of Array-Based Structures 18 2 Search Trees 23 2.1 Two Models of Search Trees 23 2.2 General Properties and Transformations 26 2.3 Height of a Search Tree 29 2.4 Basic Find, Insert, and Delete 31 2.5ReturningfromLeaftoRoot35 2.6 Dealing with Nonunique Keys 37 2.7 Queries for the Keys in an Interval 38 2.8 Building Optimal Search Trees 40 2.9 Converting Trees into Lists 47 2.10
    [Show full text]
  • Search Trees
    Lecture III Page 1 “Trees are the earth’s endless effort to speak to the listening heaven.” – Rabindranath Tagore, Fireflies, 1928 Alice was walking beside the White Knight in Looking Glass Land. ”You are sad.” the Knight said in an anxious tone: ”let me sing you a song to comfort you.” ”Is it very long?” Alice asked, for she had heard a good deal of poetry that day. ”It’s long.” said the Knight, ”but it’s very, very beautiful. Everybody that hears me sing it - either it brings tears to their eyes, or else -” ”Or else what?” said Alice, for the Knight had made a sudden pause. ”Or else it doesn’t, you know. The name of the song is called ’Haddocks’ Eyes.’” ”Oh, that’s the name of the song, is it?” Alice said, trying to feel interested. ”No, you don’t understand,” the Knight said, looking a little vexed. ”That’s what the name is called. The name really is ’The Aged, Aged Man.’” ”Then I ought to have said ’That’s what the song is called’?” Alice corrected herself. ”No you oughtn’t: that’s another thing. The song is called ’Ways and Means’ but that’s only what it’s called, you know!” ”Well, what is the song then?” said Alice, who was by this time completely bewildered. ”I was coming to that,” the Knight said. ”The song really is ’A-sitting On a Gate’: and the tune’s my own invention.” So saying, he stopped his horse and let the reins fall on its neck: then slowly beating time with one hand, and with a faint smile lighting up his gentle, foolish face, he began..
    [Show full text]
  • Computational Geometry: 1D Range Tree, 2D Range Tree, Line
    Computational Geometry Lecture 17 Computational geometry Algorithms for solving “geometric problems” in 2D and higher. Fundamental objects: point line segment line Basic structures: point set polygon L17.2 Computational geometry Algorithms for solving “geometric problems” in 2D and higher. Fundamental objects: point line segment line Basic structures: triangulation convex hull L17.3 Orthogonal range searching Input: n points in d dimensions • E.g., representing a database of n records each with d numeric fields Query: Axis-aligned box (in 2D, a rectangle) • Report on the points inside the box: • Are there any points? • How many are there? • List the points. L17.4 Orthogonal range searching Input: n points in d dimensions Query: Axis-aligned box (in 2D, a rectangle) • Report on the points inside the box Goal: Preprocess points into a data structure to support fast queries • Primary goal: Static data structure • In 1D, we will also obtain a dynamic data structure supporting insert and delete L17.5 1D range searching In 1D, the query is an interval: First solution using ideas we know: • Interval trees • Represent each point x by the interval [x, x]. • Obtain a dynamic structure that can list k answers in a query in O(k lg n) time. L17.6 1D range searching In 1D, the query is an interval: Second solution using ideas we know: • Sort the points and store them in an array • Solve query by binary search on endpoints. • Obtain a static structure that can list k answers in a query in O(k + lg n) time. Goal: Obtain a dynamic structure that can list k answers in a query in O(k + lg n) time.
    [Show full text]
  • Position Heaps for Cartesian-Tree Matching on Strings and Tries
    Position Heaps for Cartesian-tree Matching on Strings and Tries Akio Nishimoto1, Noriki Fujisato1, Yuto Nakashima1, and Shunsuke Inenaga1;2 1Department of Informatics, Kyushu University, Japan fnishimoto.akio, noriki.fujisato, yuto.nakashima, [email protected] 2PRESTO, Japan Science and Technology Agency, Japan Abstract The Cartesian-tree pattern matching is a recently introduced scheme of pattern matching that detects fragments in a sequential data stream which have a similar structure as a query pattern. Formally, Cartesian-tree pattern matching seeks all substrings S0 of the text string S such that the Cartesian tree of S0 and that of a query pattern P coincide. In this paper, we present a new indexing structure for this problem, called the Cartesian-tree Position Heap (CPH ). Let n be the length of the input text string S, m the length of a query pattern P , and σ the alphabet size. We show that the CPH of S, denoted CPH(S), supports pattern matching queries in O(m(σ + log(minfh; mg)) + occ) time with O(n) space, where h is the height of the CPH and occ is the number of pattern occurrences. We show how to build CPH(S) in O(n log σ) time with O(n) working space. Further, we extend the problem to the case where the text is a labeled tree (i.e. a trie). Given a trie T with N nodes, we show that the CPH of T , denoted CPH(T ), supports pattern matching queries on the trie in O(m(σ2 +log(minfh; mg))+occ) time with O(Nσ) space.
    [Show full text]
  • Range Searching
    Range Searching ² Data structure for a set of objects (points, rectangles, polygons) for efficient range queries. Y Q X ² Depends on type of objects and queries. Consider basic data structures with broad applicability. ² Time-Space tradeoff: the more we preprocess and store, the faster we can solve a query. ² Consider data structures with (nearly) linear space. Subhash Suri UC Santa Barbara Orthogonal Range Searching ² Fix a n-point set P . It has 2n subsets. How many are possible answers to geometric range queries? Y 5 Some impossible rectangular ranges 6 (1,2,3), (1,4), (2,5,6). 1 4 Range (1,5,6) is possible. 3 2 X ² Efficiency comes from the fact that only a small fraction of subsets can be formed. ² Orthogonal range searching deals with point sets and axis-aligned rectangle queries. ² These generalize 1-dimensional sorting and searching, and the data structures are based on compositions of 1-dim structures. Subhash Suri UC Santa Barbara 1-Dimensional Search ² Points in 1D P = fp1; p2; : : : ; png. ² Queries are intervals. 15 71 3 7 9 21 23 25 45 70 72 100 120 ² If the range contains k points, we want to solve the problem in O(log n + k) time. ² Does hashing work? Why not? ² A sorted array achieves this bound. But it doesn’t extend to higher dimensions. ² Instead, we use a balanced binary tree. Subhash Suri UC Santa Barbara Tree Search 15 7 24 3 12 20 27 1 4 9 14 17 22 25 29 1 3 4 7 9 12 14 15 17 20 22 24 25 27 29 31 u v xlo =2 x hi =23 ² Build a balanced binary tree on the sorted list of points (keys).
    [Show full text]
  • I/O-Efficient Spatial Data Structures for Range Queries
    I/O-Efficient Spatial Data Structures for Range Queries Lars Arge Kasper Green Larsen MADALGO,∗ Department of Computer Science, Aarhus University, Denmark E-mail: [email protected],[email protected] 1 Introduction Range reporting is a one of the most fundamental topics in spatial databases and computational geometry. In this class of problems, the input consists of a set of geometric objects, such as points, line segments, rectangles etc. The goal is to preprocess the input set into a data structure, such that given a query range, one can efficiently report all input objects intersecting the range. The ranges most commonly considered are axis-parallel rectangles, halfspaces, points, simplices and balls. In this survey, we focus on the planar orthogonal range reporting problem in the external memory model of Aggarwal and Vitter [2]. Here the input consists of a set of N points in the plane, and the goal is to support reporting all points inside an axis-parallel query rectangle. We use B to denote the disk block size in number of points. The cost of answering a query is measured in the number of I/Os performed and the space of the data structure is measured in the number of disk blocks occupied, hence linear space is O(N=B) disk blocks. Outline. In Section 2, we set out by reviewing the classic B-tree for solving one- dimensional orthogonal range reporting, i.e. given N points on the real line and a query interval q = [q1; q2], report all T points inside q. In Section 3 we present optimal solutions for planar orthogonal range reporting, and finally in Section 4, we briefly discuss related range searching problems.
    [Show full text]
  • Parallel Range, Segment and Rectangle Queries with Augmented Maps
    Parallel Range, Segment and Rectangle Queries with Augmented Maps Yihan Sun Guy E. Blelloch Carnegie Mellon University Carnegie Mellon University [email protected] [email protected] Abstract The range, segment and rectangle query problems are fundamental problems in computational geometry, and have extensive applications in many domains. Despite the significant theoretical work on these problems, efficient implementations can be complicated. We know of very few practical implementations of the algorithms in parallel, and most implementations do not have tight theoretical bounds. In this paper, we focus on simple and efficient parallel algorithms and implementations for range, segment and rectangle queries, which have tight worst-case bound in theory and good parallel performance in practice. We propose to use a simple framework (the augmented map) to model the problem. Based on the augmented map interface, we develop both multi-level tree structures and sweepline algorithms supporting range, segment and rectangle queries in two dimensions. For the sweepline algorithms, we also propose a parallel paradigm and show corresponding cost bounds. All of our data structures are work-efficient to build in theory (O(n log n) sequential work) and achieve a low parallel depth (polylogarithmic for the multi-level tree structures, and O(n) for sweepline algorithms). The query time is almost linear to the output size. We have implemented all the data structures described in the paper using a parallel augmented map library. Based on the library each data structure only requires about 100 lines of C++ code. We test their performance on large data sets (up to 108 elements) and a machine with 72-cores (144 hyperthreads).
    [Show full text]
  • Lecture Notes of CSCI5610 Advanced Data Structures
    Lecture Notes of CSCI5610 Advanced Data Structures Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong July 17, 2020 Contents 1 Course Overview and Computation Models 4 2 The Binary Search Tree and the 2-3 Tree 7 2.1 The binary search tree . .7 2.2 The 2-3 tree . .9 2.3 Remarks . 13 3 Structures for Intervals 15 3.1 The interval tree . 15 3.2 The segment tree . 17 3.3 Remarks . 18 4 Structures for Points 20 4.1 The kd-tree . 20 4.2 A bootstrapping lemma . 22 4.3 The priority search tree . 24 4.4 The range tree . 27 4.5 Another range tree with better query time . 29 4.6 Pointer-machine structures . 30 4.7 Remarks . 31 5 Logarithmic Method and Global Rebuilding 33 5.1 Amortized update cost . 33 5.2 Decomposable problems . 34 5.3 The logarithmic method . 34 5.4 Fully dynamic kd-trees with global rebuilding . 37 5.5 Remarks . 39 6 Weight Balancing 41 6.1 BB[α]-trees . 41 6.2 Insertion . 42 6.3 Deletion . 42 6.4 Amortized analysis . 42 6.5 Dynamization with weight balancing . 43 6.6 Remarks . 44 1 CONTENTS 2 7 Partial Persistence 47 7.1 The potential method . 47 7.2 Partially persistent BST . 48 7.3 General pointer-machine structures . 52 7.4 Remarks . 52 8 Dynamic Perfect Hashing 54 8.1 Two random graph results . 54 8.2 Cuckoo hashing . 55 8.3 Analysis . 58 8.4 Remarks . 59 9 Binomial and Fibonacci Heaps 61 9.1 The binomial heap .
    [Show full text]
  • CMSC 420: Lecture 7 Randomized Search Structures: Treaps and Skip Lists
    CMSC 420 Dave Mount CMSC 420: Lecture 7 Randomized Search Structures: Treaps and Skip Lists Randomized Data Structures: A common design techlque in the field of algorithm design in- volves the notion of using randomization. A randomized algorithm employs a pseudo-random number generator to inform some of its decisions. Randomization has proved to be a re- markably useful technique, and randomized algorithms are often the fastest and simplest algorithms for a given application. This may seem perplexing at first. Shouldn't an intelligent, clever algorithm designer be able to make better decisions than a simple random number generator? The issue is that a deterministic decision-making process may be susceptible to systematic biases, which in turn can result in unbalanced data structures. Randomness creates a layer of \independence," which can alleviate these systematic biases. In this lecture, we will consider two famous randomized data structures, which were invented at nearly the same time. The first is a randomized version of a binary tree, called a treap. This data structure's name is a portmanteau (combination) of \tree" and \heap." It was developed by Raimund Seidel and Cecilia Aragon in 1989. (Remarkably, this 1-dimensional data structure is closely related to two 2-dimensional data structures, the Cartesian tree by Jean Vuillemin and the priority search tree of Edward McCreight, both discovered in 1980.) The other data structure is the skip list, which is a randomized version of a linked list where links can point to entries that are separated by a significant distance. This was invented by Bill Pugh (a professor at UMD!).
    [Show full text]