MIT 6.851 Advanced Data Structures Prof

Total Page:16

File Type:pdf, Size:1020Kb

MIT 6.851 Advanced Data Structures Prof MIT 6.851 Advanced Data Structures Prof. Erik Demaine Spring '12 Scribe Notes Collection TA: Tom Morgan, Justin Zhang Editing: Justin Zhang Contents 1 1. Temporal data structure 1 4 Scribers: Oscar Moll (2012), Aston Motes (2007), Kevin Wang (2007) 1.1 Overview . 4 1.2 Model and definitions . 4 1.3 Partial persistence . 6 1.4 Full persistence . 9 1.5 Confluent Persistence . 12 1.6 Functional persistence . 13 2 2. Temporal data structure 2 14 Scribers: Erek Speed (2012), Victor Jakubiuk (2012), Aston Motes (2007), Kevin Wang (2007) 2.1 Overview . 14 2.2 Retroactivity . 14 3 3. Geometric data structure 1 24 Scribers: Brian Hamrick (2012), Ben Lerner (2012), Keshav Puranmalka (2012) 3.1 Overview . 24 3.2 Planar Point Location . 24 3.3 Orthogonal range searching . 27 3.4 Fractional Cascading . 33 4 4. Geometric data structure 2 35 2 Scribers: Brandon Tran (2012), Nathan Pinsker (2012), Ishaan Chugh (2012), David Stein (2010), Jacob Steinhardt (2010) 4.1 Overview- Geometry II . 35 4.2 3D Orthogonal Range Search in O(lg n) Query Time . 35 4.3 Kinetic Data Structures . 38 5 5. Dynamic optimality 1 42 Scribers: Brian Basham (2012), Travis Hance (2012), Jayson Lynch (2012) 5.1 Overview . 42 5.2 Binary Search Trees . 42 5.3 Splay Trees . 45 5.4 Geometric View . 46 6 6. Dynamic optimality 2 50 Scribers: Aakanksha Sarda (2012), David Field (2012), Leonardo Urbina (2012), Prasant Gopal (2010), Hui Tang (2007), Mike Ebersol (2005) 6.1 Overview . 50 6.2 Independent Rectangle Bounds . 50 6.3 Lower Bounds . 53 6.4 Tango Trees . 54 6.5 Signed greedy algorithm . 57 7 7. Memory hierarchy 1 59 Scribers: Claudio A Andreoni (2012), Sebastien Dabdoub (2012), Usman Masood (2012), Eric Liu (2010), Aditya Rathnam (2007) 7.1 Memory Hierarchies and Models of Them . 59 7.2 External Memory Model . 60 7.3 Cache Oblivious Model . 61 7.4 Cache Oblivious B-Trees . 62 8 8. Memory hierarchy 2 67 Scribers: Pedram Razavi (2012), Thomas Georgiou (2012), Tom Morgan (2010) 8.1 From Last Lectures. 67 8.2 Ordered File Maintenance [55] [56] . 67 8.3 List Labeling . 70 8.4 Cache-Oblivious Priority Queues [59] . 70 9 9. Memory hierarchy 3 73 Scribers: Tim Kaler(2012), Jenny Li(2012), Elena Tatarchenko(2012) 9.1 Overview . 73 9.2 Lazy Funnelsort . 73 9.3 Orthogonal 2D Range Searching . 75 10 10. Dictionaries 82 Scribers: Edward Z. Yang (2012), Katherine Fang (2012), Benjamin Y. Lee (2012), David Wilson (2010), Rishi Gupta (2010) 10.1 Overview . 82 10.2 Hash Function . 82 10.3 Basic Chaining . 84 10.4 FKS Perfect Hashing { Fredman, Koml´os,Szemer´edi(1984) [157] . 86 10.5 Linear probing . 87 10.6 Cuckoo Hashing { Pagh and Rodler (2004) [155] . 88 11 11. Integer 1 92 Scribers: Sam Fingeret (2012), Shravas Rao (2012), Paul Christiano (2010) 11.1 Overview . 92 11.2 Integer Data Structures . 92 11.3 Successor / Predecessor Queries . 93 11.4 Van Emde Boas Trees . 94 11.5 Binary tree view . 95 11.6 Reducing space . 96 12 12. Integer 2 98 Scribers: Kent Huynh (2012), Shoshana Klerman (2012), Eric Shyu (2012) 12.1 Introduction . 98 12.2 Overview of Fusion Trees . 98 12.3 The General Idea . 99 12.4 Sketching . 99 12.5 Desketchifying . 100 12.6 Approximating Sketch . 101 12.7 Parallel Comparison . 102 12.8 Most Significant Set Bit . 103 13 13. Integer 3 106 Scribers: Karan Sagar (2012), Jacob Hurwitz (2012), Jingjing Liu (2010), Meshkat Farrokhzadi (2007), Yoyo Zhou (2005) 13.1 Overview . 106 13.2 Survey of predecessor lower bound results . 106 13.3 Communication Complexity . 108 13.4 Round Elimination . 109 13.5 Proof of Predecessor Bound . 109 13.6 Sketch of the Proof for the Round Elimination Lemma . 110 14 14. Integer 4 113 Scribers: Leon Bergen (2012), Andrea Lincoln (2012), Tana Wattanawaroon (2012), Haitao Mao (2010), Eric Price (2007) 14.1 Overview . 113 14.2 Sorting reduced to Priority Queues . 113 14.3 Sorting reduced to Priority Queues . 114 14.4 Sorting for w = Ω(log2+" n) ................................114 15 15. Static Trees 120 Scribers: Jelle van den Hooff(2012), Yuri Lin(2012), Andrew Winslow (2010) 15.1 Overview . 120 15.2 Reductions between RMQ and LCA . 121 15.3 Constant time LCA and RMQ . 124 15.4 Level Ancestor Queries (LAQ) . 126 16 16. Strings 128 Scribers: Cosmin Gheorghe (2012), Nils Molina (2012), Kate Rudolph (2012), Mark Chen (2010), Aston Motes (2007), Kah Keng Tay (2007), Igor Ganichev (2005), Michael Walfish (2003) 16.1 Overview . 128 16.2 Predecessor Problem . 128 16.3 Suffix Trees . 132 16.4 Suffix Arrays . 134 16.5 DC3 Algorithm for Building Suffix Arrays . 136 17 17. Succinct 1 139 Scribers: David Benjamin(2012), Lin Fei(2012), Yuzhi Zheng(2012),Morteza Zadimoghaddam(2010), Aaron Bernstein(2007) 17.1 Overview . 139 17.2 Level Order Representation of Binary Tries . 141 17.3 Rank and Select . 143 17.4 Subtree Sizes . 146 18 18. Succinct 2 148 Scribers: Sanja Popovic(2012), Jean-Baptiste Nivoit(2012), Jon Schneider(2012), Sarah Eisenstat (2010), Mart´ıBol´ıvar (2007) 18.1 Overview . 148 18.2 Survey . 148 18.3 Compressed suffix arrays . 150 18.4 Compact suffix arrays . ..
Recommended publications
  • Cache Oblivious Search Trees Via Binary Trees of Small Height
    Cache Oblivious Search Trees via Binary Trees of Small Height Gerth Stølting Brodal∗ Rolf Fagerberg∗ Riko Jacob∗ Abstract likely to increase in the future [13, pp. 7 and 429]. We propose a version of cache oblivious search trees As a consequence, the memory access pattern of an which is simpler than the previous proposal of Bender, algorithm has become a key component in determining Demaine and Farach-Colton and has the same complex- its running time in practice. Since classic asymptotical ity bounds. In particular, our data structure avoids the analysis of algorithms in the RAM model is unable to use of weight balanced B-trees, and can be implemented capture this, a number of more elaborate models for as just a single array of data elements, without the use of analysis have been proposed. The most widely used of pointers. The structure also improves space utilization. these is the I/O model of Aggarwal and Vitter [1], which For storing n elements, our proposal uses (1 + ε)n assumes a memory hierarchy containing two levels, the times the element size of memory, and performs searches lower level having size M and the transfer between the in worst case O(logB n) memory transfers, updates two levels taking place in blocks of B elements. The in amortized O((log2 n)/(εB)) memory transfers, and cost of the computation in the I/O model is the number range queries in worst case O(logB n + k/B) memory of blocks transferred. This model is adequate when transfers, where k is the size of the output.
    [Show full text]
  • Compressed Suffix Trees with Full Functionality
    Compressed Suffix Trees with Full Functionality Kunihiko Sadakane Department of Computer Science and Communication Engineering, Kyushu University Hakozaki 6-10-1, Higashi-ku, Fukuoka 812-8581, Japan [email protected] Abstract We introduce new data structures for compressed suffix trees whose size are linear in the text size. The size is measured in bits; thus they occupy only O(n log |A|) bits for a text of length n on an alphabet A. This is a remarkable improvement on current suffix trees which require O(n log n) bits. Though some components of suffix trees have been compressed, there is no linear-size data structure for suffix trees with full functionality such as computing suffix links, string-depths and lowest common ancestors. The data structure proposed in this paper is the first one that has linear size and supports all operations efficiently. Any algorithm running on a suffix tree can also be executed on our compressed suffix trees with a slight slowdown of a factor of polylog(n). 1 Introduction Suffix trees are basic data structures for string algorithms [13]. A pattern can be found in time proportional to the pattern length from a text by constructing the suffix tree of the text in advance. The suffix tree can also be used for more complicated problems, for example finding the longest repeated substring in linear time. Many efficient string algorithms are based on the use of suffix trees because this does not increase the asymptotic time complexity. A suffix tree of a string can be constructed in linear time in the string length [28, 21, 27, 5].
    [Show full text]
  • Trans-Dichotomous Algorithms for Minimum Spanning Trees and Shortest Paths
    JOURNAL or COMPUTER AND SYSTEM SCIENCES 48, 533-551 (1994) Trans-dichotomous Algorithms for Minimum Spanning Trees and Shortest Paths MICHAEL L. FREDMAN* University of California at San Diego, La Jolla, California 92093, and Rutgers University, New Brunswick, New Jersey 08903 AND DAN E. WILLARDt SUNY at Albany, Albany, New York 12203 Received February 5, 1991; revised October 20, 1992 Two algorithms are presented: a linear time algorithm for the minimum spanning tree problem and an O(m + n log n/log log n) implementation of Dijkstra's shortest-path algorithm for a graph with n vertices and m edges. The second algorithm surpasses information theoretic limitations applicable to comparison-based algorithms. Both algorithms utilize new data structures that extend the fusion tree method. © 1994 Academic Press, Inc. 1. INTRODUCTION We extend the fusion tree method [7J to develop a linear-time algorithm for the minimum spanning tree problem and an O(m + n log n/log log n) implementation of Dijkstra's shortest-path algorithm for a graph with n vertices and m edges. The implementation of Dijkstra's algorithm surpasses information theoretic limitations applicable to comparison-based algorithms. Our extension of the fusion tree method involves the development of a new data structure, the atomic heap. The atomic heap accommodates heap (priority queue) operations in constant amortized time under suitable polylog restrictions on the heap size. Our linear-time minimum spanning tree algorithm results from a direct application of the atomic heap. To obtain the shortest-path algorithm, we first use the atomic heat as a building block to construct a new data structure, the AF-heap, which has no explicit size restric- tion and surpasses information theoretic limitations applicable to comparison-based algorithms.
    [Show full text]
  • How to Morph Graphs on the Torus
    How to Morph Graphs on the Torus Erin Wolf Chambers† Jeff Erickson‡ Patrick Lin‡ Salman Parsa† Abstract first algorithm to morph graphs on any higher-genus surface. In fact, it is the first algorithm to compute We present the first algorithm to morph graphs on the torus. any form of isotopy between surface graphs; existing Given two isotopic essentially 3-connected embeddings of the algorithms to test whether two graphs on the same surface same graph on the Euclidean flat torus, where the edges in both are isotopic are non-constructive 29 . Our algorithm drawings are geodesics, our algorithm computes a continuous [ ] outputs a morph consisting of O n steps; within each step, deformation from one drawing to the other, such that all edges ( ) all vertices move along parallel geodesics at (different) are geodesics at all times. Previously even the existence of constant speeds, and all edges remain geodesics (“straight such a morph was not known. Our algorithm runs in O n1+!=2 ( ) line segments”). Our algorithm runs in O n1+!=2 time; the time, where ! is the matrix multiplication exponent, and the ( ) running time is dominated by repeatedly solving a linear computed morph consists of O n parallel linear morphing steps. ( ) system encoding a natural generalization of Tutte’s spring Existing techniques for morphing planar straight-line graphs do embedding theorem. not immediately generalize to graphs on the torus; in particular, Cairns’ original 1944 proof and its more recent improvements rely on the fact that every planar graph contains a vertex of degree 1.1 Prior Results (and Why They Don’t Generalize).
    [Show full text]
  • Interval Trees Storing and Searching Intervals
    Interval Trees Storing and Searching Intervals • Instead of points, suppose you want to keep track of axis-aligned segments: • Range queries: return all segments that have any part of them inside the rectangle. • Motivation: wiring diagrams, genes on genomes Simpler Problem: 1-d intervals • Segments with at least one endpoint in the rectangle can be found by building a 2d range tree on the 2n endpoints. - Keep pointer from each endpoint stored in tree to the segments - Mark segments as you output them, so that you don’t output contained segments twice. • Segments with no endpoints in range are the harder part. - Consider just horizontal segments - They must cross a vertical side of the region - Leads to subproblem: Given a vertical line, find segments that it crosses. - (y-coords become irrelevant for this subproblem) Interval Trees query line interval Recursively build tree on interval set S as follows: Sort the 2n endpoints Let xmid be the median point Store intervals that cross xmid in node N intervals that are intervals that are completely to the completely to the left of xmid in Nleft right of xmid in Nright Another view of interval trees x Interval Trees, continued • Will be approximately balanced because by choosing the median, we split the set of end points up in half each time - Depth is O(log n) • Have to store xmid with each node • Uses O(n) storage - each interval stored once, plus - fewer than n nodes (each node contains at least one interval) • Can be built in O(n log n) time. • Can be searched in O(log n + k) time [k = #
    [Show full text]
  • 1 Introduction and Terminology
    A Survey of Solved Problems and Applications on Bandwidth, Edgesum and Pro le of Graphs Yung-Ling Lai National Chiayi Teacher College Chiayi, Taiwan, R.O.C. Kenneth Williams Western Michigan University Abstract This pap er provides a survey of results on the exact bandwidth, edge- sum, and pro le of graphs. A bibliographyof work in these areas is pro- vided. The emphasis is on comp osite graphs. This may be regarded as an up date of the original survey of solved bandwidth problems by Chinn, Chvatalova, Dewdney, and Gibbs[10] in 1982. Also several of the applica- tion areas involving these graph parameters are describ ed. 1 Intro duction and terminology For a graph G, V (G) denotes the set of vertices of G and E (G) denotes the set of edges of G. Let G = (V; E ) be a graph on n vertices. A 1-1 mapping f : V ! f1; 2;:::;ng is called a proper numbering of G. The bandwidth B (G) of aproper f numbering f of G is the number B (G)= maxfjf (u) f (v )j : uv 2 E g; f and the bandwidth B(G) of G is the number B (G)= minfB (G): f is a prop er numb ering of Gg: f A prop er numb ering f is called a bandwidth numbering of G if B (G)=B (G). f For example, Figure 1 shows bandwidth numb erings for the graphs P ;C ;K 4 5 1;4 and K . In general, B (P ) = 1, B (C ) = 2, B (K ) = dn=2e and B (K ) = 2;3 n n 1;n m;n m + dn=2e1form n.
    [Show full text]
  • Balanced Trees Part One
    Balanced Trees Part One Balanced Trees ● Balanced search trees are among the most useful and versatile data structures. ● Many programming languages ship with a balanced tree library. ● C++: std::map / std::set ● Java: TreeMap / TreeSet ● Many advanced data structures are layered on top of balanced trees. ● We’ll see several later in the quarter! Where We're Going ● B-Trees (Today) ● A simple type of balanced tree developed for block storage. ● Red/Black Trees (Today/Thursday) ● The canonical balanced binary search tree. ● Augmented Search Trees (Thursday) ● Adding extra information to balanced trees to supercharge the data structure. Outline for Today ● BST Review ● Refresher on basic BST concepts and runtimes. ● Overview of Red/Black Trees ● What we're building toward. ● B-Trees and 2-3-4 Trees ● Simple balanced trees, in depth. ● Intuiting Red/Black Trees ● A much better feel for red/black trees. A Quick BST Review Binary Search Trees ● A binary search tree is a binary tree with 9 the following properties: 5 13 ● Each node in the BST stores a key, and 1 6 10 14 optionally, some auxiliary information. 3 7 11 15 ● The key of every node in a BST is strictly greater than all keys 2 4 8 12 to its left and strictly smaller than all keys to its right. Binary Search Trees ● The height of a binary search tree is the 9 length of the longest path from the root to a 5 13 leaf, measured in the number of edges. 1 6 10 14 ● A tree with one node has height 0.
    [Show full text]
  • Lecture 04 Linear Structures Sort
    Algorithmics (6EAP) MTAT.03.238 Linear structures, sorting, searching, etc Jaak Vilo 2018 Fall Jaak Vilo 1 Big-Oh notation classes Class Informal Intuition Analogy f(n) ∈ ο ( g(n) ) f is dominated by g Strictly below < f(n) ∈ O( g(n) ) Bounded from above Upper bound ≤ f(n) ∈ Θ( g(n) ) Bounded from “equal to” = above and below f(n) ∈ Ω( g(n) ) Bounded from below Lower bound ≥ f(n) ∈ ω( g(n) ) f dominates g Strictly above > Conclusions • Algorithm complexity deals with the behavior in the long-term – worst case -- typical – average case -- quite hard – best case -- bogus, cheating • In practice, long-term sometimes not necessary – E.g. for sorting 20 elements, you dont need fancy algorithms… Linear, sequential, ordered, list … Memory, disk, tape etc – is an ordered sequentially addressed media. Physical ordered list ~ array • Memory /address/ – Garbage collection • Files (character/byte list/lines in text file,…) • Disk – Disk fragmentation Linear data structures: Arrays • Array • Hashed array tree • Bidirectional map • Heightmap • Bit array • Lookup table • Bit field • Matrix • Bitboard • Parallel array • Bitmap • Sorted array • Circular buffer • Sparse array • Control table • Sparse matrix • Image • Iliffe vector • Dynamic array • Variable-length array • Gap buffer Linear data structures: Lists • Doubly linked list • Array list • Xor linked list • Linked list • Zipper • Self-organizing list • Doubly connected edge • Skip list list • Unrolled linked list • Difference list • VList Lists: Array 0 1 size MAX_SIZE-1 3 6 7 5 2 L = int[MAX_SIZE]
    [Show full text]
  • Suffix Trees
    JASS 2008 Trees - the ubiquitous structure in computer science and mathematics Suffix Trees Caroline L¨obhard St. Petersburg, 9.3. - 19.3. 2008 1 Contents 1 Introduction to Suffix Trees 3 1.1 Basics . 3 1.2 Getting a first feeling for the nice structure of suffix trees . 4 1.3 A historical overview of algorithms . 5 2 Ukkonen’s on-line space-economic linear-time algorithm 6 2.1 High-level description . 6 2.2 Using suffix links . 7 2.3 Edge-label compression and the skip/count trick . 8 2.4 Two more observations . 9 3 Generalised Suffix Trees 9 4 Applications of Suffix Trees 10 References 12 2 1 Introduction to Suffix Trees A suffix tree is a tree-like data-structure for strings, which affords fast algorithms to find all occurrences of substrings. A given String S is preprocessed in O(|S|) time. Afterwards, for any other string P , one can decide in O(|P |) time, whether P can be found in S and denounce all its exact positions in S. This linear worst case time bound depending only on the length of the (shorter) string |P | is special and important for suffix trees since an amount of applications of string processing has to deal with large strings S. 1.1 Basics In this paper, we will denote the fixed alphabet with Σ, single characters with lower-case letters x, y, ..., strings over Σ with upper-case or Greek letters P, S, ..., α, σ, τ, ..., Trees with script letters T , ... and inner nodes of trees (that is, all nodes despite of root and leaves) with lower-case letters u, v, ...
    [Show full text]
  • Augmentation: Range Trees (PDF)
    Lecture 9 Augmentation 6.046J Spring 2015 Lecture 9: Augmentation This lecture covers augmentation of data structures, including • easy tree augmentation • order-statistics trees • finger search trees, and • range trees The main idea is to modify “off-the-shelf” common data structures to store (and update) additional information. Easy Tree Augmentation The goal here is to store x.f at each node x, which is a function of the node, namely f(subtree rooted at x). Suppose x.f can be computed (updated) in O(1) time from x, children and children.f. Then, modification a set S of nodes costs O(# of ancestors of S)toupdate x.f, because we need to walk up the tree to the root. Two examples of O(lg n) updates are • AVL trees: after rotating two nodes, first update the new bottom node and then update the new top node • 2-3 trees: after splitting a node, update the two new nodes. • In both cases, then update up the tree. Order-Statistics Trees (from 6.006) The goal of order-statistics trees is to design an Abstract Data Type (ADT) interface that supports the following operations • insert(x), delete(x), successor(x), • rank(x): find x’s index in the sorted order, i.e., # of elements <x, • select(i): find the element with rank i. 1 Lecture 9 Augmentation 6.046J Spring 2015 We can implement the above ADT using easy tree augmentation on AVL trees (or 2-3 trees) to store subtree size: f(subtree) = # of nodes in it. Then we also have x.size =1+ c.size for c in x.children.
    [Show full text]
  • Advanced Data Structures
    Advanced Data Structures PETER BRASS City College of New York CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521880374 © Peter Brass 2008 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2008 ISBN-13 978-0-511-43685-7 eBook (EBL) ISBN-13 978-0-521-88037-4 hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Contents Preface page xi 1 Elementary Structures 1 1.1 Stack 1 1.2 Queue 8 1.3 Double-Ended Queue 16 1.4 Dynamical Allocation of Nodes 16 1.5 Shadow Copies of Array-Based Structures 18 2 Search Trees 23 2.1 Two Models of Search Trees 23 2.2 General Properties and Transformations 26 2.3 Height of a Search Tree 29 2.4 Basic Find, Insert, and Delete 31 2.5ReturningfromLeaftoRoot35 2.6 Dealing with Nonunique Keys 37 2.7 Queries for the Keys in an Interval 38 2.8 Building Optimal Search Trees 40 2.9 Converting Trees into Lists 47 2.10
    [Show full text]
  • Search Trees
    Lecture III Page 1 “Trees are the earth’s endless effort to speak to the listening heaven.” – Rabindranath Tagore, Fireflies, 1928 Alice was walking beside the White Knight in Looking Glass Land. ”You are sad.” the Knight said in an anxious tone: ”let me sing you a song to comfort you.” ”Is it very long?” Alice asked, for she had heard a good deal of poetry that day. ”It’s long.” said the Knight, ”but it’s very, very beautiful. Everybody that hears me sing it - either it brings tears to their eyes, or else -” ”Or else what?” said Alice, for the Knight had made a sudden pause. ”Or else it doesn’t, you know. The name of the song is called ’Haddocks’ Eyes.’” ”Oh, that’s the name of the song, is it?” Alice said, trying to feel interested. ”No, you don’t understand,” the Knight said, looking a little vexed. ”That’s what the name is called. The name really is ’The Aged, Aged Man.’” ”Then I ought to have said ’That’s what the song is called’?” Alice corrected herself. ”No you oughtn’t: that’s another thing. The song is called ’Ways and Means’ but that’s only what it’s called, you know!” ”Well, what is the song then?” said Alice, who was by this time completely bewildered. ”I was coming to that,” the Knight said. ”The song really is ’A-sitting On a Gate’: and the tune’s my own invention.” So saying, he stopped his horse and let the reins fall on its neck: then slowly beating time with one hand, and with a faint smile lighting up his gentle, foolish face, he began..
    [Show full text]