Hilbert R-Tree: an Improved R-Tree Using Fkactals

Total Page:16

File Type:pdf, Size:1020Kb

Hilbert R-Tree: an Improved R-Tree Using Fkactals Hilbert R-tree: An Improved R-tree Using Fkactals Ibrahim Kamel Christos Faloutsos * Department of Computer Science Department of Computer Science and University of Maryland Institute for Systems Research (ISR) College Park, MD 20742 University of Maryland kamelQcs.umd.edu College Park, MD 20742 [email protected] Hilbert R-tree. Our experiments show that the ‘2-to-3’ split policy provides a cornpro Abstract mise between the insertion complexity and the search cost, giving up to 28% savings over the We propose a new Rtree structure that out- R’ - tree [BKSSOO]on real data. performs all the older ones. The heart of the idea is to facilitate the deferred splitting ap preach in R-trees. This is done by propos- 1 Introduction ing an ordering on the R-tree nodes. This One of the requirements for the database management ordering has to be ‘good’, in the sense that systems (DBMSs) of the near future is the ability to it should group ‘similar’ data rectangles to handle spatial data [SSUSl]. Spatial data arise in gether, to minimize the area and perimeter many applications, including: Cartography [WhiOl]; of the resulting minimum bounding rectangles Computer-Aided Design (CAD) [OHM+841 [Gut84a]; (MBRs). computer vision and robotics [BB82]; traditional Following [KF93] we have chosenthe so-called databases, where a record with k attributes corre- ‘2D-c’ method, which sorts rectangles accord- sponds to a point in a k-d space; temporal databases, ing to the Hilbert value of the center of the where time can be considered as one more dimen- rectangles. Given the ordering, every node sion [KS91]; scientific databases with spatial-temporal has a well-defined set of sibling nodes; thus, data, such as the ones in the ‘Grand Challenge’ appli- we can use deferred splitting. By adjusting cations [Gra92], etc. the split policy, the Hilbert R-tree can achieve In the above applications, one of the most typical as high utilization as desired. To the contrary, queries is the mnge query: Given a rectangle, retrieve the R.-tree has no control over the space uti- all the elements that intersect it. A special csse of the lization, typically achieving up to 70%. We range query is the point query or stabbing query, where designed the manipulation algorithms in de- the query rectangle degeneratesto a point. tail, and we did a full implementation of the We focus on the R-tree [Gut84b] family of methods, *This resexch was partially fuuded by the hmtitutc for Sys- which contains someof the most efficient methods that term, FLsemch (ISR), by the National Science Foundation un- support range queries. The advantage of our method der Grants IF&9205273 and IFU-8958546 (PYI), with matching (and the rest of the R-tree-based methods) over the fuuds from EMPRESS Software Inc. und Think& h4achinen methods that use linear quad-trees and z-ordering is IIIC. that R-trees treat the data objects as a whole, while Pcrmieoion io copy without fee all or part of thir maicrial ir granid provided that tbe copier are not made or dirtribmted for quad-tree based methods typically divide objects into direci commercial advantage, the VLDB copyrigkt notice and quad-tree blocks, increasing the number of items to be the title of be publication and iio dde eppar, and notice ir stored. given that copying ir by pewnirrion of the Very Large Data Bare Endowment. To copy other&e, or to reprblieh, reqrirer a fee The most successful variant of R-trees seems to be and/or l puial pemirvior from ihe Endowment. the R*-tree [BKSSOO].One of its main contributions is Proceedings of the 20th VLDB Conference the idea of ‘forced-reinsert’ by deleting some rectangles Santiago, Chlle, 1994 from the overflowing node, and reinserting them. 500 The main idea in the present paper is:to impose an search performance. ordering on the data rectangles. The consequencesare Subsequent work on R-trees includes the work by important: using this ordering, each R-tree node has Greene [Gre89], Roussopoulos and Leiflrer [RL85], R+- a well defined set of siblings; thus, we can use the al- tree by Sellii et al. [SRF87], R-trees using Mini- gorithms for deferred splitting. By adjusting the split mum Bounding Ploygons [JaggOal,Kamel and Falout- policy (2-to-3 or 3-t&4 etc) we can drive the utilization sos [KF93] and the R*-tree [BKSSSO] of Beckmann a8 close to 100% as desirable. Notice that the R*-tree et al. , which seems to have better performance than does not have control over the utilization, typically Guttman R-tree “quadratic split”. The main idea in achieving an average of ~70%. the R*-tree is the concept of forced re-insert. When a The only requirement for the ordering is that it has node overflows, some of its children are carefully the to be ‘good’, that is, it should lead to small R-tree sen; they are deleted and re-inserted, usually resulting nodes. in a R-tree with better structure. The paper is organized as follows. Section 2 gives a brief description of the R-tree and its variants. Sec- tion 3 describes the Hilbert R-tree. Section 4 presents 3 Hilbert R-trees our experimental results that compare the Hilbert R- In this section we introduce the Hilbert R-tree and dis- tree with other R-tree variants. Section 5 gives the cuss algorithms for searching, insertion, deletion, and conclusions and directions for future research. overflow handling. The performance of the R-trees de- pends on how good is the algorithm that cluster the 2 Survey data rectangles to a node. We propose to use space filling curves (or fractals), and specifically, the Hilbert Several spatial accessmethods have been proposed. A curve to impose a linear ordering on the data rectan- recent survey can be found in [Sam89]. These meth- gles. ods fall in the following broad classes: methods that A space filling curve visits all the points in a Ic- transform rectanglea into points in a higher dimen- dimensional grid exactly once and never crosses it- sionality space [HN83, Fre87]; methods that use lin- self. The Z-order (or Morton key order, or bit- ear quadtrees [Gar82] [AS911 or, equivalently, the Z- interleaving, or Peano curve), the Hilbert curve, and ordering [Ore861 or other space filling curvea [FR89] the Gray-code curve [Fal88] are examples of space fill- [JaggOb]; and finally, methods based on treea (R- ing curves. In [FR89], it was shown experimentally tree [Gut84b], k-d-trees [Ben75], k-d-B-trees [RobBl], that the HiIbert curve achieves the beat clustering hB-trees [LS90], cell-trees [Gun891 e.t.c.) among the three above methods. One of the most promising approaches in the last Next we provide a brief introduction to the Hilbert class is the R-tree [Gut84b]: Compared to the trans- curve: The basic Hilbert curve on a 2x2 grid, denoted formation methods, R-trees work on the native space, by HI, is shown in Figure 1. To derive a curve of or- which has lower dimensionality; compared to the lin- der i, each vertex of the basic curve is replaced by the ear quadtrees, the R-trees do not need to divide the curve of order i - 1, which may be appropriately rc+ spatial objects into (several) pieces (quadtree blocks). tated and/or reflected. Figure 1 also shows the Hilbert The R-tree is an extension of the B-tree for multidi- curvea of order 2 and 3. When the order of the curve mensional objects. A geometric object is represented tends to infinity, the resulting curve is a fractal, with a by its minimum bounding rectangle (MBR): Non-leaf fractal dimension of 2 [Man77]. The Hilbert curve can nodes contain entries of the form (R&r) where ptr is be generalized for higher dimensionalitiea. Algorithms a pointer to a child node in the R-tree; R is the MBR to draw the two-dimensional curve of a given order, that covers all rectangles in the child node. Leaf nodes can be found in [Gri86], [JaggOb]. An algorithm for contain entries of the form (obj-id, R) where obj-id is a higher dimensionalities is in [Bia69]. pointer to the object description, and R is the MBR of The path of a space filling curve imposes a linear the object. The main innovation in the R-tree is that ordering on the grid points. Figure 1 shows one such father nodes are allowed to overlap. This way, the R- ordering for a 4 x 4 grid (see curve Ha). For example tree can guarantee at least 50% space utilization and the point (0,O) on the Hz curve has a Hilbert value remain balanced. of 0, while the point (1,l) has a Hilbert value of 2. Guttman proposed three splitting algorithms, the The Hilbert value of a rectangle needs to be defined. linear split, the quadraiic split and the exponedial Following the experiments in [KF93], a good choice is split. Their namea come from their complexity; among the following: the three, the quadratic split algorithm is the one that achieves the best trade-off between splitting time and Definition 1 : The Hilbed value of a rectangle is de- 501 “1 “2 “3 Figure 1: Hilbert Curves of order 1, 2 and 3 fined as the Hilbert value of its center. on the disk; the contents of the parent node ‘II’ are After this preliminary material, we are in a position shown in more detail. Every data rectangle in node now to describe the proposed methods. ‘I’ has Hilbert value 533; everything in node ‘II’ has Hilbert value greater than 33 and 5107 etc.
Recommended publications
  • Tree-Combined Trie: a Compressed Data Structure for Fast IP Address Lookup
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 6, No. 12, 2015 Tree-Combined Trie: A Compressed Data Structure for Fast IP Address Lookup Muhammad Tahir Shakil Ahmed Department of Computer Engineering, Department of Computer Engineering, Sir Syed University of Engineering and Technology, Sir Syed University of Engineering and Technology, Karachi Karachi Abstract—For meeting the requirements of the high-speed impact their forwarding capacity. In order to resolve two main Internet and satisfying the Internet users, building fast routers issues there are two possible solutions one is IPv6 IP with high-speed IP address lookup engine is inevitable. addressing scheme and second is Classless Inter-domain Regarding the unpredictable variations occurred in the Routing or CIDR. forwarding information during the time and space, the IP lookup algorithm should be able to customize itself with temporal and Finding a high-speed, memory-efficient and scalable IP spatial conditions. This paper proposes a new dynamic data address lookup method has been a great challenge especially structure for fast IP address lookup. This novel data structure is in the last decade (i.e. after introducing Classless Inter- a dynamic mixture of trees and tries which is called Tree- Domain Routing, CIDR, in 1994). In this paper, we will Combined Trie or simply TC-Trie. Binary sorted trees are more discuss only CIDR. In addition to these desirable features, advantageous than tries for representing a sparse population reconfigurability is also of great importance; true because while multibit tries have better performance than trees when a different points of this huge heterogeneous structure of population is dense.
    [Show full text]
  • Balanced Trees Part One
    Balanced Trees Part One Balanced Trees ● Balanced search trees are among the most useful and versatile data structures. ● Many programming languages ship with a balanced tree library. ● C++: std::map / std::set ● Java: TreeMap / TreeSet ● Many advanced data structures are layered on top of balanced trees. ● We’ll see several later in the quarter! Where We're Going ● B-Trees (Today) ● A simple type of balanced tree developed for block storage. ● Red/Black Trees (Today/Thursday) ● The canonical balanced binary search tree. ● Augmented Search Trees (Thursday) ● Adding extra information to balanced trees to supercharge the data structure. Outline for Today ● BST Review ● Refresher on basic BST concepts and runtimes. ● Overview of Red/Black Trees ● What we're building toward. ● B-Trees and 2-3-4 Trees ● Simple balanced trees, in depth. ● Intuiting Red/Black Trees ● A much better feel for red/black trees. A Quick BST Review Binary Search Trees ● A binary search tree is a binary tree with 9 the following properties: 5 13 ● Each node in the BST stores a key, and 1 6 10 14 optionally, some auxiliary information. 3 7 11 15 ● The key of every node in a BST is strictly greater than all keys 2 4 8 12 to its left and strictly smaller than all keys to its right. Binary Search Trees ● The height of a binary search tree is the 9 length of the longest path from the root to a 5 13 leaf, measured in the number of edges. 1 6 10 14 ● A tree with one node has height 0.
    [Show full text]
  • Trees: Binary Search Trees
    COMP2012H Spring 2014 Dekai Wu Binary Trees & Binary Search Trees (data structures for the dictionary ADT) Outline } Binary tree terminology } Tree traversals: preorder, inorder and postorder } Dictionary and binary search tree } Binary search tree operations } Search } min and max } Successor } Insertion } Deletion } Tree balancing issue COMP2012H (BST) Binary Tree Terminology } Go to the supplementary notes COMP2012H (BST) Linked Representation of Binary Trees } The degree of a node is the number of children it has. The degree of a tree is the maximum of its element degree. } In a binary tree, the tree degree is two data } Each node has two links left right } one to the left child of the node } one to the right child of the node Left child Right child } if no child node exists for a node, the link is set to NULL root 32 32 79 42 79 42 / 13 95 16 13 95 16 / / / / / / COMP2012H (BST) Binary Trees as Recursive Data Structures } A binary tree is either empty … Anchor or } Consists of a node called the root } Root points to two disjoint binary (sub)trees Inductive step left and right (sub)tree r left right subtree subtree COMP2012H (BST) Tree Traversal is Also Recursive (Preorder example) If the binary tree is empty then Anchor do nothing Else N: Visit the root, process data L: Traverse the left subtree Inductive/Recursive step R: Traverse the right subtree COMP2012H (BST) 3 Types of Tree Traversal } If the pointer to the node is not NULL: } Preorder: Node, Left subtree, Right subtree } Inorder: Left subtree, Node, Right subtree Inductive/Recursive step } Postorder: Left subtree, Right subtree, Node template <class T> void BinaryTree<T>::InOrder( void(*Visit)(BinaryTreeNode<T> *u), template<class T> BinaryTreeNode<T> *t) void BinaryTree<T>::PreOrder( {// Inorder traversal.
    [Show full text]
  • Q-Tree: a Multi-Attribute Based Range Query Solution for Tele-Immersive Framework
    Q-Tree: A Multi-Attribute Based Range Query Solution for Tele-Immersive Framework Md Ahsan Arefin, Md Yusuf Sarwar Uddin, Indranil Gupta, Klara Nahrstedt Department of Computer Science University of Illinois at Urbana Champaign Illinois, USA {marefin2, mduddin2, indy, klara}@illinois.edu Abstract given in a high level description which are transformed into multi-attribute composite range queries. Some of the Users and administrators of large distributed systems are examples include “which site is highly congested?”, “which frequently in need of monitoring and management of its components are not working properly?” etc. To answer the various components, data items and resources. Though there first one, the query is transformed into a multi-attribute exist several distributed query and aggregation systems, the composite range query with constrains (range of values) on clustered structure of tele-immersive interactive frameworks CPU utilization, memory overhead, stream rate, bandwidth and their time-sensitive nature and application requirements utilization, delay and packet loss rate. The later one can represent a new class of systems which poses different chal- be answered by constructing a multi-attribute range query lenges on this distributed search. Multi-attribute composite with constrains on static and dynamic characteristics of range queries are one of the key features in this class. those components. Queries can also be made by defining Queries are given in high level descriptions and then trans- different multi-attribute ranges explicitly. Another mention- formed into multi-attribute composite range queries. Design- able property of such systems is that the number of site is ing such a query engine with minimum traffic overhead, low limited due to limited display space and limited interactions, service latency, and with static and dynamic nature of large but the number of data items to store and manage can datasets, is a challenging task.
    [Show full text]
  • Heaps a Heap Is a Complete Binary Tree. a Max-Heap Is A
    Heaps Heaps 1 A heap is a complete binary tree. A max-heap is a complete binary tree in which the value in each internal node is greater than or equal to the values in the children of that node. A min-heap is defined similarly. 97 Mapping the elements of 93 84 a heap into an array is trivial: if a node is stored at 90 79 83 81 index k, then its left child is stored at index 42 55 73 21 83 2k+1 and its right child at index 2k+2 01234567891011 97 93 84 90 79 83 81 42 55 73 21 83 CS@VT Data Structures & Algorithms ©2000-2009 McQuain Building a Heap Heaps 2 The fact that a heap is a complete binary tree allows it to be efficiently represented using a simple array. Given an array of N values, a heap containing those values can be built, in situ, by simply “sifting” each internal node down to its proper location: - start with the last 73 73 internal node * - swap the current 74 81 74 * 93 internal node with its larger child, if 79 90 93 79 90 81 necessary - then follow the swapped node down 73 * 93 - continue until all * internal nodes are 90 93 90 73 done 79 74 81 79 74 81 CS@VT Data Structures & Algorithms ©2000-2009 McQuain Heap Class Interface Heaps 3 We will consider a somewhat minimal maxheap class: public class BinaryHeap<T extends Comparable<? super T>> { private static final int DEFCAP = 10; // default array size private int size; // # elems in array private T [] elems; // array of elems public BinaryHeap() { .
    [Show full text]
  • AVL Trees, Page 1 ' $
    CS 310 AVL Trees, Page 1 ' $ Motives of Balanced Trees Worst-case search steps #ofentries complete binary tree skewed binary tree 15 4 15 63 6 63 1023 10 1023 65535 16 65535 1,048,575 20 1048575 1,073,741,823 30 1073741823 & % CS 310 AVL Trees, Page 2 ' $ Analysis r • logx y =anumberr such that x = y ☞ log2 1024 = 10 ☞ log2 65536 = 16 ☞ log2 1048576 = 20 ☞ log2 1073741824 = 30 •bxc = an integer a such that a ≤ x dxe = an integer a such that a ≥ x ☞ b3.1416c =3 ☞ d3.1416e =4 ☞ b5c =5=d5e & % CS 310 AVL Trees, Page 3 ' $ • If a binary search tree of N nodes happens to be complete, then a search in the tree requires at most b c log2 N +1 steps. • Big-O notation: We say f(x)=O(g(x)) if f(x)isbounded by c · g(x), where c is a constant, for sufficiently large x. ☞ x2 +11x − 30 = O(x2) (consider c =2,x ≥ 6) ☞ x100 + x50 +2x = O(2x) • If a BST happens to be complete, then a search in the tree requires O(log2 N)steps. • In a linked list of N nodes, a search requires O(N)steps. & % CS 310 AVL Trees, Page 4 ' $ AVL Trees • A balanced binary search tree structure proposed by Adelson, Velksii and Landis in 1962. • In an AVL tree of N nodes, every searching, deletion or insertion operation requires only O(log2 N)steps. • AVL tree searching is exactly the same as that of the binary search tree. & % CS 310 AVL Trees, Page 5 ' $ Definition • An empty tree is balanced.
    [Show full text]
  • Binary Search Tree
    ADT Binary Search Tree! Ellen Walker! CPSC 201 Data Structures! Hiram College! Binary Search Tree! •" Value-based storage of information! –" Data is stored in order! –" Data can be retrieved by value efficiently! •" Is a binary tree! –" Everything in left subtree is < root! –" Everything in right subtree is >root! –" Both left and right subtrees are also BST#s! Operations on BST! •" Some can be inherited from binary tree! –" Constructor (for empty tree)! –" Inorder, Preorder, and Postorder traversal! •" Some must be defined ! –" Insert item! –" Delete item! –" Retrieve item! The Node<E> Class! •" Just as for a linked list, a node consists of a data part and links to successor nodes! •" The data part is a reference to type E! •" A binary tree node must have links to both its left and right subtrees! The BinaryTree<E> Class! The BinaryTree<E> Class (continued)! Overview of a Binary Search Tree! •" Binary search tree definition! –" A set of nodes T is a binary search tree if either of the following is true! •" T is empty! •" Its root has two subtrees such that each is a binary search tree and the value in the root is greater than all values of the left subtree but less than all values in the right subtree! Overview of a Binary Search Tree (continued)! Searching a Binary Tree! Class TreeSet and Interface Search Tree! BinarySearchTree Class! BST Algorithms! •" Search! •" Insert! •" Delete! •" Print values in order! –" We already know this, it#s inorder traversal! –" That#s why it#s called “in order”! Searching the Binary Tree! •" If the tree is
    [Show full text]
  • 7. Hierarchies & Trees Visualizing Topological Relations
    7. Hierarchies & Trees Visualizing topological relations Vorlesung „Informationsvisualisierung” Prof. Dr. Andreas Butz, WS 2011/12 Konzept und Basis für Folien: Thorsten Büring LMU München – Medieninformatik – Andreas Butz – Informationsvisualisierung – WS2011/12 Folie 1 Outline • Hierarchical data and tree representations • 2D Node-link diagrams – Hyperbolic Tree Browser – SpaceTree – Cheops – Degree of interest tree – 3D Node-link diagrams • Enclosure – Treemap – Ordererd Treemaps – Various examples – Voronoi treemap – 3D Treemaps • Circular visualizations • Space-filling node-link diagram LMU München – Medieninformatik – Andreas Butz – Informationsvisualisierung – WS2011/12 Folie 2 Hierarchical Data • Card et al. 1999: data repository in which data cases are related to subcases • Many data collections have an inherent hierarchical organization – Organizational Charts – Websites (approximately hierarchical) Yee et al. 2001 – File system – Family tree – OO programming • Hierarchies are usually represented as tree visual structures • Trees tend to be easier to lay out and interpret than networks (e.g. no cycles) • But: as shown in the example, networks may in some cases be visualized as a tree LMU München – Medieninformatik – Andreas Butz – Informationsvisualisierung – WS2011/12 Folie 3 Tree Representations • Two kinds of representations • Node-link diagram (see previous lecture): represent connections as edges between vertices (data cases) http://www.icann.org • Enclosure: space-filling approaches by visually nesting the hierarchy LMU
    [Show full text]
  • 6.172 Lecture 19 : Cache-Oblivious B-Tree (Tokudb)
    How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul Guest Lecture in MIT 6.172 Performance Engineering, 18 November 2010. 6.172 —How Fractal Trees Work 1 My Background • I’m an MIT alum: MIT Degrees = 2 × S.B + S.M. + Ph.D. • I was a principal architect of the Connection Machine CM-5 super­ computer at Thinking Machines. • I was Assistant Professor at Yale. • I was Akamai working on network mapping and billing. • I am research faculty in the SuperTech group, working with Charles. 6.172 —How Fractal Trees Work 2 Tokutek A few years ago I started collaborating with Michael Bender and Martin Farach-Colton on how to store data on disk to achieve high performance. We started Tokutek to commercialize the research. 6.172 —How Fractal Trees Work 3 I/O is a Big Bottleneck Sensor Query Systems include Sensor Disk Query sensors and Sensor storage, and Query want to perform Millions of data elements arrive queries on per second Query recently arrived data using indexes. recent data. Sensor 6.172 —How Fractal Trees Work 4 The Data Indexing Problem • Data arrives in one order (say, sorted by the time of the observation). • Data is queried in another order (say, by URL or location). Sensor Query Sensor Disk Query Sensor Query Millions of data elements arrive per second Query recently arrived data using indexes. Sensor 6.172 —How Fractal Trees Work 5 Why Not Simply Sort? • This is what data Data Sorted by Time warehouses do. • The problem is that you Sort must wait to sort the data before querying it: Data Sorted by URL typically an overnight delay.
    [Show full text]
  • B-Trees M-Ary Search Tree Solution
    M-ary Search Tree B-Trees • Maximum branching factor of M • Complete tree has height = Section 4.7 in Weiss # disk accesses for find : Runtime of find : 2 Solution: B-Trees B-Trees • specialized M-ary search trees What makes them disk-friendly? • Each node has (up to) M-1 keys: 1. Many keys stored in a node – subtree between two keys x and y contains leaves with values v such that • All brought to memory/cache in one access! 3 7 12 21 x ≤ v < y 2. Internal nodes contain only keys; • Pick branching factor M Only leaf nodes contain keys and actual data such that each node • The tree structure can be loaded into memory takes one full irrespective of data object size {page, block } x<3 3≤x<7 7≤x<12 12 ≤x<21 21 ≤x • Data actually resides in disk of memory 3 4 B-Tree: Example B-Tree Properties ‡ B-Tree with M = 4 (# pointers in internal node) and L = 4 (# data items in leaf) – Data is stored at the leaves – All leaves are at the same depth and contain between 10 40 L/2 and L data items – Internal nodes store up to M-1 keys 3 15 20 30 50 – Internal nodes have between M/2 and M children – Root (special case) has between 2 and M children (or root could be a leaf) 1 2 10 11 12 20 25 26 40 42 AB xG 3 5 6 9 15 17 30 32 33 36 50 60 70 Data objects, that I’ll Note: All leaves at the same depth! ignore in slides 5 ‡These are technically B +-Trees 6 1 Example, Again B-trees vs.
    [Show full text]
  • AVL Trees and Rotations
    / AVL trees and rotations Q1 Operations (insert, delete, search) are O(height) Tree height is O(log n) if perfectly balanced ◦ But maintaining perfect balance is O(n) Height-balanced trees are still O(log n) ◦ For T with height h, N(T) ≤ Fib(h+3) – 1 ◦ So H < 1.44 log (N+2) – 1.328 * AVL (Adelson-Velskii and Landis) trees maintain height-balance using rotations Are rotations O(log n)? We’ll see… / or = or \ Different representations for / = \ : Just two bits in a low-level language Enum in a higher-level language / Assume tree is height-balanced before insertion Insert as usual for a BST Move up from the newly inserted node to the lowest “unbalanced” node (if any) ◦ Use the balance code to detect unbalance - how? Do an appropriate rotation to balance the sub-tree rooted at this unbalanced node For example, a single left rotation: Two basic cases ◦ “See saw” case: Too-tall sub-tree is on the outside So tip the see saw so it’s level ◦ “Suck in your gut” case: Too-tall sub-tree is in the middle Pull its root up a level Q2-3 Unbalanced node Middle sub-tree attaches to lower node of the “see saw” Diagrams are from Data Structures by E.M. Reingold and W.J. Hansen Q4-5 Unbalanced node Pulled up Split between the nodes pushed down Weiss calls this “right-left double rotation” Q6 Write the method: static BalancedBinaryNode singleRotateLeft ( BalancedBinaryNode parent, /* A */ BalancedBinaryNode child /* B */ ) { } Returns a reference to the new root of this subtree.
    [Show full text]
  • Tries and String Matching
    Tries and String Matching Where We've Been ● Fundamental Data Structures ● Red/black trees, B-trees, RMQ, etc. ● Isometries ● Red/black trees ≡ 2-3-4 trees, binomial heaps ≡ binary numbers, etc. ● Amortized Analysis ● Aggregate, banker's, and potential methods. Where We're Going ● String Data Structures ● Data structures for storing and manipulating text. ● Randomized Data Structures ● Using randomness as a building block. ● Integer Data Structures ● Breaking the Ω(n log n) sorting barrier. ● Dynamic Connectivity ● Maintaining connectivity in an changing world. String Data Structures Text Processing ● String processing shows up everywhere: ● Computational biology: Manipulating DNA sequences. ● NLP: Storing and organizing huge text databases. ● Computer security: Building antivirus databases. ● Many problems have polynomial-time solutions. ● Goal: Design theoretically and practically efficient algorithms that outperform brute-force approaches. Outline for Today ● Tries ● A fundamental building block in string processing algorithms. ● Aho-Corasick String Matching ● A fast and elegant algorithm for searching large texts for known substrings. Tries Ordered Dictionaries ● Suppose we want to store a set of elements supporting the following operations: ● Insertion of new elements. ● Deletion of old elements. ● Membership queries. ● Successor queries. ● Predecessor queries. ● Min/max queries. ● Can use a standard red/black tree or splay tree to get (worst-case or expected) O(log n) implementations of each. A Catch ● Suppose we want to store a set of strings. ● Comparing two strings of lengths r and s takes time O(min{r, s}). ● Operations on a balanced BST or splay tree now take time O(M log n), where M is the length of the longest string in the tree.
    [Show full text]