CS 758/858: Algorithms

Total Page:16

File Type:pdf, Size:1020Kb

CS 758/858: Algorithms CS 758/858: Algorithms ■ COVID http://www.cs.unh.edu/~ruml/cs758 Tries BSTs Algorithms DP Wheeler Ruml (UNH) Class 8, CS 758 – 1 / 24 COVID ■ COVID Tries BSTs Algorithms DP ■ check your Wildcat Pass before coming to campus ■ if you have concerns, let me know Wheeler Ruml (UNH) Class 8, CS 758 – 2 / 24 ■ COVID Tries ■ Problem ■ Searching ■ Searching ■ Tries ■ Not Tries ■ Searching ■ Problem ■ Break Tries BSTs Algorithms DP Wheeler Ruml (UNH) Class 8, CS 758 – 3 / 24 Problem Statement ■ COVID Given a linked list of items, print all possible subsets. Tries ■ Problem ■ Searching ■ Searching running time? ■ Tries ■ Not Tries ■ Searching ■ Problem ■ Break BSTs Algorithms DP Wheeler Ruml (UNH) Class 8, CS 758 – 4 / 24 Searching ■ COVID Structure Find Insert Delete Tries List (unsorted) ■ Problem ■ Searching List (sorted) ■ Searching Array (unsorted) ■ Tries ■ Not Tries Array (sorted) ■ Searching Heap ■ Problem ■ Break Hash table BSTs Binary tree (unbalanced) Algorithms Binary tree (balanced) DP Wheeler Ruml (UNH) Class 8, CS 758 – 5 / 24 Searching ■ COVID What about long keys? Tries ■ Problem ■ Searching Can we detect miss without examining entire key? ■ Searching ■ Tries ■ Not Tries ■ Searching ■ Problem ■ Break BSTs Algorithms DP Wheeler Ruml (UNH) Class 8, CS 758 – 6 / 24 Tries ■ COVID trie: test digits of key, branching on values Tries ■ ■ Problem some nodes do not hold values ■ Searching ■ fixed order ■ Searching ■ Tries ■ depth = length ■ Not Tries ■ ■ Searching canonical representation ■ Problem ■ Break retrieval BSTs Algorithms CLRS: ‘trie’ = ‘radix tree’ DP Wikipedia: ‘trie’ 6= ‘radix tree’ Sedgewick: ‘trie’ 6= ‘digital search tree’ duplicate keys? what’s their weakness? Wheeler Ruml (UNH) Class 8, CS 758 – 7 / 24 Not Tries ■ COVID Wikipedia ‘radix tree’ = ‘radix trie’ = ‘patricia trie’: compressed Tries trie, every internal node has at least two leaves beneath ■ Problem ■ Searching ■ Searching Sedgewick: ‘digital search tree’: value at every node, just like ■ Tries ■ Not Tries binary trees except test bits instead of full compare ■ Searching ■ Problem ■ Break BSTs Algorithms DP Wheeler Ruml (UNH) Class 8, CS 758 – 8 / 24 Searching ■ COVID Structure Find Insert Delete Tries List (unsorted) ■ Problem ■ Searching List (sorted) ■ Searching Array (unsorted) ■ Tries ■ Not Tries Array (sorted) ■ Searching Heap ■ Problem ■ Break Hash table BSTs Binary tree (unbalanced) Algorithms Binary tree (balanced) DP Trie Wheeler Ruml (UNH) Class 8, CS 758 – 9 / 24 Problem Statement ■ COVID Given a list of records, which may contain duplicates, return a Tries list containing each record at most once. ■ Problem ■ Searching ■ Searching ■ Tries ■ Not Tries ■ Searching ■ Problem ■ Break BSTs Algorithms DP Wheeler Ruml (UNH) Class 8, CS 758 – 10 / 24 Break ■ ■ COVID asst 4 ■ Tries asst 5 ■ Problem ■ Searching ■ Searching ■ Tries ■ Not Tries ■ Searching ■ Problem ■ Break BSTs Algorithms DP Wheeler Ruml (UNH) Class 8, CS 758 – 11 / 24 ■ COVID Tries BSTs ■ BST Insert ■ Find-Parent ■ Property 1 ■ Property 2 Algorithms DP Binary Search Trees Wheeler Ruml (UNH) Class 8, CS 758 – 12 / 24 Binary Search Tree Insertion ■ COVID Invariant: For each node n, all nodes in n’s left subtree ≤ n, Tries all nodes in right subtree ≥ n. BSTs ■ BST Insert ■ Find-Parent ■ Property 1 insert (n) ■ Property 2 1. n’s parent ← find-parent(n, root, nil) Algorithms 2. if parent is nil DP 3. root ← n 4. else 5. if n should be before parent 6. parent’s left child ← n 7. else 8. parent’s right child ← n Wheeler Ruml (UNH) Class 8, CS 758 – 13 / 24 Find-Parent Specification ■ COVID Need: Tries 1) find-parent returns nil iff empty tree, BSTs 2) find-parent returns leaf node p directly adjacent to n. Ie, ■ BST Insert ■ Find-Parent either ■ Property 1 ■ Property 2 predecessor(p) ≤ n ≤ p or Algorithms p ≤ n ≤ successor(p) DP so that attaching n to p preserves BST ordering. Wheeler Ruml (UNH) Class 8, CS 758 – 14 / 24 Property 1 ■ COVID find-parent(n, curr, parent) Tries 9. if curr doesn’t exist BSTs ■ BST Insert 10. return parent ■ Find-Parent 11. if n should be before curr ■ Property 1 ■ Property 2 12. return find-parent(n, curr’s left child, curr) Algorithms 13. else DP 14. return find-parent(n, curr’s right child, curr) 1) called as find-parent(n, root, nil), so only way return value is nil is if curr=root=nil (ie, empty tree). 2) Invariant: if we replace curr with n (and ignore curr’s children), n is in proper place in the tree. Wheeler Ruml (UNH) Class 8, CS 758 – 15 / 24 Property 2 ■ COVID Invariant: If we replace curr with n (and ignore curr’s children), Tries n is in proper place in the tree. BSTs ■ BST Insert Initialization: curr is root. Replacing root with n has n in correct ■ Find-Parent place with respect to the zero remaining nodes. ■ Property 1 ■ Property 2 Maintenance: We were the correct position if curr were replaced Algorithms DP by n. To set up next iteration, we move to the correct side of curr. This preserves BST ordering with respect to curr as we move below it and it (and its other child) enters the ‘active tree’. Termination: We terminate when we move off the correct side of a leaf. The BST invariant holds everywhere if that null pointer were replaced by n because there are no children to ignore. Thus, we know that: predecessor(p) ≤ n ≤ p or p ≤ n ≤ successor(p) Wheeler Ruml (UNH) Class 8, CS 758 – 16 / 24 ■ COVID Tries BSTs Algorithms ■ Algorithms ■ Types of Algs DP Algorithms Wheeler Ruml (UNH) Class 8, CS 758 – 17 / 24 Algorithms Beyond craftsmanship lies invention, and it is ■ COVID here that lean, spare, fast programs are born. Almost Tries always these are the result of strategic breakthrough BSTs rather than tactical cleverness. Sometimes the Algorithms ■ Algorithms strategic breakthrough will be a new algorithm, such ■ Types of Algs the Cooley-Tukey Fast Fourier transform or the DP substitution of an n log n sort for an n2 set of comparisons. Much more often, strategic breakthrough will come from redoing the representation of the data or tables. This is where the heart of a program lies. Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowchart; it’ll be obvious. — Fred Brooks, 1974 (lead on IBM System/360, Turing Award 1999) Wheeler Ruml (UNH) Class 8, CS 758 – 18 / 24 Algorithms: Modern Version ■ COVID Smart data structures and dumb code works a lot Tries better than the other way around. BSTs Algorithms — Guy Steele, 2002 (ACM Fellow, inventor of ■ Algorithms Scheme, editor of The Hacker’s Dictionary) ■ Types of Algs DP Wheeler Ruml (UNH) Class 8, CS 758 – 19 / 24 Types of Algorithms ■ ■ COVID divide and conquer Tries ■ dynamic programming BSTs ■ greedy Algorithms ■ backtracking ■ Algorithms ■ ■ Types of Algs (reduction) DP Wheeler Ruml (UNH) Class 8, CS 758 – 20 / 24 ■ COVID Tries BSTs Algorithms DP ■ Fibonacci ■ Memoization ■ EOLQs Dynamic Programming Wheeler Ruml (UNH) Class 8, CS 758 – 21 / 24 Fibonacci Numbers ■ COVID 0 if n =0 Tries Fn = 1 if n =1 BSTs Fn−1 + Fn−2 for n ≥ 2 Algorithms DP ■ Fibonacci What is the complexity of the naive algorithm? ■ Memoization ■ EOLQs How to make this efficient? Wheeler Ruml (UNH) Class 8, CS 758 – 22 / 24 Memoization ■ ■ COVID recursive decomposition Tries ■ polynomial number of subproblems BSTs ■ cache results in look-up table Algorithms DP one form of dynamic programming ■ Fibonacci ■ Memoization ■ EOLQs Wheeler Ruml (UNH) Class 8, CS 758 – 23 / 24 EOLQs ■ ■ COVID What’s still confusing? Tries ■ What question didn’t you get to ask today? BSTs ■ What would you like to hear more about? Algorithms DP Please write down your most pressing question about algorithms ■ Fibonacci and put it in the box on your way out. ■ Memoization ■ EOLQs Thanks! Wheeler Ruml (UNH) Class 8, CS 758 – 24 / 24.
Recommended publications
  • KP-Trie Algorithm for Update and Search Operations
    The International Arab Journal of Information Technology, Vol. 13, No. 6, November 2016 KP-Trie Algorithm for Update and Search Operations Feras Hanandeh1, Mohammed Akour2, Essam Al Daoud3, Rafat Alshorman4, Izzat Alsmadi5 1Dept. of Computer Information Systems, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, Hashemite University, Zarqa, Jordan, [email protected] 2,5Dept. of Computer Information Systems, Faculty of Information Technology, Yarmouk University, Irbid, Jordan, [email protected], [email protected] 3,4Computer Science Department, Zarqa University [email protected]; [email protected] Abstract: Radix-Tree is a space optimized data structure that performs data compression by means of cluster nodes that share the same branch. Each node with only one child is merged with its child and is considered as space optimized. Nevertheless, it can't be considered as speed optimized because the root is associated with the empty string . Moreover, values are not normally associated with every node; they are associated only with leaves and some inner nodes that correspond to keys of interest. Therefore, it takes time in moving bit by bit to reach the desired word. In this paper we propose the KP-Trie which is consider as speed and space optimized data structure that is resulted from both horizontal and vertical compression. Keywords: Trie, radix tree, data structure, branch factor, indexing, tree structure, information retrieval. 1- Introduction The concept of Data structures such as trees has developed since the 19th century. Tries evolved from trees. They have different names Data structures are a specialized format for such as: Radix tree, prefix tree compact trie, efficient organizing, retrieving, saving and bucket trie, crit bit tree, and PATRICIA storing data.
    [Show full text]
  • 1 Suffix Trees
    This material takes about 1.5 hours. 1 Suffix Trees Gusfield: Algorithms on Strings, Trees, and Sequences. Weiner 73 “Linear Pattern-matching algorithms” IEEE conference on automata and switching theory McCreight 76 “A space-economical suffix tree construction algorithm” JACM 23(2) 1976 Chen and Seifras 85 “Efficient and Elegegant Suffix tree construction” in Apos- tolico/Galil Combninatorial Algorithms on Words Another “search” structure, dedicated to strings. Basic problem: match a “pattern” (of length m) to “text” (of length n) • goal: decide if a given string (“pattern”) is a substring of the text • possibly created by concatenating short ones, eg newspaper • application in IR, also computational bio (DNA seqs) • if pattern avilable first, can build DFA, run in time linear in text • if text available first, can build suffix tree, run in time linear in pattern. • applications in computational bio. First idea: binary tree on strings. Inefficient because run over pattern many times. • fractional cascading? • realize only need one character at each node! Tries: • used to store dictionary of strings • trees with children indexed by “alphabet” • time to search equal length of query string • insertion ditto. • optimal, since even hashing requires this time to hash. • but better, because no “hash function” computed. • space an issue: – using array increases stroage cost by |Σ| – using binary tree on alphabet increases search time by log |Σ| 1 – ok for “const alphabet” – if really fussy, could use hash-table at each node. • size in worst case: sum of word lengths (so pretty much solves “dictionary” problem. But what about substrings? • Relevance to DNA searches • idea: trie of all n2 substrings • equivalent to trie of all n suffixes.
    [Show full text]
  • Balanced Trees Part One
    Balanced Trees Part One Balanced Trees ● Balanced search trees are among the most useful and versatile data structures. ● Many programming languages ship with a balanced tree library. ● C++: std::map / std::set ● Java: TreeMap / TreeSet ● Many advanced data structures are layered on top of balanced trees. ● We’ll see several later in the quarter! Where We're Going ● B-Trees (Today) ● A simple type of balanced tree developed for block storage. ● Red/Black Trees (Today/Thursday) ● The canonical balanced binary search tree. ● Augmented Search Trees (Thursday) ● Adding extra information to balanced trees to supercharge the data structure. Outline for Today ● BST Review ● Refresher on basic BST concepts and runtimes. ● Overview of Red/Black Trees ● What we're building toward. ● B-Trees and 2-3-4 Trees ● Simple balanced trees, in depth. ● Intuiting Red/Black Trees ● A much better feel for red/black trees. A Quick BST Review Binary Search Trees ● A binary search tree is a binary tree with 9 the following properties: 5 13 ● Each node in the BST stores a key, and 1 6 10 14 optionally, some auxiliary information. 3 7 11 15 ● The key of every node in a BST is strictly greater than all keys 2 4 8 12 to its left and strictly smaller than all keys to its right. Binary Search Trees ● The height of a binary search tree is the 9 length of the longest path from the root to a 5 13 leaf, measured in the number of edges. 1 6 10 14 ● A tree with one node has height 0.
    [Show full text]
  • Heaps a Heap Is a Complete Binary Tree. a Max-Heap Is A
    Heaps Heaps 1 A heap is a complete binary tree. A max-heap is a complete binary tree in which the value in each internal node is greater than or equal to the values in the children of that node. A min-heap is defined similarly. 97 Mapping the elements of 93 84 a heap into an array is trivial: if a node is stored at 90 79 83 81 index k, then its left child is stored at index 42 55 73 21 83 2k+1 and its right child at index 2k+2 01234567891011 97 93 84 90 79 83 81 42 55 73 21 83 CS@VT Data Structures & Algorithms ©2000-2009 McQuain Building a Heap Heaps 2 The fact that a heap is a complete binary tree allows it to be efficiently represented using a simple array. Given an array of N values, a heap containing those values can be built, in situ, by simply “sifting” each internal node down to its proper location: - start with the last 73 73 internal node * - swap the current 74 81 74 * 93 internal node with its larger child, if 79 90 93 79 90 81 necessary - then follow the swapped node down 73 * 93 - continue until all * internal nodes are 90 93 90 73 done 79 74 81 79 74 81 CS@VT Data Structures & Algorithms ©2000-2009 McQuain Heap Class Interface Heaps 3 We will consider a somewhat minimal maxheap class: public class BinaryHeap<T extends Comparable<? super T>> { private static final int DEFCAP = 10; // default array size private int size; // # elems in array private T [] elems; // array of elems public BinaryHeap() { .
    [Show full text]
  • CSCI 333 Data Structures Binary Trees Binary Tree Example Full And
    Notes with the dark blue background CSCI 333 were prepared by the textbook author Data Structures Clifford A. Shaffer Chapter 5 Department of Computer Science 18, 20, 23, and 25 September 2002 Virginia Tech Copyright © 2000, 2001 Binary Trees Binary Tree Example A binary tree is made up of a finite set of Notation: Node, nodes that is either empty or consists of a children, edge, node called the root together with two parent, ancestor, binary trees, called the left and right descendant, path, subtrees, which are disjoint from each depth, height, level, other and from the root. leaf node, internal node, subtree. Full and Complete Binary Trees Full Binary Tree Theorem (1) Full binary tree: Each node is either a leaf or Theorem: The number of leaves in a non-empty internal node with exactly two non-empty children. full binary tree is one more than the number of internal nodes. Complete binary tree: If the height of the tree is d, then all leaves except possibly level d are Proof (by Mathematical Induction): completely full. The bottom level has all nodes to the left side. Base case: A full binary tree with 1 internal node must have two leaf nodes. Induction Hypothesis: Assume any full binary tree T containing n-1 internal nodes has n leaves. 1 Full Binary Tree Theorem (2) Full Binary Tree Corollary Induction Step: Given tree T with n internal Theorem: The number of null pointers in a nodes, pick internal node I with two leaf children. non-empty tree is one more than the Remove I’s children, call resulting tree T’.
    [Show full text]
  • Tree Structures
    Tree Structures Definitions: o A tree is a connected acyclic graph. o A disconnected acyclic graph is called a forest o A tree is a connected digraph with these properties: . There is exactly one node (Root) with in-degree=0 . All other nodes have in-degree=1 . A leaf is a node with out-degree=0 . There is exactly one path from the root to any leaf o The degree of a tree is the maximum out-degree of the nodes in the tree. o If (X,Y) is a path: X is an ancestor of Y, and Y is a descendant of X. Root X Y CSci 1112 – Algorithms and Data Structures, A. Bellaachia Page 1 Level of a node: Level 0 or 1 1 or 2 2 or 3 3 or 4 Height or depth: o The depth of a node is the number of edges from the root to the node. o The root node has depth zero o The height of a node is the number of edges from the node to the deepest leaf. o The height of a tree is a height of the root. o The height of the root is the height of the tree o Leaf nodes have height zero o A tree with only a single node (hence both a root and leaf) has depth and height zero. o An empty tree (tree with no nodes) has depth and height −1. o It is the maximum level of any node in the tree. CSci 1112 – Algorithms and Data Structures, A.
    [Show full text]
  • Binary Search Tree
    ADT Binary Search Tree! Ellen Walker! CPSC 201 Data Structures! Hiram College! Binary Search Tree! •" Value-based storage of information! –" Data is stored in order! –" Data can be retrieved by value efficiently! •" Is a binary tree! –" Everything in left subtree is < root! –" Everything in right subtree is >root! –" Both left and right subtrees are also BST#s! Operations on BST! •" Some can be inherited from binary tree! –" Constructor (for empty tree)! –" Inorder, Preorder, and Postorder traversal! •" Some must be defined ! –" Insert item! –" Delete item! –" Retrieve item! The Node<E> Class! •" Just as for a linked list, a node consists of a data part and links to successor nodes! •" The data part is a reference to type E! •" A binary tree node must have links to both its left and right subtrees! The BinaryTree<E> Class! The BinaryTree<E> Class (continued)! Overview of a Binary Search Tree! •" Binary search tree definition! –" A set of nodes T is a binary search tree if either of the following is true! •" T is empty! •" Its root has two subtrees such that each is a binary search tree and the value in the root is greater than all values of the left subtree but less than all values in the right subtree! Overview of a Binary Search Tree (continued)! Searching a Binary Tree! Class TreeSet and Interface Search Tree! BinarySearchTree Class! BST Algorithms! •" Search! •" Insert! •" Delete! •" Print values in order! –" We already know this, it#s inorder traversal! –" That#s why it#s called “in order”! Searching the Binary Tree! •" If the tree is
    [Show full text]
  • Suffix Trees and Suffix Arrays in Primary and Secondary Storage Pang Ko Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 2007 Suffix trees and suffix arrays in primary and secondary storage Pang Ko Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Bioinformatics Commons, and the Computer Sciences Commons Recommended Citation Ko, Pang, "Suffix trees and suffix arrays in primary and secondary storage" (2007). Retrospective Theses and Dissertations. 15942. https://lib.dr.iastate.edu/rtd/15942 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Suffix trees and suffix arrays in primary and secondary storage by Pang Ko A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Engineering Program of Study Committee: Srinivas Aluru, Major Professor David Fern´andez-Baca Suraj Kothari Patrick Schnable Srikanta Tirthapura Iowa State University Ames, Iowa 2007 UMI Number: 3274885 UMI Microform 3274885 Copyright 2007 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, MI 48106-1346 ii DEDICATION To my parents iii TABLE OF CONTENTS LISTOFTABLES ................................... v LISTOFFIGURES .................................. vi ACKNOWLEDGEMENTS. .. .. .. .. .. .. .. .. .. ... .. .. .. .. vii ABSTRACT....................................... viii CHAPTER1. INTRODUCTION . 1 1.1 SuffixArrayinMainMemory .
    [Show full text]
  • Binary Tree Fall 2017 Stony Brook University Instructor: Shebuti Rayana [email protected] Introduction to Tree
    CSE 230 Intermediate Programming in C and C++ Binary Tree Fall 2017 Stony Brook University Instructor: Shebuti Rayana [email protected] Introduction to Tree ■ Tree is a non-linear data structure which is a collection of data (Node) organized in hierarchical structure. ■ In tree data structure, every individual element is called as Node. Node stores – the actual data of that particular element and – link to next element in hierarchical structure. Tree with 11 nodes and 10 edges Shebuti Rayana (CS, Stony Brook University) 2 Tree Terminology ■ Root ■ In a tree data structure, the first node is called as Root Node. Every tree must have root node. In any tree, there must be only one root node. Root node does not have any parent. (same as head in a LinkedList). Here, A is the Root node Shebuti Rayana (CS, Stony Brook University) 3 Tree Terminology ■ Edge ■ The connecting link between any two nodes is called an Edge. In a tree with 'N' number of nodes there will be a maximum of 'N-1' number of edges. Edge is the connecting link between the two nodes Shebuti Rayana (CS, Stony Brook University) 4 Tree Terminology ■ Parent ■ The node which is predecessor of any node is called as Parent Node. The node which has branch from it to any other node is called as parent node. Parent node can also be defined as "The node which has child / children". Here, A is Parent of B and C B is Parent of D, E and F C is the Parent of G and H Shebuti Rayana (CS, Stony Brook University) 5 Tree Terminology ■ Child ■ The node which is descendant of any node is called as CHILD Node.
    [Show full text]
  • Section 11.1 Introduction to Trees Definition
    Section 11.1 Introduction to Trees Definition: A tree is a connected undirected graph with no simple circuits. : A circuit is a path of length >=1 that begins and ends a the same vertex. d d Tournament Trees A common form of tree used in everyday life is the tournament tree, used to describe the outcome of a series of games, such as a tennis tournament. Alice Antonia Alice Anita Alice Abigail Abigail Alice Amy Agnes Agnes Angela Angela Angela Audrey A Family Tree Much of the tree terminology derives from family trees. Gaea Ocean Cronus Phoebe Zeus Poseidon Demeter Pluto Leto Iapetus Persephone Apollo Atlas Prometheus Ancestor Tree An inverted family tree. Important point - it is a binary tree. Iphigenia Clytemnestra Agamemnon Leda Tyndareus Aerope Atreus Catreus Forest Graphs containing no simple circuits that are not connected, but each connected component is a tree. Theorem An undirected graph is a tree if and only if there is a unique simple path between any two of its vertices. Rooted Trees Once a vertex of a tree has been designated as the root of the tree, it is possible to assign direction to each of the edges. Rooted Trees g a e f e c b b d a d c g f root node a internal vertex parent of g b c d e f g leaf siblings h i a b c d e f g h i h i ancestors of and a b c d e f g subtree with b as its h i root subtree with c as its root m-ary trees A rooted tree is called an m-ary tree if every internal vertex has no more than m children.
    [Show full text]
  • The Adaptive Radix Tree
    Department of Informatics, University of Z¨urich MSc Basismodul The Adaptive Radix Tree Rafael Kallis Matrikelnummer: 14-708-887 Email: [email protected] September 18, 2018 supervised by Prof. Dr. Michael B¨ohlenand Kevin Wellenzohn 1 1 Introduction The goal of this project is to study and implement the Adaptive Radix Tree (ART), as proposed by Leis et al. [2]. ART, which is a trie based data structure, achieves its performance, and space efficiency, by compressing the tree both vertically, i.e., if a node has no siblings it is merged with its parent, and horizontally, i.e., uses an array which grows as the number of children increases. Vertical compression reduces the tree height and horizontal compression decreases a node's size. In Section 3 we describe how ART is constructed by applying vertical and horizontal compression to a trie. Next, we describe the point query procedure, as well as key deletion in Section 4. Finally, a benchmark of ART, a red-black tree and a hashtable is presented in Section 5. 2 Background - Tries A trie [1] is a hierarchical data structure which stores key-value pairs. Tries can answer both point and range queries efficiently since keys are stored in lexicographic order. Unlike a comparison-based search tree, a trie does not store keys in nodes. Rather, the digital representation of a search key is split into partial keys used to index the nodes. When constructing a trie from a set of keys, all insertion orders result in the same tree. Tries have no notion of balance and therefore do not require rebalancing operations.
    [Show full text]
  • Artful Indexing for Main-Memory Databases
    Indexing for Main-Memory data systems: The Adaptive Radix Tree (ART) Ivan Sinyagin Memory Wall Why indexes ? Best data structure O(1) ? Binary Search ! Binary Search • Cache utilization is low • Only first 3-5 cache lines have good temporal locality • Only the last cache line has spacial locality • Updates in a sorted array are expensive Trees T-tree • Sorted array split into balanced BST with fat nodes (~ cache lines) • Better than RB/AVL • Updates faster, but still expensive • Similar to BS: useless data movement to CPU (useful only min and max) • Developed in mid 80s and still(!) used in many DBMS B+ tree • B+ tree • Fanout => minimize random access by shallowing the tree • Keys fit into a cache line • Increased cache utilization (all keys are useful) • 1 useful pointer • Pipeline stalls - conditional logic • Still expensive updates: splitting & rebalancing CSB+ tree CSB+ tree • ~ 1999-2000 • Improved space complexity • Great cache line utilization: keys + 1 pointer • Node size ~ cache line • Update overhead - more logic to balance Can we do better ? • Less conditional logic • Cheap updates: no rebalancing, no splitting • Preserve order => tree • Preserve few random accesses (low height) • Preserve cache line utilization • Preserve space complexity Tries Radix Tree Implicit keys Space complexity Radix Tree span • k bits keys => k/s inner levels and 2^s pointers • 32 bit keys & span=1 => 32 levels & 2 pointers • 32 bit keys & span=2 => 16 levels & 4 pointers • 32 bit keys & span=3 => 11 levels & 8 pointers • 32 bit keys & span=4 => 8
    [Show full text]