Treaps and Skip Lists [Sp’17]
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Trees • Binary Trees • Newer Types of Tree Structures • M-Ary Trees
Data Organization and Processing Hierarchical Indexing (NDBI007) David Hoksza http://siret.ms.mff.cuni.cz/hoksza 1 Outline • Background • prefix tree • graphs • … • search trees • binary trees • Newer types of tree structures • m-ary trees • Typical tree type structures • B-tree • B+-tree • B*-tree 2 Motivation • Similarly as in hashing, the motivation is to find record(s) given a query key using only a few operations • unlike hashing, trees allow to retrieve set of records with keys from given range • Tree structures use “clustering” to efficiently filter out non relevant records from the data set • Tree-based indexes practically implement the index file data organization • for each search key, an indexing structure can be maintained 3 Drawbacks of Index(-Sequential) Organization • Static nature • when inserting a record at the beginning of the file, the whole index needs to be rebuilt • an overflow handling policy can be applied → the performance degrades as the file grows • reorganization can take a lot of time, especially for large tables • not possible for online transactional processing (OLTP) scenarios (as opposed to OLAP – online analytical processing) where many insertions/deletions occur 4 Tree Indexes • Most common dynamic indexing structure for external memory • When inserting/deleting into/from the primary file, the indexing structure(s) residing in the secondary file is modified to accommodate the new key • The modification of a tree is implemented by splitting/merging nodes • Used not only in databases • NTFS directory structure -
Skip Lists: a Probabilistic Alternative to Balanced Trees
Skip Lists: A Probabilistic Alternative to Balanced Trees Skip lists are a data structure that can be used in place of balanced trees. Skip lists use probabilistic balancing rather than strictly enforced balancing and as a result the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees. William Pugh Binary trees can be used for representing abstract data types Also giving every fourth node a pointer four ahead (Figure such as dictionaries and ordered lists. They work well when 1c) requires that no more than n/4 + 2 nodes be examined. the elements are inserted in a random order. Some sequences If every (2i)th node has a pointer 2i nodes ahead (Figure 1d), of operations, such as inserting the elements in order, produce the number of nodes that must be examined can be reduced to degenerate data structures that give very poor performance. If log2 n while only doubling the number of pointers. This it were possible to randomly permute the list of items to be in- data structure could be used for fast searching, but insertion serted, trees would work well with high probability for any in- and deletion would be impractical. put sequence. In most cases queries must be answered on-line, A node that has k forward pointers is called a level k node. so randomly permuting the input is impractical. Balanced tree If every (2i)th node has a pointer 2i nodes ahead, then levels algorithms re-arrange the tree as operations are performed to of nodes are distributed in a simple pattern: 50% are level 1, maintain certain balance conditions and assure good perfor- 25% are level 2, 12.5% are level 3 and so on. -
Skip B-Trees
Skip B-Trees Ittai Abraham1, James Aspnes2?, and Jian Yuan3 1 The Institute of Computer Science, The Hebrew University of Jerusalem, [email protected] 2 Department of Computer Science, Yale University, [email protected] 3 Google, [email protected] Abstract. We describe a new data structure, the Skip B-Tree that combines the advantages of skip graphs with features of traditional B-trees. A skip B-Tree provides efficient search, insertion and deletion operations. The data structure is highly fault tolerant even to adversarial failures, and allows for particularly simple repair mechanisms. Related resource keys are kept in blocks near each other enabling efficient range queries. Using this data structure, we describe a new distributed peer-to-peer network, the Distributed Skip B-Tree. Given m data items stored in a system with n nodes, the network allows to perform a range search operation for r consecutive keys that costs only O(logb m + r/b) where b = Θ(m/n). In addition, our distributed Skip B-tree search network has provable polylogarithmic costs for all its other basic operations like insert, delete, and node join. To the best of our knowledge, all previous distributed search networks either provide a range search operation whose cost is worse than ours or may require a linear cost for some basic operation like insert, delete, and node join. ? Supported in part by NSF grants CCR-0098078, CNS-0305258, and CNS-0435201. 1 Introduction Peer-to-peer systems provide a decentralized way to share resources among machines. An ideal peer-to-peer network should have such properties as decentralization, scalability, fault-tolerance, self-stabilization, load- balancing, dynamic addition and deletion of nodes, efficient query searching and exploiting spatial as well as temporal locality in searches. -
On the Cost of Persistence and Authentication in Skip Lists*
On the Cost of Persistence and Authentication in Skip Lists⋆ Micheal T. Goodrich1, Charalampos Papamanthou2, and Roberto Tamassia2 1 Department of Computer Science, University of California, Irvine 2 Department of Computer Science, Brown University Abstract. We present an extensive experimental study of authenticated data structures for dictionaries and maps implemented with skip lists. We consider realizations of these data structures that allow us to study the performance overhead of authentication and persistence. We explore various design decisions and analyze the impact of garbage collection and virtual memory paging, as well. Our empirical study confirms the effi- ciency of authenticated skip lists and offers guidelines for incorporating them in various applications. 1 Introduction A proven paradigm from distributed computing is that of using a large collec- tion of widely distributed computers to respond to queries from geographically dispersed users. This approach forms the foundation, for example, of the DNS system. Of course, we can abstract the main query and update tasks of such systems as simple data structures, such as distributed versions of dictionar- ies and maps, and easily characterize their asymptotic performance (with most operations running in logarithmic time). There are a number of interesting im- plementation issues concerning practical systems that use such distributed data structures, however, including the additional features that such structures should provide. For instance, a feature that can be useful in a number of real-world ap- plications is that distributed query responders provide authenticated responses, that is, answers that are provably trustworthy. An authenticated response in- cludes both an answer (for example, a yes/no answer to the query “is item x a member of the set S?”) and a proof of this answer, equivalent to a digital signature from the data source. -
The Randomized Quicksort Algorithm
Outline The Randomized Quicksort Algorithm K. Subramani1 1Lane Department of Computer Science and Electrical Engineering West Virginia University 7 February, 2012 Subramani Sample Analyses Outline Outline 1 The Randomized Quicksort Algorithm Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that A[1] < A[2]... A[n]. Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that A[1] < A[2]... A[n]. Note The assumption of distinctness simplifies the analysis. Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that A[1] < A[2]... A[n]. Note The assumption of distinctness simplifies the analysis. It has no bearing on the running time. Subramani Sample Analyses The Randomized Quicksort Algorithm The Partition subroutine Function PARTITION(A,p,q) 1: {We partition the sub-array A[p,p + 1,...,q] about A[p].} 2: for (i = (p + 1) to q) do 3: if (A[i] < A[p]) then 4: Insert A[i] into bucket L. -
Adversarial Search
Adversarial Search In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us. Outline 1. Games 2. Optimal Decisions in Games 3. Alpha-Beta Pruning 4. Imperfect, Real-Time Decisions 5. Games that include an Element of Chance 6. State-of-the-Art Game Programs 7. Summary 2 Search Strategies for Games • Difference to general search problems deterministic random – Imperfect Information: opponent not deterministic perfect Checkers, Backgammon, – Time: approximate algorithms information Chess, Go Monopoly incomplete Bridge, Poker, ? information Scrabble • Early fundamental results – Algorithm for perfect game von Neumann (1944) • Our terminology: – Approximation through – deterministic, fully accessible evaluation information Zuse (1945), Shannon (1950) Games 3 Games as Search Problems • Justification: Games are • Games as playground for search problems with an serious research opponent • How can we determine the • Imperfection through actions best next step/action? of opponent: possible results... – Cutting branches („pruning“) • Games hard to solve; – Evaluation functions for exhaustive: approximation of utility – Average branching factor function chess: 35 – ≈ 50 steps per player ➞ 10154 nodes in search tree – But “Only” 1040 allowed positions Games 4 Search Problem • 2-player games • Search problem – Player MAX – Initial state – Player MIN • Board, positions, first player – MAX moves first; players – Successor function then take turns • Lists of (move,state)-pairs – Goal test -
Tree-Combined Trie: a Compressed Data Structure for Fast IP Address Lookup
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 6, No. 12, 2015 Tree-Combined Trie: A Compressed Data Structure for Fast IP Address Lookup Muhammad Tahir Shakil Ahmed Department of Computer Engineering, Department of Computer Engineering, Sir Syed University of Engineering and Technology, Sir Syed University of Engineering and Technology, Karachi Karachi Abstract—For meeting the requirements of the high-speed impact their forwarding capacity. In order to resolve two main Internet and satisfying the Internet users, building fast routers issues there are two possible solutions one is IPv6 IP with high-speed IP address lookup engine is inevitable. addressing scheme and second is Classless Inter-domain Regarding the unpredictable variations occurred in the Routing or CIDR. forwarding information during the time and space, the IP lookup algorithm should be able to customize itself with temporal and Finding a high-speed, memory-efficient and scalable IP spatial conditions. This paper proposes a new dynamic data address lookup method has been a great challenge especially structure for fast IP address lookup. This novel data structure is in the last decade (i.e. after introducing Classless Inter- a dynamic mixture of trees and tries which is called Tree- Domain Routing, CIDR, in 1994). In this paper, we will Combined Trie or simply TC-Trie. Binary sorted trees are more discuss only CIDR. In addition to these desirable features, advantageous than tries for representing a sparse population reconfigurability is also of great importance; true because while multibit tries have better performance than trees when a different points of this huge heterogeneous structure of population is dense. -
KP-Trie Algorithm for Update and Search Operations
The International Arab Journal of Information Technology, Vol. 13, No. 6, November 2016 722 KP-Trie Algorithm for Update and Search Operations Feras Hanandeh1, Izzat Alsmadi2, Mohammed Akour3, and Essam Al Daoud4 1Department of Computer Information Systems, Hashemite University, Jordan 2, 3Department of Computer Information Systems, Yarmouk University, Jordan 4Computer Science Department, Zarqa University, Jordan Abstract: Radix-Tree is a space optimized data structure that performs data compression by means of cluster nodes that share the same branch. Each node with only one child is merged with its child and is considered as space optimized. Nevertheless, it can’t be considered as speed optimized because the root is associated with the empty string. Moreover, values are not normally associated with every node; they are associated only with leaves and some inner nodes that correspond to keys of interest. Therefore, it takes time in moving bit by bit to reach the desired word. In this paper we propose the KP-Trie which is consider as speed and space optimized data structure that is resulted from both horizontal and vertical compression. Keywords: Trie, radix tree, data structure, branch factor, indexing, tree structure, information retrieval. Received January 14, 2015; accepted March 23, 2015; Published online December 23, 2015 1. Introduction the exception of leaf nodes, nodes in the trie work merely as pointers to words. Data structures are a specialized format for efficient A trie, also called digital tree, is an ordered multi- organizing, retrieving, saving and storing data. It’s way tree data structure that is useful to store an efficient with large amount of data such as: Large data associative array where the keys are usually strings, bases. -
Balanced Trees Part One
Balanced Trees Part One Balanced Trees ● Balanced search trees are among the most useful and versatile data structures. ● Many programming languages ship with a balanced tree library. ● C++: std::map / std::set ● Java: TreeMap / TreeSet ● Many advanced data structures are layered on top of balanced trees. ● We’ll see several later in the quarter! Where We're Going ● B-Trees (Today) ● A simple type of balanced tree developed for block storage. ● Red/Black Trees (Today/Thursday) ● The canonical balanced binary search tree. ● Augmented Search Trees (Thursday) ● Adding extra information to balanced trees to supercharge the data structure. Outline for Today ● BST Review ● Refresher on basic BST concepts and runtimes. ● Overview of Red/Black Trees ● What we're building toward. ● B-Trees and 2-3-4 Trees ● Simple balanced trees, in depth. ● Intuiting Red/Black Trees ● A much better feel for red/black trees. A Quick BST Review Binary Search Trees ● A binary search tree is a binary tree with 9 the following properties: 5 13 ● Each node in the BST stores a key, and 1 6 10 14 optionally, some auxiliary information. 3 7 11 15 ● The key of every node in a BST is strictly greater than all keys 2 4 8 12 to its left and strictly smaller than all keys to its right. Binary Search Trees ● The height of a binary search tree is the 9 length of the longest path from the root to a 5 13 leaf, measured in the number of edges. 1 6 10 14 ● A tree with one node has height 0. -
Lecture 04 Linear Structures Sort
Algorithmics (6EAP) MTAT.03.238 Linear structures, sorting, searching, etc Jaak Vilo 2018 Fall Jaak Vilo 1 Big-Oh notation classes Class Informal Intuition Analogy f(n) ∈ ο ( g(n) ) f is dominated by g Strictly below < f(n) ∈ O( g(n) ) Bounded from above Upper bound ≤ f(n) ∈ Θ( g(n) ) Bounded from “equal to” = above and below f(n) ∈ Ω( g(n) ) Bounded from below Lower bound ≥ f(n) ∈ ω( g(n) ) f dominates g Strictly above > Conclusions • Algorithm complexity deals with the behavior in the long-term – worst case -- typical – average case -- quite hard – best case -- bogus, cheating • In practice, long-term sometimes not necessary – E.g. for sorting 20 elements, you dont need fancy algorithms… Linear, sequential, ordered, list … Memory, disk, tape etc – is an ordered sequentially addressed media. Physical ordered list ~ array • Memory /address/ – Garbage collection • Files (character/byte list/lines in text file,…) • Disk – Disk fragmentation Linear data structures: Arrays • Array • Hashed array tree • Bidirectional map • Heightmap • Bit array • Lookup table • Bit field • Matrix • Bitboard • Parallel array • Bitmap • Sorted array • Circular buffer • Sparse array • Control table • Sparse matrix • Image • Iliffe vector • Dynamic array • Variable-length array • Gap buffer Linear data structures: Lists • Doubly linked list • Array list • Xor linked list • Linked list • Zipper • Self-organizing list • Doubly connected edge • Skip list list • Unrolled linked list • Difference list • VList Lists: Array 0 1 size MAX_SIZE-1 3 6 7 5 2 L = int[MAX_SIZE] -
Game Trees, Quad Trees and Heaps
CS 61B Game Trees, Quad Trees and Heaps Fall 2014 1 Heaps of fun R (a) Assume that we have a binary min-heap (smallest value on top) data structue called Heap that stores integers and has properly implemented insert and removeMin methods. Draw the heap and its corresponding array representation after each of the operations below: Heap h = new Heap(5); //Creates a min-heap with 5 as the root 5 5 h.insert(7); 5,7 5 / 7 h.insert(3); 3,7,5 3 /\ 7 5 h.insert(1); 1,3,5,7 1 /\ 3 5 / 7 h.insert(2); 1,2,5,7,3 1 /\ 2 5 /\ 7 3 h.removeMin(); 2,3,5,7 2 /\ 3 5 / 7 CS 61B, Fall 2014, Game Trees, Quad Trees and Heaps 1 h.removeMin(); 3,7,5 3 /\ 7 5 (b) Consider an array based min-heap with N elements. What is the worst case running time of each of the following operations if we ignore resizing? What is the worst case running time if we take into account resizing? What are the advantages of using an array based heap vs. using a BST-based heap? Insert O(log N) Find Min O(1) Remove Min O(log N) Accounting for resizing: Insert O(N) Find Min O(1) Remove Min O(N) Using a BST is not space-efficient. (c) Your friend Alyssa P. Hacker challenges you to quickly implement a max-heap data structure - "Hah! I’ll just use my min-heap implementation as a template", you think to yourself. -
CSE 326: Data Structures Binary Search Trees
Announcements (1/23/09) CSE 326: Data Structures • HW #2 due now • HW #3 out today, due at beginning of class next Binary Search Trees Friday. • Project 2A due next Wed. night. Steve Seitz Winter 2009 • Read Chapter 4 1 2 ADTs Seen So Far The Dictionary ADT • seitz •Data: Steve • Stack •Priority Queue Seitz –a set of insert(seitz, ….) CSE 592 –Push –Insert (key, value) pairs –Pop – DeleteMin •ericm6 Eric • Operations: McCambridge – Insert (key, value) CSE 218 •Queue find(ericm6) – Find (key) • ericm6 • soyoung – Enqueue Then there is decreaseKey… Eric, McCambridge,… – Remove (key) Soyoung Shin – Dequeue CSE 002 •… The Dictionary ADT is also 3 called the “Map ADT” 4 A Modest Few Uses Implementations insert find delete •Sets • Unsorted Linked-list • Dictionaries • Networks : Router tables • Operating systems : Page tables • Unsorted array • Compilers : Symbol tables • Sorted array Probably the most widely used ADT! 5 6 Binary Trees Binary Tree: Representation • Binary tree is A – a root left right – left subtree (maybe empty) A pointerpointer A – right subtree (maybe empty) B C B C B C • Representation: left right left right pointerpointer pointerpointer D E F D E F Data left right G H D E F pointer pointer left right left right left right pointerpointer pointerpointer pointerpointer I J 7 8 Tree Traversals Inorder Traversal void traverse(BNode t){ A traversal is an order for if (t != NULL) visiting all the nodes of a tree + traverse (t.left); process t.element; Three types: * 5 traverse (t.right); • Pre-order: Root, left subtree, right subtree 2 4 } • In-order: Left subtree, root, right subtree } (an expression tree) • Post-order: Left subtree, right subtree, root 9 10 Binary Tree: Special Cases Binary Tree: Some Numbers… Recall: height of a tree = longest path from root to leaf.