Persistent Adaptive Radix Trees: Efficient Fine-Grained Updates to Immutable Data
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
KP-Trie Algorithm for Update and Search Operations
The International Arab Journal of Information Technology, Vol. 13, No. 6, November 2016 KP-Trie Algorithm for Update and Search Operations Feras Hanandeh1, Mohammed Akour2, Essam Al Daoud3, Rafat Alshorman4, Izzat Alsmadi5 1Dept. of Computer Information Systems, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, Hashemite University, Zarqa, Jordan, [email protected] 2,5Dept. of Computer Information Systems, Faculty of Information Technology, Yarmouk University, Irbid, Jordan, [email protected], [email protected] 3,4Computer Science Department, Zarqa University [email protected]; [email protected] Abstract: Radix-Tree is a space optimized data structure that performs data compression by means of cluster nodes that share the same branch. Each node with only one child is merged with its child and is considered as space optimized. Nevertheless, it can't be considered as speed optimized because the root is associated with the empty string . Moreover, values are not normally associated with every node; they are associated only with leaves and some inner nodes that correspond to keys of interest. Therefore, it takes time in moving bit by bit to reach the desired word. In this paper we propose the KP-Trie which is consider as speed and space optimized data structure that is resulted from both horizontal and vertical compression. Keywords: Trie, radix tree, data structure, branch factor, indexing, tree structure, information retrieval. 1- Introduction The concept of Data structures such as trees has developed since the 19th century. Tries evolved from trees. They have different names Data structures are a specialized format for such as: Radix tree, prefix tree compact trie, efficient organizing, retrieving, saving and bucket trie, crit bit tree, and PATRICIA storing data. -
The Adaptive Radix Tree
Department of Informatics, University of Z¨urich MSc Basismodul The Adaptive Radix Tree Rafael Kallis Matrikelnummer: 14-708-887 Email: [email protected] September 18, 2018 supervised by Prof. Dr. Michael B¨ohlenand Kevin Wellenzohn 1 1 Introduction The goal of this project is to study and implement the Adaptive Radix Tree (ART), as proposed by Leis et al. [2]. ART, which is a trie based data structure, achieves its performance, and space efficiency, by compressing the tree both vertically, i.e., if a node has no siblings it is merged with its parent, and horizontally, i.e., uses an array which grows as the number of children increases. Vertical compression reduces the tree height and horizontal compression decreases a node's size. In Section 3 we describe how ART is constructed by applying vertical and horizontal compression to a trie. Next, we describe the point query procedure, as well as key deletion in Section 4. Finally, a benchmark of ART, a red-black tree and a hashtable is presented in Section 5. 2 Background - Tries A trie [1] is a hierarchical data structure which stores key-value pairs. Tries can answer both point and range queries efficiently since keys are stored in lexicographic order. Unlike a comparison-based search tree, a trie does not store keys in nodes. Rather, the digital representation of a search key is split into partial keys used to index the nodes. When constructing a trie from a set of keys, all insertion orders result in the same tree. Tries have no notion of balance and therefore do not require rebalancing operations. -
CS302ES Regulations
DATA STRUCTURES Subject Code: CS302ES Regulations : R18 - JNTUH Class: II Year B.Tech CSE I Semester Department of Computer Science and Engineering Bharat Institute of Engineering and Technology Ibrahimpatnam-501510,Hyderabad DATA STRUCTURES [CS302ES] COURSE PLANNER I. CourseOverview: This course introduces the core principles and techniques for Data structures. Students will gain experience in how to keep a data in an ordered fashion in the computer. Students can improve their programming skills using Data Structures Concepts through C. II. Prerequisite: A course on “Programming for Problem Solving”. III. CourseObjective: S. No Objective 1 Exploring basic data structures such as stacks and queues. 2 Introduces a variety of data structures such as hash tables, search trees, tries, heaps, graphs 3 Introduces sorting and pattern matching algorithms IV. CourseOutcome: Knowledge Course CO. Course Outcomes (CO) Level No. (Blooms Level) CO1 Ability to select the data structures that efficiently L4:Analysis model the information in a problem. CO2 Ability to assess efficiency trade-offs among different data structure implementations or L4:Analysis combinations. L5: Synthesis CO3 Implement and know the application of algorithms for sorting and pattern matching. Data Structures Data Design programs using a variety of data structures, CO4 including hash tables, binary and general tree L6:Create structures, search trees, tries, heaps, graphs, and AVL-trees. V. How program outcomes areassessed: Program Outcomes (PO) Level Proficiency assessed by PO1 Engineeering knowledge: Apply the knowledge of 2.5 Assignments, Mathematics, science, engineering fundamentals and Tutorials, Mock an engineering specialization to the solution of II B Tech I SEM CSE Page 45 complex engineering problems. -
Artful Indexing for Main-Memory Databases
Indexing for Main-Memory data systems: The Adaptive Radix Tree (ART) Ivan Sinyagin Memory Wall Why indexes ? Best data structure O(1) ? Binary Search ! Binary Search • Cache utilization is low • Only first 3-5 cache lines have good temporal locality • Only the last cache line has spacial locality • Updates in a sorted array are expensive Trees T-tree • Sorted array split into balanced BST with fat nodes (~ cache lines) • Better than RB/AVL • Updates faster, but still expensive • Similar to BS: useless data movement to CPU (useful only min and max) • Developed in mid 80s and still(!) used in many DBMS B+ tree • B+ tree • Fanout => minimize random access by shallowing the tree • Keys fit into a cache line • Increased cache utilization (all keys are useful) • 1 useful pointer • Pipeline stalls - conditional logic • Still expensive updates: splitting & rebalancing CSB+ tree CSB+ tree • ~ 1999-2000 • Improved space complexity • Great cache line utilization: keys + 1 pointer • Node size ~ cache line • Update overhead - more logic to balance Can we do better ? • Less conditional logic • Cheap updates: no rebalancing, no splitting • Preserve order => tree • Preserve few random accesses (low height) • Preserve cache line utilization • Preserve space complexity Tries Radix Tree Implicit keys Space complexity Radix Tree span • k bits keys => k/s inner levels and 2^s pointers • 32 bit keys & span=1 => 32 levels & 2 pointers • 32 bit keys & span=2 => 16 levels & 4 pointers • 32 bit keys & span=3 => 11 levels & 8 pointers • 32 bit keys & span=4 => 8 -
Haskell Communities and Activities Report
Haskell Communities and Activities Report http://tinyurl.com/haskcar Thirty Fourth Edition — May 2018 Mihai Maruseac (ed.) Chris Allen Christopher Anand Moritz Angermann Francesco Ariis Heinrich Apfelmus Gershom Bazerman Doug Beardsley Jost Berthold Ingo Blechschmidt Sasa Bogicevic Emanuel Borsboom Jan Bracker Jeroen Bransen Joachim Breitner Rudy Braquehais Björn Buckwalter Erik de Castro Lopo Manuel M. T. Chakravarty Eitan Chatav Olaf Chitil Alberto Gómez Corona Nils Dallmeyer Tobias Dammers Kei Davis Dimitri DeFigueiredo Richard Eisenberg Maarten Faddegon Dennis Felsing Olle Fredriksson Phil Freeman Marc Fontaine PÁLI Gábor János Michał J. Gajda Ben Gamari Michael Georgoulopoulos Andrew Gill Mikhail Glushenkov Mark Grebe Gabor Greif Adam Gundry Jennifer Hackett Jurriaan Hage Martin Handley Bastiaan Heeren Sylvain Henry Joey Hess Kei Hibino Guillaume Hoffmann Graham Hutton Nicu Ionita Judah Jacobson Patrik Jansson Wanqiang Jiang Dzianis Kabanau Nikos Karagiannidis Anton Kholomiov Oleg Kiselyov Ivan Krišto Yasuaki Kudo Harendra Kumar Rob Leslie David Lettier Ben Lippmeier Andres Löh Rita Loogen Tim Matthews Simon Michael Andrey Mokhov Dino Morelli Damian Nadales Henrik Nilsson Wisnu Adi Nurcahyo Ulf Norell Ivan Perez Jens Petersen Sibi Prabakaran Bryan Richter Herbert Valerio Riedel Alexey Radkov Vaibhav Sagar Kareem Salah Michael Schröder Christian Höner zu Siederdissen Ben Sima Jeremy Singer Gideon Sireling Erik Sjöström Chris Smith Michael Snoyman David Sorokin Lennart Spitzner Yuriy Syrovetskiy Jonathan Thaler Henk-Jan van Tuyl Tillmann Vogt Michael Walker Li-yao Xia Kazu Yamamoto Yuji Yamamoto Brent Yorgey Christina Zeller Marco Zocca Preface This is the 34th edition of the Haskell Communities and Activities Report. This report has 148 entries, 5 more than in the previous edition. -
STAT 534 Lecture 4 April, 2019 Radix Trees C 2002 Marina Meil˘A [email protected]
STAT 534 Lecture 4 April, 2019 Radix trees c 2002 Marina Meil˘a [email protected] Reading CLRS Exercise 12–2 page 269. 1 Representing sets of strings with radix trees Radix trees are used to store sets of strings. Here we will discuss mostly binary strings, but they can be extended easily to store sets of strings over any alphabet. Below is an example of a radix tree that stores strings over the alphabet {0, 1} (binary strings). Each left-going edge is labeled with a 0 and each right-going edge with a 1. Thus, the path to any node in the tree is uniquely described as a string of 0’s and 1’s. The path to the root is the empty string “–”. For example, the path to the leftmost node in the tree is the string “0”; the path to the lowest node in the tree is the string “1011”. Note that there can be at most 2 strings of length 1, at most 4 strings of length 2, and at most 2h strings of length h. All the strings of a given length are at the same depth in the tree. A prefix of the string a =(a1a2 ...an) is any “beginning part” of a, i.e any ′ ′ ′ ′ ′ string a = (a1a2 ...ak) with k < n such that ai = ai for i = 1,...k. Ina radix tree, if node b is an ancestor of node a, then the string represented by b is a prefix of the string (represented by) a. For example, “10” is a prefix and an ancestor of “100”. -
Wormhole: a Fast Ordered Index for In-Memory Data Management
Wormhole: A Fast Ordered Index for In-memory Data Management Xingbo Wu†, Fan Ni‡, and Song Jiang‡ †University of Illinois at Chicago ‡University of Texas at Arlington Abstract 94% of query execution time in today’s in-memory databases [17]. Recent studies have proposed many op- In-memory data management systems, such as key-value timizations to improve them with a major focus on hash- stores, have become an essential infrastructure in today’s table-based key-value (KV) systems, including efforts big-data processing and cloud computing. They rely on on avoiding chaining in hash tables, improving memory efficient index structures to access data. While unordered access through cache prefetching, and exploiting paral- indexes, such as hash tables, can perform point search lelism with fine-grained locking [10, 21, 35]. With these with O(1) time, they cannot be used in many scenarios efforts the performance of index lookup can be pushed where range queries must be supported. Many ordered close to the hardware’s limit, where each lookup needs indexes, such as B+ tree and skip list, have a O(logN) only one or two memory accesses to reach the requested lookup cost, where N is number of keys in an index. For data [21]. an ordered index hosting billions of keys, it may take Unfortunately, the O(1) lookup performance and ben- more than 30 key-comparisons in a lookup, which is an efits of the optimizations are not available to ordered order of magnitude more expensive than that on a hash indexes used in important applications, such as B+ tree table. -
Concurrent Tries with Efficient Non-Blocking Snapshots
Concurrent Tries with Efficient Non-Blocking Snapshots Aleksandar Prokopec Nathan G. Bronson Phil Bagwell Martin Odersky EPFL Stanford Typesafe EPFL aleksandar.prokopec@epfl.ch [email protected] [email protected] martin.odersky@epfl.ch Abstract root T1: CAS root We describe a non-blocking concurrent hash trie based on shared- C3 T2: CAS C3 memory single-word compare-and-swap instructions. The hash trie supports standard mutable lock-free operations such as insertion, C2 ··· C2 C2’ ··· removal, lookup and their conditional variants. To ensure space- efficiency, removal operations compress the trie when necessary. C1 k3 C1’ k3 C1 k3 k5 We show how to implement an efficient lock-free snapshot op- k k k k k k k eration for concurrent hash tries. The snapshot operation uses a A 1 2 B 1 2 4 1 2 single-word compare-and-swap and avoids copying the data struc- ture eagerly. Snapshots are used to implement consistent iterators Figure 1. Hash tries and a linearizable size retrieval. We compare concurrent hash trie performance with other concurrent data structures and evaluate the performance of the snapshot operation. 2. We introduce a non-blocking, atomic constant-time snapshot Categories and Subject Descriptors E.1 [Data structures]: Trees operation. We show how to use them to implement atomic size retrieval, consistent iterators and an atomic clear operation. General Terms Algorithms 3. We present benchmarks that compare performance of concur- Keywords hash trie, concurrent data structure, snapshot, non- rent tries against other concurrent data structures across differ- blocking ent architectures. 1. Introduction Section 2 illustrates usefulness of snapshots. -
(Fall 2019) :: Tree Indexes II
Tree Indexes 08 Part II Intro to Database Systems Andy Pavlo 15-445/15-645 Computer Science Fall 2019 AP Carnegie Mellon University 2 UPCOMING DATABASE EVENTS Vertica Talk → Monday Sep 23rd @ 4:30pm → GHC 8102 CMU 15-445/645 (Fall 2019) 3 TODAY'S AGENDA More B+Trees Additional Index Magic Tries / Radix Trees Inverted Indexes CMU 15-445/645 (Fall 2019) 4 B+TREE: DUPLICATE KEYS Approach #1: Append Record Id → Add the tuple's unique record id as part of the key to ensure that all keys are unique. → The DBMS can still use partial keys to find tuples. Approach #2: Overflow Leaf Nodes → Allow leaf nodes to spill into overflow nodes that contain the duplicate keys. → This is more complex to maintain and modify. CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert 6 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 9 <5 <9 1 3 6 7 8 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 97 9 <5 <9 1 3 6 7 8 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 5 B+TREE: APPEND RECORD ID Insert <6,(Page,Slot)> 5 97 9 <5 <9 <9 ≥9 1 3 6 76 8 7 8 9 13 <Key,RecordId> CMU 15-445/645 (Fall 2019) 6 B+TREE: OVERFLOW LEAF NODES Insert 6 5 9 <5 <9 ≥9 1 3 6 7 8 9 13 6 CMU 15-445/645 (Fall 2019) 6 B+TREE: OVERFLOW LEAF NODES Insert 6 5 9 Insert 7 <5 <9 ≥9 1 3 6 7 8 9 13 6 7 CMU 15-445/645 (Fall 2019) 6 B+TREE: OVERFLOW LEAF NODES Insert 6 5 9 Insert 7 <5 <9 ≥9 Insert 6 1 3 6 7 8 9 13 6 7 6 CMU 15-445/645 (Fall 2019) 7 DEMO B+Tree vs. -
A Contention Adapting Approach to Concurrent Ordered Sets$
A Contention Adapting Approach to Concurrent Ordered SetsI Konstantinos Sagonasa,b, Kjell Winblada,∗ aDepartment of Information Technology, Uppsala University, Sweden bSchool of Electrical and Computer Engineering, National Technical University of Athens, Greece Abstract With multicores being ubiquitous, concurrent data structures are increasingly important. This article proposes a novel approach to concurrent data structure design where the data structure dynamically adapts its synchronization granularity based on the detected contention and the amount of data that operations are accessing. This approach not only has the potential to reduce overheads associated with synchronization in uncontended scenarios, but can also be beneficial when the amount of data that operations are accessing atomically is unknown. Using this adaptive approach we create a contention adapting search tree (CA tree) that can be used to implement concurrent ordered sets and maps with support for range queries and bulk operations. We provide detailed proof sketches for the linearizability as well as deadlock and livelock freedom of CA tree operations. We experimentally compare CA trees to state-of-the-art concurrent data structures and show that CA trees beat the best of the data structures that we compare against by over 50% in scenarios that contain basic set operations and range queries, outperform them by more than 1200% in scenarios that also contain range updates, and offer performance and scalability that is better than many of them on workloads that only contain basic set operations. Keywords: concurrent data structures, ordered sets, linearizability, range queries 1. Introduction With multicores being widespread, the need for efficient concurrent data structures has increased. -
Data Structures and Algorithms for Data-Parallel Computing in a Managed Runtime
Data Structures and Algorithms for Data-Parallel Computing in a Managed Runtime THÈSE NO 6264 (2014) PRÉSENTÉE LE 1ER SEPTEMBRE 2014 À LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS LABORATOIRE DE MÉTHODES DE PROGRAMMATION 1 PROGRAMME DOCTORAL EN INFORMATIQUE, COMMUNICATIONS ET INFORMATION ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR Aleksandar PROKOPEC acceptée sur proposition du jury: Prof. O. N. A. Svensson, président du jury Prof. M. Odersky, directeur de thèse Prof. D. S. Lea, rapporteur Prof. V. Kuncak, rapporteur Prof. E. Meijer, rapporteur Suisse 2014 Go confidently in the direction of your dreams. Live the life you’ve imagined. — Thoreau To my parents and everything they gave me in this life. Acknowledgements Writing an acknowledgment section is a tricky task. I always feared omitting somebody really important here. Through the last few years, whenever I remembered a person that influenced my life in some way, I made a note to put that person here. I really hope I didn’t forget anybody important. And by important I mean: anybody who somehow contributed to me obtaining a PhD in computer science. So get ready – this will be a long acknowledgement section. If somebody feels left out, he should know that it was probably by mistake. Anyway, here it goes. First of all, I would like to thank my PhD thesis advisor Martin Odersky for letting me be a part of the Scala Team at EPFL during the last five years. Being a part of development of something as big as Scala was an amazing experience and I am nothing but thankful for it. -
Hyperion: Building the Largest In-Memory Search Tree
Hyperion: Building the largest in-memory search tree MARKUS MÄSKER, Johannes Gutenberg University Mainz TIM SÜSS, Johannes Gutenberg University Mainz LARS NAGEL, Loughborough University LINGFANG ZENG, Huazhong University of Science and Technology ANDRÉ BRINKMANN, Johannes Gutenberg University Mainz Indexes are essential in data management systems to increase the Search trees have been used to index large data sets since speed of data retrievals. Widespread data structures to provide fast the 1950s [19]; examples are binary search trees [29], AVL and memory-efficient indexes are prefix tries. Implementations like trees [1], red-black trees [8], B-trees [9] and B+-trees. These Judy, ART, or HOT optimize their internal alignments for cache trees have the disadvantage of always storing complete keys and vector unit efficiency. While these measures usually improve inside their nodes and hence large parts of the keys many the performance substantially, they can have a negative impact on times. Binary trees also show bad caching behavior and do memory efficiency. not adapt well to new hardware architectures [44]. The result- In this paper we present Hyperion, a trie-based main-memory key-value store achieving extreme space efficiency. In contrast to ing performance issues become worse due to the increasing other data structures, Hyperion does not depend on CPU vector divergence between CPU and memory speed [42][46]. units, but scans the data structure linearly. Combined with a custom More memory-efficient data structures are tries [19][21] memory allocator, Hyperion accomplishes a remarkable data density which distribute the individual parts of a key over multiple while achieving a competitive point query and an exceptional range nodes to reduce redundancies.