Bit Flip Resilient Prefix Tree

Total Page:16

File Type:pdf, Size:1020Kb

Bit Flip Resilient Prefix Tree Master thesis Database Group Bit Flip Resilient Prefix Tree Supervisors: Author: Prof. Dr.-Ing Wolfgang Lehner, Mina Ahmadi Dr.-Ing. Dirk Habich, Dipl.-Inf. Till Kolditz February 2016 Aufgabenstellung ¢£¤¥¦§ © © ¤¦ £ £ ¨ Master’s Thesis Description !"#$%&'()* +,-./,-0 102.3-4 526-0,40 7,52 7073-8 9:-3;<:=;9 ,2. 6,=,6598 >8 .06-0,452< 9-,245493- ?0,9;-0 45@04A B:5C0 9:54 ,C43 526-0,404 020-<8 0??5650268D 320 .-,/>,6E 54 9:0 .06-0,40 52 -0C5,>5C598 ,2. ,2 526-0,40 52 :,-./,-0 0--3- -,904A F20 =,-956;C,- 0??069 54 9:0 >59 ?C5=D /:56: 6,;404 , 452<C0 3- 7;C95=C0 >594 93 6:,2<0 9:0 49,90 .;0 93 4010-,C 52?C;0260 C5E0 :0,9D 634756 -,84 3- 0C069-56,C 6-3449,CEA GC-0,.8 93.,8D 9:0-0 ,-0 49;.504 -010,C52< 4010-0 7;C95H>59 ?C5= -,904 52 C,-<0 637=;90- 4849074D /:56: 6,2239 >0 :,2.C0. >8 98=56,C 0--3- .0906952< ,2. 63--06952< IJKKL 7,52 7073-8A M010-9:0C044D 9:54 :,-./,-0 9-02. ,CC3/0. 9:0 .010C3=7029 3? 7,52H7073-8 6029-56 .,9,>,40 4849074 73-0 9:,2 , .06,.0 ,<3A N:040 4849074 493-0 ,CC >;452044 .,9, 52 7,52 7073-8D /:0-0,4 9:08 E00= 4010-,C ,..59532,C .,9, 49-;69;-04 93 7,2,<0 ,CC 9:0 .,9, ,2. ,660C0-,90 ,66044 93 9:0 .,9, 15, 52.0O 49-;69;-04A F? 9:0 C,990-D 320 ?,75C8 -06029C8 =-3=340. ?3- ;40 52 ,2,C8956,C /3-EC3,.4 54 9:0 =-0?5O 9-00 PQR /:340 7,52 6:,-,690-549564 ,-0 9:,9 59 .515.04 , E08 5293 =-0?5O04 ,2. 9:,9 59 0O6044510C8 ;404 =35290-4A S59 ?C5= .09069532 /,4 ,C-0,.8 =-3=340. ?3- 52H7073-8 SH9-004 PTRA UVVWX#"YX') U&Z" [2.0O 49-;69;-04 ,-0 32C8 320 =3-9532 3? , .,9,>,40 484907 /:56: 200. 93 >0 :,-.020. ,<,5249 >59 ?C5=4A N:54 54 -0\;5-0. 93 >0 ,>C0 93 >020?59 ?-37 ?;9;-0 :,-./,-0 <020-,95324 ,2. 63=0 /59: 526-0,452< 0--3- -,904A ]'VX# N:0 <3,C 3? 9:54 9:0454 54 93 5210495<,90 70,24 93 .09069 ,2. 63--069 >59 ?C5= 0--3-4 ?3- 9:0 =-0?5O 9-00 52.0O 49-;69;-0A N:0 49;.029 54 0263;-,<0. 93 6:3340 320 :,-./,-0 ?,5C;-0 73.0C -0407>C52< 6;--029 3- ?;9;-0 >59 ?C5= -,904A ^5-49D ,==-3,6:04 ?3- , <020-,C =-0?5O 9-00 49-;69;-0 4:3;C. >0 0O,7520.A N:02D 9:0 49;.029 4:,CC 63245.0- 906:25\;04 ?3- 3=9575@0. =-0?5O 9-00 1,-5,294A _-,/>,6E4 52 9:-3;<:=;9 ,2. 7073-8 ?339=-529 ,-0 93 >0 70,4;-0. ,2. 9:0 ;40 3? `[a_ 5249-;695324 7;49 >0 -0<,-.0.A bZcZ&Z)#Zd PQR N:37,4 e54452<0-D S02f,752 `6:C0<0CD _5-E +,>56:D B3C?<,2< g0:20-h ijjNh i;0-8 j-3604452< 32 j-0?5O N-004A [2h j-3600.52<4 3? 9:0 k9: S50225,C K32?0-0260 32 [2231,9510 _,9, `849074 l040,-6: IK[_l TmQnD G45C37,-D K,C5?3-25,D o`GLA PTR N5CC e3C.59@D N:37,4 e54452<0-D S02f,752 `6:C0<0CD _5-E +,>56:D B3C?<,2< g0:20-h F2C520 >59 ?C5= .09069532 ?3- 52H7073-8 SH9-004 32 ;2-0C5,>C0 :,-./,-0A [2h pqrsttuvwxy rz {|t }tw{| ~w{tqw{vrw rqy|r rw { wxttw{ rw t quqt ~ wt wrvqu {| ¡ 2 Declaration of Authorship I, Mina Ahmadi, declare that this master's thesis entitled 'Bit Flip Re- silient Prefix Tree' is my own work. All information published and written by another person is acknowledged and referenced in the bibliography. Dresden, 1st of February 2016 3 Abstract Detecting bit flips in index structures of databases is an important issue. Nowadays, index structures reside in memory in order to avoid accessing the disk and make processes of databases faster. Since size of the memories are big enough nowadays, there is no problem to keep the indexes in memory but in case of bit flip occurrence, we would like to detect them to avoid accessing a part of memory which is not allocated or accessing wrong values. When bit flip happens, bits can change from 0 to 1 or vice versa. Bit flips can happen because of multiple reasons. Sometimes bit flips are transient, so they will go away after a reboot, but sometimes they will not change at all. The index structure which is going to be explored is prefix tree. In this thesis, parity bits are used for the pointers in the tree in the first step. In the second step, memory allocation for the prefix tree is done in a way, so that locating the nodes is possible through some calculations and any bit flips can be detected by any mismatch between the calculation and the existing pointers. In the third step, prefix tree is implemented in a way to reduce the memory accesses but bit flip detection is done in the same way as in the second step. Based on the results of these implementations, bit flips can be detected by using parity bits if the number of bit flips on a byte is odd. In the imple- mentations on the second and third step, any number of bit flips could be detected. There were some memory allocation issues in both of the methods which resulted in high memory consumption when a lot of keys were inserted into the tree, but a reasonable solution for that is given at the end of the thesis. 4 Acknowledgement Firstly, I would like to express my sincere gratitude to my advisor Dipl.-inf. Till Kolditz for the support of my master thesis, for his patience, motivation, and immense knowledge. Besides my advisor, I would like to thank the rest of my thesis committee: Prof. Dr.-Ing. Wolfgang Lehner and Dr.-Ing. Dirk Habich. Last but not the least, I would like to thank my family, my parents, my brothers, my sister and my friends for supporting me spiritually throughout writing this thesis and my life in general. 5 Contents List of Figures 10 List of Tables 12 1 Introduction 13 1.1 Motivation . 13 1.2 Objectives . 14 1.3 Structure of the Thesis . 14 2 Background and Related Works 15 2.1 Background . 15 2.1.1 What is Bit Flip? . 15 2.1.2 Types of Error . 15 2.1.3 Soft Errors Detection Mechanisms . 16 2.2 Related Works . 17 2.2.1 A Study of Index Structures for Main Memory Database Management Systems . 17 2.2.2 KISS-Tree: Small Latch-Free In-Memory Indexing on Modern Architectures . 21 2.2.3 CTrie . 22 2.2.4 Burst Trie . 23 2.2.5 Online Bit Flip Detection for In-Memory B-Trees on Unreliable Hardware . 25 2.2.6 Efficient Verification of B-tree Integrity . 28 6 2.2.7 Efficient In-Memory Indexing with Generalized Prefix Trees . 29 2.3 Summary . 31 3 Methodologies 32 3.1 Basic Structure of Prefix Tree . 33 3.2 Prefix Tree with Parity Bits . 34 3.3 New Layout for Node Storage in Prefix Tree . 36 3.3.1 Density of Tree . 36 3.3.2 Creation of Tree . 37 3.3.3 Insertion of Bit Flips . 38 3.3.4 Evaluation . 38 3.4 Structure of Prefix Tree with Less Memory Access . 39 3.4.1 Structure of Tree . 39 3.4.2 Insertion in Tree . 41 3.4.3 Insertion of Bit Flips . 43 3.4.4 Value Retrieval from Tree . 44 3.5 Summary . 46 4 Implementation 47 4.1 Implementation of Basic Prefix Tree . 47 4.2 Prefix Tree with Parity Bits . 48 4.3 New Layout for Node Storage in Prefix Tree . 51 4.3.1 Density of Tree . 51 4.3.2 Creation of the Dense Tree . 53 4.3.3 Insertion of Bit Flips . 54 4.3.4 Evaluation . 54 4.4 Implementation of Prefix Tree with Less Memory Accesses . 55 4.4.1 Insertion . 57 4.4.2 Bit Flip Insertion . 58 4.4.3 Retrieving Values from Tree . 60 4.5 Summary . 61 5 Evaluation and Future Work 62 5.1 Evaluation . 62 5.2 Future Work . 69 7 6 Bibliography 71 8 List of Figures 2.1 Extendible Hashing . 19 2.2 C-Trie Insertion . 23 2.3 Burst trie after insertion of a number of keys . 24 3.1 Overall structure of the thesis . 33 3.2 Traverse the tree for finding value of key 200 . 34 3.3 Tree structure . 41 3.4 Insertion of keys into the tree . 43 3.5 Value retrieval from tree . 45 4.1 Setting parity bits for the pointers . 49 4.2 Checking parity bits of the pointers . 50 4.3 Number of created nodes after using Poisson distribution for key creation . 52 4.4 Number of created nodes after using Uniform int distribution for key creation . 52 4.5 Number of created nodes after using keys which are multiples of three . 53 4.6 Virtual memory allocation for the first, second and third level 56 4.7 Insertion of bit flip in key-value pairs, nodes and flag bytes in the tree .
Recommended publications
  • CS302ES Regulations
    DATA STRUCTURES Subject Code: CS302ES Regulations : R18 - JNTUH Class: II Year B.Tech CSE I Semester Department of Computer Science and Engineering Bharat Institute of Engineering and Technology Ibrahimpatnam-501510,Hyderabad DATA STRUCTURES [CS302ES] COURSE PLANNER I. CourseOverview: This course introduces the core principles and techniques for Data structures. Students will gain experience in how to keep a data in an ordered fashion in the computer. Students can improve their programming skills using Data Structures Concepts through C. II. Prerequisite: A course on “Programming for Problem Solving”. III. CourseObjective: S. No Objective 1 Exploring basic data structures such as stacks and queues. 2 Introduces a variety of data structures such as hash tables, search trees, tries, heaps, graphs 3 Introduces sorting and pattern matching algorithms IV. CourseOutcome: Knowledge Course CO. Course Outcomes (CO) Level No. (Blooms Level) CO1 Ability to select the data structures that efficiently L4:Analysis model the information in a problem. CO2 Ability to assess efficiency trade-offs among different data structure implementations or L4:Analysis combinations. L5: Synthesis CO3 Implement and know the application of algorithms for sorting and pattern matching. Data Structures Data Design programs using a variety of data structures, CO4 including hash tables, binary and general tree L6:Create structures, search trees, tries, heaps, graphs, and AVL-trees. V. How program outcomes areassessed: Program Outcomes (PO) Level Proficiency assessed by PO1 Engineeering knowledge: Apply the knowledge of 2.5 Assignments, Mathematics, science, engineering fundamentals and Tutorials, Mock an engineering specialization to the solution of II B Tech I SEM CSE Page 45 complex engineering problems.
    [Show full text]
  • Haskell Communities and Activities Report
    Haskell Communities and Activities Report http://tinyurl.com/haskcar Thirty Fourth Edition — May 2018 Mihai Maruseac (ed.) Chris Allen Christopher Anand Moritz Angermann Francesco Ariis Heinrich Apfelmus Gershom Bazerman Doug Beardsley Jost Berthold Ingo Blechschmidt Sasa Bogicevic Emanuel Borsboom Jan Bracker Jeroen Bransen Joachim Breitner Rudy Braquehais Björn Buckwalter Erik de Castro Lopo Manuel M. T. Chakravarty Eitan Chatav Olaf Chitil Alberto Gómez Corona Nils Dallmeyer Tobias Dammers Kei Davis Dimitri DeFigueiredo Richard Eisenberg Maarten Faddegon Dennis Felsing Olle Fredriksson Phil Freeman Marc Fontaine PÁLI Gábor János Michał J. Gajda Ben Gamari Michael Georgoulopoulos Andrew Gill Mikhail Glushenkov Mark Grebe Gabor Greif Adam Gundry Jennifer Hackett Jurriaan Hage Martin Handley Bastiaan Heeren Sylvain Henry Joey Hess Kei Hibino Guillaume Hoffmann Graham Hutton Nicu Ionita Judah Jacobson Patrik Jansson Wanqiang Jiang Dzianis Kabanau Nikos Karagiannidis Anton Kholomiov Oleg Kiselyov Ivan Krišto Yasuaki Kudo Harendra Kumar Rob Leslie David Lettier Ben Lippmeier Andres Löh Rita Loogen Tim Matthews Simon Michael Andrey Mokhov Dino Morelli Damian Nadales Henrik Nilsson Wisnu Adi Nurcahyo Ulf Norell Ivan Perez Jens Petersen Sibi Prabakaran Bryan Richter Herbert Valerio Riedel Alexey Radkov Vaibhav Sagar Kareem Salah Michael Schröder Christian Höner zu Siederdissen Ben Sima Jeremy Singer Gideon Sireling Erik Sjöström Chris Smith Michael Snoyman David Sorokin Lennart Spitzner Yuriy Syrovetskiy Jonathan Thaler Henk-Jan van Tuyl Tillmann Vogt Michael Walker Li-yao Xia Kazu Yamamoto Yuji Yamamoto Brent Yorgey Christina Zeller Marco Zocca Preface This is the 34th edition of the Haskell Communities and Activities Report. This report has 148 entries, 5 more than in the previous edition.
    [Show full text]
  • Concurrent Tries with Efficient Non-Blocking Snapshots
    Concurrent Tries with Efficient Non-Blocking Snapshots Aleksandar Prokopec Nathan G. Bronson Phil Bagwell Martin Odersky EPFL Stanford Typesafe EPFL aleksandar.prokopec@epfl.ch [email protected] [email protected] martin.odersky@epfl.ch Abstract root T1: CAS root We describe a non-blocking concurrent hash trie based on shared- C3 T2: CAS C3 memory single-word compare-and-swap instructions. The hash trie supports standard mutable lock-free operations such as insertion, C2 ··· C2 C2’ ··· removal, lookup and their conditional variants. To ensure space- efficiency, removal operations compress the trie when necessary. C1 k3 C1’ k3 C1 k3 k5 We show how to implement an efficient lock-free snapshot op- k k k k k k k eration for concurrent hash tries. The snapshot operation uses a A 1 2 B 1 2 4 1 2 single-word compare-and-swap and avoids copying the data struc- ture eagerly. Snapshots are used to implement consistent iterators Figure 1. Hash tries and a linearizable size retrieval. We compare concurrent hash trie performance with other concurrent data structures and evaluate the performance of the snapshot operation. 2. We introduce a non-blocking, atomic constant-time snapshot Categories and Subject Descriptors E.1 [Data structures]: Trees operation. We show how to use them to implement atomic size retrieval, consistent iterators and an atomic clear operation. General Terms Algorithms 3. We present benchmarks that compare performance of concur- Keywords hash trie, concurrent data structure, snapshot, non- rent tries against other concurrent data structures across differ- blocking ent architectures. 1. Introduction Section 2 illustrates usefulness of snapshots.
    [Show full text]
  • A Contention Adapting Approach to Concurrent Ordered Sets$
    A Contention Adapting Approach to Concurrent Ordered SetsI Konstantinos Sagonasa,b, Kjell Winblada,∗ aDepartment of Information Technology, Uppsala University, Sweden bSchool of Electrical and Computer Engineering, National Technical University of Athens, Greece Abstract With multicores being ubiquitous, concurrent data structures are increasingly important. This article proposes a novel approach to concurrent data structure design where the data structure dynamically adapts its synchronization granularity based on the detected contention and the amount of data that operations are accessing. This approach not only has the potential to reduce overheads associated with synchronization in uncontended scenarios, but can also be beneficial when the amount of data that operations are accessing atomically is unknown. Using this adaptive approach we create a contention adapting search tree (CA tree) that can be used to implement concurrent ordered sets and maps with support for range queries and bulk operations. We provide detailed proof sketches for the linearizability as well as deadlock and livelock freedom of CA tree operations. We experimentally compare CA trees to state-of-the-art concurrent data structures and show that CA trees beat the best of the data structures that we compare against by over 50% in scenarios that contain basic set operations and range queries, outperform them by more than 1200% in scenarios that also contain range updates, and offer performance and scalability that is better than many of them on workloads that only contain basic set operations. Keywords: concurrent data structures, ordered sets, linearizability, range queries 1. Introduction With multicores being widespread, the need for efficient concurrent data structures has increased.
    [Show full text]
  • Data Structures and Algorithms for Data-Parallel Computing in a Managed Runtime
    Data Structures and Algorithms for Data-Parallel Computing in a Managed Runtime THÈSE NO 6264 (2014) PRÉSENTÉE LE 1ER SEPTEMBRE 2014 À LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS LABORATOIRE DE MÉTHODES DE PROGRAMMATION 1 PROGRAMME DOCTORAL EN INFORMATIQUE, COMMUNICATIONS ET INFORMATION ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR Aleksandar PROKOPEC acceptée sur proposition du jury: Prof. O. N. A. Svensson, président du jury Prof. M. Odersky, directeur de thèse Prof. D. S. Lea, rapporteur Prof. V. Kuncak, rapporteur Prof. E. Meijer, rapporteur Suisse 2014 Go confidently in the direction of your dreams. Live the life you’ve imagined. — Thoreau To my parents and everything they gave me in this life. Acknowledgements Writing an acknowledgment section is a tricky task. I always feared omitting somebody really important here. Through the last few years, whenever I remembered a person that influenced my life in some way, I made a note to put that person here. I really hope I didn’t forget anybody important. And by important I mean: anybody who somehow contributed to me obtaining a PhD in computer science. So get ready – this will be a long acknowledgement section. If somebody feels left out, he should know that it was probably by mistake. Anyway, here it goes. First of all, I would like to thank my PhD thesis advisor Martin Odersky for letting me be a part of the Scala Team at EPFL during the last five years. Being a part of development of something as big as Scala was an amazing experience and I am nothing but thankful for it.
    [Show full text]
  • Size Oblivious Programming with Infinimem
    Size Oblivious Programming with InfiniMem ? Sai Charan Koduru, Rajiv Gupta, Iulian Neamtiu Department of Computer Science and Engineering University of California, Riverside {scharan,gupta,neamtiu}@cs.ucr.edu Abstract. Many recently proposed BigData processing frameworks make programming easier, but typically expect the datasets to fit in the mem- ory of either a single multicore machine or a cluster of multicore ma- chines. When this assumption does not hold, these frameworks fail. We introduce the InfiniMem framework that enables size oblivious processing of large collections of objects that do not fit in memory by making them disk-resident. InfiniMem is easy to program with: the user just indicates the large collections of objects that are to be made disk-resident, while InfiniMem transparently handles their I/O management. The InfiniMem library can manage a very large number of objects in a uniform manner, even though the objects have different characteristics and relationships which, when processed, give rise to a wide range of access patterns re- quiring different organizations of data on the disk. We demonstrate the ease of programming and versatility of InfiniMem with 3 different prob- abilistic analytics algorithms, 3 different graph processing size oblivious frameworks; they require minimal effort, 6{9 additional lines of code. We show that InfiniMem can successfully generate a mesh with 7.5 mil- lion nodes and 300 million edges (4.5 GB on disk) in 40 minutes and it performs the PageRank computation on a 14GB graph with 134 million vertices and 805 million edges at 14 minutes per iteration on an 8-core machine with 8 GB RAM.
    [Show full text]
  • Non-Blocking Interpolation Search Trees with Doubly-Logarithmic Running Time
    Non-Blocking Interpolation Search Trees with Doubly-Logarithmic Running Time Trevor Brown Aleksandar Prokopec Dan Alistarh University of Waterloo Oracle Labs Institute of Science and Technology Canada Switzerland Austria [email protected] [email protected] [email protected] Abstract are surprisingly robust to distributional skew, which sug- Balanced search trees typically use key comparisons to guide gests that our data structure can be a promising alternative their operations, and achieve logarithmic running time. By to classic concurrent search structures. relying on numerical properties of the keys, interpolation CCS Concepts • Theory of computation → Concur- search achieves lower search complexity and better perfor- rent algorithms; Shared memory algorithms; • Computing mance. Although interpolation-based data structures were methodologies → Concurrent algorithms; investigated in the past, their non-blocking concurrent vari- ants have received very little attention so far. Keywords concurrent data structures, search trees, inter- In this paper, we propose the first non-blocking imple- polation, non-blocking algorithms mentation of the classic interpolation search tree (IST) data structure. For arbitrary key distributions, the data structure ensures worst-case O¹logn + pº amortized time for search, 1 Introduction insertion and deletion traversals. When the input key distri- Efficient search data structures are critical in practical set- butions are smooth, lookups run in expected O¹log logn + pº tings such as databases, where the large amounts of under- time, and insertion and deletion run in expected amortized lying data are usually paired with high search volumes, and O¹log logn + pº time, where p is a bound on the number with high amounts of concurrency on the hardware side, of threads.
    [Show full text]
  • Efficient Support for Range Queries and Range Updates Using
    Efficient Support for Range Queries and Range Updates Using Contention Adapting Search Trees? Konstantinos Sagonas and Kjell Winblad Department of Information Technology, Uppsala University, Sweden Abstract. We extend contention adapting trees (CA trees), a family of concur- rent data structures for ordered sets, to support linearizable range queries, range updates, and operations that atomically operate on multiple keys such as bulk insertions and deletions. CA trees differ from related concurrent data structures by adapting themselves according to the contention level and the access patterns to scale well in a multitude of scenarios. Variants of CA trees with different per- formance characteristics can be derived by changing their sequential component. We experimentally compare CA trees to state-of-the-art concurrent data structures and show that CA trees beat the best data structures we compare against with up to 57% in scenarios that contain basic set operations and range queries, and outperform them by more than 1200% in scenarios that also contain range updates. 1 Introduction Data intensive applications on multicores need efficient and scalable concurrent data structures. Many concurrent data structures for ordered sets have recently been proposed (e.g [2,4,8, 11]) that scale well on workloads containing single key operations, e.g. insert, remove and get. However, most of these data structures lack efficient and scalable support for operations that atomically access multiple elements, such as range queries, range updates, bulk insert and remove, which are important for various applications such as in-memory databases. Operations that operate on a single element and those that operate on multiple ones have inherently conflicting requirements.
    [Show full text]
  • Lessons Learned in Designing TM-Supported Range Queries
    Leaplist: Lessons Learned in Designing TM-Supported Range Queries Hillel Avni Nir Shavit Adi Suissa Tel-Aviv University MIT and Tel-Aviv University Department of Computer Science Tel-Aviv 69978, Israel [email protected] Ben-Gurion University of the Negev [email protected] Be’er Sheva, Israel [email protected] Abstract that certain blocks of code should be executed atomically relative We introduce Leap-List, a concurrent data-structure that is tailored to one another. Recently, several fast concurrent binary search-tree to provide linearizable range queries. A lookup in Leap-List takes algorithms using STM have been introduced by Afek et al. [2] and O(log n) and is comparable to a balanced binary search tree or Bronson et al. [4]. Although they offer good performance for Up- to a skip-list. However, in Leap-List, each node holds up-to K dates, Removes and Finds, they achieve this performance, in part, immutable key-value pairs, so collecting a linearizable range is K by carefully limiting the amount of data protected by the transac- times faster than the same operation performed non-linearizably on tions. However, as we show empirically in this paper, computing a skip-list. a range query means protecting all keys in the range from modifi- We show how software transactional memory support in a com- cation during a transaction, leading to poor performance using the mercial compiler helped us create an efficient lock-based imple- direct STM approach. mentation of Leap-List. We used this STM to implement short Another simple approach is to lock the entire data structure, transactions which we call Locking Transactions (LT), to acquire and compute a range query while it is locked.
    [Show full text]
  • Persistent Adaptive Radix Trees: Efficient Fine-Grained Updates to Immutable Data
    Persistent Adaptive Radix Trees: Efficient Fine-Grained Updates to Immutable Data Ankur Dave, Joseph E. Gonzalez, Michael J. Franklin, Ion Stoica University of California, Berkeley Abstract of failure, these systems rebuild the lost output by simply re-executing the tasks on their inputs. Since inputs are Immutability dramatically simplifies fault tolerance, strag- immutable and tasks are independent and deterministic, gler mitigation, and data consistency and is an essential re-execution is guaranteed to generate the same result. To part of widely-used distributed batch analytics systems bound recovery time for long-running queries, they period- including MapReduce, Dryad, and Spark. However, these ically checkpoint the intermediate results. Since datasets systems are increasingly being used for new applications are never modified, checkpoints can be constructed asyn- like stream processing and incremental analytics, which chronously without delaying the query [14]. Immutability often demand fine-grained updates and are seemingly at thus enables both efficient task recovery and asynchronous odds with the essential assumption of immutability. In checkpointing. this paper we introduce the persistent adaptive radix tree (PART), a map data structure designed for in-memory an- However, distributed batch analytics tasks are increas- alytics that supports efficient fine-grained updates without ingly transitioning to streaming applications in which compromising immutability. PART (1) allows applica- data and their derived statistics are both large and rapidly tions to trade off latency for throughput using batching, changing. In contrast to batch analytics, these new ap- (2) supports efficient scans using an optimized memory plications require the ability to efficiently perform sparse layout and periodic compaction, and (3) achieves efficient fine-grained updates to very large distributed statistics fault recovery using incremental checkpoints.
    [Show full text]
  • KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
    KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures Thomas Kissinger, Benjamin Schlegel, Dirk Habich, Wolfgang Lehner Database Technology Group Technische Universität Dresden 01062 Dresden, Germany {firstname.lastname}@tu-dresden.de ABSTRACT Level 1 virtual Growing main memory capacities and an increasing num- ber of hardware threads in modern server systems led to 16bit fundamental changes in database architectures. Most im- portantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite Level 2 on-demand & of a high main memory scan performance, index structures 4 KB 4 KB 4 KB 4 KB … 4 KB 4 KB compact pointers 10bit are still important components, but they have to be designed from scratch to cope with the specific characteristics of main Level 3 compressed 6bit memory and to exploit the high degree of parallelism. Cur- rent research mainly focused on adapting block-optimized Thread-Local Memory Management Subsystem B+-Trees, but these data structures were designed for sec- ondary memory and involve comprehensive structural main- tenance for updates. Figure 1: KISS-Tree Overview. In this paper, we present the KISS-Tree, a latch-free in- memory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. indexes) in-memory and use secondary memory (e.g., disks) More specifically, we aim for the same performance as mod- only for persistence. While disk block accesses constituted ern hash-based algorithms but keeping the order-preserving the bottleneck for disk-based systems, modern in-memory nature of trees.
    [Show full text]
  • Fundamental Data Structures Zuyd Hogeschool, ICT Contents
    Fundamental Data Structures Zuyd Hogeschool, ICT Contents 1 Introduction 1 1.1 Abstract data type ........................................... 1 1.1.1 Examples ........................................... 1 1.1.2 Introduction .......................................... 2 1.1.3 Defining an abstract data type ................................. 2 1.1.4 Advantages of abstract data typing .............................. 5 1.1.5 Typical operations ...................................... 5 1.1.6 Examples ........................................... 6 1.1.7 Implementation ........................................ 7 1.1.8 See also ............................................ 8 1.1.9 Notes ............................................. 8 1.1.10 References .......................................... 8 1.1.11 Further ............................................ 9 1.1.12 External links ......................................... 9 1.2 Data structure ............................................. 9 1.2.1 Overview ........................................... 10 1.2.2 Examples ........................................... 10 1.2.3 Language support ....................................... 11 1.2.4 See also ............................................ 11 1.2.5 References .......................................... 11 1.2.6 Further reading ........................................ 11 1.2.7 External links ......................................... 12 2 Sequences 13 2.1 Array data type ............................................ 13 2.1.1 History ...........................................
    [Show full text]