Uva-DARE (Digital Academic Repository)

Total Page:16

File Type:pdf, Size:1020Kb

Uva-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) Scalable distributed data structures for database management Karlsson, S.J. Publication date 2000 Link to publication Citation for published version (APA): Karlsson, S. J. (2000). Scalable distributed data structures for database management. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:01 Oct 2021 Chapterr 1 Preliminaries s Wee start out by giving an introduction to SDDSs followed by a description of LH*,, which generated this track of data storage structures. We will motivate theirr existence and point out some possible application areas, subsequently wee describe common (basic) data structures (known to most readers) their behaviorr and properties in perspective of SDDSs. This defines the terminol- ogyy and it allows us later to describe design choices more accurately. 1.11 Birthground of SDDSs Inn traditional distributed files systems, in implementations like NFS or AFS, aa file resides entirely at one specific site. This presents obvious limitations. Nott only on the size of the file, but also on the access performance scalabil- ity.. To overcome these limitations, distribution over multiple sites has been used.. One example of such a scheme is round-robin [Cor88] where records off a file are evenly distributed by rotating through the nodes when records aree inserted. The hash-declustering method of [KTM084] assigns records to nodess on the basis of a hashing function. The range-partitioning method of [DGG+86]] divides key values into ranges and different ranges are assigned too different nodes. A common aspect of these schemes is their static behav- ior,, which means that the declustering criterion does not change over time. Hence,, updating a directory or declustering function is not required. The pricee to pay is that the file cannot expand over more sites than initially allocated. Too overcome this limitation of static schemes, dynamic partitioning is used.. The first such scheme is DLH [SPW90]. This scheme was designed for aa shared-memory system. In DLH, the file is in RAM and the file parameters aree cached in the local memory of each processor. The caches are refreshed selectivelyy when addressing errors occur and through atomic updates to all thee local memories at some points. DLH appears impressively efficient for highh insertion rates. 21 1 22 2 CHAPTERCHAPTER 1. PRELIMINARIES 1.22 SDDSs SDDSss were proposed for distributing files in a network multi-computer environment,, hence without a shared-memory. The first scheme was LH** [LNS93]. Distributed Dynamic Hashing (DDH) [Dev93] is another SDDS,, based on Dynamic Hashing [Lar78]. The idea with respect to LH* iss that DDH allows greater splitting autonomy by immediately splitting overflowingg buckets. One drawback is that while LH* limits the number of forwardingss to two1 when the client makes an addressing error, DDH may usee 0(log2 N) forwardings, where iV is the number of buckets in the DDH file. file. [WBW94]] extends LH* and DDH to more efficiently control the load of aa file. The main idea is to manage several buckets of a file per server while LH** and DDH have basically only one bucket per server. One also controls thee server load as opposed to the bucket load for LH*. Bothh [KW94] and [LNS94] propose primary key ordered files. In [KW94] thee access computations on the clients and servers use a distributed binary searchh tree, whereas the SDDSs in [LNS94], collectively termed RP*, use broadcastt or distributed n-ary trees. It is shown that both kinds of SDDSs alloww for much larger and faster files than the traditional ones. 1.33 Requirements from SDDSs SDDSss (Scalable Distributed Data Structures) such as a distributed vari- antt of Linear Hashing, LH* [LNS96], and others [Dev93][WBW94][LNS94], openss up new areas of storage capacity and data access. There are three requirementss for an SDDS: First, it should have no central directory to avoid hot-spots. Second, each client should have some approximate image of how data iss distributed. This image should be improved each time a client makes ann addressing error. Third, if the client has an outdated image, it is the responsibility of thee SDDS to forward the data to the correct data server and to adapt thee client's image. SDDSss are good for distributed computing since they aim at minimizing the communicationn and in turn minimize the response time, and enable more efficientt use of the processor time. Inn light of LH [Lit80] and LH* [LNS96] the following terms are used: Thee data sites termed servers can be used from any number of autonomous sitess termed clients. To avoid a hot-spot, there is no central directory for 1Inn theory, communication delays could trigger more forwarding [WBW94], 1.4.1.4. DATA STRUCTURES BACKGROUND 23 3 thee addressing across the current structure of the file. Each client has its ownn image of this structure. An image can become outdated when the file expands.. The client may then send a request to an incorrect server. The serverss forward such requests, possibly in several steps, towards the correct address.. The correct server appends to the reply a special message to the client,, called an Image Adjustment Message (IAM). The client adjusts its image,, avoiding repetition of the error. A well-designed SDDS should make addressingg errors occasional and forwards few, and should provide for the scalabilityy of the access performance when the file grows. A typically SDDS scenarioo has many more clients than servers, and that clients are reasonably active,, i.e., a hundred or more interactions in the lifetime of a client. 1.44 Data Structures — Background InIn this section we give a short overview on commonly used indexing data structuresstructures for indexing. We start out by introducing desirable features of suchsuch data structures. In a distributed scenario using SDDSs these desired propertiesproperties are of even higher importance. The workings of distributed data structuresstructures are not the main topic in this section, thus it can be skipped by thethe expert on data structures. Dataa in DBMSs is organized using Data Structures, also referenced to as accesss structures, access paths, accelerators, indices, or indexing structures. Wee identify three important properties for a "good" data structure. The firstt is that the application accesses to the the individual elements encoded in thee data structure should be fast, i.e. insertion or retrieval should be efficient (access(access overhead). Secondly, the storage overhead, i.e. the extra storage space neededd for organizing the data and improve the access speed should be low. Third,, the data structure should be able to handle the amount of data that iss needed scalable, i.e. the structure should dynamically adapt to different storagee sizes without deteriorated performance. 1.4.11 Retrieval Methods Dataa is structured in records containing fields (attributes), i.e, bank account information.. Some fields are the target of retrieval called search keys. The searchh method depends on the inherit search characteristics which can be classifiedd along the lines. Key retrieval (lookup) Range retrieval (domain specific) Approximate retrieval (sub-string, soundex) 24 4 CHAPTERCHAPTER 1. PRELIMINARIES Predicate search (filtering) Multi-dimensional searches (point, spatial, space, nearest) Information retrieval (probabilistic) Wheree some applications access data using one key, others may need too retrieve data in a certain range, which leads to range retrieval Approx- imateimate search is another type of retrieval, often user specified allowing for matchingg under a similarity measure. A special case of proximity is sub- stringg search, for example search addresses whose names contains the string "city";; Soundex search allows one to search for names of people that sounds likee "John" (Jon, John, Jonny, Jonni, Johnnie, Johnny). Soundex searching iss efficiently implemented by mapping the search key to a normalized (sound invariant)) spelling representation. Otherr kinds of retrieval might consider several fields simultaneously, of- tenn referred to as different attributes or dimensions. Example of some di- mensionss are spatial (x, ^-coordinate or in a common database it would bee Zip-code, Age, and Income. A multi-dimensional indexing structure al- lowss retrieval of data using several of these keys (dimensions) at the same time.. A more general case is predicate search, which allows the program to specifyy an arbitrary predicate, which, when invoked on the data, returns true/falsee value. If the predicate yields true the data is returned to the user/application. InformationInformation retrieval sciences are not that strict, and employs aa scoring function which scores the data returning the the "best matches"" ranked with he best match first. Web search engines, such as http:: //www. altavista. com/, employs various searching and scoring meth- ods. 1.4.22 Reasonable Properties AA reasonable well-behaved and efficient data structure can be expected to fulfilll most of the following statements. A data structure is a container that stores n items of data. Each item is identified by its key(s) (algorithms typically assumes that thee keys are unqieue but in practice it is often relaxed).
Recommended publications
  • C Programming: Data Structures and Algorithms
    C Programming: Data Structures and Algorithms An introduction to elementary programming concepts in C Jack Straub, Instructor Version 2.07 DRAFT C Programming: Data Structures and Algorithms, Version 2.07 DRAFT C Programming: Data Structures and Algorithms Version 2.07 DRAFT Copyright © 1996 through 2006 by Jack Straub ii 08/12/08 C Programming: Data Structures and Algorithms, Version 2.07 DRAFT Table of Contents COURSE OVERVIEW ........................................................................................ IX 1. BASICS.................................................................................................... 13 1.1 Objectives ...................................................................................................................................... 13 1.2 Typedef .......................................................................................................................................... 13 1.2.1 Typedef and Portability ............................................................................................................. 13 1.2.2 Typedef and Structures .............................................................................................................. 14 1.2.3 Typedef and Functions .............................................................................................................. 14 1.3 Pointers and Arrays ..................................................................................................................... 16 1.4 Dynamic Memory Allocation .....................................................................................................
    [Show full text]
  • Open Data Structures (In Java)
    Open Data Structures (in Java) Edition 0.1G Pat Morin Contents Acknowledgments ix Why This Book? xi 1 Introduction 1 1.1 The Need for Efficiency ..................... 2 1.2 Interfaces ............................. 4 1.2.1 The Queue, Stack, and Deque Interfaces . 5 1.2.2 The List Interface: Linear Sequences . 6 1.2.3 The USet Interface: Unordered Sets .......... 8 1.2.4 The SSet Interface: Sorted Sets ............ 9 1.3 Mathematical Background ................... 9 1.3.1 Exponentials and Logarithms . 10 1.3.2 Factorials ......................... 11 1.3.3 Asymptotic Notation . 12 1.3.4 Randomization and Probability . 15 1.4 The Model of Computation ................... 18 1.5 Correctness, Time Complexity, and Space Complexity . 19 1.6 Code Samples .......................... 22 1.7 List of Data Structures ..................... 22 1.8 Discussion and Exercises .................... 26 2 Array-Based Lists 29 2.1 ArrayStack: Fast Stack Operations Using an Array . 30 2.1.1 The Basics ........................ 30 2.1.2 Growing and Shrinking . 33 2.1.3 Summary ......................... 35 Contents 2.2 FastArrayStack: An Optimized ArrayStack . 35 2.3 ArrayQueue: An Array-Based Queue . 36 2.3.1 Summary ......................... 40 2.4 ArrayDeque: Fast Deque Operations Using an Array . 40 2.4.1 Summary ......................... 43 2.5 DualArrayDeque: Building a Deque from Two Stacks . 43 2.5.1 Balancing ......................... 47 2.5.2 Summary ......................... 49 2.6 RootishArrayStack: A Space-Efficient Array Stack . 49 2.6.1 Analysis of Growing and Shrinking . 54 2.6.2 Space Usage ....................... 54 2.6.3 Summary ......................... 55 2.6.4 Computing Square Roots . 56 2.7 Discussion and Exercises ...................
    [Show full text]
  • Kernel Extensions and Device Support Programming Concepts
    Bull Kernel Extensions and Device Support Programming Concepts AIX ORDER REFERENCE 86 A2 36JX 02 Bull Kernel Extensions and Device Support Programming Concepts AIX Software November 1999 BULL ELECTRONICS ANGERS CEDOC 34 Rue du Nid de Pie – BP 428 49004 ANGERS CEDEX 01 FRANCE ORDER REFERENCE 86 A2 36JX 02 The following copyright notice protects this book under the Copyright laws of the United States of America and other countries which prohibit such actions as, but not limited to, copying, distributing, modifying, and making derivative works. Copyright Bull S.A. 1992, 1999 Printed in France Suggestions and criticisms concerning the form, content, and presentation of this book are invited. A form is provided at the end of this book for this purpose. To order additional copies of this book or other Bull Technical Publications, you are invited to use the Ordering Form also provided at the end of this book. Trademarks and Acknowledgements We acknowledge the right of proprietors of trademarks mentioned in this book. AIXR is a registered trademark of International Business Machines Corporation, and is being used under licence. UNIX is a registered trademark in the United States of America and other countries licensed exclusively through the Open Group. Year 2000 The product documented in this manual is Year 2000 Ready. The information in this document is subject to change without notice. Groupe Bull will not be liable for errors contained herein, or for incidental or consequential damages in connection with the use of this material. Contents Trademarks and Acknowledgements . iii 64-bit Kernel Extension Development. 23 About This Book .
    [Show full text]
  • Fundamental Data Structures Contents
    Fundamental Data Structures Contents 1 Introduction 1 1.1 Abstract data type ........................................... 1 1.1.1 Examples ........................................... 1 1.1.2 Introduction .......................................... 2 1.1.3 Defining an abstract data type ................................. 2 1.1.4 Advantages of abstract data typing .............................. 4 1.1.5 Typical operations ...................................... 4 1.1.6 Examples ........................................... 5 1.1.7 Implementation ........................................ 5 1.1.8 See also ............................................ 6 1.1.9 Notes ............................................. 6 1.1.10 References .......................................... 6 1.1.11 Further ............................................ 7 1.1.12 External links ......................................... 7 1.2 Data structure ............................................. 7 1.2.1 Overview ........................................... 7 1.2.2 Examples ........................................... 7 1.2.3 Language support ....................................... 8 1.2.4 See also ............................................ 8 1.2.5 References .......................................... 8 1.2.6 Further reading ........................................ 8 1.2.7 External links ......................................... 9 1.3 Analysis of algorithms ......................................... 9 1.3.1 Cost models ......................................... 9 1.3.2 Run-time analysis
    [Show full text]
  • Open Data Structures (In Pseudocode)
    Open Data Structures (in pseudocode) Edition 0.1Gβ Pat Morin Contents Acknowledgments ix Why This Book? xi 1 Introduction 1 1.1 The Need for Efficiency ..................... 2 1.2 Interfaces ............................. 4 1.2.1 The Queue, Stack, and Deque Interfaces . 5 1.2.2 The List Interface: Linear Sequences . 6 1.2.3 The USet Interface: Unordered Sets .......... 8 1.2.4 The SSet Interface: Sorted Sets ............. 8 1.3 Mathematical Background ................... 9 1.3.1 Exponentials and Logarithms . 10 1.3.2 Factorials ......................... 11 1.3.3 Asymptotic Notation . 12 1.3.4 Randomization and Probability . 15 1.4 The Model of Computation ................... 18 1.5 Correctness, Time Complexity, and Space Complexity . 19 1.6 Code Samples .......................... 21 1.7 List of Data Structures ..................... 23 1.8 Discussion and Exercises .................... 23 2 Array-Based Lists 31 2.1 ArrayStack: Fast Stack Operations Using an Array . 32 2.1.1 The Basics ........................ 32 2.1.2 Growing and Shrinking . 35 2.1.3 Summary ......................... 37 Contents 2.2 FastArrayStack: An Optimized ArrayStack . 37 2.3 ArrayQueue: An Array-Based Queue . 38 2.3.1 Summary ......................... 41 2.4 ArrayDeque: Fast Deque Operations Using an Array . 42 2.4.1 Summary ......................... 44 2.5 DualArrayDeque: Building a Deque from Two Stacks . 44 2.5.1 Balancing ......................... 48 2.5.2 Summary ......................... 50 2.6 RootishArrayStack: A Space-Efficient Array Stack . 50 2.6.1 Analysis of Growing and Shrinking . 55 2.6.2 Space Usage ....................... 55 2.6.3 Summary ......................... 56 2.7 Discussion and Exercises .................... 57 3 Linked Lists 61 3.1 SLList: A Singly-Linked List .
    [Show full text]
  • Handout 09: Suggested Project Topics
    CS166 Handout 09 Spring 2021 April 13, 2021 Suggested Project Topics Here is a list of data structures and families of data structures we think you might find interesting topics for your research project. You're by no means limited to what's contained here; if you have another data structure you'd like to explore, feel free to do so! My Wish List Below is a list of topics where, each quarter, I secretly think “I hope someone wants to pick this topic this quarter!” These are data structures I’ve always wanted to learn a bit more about or that I think would be particularly fun to do a deep dive into. You are not in any way, shape, or form required to pick something from this list, and we aren’t offer- ing extra credit or anything like that if you do choose to select one of these topics. However, if any of them seem interesting to you, we’d be excited to see what you come up with over the quarter. • Bentley-Saxe dynamization (turning static data structures into dynamic data structures) • Bε-trees (a B-tree variant designed to minimize writes) • Chazelle and Guibas’s O(log n + k) 3D range search (fast range searches in 3D) • Crazy good chocolate pop tarts (deamortizing binary search trees) • Durocher’s RMQ structure (fast RMQ without the Method of Four Russians) • Dynamic prefix sum lower bounds (proving lower bounds on dynamic prefix parity) • Farach’s suffix tree algorithm (a brilliant, beautiful divide-and-conquer algorithm) • Geometric greedy trees (lower bounds on BSTs giving rise to a specific BST) • Ham sandwich trees (fast searches
    [Show full text]
  • Geometric-Based Optimization Algorithms for Cable Routing and Branching in Cluttered Environments
    Clemson University TigerPrints All Dissertations Dissertations August 2020 Geometric-based Optimization Algorithms for Cable Routing and Branching in Cluttered Environments Nafiseh Masoudi Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Recommended Citation Masoudi, Nafiseh, "Geometric-based Optimization Algorithms for Cable Routing and Branching in Cluttered Environments" (2020). All Dissertations. 2702. https://tigerprints.clemson.edu/all_dissertations/2702 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. GEOMETRIC BASED OPTIMIZATION ALGORITHMS FOR CABLE ROUTING AND BRANCHING IN CLUTTERED ENVIRONMENTS A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Mechanical Engineering by Nafiseh Masoudi August 2020 Accepted July 21, 2020 by: Dr. Georges M. Fadel, Committee Chair Dr. Margaret M. Wiecek Dr. Joshua D. Summers Dr. Cameron J. Turner Distinguished External Reviewer: Dr. Jonathan Cagan, Carnegie Mellon University ABSTRACT The need for designing lighter and more compact systems often leaves limited space for planning routes for the connectors that enable interactions among the system’s components. Finding optimal routes for these connectors in a densely populated environment left behind at the detail design stage has been a challenging problem for decades. A variety of deterministic as well as heuristic methods has been developed to address different instances of this problem. While the focus of the deterministic methods is primarily on the optimality of the final solution, the heuristics offer acceptable solutions, especially for such problems, in a reasonable amount of time without guaranteeing to find optimal solutions.
    [Show full text]
  • 1. Arrays an Array Is a Structure of Fixed-Size, Which Can Hold Items of the Same Data Type
    06/03/2020 8 Common Data Structures every Programmer must know 8 Common Data Structures every Programmer must know A quick introduction to 8 commonly used data structures Vijini Mallawaarachchi Feb 28 · 10 min read Data Structures are a specialized means of organizing and storing data in computers in such a way that we can perform operations on the stored data more efficiently. Data structures have a wide and diverse scope of usage across the fields of Computer Science and Software Engineering. Data structures are being used in almost every program or software system that has been developed. Moreover, data structures come under the fundamentals of Computer Science and Software Engineering. It is a key topic when it comes to Software Engineering interview questions. Hence as developers, we must have good knowledge about data structures. In this article, I will be briefly explaining 8 commonly used data structures every programmer must know. 1. Arrays An array is a structure of fixed-size, which can hold items of the same data type. It can be an array of integers, an array of floating-point numbers, an array of strings https://towardsdatascience.com/8-common-data-structures-every-programmer-must-know-171acf6a1a42 1/13 06/03/2020 8 Common Data Structures every Programmer must know or even an array of arrays (such as 2-dimensional arrays). Arrays are indexed, meaning that random access is possible. Fig 1. Visualization of basic Terminology of Arrays Array operations Traverse: Go through the elements and print them. Search: Search for an element in the array. You can search the element by its value or its index Update: Update the value of an existing element at a given index Inserting elements to an array and deleting elements from an array cannot be done straight away as arrays are fixed in size.
    [Show full text]
  • Open Data Structures (In C++)
    Open Data Structures (in C++) Edition 0.1Gβ Pat Morin Contents Acknowledgments ix Why This Book? xi Preface to the C++ Edition xiii 1 Introduction1 1.1 The Need for Efficiency.....................2 1.2 Interfaces.............................4 1.2.1 The Queue, Stack, and Deque Interfaces.......5 1.2.2 The List Interface: Linear Sequences.........6 1.2.3 The USet Interface: Unordered Sets..........8 1.2.4 The SSet Interface: Sorted Sets............8 1.3 Mathematical Background...................9 1.3.1 Exponentials and Logarithms............. 10 1.3.2 Factorials......................... 11 1.3.3 Asymptotic Notation.................. 12 1.3.4 Randomization and Probability............ 15 1.4 The Model of Computation................... 18 1.5 Correctness, Time Complexity, and Space Complexity... 19 1.6 Code Samples.......................... 22 1.7 List of Data Structures..................... 22 1.8 Discussion and Exercises.................... 25 2 Array-Based Lists 29 2.1 ArrayStack: Fast Stack Operations Using an Array..... 31 2.1.1 The Basics........................ 31 Contents 2.1.2 Growing and Shrinking................. 34 2.1.3 Summary......................... 36 2.2 FastArrayStack: An Optimized ArrayStack......... 36 2.3 ArrayQueue: An Array-Based Queue............. 37 2.3.1 Summary......................... 41 2.4 ArrayDeque: Fast Deque Operations Using an Array.... 41 2.4.1 Summary......................... 43 2.5 DualArrayDeque: Building a Deque from Two Stacks.... 44 2.5.1 Balancing......................... 47 2.5.2 Summary......................... 49 2.6 RootishArrayStack: A Space-Efficient Array Stack..... 50 2.6.1 Analysis of Growing and Shrinking.......... 54 2.6.2 Space Usage....................... 55 2.6.3 Summary......................... 56 2.6.4 Computing Square Roots...............
    [Show full text]
  • Midterm 2 2/27/06
    CSE 373 Midterm 2 2/27/06 Name ________________________________ Do not write your id number or any other confidential information on this page. There are 8 questions worth a total of 60 points. Please budget your time so you get to all of the questions, particularly some of the later ones that are worth more points than some of the earlier ones. Keep your answers brief and to the point. The exam is closed book. No calculators, laptops, cell phones, paging devices, PDAs, iPods, BlackBerrrys, time-travel machines, or other devices are allowed (or needed). Please wait to turn the page until everyone is told to begin. Page 1 of 9 CSE 373 Midterm 2 2/27/06 Score _________________ / 60 1. ______ / 6 2. ______ / 3 3. ______ / 6 4. ______ / 6 5. ______ / 6 6. ______ / 12 7. ______ / 12 8. ______ / 9 Page 2 of 9 CSE 373 Midterm 2 2/27/06 Question 1. (6 points) (a) What is the load factor of a hash table? (Give a definition.) (b) What is a reasonable value for the load factor of a hash table if the operations on the table are to be efficient (i.e., O(1) instead of something significantly slower)? (c) What needs to be true about a hash function if operations using that function to locate items in a hash table are to be efficient? Question 2. (3 points) You have been asked to design a hash function for a set data structure (an unordered collection with no duplicates). There are two possibilities.
    [Show full text]
  • Suggested Final Project Topics
    CS166 Handout 10 Spring 2019 April 25, 2019 Suggested Final Project Topics Here is a list of data structures and families of data structures we think you might find interesting topics for a final project. You're by no means limited to what's contained here; if you have another data structure you'd like to explore, feel free to do so! If You Liked Range Minimum Queries, Check Out… Range Semigroup Queries In the range minimum query problem, we wanted to preprocess an array so that we could quickly find the minimum element in that range. Imagine that instead of computing the minimum of the value in the range, we instead want to compute A[i] ★ A[i+1] ★ … ★ A[j] for some associative op- eration ★. If we know nothing about ★ other than the fact that it's associative, how would we go about solving this problem efficiently? Turns out there are some very clever solutions whose run- times involve the magical Ackermann inverse function. Why they're worth studying: If you really enjoyed the RMQ coverage from earlier in the quarter, this might be a great way to look back at those topics from a different perspective. You'll get a much more nuanced understanding of why our solutions work so quickly and how to adapt those tech- niques into novel settings. Lowest Common Ancestor and Level Ancestor Queries Range minimum queries can be used to solve the lowest common ancestors problem: given a tree, preprocess the tree so that queries of the form “what node in the tree is as deep as possible and has nodes u and v as descendants?” LCA queries have a ton of applications in suffix trees and other al- gorithmic domains, and there’s a beautiful connection between LCA and RMQ that gave rise to the first ⟨O(n), O(1)⟩ solution to both LCA and RMQ.
    [Show full text]
  • Index Concurrency Control
    Index Concurrency Control Index Concurrency Control 1 / 129 Index Concurrency Control Administrivia • Assignment 3 is due on Oct 19th @ 11:59pm • Exercise Sheeet 3 is due on Oct 19th @ 11:59pm (no late days allowed) 2 / 129 Recap Recap 3 / 129 Recap Index Data Structures • List of Data Structures: Hash Tables, B+Trees, Radix Trees • Most DBMSs automatically create an index to enforce integrity constraints. • B+Trees are the way to go for indexing data. 4 / 129 Recap Observation • We assumed that all the data structures that we have discussed so far are single-threaded. • But we need to allow multiple threads to safely access our data structures to take advantage of additional CPU cores and hide disk I/O stalls. 5 / 129 Recap Concurrency Control • A concurrency control protocol is the method that the DBMS uses to ensure "correct" results for concurrent operations on a shared object. • A protocol’s correctness criteria can vary: I Logical Correctness: Am I reading the data that I am supposed to read? I Physical Correctness: Is the internal representation of the object sound? 6 / 129 Recap Today’s Agenda • Latches Overview • Hash Table Latching • B+Tree Latching • Leaf Node Scans • Delayed Parent Updates (Blink-Tree) 7 / 129 Latches Overview Latches Overview 8 / 129 Latches Overview Locks vs. Latches • Locks I Protects the database’s logical contents from other txns. I Held for the duration of the transaction. I Need to be able to rollback changes. • Latches I Protects the critical sections of the DBMS’s internal physical data structures from other threads. I Held for the duration of the operation.
    [Show full text]