Exact Positioning of Data Approach to Memory Mapped Persistent Stores: Design, Analysis and Modelling

Total Page:16

File Type:pdf, Size:1020Kb

Exact Positioning of Data Approach to Memory Mapped Persistent Stores: Design, Analysis and Modelling Exact Positioning of Data Approach to Memory Mapped Persistent Stores: Design, Analysis and Modelling by Anil K. Goel A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 1996 c Anil K. Goel 1996 I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other institutions or indi- viduals for the purpose of scholarly research. I further authorize the University of Waterloo to reproduce this thesis by photocopy- ing or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. iii The University of Waterloo requires the signatures of all persons using or photocopy- ing this thesis. Please sign below, and give address and date. v Abstract One of the primary functions of computers is to store information, i.e., to deal with long lived or persistent data. Programmers working with persistent data structures are faced with the problem that there are two, mostly incompatible, views of structured data, namely data in primary and secondary storage. Traditionally, these two views of data have been dealt with independently by researchers in the programming language and database communities. Significant research has occurred over the last decade on efficient and easy-to-use methods for manipulating persistent data structures in a fashion that makes the sec- ondary storage transparent to the programmer. Merging primary and secondary storage in this manner produces a single-level store, which gives the illusion that data on sec- ondary storage is accessible in the same way as data in primary storage. In complex de- sign environments, a single-level store offers substantial performance advantages over conventional file or database access. These advantages are crucial to unconventional database applications such as computer-aided design, text management, and geograph- ical information systems. In addition, a single-level store reduces complexity in a pro- gram by freeing the programmer from the responsibility of dealing with two views of data. This dissertation proposes, develops and investigates a novel approach for imple- menting single-level stores using memory mapping. Memory mapping is the use of virtual memory to map data stored on secondary storage into primary storage so that the data is directly accessible by the processor's instructions. In this environment, all transfer of data to and from the secondary store takes place implicitly during program execution. The methodology was motivated by the significant simplification in expressing com- plex data structures offered by the technique of memory mapping. This work parallels vii other proposals that exploit the potential of memory mapping, but develops a unique approach based on the ideas of segmentation and exact positioning of data in memory. Rigorous experimentation has been conducted to demonstrate the effectiveness and ease of use of the proposed methodology vis-a-vis the traditional approaches of manipulating structured data on secondary storage. The behaviour of high-level database algorithms in the proposed memory mapped environment, especially in highly parallel systems, has been investigated. A quantitative analytical model of computation in this environment has been designed and validated through experiments conducted on several database join algorithms; parallel multi-disk versions of the traditional join algorithms were developed for this purpose. An analytical model of the system is extremely useful for data structure and algorithm designers for predicting general performance behaviour without having to construct and test specific algorithms. More importantly, a quantitative model is an essential tool for database subsystems such as a query optimizer. viii Acknowledgements The production of this dissertation would have been so much harder, if not impossi- ble, without the guidance, encouragement, mentoring, unbounded patience, hard work, friendship and generosity of my supervisor Dr. Peter Buhr. I thank Peter for everything he has done for me since my arrival at Waterloo. I am grateful to Dr. Prabhakar Ragde and Dr. Naomi Nishimura for co-supervising the theoretical aspects of this dissertation. Their contribution was instrumental in mak- ing this dissertation into a reality. I thank my wife Aparna, daughter Ankita and son Anirudh, who have all contributed to the dissertation in more ways than one. They provided the flicker of light during the darkest hours and ensured that I did not get lost somewhere along the way. I thank my external examiner, Dr. John Rosenberg of the University of Sydney, Aus- tralia and the other members of my examining committee, Dr. Paul Larson, Dr. Frank Tompa and Dr. Bruno Preiss for their valuable suggestions. Dr. Paul Larson, Dr. Bernhard Seeger, Andy Wai and David Clark provided valuable help during the early experiments. Dr. Ian Munro was always willing to discuss matters of theory. Several other people, in particular, N. Asokan, Gopi Attaluri, Lauri Brown and Glenn Paulley, offered suggestions and encouragement that was instrumental in preserving my senility and I am thankful to all of them. I am thankful to the Math Faculty Computing Facility at the University of Waterloo for maintaining a top notch computing environment and for employing me during the last three years of my stay at UW. MFCF and the wonderful people that work there certainly made my life much easier. In particular, I thank Bill Ince for never saying no. Finally, I thank all the people I have interacted with at Waterloo for making my stay here most enjoyable. ix To Aparna xi Contents 1 Introduction 1 1.1 The Single-Level Store . 2 1.2 Exact Positioning of Data Approach to Memory Mapping . 4 1.2.1 EPD Approach . 6 1.2.2 Multiple Accessible Databases and Inter-Database Pointers . 7 1.2.3 EPD Persistence Model . 10 1.3 Motivation . 12 1.4 The Thesis . 13 1.4.1 Dissertation Overview . 15 2 Memory Mapping and Single-Level Stores 19 2.1 Motivation for Using Memory Mapping . 20 2.2 Memory Mapping and the EPD Approach . 23 2.2.1 Non-Uniform Access Speed . 25 2.2.2 Advantages . 25 2.2.3 Disadvantages . 31 2.3 Survey of Related Work . 32 2.3.1 Software Approaches Based on Conventional Architectures . 32 xiii PS-Algol / POMS . 33 Napier / Brown's Stable Store . 35 E / EXODUS Storage Manager . 36 Other Language Efforts . 37 Texas: Pointer Swizzling at Page Fault Time . 37 Hybrid Pointer Swizzling . 39 ObjectStore . 40 Cricket . 42 QuickStore . 42 2.3.2 Architectural Approaches . 44 Bubba Database System . 44 MONADS Architecture . 45 Model for Address-Oriented Software . 45 Single Address Space Operating Systems (SASOSs) . 46 Grasshopper Operating System . 48 IBM RS 6000 and AS/400 . 49 Recoverable Virtual Memory . 49 Camelot Distributed Transaction System . 49 IBM's 801 prototype hardware architecture . 50 Clouds Distributed Operating System . 50 2.3.3 Others . 50 2.4 Summary . 51 3 Using the EPD Approach to Build a Single-Level Store 53 3.1 Database Design Methodology . 53 3.1.1 Design Objectives . 54 xiv 3.1.2 Basic Structure . 57 3.1.3 Representative . 58 3.1.4 Accessors . 62 3.1.5 Critique of Database . 64 3.2 Comparison of Database with Related Approaches . 65 3.3 Parallelism in Database . 69 3.3.1 Partitioned File Structures and Concurrent Retrievals . 71 3.3.2 Query Types and Parallelism . 72 3.3.3 Range Query Generators or Iterators . 73 3.3.4 Generic Concurrent Retrieval Algorithm . 74 3.4 Programming Issues and Interfaces . 78 3.4.1 Polymorphism . 78 3.4.2 Generic File Structures and Access Methods . 79 3.4.3 Storage Management . 80 Memory Organization . 81 3.4.4 Nested Memory Structure . 83 3.4.5 Address Space Tools . 83 3.4.6 Segment Tools . 84 3.4.7 Database Programming Interface . 85 3.4.8 Representative Interface . 86 Class Rep . 86 Class RepAccess . 87 Organization of Representative and Access Classes . 89 Class RepWrapper . 91 3.4.9 Heap Tools . 92 Storage Management Schemes . 93 xv Nesting Heaps . 94 Overflow Control . 95 Expansion Object . 96 3.4.10 Linked List Example . 99 List Application . 99 3.4.11 Linked List File Structure . 101 List Node . 101 Administration . 102 Expansion Class . 103 File Structure Class . 104 Access Class . 107 Generator . 107 Wrapper . 109 3.4.12 Programming Conventions . 111 3.4.13 B-Tree Example . 112 B-Tree Application . 113 Nested Memory Manager . 114 3.5 Analytical Modelling of the System . 119 3.5.1 Survey of Related Work . 120 Theoretical Models . 120 Database Studies . 122 3.5.2 Modelling . 124 Disk Transfer Time . 127 Memory Mapping Costs . 128 3.5.3 Using the Model to Analyze an Algorithm . 130 3.6 Summary . ..
Recommended publications
  • Memory Map: 68HC12 CPU and MC9S12DP256B Evaluation Board
    ECE331 Handout 7: Microcontroller Peripheral Hardware MCS12DP256B Block Diagram: Peripheral I/O Devices associated with Ports Expanded Microcontroller Architecture: CPU and Peripheral Hardware Details • CPU – components and control signals from Control Unit – connections between Control Unit and Data Path – components and signal flow within data path • Peripheral Hardware Memory Map – Memory and I/O Devices connected through Bus – all peripheral blocks mapped to addresses so CPU can read/write them – physical memory type varies (register, PROM, RAM) Handout 7: Peripheral Hardware Page 1 Memory Map: 68HCS12 CPU and MC9S12DP256B Evaluation Board Configuration Register Set Physical Memory Handout 7: Peripheral Hardware Page 2 Handout 7: Peripheral Hardware Page 3 HCS12 Modes of Operation MODC MODB MODA Mode Port A Port B 0 0 0 special single chip G.P. I/O G.P. I/O special expanded 0 0 1 narrow Addr/Data Addr 0 1 0 special peripheral Addr/Data Addr/Data 0 1 1 special expanded wide Addr/Data Addr/Data 1 0 0 normal single chip G.P. I/O G.P. I/O normal expanded 1 0 1 narrow Addr/Data Addr 1 1 0 reserved -- -- 1 1 1 normal expanded wide Addr/Data Addr/Data G.P. = general purpose HCS12 Ports for Expanded Modes Handout 7: Peripheral Hardware Page 4 Memory Basics •RAM: Random Access Memory – historically defined as memory array with individual bit access – refers to memory with both Read and Write capabilities •ROM: Read Only Memory – no capabilities for “online” memory Write operations – Write typically requires high voltages or erasing by UV light • Volatility of Memory – volatile memory loses data over time or when power is removed • RAM is volatile – non-volatile memory stores date even when power is removed • ROM is no n-vltilvolatile • Static vs.
    [Show full text]
  • Efficient Support of Position Independence on Non-Volatile Memory
    Efficient Support of Position Independence on Non-Volatile Memory Guoyang Chen Lei Zhang Richa Budhiraja Qualcomm Inc. North Carolina State University Qualcomm Inc. [email protected] Raleigh, NC [email protected] [email protected] Xipeng Shen Youfeng Wu North Carolina State University Intel Corp. Raleigh, NC [email protected] [email protected] ABSTRACT In Proceedings of MICRO-50, Cambridge, MA, USA, October 14–18, 2017, This paper explores solutions for enabling efficient supports of 13 pages. https://doi.org/10.1145/3123939.3124543 position independence of pointer-based data structures on byte- addressable None-Volatile Memory (NVM). When a dynamic data structure (e.g., a linked list) gets loaded from persistent storage into 1 INTRODUCTION main memory in different executions, the locations of the elements Byte-Addressable Non-Volatile Memory (NVM) is a new class contained in the data structure could differ in the address spaces of memory technologies, including phase-change memory (PCM), from one run to another. As a result, some special support must memristors, STT-MRAM, resistive RAM (ReRAM), and even flash- be provided to ensure that the pointers contained in the data struc- backed DRAM (NV-DIMMs). Unlike on DRAM, data on NVM tures always point to the correct locations, which is called position are durable, despite software crashes or power loss. Compared to independence. existing flash storage, some of these NVM technologies promise This paper shows the insufficiency of traditional methods in sup- 10-100x better performance, and can be accessed via memory in- porting position independence on NVM. It proposes a concept called structions rather than I/O operations.
    [Show full text]
  • English Arabic Technical Computing Dictionary
    English Arabic Technical Computing Dictionary Arabeyes Arabisation Team http://wiki.arabeyes.org/Technical Dictionary Versin: 0.1.29-04-2007 April 29, 2007 This is a compilation of the Technical Computing Dictionary that is under development at Arabeyes, the Arabic UNIX project. The technical dictionary aims to to translate and standardise technical terms that are used in software. It is an effort to unify the terms used across all Open Source projects and to present the user with consistant and understandable interfaces. This work is licensed under the FreeBSD Documentation License, the text of which is available at the back of this document. Contributors are welcome, please consult the URL above or contact [email protected]. Q Ì ÉJ ªË@ éÒ¢@ Ñ«YË QK AK.Q« ¨ðQåÓ .« èQK ñ¢ ÕæK ø YË@ úæ®JË@ úGñAm '@ ñÓA®ÊË éj èYë . l×. @QK. éÔg. QK ú¯ éÊÒªJÖÏ@ éJ J®JË@ HAjÊ¢Ö Ï@ YJ kñKð éÔg. QK úÍ@ ñÓA®Ë@ ¬YîE .ºKñJ ËAK. éîD J.Ë@ ÐYjJÒÊË éÒj. Óð éÓñê®Ó H. ñAg éêk. @ð Õç'Y®JË ð á ÔgQÖÏ@ á K. H. PAJË@ øXA®JË ,H. ñAmÌ'@ . ¾JÖ Ï .ñÓA®Ë@ éK AîE ú ¯ èQ¯ñJÖÏ@ ð ZAKñÊË ø X @ ú G. ø Q¯ ékP ù ë ñÓA®Ë@ ékP . éJ K.QªËAK. ÕÎ @ [email protected] . úΫ ÈAB@ ð@ èC«@ à@ñJªË@ úÍ@ H. AëYË@ ZAg. QË@ ,á ÒëAÖÏ@ ɾK. I. kQK A Abortive release êm .× (ú¾J.) ¨A¢®K@ Abort Aêk . @ Abscissa ú æJ Absolute address Ê¢Ó à@ñ J« Absolute pathname Ê¢Ó PAÓ Õæ @ Absolute path Ê¢Ó PAÓ Absolute Ê¢Ó Abstract class XQm.× ­J Abstract data type XQm.× HA KAJ K.
    [Show full text]
  • User Guide Commplete 4000 Single Board Computer (IPC-623C) User Guide S000277A Revision a All Rights Reserved
    MultiTech Model IPC-623C Single Board Computer for CommPlete 4000 Server User Guide CommPlete 4000 Single Board Computer (IPC-623C) User Guide S000277A Revision A All rights reserved. This publication may not be reproduced, in whole or in part, without prior expressed written permission from Multi-Tech Systems, Inc. All rights reserved. Copyright © 2002 by Multi-Tech Systems, Inc. Multi-Tech Systems, Inc. makes no representation or warranties with respect to the contents hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. Furthermore, Multi-Tech Systems, Inc. reserves the right to revise this publication and to make changes from time to time in the content hereof without obligation of Multi-Tech Systems, Inc., to notify any person or organization of such revisions or changes. Record of Revisions Revision Description A Manual released (08/12/02). Patents This product is covered by one or more of the following U.S. Patent Numbers: 5.301.274, 5.309.562, 5.355.365, 5.355.653, 5.452.289, 5.453.986. Other patents Pending. Trademarks The Multi-Tech logo is a registered trademark of Multi-Tech Systems, Inc. NetWare is a registered trademark of Novell, Inc. Pentium is a registered trademark of Intel Corporation. SCO is a registered trademark of Santa Cruz Operation, Inc. UNIX is a registered trademark of X/Open Company, Ltd. Windows 95 and Windows NT are registered trademarks of Microsoft. Multi-Tech Systems, Inc. 2205 Woodale Drive Mounds View, Minnesota 55112 (763) 785-3500 or (800) 328-9717 Fax (763) 785-9874 Tech Support (800) 972-2439 Internet Address: http://www.multitech.com Table of Contents Contents Chapter 1 - Introduction ...............................................................
    [Show full text]
  • Database : a Toolkit for Constructing Memory Mapped Databases Peter A
    Database : A Toolkit for Constructing Memory Mapped Databases Peter A. Buhr, Anil K. Goel and Anderson Wai Dept. of Computer Science, University of Waterloo Waterloo, Ontario, Canada, N2L 3G1 Abstract The main objective of this work was an ef®cient methodology for constructing low-level database tools that are built around a single-level store implemented using memory mapping. The methodology allowed normal programming pointers to be stored directly onto secondary storage, and subsequently retrieved and manipulated by other programs without the need for relocation, pointer swizzling or reading all the data. File structures for a database, e.g. a B- Tree, built using this approach are signi®cantly simpler to build, test, and maintain than traditional ®le structures. All access methods to the ®le structure are statically type-safe and ®le structure de®nitions can be generic in the type of the record and possibly key(s) stored in the ®le structure, which affords signi®cant code reuse. An additional design requirement is that multiple ®le structures may be simultaneously accessible by an application. Concurrency at both the front end (multiple accessors) and the back end (®le structure partitioned over multiple disks) are possible. Finally, experimental results between traditional and memory mapped ®les structures show that performance of a memory mapped ®le structure is as good or better than the traditional approach. 1 Introduction The main objective of this work was an ef®cient methodology for constructing low-level database tools that are built around a single-level store. A single-level store gives the illusion that data on disk (secondary storage) is accessible in the same way as data in main memory (primary storage), which is analogous to the goals of virtual memory.
    [Show full text]
  • Memory Mapping and DMA
    ,ch15.13676 Page 412 Friday, January 21, 2005 11:04 AM CHAPTER 15 Chapter 15 Memory Mapping and DMA This chapter delves into the area of Linux memory management, with an emphasis on techniques that are useful to the device driver writer. Many types of driver pro- gramming require some understanding of how the virtual memory subsystem works; the material we cover in this chapter comes in handy more than once as we get into some of the more complex and performance-critical subsystems. The virtual mem- ory subsystem is also a highly interesting part of the core Linux kernel and, there- fore, it merits a look. The material in this chapter is divided into three sections: • The first covers the implementation of the mmap system call, which allows the mapping of device memory directly into a user process’s address space. Not all devices require mmap support, but, for some, mapping device memory can yield significant performance improvements. • We then look at crossing the boundary from the other direction with a discus- sion of direct access to user-space pages. Relatively few drivers need this capabil- ity; in many cases, the kernel performs this sort of mapping without the driver even being aware of it. But an awareness of how to map user-space memory into the kernel (with get_user_pages) can be useful. • The final section covers direct memory access (DMA) I/O operations, which pro- vide peripherals with direct access to system memory. Of course, all of these techniques require an understanding of how Linux memory management works, so we start with an overview of that subsystem.
    [Show full text]
  • Virtual Memory and Linux
    Virtual Memory and Linux Matt Porter Embedded Linux Conference Europe October 13, 2016 About the original author, Alan Ott ● Unfortunately, he is unable to be here at ELCE 2016. ● Veteran embedded systems and Linux developer ● Linux Architect at SoftIron – 64-bit ARM servers and data center appliances – Hardware company, strong on software – Overdrive 3000, more products in process Physical Memory Single Address Space ● Simple systems have a single address space ● Memory and peripherals share – Memory is mapped to one part – Peripherals are mapped to another ● All processes and OS share the same memory space – No memory protection! – Processes can stomp one another – User space can stomp kernel mem! Single Address Space ● CPUs with single address space ● 8086-80206 ● ARM Cortex-M ● 8- and 16-bit PIC ● AVR ● SH-1, SH-2 ● Most 8- and 16-bit systems x86 Physical Memory Map ● Lots of Legacy ● RAM is split (DOS Area and Extended) ● Hardware mapped between RAM areas. ● High and Extended accessed differently Limitations ● Portable C programs expect flat memory ● Multiple memory access methods limit portability ● Management is tricky ● Need to know or detect total RAM ● Need to keep processes separated ● No protection ● Rogue programs can corrupt the entire system Virtual Memory What is Virtual Memory? ● Virtual Memory is a system that uses an address mapping ● Maps virtual address space to physical address space – Maps virtual addresses to physical RAM – Maps virtual addresses to hardware devices ● PCI devices ● GPU RAM ● On-SoC IP blocks What is Virtual Memory? ● Advantages ● Each processes can have a different memory mapping – One process's RAM is inaccessible (and invisible) to other processes.
    [Show full text]
  • I.MX51 Board Initialization and Memory Mapping Using the Linux Target Image Builder (LTIB) by Multimedia Application Division Freescale Semiconductor, Inc
    Freescale Semiconductor Document Number: AN3980 Application Note Rev. 0, 03/2010 i.MX51 Board Initialization and Memory Mapping Using the Linux Target Image Builder (LTIB) by Multimedia Application Division Freescale Semiconductor, Inc. Austin, TX This application note provides general information regarding Contents the board initialization process and the memory mapping 1. Linux Booting Process . 2 1.1. General Bootloader Objectives . 2 implementation of the Linux kernel using the LTIB in an 1.2. Tags in Linux Booting Process . 3 i.MX51 Board Support Package (BSP). 2. Board Initialization Process . 8 2.1. MACHINE_START Description and Flow . 8 The board initialization process is relatively complex. Hence 2.2. Board Initialization Function . 10 this application note provides general overview of the board 3. Memory Map . 13 initialization process, while explaining more about the 3.1. I/O Mapping Function Flow and Description . 13 3.2. Memory Map on i.MX51 . 14 memory mapping. 4. References . 16 The knowledge of these aspects enable better understanding 4.1. Freescale Semiconductor Documents . 16 4.2. Standard Documents . 16 of the BSP structure. When changes such as migrations to 5. Revision History . 17 another board or device with the memories, external board chips are needed, these are some of the elements that need to be changed on the software side. This application note is targeted to the i.MX51 BSPs, but it is applicable to any i.MX device. The structure and architecture of the system (software) is the same for all the i.MX BSPs. This application note covers the information, and the initialization process flow of a BSP running a LTIB, on an i.MX51 platform.
    [Show full text]
  • Virtual Memory ‰ Virtual Memory with Paging ‰ Page Table ‰ Multilevel Page Table
    Preview Memory Management With Mono-Process With Multi-Processes Multi-process with Fixed Partitions Modeling Multiprogramming Swapping Memory Management with Bitmaps Memory Management with Free-List Virtual Memory Virtual Memory with Paging Page Table Multilevel Page Table CS 431 Operating Systems 1 Memory Management Ideal Memory – Infinitely Large Infinitely Fast Non-volatile Inexpensive No such a memory, most computers has a memory hierarchy Memory hierarchy – Small, fast, very expensive registers Volatile, expensive cache memory Megabyte medium-speed, medium-speed RAM Tens or hundreds of gigabits of slow cheap, non-volatile disk storage (Hard Disks) Memory management is a part of the operating system which manages the memory hierarchy CS 431 Operating Systems 2 Memory Management (mono-process) Mono-programming without Swapping or Paging There is only memory sharing between a user program and the operating system in this system. Only one program could be loaded in the memory. When the program finishes its job, a scheduler (long term scheduler) chooses one job from the pool of jobs and loads it into the memory and starts to run. CS 431 Operating Systems 3 Memory Management (mono-process) CS 431 Operating Systems 4 Memory Management (multi-process) Multiprogramming with Fixed Partition - Memory is divided into n (possibly unequal size) partitions – can be done manually when the system started. 1. Fixed memory partitions with separate input queues for each partition 2. Fixed memory partitions with a single input queue. CS 431 Operating Systems 5 Memory Management (multi-process with fixed partition) CS 431 Operating Systems 6 Memory Management (multi-process with fixed partition) Fixed memory partitions with separate input queues for each partition When a job arrives, it can be put into the queue for the smallest partition that is large enough to hold it.
    [Show full text]
  • Programming Models for Emerging Non-Volatile Memory Technologies ANDYHARDWARE RUDOFF
    Programming Models for Emerging Non-Volatile Memory Technologies ANDYHARDWARE RUDOFF Andy Rudoff is an Enterprise pcoming advances in Non-Volatile Memory (NVM) technologies Storage Architect at Intel. will blur the line between storage and memory, creating a disruptive He has more than 25 years change to the way software is written. Just as NAND (Flash) has led of experience in operating U to the addition of new operations such as TRIM, next generation NVM will systems internals, file systems, support load/store operations and require new APIs. In this article, I describe and networking. Andy is co-author of the popular UNIX Network Programming book. some of the issues related to NVM programming, how they are currently More recently, he has focused on programming being resolved, and how you can learn more about the emerging interfaces. models and algorithms for Non-Volatile The needs of these emerging technologies will outgrow the traditional UNIX storage soft- Memory usage. [email protected] ware stack. Instead of basic read/write interfaces to block storage devices, NVM devices will offer more advanced operations to software components higher up in the stack. Instead of applications issuing reads and writes on files, converted into block I/O on SCSI devices, applications will turn to new programming models offering direct access to per- sistent memory (PM). The resulting programming models allow applications to leverage the benefits of technological advances in NVM. The immediate success of these advances and next generation NVM technologies will rely on the availability of useful and familiar interfaces for application software as well as kernel components.
    [Show full text]
  • Leanstore: In-Memory Data Management Beyond Main Memory
    LeanStore: In-Memory Data Management Beyond Main Memory Viktor Leis, Michael Haubenschild∗, Alfons Kemper, Thomas Neumann Technische Universitat¨ Munchen¨ Tableau Software∗ fleis,kemper,[email protected] [email protected]∗ Abstract—Disk-based database systems use buffer managers 60K in order to transparently manage data sets larger than main 40K memory. This traditional approach is effective at minimizing the number of I/O operations, but is also the major source of 20K overhead in comparison with in-memory systems. To avoid this TPC-C [txns/s] 0 overhead, in-memory database systems therefore abandon buffer BerkeleyDB LeanStore in-memoryWiredTiger management altogether, which makes handling data sets larger Fig. 1. Single-threaded in-memory TPC-C performance (100 warehouses). than main memory very difficult. In this work, we revisit this fundamental dichotomy and design Two representative proposals for efficiently managing larger- a novel storage manager that is optimized for modern hardware. Our evaluation, which is based on TPC-C and micro benchmarks, than-RAM data sets in main-memory systems are Anti- shows that our approach has little overhead in comparison Caching [7] and Siberia [8], [9], [10]. In comparison with a with a pure in-memory system when all data resides in main traditional buffer manager, these approaches exhibit one major memory. At the same time, like a traditional buffer manager, weakness: They are not capable of maintaining a replacement it is fully transparent and can manage very large data sets strategy over relational and index data. Either the indexes, effectively. Furthermore, due to low-overhead synchronization, our implementation is also highly scalable on multi-core CPUs.
    [Show full text]
  • A Survey of B-Tree Logging and Recovery Techniques
    1 A Survey of B-Tree Logging and Recovery Techniques GOETZ GRAEFE, Hewlett-Packard Laboratories B-trees have been ubiquitous in database management systems for several decades, and they serve in many other storage systems as well. Their basic structure and their basic operations are well understood including search, insertion, and deletion. However, implementation of transactional guarantees such as all-or-nothing failure atomicity and durability in spite of media and system failures seems to be difficult. High-performance techniques such as pseudo-deleted records, allocation-only logging, and transaction processing during crash recovery are widely used in commercial B-tree implementations but not widely understood. This survey collects many of these techniques as a reference for students, researchers, system architects, and software developers. Central in this discussion are physical data independence, separation of logical database contents and physical representation, and the concepts of user transactions and system transactions. Many of the techniques discussed are applicable beyond B-trees. Categories and Subject Descriptors: H.2.2 [Database Management]: Physical Design—Recovery and restart; H.2.4 [Database Management]: Systems—Transaction Processing; H.3.0 [Information Storage and Retrieval]: General General Terms: Algorithms ACM Reference Format: Graefe, G. 2012. A survey of B-tree logging and recovery techniques. ACM Trans. Datab. Syst. 37, 1, Article 1 (February 2012), 35 pages. DOI = 10.1145/2109196.2109197 http://doi.acm.org/10.1145/2109196.2109197 1. INTRODUCTION B-trees [Bayer and McCreight 1972] are primary data structures not only in databases but also in information retrieval, file systems, key-value stores, and many application- specific indexes.
    [Show full text]