Fineline: Log-Structured Transactional Storage and Recovery

Total Page:16

File Type:pdf, Size:1020Kb

Fineline: Log-Structured Transactional Storage and Recovery FineLine: Log-structured Transactional Storage and Recovery ∗ Caetano Sauer Goetz Graefe Theo Harder¨ Tableau Software Google Inc. TU Kaiserslautern Munich, Germany Madison, WI, USA Kaiserslautern, Germany [email protected] [email protected] [email protected] ABSTRACT Main memory Main memory Recovery is an intricate aspect of transaction processing architectures. In its traditional implementation, recovery requires the management of two persistent data stores|a write-ahead log and a materialized database|which must Log DB be carefully orchestrated to maintain transactional consis- Indexed log tency. Furthermore, the design and implementation of re- covery algorithms have deep ramifications into almost every a) Dual storage b) FineLine component of the internal system architecture, from con- Figure 1: Single-storage principle of FineLine currency control to buffer management and access path im- plementation. Such complexity not only incurs high costs storage: a log, which keeps track and establishes a global for development, testing, and training, but also unavoidably order of individual transactional updates; and a database, affects system performance, introducing overheads and lim- which is the permanent store for data pages. The former iting scalability. is an append-only structure, whose writes must be synchro- This paper proposes a novel approach for transactional nized at transaction commit time, while the latter is updated storage and recovery called FineLine. It simplifies the imple- asynchronously by propagating pages from a buffer pool into mentation of transactional database systems by eliminating the database. the log-database duality and maintaining all persistent data The primary reason behind such a dual-storage design, in a single, log-structured data structure. This approach illustrated in Fig.1a, is to maximize transaction throughput not only provides more efficient recovery with less overhead, while still providing transactional guarantees, taking into but also decouples the management of persistent data from account the characteristics of typical storage architectures. in-memory access paths. As such, it blurs the lines that The log, being an append-only structure, is write-optimized, separate in-memory from disk-based database systems, pro- while the database maintains data pages in a \ready-to-use" viding the efficiency of the former with the reliability of the format and is thus read-optimized. latter. The downside of the dual-storage approach is that main- PVLDB Reference Format: taining transactional consistency requires a careful orches- Caetano Sauer, Goetz Graefe, and Theo H¨arder.FineLine: Log- tration between the log and the database. This requires structured Transactional Storage and Recovery. PVLDB, 11 (13): complex software logic that not only leads to increased code 2249-2262, 2018. maintenance and testing costs, but also unavoidably limits DOI: https://doi.org/10.14778/3275366.3275373 the performance of memory-resident workloads. Further- more, the propagation of logged updates into the database 1. INTRODUCTION usually lags behind the log by a significant margin, leading Database systems widely adopt the write-ahead logging to longer outages during recovery from system failures. approach to support transactional storage and recovery. In These problems are known for a long time, as noted this approach, the system maintains two forms of persistent recently by Stonebraker, referring to the now 30-year old POSTGRES design [46]: ∗Work done partially at TU Kaiserslautern \. a DBMS is really two DBMSs, one managing This work is licensed under the Creative Commons Attribution- the database as we know it and a second one NonCommercial-NoDerivatives 4.0 International License. To view a copy managing the log." of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For any use beyond those covered by this license, obtain permission by emailing Michael Stonebraker [47] [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. A single-storage approach eliminates these problems, but existing solutions are usually less reliable and more restric- Proceedings of the VLDB Endowment, Vol. 11, No. 13 ISSN 2150-8097. tive than traditional write-ahead logging|for instance, be- DOI: https://doi.org/10.14778/3275366.3275373 cause they require all data to fit in main memory, are de- 2249 signed for non-volatile memory and cannot perform well amount of non-volatile memory is assumed. Some designs with hard disks and SSDs, or do not support media recovery. for in-memory databases employ a no-steal/no-force strat- This paper proposes a novel, single-storage approach egy [11, 33, 48], which eliminates persistent undo logging, for transactional storage and recovery called FineLine. It but a log is still required for redo recovery. employs a generalized, log-structured data structure for The LogBase approach [50] uses a log-based representa- persistent storage that delivers a sweet spot between the tion for persistent data. It maintains in-memory indexes for write efficiency of a traditional write-ahead log and the read key-value pairs where the leaf entries simply point to the efficiency of a materialized database. This data structure, latest version of each record in the log. This way, updates referred to here as the indexed log, is illustrated in Fig.1b. on the index are random in main memory but sequential on The indexed log of FineLine decouples persistence con- disk. In order to deliver acceptable scan performance, the cerns from the design of in-memory data structures, thus persistent log is then reorganized by sorting and merging, opening opportunity for several optimizations proposed for similar to log-structured merge (LSM) trees [37]. in-memory database systems. The system architecture is Despite the single-storage advantage, this approach has also general enough to support a wide range of storage de- three major limitations. The first limitation stems from the vices and larger-than-memory datasets via buffer manage- fact that the approach seems to be targeted at NoSQL key- ment. Unlike many existing proposals, this generalized ap- value stores rather than OLTP database systems. LogBase proach not only improves performance but also provides is designed for write-intensive, disk-resident workloads, and all features of write-ahead logging as implemented in the thus it makes poor use of main memory by storing the whole ARIES family of algorithms|e.g., media recovery, partial index in it and requiring an additional layer of indirection| rollbacks, index and space management, fine-granular recov- and thus additional caching|to fetch actual data records ery with physiological log records, and support for datasets from the log. Furthermore, it destroys clustering and larger than main memory. In fact, the capabilities of Fine- sequentiality properties of access paths, making scans quite Line go beyond ARIES with support for on-demand recov- inefficient and precluding the use of memory-optimized ery, localized (e.g., single-page) repair, and no-steal propa- index structures|e.g., the Masstree [34] used in Silo [48] gation, but with a substantially simpler design. Lastly, the or the Adaptive Radix Tree [31] used in HyPer [29]. FineLine design is completely independent from concurrency The third limitation is that recovery requires a full scan control schemes and access path implementation, i.e., it sup- of the log to rebuild the in-memory indexes. As a remedy, ports any storage and index structures with any concurrency the authors propose taking periodic checkpoints of these control protocol. indexes. However, these checkpoints essentially become a In the remainder of this paper, Section2 discusses related second persistent storage representation, and the single- work. Section3 introduces the basic system architecture storage characteristic is lost. While details of how recovery and its components. Sections4 and5 then expose the is implemented are not provided in the original publication details of the logging and recovery algorithms, respectively. [50], it seems like traditional recovery techniques like ARIES Finally, Section6 provides some preliminary experiments [35] or System R [21] are required. and Section7 concludes this paper. Hyder [6] is a distributed transactional system that main- tains a shared log as the only persistent storage structure. 2. RELATED WORK The design greatly simplifies recovery and enables simple scale-out without partitioning. It also exploits the implicit This section discusses related work, focusing on the multi-versioning of its log-structured storage to perform op- limitations that the FineLine design aims to overcome. This timistic concurrency control with its novel meld algorithm. new approach is not expected to be superior to all existing Despite the design simplicity and scalability, its recovery approaches in their own target domains, but it is unique in and concurrency control protocols are tightly coupled. Fur- which it combines their advantages in a generalized design. thermore, a fundamental assumption of Hyder is that the 2.1 Single-storage approaches entire database is modelled as a binary search tree. These restrictions, while acceptable for the distributed scenario en- System designs that provide a single persistent storage visioned by the authors, have major performance implica- structure fall into two main categories: those that eliminate tions for single-node, memory-optimized database systems. the log and those
Recommended publications
  • Let's Talk About Storage & Recovery Methods for Non-Volatile Memory
    Let’s Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems Joy Arulraj Andrew Pavlo Subramanya R. Dulloor [email protected] [email protected] [email protected] Carnegie Mellon University Carnegie Mellon University Intel Labs ABSTRACT of power, the DBMS must write that data to a non-volatile device, The advent of non-volatile memory (NVM) will fundamentally such as a SSD or HDD. Such devices only support slow, bulk data change the dichotomy between memory and durable storage in transfers as blocks. Contrast this with volatile DRAM, where a database management systems (DBMSs). These new NVM devices DBMS can quickly read and write a single byte from these devices, are almost as fast as DRAM, but all writes to it are potentially but all data is lost once power is lost. persistent even after power loss. Existing DBMSs are unable to take In addition, there are inherent physical limitations that prevent full advantage of this technology because their internal architectures DRAM from scaling to capacities beyond today’s levels [46]. Using are predicated on the assumption that memory is volatile. With a large amount of DRAM also consumes a lot of energy since it NVM, many of the components of legacy DBMSs are unnecessary requires periodic refreshing to preserve data even if it is not actively and will degrade the performance of data intensive applications. used. Studies have shown that DRAM consumes about 40% of the To better understand these issues, we implemented three engines overall power consumed by a server [42]. in a modular DBMS testbed that are based on different storage Although flash-based SSDs have better storage capacities and use management architectures: (1) in-place updates, (2) copy-on-write less energy than DRAM, they have other issues that make them less updates, and (3) log-structured updates.
    [Show full text]
  • Failures in DBMS
    Chapter 11 Database Recovery 1 Failures in DBMS Two common kinds of failures StSystem filfailure (t)(e.g. power outage) ‒ affects all transactions currently in progress but does not physically damage the data (soft crash) Media failures (e.g. Head crash on the disk) ‒ damagg()e to the database (hard crash) ‒ need backup data Recoveryyp scheme responsible for handling failures and restoring database to consistent state 2 Recovery Recovering the database itself Recovery algorithm has two parts ‒ Actions taken during normal operation to ensure system can recover from failure (e.g., backup, log file) ‒ Actions taken after a failure to restore database to consistent state We will discuss (briefly) ‒ Transactions/Transaction recovery ‒ System Recovery 3 Transactions A database is updated by processing transactions that result in changes to one or more records. A user’s program may carry out many operations on the data retrieved from the database, but the DBMS is only concerned with data read/written from/to the database. The DBMS’s abstract view of a user program is a sequence of transactions (reads and writes). To understand database recovery, we must first understand the concept of transaction integrity. 4 Transactions A transaction is considered a logical unit of work ‒ START Statement: BEGIN TRANSACTION ‒ END Statement: COMMIT ‒ Execution errors: ROLLBACK Assume we want to transfer $100 from one bank (A) account to another (B): UPDATE Account_A SET Balance= Balance -100; UPDATE Account_B SET Balance= Balance +100; We want these two operations to appear as a single atomic action 5 Transactions We want these two operations to appear as a single atomic action ‒ To avoid inconsistent states of the database in-between the two updates ‒ And obviously we cannot allow the first UPDATE to be executed and the second not or vice versa.
    [Show full text]
  • SQL Server Protection Whitepaper
    SQL Server Protection Whitepaper Contents 1. Introduction ..................................................................................................................................... 2 Documentation .................................................................................................................................................................. 2 Licensing ............................................................................................................................................................................... 2 The benefits of using the SQL Server Add-on ....................................................................................................... 2 Requirements ...................................................................................................................................................................... 2 2. SQL Protection overview ................................................................................................................ 3 User databases ................................................................................................................................................................... 3 System databases .............................................................................................................................................................. 4 Transaction logs ................................................................................................................................................................
    [Show full text]
  • What Is Nosql? the Only Thing That All Nosql Solutions Providers Generally Agree on Is That the Term “Nosql” Isn’T Perfect, but It Is Catchy
    NoSQL GREG SYSADMINBURD Greg Burd is a Developer Choosing between databases used to boil down to examining the differences Advocate for Basho between the available commercial and open source relational databases . The term Technologies, makers of Riak. “database” had become synonymous with SQL, and for a while not much else came Before Basho, Greg spent close to being a viable solution for data storage . But recently there has been a shift nearly ten years as the product manager for in the database landscape . When considering options for data storage, there is a Berkeley DB at Sleepycat Software and then new game in town: NoSQL databases . In this article I’ll introduce this new cat- at Oracle. Previously, Greg worked for NeXT egory of databases, examine where they came from and what they are good for, and Computer, Sun Microsystems, and KnowNow. help you understand whether you, too, should be considering a NoSQL solution in Greg has long been an avid supporter of open place of, or in addition to, your RDBMS database . source software. [email protected] What Is NoSQL? The only thing that all NoSQL solutions providers generally agree on is that the term “NoSQL” isn’t perfect, but it is catchy . Most agree that the “no” stands for “not only”—an admission that the goal is not to reject SQL but, rather, to compensate for the technical limitations shared by the majority of relational database implemen- tations . In fact, NoSQL is more a rejection of a particular software and hardware architecture for databases than of any single technology, language, or product .
    [Show full text]
  • Oracle Nosql Database
    An Oracle White Paper November 2012 Oracle NoSQL Database Oracle NoSQL Database Table of Contents Introduction ........................................................................................ 2 Technical Overview ............................................................................ 4 Data Model ..................................................................................... 4 API ................................................................................................. 5 Create, Remove, Update, and Delete..................................................... 5 Iteration ................................................................................................... 6 Bulk Operation API ................................................................................. 7 Administration .................................................................................... 7 Architecture ........................................................................................ 8 Implementation ................................................................................... 9 Storage Nodes ............................................................................... 9 Client Driver ................................................................................. 10 Performance ..................................................................................... 11 Conclusion ....................................................................................... 12 1 Oracle NoSQL Database Introduction NoSQL databases
    [Show full text]
  • How to Conduct Transaction Log Analysis for Web Searching And
    Search Log Analysis: What is it; what’s been done; how to do it Bernard J. Jansen School of Information Sciences and Technology The Pennsylvania State University 329F IST Building University Park, Pennsylvania 16802 Email: [email protected] Abstract The use of data stored in transaction logs of Web search engines, Intranets, and Web sites can provide valuable insight into understanding the information-searching process of online searchers. This understanding can enlighten information system design, interface development, and devising the information architecture for content collections. This article presents a review and foundation for conducting Web search transaction log analysis. A methodology is outlined consisting of three stages, which are collection, preparation, and analysis. The three stages of the methodology are presented in detail with discussions of goals, metrics, and processes at each stage. Critical terms in transaction log analysis for Web searching are defined. The strengths and limitations of transaction log analysis as a research method are presented. An application to log client-side interactions that supplements transaction logs is reported on, and the application is made available for use by the research community. Suggestions are provided on ways to leverage the strengths of, while addressing the limitations of, transaction log analysis for Web searching research. Finally, a complete flat text transaction log from a commercial search engine is available as supplementary material with this manuscript. Introduction Researchers have used transaction logs for analyzing a variety of Web systems (Croft, Cook, & Wilder, 1995; Jansen, Spink, & Saracevic, 2000; Jones, Cunningham, & McNab, 1998; Wang, 1 of 42 Berry, & Yang, 2003). Web search engine companies use transaction logs (also referred to as search logs) to research searching trends and effects of system improvements (c.f., Google at http://www.google.com/press/zeitgeist.html or Yahoo! at http://buzz.yahoo.com/buzz_log/?fr=fp- buzz-morebuzz).
    [Show full text]
  • The Integration of Database Systems
    Purdue University Purdue e-Pubs Department of Computer Science Technical Reports Department of Computer Science 1993 The Integration of Database Systems Tony Schaller Omran A. Bukhres Ahmed K. Elmagarmid Purdue University, [email protected] Xiangning Liu Report Number: 93-046 Schaller, Tony; Bukhres, Omran A.; Elmagarmid, Ahmed K.; and Liu, Xiangning, "The Integration of Database Systems" (1993). Department of Computer Science Technical Reports. Paper 1061. https://docs.lib.purdue.edu/cstech/1061 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. The Integration of Database Systems Tony Schaller, Omran A. Bukhres, Ahmed K. Elmagarmid and Xiangning Liu CSD-TR-93-046 July 1993 I I The Integration of Database Systems Tony Schaller Molecular Design Ltd. 2132 Farallon Drive San Leandro, CA 94577 Omran A. Bukhres, Ahmed K. Elmagarmid and Xiangning Liu DeparLment of Computer Sciences Purdue University West Lafayette, IN 47907 eJIlail: {bukhres,ake,xl} .es.purdue.edu 1 Introduction A database system is composed of two elements: a software program, called a database management system, and a set of data, called a database. The data in a database is organized according to some data model, such as the relational model used in a DB2 database [DW88] or the hierarchical model found with IMS databases [Dat77] . Users access the data through an interface (the query language) provided by the database management system. A schema describes the actual data structures and organization within the system. During the decade ofthe nineteen-seventies, centralized databases were predominant, but recent innovations in communications and database technologies have engendered a revolution in data processing, giving rlse to a new generation of decentralized database systems.
    [Show full text]
  • High-Performance Transaction Processing in SAP HANA
    High-Performance Transaction Processing in SAP HANA Juchang Lee1, Michael Muehle1, Norman May1, Franz Faerber1, Vishal Sikka1, Hasso Plattner2, Jens Krueger2, Martin Grund3 1SAP AG 2Hasso Plattner Insitute, Potsdam, Germany, 3eXascale Infolab, University of Fribourg, Switzerland Abstract Modern enterprise applications are currently undergoing a complete paradigm shift away from tradi- tional transactional processing to combined analytical and transactional processing. This challenge of combining two opposing query types in a single database management system results in additional re- quirements for transaction management as well. In this paper, we discuss our approach to achieve high throughput for transactional query processing while allowing concurrent analytical queries. We present our approach to distributed snapshot isolation and optimized two-phase commit protocols. 1 Introduction An efficient and holistic data management infrastructure is one of the key requirements for making the right deci- sions at an operational, tactical, and strategic level and is core to support all kinds of enterprise applications[12]. In contrast to traditional architectures of database systems, the SAP HANA database takes a different approach to provide support for a wide range of data management tasks. The system is organized in a main-memory centric fashion to reflect the shift within the memory hierarchy[2] and to consistently provide high perfor- mance without prohibitively slow disk interactions. Completely transparent for the application, data is orga- nized along its life cycle either in column or row format, providing the best performance for different workload characteristics[11, 1]. Transactional workloads with a high update rate and point queries can be routed against a row store; analytical workloads with range scans over large datasets are supported by column oriented data struc- tures.
    [Show full text]
  • High Volume Transaction Processing Without Concurrency Control, Two Phase Commit, SQL Or
    High Volume Transaction Pro cessing Without Concurrency Control Two Phase Commit SQL or C Arthur Whitney Dennis Shasha Stevan Apter KX Systems Courant Institute NYU Union Bank of Switzerland Harker Avenue Mercer Street Park Avenue Palo Alto CA New York NY New York NY Abstract Imagine an application environment in which subsecond response to thousands of events gives the user a distinct competitive advantage yet transactional guarantees are important Imag ine also that the data ts comfortably into a few gigabytes of Random Access Memory These attributes characterize many nancial trading applications Which engine should one use in such a case IBM FastPath Sybase Oracle or Object Store We argue that an unconventional approach is cal led for we use a listbased language cal led K having optimized support for bulk array operators and that integrates networking and a graphical user interface Locking is unnecessary when singlethreading such applications because the data ts into memory obviating the need to go to disk except for logging purposes Multithreading can be hand led for OLTP applications by analyzing the arguments to transactions The result is a private sizereduced TPCB benchmark that achieves transactions per second with ful l recoverability and TCPIP overhead on an Megahertz UltraSparc I Further hot disaster recovery can be done with far less overhead than required by two phase commit by using a sequential state machine approach We show how to exploit multiple processors without complicating the language or our basic framework
    [Show full text]
  • A Transaction Processing Method for Distributed Database
    Advances in Computer Science Research, volume 87 3rd International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2019) A Transaction Processing Method for Distributed Database Zhian Lin a, Chi Zhang b School of Computer and Cyberspace Security, Communication University of China, Beijing, China [email protected], [email protected] Abstract. This paper introduces the distributed transaction processing model and two-phase commit protocol, and analyses the shortcomings of the two-phase commit protocol. And then we proposed a new distributed transaction processing method which adds heartbeat mechanism into the two- phase commit protocol. Using the method can improve reliability and reduce blocking in distributed transaction processing. Keywords: distributed transaction, two-phase commit protocol, heartbeat mechanism. 1. Introduction Most database services of application systems will be distributed on several servers, especially in some large-scale systems. Distributed transaction processing will be involved in the execution of business logic. At present, two-phase commit protocol is one of the methods to distributed transaction processing in distributed database systems. The two-phase commit protocol includes coordinator (transaction manager) and several participants (databases). In the process of communication between the coordinator and the participants, if the participants without reply for fail, the coordinator can only wait all the time, which can easily cause system blocking. In this paper, heartbeat mechanism is introduced to monitor participants, which avoid the risk of blocking of two-phase commit protocol, and improve the reliability and efficiency of distributed database system. 2. Distributed Transactions 2.1 Distributed Transaction Processing Model In a distributed system, each node is physically independent and they communicates and coordinates each other through the network.
    [Show full text]
  • Destiny® System Backups
    Destiny® system backups Establishing a backup and restore plan for Destiny Overview It is important to establish a backup and restore plan for your Destiny installation. The plan must be validated and monitored to ensure that your data is sufficiently backed up and can be recovered in the event of hardware failure or other disaster. IMPORTANT Follett recommends deploying a comprehensive backup solution and testing and monitoring all backup processes. There are tradeoff decisions to be made in the backup strategy that will be shaped by your organization’s risk tolerance, technology constraints, and ability to absorb data loss or downtime. This document provides an overview of standard backup and restore for the Destiny system. IMPORTANT This content does not cover high-availability configurations such as clustered servers or log shipping. Please contact Follett School Solutions, Inc. Technical Support should you have any questions about backing up your Destiny installation. For details, see Destiny® system backups. Backing up Destiny: an overview SQL database The heart of the Destiny data resides in the SQL database. These are the main components of SQL data to be backed up: The Destiny SQL database. This database contains the main Destiny data. Proper SQL backup configuration of this database is essential. NOTE If your installation is a consortium, you will have to ensure a proper backup of multiple SQL databases (one per member). The Destiny SQL transaction log. The Master SQL database. This system database is useful when restoring to a replacement server. It is not necessary to back up the tempdb database. This is a SQL Server work area and is recreated automatically.
    [Show full text]
  • Transaction Management in the R* Distributed Database Management System
    Transaction Management in the R* Distributed Database Management System C. MOHAN, B. LINDSAY, and R. OBERMARCK IBM Almaden Research Center This paper deals with the transaction management aspects of the R* distributed database system. It concentrates primarily on the description of the R* commit protocols, Presumed Abort (PA) and Presumed Commit (PC). PA and PC are extensions of the well-known, two-phase (2P) commit protocol. PA is optimized for read-only transactions and a class of multisite update transactions, and PC is optimized for other classes of multisite update transactions. The optimizations result in reduced intersite message traffic and log writes, and, consequently, a better response time. The paper also discusses R*‘s approach toward distributed deadlock detection and resolution. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems-distributed datahes; D.4.1 [Operating Systems]: Process Management-concurrency; deadlocks, syndvonization; D.4.7 [Operating Systems]: Organization and Design-distributed sys- tems; D.4.5 [Operating Systems]: Reliability--fault tolerance; H.2.0 [Database Management]: General-concurrency control; H.2.2 [Database Management]: ‘Physical Design-recouery and restart; H.2.4 [Database Management]: Systems-ditributed systems; transactionprocessing; H.2.7 [Database Management]: Database Administration-logging and recouery General Terms: Algorithms, Design, Reliability Additional Key Words and Phrases: Commit protocols, deadlock victim selection 1. INTRODUCTION R* is an experimental, distributed database management system (DDBMS) developed and operational at the IBM San Jose Research Laboratory (now renamed the IBM Almaden Research Center) 118, 201. In a distributed database system, the actions of a transaction (an atomic unit of consistency and recovery [13]) may occur at more than one site.
    [Show full text]