The Design and Implementation of a Non-Volatile Memory Database Management System Joy Arulraj

The Design and Implementation of a Non-Volatile Memory Database Management System Joy Arulraj CMU-CS-- July School of Computer Science Computer Science Department Carnegie Mellon University Pittsburgh, PA Thesis Committee Andrew Pavlo (Chair) Todd Mowry Greg Ganger Samuel Madden, MIT Donald Kossmann, Microsoft Research Submitted in partial fulllment of the requirements for the degree of Doctor of Philosophy. Copyright © Joy Arulraj This research was sponsored by the National Science Foundation under grant number CCF-, the University Industry Research Corporation, the Samsung Fellowship, and the Samson Fellowship. The views and conclusions con- tained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: Non-Volatile Memory, Database Management System, Logging and Recovery, Stor- age Management, Buffer Management, Indexing To my family Abstract This dissertation explores the implications of non-volatile memory (NVM) for database management systems (DBMSs). The advent of NVM will fundamentally change the dichotomy between volatile memory and durable storage in DBMSs. These new NVM devices are almost as fast as volatile memory, but all writes to them are persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. With NVM, many of the components of legacy DBMSs are unnecessary and will degrade the performance of data-intensive applications. We present the design and implementation of DBMS architectures that are explicitly tailored for NVM. The dissertation focuses on three aspects of a DBMS: () logging and recovery, () storage and buffer management, and () indexing. First, we present a logging and recovery protocol that enables the DBMS to support near-instantaneous recovery. Second, we propose a storage engine architecture and buffer management policy that leverages the durability and byte-addressability properties of NVM to reduce data duplication and data migration. Third, the dissertation presents the design of a range index tailored for NVM that is latch-free yet simple to implement. All together, the work described in this dissertation illustrates that rethinking the fundamental algorithms and data struc- tures employed in a DBMS for NVM improves performance and availability, reduces operational cost, and simplies software development. Acknowledgments Many people deserve special acknowledgments for making this dissertation possible. I thank my advisor Andy Pavlo for his unceasing support and thoughtful guidance. He was always generous with his time and gave me the freedom to pursue my research agenda. I greatly beneted from his expertise in designing and building software systems. He taught me everything from advanced topics in database systems to Wu Tang Clan. I could not have asked for a better advisor. I have been fortunate to have had the guidance of several remarkable mentors: Todd Mowry, Greg Ganger, Sam Madden, Donald Kossmann, Garth Gibson, and Michael Stonebraker. Their valuable feedback and constructive criticism helped shape this dissertation. I would like to express my gratitude to Sam, Mike, and Todd for their advice on the academic job market. I am indebted to Shan Lu who fostered my interest in software systems. She took the chance to work with me and introduced me to systems research. I have been fortunate to overlap with a multitude of exemplary researchers during my time at CMU: David Andersen, Subramanya Dulloor, Christos Faloutsos, Phil Gibbons, Anuj Kalia, Michael Kaminsky, Danai Koutra, Mike Kozuch, Viktor Leis, Hyeontaek Lim, Yixin Luo, Lin Ma, Prashanth Menon, Gennady Pekhimenko, Matthew Perron, Guru Prashanth, Ram Raghunathan, Majd Sakr, Jungmin Seo, Vivek Seshadri, Neil Shah, Sahil Singla, Anthony Tomasic, Dana Van Aken, Yingjun Wu, Huanchen Zhang, and Michael Zhang. The many members of the CMU Database group and Parallel Data Lab made for wonderful company. I am grateful to Deborah Cavlovich, Karen Lindenfelser, Jessica Parker, and Marilyn Walgora for their support. I worked on the BzTree project (Chapter ) during my internship at Microsoft Research. I am grateful to Justin Levandoski, Umar Farooq Minhas, and Paul Larson for their mentorship. Many others provided helpful feedback during the course of this work, including Phil Bernstein, Shel Finkelstein, David Lomet, Chandrasekaran Mohan, and Hamid Pirahesh. I thank my friends in Pittsburgh who often gave me reasons to laugh and think: Vivek, Ranjan, SKB, Divya, Vikram, and Lavanya. I am sure I have missed many others. Finally, I am deeply grateful for the love and support from my parents. I thank my wife Beno for being there with me every step of the way. My brother Leo and sister-in-law Vinoliya led by example and encouragement. I dedicate this dissertation to my family. Contents Abstract Acknowledgments Introduction . Thesis Statement and Overview . . Contributions ......................................... . Outline ............................................. The Case for a NVM-oriented DBMS . NVM-Only Architecture . .. Memory-oriented System . .. Disk-oriented System . . NVM+DRAM Architecture . .. Anti-Caching System . .. Disk-oriented System . . Experimental Evaluation . .. System Conguration . .. Benchmarks . .. NVM-Only Architecture . .. NVM+DRAM Architecture . .. Recovery . . Discussion . .. NVM-aware DBMS . .. Anti-Caching with Synchronous Fetches . . Goal of this Dissertation . Storage Management . DBMS Testbed . .. In-Place Updates Engine (InP) .......................... .. Copy-on-Write Updates Engine (CoW) ..................... .. Log-structured Updates Engine (Log) ...................... . NVM-Aware Engines . .. In-Place Updates Engine (NVM-InP) ...................... .. Copy-on-Write Updates Engine (NVM-CoW) ................. .. Log-structured Updates Engine (NVM-Log) .................. . Experimental Evaluation . .. Benchmarks . .. Runtime Performance . .. Reads & Writes . .. Recovery . .. Execution Time Breakdown . .. Storage Footprint . .. Analytical Cost Model . .. Impact of B+Tree Node Size . .. NVM Instruction Set Extensions . .. Discussion . . Summary............................................ Logging and Recovery . Recovery Principles . . Write-Ahead Logging . .. Runtime Operation . .. Commit Protocol . .. Recovery Protocol . . Write-Behind Logging . .. Runtime Operation . .. Commit Protocol . .. Recovery Protocol . . Replication . .. Replication Schemes . .. Record Format . . Experimental Evaluation . .. Benchmarks . .. Runtime Performance . .. Recovery Time . .. Storage Footprint . .. Replication . .. Impact of NVM Latency . .. NVM Instruction Set Extensions . .. Instant Recovery Protocol . .. Impact of Group Commit Latency . . Summary............................................ Buffer Management . Buffer Management Principles . . NVM-Aware Buffer Management . .. New Data Migration Paths . .. Bypass DRAM during Reads . .. Bypass DRAM during Writes . .. Bypass NVM During Reads . .. Bypass NVM During Writes . . Adaptive Data Migration . . Storage Hierarchy Selection . .. Hierarchical Storage System Model . .. Storage Hierarchy Recommender System . . Trace-Driven Buffer Manager . . Experimental Evaluation . .. Experimental Setup . .. Workload Skew Characterization . .. Impact of NVM on Runtime Performance . .. Storage Hierarchy Recommendation . .. Data Migration Policies . .. Buffer Management Policy Comparison . .. Adaptive Data Migration . . Summary............................................ Indexing . Persistent Multi-Word CAS . .. Durability . .. Execution . ..

The Design and Implementation of a Non-Volatile Memory Database Management System Joy Arulraj

Index Selection for In-Memory Databases

Amazon Aurora Mysql Database Administrator's Handbook

Innodb 1.1 for Mysql 5.5 User's Guide Innodb 1.1 for Mysql 5.5 User's Guide

Mysql Table Doesn T Exist in Engine

Migrating Borland® Database Engine Applications to Dbexpress

A Methodology for Evaluating Relational and Nosql Databases for Small-Scale Storage and Retrieval

Digital Forensic Innodb Database Engine for Employee Performance Appraisal Application

Introduction to Databases Presented by Yun Shen ([email protected]) Research Computing

Introducing Severalnines Clustercontrol™

Mysql Storage Engines Data in Mysql Is Stored in Files (Or Memory) Using a Variety of Different Techniques

Database Buffer Cache • Relation to Overall System • Storage Techniques in Context

Database Engine Tuning Advisor Index Recommendations