Fineline: Log-Structured Transactional Storage and Recovery

FineLine: Log-structured Transactional Storage and Recovery ∗ Caetano Sauer Goetz Graefe Theo Harder¨ Tableau Software Google Inc. TU Kaiserslautern Munich, Germany Madison, WI, USA Kaiserslautern, Germany [email protected] [email protected] [email protected] ABSTRACT Main memory Main memory Recovery is an intricate aspect of transaction processing architectures. In its traditional implementation, recovery requires the management of two persistent data stores|a write-ahead log and a materialized database|which must Log DB be carefully orchestrated to maintain transactional consis- Indexed log tency. Furthermore, the design and implementation of recovery algorithms have deep ramifications into almost every a) Dual storage b) FineLine component of the internal system architecture, from con- Figure 1: Single-storage principle of FineLine currency control to buffer management and access path implementation. Such complexity not only incurs high costs storage: a log, which keeps track and establishes a global for development, testing, and training, but also unavoidably order of individual transactional updates; and a database, affects system performance, introducing overheads and lim- which is the permanent store for data pages. The former iting scalability. is an append-only structure, whose writes must be synchro- This paper proposes a novel approach for transactional nized at transaction commit time, while the latter is updated storage and recovery called FineLine. It simplifies the imple- asynchronously by propagating pages from a buffer pool into mentation of transactional database systems by eliminating the database. the log-database duality and maintaining all persistent data The primary reason behind such a dual-storage design, in a single, log-structured data structure. This approach illustrated in Fig.1a, is to maximize transaction throughput not only provides more efficient recovery with less overhead, while still providing transactional guarantees, taking into but also decouples the management of persistent data from account the characteristics of typical storage architectures. in-memory access paths. As such, it blurs the lines that The log, being an append-only structure, is write-optimized, separate in-memory from disk-based database systems, pro- while the database maintains data pages in a \ready-to-use" viding the efficiency of the former with the reliability of the format and is thus read-optimized. latter. The downside of the dual-storage approach is that main- PVLDB Reference Format: taining transactional consistency requires a careful orches- Caetano Sauer, Goetz Graefe, and Theo Härder.FineLine: Log- tration between the log and the database. This requires structured Transactional Storage and Recovery. PVLDB, 11 (13): complex software logic that not only leads to increased code 2249-2262, 2018. maintenance and testing costs, but also unavoidably limits DOI: https://doi.org/10.14778/3275366.3275373 the performance of memory-resident workloads. Further- more, the propagation of logged updates into the database 1. INTRODUCTION usually lags behind the log by a significant margin, leading Database systems widely adopt the write-ahead logging to longer outages during recovery from system failures. approach to support transactional storage and recovery. In These problems are known for a long time, as noted this approach, the system maintains two forms of persistent recently by Stonebraker, referring to the now 30-year old POSTGRES design [46]: ∗Work done partially at TU Kaiserslautern \. a DBMS is really two DBMSs, one managing This work is licensed under the Creative Commons Attribution- the database as we know it and a second one NonCommercial-NoDerivatives 4.0 International License. To view a copy managing the log." of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For any use beyond those covered by this license, obtain permission by emailing Michael Stonebraker [47] [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. A single-storage approach eliminates these problems, but existing solutions are usually less reliable and more restric- Proceedings of the VLDB Endowment, Vol. 11, No. 13 ISSN 2150-8097. tive than traditional write-ahead logging|for instance, be- DOI: https://doi.org/10.14778/3275366.3275373 cause they require all data to fit in main memory, are de- 2249 signed for non-volatile memory and cannot perform well amount of non-volatile memory is assumed. Some designs with hard disks and SSDs, or do not support media recovery. for in-memory databases employ a no-steal/no-force strat- This paper proposes a novel, single-storage approach egy [11, 33, 48], which eliminates persistent undo logging, for transactional storage and recovery called FineLine. It but a log is still required for redo recovery. employs a generalized, log-structured data structure for The LogBase approach [50] uses a log-based representa- persistent storage that delivers a sweet spot between the tion for persistent data. It maintains in-memory indexes for write efficiency of a traditional write-ahead log and the read key-value pairs where the leaf entries simply point to the efficiency of a materialized database. This data structure, latest version of each record in the log. This way, updates referred to here as the indexed log, is illustrated in Fig.1b. on the index are random in main memory but sequential on The indexed log of FineLine decouples persistence con- disk. In order to deliver acceptable scan performance, the cerns from the design of in-memory data structures, thus persistent log is then reorganized by sorting and merging, opening opportunity for several optimizations proposed for similar to log-structured merge (LSM) trees [37]. in-memory database systems. The system architecture is Despite the single-storage advantage, this approach has also general enough to support a wide range of storage de- three major limitations. The first limitation stems from the vices and larger-than-memory datasets via buffer manage- fact that the approach seems to be targeted at NoSQL key- ment. Unlike many existing proposals, this generalized ap- value stores rather than OLTP database systems. LogBase proach not only improves performance but also provides is designed for write-intensive, disk-resident workloads, and all features of write-ahead logging as implemented in the thus it makes poor use of main memory by storing the whole ARIES family of algorithms|e.g., media recovery, partial index in it and requiring an additional layer of indirection| rollbacks, index and space management, fine-granular recov- and thus additional caching|to fetch actual data records ery with physiological log records, and support for datasets from the log. Furthermore, it destroys clustering and larger than main memory. In fact, the capabilities of Fine- sequentiality properties of access paths, making scans quite Line go beyond ARIES with support for on-demand recov- inefficient and precluding the use of memory-optimized ery, localized (e.g., single-page) repair, and no-steal propa- index structures|e.g., the Masstree [34] used in Silo [48] gation, but with a substantially simpler design. Lastly, the or the Adaptive Radix Tree [31] used in HyPer [29]. FineLine design is completely independent from concurrency The third limitation is that recovery requires a full scan control schemes and access path implementation, i.e., it sup- of the log to rebuild the in-memory indexes. As a remedy, ports any storage and index structures with any concurrency the authors propose taking periodic checkpoints of these control protocol. indexes. However, these checkpoints essentially become a In the remainder of this paper, Section2 discusses related second persistent storage representation, and the single- work. Section3 introduces the basic system architecture storage characteristic is lost. While details of how recovery and its components. Sections4 and5 then expose the is implemented are not provided in the original publication details of the logging and recovery algorithms, respectively. [50], it seems like traditional recovery techniques like ARIES Finally, Section6 provides some preliminary experiments [35] or System R [21] are required. and Section7 concludes this paper. Hyder [6] is a distributed transactional system that maintains a shared log as the only persistent storage structure. 2. RELATED WORK The design greatly simplifies recovery and enables simple scale-out without partitioning. It also exploits the implicit This section discusses related work, focusing on the multi-versioning of its log-structured storage to perform op- limitations that the FineLine design aims to overcome. This timistic concurrency control with its novel meld algorithm. new approach is not expected to be superior to all existing Despite the design simplicity and scalability, its recovery approaches in their own target domains, but it is unique in and concurrency control protocols are tightly coupled. Fur- which it combines their advantages in a generalized design. thermore, a fundamental assumption of Hyder is that the 2.1 Single-storage approaches entire database is modelled as a binary search tree. These restrictions, while acceptable for the distributed scenario en- System designs that provide a single persistent storage visioned by the authors, have major performance implica- structure fall into two main categories: those that eliminate tions for single-node, memory-optimized database systems. the log and those

Load more