DASH: Database Shadowing for Mobile DBMS

DASH: Database Shadowing for Mobile DBMS Youjip Won1 Sundoo Kim2 Juseong Yun2 Dam Quang Tuan2 Jiwon Seo2 1KAIST, Daejeon, Korea 2Hanyang University, Seoul, Korea [email protected] [email protected] ABSTRACT 1. INTRODUCTION In this work, we propose Database Shadowing, or DASH, Crash recovery is a vital part of DBMS design. Algorithms which is a new crash recovery technique for SQLite DBMS. for crash recovery range from naive full-file shadowing [15] DASH is a hybrid mixture of classical shadow paging and to the sophisticated ARIES protocol [38]. Most enterprise logging. DASH addresses four major issues in the current DBMS's, e.g., IBM DB2, Informix, Micrsoft SQL and Oracle SQLite journal modes: the performance and write amplifi- 8, use ARIES or its variants for efficient concurrency control. cation issues of the rollback mode and the storage space re- SQLite is one of the most widely used DBMS's. It is quirement and tail latency issues of the WAL mode. DASH deployed on nearly all computing platform such as smart- exploits two unique characteristics of SQLite: the database phones (e.g, Android, Tizen, Firefox, and iPhone [52]), dis- files are small and the transactions are entirely serialized. tributed filesystems (e.g., Ceph [58] and Gluster filesys- DASH consists of three key ingredients Aggregate Update, tem [1]), wearable devices (e.g., smart watch [4, 21]), and Atomic Exchange and Version Reset. Aggregate Update elim- automobiles [19, 55]. As a library-based embedded DBMS, inates the redundant write overhead and the requirement to SQLite deliberately adopts a basic transaction management maintain multiple snapshots both of which are inherent in and crash recovery scheme. SQLite executes the transac- the out-of-place update. Atomic Exchange resolves the over- tions in a serial manner yielding no-steal/force buffer man- head of updating the locations of individual database pages agement. It uses page-granularity physical logging for crash exploiting order-preserving nature of the metadata update recovery. These two design choices make the SQLite trans- operation in modern filesystem. Version Reset makes the actions suffer from excessive IO overhead; the insertion of a result of the Atomic Exchange durable without relying on single 100 Byte record in PERSIST mode with the FULL expensive filesystem journaling. The salient aspect of DASH SYNC option (the default option in Android Marshmal- lies in its simplicity and compatibility with the legacy. DASH low) incurs five fdatasync() invocations [30] with 40 KB of does not require any modifications in the underlying filesys- writes to the storage (Figure 1). Despite its popularity and tem or the database organization. It requires only 451 LOC significance in all ranges of modern computing platforms, we to implement. In Cyclomatic Complexity score, which rep- argue that SQLite has not received proper attention from resents software complexity, DASH renders 33% lower (sim- researchers and practitioners and leaves much to be desired pler) mark than PERSIST and WAL modes of SQLite. We from an IO efficiency perspective. implement DASH for SQLite on Android and extensively The inefficiency of IO in SQLite transactions is caused evaluate it on widely used smartphone devices. DASH yields by the un-orchestrated interaction between SQLite and the 4× performance gain over PERSIST mode (default journal- EXT4 file system. Many studies have addressed this issue [23, ing mode). Compared to WAL mode (the fastest journaling 27, 30, 40, 49]. Each of these works has its own drawback, mode), DASH uses only 2.5% of the storage space on aver- which bars itself from practical deployment in the com- age. The transaction latency of DASH at 99.9% is one fourth modity platform, e.g., space overhead [30], garbage collec- of that of WAL mode. tion overhead [27], overhead of redundant write [49] or non- existent hardware platform [26, 40]. PVLDB Reference Format: In this work, we propose a new crash recovery technique, Youjip Won, Sundoo Kim, Juseong Yun, Dam Quang Tuan, Jiwon called Database Shadowing or DASH for short, for SQLite. Seo. Database Shadowing Revisited: From A Mobile Perspective. We observe that the modern mobile DBMS exhibits unique PVLDB, 12(7): 793-806, 2019. characteristics. The mobile DBMS is very small (typically DOI: https://doi.org/10.14778/3317315.3317321 less than 100 KB), accesses to the mobile DBMS are entirely serialized, and consecutive transactions update many pages in common. Furthermore, in many cases a DBMS is This work is licensed under the Creative Commons Attribution- exclusively opened by a dedicated application. These some- NonCommercial-NoDerivatives 4.0 International License. To view a copy what evident findings have profound implication in design- of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For ing a crash recovery routine for SQLite. State of art recovery any use beyond those covered by this license, obtain permission by emailing routines are mostly built upon the foundation that the data- [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. base file is large (a few hundred MB at least) and there are Proceedings of the VLDB Endowment, Vol. 12, No. 7 multiple concurrent transactions with complicated depen- ISSN 2150-8097. dencies among them. Both of these characteristics make full DOI: https://doi.org/10.14778/3317315.3317321 793 file shadowing and shadowing paging [32, 62] an infeasible Table 1: Comparison of journal modes, : very good, : choice for crash recovery in modern DBMS design, the main good, ×: poor (PST: PERSIST mode, WAL: WAL mode,# reason being the excessive space overhead, the overhead of DASH: SHADOW mode, Thput: transaction throughput, redundant writes and the overhead of maintaining depen- Tlat: average transaction latency, TT lat: tail transaction la- dencies among the concurrent transactions. tency, Trcvy: recovery time) Based upon our observation, we establish three principles mode Thput Tlat TT lat Trcvy IO Vol space in designing a crash recovery scheme for SQLite. First, the PST × × space overhead of maintaining an extra copy of a database WAL # #× #× × file is not substantial. Second, there are no concurrent trans- DASH # # actions. Third, the filesystem guarantees that the updated # # filesystem metadata are made durable in the order in which they are updated. fdatasync() calls in a transaction. DASH uses only 2.5% DASH maintains a shadow file for each database file. The of the storage space of that of WAL mode. It reduces the shadow file in DASH represents the preceding state of the tail latency of an updated transaction at 99.9% by one associated database file, i.e., the one before the the most fourth against WAL mode. recent transaction was applied. DASH consists of three key technical ingredients: Aggregate Update, Atomic Exchange The beauty of DASH is its simplicity. It is implemented and Version Reset. In applying a transaction, DASH ag- with only 451 LOC. DASH does not require any modifica- gregates the dirty pages updated by the most recent two tions in the underlying filesystem or the database organiza- transactions and applies them to the shadow file as if they tion. It does not require any new hardware features, either. are from a single transaction. We call this method Aggre- The only significant change brought by DASH is its use of gate Update. After an Aggregate Update, the DBMS atom- a 256 bytes of unused space at the tail end of the data- ically switches the two files. Atomic Exchange guarantees base header page in SQLite. Database Shadowing can be the atomicity of the set of rename()'s without explicitly en- immediately deployed on the commodity platform. Table 1 forcing the order in which the results of its rename()'s are summarizes the comparison of Database Shadowing against made durable. Atomic Exchange dispenses with expensive PERSIST mode and WAL mode journaling. order-preserving primitives such as fdatasync() by exploit- The remainder of the paper is organized as follows. Sec- ing the order-preserving aspect of the metadata operation tion 2 explains background. Section 3 explains the charac- in modern filesystems [28, 35, 43, 53]. Version Reset guaran- teristics of SQLite. Section 4 provides an overview of DASH tees the durability of a transaction without invoking bulky and its key components. Section 5 explains the details of the filesystem journaling. implementation. Section 6 explains the optimization tech- Unlike full file shadowing, the database file and the shadow nique for the exclusive connection. In section 7, we evaluate file are not identical in DASH. They differ by those pages DASH and compare it with the existing journaling modes of updated by the most recently committed transaction. Un- SQLite. Section 8 summarizes the preceding works. Finally, like shadow paging [39], DASH updates the database pages section 9 concludes the paper. in the shadow file in an in-place manner. In-place update of DASH avoids the overhead of maintaining shadow copies of 2. BACKGROUND the database pages in a database file [6, 27]. DASH can be thought as file-granularity shadow paging. It is designed to 2.1 Synopsis be free from the inherent drawbacks of full file shadowing SQLite is a library-based DBMS. It accounts for more and shadow paging. DASH eliminates the redundant writes than 70% of the IOs on the Android platform [23]. The An- overhead in full file shadowing with \Aggregate Update", droid Runtime consists of essential libraries such as SQLite, mitigates the overhead of keeping track of the updated loca- libc, and various media libraries. At its inception, Android tions of the database pages in shadow paging with \Atomic smartphones used yaffs and jffs to manage the FTL- Exchange", and minimizes the overhead of committing a less Flash storage [29]. Recent smartphone products use a transaction with \Version Reset".

DASH: Database Shadowing for Mobile DBMS

Redbooks Paper Linux on IBM Zseries and S/390

CS 152: Computer Systems Architecture Storage Technologies

AMD Alchemy™ Processors Building a Root File System for Linux® Incorporating Memory Technology Devices

Recursive Updates in Copy-On-Write File Systems - Modeling and Analysis

F2punifycr: a Flash-Friendly Persistent Burst-Buffer File System

Zálohuj S BTRFS!

FRASH: Hierarchical File System for FRAM and Flash

Journaling File Systems

The Evolution of File Systems

Linux Kernel Series

CS 5472 - Advanced Topics in Computer Security

File Management What Is a File?