NOVA-Fortis: a Fault-Tolerant Non-Volatile Main Memory File

NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System Jian Xu∗ Lu Zhang∗ Amirsaman Memaripour University of California, San Diego University of California, San Diego University of California, San Diego [email protected] [email protected] Akshatha Gangadharaiah Amit Borase Tamires Brito Da Silva University of California, San Diego University of California, San Diego University of California, San Diego Steven Swanson Andy Rudoff University of California, San Diego Intel Corporation [email protected] [email protected] ABSTRACT CCS CONCEPTS Emerging fast, persistent memories will enable systems that • Information systems → Phase change memory; Mirror- combine conventional DRAM with large amounts of non- ing; RAID; Point-in-time copies; volatile main memory (NVMM) and provide huge increases in storage performance. Fully realizing this potential requires KEYWORDS fundamental changes in how system software manages, pro- Persistent Memory, Non-volatile Memory, Direct Access, tects, and provides access to data that resides in NVMM. We DAX, File Systems, Reliability address these needs by describing an NVMM-optimized file system called NOVA-Fortis that is both fast and resilient in ACM Reference Format: the face of corruption due to media errors and software bugs. Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangad- We identify and propose solutions for the unique challenges haraiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, in adding fault tolerance to an NVMM file system, adapt and Andy Rudoff. 2017. NOVA-Fortis: A Fault-Tolerant Non-Volatile state-of-the-art reliability techniques to an NVMM file sys- Main Memory File System. In Proceedings of ACM SIGOPS 26th Sym- tem, and quantify the performance and storage overheads posium on Operating Systems Principles (SOSP ’17). ACM, New York, of these techniques. We find that NOVA-Fortis’ reliability NY, USA, 19 pages. https://doi.org/10.1145/3132747.3132761 features consume 14.8% of the storage for redundancy and reduce application-level performance by between 2% and 38% 1 INTRODUCTION compared to the same file system with the features removed. Fast, persistent memory technologies (e.g., battery-backed NOVA-Fortis outperforms DAX-aware file systems without NVDIMMs [37] or Intel and Micron’s 3D XPoint [36]) will reliability features by 1.5× on average. It outperforms reli- enable computer systems with expansive memory systems able, block-based file systems running on NVMM by3× on that combine volatile and non-volatile memories. These hy- average. brid memory systems offer the promise of dramatic increases in storage performance. ∗The first two authors contributed equally to this work. Integrating NVMMs into computer systems presents a host of interesting challenges. The most pressing of these focus on how we should redesign existing software com- Permission to make digital or hard copies of part or all of this work for ponents (e.g., file systems) to accommodate and exploit the personal or classroom use is granted without fee provided that copies are different performance characteristics, interfaces, and seman- not made or distributed for profit or commercial advantage and that copies tics that NVMMs provide. bear this notice and the full citation on the first page. Copyrights for third- Several groups have proposed new file systems12 [ , 15, 64] party components of this work must be honored. For all other uses, contact designed specifically for NVMMs and several Windows and the owner/author(s). Linux file systems now include at least rudimentary support SOSP ’17, October 28, 2017, Shanghai, China © 2017 Copyright held by the owner/author(s). for them [10, 20, 63]. These file systems provide significant ACM ISBN 978-1-4503-5085-3/17/10. performance gains for data access and support “direct access” https://doi.org/10.1145/3132747.3132761 (or DAX-style) mmap() that allows applications to access a 478 SOSP ’17, October 28, 2017, Shanghai, China J. Xu and L. Zhang et al. file’s contents directly using load and store instructions, a Caveat DAXor has two important consequences for NOVA- likely “killer app” for NVMMs. Fortis’ design. The first applies to most other NVMM file Despite these NVMM-centric performance improvements, systems: To maximize performance, applications are respon- none of these file systems provide the data protection fea- sible for enforcing ordering on stores to mapped data to tures necessary to detect and correct media errors, protect ensure consistency in the face of system failure. against data corruption due to misbehaving code, or perform The second consequence arises because NOVA-Fortis uses consistent backups of the NVMM’s contents. File system parity to protect file data from corruption. Keeping error stacks in wide use (e.g., ext4 running atop LVM, Btrfs, and correction information up-to-date for mapped data would ZFS) provide some or all of these capabilities for block-based require interposing on every store, imposing a significant storage. If users are to trust NVMM file systems with critical performance overhead. Instead, NOVA-Fortis requires ap- data, they will need these features as well. plications to take responsibility for data protection of data From a reliability perspective, there are four key differ- while it is mapped and restores parity protection when the ences between conventional block-based file systems and memory is unmapped. NVMM file systems. We quantify the performance and storage overhead of First, the memory controller reports persistent memory NOVA-Fortis’ fault-tolerance mechanisms and these design media errors as non-maskable interrupts rather than error decisions and evaluate their effectiveness at preventing cor- codes from a block driver. Further, the granularity of errors ruption of both file system metadata and file data. is smaller (e.g., a cache line) and varies depending on the We make the following contributions: memory device. (1) We identify the unique challenges that the caveat DAXor Second, persistent memory file systems must support principle presents to building a fault-tolerant NVMM DAX-style memory mapping that maps persistent memory file systems. pages directly into the application’s address space. DAX is (2) We describe a fast replication algorithm called Tick- the fastest way to access persistent memory since it elimi- Tock for NVMM data structures that combines atomic nates all operating and file system code from the access path. update with error detection and recovery. However, it means a file’s contents can change without the (3) We adapt state-of-the-art techniques for data protec- file system’s knowledge, something that is not possible ina tion to work in NOVA-Fortis and to accommodate block-based file system. DAX-style mmap(). Third, the entire file system resides in the kernel’s address (4) We quantify NOVA-Fortis’ vulnerability to scribbles space, vastly increasing vulnerability to “scribbles” – errant and develop techniques to reduce this vulnerability. stores from misbehaving kernel code. (5) We quantify the performance and storage overheads Fourth, NVMMs are vastly faster than block-based stor- of NOVA-Fortis’ data protection mechanisms. age devices. This means that the trade-offs block-based file systems make between reliability and performance need a We find that the extra storage NOVA-Fortis needs topro- thorough re-evaluation. vide fault-tolerance consumes 14.8% of file system space and We explore the impact of these differences on file system reduces application-level performance by between 2% and reliability mechanisms by building NOVA-Fortis, an NVMM 38% compared to NOVA. NOVA-Fortis outperforms DAX- × file system that adds fault-tolerance to NOVA[64] by in- aware file systems without reliability features by1.5 on corporating snapshots, replication, checksums, and RAID-4 average. It outperforms reliable, block-based file systems × parity protection. running on NVMM by 3 on average. In applying these techniques to an NVMM file system, we To describe NOVA-Fortis, we start by providing a brief have developed the principle of caveat DAXor (“let the DAXer primer on NVMM’s implications for system designers, exist- beware”): Applications that use DAX-style mmap() must ac- ing NVMM file systems, key issues in file system reliability, cept responsibility for protecting their data’s integrity and and the NOVA filesystem (Section 2). Then, we describe consistency. NOVA-Fortis’ snapshot and (meta)data protection mecha- Protecting and guaranteeing consistency for DAX mmap()’d nisms (Sections 3 and 4). Section 5 evaluates these mecha- data is complex and challenging. The file system cannot fix nisms, and Section 6 presents our conclusions. that and should not try. Instead, the file system should stu- diously avoid imposing any performance overhead on DAX- 2 BACKGROUND style access, except when absolutely necessary. For data that NOVA-Fortis targets memory systems that include emerging is not mapped, the file system should retain responsibility non-volatile memory technologies along with DRAM. This for data integrity. section first provides a brief survey of NVMM technologies and the opportunities and challenges they present. Then we 479 NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System SOSP ’17, October 28, 2017, Shanghai, China describe recent work on NVMM file systems and discuss key 2.2 NVMM File Systems and DAX issues in file system reliability. Finally, we provide a brief Several groups have designed NVMM file systems [12, 14, primer on NOVA. 15, 63, 64] that address the unique challenges that NVMMs’ performance and byte-addressible interface present. One of these, NOVA, is the basis for NOVA-Fortis, and we describe 2.1 Non-volatile Memory Technologies it in more detail in Section 2.4. Modern server platforms have support NVMM in form of NVMMs’ low latencies make software efficiency much NVDIMMs [22, 37] and the Linux kernel includes low-level more important than in block-based storage devices [7, 9, 62, drivers for identifying physical address regions that are non- 65].

NOVA-Fortis: a Fault-Tolerant Non-Volatile Main Memory File

The Kernel Report

Rootless Containers with Podman and Fuse-Overlayfs

Dm-X: Protecting Volume-Level Integrity for Cloud Volumes and Local

ECE 598 – Advanced Operating Systems Lecture 19

NOVA: a Log-Structured File System for Hybrid Volatile/Non

Oracle® Linux 7 Managing File Systems

Filesystem Considerations for Embedded Devices ELC2015 03/25/15

NOVA: the Fastest File System for Nvdimms

Ted Ts'o on Linux File Systems

OSD-Btrfs, a Novel Lustre OSD Based on Btrfs

Storage Administration Guide Storage Administration Guide SUSE Linux Enterprise Server 15

Petalinux Tools Documentation: Reference Guide