NOVA-Fortis: a Fault-Tolerant Non-Volatile Main Memory File

Total Page:16

File Type:pdf, Size:1020Kb

NOVA-Fortis: a Fault-Tolerant Non-Volatile Main Memory File NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System Jian Xu∗ Lu Zhang∗ Amirsaman Memaripour University of California, San Diego University of California, San Diego University of California, San Diego [email protected] [email protected] Akshatha Gangadharaiah Amit Borase Tamires Brito Da Silva University of California, San Diego University of California, San Diego University of California, San Diego Steven Swanson Andy Rudoff University of California, San Diego Intel Corporation [email protected] [email protected] ABSTRACT CCS CONCEPTS Emerging fast, persistent memories will enable systems that • Information systems → Phase change memory; Mirror- combine conventional DRAM with large amounts of non- ing; RAID; Point-in-time copies; volatile main memory (NVMM) and provide huge increases in storage performance. Fully realizing this potential requires KEYWORDS fundamental changes in how system software manages, pro- Persistent Memory, Non-volatile Memory, Direct Access, tects, and provides access to data that resides in NVMM. We DAX, File Systems, Reliability address these needs by describing an NVMM-optimized file system called NOVA-Fortis that is both fast and resilient in ACM Reference Format: the face of corruption due to media errors and software bugs. Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangad- We identify and propose solutions for the unique challenges haraiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, in adding fault tolerance to an NVMM file system, adapt and Andy Rudoff. 2017. NOVA-Fortis: A Fault-Tolerant Non-Volatile state-of-the-art reliability techniques to an NVMM file sys- Main Memory File System. In Proceedings of ACM SIGOPS 26th Sym- tem, and quantify the performance and storage overheads posium on Operating Systems Principles (SOSP ’17). ACM, New York, of these techniques. We find that NOVA-Fortis’ reliability NY, USA, 19 pages. https://doi.org/10.1145/3132747.3132761 features consume 14.8% of the storage for redundancy and re- duce application-level performance by between 2% and 38% 1 INTRODUCTION compared to the same file system with the features removed. Fast, persistent memory technologies (e.g., battery-backed NOVA-Fortis outperforms DAX-aware file systems without NVDIMMs [37] or Intel and Micron’s 3D XPoint [36]) will reliability features by 1.5× on average. It outperforms reli- enable computer systems with expansive memory systems able, block-based file systems running on NVMM by3× on that combine volatile and non-volatile memories. These hy- average. brid memory systems offer the promise of dramatic increases in storage performance. ∗The first two authors contributed equally to this work. Integrating NVMMs into computer systems presents a host of interesting challenges. The most pressing of these focus on how we should redesign existing software com- Permission to make digital or hard copies of part or all of this work for ponents (e.g., file systems) to accommodate and exploit the personal or classroom use is granted without fee provided that copies are different performance characteristics, interfaces, and seman- not made or distributed for profit or commercial advantage and that copies tics that NVMMs provide. bear this notice and the full citation on the first page. Copyrights for third- Several groups have proposed new file systems12 [ , 15, 64] party components of this work must be honored. For all other uses, contact designed specifically for NVMMs and several Windows and the owner/author(s). Linux file systems now include at least rudimentary support SOSP ’17, October 28, 2017, Shanghai, China © 2017 Copyright held by the owner/author(s). for them [10, 20, 63]. These file systems provide significant ACM ISBN 978-1-4503-5085-3/17/10. performance gains for data access and support “direct access” https://doi.org/10.1145/3132747.3132761 (or DAX-style) mmap() that allows applications to access a 478 SOSP ’17, October 28, 2017, Shanghai, China J. Xu and L. Zhang et al. file’s contents directly using load and store instructions, a Caveat DAXor has two important consequences for NOVA- likely “killer app” for NVMMs. Fortis’ design. The first applies to most other NVMM file Despite these NVMM-centric performance improvements, systems: To maximize performance, applications are respon- none of these file systems provide the data protection fea- sible for enforcing ordering on stores to mapped data to tures necessary to detect and correct media errors, protect ensure consistency in the face of system failure. against data corruption due to misbehaving code, or perform The second consequence arises because NOVA-Fortis uses consistent backups of the NVMM’s contents. File system parity to protect file data from corruption. Keeping error stacks in wide use (e.g., ext4 running atop LVM, Btrfs, and correction information up-to-date for mapped data would ZFS) provide some or all of these capabilities for block-based require interposing on every store, imposing a significant storage. If users are to trust NVMM file systems with critical performance overhead. Instead, NOVA-Fortis requires ap- data, they will need these features as well. plications to take responsibility for data protection of data From a reliability perspective, there are four key differ- while it is mapped and restores parity protection when the ences between conventional block-based file systems and memory is unmapped. NVMM file systems. We quantify the performance and storage overhead of First, the memory controller reports persistent memory NOVA-Fortis’ fault-tolerance mechanisms and these design media errors as non-maskable interrupts rather than error decisions and evaluate their effectiveness at preventing cor- codes from a block driver. Further, the granularity of errors ruption of both file system metadata and file data. is smaller (e.g., a cache line) and varies depending on the We make the following contributions: memory device. (1) We identify the unique challenges that the caveat DAXor Second, persistent memory file systems must support principle presents to building a fault-tolerant NVMM DAX-style memory mapping that maps persistent memory file systems. pages directly into the application’s address space. DAX is (2) We describe a fast replication algorithm called Tick- the fastest way to access persistent memory since it elimi- Tock for NVMM data structures that combines atomic nates all operating and file system code from the access path. update with error detection and recovery. However, it means a file’s contents can change without the (3) We adapt state-of-the-art techniques for data protec- file system’s knowledge, something that is not possible ina tion to work in NOVA-Fortis and to accommodate block-based file system. DAX-style mmap(). Third, the entire file system resides in the kernel’s address (4) We quantify NOVA-Fortis’ vulnerability to scribbles space, vastly increasing vulnerability to “scribbles” – errant and develop techniques to reduce this vulnerability. stores from misbehaving kernel code. (5) We quantify the performance and storage overheads Fourth, NVMMs are vastly faster than block-based stor- of NOVA-Fortis’ data protection mechanisms. age devices. This means that the trade-offs block-based file systems make between reliability and performance need a We find that the extra storage NOVA-Fortis needs topro- thorough re-evaluation. vide fault-tolerance consumes 14.8% of file system space and We explore the impact of these differences on file system reduces application-level performance by between 2% and reliability mechanisms by building NOVA-Fortis, an NVMM 38% compared to NOVA. NOVA-Fortis outperforms DAX- × file system that adds fault-tolerance to NOVA[64] by in- aware file systems without reliability features by1.5 on corporating snapshots, replication, checksums, and RAID-4 average. It outperforms reliable, block-based file systems × parity protection. running on NVMM by 3 on average. In applying these techniques to an NVMM file system, we To describe NOVA-Fortis, we start by providing a brief have developed the principle of caveat DAXor (“let the DAXer primer on NVMM’s implications for system designers, exist- beware”): Applications that use DAX-style mmap() must ac- ing NVMM file systems, key issues in file system reliability, cept responsibility for protecting their data’s integrity and and the NOVA filesystem (Section 2). Then, we describe consistency. NOVA-Fortis’ snapshot and (meta)data protection mecha- Protecting and guaranteeing consistency for DAX mmap()’d nisms (Sections 3 and 4). Section 5 evaluates these mecha- data is complex and challenging. The file system cannot fix nisms, and Section 6 presents our conclusions. that and should not try. Instead, the file system should stu- diously avoid imposing any performance overhead on DAX- 2 BACKGROUND style access, except when absolutely necessary. For data that NOVA-Fortis targets memory systems that include emerging is not mapped, the file system should retain responsibility non-volatile memory technologies along with DRAM. This for data integrity. section first provides a brief survey of NVMM technologies and the opportunities and challenges they present. Then we 479 NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System SOSP ’17, October 28, 2017, Shanghai, China describe recent work on NVMM file systems and discuss key 2.2 NVMM File Systems and DAX issues in file system reliability. Finally, we provide a brief Several groups have designed NVMM file systems [12, 14, primer on NOVA. 15, 63, 64] that address the unique challenges that NVMMs’ performance and byte-addressible interface present. One of these, NOVA, is the basis for NOVA-Fortis, and we describe 2.1 Non-volatile Memory Technologies it in more detail in Section 2.4. Modern server platforms have support NVMM in form of NVMMs’ low latencies make software efficiency much NVDIMMs [22, 37] and the Linux kernel includes low-level more important than in block-based storage devices [7, 9, 62, drivers for identifying physical address regions that are non- 65].
Recommended publications
  • The Kernel Report
    The kernel report (ELC 2012 edition) Jonathan Corbet LWN.net [email protected] The Plan Look at a year's worth of kernel work ...with an eye toward the future Starting off 2011 2.6.37 released - January 4, 2011 11,446 changes, 1,276 developers VFS scalability work (inode_lock removal) Block I/O bandwidth controller PPTP support Basic pNFS support Wakeup sources What have we done since then? Since 2.6.37: Five kernel releases have been made 59,000 changes have been merged 3069 developers have contributed to the kernel 416 companies have supported kernel development February As you can see in these posts, Ralink is sending patches for the upstream rt2x00 driver for their new chipsets, and not just dumping a huge, stand-alone tarball driver on the community, as they have done in the past. This shows a huge willingness to learn how to deal with the kernel community, and they should be strongly encouraged and praised for this major change in attitude. – Greg Kroah-Hartman, February 9 Employer contributions 2.6.38-3.2 Volunteers 13.9% Wolfson Micro 1.7% Red Hat 10.9% Samsung 1.6% Intel 7.3% Google 1.6% unknown 6.9% Oracle 1.5% Novell 4.0% Microsoft 1.4% IBM 3.6% AMD 1.3% TI 3.4% Freescale 1.3% Broadcom 3.1% Fujitsu 1.1% consultants 2.2% Atheros 1.1% Nokia 1.8% Wind River 1.0% Also in February Red Hat stops releasing individual kernel patches March 2.6.38 released – March 14, 2011 (9,577 changes from 1198 developers) Per-session group scheduling dcache scalability patch set Transmit packet steering Transparent huge pages Hierarchical block I/O bandwidth controller Somebody needs to get a grip in the ARM community.
    [Show full text]
  • Rootless Containers with Podman and Fuse-Overlayfs
    CernVM Workshop 2019 (4th June 2019) Rootless containers with Podman and fuse-overlayfs Giuseppe Scrivano @gscrivano Introduction 2 Rootless Containers • “Rootless containers refers to the ability for an unprivileged user (i.e. non-root user) to create, run and otherwise manage containers.” (https://rootlesscontaine.rs/ ) • Not just about running the container payload as an unprivileged user • Container runtime runs also as an unprivileged user 3 Don’t confuse with... • sudo podman run --user foo – Executes the process in the container as non-root – Podman and the OCI runtime still running as root • USER instruction in Dockerfile – same as above – Notably you can’t RUN dnf install ... 4 Don’t confuse with... • podman run --uidmap – Execute containers as a non-root user, using user namespaces – Most similar to rootless containers, but still requires podman and runc to run as root 5 Motivation of Rootless Containers • To mitigate potential vulnerability of container runtimes • To allow users of shared machines (e.g. HPC) to run containers without the risk of breaking other users environments • To isolate nested containers 6 Caveat: Not a panacea • Although rootless containers could mitigate these vulnerabilities, it is not a panacea , especially it is powerless against kernel (and hardware) vulnerabilities – CVE 2013-1858, CVE-2015-1328, CVE-2018-18955 • Castle approach : it should be used in conjunction with other security layers such as seccomp and SELinux 7 Podman 8 Rootless Podman Podman is a daemon-less alternative to Docker • $ alias
    [Show full text]
  • Dm-X: Protecting Volume-Level Integrity for Cloud Volumes and Local
    dm-x: Protecting Volume-level Integrity for Cloud Volumes and Local Block Devices Anrin Chakraborti Bhushan Jain Jan Kasiak Stony Brook University Stony Brook University & Stony Brook University UNC, Chapel Hill Tao Zhang Donald Porter Radu Sion Stony Brook University & Stony Brook University & Stony Brook University UNC, Chapel Hill UNC, Chapel Hill ABSTRACT on Amazon’s Virtual Private Cloud (VPC) [2] (which provides The verified boot feature in recent Android devices, which strong security guarantees through network isolation) store deploys dm-verity, has been overwhelmingly successful in elim- data on untrusted storage devices residing in the public cloud. inating the extremely popular Android smart phone rooting For example, Amazon VPC file systems, object stores, and movement [25]. Unfortunately, dm-verity integrity guarantees databases reside on virtual block devices in the Amazon are read-only and do not extend to writable volumes. Elastic Block Storage (EBS). This paper introduces a new device mapper, dm-x, that This allows numerous attack vectors for a malicious cloud efficiently (fast) and reliably (metadata journaling) assures provider/untrusted software running on the server. For in- volume-level integrity for entire, writable volumes. In a direct stance, to ensure SEC-mandated assurances of end-to-end disk setup, dm-x overheads are around 6-35% over ext4 on the integrity, banks need to guarantee that the external cloud raw volume while offering additional integrity guarantees. For storage service is unable to remove log entries documenting cloud storage (Amazon EBS), dm-x overheads are negligible. financial transactions. Yet, without integrity of the storage device, a malicious cloud storage service could remove logs of a coordinated attack.
    [Show full text]
  • ECE 598 – Advanced Operating Systems Lecture 19
    ECE 598 { Advanced Operating Systems Lecture 19 Vince Weaver http://web.eece.maine.edu/~vweaver [email protected] 7 April 2016 Announcements • Homework #7 was due • Homework #8 will be posted 1 Why use FAT over ext2? • FAT simpler, easy to code • FAT supported on all major OSes • ext2 faster, more robust filename and permissions 2 btrfs • B-tree fs (similar to a binary tree, but with pages full of leaves) • overwrite filesystem (overwite on modify) vs CoW • Copy on write. When write to a file, old data not overwritten. Since old data not over-written, crash recovery better Eventually old data garbage collected • Data in extents 3 • Copy-on-write • Forest of trees: { sub-volumes { extent-allocation { checksum tree { chunk device { reloc • On-line defragmentation • On-line volume growth 4 • Built-in RAID • Transparent compression • Snapshots • Checksums on data and meta-data • De-duplication • Cloning { can make an exact snapshot of file, copy-on- write different than link, different inodles but same blocks 5 Embedded • Designed to be small, simple, read-only? • romfs { 32 byte header (magic, size, checksum,name) { Repeating files (pointer to next [0 if none]), info, size, checksum, file name, file data • cramfs 6 ZFS Advanced OS from Sun/Oracle. Similar in idea to btrfs indirect still, not extent based? 7 ReFS Resilient FS, Microsoft's answer to brtfs and zfs 8 Networked File Systems • Allow a centralized file server to export a filesystem to multiple clients. • Provide file level access, not just raw blocks (NBD) • Clustered filesystems also exist, where multiple servers work in conjunction.
    [Show full text]
  • NOVA: a Log-Structured File System for Hybrid Volatile/Non
    NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu and Steven Swanson, University of California, San Diego https://www.usenix.org/conference/fast16/technical-sessions/presentation/xu This paper is included in the Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016 • Santa Clara, CA, USA ISBN 978-1-931971-28-7 Open access to the Proceedings of the 14th USENIX Conference on File and Storage Technologies is sponsored by USENIX NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu Steven Swanson University of California, San Diego Abstract Hybrid DRAM/NVMM storage systems present a host of opportunities and challenges for system designers. These sys- Fast non-volatile memories (NVMs) will soon appear on tems need to minimize software overhead if they are to fully the processor memory bus alongside DRAM. The result- exploit NVMM’s high performance and efficiently support ing hybrid memory systems will provide software with sub- more flexible access patterns, and at the same time they must microsecond, high-bandwidth access to persistent data, but provide the strong consistency guarantees that applications managing, accessing, and maintaining consistency for data require and respect the limitations of emerging memories stored in NVM raises a host of challenges. Existing file sys- (e.g., limited program cycles). tems built for spinning or solid-state disks introduce software Conventional file systems are not suitable for hybrid mem- overheads that would obscure the performance that NVMs ory systems because they are built for the performance char- should provide, but proposed file systems for NVMs either in- acteristics of disks (spinning or solid state) and rely on disks’ cur similar overheads or fail to provide the strong consistency consistency guarantees (e.g., that sector updates are atomic) guarantees that applications require.
    [Show full text]
  • Oracle® Linux 7 Managing File Systems
    Oracle® Linux 7 Managing File Systems F32760-07 August 2021 Oracle Legal Notices Copyright © 2020, 2021, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract.
    [Show full text]
  • Filesystem Considerations for Embedded Devices ELC2015 03/25/15
    Filesystem considerations for embedded devices ELC2015 03/25/15 Tristan Lelong Senior embedded software engineer Filesystem considerations ABSTRACT The goal of this presentation is to answer a question asked by several customers: which filesystem should you use within your embedded design’s eMMC/SDCard? These storage devices use a standard block interface, compatible with traditional filesystems, but constraints are not those of desktop PC environments. EXT2/3/4, BTRFS, F2FS are the first of many solutions which come to mind, but how do they all compare? Typical queries include performance, longevity, tools availability, support, and power loss robustness. This presentation will not dive into implementation details but will instead summarize provided answers with the help of various figures and meaningful test results. 2 TABLE OF CONTENTS 1. Introduction 2. Block devices 3. Available filesystems 4. Performances 5. Tools 6. Reliability 7. Conclusion Filesystem considerations ABOUT THE AUTHOR • Tristan Lelong • Embedded software engineer @ Adeneo Embedded • French, living in the Pacific northwest • Embedded software, free software, and Linux kernel enthusiast. 4 Introduction Filesystem considerations Introduction INTRODUCTION More and more embedded designs rely on smart memory chips rather than bare NAND or NOR. This presentation will start by describing: • Some context to help understand the differences between NAND and MMC • Some typical requirements found in embedded devices designs • Potential filesystems to use on MMC devices 6 Filesystem considerations Introduction INTRODUCTION Focus will then move to block filesystems. How they are supported, what feature do they advertise. To help understand how they compare, we will present some benchmarks and comparisons regarding: • Tools • Reliability • Performances 7 Block devices Filesystem considerations Block devices MMC, EMMC, SD CARD Vocabulary: • MMC: MultiMediaCard is a memory card unveiled in 1997 by SanDisk and Siemens based on NAND flash memory.
    [Show full text]
  • NOVA: the Fastest File System for Nvdimms
    NOVA: The Fastest File System for NVDIMMs Steven Swanson, UC San Diego XFS F2FS NILFS EXT4 BTRFS © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Disk-based file systems are inadequate for NVMM Disk-based file systems cannot 1-Sector 1-Block N-Block 1-Sector 1-Block N-Block Atomicity overwrit overwrit overwrit append append append exploit NVMM performance e e e Ext4 ✓ ✗ ✗ ✗ ✗ ✗ wb Ext4 ✓ ✓ ✗ ✓ ✗ ✓ Performance optimization Order Ext4 ✓ ✓ ✓ ✓ ✗ ✓ compromises consistency on Dataj system failure [1] Btrfs ✓ ✓ ✓ ✓ ✗ ✓ xfs ✓ ✓ ✗ ✓ ✗ ✓ Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ [1] Pillai et al, All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI '14. © 2017 SNIA Persistent Memory Summit. All Rights Reserved. BPFS SCMFS PMFS Aerie EXT4-DAX XFS-DAX NOVA M1FS © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Previous Prototype NVMM file systems are not strongly consistent DAX does not provide data ATomic Atomicity Metadata Data Snapshot atomicity guarantee Mmap [1] So programming is more BPFS ✓ ✓ [2] ✗ ✗ difficult PMFS ✓ ✗ ✗ ✗ Ext4-DAX ✓ ✗ ✗ ✗ Xfs-DAX ✓ ✗ ✗ ✗ SCMFS ✗ ✗ ✗ ✗ Aerie ✓ ✗ ✗ ✗ © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Ext4-DAX and xfs-DAX shortcomings No data atomicity support Single journal shared by all the transactions (JBD2- based) Poor performance Development teams are (rightly) “disk first”. © 2017 SNIA Persistent Memory Summit. All Rights Reserved. NOVA provides strong atomicity guarantees 1-Sector 1-Sector 1-Block 1-Block N-Block N-Block Atomicity Atomicity Metadata Data Mmap overwrite append overwrite append overwrite append Ext4 ✓ ✗ ✗ ✗ ✗ ✗ BPFS ✓ ✓ ✗ wb Ext4 ✓ ✓ ✗ ✓ ✗ ✓ PMFS ✓ ✗ ✗ Order Ext4 ✓ ✓ ✓ ✓ ✗ ✓ Ext4-DAX ✓ ✗ ✗ Dataj Btrfs ✓ ✓ ✓ ✓ ✗ ✓ Xfs-DAX ✓ ✗ ✗ xfs ✓ ✓ ✗ ✓ ✗ ✓ SCMFS ✗ ✗ ✗ Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ Aerie ✓ ✗ ✗ © 2017 SNIA Persistent Memory Summit. All Rights Reserved.
    [Show full text]
  • Ted Ts'o on Linux File Systems
    Ted Ts’o on Linux File Systems An Interview RIK FARROW Rik Farrow is the Editor of ;login:. ran into Ted Ts’o during a tutorial luncheon at LISA ’12, and that later [email protected] sparked an email discussion. I started by asking Ted questions that had I puzzled me about the early history of ext2 having to do with the perfor- mance of ext2 compared to the BSD Fast File System (FFS). I had met Rob Kolstad, then president of BSDi, because of my interest in the AT&T lawsuit against the University of California and BSDi. BSDi was being sued for, among other things, Theodore Ts’o is the first having a phone number that could be spelled 800-ITS-UNIX. I thought that it was important North American Linux for the future of open source operating systems that AT&T lose that lawsuit. Kernel Developer, having That said, when I compared the performance of early versions of Linux to the current version started working with Linux of BSDi, I found that they were closely matched, with one glaring exception. Unpacking tar in September 1991. He also archives using Linux (likely .9) was blazingly fast compared to BSDi. I asked Rob, and he served as the tech lead for the MIT Kerberos explained that the issue had to do with synchronous writes, finally clearing up a mystery for me. V5 development team, and was the architect at IBM in charge of bringing real-time Linux Now I had a chance to ask Ted about the story from the Linux side, as well as other questions in support of real-time Java to the US Navy.
    [Show full text]
  • OSD-Btrfs, a Novel Lustre OSD Based on Btrfs
    2015/04/15 OSD-Btrfs, A Novel Lustre OSD based on Btrfs Shuichi Ihara Li Xi DataDirect Networks, Inc DataDirect Networks, Inc ©2015 DataDirect Networks. All Rights Reserved. ddn.com Why Btrfs for Lustre? ► Lustre today uses ldiskfs(ext4) or, more recently, ZFS as backend file system ► Ldiskfs(ext4) • ldiskfs(ext4) is used widely and exhibits well-tuned performance and proven reliability • But, its technical limitations are increasingly apparent, it is missing important novel features, lacks scalability, while offline fsck is extremely time-consuming ► ZFS • ZFS is an extremely feature-rich file system • but, for use as Lustre OSD further performance optimization is required • Licensing and potential legal issues have prevented a broader distribution ► Btrfs • Btrfs shares many features and goals with ZFS • Btrfs is becoming sufficiently stable for production usage • Btrfs is available through the major mainstream Linux distributions • Pachless server support. Simple Lustre setup and easy maintenance 2 ©2015 DataDirect Networks. All Rights Reserved. ddn.com Attractive Btrfs Features for Lustre (1) ► Features that improve OSD internally • Integrated multiple device support: RAID 0/1/10/5/6 o RAID 0 results in less OSTs per OSS and simplifies management, while improving the performance characteristics of a single OSD o RAID 1/10/5/6 improves reliability of OSDs • Dynamic inode allocation o No need to worry about formatting the MDT with wrong option to cause insufficient inode number • crc32c checksums on data and metadata o Improves OSD reliability • Online filesystem check and very fast offline filesystem check (planed) o Drives are getting bigger, but not faster, which leads to increasing fsck times 3 ©2015 DataDirect Networks.
    [Show full text]
  • Storage Administration Guide Storage Administration Guide SUSE Linux Enterprise Server 15
    SUSE Linux Enterprise Server 15 Storage Administration Guide Storage Administration Guide SUSE Linux Enterprise Server 15 Provides information about how to manage storage devices on a SUSE Linux Enterprise Server. Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2006– 2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see https://www.suse.com/company/legal/ . All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its aliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof. Contents About This Guide xi 1 Available Documentation xi 2 Giving Feedback xiii 3 Documentation Conventions xiii 4 Product Life Cycle and Support xv Support Statement for SUSE Linux Enterprise Server xvi • Technology Previews xvii I FILE SYSTEMS AND MOUNTING 1 1 Overview of File Systems
    [Show full text]
  • Petalinux Tools Documentation: Reference Guide
    See all versions of this document PetaLinux Tools Documentation Reference Guide UG1144 (v2021.1) June 16, 2021 Revision History Revision History The following table shows the revision history for this document. Section Revision Summary 06/16/2021 Version 2021.1 Chapter 7: Customizing the Project Added a new section: Configuring UBIFS Boot. Chapter 5: Booting and Packaging Updated Steps to Boot a PetaLinux Image on Hardware with SD Card. Appendix A: Migration Added FPGA Manager Changes, Yocto Recipe Name Changes, Host GCC Version Upgrade. Chapter 10: Advanced Configurations Updated U-Boot Configuration and Image Packaging Configuration. UG1144 (v2021.1) June 16, 2021Send Feedback www.xilinx.com PetaLinux Tools Documentation Reference Guide 2 Table of Contents Revision History...............................................................................................................2 Chapter 1: Overview.................................................................................................... 8 Introduction................................................................................................................................. 8 Navigating Content by Design Process.................................................................................... 9 Chapter 2: Setting Up Your Environment...................................................... 11 Installation Steps.......................................................................................................................11 PetaLinux Working Environment Setup................................................................................
    [Show full text]