Gassyfs: an In-Memory File System That Embraces Volatility

Total Page:16

File Type:pdf, Size:1020Kb

Gassyfs: an In-Memory File System That Embraces Volatility GassyFS: An In-Memory File System That Embraces Volatility Noah Watkins, Michael Sevilla, Carlos Maltzahn University of California, Santa Cruz fjayhawk,msevilla,[email protected] Abstract For many years storage systems were designed for slow storage devices. However, the average speed of these de- vices has been growing exponentially, making traditional storage system designs increasingly inadequate. Suffi- cient time must be dedicated to redesigning future stor- age systems or they might never adequately support fast storage devices using traditional storage semantics. In this paper, we argue that storage systems should ex- pose a persistence-performance trade-off to applications that are willing to explicitly take control over durabil- ity. We describe our prototype system called GassyFS that stores file system data in distributed remote memory and provides support for checkpointing file system state. As a consequence of this design, we explore a spectrum Figure 1: GassyFS is comparable to tmpfs for genomic bench- of mechanisms and policies for efficiently sharing data marks that fit on one node, yet it is designed to scale beyond one across file system boundaries. node. Its simplicity and well-defined workflows for its applica- tions lets us explore new techniques for sharing data. 1 Introduction parent scale-out solution that minimizes modifications to A wide range of data-intensive applications that access applications. In this paper we consider the latter solu- large working sets managed in a file system can eas- tion, and observe that existing scale-out storage systems ily become bound by the cost of I/O. As a workaround, do not provide the persistence-performance trade-off de- practitioners exploit the availability of inexpensive RAM spite applications already using a homegrown strategy. to significantly accelerate application performance us- ing in-memory file systems such as tmpfs [10]. Unfor- Figure 1 shows how our system, GassyFS, compares tunately, durability and scalability are major challenges to tmpfs when the working set for our genomics bench- that users must address on their own. marks fit on a single node. That figure demonstrates the types of workflows we target: the user ingests BAM in- While the use of tmpfs is a quick and effective so- 1 lution to accelerating application performance, in order put files (10GB, 40MB), computes results, and check- to bound data loss inherent with memory-based storage, points the results back into durable storage (24GB) us- practitioners modify application workflows to bracket ing cp and sync. The slowdowns reported by Figure 1 periods of high-performance I/O with explicit control demonstrate that GassyFS has performance comparable over durability (e.g. checkpointing), creating a trade- to tmpfs, in our limited tests, with the added benefit that it off between persistence and performance. But when 1 single-node scalability limits are reached applications are The workload uses SAMtools, a popular Genomics tool kit. The jobs process are (input/output sizes): sort (10GB/10GB), shuf- faced with a choice: either adopt a distributed algorithm fle (10GB/52MB), convert into VCF (40MB/160MB), merge (20GB/ through an expensive development effort, or use a trans- 13.4GB) is designed to scale beyond one node. Scaling the work- load to volumes that do not fit into RAM without a dis- POSIX Application POSIX Application tributed solution limits the scope of solutions to expen- GassyFS (libgassy, FUSE) GassyFS (libgassy, FUSE) sive, emerging byte-addressable media. One approach is to utilize the caching features found in some distributed storage systems. These systems are d e t Network-attached RAM a Network-attached RAM often composed from commodity hardware with many r e Checkpoint d Checkpoint e cores, lots of memory, high-performance networking, Restore f Restore and a variety of storage media with differing capabili- RADOS Ceph Amazon S3 Gluster SSD ties. Applications that wish to take advantage of the per- Organization A Organization B formance heterogeneity of media (e.g. HDD vs SSD) may use caching features when available, but such sys- Figure 2: GassyFS has facilities for explicitly managing tems tend to use coarse-grained abstractions, impose re- persistence to different storage targets. A checkpointing dundancy costs at all layers, and do not expose abstrac- infrastructure gives GassyFS flexible policies for persist- tions for volatile storage—memory is only used inter- ing namespaces and federating data. nally. These limitations force applications that explicitly manage durability from scaling through the use of ex- isting systems. Furthermore, existing systems originally 2 Architecture designed for slow media may not be able to fully exploit the speed of RAM over networks that offer direct mem- The architecture of GassyFS is illustrated in Figure 2. ory access. What is needed are scale-out solutions that The core of the file system is a user-space library that allow applications to fine-tune their durability require- implements a POSIX file interface. File system meta- ments to achieve higher performance when possible. data is managed locally in memory, and file data is dis- Existing solutions center on large-volume high- tributed across a pool of network-attached RAM man- performance storage constructed from off-the-shelf hard- aged by worker nodes and accessible over RDMA or Eth- ware and software components. For instance, iSCSI ernet. Applications access GassyFS through a standard extensions for RDMA can be combined with remote FUSE mount, or may link directly to the library to avoid RAM disks to provide a high-performance volatile block any overhead that FUSE may introduce. store formatted with a local file system or used as a By default all data in GassyFS is non-persistent. That block cache. While easy to assemble, this approach re- is, all metadata and file data is kept in memory, and quires coarse-grained data management, and I/O must be any node failure will result in data loss. In this mode funneled through the host for it to be made persistent. GassyFS can be thought of as a high-volume tmpfs that Non-volatile memories with RAM-like performance are can be instantiated and destroyed as needed, or kept emerging, but software and capacity are a limiting factor. mounted and used by applications with multiple stages of While easy to setup and scale, the problem of forcing all execution. The differences between GassyFS and tmpfs I/O related to persistence through a host is exacerbated as become apparent when we consider how users deal with the volume of data managed increases. The inability of durability concerns. these solutions to exploit application-driven persistence At the bottom of Figure 2 are shown a set of storage forces us to look to other architectures. targets that can be used for managing persistent check- In the remainder of this paper we present a prototype points of GassyFS. We will discuss durability in more file system called GassyFS1, a distributed in-memory file detail in the next section. Finally, we support a form of system with explicit checkpointing for persistence. We file system federation that allows checkpoint content to view GassyFS as a stop-gap solution to providing appli- be accessed remotely to enable efficient data sharing be- cations with access to very high-performance large vol- tween users over a wide-area network. Federation and ume persistent memories that are projected to be avail- data sharing are discussed in Section 6. able in the coming years, allowing applications to be- gin exploring the implications of a changing I/O per- 3 Durability formance profile. As a result of including support for checkpointing in our system, we encountered a number Persistence is achieved in GassyFS using a checkpoint of ways that we envision data to be shared without cross- feature that materializes a consistent view of the file sys- ing file system boundaries, by providing data manage- tem. In contrast to explicitly copying data into and out of ment features for attaching shared checkpoints and ex- tmpfs, users instruct GassyFS to generate a checkpoint ternal storage systems, allowing data to reside within the of a configurable subset of the file system. Each check- context of GassyFS as long as possible. point may be stored on a variety of supported backends 2 such as local disk or SSD, a distributed storage system Namespace View Checkpoint Content such as Ceph, RADOS or GlusterFS, or to a cloud-based Full Checkpoint service such as Amazon S3. A checkpoint is saved using a copy-on-write mecha- nism that generates a new, persistent version of the file Partial Checkpoint system. Users may forego the ability to time travel and only use checkpointing to recover from local failures. However, generating checkpoint versions allows for ro- bust sharing of checkpoints when one file system is ac- B B Parallel Load tively being used. Included In Checkpoint A B A GassyFS takes advantage of its distributed design to Attached File System or Checkpoint avoid the single-node I/O bottleneck that is present when persisting data stored within tmpfs. Rather, in GassyFS each worker node performs checkpoint I/O in parallel Figure 3: Namespace checkpointing modes. Top: with all other nodes, storing data to a locally attached full/partial checkpoints, Middle: checkpointing attached disk, or to a networked storage system that supports par- file systems (e.g., connected with POSIX), Bottom: allel I/O. Next we’ll discuss the modes of operation and checkpointing inode types (e.g., code to access remote how content in a checkpoint is controlled. data) 4 Extensible Namespaces since the mount is not managed by GassyFS it has no ability to integrate the external content or maintain refer- In the previous section we discussed how persistence is ences within a checkpoint. And second, all I/O to the ex- achieved in GassyFS using a basic versioning checkpoint ternal mount must flow through the host, preventing fea- scheme.
Recommended publications
  • CIS 191 Linux Lab Exercise
    CIS 191 Linux Lab Exercise Lab 9: Backup and Restore Fall 2008 Lab 9: Backup and Restore The purpose of this lab is to explore the different types of backups and methods of restoring them. We will look at three utilities often used for backups and learn the advantages and disadvantages of each. Supplies • VMWare Server 1.05 or higher • Benji VM Preconfiguration • Labs 6 and Labs 7 completed on a pristine Benji VM [root@benji /]# fdisk -l Disk /dev/sda: 5368 MB, 5368709120 bytes 255 heads, 63 sectors/track, 652 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 382 3068383+ 83 Linux /dev/sda2 383 447 522112+ 82 Linux swap / Solaris /dev/sda3 448 511 514080 83 Linux /dev/sda4 512 652 1132582+ 5 Extended /dev/sda5 512 549 305203+ 83 Linux /dev/sda6 550 556 56196 83 Linux /dev/sda7 557 581 200781 83 Linux [root@benji /]# [root@benji /]# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda5 on /opt type ext3 (rw) /dev/sda3 on /var type ext3 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/sda7 on /home type ext3 (rw,usrquota) [root@benji /]# cat /etc/fstab LABEL=/1 / ext3 defaults 1 1 devpts /dev/pts devpts gid=5,mode=620 0 0 tmpfs /dev/shm tmpfs defaults 0 0 LABEL=/opt /opt ext3 defaults 1 2 proc /proc proc defaults 0 0 sysfs /sys sysfs defaults 0 0 LABEL=/var /var ext3 defaults 1 2 LABEL=SWAP-sda2 swap swap defaults 0 0 LABEL=/home /home ext3 usrquota,defaults 1 2 [root@benji /]# Forum If you get stuck on one of the steps below don’t beat your head against the wall.
    [Show full text]
  • Connecting the Storage System to the Solaris Host
    Configuration Guide for Solaris™ Host Attachment Hitachi Virtual Storage Platform Hitachi Universal Storage Platform V/VM FASTFIND LINKS Document Organization Product Version Getting Help Contents MK-96RD632-05 Copyright © 2010 Hitachi, Ltd., all rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or stored in a database or retrieval system for any purpose without the express written permission of Hitachi, Ltd. (hereinafter referred to as “Hitachi”) and Hitachi Data Systems Corporation (hereinafter referred to as “Hitachi Data Systems”). Hitachi Data Systems reserves the right to make changes to this document at any time without notice and assumes no responsibility for its use. This document contains the most current information available at the time of publication. When new and/or revised information becomes available, this entire document will be updated and distributed to all registered users. All of the features described in this document may not be currently available. Refer to the most recent product announcement or contact your local Hitachi Data Systems sales office for information about feature and product availability. Notice: Hitachi Data Systems products and services can be ordered only under the terms and conditions of the applicable Hitachi Data Systems agreement(s). The use of Hitachi Data Systems products is governed by the terms of your agreement(s) with Hitachi Data Systems. Hitachi is a registered trademark of Hitachi, Ltd. in the United States and other countries. Hitachi Data Systems is a registered trademark and service mark of Hitachi, Ltd.
    [Show full text]
  • Managing File Systems in Oracle® Solaris 11.4
    ® Managing File Systems in Oracle Solaris 11.4 Part No: E61016 November 2020 Managing File Systems in Oracle Solaris 11.4 Part No: E61016 Copyright © 2004, 2020, Oracle and/or its affiliates. License Restrictions Warranty/Consequential Damages Disclaimer This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. Warranty Disclaimer The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. Restricted Rights Notice If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial
    [Show full text]
  • Unionfs: User- and Community-Oriented Development of a Unification File System
    Unionfs: User- and Community-Oriented Development of a Unification File System David Quigley, Josef Sipek, Charles P. Wright, and Erez Zadok Stony Brook University {dquigley,jsipek,cwright,ezk}@cs.sunysb.edu Abstract If a file exists in multiple branches, the user sees only the copy in the higher-priority branch. Unionfs allows some branches to be read-only, Unionfs is a stackable file system that virtually but as long as the highest-priority branch is merges a set of directories (called branches) read-write, Unionfs uses copy-on-write seman- into a single logical view. Each branch is as- tics to provide an illusion that all branches are signed a priority and may be either read-only writable. This feature allows Live-CD develop- or read-write. When the highest priority branch ers to give their users a writable system based is writable, Unionfs provides copy-on-write se- on read-only media. mantics for read-only branches. These copy- on-write semantics have lead to widespread There are many uses for namespace unifica- use of Unionfs by LiveCD projects including tion. The two most common uses are Live- Knoppix and SLAX. In this paper we describe CDs and diskless/NFS-root clients. On Live- our experiences distributing and maintaining CDs, by definition, the data is stored on a read- an out-of-kernel module since November 2004. only medium. However, it is very convenient As of March 2006 Unionfs has been down- for users to be able to modify the data. Uni- loaded by over 6,700 unique users and is used fying the read-only CD with a writable RAM by over two dozen other projects.
    [Show full text]
  • We Get Letters Sept/Oct 2018
    SEE TEXT ONLY WeGetletters by Michael W Lucas letters@ freebsdjournal.org tmpfs, or be careful to monitor tmpfs space use. Hey, FJ Letters Dude, Not that you’ll configure your monitoring system Which filesystem should I use? to watch tmpfs, because it’s temporary. And no matter what, one day you’ll forget —FreeBSD Newbie that you used memory space as a filesystem. You’ll stash something vital in that temporary space, then reboot. And get really annoyed Dear FreeBSD Newbie, when that vital data vanishes into the ether. First off, welcome to FreeBSD. The wider com- Some other filesystems aren’t actively terrible. munity is glad to help you. The device filesystem devfs(5) provides device Second, please let me know who told you to nodes. Filesystems that can’t store user data are start off by writing me. I need to properly… the best filesystems. But then some clever sysad- “thank” them. min decides to hack on /etc/devfs.rules to Filesystems? Sure, let’s talk filesystems. change the standard device nodes for their spe- Discussing which filesystem is the worst is like cial application, or /etc/devd.conf to create or debating the merits of two-handed swords as reconfigure device nodes, and the whole system compared to lumberjack-grade chainsaws and goes down the tubes. industrial tulip presses. While every one of them Speaking of clever sysadmins, now and then has perfectly legitimate uses, in the hands of the people decide that they want to optimize disk novice they’re far more likely to maim everyone space or cut down how many copies of a file involved.
    [Show full text]
  • Comparison of Kernel and User Space File Systems
    Comparison of kernel and user space file systems — Bachelor Thesis — Arbeitsbereich Wissenschaftliches Rechnen Fachbereich Informatik Fakultät für Mathematik, Informatik und Naturwissenschaften Universität Hamburg Vorgelegt von: Kira Isabel Duwe E-Mail-Adresse: [email protected] Matrikelnummer: 6225091 Studiengang: Informatik Erstgutachter: Professor Dr. Thomas Ludwig Zweitgutachter: Professor Dr. Norbert Ritter Betreuer: Michael Kuhn Hamburg, den 28. August 2014 Abstract A file system is part of the operating system and defines an interface between OS and the computer’s storage devices. It is used to control how the computer names, stores and basically organises the files and directories. Due to many different requirements, such as efficient usage of the storage, a grand variety of approaches arose. The most important ones are running in the kernel as this has been the only way for a long time. In 1994, developers came up with an idea which would allow mounting a file system in the user space. The FUSE (Filesystem in Userspace) project was started in 2004 and implemented in the Linux kernel by 2005. This provides the opportunity for a user to write an own file system without editing the kernel code and therefore avoid licence problems. Additionally, FUSE offers a stable library interface. It is originally implemented as a loadable kernel module. Due to its design, all operations have to pass through the kernel multiple times. The additional data transfer and the context switches are causing some overhead which will be analysed in this thesis. So, there will be a basic overview about on how exactly a file system operation takes place and which mount options for a FUSE-based system result in a better performance.
    [Show full text]
  • State of the Art: Where We Are with the Ext3 Filesystem
    State of the Art: Where we are with the Ext3 filesystem Mingming Cao, Theodore Y. Ts’o, Badari Pulavarty, Suparna Bhattacharya IBM Linux Technology Center {cmm, theotso, pbadari}@us.ibm.com, [email protected] Andreas Dilger, Alex Tomas, Cluster Filesystem Inc. [email protected], [email protected] Abstract 1 Introduction Although the ext2 filesystem[4] was not the first filesystem used by Linux and while other filesystems have attempted to lay claim to be- ing the native Linux filesystem (for example, The ext2 and ext3 filesystems on Linux R are when Frank Xia attempted to rename xiafs to used by a very large number of users. This linuxfs), nevertheless most would consider the is due to its reputation of dependability, ro- ext2/3 filesystem as most deserving of this dis- bustness, backwards and forwards compatibil- tinction. Why is this? Why have so many sys- ity, rather than that of being the state of the tem administrations and users put their trust in art in filesystem technology. Over the last few the ext2/3 filesystem? years, however, there has been a significant amount of development effort towards making There are many possible explanations, includ- ext3 an outstanding filesystem, while retaining ing the fact that the filesystem has a large and these crucial advantages. In this paper, we dis- diverse developer community. However, in cuss those features that have been accepted in our opinion, robustness (even in the face of the mainline Linux 2.6 kernel, including direc- hardware-induced corruption) and backwards tory indexing, block reservation, and online re- compatibility are among the most important sizing.
    [Show full text]
  • OST Data Migrations Using ZFS Snapshot/Send/Receive
    OST data migrations using ZFS snapshot/send/receive Tom Crowe Research Technologies High Performance File Systems [email protected] Indiana University Abstract Data migrations can be time consuming and tedious, often requiring large maintenance windows of downtime. Some common reasons for data migrations include aging and/or failing hardware, increases in capacity, and greater performance. Traditional file and block based “copy tools” each have pros and cons, but the time to complete the migration is often the core issue. Some file based tools are feature rich, allowing quick comparisons of date/time stamps, or changed blocks inside a file. However examining multiple millions, or even billions of files takes time. Even when there is little no no data churn, a final "sync" may take hours, even days to complete, with little data movement. Block based tools have fairly predictable transfer speeds when the block device is otherwise "idle", however many block based tools do not allow "delta" transfers. The entire block device needs to be read, and then written out to another block device to complete the migration Abstract - continued ZFS backed OST’s can be migrated to new hardware or to existing reconfigured hardware, by leveraging ZFS snapshots and ZFS send/receive operations. The ZFS snapshot/send/receive migration method leverages incremental data transfers, allowing an initial data copy to be "caught up" with subsequent incremental changes. This migration method preserves all the ZFS Lustre properities (mgsnode, fsname, network, index, etc), but allows the underlying zpool geometry to be changed on the destination. The rolling ZFS snapshot/send/receive operations can be maintained on a per OST basis, allowing granular migrations.
    [Show full text]
  • Sibylfs: Formal Specification and Oracle-Based Testing for POSIX and Real-World File Systems
    SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems Tom Ridge1 David Sheets2 Thomas Tuerk3 Andrea Giugliano1 Anil Madhavapeddy2 Peter Sewell2 1University of Leicester 2University of Cambridge 3FireEye http://sibylfs.io/ Abstract 1. Introduction Systems depend critically on the behaviour of file systems, Problem File systems, in common with several other key but that behaviour differs in many details, both between systems components, have some well-known but challeng- implementations and between each implementation and the ing properties: POSIX (and other) prose specifications. Building robust and portable software requires understanding these details and differences, but there is currently no good way to system- • they provide behaviourally complex abstractions; atically describe, investigate, or test file system behaviour • there are many important file system implementations, across this complex multi-platform interface. each with its own internal complexities; In this paper we show how to characterise the envelope • different file systems, while broadly similar, nevertheless of allowed behaviour of file systems in a form that enables behave quite differently in some cases; and practical and highly discriminating testing. We give a math- • other system software and applications often must be ematically rigorous model of file system behaviour, SibylFS, written to be portable between file systems, and file sys- that specifies the range of allowed behaviours of a file sys- tems themselves are sometimes ported from one OS to tem for any sequence of the system calls within our scope, another, or written to support application portability. and that can be used as a test oracle to decide whether an ob- served trace is allowed by the model, both for validating the File system behaviour, and especially these variations in be- model and for testing file systems against it.
    [Show full text]
  • Freenas® 11.2-U3 User Guide
    FreeNAS® 11.2-U3 User Guide March 2019 Edition FreeNAS® is © 2011-2019 iXsystems FreeNAS® and the FreeNAS® logo are registered trademarks of iXsystems FreeBSD® is a registered trademark of the FreeBSD Foundation Written by users of the FreeNAS® network-attached storage operating system. Version 11.2 Copyright © 2011-2019 iXsystems (https://www.ixsystems.com/) CONTENTS Welcome .............................................................. 8 Typographic Conventions ..................................................... 10 1 Introduction 11 1.1 New Features in 11.2 .................................................... 11 1.1.1 RELEASE-U1 ..................................................... 14 1.1.2 U2 .......................................................... 14 1.1.3 U3 .......................................................... 15 1.2 Path and Name Lengths .................................................. 16 1.3 Hardware Recommendations ............................................... 17 1.3.1 RAM ......................................................... 17 1.3.2 The Operating System Device ........................................... 18 1.3.3 Storage Disks and Controllers ........................................... 18 1.3.4 Network Interfaces ................................................. 19 1.4 Getting Started with ZFS .................................................. 20 2 Installing and Upgrading 21 2.1 Getting FreeNAS® ...................................................... 21 2.2 Preparing the Media ...................................................
    [Show full text]
  • Rump File Systems Kernel Code Reborn
    Rump File Systems Kernel Code Reborn Antti Kantee [email protected] Helsinki University of Technology USENIX Annual Technical Conference, San Diego, USA June 2009 Introduction ● kernel / userspace dichotomy – interfaces dictate environment ● make kernel file systems run in userspace in a complete and maintainable way – full stack, no code forks or #ifdef ● file system is a protocol translator – read(off,size,n) => blocks 001,476,711,999 Implementation status ● NetBSD kernel file system code runs unmodified in a userspace process ● total of 13 kernel file systems – cd9660, efs, ext2fs, fat, ffs, hfs+, lfs, nfs, ntfs, puffs, sysvbfs, tmpfs, udf – disk, memory, network, ”other” ● implementation shipped as source and binary with NetBSD 5.0 and later Terminology rump: runnable userspace meta program 1) userspace program using kernel code 2) framework which enables above rump kernel: part of rump with kernel code host (OS): system running the rump(1) Talk outline motivation use cases implementation evaluation Motivation ● original motivation: kernel development ● additional benefits: – security – code reuse in userspace tools – kernel code reuse on other systems Contrasts 1)usermode OS, emulator, virtual machine, second machine, turing machine, etc. – acknowledge that we already have an OS – vm simplifications, abstraction shortcuts, etc. – direct host service (no additional userland) 2)userspace file systems (e.g. FUSE) – reuse existing code, not write new code against another interface Talk outline motivation use cases implementation evaluation
    [Show full text]
  • File Systems
    File Systems Parallel Storage Systems 2021-04-20 Jun.-Prof. Dr. Michael Kuhn [email protected] Parallel Computing and I/O Institute for Intelligent Cooperating Systems Faculty of Computer Science Otto von Guericke University Magdeburg https://parcio.ovgu.de Outline Review File Systems Review Introduction Structure Example: ext4 Alternatives Summary Michael Kuhn File Systems 1 / 51 Storage Devices Review • Which hard-disk drive parameter is increasing at the slowest rate? 1. Capacity 2. Throughput 3. Latency 4. Density Michael Kuhn File Systems 2 / 51 Storage Devices Review • Which RAID level does not provide redundancy? 1. RAID 0 2. RAID 1 3. RAID 5 4. RAID 6 Michael Kuhn File Systems 2 / 51 Storage Devices Review • Which problem is called write hole? 1. Inconsistency due to non-atomic data/parity update 2. Incorrect parity calculation 3. Storage device failure during reconstruction 4. Partial stripe update Michael Kuhn File Systems 2 / 51 Outline Introduction File Systems Review Introduction Structure Example: ext4 Alternatives Summary Michael Kuhn File Systems 3 / 51 Motivation Introduction 1. File systems provide structure • File systems typically use a hierarchical organization • Hierarchy is built from files and directories • Other approaches: Tagging, queries etc. 2. File systems manage data and metadata • They are responsible for block allocation and management • Access is handled via file and directory names • Metadata includes access permissions, time stamps etc. • File systems use underlying storage devices • Devices can also be provided by storage arrays such as RAID • Common choices are Logical Volume Manager (LVM) and/or mdadm on Linux Michael Kuhn File Systems 4 / 51 Examples Introduction • Linux: tmpfs, ext4, XFS, btrfs, ZFS • Windows: FAT, exFAT, NTFS • OS X: HFS+, APFS • Universal: ISO9660, UDF • Pseudo: sysfs, proc Michael Kuhn File Systems 5 / 51 Examples..
    [Show full text]