Removing the Middleman for High-Performance FUSE File

Total Page:16

File Type:pdf, Size:1020Kb

Removing the Middleman for High-Performance FUSE File Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support Yue Zhu*, Teng Wang*, Kathryn Mohror+, Adam Moody+, Kento Sato+, Muhib Khan*, Weikuan Yu* Florida State University* Lawrence Livermore National Laboratory+ Outline • Background & Motivation ·Design ·Performance Evaluation ·Conclusion ROSS’18 S-2 Introduction n High-performance computing (HPC) systems needs efficient file system for supporting large-scale scientific applications Ø Different file systems are used for different kinds of data in a single job Ø Both kernel- and user-level file systems can be used in the applications Ø Due to kernel-level file systems’ development complexity, reliability and portability issues, user-level file systems are more leveraged for particular I/O workloads with special purpose n Filesystem in UserSpacE (FUSE) Ø A software interface for UniX-like computer operating systems Ø It allows non-privileged users to create their own file systems without modifying kernel code Ø User defined file system is run as a separate process in user-space Ø Example: SSHFS, GlusterFS client, FusionFS(BigData’14) ROSS’18 S-3 How does FUSE Work? n Execution path of a function call Ø Send the request to the user-level file system process o App program → VFS → FUSE kernel module → User-level file system process Ø Return the data back to the application program o User-level file system process → FUSE kernel module → VFS → App program Application Program User Level File System User Space Kernel Space Virtual File System (VFS) FUSE Ext4 Page Cache Storage Device ROSS’18 S-4 FUSE File System vs. Native File System FUSE File System Native File System # User-kernel 4 2 Mode Switch # Context 2 0 Switch # Memory 2 1 Copies Application Program User Level File System User Space Kernel Space Virtual File System (VFS) FUSE Ext4 Page Cache Storage Device ROSS’18 S-5 Number of Context Switches & I/O Bandwidth n The complexity added in FUSE file system execution path causes performance degradation in I/O bandwidth Ø tmpfs: a file system that stores data in volatile memory Ø FUSE-tmpfs: a FUSE file system deployed on top of tmpfs Ø dd micro-benchmark and perf system profiling tool are used to gather the I/O bandwidth and the number of context switches Ø Experiment method: continually issue 1000 writes Write Bandwidth # Context Switches Block FUSE-tmpfs tmpfs FUSE- tmpfs Size (KB) (MB/s) (GB/s) tmpfs 4 163 1.3 1012 7 16 372 1.6 1012 7 64 519 1.7 1012 7 128 549 2.0 1012 7 256 569 2.4 2012 7 ROSS’18 S-6 Breakdown of Metadata & Data Latency n The actual file system operations (i.e. metadata or data operations) only occupy a small amount of total execution time Ø Tests are on tmpfs and FUSE-tmpfs Ø Real Operation in metadata operation: the time of conducting operation Ø Data Movement: the actual time of write in a complete write function call Ø Overhead: the cost besides the above two, e.g. the time of context switches 250 11.18% 600 Real Operation Data Movement 38.12% 200 500 Overhead Overhead 150 400 (ns) 300 100 37.86% 200 2.17% 33.7% Latency 50 Latency (ns) 100 34.8% 10.08%15.82% 0 tmpfs FUSE-tmpfs tmpfs FUSE-tmpfs 0 Create Close 1 4 16 64 128 256 Metadata Operations Transfer Sizes (KB) Fig. 1. Time Expense in Metadata Operations Fig. 2. Time Expense in Data Operations ROSS’18 S-7 Existing Solution and Our Approach n How to reduce the overheads from FUSE? Ø Build an independent user-space library to avoid going through kernel (e.g., IndexFS (SC’14), FusionFS) Ø However, this approach cannot support multiple FUSE libraries with distinct file paths and file descriptors n We propose Direct-FUSE to support multiple backend I/O services to an application Ø We adapted libsysio to our purpose in Direct-FUSE o libsysio is developed by Scalability team of Sandia National Lab): « a POSIX-like file I/O, and name space support for remote file systems from an application’s user-level address space. ROSS’18 S-8 Outline ·Background & Motivation • Design ·Performance Evaluation ·Conclusion ROSS’18 S-9 The Overview of Direct-FUSE n Direct-FUSE mainly consists of three components 1. Adapted-libsysio o Intercept file path and file descriptor for backend services identification o Simplify metadata and data execution path in original libsysio 2. lightweight-libfuse (not real libfuse) o Abstract file system operations from backend services to unified APIs 3. Backend services o Provide defined file system operations (e.g., FusionFS) Application Program Direct-FUSE Adapted-libsysio lightweight-libfuse Backend Services FUSE-Ext4 FusionFS Client …. Ext4 FusionFS Server … ROSS’18 S-10 Path and File Descriptor Operations n To facilitate the interception of file system operations for multiple backends, the operations are categorized into two: 1. File path operations i. Intercept prefiX and path (e.g., sshfs:/sshfs/test.txt) and return mount information ii. Look up corresponding inode based on the mount information, and redirect to defined operations 2. File descriptor operations i. Find open-file record based on given file descriptor « Open-file record contains pointers to inode, current stream position, etc ii. Redirect to defined operations based inode info in open-file record ROSS’18 S-11 Requirements for New Backends n Interact with FUSE high-level APIs n Separated as an independent user-space library Ø The library contains the fuse file system operations, initialization function, and also the unmount function Ø If a backend passes some specialized data to the fuse module via fuse_mount(), then the data has to be globalized for later file system operations n Implemented in C/C++ or has to be binary compatible with C/C++ ROSS’18 S-12 Outline ·Background and Challenges ·Design • Performance Evaluation ·Conclusion ROSS’18 S-13 Experimental Methodology n We compare the bandwidth of Direct-FUSE with local FUSE file system and native file system on disk and memory by Iozone Ø Disk o Ext4-fuse: FUSE file system overlying Ext4 o Ext4-direct: Ext4-fuse bypasses the FUSE kernel o Ext4-native: original Ext4 on disk Ø Memory o tmpfs-fuse, tmpfs-direct, and tmpfs-native are similar to the three tests on disk n We also compare the I/O bandwidth of distributed FUSE file system with Direct-FUSE Ø FusionFS: a distributed file system that supports metadata- and write-intensive operations ROSS’18 S-14 Sequential Write Bandwidth n Direct-FUSE achieves comparable bandwidth performance to the native file system Ø Ext4-direct outperforms Ext4-fuse by 16.5% on average Ø tmpfs-direct outperforms tmpfs-fuse at least 2.15x Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native 10000 1000 (MB/s) 100 10 1 Bandwidth 4 16 64 256 1024 Write Transfer Sizes (KB) ROSS’18 S-15 Sequential Read Bandwidth n Similar to the sequential write bandwidth, the read bandwidth of Direct-FUSE is comparable to the native file system Ø Ext4-direct outperforms Ext4-fuse by 2.5% on average Ø tmpfs-direct outperforms tmpfs-fuse at least 2.26X Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native 10000 1000 (MB/s) 100 10 1 Bandwidth 4 16 64 256 1024 Read Transfer Sizes (KB) ROSS’18 S-16 Distributed I/O Bandwidth n Direct-FUSE outperforms FusionFS in write bandwidth and shows comparable read bandwidth Ø Writes benefit more from the FUSE kernel bypassing n Direct-FUSE delivers similar scalability results as the original FusionFS 10000 10000 fusionfs direct-fusionfs fusionfs direct-fusionfs 1000 1000 100 (MB/s) 100 10 10 Bandwidth Bandwidth (MB/s) 1 Bandwidth 1 1 2 4 8 16 1 2 4 8 16 Number of Nodes Number of Nodes Write Read ROSS’18 S-17 Overhead Analysis n The dummy read/write occupies less than 3% of the complete I/O function time in Direct-FUSE, even when the I/O size is very small Ø Dummy write/read: no actual data movement, directly return once reach the backend service Ø Real write/read: the actual Direct-FUSE read and write I/O calls 10000 10000 dummy write real write dummy read real read s) 1000 s) 1000 n n 100 100 Latency ( Latency 10 ( Latency 10 1 1 1B 4B 16B 64B 256B 1KB 1B 4B 16B 64B 256B 1KB Transfer Sizes Transfer Sizes ROSS’18 S-18 Conclusions Ø We have revealed and analyzed the context switches count and time overheads in FUSE metadata and data operations Ø We have designed and implemented Direct-FUSE, which can avoid crossing kernel boundary and support multiple FUSE backends simultaneously Ø Our experimental results indicate that Direct-FUSE achieves significant performance improvement compared to original FUSE file systems ROSS’18 S-19 Sponsors of This Research ROSS’18 S-20.
Recommended publications
  • CIS 191 Linux Lab Exercise
    CIS 191 Linux Lab Exercise Lab 9: Backup and Restore Fall 2008 Lab 9: Backup and Restore The purpose of this lab is to explore the different types of backups and methods of restoring them. We will look at three utilities often used for backups and learn the advantages and disadvantages of each. Supplies • VMWare Server 1.05 or higher • Benji VM Preconfiguration • Labs 6 and Labs 7 completed on a pristine Benji VM [root@benji /]# fdisk -l Disk /dev/sda: 5368 MB, 5368709120 bytes 255 heads, 63 sectors/track, 652 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 382 3068383+ 83 Linux /dev/sda2 383 447 522112+ 82 Linux swap / Solaris /dev/sda3 448 511 514080 83 Linux /dev/sda4 512 652 1132582+ 5 Extended /dev/sda5 512 549 305203+ 83 Linux /dev/sda6 550 556 56196 83 Linux /dev/sda7 557 581 200781 83 Linux [root@benji /]# [root@benji /]# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda5 on /opt type ext3 (rw) /dev/sda3 on /var type ext3 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/sda7 on /home type ext3 (rw,usrquota) [root@benji /]# cat /etc/fstab LABEL=/1 / ext3 defaults 1 1 devpts /dev/pts devpts gid=5,mode=620 0 0 tmpfs /dev/shm tmpfs defaults 0 0 LABEL=/opt /opt ext3 defaults 1 2 proc /proc proc defaults 0 0 sysfs /sys sysfs defaults 0 0 LABEL=/var /var ext3 defaults 1 2 LABEL=SWAP-sda2 swap swap defaults 0 0 LABEL=/home /home ext3 usrquota,defaults 1 2 [root@benji /]# Forum If you get stuck on one of the steps below don’t beat your head against the wall.
    [Show full text]
  • Connecting the Storage System to the Solaris Host
    Configuration Guide for Solaris™ Host Attachment Hitachi Virtual Storage Platform Hitachi Universal Storage Platform V/VM FASTFIND LINKS Document Organization Product Version Getting Help Contents MK-96RD632-05 Copyright © 2010 Hitachi, Ltd., all rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or stored in a database or retrieval system for any purpose without the express written permission of Hitachi, Ltd. (hereinafter referred to as “Hitachi”) and Hitachi Data Systems Corporation (hereinafter referred to as “Hitachi Data Systems”). Hitachi Data Systems reserves the right to make changes to this document at any time without notice and assumes no responsibility for its use. This document contains the most current information available at the time of publication. When new and/or revised information becomes available, this entire document will be updated and distributed to all registered users. All of the features described in this document may not be currently available. Refer to the most recent product announcement or contact your local Hitachi Data Systems sales office for information about feature and product availability. Notice: Hitachi Data Systems products and services can be ordered only under the terms and conditions of the applicable Hitachi Data Systems agreement(s). The use of Hitachi Data Systems products is governed by the terms of your agreement(s) with Hitachi Data Systems. Hitachi is a registered trademark of Hitachi, Ltd. in the United States and other countries. Hitachi Data Systems is a registered trademark and service mark of Hitachi, Ltd.
    [Show full text]
  • Managing File Systems in Oracle® Solaris 11.4
    ® Managing File Systems in Oracle Solaris 11.4 Part No: E61016 November 2020 Managing File Systems in Oracle Solaris 11.4 Part No: E61016 Copyright © 2004, 2020, Oracle and/or its affiliates. License Restrictions Warranty/Consequential Damages Disclaimer This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. Warranty Disclaimer The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. Restricted Rights Notice If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial
    [Show full text]
  • Unionfs: User- and Community-Oriented Development of a Unification File System
    Unionfs: User- and Community-Oriented Development of a Unification File System David Quigley, Josef Sipek, Charles P. Wright, and Erez Zadok Stony Brook University {dquigley,jsipek,cwright,ezk}@cs.sunysb.edu Abstract If a file exists in multiple branches, the user sees only the copy in the higher-priority branch. Unionfs allows some branches to be read-only, Unionfs is a stackable file system that virtually but as long as the highest-priority branch is merges a set of directories (called branches) read-write, Unionfs uses copy-on-write seman- into a single logical view. Each branch is as- tics to provide an illusion that all branches are signed a priority and may be either read-only writable. This feature allows Live-CD develop- or read-write. When the highest priority branch ers to give their users a writable system based is writable, Unionfs provides copy-on-write se- on read-only media. mantics for read-only branches. These copy- on-write semantics have lead to widespread There are many uses for namespace unifica- use of Unionfs by LiveCD projects including tion. The two most common uses are Live- Knoppix and SLAX. In this paper we describe CDs and diskless/NFS-root clients. On Live- our experiences distributing and maintaining CDs, by definition, the data is stored on a read- an out-of-kernel module since November 2004. only medium. However, it is very convenient As of March 2006 Unionfs has been down- for users to be able to modify the data. Uni- loaded by over 6,700 unique users and is used fying the read-only CD with a writable RAM by over two dozen other projects.
    [Show full text]
  • We Get Letters Sept/Oct 2018
    SEE TEXT ONLY WeGetletters by Michael W Lucas letters@ freebsdjournal.org tmpfs, or be careful to monitor tmpfs space use. Hey, FJ Letters Dude, Not that you’ll configure your monitoring system Which filesystem should I use? to watch tmpfs, because it’s temporary. And no matter what, one day you’ll forget —FreeBSD Newbie that you used memory space as a filesystem. You’ll stash something vital in that temporary space, then reboot. And get really annoyed Dear FreeBSD Newbie, when that vital data vanishes into the ether. First off, welcome to FreeBSD. The wider com- Some other filesystems aren’t actively terrible. munity is glad to help you. The device filesystem devfs(5) provides device Second, please let me know who told you to nodes. Filesystems that can’t store user data are start off by writing me. I need to properly… the best filesystems. But then some clever sysad- “thank” them. min decides to hack on /etc/devfs.rules to Filesystems? Sure, let’s talk filesystems. change the standard device nodes for their spe- Discussing which filesystem is the worst is like cial application, or /etc/devd.conf to create or debating the merits of two-handed swords as reconfigure device nodes, and the whole system compared to lumberjack-grade chainsaws and goes down the tubes. industrial tulip presses. While every one of them Speaking of clever sysadmins, now and then has perfectly legitimate uses, in the hands of the people decide that they want to optimize disk novice they’re far more likely to maim everyone space or cut down how many copies of a file involved.
    [Show full text]
  • Comparison of Kernel and User Space File Systems
    Comparison of kernel and user space file systems — Bachelor Thesis — Arbeitsbereich Wissenschaftliches Rechnen Fachbereich Informatik Fakultät für Mathematik, Informatik und Naturwissenschaften Universität Hamburg Vorgelegt von: Kira Isabel Duwe E-Mail-Adresse: [email protected] Matrikelnummer: 6225091 Studiengang: Informatik Erstgutachter: Professor Dr. Thomas Ludwig Zweitgutachter: Professor Dr. Norbert Ritter Betreuer: Michael Kuhn Hamburg, den 28. August 2014 Abstract A file system is part of the operating system and defines an interface between OS and the computer’s storage devices. It is used to control how the computer names, stores and basically organises the files and directories. Due to many different requirements, such as efficient usage of the storage, a grand variety of approaches arose. The most important ones are running in the kernel as this has been the only way for a long time. In 1994, developers came up with an idea which would allow mounting a file system in the user space. The FUSE (Filesystem in Userspace) project was started in 2004 and implemented in the Linux kernel by 2005. This provides the opportunity for a user to write an own file system without editing the kernel code and therefore avoid licence problems. Additionally, FUSE offers a stable library interface. It is originally implemented as a loadable kernel module. Due to its design, all operations have to pass through the kernel multiple times. The additional data transfer and the context switches are causing some overhead which will be analysed in this thesis. So, there will be a basic overview about on how exactly a file system operation takes place and which mount options for a FUSE-based system result in a better performance.
    [Show full text]
  • State of the Art: Where We Are with the Ext3 Filesystem
    State of the Art: Where we are with the Ext3 filesystem Mingming Cao, Theodore Y. Ts’o, Badari Pulavarty, Suparna Bhattacharya IBM Linux Technology Center {cmm, theotso, pbadari}@us.ibm.com, [email protected] Andreas Dilger, Alex Tomas, Cluster Filesystem Inc. [email protected], [email protected] Abstract 1 Introduction Although the ext2 filesystem[4] was not the first filesystem used by Linux and while other filesystems have attempted to lay claim to be- ing the native Linux filesystem (for example, The ext2 and ext3 filesystems on Linux R are when Frank Xia attempted to rename xiafs to used by a very large number of users. This linuxfs), nevertheless most would consider the is due to its reputation of dependability, ro- ext2/3 filesystem as most deserving of this dis- bustness, backwards and forwards compatibil- tinction. Why is this? Why have so many sys- ity, rather than that of being the state of the tem administrations and users put their trust in art in filesystem technology. Over the last few the ext2/3 filesystem? years, however, there has been a significant amount of development effort towards making There are many possible explanations, includ- ext3 an outstanding filesystem, while retaining ing the fact that the filesystem has a large and these crucial advantages. In this paper, we dis- diverse developer community. However, in cuss those features that have been accepted in our opinion, robustness (even in the face of the mainline Linux 2.6 kernel, including direc- hardware-induced corruption) and backwards tory indexing, block reservation, and online re- compatibility are among the most important sizing.
    [Show full text]
  • OST Data Migrations Using ZFS Snapshot/Send/Receive
    OST data migrations using ZFS snapshot/send/receive Tom Crowe Research Technologies High Performance File Systems [email protected] Indiana University Abstract Data migrations can be time consuming and tedious, often requiring large maintenance windows of downtime. Some common reasons for data migrations include aging and/or failing hardware, increases in capacity, and greater performance. Traditional file and block based “copy tools” each have pros and cons, but the time to complete the migration is often the core issue. Some file based tools are feature rich, allowing quick comparisons of date/time stamps, or changed blocks inside a file. However examining multiple millions, or even billions of files takes time. Even when there is little no no data churn, a final "sync" may take hours, even days to complete, with little data movement. Block based tools have fairly predictable transfer speeds when the block device is otherwise "idle", however many block based tools do not allow "delta" transfers. The entire block device needs to be read, and then written out to another block device to complete the migration Abstract - continued ZFS backed OST’s can be migrated to new hardware or to existing reconfigured hardware, by leveraging ZFS snapshots and ZFS send/receive operations. The ZFS snapshot/send/receive migration method leverages incremental data transfers, allowing an initial data copy to be "caught up" with subsequent incremental changes. This migration method preserves all the ZFS Lustre properities (mgsnode, fsname, network, index, etc), but allows the underlying zpool geometry to be changed on the destination. The rolling ZFS snapshot/send/receive operations can be maintained on a per OST basis, allowing granular migrations.
    [Show full text]
  • Sibylfs: Formal Specification and Oracle-Based Testing for POSIX and Real-World File Systems
    SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems Tom Ridge1 David Sheets2 Thomas Tuerk3 Andrea Giugliano1 Anil Madhavapeddy2 Peter Sewell2 1University of Leicester 2University of Cambridge 3FireEye http://sibylfs.io/ Abstract 1. Introduction Systems depend critically on the behaviour of file systems, Problem File systems, in common with several other key but that behaviour differs in many details, both between systems components, have some well-known but challeng- implementations and between each implementation and the ing properties: POSIX (and other) prose specifications. Building robust and portable software requires understanding these details and differences, but there is currently no good way to system- • they provide behaviourally complex abstractions; atically describe, investigate, or test file system behaviour • there are many important file system implementations, across this complex multi-platform interface. each with its own internal complexities; In this paper we show how to characterise the envelope • different file systems, while broadly similar, nevertheless of allowed behaviour of file systems in a form that enables behave quite differently in some cases; and practical and highly discriminating testing. We give a math- • other system software and applications often must be ematically rigorous model of file system behaviour, SibylFS, written to be portable between file systems, and file sys- that specifies the range of allowed behaviours of a file sys- tems themselves are sometimes ported from one OS to tem for any sequence of the system calls within our scope, another, or written to support application portability. and that can be used as a test oracle to decide whether an ob- served trace is allowed by the model, both for validating the File system behaviour, and especially these variations in be- model and for testing file systems against it.
    [Show full text]
  • Freenas® 11.2-U3 User Guide
    FreeNAS® 11.2-U3 User Guide March 2019 Edition FreeNAS® is © 2011-2019 iXsystems FreeNAS® and the FreeNAS® logo are registered trademarks of iXsystems FreeBSD® is a registered trademark of the FreeBSD Foundation Written by users of the FreeNAS® network-attached storage operating system. Version 11.2 Copyright © 2011-2019 iXsystems (https://www.ixsystems.com/) CONTENTS Welcome .............................................................. 8 Typographic Conventions ..................................................... 10 1 Introduction 11 1.1 New Features in 11.2 .................................................... 11 1.1.1 RELEASE-U1 ..................................................... 14 1.1.2 U2 .......................................................... 14 1.1.3 U3 .......................................................... 15 1.2 Path and Name Lengths .................................................. 16 1.3 Hardware Recommendations ............................................... 17 1.3.1 RAM ......................................................... 17 1.3.2 The Operating System Device ........................................... 18 1.3.3 Storage Disks and Controllers ........................................... 18 1.3.4 Network Interfaces ................................................. 19 1.4 Getting Started with ZFS .................................................. 20 2 Installing and Upgrading 21 2.1 Getting FreeNAS® ...................................................... 21 2.2 Preparing the Media ...................................................
    [Show full text]
  • Rump File Systems Kernel Code Reborn
    Rump File Systems Kernel Code Reborn Antti Kantee [email protected] Helsinki University of Technology USENIX Annual Technical Conference, San Diego, USA June 2009 Introduction ● kernel / userspace dichotomy – interfaces dictate environment ● make kernel file systems run in userspace in a complete and maintainable way – full stack, no code forks or #ifdef ● file system is a protocol translator – read(off,size,n) => blocks 001,476,711,999 Implementation status ● NetBSD kernel file system code runs unmodified in a userspace process ● total of 13 kernel file systems – cd9660, efs, ext2fs, fat, ffs, hfs+, lfs, nfs, ntfs, puffs, sysvbfs, tmpfs, udf – disk, memory, network, ”other” ● implementation shipped as source and binary with NetBSD 5.0 and later Terminology rump: runnable userspace meta program 1) userspace program using kernel code 2) framework which enables above rump kernel: part of rump with kernel code host (OS): system running the rump(1) Talk outline motivation use cases implementation evaluation Motivation ● original motivation: kernel development ● additional benefits: – security – code reuse in userspace tools – kernel code reuse on other systems Contrasts 1)usermode OS, emulator, virtual machine, second machine, turing machine, etc. – acknowledge that we already have an OS – vm simplifications, abstraction shortcuts, etc. – direct host service (no additional userland) 2)userspace file systems (e.g. FUSE) – reuse existing code, not write new code against another interface Talk outline motivation use cases implementation evaluation
    [Show full text]
  • File Systems
    File Systems Parallel Storage Systems 2021-04-20 Jun.-Prof. Dr. Michael Kuhn [email protected] Parallel Computing and I/O Institute for Intelligent Cooperating Systems Faculty of Computer Science Otto von Guericke University Magdeburg https://parcio.ovgu.de Outline Review File Systems Review Introduction Structure Example: ext4 Alternatives Summary Michael Kuhn File Systems 1 / 51 Storage Devices Review • Which hard-disk drive parameter is increasing at the slowest rate? 1. Capacity 2. Throughput 3. Latency 4. Density Michael Kuhn File Systems 2 / 51 Storage Devices Review • Which RAID level does not provide redundancy? 1. RAID 0 2. RAID 1 3. RAID 5 4. RAID 6 Michael Kuhn File Systems 2 / 51 Storage Devices Review • Which problem is called write hole? 1. Inconsistency due to non-atomic data/parity update 2. Incorrect parity calculation 3. Storage device failure during reconstruction 4. Partial stripe update Michael Kuhn File Systems 2 / 51 Outline Introduction File Systems Review Introduction Structure Example: ext4 Alternatives Summary Michael Kuhn File Systems 3 / 51 Motivation Introduction 1. File systems provide structure • File systems typically use a hierarchical organization • Hierarchy is built from files and directories • Other approaches: Tagging, queries etc. 2. File systems manage data and metadata • They are responsible for block allocation and management • Access is handled via file and directory names • Metadata includes access permissions, time stamps etc. • File systems use underlying storage devices • Devices can also be provided by storage arrays such as RAID • Common choices are Logical Volume Manager (LVM) and/or mdadm on Linux Michael Kuhn File Systems 4 / 51 Examples Introduction • Linux: tmpfs, ext4, XFS, btrfs, ZFS • Windows: FAT, exFAT, NTFS • OS X: HFS+, APFS • Universal: ISO9660, UDF • Pseudo: sysfs, proc Michael Kuhn File Systems 5 / 51 Examples..
    [Show full text]