Experiences on File Systems Which Is the Best File System for You?

Total Page:16

File Type:pdf, Size:1020Kb

Experiences on File Systems Which Is the Best File System for You? Experiences on File Systems Which is the best file system for you? Jakob Blomer CERN PH/SFT CHEP 2015 Okinawa, Japan 1 / 22 We like file systems for their rich and standardized interface We struggle finding an optimal implementation of that interface Why Distributed File Systems Physics experiments store their files in a variety of file systems for a good reason File system interface is portable: we can take our local analysis ∙ application and run it anywhere on a big data set File system as a storage abstraction is a ∙ sweet spot between data flexibility and data organization 2 / 22 Why Distributed File Systems Physics experiments store their files in a variety of file systems for a good reason File system interface is portable: we can take our local analysis ∙ application and run it anywhere on a big data set File system as a storage abstraction is a ∙ sweet spot between data flexibility and data organization We like file systems for their rich and standardized interface We struggle finding an optimal implementation of that interface 2 / 22 Shortlist of Distributed File Systems Quantcast File System CernVM-FS 3 / 22 Agenda 1 What do we want from a distributed file system? 2 Sorting and searching the file system landscape 3 Technology trends and future challenges 4 / 22 Can one size fit all? Change Frequency Throughput MB/s Mean File Size Data Classes Throughput Home folders IOPS ∙ Physics Data Data ∙ Value ∙ Recorded ∙ Simulated Volume ∙ Analysis results Confidentiality Software binaries ∙ Cache Hit Rate Scratch area Redundancy ∙ [Data are illustrative] 5 / 22 Can one size fit all? Change Frequency Throughput MB/s Mean File Size Data Classes Throughput Home folders IOPS ∙ Physics Data Data ∙ Value ∙ Recorded ∙ Simulated Volume ∙ Analysis results Confidentiality Software binaries ∙ Cache Hit Rate Scratch area Redundancy ∙ [Data are illustrative] 5 / 22 Can one size fit all? Change Frequency Throughput MB/s Mean File Size Data Classes Throughput Home folders IOPS ∙ Physics Data Data ∙ Value ∙ Recorded ∙ Simulated Volume ∙ Analysis results Confidentiality Software binaries ∙ Cache Hit Rate Scratch area Redundancy ∙ [Data are illustrative] 5 / 22 Can one size fit all? Change Frequency Throughput MB/s Mean File Size Data Classes Throughput Home folders IOPS ∙ Physics Data Data ∙ Value ∙ Recorded ∙ Simulated Volume ∙ Analysis results Confidentiality Software binaries ∙ Cache Hit Rate Scratch area Redundancy ∙ [Data are illustrative] 5 / 22 Can one size fit all? Change Frequency Throughput MB/s Mean File Size Data Classes Throughput Home folders IOPS ∙ Physics Data Data ∙ Value ∙ Recorded ∙ Simulated Volume ∙ Analysis results Confidentiality Software binaries ∙ Cache Hit Rate Scratch area Redundancy ∙ [Data are illustrative] Depending on the use case, the dimensions span orders of magnitude 5 / 22 POSIX Interface File system operations No DFS is fully POSIX essential ∙ compliant create(), unlink(), stat() open(), close(), It must provide just enough to read(), write(), seek() ∙ not break applications Often this can be only difficult for DFSs ∙ discovered by testing File locks Write-through Atomic rename() File ownership Extended attributes Unlink opened files Symbolic links, hard links Device files, IPC files 6 / 22 POSIX Interface File system operations No DFS is fully POSIX essential ∙ compliant create(), unlink(), stat() open(), close(), It must provide just enough to read(), write(), seek() ∙ not break applications Often this can be only difficult for DFSs ∙ discovered by testing File locks Write-through Atomic rename() Missing APIs File ownership Physical file location, file replication Extended attributes properties, file temperature, . Unlink opened files Symbolic links, hard links Device files, IPC files 6 / 22 What we ideally want is an application-defined, mountable file system Fuse Parrot Application-Defined File Systems? Mounted file system File system library FILE * f = fopen ( hdfsFS fs = hdfsConnect( "susy.dat", "r"); "default", 0); w h i l e (...){ hdfsFile f = hdfsOpenFile( fread (...); fs, "susy.dat", ...); ... w h i l e (...){ } hdfsRead(fs, f, ...); f c l o s e ( f ) ; ... } hdfsCloseFile(fs , f); Application independent from file system Performance tuned API Allows for standard tools (ls, grep,...) Requires code changes System administrator selects Application selects the file system the file system 7 / 22 Application-Defined File Systems? Mounted file system File system library FILE * f = fopen ( hdfsFS fs = hdfsConnect( "susy.dat", "r"); "default", 0); w h i l e (...){ hdfsFile f = hdfsOpenFile( fread (...); fs, "susy.dat", ...); ... w h i l e (...){ } hdfsRead(fs, f, ...); f c l o s e ( f ) ; ... } hdfsCloseFile(fs , f); Application independent from file system Performance tuned API Allows for standard tools (ls, grep,...) Requires code changes System administrator selects Application selects the file system the file system What we ideally want is an application-defined, mountable file system Fuse Parrot 7 / 22 Agenda 1 What do we want from a distributed file system? 2 Sorting and searching the file system landscape 3 Technology trends and future challenges 8 / 22 drop- Tahoe- box/own- LAFS cloud HDFS AFS Personal QFS Files Big Data MooseFS MapR FS XtreemFS Distributed General Ceph Lustre File Systems Purpose GPFS Gluster-FS Super- computers NFS Orange- Shared FS Panasas Disk BeeGFS OCFS2 GFS2 9 / 22 drop- Tahoe- box/own- LAFS cloud HDFS AFS Privacy ∙ Personal QFS Sharing ∙ Files Sync MapReduce ∙ ∙ workflows Big Data MooseFS Commodity MapR FS XtreemFS ∙ hardware Incremental Distributed ∙ scalabilityGeneral Ceph Lustre File Systems EasePurpose of ∙ administration GPFS Fast parallel Gluster-FS ∙ writesSuper- (p)NFS InfiniBand,computers ∙ Myrinet, . High level of Orange- Shared ∙ POSIX FS Disk Panasas compliance BeeGFS OCFS2 GFS2 9 / 22 drop- Tahoe- box/own- LAFS cloud HDFS AFS Privacy ∙ Personal QFS Sharing ∙ Files Sync MapReduce ∙ ∙ workflows Big Data MooseFS Commodity MapR FS XtreemFS ∙ hardware Incremental Distributed ∙ scalabilityGeneral Ceph Lustre File Systems EasePurpose of ∙ administration GPFS Fast parallel Gluster-FS ∙ writesSuper- (p)NFS InfiniBand,computers ∙ Myrinet, . High level of Orange- Shared ∙ POSIX FS Disk Panasas compliance BeeGFS OCFS2 GFS2 Used in HEP 9 / 22 drop- Tahoe- box/own- LAFS cloud HDFS AFS Privacy ∙ Personal QFS Sharing ∙ Files Sync dCache MapReduce ∙ ∙ workflows Big Data MooseFS Commodity MapR FS XtreemFS ∙ hardware XRootD Incremental Distributed ∙ scalabilityGeneral Ceph HEP Lustre File Systems EasePurpose of ∙ administration CernVM- FS GPFS Fast parallel Gluster-FS ∙ writesSuper- (p)NFS InfiniBand,computers EOS ∙ Myrinet, . High level of Orange- Shared ∙ POSIX FS Disk Panasas compliance BeeGFS OCFS2 GFS2 Used in HEP 10 / 22 drop- Tahoe- box/own- LAFS cloud HDFS AFS Privacy ∙ Personal QFS Sharing ∙ Files Sync dCache MapReduce ∙ ∙ workflows Big Data MooseFS Commodity MapR FS XtreemFS ∙ hardware Tape access XRootD ∙ WAN Incremental ∙ federation Distributed ∙ scalabilityGeneral Ceph HEP Lustre Software File Systems EasePurpose of ∙ distribution ∙ administration CernVM- Fault- FS ∙ tolerance GPFS Fast parallel Gluster-FS ∙ writesSuper- (p)NFS InfiniBand,computers EOS ∙ Myrinet, . High level of Orange- Shared ∙ POSIX FS Disk Panasas compliance BeeGFS OCFS2 GFS2 Used in HEP 10 / 22 File System Architecture Examples: Hadoop File System, Quantcast File System Object-based file system delete() meta-data create() read() write() ∙ ∙ ∙ ∙ data ∙ ∙ Target: Incremental scaling, large & immutable files Typical for Big Data applications 11 / 22 File System Architecture Examples: Hadoop File System, Quantcast File System Object-based file system delete() meta-data create() Head node can help in job scheduling read() write() ∙ ∙ ∙ ∙ data ∙ ∙ Target: Incremental scaling, large & immutable files Typical for Big Data applications 11 / 22 File System Architecture Examples: Lustre, MooseFS, pNFS, XtreemFS Parallel file system delete() meta-data create() read() write() ∙ data ∙ ∙ ∙ ∙ ∙ Target: Maximum aggregated throughput, large files Typical for High-Performance Computing 12 / 22 File System Architecture Examples: Ceph, OrangeFS Distributed meta-data delete() meta-data create() read() write() ∙ data ∙ ∙ ∙ ∙ ∙ Target: Avoid single point of failure and meta-data bottleneck Modern general-purpose distributed file system 13 / 22 File System Architecture Examples: GlusterFS Symmetric, peer-to-peer hash(pathn) Distributed hash table — Hosts of pathn ○ Target: Conceptual simplicity, inherently scalable 14 / 22 File System Architecture Examples: GlusterFS Symmetric, peer-to-peer hash(pathn) Difficult to deal with node churn Slow lookup beyond LAN In HEP we use caching and catalog based data management Distributed hash table — Hosts of pathn ○ Target: Conceptual simplicity, inherently scalable 14 / 22 Agenda 1 What do we want from a distributed file system? 2 Sorting and searching the file system landscape 3 Technology trends and future challenges 15 / 22 Trends and Challenges We are lucky: large data sets tend to be immutable everywhere For instance: media, backups, VM images, scientific data sets, . Reflected in hardware: shingled magnetic recording drives ∙ Reflected in software: log-structured space management ∙ We need to invest in scaling fault-tolerance and speed together with with the capacity Replication becomes too expensive at the Petabyte and Exabyte scale ∙ Erasure codes ! Explicit use of SSDs ∙ ∙ For meta-data (high IOPS requirement) ∙ As a fast storage pool ∙ As a node-local cache
Recommended publications
  • Elastic Storage for Linux on System Z IBM Research & Development Germany
    Peter Münch T/L Test and Development – Elastic Storage for Linux on System z IBM Research & Development Germany Elastic Storage for Linux on IBM System z © Copyright IBM Corporation 2014 9.0 Elastic Storage for Linux on System z Session objectives • This presentation introduces the Elastic Storage, based on General Parallel File System technology that will be available for Linux on IBM System z. Understand the concepts of Elastic Storage and which functions will be available for Linux on System z. Learn how you can integrate and benefit from the Elastic Storage in a Linux on System z environment. Finally, get your first impression in a live demo of Elastic Storage. 2 © Copyright IBM Corporation 2014 Elastic Storage for Linux on System z Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. AIX* FlashSystem Storwize* Tivoli* DB2* IBM* System p* WebSphere* DS8000* IBM (logo)* System x* XIV* ECKD MQSeries* System z* z/VM* * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
    [Show full text]
  • Base Performance for Beegfs with Data Protection
    BeeGFS Data Integrity Improvements without impacting Performance Presenter: Dr. M. K. Jibbe Technical Director , NetApp ESG Presenter: Joey Parnell 1 SW2019 StorageArchitect, Developer Conference. NetApp © 2019 NetApp ESG Inc. NetApp Proprietary All Rights Reserved. Current Problem • Lustre has been a common choice in the HPC market for years but the acquisition of the Lustre intellectual property by storage provider in 2018 has created a challenge for users that want to avoid vendor lock-in. • Parallel file system needs are expanding beyond HPC, but most lack enterprise reliability • Any BeeGFS configuration does Lack Data protection at different locations: • Storage Client and servers do not have cache protection in the event of kernel panics, software crashes or power loss. Potential issues are: • Data cached in the BeeGFS client itself • Data cached at the underlying filesystem layer in the storage server (XFS, ZFS, ext4, etc.) • Data cached by the OS kernel itself underneath the filesystem in the storage server Goal of the Team (Remove Competitor and Expand Market Reach of BeeGFS) Parallel File system Improve solution resiliency to ensure data integrity by ML BeeGFS Swim lane identifying potential component faults that could lead to BeeGFS IB and FC data loss and changing system design to handle the fault Address known Market without any data integrity challenges. Opportunity Simple solution and support 2 2 2019 Storage Developer Conference. © 2019 NetApp Inc. NetApp Proprietary All Rights Reserved. Data Integrity & Availability expectations in HPC (Provide Enterprise Reliability with E-series Storage – BeeGFS) However, . The market we serve goes beyond the HPC scratch file system use cases .
    [Show full text]
  • Optimizing the Ceph Distributed File System for High Performance Computing
    Optimizing the Ceph Distributed File System for High Performance Computing Kisik Jeong Carl Duffy Jin-Soo Kim Joonwon Lee Sungkyunkwan University Seoul National University Seoul National University Sungkyunkwan University Suwon, South Korea Seoul, South Korea Seoul, South Korea Suwon, South Korea [email protected] [email protected] [email protected] [email protected] Abstract—With increasing demand for running big data ana- ditional HPC workloads. Traditional HPC workloads perform lytics and machine learning workloads with diverse data types, reads from a large shared file stored in a file system, followed high performance computing (HPC) systems consequently need by writes to the file system in a parallel manner. In this case, to support diverse types of storage services. Ceph is one possible candidate for such HPC environments, as Ceph provides inter- a storage service should provide good sequential read/write faces for object, block, and file storage. Ceph, however, is not performance. However, Ceph divides large files into a number designed for HPC environments, thus it needs to be optimized of chunks and distributes them to several different disks. This for HPC workloads. In this paper, we find and analyze problems feature has two main problems in HPC environments: 1) Ceph that arise when running HPC workloads on Ceph, and propose translates sequential accesses into random accesses, 2) Ceph a novel optimization technique called F2FS-split, based on the F2FS file system and several other optimizations. We measure needs to manage many files, incurring high journal overheads the performance of Ceph in HPC environments, and show that in the underlying file system.
    [Show full text]
  • Spring 2014 • Vol
    Editorial is published two times a year by The GAUSS Centre Editorial for Supercomputing (HLRS, LRZ, JSC) Welcome to this new issue of inside, tive of the German Research Society the journal on Innovative Supercomput- (DFG) start to create interesting re- Publishers ing in Germany published by the Gauss sults as is shown in our project section. Centre for Supercomputing (GCS). In Prof. Dr. A. Bode | Prof. Dr. Dr. Th. Lippert | Prof. Dr.-Ing. Dr. h.c. Dr. h.c. M. M. Resch this issue, there is a clear focus on With respect to hardware GCS is con- Editor applications. While the race for ever tinuously following its roadmap. New faster systems is accelerated by the systems will become available at HLRS F. Rainer Klank, HLRS [email protected] challenge to build an Exascale system in 2014 and at LRZ in 2015. System within a reasonable power budget, the shipment for HLRS is planned to start Design increased sustained performance allows on August 8 with a chance to start Linda Weinmann, HLRS [email protected] for the solution of ever more complex general operation as early as October Sebastian Zeeden, HLRS [email protected] problems. The range of applications 2014. We will report on further Pia Heusel, HLRS [email protected] presented is impressive covering a progress. variety of different fields. As usual, this issue includes informa- GCS has been a strong player in PRACE tion about events in supercomputing over the last years. In 2015 will con- in Germany over the last months and tinue to provide access to leading edge gives an outlook of workshops in the systems for all European researchers field.
    [Show full text]
  • Proxmox Ve Mit Ceph &
    PROXMOX VE MIT CEPH & ZFS ZUKUNFTSSICHERE INFRASTRUKTUR IM RECHENZENTRUM Alwin Antreich Proxmox Server Solutions GmbH FrOSCon 14 | 10. August 2019 Alwin Antreich Software Entwickler @ Proxmox 15 Jahre in der IT als Willkommen! System / Netzwerk Administrator FrOSCon 14 | 10.08.2019 2/33 Proxmox Server Solutions GmbH Aktive Community Proxmox seit 2005 Globales Partnernetz in Wien (AT) Proxmox Mail Gateway Enterprise (AGPL,v3) Proxmox VE (AGPL,v3) Support & Services FrOSCon 14 | 10.08.2019 3/33 Traditionelle Infrastruktur FrOSCon 14 | 10.08.2019 4/33 Hyperkonvergenz FrOSCon 14 | 10.08.2019 5/33 Hyperkonvergente Infrastruktur FrOSCon 14 | 10.08.2019 6/33 Voraussetzung für Hyperkonvergenz CPU / RAM / Netzwerk / Storage Verwende immer genug von allem. FrOSCon 14 | 10.08.2019 7/33 FrOSCon 14 | 10.08.2019 8/33 Was ist ‚das‘? ● Ceph & ZFS - software-basierte Storagelösungen ● ZFS lokal / Ceph verteilt im Netzwerk ● Hervorragende Performance, Verfügbarkeit und https://ceph.io/ Skalierbarkeit ● Verwaltung und Überwachung mit Proxmox VE ● Technischer Support für Ceph & ZFS inkludiert in Proxmox Subskription http://open-zfs.org/ FrOSCon 14 | 10.08.2019 9/33 FrOSCon 14 | 10.08.2019 10/33 FrOSCon 14 | 10.08.2019 11/33 ZFS Architektur FrOSCon 14 | 10.08.2019 12/33 ZFS ARC, L2ARC and ZIL With ZIL Without ZIL ARC RAM ARC RAM ZIL ZIL Application HDD Application SSD HDD FrOSCon 14 | 10.08.2019 13/33 FrOSCon 14 | 10.08.2019 14/33 FrOSCon 14 | 10.08.2019 15/33 Ceph Network Ceph Docs: https://docs.ceph.com/docs/master/ FrOSCon 14 | 10.08.2019 16/33 FrOSCon
    [Show full text]
  • Oracle® ZFS Storage Appliance Security Guide, Release OS8.6.X
    ® Oracle ZFS Storage Appliance Security Guide, Release OS8.6.x Part No: E76480-01 September 2016 Oracle ZFS Storage Appliance Security Guide, Release OS8.6.x Part No: E76480-01 Copyright © 2014, 2016, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS. Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs.
    [Show full text]
  • Efficient Implementation of Data Objects in the OSD+-Based Fusion
    Efficient Implementation of Data Objects in the OSD+-Based Fusion Parallel File System Juan Piernas(B) and Pilar Gonz´alez-F´erez Departamento de Ingenier´ıa y Tecnolog´ıa de Computadores, Universidad de Murcia, Murcia, Spain piernas,pilar @ditec.um.es { } Abstract. OSD+s are enhanced object-based storage devices (OSDs) able to deal with both data and metadata operations via data and direc- tory objects, respectively. So far, we have focused on designing and implementing efficient directory objects in OSD+s. This paper, however, presents our work on also supporting data objects, and describes how the coexistence of both kinds of objects in OSD+s is profited to efficiently implement data objects and to speed up some commonfile operations. We compare our OSD+-based Fusion Parallel File System (FPFS) with Lustre and OrangeFS. Results show that FPFS provides a performance up to 37 better than Lustre, and up to 95 better than OrangeFS, × × for metadata workloads. FPFS also provides 34% more bandwidth than OrangeFS for data workloads, and competes with Lustre for data writes. Results also show serious scalability problems in Lustre and OrangeFS. Keywords: FPFS OSD+ Data objects Lustre OrangeFS · · · · 1 Introduction File systems for HPC environment have traditionally used a cluster of data servers for achieving high rates in read and write operations, for providing fault tolerance and scalability, etc. However, due to a growing number offiles, and an increasing use of huge directories with millions or billions of entries accessed by thousands of processes at the same time [3,8,12], some of thesefile systems also utilize a cluster of specialized metadata servers [6,10,11] and have recently added support for distributed directories [7,10].
    [Show full text]
  • Gluster Roadmap: Recent Improvements and Upcoming Features
    Gluster roadmap: Recent improvements and upcoming features Niels de Vos GlusterFS co-maintainer [email protected] Agenda ● Introduction into Gluster ● Quick Start ● Current stable releases ● History of feature additions ● Plans for the upcoming 3.8 and 4.0 release ● Detailed description of a few select features FOSDEM, 30 January 2016 2 What is GlusterFS? ● Scalable, general-purpose storage platform ● POSIX-y Distributed File System ● Object storage (swift) ● Distributed block storage (qemu) ● Flexible storage (libgfapi) ● No Metadata Server ● Heterogeneous Commodity Hardware ● Flexible and Agile Scaling ● Capacity – Petabytes and beyond ● Performance – Thousands of Clients FOSDEM, 30 January 2016 3 Terminology ● Brick ● Fundamentally, a filesystem mountpoint ● A unit of storage used as a capacity building block ● Translator ● Logic between the file bits and the Global Namespace ● Layered to provide GlusterFS functionality FOSDEM, 30 January 2016 4 Terminology ● Volume ● Bricks combined and passed through translators ● Ultimately, what's presented to the end user ● Peer / Node ● Server hosting the brick filesystems ● Runs the Gluster daemons and participates in volumes ● Trusted Storage Pool ● A group of peers, like a “Gluster cluster” FOSDEM, 30 January 2016 5 Scale-out and Scale-up FOSDEM, 30 January 2016 6 Distributed Volume ● Files “evenly” spread across bricks ● Similar to file-level RAID 0 ● Server/Disk failure could be catastrophic FOSDEM, 30 January 2016 7 Replicated Volume ● Copies files to multiple bricks ● Similar to file-level
    [Show full text]
  • Membrane: Operating System Support for Restartable File Systems Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C
    Membrane: Operating System Support for Restartable File Systems Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift Computer Sciences Department, University of Wisconsin, Madison Abstract and most complex code bases in the kernel. Further, We introduce Membrane, a set of changes to the oper- file systems are still under active development, and new ating system to support restartable file systems. Mem- ones are introduced quite frequently. For example, Linux brane allows an operating system to tolerate a broad has many established file systems, including ext2 [34], class of file system failures and does so while remain- ext3 [35], reiserfs [27], and still there is great interest in ing transparent to running applications; upon failure, the next-generation file systems such as Linux ext4 and btrfs. file system restarts, its state is restored, and pending ap- Thus, file systems are large, complex, and under develop- plication requests are serviced as if no failure had oc- ment, the perfect storm for numerous bugs to arise. curred. Membrane provides transparent recovery through Because of the likely presence of flaws in their imple- a lightweight logging and checkpoint infrastructure, and mentation, it is critical to consider how to recover from includes novel techniques to improve performance and file system crashes as well. Unfortunately, we cannot di- correctness of its fault-anticipation and recovery machin- rectly apply previous work from the device-driver litera- ery. We tested Membrane with ext2, ext3, and VFAT. ture to improving file-system fault recovery. File systems, Through experimentation, we show that Membrane in- unlike device drivers, are extremely stateful, as they man- duces little performance overhead and can tolerate a wide age vast amounts of both in-memory and persistent data; range of file system crashes.
    [Show full text]
  • Storage Virtualization for KVM – Putting the Pieces Together
    Storage Virtualization for KVM – Putting the pieces together Bharata B Rao – [email protected] Deepak C Shettty – [email protected] M Mohan Kumar – [email protected] (IBM Linux Technology Center, Bangalore) Balamurugan Aramugam - [email protected] Shireesh Anjal – [email protected] (RedHat, Bangalore) Aug 2012 LPC2012 Linux is a registered trademark of Linus Torvalds. Agenda ● Problems around storage in virtualization ● GlusterFS as virt-ready file system – QEMU-GlusterFS integration – GlusterFS – Block device translator ● Virtualization management - oVirt and VDSM – VDSM-GlusterFS integration ● Storage integration – libstoragemgmt Problems in storage/FS in KVM virtualization ● Multiple choices for file system and virtualization management ● Lack of virtualization aware file systems ● File systems/storage functionality implemented in other layers of virtualization stack – Snapshots, block streaming, image formats in QEMU ● No well defined interface points in the virtualization stack for storage integration ● No standard interface/APIs available for services like backup and restore ● Need for a single FS/storage solution that works for local, SAN and NAS storage – Mixing different types of storage into a single filesystem namespace GlusterFS ● User space distributed file system that scales to several petabytes ● Aggregates storage resources from multiple nodes and presents a unified file system namespace GlusterFS - features ● Replication ● Striping ● Distribution ● Geo-replication/sync ● Online volume extension
    [Show full text]
  • The Parallel File System Lustre
    The parallel file system Lustre Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT – University of the State Rolandof Baden Laifer-Württemberg – Internal and SCC Storage Workshop National Laboratory of the Helmholtz Association www.kit.edu Overview Basic Lustre concepts Lustre status Vendors New features Pros and cons INSTITUTSLustre-, FAKULTÄTS systems-, ABTEILUNGSNAME at (inKIT der Masteransicht ändern) Complexity of underlying hardware Remarks on Lustre performance 2 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Basic Lustre concepts Client ClientClient Directory operations, file open/close File I/O & file locking metadata & concurrency INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Recovery,Masteransicht ändern)file status, Metadata Server file creation Object Storage Server Lustre componets: Clients offer standard file system API (POSIX) Metadata servers (MDS) hold metadata, e.g. directory data, and store them on Metadata Targets (MDTs) Object Storage Servers (OSS) hold file contents and store them on Object Storage Targets (OSTs) All communicate efficiently over interconnects, e.g. with RDMA 3 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Lustre status (1) Huge user base about 70% of Top100 use Lustre Lustre HW + SW solutions available from many vendors: DDN (via resellers, e.g. HP, Dell), Xyratex – now Seagate (via resellers, e.g. Cray, HP), Bull, NEC, NetApp, EMC, SGI Lustre is Open Source INSTITUTS-, LotsFAKULTÄTS of organizational-, ABTEILUNGSNAME
    [Show full text]
  • A Fog Storage Software Architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein
    A Fog storage software architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein To cite this version: Bastien Confais, Adrien Lebre, Benoît Parrein. A Fog storage software architecture for the Internet of Things. Advances in Edge Computing: Massive Parallel Processing and Applications, IOS Press, pp.61-105, 2020, Advances in Parallel Computing, 978-1-64368-062-0. 10.3233/APC200004. hal- 02496105 HAL Id: hal-02496105 https://hal.archives-ouvertes.fr/hal-02496105 Submitted on 2 Mar 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. November 2019 A Fog storage software architecture for the Internet of Things Bastien CONFAIS a Adrien LEBRE b and Benoˆıt PARREIN c;1 a CNRS, LS2N, Polytech Nantes, rue Christian Pauc, Nantes, France b Institut Mines Telecom Atlantique, LS2N/Inria, 4 Rue Alfred Kastler, Nantes, France c Universite´ de Nantes, LS2N, Polytech Nantes, Nantes, France Abstract. The last prevision of the european Think Tank IDATE Digiworld esti- mates to 35 billion of connected devices in 2030 over the world just for the con- sumer market. This deep wave will be accompanied by a deluge of data, applica- tions and services.
    [Show full text]