File Systems Unfit As Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution

Total Page:16

File Type:pdf, Size:1020Kb

File Systems Unfit As Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution Abutalib Aghayev Sage Weil Michael Kuchnik Carnegie Mellon University Red Hat, Inc. Carnegie Mellon University Mark Nelson Gregory R. Ganger George Amvrosiadis Red Hat, Inc. Carnegie Mellon University Carnegie Mellon University Abstract 1 Introduction For a decade, the Ceph distributed file system followed the Distributed file systems operate on a cluster of machines, conventional wisdom of building its storage backend on top each assigned one or more roles such as cluster state mon- of local file systems. This is a preferred choice for most dis- itor, metadata server, and storage server. Storage servers, tributed file systems today because it allows them to benefit which form the bulk of the machines in the cluster, receive from the convenience and maturity of battle-tested code. I/O requests over the network and serve them from locally Ceph’s experience, however, shows that this comes at a high attached storage devices using storage backend software. Sit- price. First, developing a zero-overhead transaction mecha- ting in the I/O path, the storage backend plays a key role in nism is challenging. Second, metadata performance at the the performance of the overall system. local level can significantly affect performance at the dis- Traditionally distributed file systems have used local file tributed level. Third, supporting emerging storage hardware systems, such as ext4 or XFS, directly or through middleware, is painstakingly slow. as the storage backend [29, 34, 37, 41, 74, 84, 93, 98, 101, 102]. Ceph addressed these issues with BlueStore, a new back- This approach has delivered reasonable performance, pre- end designed to run directly on raw storage devices. In only cluding questions on the suitability of file systems as a dis- two years since its inception, BlueStore outperformed previ- tributed storage backend. Several reasons have contributed ous established backends and is adopted by 70% of users in to the success of file systems as the storage backend. First, production. By running in user space and fully controlling they allow delegating the hard problems of data persistence the I/O stack, it has enabled space-efficient metadata and and block allocation to a well-tested and highly performant data checksums, fast overwrites of erasure-coded data, inline code. Second, they offer a familiar interface (POSIX) and compression, decreased performance variability, and avoided abstractions (files, directories). Third, they enable the useof a series of performance pitfalls of local file systems. Finally, standard tools (ls, find) to explore disk contents. it makes the adoption of backwards-incompatible storage Ceph [98] is a widely-used, open-source distributed file hardware possible, an important trait in a changing storage system that followed this convention for a decade. Hard landscape that is learning to embrace hardware diversity. lessons that the Ceph team learned using several popular file systems led them to question the fitness of file systems as CCS Concepts • Information systems → Distributed storage backends. This is not surprising in hindsight. Stone- storage; • Software and its engineering → File systems braker, after building the INGRES database for a decade, management; Software performance. noted that “operating systems offer all things to all people at much higher overhead” [90]. Similarly, exokernels demon- Keywords Ceph, object storage, distributed file system, strated that customizing abstractions to applications results storage backend, file system in significantly better performance31 [ , 50]. In addition to the performance penalty, adopting increasingly diverse stor- Permission to make digital or hard copies of all or part of this work for age hardware is becoming a challenge for local file systems, personal or classroom use is granted without fee provided that copies which were originally designed for a single storage medium. are not made or distributed for profit or commercial advantage and that The first contribution of this experience paper is to outline copies bear this notice and the full citation on the first page. Copyrights the main reasons behind Ceph’s decision to develop BlueStore, for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or a new storage backend deployed directly on raw storage de- republish, to post on servers or to redistribute to lists, requires prior specific vices. First, it is hard to implement efficient transactions on permission and/or a fee. Request permissions from [email protected]. top of existing file systems. A significant body of work aims SOSP ’19, October 27–30, 2019, Huntsville, ON, Canada to introduce transactions into file systems39 [ , 63, 65, 72, 77, © 2019 Copyright held by the owner/author(s). Publication rights licensed 80, 87, 106], but none of these approaches have been adopted to ACM. due to their high performance overhead, limited function- ACM ISBN 978-1-4503-6873-5/19/10...$15.00 https://doi.org/10.1145/3341301.3359656 ality, interface complexity, or implementation complexity. 353 The experience of the Ceph team shows that the alternative overhead of the resulting extent reference-counting through options, such as leveraging the limited internal transaction careful interface design; (3) BlueFS—a user space file system mechanism of file systems, implementing Write-Ahead Log- that enables RocksDB to run faster on raw storage devices; ging in user space, or using a transactional key-value store, and (4) a space allocator with a fixed 35 MiB memory usage also deliver subpar performance. per terabyte of disk space. Second, the local file system’s metadata performance can In addition to the above contributions, we perform several significantly affect the performance of the distributed file experiments that evaluate the improvement of design changes system as a whole. More specifically, a key challenge that from Ceph’s previous production backend, FileStore, to Blue- the Ceph team faced was enumerating directories with mil- Store. We experimentally measure the performance effect of lions of entries fast, and the lack of ordering in the returned issues such as the overhead of journaling file systems, dou- result. Both Btrfs and XFS-based backends suffered from this ble writes to the journal, inefficient directory splitting, and problem, and directory splitting operations meant to distrib- update-in-place mechanisms (as opposed to copy-on-write). ute the metadata load were found to clash with file system policies, crippling overall system performance. 2 Background At the same time, the rigidity of mature file systems pre- vents them from adopting emerging storage hardware that This section aims to highlight the role of distributed storage abandon the venerable block interface. The history of produc- backends and the features that are essential for building an efficient distributed file system (§ 2.1). We provide a brief tion file systems shows that on average they take adecade overview of Ceph’s architecture (§ 2.2) and the evolution of to mature [30, 56, 103, 104]. Once file systems mature, their maintainers tend to be conservative when it comes to making Ceph’s storage backend over the last decade (§ 2.3), intro- fundamental changes due to the consequences of mistakes. ducing terms that will be used throughout the paper. On the other hand, novel storage hardware aimed for data centers introduce backward-incompatible interfaces that re- 2.1 Essentials of Distributed Storage Backends quire drastic changes. For example, to increase capacity, hard Distributed file systems aggregate storage space from multi- disk drive (HDD) vendors are moving to Shingled Magnetic ple physical machines into a single unified data store that Recording (SMR) technology [38, 62, 82] that works best with offers high-bandwidth and parallel I/O, horizontal scalability, a backward-incompatible zone interface [46]. Similarly, to fault tolerance, and strong consistency. While distributed eliminate the long I/O tail latency in solid state drives (SSDs) file systems may be designed differently and use unique caused by the Flash Translation Layer (FTL) [35, 54, 108], terms to refer to the machines managing data placement vendors are introducing Zoned Namespace (ZNS) SSDs that on physical media, the storage backend is usually defined eliminate the FTL, again, exposing the zone interface [9, 26]. as the software module directly managing the storage de- Cloud storage providers [58, 73, 109] and storage server ven- vice attached to physical machines. For example, Lustre’s dors [17, 57] are already adapting their private software Object Storage Servers (OSSs) store data on Object Stor- stacks to use the zoned devices. Distributed file systems, age Targets [102] (OSTs), GlusterFS’s Nodes store data on however, are stalled by delays in the adoption of zoned de- Bricks [74], and Ceph’s Nodes store data on Object Storage vices in local file systems. Devices (OSDs) [98]. In these, and other systems, the storage In 2015, the Ceph project started designing and implement- backend is the software module that manages space on disks ing BlueStore, a user space storage backend that stores data (OSTs, Bricks, OSDs) attached to physical machines (OSSs, directly on raw storage devices, and metadata in a key-value Nodes). store. By taking full control of the I/O path, BlueStore has Widely-used distributed file systems such as
Recommended publications
  • Userland Tools and Techniques for Board Bring up and Systems Integration
    Userland Tools and Techniques for Board Bring Up and Systems Integration ELC 2012 HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Agenda * Introduction * What? * Why bother with userland? * Common SoC interfaces * Typical Scenario * Kernel setup * GPIO/UART/I2C/SPI/Other * Questions HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Introduction * SoC offer a lot of integrated functionality * System designs differ by outside parts * Most mobile systems are SoC * "CPU boards" for SoCs * Available BSP for starting * Vendor or other sources * Common Unique components * Memory (RAM) * Storage ("flash") * IO * Displays HY Research LLC * Power supplies http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC What? * IO related items * I2C * SPI * UART * GPIO * USB HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Why userland? * Easier for non kernel savy * Quicker turn around time * Easier to debug * Often times available already Sample userland from BSP/LSP vendor * Kernel driver is not ready HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Common SoC interfaces Most SoC have these and more: * Pinmux * UART * GPIO * I2C * SPI * USB * Not discussed: Audio/Displays HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Typical Scenario Custom board: * Load code/Bring up memory * Setup memory controller for part used * Load Linux * Toggle lines on board to verify Prototype based on demo/eval board: * Start with board at a shell prompt * Get newly attached hw to work HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Kernel Setup Additions to typical configs: * Enable UART support (typically done already) * Enable I2C support along with drivers for the SoC * (CONFIG_I2C + other) * Enable SPIdev * (CONFIG_SPI + CONFIG_SPI_SPIDEV) * Add SPIDEV to board file * Enable GPIO sysfs * (CONFIG_GPIO + other + CONFIG_GPIO_SYSFS) * Enable USB * Depends on OTG vs normal, etc.
    [Show full text]
  • Base Performance for Beegfs with Data Protection
    BeeGFS Data Integrity Improvements without impacting Performance Presenter: Dr. M. K. Jibbe Technical Director , NetApp ESG Presenter: Joey Parnell 1 SW2019 StorageArchitect, Developer Conference. NetApp © 2019 NetApp ESG Inc. NetApp Proprietary All Rights Reserved. Current Problem • Lustre has been a common choice in the HPC market for years but the acquisition of the Lustre intellectual property by storage provider in 2018 has created a challenge for users that want to avoid vendor lock-in. • Parallel file system needs are expanding beyond HPC, but most lack enterprise reliability • Any BeeGFS configuration does Lack Data protection at different locations: • Storage Client and servers do not have cache protection in the event of kernel panics, software crashes or power loss. Potential issues are: • Data cached in the BeeGFS client itself • Data cached at the underlying filesystem layer in the storage server (XFS, ZFS, ext4, etc.) • Data cached by the OS kernel itself underneath the filesystem in the storage server Goal of the Team (Remove Competitor and Expand Market Reach of BeeGFS) Parallel File system Improve solution resiliency to ensure data integrity by ML BeeGFS Swim lane identifying potential component faults that could lead to BeeGFS IB and FC data loss and changing system design to handle the fault Address known Market without any data integrity challenges. Opportunity Simple solution and support 2 2 2019 Storage Developer Conference. © 2019 NetApp Inc. NetApp Proprietary All Rights Reserved. Data Integrity & Availability expectations in HPC (Provide Enterprise Reliability with E-series Storage – BeeGFS) However, . The market we serve goes beyond the HPC scratch file system use cases .
    [Show full text]
  • Optimizing the Ceph Distributed File System for High Performance Computing
    Optimizing the Ceph Distributed File System for High Performance Computing Kisik Jeong Carl Duffy Jin-Soo Kim Joonwon Lee Sungkyunkwan University Seoul National University Seoul National University Sungkyunkwan University Suwon, South Korea Seoul, South Korea Seoul, South Korea Suwon, South Korea [email protected] [email protected] [email protected] [email protected] Abstract—With increasing demand for running big data ana- ditional HPC workloads. Traditional HPC workloads perform lytics and machine learning workloads with diverse data types, reads from a large shared file stored in a file system, followed high performance computing (HPC) systems consequently need by writes to the file system in a parallel manner. In this case, to support diverse types of storage services. Ceph is one possible candidate for such HPC environments, as Ceph provides inter- a storage service should provide good sequential read/write faces for object, block, and file storage. Ceph, however, is not performance. However, Ceph divides large files into a number designed for HPC environments, thus it needs to be optimized of chunks and distributes them to several different disks. This for HPC workloads. In this paper, we find and analyze problems feature has two main problems in HPC environments: 1) Ceph that arise when running HPC workloads on Ceph, and propose translates sequential accesses into random accesses, 2) Ceph a novel optimization technique called F2FS-split, based on the F2FS file system and several other optimizations. We measure needs to manage many files, incurring high journal overheads the performance of Ceph in HPC environments, and show that in the underlying file system.
    [Show full text]
  • Spring 2014 • Vol
    Editorial is published two times a year by The GAUSS Centre Editorial for Supercomputing (HLRS, LRZ, JSC) Welcome to this new issue of inside, tive of the German Research Society the journal on Innovative Supercomput- (DFG) start to create interesting re- Publishers ing in Germany published by the Gauss sults as is shown in our project section. Centre for Supercomputing (GCS). In Prof. Dr. A. Bode | Prof. Dr. Dr. Th. Lippert | Prof. Dr.-Ing. Dr. h.c. Dr. h.c. M. M. Resch this issue, there is a clear focus on With respect to hardware GCS is con- Editor applications. While the race for ever tinuously following its roadmap. New faster systems is accelerated by the systems will become available at HLRS F. Rainer Klank, HLRS [email protected] challenge to build an Exascale system in 2014 and at LRZ in 2015. System within a reasonable power budget, the shipment for HLRS is planned to start Design increased sustained performance allows on August 8 with a chance to start Linda Weinmann, HLRS [email protected] for the solution of ever more complex general operation as early as October Sebastian Zeeden, HLRS [email protected] problems. The range of applications 2014. We will report on further Pia Heusel, HLRS [email protected] presented is impressive covering a progress. variety of different fields. As usual, this issue includes informa- GCS has been a strong player in PRACE tion about events in supercomputing over the last years. In 2015 will con- in Germany over the last months and tinue to provide access to leading edge gives an outlook of workshops in the systems for all European researchers field.
    [Show full text]
  • Proxmox Ve Mit Ceph &
    PROXMOX VE MIT CEPH & ZFS ZUKUNFTSSICHERE INFRASTRUKTUR IM RECHENZENTRUM Alwin Antreich Proxmox Server Solutions GmbH FrOSCon 14 | 10. August 2019 Alwin Antreich Software Entwickler @ Proxmox 15 Jahre in der IT als Willkommen! System / Netzwerk Administrator FrOSCon 14 | 10.08.2019 2/33 Proxmox Server Solutions GmbH Aktive Community Proxmox seit 2005 Globales Partnernetz in Wien (AT) Proxmox Mail Gateway Enterprise (AGPL,v3) Proxmox VE (AGPL,v3) Support & Services FrOSCon 14 | 10.08.2019 3/33 Traditionelle Infrastruktur FrOSCon 14 | 10.08.2019 4/33 Hyperkonvergenz FrOSCon 14 | 10.08.2019 5/33 Hyperkonvergente Infrastruktur FrOSCon 14 | 10.08.2019 6/33 Voraussetzung für Hyperkonvergenz CPU / RAM / Netzwerk / Storage Verwende immer genug von allem. FrOSCon 14 | 10.08.2019 7/33 FrOSCon 14 | 10.08.2019 8/33 Was ist ‚das‘? ● Ceph & ZFS - software-basierte Storagelösungen ● ZFS lokal / Ceph verteilt im Netzwerk ● Hervorragende Performance, Verfügbarkeit und https://ceph.io/ Skalierbarkeit ● Verwaltung und Überwachung mit Proxmox VE ● Technischer Support für Ceph & ZFS inkludiert in Proxmox Subskription http://open-zfs.org/ FrOSCon 14 | 10.08.2019 9/33 FrOSCon 14 | 10.08.2019 10/33 FrOSCon 14 | 10.08.2019 11/33 ZFS Architektur FrOSCon 14 | 10.08.2019 12/33 ZFS ARC, L2ARC and ZIL With ZIL Without ZIL ARC RAM ARC RAM ZIL ZIL Application HDD Application SSD HDD FrOSCon 14 | 10.08.2019 13/33 FrOSCon 14 | 10.08.2019 14/33 FrOSCon 14 | 10.08.2019 15/33 Ceph Network Ceph Docs: https://docs.ceph.com/docs/master/ FrOSCon 14 | 10.08.2019 16/33 FrOSCon
    [Show full text]
  • Efficient Implementation of Data Objects in the OSD+-Based Fusion
    Efficient Implementation of Data Objects in the OSD+-Based Fusion Parallel File System Juan Piernas(B) and Pilar Gonz´alez-F´erez Departamento de Ingenier´ıa y Tecnolog´ıa de Computadores, Universidad de Murcia, Murcia, Spain piernas,pilar @ditec.um.es { } Abstract. OSD+s are enhanced object-based storage devices (OSDs) able to deal with both data and metadata operations via data and direc- tory objects, respectively. So far, we have focused on designing and implementing efficient directory objects in OSD+s. This paper, however, presents our work on also supporting data objects, and describes how the coexistence of both kinds of objects in OSD+s is profited to efficiently implement data objects and to speed up some commonfile operations. We compare our OSD+-based Fusion Parallel File System (FPFS) with Lustre and OrangeFS. Results show that FPFS provides a performance up to 37 better than Lustre, and up to 95 better than OrangeFS, × × for metadata workloads. FPFS also provides 34% more bandwidth than OrangeFS for data workloads, and competes with Lustre for data writes. Results also show serious scalability problems in Lustre and OrangeFS. Keywords: FPFS OSD+ Data objects Lustre OrangeFS · · · · 1 Introduction File systems for HPC environment have traditionally used a cluster of data servers for achieving high rates in read and write operations, for providing fault tolerance and scalability, etc. However, due to a growing number offiles, and an increasing use of huge directories with millions or billions of entries accessed by thousands of processes at the same time [3,8,12], some of thesefile systems also utilize a cluster of specialized metadata servers [6,10,11] and have recently added support for distributed directories [7,10].
    [Show full text]
  • Linux on the Road
    Linux on the Road Linux with Laptops, Notebooks, PDAs, Mobile Phones and Other Portable Devices Werner Heuser <wehe[AT]tuxmobil.org> Linux Mobile Edition Edition Version 3.22 TuxMobil Berlin Copyright © 2000-2011 Werner Heuser 2011-12-12 Revision History Revision 3.22 2011-12-12 Revised by: wh The address of the opensuse-mobile mailing list has been added, a section power management for graphics cards has been added, a short description of Intel's LinuxPowerTop project has been added, all references to Suspend2 have been changed to TuxOnIce, links to OpenSync and Funambol syncronization packages have been added, some notes about SSDs have been added, many URLs have been checked and some minor improvements have been made. Revision 3.21 2005-11-14 Revised by: wh Some more typos have been fixed. Revision 3.20 2005-11-14 Revised by: wh Some typos have been fixed. Revision 3.19 2005-11-14 Revised by: wh A link to keytouch has been added, minor changes have been made. Revision 3.18 2005-10-10 Revised by: wh Some URLs have been updated, spelling has been corrected, minor changes have been made. Revision 3.17.1 2005-09-28 Revised by: sh A technical and a language review have been performed by Sebastian Henschel. Numerous bugs have been fixed and many URLs have been updated. Revision 3.17 2005-08-28 Revised by: wh Some more tools added to external monitor/projector section, link to Zaurus Development with Damn Small Linux added to cross-compile section, some additions about acoustic management for hard disks added, references to X.org added to X11 sections, link to laptop-mode-tools added, some URLs updated, spelling cleaned, minor changes.
    [Show full text]
  • Serverless Network File Systems
    Serverless Network File Systems Thomas E. Anderson, Michael D. Dahlin, Jeanna M. Neefe, David A. Patterson, Drew S. Roselli, and Randolph Y. Wang Computer Science Division University of California at Berkeley Abstract In this paper, we propose a new paradigm for network file system design, serverless network file systems. While traditional network file systems rely on a central server machine, a serverless system utilizes workstations cooperating as peers to provide all file system services. Any machine in the system can store, cache, or control any block of data. Our approach uses this location independence, in combination with fast local area networks, to provide better performance and scalability than traditional file systems. Further, because any machine in the system can assume the responsibilities of a failed component, our serverless design also provides high availability via redundant data storage. To demonstrate our approach, we have implemented a prototype serverless network file system called xFS. Preliminary performance measurements suggest that our architecture achieves its goal of scalability. For instance, in a 32-node xFS system with 32 active clients, each client receives nearly as much read or write throughput as it would see if it were the only active client. 1. Introduction A serverless network file system distributes storage, cache, and control over cooperating workstations. This approach contrasts with traditional file systems such as Netware [Majo94], NFS [Sand85], Andrew [Howa88], and Sprite [Nels88] where a central server machine stores all data and satisfies all client cache misses. Such a central server is both a performance and reliability bottleneck. A serverless system, on the other hand, distributes control processing and data storage to achieve scalable high performance, migrates the responsibilities of failed components to the remaining machines to provide high availability, and scales gracefully to simplify system management.
    [Show full text]
  • Flexible Lustre Management
    Flexible Lustre management Making less work for Admins ORNL is managed by UT-Battelle for the US Department of Energy How do we know Lustre condition today • Polling proc / sysfs files – The knocking on the door model – Parse stats, rpc info, etc for performance deviations. • Constant collection of debug logs – Heavy parsing for common problems. • The death of a node – Have to examine kdumps and /or lustre dump Origins of a new approach • Requirements for Linux kernel integration. – No more proc usage – Migration to sysfs and debugfs – Used to configure your file system. – Started in lustre 2.9 and still on going. • Two ways to configure your file system. – On MGS server run lctl conf_param … • Directly accessed proc seq_files. – On MSG server run lctl set_param –P • Originally used an upcall to lctl for configuration • Introduced in Lustre 2.4 but was broken until lustre 2.12 (LU-7004) – Configuring file system works transparently before and after sysfs migration. Changes introduced with sysfs / debugfs migration • sysfs has a one item per file rule. • Complex proc files moved to debugfs • Moving to debugfs introduced permission problems – Only debugging files should be their. – Both debugfs and procfs have scaling issues. • Moving to sysfs introduced the ability to send uevents – Item of most interest from LUG 2018 Linux Lustre client talk. – Both lctl conf_param and lctl set_param –P use this approach • lctl conf_param can set sysfs attributes without uevents. See class_modify_config() – We get life cycle events for free – udev is now involved. What do we get by using udev ? • Under the hood – uevents are collect by systemd and then processed by udev rules – /etc/udev/rules.d/99-lustre.rules – SUBSYSTEM=="lustre", ACTION=="change", ENV{PARAM}=="?*", RUN+="/usr/sbin/lctl set_param '$env{PARAM}=$env{SETTING}’” • You can create your own udev rule – http://reactivated.net/writing_udev_rules.html – /lib/udev/rules.d/* for examples – Add udev_log="debug” to /etc/udev.conf if you have problems • Using systemd for long task.
    [Show full text]
  • Dm-X: Protecting Volume-Level Integrity for Cloud Volumes and Local
    dm-x: Protecting Volume-level Integrity for Cloud Volumes and Local Block Devices Anrin Chakraborti Bhushan Jain Jan Kasiak Stony Brook University Stony Brook University & Stony Brook University UNC, Chapel Hill Tao Zhang Donald Porter Radu Sion Stony Brook University & Stony Brook University & Stony Brook University UNC, Chapel Hill UNC, Chapel Hill ABSTRACT on Amazon’s Virtual Private Cloud (VPC) [2] (which provides The verified boot feature in recent Android devices, which strong security guarantees through network isolation) store deploys dm-verity, has been overwhelmingly successful in elim- data on untrusted storage devices residing in the public cloud. inating the extremely popular Android smart phone rooting For example, Amazon VPC file systems, object stores, and movement [25]. Unfortunately, dm-verity integrity guarantees databases reside on virtual block devices in the Amazon are read-only and do not extend to writable volumes. Elastic Block Storage (EBS). This paper introduces a new device mapper, dm-x, that This allows numerous attack vectors for a malicious cloud efficiently (fast) and reliably (metadata journaling) assures provider/untrusted software running on the server. For in- volume-level integrity for entire, writable volumes. In a direct stance, to ensure SEC-mandated assurances of end-to-end disk setup, dm-x overheads are around 6-35% over ext4 on the integrity, banks need to guarantee that the external cloud raw volume while offering additional integrity guarantees. For storage service is unable to remove log entries documenting cloud storage (Amazon EBS), dm-x overheads are negligible. financial transactions. Yet, without integrity of the storage device, a malicious cloud storage service could remove logs of a coordinated attack.
    [Show full text]
  • The Parallel File System Lustre
    The parallel file system Lustre Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT – University of the State Rolandof Baden Laifer-Württemberg – Internal and SCC Storage Workshop National Laboratory of the Helmholtz Association www.kit.edu Overview Basic Lustre concepts Lustre status Vendors New features Pros and cons INSTITUTSLustre-, FAKULTÄTS systems-, ABTEILUNGSNAME at (inKIT der Masteransicht ändern) Complexity of underlying hardware Remarks on Lustre performance 2 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Basic Lustre concepts Client ClientClient Directory operations, file open/close File I/O & file locking metadata & concurrency INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Recovery,Masteransicht ändern)file status, Metadata Server file creation Object Storage Server Lustre componets: Clients offer standard file system API (POSIX) Metadata servers (MDS) hold metadata, e.g. directory data, and store them on Metadata Targets (MDTs) Object Storage Servers (OSS) hold file contents and store them on Object Storage Targets (OSTs) All communicate efficiently over interconnects, e.g. with RDMA 3 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Lustre status (1) Huge user base about 70% of Top100 use Lustre Lustre HW + SW solutions available from many vendors: DDN (via resellers, e.g. HP, Dell), Xyratex – now Seagate (via resellers, e.g. Cray, HP), Bull, NEC, NetApp, EMC, SGI Lustre is Open Source INSTITUTS-, LotsFAKULTÄTS of organizational-, ABTEILUNGSNAME
    [Show full text]
  • A Fog Storage Software Architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein
    A Fog storage software architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein To cite this version: Bastien Confais, Adrien Lebre, Benoît Parrein. A Fog storage software architecture for the Internet of Things. Advances in Edge Computing: Massive Parallel Processing and Applications, IOS Press, pp.61-105, 2020, Advances in Parallel Computing, 978-1-64368-062-0. 10.3233/APC200004. hal- 02496105 HAL Id: hal-02496105 https://hal.archives-ouvertes.fr/hal-02496105 Submitted on 2 Mar 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. November 2019 A Fog storage software architecture for the Internet of Things Bastien CONFAIS a Adrien LEBRE b and Benoˆıt PARREIN c;1 a CNRS, LS2N, Polytech Nantes, rue Christian Pauc, Nantes, France b Institut Mines Telecom Atlantique, LS2N/Inria, 4 Rue Alfred Kastler, Nantes, France c Universite´ de Nantes, LS2N, Polytech Nantes, Nantes, France Abstract. The last prevision of the european Think Tank IDATE Digiworld esti- mates to 35 billion of connected devices in 2030 over the world just for the con- sumer market. This deep wave will be accompanied by a deluge of data, applica- tions and services.
    [Show full text]