How to Teach an Old File System Dog New Object Store Tricks

Total Page:16

File Type:pdf, Size:1020Kb

How to Teach an Old File System Dog New Object Store Tricks How to Teach an Old File System Dog New Object Store Tricks USENIX HotStorage ’18 Eunji Lee1, Youil Han1, Suli Yang2, Andrea C. Arpaci-Dusseau2, Remzi H. Arpaci-Dusseau2 Chungbuk National University 1 Data-service Platforms • Layering • Abstract away underlying details • Reuse of existing software • Agility: development, operation, and maintenance Eco System of Data-Service Platform 2 Often at odds with efficiency • Local File System • Bottom layer of modern storage platforms • Portability, Extensibility, Ease of Development Distributed Data Store Distributed Data Store Distributed Data Store (Dynamo, MongoDB) (HBase, BigTable) (Ceph) Key-value Store Object Store Object Store Daemon (RocksDB, BerkelyDB) Distributed File System (Ceph) (HDFS, GFS) Local File System Local File System Local File System (Ext4, XFS, BtrFS) (Ext4, XFS, BtrFS) (Ext4, XFS, BtrFS) 3 Local File System • Not intended to serve as an underlying storage engine • Mismatch between the two layers • System-wide optimization • Ignore demands from individual applications • Little control over file system internals • Suffer from degraded QoS • Lack of required operations • No atomic operation • No data movement or reorganization • No additional user-level metadata Out-of-control and Sub-optimal Performance 4 Current Solutions • Bypass File System • Key-value store, Object Store, Database • But, reliniquish file system benefits • Extend file system interfaces • Add new features to POSIX APIs • Slow and conservative evolution • Stable maintenance than specific optimizations Name: Ext2/3/4 Birth: 1993 5 Our Approach • Use a file system as it is, but in a different manner! • Design patterns of user-level data platform • Take advantages of file system • Minimize negative effects of mismatches 6 Contents • Motivation • Problem Analysis • SwimStore • Performance Evaluation • Conclusion 7 Data-service Platform Taxonomy What is the best way to store objects atop a file system? Mapping Packing “Object as a file” “Multiple objects in a file” 8 Case Study: Ceph • Backend object store engine • FileStore : mapping • KStore : packing • BlueStore OSD FileStore KStore BlueStore RGW RBD CephFS RADOS File system … Storage Device Ceph Architecture Backend Object Store 9 Mapping vs. Packing FileStore (Mapping) KStore (Packing) Object LSM Tree A B … File A File A File B … Object Store Log Object Store Log “Object as a File” “Multiple Objects in a File” 10 Experimental Setup • Ceph 12.01 • Amazon EC2 Clusters • Intel Xeon quad-core • 32GB DRAM • 256 GB SSD x 2 • Ubuntu Server 16.04 • File System : XFS (recommended in Ceph) • Backend: FileStore, KStore • Benchmark: Rados • Metric: IOPS, throughput, write traffic 11 Performance • Small Write (4KB) • KStore performs better than FileStore by 1.5x • Write amplification by file metadata 1.5x 10.0 9.0 10.08.8x Original 8.8x 8.0 9.0 Logging Original 7.0 8.0 Compaction Logging 1x 7.0 6.0 Filesystem Compaction riginal Write 5.0 6.0 Filesystem O 4.4x riginal Write 5.0 4.0 O 4.4x Original write traffic 4.0 3.2x KStore 3.0 3.2xKstore(4KB) 864 MB IOPS 3.0 2.1x 2.0 2.1x Kstore(1MB) 2.4 GB 2.0 FileStore 1.0 Filestore(4KB) 332 MB Ratio wrto 1.0 0.0 Filestore(1MB) 3.8 GB Ratio wrto 0.0 Kstore Filestore Kstore Filestore 4KBFilestore 1MB Kstore Filestore Kstore 4KB 1MB Average IOPS Write Traffic Breakdown 12 Performance • Large Write (1MB) • FileStore outperforms KStore by 1.6x • Write amplification by compaction 10.0 8.8x 1.6x 9.0 10.0 Original 8.8x 8.0 9.0 Logging Original 7.0 8.0 Compaction Logging 1x 6.0 7.0 Filesystem Compaction riginal Write 5.0 6.0 O 4.4x Filesystem riginal Write 5.0 FileStore 4.0 O 3.2x 4.4x Original write traffic 3.0 4.0 3.2xKstore(4KB) 864 MB IOPS 3.0 2.1x 2.0 2.1x Kstore(1MB) 2.4 GB 2.0 KStore 1.0 Filestore(4KB) 332 MB Ratio wrto 1.0 0.0 Filestore(1MB) 3.8 GB Ratio wrto Kstore Filestore0.0 Kstore Filestore FileStore KStore 4KBFilestore 1MB Kstore Filestore Kstore 4KB 1MB Average IOPS Write Traffic Breakdown 13 Performance • Lack of atomic update support in file systems • Double-write penalty of logging • Halve bandwidth in large writes 10.0 10.09.0 8.8x Original 8.8x 8.09.0 OriginalLogging 8.0 7.0 LoggingCompaction 7.0 6.0 CompactionFilesystem riginal Write 6.0 Filesystem riginal Write 5.0 O 5.0 4.4x O 4.0 4.4x 4.0 3.2x 3.0 3.2x Original write traffic 3.0 2.1x Kstore(4KB) 864 MB 2.0 2.1x 2.0 Kstore(1MB) 2.4 GB 1.0 1.0 Filestore(4KB) 332 MB Ratio wrto Ratio wrto 0.0 0.0 Filestore(1MB) 3.8 GB FilestoreKstore FilestoreKstore KstoreFilestoreFilestoreKstore 4KB4KB 1MB 1MB Write Traffic Breakdown 14 QoS • FileStore Performance 4KB write Write Traffic 200 20 filestore BG-Write Throughput Page Cache FS: XFS 100 10 W: 4KB Write (MiB) 0 0 Throughput(MB/s) 0 10 20 30 40 50 60 Storage 1MB write Time(s) Periodic Flush w. Buffered I/O 200 300 filestore BG-Write Throughput Transaction Entanglement FS: XFS 100 150 W: 1MB Write (MiB) 0 0 Throughput(MB/s) 0 10 20 30 40 50 60 Time(s) 15 QoS • KStore 4KB write 300 30 kstore BG-Write Throughput User-level Cache FS: XFS 150 15 W: 4KB Write (MiB) 0 0 Throughput(MB/s) 0 10 20 30 40 50 Time(s) Storage 1MB write Frequent Compaction 200 300 Write amplification by merge kstore BG-Write Throughput FS: XFS 100 150 W: 1MB Write (MiB) Throughput: 0 0 Throughput(MB/s) 0 10 20 30 40 50 60 40MB/s Time(s) Consistently Poor 16 Summary • Performance penalties of file systems • Small objects seriously suffer from write amplification caused by filesystem metadata • Large writes are sensitive to write traffic increase by Logging in common, and frequent compaction in packing architecture. • Buffered I/O and out-of-control flush mechanism in file systems makes it challenging to support QoS. 17 Contents • Motivation • Problem Analysis • SwimStore • Performance Evaluation • Conclusion 18 SwimStore • Shadowing with Immutable Metadata Store • Provide consistently excellent performance for all object sizes running over a file system 19 SwimStore • Strategy 1. In-file shadowing Object Problems • Filesystem metadata overhead A’ • Double-write penalty • Performance fluctuation Direct I/O • Compaction cost File A B Indexing Log key, offset, length 20 SwimStore • Strategy 1. In-file shadowing FileStore SwimStore Raw Device Logging A’ A’ Synchronous Asynchronous Direct I/O Buffered I/O A A File File Log File System 21 User-facing Latency increases! SwimStore • File system access is slower than raw device access • File system metadata (e.g., inode, allocation bitmap, etc.) • Transaction entanglement A’ Synchronous m m m m Direct I/O File A File System 22 SwimStore • Strategy 2. Metadata-Immutable Container Single file Metadata- Immutable Per-file Container Raw 1 device A’ 0.86 Synchronous Create a container file Direct I/O and allocate space in advance 0.4 0.4 File A m m m m File System Latency (4KB write) 23 SwimStore • Strategy 3. Hole-punching with Buddy-like Allocation Shadowing technique requires the recycling of obsolete data space 24 SwimStore • Strategy 3. Hole-punching with Buddy-like Allocation Hole-punching …. Opportunities (+) Filesystem has “infinite address space” (+) Filesystem provides “physical space reclamation” with punch-hole 25 SwimStore • Strategy 3. Hole-punching with Buddy-like Allocation Too small holes severely fragments space New object Logical address Physical address 26 SwimStore • Strategy 3. Hole-punching with Buddy-like Allocation 2^0 GC for small holes 2^1 …. Hole-punching for large holes 2^n 27 SwimStore • Architecture … … … LSM-Tree (LevelDB) … Container File Pool Metadata Intent Log (Indexing, attributes, etc.) (metadata, checksum) Contents • Motivation • Problem Analysis • SwimStore • Performance Evaluation • Conclusion 29 Experimental Setup • Ceph 12.01, C++ 12K LOC • Amazon EC2 Clusters • Intel Xeon quad-core • 32GB DRAM • 256 GB SSD x 2 • Ubuntu Server 16.04 • File System : XFS (recommended in Ceph) • Backend: FileStore, KStore, BlueStore, SwimStore • Benchmark: Rados • Metric: IOPS, throughput, write traffic 30 Performance Evaluation • IOPS FileStore SwimStore KStore 3.0 2.5 2.0 1.5 FileStore 1.0 1454 ops/s 1472 ops/s 881 ops/s 243 ops/s 67 ops/s BlueStore 0.5 SwimStore KStore Ratio wrto FileStore 0.0 4KB 16KB 64KB 256KB 1MB BlueStore IOSize Small Write Large Write 2.5x better than FileStore 1.8x better than FileStore 1.6x better than BlueStore 3.1x better than KStore 1.1x better than KStore 31 Performance Evaluation • Write Traffic FileStore 5.0 4.0 3.0 FileStore 2.0 BlueStore 1.0 1129 (MB) 3020 (MB) 5370 (MB) 7342 (MB) 7342 (MB) SwimStore KStore Ratio wrto SwimStore 0.0 4KB 16KB 64KB 256KB 1MB IOSize BlueStore KStore SwimStore 32 Contents • Motivation • Problem Analysis • Solution • Performance Evaluation • Conclusion 33 Conclusion • Explore design patterns to build an object store atop a local file system • SwimStore: a new backend object store • In-file shadowing • Immutable metadata container • Hole-punching with buddy-like allocation • Provide high performance and little performance variations • Retain all benefits of the file system 34 Thank you 35.
Recommended publications
  • Optimizing the Ceph Distributed File System for High Performance Computing
    Optimizing the Ceph Distributed File System for High Performance Computing Kisik Jeong Carl Duffy Jin-Soo Kim Joonwon Lee Sungkyunkwan University Seoul National University Seoul National University Sungkyunkwan University Suwon, South Korea Seoul, South Korea Seoul, South Korea Suwon, South Korea [email protected] [email protected] [email protected] [email protected] Abstract—With increasing demand for running big data ana- ditional HPC workloads. Traditional HPC workloads perform lytics and machine learning workloads with diverse data types, reads from a large shared file stored in a file system, followed high performance computing (HPC) systems consequently need by writes to the file system in a parallel manner. In this case, to support diverse types of storage services. Ceph is one possible candidate for such HPC environments, as Ceph provides inter- a storage service should provide good sequential read/write faces for object, block, and file storage. Ceph, however, is not performance. However, Ceph divides large files into a number designed for HPC environments, thus it needs to be optimized of chunks and distributes them to several different disks. This for HPC workloads. In this paper, we find and analyze problems feature has two main problems in HPC environments: 1) Ceph that arise when running HPC workloads on Ceph, and propose translates sequential accesses into random accesses, 2) Ceph a novel optimization technique called F2FS-split, based on the F2FS file system and several other optimizations. We measure needs to manage many files, incurring high journal overheads the performance of Ceph in HPC environments, and show that in the underlying file system.
    [Show full text]
  • Proxmox Ve Mit Ceph &
    PROXMOX VE MIT CEPH & ZFS ZUKUNFTSSICHERE INFRASTRUKTUR IM RECHENZENTRUM Alwin Antreich Proxmox Server Solutions GmbH FrOSCon 14 | 10. August 2019 Alwin Antreich Software Entwickler @ Proxmox 15 Jahre in der IT als Willkommen! System / Netzwerk Administrator FrOSCon 14 | 10.08.2019 2/33 Proxmox Server Solutions GmbH Aktive Community Proxmox seit 2005 Globales Partnernetz in Wien (AT) Proxmox Mail Gateway Enterprise (AGPL,v3) Proxmox VE (AGPL,v3) Support & Services FrOSCon 14 | 10.08.2019 3/33 Traditionelle Infrastruktur FrOSCon 14 | 10.08.2019 4/33 Hyperkonvergenz FrOSCon 14 | 10.08.2019 5/33 Hyperkonvergente Infrastruktur FrOSCon 14 | 10.08.2019 6/33 Voraussetzung für Hyperkonvergenz CPU / RAM / Netzwerk / Storage Verwende immer genug von allem. FrOSCon 14 | 10.08.2019 7/33 FrOSCon 14 | 10.08.2019 8/33 Was ist ‚das‘? ● Ceph & ZFS - software-basierte Storagelösungen ● ZFS lokal / Ceph verteilt im Netzwerk ● Hervorragende Performance, Verfügbarkeit und https://ceph.io/ Skalierbarkeit ● Verwaltung und Überwachung mit Proxmox VE ● Technischer Support für Ceph & ZFS inkludiert in Proxmox Subskription http://open-zfs.org/ FrOSCon 14 | 10.08.2019 9/33 FrOSCon 14 | 10.08.2019 10/33 FrOSCon 14 | 10.08.2019 11/33 ZFS Architektur FrOSCon 14 | 10.08.2019 12/33 ZFS ARC, L2ARC and ZIL With ZIL Without ZIL ARC RAM ARC RAM ZIL ZIL Application HDD Application SSD HDD FrOSCon 14 | 10.08.2019 13/33 FrOSCon 14 | 10.08.2019 14/33 FrOSCon 14 | 10.08.2019 15/33 Ceph Network Ceph Docs: https://docs.ceph.com/docs/master/ FrOSCon 14 | 10.08.2019 16/33 FrOSCon
    [Show full text]
  • Red Hat Openstack* Platform with Red Hat Ceph* Storage
    Intel® Data Center Blocks for Cloud – Red Hat* OpenStack* Platform with Red Hat Ceph* Storage Reference Architecture Guide for deploying a private cloud based on Red Hat OpenStack* Platform with Red Hat Ceph Storage using Intel® Server Products. Rev 1.0 January 2017 Intel® Server Products and Solutions <Blank page> Intel® Data Center Blocks for Cloud – Red Hat* OpenStack* Platform with Red Hat Ceph* Storage Document Revision History Date Revision Changes January 2017 1.0 Initial release. 3 Intel® Data Center Blocks for Cloud – Red Hat® OpenStack® Platform with Red Hat Ceph Storage Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Learn more at Intel.com, or from the OEM or retailer. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.
    [Show full text]
  • Linux on the Road
    Linux on the Road Linux with Laptops, Notebooks, PDAs, Mobile Phones and Other Portable Devices Werner Heuser <wehe[AT]tuxmobil.org> Linux Mobile Edition Edition Version 3.22 TuxMobil Berlin Copyright © 2000-2011 Werner Heuser 2011-12-12 Revision History Revision 3.22 2011-12-12 Revised by: wh The address of the opensuse-mobile mailing list has been added, a section power management for graphics cards has been added, a short description of Intel's LinuxPowerTop project has been added, all references to Suspend2 have been changed to TuxOnIce, links to OpenSync and Funambol syncronization packages have been added, some notes about SSDs have been added, many URLs have been checked and some minor improvements have been made. Revision 3.21 2005-11-14 Revised by: wh Some more typos have been fixed. Revision 3.20 2005-11-14 Revised by: wh Some typos have been fixed. Revision 3.19 2005-11-14 Revised by: wh A link to keytouch has been added, minor changes have been made. Revision 3.18 2005-10-10 Revised by: wh Some URLs have been updated, spelling has been corrected, minor changes have been made. Revision 3.17.1 2005-09-28 Revised by: sh A technical and a language review have been performed by Sebastian Henschel. Numerous bugs have been fixed and many URLs have been updated. Revision 3.17 2005-08-28 Revised by: wh Some more tools added to external monitor/projector section, link to Zaurus Development with Damn Small Linux added to cross-compile section, some additions about acoustic management for hard disks added, references to X.org added to X11 sections, link to laptop-mode-tools added, some URLs updated, spelling cleaned, minor changes.
    [Show full text]
  • Serverless Network File Systems
    Serverless Network File Systems Thomas E. Anderson, Michael D. Dahlin, Jeanna M. Neefe, David A. Patterson, Drew S. Roselli, and Randolph Y. Wang Computer Science Division University of California at Berkeley Abstract In this paper, we propose a new paradigm for network file system design, serverless network file systems. While traditional network file systems rely on a central server machine, a serverless system utilizes workstations cooperating as peers to provide all file system services. Any machine in the system can store, cache, or control any block of data. Our approach uses this location independence, in combination with fast local area networks, to provide better performance and scalability than traditional file systems. Further, because any machine in the system can assume the responsibilities of a failed component, our serverless design also provides high availability via redundant data storage. To demonstrate our approach, we have implemented a prototype serverless network file system called xFS. Preliminary performance measurements suggest that our architecture achieves its goal of scalability. For instance, in a 32-node xFS system with 32 active clients, each client receives nearly as much read or write throughput as it would see if it were the only active client. 1. Introduction A serverless network file system distributes storage, cache, and control over cooperating workstations. This approach contrasts with traditional file systems such as Netware [Majo94], NFS [Sand85], Andrew [Howa88], and Sprite [Nels88] where a central server machine stores all data and satisfies all client cache misses. Such a central server is both a performance and reliability bottleneck. A serverless system, on the other hand, distributes control processing and data storage to achieve scalable high performance, migrates the responsibilities of failed components to the remaining machines to provide high availability, and scales gracefully to simplify system management.
    [Show full text]
  • Dm-X: Protecting Volume-Level Integrity for Cloud Volumes and Local
    dm-x: Protecting Volume-level Integrity for Cloud Volumes and Local Block Devices Anrin Chakraborti Bhushan Jain Jan Kasiak Stony Brook University Stony Brook University & Stony Brook University UNC, Chapel Hill Tao Zhang Donald Porter Radu Sion Stony Brook University & Stony Brook University & Stony Brook University UNC, Chapel Hill UNC, Chapel Hill ABSTRACT on Amazon’s Virtual Private Cloud (VPC) [2] (which provides The verified boot feature in recent Android devices, which strong security guarantees through network isolation) store deploys dm-verity, has been overwhelmingly successful in elim- data on untrusted storage devices residing in the public cloud. inating the extremely popular Android smart phone rooting For example, Amazon VPC file systems, object stores, and movement [25]. Unfortunately, dm-verity integrity guarantees databases reside on virtual block devices in the Amazon are read-only and do not extend to writable volumes. Elastic Block Storage (EBS). This paper introduces a new device mapper, dm-x, that This allows numerous attack vectors for a malicious cloud efficiently (fast) and reliably (metadata journaling) assures provider/untrusted software running on the server. For in- volume-level integrity for entire, writable volumes. In a direct stance, to ensure SEC-mandated assurances of end-to-end disk setup, dm-x overheads are around 6-35% over ext4 on the integrity, banks need to guarantee that the external cloud raw volume while offering additional integrity guarantees. For storage service is unable to remove log entries documenting cloud storage (Amazon EBS), dm-x overheads are negligible. financial transactions. Yet, without integrity of the storage device, a malicious cloud storage service could remove logs of a coordinated attack.
    [Show full text]
  • Red Hat Data Analytics Infrastructure Solution
    TECHNOLOGY DETAIL RED HAT DATA ANALYTICS INFRASTRUCTURE SOLUTION TABLE OF CONTENTS 1 INTRODUCTION ................................................................................................................ 2 2 RED HAT DATA ANALYTICS INFRASTRUCTURE SOLUTION ..................................... 2 2.1 The evolution of analytics infrastructure ....................................................................................... 3 Give data scientists and data 2.2 Benefits of a shared data repository on Red Hat Ceph Storage .............................................. 3 analytics teams access to their own clusters without the unnec- 2.3 Solution components ...........................................................................................................................4 essary cost and complexity of 3 TESTING ENVIRONMENT OVERVIEW ............................................................................ 4 duplicating Hadoop Distributed File System (HDFS) datasets. 4 RELATIVE COST AND PERFORMANCE COMPARISON ................................................ 6 4.1 Findings summary ................................................................................................................................. 6 Rapidly deploy and decom- 4.2 Workload details .................................................................................................................................... 7 mission analytics clusters on 4.3 24-hour ingest ........................................................................................................................................8
    [Show full text]
  • Unlock Bigdata Analytic Efficiency with Ceph Data Lake
    Unlock Bigdata Analytic Efficiency With Ceph Data Lake Jian Zhang, Yong Fu, March, 2018 Agenda . Background & Motivations . The Workloads, Reference Architecture Evolution and Performance Optimization . Performance Comparison with Remote HDFS . Summary & Next Step 3 Challenges of scaling Hadoop* Storage BOUNDED Storage and Compute resources on Hadoop Nodes brings challenges Data Capacity Silos Costs Performance & efficiency Typical Challenges Data/Capacity Multiple Storage Silos Space, Spent, Power, Utilization Upgrade Cost Inadequate Performance Provisioning And Configuration Source: 451 Research, Voice of the Enterprise: Storage Q4 2015 *Other names and brands may be claimed as the property of others. 4 Options To Address The Challenges Compute and Large Cluster More Clusters Storage Disaggregation • Lacks isolation - • Cost of • Isolation of high- noisy neighbors duplicating priority workloads hinder SLAs datasets across • Shared big • Lacks elasticity - clusters datasets rigid cluster size • Lacks on-demand • On-demand • Can’t scale provisioning provisioning compute/storage • Can’t scale • compute/storage costs separately compute/storage costs scale costs separately separately Compute and Storage disaggregation provides Simplicity, Elasticity, Isolation 5 Unified Hadoop* File System and API for cloud storage Hadoop Compatible File System abstraction layer: Unified storage API interface Hadoop fs –ls s3a://job/ adl:// oss:// s3n:// gs:// s3:// s3a:// wasb:// 2006 2008 2014 2015 2016 6 Proposal: Apache Hadoop* with disagreed Object Storage SQL …… Hadoop Services • Virtual Machine • Container • Bare Metal HCFS Compute 1 Compute 2 Compute 3 … Compute N Object Storage Services Object Object Object Object • Co-located with gateway Storage 1 Storage 2 Storage 3 … Storage N • Dynamic DNS or load balancer • Data protection via storage replication or erasure code Disaggregated Object Storage Cluster • Storage tiering *Other names and brands may be claimed as the property of others.
    [Show full text]
  • XFS: There and Back ...And There Again? Slide 1 of 38
    XFS: There and Back.... .... and There Again? Dave Chinner <[email protected]> <[email protected]> XFS: There and Back .... and There Again? Slide 1 of 38 Overview • Story Time • Serious Things • These Days • Shiny Things • Interesting Times XFS: There and Back .... and There Again? Slide 2 of 38 Story Time • Way back in the early '90s • Storage exceeding 32 bit capacities • 64 bit CPUs, large scale MP • Hundreds of disks in a single machine • XFS: There..... Slide 3 of 38 "x" is for Undefined xFS had to support: • Fast Crash Recovery • Large File Systems • Large, Sparse Files • Large, Contiguous Files • Large Directories • Large Numbers of Files • - Scalability in the XFS File System, 1995 http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html XFS: There..... Slide 4 of 38 The Early Years XFS: There..... Slide 5 of 38 The Early Years • Late 1994: First Release, Irix 5.3 • Mid 1996: Default FS, Irix 6.2 • Already at Version 4 • Attributes • Journalled Quotas • link counts > 64k • feature masks • • XFS: There..... Slide 6 of 38 The Early Years • • Allocation alignment to storage geometry (1997) • Unwritten extents (1998) • Version 2 directories (1999) • mkfs time configurable block size • Scalability to tens of millions of directory entries • • XFS: There..... Slide 7 of 38 What's that Linux Thing? • Feature development mostly stalled • Irix development focussed on CXFS • New team formed for Linux XFS port! • Encumberance review! • Linux was missing lots of bits XFS needed • Lot of work needed • • XFS: There and..... Slide 8 of 38 That Linux Thing? XFS: There and..... Slide 9 of 38 Light that fire! • 2000: SGI releases XFS under GPL • • 2001: First stable XFS release • • 2002: XFS merged into 2.5.36 • • JFS follows similar timeline • XFS: There and....
    [Show full text]
  • Comparing Filesystem Performance: Red Hat Enterprise Linux 6 Vs
    COMPARING FILE SYSTEM I/O PERFORMANCE: RED HAT ENTERPRISE LINUX 6 VS. MICROSOFT WINDOWS SERVER 2012 When choosing an operating system platform for your servers, you should know what I/O performance to expect from the operating system and file systems you select. In the Principled Technologies labs, using the IOzone file system benchmark, we compared the I/O performance of two operating systems and file system pairs, Red Hat Enterprise Linux 6 with ext4 and XFS file systems, and Microsoft Windows Server 2012 with NTFS and ReFS file systems. Our testing compared out-of-the-box configurations for each operating system, as well as tuned configurations optimized for better performance, to demonstrate how a few simple adjustments can elevate I/O performance of a file system. We found that file systems available with Red Hat Enterprise Linux 6 delivered better I/O performance than those shipped with Windows Server 2012, in both out-of- the-box and optimized configurations. With I/O performance playing such a critical role in most business applications, selecting the right file system and operating system combination is critical to help you achieve your hardware’s maximum potential. APRIL 2013 A PRINCIPLED TECHNOLOGIES TEST REPORT Commissioned by Red Hat, Inc. About file system and platform configurations While you can use IOzone to gauge disk performance, we concentrated on the file system performance of two operating systems (OSs): Red Hat Enterprise Linux 6, where we examined the ext4 and XFS file systems, and Microsoft Windows Server 2012 Datacenter Edition, where we examined NTFS and ReFS file systems.
    [Show full text]
  • Evaluation of Active Storage Strategies for the Lustre Parallel File System
    Evaluation of Active Storage Strategies for the Lustre Parallel File System Juan Piernas Jarek Nieplocha Evan J. Felix Pacific Northwest National Pacific Northwest National Pacific Northwest National Laboratory Laboratory Laboratory P.O. Box 999 P.O. Box 999 P.O. Box 999 Richland, WA 99352 Richland, WA 99352 Richland, WA 99352 [email protected] [email protected] [email protected] ABSTRACT umes of data remains a challenging problem. Despite the Active Storage provides an opportunity for reducing the improvements of storage capacities, the cost of bandwidth amount of data movement between storage and compute for moving data between the processing nodes and the stor- nodes of a parallel filesystem such as Lustre, and PVFS. age devices has not improved at the same rate as the disk ca- It allows certain types of data processing operations to be pacity. One approach to reduce the bandwidth requirements performed directly on the storage nodes of modern paral- between storage and compute devices is, when possible, to lel filesystems, near the data they manage. This is possible move computation closer to the storage devices. Similarly by exploiting the underutilized processor and memory re- to the processing-in-memory (PIM) approach for random ac- sources of storage nodes that are implemented using general cess memory [16], the active disk concept was proposed for purpose servers and operating systems. In this paper, we hard disk storage systems [1, 15, 24]. The active disk idea present a novel user-space implementation of Active Storage exploits the processing power of the embedded hard drive for Lustre, and compare it to the traditional kernel-based controller to process the data on the disk without the need implementation.
    [Show full text]
  • Design and Implementation of Archival Storage Component of OAIS Reference Model
    Masaryk University Faculty of Informatics Design and implementation of Archival Storage component of OAIS Reference Model Master’s Thesis Jan Tomášek Brno, Spring 2018 Masaryk University Faculty of Informatics Design and implementation of Archival Storage component of OAIS Reference Model Master’s Thesis Jan Tomášek Brno, Spring 2018 Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Jan Tomášek Advisor: doc. RNDr. Tomáš Pitner Ph.D. i Acknowledgement I would like to express my gratitude to my supervisor, doc. RNDr. Tomáš Pitner, Ph.D. for valuable advice concerning formal aspects of my thesis, readiness to help and unprecedented responsiveness. Next, I would like to thank to RNDr. Miroslav Bartošek, CSc. for providing me with long-term preservation literature, to RNDr. Michal Růžička, Ph.D. for managing the very helpful internal documents of the ARCLib project, and to the whole ARCLib team for the great col- laboration and willingness to meet up and discuss questions emerged during the implementation. I would also like to thank to my colleagues from the inQool com- pany for providing me with this opportunity, help in solving of practi- cal issues, and for the time flexibility needed to finish this work. Last but not least, I would like to thank to my closest family and friends for all the moral support. iii Abstract This thesis deals with the development of the Archival Storage module of the Reference Model for an Open Archival Information System.
    [Show full text]