Distributed Metadata Management for Parallel Filesystems

Total Page:16

File Type:pdf, Size:1020Kb

Distributed Metadata Management for Parallel Filesystems Distributed Metadata Management for Parallel Filesystems A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Vilobh Meshram, B.Tech(Computer Science) Graduate Program in Computer Science and Engineering The Ohio State University 2011 Master’s Examination Committee: Dr. D.K. Panda, Advisor Dr. P. Sadayappan c Copyright by Vilobh Meshram 2011 Abstract Much of the research in storage systems has been focused on improving the scale and performance of the data-access throughput that read and write large amounts of file data. Parallel file systems do a good job of scaling large file access bandwidth by striping or sharing I/O resources across many servers or disks. However, the same cannot be said about scaling file metadata operation rates. Most existing parallel filesystems choose to concentrate all the metadata process- ing load on a single server. This centralized processing can guarantee correctness, but it severely hampers scalability. This downside is becoming more and more unac- ceptable as metadata throughput is critical for large scale applications. Distributing metadata processing load is critical to improve metadata scalability when handling huge number of client nodes. However, in such a distributed scenario, a solution to speed up metadata operations has to address two challenges simultaneously, namely scalability and reliability. We propose two approaches to solve the challenges mentioned above for metadata management in parallel filesystems with a focus towards reliability and scalability aspects. As demonstrated by experiments, the approach to solve the problem of dis- tributed metadata management achieves significant improvements over native parallel filesystems by large margin for all the major metadata operations. With 256 client processes, our approach to solve the problem of distributed metadata management ii outperforms Lustre and PVFS2 by a factor of 1.9 and 23, respectively, to create di- rectories. With respect to stat() operation on files, our approach is 1.3 and 3.0 times faster than Lustre and PVFS. iii This work is dedicated to my parents and my sister iv Acknowledgments I consider myself extremely fortunate to have met and worked with some remark- able people during my stay at Ohio State. While a brief note of thanks does not do justice to their impact on my life, I deeply appreciate their contributions. I begin by thanking my adviser, Dr. Dhabaleswar K.Panda. His guidance and advice during the course of my Masters studies have shaped my career. I am thankful to Dr. P. Sadayappan for agreeing to serve on my Master’s examination committee. Special thanks to Xiangyong Ouyang for all the support and help. I would also like to thank Dr.Xavier Besseron for his insightful comments and discussions which helped me to strengthen my thesis. I am especially grateful to Xiangyong, Xavier and Raghu and I feel lucky to have collaborated closely with them. I would like to thank all my friends in the Network Based Computing Research Laboratory for their friendship and support. Finally, I thank my family, especially my parents and my sister. Their love, action, and faith have been a constant source of strength for me. None of this would have been possible without them. v Vita April 18, 1986 . Born - Amravati, India 2007 . .B.Tech., Computer Science, COEP, Pune University, Pune, India. 2007-2009 . Software Development Engineer, Symantec R&D India 2010-2011 . Graduate Research Associate, The Ohio State University Publications Research Publications Vilobh Meshram, Xavier Besseron, Xiangyong Ouyang, Raghunath Rajachandrasekar and Dhabaleswar K. Panda Can a Decentralized Metadata Service Layer benefit Parallel Filesystems?. accepted in IASDS 2011 workshop in conjunction with Cluster 2011 Vilobh Meshram, Xiangyong Ouyang and Dhabaleswar K. Panda Minimizing Lookup RPCs in Lustre File System using Metadata Delegation at Client Side. OSU Technical Report OSU-CISRC-7/11-TR20, July 2011 Raghunath Rajachandrasekar, Xiangyong Ouyang, Xavier Besseron, Vilobh Meshram and Dhabaleswar K. Panda Can Checkpoint/Restart Mechanisms Benefit from Hier- archical Data Staging?. to appear in Reselience 2011 workshop in conjunction with Euro-Par 2011 Fields of Study vi Major Field: Computer Science and Engineering Studies in High Performance Computing: Prof. D. K. Panda vii Table of Contents Page Abstract . ii Dedication . iv Acknowledgments . v Vita......................................... vi List of Tables . xi List of Figures . xii 1. Introduction . 1 1.1 Parallel Filesystems . 3 1.2 Metadata Management in Parallel Filesystems . 5 1.3 Distributed Coordination Service . 8 1.4 Motivation of the Work . 10 1.4.1 Metadata Server Bottlenecks . 10 1.4.2 Consistency management of Metadata . 12 1.5 Problem Statement . 14 1.6 Organization of Thesis . 15 2. Related Work . 16 2.1 Metadata Management approaches . 16 2.2 Scalable filesystem directories . 19 viii 3. Delegating metadata at client side (DMCS) . 22 3.1 RPC Processing in Lustre Filesystem . 22 3.2 Existing Design . 24 3.3 Design and challenges for delegating metadata at client side . 25 3.3.1 Design of communication module . 25 3.3.2 Design of DMCS approach . 26 3.3.3 Challenges . 30 3.3.4 Metadata revocation . 31 3.3.5 Distributed Lock management for DMCS approach . 31 3.4 Performance Evaluation . 32 3.4.1 File Open IOPS: Varying Number of Client Processes . 34 3.4.2 File Open IOPS: Varying File Pool Size . 34 3.4.3 File Open IOPS: Varying File path Depth . 36 3.5 Summary . 37 4. Design of a Decentralized Metadata Service Layer for Distributed Meta- data Management . 39 4.1 Detailed design of Distributed Union FileSystem (DUFS) . 39 4.1.1 Implementation Overview . 41 4.1.2 FUSE-based Filesystem Interface . 42 4.2 ZooKeeper-based Metadata Management . 43 4.2.1 File Identifier . 44 4.2.2 Deterministic mapping function . 45 4.2.3 Back-end storage . 45 4.3 Algorithm examples for Metadata operations . 46 4.3.1 Reliability concerns . 46 4.4 Performance Evaluation . 48 4.4.1 Distributed coordination service throughput and memory us- age experiments . 49 4.4.2 Scalability Experiments . 52 4.4.3 Experiments with varying number of distributed coordina- tion service servers . 52 4.4.4 Experiment with different number of mounts combined using DUFS . 55 4.4.5 Experiments with different back-end parallel filesystems . 58 4.5 Summary . 60 ix 5. Contributions and Future Work . 62 5.1 Summary of Research Contributions and Future Work . 62 5.1.1 Delegating metadata at client side . 63 5.1.2 Design of a decentralized metadata service layer for distributed metadata management . 64 Bibliography . 66 x List of Tables Table Page 1.1 LDLM and Oprofile Experiments . 7 1.2 Transaction throughput with a fixed file pool size of 1,000 files . 11 1.3 Transaction throughput with varying file pool . 12 1.4 Transaction throughput with varying file pool . 12 3.1 Metadata operation rates with different underlying storage . 30 xi List of Figures Figure Page 1.1 Basic Lustre Design . 4 1.2 Zookeeper Design . 9 1.3 Example of consistency issue with 2 clients and 2 MetaData servers . 13 3.1 Design of DMCS approach . 27 3.2 File open IOPS, Each Process Accesses 10,000 Files . 35 3.3 File open IOPS, Using 16 Client Processes . 36 3.4 Time to Finish open, Using 16 Processes Each Accessing 10,000 Files 37 4.1 DUFS mapping from the virtual path to the physical path using File Identifier (FID) . 40 4.2 DUFS overview. A, B, C and D show the steps required to perform an open() operation. 41 4.3 Sample physical filename generated from a given FID . 46 4.4 Algorithm for the mkdir() operation . 47 4.5 Algorithm for the stat() operation . 47 4.6 ZooKeeper throughput for basic operations by varying the number of ZooKeeper Servers . 50 xii 4.7 Zookeeper memory usage and its comparison with DUFS and basic FUSE based file system memory usage . 51 4.8 Scalability experiments with 8 Client nodes and varying number of client processes . 53 4.9 Scalability experiments with 16 Client nodes and varying number of client processes . 54 4.10 Operation throughput by varying the number of Zookeeper Servers . 56 4.11 File operation throughput for different numbers of back-end storage . 57 4.12 Operation throughput with respect to the number of clients for Lustre and PVFS2 . 59 xiii Chapter 1: INTRODUCTION High-performance computing (HPC) is an integral part of today’s scientific, eco- nomic, social, and commercial fabric. We depend on HPC systems and applications for a wide range of activities such as climate modeling, drug research, weather fore- casting, and energy exploration. HPC systems enable researchers and scientists to discover the origins of the universe, design automobiles and airplanes, predict weather patterns, model global trade, and develop life-saving drugs. Because of the nature of the problems that they are trying to solve, HPC applications are often data-intensive. Scientific applications in astrophysics (CHIMERA and VULCAN2D), climate mod- eling (POP), combustion (S3D), fusion (GTC), visualization, astronomy, and other fields generate or consume large volumes of data. This data is on the order of ter- abytes and petabytes and is often shared by the entire scientific community. Today’s computational requirements are increasing at a geometric rate that involves large quantities of data. While the computational power of microprocessors has kept pace with Moore’s law as a result of increased chip densities, performance improvements in magnetic storage have not seen a corresponding increase. The result has been an in- creasing gap between the computational power and the I/O subsystem performance of current HPC systems. Hence, while supercomputers keep getting faster, we do 1 not see a corresponding improvement in application performance, because of the I/O bandwidth bottleneck.
Recommended publications
  • Optimizing the Ceph Distributed File System for High Performance Computing
    Optimizing the Ceph Distributed File System for High Performance Computing Kisik Jeong Carl Duffy Jin-Soo Kim Joonwon Lee Sungkyunkwan University Seoul National University Seoul National University Sungkyunkwan University Suwon, South Korea Seoul, South Korea Seoul, South Korea Suwon, South Korea [email protected] [email protected] [email protected] [email protected] Abstract—With increasing demand for running big data ana- ditional HPC workloads. Traditional HPC workloads perform lytics and machine learning workloads with diverse data types, reads from a large shared file stored in a file system, followed high performance computing (HPC) systems consequently need by writes to the file system in a parallel manner. In this case, to support diverse types of storage services. Ceph is one possible candidate for such HPC environments, as Ceph provides inter- a storage service should provide good sequential read/write faces for object, block, and file storage. Ceph, however, is not performance. However, Ceph divides large files into a number designed for HPC environments, thus it needs to be optimized of chunks and distributes them to several different disks. This for HPC workloads. In this paper, we find and analyze problems feature has two main problems in HPC environments: 1) Ceph that arise when running HPC workloads on Ceph, and propose translates sequential accesses into random accesses, 2) Ceph a novel optimization technique called F2FS-split, based on the F2FS file system and several other optimizations. We measure needs to manage many files, incurring high journal overheads the performance of Ceph in HPC environments, and show that in the underlying file system.
    [Show full text]
  • Proxmox Ve Mit Ceph &
    PROXMOX VE MIT CEPH & ZFS ZUKUNFTSSICHERE INFRASTRUKTUR IM RECHENZENTRUM Alwin Antreich Proxmox Server Solutions GmbH FrOSCon 14 | 10. August 2019 Alwin Antreich Software Entwickler @ Proxmox 15 Jahre in der IT als Willkommen! System / Netzwerk Administrator FrOSCon 14 | 10.08.2019 2/33 Proxmox Server Solutions GmbH Aktive Community Proxmox seit 2005 Globales Partnernetz in Wien (AT) Proxmox Mail Gateway Enterprise (AGPL,v3) Proxmox VE (AGPL,v3) Support & Services FrOSCon 14 | 10.08.2019 3/33 Traditionelle Infrastruktur FrOSCon 14 | 10.08.2019 4/33 Hyperkonvergenz FrOSCon 14 | 10.08.2019 5/33 Hyperkonvergente Infrastruktur FrOSCon 14 | 10.08.2019 6/33 Voraussetzung für Hyperkonvergenz CPU / RAM / Netzwerk / Storage Verwende immer genug von allem. FrOSCon 14 | 10.08.2019 7/33 FrOSCon 14 | 10.08.2019 8/33 Was ist ‚das‘? ● Ceph & ZFS - software-basierte Storagelösungen ● ZFS lokal / Ceph verteilt im Netzwerk ● Hervorragende Performance, Verfügbarkeit und https://ceph.io/ Skalierbarkeit ● Verwaltung und Überwachung mit Proxmox VE ● Technischer Support für Ceph & ZFS inkludiert in Proxmox Subskription http://open-zfs.org/ FrOSCon 14 | 10.08.2019 9/33 FrOSCon 14 | 10.08.2019 10/33 FrOSCon 14 | 10.08.2019 11/33 ZFS Architektur FrOSCon 14 | 10.08.2019 12/33 ZFS ARC, L2ARC and ZIL With ZIL Without ZIL ARC RAM ARC RAM ZIL ZIL Application HDD Application SSD HDD FrOSCon 14 | 10.08.2019 13/33 FrOSCon 14 | 10.08.2019 14/33 FrOSCon 14 | 10.08.2019 15/33 Ceph Network Ceph Docs: https://docs.ceph.com/docs/master/ FrOSCon 14 | 10.08.2019 16/33 FrOSCon
    [Show full text]
  • Red Hat Openstack* Platform with Red Hat Ceph* Storage
    Intel® Data Center Blocks for Cloud – Red Hat* OpenStack* Platform with Red Hat Ceph* Storage Reference Architecture Guide for deploying a private cloud based on Red Hat OpenStack* Platform with Red Hat Ceph Storage using Intel® Server Products. Rev 1.0 January 2017 Intel® Server Products and Solutions <Blank page> Intel® Data Center Blocks for Cloud – Red Hat* OpenStack* Platform with Red Hat Ceph* Storage Document Revision History Date Revision Changes January 2017 1.0 Initial release. 3 Intel® Data Center Blocks for Cloud – Red Hat® OpenStack® Platform with Red Hat Ceph Storage Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Learn more at Intel.com, or from the OEM or retailer. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.
    [Show full text]
  • CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N
    CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N. Gutierrez Center for Education and Research Information Assurance and Security Purdue University, West Lafayette, IN 47907-2086 DECEPTIVE MEMORY SYSTEMS ADissertation Submitted to the Faculty of Purdue University by Christopher N. Gutierrez In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2017 Purdue University West Lafayette, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF DISSERTATION APPROVAL Dr. Eugene H. Spa↵ord, Co-Chair Department of Computer Science Dr. Saurabh Bagchi, Co-Chair Department of Computer Science Dr. Dongyan Xu Department of Computer Science Dr. Mathias Payer Department of Computer Science Approved by: Dr. Voicu Popescu by Dr. William J. Gorman Head of the Graduate Program iii This work is dedicated to my wife, Gina. Thank you for all of your love and support. The moon awaits us. iv ACKNOWLEDGMENTS Iwould liketothank ProfessorsEugeneSpa↵ord and SaurabhBagchi for their guidance, support, and advice throughout my time at Purdue. Both have been instru­ mental in my development as a computer scientist, and I am forever grateful. I would also like to thank the Center for Education and Research in Information Assurance and Security (CERIAS) for fostering a multidisciplinary security culture in which I had the privilege to be part of. Special thanks to Adam Hammer and Ronald Cas­ tongia for their technical support and Thomas Yurek for his programming assistance for the experimental evaluation. I am grateful for the valuable feedback provided by the members of my thesis committee, Professor Dongyen Xu, and Professor Math­ ias Payer.
    [Show full text]
  • Bull SAS: Novascale B260 (Intel Xeon Processor 5110,1.60Ghz)
    SPEC CINT2006 Result spec Copyright 2006-2014 Standard Performance Evaluation Corporation Bull SAS SPECint2006 = 10.2 NovaScale B260 (Intel Xeon processor 5110,1.60GHz) SPECint_base2006 = 9.84 CPU2006 license: 20 Test date: Dec-2006 Test sponsor: Bull SAS Hardware Availability: Dec-2006 Tested by: Bull SAS Software Availability: Dec-2006 0 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 12.7 400.perlbench 11.6 8.64 401.bzip2 8.41 6.59 403.gcc 6.38 11.9 429.mcf 12.7 12.0 445.gobmk 10.6 6.90 456.hmmer 6.72 10.8 458.sjeng 9.90 11.0 462.libquantum 10.8 17.1 464.h264ref 16.8 9.22 471.omnetpp 8.38 7.84 473.astar 7.83 12.5 483.xalancbmk 12.4 SPECint_base2006 = 9.84 SPECint2006 = 10.2 Hardware Software CPU Name: Intel Xeon 5110 Operating System: Windows Server 2003 Enterprise Edition (32 bits) CPU Characteristics: 1.60 GHz, 4MB L2, 1066MHz bus Service Pack1 CPU MHz: 1600 Compiler: Intel C++ Compiler for IA32 version 9.1 Package ID W_CC_C_9.1.033 Build no 20061103Z FPU: Integrated Microsoft Visual Studio .NET 2003 (lib & linker) CPU(s) enabled: 1 core, 1 chip, 2 cores/chip MicroQuill SmartHeap Library 8.0 (shlW32M.lib) CPU(s) orderable: 1 to 2 chips Auto Parallel: No Primary Cache: 32 KB I + 32 KB D on chip per core File System: NTFS Secondary Cache: 4 MB I+D on chip per chip System State: Default L3 Cache: None Base Pointers: 32-bit Other Cache: None Peak Pointers: 32-bit Memory: 8 GB (2GB DIMMx4, FB-DIMM PC2-5300F ECC CL5) Other Software: None Disk Subsystem: 73 GB SAS, 10000RPM Other Hardware: None
    [Show full text]
  • HTTP-FUSE Xenoppix
    HTTP-FUSE Xenoppix Kuniyasu Suzaki† Toshiki Yagi† Kengo Iijima† Kenji Kitagawa†† Shuichi Tashiro††† National Institute of Advanced Industrial Science and Technology† Alpha Systems Inc.†† Information-Technology Promotion Agency, Japan††† {k.suzaki,yagi-toshiki,k-iijima}@aist.go.jp [email protected], [email protected] Abstract a CD-ROM. Furthermore it requires remaking the entire CD-ROM when a bit of data is up- dated. The other solution is a Virtual Machine We developed “HTTP-FUSE Xenoppix” which which enables us to install many OSes and ap- boots Linux, Plan9, and NetBSD on Virtual plications easily. However, that requires in- Machine Monitor “Xen” with a small bootable stalling virtual machine software. (6.5MB) CD-ROM. The bootable CD-ROM in- cludes boot loader, kernel, and miniroot only We have developed “Xenoppix” [1], which and most part of files are obtained via Internet is a combination of CD/DVD bootable Linux with network loopback device HTTP-FUSE “KNOPPIX” [2] and Virtual Machine Monitor CLOOP. It is made from cloop (Compressed “Xen” [3, 4]. Xenoppix boots Linux (KNOP- Loopback block device) and FUSE (Filesys- PIX) as Host OS and NetBSD or Plan9 as Guest tem USErspace). HTTP-FUSE CLOOP can re- OS with a bootable DVD only. KNOPPIX construct a block device from many small block is advanced in automatic device detection and files of HTTP servers. In this paper we describe driver integration. It prepares the Xen environ- the detail of the implementation and its perfor- ment and Guest OSes don’t need to worry about mance. lack of device drivers.
    [Show full text]
  • The Parallel File System Lustre
    The parallel file system Lustre Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT – University of the State Rolandof Baden Laifer-Württemberg – Internal and SCC Storage Workshop National Laboratory of the Helmholtz Association www.kit.edu Overview Basic Lustre concepts Lustre status Vendors New features Pros and cons INSTITUTSLustre-, FAKULTÄTS systems-, ABTEILUNGSNAME at (inKIT der Masteransicht ändern) Complexity of underlying hardware Remarks on Lustre performance 2 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Basic Lustre concepts Client ClientClient Directory operations, file open/close File I/O & file locking metadata & concurrency INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Recovery,Masteransicht ändern)file status, Metadata Server file creation Object Storage Server Lustre componets: Clients offer standard file system API (POSIX) Metadata servers (MDS) hold metadata, e.g. directory data, and store them on Metadata Targets (MDTs) Object Storage Servers (OSS) hold file contents and store them on Object Storage Targets (OSTs) All communicate efficiently over interconnects, e.g. with RDMA 3 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Lustre status (1) Huge user base about 70% of Top100 use Lustre Lustre HW + SW solutions available from many vendors: DDN (via resellers, e.g. HP, Dell), Xyratex – now Seagate (via resellers, e.g. Cray, HP), Bull, NEC, NetApp, EMC, SGI Lustre is Open Source INSTITUTS-, LotsFAKULTÄTS of organizational-, ABTEILUNGSNAME
    [Show full text]
  • A Fog Storage Software Architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein
    A Fog storage software architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein To cite this version: Bastien Confais, Adrien Lebre, Benoît Parrein. A Fog storage software architecture for the Internet of Things. Advances in Edge Computing: Massive Parallel Processing and Applications, IOS Press, pp.61-105, 2020, Advances in Parallel Computing, 978-1-64368-062-0. 10.3233/APC200004. hal- 02496105 HAL Id: hal-02496105 https://hal.archives-ouvertes.fr/hal-02496105 Submitted on 2 Mar 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. November 2019 A Fog storage software architecture for the Internet of Things Bastien CONFAIS a Adrien LEBRE b and Benoˆıt PARREIN c;1 a CNRS, LS2N, Polytech Nantes, rue Christian Pauc, Nantes, France b Institut Mines Telecom Atlantique, LS2N/Inria, 4 Rue Alfred Kastler, Nantes, France c Universite´ de Nantes, LS2N, Polytech Nantes, Nantes, France Abstract. The last prevision of the european Think Tank IDATE Digiworld esti- mates to 35 billion of connected devices in 2030 over the world just for the con- sumer market. This deep wave will be accompanied by a deluge of data, applica- tions and services.
    [Show full text]
  • Red Hat Data Analytics Infrastructure Solution
    TECHNOLOGY DETAIL RED HAT DATA ANALYTICS INFRASTRUCTURE SOLUTION TABLE OF CONTENTS 1 INTRODUCTION ................................................................................................................ 2 2 RED HAT DATA ANALYTICS INFRASTRUCTURE SOLUTION ..................................... 2 2.1 The evolution of analytics infrastructure ....................................................................................... 3 Give data scientists and data 2.2 Benefits of a shared data repository on Red Hat Ceph Storage .............................................. 3 analytics teams access to their own clusters without the unnec- 2.3 Solution components ...........................................................................................................................4 essary cost and complexity of 3 TESTING ENVIRONMENT OVERVIEW ............................................................................ 4 duplicating Hadoop Distributed File System (HDFS) datasets. 4 RELATIVE COST AND PERFORMANCE COMPARISON ................................................ 6 4.1 Findings summary ................................................................................................................................. 6 Rapidly deploy and decom- 4.2 Workload details .................................................................................................................................... 7 mission analytics clusters on 4.3 24-hour ingest ........................................................................................................................................8
    [Show full text]
  • Globalfs: a Strongly Consistent Multi-Site File System
    GlobalFS: A Strongly Consistent Multi-Site File System Leandro Pacheco Raluca Halalai Valerio Schiavoni University of Lugano University of Neuchatelˆ University of Neuchatelˆ Fernando Pedone Etienne Riviere` Pascal Felber University of Lugano University of Neuchatelˆ University of Neuchatelˆ Abstract consistency, availability, and tolerance to partitions. Our goal is to ensure strongly consistent file system operations This paper introduces GlobalFS, a POSIX-compliant despite node failures, at the price of possibly reduced geographically distributed file system. GlobalFS builds availability in the event of a network partition. Weak on two fundamental building blocks, an atomic multicast consistency is suitable for domain-specific applications group communication abstraction and multiple instances of where programmers can anticipate and provide resolution a single-site data store. We define four execution modes and methods for conflicts, or work with last-writer-wins show how all file system operations can be implemented resolution methods. Our rationale is that for general-purpose with these modes while ensuring strong consistency and services such as a file system, strong consistency is more tolerating failures. We describe the GlobalFS prototype in appropriate as it is both more intuitive for the users and detail and report on an extensive performance assessment. does not require human intervention in case of conflicts. We have deployed GlobalFS across all EC2 regions and Strong consistency requires ordering commands across show that the system scales geographically, providing replicas, which needs coordination among nodes at performance comparable to other state-of-the-art distributed geographically distributed sites (i.e., regions). Designing file systems for local commands and allowing for strongly strongly consistent distributed systems that provide good consistent operations over the whole system.
    [Show full text]
  • Comparing Filesystem Performance: Red Hat Enterprise Linux 6 Vs
    COMPARING FILE SYSTEM I/O PERFORMANCE: RED HAT ENTERPRISE LINUX 6 VS. MICROSOFT WINDOWS SERVER 2012 When choosing an operating system platform for your servers, you should know what I/O performance to expect from the operating system and file systems you select. In the Principled Technologies labs, using the IOzone file system benchmark, we compared the I/O performance of two operating systems and file system pairs, Red Hat Enterprise Linux 6 with ext4 and XFS file systems, and Microsoft Windows Server 2012 with NTFS and ReFS file systems. Our testing compared out-of-the-box configurations for each operating system, as well as tuned configurations optimized for better performance, to demonstrate how a few simple adjustments can elevate I/O performance of a file system. We found that file systems available with Red Hat Enterprise Linux 6 delivered better I/O performance than those shipped with Windows Server 2012, in both out-of- the-box and optimized configurations. With I/O performance playing such a critical role in most business applications, selecting the right file system and operating system combination is critical to help you achieve your hardware’s maximum potential. APRIL 2013 A PRINCIPLED TECHNOLOGIES TEST REPORT Commissioned by Red Hat, Inc. About file system and platform configurations While you can use IOzone to gauge disk performance, we concentrated on the file system performance of two operating systems (OSs): Red Hat Enterprise Linux 6, where we examined the ext4 and XFS file systems, and Microsoft Windows Server 2012 Datacenter Edition, where we examined NTFS and ReFS file systems.
    [Show full text]
  • Hypervisors Vs. Lightweight Virtualization: a Performance Comparison
    2015 IEEE International Conference on Cloud Engineering Hypervisors vs. Lightweight Virtualization: a Performance Comparison Roberto Morabito, Jimmy Kjällman, and Miika Komu Ericsson Research, NomadicLab Jorvas, Finland [email protected], [email protected], [email protected] Abstract — Virtualization of operating systems provides a container and alternative solutions. The idea is to quantify the common way to run different services in the cloud. Recently, the level of overhead introduced by these platforms and the lightweight virtualization technologies claim to offer superior existing gap compared to a non-virtualized environment. performance. In this paper, we present a detailed performance The remainder of this paper is structured as follows: in comparison of traditional hypervisor based virtualization and Section II, literature review and a brief description of all the new lightweight solutions. In our measurements, we use several technologies and platforms evaluated is provided. The benchmarks tools in order to understand the strengths, methodology used to realize our performance comparison is weaknesses, and anomalies introduced by these different platforms in terms of processing, storage, memory and network. introduced in Section III. The benchmark results are presented Our results show that containers achieve generally better in Section IV. Finally, some concluding remarks and future performance when compared with traditional virtual machines work are provided in Section V. and other recent solutions. Albeit containers offer clearly more dense deployment of virtual machines, the performance II. BACKGROUND AND RELATED WORK difference with other technologies is in many cases relatively small. In this section, we provide an overview of the different technologies included in the performance comparison.
    [Show full text]