High Performance Storage

Total Page:16

File Type:pdf, Size:1020Kb

High Performance Storage Linux Clusters Instute: High Performance Storage University of Oklahoma, 05/19/2015 Mehmet Belgin, Georgia Tech [email protected] (in collaboraon with Wesley Emeneker) 18-22 May 2015 1 The Fundamental Ques.on • How do we meet *all* user needs for storage? • Is it even possible? • Confounding factors • User expectaons (in their own words) • Budget constraints • Applicaon needs and use cases • Exper9se in team • Exis9ng infrastructure 18-22 May 2015 2 Examples to Common Storage Systems • Network File System (NFS) – a distributed file system protocol for accessing files over a network. • Lustre – a parallel, distributed file system • OSS – object storage server. This server stores stores and manages pieces of files (aka objects) • OST – object storage target. This disk is managed by the OSS and stores data • MDS – metadata server. This server stores file metadata. • MDT – metadata target. This disk is managed by the MDS and stores file metadata • General Parallel File System (GPFS) – a parallel, distributed file system. • Metadata is not owned by any par9cular server or set of servers. • All clients par9cipate in filesystem management • NSD – network storage device • Panasas/PanFS – a parallel, distributed file system • Metadata is owned by director blades • File data is owned by storage blades 18-22 May 2015 3 Nomenclature • Object store – a place where chunks of data (aka objects) are stored. Objects are not files, though they can store individual files or different pieces of files. • Raw space – what the the disk label shows. Typically given in base 10. i.e. 10TB (terabyte) == 10*10^12 bytes • Usable space - what “df” shows once the storage is mounted. Typically given in base 2. i.e. 10TiB (tebibyte) == 10*2^40 bytes • Usable space is o_en about 30% smaller (some9mes more, some9mes less) than raw space. 18-22 May 2015 4 Which one is right for me? Lustre 18-22 May 2015 5 The End. Thanks for par9cipang! 18-22 May 2015 6 Before we start… What is a File System? 18-22 May 2015 7 What is a filesystem? • A system for files (Duh!) • A source of constant frustraon • A filesystem is used to control how data is stored and retrieved –Wikipedia • It’s a container (that contains files) • It’s the set of disks, servers (computaonal components), networking, and so_ware • All of the above 18-22 May 2015 8 Disclaimer • There are no right answers • There are wrong answers • No, seriously. • It comes down to balancing tradeoffs of preferences, exper9se, costs, and case-by-case analysis 18-22 May 2015 9 Know Your Stakeholders … and keep all of them happy! (at the same 9me) 1. Users 2. Managers and University Leadership 3. University support staff 4. System administrators Managers 5. Vendor Users Sysadmins 18-22 May 2015 10 What do you need to support? Common Storage Requirements (which most users can’t ar9culate) • Temporary storage for intermediate results from jobs (a.k.a scratch) • Long-term storage for run9me use • Backups • Archive • Expor9ng said filesystem to other machines (like a user's Windows XP laptop) • Virtual Machine hos9ng • Database hos9ng • Map/Reduce (a.k.a Hadoop) • Data ingest and outgest (DMZ?) • System Administrator storage 18-22 May 2015 11 Tradeoffs First, try to define ‘use purpose’ and ‘operaonal life9me’… • Speed (… is a relave term!) • Space • Cost • Scalability • Administrave burden • Monitoring • Reliability/Redundancy • Features • Support from vendor 18-22 May 2015 12 Parallel/Distributed vs. Serial Filesystems* Serial • It doesn’t scale beyond a single server • It o_en isn't easy to make it reliable or redundant beyond a single server • A single server controls everything Parallel • Speed increases as more components are added to it • Built for distributed redundancy and reliability • Mul9ple servers contribute to the management of the filesystem *None of these things are 100% true 18-22 May 2015 13 The Most Common Solu.ons for HPC Want to access your data from everywhere? You need “Network Aoached Storage (NAS)”! • NFS (serial-ish) • GPFS (Parallel) • Lustre (Parallel) • Panasas (Parallel) • What about others like OrangeFS, Gluster, Ceph, XtreemFS, CIFS, HDFS, Swi_, etc.? 18-22 May 2015 14 Prepare for a Challenge • NFS low • Panasas Administrave Burden & needed experse • GPFS (anectodal) high • Lustre • Your mileage may vary! 18-22 May 2015 15 Network File System (NFS) • Can be built from commodity parts or purchased as an appliance • A single server typically controls everything *Speed *Space *Cost *Scalability *Administrave Burden *Monitoring • Where does it fall for our tradeoffs? *Reliability/Redundancy • No so_ware cost *Features *Vendor Support • Compable (not 100% POSIX) • Underlying Filesystem does not maer much (ZFS, ext3, …) • True redundancy is harder (single point of failure) • Mostly for low-volume, low-throughput workloads • Strong client side caching, works well for small files • Requires minimal exper9se and (relavely) easy to manage 18-22 May 2015 16 General Parallel File System (GPFS) • Can be built from commodity parts or purchsed as an appliance • All nodes in the GPFS cluster par9cipate NSD Server NSD Server in filesystem management Network • Metadata is managed by every node in the cluster Client Client • Where does it fall in our tradeoffs? *Speed *Space *Cost *Scalability *Administrave Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support 18-22 May 2015 17 Lustre • Can be built from commodity parts, or purchased as an appliance • Separate servers for data and metadata • Where does it fall in our tradeoffs? *Speed *Space *Cost *Scalability *Administrave Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support * Image credit: nor-tech.com 18-22 May 2015 18 Panasas • Is an appliance • Separate servers for metadata and data • Where does it fall in our tradeoffs? *Speed *Space *Cost *Scalability *Administrave Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support * Image credit: panasas.com 18-22 May 2015 19 Appliances Screenshot of Panasas management tool • Appliances generally come with vendor tools for monitoring and management • Do these tools increase or decrease management complexity? • How important is vendor support for your team? 18-22 May 2015 20 Good idea? Bad idea? Let’s discuss! • NFS for everything • Panasas for everything • Lustre for everything • GPFS for everything 18-22 May 2015 21 How about… • Lustre for work (files stored here are temporary) • NFS for home • Tape for backup and archival • Lustre available everywhere • Tape available on data movers • NFS only available on login machines 18-22 May 2015 22 Designing your storage soluon • Who are the stakeholders? • How quickly should we be able to read any one file? • How will people want to use it? • How much training will you need? • How much training will your users need to effec9vely use your storage? • Do you have the knowledge necessary to do the training? • How o_en do they need the training? • Do you need different 9ers or types of storage? • Long-term • Temporary • Archive • From what science/usage domains are the users? • aka what applicaons will they be using? • What features are necessary? 18-22 May 2015 23 Applica.on Driven Tradeoffs • Domain Science • Chemistry • Aerospace • Bio* (biology, bioinformacs, biomedical) • Physics • Business • Economics • etc. • Data and Applicaon Restric9ons • HIPAA and PHI • ITAR • PCI DSS • And many more (SOX, GLBA, CJIS, FERPA, SOC, …) 18-22 May 2015 24 What you need to know • What is the distribu9on of files? • sizes, count • What is the expected workload? • How many bytes are wrioen for every byte read? • How many bytes are read for each file opened? • How many bytes are wrioen for each file opened? • Are there any system-based restric9ons? • POSIX conformance. Do you need a POSIX Filesystem? • Limitaons on number of files or files per directories • Network compability (IB vs. Ethernet) 18-22 May 2015 25 Use Case: Data Movement • Scenario: User needs to import a lot of data • Where is the data coming from? • Campus LAN? • Campus WAN? • WAN? • How o_en will the data be ingested? • Does it need to be outgested? • What kind of data is it? • Is it a one-9me ingest or regular? 18-22 May 2015 26 Designing your storage soluon • What technologies do you need to sasfy the requirements that you now have? • Can you put a number on the following? • Minimum disk throughput from a single compute node • Minimum aggregate throughput for the en9re filesystem for a benchmark (like iozone or IOR) • I/O load for representave workloads from your site • How much data and metadata is read/wrioen per job? • Temporary space requirements • Archive and backup space requirements • How much churn is there in data that needs to be backed up? 18-22 May 2015 27 Storage Devices • Solid State speed & cost capacity o Serial ATA (SATA): $/byte, large capacity, less • RAM reliable, slower (7.2k RPM) low • PCIe SSD high o Serial Aached SCSI (SAS): $$/byte, small • SATA/SAS SSD capacity, reliable, fast (15k RPM) • Spinning Disk o Nearline-SAS: SATA drives with SAS interface: • SAS more reliable than SATA, cheaper than SAS, ~SATA speeds but with lower overhead • NL-SAS • SATA o Solid State Disk (SSD): No spinning disks, $$$/ low high byte, blazing fast, reliable1 • Tape 18-22 May 2015 28 What is an IOP? • IOP == Input/Output Operaon • IOPS == Input/Output Operaons per Second • We care about two IOPS reports • The number we tell people when we say “Our Veridian Dynamics Frobulator 2021 gets 300PiB/s bandwidth!” • The number that affects users “Our Veridian Dynamics Frobulator 2021 only gets 5KiB/s for <insert your applicaon’s name>” • Why the difference? 18-22 May 2015 29 More tradeoffs … Space vs. Speed • Do you need 10GiB/s and 10TiB of space? • Do you need 1PiB of usable storage and 1GiB/s? • How do you meet your requirements? Large vs. Small Files • What is a small file? • No hard rule. It depends on how you define it.
Recommended publications
  • Efficient Implementation of Data Objects in the OSD+-Based Fusion
    Efficient Implementation of Data Objects in the OSD+-Based Fusion Parallel File System Juan Piernas(B) and Pilar Gonz´alez-F´erez Departamento de Ingenier´ıa y Tecnolog´ıa de Computadores, Universidad de Murcia, Murcia, Spain piernas,pilar @ditec.um.es { } Abstract. OSD+s are enhanced object-based storage devices (OSDs) able to deal with both data and metadata operations via data and direc- tory objects, respectively. So far, we have focused on designing and implementing efficient directory objects in OSD+s. This paper, however, presents our work on also supporting data objects, and describes how the coexistence of both kinds of objects in OSD+s is profited to efficiently implement data objects and to speed up some commonfile operations. We compare our OSD+-based Fusion Parallel File System (FPFS) with Lustre and OrangeFS. Results show that FPFS provides a performance up to 37 better than Lustre, and up to 95 better than OrangeFS, × × for metadata workloads. FPFS also provides 34% more bandwidth than OrangeFS for data workloads, and competes with Lustre for data writes. Results also show serious scalability problems in Lustre and OrangeFS. Keywords: FPFS OSD+ Data objects Lustre OrangeFS · · · · 1 Introduction File systems for HPC environment have traditionally used a cluster of data servers for achieving high rates in read and write operations, for providing fault tolerance and scalability, etc. However, due to a growing number offiles, and an increasing use of huge directories with millions or billions of entries accessed by thousands of processes at the same time [3,8,12], some of thesefile systems also utilize a cluster of specialized metadata servers [6,10,11] and have recently added support for distributed directories [7,10].
    [Show full text]
  • Gluster Roadmap: Recent Improvements and Upcoming Features
    Gluster roadmap: Recent improvements and upcoming features Niels de Vos GlusterFS co-maintainer [email protected] Agenda ● Introduction into Gluster ● Quick Start ● Current stable releases ● History of feature additions ● Plans for the upcoming 3.8 and 4.0 release ● Detailed description of a few select features FOSDEM, 30 January 2016 2 What is GlusterFS? ● Scalable, general-purpose storage platform ● POSIX-y Distributed File System ● Object storage (swift) ● Distributed block storage (qemu) ● Flexible storage (libgfapi) ● No Metadata Server ● Heterogeneous Commodity Hardware ● Flexible and Agile Scaling ● Capacity – Petabytes and beyond ● Performance – Thousands of Clients FOSDEM, 30 January 2016 3 Terminology ● Brick ● Fundamentally, a filesystem mountpoint ● A unit of storage used as a capacity building block ● Translator ● Logic between the file bits and the Global Namespace ● Layered to provide GlusterFS functionality FOSDEM, 30 January 2016 4 Terminology ● Volume ● Bricks combined and passed through translators ● Ultimately, what's presented to the end user ● Peer / Node ● Server hosting the brick filesystems ● Runs the Gluster daemons and participates in volumes ● Trusted Storage Pool ● A group of peers, like a “Gluster cluster” FOSDEM, 30 January 2016 5 Scale-out and Scale-up FOSDEM, 30 January 2016 6 Distributed Volume ● Files “evenly” spread across bricks ● Similar to file-level RAID 0 ● Server/Disk failure could be catastrophic FOSDEM, 30 January 2016 7 Replicated Volume ● Copies files to multiple bricks ● Similar to file-level
    [Show full text]
  • Storage Virtualization for KVM – Putting the Pieces Together
    Storage Virtualization for KVM – Putting the pieces together Bharata B Rao – [email protected] Deepak C Shettty – [email protected] M Mohan Kumar – [email protected] (IBM Linux Technology Center, Bangalore) Balamurugan Aramugam - [email protected] Shireesh Anjal – [email protected] (RedHat, Bangalore) Aug 2012 LPC2012 Linux is a registered trademark of Linus Torvalds. Agenda ● Problems around storage in virtualization ● GlusterFS as virt-ready file system – QEMU-GlusterFS integration – GlusterFS – Block device translator ● Virtualization management - oVirt and VDSM – VDSM-GlusterFS integration ● Storage integration – libstoragemgmt Problems in storage/FS in KVM virtualization ● Multiple choices for file system and virtualization management ● Lack of virtualization aware file systems ● File systems/storage functionality implemented in other layers of virtualization stack – Snapshots, block streaming, image formats in QEMU ● No well defined interface points in the virtualization stack for storage integration ● No standard interface/APIs available for services like backup and restore ● Need for a single FS/storage solution that works for local, SAN and NAS storage – Mixing different types of storage into a single filesystem namespace GlusterFS ● User space distributed file system that scales to several petabytes ● Aggregates storage resources from multiple nodes and presents a unified file system namespace GlusterFS - features ● Replication ● Striping ● Distribution ● Geo-replication/sync ● Online volume extension
    [Show full text]
  • The Parallel File System Lustre
    The parallel file system Lustre Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT – University of the State Rolandof Baden Laifer-Württemberg – Internal and SCC Storage Workshop National Laboratory of the Helmholtz Association www.kit.edu Overview Basic Lustre concepts Lustre status Vendors New features Pros and cons INSTITUTSLustre-, FAKULTÄTS systems-, ABTEILUNGSNAME at (inKIT der Masteransicht ändern) Complexity of underlying hardware Remarks on Lustre performance 2 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Basic Lustre concepts Client ClientClient Directory operations, file open/close File I/O & file locking metadata & concurrency INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Recovery,Masteransicht ändern)file status, Metadata Server file creation Object Storage Server Lustre componets: Clients offer standard file system API (POSIX) Metadata servers (MDS) hold metadata, e.g. directory data, and store them on Metadata Targets (MDTs) Object Storage Servers (OSS) hold file contents and store them on Object Storage Targets (OSTs) All communicate efficiently over interconnects, e.g. with RDMA 3 16.4.2014 Roland Laifer – Internal SCC Storage Workshop Steinbuch Centre for Computing Lustre status (1) Huge user base about 70% of Top100 use Lustre Lustre HW + SW solutions available from many vendors: DDN (via resellers, e.g. HP, Dell), Xyratex – now Seagate (via resellers, e.g. Cray, HP), Bull, NEC, NetApp, EMC, SGI Lustre is Open Source INSTITUTS-, LotsFAKULTÄTS of organizational-, ABTEILUNGSNAME
    [Show full text]
  • A Fog Storage Software Architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein
    A Fog storage software architecture for the Internet of Things Bastien Confais, Adrien Lebre, Benoît Parrein To cite this version: Bastien Confais, Adrien Lebre, Benoît Parrein. A Fog storage software architecture for the Internet of Things. Advances in Edge Computing: Massive Parallel Processing and Applications, IOS Press, pp.61-105, 2020, Advances in Parallel Computing, 978-1-64368-062-0. 10.3233/APC200004. hal- 02496105 HAL Id: hal-02496105 https://hal.archives-ouvertes.fr/hal-02496105 Submitted on 2 Mar 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. November 2019 A Fog storage software architecture for the Internet of Things Bastien CONFAIS a Adrien LEBRE b and Benoˆıt PARREIN c;1 a CNRS, LS2N, Polytech Nantes, rue Christian Pauc, Nantes, France b Institut Mines Telecom Atlantique, LS2N/Inria, 4 Rue Alfred Kastler, Nantes, France c Universite´ de Nantes, LS2N, Polytech Nantes, Nantes, France Abstract. The last prevision of the european Think Tank IDATE Digiworld esti- mates to 35 billion of connected devices in 2030 over the world just for the con- sumer market. This deep wave will be accompanied by a deluge of data, applica- tions and services.
    [Show full text]
  • Globalfs: a Strongly Consistent Multi-Site File System
    GlobalFS: A Strongly Consistent Multi-Site File System Leandro Pacheco Raluca Halalai Valerio Schiavoni University of Lugano University of Neuchatelˆ University of Neuchatelˆ Fernando Pedone Etienne Riviere` Pascal Felber University of Lugano University of Neuchatelˆ University of Neuchatelˆ Abstract consistency, availability, and tolerance to partitions. Our goal is to ensure strongly consistent file system operations This paper introduces GlobalFS, a POSIX-compliant despite node failures, at the price of possibly reduced geographically distributed file system. GlobalFS builds availability in the event of a network partition. Weak on two fundamental building blocks, an atomic multicast consistency is suitable for domain-specific applications group communication abstraction and multiple instances of where programmers can anticipate and provide resolution a single-site data store. We define four execution modes and methods for conflicts, or work with last-writer-wins show how all file system operations can be implemented resolution methods. Our rationale is that for general-purpose with these modes while ensuring strong consistency and services such as a file system, strong consistency is more tolerating failures. We describe the GlobalFS prototype in appropriate as it is both more intuitive for the users and detail and report on an extensive performance assessment. does not require human intervention in case of conflicts. We have deployed GlobalFS across all EC2 regions and Strong consistency requires ordering commands across show that the system scales geographically, providing replicas, which needs coordination among nodes at performance comparable to other state-of-the-art distributed geographically distributed sites (i.e., regions). Designing file systems for local commands and allowing for strongly strongly consistent distributed systems that provide good consistent operations over the whole system.
    [Show full text]
  • Glusterfs Documentation Release 3.8.0
    GlusterFS Documentation Release 3.8.0 Gluster Community Aug 10, 2016 Contents 1 Quick Start Guide 3 1.1 Single Node Cluster...........................................3 1.2 Multi Node Cluster............................................4 2 Overview and Concepts 7 2.1 Volume Types..............................................7 2.2 FUSE................................................... 10 2.3 Translators................................................ 12 2.4 Geo-Replication............................................. 17 2.5 Terminologies.............................................. 19 3 Installation Guide 23 3.1 Getting Started.............................................. 23 3.2 Configuration............................................... 24 3.3 Installing Gluster............................................. 26 3.4 Overview................................................. 27 3.5 Quick Start Guide............................................ 28 3.6 Setup Baremetal............................................. 29 3.7 Deploying in AWS............................................ 30 3.8 Setting up in virtual machines...................................... 31 4 Administrator Guide 33 5 Upgrade Guide 35 6 Contributors Guide 37 6.1 Adding your blog............................................. 37 6.2 Bug Lifecycle.............................................. 37 6.3 Bug Reporting Guidelines........................................ 38 6.4 Bug Triage Guidelines.......................................... 41 7 Changelog 47 8 Presentations 49 i ii GlusterFS
    [Show full text]
  • Storage Systems and Input/Output 2018 Pre-Workshop Document
    Storage Systems and Input/Output 2018 Pre-Workshop Document Gaithersburg, Maryland September 19-20, 2018 Meeting Organizers Robert Ross (ANL) (lead organizer) Glenn Lockwood (LBL) Lee Ward (SNL) (co-lead) Kathryn Mohror (LLNL) Gary Grider (LANL) Bradley Settlemyer (LANL) Scott Klasky (ORNL) Pre-Workshop Document Contributors Philip Carns (ANL) Quincey Koziol (LBL) Matthew Wolf (ORNL) 1 Table of Contents 1 Table of Contents 2 2 Executive Summary 4 3 Introduction 5 4 Mission Drivers 7 4.1 Overview 7 4.2 Workload Characteristics 9 4.2.1 Common observations 9 4.2.2 An example: Adjoint-based sensitivity analysis 10 4.3 Input/Output Characteristics 11 4.4 Implications of In Situ Analysis on the SSIO Community 14 4.5 Data Organization and Archiving 15 4.6 Metadata and Provenance 18 4.7 Summary 20 5 Computer Science Challenges 21 5.1 Hardware/Software Architectures 21 5.1.1 Storage Media and Interfaces 21 5.1.2 Networks 21 5.1.3 Active Storage 22 5.1.4 Resilience 23 5.1.5 Understandability 24 5.1.6 Autonomics 25 5.1.7 Security 26 5.1.8 New Paradigms 27 5.2 Metadata, Name Spaces, and Provenance 28 5.2.1 Metadata 28 5.2.2 Namespaces 30 5.2.3 Provenance 30 5.3 Supporting Science Workflows - SAK 32 5.3.1 DOE Extreme Scale Use cases - SAK 33 5.3.2 Programming Model Integration - (Workflow Composition for on line workflows, and for offline workflows ) - MW 33 5.3.3 Workflows (Engine) - Provision and Placement MW 34 5.3.4 I/O Middleware and Libraries (Connectivity) - both on-and offline, (not or) 35 2 Storage Systems and Input/Output 2018 Pre-Workshop
    [Show full text]
  • Evaluation of Active Storage Strategies for the Lustre Parallel File System
    Evaluation of Active Storage Strategies for the Lustre Parallel File System Juan Piernas Jarek Nieplocha Evan J. Felix Pacific Northwest National Pacific Northwest National Pacific Northwest National Laboratory Laboratory Laboratory P.O. Box 999 P.O. Box 999 P.O. Box 999 Richland, WA 99352 Richland, WA 99352 Richland, WA 99352 [email protected] [email protected] [email protected] ABSTRACT umes of data remains a challenging problem. Despite the Active Storage provides an opportunity for reducing the improvements of storage capacities, the cost of bandwidth amount of data movement between storage and compute for moving data between the processing nodes and the stor- nodes of a parallel filesystem such as Lustre, and PVFS. age devices has not improved at the same rate as the disk ca- It allows certain types of data processing operations to be pacity. One approach to reduce the bandwidth requirements performed directly on the storage nodes of modern paral- between storage and compute devices is, when possible, to lel filesystems, near the data they manage. This is possible move computation closer to the storage devices. Similarly by exploiting the underutilized processor and memory re- to the processing-in-memory (PIM) approach for random ac- sources of storage nodes that are implemented using general cess memory [16], the active disk concept was proposed for purpose servers and operating systems. In this paper, we hard disk storage systems [1, 15, 24]. The active disk idea present a novel user-space implementation of Active Storage exploits the processing power of the embedded hard drive for Lustre, and compare it to the traditional kernel-based controller to process the data on the disk without the need implementation.
    [Show full text]
  • On the Performance Variation in Modern Storage Stacks
    On the Performance Variation in Modern Storage Stacks Zhen Cao1, Vasily Tarasov2, Hari Prasath Raman1, Dean Hildebrand2, and Erez Zadok1 1Stony Brook University and 2IBM Research—Almaden Appears in the proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17) Abstract tions on different machines have to compete for heavily shared resources, such as network switches [9]. Ensuring stable performance for storage stacks is im- In this paper we focus on characterizing and analyz- portant, especially with the growth in popularity of ing performance variations arising from benchmarking hosted services where customers expect QoS guaran- a typical modern storage stack that consists of a file tees. The same requirement arises from benchmarking system, a block layer, and storage hardware. Storage settings as well. One would expect that repeated, care- stacks have been proven to be a critical contributor to fully controlled experiments might yield nearly identi- performance variation [18, 33, 40]. Furthermore, among cal performance results—but we found otherwise. We all system components, the storage stack is the corner- therefore undertook a study to characterize the amount stone of data-intensive applications, which become in- of variability in benchmarking modern storage stacks. In creasingly more important in the big data era [8, 21]. this paper we report on the techniques used and the re- Although our main focus here is reporting and analyz- sults of this study. We conducted many experiments us- ing the variations in benchmarking processes, we believe ing several popular workloads, file systems, and storage that our observations pave the way for understanding sta- devices—and varied many parameters across the entire bility issues in production systems.
    [Show full text]
  • HFAA: a Generic Socket API for Hadoop File Systems
    HFAA: A Generic Socket API for Hadoop File Systems Adam Yee Jeffrey Shafer University of the Pacific University of the Pacific Stockton, CA Stockton, CA [email protected] jshafer@pacific.edu ABSTRACT vices: one central NameNode and many DataNodes. The Hadoop is an open-source implementation of the MapReduce NameNode is responsible for maintaining the HDFS direc- programming model for distributed computing. Hadoop na- tory tree. Clients contact the NameNode in order to perform tively integrates with the Hadoop Distributed File System common file system operations, such as open, close, rename, (HDFS), a user-level file system. In this paper, we intro- and delete. The NameNode does not store HDFS data itself, duce the Hadoop Filesystem Agnostic API (HFAA) to allow but rather maintains a mapping between HDFS file name, Hadoop to integrate with any distributed file system over a list of blocks in the file, and the DataNode(s) on which TCP sockets. With this API, HDFS can be replaced by dis- those blocks are stored. tributed file systems such as PVFS, Ceph, Lustre, or others, thereby allowing direct comparisons in terms of performance Although HDFS stores file data in a distributed fashion, and scalability. Unlike previous attempts at augmenting file metadata is stored in the centralized NameNode service. Hadoop with new file systems, the socket API presented here While sufficient for small-scale clusters, this design prevents eliminates the need to customize Hadoop’s Java implementa- Hadoop from scaling beyond the resources of a single Name- tion, and instead moves the implementation responsibilities Node. Prior analysis of CPU and memory requirements for to the file system itself.
    [Show full text]
  • Lustre* Software Release 2.X Operations Manual Lustre* Software Release 2.X: Operations Manual Copyright © 2010, 2011 Oracle And/Or Its Affiliates
    Lustre* Software Release 2.x Operations Manual Lustre* Software Release 2.x: Operations Manual Copyright © 2010, 2011 Oracle and/or its affiliates. (The original version of this Operations Manual without the Intel modifications.) Copyright © 2011, 2012, 2013 Intel Corporation. (Intel modifications to the original version of this Operations Man- ual.) Notwithstanding Intel’s ownership of the copyright in the modifications to the original version of this Operations Manual, as between Intel and Oracle, Oracle and/or its affiliates retain sole ownership of the copyright in the unmodified portions of this Operations Manual. Important Notice from Intel INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IM- PLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSO- EVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR IN- FRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL IN- DEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE AT- TORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCON- TRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
    [Show full text]