HPC Storage, Part 1&2

Total Page:16

File Type:pdf, Size:1020Kb

HPC Storage, Part 1&2 Linux Clusters Institute: HPC Storage, Part 1&2 Rutgers University, 19-23 August 2019 Garrett McGrath, Princeton Neuroscience Institute [email protected] HPC Storage Concepts, Planning and Implementation Targets for Session #1 Target Audience: Those involved in designing, implementing, or managing HPC storage systems. Outline: ● Concepts and Terminology ● Goals & Requirements ● Storage Hardware ● File Systems ● Wrap Up Concepts and Terminology What is Storage? A place to store data. Either temporarily or permanently. ● Processor Cache ○ Fastest access; closest to the CPU; temporary CPU ● System Memory (DRAM) Registers ○ Very Fast access; close to CPU but not on it; temporary Latency & Size Increase Cache ● Solid State Storage (L1, L2, L3,) ○ Fast access ○ Can be system internal or part of an external storage system ○ Capable of high densities with high associated costs Memory ● Spinning Disk (DRAM, HBM, ) ○ Slow; performance is tied to access behavior Solid State Disk ○ Can be system internal or part of an external storage system (SATA SSD, M.2 Module, ○ Capable of extremely high densities PCIe Card) ● Tape ○ Extremely slow; typically found in only in libraries Spinning Disks Bandwidth Increase(PMR, SMR, HAMR/MAMR) Tape (DLT-S, DAT, AIT, LTO, QIC) Concepts and Terminology • IOPs: Input/Output Operations per second • RAID: Redundant Array of Inexpensive Disk • JBOD: Just a Bunch of Disk • RAS: reliability, accessibility, serviceability • Storage Server: provide direct access to storage device and functions as a data manager for that disk • Storage Client: accesses data, but plays no role in data management • LAN: local area network • WAN: wide area network • SAN: storage area network Concepts and Terminology • High Availability (HA) • Components are configured in failover pairs • Prevents a single point of failure in the system • Prevents a service outage Storage • Failover Pairs • Active/Active • Both component share the load • On failure one component takes over the complete load • Active/Passive Active Standby Controller Controller • One component services requests, the other is in standby • On failure the standby becomes active • Networks • InfiniBand (IB) • Ethernet (TCP/IP) • Host Connectivity • Host Bus Adapter (HBA) DN DN DN DN • Network Interface Card (NIC) Concepts and Terminology • Raw Space: what the disk label shows. Typically given 3 Drives in base 10. 2 Storage Drives • 10TB (terabyte) == 10*10^12 bytes 1 Parity Drive 30% RAID overhead • Useable Space: what `df` shows once the storage is mounted. Typically given in base 2. • 10TiB (tebibyte) == 10*2^40 bytes 4 Drives 3 Storage Drives 1 Parity Drive Useable space is often about 30% smaller than raw 25% RAID overhead space • Some space is used for RAID overhead, file system overhead, etc. File System overhead is applied after RAID overhead further reducing the usable space. • Learning how to calculate this is a challenge • Dependent on levels of redundancy and the file system you choose Goals and Requirements Which Storage Architecture Is Best? ● Short Answer: Whichever solution solves all your problems ● Long Answer: There is no single best solution for all scenarios ○ Each is designed to solve specific problems and serve specific requirements ○ Each works well when built and deployed according to their strengths ○ Usage requirements and access patterns define which is the best choice ■ Application Requirements ■ User Expectations ■ Budget Constraints ■ Expertise in the support team ● Compromise based on competing needs is almost always the end result Storage System Design Goal: Balance ● The Ideal: ○ All components of the system contribute equally to the overall performance of the system ● The Reality: ○ Competing needs will lead to compromises that cause imbalances in the system. Common Imbalances: ● Capacity is prioritized over bandwidth; the number of disks exceeds the performance capabilities of the controllers, disk interconnect, or HBAs ● Overall output of the storage system exceeds the network capacity of the computational systems Requirements Evaluation • Stakeholders • I/O Profiles • Computational Users • Serial I/O • Management • Parallel I/O • Policy Managers • MapReduce I/O • Funding Agencies • Large Files • System Administration Staff • Small Files • Infrastructure Support Staff • Infrastructure Profile • IT Security Staff • Integrated with HPC resource • Usage Patterns • Standalone storage solution • Write dominate • Network connectivity • Read dominate • Security requirements • Streaming I/O vs Random I/O • User Profiles • Expert vs. Beginner • Custom vs. commercial application Gathering Stakeholder Requirements • Who are your stakeholders? • What is the distribution of files? • What features are they looking for? • Sizes, count • How will people want to use the storage? • What is the typical I/O pattern? • What usage policies need to be supported? • How many bytes are written for every byte • From what science/usage domains are the users? read? • What applications will they be using? • How many bytes are read for each file opened? • How much space do they anticipate needing? • How many bytes are written for each file • Can they define the performance characteristics opened? they need? • Are there any system-based restrictions? • Are there expectations of access from multiple • POSIX conformance - do you need a POSIX systems? interface to the file system • Limitations on number of files or files per directory • Network compatibility (IB, Eth) Application I/O Access Patterns • An application’s I/O transaction size, and the order in which they are accessed, defines an application’s I/O access pattern. This is a combination of how the application does I/O along with how the file system handles I/O requests. • For typical HPC file systems, sequential I/O of large blocks provides the best performance. Unfortunately, these types of I/O patterns aren’t the most common. • Understanding the I/O access patterns of your major applications can help you design a solution your users will be happy with. Common Data Access Patterns ● Streaming (bandwidth centric) ○ Records Accessed Only Once, file is read/written from beginning to end ○ Minimal overall IOPS ○ File tend to be large and performance is measured in bandwidth ○ Common in Digital Media, HPC, Scientific Applications, DSP ● Discrete File I/O (IOP centric) ○ Small individual transactions; may not even read a full block at a time ■ Small files, random access ○ File IOPS can be high ○ Common in bioinformatics, rendering, home directories ● Transaction Processing (IOP centric) ○ Small transactions with good temporal locality; individual updates maybe smaller than a block but consecutive transactions tend to be in continuous blocks ○ File IOPS can be high ○ Common in databases and commercial applications HPC I/O Access Patterns ● Traditional HPC ○ Streaming large block writes (low IOPs rates) ○ Large output files ○ Minimal metadata operations ● More common today ○ Random I/O patterns (high IOPs rates) ○ Smaller output files ○ Large number of metadata operations Challenges: ● Choosing a block size that fits your application I/O pattern ● IOPs becomes more important with random I/O patterns and small files Gathering Data Requirements • Do you need different tiers or types of storage? • Active long-term (project space) • Temporary (scratch space) • Archive (disk or tape) • Backups (snapshots, disk, tape) • Encryption • Data Restrictions • HIPAA and PHI • ITAR • FISMA • PCI DSS • And many more (SOX, GLBA, CJIS, FERPA, SOC, …) • Ingest/Outgest • Data transfer characteristics Training and Support Requirements • Training • System Support • Sys Admin Staff • Does the vendor provide support for all • How much training does your staff components of your system? need? • Do support for parts of your system come • Vendor supplied training? from the open source community? • Does someone on your staff have the • What are the support requirements for your expertise to provide training? staff? • User Services • 7x24 • How much training does your users • 8x5 M-F support staff need? • Do you have Service Level Agreements (SLAs) • Does you user support staff have the with your user community? expertise to provide user training? • Users • How much training will your users need to effectively use the system? • How often will training need to be provided? Common Storage Usage • Temporary storage for intermediate job results • Data pre-processing • Typical ‘scratch’ usage • Data post-processing • Active long-term storage for runtime use • Data serving • Backups • Data portals • Archival • Web services • Data transfer services (DTNs) • Virtual machine hosting • NFS/CIFS • Database hosting • Centralized software repositories • Data ingestion • System Administration Storage • Log files • Monitoring • Cluster management tools Common Design Tradeoffs • Aggregate Speed or Bandwidth • Capacity • Scalability • Cost/Budget • Physical Space • Environmental Needs • Power, cooling, etc. • Reliability/Redundancy Features • Sys Admin features • Management tools • Monitoring • Vendor support • Community support Storage Hardware Storage Characteristics • Controllers • Block Storage • Host Connections • Object Storage • Chassis • Networks • Drawers/Trays • SAN, LAN, WAN • Disk Channels • Tape Drives • SAS, SATA Protocols • Tapes • Disks • Disk Cache • Spinning • Solid state • SAS, SATA, NVMe Protocols • JBOD • RAID Storage Evaluation • System Features • User Interfaces • Data integrity features • POSIX based file system • RAID, erasure
Recommended publications
  • HP Storageworks Clustered File System Command Line Reference
    HP StorageWorks Clustered File System 3.0 Command Line reference guide *392372-001* *392372–001* Part number: 392372–001 First edition: May 2005 Legal and notice information © Copyright 1999-2005 PolyServe, Inc. Portions © 2005 Hewlett-Packard Development Company, L.P. Neither PolyServe, Inc. nor Hewlett-Packard Company makes any warranty of any kind with regard to this material, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Neither PolyServe nor Hewlett-Packard shall be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance, or use of this material. This document contains proprietary information, which is protected by copyright. No part of this document may be photocopied, reproduced, or translated into another language without the prior written consent of Hewlett-Packard. The information is provided “as is” without warranty of any kind and is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Neither PolyServe nor HP shall be liable for technical or editorial errors or omissions contained herein. The software this document describes is PolyServe confidential and proprietary. PolyServe and the PolyServe logo are trademarks of PolyServe, Inc. PolyServe Matrix Server contains software covered by the following copyrights and subject to the licenses included in the file thirdpartylicense.pdf, which is included in the PolyServe Matrix Server distribution. Copyright © 1999-2004, The Apache Software Foundation. Copyright © 1992, 1993 Simmule Turner and Rich Salz.
    [Show full text]
  • Shared File Systems: Determining the Best Choice for Your Distributed SAS® Foundation Applications Margaret Crevar, SAS Institute Inc., Cary, NC
    Paper SAS569-2017 Shared File Systems: Determining the Best Choice for your Distributed SAS® Foundation Applications Margaret Crevar, SAS Institute Inc., Cary, NC ABSTRACT If you are planning on deploying SAS® Grid Manager and SAS® Enterprise BI (or other distributed SAS® Foundation applications) with load balanced servers on multiple operating systems instances, , a shared file system is required. In order to determine the best shared file system choice for a given deployment, it is important to understand how the file system is used, the SAS® I/O workload characteristics performed on it, and the stressors that SAS Foundation applications produce on the file system. For the purposes of this paper, we use the term "shared file system" to mean both a clustered file system and shared file system, even though" shared" can denote a network file system and a distributed file system – not clustered. INTRODUCTION This paper examines the shared file systems that are most commonly used with SAS and reviews their strengths and weaknesses. SAS GRID COMPUTING REQUIREMENTS FOR SHARED FILE SYSTEMS Before we get into the reasons why a shared file system is needed for SAS® Grid Computing, let’s briefly discuss the SAS I/O characteristics. GENERAL SAS I/O CHARACTERISTICS SAS Foundation creates a high volume of predominately large-block, sequential access I/O, generally at block sizes of 64K, 128K, or 256K, and the interactions with data storage are significantly different from typical interactive applications and RDBMSs. Here are some major points to understand (more details about the bullets below can be found in this paper): SAS tends to perform large sequential Reads and Writes.
    [Show full text]
  • Data Storage and High-Speed Streaming
    FYS3240 PC-based instrumentation and microcontrollers Data storage and high-speed streaming Spring 2013 – Lecture #8 Bekkeng, 8.1.2013 Data streaming • Data written to or read from a hard drive at a sustained rate is often referred to as streaming • Trends in data storage – Ever-increasing amounts of data – Record “everything” and play it back later – Hard drives: faster, bigger, and cheaper – Solid state drives – RAID hardware – PCI Express • PCI Express provides higher, dedicated bandwidth Overview • Hard drive performance and alternatives • File types • RAID • DAQ software design for high-speed acquisition and storage Streaming Data with the PCI Express Bus • A PCI Express device receives dedicated bandwidth (250 MB/s or more). • Data is transferred from onboard device memory (typically less than 512 MB), across a dedicated PCI Express link, across the I/O bus, and into system memory (RAM; 3 GB or more possible). It can then be transferred from system memory, across the I/O bus, onto hard drives (TB´s of data). The CPU/DMA-controller is responsible for managing this process. • Peer-to-peer data streaming is also possible between two PCI Express devices. PXI: Streaming to/from Hard Disk Drives RAM – Random Access Memory • SRAM – Static RAM: Each bit stored in a flip-flop • DRAM – Dynamic RAM: Each bit stored in a capacitor (transistor). Has to be refreshed (e.g. each 15 ms) – EDO DRAM – Extended Data Out DRAM. Data available while next bit is being set up – Dual-Ported DRAM (VRAM – Video RAM). Two locations can be accessed at the same time – SDRAM – Synchronous DRAM.
    [Show full text]
  • Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques
    Comparative Analysis of Distributed and Parallel File Systems’ Internal Techniques Viacheslav Dubeyko Content 1 TERMINOLOGY AND ABBREVIATIONS ................................................................................ 4 2 INTRODUCTION......................................................................................................................... 5 3 COMPARATIVE ANALYSIS METHODOLOGY ....................................................................... 5 4 FILE SYSTEM FEATURES CLASSIFICATION ........................................................................ 5 4.1 Distributed File Systems ............................................................................................................................ 6 4.1.1 HDFS ..................................................................................................................................................... 6 4.1.2 GFS (Google File System) ....................................................................................................................... 7 4.1.3 InterMezzo ............................................................................................................................................ 9 4.1.4 CodA .................................................................................................................................................... 10 4.1.5 Ceph.................................................................................................................................................... 12 4.1.6 DDFS ..................................................................................................................................................
    [Show full text]
  • RAID Technology
    RAID Technology Reference and Sources: y The most part of text in this guide has been taken from copyrighted document of Adaptec, Inc. on site (www.adaptec.com) y Perceptive Solutions, Inc. RAID stands for Redundant Array of Inexpensive (or sometimes "Independent") Disks. RAID is a method of combining several hard disk drives into one logical unit (two or more disks grouped together to appear as a single device to the host system). RAID technology was developed to address the fault-tolerance and performance limitations of conventional disk storage. It can offer fault tolerance and higher throughput levels than a single hard drive or group of independent hard drives. While arrays were once considered complex and relatively specialized storage solutions, today they are easy to use and essential for a broad spectrum of client/server applications. Redundant Arrays of Inexpensive Disks (RAID) "KILLS - BUGS - DEAD!" -- TV commercial for RAID bug spray There are many applications, particularly in a business environment, where there are needs beyond what can be fulfilled by a single hard disk, regardless of its size, performance or quality level. Many businesses can't afford to have their systems go down for even an hour in the event of a disk failure; they need large storage subsystems with capacities in the terabytes; and they want to be able to insulate themselves from hardware failures to any extent possible. Some people working with multimedia files need fast data transfer exceeding what current drives can deliver, without spending a fortune on specialty drives. These situations require that the traditional "one hard disk per system" model be set aside and a new system employed.
    [Show full text]
  • Memory Systems : Cache, DRAM, Disk
    CHAPTER 24 Storage Subsystems Up to this point, the discussions in Part III of this with how multiple drives within a subsystem can be book have been on the disk drive as an individual organized together, cooperatively, for better reliabil- storage device and how it is directly connected to a ity and performance. This is discussed in Sections host system. This direct attach storage (DAS) para- 24.1–24.3. A second aspect deals with how a storage digm dates back to the early days of mainframe subsystem is connected to its clients and accessed. computing, when disk drives were located close to Some form of networking is usually involved. This is the CPU and cabled directly to the computer system discussed in Sections 24.4–24.6. A storage subsystem via some control circuits. This simple model of disk can be designed to have any organization and use any drive usage and confi guration remained unchanged of the connection methods discussed in this chapter. through the introduction of, fi rst, the mini computers Organization details are usually made transparent to and then the personal computers. Indeed, even today user applications by the storage subsystem presenting the majority of disk drives shipped in the industry are one or more virtual disk images, which logically look targeted for systems having such a confi guration. like disk drives to the users. This is easy to do because However, this simplistic view of the relationship logically a disk is no more than a drive ID and a logical between the disk drive and the host system does not address space associated with it.
    [Show full text]
  • 6Gb/S SATA RAID TB User Manual
    6Gb/s SATA RAID TB T12-S6.TB - Desktop RM12-S6.TB - Rackmount User Manual Version: 1.0 Issue Date: October, 2013 ARCHTTP PROXY SERVER INSTALLATION 5.5 For Mac OS 10.X The ArcHttp proxy server is provided on the software CD delivered with 6Gb/s SATA RAID controller or download from the www.areca. com.tw. The firmware embedded McRAID storage manager can configure and monitor the 6Gb/s SATA RAID controller via ArcHttp proxy server. The Archttp proxy server for Mac pro, please refer to Chapter 4.6 "Driver Installation" for Mac 10.X. 5.6 ArcHttp Configuration The ArcHttp proxy server will automatically assign one additional port for setup its configuration. If you want to change the "archttp- srv.conf" setting up of ArcHttp proxy server configuration, for example: General Configuration, Mail Configuration, and SNMP Configuration, please start Web Browser http:\\localhost: Cfg As- sistant. Such as http:\\localhost: 81. The port number for first con- troller McRAID storage manager is ArcHttp proxy server configura- tion port number plus 1. • General Configuration: Binding IP: Restrict ArcHttp proxy server to bind only single interface (If more than one physical network in the server). HTTP Port#: Value 1~65535. Display HTTP Connection Information To Console: Select “Yes" to show Http send bytes and receive bytes information in the console. Scanning PCI Device: Select “Yes” for ARC-1XXX series controller. Scanning RS-232 Device: No. Scanning Inband Device: No. 111 ARCHTTP PROXY SERVER INSTALLATION • Mail (alert by Mail) Configuration: To enable the controller to send the email function, you need to configure the SMTP function on the ArcHttp software.
    [Show full text]
  • University of California Santa Cruz Incorporating Solid
    UNIVERSITY OF CALIFORNIA SANTA CRUZ INCORPORATING SOLID STATE DRIVES INTO DISTRIBUTED STORAGE SYSTEMS A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in COMPUTER SCIENCE by Rosie Wacha December 2012 The Dissertation of Rosie Wacha is approved: Professor Scott A. Brandt, Chair Professor Carlos Maltzahn Professor Charlie McDowell Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright c by Rosie Wacha 2012 Table of Contents Table of Contents iii List of Figures viii List of Tables xii Abstract xiii Acknowledgements xv 1 Introduction 1 2 Background and Related Work 6 2.1 Data Layouts for Redundancy and Performance . 6 RAID . 8 Parity striping . 10 Parity declustering . 12 Reconstruction performance improvements . 14 iii Disk arrays with higher fault tolerance . 14 2.2 Very Large Storage Arrays . 17 Data placement . 17 Ensuring reliability of data . 19 2.3 Self-Configuring Disk Arrays . 20 HP AutoRAID . 21 Sparing . 22 2.4 Solid-State Drives (SSDs) . 24 2.5 Mitigating RAID’s Small Write Problem . 27 2.6 Low Power Storage Systems . 29 2.7 Real Systems . 31 3 RAID4S: Supercharging RAID Small Writes with SSD 32 3.1 Improving RAID Small Write Performance . 32 3.2 Related Work . 38 All-SSD RAID arrays . 39 Hybrid SSD-HDD RAID arrays . 40 Other solid state technology . 41 3.3 Small Write Performance . 41 3.4 The RAID4S System . 43 3.5 The Low Cost of RAID4S . 46 3.6 Reduced Power Consumption . 48 iv 3.7 RAID4S Simulation Results . 52 Simulated array performance . 56 3.8 Experimental Methodology & Results .
    [Show full text]
  • Which RAID Level Is Right for Me?
    STORAGE SOLUTIONS WHITE PAPER Which RAID Level is Right for Me? Contents Introduction.....................................................................................1 RAID 10 (Striped RAID 1 sets) .................................................3 RAID Level Descriptions..................................................................1 RAID 50 (Striped RAID 5 sets) .................................................4 RAID 0 (Striping).......................................................................1 RAID 60 (Striped RAID 6 sets) .................................................4 RAID 1 (Mirroring).....................................................................2 RAID Level Comparison ..................................................................5 RAID 1E (Striped Mirror)...........................................................2 About Adaptec RAID .......................................................................5 RAID 5 (Striping with parity) .....................................................2 RAID 5EE (Hot Space).....................................................................3 RAID 6 (Striping with dual parity).............................................3 Data is the most valuable asset of any business today. Lost data of users. This white paper intends to give an overview on the means lost business. Even if you backup regularly, you need a performance and availability of various RAID levels in general fail-safe way to ensure that your data is protected and can be and may not be accurate in all user
    [Show full text]
  • Designing High-Performance and Scalable Clustered Network Attached Storage with Infiniband
    DESIGNING HIGH-PERFORMANCE AND SCALABLE CLUSTERED NETWORK ATTACHED STORAGE WITH INFINIBAND DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Ranjit Noronha, MS * * * * * The Ohio State University 2008 Dissertation Committee: Approved by Dhabaleswar K. Panda, Adviser Ponnuswammy Sadayappan Adviser Feng Qin Graduate Program in Computer Science and Engineering c Copyright by Ranjit Noronha 2008 ABSTRACT The Internet age has exponentially increased the volume of digital media that is being shared and distributed. Broadband Internet has made technologies such as high quality streaming video on demand possible. Large scale supercomputers also consume and cre- ate huge quantities of data. This media and data must be stored, cataloged and retrieved with high-performance. Researching high-performance storage subsystems to meet the I/O demands of applications in modern scenarios is crucial. Advances in microprocessor technology have given rise to relatively cheap off-the-shelf hardware that may be put together as personal computers as well as servers. The servers may be connected together by networking technology to create farms or clusters of work- stations (COW). The evolution of COWs has significantly reduced the cost of ownership of high-performance clusters and has allowed users to build fairly large scale machines based on commodity server hardware. As COWs have evolved, networking technologies like InfiniBand and 10 Gigabit Eth- ernet have also evolved. These networking technologies not only give lower end-to-end latencies, but also allow for better messaging throughput between the nodes. This allows us to connect the clusters with high-performance interconnects at a relatively lower cost.
    [Show full text]
  • Newest Trends in High Performance File Systems
    Newest Trends in High Performance File Systems Elena Bergmann Arbeitsbereich Wissenschaftliches Rechnen Fachbereich Informatik Fakult¨atf¨urMathematik, Informatik und Naturwissenschaften Universit¨atHamburg Betreuer Julian Kunkel 2015-11-23 Introduction File Systems Sirocco File System Summary Literature Agenda 1 Introduction 2 File Systems 3 Sirocco File System 4 Summary 5 Literature Elena Bergmann Newest Trends in High Performance File Systems 2015-11-23 2 / 44 Introduction File Systems Sirocco File System Summary Literature Introduction Current situation: Fundamental changes in hardware Core counts are increasing Performance improvement of storage devices is much slower Bigger system, more hardware, more failure probabilities System is in a state of failure at all times And exascale systems? Gap between produced data and storage performance (20 GB/s to 4 GB/s) I/O bandwidth requirement is high Metadata server often bottleneck Scalability not given Elena Bergmann Newest Trends in High Performance File Systems 2015-11-23 3 / 44 Introduction File Systems Sirocco File System Summary Literature Upcoming technologies until 2020 Deeper storage hierarchy (tapes, disc, NVRAM . ) Is traditional input/output technology enough? Will POSIX (Portable Operating System Interface) I/O scale? Non-volatile memory Storage technologies (NVRAM) Location across the hierarchy Node local storage Burst buffers New programming abstractions and workflows New generation of I/O ware and service Elena Bergmann Newest Trends in High Performance File Systems 2015-11-23
    [Show full text]
  • Of File Systems and Storage Models
    Chapter 4 Of File Systems and Storage Models Disks are always full. It is futile to try to get more disk space. Data expands to fill any void. –Parkinson’sLawasappliedto disks 4.1 Introduction This chapter deals primarily with how we store data. Virtually all computer systems require some way to store data permanently; even so-called “diskless” systems do require access to certain files in order to boot, run and be useful. Albeit stored remotely (or in memory), these bits reside on some sort of storage system. Most frequently, data is stored on local hard disks, but over the last few years more and more of our files have moved “into the cloud”, where di↵erent providers o↵er easy access to large amounts of storage over the network. We have more and more computers depending on access to remote systems, shifting our traditional view of what constitutes a storage device. 74 CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 75 As system administrators, we are responsible for all kinds of devices: we build systems running entirely without local storage just as we maintain the massive enterprise storage arrays that enable decentralized data replication and archival. We manage large numbers of computers with their own hard drives, using a variety of technologies to maximize throughput before the data even gets onto a network. In order to be able to optimize our systems on this level, it is important for us to understand the principal concepts of how data is stored, the di↵erent storage models and disk interfaces.Itisimportanttobeawareofcertain physical properties of our storage media, and the impact they, as well as certain historic limitations, have on how we utilize disks.
    [Show full text]