HPC Storage, Part 1&2

Linux Clusters Institute: HPC Storage, Part 1&2 Rutgers University, 19-23 August 2019 Garrett McGrath, Princeton Neuroscience Institute [email protected] HPC Storage Concepts, Planning and Implementation Targets for Session #1 Target Audience: Those involved in designing, implementing, or managing HPC storage systems. Outline: ● Concepts and Terminology ● Goals & Requirements ● Storage Hardware ● File Systems ● Wrap Up Concepts and Terminology What is Storage? A place to store data. Either temporarily or permanently. ● Processor Cache ○ Fastest access; closest to the CPU; temporary CPU ● System Memory (DRAM) Registers ○ Very Fast access; close to CPU but not on it; temporary Latency & Size Increase Cache ● Solid State Storage (L1, L2, L3,) ○ Fast access ○ Can be system internal or part of an external storage system ○ Capable of high densities with high associated costs Memory ● Spinning Disk (DRAM, HBM, ) ○ Slow; performance is tied to access behavior Solid State Disk ○ Can be system internal or part of an external storage system (SATA SSD, M.2 Module, ○ Capable of extremely high densities PCIe Card) ● Tape ○ Extremely slow; typically found in only in libraries Spinning Disks Bandwidth Increase(PMR, SMR, HAMR/MAMR) Tape (DLT-S, DAT, AIT, LTO, QIC) Concepts and Terminology • IOPs: Input/Output Operations per second • RAID: Redundant Array of Inexpensive Disk • JBOD: Just a Bunch of Disk • RAS: reliability, accessibility, serviceability • Storage Server: provide direct access to storage device and functions as a data manager for that disk • Storage Client: accesses data, but plays no role in data management • LAN: local area network • WAN: wide area network • SAN: storage area network Concepts and Terminology • High Availability (HA) • Components are configured in failover pairs • Prevents a single point of failure in the system • Prevents a service outage Storage • Failover Pairs • Active/Active • Both component share the load • On failure one component takes over the complete load • Active/Passive Active Standby Controller Controller • One component services requests, the other is in standby • On failure the standby becomes active • Networks • InfiniBand (IB) • Ethernet (TCP/IP) • Host Connectivity • Host Bus Adapter (HBA) DN DN DN DN • Network Interface Card (NIC) Concepts and Terminology • Raw Space: what the disk label shows. Typically given 3 Drives in base 10. 2 Storage Drives • 10TB (terabyte) == 10*10^12 bytes 1 Parity Drive 30% RAID overhead • Useable Space: what `df` shows once the storage is mounted. Typically given in base 2. • 10TiB (tebibyte) == 10*2^40 bytes 4 Drives 3 Storage Drives 1 Parity Drive Useable space is often about 30% smaller than raw 25% RAID overhead space • Some space is used for RAID overhead, file system overhead, etc. File System overhead is applied after RAID overhead further reducing the usable space. • Learning how to calculate this is a challenge • Dependent on levels of redundancy and the file system you choose Goals and Requirements Which Storage Architecture Is Best? ● Short Answer: Whichever solution solves all your problems ● Long Answer: There is no single best solution for all scenarios ○ Each is designed to solve specific problems and serve specific requirements ○ Each works well when built and deployed according to their strengths ○ Usage requirements and access patterns define which is the best choice ■ Application Requirements ■ User Expectations ■ Budget Constraints ■ Expertise in the support team ● Compromise based on competing needs is almost always the end result Storage System Design Goal: Balance ● The Ideal: ○ All components of the system contribute equally to the overall performance of the system ● The Reality: ○ Competing needs will lead to compromises that cause imbalances in the system. Common Imbalances: ● Capacity is prioritized over bandwidth; the number of disks exceeds the performance capabilities of the controllers, disk interconnect, or HBAs ● Overall output of the storage system exceeds the network capacity of the computational systems Requirements Evaluation • Stakeholders • I/O Profiles • Computational Users • Serial I/O • Management • Parallel I/O • Policy Managers • MapReduce I/O • Funding Agencies • Large Files • System Administration Staff • Small Files • Infrastructure Support Staff • Infrastructure Profile • IT Security Staff • Integrated with HPC resource • Usage Patterns • Standalone storage solution • Write dominate • Network connectivity • Read dominate • Security requirements • Streaming I/O vs Random I/O • User Profiles • Expert vs. Beginner • Custom vs. commercial application Gathering Stakeholder Requirements • Who are your stakeholders? • What is the distribution of files? • What features are they looking for? • Sizes, count • How will people want to use the storage? • What is the typical I/O pattern? • What usage policies need to be supported? • How many bytes are written for every byte • From what science/usage domains are the users? read? • What applications will they be using? • How many bytes are read for each file opened? • How much space do they anticipate needing? • How many bytes are written for each file • Can they define the performance characteristics opened? they need? • Are there any system-based restrictions? • Are there expectations of access from multiple • POSIX conformance - do you need a POSIX systems? interface to the file system • Limitations on number of files or files per directory • Network compatibility (IB, Eth) Application I/O Access Patterns • An application’s I/O transaction size, and the order in which they are accessed, defines an application’s I/O access pattern. This is a combination of how the application does I/O along with how the file system handles I/O requests. • For typical HPC file systems, sequential I/O of large blocks provides the best performance. Unfortunately, these types of I/O patterns aren’t the most common. • Understanding the I/O access patterns of your major applications can help you design a solution your users will be happy with. Common Data Access Patterns ● Streaming (bandwidth centric) ○ Records Accessed Only Once, file is read/written from beginning to end ○ Minimal overall IOPS ○ File tend to be large and performance is measured in bandwidth ○ Common in Digital Media, HPC, Scientific Applications, DSP ● Discrete File I/O (IOP centric) ○ Small individual transactions; may not even read a full block at a time ■ Small files, random access ○ File IOPS can be high ○ Common in bioinformatics, rendering, home directories ● Transaction Processing (IOP centric) ○ Small transactions with good temporal locality; individual updates maybe smaller than a block but consecutive transactions tend to be in continuous blocks ○ File IOPS can be high ○ Common in databases and commercial applications HPC I/O Access Patterns ● Traditional HPC ○ Streaming large block writes (low IOPs rates) ○ Large output files ○ Minimal metadata operations ● More common today ○ Random I/O patterns (high IOPs rates) ○ Smaller output files ○ Large number of metadata operations Challenges: ● Choosing a block size that fits your application I/O pattern ● IOPs becomes more important with random I/O patterns and small files Gathering Data Requirements • Do you need different tiers or types of storage? • Active long-term (project space) • Temporary (scratch space) • Archive (disk or tape) • Backups (snapshots, disk, tape) • Encryption • Data Restrictions • HIPAA and PHI • ITAR • FISMA • PCI DSS • And many more (SOX, GLBA, CJIS, FERPA, SOC, …) • Ingest/Outgest • Data transfer characteristics Training and Support Requirements • Training • System Support • Sys Admin Staff • Does the vendor provide support for all • How much training does your staff components of your system? need? • Do support for parts of your system come • Vendor supplied training? from the open source community? • Does someone on your staff have the • What are the support requirements for your expertise to provide training? staff? • User Services • 7x24 • How much training does your users • 8x5 M-F support staff need? • Do you have Service Level Agreements (SLAs) • Does you user support staff have the with your user community? expertise to provide user training? • Users • How much training will your users need to effectively use the system? • How often will training need to be provided? Common Storage Usage • Temporary storage for intermediate job results • Data pre-processing • Typical ‘scratch’ usage • Data post-processing • Active long-term storage for runtime use • Data serving • Backups • Data portals • Archival • Web services • Data transfer services (DTNs) • Virtual machine hosting • NFS/CIFS • Database hosting • Centralized software repositories • Data ingestion • System Administration Storage • Log files • Monitoring • Cluster management tools Common Design Tradeoffs • Aggregate Speed or Bandwidth • Capacity • Scalability • Cost/Budget • Physical Space • Environmental Needs • Power, cooling, etc. • Reliability/Redundancy Features • Sys Admin features • Management tools • Monitoring • Vendor support • Community support Storage Hardware Storage Characteristics • Controllers • Block Storage • Host Connections • Object Storage • Chassis • Networks • Drawers/Trays • SAN, LAN, WAN • Disk Channels • Tape Drives • SAS, SATA Protocols • Tapes • Disks • Disk Cache • Spinning • Solid state • SAS, SATA, NVMe Protocols • JBOD • RAID Storage Evaluation • System Features • User Interfaces • Data integrity features • POSIX based file system • RAID, erasure

Load more