High Performance Storage

Linux Clusters Instute: High Performance Storage University of Oklahoma, 05/19/2015 Mehmet Belgin, Georgia Tech [email protected] (in collaboraon with Wesley Emeneker) 18-22 May 2015 1 The Fundamental Ques.on • How do we meet *all* user needs for storage? • Is it even possible? • Confounding factors • User expectaons (in their own words) • Budget constraints • Applicaon needs and use cases • Exper9se in team • Exis9ng infrastructure 18-22 May 2015 2 Examples to Common Storage Systems • Network File System (NFS) – a distributed file system protocol for accessing files over a network. • Lustre – a parallel, distributed file system • OSS – object storage server. This server stores stores and manages pieces of files (aka objects) • OST – object storage target. This disk is managed by the OSS and stores data • MDS – metadata server. This server stores file metadata. • MDT – metadata target. This disk is managed by the MDS and stores file metadata • General Parallel File System (GPFS) – a parallel, distributed file system. • Metadata is not owned by any par9cular server or set of servers. • All clients par9cipate in filesystem management • NSD – network storage device • Panasas/PanFS – a parallel, distributed file system • Metadata is owned by director blades • File data is owned by storage blades 18-22 May 2015 3 Nomenclature • Object store – a place where chunks of data (aka objects) are stored. Objects are not files, though they can store individual files or different pieces of files. • Raw space – what the the disk label shows. Typically given in base 10. i.e. 10TB (terabyte) == 10*10^12 bytes • Usable space - what “df” shows once the storage is mounted. Typically given in base 2. i.e. 10TiB (tebibyte) == 10*2^40 bytes • Usable space is o_en about 30% smaller (some9mes more, some9mes less) than raw space. 18-22 May 2015 4 Which one is right for me? Lustre 18-22 May 2015 5 The End. Thanks for par9cipang! 18-22 May 2015 6 Before we start… What is a File System? 18-22 May 2015 7 What is a filesystem? • A system for files (Duh!) • A source of constant frustraon • A filesystem is used to control how data is stored and retrieved –Wikipedia • It’s a container (that contains files) • It’s the set of disks, servers (computaonal components), networking, and so_ware • All of the above 18-22 May 2015 8 Disclaimer • There are no right answers • There are wrong answers • No, seriously. • It comes down to balancing tradeoffs of preferences, exper9se, costs, and case-by-case analysis 18-22 May 2015 9 Know Your Stakeholders … and keep all of them happy! (at the same 9me) 1. Users 2. Managers and University Leadership 3. University support staff 4. System administrators Managers 5. Vendor Users Sysadmins 18-22 May 2015 10 What do you need to support? Common Storage Requirements (which most users can’t ar9culate) • Temporary storage for intermediate results from jobs (a.k.a scratch) • Long-term storage for run9me use • Backups • Archive • Expor9ng said filesystem to other machines (like a user's Windows XP laptop) • Virtual Machine hos9ng • Database hos9ng • Map/Reduce (a.k.a Hadoop) • Data ingest and outgest (DMZ?) • System Administrator storage 18-22 May 2015 11 Tradeoffs First, try to define ‘use purpose’ and ‘operaonal life9me’… • Speed (… is a relave term!) • Space • Cost • Scalability • Administrave burden • Monitoring • Reliability/Redundancy • Features • Support from vendor 18-22 May 2015 12 Parallel/Distributed vs. Serial Filesystems* Serial • It doesn’t scale beyond a single server • It o_en isn't easy to make it reliable or redundant beyond a single server • A single server controls everything Parallel • Speed increases as more components are added to it • Built for distributed redundancy and reliability • Mul9ple servers contribute to the management of the filesystem *None of these things are 100% true 18-22 May 2015 13 The Most Common Solu.ons for HPC Want to access your data from everywhere? You need “Network Aoached Storage (NAS)”! • NFS (serial-ish) • GPFS (Parallel) • Lustre (Parallel) • Panasas (Parallel) • What about others like OrangeFS, Gluster, Ceph, XtreemFS, CIFS, HDFS, Swi_, etc.? 18-22 May 2015 14 Prepare for a Challenge • NFS low • Panasas Administrave Burden & needed experse • GPFS (anectodal) high • Lustre • Your mileage may vary! 18-22 May 2015 15 Network File System (NFS) • Can be built from commodity parts or purchased as an appliance • A single server typically controls everything *Speed *Space *Cost *Scalability *Administrave Burden *Monitoring • Where does it fall for our tradeoffs? *Reliability/Redundancy • No so_ware cost *Features *Vendor Support • Compable (not 100% POSIX) • Underlying Filesystem does not maer much (ZFS, ext3, …) • True redundancy is harder (single point of failure) • Mostly for low-volume, low-throughput workloads • Strong client side caching, works well for small files • Requires minimal exper9se and (relavely) easy to manage 18-22 May 2015 16 General Parallel File System (GPFS) • Can be built from commodity parts or purchsed as an appliance • All nodes in the GPFS cluster par9cipate NSD Server NSD Server in filesystem management Network • Metadata is managed by every node in the cluster Client Client • Where does it fall in our tradeoffs? *Speed *Space *Cost *Scalability *Administrave Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support 18-22 May 2015 17 Lustre • Can be built from commodity parts, or purchased as an appliance • Separate servers for data and metadata • Where does it fall in our tradeoffs? *Speed *Space *Cost *Scalability *Administrave Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support * Image credit: nor-tech.com 18-22 May 2015 18 Panasas • Is an appliance • Separate servers for metadata and data • Where does it fall in our tradeoffs? *Speed *Space *Cost *Scalability *Administrave Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support * Image credit: panasas.com 18-22 May 2015 19 Appliances Screenshot of Panasas management tool • Appliances generally come with vendor tools for monitoring and management • Do these tools increase or decrease management complexity? • How important is vendor support for your team? 18-22 May 2015 20 Good idea? Bad idea? Let’s discuss! • NFS for everything • Panasas for everything • Lustre for everything • GPFS for everything 18-22 May 2015 21 How about… • Lustre for work (files stored here are temporary) • NFS for home • Tape for backup and archival • Lustre available everywhere • Tape available on data movers • NFS only available on login machines 18-22 May 2015 22 Designing your storage soluon • Who are the stakeholders? • How quickly should we be able to read any one file? • How will people want to use it? • How much training will you need? • How much training will your users need to effec9vely use your storage? • Do you have the knowledge necessary to do the training? • How o_en do they need the training? • Do you need different 9ers or types of storage? • Long-term • Temporary • Archive • From what science/usage domains are the users? • aka what applicaons will they be using? • What features are necessary? 18-22 May 2015 23 Applica.on Driven Tradeoffs • Domain Science • Chemistry • Aerospace • Bio* (biology, bioinformacs, biomedical) • Physics • Business • Economics • etc. • Data and Applicaon Restric9ons • HIPAA and PHI • ITAR • PCI DSS • And many more (SOX, GLBA, CJIS, FERPA, SOC, …) 18-22 May 2015 24 What you need to know • What is the distribu9on of files? • sizes, count • What is the expected workload? • How many bytes are wrioen for every byte read? • How many bytes are read for each file opened? • How many bytes are wrioen for each file opened? • Are there any system-based restric9ons? • POSIX conformance. Do you need a POSIX Filesystem? • Limitaons on number of files or files per directories • Network compability (IB vs. Ethernet) 18-22 May 2015 25 Use Case: Data Movement • Scenario: User needs to import a lot of data • Where is the data coming from? • Campus LAN? • Campus WAN? • WAN? • How o_en will the data be ingested? • Does it need to be outgested? • What kind of data is it? • Is it a one-9me ingest or regular? 18-22 May 2015 26 Designing your storage soluon • What technologies do you need to sasfy the requirements that you now have? • Can you put a number on the following? • Minimum disk throughput from a single compute node • Minimum aggregate throughput for the en9re filesystem for a benchmark (like iozone or IOR) • I/O load for representave workloads from your site • How much data and metadata is read/wrioen per job? • Temporary space requirements • Archive and backup space requirements • How much churn is there in data that needs to be backed up? 18-22 May 2015 27 Storage Devices • Solid State speed & cost capacity o Serial ATA (SATA): $/byte, large capacity, less • RAM reliable, slower (7.2k RPM) low • PCIe SSD high o Serial Aached SCSI (SAS): $$/byte, small • SATA/SAS SSD capacity, reliable, fast (15k RPM) • Spinning Disk o Nearline-SAS: SATA drives with SAS interface: • SAS more reliable than SATA, cheaper than SAS, ~SATA speeds but with lower overhead • NL-SAS • SATA o Solid State Disk (SSD): No spinning disks, $$$/ low high byte, blazing fast, reliable1 • Tape 18-22 May 2015 28 What is an IOP? • IOP == Input/Output Operaon • IOPS == Input/Output Operaons per Second • We care about two IOPS reports • The number we tell people when we say “Our Veridian Dynamics Frobulator 2021 gets 300PiB/s bandwidth!” • The number that affects users “Our Veridian Dynamics Frobulator 2021 only gets 5KiB/s for <insert your applicaon’s name>” • Why the difference? 18-22 May 2015 29 More tradeoffs … Space vs. Speed • Do you need 10GiB/s and 10TiB of space? • Do you need 1PiB of usable storage and 1GiB/s? • How do you meet your requirements? Large vs. Small Files • What is a small file? • No hard rule. It depends on how you define it.

Load more