How Beegfs Excels in Extreme HPC Scale-Out Environments HPC Knowledge Meeting '19

How BeeGFS excels in extreme HPC scale-out environments HPC Knowledge Meeting '19 www.beegfs.io 2019 Alexander Eekhoff, Manager System Engineering About ThinkParQ • Established in 2014 as a spinoff from the Fraunhofer Center for High-Performance Computing, with a strong focus on R&D • 5 rankings in the top 20 on the IO-500 list. • Awarded the HPCwire 2018 Best Storage Product or Technology Award • Together with Partners, ThinkParQ provides fast, flexible, and solid storage solutions around BeeGFS for the users’ needs HPC Knowledge Meeting ‘19 Delivering solutions for HPC AI / Deep Learning Life Sciences Oil and Gas HPC Knowledge Meeting ‘19 Technology Partners HPC Knowledge Meeting ‘19 Partners Platinum Gold Partners Gold Partners Gold Partners Partners APAC EMEA NA HPC Knowledge Meeting ‘19 Partners Platinum Gold Partners Partners HPC Knowledge Meeting ‘19 BeeGFS – The Leading Parallel Cluster File System Client Service Well balanced from Easy to deploy and Performance Ease of Use small to large files Metadata integrate with existing Service infrastructure Direct Parallel File Access Storage Service Increase file system High availability design performanceScalability and enablingRobust continuous capacity, seamlessly operations and nondisruptively HPC Knowledge Meeting ‘19 Quick Facts: BeeGFS • A hardware-independent parallel file system (aka Software-defined Parallel Storage) • Runs on various platforms: x86, ARM, /mnt/beegfs/dir OpenPower, … 1 • Multiple networks (InfiniBand, OmniPath, Ethernet...) • Open Source 1 1 1 2 2 3 2 3 3 M MM … • Runs on various Linux distros: RHEL, SLES, Ubuntu… Storage Server #1 Storage Server #2 Storage Server #3 Storage Server #4 Storage Server #5 Metadata Server #1 • NFS, CIFS, Hadoop enabled Simply grow capacity and performance to the level that you need HPC Knowledge Meeting ‘19 Enterprise Features BeeGFS Enterprise Features (under support contract): • High Availability • Quota Enforcement • Access Control Lists (ACLs) • Storage Pools Support Benefits: • Professional Support • Customer Portal (Training videos, additional documentation) • Special repositories with early updates and hotfixes • Guaranteed next business day response End User License Agreement https://www.beegfs.io/docs/BeeGFS_EULA.txt HPC Knowledge Meeting ‘19 How BeeGFS Works beegfs.io What is BeeGFS /mnt/beegfs/dir 1 1 1 1 2 2 3 2 3 3 M M M … Storage Server #1 Storage Server #2 Storage Server #3 Storage Server #4 Storage Server #5 Metadata Server #1 Simply grow capacity and performance to the level that you need HPC Knowledge Meeting ‘19 BeeGFS Architecture Client Service • Client Service • Native Linux module to mount the file system Metadata • Management Service Service • Service registry and watch dog • Metadata Service • Maintain striping information for files Direct Parallel File Access • Not involved in data access between file open/close Storage Service • Storage Service • Store the (distributed) file contents • Graphical Administration and Monitoring Service • GUI to perform administrative tasks and monitor system information • Can be used for “Windows-style installation“ HPC Knowledge Meeting ‘19 BeeGFS Architecture Clients Metadata Servers • Management Service Direct, • Meeting point for servers and clients parallel file access • Watches registered services and checks their state • Not critical for performance, stores no user data • Typically not running on a dedicated machine Storage Servers Management Host Graphical Administration & Monitoring system HPC Knowledge Meeting ‘19 BeeGFS Architecture Clients • Metadata Service • Stores information about the data Metadata Servers • Directory information Direct, • File and directory ownership parallel file access • Location of user data files on storage targets • Not involved in data access between file open/close • Faster CPU cores improve latency Storage Servers • Manages one metadata target • In general, any directory on an existing local file system Management Host Graphical Administration & Monitoring system • Typically a RAID1 or RAID10 on SSD or NVMe devices • Stores complete metadata including file size HPC Knowledge Meeting ‘19 BeeGFS Architecture Clients • Storage Service • Stores striped user file contents (data chunk files) • One or multiple storage services per BeeGFS instance Metadata Servers • Manages one or more storage targets Direct, • In general, any directory on an existing local file system parallel file access • Typically a RAID-6 (8+2 or 10+2) or zfs RAIDz2 volume, either internal or externally attached • It can also be a single HDD, NVMe, or SSD device Storage Servers • Multiple RDMA interfaces per server possible • Different storage service instances bind to different Management Host Graphical Administration & Monitoring system interfaces • Different IP subnets for the interfaces for the routing to work correctly HPC Knowledge Meeting ‘19 Live per-Client and per-User Statistics HPC Knowledge Meeting ‘19 BeeGFS - Design Philosophy • Designed for Performance, Scalability, Robustness and Ease of Use • Distributed Metadata • No Linux patches, on top of EXT, XFS, ZFS, BTRFS, .. • Scalable multithreaded architecture • Supports RDMA / RoCE & TCP (InfiniBand, Omni-Path, 100/40/10/1GbE, …) • Easy to install and maintain (user space servers) • Robust and flexible (all services can be placed independently) • Hardware agnostic HPC Knowledge Meeting ‘19 Key Features beegfs.io High Availability I – Buddy Mirroring • Built-in Replication for High Storage Storage Storage Storage Availability Server #1 Server #2 Server #3 Server #4 • Flexible setting per directory • Individual for metadata and/or storage Target #101 Target #201 Target #301 Target #401 • Buddies can be in different Buddy Buddy Group #1 Group #2 racks or different fire zones. HPC Knowledge Meeting ‘19 High Availability II – Shared storage • Shared storage together with Storage Storage Storage Storage Pacemaker/Corosync Server #1 Server #2 Server #3 Server #4 • No extra storage space needed • Works in active/active layout • BeeGFS ha-utils simplify setup Target #101 Target #301 and administration Target #201 Target #401 HPC Knowledge Meeting ‘19 Storage Pool Storage Service … • Support for different types of storage • Single namespace across all tiers Performance Pool Capacity Pool Current Finished Projects Projects HPC Knowledge Meeting ‘19 BeeOND – BeeGFS On Demand • Create a parallel file system instance on-the-fly • Start/stop with one simple command Compute Compute Compute Compute • Use cases: cloud computing, test systems, Node #1 Node #2 Node #3 Node #n cluster compute nodes, ….. … • Can be integrated in cluster batch system • Common use case: User-controlled per-job parallel file system Data Staging • Aggregate the performance and capacity of local SSDs/disks in compute nodes of a job • Take load from global storage • Speed up "nasty" I/O patterns HPC Knowledge Meeting ‘19 The easiest way to setup a parallel filesystem… # GENERAL USAGE… $ beeond start –n <nodefile> -d <storagedir> -c <clientmount> ------------------------------------------------- # EXAMPLE… $ beeond start –n $NODEFILE –d /local_disk/beeond –c /my_scratch Starting BeeOND Services… Mounting BeeOND at /my_scratch… Done. HPC Knowledge Meeting ‘19 BeeGFS Additional Features • HA support • Quota user/group • ACL • Support for different types of storage • Modification Event Logging • Statistics in time series database • Cluster Manager Integration eg Bright Cluster Manager, Univa • Cloud readiness for AWS / Azure HPC Knowledge Meeting ‘19 Bright Cluster Manager Integration HPC Knowledge Meeting ‘19 BeeGFS and BeeOND beegfs.io Scale from small Converged Setup HPC Knowledge Meeting ‘19 Into Enterprise Storage Service ... Storage Service Direct Parallel File Direct Parallel File Access Access ... HPC Knowledge Meeting ‘19 to BeeOND NvME Storage Service ... HPC Knowledge Meeting ‘19 BeeGFS Use Cases beegfs.io Alfred Wegener Institute for Polar and Marine Research • Institute was founded in 1980 and is named after meteorologist, climatologist and geologist Alfred Wegener. • Government funded • Conducts research in the Arctic, in the Antarctic and in the high and mid latitude oceans • Additional research topics are: • North Sea research • Marine biological monitoring • Technical marine developments • Actual mission: In September 2019 the icebreaker Polarstern will drift through the Arctic Ocean for 1 year with 600 team members from 17 countries & use the data gathered to take climate and ecosystem research to the next level. HPC Knowledge Meeting ‘19 Day to day HPC operations @AWI • CS400 • 11,548 Cores • 316 Nodes: • 2x Intel Xeon Broadwell 18-Core CPUs • 64GB RAM (DDR4 2400MHz) • 400GB SSD • 4 fat compute nodes, as above, but 512GB RAM • 1 very fat node, 2x Intel Broadwell 14-Core CPUs, 1.5TB RAM • Intel Omnipath network • 1024TB fast parallel file system (BeeGFS) • 128TB home and software file system HPC Knowledge Meeting ‘19 Do you remember BeeOND? • Global BeeGFS storage on spinning disks “Robust and stable, even in a case of unexpected power • 1PB of scratch_fs providing 80GB/s failure.“ • 316 compute nodes Dr. Malte Thoma • Each equipped with 400MB SSD each Alfred Wegener Institute, Helmholtz Centre for Polar and • 316x500MB/s per SSD equals 150GB/s aggregate Marine Research - (Bremerhaven, Germany) BeeOND burst “for free” HPC Knowledge Meeting ‘19 Tokyo Institute of Technology: Tsubame 3 • Top national university for science and technology in Japan • 130 year history • Over 10,000 students located in the Tokyo Area Tsubame

Load more