Using On-Demand File Systems in HPC Environments

Using On-Demand File Systems in HPC Environments Mehmet Soysal, Marco Berghoff, Thorsten Zirwes, Marc-André Vef, Sebastian Oeste, André Brinkmann, Wolfgang E. Nagel, and Achim Streit Steinbuch Centre for Computing (SCC) / Scientific Computing and Simulation (SCS) KIT – The Research University in the Helmholtz Association www.kit.edu Overview Motivation Approach Related Work Use Cases and Results Remarks and Observations Conclusion & Future work 2/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Motivation The I/O Subsystem (parallel FS) is a bottleneck in HPC Systems Bandwidth, metadata or latency Shared medium Applications interfere with each other New storage technologies (SSD, NVMe, NVRAM) 3/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Motivation Proposed Solution Bring data closer to compute nodes On-demand file system (ODFS) node-local storage Tailor private ODFS Advantages Dedicated bandwidth / IOPS Independent to global file system Low latency due to SSD / NVMe / NVRAM No code changes needed to application 4/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) HPC: current file system usage App/Job 1 App/Job 2 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Global Parallel File System $HOME / $WORK / Scratch 5/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) HPC: usage with on-demand fs App/Job 1 App/Job 2 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 On-demand file system 1 On-demand file system 2 Global Parallel File System $HOME / $WORK / Scratch 6/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Related Work / Approaches File system features Hardware solutions Spectrum Scale (GPFS) - HWAC Solid state disks Lustre – PFL / DOM / PCC Burst buffers Beeond – Storage pools In bound cache Libaries System reconfiguration MPI-IO Dynamic Remote Scratch Sionlib Ramdisk storage accelerator HDF5 / NETCDF BeeGFS On Demand - BeeOND ADIOS Lustre On Demand - LOD 7/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Testing Environment ForHLR II Cluster @KIT 1152 Nodes / 2 X E5-2660 v3 (20 cores) / 64 GB RAM 2 Island (816 / 336 Nodes) / 56 Gbit per node / CBB Fabric Local SATA-SSD (480GB) per node - approx. 600 / 400 MB R/W Scenarios Generic Benchmark Two use cases from our users (240 Nodes + 1) OpenFOAM NAStJA Concurrent data staging (23 Nodes + 1) 8/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Throughput with IOZone 9/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Use cases NAStJA Use case 1: 240 nodes / 1 Block per core 4800 files / 4800MB per snapshot Use case 2: Data staging with 16, 19, ans 20 cores per node / 23 nodes Concurrent stage out OpenFOAM Use case 1: Laboratory burner flame ~450k files / 120 GB per snapshot Use case 2: Mixing methane / air Generates files for use case 1 / write at high frequency results Use cases are user provided and actively used 10/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Average load on global FS (NAStJA) 11/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) OpenFOAM use case 1 12/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) NAStJA use case 1 13/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Concurrent data staging Application (NAStJA) runs on 23 nodes + 1 MDS Application runs with 16, 19 and 20 tasks per node Comparative run without data staging Data staging with 1 process per node Data staging on MDS with four processes Impact on application with concurrent data staging? 14/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) NAStJA use case 2 / stage out Using data staging on MDS has only minimal impact With 19 and 20 cores for application very high initial peaks Fast data staging – Slow data staging High impact – Low impact 15/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Remarks and observations Loopback device Speedup Faster cleanup after Job Storage targets are very small (chunk size/stripe count!) Solution for very problematic use cases Applications I/O behavior important 16/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Conclusion & future work Reduces load on global file system Easy to set up Some application might run slower I/O analysis helpful Topology awareness In-situ post processing Add file systems(Ceph, GekkoFS) and pre-sets (small/huge files) Automatic data staging 17/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Acknowledgement ADA-FS ProJect DFG priority programm SPPEXA “Software for exascale Computing” Steinbuch Centre for Computing Contact: [email protected] Questions? 18/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) .

Using On-Demand File Systems in HPC Environments

Base Performance for Beegfs with Data Protection

Spring 2014 • Vol

Data Integration and Analysis 10 Distributed Data Storage

Globalfs: a Strongly Consistent Multi-Site File System

Towards Transparent Data Access with Context Awareness

The Leading Parallel Cluster File System, Developed with a St Rong Focus on Perf Orm Ance and Designed for Very Ea Sy Inst All Ati on and Management

Maximizing the Performance of Scientific Data Transfer By

Understanding and Finding Crash-Consistency Bugs in Parallel File Systems Jinghan Sun, Chen Wang, Jian Huang, and Marc Snir University of Illinois at Urbana-Champaign

Performance and Scalability of a Sensor Data Storage Framework

Towards Transparent Data Access with Context Awareness

Beegfs Unofficial Documentation

Filesystems in IO500 3% Gridscaler Datawarp 2% Wekaio Matrix 2% Total Submissions Across All Lists: 147 5% SUSE Enterprise Storage 1% Lustre 29%