Using On-Demand File Systems in HPC Environments

Using On-Demand File Systems in HPC Environments

Using On-Demand File Systems in HPC Environments Mehmet Soysal, Marco Berghoff, Thorsten Zirwes, Marc-André Vef, Sebastian Oeste, André Brinkmann, Wolfgang E. Nagel, and Achim Streit Steinbuch Centre for Computing (SCC) / Scientific Computing and Simulation (SCS) KIT – The Research University in the Helmholtz Association www.kit.edu Overview Motivation Approach Related Work Use Cases and Results Remarks and Observations Conclusion & Future work 2/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Motivation The I/O Subsystem (parallel FS) is a bottleneck in HPC Systems Bandwidth, metadata or latency Shared medium Applications interfere with each other New storage technologies (SSD, NVMe, NVRAM) 3/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Motivation Proposed Solution Bring data closer to compute nodes On-demand file system (ODFS) node-local storage Tailor private ODFS Advantages Dedicated bandwidth / IOPS Independent to global file system Low latency due to SSD / NVMe / NVRAM No code changes needed to application 4/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) HPC: current file system usage App/Job 1 App/Job 2 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Global Parallel File System $HOME / $WORK / Scratch 5/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) HPC: usage with on-demand fs App/Job 1 App/Job 2 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 On-demand file system 1 On-demand file system 2 Global Parallel File System $HOME / $WORK / Scratch 6/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Related Work / Approaches File system features Hardware solutions Spectrum Scale (GPFS) - HWAC Solid state disks Lustre – PFL / DOM / PCC Burst buffers Beeond – Storage pools In bound cache Libaries System reconfiguration MPI-IO Dynamic Remote Scratch Sionlib Ramdisk storage accelerator HDF5 / NETCDF BeeGFS On Demand - BeeOND ADIOS Lustre On Demand - LOD 7/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Testing Environment ForHLR II Cluster @KIT 1152 Nodes / 2 X E5-2660 v3 (20 cores) / 64 GB RAM 2 Island (816 / 336 Nodes) / 56 Gbit per node / CBB Fabric Local SATA-SSD (480GB) per node - approx. 600 / 400 MB R/W Scenarios Generic Benchmark Two use cases from our users (240 Nodes + 1) OpenFOAM NAStJA Concurrent data staging (23 Nodes + 1) 8/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Throughput with IOZone 9/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Use cases NAStJA Use case 1: 240 nodes / 1 Block per core 4800 files / 4800MB per snapshot Use case 2: Data staging with 16, 19, ans 20 cores per node / 23 nodes Concurrent stage out OpenFOAM Use case 1: Laboratory burner flame ~450k files / 120 GB per snapshot Use case 2: Mixing methane / air Generates files for use case 1 / write at high frequency results Use cases are user provided and actively used 10/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Average load on global FS (NAStJA) 11/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) OpenFOAM use case 1 12/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) NAStJA use case 1 13/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Concurrent data staging Application (NAStJA) runs on 23 nodes + 1 MDS Application runs with 16, 19 and 20 tasks per node Comparative run without data staging Data staging with 1 process per node Data staging on MDS with four processes Impact on application with concurrent data staging? 14/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) NAStJA use case 2 / stage out Using data staging on MDS has only minimal impact With 19 and 20 cores for application very high initial peaks Fast data staging – Slow data staging High impact – Low impact 15/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Remarks and observations Loopback device Speedup Faster cleanup after Job Storage targets are very small (chunk size/stripe count!) Solution for very problematic use cases Applications I/O behavior important 16/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Conclusion & future work Reduces load on global file system Easy to set up Some application might run slower I/O analysis helpful Topology awareness In-situ post processing Add file systems(Ceph, GekkoFS) and pre-sets (small/huge files) Automatic data staging 17/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) Acknowledgement ADA-FS ProJect DFG priority programm SPPEXA “Software for exascale Computing” Steinbuch Centre for Computing Contact: [email protected] Questions? 18/18 Wed, July 17, 2019 Mehmet Soysal – Using On-Demand File Systems in HPC Environments Steinbuch Centre for Computing (SCC) .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    18 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us