Developing Software for OLCF Frontier

Developing Software for OLCF Frontier Philip C. Roth Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory ORNL is managed by UT-Battelle, LLC for the US Department of Energy Context • NVIDIA GPUs in OLCF’s two most recent systems (Titan, Summit) – User base, staff has investment in CUDA and NVIDIA libraries • OLCF’s next system (Frontier) will have AMD GPUs – CUDA and related NVIDIA libraries not supported • Portability tools are increasingly important to OLCF and its users – Enable “Day 1 Success” of Frontier – Support overlapping OLCF system lifetimes – Support developers/users who also target non-OLCF systems 2 Two High-Level Questions • How to develop code that targets Frontier? – Will discuss inter-process and intra-node options • Where to do that development? – Will discuss currently available and expected development systems 3 The View from 30,000 Feet • Frontier will support traditional HPC application programming languages (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are well supported by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 4 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languages (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are well supported by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 5 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are well supported by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 6 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 7 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run onSounds a GPU like will Summit be required for good performance 9 • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 8 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run onSounds a GPU like will Summit be required for good performance 9 • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported Sounds like Summit 9 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run onSounds a GPU like will Summit be required for good performance 9 • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported Sounds like Summit Summit is a premier development platform for targeting Frontier 10 Development Systems • OLCF Summit – Best platform for development at scale – Similar “fat node” architecture – Ability to use many looking-forward-to-Frontier approaches today (e.g., OpenMP offload, HIP, Kokkos) • Frontier Center of Excellence (CoE) systems at Cray – Small system (tulip) available for ECP project teams 9 – User cap due to system size – Application details available on request (Bronson Messer, [email protected]) • OLCF Early Access System expected in late 2020 • Your local Linux system, especially if it has a GPU – Much of Frontier’s software stack uses open source software (e.g., ROCm) – Capability may depend on maturity of the software (e.g., Kokkos vs. OpenMP 5) – Some software do not yet support AMD GPUs 11 Compilers/Languages • Three supported compiler suites – Cray Programming Environment (PE) – AMD – GCC • All compiler suites support C, C++, Fortran • Some official support for other HPC languages – UPC (Cray, GCC) – Chapel (Cray) – Coarray Fortran, Coarray C++ (Cray) – Charm++ (Cray only?) • Open source options (e.g., Python, Julia, ”stock” LLVM/Clang) should work – GPU and model-specific CPU support may be limited/non-existent 12 Inter-process • Several supported options for inter-process communication and synchronization – MPI – Charm++ – Coarray Fortran/C++ – UPC – Global Arrays – GASNet – OpenSHMEM • Support may be limited to specific compiler suite (e.g., Coarray Fortran only available with Cray compiler) 13 Intra-node • Several options for targeting CPU/GPUs in fat node organization – Directives – Distinct kernel functions – Lambdas/functors 14 Intra-node: Directives • Compiler annotations in source code that describe how to move data and how to parallelize code • OpenMP 5.x including offload to GPU will be supported – Cray, AMD compilers – Possibly also GCC compilers • OpenACC – In discussions, GCC suite only – CLACC (Joel Denny, [email protected]; LLVM-based, C only) • Only option for a strict “Fortran-only” approach • Path via Summit: OpenMP 4.5 15 Intra-node: Distinct Kernel Functions • Heterogeneous-compute Interface for Portability (HIP) – C++ API very similar to CUDA implemented in header-only library – Language for writing GPU kernels in C++ with some C++11 features – Tools for converting CUDA code to HIP code – Can target either AMD or NVIDIA GPUs – Open source • Collection of compatible libraries – Many analogous to NVIDIA libraries (e.g., rocSPARSE and cuSPARSE) – Some portability “shim” libraries to insulate application code • OpenCL will be usable via AMD implementation • Path via Summit: modules for HIP and several HIP libraries (e.g., hipBLAS) are available and target NVIDIA GPUs 16 Intra-node: Lambdas/Functors • Code to run on GPU specified using C++ lambda or functor object • Several options with varying degrees of maturity and likelihoods of availability – Kokkos, RAJA developers working on HIP backend – YAKL (“Yet Another Kernel Launcher”, Matt Norman from OLCF) – like Kokkos but simplified, and with Fortran interface to most functionality – Even SYCL/DPC++ may be available • Path via Summit: OLCF does not provide modules, but Kokkos and YAKL known to work on Summit CPU and GPUs; RAJA should work 17 Fortran Revisited • Directives will be best-supported approach • Efforts like YAKL provide existence proof of “Jump to C/C++” approach – Transfer control and data from Fortran to C/C++, e.g., using ISO bindings or specially-built Fortran bindings – Use preferred C/C++-based

Developing Software for OLCF Frontier

Benchmarking the Intel FPGA SDK for Opencl Memory Interface

BCL: a Cross-Platform Distributed Data Structures Library

Advances, Applications and Performance of The

Enabling Efficient Use of UPC and Openshmem PGAS Models on GPU Clusters

Automatic Handling of Global Variables for Multi-Threaded MPI Programs

Overview of the Global Arrays Parallel Software Development Toolkit: Introduction to Global Address Space Programming Models

Exascale Computing Project -- Software

The Global Arrays User Manual

Recent Activities in Programming Models and Runtime Systems at ANL

The Opengl ES Shading Language

The Opengl ES Shading Language

Enabling Efficient Use of UPC and Openshmem PGAS Models on GPU Clusters