Developing Software for OLCF Frontier

Philip C. Roth Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory

ORNL is managed by UT-Battelle, LLC for the US Department of Energy Context

• NVIDIA GPUs in OLCF’s two most recent systems (Titan, Summit) – User base, staff has investment in CUDA and NVIDIA libraries • OLCF’s next system (Frontier) will have AMD GPUs – CUDA and related NVIDIA libraries not supported • Portability tools are increasingly important to OLCF and its users – Enable “Day 1 Success” of Frontier – Support overlapping OLCF system lifetimes – Support developers/users who also target non-OLCF systems

2 Two High-Level Questions

• How to develop code that targets Frontier? – Will discuss inter- and intra-node options

• Where to do that development? – Will discuss currently available and expected development systems

3 The View from 30,000 Feet

• Frontier will support traditional HPC application programming languages (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use /CPU type that are well supported by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a , tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported

4 The View from 30,000 Feet

Sounds like Summit • Frontier will support traditional HPC application programming languages (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are well supported by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported

5 The View from 30,000 Feet

Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are well supported by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported

6 The View from 30,000 Feet

Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported

7 The View from 30,000 Feet

Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run onSounds a GPU like will Summit be required for good performance

9 • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported

8 The View from 30,000 Feet

Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run onSounds a GPU like will Summit be required for good performance

9 • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported Sounds like Summit

9 The View from 30,000 Feet

Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run onSounds a GPU like will Summit be required for good performance

9 • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported Sounds like Summit

Summit is a premier development platform for targeting Frontier

10 Development Systems

• OLCF Summit – Best platform for development at scale – Similar “fat node” architecture – Ability to use many looking-forward-to-Frontier approaches today (e.g., OpenMP offload, HIP, Kokkos) • Frontier Center of Excellence (CoE) systems at Cray – Small system (tulip) available for ECP project teams 9 – User cap due to system size – Application details available on request (Bronson Messer, [email protected]) • OLCF Early Access System expected in late 2020 • Your local Linux system, especially if it has a GPU – Much of Frontier’s software stack uses open source software (e.g., ROCm) – Capability may depend on maturity of the software (e.g., Kokkos vs. OpenMP 5) – Some software do not yet support AMD GPUs

11 Compilers/Languages

• Three supported compiler suites – Cray Programming Environment (PE) – AMD – GCC • All compiler suites support C, C++, Fortran • Some official support for other HPC languages – UPC (Cray, GCC) – Chapel (Cray) – , Coarray C++ (Cray) – Charm++ (Cray only?) • Open source options (e.g., Python, Julia, ”stock” LLVM/Clang) should work – GPU and model-specific CPU support may be limited/non-existent

12 Inter-process

• Several supported options for inter-process communication and synchronization – MPI – Charm++ – Coarray Fortran/C++ – UPC – Global Arrays – GASNet – OpenSHMEM • Support may be limited to specific compiler suite (e.g., Coarray Fortran only available with Cray compiler)

13 Intra-node

• Several options for targeting CPU/GPUs in fat node organization – Directives – Distinct kernel functions – Lambdas/functors

14 Intra-node: Directives

• Compiler annotations in source code that describe how to move data and how to parallelize code • OpenMP 5.x including offload to GPU will be supported – Cray, AMD compilers – Possibly also GCC compilers • OpenACC – In discussions, GCC suite only – CLACC (Joel Denny, [email protected]; LLVM-based, C only) • Only option for a strict “Fortran-only” approach • Path via Summit: OpenMP 4.5

15 Intra-node: Distinct Kernel Functions

• Heterogeneous-compute Interface for Portability (HIP) – C++ API very similar to CUDA implemented in header-only library – Language for writing GPU kernels in C++ with some C++11 features – Tools for converting CUDA code to HIP code – Can target either AMD or NVIDIA GPUs – Open source • Collection of compatible libraries – Many analogous to NVIDIA libraries (e.g., rocSPARSE and cuSPARSE) – Some portability “shim” libraries to insulate application code • OpenCL will be usable via AMD implementation • Path via Summit: modules for HIP and several HIP libraries (e.g., hipBLAS) are available and target NVIDIA GPUs

16 Intra-node: Lambdas/Functors

• Code to run on GPU specified using C++ lambda or functor object • Several options with varying degrees of maturity and likelihoods of availability – Kokkos, RAJA developers working on HIP backend – YAKL (“Yet Another Kernel Launcher”, Matt Norman from OLCF) – like Kokkos but simplified, and with Fortran interface to most functionality – Even SYCL/DPC++ may be available • Path via Summit: OLCF does not provide modules, but Kokkos and YAKL known to work on Summit CPU and GPUs; RAJA should work

17 Fortran Revisited

• Directives will be best-supported approach • Efforts like YAKL provide existence proof of “Jump to C/C++” approach – Transfer control and data from Fortran to C/C++, e.g., using ISO bindings or specially-built Fortran bindings – Use preferred C/C++-based approach to transfer control and data between CPU and GPU – Requires knowledge and/or control of underlying data representations in both Fortran and C/C++ contexts

18 ORNL is managed by UT-Battelle, LLC for the US Department of Energy Acknowledgements

• This research was supported by the Exascale Computing Project (17- SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. • This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

20 Summary

• Many options for preparing code for Frontier – Directives – OpenMP, maybe OpenACC – Distinct kernel functions – HIP, OpenCL – Lambdas/functors – Kokkos, RAJA, YAKL, maybe SYCL/DPC++ • Summit is a great platform for preparing code for Frontier… – …but pre-Frontier systems and your local Linux system are useful too • For more information – OLCF Frontier web site at https://www.olcf.ornl.gov/frontier/ – [email protected]

21