
Developing Software for OLCF Frontier Philip C. Roth Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory ORNL is managed by UT-Battelle, LLC for the US Department of Energy Context • NVIDIA GPUs in OLCF’s two most recent systems (Titan, Summit) – User base, staff has investment in CUDA and NVIDIA libraries • OLCF’s next system (Frontier) will have AMD GPUs – CUDA and related NVIDIA libraries not supported • Portability tools are increasingly important to OLCF and its users – Enable “Day 1 Success” of Frontier – Support overlapping OLCF system lifetimes – Support developers/users who also target non-OLCF systems 2 Two High-Level Questions • How to develop code that targets Frontier? – Will discuss inter-process and intra-node options • Where to do that development? – Will discuss currently available and expected development systems 3 The View from 30,000 Feet • Frontier will support traditional HPC application programming languages (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are well supported by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 4 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languages (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are well supported by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 5 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are well supported by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 6 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run on a GPU will be required for good performance • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 7 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run onSounds a GPU like will Summit be required for good performance 9 • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported 8 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run onSounds a GPU like will Summit be required for good performance 9 • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported Sounds like Summit 9 The View from 30,000 Feet Sounds like Summit • Frontier will support traditional HPC application programming languagesSounds like Summit (C, C++, Fortran) • Frontier will support direct programming of GPUs with C++ (or C++-like) languages • Frontier will use operating system/CPU type that are wellSounds supported like Summit by open source software community • Frontier compute nodes will have “fat” nodes that feature multiple GPUs – Expressing computation as data parallel operations that run onSounds a GPU like will Summit be required for good performance 9 • Frontier has a distributed memory, tightly-coupled cluster organization – MPI will be supported, and inter-node programming models built on MPI should work – Several PGAS options will be supported Sounds like Summit Summit is a premier development platform for targeting Frontier 10 Development Systems • OLCF Summit – Best platform for development at scale – Similar “fat node” architecture – Ability to use many looking-forward-to-Frontier approaches today (e.g., OpenMP offload, HIP, Kokkos) • Frontier Center of Excellence (CoE) systems at Cray – Small system (tulip) available for ECP project teams 9 – User cap due to system size – Application details available on request (Bronson Messer, [email protected]) • OLCF Early Access System expected in late 2020 • Your local Linux system, especially if it has a GPU – Much of Frontier’s software stack uses open source software (e.g., ROCm) – Capability may depend on maturity of the software (e.g., Kokkos vs. OpenMP 5) – Some software do not yet support AMD GPUs 11 Compilers/Languages • Three supported compiler suites – Cray Programming Environment (PE) – AMD – GCC • All compiler suites support C, C++, Fortran • Some official support for other HPC languages – UPC (Cray, GCC) – Chapel (Cray) – Coarray Fortran, Coarray C++ (Cray) – Charm++ (Cray only?) • Open source options (e.g., Python, Julia, ”stock” LLVM/Clang) should work – GPU and model-specific CPU support may be limited/non-existent 12 Inter-process • Several supported options for inter-process communication and synchronization – MPI – Charm++ – Coarray Fortran/C++ – UPC – Global Arrays – GASNet – OpenSHMEM • Support may be limited to specific compiler suite (e.g., Coarray Fortran only available with Cray compiler) 13 Intra-node • Several options for targeting CPU/GPUs in fat node organization – Directives – Distinct kernel functions – Lambdas/functors 14 Intra-node: Directives • Compiler annotations in source code that describe how to move data and how to parallelize code • OpenMP 5.x including offload to GPU will be supported – Cray, AMD compilers – Possibly also GCC compilers • OpenACC – In discussions, GCC suite only – CLACC (Joel Denny, [email protected]; LLVM-based, C only) • Only option for a strict “Fortran-only” approach • Path via Summit: OpenMP 4.5 15 Intra-node: Distinct Kernel Functions • Heterogeneous-compute Interface for Portability (HIP) – C++ API very similar to CUDA implemented in header-only library – Language for writing GPU kernels in C++ with some C++11 features – Tools for converting CUDA code to HIP code – Can target either AMD or NVIDIA GPUs – Open source • Collection of compatible libraries – Many analogous to NVIDIA libraries (e.g., rocSPARSE and cuSPARSE) – Some portability “shim” libraries to insulate application code • OpenCL will be usable via AMD implementation • Path via Summit: modules for HIP and several HIP libraries (e.g., hipBLAS) are available and target NVIDIA GPUs 16 Intra-node: Lambdas/Functors • Code to run on GPU specified using C++ lambda or functor object • Several options with varying degrees of maturity and likelihoods of availability – Kokkos, RAJA developers working on HIP backend – YAKL (“Yet Another Kernel Launcher”, Matt Norman from OLCF) – like Kokkos but simplified, and with Fortran interface to most functionality – Even SYCL/DPC++ may be available • Path via Summit: OLCF does not provide modules, but Kokkos and YAKL known to work on Summit CPU and GPUs; RAJA should work 17 Fortran Revisited • Directives will be best-supported approach • Efforts like YAKL provide existence proof of “Jump to C/C++” approach – Transfer control and data from Fortran to C/C++, e.g., using ISO bindings or specially-built Fortran bindings – Use preferred C/C++-based
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages21 Page
-
File Size-