Rocm: Leveraging Open Source Software for Accelerating

ROCm: Leveraging Open Source Software for Accelerating Applications with GPUs Derek Bouius – Product Manager Disclaimer The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. All open source software listed in this presentation is governed by the associated open source license. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18 ©2019 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Ryzen, Threadripper, EPYC, Infinity Fabric, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. PCIe is a trademark (or registered trademark) of PCI-SIG Corporation. OpenCL is a trademark of Apple Inc. used by permission by Khronos Group, Inc. Linux is a trademark (or registered trademark) of Linus Torvalds. ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 2 What Scientists Care About: “Time-to-Results” Design Innovate Develop Analyze Execute ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 3 Merging of HPC and ML Into Same Workflow Accelerates “Time-to-Results” • Trained ANN can replace existing numerical solvers, enabling fast and scalable simulations – up to 100 Million times faster! https://arxiv.org/abs/1910.07291 ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 4 Accelerate Development Through Abstraction of Heterogenous Compute Infrastructure Domain Specific Languages (DSLs) Frameworks Directives Libraries ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 5 • The GCN ISA is free and open: https://developer.amd.com/resources/developer- guides-manuals/ • hcc (hip-clang) • Compiles HIP code • HIP (Heterogeneous Interface for Portability) is an interface that looks similar to CUDA AMD GPU • hcc is a fork of clang • It understands HIP and emits AMDGCN in the resulting Compilers: binary • hipcc -> hcc (clang) -> amdgcn C/C++ based • All the x86 pieces are dealt with in the same way • AOMP (AMD OpenMP Compiler) • Compiles C/C++ code with OpenMP “target” pragmas • Links with libomptarget to produce a binary that can offload work to the GPU • OpenCL • Khronos Industry Standard accelerator language ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 7 • Important technology to the industry • AMD has plans to support OpenMP 4.5+ target offload from FORTRAN with two open source AMD GPU options Compilers: • F18 (based on llvm) • gfortran the • FORTRAN compiler work is an ongoing effort FORTRAN • See the Frontier spec sheet for what is expected to be supported on Frontier story • https://www.olcf.ornl.gov/wp- content/uploads/2019/05/frontier_specsheet .pdf ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 8 Compiler Front Ends (C, C++, HIP, OpenMP, etc) Language Runtime API User Space ROCr System Runtime API GCN Assembly ROCm Thunk API Device LLVM Host LLVM ROCm Kernel Driver Compiler (GCN) Compiler Kernel Space Optimizer Optimizer AMDGPU Kernel Driver GCN Target CPU ISA Target CPU Code GPU Code ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 9 ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 10 Memcopy time interval GPU kernel time interval API call time interval Kernel parameters Kernel counters $ rocprof –hsa-trace -i input.txt SimpleConvolution Kernel timestamps 11 ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 11 Decoder ring: Math library equivalents CUBLAS ROCBLAS Basic Linear Algebra Subroutines CUFFT ROCFFT Fast Fourier Transforms CUDNN MIOPEN Deep Learning Library CUB ROCPRIM Optimized Parallel Primitives EIGEN EIGEN C++ Template Library for Linear Algebra MORE INFO AT: GITHUB.COM/ROCM-DEVELOPER-TOOLS/HIP à HIP_PORTING_GUIDE.MD ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 12 HIP: Heterogeneous-compute Interface for Portability C++ runtime API and kernel language that allows developers to create portable applications that can Portable HIP C++ (Host & Device Code) run on AMD’s accelerators as well as CUDA devices. • Is open-source #include “cuda.h” #include “hcc.h” • Provides an API for an application to leverage GPU acceleration for both AMD and CUDA devices nvcc hipcc • Syntactically similar to CUDA. Most CUDA API calls can be converted in place: cuda -> hip AMD GPU • Supports a strong subset of CUDA runtime Nvidia GPU functionality ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 13 AMD GPU Libraries A note on naming conventions: • roc* -> AMGCN library usually written in HIP • cu* -> NVIDIA PTX libraries • hip* -> usually interface layer on top of roc*/cu* backends hip* libraries: • Can be compiled by hipcc and can generate a call for the device you have: • hipcc->AMDGCN hipBLAS • hipcc->nvcc (inlined)->NVPTX rocBLAS cuBLAS ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 14 Hipify-perl: • Easy to use –point at a directory and it will attempt to hipify CUDA code • Very simple string replacement technique: may make incorrect translations • sed -e ‘s/cuda/hip/g’, (e.g., cudaMemcpy becomes hipMemcpy) • Recommended for quick scans of projects HIPify Tools Hipify-clang: • Requires clang compiler to parse files and perform semantic translation • More robust translation of the code • Generates warnings and assistance for additional analysis • High quality translation, particularly for cases where the user is familiar with the make system ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 15 Directives –OpenMP and OpenACC Abstract a Variety of Hardware Platforms OpenMP OpenACC CPU GPU Accelerator ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 16 Frameworks –HPC Kokkos RAJA OpenMP HIP ROCm CUDA ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 17 Frameworks – Machine Learning TensorFlow PyTorch (Caffe2) MIOpen HIP OpenCL ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 18 Domain Specific Languages – Portability of the Future? DSL 1 DSL 2 DSL n OpenMP OpenACC HIP ISO C++ ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 19 Accelerate Today https://github.com/RadeonOpenCompute/k8s-device-plugin https://hub.docker.com/r/rocm/k8s-device-plugin/ hub.docker.com/r/rocm/ Upstream support in SyLabs Singularity ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 20 For more detailed information… • www.amd.com/rocm HIP Training Webinar: • https://www.youtube.com/watch?v=3ZXbRJVvgJs • Sessions at SC19: contact [email protected] ADAC8 - ©2019 Advanced Micro Devices, Inc. All rights reserved. 21 .

Rocm: Leveraging Open Source Software for Accelerating

GLSL 4.50 Spec

Intel Corporation

Implementing FPGA Design with the Opencl Standard

Ipox® Portfolio Holding Analysis 02/24/2021

The Amd Linux Graphics Stack – 2018 Edition Nicolai Hähnle Fosdem 2018

AMD Socket AM2 with AMD M690T Chipset Reference Schematic D D

Khronos Template 2015

Advanced Micro Devices, Inc. 2485 Augustine Drive Santa Clara, California 95054

Khronos Native Platform Graphics Interface (EGL Version 1.4 - April 6, 2011)

The Openvx™ Specification

SYCL – Introduction and Hands-On

Opencl on The