SYCL 2020 Specification Release

SYCL 2020 Specification Release Michael Wong SYCL WG Chair Codeplay Distinguished Engineer ISOCPP Director & VP ISO C++ Directions Group [email protected] | wongmichael.com/about This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 1 Khronos Connects Software to Silicon Open, royalty-free interoperability standards to harness the power of GPU, multiprocessor and XR hardware 3D graphics, augmented and virtual reality, parallel programming, inferencing and vision acceleration Non-profit, member-driven standards organization, open to any company Well-defined multi-company governance and IP Framework Founded in 2000 >150 Members ~ 40% US, 30% Europe, 30% Asia This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 2 Khronos Active Initiatives 3D Graphics 3D Assets Portable XR Parallel Computation Desktop, Mobile Authoring Augmented and Vision, Inferencing, and Web and Delivery Virtual Reality Machine Learning Safety Critical APIs This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 3 Khronos Compute Acceleration Standards Higher-level Languages and APIs Streamlined development and performance portability Single source C++ programming Graph-based vision and with compute acceleration inferencing acceleration Lower-level APIs Direct Hardware Control GPU rendering + Intermediate Heterogeneous compute Representation SYCL and SPIR were compute acceleration (IR) supporting acceleration originally OpenCL parallel execution sub projects and graphics GPU CPUCPUCPU GPUGPU FPGA DSP Increasing industry interest in AI/Tensor HW parallel compute acceleration to Custom Hardware combat the ‘End of Moore’s Law’ This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 4 SYCL 2020 Launch! Open Standard for Single Source C++ Parallel Programming SYCL 2020 is released after 3 years of intense work Significant adoption in Embedded, Desktop and HPC markets Improved programmability, smaller code size, faster performance Based on C++17, backwards compatible with SYCL 1.2.1 Simplify porting of standard C++ applications to SYCL Closer alignment and integration with ISO C++ Backend acceleration API independent SYCL 2020 increases expressiveness and simplicity for modern C++ heterogeneous programming This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 5 SYCL 2020 Industry Momentum SYCL support growing from Embedded Systems through https://www.alcf.anl.gov/support-center/aurora/sycl-and-dpc-aurora Desktops to Supercomputers https://www.embeddedcomputing.com/technology/open-source/risc-v-open-source-ip/nsitexe-kyoto-microcomputer-and-codeplay-software-are-bringing-open-standards-programming-to-risc-v-vector-processor-for-hpc-and-ai-systems https://www.nextplatform.com/2021/02/03/can-sycl-slice-into-broader-supercomputing/ https://www.phoronix.com/scan.php?page=news_item&px=hipSYCL-New-Lite-Runtime https://software.intel.com/content/www/us/en/develop/articles/interoperability-dpcpp-sycl-opencl.html https://www.renesas.com/br/en/about/press-room/renesas-electronics-and-codeplay-collaborate-opencl-and-sycl-adas-solutions https://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2021/nersc-alcf-codeplay-partner-on-sycl-for-next-generation-supercomputers/ https://research-portal.uws.ac.uk/en/publications/trisycl-for-xilinx-fpga https://www.imaginationtech.com/news/press-release/tensorflow-gets-native-support-for-powervr-gpus-via-optimised-open-source-sycl-libraries/ This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 6 SYCL 2020 Major Features • Unified Shared Memory (USM) • Code with pointers can work naturally without buffers or accessors • Parallel Reductions • Added built-in reduction operation to avoid boilerplate code and achieve maximum performance on hardware with built-in reduction operation acceleration • Work group and subgroup algorithms • Efficient parallel operations between work items • Class template argument deduction (CTAD) and template deduction guides • Simplified class template instantiation • Simplified use of Accessors with a built-in reduction operation • Reduces boilerplate code and streamlines the use of C++ software design patterns • Expanded interoperability • Efficient acceleration by diverse backend acceleration APIs • SYCL atomic operations are now more closely aligned to standard C++ atomics • Enhances parallel programming freedom This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 7 SYCL Single Source C++ Parallel Programming One-MKL Complex ML frameworks One-DNN Standard C++ can be directly compiled OneDPC C++ ML and accelerated SYCL-BLAS Application SYCL-Eigen Libraries Frameworks SYCL-DNN Code SYCL Parallel STL ... C++ Template C++ Template C++ Template C++ templates and lambda functions separate host & Libraries Libraries Libraries accelerated device code C++ Kernel Fusion can SYCL CPU give better performance Compiler Compiler on complex apps and libs than hand-coding CPU Other Backends Accelerated code SYCL is ideal for accelerating larger passed into device CPUCPUCPU GPUGPU CPUCPUCPU GPUGPU C++-based engines and applications OpenCL compilers FPGA DSP FPGA DSP with performance portability AI/Tensor HW AI/Tensor HW Custom Hardware Custom Hardware This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 8 Parallel Industry Initiatives C++11 C++14 C++17 C++20 C++23 SYCL 1.2 SYCL 1.2.1 SYCL 2020 SYCL 202X C++11 Single source C++11 Single source C++17 Single source C++20 Single source programming programming programming programming Many backend options Many backend options OpenCL 1.2 OpenCL 2.1 OpenCL 2.2 OpenCL 3.0 OpenCL C Kernel SPIR-V in Core Language 2011 2015 2017 2020 202X This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 9 SYCL Implementations in Development SYCL, OpenCL and SPIR-V, as open industry SYCL enables Khronos to standards, enable flexible integration and influence ISO C++ to (eventually) deployment of multiple acceleration technologies Source Code support heterogeneous compute DPC++ ComputeCpp triSYCL hipSYCL neoSYCL Uses LLVM/Clang Multiple Open source CUDA and SX-AURORA Part of oneAPI Backends test bed HIP/ROCm TSUBASA Experimental Experimental Any CPU CUDA+PTX Any CPU OpenCL+PTX OpenMP OpenMP CUDA NVIDIA GPUs NVIDIA GPUs Any CPU Any CPU NVIDIA GPUs VEO OpenCL + OpenCL + OpenCL + ROCm SPIR-V SPIR(-V) SPIR/LLVM AMD GPUs Intel CPUs NEC VEs Intel CPUs Intel CPUs XILINX FPGAs Intel GPUs Intel GPUs POCL Intel FPGAs Intel FPGAs (open-source OpenCL supporting CPUs and NVIDIA Multiple Backends in Development AMD GPUs GPUs and more) SYCL beginning to be supported on multiple (depends on driver stack) low-level APIs in addition to OpenCL Arm Mali IMG PowerVR e.g., ROCm and CUDA Renesas R-Car For more information: http://sycl.tech This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 10 SYCL in Embedded Systems Networks trained on high-end Neural Network Open industry standards, enable Training Data flexible integration and deployment desktop and cloud systems Training of multiple acceleration technologies Trained Compilation Networks Ingestion Applications link to compiled Compiled Vision / Inferencing C++ Application inferencing code or call Engine Code vision/inferencing API Code Hardware Acceleration APIs Sensor Data Diverse Embedded Hardware DSP Multi-core CPUs, GPUs Dedicated DSPs, FPGAs, Tensor Cores FPGA Hardware * Vulkan only runs on GPUs GPU This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 11 SYCL in HPC/Supercomputers Simulation Data Learning Three Pillars of HPC Languages Productivity Languages Productivity Languages Science Problem Solver Libraries, Parallel RT Big Data Stack, Stats Lib, Databases Deep Learning, Linear Alg, ML Today’s Supercomputing Development Workflow needs knowledge of Need Languages that allow system architecture and control of these Data Issues tools that control data OpenMP for C CUDA/pthreads/ C++ Application uses Set Data affinity, Data Layout, Data and Fortran OpenACC/OpenCL SYCL, Kokkos, Raja movement, Data Locality, highly Parameterized Code and dynamically Choose compose the algorithms (C++ templates, parallel STL, inlining and fusion, Algorithm abstractions) for target Libraries augment compiler Math, ML, Data Libraries;C++ Std, C, Python Libraries optimizations for Performance Portable programs Implement 2021 2020 2021 2021 and Test Use open standards to run Algorithm Performance Portable code on new generation, or different vendor’s, hardware with compiler optimization, explicit parametrization and Optimize dynamically composed algorithm Algorithm Based on IWOCL/SYCLCon 2020 keynote Hal FInkel: https://www.iwocl.org/wp- content/uploads/iwocl-syclcon-2020-finkel-keynote-slides.pdf This work is licensed under a Creative Commons Attribution 4.0 International License © The Khronos® Group Inc. 2021 - Page 12 Enabling Industry Engagement • SYCL working group values industry feedback - https://community.khronos.org/c/sycl

SYCL 2020 Specification Release

Introduction to Openacc 2018 HPC Workshop: Parallel Programming

The Importance of Data

Introduction to GPU Computing

Delivering Heterogeneous Programming in C++

Programmers' Tool Chain

Intel® Oneapi Programming Guide

Opencl SYCL 2.2 Specification

GPU Computing with Openacc Directives

Openacc Course October 2017. Lecture 1 Q&As

Of the Threading Building Blocks Flow Graph API, a C++ API for Expressing Dependency, Streaming and Data Flow Applications

SYCL – Introduction and Hands-On

Under the Hood of SYCL – an Initial Performance Analysis with an Unstructured-Mesh CFD Application