Opencl Overview and Update
Total Page:16
File Type:pdf, Size:1020Kb
OpenCL Overview and Update Neil Trevett | Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair [email protected] | @neilt3d Santa Clara, May 2019 © Copyright Khronos® Group 2019 - Page 1 Khronos Mission Khronos members are industry leaders from around the world that join to safely cooperate - to advance their own businesses and the industry as a whole Software Silicon Khronos is an open, member-driven industry consortium developing royalty-free standards, and vibrant ecosystems, to harness the power of silicon acceleration for demanding graphics rendering and computationally intensive applications © Copyright Khronos® Group 2019 - Page 2 Active Khronos Standards Industry involvement, engagement and feedback are the lifeblood of any successful standard © Copyright Khronos® Group 2019 - Page 3 Levels of Khronos Engagement Open to all! https://community.khronos.org/c/opencl www.khr.io/slack Spec fixes and suggestions made under the Khronos IP Framework. Open source contributions OpenCL Forum under repo’s CLA – typically Apache 2.0 and Slack Channel https://github.com/KhronosGroup Contribute to open source Advisors under the Khronos IP specs, CTS and tools Framework can comment and contribute to requirements and draft specifications Advisory Panel Khronos members under IP Framework and NDA can participate and vote in working group meetings Working Group © Copyright Khronos® Group 2019 - Page 4 OpenCL Heterogeneous Computing A programming and runtime framework for heterogeneous compute resources Low-level control over memory allocation and parallel task execution Simpler and relatively lightweight compared to GPU APIs Application Application Growing number of optimized OpenCL libraries Vision and Imaging Machine Learning and Inferencing Fragmented GPU Linear Algebra and Mathematics API Landscape Physics GPU CPU CPUCPU GPUGPU Heterogeneous FPGA DSP Compute Many mobile SOCs and Embedded Systems becoming increasingly heterogeneous Resources Autonomous vehicles Vision and inferencing Custom Hardware © Copyright Khronos® Group 2019 - Page 5 OpenCL – Low-level Parallel Programing • Low-level programming of heterogeneous parallel compute resources - One code tree can be executed on CPUs, GPUs, DSPs and FPGA … • OpenCL C or C++ language to write kernel programs to execute on any compute device - Platform Layer API - to query, select and initialize compute devices - Runtime API - to build and execute kernels programs on multiple devices • The programmer gets to control: - What programs execute on what device - Where data is stored in various speed and size memories in the system - When programs are run, and what operations are dependent on earlier operations GPU OpenCL CPU KernelOpenCL Kernel code DSP Runtime API CodeKernelOpenCL compiled for CPU loads and executes CPU CodeKernelOpenCL devices FPGA kernels across devices CodeKernel Devices Code Host © Copyright Khronos® Group 2019 - Page 6 OpenCL Versions OpenCL C++ Kernel Language Static subset of C++14 Templates and Lambdas SPIR-V 1.2 with C++14 support e.g. constructors/destructors Pipes Efficient device-scope communication between kernels OpenCL 1.2 is the de-facto baseline Code Gen Optimizations: - Specialization constants at version for all non-FPGA processors SPIR-V compilation time - Constructors and destructors of program scope global objects - User callbacks can be set at SPIR-V in Core program release time Subgroups into core Subgroup query operations Shared Virtual Memory clCloneKernel On-device dispatch Low-latency device Generic Address Space timer queries 3-component vectors Enhanced Image Support Device partitioning Additional image formats C11 Atomics Separate compilation and linking Multiple hosts and devices Pipes Enhanced image support Buffer region operations Android ICD Built-in kernels / custom devices Enhanced event-driven execution Enhanced DX and OpenGL Interop Additional OpenCL C built-ins Improved OpenGL data/event interop Dec08 18 months Jun10 18 months Nov11 24 months Nov13 24 months Nov15 18 months May17 OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 OpenCL 2.1 OpenCL 2.2 Specification Specification Specification Specification Specification Specification © Copyright Khronos® Group 2019 - Page 7 OpenCL Conformant Implementations 1.0|May09 1.1|Jul11 1.2|Jun12 1.0|Aug09 1.1|Aug10 1.2|May12 2.0|Dec14 1.0|May10 1.1|Feb11 Desktop 1.1|Mar11 1.2|Dec12 2.0|Jul14 2.1 | Jun16 1.0|May09 1.1|Jun10 1.2|May15 1.1|Aug12 1.2|Mar16 2.0|Apr17 1.0 | Feb11 1.2|Sep13 Mobile 1.1|Nov12 1.2|Apr14 2.0|Nov15 1.1|Apr12 1.2|Dec14 1.2|Sep14 1.1|May13 1.0|Jan10Adoption Since Last EVS Embedded Intel: 2.1 for latest processors on Windows and Linux 1.2|May15 Intel: 1.0 on Arria 10 GX FPGA 1.2|Aug15 NVIDIA: 1.2 for Turing GPUs on Windows and Linux Qualcomm: 2.0 on Adreno GPUs on Android 1.0|Jul13 FPGA 1.0|Dec14 Vendor timelines are Dec08 Jun10 Nov11 Nov13 Nov15 first conformant OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 OpenCL 2.1 submission for each spec Specification Specification Specification Specification Specification generation © Copyright Khronos® Group 2019 - Page 8 OpenCL Apps and Libs OpenCL is Pervasive! 70% Increase in two years Modo Desktop Creative Apps clDNN Parallel Computation Languages Intel Intel Huawei Android NNAPI Arm Compute Library CLBlast SYCL-BLAS AMD MIVisionX Linear Algebra Libraries TI Deep Learning Library (TIDL) VeriSilicon Synopsis MetaWare EV Machine Learning Libraries Math and Physics Libraries Machine Learning Inferencing Compilers https://www.khronos.org/opencl/resources/opencl-applications-using-opencl Vision and Imaging Libraries https://www.iwocl.org/resources/opencl-libraries-and-toolkits/ © Copyright Khronos® Group 2019 - Page 9 OpenCL Open Source Specs and Tests Source Materials for Specifications and Reference Documentation CONTRIBUTED Under Khronos IP Framework Khronos has open sourced OpenCL (you won’t assert patents against conformant implementations, Specifications and Conformance Tests and license copyright for Khronos use) Merge requests welcome from the community (subject to review by OpenCL working group) Spec and File bugs for specifications, headers etc. Ref Language Redistribution Mix your own documentation! Source under CC-BY 4.0 Contributions and Distribution under Apache 2.0 Spec Build Contributions and System and Scripts Distribution under Conformance Apache 2.0 Test Suite Source Khronos builds and Ratifies Spec and Ref Language Source and Canonical Specification under derivative materials. Khronos IP Framework. Re-mixable under CC-BY by the Khronos Adopters No changes or re-hosting allowed industry and community Program Anyone can test any Conformant Implementations can Community built implementation at use trademark and are covered documentation and any time by Khronos IP Framework tools © Copyright Khronos® Group 2019 - Page 11 Underway - Unified OpenCL Specification • Unified OpenCL API specification will describe the API for all versions of OpenCL - Rather than having a separate specification per version • OpenCL SPIR-V environment, extension and SPIR-V specs are already unified - Working well – good developer feedback • Easier for developers to navigate - And to consistently apply specification fixes and clarifications • Working Group has started prototyping the spec work - Short introductory section describing the unified aspects - "missing before X.Y" and "deprecated by X.Y" language - Roughly as SPIR-V specification - https://github.com/KhronosGroup/OpenCL-Docs/issues/77 • DRAFT unified spec is already uploaded! - https://github.com/KhronosGroup/OpenCL-Docs/files/3170333/OpenCL_API.pdf - Feedback welcome! © Copyright Khronos® Group 2019 - Page 12 OpenCL Evolution OpenCL Extension Specs Integration of Extensions plus Vulkan / OpenCL Interop New Core functionality Scratch-Pad Memory Management Vulkan-like loader and layers Extended Subgroups ‘Flexible Profile’ for Deployment Flexibility SPIR-V 1.4 ingestion for compiler efficiency SPIR-V Extended debug info Spec Maintenance Updates May 2017 Target 2020 OpenCL 2.2 Regular updates for spec clarifications and bug fixes ‘OpenCL Next’ Repeat Cycle for next Core Specification © Copyright Khronos® Group 2019 - Page 13 Embedded Processors & OpenCL Conformance • The embedded market is a new frontier needing advanced heterogenous compute - E.g. Vision and inferencing using a wide range of processor architectures • BUT OpenCL is currently monolithic – and arguably desktop/HPC-centric - E.g. a processor without 32-bit IEEE floating point cannot realistically be conformant - Vendors and developers do not want software emulation of higher precisions • Many functionality requirements change between different markets and processors Supported Precisions DSP A DSP B DSP C 8-bit integer 16-bit integer OpenCL can drive significantly 32-bit integer more adoption in the embedded processor market 64-bit integer 16-bit float 32-bit float 64-bit float Possible to be OpenCL Compliant? No No Yes © Copyright Khronos® Group 2019 - Page 14 Flexible Profile and Feature Sets • In OpenCL Next Flexible Profile features become optional for enhanced deployment flexibility - API and language features e.g. floating point precisions • Feature Sets reduce danger of fragmentation - Defined to suit specific markets – e.g. desktop, embedded vision and inferencing • Implementations are conformant if fully support feature set functionality - Supporting Feature Sets will help drive sales – encouraging consistent functionality per market - An implementation may support multiple Feature