Khronos SIGGRAPH Asia Macau Dec16
Total Page:16
File Type:pdf, Size:1020Kb
NNEF New! Neural Net Exchange Format Khronos VR Initiative! Graphics, Compute and Vision APIs Including Vulkan Next Generation GPU Acceleration SIGGRAPH Asia 2016, Macau Neil Trevett Vice President Developer Ecosystem, NVIDIA | President, Khronos [email protected] | @neilt3d November 2016 © Copyright Khronos Group 2016 - Page 1 Khronos Mission Software Silicon Khronos is an International Industry Consortium of over 100 companies creating royalty-free, open standard APIs to enable software to access hardware acceleration for 3D graphics, parallel computing and vision processing © Copyright Khronos Group 2016 - Page 2 Khronos Standards 3D for the Web - JavaScript API for 3D Graphics - Transmission of 3D runtime assets New! Khronos VR Initiative! NNEF Vision / Camera - Tracking and positioning - Scene Analysis 3D Graphics - Games and Virtual Reality - Augmented Reality - CG Effects Parallel Computation - Product Design - Machine Learning - Safety Critical - Scene Comprehension - 3D scene model construction - Video Transcoding © Copyright Khronos Group 2016 - Page 3 OpenCL – Low-level Parallel Programing • Low level programming of heterogeneous parallel compute resources - One code tree can be executed on CPUs, GPUs, DSPs and FPGA • OpenCL C language to write kernel programs to execute on any compute device - Platform Layer API - to query, select and initialize compute devices - Runtime API - to build and execute kernels programs on multiple devices • New in OpenCL 2.2 - OpenCL C++ kernel language - a static subset of C++14 - Adaptable and elegant sharable code – great for building libraries - Templates enable meta-programming for highly adaptive software - Lambdas used to implement nested/dynamic parallelism GPU OpenCL CPU KernelOpenCL Kernel code DSP Runtime API CodeKernelOpenCL compiled for CPU loads and executes CPU CodeKernelOpenCL FPGA CodeKernel devices kernels across devices Devices Code Host © Copyright Khronos Group 2016 - Page 4 OpenCL 2.2 - Top to Bottom C++ Single Source C++ Programming Full support for features in C++14-based Kernel Language API and Language Specs Brings C++14-based Kernel Language into core specification OpenCL C++ Kernel Language SPIR-V 1.1 with C++ support Portable Kernel Intermediate Language SYCL 2.2 for single source C++ Support for C++14-based kernel language e.g. constructors/destructors SPIR-V in Core Subgroups into core Subgroup query operations Shared Virtual Memory clCloneKernel On-device dispatch Low-latency device Generic Address Space timer queries 3-component vectors Enhanced Image Support Device partitioning Additional image formats C11 Atomics Separate compilation and linking Multiple hosts and devices Pipes Enhanced image support Buffer region operations Android ICD Built-in kernels / custom devices Enhanced event-driven execution Enhanced DX and OpenGL Interop Additional OpenCL C built-ins Improved OpenGL data/event interop Dec08 18 months Jun10 18 months Nov11 24 months Nov13 24 months Nov15 7months May16 OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 OpenCL 2.1 OpenCL 2.2 Specification Specification Specification Specification Specification PROVISIONAL © Copyright Khronos Group 2016 - Page 5 SPIR-V Transforms the Language Ecosystem • First multi-API, intermediate language for parallel compute and graphics - Natively represents structures in shader and kernel languages - https://www.khronos.org/registry/spir-v/papers/WhitePaper.pdf Diverse Languages and Frameworks Multiple Developer Advantages Use same front-end compiler for all platforms Standard Ship SPIR-V – not shader source code Tools for Portable Simpler and more reliable drivers analysis and optimization Reduces runtime kernel compilation time Intermediate Representation Hardware runtimes on multiple architectures © Copyright Khronos Group 2016 - Page 6 Third party kernel and SPIR-V Ecosystem shader Languages Khronos has open sourced ‘glslang’ GLSL to these tools and translators HLSL GLSL SPIR-V compiler Khronos plans to open source these tools soon OpenCL C OpenCL C++ https://github.com/KhronosGroup/SPIRV-Tools SPIR-V (Dis)Assembler LLVM to SPIR-V Bi-directional Translator SPIR-V Validator LLVM Other SPIR-V Intermediate • Khronos defined and controlled Forms cross-API intermediate language • Native support for graphics and parallel constructs • 32-bit Word Stream • Extensible and easily parsed • Retains data object and control IHV Driver ARB_gl_spirv flow information for effective Runtimes extension code generation and translation © Copyright Khronos Group 2016 - Page 7 SYCL for OpenCL • Single-source heterogeneous programming using STANDARD C++ - Use C++ templates and lambda functions for host & device code • Aligns the hardware acceleration of OpenCL with direction of the C++ standard - C++14 with open source C++17 Parallel STL hosted by Khronos Developer Choice The development of the two specifications are aligned so code can be easily shared between the two approaches C++ Kernel Language Single-source C++ Low Level Control Programmer Familiarity ‘GPGPU’-style separation of Approach also taken by device-side kernel source C++ AMP and OpenMP code and host code © Copyright Khronos Group 2016 - Page 8 OpenCV OpenCV Transparent API for OpenCL Kernel Offload • One OpenCL queue per CPU thread • Extensive and widely used open source • CPU threads can share a device vision library - written in optimized C/C++ • OpenCL kernels are executed asynchronously - Free-use BSD license • C++, C, Python and Java interfaces OpenCV Application - Windows, Linux, Mac OS, iOS and Android • Increasingly taking advantage of heterogeneous processing using OpenCL CPU CPU … CPU - OpenCV 3.X Transparent API; Thread Thread Thread single API entry for each function/algorithm - Dynamically loads OpenCL runtime if available; otherwise falls back to CPU code ocl::Queue ocl::Queue … ocl::Queue - Runtime Dispatching; no recompilation! ocl::Device … ocl::Device ocl::Context © Copyright Khronos Group 2016 - Page 9 OpenCL Conformant Implementations 1.0 | May09 1.1 | Jul11 1.2 | Jun12 1.0 | Aug09 1.1 | Aug10 1.2 | May12 2.0 | Dec14 1.0 | May10 1.1 | Feb11 Desktop 1.1 |Mar11 1.2 | Dec12 2.0 | Jul14 2.1 | Jun15 1.0 | May09 1.1 | Jun10 1.2 | May15 1.1 | Aug12 1.2 | Mar16 1.0 | Feb11 1.2 | Sep13 Mobile 1.1 | Nov12 1.2 | Apr14 2.0 | Nov15 1.1 | Apr12 1.2 | Dec14 1.2 | Sep14 1.1 | May13 1.0 | Jan10 Embedded 1.2 | May15 1.2 | Aug15 1.0 | Jul13 FPGA 1.0 | Dec14 Vendor timelines are Dec08 Jun10 Nov11 Nov13 Nov15 first implementation of OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 OpenCL 2.1 each spec generation Specification Specification Specification Specification Specification © Copyright Khronos Group 2016 - Page 10 OpenCL Roadmap DISCUSSION • Thin, powerful, explicit run-time for control and predictability • Feature sets and dial-able precision for target market agility • Installable tools and three layer ecosystem for flexibility Applications Vendor-supplied and Math Language Tool open source middleware Libraries Front-ends Layers API Definitions Dial-able types Thin, explicit run-time with rigorous Real-time Pre- Explicit Self-synchronized, Stream and precision memory/execution model. emption and Asynch self-scheduled Processing … QoS scheduling DMA graphs Installable tool & Low-latency, fine-grain pre-emption validation layers and synchronization Features that can be enabled for particular target markets © Copyright Khronos Group 2016 - Page 11 Khronos Standards 3D for the Web - JavaScript API for 3D Graphics - Transmission of 3D runtime assets New! Khronos VR Initiative! NNEF Vision / Camera - Tracking and positioning - Scene Analysis 3D Graphics - Games and Virtual Reality - Augmented Reality - CG Effects Parallel Computation - Product Design - Machine Learning - Safety Critical - Scene Comprehension - 3D scene model construction - Video Transcoding © Copyright Khronos Group 2016 - Page 12 The Need for a New Generation GPU API • Explicit - Direct GPU control, predictable - no driver surprises • Faster - Reduce CPU overhead and latency, multi-thread scaling • Portable - Efficient on cloud, desktop, console, mobile and embedded OpenGL has evolved over 25 years and GPUs are increasingly programmable and GPUs will accelerate graphics, compute, vision continues to meet industry needs – but there is compute capable + platforms are becoming and deep learning across diverse platforms: a need for a complementary API approach mobile, memory-unified and multi-core FLEXIBILITY and PORTABILITY are key © Copyright Khronos Group 2016 - Page 13 Three New Generation GPU APIs Only Windows 10 Only Apple Cross Platform N Vulkan is extensible – just like OpenGL - and so is the new only generation API where hardware vendors can deliver innovations to market whenever they need © Copyright Khronos Group 2016 - Page 14 The Key Principle of Vulkan: Explicit Control • Application tells the driver what it is going to do You have to supply a LOT of information! - In enough detail that driver doesn’t have to guess - When the driver needs to know it And, you have to supply it early • In return, driver promises to do - What the application asks for - When it asks for it - Very quickly Efficient, predictable performance • No driver magic – no surprises © Copyright Khronos Group 2016 - Page 15 Vulkan - No Compromise Performance Potential Performance Gain ‘Half Way New Gen’ Retains Traditional Binding Model Mixes OpenGL ES 3.1/OpenCL 1.2 C++11-based kernel language Bindings to Objective-C and Swift Amount of work to port (missing functionality such as Geometry Shaders) from traditional