Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014 Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos
© Copyright Khronos Group 2014 - Page 1 Khronos Connects Software to Silicon
Open Consortium creating ROYALTY-FREE, OPEN STANDARD APIs for hardware acceleration
Defining the roadmap for low-level silicon interfaces needed on every platform
Graphics, compute, rich media, vision, sensor and camera processing
Rigorous specifications AND conformance tests for cross- vendor portability
Acceleration APIs BY the Industry FOR the Industry Well over a BILLION people use Khronos APIs Every Day…
© Copyright Khronos Group 2014 - Page 2 Khronos Standards
3D Asset Handling - 3D authoring asset interchange - 3D asset transmission format with compression Visual Computing - 3D Graphics - Heterogeneous Parallel Computing
Over 100 companies defining royalty-free APIs to connect software to silicon Camera Control API Acceleration in HTML5 - 3D in browser – no Plug-in - Heterogeneous computing for JavaScript Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion
© Copyright Khronos Group 2014 - Page 3 The OpenGL Family
OpenGL 4.4 is the industry’s most advanced 3D API Cross platform – Windows, Linux, Mac, Android Foundation for productivity apps Target for AAA engines and games
The most pervasively available 3D API – 1.6 Billion devices and counting Almost every mobile and embedded device – inc. Android, iOS Bringing proven desktop functionality to mobile
JavaScript binding to OpenGL ES Enabling the Web with GPU access Almost pervasive availability on mobile and desktop browsers Truly portable 3D apps with HTML5
© Copyright Khronos Group 2014 - Page 4 OpenGL ES 3.1 Launched at GDC! • Headline features - Compute Shaders and Draw-Indirect - Compute shaders can create geometry or other rendering data • Expecting rapid adoption - driver upgrade for many SOCs - Backward compatible with 2.0/3.0 so apps can incrementally adopt features • Enabling desktop OpenGL to be used for mobile development - ARB_ES_3_1_compatibility specification to support “OpenGL ES 3.1 context”
Driver Silicon Silicon Driver Update Update Update Update 2002 2003 2004 2007 2012 2014 Working 1.0 1.1 2.0 3.0 3.1 Group Formed
© Copyright Khronos Group 2014 - Page 5 OpenGL Fallacy: Old and Inefficient
Immediate Display Lists Mode Fixed Function
Evaluators Ancient crufty stuff Feedback
Selectors Selection
© Copyright Khronos Group 2014 - Page 6 OpenGL Reality: Modern & Efficient
Bindless Multi-Draw ARB Indirect GL4.3
Texture Arrays GL3.0
Buffer SSBO Storage GL4.3 GL4.4
UBO GL3.1
© Copyright Khronos Group 2014 - Page 7 Classic OpenGL Model
Memory… indirect draw buffer object
buffer object CPU texture object buffer object
buffer object texture object GPU buffer object buffer object
cmd cmd cmd cmd buffer object
render target Direct Drawing Commands buffer object
(via the command fifo)
© Copyright Khronos Group 2014 - Page 8 … Efficient OpenGL Model Memory access mediated through … Memory OpenGL fences indirect draw buffer object indirect draw buffer object CPU CPU texture object buffer object indirect draw buffer object texture object buffer object GPU buffer object CPU CPU buffer object
render target CPU Writes Memory – buffer object GPU Writes Commands to Memory multi-threaded (no API)! Reads Commands from Memory No API – Minimal CPU Involvement
… © Copyright Khronos Group 2014 - Page 9 Results • OpenGL enables scalable multi-threading with no new API - CPU and GPU Cores just write to memory - GPU work creation - builds buffers, constructs MDI commands • Integer multiple speedups ~5x – ~15x (not a typo) - On driver limited cases, obviously • Works TODAY on existing drivers! - Mostly OpenGL 4.2+ - Extensions are at least EXT • Does not require a new object model - Does not require breaking existing applications • http://blogs.nvidia.com/blog/2014/03/20/opengl-gdc2014/
© Copyright Khronos Group 2014 - Page 10 EGL 1.5 Released • EGL 1.5 brings functionality from multiple extensions into core - Increased reliability and portability Applications • EGLImages API Interop EGL provides efficient - Sharing textures and renderbuffers transfer of data and events between Khronos APIs • Context Robustness - Defending against malicious code • EGLSync objects - Improved OpenGL /OpenCL interop • Platform extensions - Standardized interactions for multiple OS Application Portability e.g. Android and 64-bit platforms EGL abstracts graphics context management, surface and • sRGB colorspace rendering buffer binding and rendering synchronization
• NEXT – EGLStreams into core for vision and OS and Display camera interop Platforms
© Copyright Khronos Group 2014 - Page 11 OpenCL as Parallel Compute Foundation • 100+ tool chains and languages leveraging OpenCL - Heterogeneous solutions emerging for the most popular programming languages
Aparapi River Trail PyOpenCL Harlan C++ AMP Halide WebCL Compiler Shevlin Park Image JavaScript Java language Language Python wrapper High level directives for Processing binding to extensions for extensions to around language for GPU Uses Clang Fortran C and C++ and LLVM Language OpenCL parallelism JavaScript OpenCL programming
OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources
Device X Device Y Device Z
© Copyright Khronos Group 2014 - Page 12 Widening OpenCL Ecosystem
Alternative High-level OpenCL C Alternative High-level AlternativeLanguage for AppsFrameworks and Kernel Source Language for Frameworks LanguageKernels for Frameworks Kernels Kernels
SPIR Generator (e.g. patched Clang)
SPIR SYCL Standard Portable Programming abstraction that combines Intermediate Representation portability and efficiency of OpenCL with SPIR 1.2 Released ease of use and flexibility of C++
January 2014 OpenCL run-time OpenCL C SPIR 1.2 Released here at GDC! can consume SPIR Runtime
Device X Device Y Device Z © Copyright Khronos Group 2014 - Page 13 WebCL: Heterogeneous Computing for the Web • WebCL 1.0 specification officially finalized today at GDC! - https://www.khronos.org/webcl • WebCL defines JavaScript binding to the OpenCL APIs - Enables initiation of Kernels written in OpenCL C within the browser • Typical Use Cases - 3D asset codecs, video codecs and processing, imaging and vision processing - Physics for WebGL games, Online data visualization, Augmented Reality
OpenCL KernelOpenCL CodeKernel OpenCL CodeKernel OpenCL C JavaScript Runtime API JavaScript Platform API CodeKernel To query, select and initialize Code To build and execute kernels compute devices across multiple devices GPU DSP CPU CPU HW
© Copyright Khronos Group 2014 - Page 14 WebGL/WebCL Ecosystem
Low-level APIs provide Content Content downloaded from the Web a powerful foundation JavaScript, HTML, CSS, ... for a rich JavaScript Middleware can make WebGL and WebCL middleware ecosystem accessible to non-expert programmers E.g. three.js library: http://threejs.org/ used by JavaScript Middleware majority of WebGL content
Browser provides WebGL and WebCL Alongside other HTML5 technologies No plug-in required HTML5 JavaScript / CSS
OS Provided Drivers WebGL uses OpenGL ES 2.0 or Angle for OpenGL ES 2.0 over DX9 WebCL uses OpenCL 1.X
© Copyright Khronos Group 2014 - Page 15 OpenVX – Power Efficient Vision Acceleration • Acceleration API for real-time vision - Focus on mobile and embedded systems • Enable diverse efficient implementations - From CPUs, through GPUs and DSPs to dedicated hardware Application • Foundational API for vision acceleration
- Can be used by middleware libraries or OpenCV open Other higher-level by applications directly source library CV libraries • Complementary to OpenCV - Which is great for prototyping
• Khronos open source sample implementation - To be released with final specification - Sample - not reference - spec remains the Open source sample Hardware vendor definitive definition of OpenVX operation implementation implementations
© Copyright Khronos Group 2014 - Page 16 OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Processing can be tiled to keep data entirely in local memory/cache • EGLStreams can provide data and event interop with other Khronos APIs - BUT use of other Khronos APIs are not mandated
OpenVX Node Native OpenVX OpenVX Camera Node Node Control OpenVX Node Heterogeneous Processing Example OpenVX Graph
© Copyright Khronos Group 2014 - Page 17 OpenVX 1.0 Function Overview • Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters • Image Processing OpenVX Specification - Arithmetic, Logical, and statistical operations Evolution - Multichannel Color and BitDepth Extraction and Conversion OpenVX 1.0 defines - 2D Filtering and Morphological operations framework for - Image Resizing and Warping creating, managing and executing graphs • Core Computer Vision - Pyramid computation - Integral Image computation Focused set of widely • Feature Extraction and Tracking used functions that are readily accelerated - Histogram Computation and Equalization Widely used extensions adopted into future - Canny Edge Detection versions of the core - Harris and FAST Corner detection Implementers can add - Sparse Optical Flow functions as extensions
© Copyright Khronos Group 2014 - Page 18 OpenVX and OpenCV are Complementary
Community driven open source Formal specification defined and Governance with no formal specification implemented by hardware vendors No conformance tests for consistency and Full conformance test suite / process Conformance every vendor implements different subset creates a reliable acceleration platform Portability APIs can vary depending on processor Hardware abstracted for portability Very wide Tight focus on hardware accelerated Scope 1000s of imaging and vision functions functions for mobile vision Multiple camera APIs/interfaces Use external camera API Memory-based architecture Graph-based execution Efficiency Each operation reads and writes memory Optimizable computation, data transfer Use Case Rapid experimentation Production development & deployment
© Copyright Khronos Group 2014 - Page 19 Vision Developers – API Choice
GPU Compute Shaders Flexible GLSL compute and imaging Easy to integrate into existing graphics apps Can mean less APIs and contexts for app to manage Limited to single GPU
Out of the Box Vision Framework Vision operators and graph framework library Can run on dedicated hardware – no compiler Suited for low-power, always-on acceleration Easier performance portability to diverse hardware
General Purpose Heterogeneous Programming Framework Flexible, low-level access to any devices with OpenCL compiler Single run-time framework for CPUs, GPUs, DSPs, hardware Needs full compiler stack and IEEE precision Can be used to code new OpenVX nodes
© Copyright Khronos Group 2014 - Page 20 Need for Camera Control API • Advanced control of ISP and camera subsystem - Generate sophisticated image stream for advanced imaging & vision apps • No platform API currently fulfills all developer requirements - Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays - Cross sensor synch: e.g. synch of camera and MEMS sensors - Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI - Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing
Scope of Camera Control API
3A - Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF) Lens, sensor, aperture control
Bayer RGB/YUV Image Signal Image/Vision Pre-processing Post-processing Processor (ISP) Applications
Sensor, Color Filter Array Lens, Flash, Focus, Aperture
© Copyright Khronos Group 2014 - Page 21 Camera API Architecture will be FCAM-based • No global state - State travels with image requests - Every stage in the pipeline may have different state - Enables fast, deterministic state changes • Synchronize devices - Lens, flash, sound capture, gyro… - Devices can schedule Actions - E.g. to be triggered on exposure change
© Copyright Khronos Group 2014 - Page 22 Low-level Sensor Abstraction API
Apps request semantic sensor information StreamInput defines possible requests, e.g. Read Physical or Virtual Sensors e.g. “Game Quaternion” Context detection e.g. “Am I in an elevator?”
Apps Need Sophisticated Access to Sensor Data Without coding to specific Advanced Sensors Everywhere sensor hardware Multi-axis motion/position, quaternions, context-awareness, gestures, activity Sensor Discoverability monitoring, health and environmental sensors Sensor Code Portability
StreamInput processing graph provides optimized sensor data stream High-value, smart sensor fusion middleware can connect to apps in a portable way Apps can gain ‘magical’ situational awareness
© Copyright Khronos Group 2014 - Page 23 Khronos APIs for Augmented Reality
AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together Audio Rendering
MEMS Application Sensors Sensor on CPUs, GPUs Fusion and DSPs
Precision timestamps Vision Processing on all sensor samples
Advanced Camera EGLStream - 3D Rendering and Video Control and stream stream data Composition generation between APIs On GPU Camera Control API
© Copyright Khronos Group 2014 - Page 24 NVIDIA Use of Khronos Standards • Shipping - OpenGL 4.4 - OpenGL ES 3.X - WebGL 1.0 - OpenCL 1.X - EGL 1.4 • Implementing - OpenVX 1.0 Provisional (foundational framework for VisionWorks) • Participating Camera - Camera Working Group Control API - StreamInput
© Copyright Khronos Group 2014 - Page 25