<<

Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014 Neil Trevett Vice President Mobile Ecosystem, President Khronos

© Copyright 2014 - Page 1 Khronos Connects Software to Silicon

Open Consortium creating ROYALTY-FREE, for hardware acceleration

Defining the roadmap for low-level silicon interfaces needed on every platform

Graphics, compute, rich media, vision, sensor and camera processing

Rigorous specifications AND conformance tests for cross- vendor portability

Acceleration APIs BY the Industry FOR the Industry Well over a BILLION people use Khronos APIs Every Day…

© Copyright Khronos Group 2014 - Page 2 Khronos Standards

3D Asset Handling - 3D authoring asset interchange - 3D asset transmission format with compression Visual Computing - 3D Graphics - Heterogeneous Parallel Computing

Over 100 companies defining royalty-free APIs to connect software to silicon Camera Control API Acceleration in HTML5 - 3D in browser – no Plug-in - Heterogeneous computing for JavaScript Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion

© Copyright Khronos Group 2014 - Page 3 The OpenGL Family

OpenGL 4.4 is the industry’s most advanced 3D API Cross platform – Windows, Linux, Mac, Android Foundation for productivity apps Target for AAA engines and games

The most pervasively available 3D API – 1.6 Billion devices and counting Almost every mobile and embedded device – inc. Android, iOS Bringing proven desktop functionality to mobile

JavaScript binding to OpenGL ES Enabling the Web with GPU access Almost pervasive availability on mobile and desktop browsers Truly portable 3D apps with HTML5

© Copyright Khronos Group 2014 - Page 4 OpenGL ES 3.1 Launched at GDC! • Headline features - Compute Shaders and Draw-Indirect - Compute shaders can create geometry or other rendering data • Expecting rapid adoption - driver upgrade for many SOCs - Backward compatible with 2.0/3.0 so apps can incrementally adopt features • Enabling desktop OpenGL to be used for mobile development - ARB_ES_3_1_compatibility specification to support “OpenGL ES 3.1 context”

Driver Silicon Silicon Driver Update Update Update Update 2002 2003 2004 2007 2012 2014 Working 1.0 1.1 2.0 3.0 3.1 Group Formed

© Copyright Khronos Group 2014 - Page 5 OpenGL Fallacy: Old and Inefficient

Immediate Display Lists Mode Fixed Function

Evaluators Ancient crufty stuff Feedback

Selectors Selection

© Copyright Khronos Group 2014 - Page 6 OpenGL Reality: Modern & Efficient

Bindless Multi-Draw ARB Indirect GL4.3

Texture Arrays GL3.0

Buffer SSBO Storage GL4.3 GL4.4

UBO GL3.1

© Copyright Khronos Group 2014 - Page 7 Classic OpenGL Model

Memory… indirect draw buffer object

buffer object CPU texture object buffer object

buffer object texture object GPU buffer object buffer object

cmd cmd cmd cmd buffer object

render target Direct Drawing Commands buffer object

(via the command fifo)

© Copyright Khronos Group 2014 - Page 8 … Efficient OpenGL Model Memory access mediated through … Memory OpenGL fences indirect draw buffer object indirect draw buffer object CPU CPU texture object buffer object indirect draw buffer object texture object buffer object GPU buffer object CPU CPU buffer object

render target CPU Writes Memory – buffer object GPU Writes Commands to Memory multi-threaded (no API)! Reads Commands from Memory No API – Minimal CPU Involvement

… © Copyright Khronos Group 2014 - Page 9 Results • OpenGL enables scalable multi-threading with no new API - CPU and GPU Cores just write to memory - GPU work creation - builds buffers, constructs MDI commands • Integer multiple speedups ~5x – ~15x (not a typo) - On driver limited cases, obviously • Works TODAY on existing drivers! - Mostly OpenGL 4.2+ - Extensions are at least EXT • Does not require a new object model - Does not require breaking existing applications • http://blogs.nvidia.com/blog/2014/03/20/opengl-gdc2014/

© Copyright Khronos Group 2014 - Page 10 EGL 1.5 Released • EGL 1.5 brings functionality from multiple extensions into core - Increased reliability and portability Applications • EGLImages API Interop EGL provides efficient - Sharing textures and renderbuffers transfer of data and events between Khronos APIs • Context Robustness - Defending against malicious code • EGLSync objects - Improved OpenGL /OpenCL interop • Platform extensions - Standardized interactions for multiple OS Application Portability e.g. Android and 64-bit platforms EGL abstracts graphics context management, surface and • sRGB colorspace rendering buffer binding and rendering synchronization

• NEXT – EGLStreams into core for vision and OS and Display camera interop Platforms

© Copyright Khronos Group 2014 - Page 11 OpenCL as Parallel Compute Foundation • 100+ tool chains and languages leveraging OpenCL - Heterogeneous solutions emerging for the most popular programming languages

Aparapi River Trail PyOpenCL Harlan ++ AMP Halide WebCL Compiler Shevlin Park Image JavaScript Java language Language Python wrapper High level directives for Processing binding to extensions for extensions to around language for GPU Uses Clang Fortran C and C++ and LLVM Language OpenCL parallelism JavaScript OpenCL programming

OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources

Device X Device Y Device Z

© Copyright Khronos Group 2014 - Page 12 Widening OpenCL Ecosystem

Alternative High-level OpenCL C Alternative High-level AlternativeLanguage for AppsFrameworks and Kernel Source Language for Frameworks LanguageKernels for Frameworks Kernels Kernels

SPIR Generator (e.g. patched Clang)

SPIR SYCL Standard Portable Programming abstraction that combines Intermediate Representation portability and efficiency of OpenCL with SPIR 1.2 Released ease of use and flexibility of C++

January 2014 OpenCL run-time OpenCL C SPIR 1.2 Released here at GDC! can consume SPIR Runtime

Device X Device Y Device Z © Copyright Khronos Group 2014 - Page 13 WebCL: Heterogeneous Computing for the Web • WebCL 1.0 specification officially finalized today at GDC! - https://www.khronos.org/webcl • WebCL defines JavaScript binding to the OpenCL APIs - Enables initiation of Kernels written in OpenCL C within the browser • Typical Use Cases - 3D asset codecs, video codecs and processing, imaging and vision processing - Physics for WebGL games, Online data visualization, Augmented Reality

OpenCL KernelOpenCL CodeKernel OpenCL CodeKernel OpenCL C JavaScript Runtime API JavaScript Platform API CodeKernel To query, select and initialize Code To build and execute kernels compute devices across multiple devices GPU DSP CPU CPU HW

© Copyright Khronos Group 2014 - Page 14 WebGL/WebCL Ecosystem

Low-level APIs provide Content Content downloaded from the Web a powerful foundation JavaScript, HTML, CSS, ... for a rich JavaScript Middleware can make WebGL and WebCL middleware ecosystem accessible to non-expert programmers E.g. three.js library: http://threejs.org/ used by JavaScript Middleware majority of WebGL content

Browser provides WebGL and WebCL Alongside other HTML5 technologies No plug-in required HTML5 JavaScript / CSS

OS Provided Drivers WebGL uses OpenGL ES 2.0 or Angle for OpenGL ES 2.0 over DX9 WebCL uses OpenCL 1.X

© Copyright Khronos Group 2014 - Page 15 OpenVX – Power Efficient Vision Acceleration • Acceleration API for real-time vision - Focus on mobile and embedded systems • Enable diverse efficient implementations - From CPUs, through GPUs and DSPs to dedicated hardware Application • Foundational API for vision acceleration

- Can be used by middleware libraries or OpenCV open Other higher-level by applications directly source library CV libraries • Complementary to OpenCV - Which is great for prototyping

• Khronos open source sample implementation - To be released with final specification - Sample - not reference - spec remains the Open source sample Hardware vendor definitive definition of OpenVX operation implementation implementations

© Copyright Khronos Group 2014 - Page 16 OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Processing can be tiled to keep data entirely in local memory/cache • EGLStreams can provide data and event interop with other Khronos APIs - BUT use of other Khronos APIs are not mandated

OpenVX Node Native OpenVX OpenVX Camera Node Node Control OpenVX Node Heterogeneous Processing Example OpenVX Graph

© Copyright Khronos Group 2014 - Page 17 OpenVX 1.0 Function Overview • Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters • Image Processing OpenVX Specification - Arithmetic, Logical, and statistical operations Evolution - Multichannel Color and BitDepth Extraction and Conversion OpenVX 1.0 defines - 2D Filtering and Morphological operations framework for - Image Resizing and Warping creating, managing and executing graphs • Core Computer Vision - Pyramid computation - Integral Image computation Focused set of widely • Feature Extraction and Tracking used functions that are readily accelerated - Histogram Computation and Equalization Widely used extensions adopted into future - Canny Edge Detection versions of the core - Harris and FAST Corner detection Implementers can add - Sparse Optical Flow functions as extensions

© Copyright Khronos Group 2014 - Page 18 OpenVX and OpenCV are Complementary

Community driven open source Formal specification defined and Governance with no formal specification implemented by hardware vendors No conformance tests for consistency and Full conformance test suite / process Conformance every vendor implements different subset creates a reliable acceleration platform Portability APIs can vary depending on processor Hardware abstracted for portability Very wide Tight focus on hardware accelerated Scope 1000s of imaging and vision functions functions for mobile vision Multiple camera APIs/interfaces Use external camera API Memory-based architecture Graph-based execution Efficiency Each operation reads and writes memory Optimizable computation, data transfer Use Case Rapid experimentation Production development & deployment

© Copyright Khronos Group 2014 - Page 19 Vision Developers – API Choice

GPU Compute Shaders Flexible GLSL compute and imaging Easy to integrate into existing graphics apps Can mean less APIs and contexts for app to manage Limited to single GPU

Out of the Box Vision Framework Vision operators and graph framework library Can run on dedicated hardware – no compiler Suited for low-power, always-on acceleration Easier performance portability to diverse hardware

General Purpose Heterogeneous Programming Framework Flexible, low-level access to any devices with OpenCL compiler Single run-time framework for CPUs, GPUs, DSPs, hardware Needs full compiler stack and IEEE precision Can be used to code new OpenVX nodes

© Copyright Khronos Group 2014 - Page 20 Need for Camera Control API • Advanced control of ISP and camera subsystem - Generate sophisticated image stream for advanced imaging & vision apps • No platform API currently fulfills all developer requirements - Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays - Cross sensor synch: e.g. synch of camera and MEMS sensors - Advanced, high-frequency per- burst control of camera/sensor: e.g. ROI - Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing

Scope of Camera Control API

3A - Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF) Lens, sensor, aperture control

Bayer RGB/YUV Image Signal Image/Vision Pre-processing Post-processing Processor (ISP) Applications

Sensor, Color Filter Array Lens, Flash, Focus, Aperture

© Copyright Khronos Group 2014 - Page 21 Camera API Architecture will be FCAM-based • No global state - State travels with image requests - Every stage in the pipeline may have different state - Enables fast, deterministic state changes • Synchronize devices - Lens, flash, sound capture, gyro… - Devices can schedule Actions - E.g. to be triggered on exposure change

© Copyright Khronos Group 2014 - Page 22 Low-level Sensor Abstraction API

Apps request semantic sensor information StreamInput defines possible requests, e.g. Read Physical or Virtual Sensors e.g. “Game Quaternion” Context detection e.g. “Am I in an elevator?”

Apps Need Sophisticated Access to Sensor Data Without coding to specific Advanced Sensors Everywhere sensor hardware Multi-axis motion/position, quaternions, context-awareness, gestures, activity Sensor Discoverability monitoring, health and environmental sensors Sensor Code Portability

StreamInput processing graph provides optimized sensor data stream High-value, smart sensor fusion middleware can connect to apps in a portable way Apps can gain ‘magical’ situational awareness

© Copyright Khronos Group 2014 - Page 23 Khronos APIs for Augmented Reality

AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together Audio Rendering

MEMS Application Sensors Sensor on CPUs, GPUs Fusion and DSPs

Precision timestamps Vision Processing on all sensor samples

Advanced Camera EGLStream - 3D Rendering and Video Control and stream stream data Composition generation between APIs On GPU Camera Control API

© Copyright Khronos Group 2014 - Page 24 NVIDIA Use of Khronos Standards • Shipping - OpenGL 4.4 - OpenGL ES 3.X - WebGL 1.0 - OpenCL 1.X - EGL 1.4 • Implementing - OpenVX 1.0 Provisional (foundational framework for VisionWorks) • Participating Camera - Camera Working Group Control API - StreamInput

© Copyright Khronos Group 2014 - Page 25