Khronos Overview The State of the Art in Open Standards for Visual Computing Neil Trevett Khronos President Vice President Mobile Content, NVIDIA

© Copyright Khronos Group 2013 - Page 1 Khronos Connects Software to Silicon

ROYALTY-FREE, for advanced hardware acceleration

Low level silicon to software interfaces needed on every platform

Graphics, video, audio, compute, vision, sensor and camera processing

Defines the forward looking roadmap for the silicon community

Shipping on billions of devices across multiple operating systems

Rigorous conformance tests for cross-vendor consistency

Khronos is OPEN for any company to join and participate

Acceleration APIs BY the Industry FOR the Industry

© Copyright Khronos Group 2013 - Page 2 Making a Difference – One API at a Time

Well over 1 BILLION people are using what the Khronos members have created together - Every Day…

© Copyright Khronos Group 2013 - Page 3 Khronos Standards glTF cooperation with MPEG for 3D Asset Compression!

3D Asset Handling - Advanced Authoring pipelines - 3D Asset Transmission Format with OpenCL 2.0 Finalized! streaming and compression Visual Computing - Object and Terrain Visualization - Advanced scene construction

Over 100 companies defining royalty-free APIs to connect software to silicon Camera Control API OpenVX 1.0 Acceleration in the Browser Provisional - WebGL for 3D in browsers Released! - WebCL – Heterogeneous Computing for the web Sensor Processing - Mobile Vision Acceleration WebGL and WebCL - On-device Sensor Fusion Momentum!

© Copyright Khronos Group 2013 - Page 4 OpenCL Milestones • 24 month cadence for major OpenCL 2.0 update - Slightly longer than 18 month cadence between versions of OpenCL 1.X • Significant feedback from the developer community on Provisional Specification - Many suggestions were incorporated into the final 2.0 specification - Other feedback will be considered for future specification versions

OpenCL 1.1 OpenCL 2.0 Specification and Provisional Specification conformance tests released for public released review Dec08 Nov11 Nov13 Jun10 Jul13 OpenCL 1.0 released. OpenCL 1.2 OpenCL 2.0 Conformance tests Specification and Specification finalized released Dec08 conformance tests and conformance released tests released

© Copyright Khronos Group 2013 - Page 5 Key OpenCL 2.0 Features • Shared Virtual Memory - Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices • Nested Parallelism - Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks • Generic Address Space - Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application

© Copyright Khronos Group 2013 - Page 6 Broad OpenCL Implementer Adoption • Multiple conformant implementations shipping on desktop and mobile - For CPUs and GPUs on multiple OS • Android ICD extension released in latest extension specification - OpenCL implementations can be discovered and loaded as a shared object • Multiple implementations shipping in Android NDK - ARM, Imagination, Vivante, Qualcomm, Samsung …

© Copyright Khronos Group 2013 - Page 7 OpenCL as Parallel Compute Foundation • 100+ tool chains and languages leveraging OpenCL - Heterogeneous solutions emerging for the most popular programming languages

C++ AMP OpenCL HLM WebCL Aparapi River Trail PyOpenCL Harlan Shevlin Park C++ JavaScript binding to Java language Language Python wrapper High level Compiler Uses Clang syntax/compiler OpenCL for initiation extensions for extensions to around language for GPU directives for and LLVM extensions of OpenCL C kernels parallelism JavaScript OpenCL programming Fortran C and C++

OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources

© Copyright Khronos Group 2013 - Page 8 Widespread Developers Leveraging OpenCL

• Broad uptake of OpenCL in commercial applications - For desktop and increasingly mobile apps • “OpenCL” on Sourceforge, Github, Google Code, BitBucket finds over 2,000 projects - x264 - Handbrake - FFMPEG - JPEG - VLC - OpenCV - GIMP - ImageMagick - IrfanView - Hadoop, Memcched - Aparapi – A parallel API (for Java) - Bolt – a Unified Heterogeneous Library - Sumatra – next generation of compute enabled Java - WinZip - Crypto++ - Bullet physics library - Etc. Etc.

© Copyright Khronos Group 2013 - Page 9 OpenCL Academic Traction • OpenCL at over 100 Universities Worldwide Teaching multi-faceted programming courses - Research with top-tier Universities globally • Complete University Kits available - Presentation w/instructor & speaker notes - Example code, & sample application • Growing textbook ecosystem - US, Japan, Europe, China and India • Number of papers referencing OpenCL on Google Scholar is growing rapidly - Over 2000 papers in 2012 • Commercial OpenCL training courses - http://www.accelereyes.com/services/training

http://developer.amd.com/Resources/library/Pages/default.aspx

© Copyright Khronos Group 2013 - Page 10 Leveraging Proven Native APIs into HTML5 • Khronos and W3C liaison - Leverage proven native API investments into the Web - Fast API development and deployment - Designed by the hardware community - Familiar foundation reduces developer learning curve

HTML WebCAM(!) Canvas WebVX? WebStream? Camera control and JavaScript Vision Sensor Fusion Processing video processing

Camera Control Native Path Rendering

JavaScript API shipping, Possible future Native APIs shipping acceleration being developed JavaScript APIs or or Khronos working group or work underway acceleration

© Copyright Khronos Group 2013 - Page 11 Mobile Web is a Real Time Application

2048x1536 3100K Pixels 326 DPI

1024x768 786K Pixels 132 DPI + 320x480 = 153K Pixels 163 DPI Apple Apple Apple iPhone iPad iPad Mini

Buttery smooth touch In 5 years the number of Need GPU interaction needs continuous pixels to process on Acceleration for 60Hz updates mobile screens has gone everything Web! up by factor of TWENTY

© Copyright Khronos Group 2013 - Page 12 WebGL Availability in Browsers

Much WebGL content uses three.js library: http://threejs.org/

- Microsoft – “where you have IE11, you have WebGL – turned on by default and working all the time” - Microsoft - WebGL also enabled for Windows applications - web app framework and web view - Apple - WebGL must be explicitly turned on MAC Safari and only exposed on iOS for iAds - Chrome OS - WebGL is the only cross-platform API to program the GPU - Google IO announcement - Chrome on Android will soon launch with WebGL © Copyright Khronos Group 2013 - Page 13 Microsoft PhotoSynth2 • Demonstrated at Build 2013

http://channel9.msdn.com/Events/Build/2013/4-072 1:50

© Copyright Khronos Group 2013 - Page 14 Cross-OS Portability

HTML5 provides cross HTML/CSS HTML/CSS HTML/CSS platform portability. GPU accessibility through WebGL available soon on ~90% mobile systems

Dalvik Preferred development SDK Objective C C# environments not (Java) designed for portability

Native code is portable- but apps must cope with C/C++ DirectX different available APIs and libraries

© Copyright Khronos Group 2013 - Page 15 OpenGL 3D API Family Tree ES3 is backward compatible Fixed function Programmable vertex so new features can be 3D Pipeline and fragment shaders added incrementally WebGL 2.0 is in development now - will bring OpenGL ES 3.0 functionality to the Web http://www.khronos.org/webgl/public-mailing-list/ http://www.khronos.org/registry/webgl/specs/latest/ http://www.khronos.org/webgl/wiki/Testing/Conformance OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0 Content Content Content Mobile 3D WebGL 1.0 WebGL 2.0

OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0 ES-Next OpenGL ES 1.0

OpenGL 1.3 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.1 OpenGL 3.3 OpenGL 4.2 OpenGL 4.3 OpenGL 4.4 GL-Next

OpenGL 3.0 OpenGL 3.2 OpenGL 4.0

OpenGL 4.1

OpenGL 4.4 is a Desktop 3D superset of DX11

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

© Copyright Khronos Group 2013 - Page 16 OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power - Incorporates proven features from OpenGL 3.3 / 4.x - 32-bit integers and floats in shader programs - NPOT, 3D textures, depth textures, texture arrays - Multiple Render Targets for deferred rendering, Occlusion Queries - Instanced Rendering, Transform Feedback … • Make life better for the programmer - Tighter requirements for supported features to reduce implementation variability • Backward compatible with OpenGL ES 2.0 - OpenGL ES 2.0 apps continue to run unmodified • Standardized Texture Compression - #1 developer request!

© Copyright Khronos Group 2013 - Page 17 3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential - Mobile and connected devices need access to increasingly large asset databases • 3D is the last media type to define a compressed format - 3D is more complex – diverse asset types and use cases • Needs to be royalty-free - Avoid an ‘internet video codec war’ scenario • Eventually enable hardware implementations of successful codecs - High-performance and low power – but pragmatic adoption strategy is key

Audio Video Images 3D MP3 H.264 JPEG ? ! An effective and widely adopted codec ignites previously unimagined opportunities for a media type © Copyright Khronos Group 2013 - Page 18 glTF – OpenGL Transmission Format • Binary file format for efficient transmission for 3D assets - Reduce network bandwidth and minimize client processing overhead • Run-time neutral - DO NOT IMPLY OR MANDATE ANY RUN-TIME BEHAVIOR - Can be used by any app or run-time – usually WebGL accelerated • Scalable to handle compression and streaming - Though baseline format does not include compression • ‘Direct load efficiency’ for WebGL - Little or NO processing to drop glTF data into WebGL client • Carry conditioned data from any authoring format - Prototyping and optimizing efficient handling of COLLADA assets

A standards-based content pipeline for rich native and Web 3D Authoring Playback applications

© Copyright Khronos Group 2013 - Page 19 COLLADA and glTF Open Source Ecosystem

OpenCOLLADA Tool Interop COLLADA2GLTF Importer/Exporter Translator and COLLADA Other Conformance Tests authoring On GitHUB formats

https://github.com/KhronosGroup/glTF

Web-based https://github.com/KhronosGroup/OpenCOLLADA Tools https://github.com/KhronosGroup/COLLADA-CTS

Pervasive WebGL deployment Three.js glTF Importer. Rest3D initiative

© Copyright Khronos Group 2013 - Page 20 WebGL as Test-bed for 3D Asset Compression • Integrating and benchmarking 3D geometry compression formats with glTF - Baseline is GZIP • Scalable Complexity 3D Mesh Compression codec MPEG-SC3DMC - Royalty-free graphics compression technology from MPEG (MIT License) - Open3DGC is efficient JavaScript and C/C++ implementation - Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - https://github.com/amd/rest3d/tree/master/server/o3dgc • WebGL-loader is Google lightweight compression for WebGL content • OpenCTM uses LZMA compression

© Copyright Khronos Group 2013 - Page 21 Initial Compression Results • Compression Efficiency - Gzip (default level=6) 400 - OpenCTM (default settings)

- Open3DGC and Webgl-loader 300

- Positions on 14 bits Gzip OpenCTM - Normals and texCoords on 10 bits 200 Webgl-loader + Gzip

Size Size (MBytes) Open3DGC-ASCII + Gzip 100 Open3DGC-Binary

0 CAD 3D Scanned MPEG dataset (3748 models) (78 models) (1211 models)

Open3DGC is 5x-9x more efficient than Gzip 1.3x-2.4x more efficient than OpenCTM and 1.2x-1.5x more efficient than webgl-loader

© Copyright Khronos Group 2013 - Page 22 OpenVX – Power Efficient Vision Processing • Acceleration API for real-time vision - Focus on mobile and embedded systems Application • Diversity of efficient implementations - From programmable processors, through GPUs to dedicated hardware pipelines OpenCV open Other higher-level source library CV libraries • Tightly specified API with conformance - Portable, production-grade vision functions • Complementary to OpenCV - Which is great for prototyping

Open source sample Hardware vendor implementation implementations

Acceleration for power-efficient vision processing

© Copyright Khronos Group 2013 - Page 23 OpenVX Graphs • Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Tiling extension enables user nodes (extensions) to also run in local memory • VXU Utility Library for access to single nodes - Easy way to start using OpenVX • EGLStreams can provide data and event interop with other APIs - BUT use of other Khronos APIs are not mandated

OpenVX Node Native OpenVX OpenVX Camera Node Node Control OpenVX Node Heterogeneous Processing Example Graph and Flow

© Copyright Khronos Group 2013 - Page 24 OpenVX 1.0 Function Overview • Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters • Image Processing - Arithmetic, Logical, and statistical operations - Multichannel Color and BitDepth Extraction and Conversion - 2D Filtering and Morphological operations - Image Resizing and Warping • Core Computer Vision - Pyramid computation - Integral Image computation • Feature Extraction and Tracking - Histogram Computation and Equalization - Canny Edge Detection - Harris and FAST Corner detection - Sparse Optical Flow

© Copyright Khronos Group 2013 - Page 25 OpenVX Participants and Timeline • Aiming for specification finalization by mid-2014 • Itseez is working group chair • Qualcomm and TI are specification editors

© Copyright Khronos Group 2013 - Page 26 OpenVX and OpenCV are Complementary

Open Source Formal specification and Governance Community Driven conformance tests No formal specification Implemented by hardware vendors Very wide Tight focus on hardware accelerated Scope 1000s of functions of imaging and vision functions for mobile vision Multiple camera APIs/interfaces Use external camera API No Conformance testing Full conformance test suite / process Conformance Every vendor implements different subset Reliable acceleration platform Use Case Rapid prototyping Production deployment Memory-based architecture Graph-based execution Efficiency Each operation reads and writes memory Optimizable computation, data transfer Portability APIs can vary depending on processor Hardware abstracted for portability

© Copyright Khronos Group 2013 - Page 27 OpenVX and OpenCL are Complementary

Use Case General Heterogeneous programming Domain targeted - vision processing Language-based Library-based Architecture – needs online compilation - no online compiler required Abstracted node and memory model - Target ‘Exposed’ architected memory model – diverse implementations can be optimized Hardware can impact performance portability for power and performance Minimal floating point requirements – Precision Full IEEE floating point mandated optimized for vision operators Focus on general-purpose math Fully implemented vision operators and Ease of Use libraries with no built-in vision framework ‘out of the box’ functions

© Copyright Khronos Group 2013 - Page 28 Typical Imaging Pipeline • Pre- and Post-processing can be done on CPU, GPU, DSP… • ISP controls camera via 3A algorithms Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF) • ISP may be a separate chip or within Application Processor

Lens, sensor, aperture control 3A Bayer RGB/YUV Image Signal Processor Post- Pre-processing App (ISP) processing

CMOS sensor Need for advanced camera control API: Color Filter Array - to drive more flexible app camera control Lens - over more types of camera sensors - with tighter integration with the rest of the system

© Copyright Khronos Group 2013 - Page 29 Khronos Camera API • Catalyze camera functionality not available on any current platform - Open API that aligns with future platform direction for easy adoption - E.g. could be used to implement future versions of Android Camera HAL • More detailed control per frame - Focus, flash, format, Region of Interest (ROI) selection • Global Timing & Synchronization - E.g. Between cameras and MEMS sensors • Application control over ISP processing (including 3A) - Including multiple, re-entrant ISPs • Control multiple sensors with synch and alignment - Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras • Flexible processing/streaming - Multiple output streams and streaming rows (not just frames) - RAW, Bayer and YUV Processing

© Copyright Khronos Group 2013 - Page 30 Camera API Design Philosophy • C-language API starting from proven designs - e.g. FCAM, Android Camera HAL V3 • Design alignment with widely used hardware standards - e.g. MIPI CSI • Focus on mobile, power-limited devices - But do not preclude other use cases such as automotive, surveillance, DSLR… • Minimize overlap and maximize interoperability with other Khronos APIs - But other Khronos APIs are not required • Provide support for vendor-specific extensions

Group charter First draft Specification approved specification ratification Apr13 4Q13 2Q14 Jul13 1Q14 3Q14 Provisional Sample specification implementation and tests © Copyright Khronos Group 2013 - Page 31 ‘Always On’ Camera and Sensor Processing • Visual sensor revolution – driving need for significant vision acceleration - Multi-sensors: Stereo pairs -> Plenoptic arrays -> Active depth cameras • Devices should be always environmentally-aware – e.g. ‘wave to wake’ - BUT many sensor use cases consume too much power to actually run 24/7 • Smart use of sensors to trigger levels of processing capability - ‘Scanners’ - very low power, always on, detect events in the environment

ARM 7 DSP / Hardware GPU / Hardware 1 MIP and accelerometers can Low power activation of camera Maximum acceleration for processing detect someone in the vicinity to detect someone in field of view full depth sensor capability © Copyright Khronos Group 2013 - Page 32 Sensor Industry Fragmentation …

© Copyright Khronos Group 2013 - Page 33 StreamInput - Sensor Fusion • Defines access to high-quality fused sensor stream and context changes - Implementers can optimize and innovate generation of the sensor stream

Applications

Middleware engines need platform- Platforms can provide portable access to native, low-level increased access to OS Sensor OS APIs Middleware sensor data stream improved sensor data stream (E.g. Android SensorManager or (E.g. Augmented Reality engines, – driving faster, deeper iOS CoreMotion) gaming engines) sensor usage by applications

Mobile or embedded platforms without sensor fusion APIs can provide direct application access Low-level native API defines access to to StreamInput StreamInput implementations fused sensor data stream and context-awareness compete on sensor stream quality, reduced power consumption, environment triggering and context detection – enabling sensor subsystem vendors to increased Sensor ADDED VALUE Sensor … Sensor Sensor Hub Hub

© Copyright Khronos Group 2013 - Page 34 Khronos APIs for Augmented Reality

AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together Audio Rendering

MEMS Application Sensors Sensor on CPUs, GPUs Fusion and DSPs

Precision timestamps Vision Processing on all sensor samples

Advanced Camera EGLStream - 3D Rendering and Video Control and stream stream data Composition generation between APIs On GPU Camera Control API

© Copyright Khronos Group 2013 - Page 35 Khronos DevU In Depth Sessions Today

© Copyright Khronos Group 2013 - Page 36