Khronos Overview December 2014 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem @neilt3d

© Copyright 2014 - Page 1 Why Do We Need Standards? • Standards are interoperability interfaces - They enable communities to communicate and independently innovate • Compelling user experiences can be created inexpensively to build mass markets - Don’t slow growth with functionality fragmentation that adds no value • E.g. Wireless and IO standards - GSM/EDGE, UMTS/HSPA, LTE, IEEE 802.11, Bluetooth, USB …

Standards drive mobile market growth by expanding device capabilities

© Copyright Khronos Group 2014 - Page 2 Khronos Connects Software to Silicon • Open Consortium creating for hardware acceleration - Any company is welcome – many international members • Defining the low- silicon interfaces needed on every platform - Graphics, compute, rich media, vision, sensor and camera processing • Commitment to ROYALTY-FREE specifications for use by the whole industry - State-of-the art IP framework to protect members AND the standards • Non-profit organization - Membership fees cover operating and expenses • Create and publish API Specifications AND Conformance Tests - For cross-vendor portability • Strong industry momentum - 100s of man years invested by industry experts Silicon Well over a BILLION people use Khronos APIs Every Day… Software © Copyright Khronos Group 2014 - Page 3 BOARD OF PROMOTERS

Over 100 members worldwide Any company or university welcome to join

© Copyright Khronos Group 2014 - Page 4 http://accelerateyourworld.org/

© Copyright Khronos Group 2014 - Page 5 Standards in the Real World

Right time to Standardize?

Vendor differences adding no value - Darwinian industry is still fragmentation is slowing growth – clear experimenting with what works goals emerge for a standard and what doesn’t

REFINE BY COMMITTEE BY COMMITTEE Industry agrees on what to standardize Experimentation and design by – cooperative refinement from multiple committee can be viewpoints creates a robust solution slow and unfocused

A good standard enables A bad standard stifles innovation implementation innovation and causes commoditization

Proven processes to accelerate time to a productive ecosystem Proven IP Framework that protects Members IP and specifications in the market A forum for the industry to come together to enable efficient silicon innovation

© Copyright Khronos Group 2014 - Page 6 The Value of Khronos Participation

See an early window Have a voice in Develop into the future of the how key products in Products are aligned industry technology standards parallel with with global market roadmap before evolve to suit spec drafting needs and trends products are your business for faster time developed needs to market Members can ship products faster than non-members

Gather Industry Draft Specifications Publicly Release Non-members Requirements for future Confidential to Specifications and Release silicon acceleration Khronos members Conformance Tests Products

The Khronos standardization process is proven to RAPIDLY generate industry consensus on future hardware acceleration functionality to EFFICIENTLY create new market opportunities

© Copyright Khronos Group 2014 - Page 7 IP Policy, Adoption & Conformance, and Working Group Processes

Note: These slides are for informational purposes only. Please consult the Khronos Membership Agreement, the Khronos Adopters Agreement and the Khronos Trademark Guidelines for the exact legally binding language.

© Copyright Khronos Group 2014 - Page 8 Specification Development Phases

Scope of Work proposals Design Proposals accepted No more core accepted from any member from any member specification changes

Previous version Gather Discussion and voting on Specification Specification of specification Requirements requirements Development Ratification

Spec public release after Ratification vote Decision to develop Scope of Work for new Ratification vote in by the Board of new release spec version agreed Working Group Promoters

Proposals for Extensions to any version of a Specification can be made and approved at any time - Vendor Extension – no approval needed – but Khronos still designates registry key - Multi-Vendor Extension – no approval needed – but Khronos still designates registry key - Khronos Extension – Working group approval needed – specification Ratified and IP license applies

© Copyright Khronos Group 2014 - Page 9 Working Group Decision Process Companies in good standing can vote Attendance at two of the last three working group meetings Each company gets one vote Regardless of membership level, company size or number of representatives at meeting Companies can vote either through attendance or by email Yay, nay, or abstain Any member can make Most votes pass with 66% of non-abstaining votes design proposals Ratification votes requires 3/4 majority

Proposal The discussion can take place on Proposal Iterative process both email list and on calls Proposal Working Group Chair ensures all proposals are put on agenda

Email list / 100% No Documents Working Group Discussion Consensus? Vote? repository

All proposals are available Yes to all working group Yes The proposal needs members to review as No soon as they are posted 66% of the non- abstaining vote to be accepted

Approved Declined

The proposal is Discussion can be accepted by the reopened by working working group group vote © Copyright Khronos Group 2014 - Page 10 Ratification Process

The board reviews the specification package for completeness: • 2 independent implementations Working group chair sends redline and Members review specification (1 for extensions) clean release candidate to Promoters for IP inclusion. Members may • Conformance tests and Adoption program and all members. This starts the file exclusion certificates to • Logo and trademark ratification clock. No further functional exclude essential IP during • Khronos Processes have been followed changes to the specification this period Each board member has one vote

IP exclusion Each company certificates can no has one vote longer be filed.

Working Group Board SpecificationSpecification Working Group Ratification vote Ratification period (42 days) ratification Specification Released DevelopmentDevelopment Ratification vote passes vote

Unapproved specifications are sent back to working group Specification is Ratified and approved for release. The mutual IP license is triggered

© Copyright Khronos Group 2014 - Page 11 Khronos IP Framework - Balanced Protection

Khronos Members agree not to assert IP claims against other Members or Adopters for CONFORMANT IMPLEMENTATIONS OF RATIFIED Specifications

No Implementation IP is licensed – Member can exclude JUST IP explicitly in the specification participation in specific Working Groups Disclosed IP – named patent claims can be excluded from the mutual license IP Licensed

License ONLY applies to conformant Only implementations of ESSENTIAL IP is licensed Khronos specifications (no commercial alternatives ) IP typically licensed is very narrow BUT is the IP needed to protect the specification for use in the industry

© Copyright Khronos Group 2014 - Page 12 Khronos Conformance Process • Implementers of Khronos specifications must be Adopters and pass conformance - Else NOT covered by the Khronos IP framework and cannot use the trademark! • Khronos administers an Adopters Program for each API - Adopters program provides full test access and trademark license for small fee

Company Company executes Port and Upload passing test Successful Review of results implementing Adopters Agreement execute tests results to Khronos enables products to use Khronos Khronos spec and pays fee on products to private web-site. trademarks and to be listed on wishes to use the (for unlimited generate test Peer Review by Khronos website trademark products using that results members/Adopters Example: “We spec version) implemented OpenGL ES” Full use of logo Restricted use and trademark trademark (not logo) with small Full use of logo Adopter Benefit with disclaimer disclaimer and trademark language OpenGL ES ™

© Copyright Khronos Group 2014 - Page 13 Adoption Fees

You can adopt any specification version. Adopting later versions Some APIs offer discounts includes all earlier versions. if you have adopted One adoption fee lets you earlier versions submit as many products as you wish for that versions of the specification

Members get a discount on adoption!

© Copyright Khronos Group 2014 - Page 14 Conformance Reporting and Verification

Your customers can verify on the Khronos website that your product passed conformance and you are eligible to use the logo and are covered by the mutual IP license grant

Note the multiple conformance submissions for OpenGL ES by one company – one per range of similar products

Khronos has very precise rules for what range of products can be regarded as similar and so covered by one submission. Rules are defined in the Conformance Process Document

© Copyright Khronos Group 2014 - Page 15 Adoption through the Industry Food Chain

Each company implementing and promoting a product must be an IP Block Included in SOC Included in Device Adopter and pass conformance to use logo and participate in mutual IP license

Example 1: = 3 Adopters Company A Company B Company C 3 test submissions

Example 2: = 1 Adoption fee, Company A Company A Company A 1 test submission

Company A is not selling IP or SOC as a For efficiency, Companies A, B and C in example can refer to separate product – so only needs to pass another companies submission results if their product is conformance for finished device included BUT they must still be paid Adopter and make the Submission that refers to another companies results to be covered by IP and trademark licenses

© Copyright Khronos Group 2014 - Page 16 Khronos Standards

3D Asset Handling - 3D authoring asset interchange - 3D asset transmission format with compression Visual Computing - 3D Graphics - Heterogeneous Parallel Computing

Over 100 companies defining royalty-free APIs to connect software to silicon

Acceleration in HTML5 - 3D in browser – no Plug-in - for JavaScript Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion

© Copyright Khronos Group 2014 - Page 17 Access to 3D on Over 2 BILLION Devices

1.9B Mobiles / year

300M Desktops / year Windows, Mac,

1B Browsers / year

Source: Gartner (December 2013) © Copyright Khronos Group 2014 - Page 18 Continuing OpenGL Innovation

Bringing state-of-the-art OpenGL 4.5 functionality to cross- platform graphics OpenGL 4.4 OpenGL 4.3 OpenGL 4.2 OpenGL 4.1 OpenGL 3.3/4.0 OpenGL 3.2 OpenGL 3.1 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

DirectX DirectX DirectX DirectX DirectX DirectX 9.0c 10.0 10.1 11 11.1 11.2

© Copyright Khronos Group 2014 - Page 19 What is new in OpenGL 4.5? • Direct State Access (DSA) - Object accessors enable state to be queried and modified without binding objects to contexts - efficiency and flexibility for applications, tools and middleware • Flush Control - Application can control flushing of pending commands before context switching – enabling high-performance multithreaded applications • Robustness - Providing a secure platform for applications such as WebGL browsers e.g. preventing a GPU reset affecting any other running applications • DX11 emulation features - Easier porting of applications between OpenGL and Direct3D • OpenGL ES 3.1 API and compatibility - Enables development and execution of the latest OpenGL ES applications on desktop systems

© Copyright Khronos Group 2014 - Page 20 OpenGL ES and WebGL Roadmap

32-bit integers and floats NPOT, 3D/depth textures Programmable Texture arrays Compute Fixed function Shaders Multiple Render Targets Pipeline

Driver Silicon Silicon Driver Update Update Update Update 2003 2004 2007 2012 2014 1.0 1.1 2.0 2011 3.0 3.1 Spec at GDC March 2014 Standard in Android L

WebGL 1.0 WebGL 2.0 Under Development WebGL 2.0 - Open Review http://www.khronos.org/registry/webgl/specs/latest/2.0/

© Copyright Khronos Group 2014 - Page 21 OpenGL ES 3.1 Goals • Bring developer requested features from desktop OpenGL 4 to mobile - Advanced features, modern programming styles - Higher performance with lower overhead • Headline features - Compute Shaders and Draw-Indirect - Compute shaders can create geometry or other rendering data - …and also the draw commands needed to render them - Offload work from CPU to GPU – critical for mobile perf and power • Run on OpenGL ES 3.0 hardware – expose hidden capabilities of shipping devices - Enable very rapid adoption across the industry • Better looking, faster performing apps!

© Copyright Khronos Group 2014 - Page 22 OpenGL ES 3.1 Adoption Momentum

• Widespread industry participation to release specification in March 2014 - Tool and Game Engine Developers, GPU , SoC Vendors

- Platform Owners, End Equipment Makers, Middleware ISVs

• Khronos launched the OpenGL ES 3.1 Adopters program in June 2014 - Broad set of conformance tests to ensure reliable cross-vendor operation • announced that OpenGL ES 3.1 is standard in Android L - At Google IO June 2014 • First wave of GPU vendors conformant in July 2014 - ARM, , , NVIDIA, , Vivante - http://www.khronos.org/conformance/adopters/conformant-products#opengles

© Copyright Khronos Group 2014 - Page 23 Google Android Extension Pack (AEP) • Set of extensions for OpenGL ES 3.1 - Accessible through a single query - Functionality to support AAA games • Functionality from desktop OpenGL - Tessellation - Improves the detail of geometry rendered - Geometry shaders - Add details and shadows - ASTC Texture Compression - High quality texture compression • Enables premium graphics effects - Deferred rendering - Physically-based shading - High Dynamic Range tone mapping - Global Illumination and reflection

- Smoke and particle effects Epic’s Rivalry demo using full Unreal Engine 4 Running in real-time on NVIDIA Tegra K1 with OpenGL ES 3.1 + AEP https://www.youtube.com/watch?v=jRr-G95GdaM

© Copyright Khronos Group 2014 - Page 24 Next Generation OpenGL Initiative

Platform Diversity and • Ground up re-design of API for high-efficiency access to need for cross-platform graphics and compute on modern GPUs and platforms API standards increasing • Design from first principles – even if means breaking compatibility with traditional OpenGL • An open-standard, cross-platform 3D+compute API for the modern era

After twenty two years – the of GPUs and platforms has radically changed

© Copyright Khronos Group 2014 - Page 25 Ground-up Explicit API Redesign

Traditional OpenGL Next Generation OpenGL Originally architected for graphics workstations Matches architecture of modern platforms with direct renderers and split memory including mobile platforms with unified memory, tiled rendering Driver does lots of work: state validation, dependency tracking, Explicit API – the application has direct, predictable control error checking. Limits and randomizes performance over the operation of the GPU Threading model doesn’t enable generation of graphics Multi-core friendly with multiple command queues commands in parallel to command execution that can be created in parallel Syntax evolved over twenty years – complex API choices can Removing legacy requirements simplifies API design, obscure optimal performance path reduces specification size and enables clear usage guidance Shader language compiler built into driver. Standard Intermediate Language as compiler target simplifies Only GLSL supported. Have to ship shader source driver and enables front-end language flexibility and reliability Despite conformance testing developers must often handle Simpler API, common language front-ends, more rigorous implementation variability between vendors testing increase cross vendor functional/performance portability

© Copyright Khronos Group 2014 - Page 26 Cross Platform Challenge

One family One OS One GPU on All Modern Platforms and GPUs of GPUs one OS Participation of key players Proven IP Framework Battle-tested cooperative model The drive to not let the 3D industry fragment

© Copyright Khronos Group 2014 - Page 27 Portability

Streamlined API is easier to implement and test

Cross- vendor Standard intermediate Portability Enhanced language improves shader conformance program portability and testing reduces driver complexity methodology

WebGL 1.0.2 doubles conformance tests over 1.0.1 ~21200 vs. ~8900 1.0.3 suite will contain ~20% more tests Most contributed by open source community

© Copyright Khronos Group 2014 - Page 28 Status • Organized as a joint project of ARB and OpenGL ES working groups - Likely to become standalone working group soon - Working at very high intensity since June - Making rapid progress - Very significant proposals and IP contributions received from members • Participants come from all segments of the graphics industry - Including an unprecedented level of participation from game engine ISVs

© Copyright Khronos Group 2014 - Page 29 glnext is shaping up to be amazing

. glnext will have the expected features and control of a modern API . And the portability story of OpenGL . OpenGL is already a critically important component of SteamOS . We fully anticipate that glnext will continue this tradition.

© Copyright Khronos Group 2014 - Page 30 We are super excited to contribute and work with the Next Generation OpenGL Initiative, and bring our experience of low- overhead and explicit graphics APIs to build an efficient standard for multiple platforms and vendors in Khronos. This work is of critical importance to get the most out of modern GPUs on both mobile and desktop, and to make it easier to develop advanced and efficient 3D applications – enabling us to build amazing future games with Frostbite on all platforms.

- Johan Andersson, Technical Director, Frostbite – Mobile Web is a Real Time Application

2048x1536 3100K Pixels 326 DPI

1024x768 786K Pixels 132 DPI + 320x480 = 153K Pixels 163 DPI Apple Apple Apple iPhone iPad iPad Mini

Buttery smooth touch In 5 years the number of Need GPU interaction needs continuous pixels to process on Acceleration for 60Hz updates mobile screens has gone Web Rendering! up by factor of TWENTY

© Copyright Khronos Group 2014 - Page 32 WebGL/WebCL Ecosystem

Low-level APIs provide Content Content downloaded from the Web a powerful foundation JavaScript, HTML, CSS, ... for a rich JavaScript Middleware can make WebGL and WebCL middleware ecosystem accessible to non-expert programmers E.g. three.js library: http://threejs.org/ used by JavaScript Middleware majority of WebGL content

Browser provides WebGL and WebCL Alongside other HTML5 technologies No plug-in required HTML5 JavaScript / CSS

OS Provided Drivers WebGL uses OpenGL ES 2.0 or Angle for OpenGL ES 2.0 over DX9 WebCL uses OpenCL 1.X

© Copyright Khronos Group 2014 - Page 33 Pervasive WebGL • WebGL on EVERY major desktop and mobile browser • Portable (NO source change) 3D applications are possible for the first time

http://caniuse.com/#feat=webgl

© Copyright Khronos Group 2014 - Page 34 WebGL Tool/Engine Ecosystem

Epic Citadel - WebGL HTML 5 Benchmark (Firefox 22)

https://www.youtube.com/watch?v=l9KRBuVBjVo © Copyright Khronos Group 2014 - Page 35 WebGL on Mobile Unigine Engine Demo

http://crypt-webgl.unigine.com/

© Copyright Khronos Group 2014 - Page 36 glTF - Transmitting 3D Assets to WebGL Apps • ‘GL Transmission Format’ - Runtime asset format for WebGL, OpenGL ES, and OpenGL applications • Efficient Representation = Small Size AND Minimal Load Processing - JSON for scene structure and other high-level constructs - Binary mesh and animation data - Little or no processing to drop glTF data into client application • Runtime Neutral - Can be created and used by any app or runtime • Khronos is prototyping standards-based pipeline - Conditioning of COLLADA assets into glTF for WebGL applications

Authoring Playback

© Copyright Khronos Group 2014 - Page 37 COLLADA and glTF Ecosystem

OpenCOLLADA Tool Interop COLLADA2GLTF Importer/Exporter Translator and COLLADA Other Conformance Tests authoring On GitHUB formats

Web-based Tools

Pervasive WebGL deployment Three.js glTF Importer. Rest3D initiative

© Copyright Khronos Group 2014 - Page 38 glTF Adoption!

three.js loader Cesium Engine

rest3d viewer Montage Viewer

© Copyright Khronos Group 2014 - Page 39 glTF and Compression Extension • Benchmarking 3D compression formats for implementation as glTF extensions - Baseline is GZIP - MPEG royalty-free Scalable Complexity 3D Mesh Compression codec MPEG-SC3DMC - Open3DGC JavaScript and C/C++ implementation - WebGL-loader is Google lightweight compression format for WebGL content

Format CAD Models 3D Scanned Models MPEG dataset (Mbytes) (Mbytes) (Mbytes) OBJ 1310 (100%) 736 (100%) 600 (100%) Gzip 336 (26%) 204 (28%) 157 (26%) Webgl-loader 219 (17%) 117 (16%) 103 (17%) Open3DGC 67 (5%) 22 (3%) 22 (4%) Webgl-loader + Gzip 80 (6%) 38 (5%) 26 (4%) Open3DGC is 5x-9x more efficient than Gzip and 1.2x-1.5x more efficient than webgl-loader

© Copyright Khronos Group 2014 - Page 40 Status and Open Source Resources for glTF • Open specification; Open process - Spec, and sample code: https://github.com/KhronosGroup/glTF - All features backed up by multiple implementations in code - glTF 0.8 schema available - getting very close to glTF 1.0! • COLLADA2GLTF open-source converter is gaining robustness and momentum - https://github.com/KhronosGroup/glTF/tree/master/converter/COLLADA2GLTF - Binaries are available on GitHUB for easy use • Three.js glTF loader - https://github.com/KhronosGroup/glTF/tree/master/loaders/threejs - Most glTF features are already supported • Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - Available at https://github.com/fabrobinet/glTF-webgl-viewer

© Copyright Khronos Group 2014 - Page 41

OpenCL – Portable Heterogeneous Computing • Portable Heterogeneous programming of diverse compute resources - Targeting supercomputers -> embedded systems -> mobile devices • One code tree can be executed on CPUs, GPUs, DSPs, FPGA and hardware - Dynamically interrogate system load and balance work across available processors • OpenCL = Two APIs and C-based Kernel language - Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels across multiple devices OpenCL KernelOpenCL CodeKernel OpenCL CodeKernel OpenCL CodeKernel OpenCL 2.0 Updated Code November 2014 GPU • OpenCL 2.0 Update DSP • Clarifications for support for Blocks in OpenCL C; CPU • Refinements to the precision requirements for math functions in fast math mode; • Clarification of flags that can be applied to pipes; FPGA CPU • A new extension, cl_khr_device_enqueue_local_arg_types, for enqueueing device kernels to use arguments that are a pointer to a user defined type in local memory; HW • Clarification of the CL_MEM_KERNEL_READ_AND_WRITE flag to enable filtering of image formats that can be passed to a single kernel instance as read_write © Copyright Khronos Group 2014 - Page 42 OpenCL Roadmap • What markets has OpenCL been aimed at? • What problems is OpenCL solving? • How will OpenCL need to adapt in the future? HPC HPC HPC Desktop HPC Discussion Desktop Desktop Mobile Focus for New Desktop Mobile Mobile Web Capabilities Mobile Web Web FPGA FPGA Embedded Safety Critical

3-component vectors Shared Virtual Memory Roadmap Discussions Additional image formats Device partitioning On-device dispatch Binning/Triaging Multiple hosts and devices Separate compilation and linking Generic Address SW and HW features Buffer region operations Enhanced image support Enhanced Image Support Will use Provisional Specs Enhanced event-driven execution Built-in kernels / custom devices C11 Atomics Additional OpenCL C built-ins Enhanced DX and OpenGL Interop Pipes Some common requests: Improved OpenGL data/event interop Android ICD - C++ Programming - SPIR in Core - Refine and evolve Memory Dec08 18 months Jun10 18 months Nov11 24 months Nov13 and Execution Models OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 - Better debug and profiling Specification Specification Specification Specification - Trans-API Interop

© Copyright Khronos Group 2014 - Page 43 OpenCL Implementations Desktop 1.0 | May09 1.1 | Jul11 1.2 | Jun12

1.0 | Aug09 1.1 | Aug10 1.2 | May12 2.0 | Sep14

1.0 | May10 1.1 | Feb11

1.1 |Mar11 1.2 | Dec12 2.0 | Jul14

1.0 | May09 1.1 | Jun10 Mobile 1.1 | Aug12

1.0 | Feb11 1.2 | Sep13

1.2 | Aug14

1.1 | Nov12 1.2 | Apr14

1.0 | Jan10 1.1 | Apr12 1.1 | Dec14

1.1 | May13

1.0 | Jul13 FPGA 1.0 | Dec14

Dec08 Jun10 Nov11 Nov13 OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 © Copyright Khronos Group 2014 - Page 44 Key OpenCL 2.0 Features • Shared Virtual Memory - Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices • Nested Parallelism - Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks • Generic Address Space - Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application

© Copyright Khronos Group 2014 - Page 45 OpenCL Desktop Usage • Broad commercial uptake of OpenCL - Mainly imaging, video and vision processing - Adobe, Apple, Corel, ArcSoft Etc. Etc. • “OpenCL” on Sourceforge, Github, Google Code, Bitbucket finds over 2,000 projects - OpenCL implementations - Beignet, pocl - VLC, , FFMPEG, Handbrake - GIMP, ImageMagick, IrfanView - Hadoop, Memcached - WinZip, Crypto++ Etc. Etc. • Desktop benchmarks use OpenCL - PCMark 8 – video chat and edit - Basemark CL, CompuBench Desktop http://streamcomputing.eu/blog/2013-12-28/professional-consumer-media-software-/ Basemark® CL

© Copyright Khronos Group 2014 - Page 46 Teaching OpenCL • International textbooks - US, Japan, Europe, China and India • Research Paper momentum - Over 4000 papers in 2013 • Commercial OpenCL training courses - http://arrayfire.com/#training • Almost 100 University Courses with OpenCL

OpenCL Research Papers on Google Scholar

http://developer.amd.com/partners/university-programs/opencl-university-course-listings/

© Copyright Khronos Group 2014 - Page 47 Khronos Foundational APIs

Market Momentum… Applications, libraries and frameworks that find OpenCL acceleration can deliver a better end-user experience Developer Innovation

A successful standard enables Deliver the lowest level abstraction possible and encourages innovation in API that still provides portability – this is implementation and usage functionality needed on every platform

Implementer Innovation

Market Momentum.. Many devices competing on performance and power to tap into the value of OpenCL content

© Copyright Khronos Group 2014 - Page 48 OpenCL as Parallel Language Backend

JavaScript Language for MulticoreWare Embedded Java language River Trail Compiler PyOpenCL Harlan binding for image open source array extensions Language directives for Python High level initiation of processing and project on language for for extensions to Fortran, wrapper language OpenCL C computational Bitbucket Haskell parallelism JavaScript C and C++ around for GPU kernels photography OpenCL programming

OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources

© Copyright Khronos Group 2014 - Page 49 Libraries and Languages using OpenCL

Library Name Overview Website Accelerate accelerate: An embedded language for accelerated array processing http://hackage.haskell.org/package/accelerate amgCL Simple and generic algebraic multigrid framework https://github.com/ddemidov/amgcl Aparapi API for data parallel Java. Allows suitable code to be executed on GPU via OpenCL. https://code.google.com/p/aparapi/ ArrayFire Array-based function library https://www.accelereyes.com/products/arrayfire Bolt Bolt C++ Template Library https://github.com/HSA-Libraries/Bolt/releases/tag/v1.1GA Boost.Compute Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. https://github.com/kylelutz/compute Bullet Physics Bullet Physic OpenCL accelerated Rigid Body Pipeline http://bulletphysics.org/wordpress/?p=381 C++ AMP CLANG/LLVM based C++AMP 1.2 standard and transforms it into OpenCL-C https://bitbucket.org/multicoreware/cppamp-driver-ng/wiki/Home clBLAS cl BLAS implementation https://github.com/clMathLibraries/clBLAS clFFT OpenCL FFT Libarary https://github.com/clMathLibraries/clFFT clMAGMA clMAGMA 1.1 is an OpenCL port of MAGMA http://icl.cs.utk.edu/magma/software/view.html?id=190 clpp OpenCL Data Parallel Primitives Library https://code.google.com/p/clpp/ clSpMV Sparse Matrix Solver http://www.eecs.berkeley.edu/~subrian/clSpMV.html Clyther Python just-in-time specialization engine for OpenCL http://srossross.github.io/Clyther/ Math Lib OpenCL 1.2 Math library https://www.codeplay.com/products/math/ Concord C++ Hetrogenous Programing Framework ( Support OpenCL 1.2 ) TBB like https://github.com/IntelLabs/iHRC/ COPRTHR CO-PRocessing THReads (COPRTHR) SDK http://www.browndeertechnology.com/coprthr.htm DL- Data Layout DL Enables Optimized Data Layout Across Heterogeneous Processors http://www.multicorewareinc.com/dl.html ForOpenCL Fortran to OpenCL tool http://sourceforge.net/projects/fortran-parser/files/ForOpenCL/ fortranCL FortranCL is an OpenCL interface for Fortran 90. https://code.google.com/p/fortrancl/ FSCL.Compiler FSharp to OpenCL Compiler https://github.com/GabrieleCocco/FSCL.Compiler GATLAS GPU Automatically Tuned Linear Algebra Software ( Project looks stalled) https://github.com/cjang/GATLAS GMAC Global Memory for Accelerators http://www.multicorewareinc.com/gmac.html GPULib Iterative sparse solvers http://www.txcorp.com/ gpumatrix A matrix and array library on GPU with interface compatible with Eigen. https://github.com/rudaoshi/gpumatrix GPUVerify GPUVerify is a tool for formal analysis of GPU kernels written in OpenCL http://multicore.doc.ic.ac.uk/tools/GPUVerify/ Halide Halide Programming language for high-performance image processing http://halide-lang.org/ Harlan Harlan: A Scheme-Based GPU Programming Language https://github.com/eholk/harlan HOpenCL Haskell OpenCL Wrapper API https://github.com/bgaster/hopencl libCL C++ Generic parallel algorithms library http://www.libcl.org/ Libra SDK Cross Platform Acceleration API http://www.gpusystems.com/libra.aspx M³ Platform Parallel Framework and Primitive Libraries http://www.fixstars.com/en/products/m-cubed/ MUMPS Direct Sparse solver http://graal.ens-lyon.fr/MUMPS/ Octave Octave acceleration via OpenCL http://indico.cern.ch/event/93877/session/13/contribution/89/material/slides/0.pdf Courtesy: AMD

© Copyright Khronos Group 2014 - Page 50 Libraries and Languages using OpenCL #2

Open Fortran Parser ANTLR-based parsing tools that support the Fortran 2008 standard http://fortran-parser.sourceforge.net/ OpenACC to OpenCL Compiler Rose based OpenACC to OpenCL Compiler. https://github.com/tristanvdb/OpenACC-to-OpenCL-Compiler OpenCL.jl Julia OpenCL 1.2 bindings https://github.com/jakebolewski/OpenCL.jl OpenCLIPP OpenCL Integrated Performance Primitives - A library of optimized OpenCL image processing functions https://github.com/CRVI/OpenCLIPP OpenCLLink Mathematica to use the OpenCL parallel computing language http://reference.wolfram.com/mathematica/OpenCLLink/guide/OpenCLLink.html OpenClooVision Computer vision framework based on OpenCL and C# http://opencloovision.codeplex.com/ OpenCV-CL OpenCL accelerated OpenCV http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/07/opencv-cl_instructions-246.pdf OpenHMPP Directive-based OpenACC and OpenHMPP Source to OpenCL compiler http://www.caps-entreprise.com/products/caps-compilers/ Paralution C++ sparse iterative solvers and preconditioners library with OpenCL support http://www.paralution.com/ Pardiso Direct Sparse solver http://www.pardiso-project.org/ Pencil PENCIL to be a suitable target language for the compilation of domain-specific languages (DSLs). https://github.com/carpproject/pencil PETSc Portable, Extensible Toolkit for Scientific Computation http://www.mcs.anl.gov/petsc/ PyOpenCL OpenCL parallel computation API from Python http://mathema.tician.de/software/pyopencl/ QT with OpenCL Using OpenCL with QT http://doc.qt.digia.com/opencl-snapshot/ RaijinCL library for matrix operations for OpenCL http://www.raijincl.org/ Rivertrail JavaScript which supports Data Parallelism via OpenCL https://github.com/rivertrail/rivertrail/wiki RNG Random number generation for parallel computations http://www.iro.umontreal.ca/~lecuyer/ ROpenCL Parallel Computing for R Using OpenCL http://repos.openanalytics.eu/html/ROpenCL.html Rose Compiler Rose Compiler with OpenCL Support http://rosecompiler.org/ Rust-OpenCl OpenCL bindings for Rust. https://github.com/luqmana/rust-opencl ScalaCL Scala support of OpenCL https://github.com/ochafik/ScalaCL SkelCL SkelCL is a library providing high-level abstractions for alleviated programming of modern parallel heterogeneoushttps://github.com/skelcl/skelcl systems SnuCL SnuCL naturally extends the original OpenCL semantics to the heterogeneous cluster http://snucl.snu.ac.kr/ SpeedIT 2.4 OpenCl based OpenFoam acceleration library http://vratis.com/index.php?option=com_content&view=category&layout=blog&id=49&Itemid=88&lang=en streamscan StreamScan: Fast Scan Algorithms for GPUs without Global Barrier Synchronization- https://code.google.com/p/streamscan/ SuperLU Direct Sparse solver http://crd-legacy.lbl.gov/~xiaoye/SuperLU/ TM-Task Management Heterogeneous Task Scheduling and Management http://www.multicorewareinc.com/tm.html Trilinos Building blocks for the development of scientific applications; constructing and using sparse and dense matriceshttp://trilinos.sandia.gov/ VexCL VexCL is a C++ vector expression template library for OpenCL/CUDA http://ddemidov.github.io/vexcl ViennaCL open-source linear algebra library for computations on many-core (GPUs, MIC) and multi-core CPUs.http://viennacl.sourceforge.net/ VirtualCL VirtualCL (VCL) cluster platform is a wrapper for OpenCL™ http://www.mosix.cs.huji.ac.il/txt_vcl.html VOBLA Vehicle for Optimized Basic Linear Algebra - Optimized Basic Linear Algebra DSL https://github.com/carpproject/vobla VOCL Virtualized OpenCL enviornment http://www.mcs.anl.gov/~thakur/papers/xiao-vocl-inpar12.pdf VSI/Pro® VSIPL implementation in OpenCL http://www.techsource.com/press/pdfs/Run_Time-TechSource_press_release.pdf WAMS Algebraic Multigrid Solver using state-of-the-art wavelet preconditioners- solver for sparse linear equations http://www.newengland-scientific.com/ Courtesy: AMD

© Copyright Khronos Group 2014 - Page 51 Widening OpenCL Ecosystem

High-level OpenCL C Alternative Language High-level Alternative Language SingleFrameworks source Kernel Source Diverse, fordomain Kernels- Frameworks specificfor Languages, Kernels file applications frameworks and tools

SPIR Generator (e.g. patched Clang)

https://github.com/KhronosGroup/SPIR

SPIR is easier compiler target than C SYCL Programming abstraction that combines SPIR portability and efficiency of OpenCL with (Standard Portable ease of use and flexibility of C++ Intermediate Representation) Single source file programming First portable IR that includes SYCL 1.2 Provisional Updated

support for parallel computation November 2014 Created in close cooperation with OpenCL run-time OpenCL C LLVM community can consume SPIR Runtime SPIR 2.0 Provisional Released August 2014 (uses LLVM 3.4) Device X Device Y Device Z © Copyright Khronos Group 2014 - Page 52 SYCL for OpenCL • Pronounced ‘sickle’ to go with ‘spear’ (SPIR) • Royalty-free, cross-platform C++ programming layer - Builds on concepts portability & efficiency of OpenCL - Ease of use and flexibility of C++ • Single-source C++ development - C++ template functions can contain host & device code - e.g. parallel_sort (myData); - Construct complex reusable algorithm templates that use OpenCL for acceleration • SYCL 1.2 Provisional spec released at GDC in March 2014 - Updated at Supercomputing November 2014

© Copyright Khronos Group 2014 - Page 53 SPIR Unleashes Language Innovation • Front-ends - New language front-ends and programming abstractions for heterogeneous parallel programming target production quality OpenCL backends through SPIR • Back-ends - New target platforms based on multicore, vector, VLIW or other technologies can reuse production quality language frontends and abstractions - E.g. OpenACC, C++ AMP and Python are targeting SPIR to access optimized back- ends across multiple vendors • Tooling - Advanced program analysis and optimization of programs in SPIR form • SPIR 2.0 supports full2.0 “C” kernel language - Generic address space Front-end Multi Languages and Vendor - Device side kernel enqueue Frameworks Tools - C++11 atomics, Pipes, More… Multiple Hardware - Uses LLVM 3.4 with restrictions and conventions Architectures Backends © Copyright Khronos Group 2014 - Page 54

Heterogeneous Computing and Mobile • Mobile SOCs now beginning to need more than just ‘GPU Compute’ - Multi-core CPUs, GPUs, DSPs, ISPs, specialized hardware blocks • OpenCL can provide a single programming framework for all processors on a SOC - OpenCL 1.2 Built-in Kernels for custom HW

Image Courtesy Qualcomm © Copyright Khronos Group 2014 - Page 55 APIs for Mobile Compute

GPU Compute Shaders (OpenGL 4.4 and OpenGL ES 3.1) Pervasively available on almost any mobile device or OS Easy integration into graphics apps – no API interop needed Program in GLSL not C Limited to acceleration on a single GPU

General Purpose Heterogeneous Programming Framework Flexible, low-level access to any devices with OpenCL compiler Open standard for any device or OS – being used as backend by many languages and frameworks Single programming and run-time framework for CPUs, GPUs, DSPs, hardware Needs full compiler stack and IEEE precision

Metal Integrated Graphics and Compute Subset of a mix of OpenGL and OpenCL functionality C++11-based kernel language Apple only (iOS 8 only, A7 and later hardware), GPU only

C/C++ Language Integrated GPU Compute Easy programmability and low level access to GPU: Unified Memory, Virtual Addressing, Mature and optimized tools and performance Extensive compute and imaging libraries available (NPP, cuFFT, cuBLAS, -gdb, nvprof etc.) NVIDIA only, GPU only RenderScript - Easy, High-level Compute Offload from Java C99 based kernel language for simple offload from Java apps to CPU and GPU RS JIT Compilation provide host and device portability Android only Limited control over acceleration configuration © Copyright Khronos Group 2014 - Page 56 RenderScript and OpenCL • RenderScript and OpenCL do not directly compete - RS addressing very different needs to OpenCL – at a different level in the stack • RenderScript designed for 99% of Android developers - using Java - Code critical sections as native C - automatic offload to CPU/GPU - Programmer Simplicity and Portability across 1,000’s Android handsets - Future - Dynamic load balancing through integration with Android instrumentation and power management systems • BUT - other types of developer need OpenCL-class control in native code - Middleware engines: Unity, Epic Unreal, metaio AR, Bullet Physics … - Leading edge apps: real-time video/vision/camera Compute Graphics - OEM functionality: e.g. camera pipeline Java Binding to - These are the developers/apps/engines Java RS OpenGL ES that hardware vendors want for differentiation (similar to JSR239)

OpenCL on Android can enable specialized access to native Native acceleration and be an effective backend for RenderScript innovation

© Copyright Khronos Group 2014 - Page 57 Mobile OpenCL Shipping • Android ICD extension released in latest extension specification - OpenCL implementations can be discovered and loaded as a shared object • Multiple implementations shipping in Android NDK - ARM, Imagination, NVIDIA, Vivante, Qualcomm, Samsung …

© Copyright Khronos Group 2014 - Page 58 Mixamo - Avatar Videoconferencing • Real time facial animation capture on mobile – ported directly from PC • Animate an avatar while conferencing • Full GPU acceleration of vision processing using OpenCL

NVIDIA Tegra K1 Development Board © Copyright Khronos Group 2014 - Page 59 WebCL - Heterogeneous Computing for Web • OpenCL = Two APIs and C-based Kernel language - Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels across multiple devices • WebCL defines JavaScript binding to the OpenCL APIs - Enables initiation of OpenCL C Kernels from within the browser

OpenCL KernelOpenCL CodeKernel OpenCL CodeKernel OpenCL C JavaScript Platform API CodeKernel JavaScript Runtime API To query, select and initialize Code To build and execute kernels compute devices across multiple devices GPU DSP CPU CPU HW

© Copyright Khronos Group 2014 - Page 60 Motivation for WebCL • Parallel acceleration for compute-intensive web applications - Portable and efficient access to heterogeneous multicore devices in JavaScript • Typical Use Cases - 3D asset codecs, video codecs and processing, imaging and vision processing - Physics for WebGL games, Online data , • WebCL 1.0 specification officially released at GDC March 2014 - https://www.khronos.org/webcl

http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc © Copyright Khronos Group 2014 - Page 61 WebCL Open Source Resources • Implementations

- - Firefox extension (Mozilla Public License 2.0) Based on Apple QJulia - https://github.com/toaarnio/webcl-firefox - Samsung - WebKit (BSD) - https://github.com/SRA-SiliconValley/webkit-webcl - - Uses Node.js (BSD) - https://github.com/Motorola-Mobility/node-webcl Based on Iñigo Quilez, Shader Toy - AMD –Chromium build - https://github.com/amd/Chromium-WebCL • WebCL Kernel Validator (open source) - https://github.com/KhronosGroup/webcl-validator Based on Iñigo Quilez, Shader Toy • OpenCL to WebCL Translator - https://github.com/wolfviking0/webcl-translator • OpenCL Conformance Tests - https://github.com/KhronosGroup/WebCL-conformance/ http://fract.ured.me/ © Copyright Khronos Group 2014 - Page 62

Khronos and W3C Cooperation • Khronos and W3C liaison for Web APIs - Leverage proven native APIs - Fast API development/deployment - Designed by hardware community - Familiar foundation reduces W3C Augmented Web Community Group discussing many of these vision developer learning curve issues for the Web: e.g. leveraging WebRTC in the short term http://w3.org/community/ar

WebSL? JS Binding to Canvas WebVX? WebStream? WebKCAM? JavaScript Vision Sensor Fusion Camera Processing control WebAudio

Native Path Rendering JavaScript API shipping, Possible future Native APIs shipping acceleration being developed JavaScript APIs or or Khronos working group or work underway acceleration © Copyright Khronos Group 2014 - Page 63 OpenMAX IL Media Acceleration

StageFright

OpenMAX IL enables diverse Low-level Acceleration Media Acceleration high-level media frameworks and applications to portably tap into silicon media acceleration

© Copyright Khronos Group 2014 - Page 64 OpenMAX IL – Video, Audio and Imaging • Enables arbitrary multimedia pipelines by plugging blocks together - Componentized architecture abstracts multimedia functionality block interfaces • Wide variety of building blocks for imaging, video and audio functions - Encode, decode, apply an effect, capture, render, split, mix, etc • Enables blocks from different sources to work together - Blocks can be implemented in software or hardware

Portable & reusable media processing building blocks

© Copyright Khronos Group 2014 - Page 65 OpenMAX IL – Component Graphs • Standardized component interfaces enable flexible media graphs - Including tunneling between components for execution efficiency • Wide variety of components for imaging, video and audio functions - Encode, decode, apply an effect, capture, render, split, mix, etc

AAC Audio Audio Audio Decoder Renderer Speakers Clock *.mp4 / *.3gp for AV Sync Time File Reader Data

Video Video Video Scheduler Renderer Display MPEG4/ Decoder Decompressed H.264 Video Video

Example: MPEG-4 video synchronized with AAC audio decode

© Copyright Khronos Group 2014 - Page 66 OpenSL ES and OpenMAX AL

Advanced Audio Multimedia Video 3D Audio playback Audio Video Playback recording Audio Effects Radio and RDS Audio Advanced Recording Camera MIDI

Basic Image capture Buffer MIDI & display queues

Both working groups collaborate to define common API functionality

© Copyright Khronos Group 2014 - Page 67 OpenMAX AL - Object Oriented Media • Connect media objects for processing for images and video with AV sync - Media Objects enable PLAY and RECORD of media • Objects have control interfaces - Play, Seek, Rate, Audio, Video Post-processing, Metadata Extraction - Record, Camera, Video Encoder, Audio Encoder, Metadata Insertion, Radio, MIDI • Extensive camera controls - Flash and metering modes, White balance and focusing controls - Exposure compensation, ISO Sensitivity, Shutter speed & Aperture, Zoom • Analog radio controls - Tuning, RDS Analog Radio Audio Mix Camera Display Window Audio Input DSrc OpenMAX AL DSnk Media Object URI URI Memory Memory © Copyright Khronos Group 2014 - Page 68 OpenMAX AL Video Playback Example • Create Engine object - To drive this session • Create Audio Output Mix object - Method on Engine interface - Mix object drives audio output devices Application • Create Media Player object - Method on Engine interface EngineItf - Input is URI pointing to a local media file Play Event Callback - Output drives display and audio output mix Engine Object • Register event callback - Method on Media Player interface PlayItf • Set PlayState to Playing Media Player Output Mix - Method on Media Player interface Object Object • Wait for end of file event - Via registered callback

© Copyright Khronos Group 2014 - Page 69 OpenMAX AL Profiles and Extensions • Two profiles: - Media Player – media playback-only devices - Media Player/Recorder – full-featured media devices • Some features optional in all profiles - E.g. Vibra, LED, Analog Radio, MIDI, Digital TV - APIs are consistent when hardware is available • Vendor-specific extensions can be integrated into future API core specs

Camera controls Audio playback Audio recording Video playback Video recording Image rendering Image capture

© Copyright Khronos Group 2014 - Page 70 Other OpenMAX AL Features • Extensive camera controls - Flash and metering modes - White balance and focusing controls - Exposure compensation, ISO Sensitivity - Shutter speed & Aperture - Zoom (digital and optical) • Analog radio controls - Tuning, RDS • Audio routing - Application-selectable audio inputs and outputs, based on location, connectivity, etc. - I/O device capability querying • Metadata extraction and insertion - Search/extract and insert/overwrite metadata in a variety of file formats

© Copyright Khronos Group 2014 - Page 71 What’s New in OpenMAX AL 1.1 • Chaining of media objects - Explicit ordering of media processing steps - Transcoding - Audio replacement • Dynamic sources and sinks • Metadata support for streaming playback • Content pipes • Multiple version support • Support for VP8 codec format • New analog radio callback events for more fine-grained radio control • New error codes for improved error handling

© Copyright Khronos Group 2014 - Page 72 Audio Fragmentation • Modern mobile devices have advanced audio capabilities - Including high-quality music and 3D gaming • BUT - no standard way to access audio hardware acceleration - Even playing a simple sound on different platform requires different code • What about ALSA, OSS, GStreamer, OpenAL? - OpenAL is targeted for desktop PCs - OSS is obsolete, replaced by ALSA - ALSA is Linux specific - GStreamer is not an API – and not designed to be optimally hardware accelerated - Are all released under variations of GNU Public License

© Copyright Khronos Group 2014 - Page 73 OpenSL ES – Advanced Audio • Create theater-quality audio experience - Even in a mobile device! • Profiles reduce application customization - Applications can query available profiles - Develop to a specific profile or profile combination • Full 3D audio functionality enhances any gaming experience - Perfect companion to OpenGL ES • Designed for implementation by either a hardware or software solution - Unlike any other advanced audio API

© Copyright Khronos Group 2014 - Page 74 OpenSL ES Profiles

Game-centric mobile devices Music-centric mobile devices Advanced MIDI functionality, sophisticated High quality audio, ability to audio capabilities such as 3D audio, audio support multiple music audio effects, ability to handle buffers of audio, etc. codecs, audio streaming support

Basic mobile phones Ring tone and alert tone playback (basic MIDI functionality), basic audio playback and record functionality, simple 2D audio games

© Copyright Khronos Group 2014 - Page 75 OpenSL ES – Object-Oriented Audio • OpenSL ES has an object-oriented programming model - Simplifies common use cases – but also extensible • Engine Objects are central to any OpenSL ES session - Objects created using methods on the Engine Object interfaces • OpenSL ES Objects enable PLAY and RECORD of audio - Perform some operation on an input and emit the result as output - Can handle almost any audio use case • Objects have control interfaces - For application

© Copyright Khronos Group 2014 - Page 76 What’s new in OpenSL ES 1.1

• Buffer queues • Content pipes • Better control of 3D performance • Explicit object ordering • Dynamic sources and sinks • Metadata support for streaming playback • Multiple version support • Extension configuration support • And more…

© Copyright Khronos Group 2014 - Page 77 Mobile Vision Acceleration = New Experiences

Need for advanced sensors and the acceleration to process them

Computational Face, Body and 3D Scene/Object Augmented Photography and Gesture Tracking Reconstruction Reality Videography

© Copyright Khronos Group 2014 - Page 78 Visual Computing = Graphics AND Vision

Graphics Processing

Data New mobile visual sensors for MORE DATA Advanced mobile hardware for MORE PROCESSING Enables closer intertwining of real and virtual worlds Imagery

Vision Processing High-Quality Reflections, Refractions, and Caustics in Augmented Reality and their Contribution to Visual Coherence P. Kán, H. Kaufmann, Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria https://www.youtube.com/watch?v=i2MEwVZzDaA © Copyright Khronos Group 2014 - Page 79 Vision Pipeline Challenges and Opportunities Growing Camera Diversity Diverse Vision Processors Sensor Proliferation Capturing color, range Driving for high performance Diverse sensor awareness of and lightfields and low power the user and surroundings

• Light / Proximity • 2 cameras • 3 microphones • Touch • Position - GPS - WiFi (fingerprint) • Camera sensors >20MPix - Cellular trilateration • Multi-core CPUs • Novel sensor configurations - NFC/Bluetooth Beacons • Programmable GPUs • Stereo pairs • Accelerometer • DSPs and DSP arrays • Magnetometer • Plenoptic Arrays 19 • Camera ISPs • Gyroscope • Active Structured Light • Dedicated vision IP blocks • Pressure / Temp / Humidity • Active TOF

Flexible sensor and camera Use best processing available Control/fuse vision data control to generate for image stream processing – by/with all other sensor data required image stream with code portability on device

© Copyright Khronos Group 2014 - Page 80 Vision Processing Power Efficiency • Depth sensors = significant processing - Generate/use environmental information Advanced Sensors • Wearables will need ‘always-on’ vision - With smaller thermal limit / battery than phones! • GPUs has x10 CPU imaging power efficiency - GPUs architected for efficient pixel handling • Traditional cameras have dedicated hardware - ISP = Image Signal Processor – on all SOCs today Wearables • SOCs have space for more transistors X100 Dedicated

- But can’t turn on at same time = Dark Silicon Hardware GPU • Potential for dedicated sensor/vision silicon X10 Compute - Can trigger full CPU/GPU complex Multi-core Power EfficiencyPower X1 CPU But how to program specialized processors? Performance and Functional Portability Computation Flexibility © Copyright Khronos Group 2014 - Page 81 OpenVX – Power Efficient Vision Acceleration • Out-of-the-Box vision acceleration framework - Enables low-power, real-time applications - Targeted at mobile and embedded platforms

• Functional Portability Application - Tightly defined specification Application Application - Full conformance tests Application • Performance portability across diverse HW - Higher-level abstraction hides hardware details - ISPs, Dedicated hardware, DSPs and DSP arrays, GPUs, Multi-core CPUs … • Enables low-power, always-on acceleration - Can run solely on dedicated vision hardware Vision AcceleratorVision - Does not require full SOC CPU/GPU complex to AcceleratorVision be powered on AcceleratorVision Accelerator

© Copyright Khronos Group 2014 - Page 82 OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Processing can be tiled to keep data entirely in local memory/cache • VXU Utility Library for access to single nodes - Easy way to start using OpenVX by calling each node independently • EGLStreams can provide data and event interop with other Khronos APIs - BUT use of other Khronos APIs are not mandated

OpenVX Node Native OpenVX OpenVX Downstream Camera Node Node Application Control OpenVX Processing Node

Example OpenVX Graph

© Copyright Khronos Group 2014 - Page 83 OpenVX 1.0 Function Overview • Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters OpenVX Specification • Image Processing Is Extensible - Arithmetic, Logical, and statistical operations Khronos maintains extension registry - Multichannel Color and BitDepth Extraction and Conversion OpenVX 1.0 defines - 2D Filtering and Morphological operations framework for - Image Resizing and Warping creating, managing and executing graphs • Core Computer Vision - Pyramid computation - Integral Image computation Focused set of widely • Feature Extraction and Tracking used functions that are readily accelerated - Histogram Computation and Equalization Widely used extensions adopted into future - Canny Edge Detection versions of the core - Harris and FAST Corner detection Implementers can add - Sparse Optical Flow functions as extensions

© Copyright Khronos Group 2014 - Page 84 Example Graph - Stereo Machine Vision

OpenVX Graph

Stereo Rectify with Compute Depth Detect and Object Camera 1 Remap Map track objects (User Node) (User Node) coordinates

Stereo Image Rectify with Compute Pyramid Camera 2 Remap Optical

Flow

Delay

Tiling extension enables user nodes (extensions) to also optimally run in local memory

© Copyright Khronos Group 2014 - Page 85 OpenVX and OpenCV are Complementary

Community driven open source Formal specification defined and Governance with no formal specification implemented by hardware vendors No conformance tests for consistency and Full conformance test suite / process Conformance every vendor implements different subset creates a reliable acceleration platform Portability APIs can vary depending on processor Hardware abstracted for portability Very wide Tight focus on hardware accelerated Scope 1000s of imaging and vision functions functions for mobile vision Multiple camera APIs/interfaces Use external camera API Memory-based architecture Graph-based execution Efficiency Each operation reads and writes memory Optimizable computation, data transfer Use Case Rapid experimentation Production development & deployment

© Copyright Khronos Group 2014 - Page 86 OpenVX Announcement • Finalized OpenVX 1.0 specification released October 2014 - www.khronos.org/openvx • Full conformance test suite and Adopters Program immediately available - $20K Adopters fee ($15K for members) – working group reviews submitted results - Test suite exercises graph framework and functionality of each OpenVX 1.0 node - Approved Conformant implementations can use the OpenVX trademark • Khronos working on open source sample implementation of OpenVX 1.0 - Expected release on GitHub by end of 2014

© Copyright Khronos Group 2014 - Page 87 Khronos APIs for Vision Processing • Any compute API can be used for vision acceleration - OpenCL, OpenGL Compute Shaders … • OpenVX is the only vision API that does not NEED a CPU/GPU complex - Can use any processor – from high-end GPU, through DSPs to hardware blocks • Regardless of the underlying hardware – the application remains portable - The higher abstraction level of OpenVX protects app from hardware differences • App portability to dedicated vision hardware and graph-based optimizations are the keys to achieving very lower power vision processing

Many implementers may choose to use OpenCL or OpenGL Compute Shaders to implement OpenVX nodes and OpenVX to enable a developer to connect those nodes into a graph Programmable Vision Dedicated Vision Processors Hardware

© Copyright Khronos Group 2014 - Page 88 NVIDIA VisionWorks is Integrating OpenVX • VisionWorks library contains diverse vision and imaging primitives • Will leverage OpenVX for optimized primitive execution • Can extend VisionWorks nodes through GPU-accelerated primitives Applications and Middleware • Provided with sample library of fully accelerated pipelines

Vision Pipeline Samples 3rd Party Pipelines Object … SLAM Detection VisionWorks VisionWorks Primitives Framework

Corner Classifier 3rd Party Detection …

CUDA Libraries GPU Libraries

Tegra K1

© Copyright Khronos Group 2014 - Page 89 Need for Camera Control API - OpenKCAM • Advanced control of ISP and camera subsystem – with cross-platform portability - Generate sophisticated image stream for advanced imaging & vision apps • No platform API currently fulfills all developer requirements - Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays - Cross sensor synch: e.g. synch of camera and MEMS sensors - Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI - Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing

Defines control of Sensor, Color Filter Array Lens, Flash, Focus, Aperture

Auto Exposure (AE) Auto White Balance (AWB) Auto Focus (AF) Image Signal Image/Vision Processor (ISP) Applications EGLStreams

© Copyright Khronos Group 2014 - Page 90 OpenKCAM is FCAM-based • FCAM (2010) Stanford/Nokia, open source • Capture stream of camera images with precision control Khronos coordinating with MIPI on camera control and - A pipeline that converts requests into image stream data formats - All parameters packed into the requests - no visible state - Programmer has full control over sensor settings for each frame in stream • Control over focus and flash - No hidden daemon running • Control ISP - Can access supplemental statistics from ISP if available • No global state - State travels with image requests - Every pipeline stage may have different state - Enables fast, deterministic state changes

© Copyright Khronos Group 2014 - Page 91 Sensor Industry Fragmentation …

© Copyright Khronos Group 2014 - Page 92 Low-level Sensor Abstraction API

Apps request semantic sensor information StreamInput defines possible requests, e.g. Read Physical or Virtual Sensors e.g. “Game Quaternion” Context detection e.g. “Am I in an elevator?”

Apps Need Sophisticated Access to Sensor Data Without coding to specific Advanced Sensors Everywhere sensor hardware Multi-axis motion/position, quaternions, context-awareness, gestures, activity Sensor Discoverability monitoring, health and environmental sensors Sensor Code Portability

StreamInput processing graph provides optimized sensor data stream High-value, smart sensor fusion middleware can connect to apps in a portable way Apps can gain ‘magical’ situational awareness

© Copyright Khronos Group 2014 - Page 93 Khronos APIs for Augmented Reality

AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together Audio Rendering

MEMS Application Sensors Sensor on CPUs, GPUs Fusion and DSPs

Precision timestamps Vision Processing on all sensor samples

Advanced Camera EGLStream - 3D Rendering and Video Control and stream stream data Composition generation between APIs On GPU

© Copyright Khronos Group 2014 - Page 94 Summary • Khronos is building a trio of interoperating APIs for portable / power-efficient vision and sensor processing • OpenVX 1.0 specification is now finalized and released - Full conformance tests and Adopters program immediately available - Khronos open source sample implementation by end of 2014 - First commercial implementations already close to shipping • Any company is welcome to join Khronos to influence the direction of mobile and embedded vision processing! - $15K annual membership fee for access to all Khronos API working groups - Well-defined IP framework protects your IP and conformant implementations • More Information - www.khronos.org - [email protected] - @neilt3d

© Copyright Khronos Group 2014 - Page 95 Questions?

• www.khronos.org • [email protected] • @neilt3d

© Copyright Khronos Group 2014 - Page 96