Khronos 简介 2014年12月 Neil Trevett Khronos 主席 NVIDIA移动生态系统副总裁 @neilt3d

© Copyright 2014 - Page 1 为什么我们需要标准? • 标准是交互操作界面 - 他们实现了社群的交流和独立创新 • 以低成本地方式打造更强的用户体验,以创建巨大的市场 - 不要因毫无价值的功能碎片而减缓发展 • E.g. 无线和IO标准 - GSM/EDGE, UMTS/HSPA, LTE, IEEE 802.11, Bluetooth, USB …

通过扩展设备性能,标准着带动市 场的增长

© Copyright Khronos Group 2014 - Page 2 Khronos 将软件链接到芯片 • 开放组织为硬件加速创建开源标准API - 欢迎所有公司加入 – 众多国际成员企业 • 定义每个平台都需要的低水平芯片界面 - 图形、计算、多媒体、视觉、传感和相机处理 • 承诺将为整合业界带来免版税规范 - 不断更新的IP框架保护会员权益和标准 • 非盈利组织 - 会员费仅用于支付组织运营和工程开发费用 • 创建并发布API规范和一致性测试 - 以实现跨厂商可移植性 • 强大的业界投入 - 业界专家的数百人年投入 Silicon 每天都有数十亿人在使用Khronos的API Software

© Copyright Khronos Group 2014 - Page 3 BOARD OF PROMOTERS

Over 100 members worldwide Any company or university welcome to join

© Copyright Khronos Group 2014 - Page 4 http://accelerateyourworld.org/

© Copyright Khronos Group 2014 - Page 5 现实世界中的标准

标准化的最佳时机?

厂商间差异化毫无价值 – 碎片不断增加 达尔文业界还在实验着什么行得通什 – 标准化的目标不断清晰 么行不通

有业界设计 由业界定义 业界的实验和设计速度迟缓且不够集 业界对标准化的内容是赞同的 – 共同为 中关注 各种观点定义高效的解决方案 一个不好的标准将扼杀创新并引起商 一个好的标准可以实现执行创新 业化

通过实际考验的流程可以加速高效生态系统的发展 通过实际考验的IP框架保护了市场中会员IP和规范本身 一个业界的平台将所有企业聚集在一起共同进行高效的芯片技术创新

© Copyright Khronos Group 2014 - Page 6 Khronos的参与价值

在产品开发之 在行业标准的 在草拟规范 前提前了解未 创建中拥有发 的同时开发 产品与全球市场需 来行业技术发 言权,使其满 产品,获得 求和趋势与时俱进 展趋势 足自己业务的 先机市场 需求 会员公司将比非会员 公司提早推出产品

为未来芯片加速方面搜集 规范草本仅对 公开发布规范和一 非会员公司 行业需求 Khornos会员开放 致性测试 发布产品

实践证明,Khronos的标准流程在未来硬件加速性能 方面迅速地引起共鸣,有效地创建了新的市场机遇。

© Copyright Khronos Group 2014 - Page 7 IP政策,采用&一致性,工作组流程

注:以下内容仅供参考。具体法律相关信息请参考Khronos会员协议(Khronos Membership Agreement)、Khronos采用者协议(Khronos Adopters Agreement )和Khronos商标标识指导(Khronos Trademark Guidelines)

© Copyright Khronos Group 2014 - Page 8 规范开发阶段

接受来自任何会员企业的工作 接受来自任何会员企业的设计 不再有任何核心规范变动 范围项目书 项目书

规范的早期版本 Specification 搜集需求 对需求进行讨论和投票 规范开发 Ratification

决定开发新版本 对新规范版本的工作范围 工作组最终投票通过 最终董事会投票通过后, 达成一致 规范公开发布

任何规范版本扩展的项目书可以在任何时间提交和通过 - 厂商扩展 – 无需批准– 但是Khronos仍要制定registry key - 多厂商扩展 – 无需批准 –但是Khronos仍要制定registry key - Khronos 扩展 – 需要工作组批准 – 需完成规范批准和IP许可

© Copyright Khronos Group 2014 - Page 9 Working Group 投票决策流程 长期良好的会员企业可以参与投票 在过去的三次工作组会议中至少参加两次 一家公司一个投票权 无论会员类型、公司规模或参会人数 公司可以通过出席会议或邮件投票表示同意、否决或弃权 多数通过即非弃权投票超过66% 任何会员企业可以提交设 最终批准投票要求达到3/4大多数同意 计项目书

Proposal 决策可以通过邮件列表或者电话会 Proposal 重复过程 议完成 工作 要确保所有 项目书 组组长 项目书都被放到会议议 程中

邮件列表/文件 100% No 工作组 讨论 投票 库 一致同意? ?

一旦项目书被发出,工作组 Yes 的全部成员可以看到项目书 Yes 项目书需要由66%非弃 No 权票通过

通过 否决

工作组接受项目书 可以通过工作组投票决 定重新进行讨论

© Copyright Khronos Group 2014 - Page 10 标准批准流程

董事会将审核规范的完整性: • 2个独立的执行(1个是扩展) • 一致性测试和采用项目 工作组组长发送修改备注(redline)和完 会员查看规范IP列表。在此期间, • 标志和标识 成版(clean)发起者会员(Promoters) 会员可以提出排除证书以排除根 • Khronos流程中从始至终都是一个董事会成员公 和全部会员企业。这是规范批准流程的起 本IP。 司一个投票权 点。自此不可以对规范的功能再进行任何 修改。

将不可以再提出IP排除证 每个公司有一 书 个投票权

Specification Working Group 董事会批准投 规范开发 工作组批准投票通过 批准期间(42天) 规范发布 Development Ratification vote 票

未通过的规范将会被返回工作组 规范通过,批准发布 IP许可启动

© Copyright Khronos Group 2014 - Page 11 Khronos IP 框架- 平衡的保护机制

Khronos 会员同意不会对其他会员或者采用者的公开规范执行一致性提出IP方面诉讼

没有执行IP许可 会员可以拒绝参加某些工作 - 只是规范中涉及的IP 组

排除IP – 具体指明的产权可以被 排除在共同许可之外 IP 许可

许可只会发给通过 Khronos规范一致性测试 只有基础IP会被许可(不涉及 的执行 商业部分)

通常IP许可范围很窄 但是这些是保护规范在业界中使用所必须的IP

© Copyright Khronos Group 2014 - Page 12 Khronos 一致性流程

• Khronos规范的执行者必须是采用者并通过一致性测试 - 否则不会被IP框架保护并且不被允许使用商标标识! • Khronos的每个API都有采用者计划 - 仅仅通过支付很少的费用,采用者计划可以获得完整的测试和商标使用许可

上传测试结果到 公司执行采用者协议并 在产品上接入并 测试结果通过审核后,该产品将可 公司执行Khronos的 Khronos的网站上。接 支付费用 执行测试以生成 以使用Khronos的商标并被罗列在 规范,希望使用其商 受来自会员和采用者 (不限产品数量) 测试结果 Khronos的网站上 标 同行的审核 如:“此产品采用 OpenGL ES”

带有限制条件的商标使 完全地使用标识 采用者优势 用(不是标识logo), 和商标,只需简 完全地使用 标识和商标 需要进行文字声明测试 单的相关声明 状态 OpenGL ES ™

© Copyright Khronos Group 2014 - Page 13 采用费用

可以采用任何版本。对较新版 本的采用将涵盖所有之前更早 对于部分API,如果已经采 的版本。 用了较早的版本,可以在采 一笔采用者费用可以提交使用 用新版本标准时享受折扣。 该版本规范的无数个产品

会员企业可以享受采用项目 费用折扣!

© Copyright Khronos Group 2014 - Page 14 一致性报告与验证

您的客户可以在Khronos网站上验 证您的产品的一致性测试结果, 同时您的产品是被允许使用logo 的,并受到共同IP许可的保护

注:一家公司的多个OpenGL ES一 致性测试提交 – 一个产品系列 一个一致性测试

Khronos有关于如何定义一个测试提交可以覆盖 哪些类似产品范围非常具体的条文。具体内容 参考一致性流程文件(Conformance Process Document)

© Copyright Khronos Group 2014 - Page 15 业界食物链中的采用者关系

每个执行和推出一个产品的公司必 须是采用者,并通过一致性测试后才 可以使用logo和参加共同IP许可 IP 块 被使用在 SOC 被使用在 设备

例子1: 公司 A 公司 B 公司 C = 3个采用费用, 3个测试提交

例子2: 公司 A 公司 A 公司 A = 1 个采用费用, 1个测试提交

为提高效率,如果其产品是被包含在内的,上述例子总的公司A、B、 C可以是指其他的公司提交测试,但是他们必须是付费的采用者并 公司A不是将其IP或SOC作为单独产品出售,所以 提交其他公司的测试结果,以保证其被覆盖在IP和商标标识许可内 只有其最终设备需要通过一致性测试

© Copyright Khronos Group 2014 - Page 16 Khronos 标准

3D Asset Handling - 3D authoring asset interchange - 3D asset transmission format with compression 视觉计算 - 3D 图形 - 异构并行计算

超过百家企业共同定义免版税API,将软件链接到 芯片

HTML5中的加速 - 浏览器上的3D – 无插件 - JavaScript的异构计算 传感处理 - 视觉加速 - 相机控制 - 传感融合

© Copyright Khronos Group 2014 - Page 17 Access to 3D on Over 2 BILLION Devices

1.9B Mobiles / year

300M Desktops / year Windows, Mac, Linux

1B Browsers / year

Source: Gartner (December 2013) © Copyright Khronos Group 2014 - Page 18 Continuing OpenGL Innovation

Bringing state-of-the-art OpenGL 4.5 functionality to cross- platform graphics OpenGL 4.4 OpenGL 4.3 OpenGL 4.2 OpenGL 4.1 OpenGL 3.3/4.0 OpenGL 3.2 OpenGL 3.1 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

DirectX DirectX DirectX DirectX DirectX DirectX 9.0c 10.0 10.1 11 11.1 11.2

© Copyright Khronos Group 2014 - Page 19 What is new in OpenGL 4.5? • Direct State Access (DSA) - Object accessors enable state to be queried and modified without binding objects to contexts - efficiency and flexibility for applications, tools and middleware • Flush Control - Application can control flushing of pending commands before context switching – enabling high-performance multithreaded applications • Robustness - Providing a secure platform for applications such as WebGL browsers e.g. preventing a GPU reset affecting any other running applications • DX11 emulation features - Easier porting of applications between OpenGL and Direct3D • OpenGL ES 3.1 API and shader compatibility - Enables development and execution of the latest OpenGL ES applications on desktop systems

© Copyright Khronos Group 2014 - Page 20 OpenGL ES and WebGL Roadmap

32-bit integers and floats NPOT, 3D/depth textures Programmable Texture arrays Compute Shaders Fixed function Shaders Multiple Render Targets Pipeline

Driver Silicon Silicon Driver Update Update Update Update 2003 2004 2007 2012 2014 1.0 1.1 2.0 2011 3.0 3.1 Spec at GDC March 2014 Standard in Android L

WebGL 1.0 WebGL 2.0 Under Development WebGL 2.0 - Open Review http://www.khronos.org/registry/webgl/specs/latest/2.0/

© Copyright Khronos Group 2014 - Page 21 OpenGL ES 3.1 Goals • Bring developer requested features from desktop OpenGL 4 to mobile - Advanced features, modern programming styles - Higher performance with lower overhead • Headline features - Compute Shaders and Draw-Indirect - Compute shaders can create geometry or other rendering data - …and also the draw commands needed to render them - Offload work from CPU to GPU – critical for mobile perf and power • Run on OpenGL ES 3.0 hardware – expose hidden capabilities of shipping devices - Enable very rapid adoption across the industry • Better looking, faster performing apps!

© Copyright Khronos Group 2014 - Page 22 OpenGL ES 3.1 Adoption Momentum

• Widespread industry participation to release specification in March 2014 - Tool and Game Engine Developers, GPU Designers, SoC Vendors

- Platform Owners, End Equipment Makers, Middleware ISVs

• Khronos launched the OpenGL ES 3.1 Adopters program in June 2014 - Broad set of conformance tests to ensure reliable cross-vendor operation • announced that OpenGL ES 3.1 is standard in Android L - At Google IO June 2014 • First wave of GPU vendors conformant in July 2014 - ARM, Imagination Technologies, Intel, NVIDIA, Qualcomm, Vivante - http://www.khronos.org/conformance/adopters/conformant-products#opengles

© Copyright Khronos Group 2014 - Page 23 Google Android Extension Pack (AEP) • Set of extensions for OpenGL ES 3.1 - Accessible through a single query - Functionality to support AAA games • Functionality from desktop OpenGL - Tessellation - Improves the detail of geometry rendered - Geometry shaders - Add details and shadows - ASTC Texture Compression - High quality texture compression • Enables premium graphics effects - Deferred rendering - Physically-based shading - High Dynamic Range tone mapping - Global Illumination and reflection

- Smoke and particle effects Epic’s Rivalry demo using full Unreal Engine 4 Running in real-time on NVIDIA Tegra K1 with OpenGL ES 3.1 + AEP https://www.youtube.com/watch?v=jRr-G95GdaM

© Copyright Khronos Group 2014 - Page 24 Next Generation OpenGL Initiative

Platform Diversity and • Ground up re-design of API for high-efficiency access to need for cross-platform graphics and compute on modern GPUs and platforms API standards increasing • Design from first principles – even if means breaking compatibility with traditional OpenGL • An open-standard, cross-platform 3D+compute API for the modern era

After twenty two years – the architecture of GPUs and platforms has radically changed

© Copyright Khronos Group 2014 - Page 25 Ground-up Explicit API Redesign

Traditional OpenGL Next Generation OpenGL Originally architected for graphics workstations Matches architecture of modern platforms with direct renderers and split memory including mobile platforms with unified memory, tiled rendering Driver does lots of work: state validation, dependency tracking, Explicit API – the application has direct, predictable control error checking. Limits and randomizes performance over the operation of the GPU Threading model doesn’t enable generation of graphics Multi-core friendly with multiple command queues commands in parallel to command execution that can be created in parallel Syntax evolved over twenty years – complex API choices can Removing legacy requirements simplifies API design, obscure optimal performance path reduces specification size and enables clear usage guidance Shader language compiler built into driver. Standard Intermediate Language as compiler target simplifies Only GLSL supported. Have to ship shader source driver and enables front-end language flexibility and reliability Despite conformance testing developers must often handle Simpler API, common language front-ends, more rigorous implementation variability between vendors testing increase cross vendor functional/performance portability

© Copyright Khronos Group 2014 - Page 26 Cross Platform Challenge

One family One OS One GPU on All Modern Platforms and GPUs of GPUs one OS Participation of key players Proven IP Framework Battle-tested cooperative model The drive to not let the 3D industry fragment

© Copyright Khronos Group 2014 - Page 27 Portability

Streamlined API is easier to implement and test

Cross- vendor Standard intermediate Portability Enhanced language improves shader conformance program portability and testing reduces driver complexity methodology

WebGL 1.0.2 doubles conformance tests over 1.0.1 ~21200 vs. ~8900 1.0.3 suite will contain ~20% more tests Most contributed by open source community

© Copyright Khronos Group 2014 - Page 28 Status • Organized as a joint project of ARB and OpenGL ES working groups - Likely to become standalone working group soon - Working at very high intensity since June - Making rapid progress - Very significant proposals and IP contributions received from members • Participants come from all segments of the graphics industry - Including an unprecedented level of participation from game engine ISVs

© Copyright Khronos Group 2014 - Page 29 glnext is shaping up to be amazing

. glnext will have the expected features and control of a modern API . And the portability story of OpenGL . OpenGL is already a critically important component of SteamOS . We fully anticipate that glnext will continue this tradition.

© Copyright Khronos Group 2014 - Page 30 We are super excited to contribute and work with the Next Generation OpenGL Initiative, and bring our experience of low- overhead and explicit graphics to build an efficient standard for multiple platforms and vendors in Khronos. This work is of critical importance to get the most out of modern GPUs on both mobile and desktop, and to make it easier to develop advanced and efficient 3D applications – enabling us to build amazing future games with Frostbite on all platforms.

- Johan Andersson, Technical Director, Frostbite – Electronic Arts Mobile Web is a Real Time Application

2048x1536 3100K Pixels 326 DPI

1024x768 786K Pixels 132 DPI + 320x480 = 153K Pixels 163 DPI Apple Apple Apple iPhone iPad iPad Mini

Buttery smooth touch In 5 years the number of Need GPU interaction needs continuous pixels to process on Acceleration for 60Hz updates mobile screens has gone Web Rendering! up by factor of TWENTY

© Copyright Khronos Group 2014 - Page 32 WebGL/WebCL Ecosystem

Low-level APIs provide Content Content downloaded from the Web a powerful foundation JavaScript, HTML, CSS, ... for a rich JavaScript Middleware can make WebGL and WebCL middleware ecosystem accessible to non-expert programmers E.g. three.js library: http://threejs.org/ used by JavaScript Middleware majority of WebGL content

Browser provides WebGL and WebCL Alongside other HTML5 technologies No plug-in required HTML5 JavaScript / CSS

OS Provided Drivers WebGL uses OpenGL ES 2.0 or Angle for OpenGL ES 2.0 over DX9 WebCL uses OpenCL 1.X

© Copyright Khronos Group 2014 - Page 33 Pervasive WebGL • WebGL on EVERY major desktop and mobile browser • Portable (NO source change) 3D applications are possible for the first time

http://caniuse.com/#feat=webgl

© Copyright Khronos Group 2014 - Page 34 WebGL Tool/Engine Ecosystem

Epic Citadel - WebGL HTML 5 Benchmark (Firefox 22)

https://www.youtube.com/watch?v=l9KRBuVBjVo © Copyright Khronos Group 2014 - Page 35 WebGL on Mobile Unigine Engine Demo

http://crypt-webgl.unigine.com/

© Copyright Khronos Group 2014 - Page 36 glTF - Transmitting 3D Assets to WebGL Apps • ‘GL Transmission Format’ - Runtime asset format for WebGL, OpenGL ES, and OpenGL applications • Efficient Representation = Small Size AND Minimal Load Processing - JSON for scene structure and other high-level constructs - Binary mesh and animation data - Little or no processing to drop glTF data into client application • Runtime Neutral - Can be created and used by any app or runtime • Khronos is prototyping standards-based pipeline - Conditioning of COLLADA assets into glTF for WebGL applications

Authoring Playback

© Copyright Khronos Group 2014 - Page 37 COLLADA and glTF Ecosystem

OpenCOLLADA Tool Interop COLLADA2GLTF Importer/Exporter Translator and COLLADA Other Conformance Tests authoring On GitHUB formats

Web-based Tools

Pervasive WebGL deployment Three.js glTF Importer. Rest3D initiative

© Copyright Khronos Group 2014 - Page 38 glTF Adoption!

three.js loader Cesium Engine

rest3d viewer Montage Viewer

© Copyright Khronos Group 2014 - Page 39 glTF and Compression Extension • Benchmarking 3D compression formats for implementation as glTF extensions - Baseline is GZIP - MPEG royalty-free Scalable Complexity 3D Mesh Compression codec MPEG-SC3DMC - Open3DGC JavaScript and C/C++ implementation - WebGL-loader is Google lightweight compression format for WebGL content

Format CAD Models 3D Scanned Models MPEG dataset (Mbytes) (Mbytes) (Mbytes) OBJ 1310 (100%) 736 (100%) 600 (100%) Gzip 336 (26%) 204 (28%) 157 (26%) Webgl-loader 219 (17%) 117 (16%) 103 (17%) Open3DGC 67 (5%) 22 (3%) 22 (4%) Webgl-loader + Gzip 80 (6%) 38 (5%) 26 (4%) Open3DGC is 5x-9x more efficient than Gzip and 1.2x-1.5x more efficient than webgl-loader

© Copyright Khronos Group 2014 - Page 40 Status and Open Source Resources for glTF • Open specification; Open process - Spec, and sample code: https://github.com/KhronosGroup/glTF - All features backed up by multiple implementations in code - glTF 0.8 schema available - getting very close to glTF 1.0! • COLLADA2GLTF open-source converter is gaining robustness and momentum - https://github.com/KhronosGroup/glTF/tree/master/converter/COLLADA2GLTF - Binaries are available on GitHUB for easy use • Three.js glTF loader - https://github.com/KhronosGroup/glTF/tree/master/loaders/threejs - Most glTF features are already supported • Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - Available at https://github.com/fabrobinet/glTF-webgl-viewer

© Copyright Khronos Group 2014 - Page 41

OpenCL – Portable Heterogeneous Computing • Portable Heterogeneous programming of diverse compute resources - Targeting supercomputers -> embedded systems -> mobile devices • One code tree can be executed on CPUs, GPUs, DSPs, FPGA and hardware - Dynamically interrogate system load and balance work across available processors • OpenCL = Two APIs and C-based Kernel language - Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels across multiple devices OpenCL KernelOpenCL CodeKernel OpenCL CodeKernel OpenCL CodeKernel OpenCL 2.0 Updated Code November 2014 GPU • OpenCL 2.0 Update DSP • Clarifications for support for Blocks in OpenCL C; CPU • Refinements to the precision requirements for math functions in fast math mode; • Clarification of flags that can be applied to pipes; FPGA CPU • A new extension, cl_khr_device_enqueue_local_arg_types, for enqueueing device kernels to use arguments that are a pointer to a user defined type in local memory; HW • Clarification of the CL_MEM_KERNEL_READ_AND_WRITE flag to enable filtering of image formats that can be passed to a single kernel instance as read_write © Copyright Khronos Group 2014 - Page 42 OpenCL Roadmap • What markets has OpenCL been aimed at? • What problems is OpenCL solving? • How will OpenCL need to adapt in the future? HPC HPC HPC Desktop HPC Discussion Desktop Desktop Mobile Focus for New Desktop Mobile Mobile Web Capabilities Mobile Web Web FPGA FPGA Embedded Safety Critical

3-component vectors Shared Virtual Memory Roadmap Discussions Additional image formats Device partitioning On-device dispatch Binning/Triaging Multiple hosts and devices Separate compilation and linking Generic Address Space SW and HW features Buffer region operations Enhanced image support Enhanced Image Support Will use Provisional Specs Enhanced event-driven execution Built-in kernels / custom devices C11 Atomics Additional OpenCL C built-ins Enhanced DX and OpenGL Interop Pipes Some common requests: Improved OpenGL data/event interop Android ICD - C++ Programming - SPIR in Core - Refine and evolve Memory Dec08 18 months Jun10 18 months Nov11 24 months Nov13 and Execution Models OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 - Better debug and profiling Specification Specification Specification Specification - Trans-API Interop

© Copyright Khronos Group 2014 - Page 43 OpenCL Implementations Desktop 1.0 | May09 1.1 | Jul11 1.2 | Jun12

1.0 | Aug09 1.1 | Aug10 1.2 | May12 2.0 | Sep14

1.0 | May10 1.1 | Feb11

1.1 |Mar11 1.2 | Dec12 2.0 | Jul14

1.0 | May09 1.1 | Jun10 Mobile 1.1 | Aug12

1.0 | Feb11 1.2 | Sep13

1.2 | Aug14

1.1 | Nov12 1.2 | Apr14

1.0 | Jan10 1.1 | Apr12 1.1 | Dec14

1.1 | May13

1.0 | Jul13 FPGA 1.0 | Dec14

Dec08 Jun10 Nov11 Nov13 OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 © Copyright Khronos Group 2014 - Page 44 Key OpenCL 2.0 Features • Shared Virtual Memory - Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices • Nested Parallelism - Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks • Generic Address Space - Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application

© Copyright Khronos Group 2014 - Page 45 OpenCL Desktop Usage • Broad commercial uptake of OpenCL - Mainly imaging, video and vision processing - Adobe, Apple, Corel, ArcSoft Etc. Etc. • “OpenCL” on Sourceforge, Github, Google Code, Bitbucket finds over 2,000 projects - OpenCL implementations - Beignet, pocl - VLC, , FFMPEG, Handbrake - GIMP, ImageMagick, IrfanView - Hadoop, Memcached - WinZip, Crypto++ Etc. Etc. • Desktop benchmarks use OpenCL - PCMark 8 – video chat and edit - Basemark CL, CompuBench Desktop http://streamcomputing.eu/blog/2013-12-28/professional-consumer-media-software-/ Basemark® CL

© Copyright Khronos Group 2014 - Page 46 Teaching OpenCL • International textbooks - US, Japan, Europe, China and India • Research Paper momentum - Over 4000 papers in 2013 • Commercial OpenCL training courses - http://arrayfire.com/#training • Almost 100 University Courses with OpenCL

OpenCL Research Papers on Google Scholar

http://developer.amd.com/partners/university-programs/opencl-university-course-listings/

© Copyright Khronos Group 2014 - Page 47 Khronos Foundational APIs

Market Momentum… Applications, libraries and frameworks that find OpenCL acceleration can deliver a better end-user experience Developer Innovation

A successful standard enables Deliver the lowest level abstraction possible and encourages innovation in API that still provides portability – this is implementation and usage functionality needed on every platform

Implementer Innovation

Market Momentum.. Many devices competing on performance and power to tap into the value of OpenCL content

© Copyright Khronos Group 2014 - Page 48 OpenCL as Parallel Language Backend

JavaScript Language for MulticoreWare Embedded Java language River Trail Compiler PyOpenCL Harlan binding for image open source array extensions Language directives for Python High level initiation of processing and project on language for for extensions to Fortran, wrapper language OpenCL C computational Bitbucket Haskell parallelism JavaScript C and C++ around for GPU kernels photography OpenCL programming

OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources

© Copyright Khronos Group 2014 - Page 49 Libraries and Languages using OpenCL

Library Name Overview Website Accelerate accelerate: An embedded language for accelerated array processing http://hackage.haskell.org/package/accelerate amgCL Simple and generic algebraic multigrid framework https://github.com/ddemidov/amgcl Aparapi API for data parallel Java. Allows suitable code to be executed on GPU via OpenCL. https://code.google.com/p/aparapi/ ArrayFire Array-based function library https://www.accelereyes.com/products/arrayfire Bolt Bolt C++ Template Library https://github.com/HSA-Libraries/Bolt/releases/tag/v1.1GA Boost.Compute Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. https://github.com/kylelutz/compute Bullet Physics Bullet Physic OpenCL accelerated Rigid Body Pipeline http://bulletphysics.org/wordpress/?p=381 C++ AMP CLANG/LLVM based C++AMP 1.2 standard and transforms it into OpenCL-C https://bitbucket.org/multicoreware/cppamp-driver-ng/wiki/Home clBLAS cl BLAS implementation https://github.com/clMathLibraries/clBLAS clFFT OpenCL FFT Libarary https://github.com/clMathLibraries/clFFT clMAGMA clMAGMA 1.1 is an OpenCL port of MAGMA http://icl.cs.utk.edu/magma/software/view.html?id=190 clpp OpenCL Data Parallel Primitives Library https://code.google.com/p/clpp/ clSpMV Sparse Matrix Solver http://www.eecs.berkeley.edu/~subrian/clSpMV.html Clyther Python just-in-time specialization engine for OpenCL http://srossross.github.io/Clyther/ Codeplay Math Lib OpenCL 1.2 Math library https://www.codeplay.com/products/math/ Concord C++ Hetrogenous Programing Framework ( Support OpenCL 1.2 ) TBB like https://github.com/IntelLabs/iHRC/ COPRTHR CO-PRocessing THReads (COPRTHR) SDK http://www.browndeertechnology.com/coprthr.htm DL- Data Layout DL Enables Optimized Data Layout Across Heterogeneous Processors http://www.multicorewareinc.com/dl.html ForOpenCL Fortran to OpenCL tool http://sourceforge.net/projects/fortran-parser/files/ForOpenCL/ fortranCL FortranCL is an OpenCL interface for Fortran 90. https://code.google.com/p/fortrancl/ FSCL.Compiler FSharp to OpenCL Compiler https://github.com/GabrieleCocco/FSCL.Compiler GATLAS GPU Automatically Tuned Linear Algebra Software ( Project looks stalled) https://github.com/cjang/GATLAS GMAC Global Memory for Accelerators http://www.multicorewareinc.com/gmac.html GPULib Iterative sparse solvers http://www.txcorp.com/ gpumatrix A matrix and array library on GPU with interface compatible with Eigen. https://github.com/rudaoshi/gpumatrix GPUVerify GPUVerify is a tool for formal analysis of GPU kernels written in OpenCL http://multicore.doc.ic.ac.uk/tools/GPUVerify/ Halide Halide Programming language for high-performance image processing http://halide-lang.org/ Harlan Harlan: A Scheme-Based GPU Programming Language https://github.com/eholk/harlan HOpenCL Haskell OpenCL Wrapper API https://github.com/bgaster/hopencl libCL C++ Generic parallel algorithms library http://www.libcl.org/ Libra SDK Cross Platform Acceleration API http://www.gpusystems.com/libra.aspx M³ Platform Parallel Framework and Primitive Libraries http://www.fixstars.com/en/products/m-cubed/ MUMPS Direct Sparse solver http://graal.ens-lyon.fr/MUMPS/ Octave Octave acceleration via OpenCL http://indico.cern.ch/event/93877/session/13/contribution/89/material/slides/0.pdf Courtesy: AMD

© Copyright Khronos Group 2014 - Page 50 Libraries and Languages using OpenCL #2

Open Fortran Parser ANTLR-based parsing tools that support the Fortran 2008 standard http://fortran-parser.sourceforge.net/ OpenACC to OpenCL Compiler Rose based OpenACC to OpenCL Compiler. https://github.com/tristanvdb/OpenACC-to-OpenCL-Compiler OpenCL.jl Julia OpenCL 1.2 bindings https://github.com/jakebolewski/OpenCL.jl OpenCLIPP OpenCL Integrated Performance Primitives - A library of optimized OpenCL image processing functions https://github.com/CRVI/OpenCLIPP OpenCLLink Mathematica to use the OpenCL parallel computing language http://reference.wolfram.com/mathematica/OpenCLLink/guide/OpenCLLink.html OpenClooVision Computer vision framework based on OpenCL and C# http://opencloovision.codeplex.com/ OpenCV-CL OpenCL accelerated OpenCV http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/07/opencv-cl_instructions-246.pdf OpenHMPP Directive-based OpenACC and OpenHMPP Source to OpenCL compiler http://www.caps-entreprise.com/products/caps-compilers/ Paralution C++ sparse iterative solvers and preconditioners library with OpenCL support http://www.paralution.com/ Pardiso Direct Sparse solver http://www.pardiso-project.org/ Pencil PENCIL to be a suitable target language for the compilation of domain-specific languages (DSLs). https://github.com/carpproject/pencil PETSc Portable, Extensible Toolkit for Scientific Computation http://www.mcs.anl.gov/petsc/ PyOpenCL OpenCL parallel computation API from Python http://mathema.tician.de/software/pyopencl/ QT with OpenCL Using OpenCL with QT http://doc.qt.digia.com/opencl-snapshot/ RaijinCL library for matrix operations for OpenCL http://www.raijincl.org/ Rivertrail JavaScript which supports Data Parallelism via OpenCL https://github.com/rivertrail/rivertrail/wiki RNG Random number generation for parallel computations http://www.iro.umontreal.ca/~lecuyer/ ROpenCL Parallel Computing for R Using OpenCL http://repos.openanalytics.eu/html/ROpenCL.html Rose Compiler Rose Compiler with OpenCL Support http://rosecompiler.org/ Rust-OpenCl OpenCL bindings for Rust. https://github.com/luqmana/rust-opencl ScalaCL Scala support of OpenCL https://github.com/ochafik/ScalaCL SkelCL SkelCL is a library providing high-level abstractions for alleviated programming of modern parallel heterogeneoushttps://github.com/skelcl/skelcl systems SnuCL SnuCL naturally extends the original OpenCL semantics to the heterogeneous cluster http://snucl.snu.ac.kr/ SpeedIT 2.4 OpenCl based OpenFoam acceleration library http://vratis.com/index.php?option=com_content&view=category&layout=blog&id=49&Itemid=88&lang=en streamscan StreamScan: Fast Scan Algorithms for GPUs without Global Barrier Synchronization- https://code.google.com/p/streamscan/ SuperLU Direct Sparse solver http://crd-legacy.lbl.gov/~xiaoye/SuperLU/ TM-Task Management Heterogeneous Task Scheduling and Management http://www.multicorewareinc.com/tm.html Trilinos Building blocks for the development of scientific applications; constructing and using sparse and dense matriceshttp://trilinos.sandia.gov/ VexCL VexCL is a C++ vector expression template library for OpenCL/CUDA http://ddemidov.github.io/vexcl ViennaCL open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs.http://viennacl.sourceforge.net/ VirtualCL VirtualCL (VCL) cluster platform is a wrapper for OpenCL™ http://www.mosix.cs.huji.ac.il/txt_vcl.html VOBLA Vehicle for Optimized Basic Linear Algebra - Optimized Basic Linear Algebra DSL https://github.com/carpproject/vobla VOCL Virtualized OpenCL enviornment http://www.mcs.anl.gov/~thakur/papers/xiao-vocl-inpar12.pdf VSI/Pro® VSIPL implementation in OpenCL http://www.techsource.com/press/pdfs/Run_Time-TechSource_press_release.pdf WAMS Algebraic Multigrid Solver using state-of-the-art wavelet preconditioners- solver for sparse linear equations http://www.newengland-scientific.com/ Courtesy: AMD

© Copyright Khronos Group 2014 - Page 51 Widening OpenCL Ecosystem

High-level OpenCL C Alternative Language High-level Alternative Language SingleFrameworks source Kernel Source Diverse, fordomain Kernels- Frameworks specificfor Languages, Kernels file applications frameworks and tools

SPIR Generator (e.g. patched Clang)

https://github.com/KhronosGroup/SPIR

SPIR is easier compiler target than C SYCL Programming abstraction that combines SPIR portability and efficiency of OpenCL with (Standard Portable ease of use and flexibility of C++ Intermediate Representation) Single source file programming First portable IR that includes SYCL 1.2 Provisional Updated

support for parallel computation November 2014 Created in close cooperation with OpenCL run-time OpenCL C LLVM community can consume SPIR Runtime SPIR 2.0 Provisional Released August 2014 (uses LLVM 3.4) Device X Device Y Device Z © Copyright Khronos Group 2014 - Page 52 SYCL for OpenCL • Pronounced ‘sickle’ to go with ‘spear’ (SPIR) • Royalty-free, cross-platform C++ programming layer - Builds on concepts portability & efficiency of OpenCL - Ease of use and flexibility of C++ • Single-source C++ development - C++ template functions can contain host & device code - e.g. parallel_sort (myData); - Construct complex reusable algorithm templates that use OpenCL for acceleration • SYCL 1.2 Provisional spec released at GDC in March 2014 - Updated at Supercomputing November 2014

© Copyright Khronos Group 2014 - Page 53 SPIR Unleashes Language Innovation • Front-ends - New language front-ends and programming abstractions for heterogeneous parallel programming target production quality OpenCL backends through SPIR • Back-ends - New target platforms based on multicore, vector, VLIW or other technologies can reuse production quality language frontends and abstractions - E.g. OpenACC, C++ AMP and Python are targeting SPIR to access optimized back- ends across multiple vendors • Tooling - Advanced program analysis and optimization of programs in SPIR form • SPIR 2.0 supports full2.0 “C” kernel language - Generic address space Front-end Multi Languages and Vendor - Device side kernel enqueue Frameworks Tools - C++11 atomics, Pipes, More… Multiple Hardware - Uses LLVM 3.4 with restrictions and conventions Architectures Backends © Copyright Khronos Group 2014 - Page 54

Heterogeneous Computing and Mobile • Mobile SOCs now beginning to need more than just ‘GPU Compute’ - Multi-core CPUs, GPUs, DSPs, ISPs, specialized hardware blocks • OpenCL can provide a single programming framework for all processors on a SOC - OpenCL 1.2 Built-in Kernels for custom HW

Image Courtesy Qualcomm © Copyright Khronos Group 2014 - Page 55 APIs for Mobile Compute

GPU Compute Shaders (OpenGL 4.4 and OpenGL ES 3.1) Pervasively available on almost any mobile device or OS Easy integration into graphics apps – no API interop needed Program in GLSL not C Limited to acceleration on a single GPU

General Purpose Heterogeneous Programming Framework Flexible, low-level access to any devices with OpenCL compiler Open standard for any device or OS – being used as backend by many languages and frameworks Single programming and run-time framework for CPUs, GPUs, DSPs, hardware Needs full compiler stack and IEEE precision

Metal Integrated Graphics and Compute Subset of a mix of OpenGL and OpenCL functionality C++11-based kernel language Apple only (iOS 8 only, A7 and later hardware), GPU only

C/C++ Language Integrated GPU Compute Easy programmability and low level access to GPU: Unified Memory, Virtual Addressing, Mature and optimized tools and performance Extensive compute and imaging libraries available (NPP, cuFFT, cuBLAS, -gdb, nvprof etc.) NVIDIA only, GPU only RenderScript - Easy, High-level Compute Offload from Java C99 based kernel language for simple offload from Java apps to CPU and GPU RS JIT Compilation provide host and device portability Android only Limited control over acceleration configuration © Copyright Khronos Group 2014 - Page 56 RenderScript and OpenCL • RenderScript and OpenCL do not directly compete - RS addressing very different needs to OpenCL – at a different level in the stack • RenderScript designed for 99% of Android developers - using Java - Code critical sections as native C - automatic offload to CPU/GPU - Programmer Simplicity and Portability across 1,000’s Android handsets - Future - Dynamic load balancing through integration with Android instrumentation and power management systems • BUT - other types of developer need OpenCL-class control in native code - Middleware engines: Unity, Epic Unreal, metaio AR, Bullet Physics … - Leading edge apps: real-time video/vision/camera Compute Graphics - OEM functionality: e.g. camera pipeline Java Binding to - These are the developers/apps/engines Java RS OpenGL ES that hardware vendors want for differentiation (similar to JSR239)

OpenCL on Android can enable specialized access to native Native acceleration and be an effective backend for RenderScript innovation

© Copyright Khronos Group 2014 - Page 57 Mobile OpenCL Shipping • Android ICD extension released in latest extension specification - OpenCL implementations can be discovered and loaded as a shared object • Multiple implementations shipping in Android NDK - ARM, Imagination, NVIDIA, Vivante, Qualcomm, Samsung …

© Copyright Khronos Group 2014 - Page 58 Mixamo - Avatar Videoconferencing • Real time facial animation capture on mobile – ported directly from PC • Animate an avatar while conferencing • Full GPU acceleration of vision processing using OpenCL

NVIDIA Tegra K1 Development Board © Copyright Khronos Group 2014 - Page 59 WebCL - Heterogeneous Computing for Web • OpenCL = Two APIs and C-based Kernel language - Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels across multiple devices • WebCL defines JavaScript binding to the OpenCL APIs - Enables initiation of OpenCL C Kernels from within the browser

OpenCL KernelOpenCL CodeKernel OpenCL CodeKernel OpenCL C JavaScript Platform API CodeKernel JavaScript Runtime API To query, select and initialize Code To build and execute kernels compute devices across multiple devices GPU DSP CPU CPU HW

© Copyright Khronos Group 2014 - Page 60 Motivation for WebCL • Parallel acceleration for compute-intensive web applications - Portable and efficient access to heterogeneous multicore devices in JavaScript • Typical Use Cases - 3D asset codecs, video codecs and processing, imaging and vision processing - Physics for WebGL games, Online data visualization, Augmented Reality • WebCL 1.0 specification officially released at GDC March 2014 - https://www.khronos.org/webcl

http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc © Copyright Khronos Group 2014 - Page 61 WebCL Open Source Resources • Implementations

- Nokia - Firefox extension (Mozilla Public License 2.0) Based on Apple QJulia - https://github.com/toaarnio/webcl-firefox - Samsung - WebKit (BSD) - https://github.com/SRA-SiliconValley/webkit-webcl - Motorola - Uses Node.js (BSD) - https://github.com/Motorola-Mobility/node-webcl Based on Iñigo Quilez, Shader Toy - AMD –Chromium build - https://github.com/amd/Chromium-WebCL • WebCL Kernel Validator (open source) - https://github.com/KhronosGroup/webcl-validator Based on Iñigo Quilez, Shader Toy • OpenCL to WebCL Translator - https://github.com/wolfviking0/webcl-translator • OpenCL Conformance Tests - https://github.com/KhronosGroup/WebCL-conformance/ http://fract.ured.me/ © Copyright Khronos Group 2014 - Page 62

Khronos and W3C Cooperation • Khronos and W3C liaison for Web APIs - Leverage proven native APIs - Fast API development/deployment - Designed by hardware community - Familiar foundation reduces W3C Augmented Web Community Group discussing many of these vision developer learning curve issues for the Web: e.g. leveraging WebRTC in the short term http://w3.org/community/ar

WebSL? JS Binding to Canvas WebVX? WebStream? WebKCAM? JavaScript Vision Sensor Fusion Camera Processing control WebAudio

Native Path Rendering JavaScript API shipping, Possible future Native APIs shipping acceleration being developed JavaScript APIs or or Khronos working group or work underway acceleration © Copyright Khronos Group 2014 - Page 63 OpenMAX IL Media Acceleration

StageFright

OpenMAX IL enables diverse Low-level Acceleration Media Acceleration high-level media frameworks and applications to portably tap into silicon media acceleration

© Copyright Khronos Group 2014 - Page 64 OpenMAX IL – Video, Audio and Imaging • Enables arbitrary multimedia pipelines by plugging blocks together - Componentized architecture abstracts multimedia functionality block interfaces • Wide variety of building blocks for imaging, video and audio functions - Encode, decode, apply an effect, capture, render, split, mix, etc • Enables blocks from different sources to work together - Blocks can be implemented in software or hardware

Portable & reusable media processing building blocks

© Copyright Khronos Group 2014 - Page 65 OpenMAX IL – Component Graphs • Standardized component interfaces enable flexible media graphs - Including tunneling between components for execution efficiency • Wide variety of components for imaging, video and audio functions - Encode, decode, apply an effect, capture, render, split, mix, etc

AAC Audio Audio Audio Decoder Renderer Speakers Clock *.mp4 / *.3gp for AV Sync Time File Reader Data

Video Video Video Scheduler Renderer Display MPEG4/ Decoder Decompressed H.264 Video Video

Example: MPEG-4 video synchronized with AAC audio decode

© Copyright Khronos Group 2014 - Page 66 OpenSL ES and OpenMAX AL

Advanced Audio Multimedia Video 3D Audio playback Audio Video Playback recording Audio Effects Radio and RDS Audio Advanced Recording Camera MIDI

Basic Image capture Buffer MIDI & display queues

Both working groups collaborate to define common API functionality

© Copyright Khronos Group 2014 - Page 67 OpenMAX AL - Object Oriented Media • Connect media objects for processing for images and video with AV sync - Media Objects enable PLAY and RECORD of media • Objects have control interfaces - Play, Seek, Rate, Audio, Video Post-processing, Metadata Extraction - Record, Camera, Video Encoder, Audio Encoder, Metadata Insertion, Radio, MIDI • Extensive camera controls - Flash and metering modes, White balance and focusing controls - Exposure compensation, ISO Sensitivity, Shutter speed & Aperture, Zoom • Analog radio controls - Tuning, RDS Analog Radio Audio Mix Camera Display Window Audio Input DSrc OpenMAX AL DSnk Media Object URI URI Memory Memory © Copyright Khronos Group 2014 - Page 68 OpenMAX AL Video Playback Example • Create Engine object - To drive this session • Create Audio Output Mix object - Method on Engine interface - Mix object drives audio output devices Application • Create Media Player object - Method on Engine interface EngineItf - Input is URI pointing to a local media file Play Event Callback - Output drives display and audio output mix Engine Object • Register event callback - Method on Media Player interface PlayItf • Set PlayState to Playing Media Player Output Mix - Method on Media Player interface Object Object • Wait for end of file event - Via registered callback

© Copyright Khronos Group 2014 - Page 69 OpenMAX AL Profiles and Extensions • Two profiles: - Media Player – media playback-only devices - Media Player/Recorder – full-featured media devices • Some features optional in all profiles - E.g. Vibra, LED, Analog Radio, MIDI, Digital TV - APIs are consistent when hardware is available • Vendor-specific extensions can be integrated into future API core specs

Camera controls Audio playback Audio recording Video playback Video recording Image rendering Image capture

© Copyright Khronos Group 2014 - Page 70 Other OpenMAX AL Features • Extensive camera controls - Flash and metering modes - White balance and focusing controls - Exposure compensation, ISO Sensitivity - Shutter speed & Aperture - Zoom (digital and optical) • Analog radio controls - Tuning, RDS • Audio routing - Application-selectable audio inputs and outputs, based on location, connectivity, etc. - I/O device capability querying • Metadata extraction and insertion - Search/extract and insert/overwrite metadata in a variety of file formats

© Copyright Khronos Group 2014 - Page 71 What’s New in OpenMAX AL 1.1 • Chaining of media objects - Explicit ordering of media processing steps - Transcoding - Audio replacement • Dynamic sources and sinks • Metadata support for streaming playback • Content pipes • Multiple version support • Support for VP8 codec format • New analog radio callback events for more fine-grained radio control • New error codes for improved error handling

© Copyright Khronos Group 2014 - Page 72 Audio Fragmentation • Modern mobile devices have advanced audio capabilities - Including high-quality music and 3D gaming • BUT - no standard way to access audio hardware acceleration - Even playing a simple sound on different platform requires different code • What about ALSA, OSS, GStreamer, OpenAL? - OpenAL is targeted for desktop PCs - OSS is obsolete, replaced by ALSA - ALSA is Linux specific - GStreamer is not an API – and not designed to be optimally hardware accelerated - Are all released under variations of GNU Public License

© Copyright Khronos Group 2014 - Page 73 OpenSL ES – Advanced Audio • Create theater-quality audio experience - Even in a mobile device! • Profiles reduce application customization - Applications can query available profiles - Develop to a specific profile or profile combination • Full 3D audio functionality enhances any gaming experience - Perfect companion to OpenGL ES • Designed for implementation by either a hardware or software solution - Unlike any other advanced audio API

© Copyright Khronos Group 2014 - Page 74 OpenSL ES Profiles

Game-centric mobile devices Music-centric mobile devices Advanced MIDI functionality, sophisticated High quality audio, ability to audio capabilities such as 3D audio, audio support multiple music audio effects, ability to handle buffers of audio, etc. codecs, audio streaming support

Basic mobile phones Ring tone and alert tone playback (basic MIDI functionality), basic audio playback and record functionality, simple 2D audio games

© Copyright Khronos Group 2014 - Page 75 OpenSL ES – Object-Oriented Audio • OpenSL ES has an object-oriented programming model - Simplifies common use cases – but also extensible • Engine Objects are central to any OpenSL ES session - Objects created using methods on the Engine Object interfaces • OpenSL ES Objects enable PLAY and RECORD of audio - Perform some operation on an input and emit the result as output - Can handle almost any audio use case • Objects have control interfaces - For application

© Copyright Khronos Group 2014 - Page 76 What’s new in OpenSL ES 1.1

• Buffer queues • Content pipes • Better control of 3D performance • Explicit object ordering • Dynamic sources and sinks • Metadata support for streaming playback • Multiple version support • Extension configuration support • And more…

© Copyright Khronos Group 2014 - Page 77 Mobile Vision Acceleration = New Experiences

Need for advanced sensors and the acceleration to process them

Computational Face, Body and 3D Scene/Object Augmented Photography and Gesture Tracking Reconstruction Reality Videography

© Copyright Khronos Group 2014 - Page 78 Visual Computing = Graphics AND Vision

Graphics Processing

Data New mobile visual sensors for MORE DATA Advanced mobile hardware for MORE PROCESSING Enables closer intertwining of real and virtual worlds Imagery

Vision Processing High-Quality Reflections, Refractions, and Caustics in Augmented Reality and their Contribution to Visual Coherence P. Kán, H. Kaufmann, Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria https://www.youtube.com/watch?v=i2MEwVZzDaA © Copyright Khronos Group 2014 - Page 79 Vision Pipeline Challenges and Opportunities Growing Camera Diversity Diverse Vision Processors Sensor Proliferation Capturing color, range Driving for high performance Diverse sensor awareness of and lightfields and low power the user and surroundings

• Light / Proximity • 2 cameras • 3 microphones • Touch • Position - GPS - WiFi (fingerprint) • Camera sensors >20MPix - Cellular trilateration • Multi-core CPUs • Novel sensor configurations - NFC/Bluetooth Beacons • Programmable GPUs • Stereo pairs • Accelerometer • DSPs and DSP arrays • Magnetometer • Plenoptic Arrays 19 • Camera ISPs • Gyroscope • Active Structured Light • Dedicated vision IP blocks • Pressure / Temp / Humidity • Active TOF

Flexible sensor and camera Use best processing available Control/fuse vision data control to generate for image stream processing – by/with all other sensor data required image stream with code portability on device

© Copyright Khronos Group 2014 - Page 80 Vision Processing Power Efficiency • Depth sensors = significant processing - Generate/use environmental information Advanced Sensors • Wearables will need ‘always-on’ vision - With smaller thermal limit / battery than phones! • GPUs has x10 CPU imaging power efficiency - GPUs architected for efficient pixel handling • Traditional cameras have dedicated hardware - ISP = Image Signal Processor – on all SOCs today Wearables • SOCs have space for more transistors X100 Dedicated

- But can’t turn on at same time = Dark Silicon Hardware GPU • Potential for dedicated sensor/vision silicon X10 Compute - Can trigger full CPU/GPU complex Multi-core Power EfficiencyPower X1 CPU But how to program specialized processors? Performance and Functional Portability Computation Flexibility © Copyright Khronos Group 2014 - Page 81 OpenVX – Power Efficient Vision Acceleration • Out-of-the-Box vision acceleration framework - Enables low-power, real-time applications - Targeted at mobile and embedded platforms

• Functional Portability Application - Tightly defined specification Application Application - Full conformance tests Application • Performance portability across diverse HW - Higher-level abstraction hides hardware details - ISPs, Dedicated hardware, DSPs and DSP arrays, GPUs, Multi-core CPUs … • Enables low-power, always-on acceleration - Can run solely on dedicated vision hardware Vision AcceleratorVision - Does not require full SOC CPU/GPU complex to AcceleratorVision be powered on AcceleratorVision Accelerator

© Copyright Khronos Group 2014 - Page 82 OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Processing can be tiled to keep data entirely in local memory/cache • VXU Utility Library for access to single nodes - Easy way to start using OpenVX by calling each node independently • EGLStreams can provide data and event interop with other Khronos APIs - BUT use of other Khronos APIs are not mandated

OpenVX Node Native OpenVX OpenVX Downstream Camera Node Node Application Control OpenVX Processing Node

Example OpenVX Graph

© Copyright Khronos Group 2014 - Page 83 OpenVX 1.0 Function Overview • Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters OpenVX Specification • Image Processing Is Extensible - Arithmetic, Logical, and statistical operations Khronos maintains extension registry - Multichannel Color and BitDepth Extraction and Conversion OpenVX 1.0 defines - 2D Filtering and Morphological operations framework for - Image Resizing and Warping creating, managing and executing graphs • Core Computer Vision - Pyramid computation - Integral Image computation Focused set of widely • Feature Extraction and Tracking used functions that are readily accelerated - Histogram Computation and Equalization Widely used extensions adopted into future - Canny Edge Detection versions of the core - Harris and FAST Corner detection Implementers can add - Sparse Optical Flow functions as extensions

© Copyright Khronos Group 2014 - Page 84 Example Graph - Stereo Machine Vision

OpenVX Graph

Stereo Rectify with Compute Depth Detect and Object Camera 1 Remap Map track objects (User Node) (User Node) coordinates

Stereo Image Rectify with Compute Pyramid Camera 2 Remap Optical

Flow

Delay

Tiling extension enables user nodes (extensions) to also optimally run in local memory

© Copyright Khronos Group 2014 - Page 85 OpenVX and OpenCV are Complementary

Community driven open source Formal specification defined and Governance with no formal specification implemented by hardware vendors No conformance tests for consistency and Full conformance test suite / process Conformance every vendor implements different subset creates a reliable acceleration platform Portability APIs can vary depending on processor Hardware abstracted for portability Very wide Tight focus on hardware accelerated Scope 1000s of imaging and vision functions functions for mobile vision Multiple camera APIs/interfaces Use external camera API Memory-based architecture Graph-based execution Efficiency Each operation reads and writes memory Optimizable computation, data transfer Use Case Rapid experimentation Production development & deployment

© Copyright Khronos Group 2014 - Page 86 OpenVX Announcement • Finalized OpenVX 1.0 specification released October 2014 - www.khronos.org/openvx • Full conformance test suite and Adopters Program immediately available - $20K Adopters fee ($15K for members) – working group reviews submitted results - Test suite exercises graph framework and functionality of each OpenVX 1.0 node - Approved Conformant implementations can use the OpenVX trademark • Khronos working on open source sample implementation of OpenVX 1.0 - Expected release on GitHub by end of 2014

© Copyright Khronos Group 2014 - Page 87 Khronos APIs for Vision Processing • Any compute API can be used for vision acceleration - OpenCL, OpenGL Compute Shaders … • OpenVX is the only vision API that does not NEED a CPU/GPU complex - Can use any processor – from high-end GPU, through DSPs to hardware blocks • Regardless of the underlying hardware – the application remains portable - The higher abstraction level of OpenVX protects app from hardware differences • App portability to dedicated vision hardware and graph-based optimizations are the keys to achieving very lower power vision processing

Many implementers may choose to use OpenCL or OpenGL Compute Shaders to implement OpenVX nodes and OpenVX to enable a developer to connect those nodes into a graph Programmable Vision Dedicated Vision Processors Hardware

© Copyright Khronos Group 2014 - Page 88 NVIDIA VisionWorks is Integrating OpenVX • VisionWorks library contains diverse vision and imaging primitives • Will leverage OpenVX for optimized primitive execution • Can extend VisionWorks nodes through GPU-accelerated primitives Applications and Middleware • Provided with sample library of fully accelerated pipelines

Vision Pipeline Samples 3rd Party Pipelines Object … SLAM Detection VisionWorks VisionWorks Primitives Framework

Corner Classifier 3rd Party Detection …

CUDA Libraries GPU Libraries

Tegra K1

© Copyright Khronos Group 2014 - Page 89 Need for Camera Control API - OpenKCAM • Advanced control of ISP and camera subsystem – with cross-platform portability - Generate sophisticated image stream for advanced imaging & vision apps • No platform API currently fulfills all developer requirements - Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays - Cross sensor synch: e.g. synch of camera and MEMS sensors - Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI - Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing

Defines control of Sensor, Color Filter Array Lens, Flash, Focus, Aperture

Auto Exposure (AE) Auto White Balance (AWB) Auto Focus (AF) Image Signal Image/Vision Processor (ISP) Applications EGLStreams

© Copyright Khronos Group 2014 - Page 90 OpenKCAM is FCAM-based • FCAM (2010) Stanford/Nokia, open source • Capture stream of camera images with precision control Khronos coordinating with MIPI on camera control and - A pipeline that converts requests into image stream data formats - All parameters packed into the requests - no visible state - Programmer has full control over sensor settings for each frame in stream • Control over focus and flash - No hidden daemon running • Control ISP - Can access supplemental statistics from ISP if available • No global state - State travels with image requests - Every pipeline stage may have different state - Enables fast, deterministic state changes

© Copyright Khronos Group 2014 - Page 91 Sensor Industry Fragmentation …

© Copyright Khronos Group 2014 - Page 92 Low-level Sensor Abstraction API

Apps request semantic sensor information StreamInput defines possible requests, e.g. Read Physical or Virtual Sensors e.g. “Game Quaternion” Context detection e.g. “Am I in an elevator?”

Apps Need Sophisticated Access to Sensor Data Without coding to specific Advanced Sensors Everywhere sensor hardware Multi-axis motion/position, quaternions, context-awareness, gestures, activity Sensor Discoverability monitoring, health and environmental sensors Sensor Code Portability

StreamInput processing graph provides optimized sensor data stream High-value, smart sensor fusion middleware can connect to apps in a portable way Apps can gain ‘magical’ situational awareness

© Copyright Khronos Group 2014 - Page 93 Khronos APIs for Augmented Reality

AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together Audio Rendering

MEMS Application Sensors Sensor on CPUs, GPUs Fusion and DSPs

Precision timestamps Vision Processing on all sensor samples

Advanced Camera EGLStream - 3D Rendering and Video Control and stream stream data Composition generation between APIs On GPU

© Copyright Khronos Group 2014 - Page 94 Summary • Khronos is building a trio of interoperating APIs for portable / power-efficient vision and sensor processing • OpenVX 1.0 specification is now finalized and released - Full conformance tests and Adopters program immediately available - Khronos open source sample implementation by end of 2014 - First commercial implementations already close to shipping • Any company is welcome to join Khronos to influence the direction of mobile and embedded vision processing! - $15K annual membership fee for access to all Khronos API working groups - Well-defined IP framework protects your IP and conformant implementations • More Information - www.khronos.org - [email protected] - @neilt3d

© Copyright Khronos Group 2014 - Page 95 Questions?

• www.khronos.org • [email protected] • @neilt3d

© Copyright Khronos Group 2014 - Page 96