OpenCL Press Conference Tokyo, November 2011

Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group

© Copyright Khronos Group, 2011 - Page 1 Agenda • Introducing Khronos and announcing OpenCL 1.2! - Neil Trevett, NVIDIA • Guest Speakers - Imagination – Tony King-Smith - AMD Japan – Kazuo Hamaji - Fixstars – Akihiko Tomita • Q&A

© Copyright Khronos Group, 2011 - Page 2 Announcing OpenCL 1.2! • OpenCL 1.2 Specification publicly available today! - Significant updates - Khronos being responsive to developer requests - Updated OpenCL 1.2 conformance tests available - Multiple implementations underway • Backward compatible upgrade to OpenCL 1.1 - OpenCL 1.2 will run any OpenCL 1.0 and OpenCL 1.1 programs - OpenCL 1.2 platform can contain 1.0, 1.1 and 1.2 devices - Maintains embedded profile for mobile and embedded devices

© Copyright Khronos Group, 2011 - Page 3 Khronos - Connecting Software to Silicon • Creating open, royalty-free API standards - Focus on graphics, dynamic media, compute and sensor hardware • Low-level - just above raw silicon - “Foundation” functionality needed on every platform • Safe forum for industry cooperation - „By the industry for the industry‟ - Open to any company to join - IP framework to protect

members and industry enable software developers to turn silicon functionality into rich end user experiences

© Copyright Khronos Group, 2011 - Page 4 Apple Over 100 members – any company worldwide is welcome to join Board of Promoters

© Copyright Khronos Group, 2011 - Page 5

Khronos Family of Standards

StreamInput 3D Digital Asset Plugin-free Web Unified Sensor and

Exchange format 3D Web Content Compute Input Processing

accessibility

Authoring and and Authoring

A coordinated ecosystem of Cross platform Parallel Embedded and Vector 2D compute, graphics and media

desktop 3D Computing Mobile 3D standards and APIs

Application Application Acceleration Acceleration

Streaming Media Advanced Audio Surface Management

Khronos creates royalty-free specifications to meet real market needs and helps drive industry adoption across multiple platforms

© Copyright Khronos Group, 2011 - Page 6 Processor Parallelism

CPUs GPUs Multiple cores driving Emerging Increasingly general performance increases Intersection purpose data-parallel computing

Multi- Heterogeneous Graphics processor Computing APIs and programming Shading – e.g. OpenMP Languages

OpenCL is a programming framework for heterogeneous compute resources

© Copyright Khronos Group, 2011 - Page 7 The BIG Idea behind OpenCL • OpenCL execution model … - Define N-dimensional computation domain - Execute a kernel at each point in computation domain • C Derivative to write kernels – based on ISO C99 - APIs to discover devices in a system and distribute work to them • Targeting many types of device - GPUs, CPUs, DSPs, embedded systems, mobile phones.. Even FPGAs Traditional loops Data Parallel OpenCL void kernel void trad_mul(int n, dp_mul(global const float *a, const float *a, global const float *b, const float *b, global float *c) float *c) { { int id = get_global_id(0); int i; for (i=0; i

© Copyright Khronos Group, 2011 - Page 8 OpenCL Platform Model • One Host + one or more Compute Devices - Each Compute Device is composed of one or more Compute Units - Each Compute Unit is further divided into one or more Processing Elements

© Copyright Khronos Group, 2011 - Page 9 Anatomy of OpenCL • Platform Layer API - A hardware abstraction layer over diverse computational resources - Query, select and initialize compute devices - Create compute contexts and work-queues • Runtime API - Execute compute kernels - Manage scheduling, compute, and memory resources • Language Specification - C-based cross-platform programming interface - Subset of ISO C99 with language extensions - familiar to developers - Defined numerical accuracy - IEEE 754 rounding with specified maximum error - Online or offline compilation and build of compute kernel executables - Rich set of built-in functions

© Copyright Khronos Group, 2011 - Page 10 OpenCL Embedded Profile • OpenCL 1.0 has Embedded profile - no need for a separate “ES” spec • Almost identical functionality – some reduced precision requirements

• An always-on, connected, mobile device with multiple sensors, graphics and imaging PLUS a supercomputer – all in the palm of your hand will create a new wave of application opportunities…

A concept GPS phone processes images to recognize buildings and landmarks and uses the internet to supply relevant data

© Copyright Khronos Khronos 2009 Group, 2011 - Page 11 OpenCL Working Group Members

• Diverse industry participation – many industry experts - Processor vendors, system OEMs, middleware vendors, application developers - Academia and research labs, FPGA vendors • NVIDIA is chair, Apple is specification editor

Apple

© Copyright Khronos Group, 2011 - Page 12 OpenCL Milestones • Six months from proposal to released OpenCL 1.0 specification - Due to a strong initial proposal and a shared commercial incentive • Multiple conformant implementations shipping - For CPUs and GPUs on multiple OS • 18 month cadence between OpenCL 1.0, OpenCL 1.1 and now OpenCL 1.2 - Backwards compatibility protect software investment

OpenCL 1.2 Specification and OpenCL 1.0 conformance tests OpenCL working conformance tests group formed released released! Dec08 Jun10 Jun08 May09 Nov11 OpenCL 1.0 OpenCL 1.1 released Specification and conformance tests released

© Copyright Khronos Group, 2011 - Page 13 Major New Features in OpenCL 1.2 • Partitioning Devices - Applications can partition a device into sub-devices - Enables computation to be assigned to specific compute units - Reserve a part of the device for use for high priority/latency-sensitive tasks or effectively use shared hardware resources such as a cache • Separate compilation and linking of objects - Provides the capabilities and flexibility of traditional compilers - Create a library of OpenCL programs that other programs can link to • Enhanced Image Support - Added support for 1D images, 1D & 2D image arrays - OpenGL sharing extension now enables an OpenCL image to be created from an OpenGL 1D texture, 1D and 2D texture arrays

© Copyright Khronos Group, 2011 - Page 14 More Major New Features in OpenCL 1.2 • Custom devices and built-in kernels - Drive specialized custom devices from OpenCL – even if not programmable - Can enqueue built-in kernels to custom devices alongside OpenCL kernels • DX9 Media Surface Sharing - Efficient sharing between OpenCL and DirectX 9 or DXVA media surfaces • DX11 surface sharing - Efficient sharing between OpenCL and DirectX 11 surfaces • Installable Client Drivers (optional) - Portably handling multiple installed implementations from multiple vendors • And many other updates and additions..

© Copyright Khronos Group, 2011 - Page 15 Partitioning Devices Host • Devices can be partitioned into sub-devices - More control over how computation Compute Device is assigned to compute units Compute Device • Sub-devices may be used just like Compute Device

a normal device Compute Compute - Create contexts, building programs, Unit Unit further partitioning and creating Compute Compute Unit Unit command-queues Compute Compute • Three ways to partition a device Unit Unit - Split into equal-size groups - Provide list of group sizes - Group devices sharing a part of a Sub-device #1 Sub-device #2 Real-time Mainline cache hierarchy processing tasks processing tasks

© Copyright Khronos Group, 2011 - Page 16 Custom Devices and Built-in Kernels • Embedded platforms often contain specialized hardware and firmware - That cannot support OpenCL C • Built-in kernels can represent these hardware and firmware capabilities - Such as video encode/decode • Hardware can be integrated and controlled from the OpenCL framework - Can enqueue built-in kernels to custom devices alongside OpenCL kernels • FPGAs are one example of device that can expose built-in kernels - Latest FPGAs can support full OpenCL C as well • OpenCL becomes a powerful coordinating framework for diverse resources Built-in kernels enable - Programmable and non-programmable devices control of specialized controlled by one run-time processors and hardware from OpenCL run-time

© Copyright Khronos Group, 2011 - Page 17 Installable Client Driver • Analogous to OpenGL ICDs in use for many years - Used to handle multiple OpenGL implementations installed on a system • Optional extension - Platform vendor will choose whether to use ICD mechanisms • Khronos OpenCL installable client driver loader - Exposes multiple separate vendor installable client drivers (Vendor ICDs) • Application can access all vendor implementations - The ICD Loader acts as a de-multiplexor

Vendor #1 ICD Loader ensures OpenCL multiple implementations Vendor #2 are installed cleanly Application ICD Loader OpenCL ICD Loader enables application to use any of the installed implementations Vendor #3 OpenCL

© Copyright Khronos Group, 2011 - Page 18 OpenCL Desktop Implementations • http://developer.amd.com/zones/OpenCLZone/ • http://software.intel.com/en-us/articles/opencl-sdk/ • http://developer.nvidia.com/opencl

© Copyright Khronos Group, 2011 - Page 19 OpenCL Books – Available Now! • OpenCL Programming Guide - The “Red Book” of OpenCL - http://www.amazon.com/OpenCL-Programming-Guide-Aaftab-Munshi/dp/0321749642 • OpenCL in Action - http://www.amazon.com/OpenCL-Action-Accelerate-Graphics-Computations/dp/1617290173/ • with OpenCL - http://www.amazon.com/Heterogeneous-Computing-with-OpenCL-ebook/dp/B005JRHYUS • The OpenCL Programming Book - http://www.fixstars.com/en/opencl/book/

© Copyright Khronos Group, 2011 - Page 20 Spec Translations • Japanese OpenCL 1.1 spec translation available today - http://www.cutt.co.jp/book/978-4-87783-256-8.html - Valued partnership between Khronos and CUTT in Japan • Working on OpenCL 1.2 specification translations - Japanese, Korean and Chinese

© Copyright Khronos Group, 2011 - Page 21 Khronos OpenCL Resources • OpenCL is 100% free for developers - Download drivers from your silicon vendor • OpenCL Registry - www.khronos.org/registry/cl/ • OpenCL 1.2 Reference Card - PDF version - http://www.khronos.org/files/opencl-1-2-quick-reference-card.pdf • Online Man pages - http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/ • OpenCL Developer Forums - Give us your feedback! - www.khronos.org/message_boards/

© Copyright Khronos Group, 2011 - Page 22 Looking Forward

OpenCL-HLM (High Level Model) Exploring high-level programming model, unifying host and device execution environments through language syntax for increased usability and broader optimization opportunities

Long-term Core Roadmap Exploring enhanced memory and execution model flexibility to catalyze and expose emerging hardware capabilities

WebCL Bring parallel computation to the Web through a JavaScript binding to OpenCL

OpenCL-SPIR (Standard Parallel Intermediate Representation) Exploring low-level Intermediate Representation for code obfuscation/security and to provide target back-end for alternative high-level languages

© Copyright Khronos Group, 2011 - Page 23 Leveraging Native API Investment into HTML5 • HTML5 evolving into cross-platform programming platform - Gradually exposing complete system capabilities • Opportunity to synergize Web and native APIs development - Leverage native API investments, reduce developer learning cycles • Khronos and W3C creating close liaison

WebMAX? Device and WebAudio Sensor APIs Camera HTML and Advanced Device control and Browser JavaScript Orientation video Composition JavaScript Audio Working processing Groups

StreamInput Native

Native APIs shipping JavaScript API shipping Possible future or working group underway or working group underway JavaScript APIs

© Copyright Khronos Group, 2011 - Page 24 WebCL – Parallel Computing for the Web • Khronos launching new WebCL initiative - First announced in March 2011 - API definition already underway • JavaScript binding to OpenCL - Security is top priority • Many use cases - Physics engines to complement WebGL - Image and video editing in browser • Stay close to the OpenCL standard - Maximum flexibility - Foundation for higher-level middleware

© Copyright Khronos Group, 2011 - Page 25 WebCL Security Robustness • WebCL will be 100% secure - Just like WebGL – essential for any Web API WebCL • Robustness provided through: Robustness - WebCL language restrictions (disallowing pointers) - Memory protection - Provisions for detecting and preventing denial of service • OpenCL Robustness Extension - Can be implemented BY or OVER OpenCL - Most efficient if in OpenCL implementation itself WebCL Robustness Layer • WebCL Robustness Layer OpenCL Robustness Layer - Allows query for OpenCL Robustness extension - Provides software support for features not supported in OpenCL device

© Copyright Khronos Group, 2011 - Page 26 WebCL Open Process and Resources • Khronos open process to engage Web community - Public specification drafts, mailing lists, forums - http://www.khronos.org/webcl/ - [email protected] • Khronos welcomes new members to define and drive WebCL - [email protected] open sourced prototype for Firefox in May 2011 (LGPL) - http://webcl.nokiaresearch.com • Samsung open sourced prototype for WebKit in July 2011 (BSD) - http://code.google.com/p/webcl/

• Deformation Demo: • Calculates and renders transparent and reflective deformed spheres on top of photo background • Performance comparison - JS: ~1 FPS - WebCL: 87-116 FPS • http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc © Copyright Khronos Group, 2011 - Page 27

Visual Computing Ecosystem

High performance compute and graphics interop – buffer and events

Compute and mobile APIs interoperate through EGL

JavaScript bindings to OpenCL Parallel computation in HTML5

© Copyright Khronos Group, 2011 - Page 28 In Summary • OpenCL 1.2 Specification released by Khronos today! - www.khronos.org/opencl/ • Updates to OpenCL to meet developer needs and requests - Backwards compatible with previous versions • Extensive conformance tests already available - Multiple implementations underway • OpenCL forms a vital part of the open visual computing ecosystem - Crucial to bring parallel computation to mobile and web platforms • Strong ongoing investment in OpenCL’s roadmap - Core – exploring expanding OpenCL memory and execution models - OpenCL-HLM – parallel programming expressed through high-level languages - OpenCL-SPIR – low-level intermediate representations - WebCL – bring OpenCL capabilities to HTML5 browsers

© Copyright Khronos Group, 2011 - Page 29