GPU Acceleration for the Web State of the Union Neil Trevett Khronos President NVIDIA Vice President Mobile Content

© Copyright 2014 - Page 1 Khronos Connects Software to Silicon

Open Consortium creating ROYALTY-FREE, OPEN STANDARD for hardware acceleration

Defining the roadmap for low-level silicon interfaces needed on every platform

Graphics, compute, rich media, vision, sensor and camera processing

Rigorous specifications AND conformance tests for cross- vendor portability

Acceleration APIs BY the Industry FOR the Industry Well over a BILLION people use Khronos APIs Every Day…

© Copyright Khronos Group 2014 - Page 2 http://accelerateyourworld.org/

© Copyright Khronos Group 2014 - Page 3 Mobile Web is a Real Time Application

2048x1536 3100K Pixels 326 DPI

1024x768 786K Pixels 132 DPI + 320x480 = 153K Pixels 163 DPI Apple Apple Apple iPhone iPad iPad Mini

Buttery smooth touch In 5 years the number of Need GPU interaction needs continuous pixels to process on Acceleration for 60Hz updates mobile screens has gone Web Rendering! up by factor of TWENTY

© Copyright Khronos Group 2014 - Page 4 Access to 3D on Over 2 BILLION Devices

1.9B Mobiles / year

300M Desktops / year

1B Browsers / year

Source: Gartner (December 2013) © Copyright Khronos Group 2014 - Page 5 Pervasive WebGL

WebGL on EVERY major desktop browser And coming to ALL major mobile browsers Portable (NO source change) 3D applications are possible for the first time

http://caniuse.com/#feat=webgl

© Copyright Khronos Group 2014 - Page 6 WebGL Tool/Engine Ecosystem

Epic Citadel - WebGL HTML 5 Benchmark (Firefox 22)

https://www.youtube.com/watch?v=l9KRBuVBjVo © Copyright Khronos Group 2014 - Page 7 WebGL on Mobile Unigine Engine Demo

http://crypt-webgl.unigine.com/

© Copyright Khronos Group 2014 - Page 8 WebGL Roadmap 32-bit integers and floats NPOT, 3D/depth textures Programmable Texture arrays Compute Shaders Fixed function Shaders Multiple Render Targets Pipeline

Driver Silicon Silicon Driver Update Update Update Update 2003 2004 2007 2012 2014 1.0 1.1 2.0 2011 3.0 3.1

WebGL 1.0 WebGL 2.0 Under Development

WebGL 2.0 - Open Review http://www.khronos.org/registry/webgl/specs/latest/2.0/

© Copyright Khronos Group 2014 - Page 9 glTF - Transmitting 3D Assets to WebGL Apps • ‘GL Transmission Format’ - Runtime asset format for WebGL, OpenGL ES, and OpenGL applications • Efficient Representation = Small Size AND Minimal Load Processing - JSON for scene structure and other high-level constructs - Binary mesh and animation data - Little or no processing to drop glTF data into client application • Runtime Neutral - Can be created and used by any app or runtime • Khronos is prototyping standards-based pipeline - Conditioning of COLLADA assets into glTF for WebGL applications

Authoring Playback

© Copyright Khronos Group 2014 - Page 10 COLLADA and glTF Ecosystem

OpenCOLLADA Tool Interop COLLADA2GLTF Importer/Exporter Translator and COLLADA Other Conformance Tests authoring On GitHUB formats

Web-based Tools

Pervasive WebGL deployment Three.js glTF Importer. Rest3D initiative

© Copyright Khronos Group 2014 - Page 11 Open Source Resources for glTF • COLLADA2GLTF open-source converter is gaining robustness and momentum - https://github.com/KhronosGroup/glTF/tree/master/converter/COLLADA2GLTF - Binaries are available on GitHUB for easy use • Three.js glTF loader - https://github.com/KhronosGroup/glTF/tree/master/loaders/threejs - Most glTF features are already supported • Open specification; Open process - Spec, and sample code: https://github.com/KhronosGroup/glTF - All features backed up by multiple implementations in code - glTF 0.8 schema available - getting very close to glTF 1.0! • Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - Available at https://github.com/fabrobinet/glTF-webgl-viewer

© Copyright Khronos Group 2014 - Page 12

glTF and Compression Extension • Benchmarking 3D compression formats for implementation as glTF extensions - Baseline is GZIP - MPEG royalty-free Scalable Complexity 3D Mesh Compression codec MPEG-SC3DMC - Open3DGC JavaScript and C/C++ implementation - WebGL-loader is lightweight compression format for WebGL content

Format CAD Models 3D Scanned Models MPEG dataset (Mbytes) (Mbytes) (Mbytes) OBJ 1310 (100%) 736 (100%) 600 (100%) Gzip 336 (26%) 204 (28%) 157 (26%) Webgl-loader 219 (17%) 117 (16%) 103 (17%) Open3DGC 67 (5%) 22 (3%) 22 (4%) Webgl-loader + Gzip 80 (6%) 38 (5%) 26 (4%) Open3DGC is 5x-9x more efficient than Gzip and 1.2x-1.5x more efficient than webgl-loader

© Copyright Khronos Group 2014 - Page 13 Motivation for WebCL • Parallel acceleration for compute-intensive web applications - Portable and efficient access to heterogeneous multicore devices in JavaScript • Typical Use Cases - 3D asset codecs, video codecs and processing, imaging and vision processing - Physics for WebGL games, Online data visualization, Augmented Reality • WebCL 1.0 specification officially released at GDC March 2014 - https://www.khronos.org/webcl

http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc © Copyright Khronos Group 2014 - Page 14 WebCL - Heterogeneous Computing for Web • OpenCL = Two APIs and C-based Kernel language - Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels across multiple devices • WebCL defines JavaScript binding to the OpenCL APIs - Enables initiation of OpenCL C Kernels from within the browser

OpenCL KernelOpenCL CodeKernel OpenCL JavaScript Platform API CodeKernel OpenCL C JavaScript Runtime API CodeKernel To query, select and initialize Code To build and execute kernels compute devices across multiple devices GPU DSP CPU CPU HW

© Copyright Khronos Group 2014 - Page 15 WebGL/WebCL Ecosystem

Low-level APIs provide Content Content downloaded from the Web a powerful foundation JavaScript, HTML, CSS, ... for a rich JavaScript Middleware can make WebGL and WebCL middleware ecosystem accessible to non-expert E.g. three.js library: http://threejs.org/ used by JavaScript Middleware majority of WebGL content

Browser provides WebGL and WebCL Alongside other HTML5 technologies No plug-in required HTML5 JavaScript / CSS

OS Provided Drivers WebGL uses OpenGL ES 2.0 or Angle for OpenGL ES 2.0 over DX9 WebCL uses OpenCL 1.X

© Copyright Khronos Group 2014 - Page 16 WebCL - Designed-in Architectural Security • Leverages OpenCL 1.2 robustness/security extensions - Context Termination: to prevent DoS from long running kernels - Memory Initialization: no leakage from out of bounds memory access • API and Language Restrictions to ensure no unsafe behavior is possible - Structures are not supported as kernel arguments - Kernels name must be less than 256 characters - Mapping of CL memory objects into host memory space is not supported - Program binaries are not supported - Some OpenCL API functions & builtin functions may require translation • WebCL Kernel Validator - Open source on GitHub - Static and dynamic kernel checking - Verifies memory accesses are inside valid memory areas - Run-time checks injected in code if neccesary - https://github.com/KhronosGroup/webcl-validator

© Copyright Khronos Group 2014 - Page 17 WebCL Open Source Resources • Implementations - Nokia - Firefox extension ( Public License 2.0) Based on Apple QJulia - https://github.com/toaarnio/webcl-firefox - Samsung - WebKit (BSD) - https://github.com/SRA-SiliconValley/webkit-webcl - Motorola - Uses Node.js (BSD) - https://github.com/Motorola-Mobility/node-webcl Based on Iñigo Quilez, Shader Toy - AMD –Chromium build - https://github.com/amd/Chromium-WebCL • WebCL Kernel Validator (open source) - https://github.com/KhronosGroup/webcl-validator Based on Iñigo Quilez, Shader Toy • OpenCL to WebCL Translator - https://github.com/wolfviking0/webcl-translator • OpenCL Conformance Tests - https://github.com/KhronosGroup/WebCL-conformance/ http://fract.ured.me/ © Copyright Khronos Group 2014 - Page 18

Path Rendering Acceleration Offload the CPU so the application can run as fast as possible Make maximum use of the GPU for best performance and power CPU creates paths CPU creates paths CPU creates paths

CPU tessellates paths CPU renders paths into polygons CPU

GPU Use standard 3D Define new OpenGL commands to path commands to process polygons process paths directly

- Software Scanline renderers can - Tessellation loads the CPU – stealing - Maximum CPU offload be high quality and portable cycles from the application so perf - Compact data format sent - CPU has to process complete sometimes slower than software alone to GPU renderer pipeline – stealing cycles - Tessellation consumes a lot of data - GPU provides excellent from the application and memory bandwidth = power performance and power - Software rendering limits - Quality can be compromised due to - GPU can increase quality performance tessellation accuracy and functionality

© 2014 NVIDIA - Page 19 NV_path_rendering OpenGL Extension Brings Path processing directly to OpenGL OpenGL processes paths as fundmental primitive No tessellation necessary Goals Functionally complete for key standards: SVG, Canvas, PostScript etc. Much faster—often 4x to 100x faster than CPUs Enhanced quality – can avoid approximations needed by CPU renderers Lower power by leveraging GPU hardware New functionality – e.g. mix 2D paths with 3D and programmable shading

© 2014 NVIDIA - Page 20 Stencil then Cover Approach Create a path object and pass directly to the GPU Cubic & quadratic Bezier segments, line segments, partial elliptical arcs GPU “Stencils” the path object into the stencil buffer GPU provides massively parallel stenciling of filled or stroked paths Calculate winding rule or containment at every sub-pixel sample in parallel “Cover” the path object and stencil test against its coverage Test against path coverage determined in the 1st step and shade the path Uses GPU MSAA anti-aliasing 8 or 16 samples/pixel gives good quality Step 1 Step 2: Stencil Cover repeat

© 2014 NVIDIA - Page 21 Enhanced Quality on GPU   

weird big holes feathers? Skia Cairo NV_path_rendering  regular grid  jitter pattern Stroking approximations avoided by GPU on CPU - sub-optimal Antialiasing on GPU for better Antialiasing GPU Offers Jittered Sampling for Free GPUs great at texturing: Mip-mapping  Qt Anisotropic filtering

Wrap modes

Moiré  GPU artifacts

Similar for Qt & Skia color bleeding  Cairo  conflation artifacts on CPU  conflation free on GPU Eliminate Conflation Artifacts Proper gradient filtering on GPU Multiple color AND stencil samples per pixel © 2014 NVIDIA - Page 22 New GPU Functionality

Projective Transformation

Fast Arbitrary Path Clipping light source position for BUMP Mapping

Programmable Shading  linear RGB Paint in GLSL – for filter and transition between saturated red and saturated blending acceleration blue has dark purple region

 sRGB perceptually smooth transition from saturated red to saturated blue Mixing depth tested Fully sRGB Correct Rendering Text, 3D, and Paths

© 2014 NVIDIA - Page 23 NVPR Resources and Adoption Adobe Illustrator CC shipping with NVPR! Significant application acceleration Developer resources http://developer.nvidia.com/nv-path-rendering Open source SVG Renderer - pr_svg Whitepapers, FAQ, specification NVprSDK—software development kit Email: [email protected] Availability Shipping in Release 275 drivers and beyond All CUDA-capable NVIDIA GPUs – including mobile Proposed open standard Submitted royalty-free to Khronos For use in any OpenGL-family API including WebGL

© 2014 NVIDIA - Page 24 Path Rendering Acceleration on Android Tablet

© 2014 NVIDIA - Page 25 Vision Pipeline Challenges and Opportunities Growing Camera Diversity Diverse Vision Processors Sensor Proliferation Capturing color, range Driving for high performance Diverse sensor awareness of and lightfields and low power the user and surroundings

• Light / Proximity • 2 cameras • 3 microphones • Touch • Position - GPS - WiFi (fingerprint) • Camera sensors >20MPix - Cellular trilateration • Camera ISPs • Novel sensor configurations - NFC/Bluetooth Beacons • Dedicated vision IP blocks • Stereo pairs • Accelerometer • DSPs and DSP arrays 19 • Magnetometer • Plenoptic Arrays • Programmable GPUs • Gyroscope • Active Structured Light • Multi-core CPUs • Pressure / Temp / Humidity • Active TOF

Flexible sensor and camera Use best processing available Control/fuse vision data control to generate for image stream processing – by/with all other sensor data required image stream with code portability on device

© Copyright Khronos Group 2014 - Page 26 Khronos and W3C Cooperation • Khronos and W3C liaison for Web APIs - Leverage proven native APIs - Fast API development/deployment - Designed by hardware community Familiar foundation reduces W3C Augmented Web Community - Group discussing many of these vision developer learning curve issues for the Web: e.g. leveraging WebRTC in the short term http://w3.org/community/ar

WebSL? JS Binding to Canvas WebVX? WebStream? WebKCAM? JavaScript Vision Sensor Fusion Camera Processing control WebAudio

Native Path Rendering JavaScript API shipping, Possible future Native APIs shipping acceleration being developed JavaScript APIs or or Khronos working group or work underway acceleration © Copyright Khronos Group 2014 - Page 27 Questions?

• www.khronos.org • [email protected] • @neilt3d

© Copyright Khronos Group 2014 - Page 28