<<

Bringing the Power of the GPU to the Web Neil Trevett Vice President NVIDIA President Khronos

© Copyright 2013 - Page 1 Mobile is the New Epicenter of Innovation

© Copyright Khronos Group 2013 - Page 2 Khronos Standards

3D Asset Handling - Advanced Authoring pipelines - 3D Asset Transmission Format with streaming and compression Visual Computing - Object and Terrain Visualization - Advanced scene construction

Over 100 companies defining royalty-free to connect software to silicon Camera Control API Acceleration in the Browser - WebGL for 3D in browsers - WebCL – Heterogeneous Sensor Processing Computing for the web - Mobile Vision Acceleration - On-device Sensor Fusion

© Copyright Khronos Group 2013 - Page 3 is a Real Time Application

2048x1536 3100K Pixels 326 DPI

1024x768 786K Pixels 132 DPI + 320x480 = 153K Pixels 163 DPI Apple Apple Apple iPhone iPad iPad Mini

Buttery smooth touch In 5 years the number of Need GPU interaction needs continuous pixels to process on Acceleration for 60Hz updates mobile screens has gone everything Web! up by factor of TWENTY

© Copyright Khronos Group 2013 - Page 4 How are GPUs Accessible to the Web? • Hardware composition - Within the browser stack – under the hood • Vector Acceleration for SVG - Using NVIDIA OpenGL extensions • 3D Developer Functionality - OpenGL ES functionality through JavaScript • Compute Acceleration - Offloading compute intensive code to GPU • Compression and streaming of 3D assets - For network transmission • Camera, vision and sensor processing - Future JavaScript bindings to native APIs?

© Copyright Khronos Group 2013 - Page 5 Mobile OS Adoption of Khronos APIs

OpenGL ES 2.0 Shipping - Android 2.2

OpenSL ES 1.0 (subset) Shipping – Android 2.3

OpenMAX AL 1.0 (subset) Shipping - Android 4.0 EGL 1.4 Shipping under SDK -> NDK and WebGL now Chrome soon

OpenGL 3.2 on MacOS OpenCL 1.2 on MacOS OpenGL ES 3.0 on iOS Can enable on MacOS iOS5 enables WebGL for iAds

© Copyright Khronos Group 2013 - Page 6 WebGL – 3D on the Web – No Plug-in! • Leveraging HTML 5 and element - WebGL defines JavaScript binding to OpenGL ES 2.0 - Enables a 3D context for the canvas • Low-level foundational Web API for accessing the GPU - Flexibility and direct GPU access - Enables higher-level frameworks and middleware

JavaScript binding to OpenGL ES 2.0 Availability of OpenGL and Increasing JavaScript OpenGL ES on almost every performance. web-capable device HTML 5 Canvas Tag

© Copyright Khronos Group 2013 - Page 7 WebGL Implementation Anatomy

Content Much WebGL JavaScript, HTML, CSS, ... Content downloaded from the Web. content uses Middleware can make WebGL accessible to three.js non-expert 3D library: JavaScript Middleware http://threejs.org/

HTML5 Browser provides WebGL functionality alongside other HTML5 technologies - no plug-in required JavaScript CSS

OpenGL ES 2.0 OS Provided Drivers. WebGL on Windows can OpenGL use Direct3D - for example Angle DX9/Angle project creates OpenGL ES 2.0 over DX9

© Copyright Khronos Group 2013 - Page 8 WebGL Availability in Browsers

- – “where you have IE11, you have WebGL – turned on by default and working all the time” - Microsoft - WebGL also enabled for Windows applications - web app framework and web view - Apple - WebGL must be explicitly turned on MAC Safari and only exposed on iOS for iAds - Chrome OS - WebGL is the only cross-platform API to program the GPU - IO announcement - Chrome on Android will soon launch with WebGL © Copyright Khronos Group 2013 - Page 9 Cross-OS Portability

HTML5 provides cross HTML/CSS HTML/CSS HTML/CSS platform portability. GPU through WebGL available soon on ~90% mobile systems

Dalvik Preferred development SDK Objective C C# environments not (Java) designed for portability

Native code is portable- but apps must cope with C/C++ DirectX different available APIs and libraries

© Copyright Khronos Group 2013 - Page 10 WebGL First Wave Application Categories • Maps and Navigation • Modeling Tools and Repositories • Games • 3D Printing • Visualization • Music Videos and Promotion • Education • Photo Editors • Music Visualizers • Vision/Video Processing

© Copyright Khronos Group 2013 - Page 11 Google Maps • All rendering (2D and 3D) in Google Maps uses WebGL

© Copyright Khronos Group 2013 - Page 12 Microsoft PhotoSynth2 • Demonstrated at Build 2013

http://channel9.msdn.com/Events/Build/2013/4-072 1:50

© Copyright Khronos Group 2013 - Page 13 WebGL on Logan Android Tablet

© Copyright Khronos Group 2013 - Page 14 WebGL on Logan Android Tablet

© Copyright Khronos Group 2013 - Page 15 OpenGL 3D API Family Tree ES3 is backward compatible Fixed function Programmable vertex so new features can be 3D Pipeline and fragment added incrementally WebGL 2.0 is in development now - will bring OpenGL ES 3.0 functionality to the Web http://www.khronos.org/webgl/public-mailing-list/ http://www.khronos.org/registry/webgl/specs/latest/ http://www.khronos.org/webgl/wiki/Testing/Conformance OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0 Content Content Content Mobile 3D WebGL 1.0 WebGL 2.0

OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0 ES-Next OpenGL ES 1.0

OpenGL 1.3 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.1 OpenGL 3.3 OpenGL 4.2 OpenGL 4.3 OpenGL 4.4 GL-Next

OpenGL 3.0 OpenGL 3.2 OpenGL 4.0

OpenGL 4.1

OpenGL 4.4 is a Desktop 3D superset of DX11

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

© Copyright Khronos Group 2013 - Page 16 OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power - Incorporates proven features from OpenGL 3.3 / 4.x - 32-bit integers and floats in programs - NPOT, 3D textures, depth textures, texture arrays - Multiple Render Targets for deferred rendering, Occlusion Queries - Instanced Rendering, Transform Feedback … • Make life better for the - Tighter requirements for supported features to reduce implementation variability • Backward compatible with OpenGL ES 2.0 - OpenGL ES 2.0 apps continue to run unmodified • Standardized Texture Compression - #1 developer request!

© Copyright Khronos Group 2013 - Page 17 Why Khronos for WebGL? • Hardware API standards must take into account silicon design cycles - Multi-year pipeline of APIs that affect chips that take $100Ms to execute - Deep insights into silicon and driver architectures - Rigorous conformance tests and infrastructure • Khronos is committed to being a good citizen in the larger Web community - Opened Khronos WebGL processes to enable cooperation with web community • Khronos is the industry forum to drive hardware consensus and cooperation - Help create foundational support for higher-level that access hardware capabilities

© Copyright Khronos Group 2013 - Page 18 Leveraging Proven Native APIs into HTML5 • Khronos and W3C liaison - Leverage proven native API investments into the Web - Fast API development and deployment - Designed by the hardware community - Familiar foundation reduces developer learning curve

HTML WebCAM(!) Canvas WebVX? WebStream? Camera control and JavaScript Vision Sensor Fusion Processing video processing

Camera Control Native Path Rendering

JavaScript API shipping, Possible future Native APIs shipping acceleration being developed JavaScript APIs or or Khronos working group or work underway acceleration

© Copyright Khronos Group 2013 - Page 19 OpenCL as Parallel Compute Foundation

C++ AMP OpenCL HLM WebCL Aparapi River Trail PyOpenCL Harlan Compiler C++ JavaScript binding to Java language Language Python wrapper High level Shevlin Park directives for syntax/compiler OpenCL for initiation extensions for extensions to around language for GPU Uses Clang Fortran C and C++ and LLVM extensions of OpenCL C kernels parallelism JavaScript OpenCL programming

OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources

© Copyright Khronos Group 2013 - Page 20 WebCL • WebCL is a JavaScript binding to OpenCL APIs - Enables initiation of Kernels written in OpenCL C within the browser - Requires a conformant underlying OpenCL on the host system • Leverage resources - 3D asset codecs, video codecs and processing, imaging and vision processing - Physics for WebGL games, Online data visualization, • WebCL 1.0 based on OpenCL 1.1 Embedded Profile: - Implementations may utilize OpenCL 1.1 or 1.2 • WebCL API is designed for complete security - Restriction of some OpenCL native functionality - WebCL kernel validation – similar to WebGL

© Copyright Khronos Group 2013 - Page 21 WebCL 1.0 Kernels • HTML data - , , ImageData sources bindable to WebCLBuffer & WebCLImage -

© Copyright Khronos Group 2013 - Page 22 WebCL 1.0 Security • Leverages OpenCL 1.2 robustness/security extensions - Context Termination - to prevent DOS from long running kernels - Memory Initialization - so no leakage from out of bounds memory access • Kernels passed through open source WebCL Kernel Validator - ://github.com/KhronosGroup/webcl-validator - Initializes local and private memory if underlying OpenCL implementation does not implement memory initialization extension - Keeps track of memory allocations and traces valid ranges for reads and writes • API/Language Restrictions and definition of undefined OpenCL behavior - Kernels do not support structures as arguments - Kernels name must be less than 256 characters - Mapping of CL memory objects into host memory space is not supported - Binary kernels are not supported - Some OpenCL Extension may not be supported or require translation - Certain OpenCL parameters may not directly carry over to WebCL © Copyright Khronos Group 2013 - Page 23 WebCL 1.0 Current Status • WebCL 1.0 API definition is being publicly developed - Working Public Draft first released April 2012: www.khronos.org/webcl • WebCL distribution lists - [email protected], [email protected] • WebCL1.0 specification finalization expected in 1H14 - https://cvs.khronos.org/svn/repos/registry/trunk/public/webcl/spec/latest/index. - With conformance tests and utilities - Samsung contributed tests, working group reviewed • WebCL Conformance Framework and Test Suite (WiP) - Full API coverage and Input/output validation - Available on GitHub: https://github.com/KhronosGroup/WebCL-conformance/

© Copyright Khronos Group 2013 - Page 24 OpenCL to WebCL Translator Utility • OpenCL to WebCL Kernel Translator - Input: An OpenCL kernel - Output: WebCL kernel, and a log file, that details the translation process - Tracked by a “meta” bug on Khronos public Bugzilla - http://www.khronos.org/bugzilla/show_bug.cgi?id=785 • Host API translation (WiP) - Input: an OpenCL host API calls - Output: WebCL host API calls to be wrapped in JS - Provides verbose translation log file, detailing the process and any constraints - Tracked by a “meta” bug on Khronos public Bugzilla: - http://www.khronos.org/bugzilla/show_bug.cgi?id=913

25 © Copyright Khronos Group 2013 - Page 25 WebCL Prototype Implementations • - Firefox build with integrated WebCL - Firefox extension, open sourced May 2011 ( Public License 2.0) - https://github.com/toaarnio/webcl-firefox • Samsung - uses WebKit, open sourced June 2011 (BSD) - https://github.com/SRA-SiliconValley/webkit-webcl • Mobility - uses Node.js, open sourced April 2012 (BSD) - https://github.com/Motorola-Mobility/node-webcl

Based on Apple QJulia Based on Iñigo Quilez, Shader Toy Based on Iñigo Quilez, Shader Toy http://fract.ured.me/

© Copyright Khronos Group 2013 - Page 26 WebCL for Web Acceleration

http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc

© Copyright Khronos Group 2013 - Page 27 Khronos APIs for Augmented Reality W3C Augmented Web Community Group discussing many of these issues for the Web: e.g. leveraging WebRTC in the short term http://w3.org/community/ar Audio Rendering

MEMS Application Sensors Sensor on CPUs, GPUs Fusion and DSPs

Precision timestamps Vision Processing on all sensor samples

Advanced Camera EGLStream - 3D Rendering and Video Control and stream stream data Composition generation between APIs On GPU Camera Control API

AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together © Copyright Khronos Group 2013 - Page 28 3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential - Mobile and connected devices need access to increasingly large asset databases • 3D is the last media type to define a compressed format - 3D is more complex – diverse asset types and use cases • Needs to be royalty-free - Avoid an ‘ video codec war’ scenario • Eventually enable hardware implementations of successful codecs - High-performance and low power – but pragmatic adoption strategy is key

Audio Video Images 3D MP3 H.264 JPEG ? ! An effective and widely adopted codec ignites previously unimagined opportunities for a media type © Copyright Khronos Group 2013 - Page 29 glTF – OpenGL Transmission Format • Binary file format for efficient transmission for 3D assets - Reduce network bandwidth and minimize client processing overhead • Run-time neutral - DO NOT IMPLY OR MANDATE ANY RUN-TIME BEHAVIOR - Can be used by any app or run-time – usually WebGL accelerated • Scalable to handle compression and streaming - Though baseline format does not include compression • ‘Direct load efficiency’ for WebGL - Little or NO processing to drop glTF data into WebGL client • Carry conditioned data from any authoring format - Prototyping and optimizing efficient handling of COLLADA assets

A standards-based content pipeline for rich native and Web 3D Authoring Playback applications

© Copyright Khronos Group 2013 - Page 30 COLLADA and glTF Open Source Ecosystem

OpenCOLLADA Tool Interop COLLADA2GLTF Importer/Exporter Translator and COLLADA Other Conformance Tests authoring On GitHUB formats

https://github.com/KhronosGroup/glTF

Web-based https://github.com/KhronosGroup/OpenCOLLADA Tools https://github.com/KhronosGroup/COLLADA-CTS

Pervasive WebGL deployment Three.js glTF Importer. Rest3D initiative

© Copyright Khronos Group 2013 - Page 31 WebGL as Test-bed for 3D Asset Compression • Integrating and benchmarking 3D geometry compression formats with glTF - Baseline is GZIP • Scalable Complexity 3D Mesh Compression codec MPEG-SC3DMC - Royalty-free graphics compression technology from MPEG (MIT License) - Open3DGC is efficient JavaScript and C/C++ implementation - Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - https://github.com/amd/rest3d/tree/master/server/o3dgc • WebGL-loader is Google lightweight compression for WebGL content • OpenCTM uses LZMA compression

© Copyright Khronos Group 2013 - Page 32 Initial Compression Results • Compression Efficiency - Gzip (default level=6) 400 - OpenCTM (default settings)

- Open3DGC and Webgl-loader 300

- Positions on 14 bits Gzip OpenCTM - Normals and texCoords on 10 bits 200 Webgl-loader + Gzip

Size Size (MBytes) Open3DGC-ASCII + Gzip 100 Open3DGC-Binary

0 CAD 3D Scanned MPEG dataset (3748 models) (78 models) (1211 models)

Open3DGC is 5x-9x more efficient than Gzip 1.3x-2.4x more efficient than OpenCTM and 1.2x-1.5x more efficient than webgl-loader

© Copyright Khronos Group 2013 - Page 33 3DGC Decode Times • Javascript Decoding Speed - Desktop machine - Windows® 64-bit, 8GB RAM, Chrome - AMD Phenom™ II X4 B95 CPU @ 3.0GHz - Smart phone

- Samsung Galaxy S4 Number of Desktop decoding Smart phone decoding - Android 4.2.2 triangles time (ms) time (ms)

- Chrome “Hand” 100K 130 1045

“Dilo” 54K 85 768

“Octopus” 34K 65 457

Decoding speed will become even more critical with dense 3D meshes generated by 3D digitization technologies (e.g. 3D scanners) 3D Codec can be accelerated by WebCL Kernels or (eventually) hardware

© Copyright Khronos Group 2013 - Page 34 Texture Compression is Key •Texture compression saves precious resources - Network bandwidth, device memory space AND device memory bandwidth •Developers need the same texture compression EVERYWHERE - Otherwise portable apps – such as WebGL need multiple copies of same texture ASTC

OpenGL ES 3.0 and OpenGL 4.3 Royalty-free ETC2 / EAC extensions -> Core BUT only optional in ES. MANDATED in once proven Quality Only 4bpp | 3 channel OpenGL ES 3.0 No alpha support OpenGL 4.3 Royalty-free NOT Royalty-free. Best quality. Platform Royalty-free Independent control of bit-rate Fragmentation ETC1 Backward compatible with ETC1 and # channels Mandated in ETC2: 4bpp | 3 channel 1 to 4 channel Android Froyo EAC: 4 (8) bpp | 1(2) channel 1-8bpp in fine steps DXTC/S3TC COMBINED: RGBA 8bpp | 4 channel (400M devices) Does not have 1-2 bit compression Windows WITH ALPHA

PVRTC iOS Pervasive Deployment 2008-2010 2012-2013 2014->

© Copyright Khronos Group 2013 - Page 35 ASTC – Universal Texture Standard • Adaptive Scalable Texture Compression (ASTC) - Quality significantly exceeds S3TC or PVRTC at same bit rate • Industry-leading orthogonal compression rate and format flexibility - 1 to 4 color components: R / RG / RGB / RGBA - Choice of bit rate: from 8bpp to <1bpp in fine steps • ASTC is royalty-free and so is available to be universally adopted - Shipping as OpenGL/OpenGL ES extension today for industry feedback

Original ASTC Compression 24bpp 8bpp 3.56bpp 2bpp © Copyright Khronos Group 2013 - Page 36 Path Rendering Acceleration • Offload the CPU so the application can run as fast as possible - Make maximum use of the GPU for best performance and power

CPU creates paths CPU creates paths CPU creates paths

CPU tessellates paths CPU renders paths into polygons CPU

GPU Use standard 3D Define new OpenGL commands to path commands to process polygons process paths directly

- Software Scanline renderers can - Tessellation loads the CPU – stealing - Maximum CPU offload be high quality and portable cycles from the application so perf - Compact data format sent - CPU has to process complete sometimes slower than software alone to GPU renderer pipeline – stealing cycles - Tessellation consumes a lot of data - GPU provides excellent from the application and memory bandwidth = power performance and power - Software rendering limits - Quality can be compromised due to - GPU can increase quality performance tessellation accuracy and functionality

© Copyright Khronos Group 2013 - Page 37 NV_path_rendering OpenGL Extension • Brings Path processing directly to OpenGL - No tessellation necessary • Goals - Functionally complete for key standards: SVG, Canvas, PostScript etc. - Much faster—often 4x to 100x faster than CPUs - Enhanced quality – can avoid approximations needed by CPU renderers - Lower power by leveraging dedicated hardware - New functionality – e.g. mix 2D paths with 3D and programmable shading

© Copyright Khronos Group 2013 - Page 38 Stencil then Cover Approach • Create a path object and pass directly to the GPU - Cubic & quadratic Bezier segments, line segments, partial elliptical arcs • GPU “Stencils” the path object into the stencil buffer - GPU provides massively parallel stenciling of filled or stroked paths - Calculate winding rule or containment at every sub-pixel sample in parallel • “Cover” the path object and stencil test against its coverage - Test against path coverage determined in the 1st step and shade the path • Uses GPU MSAA anti-aliasing - 8 or 16 samples/pixel gives good quality

Step 1 Step 2: Stencil Cover

repeat

© Copyright Khronos Group 2013 - Page 39 Enhanced Quality on GPU   

weird big holes feathers? Skia Cairo NV_path_rendering  regular grid  jitter pattern Stroking approximations avoided by GPU on CPU - sub-optimal Antialiasing on GPU for better Antialiasing GPU Offers Jittered Sampling for Free GPUs great at texturing: Mip-mapping  Qt Anisotropic filtering Wrap modes

Moiré  GPU artifacts

Similar for Qt & Skia color bleeding  Cairo  conflation artifacts on CPU  Conflation free on GPU Eliminate Conflation Artifacts Proper gradient filtering on GPU Multiple color AND stencil samples per pixel © Copyright Khronos Group 2013 - Page 40 Comparing Performance

© Copyright Khronos Group 2013 - Page 41 New GPU Functionality

Projective Transformation

Fast Arbitrary Path Clipping light source position for BUMP Mapping

Programmable Shading  linear RGB Paint in GLSL – for filter and transition between saturated red and saturated blending acceleration blue has dark purple region

sRGB perceptually smooth transition from saturated red to saturated blue Mixing depth tested Fully sRGB Correct Rendering Text, 3D, and Paths

© Copyright Khronos Group 2013 - Page 42 Mixing 2D and 3D

© Copyright Khronos Group 2013 - Page 43 Standardization and Adoption Pipeline • NVIDIA is proposing nvpr to OpenGL working group at Khronos to create open, royalty-free cross platform foundation for vector graphics acceleration

Initial functionality proposal. Pervasive multi-vendor availability. Desktop and mobile Prove concepts. Widespread application usage displays typically Solicit industry feedback inspires silicon optimizations >300 DPI

OpenGL vector Vendor OpenGL acceleration Vector acceleration Extension Extension adopted into pervasive on desktop to OpenGL or Core OpenGL and and mobile OpenGL ES

nvpr is here! Mobile silicon is CUDA/OpenCL capable

© Copyright Khronos Group 2013 - Page 44 Path Rendering Acceleration on Android Tablet

© Copyright Khronos Group 2013 - Page 45 Summary • Open standards such as WebGL and WebCL are enabling web applications to reach the power of the GPU through JavaScript • GPU acceleration will soon become vital for Web applications wanting to leverage advanced use of camera and sensors • Direct acceleration of path primitives directly on GPUs will drive browser smooth touch performance for new classes of applications and devices • Work starting on 3D asset streaming and compression standards – to enable 3D as a social media type on the web • The Web and hardware community have significant opportunity to leverage each others efforts for the benefit of the industry • Khronos is committed to enable the hardware community to be a good citizen in creating the next generation of accelerated web standards • www.khronos.org • [email protected]

© Copyright Khronos Group 2013 - Page 46