<<

Bringing GPU Acceleration to the Web Neil Trevett Khronos President Vice President Mobile Content

© Copyright 2012 | Page 1 GPUs are Good at Many Things …

Interactive ray tracing

Physics simulation

Gaming Battlefield 3, EA

But traditionally GPUs have NOT been used for web and 2D graphics… Product design

Data visualization … that’s about to change …

© Copyright Khronos Group 2012 | Page 2 How Can GPUs enhance the Web? • More functionality - Helping to make HTML5 a complete apps platform - 3D graphics - WebGL - Compute - WebCL • More performance - Accelerating the bulk of web =2D - Accelerating key standards such as SVG and Canvas • Mobile computing changing the need for web acceleration Need MORE performance for TOUCH INTERACTIVITY AT LOWER POWER levels…

© Copyright Khronos Group 2012 | Page 3 Mobile – a New Era in Computing

140 Cumulative Shipments 120 iOS & Android MacOS & Windows 100

80 Mobile industry is 60 20 Years faster to 100M/year shipments than PC 40 Units Millions in

20

0 Year 1 Year 2 Year 3 Year 4 Year 5 Year 6 Year 7 Year 8 Year 9

Source: Gartner, Apple, NVIDIA

© Copyright Khronos Group 2012 | Page 4 Mobile Thermal Design Point

10” Screen takes 1-2W Resolution makes a difference! 7” Screen The iPad3 screen takes up to 8W 4-5” Screen takes takes 1W 250-500mW

2-4W 4-7W 6-10W 30-90W Typical max system power levels before thermal failure

Even as battery technology improves - these thermal limits remain

© Copyright Khronos Group 2012 | Page 5 How to Save Power? Write 32-bits to Memory 600pJ

• Much more expensive to MOVE data than COMPUTE data Send 32-bits Off-chip • Energy efficiency must now be key metric 50pJ during silicon AND software design - Awareness of where data lives, where computation happens, how is it scheduled • Need to use hardware acceleration Send 32-bits 2mm 24pJ - Lots of processing in parallel - Efficient caching and memory usage

- Reduces data movement 32-bit Float Operation For 40nm, 7pJ 1V process

32-bit Integer Add 1pJ 32-bit Register Write 0.5pJ © Copyright Khronos Group 2012 | Page 6 Needs for Accelerated In 5 years the number Power efficiency of pixels to process on by offloading mobile screens has from CPU to GPU GPU acceleration of the gone up by factor of complete web – including TWENTY …

vector graphics High-resolution screens … and displays have have significantly more reached over 300DPI pixels to process 2048x1536 3100K Pixels 264 DPI Smooth 60Hz touch interactivity with ALL 720x1280 rich web content 1024x768 921K Pixels 786K Pixels 312 DPI 132 DPI 320x480 What open standards can help 153K Pixels 163 DPI accelerate the Web? Apple Apple HTC Apple iPhone iPad One X iPad3

© Copyright Khronos Group 2012 | Page 7 Khronos Connects Software to Silicon

Khronos creates ROYALTY-FREE, for advanced hardware acceleration

Low-level “Foundation” functionality at the software silicon interface needed on every platform

Graphics, video, audio, compute, visual and sensor processing

Shipping on billions of devices across multiple operating systems - rigorous conformance tests for cross-vendor consistency

Khronos standards define the forward looking roadmap for the silicon community

Khronos is OPEN for any company to join and participate

Acceleration APIs BY the Industry FOR the Industry

© Copyright Khronos Group 2012 | Page 8 API Standards Evolution WEB INTEROP, VISION MOBILE AND SENSORS

DESKTOP OpenVL

New API technology first evolves on high- Mobile is the new platform for Apps embrace mobility’s end platforms apps innovation. Mobile unique strengths and need Diverse platforms – mobile, TV, APIs unlock hardware and complex, interoperating APIs embedded – means HTML5 will conserve battery life with rich sensory inputs become increasingly important e.g. Augmented Reality as a universal app platform

© Copyright Khronos Group 2012 | Page 9 3D API Family Tree ES3 is backward compatible Fixed function Programmable vertex so new features can be 3D Pipeline and fragment shaders added incrementally

OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0 Content Content Content Mobile 3D WebGL 1.0 WebGL-Next

OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0 ES-Next OpenGL ES 1.0

OpenGL 1.3 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL3.1 OpenGL3.3 OpenGL4.2 OpenGL 4.3 GL-Next

OpenGL 3.0 OpenGL 3.2 OpenGL 4.0

OpenGL 4.1

OpenGL 4.3 is a Desktop 3D superset of DX11

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

© Copyright Khronos Group 2012 | Page 10 OpenGL ES Deployment in Mobile

On PC – DirectX is used for Use of 3D APIs in Mobile Devices most apps. On mobile the Source: Jon Peddie Research situation is reversed

OpenGL ES is the 3D API used in Android, iOS and almost every other mobile and embedded OS – other than Windows

© Copyright Khronos Group 2012 | Page 11 WebGL – 3D on the Web – No Plug-in! • Leveraging HTML 5 and element - WebGL defines JavaScript binding to OpenGL ES 2.0 - Enables a 3D context for the canvas • Low-level foundational Web API for accessing the GPU - Flexibility and direct GPU access - Enables higher-level frameworks and middleware

JavaScript binding to OpenGL ES 2.0 Availability of OpenGL and Increasing JavaScript OpenGL ES on almost every performance. web-capable device HTML 5 Canvas Tag

© Copyright Khronos Group 2012 | Page 12 WebGL Implementation Anatomy

Content JavaScript, HTML, CSS, ... Content downloaded from the Web. Middleware can make WebGL accessible to non-expert 3D JavaScript Middleware

WebGL HTML5 Browser provides WebGL functionality alongside other HTML5 specs - no plug-in required JavaScript CSS

OpenGL ES 2.0 OS Provided Drivers. WebGL on OpenGL Windows can use Google Angle to create DX9/Angle conformant OpenGL ES 2.0 over DX9

http://www.khronos.org/webgl/wiki/User_Contributions

© Copyright Khronos Group 2012 | Page 13 WebGL Deployment • WebGL 1.0 Released at GDC March 2011 - , Apple, Google and working closely with GPU vendors • IE can be enabled with Chrome - ://developers.google.com/chrome/chrome-frame/ • Mobile WebGL beginning to ship – , Opera, RIM - Pervasive mobile WebGL expected during next 12 months

http://caniuse.com/#search=webgl WebGL is not enabled by default in desktop . On iOS 5 WebGL is available to iAds

© Copyright Khronos Group 2012 | Page 14 WebGL – Being Used by Millions Every Day

© Copyright Khronos Group 2012 | Page 15 WebGL and Security • WebGL is Architecturally Secure - NO known WebGL security issues - Impossible to access out-of-bounds or uninitialized memory - Use of cross-origin images are blocked without permission through CORS - Browsers maintaining black lists - used if unavoidable GPU driver bugs discovered • DoS attacks and GPU hardening - Draw commands can run for a long time -> unresponsive system - Even without loops in shaders - WebGL working closely with GPU vendors to categorically fix this - Short term: mandate ARB_robustness and associated GPU watchdog timer - Longer term: GPUs need robust context switch and pre-emption

WebGL is web-hardening GPU usage Helps ALL GPU accelerated portions of the browser stack

© Copyright Khronos Group 2012 | Page 16 Why Khronos for WebGL? • Hardware API standards must take into account silicon design cycles - Multi-year pipeline of APIs that affect chips that take $100Ms to execute - Rigorous conformance tests and infrastructure • Khronos is unique forum where browser and GPU vendors can cooperate - Strong synergy from having both communities under one roof • Khronos is committed to being a good citizen in the larger Web community - Opened Khronos WebGL processes to enable cooperation with web community - http://www.khronos.org/webgl/public-mailing-list/ - http://www.khronos.org/registry/webgl/specs/latest/ - http://www.khronos.org/webgl/wiki/Testing/Conformance

Khronos is the industry forum to drive hardware consensus and cooperation and advocate hardware support for higher-level software standards

© Copyright Khronos Group 2012 | Page 17 OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power - Incorporates proven features from OpenGL 3.3 / 4.x - 32-bit integers and floats in shader programs - NPOT, 3D textures, depth textures, texture arrays - Multiple Render Targets for deferred rendering, Occlusion Queries - Instanced Rendering, Transform Feedback … • Make life better for the - Tighter requirements for supported features to reduce implementation variability • Backward compatible with OpenGL ES 2.0 - OpenGL ES 2.0 apps continue to run unmodified • Standardized Texture Compression - #1 developer request!

© Copyright Khronos Group 2012 | Page 18 Texture Compression is Key • Texture compression saves precious resources - Saves network bandwidth, device memory space AND memory bandwidth • Developers need the same texture compression EVERYWHERE - Otherwise apps need multiple copies of same texture for different platforms

ASTC OpenGL ES 3.0 and OpenGL 4.3 Royalty-free ETC2 / EAC Extension -> Core BUT only optional in ES. MANDATED in once proven Quality Only 4bpp | 3 channel OpenGL ES 3.0 No alpha support OpenGL 4.3 Royalty-free NOT Royalty-free. Best quality. Platform Royalty-free Independent control of bit-rate Fragmentation ETC1 Backward compatible with ETC1 and # channels Mandated in ETC2: 4bpp | 3 channel 1 to 4 channel Android Froyo EAC: 4 (8)bpp | 1(2) channel COMBINE: 1-8bpp in fine steps DXTC/S3TC RGBA 8bpp | 4 channel (400M devices) Does not have 1-2 bit compression Windows WITH ALPHA

PVRTC iOS Deployment 2008-2010 2012-2013 2014->

© Copyright Khronos Group 2012 | Page 19 ASTC – Future Universal Texture Standard? • Adaptive Scalable Texture Compression (ASTC) - Quality significantly exceeds S3TC or PVRTC at same bit rate • Industry-leading orthogonal compression rate and format flexibility - 1 to 4 color components: R / RG / RGB / RGBA - Choice of bit rate: from 8bpp to <1bpp in fine steps • ASTC is royalty-free and so is available to be universally adopted - Shipping as GL/ES extension today for industry feedback -

Original ASTC Compression 24bpp 8bpp 3.56bpp 2bpp © Copyright Khronos Group 2012 | Page 20 Need for 3D Transmission Standard? • Efficient codecs to compress and store binary 3D asset blobs - Geometry, textures, materials, animations, physics… • Separate from scene graph description – e.g. WebGL could use JSON - Combine with RESTful APIs in Web services approach to flexible streaming and LOD management - http://rest3d.wordpress.com/ • Many initiatives under way – time for communication and collaboration? - MPEG 3D Mesh Progressive Streaming (3DMC), Bones Based Animation (BBA) - Google Body compression - Delta and ZigZag encoding - COLLADA2JSON at COLLADA working group - Web3D Consortium – Fraunhofer, significant papers at Web3D Conference

Audio Video Images 3D An effective and widely adopted codec ignites previously unimagined MP3 H.264 PNG/JPEG opportunities for a media type ? ? © Copyright Khronos Group 2012 | Page 21 OpenCL – Heterogeneous Computing

OpenCL KernelOpenCL • C Platform Layer API CodeKernel OpenCL CodeKernel OpenCL - Query, select and initialize compute devices CodeKernel Code • Kernel Language Specification - Subset of ISO C99 with language extensions - Well-defined numerical accuracy - IEEE 754 CPU CPU rounding with specified max error - Rich built-in functions: cross, dot, sin, pow, log … OpenCL • C Runtime API KernelOpenCL CodeKernel OpenCL - Runtime or build-time compilation of kernels CodeKernel OpenCL CodeKernel - Execute compute kernels across multiple devices Code

GPU A low-level, cross-platform, cross-vendor standard GPU CPU for harnessing all system compute resources One code tree can be executed on CPUs or GPUs

© Copyright Khronos Group 2012 | Page 22 OpenCL as Parallel Compute Foundation

HLM C++ AMP Aparapi RenderScript River Trail WebCL C++ C++ Java language C99 kernels for Dalvik Language JavaScript binding to syntax/compiler syntax/compiler extensions for with JIT compilation extensions to OpenCL for initiation extensions extensions parallelism for device portability JavaScript of OpenCL C kernels

CUDA or DirectCompute may also be used as compiler targets – but OpenCL provides cross-platform, cross-vendor coverage

© Copyright Khronos Group 2012 | Page 23 WebCL – for the Web • JavaScript bindings to OpenCL APIs - JavaScript initiates OpenCL C Kernels on heterogeneous multicore CPU/GPU • Stays close to the OpenCL standard - Maximum flexibility to provide a foundation for higher-level middleware • Minimal language modifications for 100% security and app portability - E.g. Mapping of CL memory objects into host memory space is not supported • WebCL language restrictions for security - E.g. disallowing pointers • Compelling use cases - Physics engines for WebGL games, image and video editing in browser • API definition underway – public draft just released - https://cvs.khronos.org/svn/repos/registry/trunk/public/webcl/spec/latest/index.

© Copyright Khronos Group 2012 | Page 24 WebCL Demo http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc

© Copyright Khronos Group 2012 | Page 25 Expanding Platform Reach for Graphics and Computation

Desktop Mobile Web WebGL on majority of production desktops now. Graphics WebGL pervasively available on mobile in next 12 months Interop Interop Interop

Typed Arrays

WebCL will start Compute deploying in next 12-18 months

Full Profile Full Profile and Embedded Profile

OpenCL pervasively available on mobile in next 18-24 months

© Copyright Khronos Group 2012 | Page 26 Leveraging Proven Native APIs into HTML5 • Leverage native API investments into the Web - Faster API development and deployment - Familiar foundation reduces developer learning curve • Khronos and W3C discussing closer liaison - Multiple potential joint projects

Device and WebMAX? WebAudio Canvas Sensor APIs Camera Advanced WebVL? Device control and JavaScript JavaScript Vision Orientation video Audio Processing Working processing Groups

OpenVL Native

Native APIs shipping JavaScript API shipping Possible future or working group underway or working group underway JavaScript APIs

© Copyright Khronos Group 2012 | Page 27 Goal: Accelerate Web Browsers with GPU • Hybrid rendering on mix of CPU and GPU rendering typically fail - Costly switching of resources between CPU and GPU rob performance - Big win if completely free the CPU to run the higher level web stack • Vector rendering is the main issue - So far, attempts at resolution-independent GPU acceleration have largely failed • Path rendering has 30 years of heritage and history - CPU scan-line algorithms = high-quality, fast and fully functional • GPU acceleration must surpass CPU approaches on all fronts - Performance - Quality - Functionality - Conformance to standards - Power efficiency

Perceptive Pixel © Copyright Khronos Group 2012 | Page 28 Path Rendering Acceleration • Offload the CPU so the application can run as fast as possible - Make maximum use of the GPU for best performance and power

CPU creates paths CPU creates paths CPU creates paths

CPU tessellates paths CPU renders paths into polygons CPU

GPU Use standard 3D Define new OpenGL commands to path commands to process polygons process paths directly

- Software Scanline renderers can - Tessellation loads the CPU – stealing - Maximum CPU offload be high quality and portable cycles from the application so perf - Compact data format sent - CPU has to process complete sometimes slower than software alone to GPU renderer pipeline – stealing cycles - Tessellation consumes a lot of data - GPU provides excellent from the application and memory bandwidth = power performance and power - Software rendering limits - Quality can be compromised due to - GPU can increase quality performance tessellation accuracy and functionality

© Copyright Khronos Group 2012 | Page 29 NV_path_rendering OpenGL Extension Brings Path processing directly to OpenGL No tessellation necessary Goals Functionally complete for key standards: SVG, Canvas, PostScript etc. Much faster—often 4x to 100x faster than CPUs Enhanced quality – can avoid approximations needed by CPU renderers Lower power by leveraging dedicated hardware New functionality – e.g. mix 2D paths with 3D and programmable shading

© 2012 NVIDIA - Page 30 Stencil then Cover Approach Map the path rendering task from a sequential algorithm… …to a pipelined and massively parallel task Create a path object and pass directly to the GPU Cubic & quadratic Bezier segments, line segments, partial elliptical arcs “Stencil” the path object into the stencil buffer GPU provides fast stenciling of filled or stroked paths Calculate winding rule or containment at every sub-pixel sample in parallel “Cover” the path object and stencil test against its coverage Test against path coverage determined in the 1st step and shade the path Uses GPU MSAA anti-aliasing Step 1 Step 2: 8 or 16 samples/pixel gives good quality Stencil Cover repeat

© 2012 NVIDIA - Page 31 Excellent Geometric Fidelity for Stroking

Correct stroking is hard Lots of CPU implementations GPU-accelerated OpenVG reference approximate stroking GPU-accelerated stroking avoids   such short-cuts GPU has FLOPS to compute true stroke point containment Qt  

Stroking with tight end-point curve

© 2012 NVIDIA - Page 32 Micrography

“Girl with Words in Her Hair” 591 paths 338,507 commands 1,244,474 coordinates Ron Maharik, Mikhail Bessmeltsev, Alla Sheffer, Ariel Shamir and Nathan Carr SIGGRAPH 2011 © 2012 NVIDIA - Page 33 More Details on nvpr Functionality union of all major path rendering standards Enables mixing traditional functionality with 3D and programmable shading Point sampling for path filling is exact No approximations due to tessellation or subdivision Path stroking is exact Line segments & quadratic Bezier segments stroking is exact All stroke cap + join styles supported Dashing fully supported Minimal pre-computation required NO tessellation involved, NO recursive subdivision Fast to animate, morph, or edit paths

© 2012 NVIDIA - Page 34 Enhanced Quality on GPU   

weird big holes feathers? Skia Cairo NV_path_rendering  regular grid  jitter pattern Stroking approximations avoided by GPU on CPU - sub-optimal Antialiasing on GPU for better Antialiasing GPU Offers Jittered Sampling for Free GPUs great at texturing: Mip-mapping  Qt Anisotropic filtering

Wrap modes

Moiré  GPU artifacts

Similar for Qt & Skia color bleeding  Cairo  conflation artifacts on CPU  conflation free on GPU Eliminate Conflation Artifacts Proper gradient filtering on GPU Multiple color AND stencil samples per pixel © 2012 NVIDIA - Page 35 Comparing Performance

© 2012 NVIDIA - Page 36 Comparative Performance (Logarithmic Scale)

Welsh_dragon Cougar tiger Celtic_round_dogsbutterfly spikesAmerican_Samoacowboy BuonaparteEmbrace_the_WorldYokozawa tiger_clipped_by_he 100x100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 100x100 200x200 300x300 400x400 500x500 600x600 700x700 800x800 900x900 1000x1000 1100x1100 1000.00 NVpr16/Cairo

NVpr16/SkiaBitmap

NVpr16/SkiaGanesh

NVpr16/Direct2D GPU

100.00 NVpr16/Direct2D WARP

10.00

1.00

0.10

GeForce GTX 480. Release drivers V.300. x16 MSAA © 2012 NVIDIA - Page 37 New GPU Functionality

Projective Transformation

Fast Arbitrary Path Clipping light source position for BUMP Mapping

Programmable Shading  linear RGB Paint in GLSL – for filter and transition between saturated red and saturated blending acceleration blue has dark purple region

 sRGB perceptually smooth transition from saturated red to saturated blue Mixing depth tested Fully sRGB Correct Rendering Text, 3D, and Paths

© 2012 NVIDIA - Page 38 Mixing 2D and 3D

© 2012 NVIDIA - Page 39 Resolution-independent Font Support Fonts are a standard, first-class part of all path rendering systems Foreign to 3D graphics systems such as OpenGL and NV_path_rendering has built-in font support Can specify a range of path objects with A specified font Sequence or range of character points No requirement for applications use font API to load glyphs You can also load glyphs “manually” from your own glyph outlines Functionality provides OS portability

© 2012 NVIDIA - Page 40 Path Geometric Queries glIsPointInFillPathNV Determine if object-space (x,y) position is inside or outside path, given a winding number mask glIsPointInStrokePathNV Determine if object-space (x,y) position is inside the stroke of a path accounts for dash pattern, joins, and caps glGetPathLengthNV Returns approximation of geometric length of a given sub-range of path segments glPointAlongPathNV Returns the object-space (x,y) position and 2D tangent vector a given offset into a specified path object Useful for “text follows a path” Queries are modeled after OpenVG queries

© 2012 NVIDIA - Page 41 Accelerated SVG Renderer Partial SVG Renderer - pr_svg Path filling, transformations and grouping Path stroking with all stroking embellishments Clipping – including clipping paths to other arbitrary paths Painting with linear/radial gradients and images Basic compositing Coming in next update: markers and text Stuff that’s missing from pr_svg Filters, Blending, Opacity groups, Animation, JavaScript integration Not hard, just best done in context of a browser NVIDIA welcomes any community involvement http://developer.nvidia.com/nv-path-rendering

© 2012 NVIDIA - Page 42 More Information Best drivers: OpenGL 4.3 beta driver www.nvidia.com/drivers Grab the latest Beta drivers for your OS & GPU Runs on any CUDA-capable GPU (GeForce 8 onwards) Developer resources http://developer.nvidia.com/nv-path-rendering Whitepapers, FAQ, specification NVprSDK—software development kit NVprDEMOs—pre-compiled Windows demos YouTube videos demonstrate various NVpr DEMOs Email: [email protected]

© 2012 NVIDIA - Page 43 Standardization and Adoption Pipeline NVIDIA is committed to propose nvpr to OpenGL working group at Khronos to create open, royalty-free cross platform foundation for vector graphics acceleration

Initial functionality proposal. Pervasive multi-vendor availability. Desktop and mobile Prove concepts. Widespread application usage displays typically Solicit industry feedback inspires silicon optimizations >300 DPI

Vendor OpenGL OpenGL vector Vector acceleration acceleration Extension Extension adopted into pervasive on desktop to OpenGL or Core OpenGL ES and mobile

nvpr is here! Mobile silicon is CUDA/OpenCL capable

© 2012 NVIDIA - Page 44 Web Page Rendering

© 2012 NVIDIA - Page 45 Summary • Open standards such as WebGL and WebCL are enabling web applications to reach the power of the GPU through JavaScript • Acceleration of path primitives directly on GPUs will drive vector and web performance and power to enable new classes of applications and devices • The Web and hardware community have significant opportunity to leverage each others efforts for the benefit of the industry • Khronos is committed to enable the hardware community to be a good citizen in creating the next generation of accelerated

© Copyright Khronos Group 2012 | Page 46