<<

Optimizations Intro Optimize Rendering Textures Other

Realtime Computer Graphics on GPUs Speedup Techniques, Other APIs

Jan Kolomazn´ık

Department of Software and Computer Science Education Faculty of Mathematics and Physics Charles University in Prague

1 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

Optimizations Intro

2 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

PERFORMANCE BOTTLENECKS I

I Most of the applications require steady framerate – What can slow down rendering? I Too much geometry rendered I CPU/GPU can process only limited amount of data per second I Render only what is visible and with adequate details I Lighting/shading computation I Use simpler material model I Limit number of generated fragments I Data transfers between CPU/GPU I Try to reuse/cache data on GPU I Use async transfers

3 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

PERFORMANCE BOTTLENECKS II

I State changes I Bundle object by materials I Use UBOs (uniform buffer objects) I GPU idling – cannot generate work fast enough I Multithreaded task generation I Not everything must be done in every frame – reuse information (temporal consistency) I CPU/Driver hotspots I Bindless textures I Instanced rendering I Indirect rendering

4 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

DIFFERENT NEEDS

I Large open world I Rendering lots of objects I Not so many details needed for distant sections I Indoors scenes I Often lots of same objects – instaced rendering I Only small portion of the scene visible – occluded by walls, . . . I CAD, molecular biology, . . . I All geometry visible at once I Lots of self-occlusions I Switching levels of detail

5 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

PROFILING

I Three rules of profiling: I Measure! I Measure! I Measure! I How to measure? I Code instrumentation I Sampling profilers I HW event counters I Profiling influences performance I Tools for GPU: I Nsight I RenderDoc I CodeXL

6 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

SCENE GRAPH

I Boundary volume hierarchies (BVH) I Spatial partitioning I Binary space partitioning I Quadtrees, octrees I Grids I Additional meta-objects: I Occluders I Collision geometry

7 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

STANDARD RENDERING PROCEDURE

1. Analyze world view 2. Transfer data on GPU I Upload/reuse geometry in VBOs I Upload/reuse textures 3. Issue draw commands

8 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

Optimize Rendering

9 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

QUERY OBJECTS

I How to use:

glGenQueries(GLsizei n, GLuint * ids); glBeginQuery(GLenum target, GLuint id); // ... glEndQuery(GLenum target);

I Example queries: GL_SAMPLES_PASSED GL_ANY_SAMPLES_PASSED GL_PRIMITIVES_GENERATED ... I Usualy used with one frame lag

10 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

OCCLUSION CULLING

I Do not render objects hidden behind others I Helper objects – occluders I CPU processing I Analyze scene graph + occluders to filter rendered geometry I GPU processing I Z-buffer pre-render I Render occluders to Z-buffer I Occlusion queries I Temporal consistency – Z-buffer reprojection

11 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

CONDITIONAL RENDERING

I GL version is 3.0 or greater I Render cheap object I Use occlusion query to see if any of it is visible I If it isn’t, skip rendering of expensive object

glBeginConditionalRender(GLuint id, GLenum mode); glEndConditionalRender();

12 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

Z-BUFFER OPTIMIZATIONS

I Early Z-test: I Fragment cannot modify depth value I Front-to-back rendering (rough sort) I Double speed Z-only: I First pass depth (stencil) only, write z-buffer – faster when no color buffer present I Second pass color writes, z-buffer read-only I Optimizations: I Z-pass for major occluders I Deffered shading

13 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

BACKFACE CULLING

I Good closed meshes I No need for triangle back-side rasterization I GPU can filter (face cull) according to vertex order: I glEnable( GL CULL FACE ); I glFrontFace( GL CCW ); I glCullFace( GL BACK ); // draw front faces only

14 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

CLIPPING PLANES

I Objects outside of the view frustum are not rendered I Set far-plane reasonable close (also increases z-buffer precision) I Harsh end of all geometry I Render background geometry I Hide the far plane in fog (glFog()) I More sophisticated effect in fragment shader/deffered shading

15 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

LEVELOF DETAIL (LOD)

I Optimal rendering efficiency: I details of distant objects ( pixel size) need not be drawn I the closest objects (and/or user focus) need the best available visual quality I dynamic level of detail I rendering system adjusts individual rendering accuracy I global parameter definition (e.g. total approximate number of drawn triangles) I advance data preparation: discrete LoD levels / continuous LoD pre-processing..

16 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

DISCRETE LOD

I fixed LoD levels are prepared in advance I shape approximations with different accuracy I can come from the finest model – generalization (can be time consuming) I rendering – choosing appropriate LoD level: I according to object-camera distance I according to bounding object projection size or even exact object projection – errors perpendicular to viewing direction are most noticeable I object importance, “player focus”, .. I global balancing (declared number of triangles to draw)

17 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

LODTRANSITIONS

I simple switching I hysteresis must be introduced (against unwanted ”popping”) I blending neighboring LoD levels I drawing both neighboring levels using transparency I linear combination (blending, “transition”) I another option I current LoD level is opaque I additional semitransparent new LoD (“over” a operation) I “z-writing” enabled only for the current LoD level

18 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

GENERATING LODS

I top-down approach I based on the simplest shape representation (low-poly), additional details are introduced I less frequent (subdivision surfaces..) I bottom-up I starts with detailed (most accurate) model I gradual simplification / generalization (data reduction)

19 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

TRIANGLE STRIPS/FANS

v0 v0 v1

v1 v1 v2

v2 v2 v0 v3 v3 v3

v5 v5 v4 v4 v4 v5 Figure: Figure: Figure: GL TRIANGLES GL TRIANGLE STRIP GL TRIANGLE FAN

20 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

LIMIT DRAW CALLS

I Instanced rendering I Multidraw commands – bundle more indexed render calls together void glMultiDrawElements(GLenum mode, const GLsizei * count, GLenum type, const GLvoid *indices, GLsizei drawcount); I Indirect rendering I GL DRAW INDIRECT BUFFER I Fill on GPU void glDrawElementsIndirect() void glDrawArraysIndirect() void glMultiDrawElementsIndirect()

21 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

MULTITHREADING

I Contexts are not thread-safe I Context can be current only in one thread I It can be used in multiple threads, but not in parallel I Multiple contexts can be used concurrently I Enable resource sharing – textures, VBOs, . . . I Split preparation work between multiple threads (task systems)

22 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

Textures

23 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

TEXTURE ACCESS

I Texture accesses expensive I Limit number of tex reads per fragment I Use reasonable sized textures I Use mip-mapping I Neighboring fragments should access neighboring texels I Textures have spatial caching

24 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

TEXTURE COMPRESSION I

I Invention: S3 in DirectX 6 (1998) I DXTC, in OpenGL: S3TC, DXT1 I Fixed compression ratio I necessary for memory management I 4:1 to 6:1 lossy compression I Decomposition into rectangle tiles (4 × 4 px) I each tile: two 16- colors and sixteen 2-bit indices (together – 4 bpp) I two extreme colors (R5G6B5), two more in-between colors (or one in-between and black) I each pixel is represented by a reference to one color I Extensions add alpha compression (128-bit): DXT3, DXT5

25 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

TEXTURE COMPRESSION II

I NVIDIA VTC (Volume Texture Compression) I 3D variant of S3TC I data blocks 4 × 4 × 1, 4 × 4 × 2, 4 × 4 × 4 I BPTC Texture Compression: I stream I Unsigned + float I Multiple gradients

26 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

Other APIs

27 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

OPENGLES

I OpenGL for Embedded Systems (smartphones, tablet computers, video game consoles) I Most widely deployed 3D graphics API in history I Subset of the OpenGL API I Latest version of OpenGL ES 3.2 I Adds tesselation and geometry

28 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

WEBGL

I JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-ins I Control code written in JavaScript and shader code that is written in OpenGL ES Shading Language I WebGL 1.0 is based on OpenGL ES 2.0, HTML5 canvas element I WebGL 2.0 is based on OpenGL ES 3.0 I WebAssembly I Compile code for virtual architecture I Executable in browser + OpenGL ES

29 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

DIRECTX 11 ()

I Available only for Microsoft platforms I Wraps same hardware – similar principles I API organization different (state machine vs. parameter objects) I Component Object Model (COM) I MS binary-interface standard for software components I Support for interprocess communication, multiple languages I Entities (textures, render targets, shaders, . . . ) are COM objects I Shading language HLSL I Similar to GLSL (almost 1:1 mapping) I Cross compilers available

30 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

DIRECT 3D

Basic interfaces:

IDXGISwapChain *swapchain ;// the pointer to the swap chain interface ID3D11Device *dev ;// the pointer to our Direct3D device interface ID3D11DeviceContext *devcon ;// the pointer to our Direct3D device context

I ID3D11Device thread-safe I Can prepare resources in multiple threads I ID3D11DeviceContext must use some synchronization I Draw commands cannot be issued in parallel

31 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

NEW APIS

I Opengl, DirectX (11 and older) I Old APIs – designed fo different ecosystems I Hotfixed for demands of modern hardware and software architectures I CPU time in driver – hot spot in lots of modern applications I Multicore CPUs cannot efficiently feed command to GPUs I AMD Mantle → Vulcan, I Explicit command buffer control I Multithreaded parallel rendering I Lower level API – developers have more control (also more code is needed) I DirectX 12 I Added similar features I Command queues, feeding from multiple threads I Metal 32 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

VULCAN

33 / 34 Optimizations Intro Optimize Rendering Textures Other APIs

VULCAN USAGE

I Create a VkInstance I Select a supported graphics card (VkPhysicalDevice) I Create a VkDevice and VkQueue for drawing and presentation I Create a window, window surface and swap chain I Wrap the swap chain images into VkImageView I Create a render pass that specifies the render targets and usage I Create framebuffers for the render pass I Set up the graphics pipeline I Allocate and record a command buffer with the draw commands for every possible swap chain image I Draw frames by acquiring images, submitting the right draw command buffer and returning the images back to the swap chain 34 / 34