Realtime Computer Graphics on Gpus Speedup Techniques, Other Apis

Optimizations Intro Optimize Rendering Textures Other APIs Realtime Computer Graphics on GPUs Speedup Techniques, Other APIs Jan Kolomazn´ık Department of Software and Computer Science Education Faculty of Mathematics and Physics Charles University in Prague 1 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Optimizations Intro 2 / 34 Optimizations Intro Optimize Rendering Textures Other APIs PERFORMANCE BOTTLENECKS I I Most of the applications require steady framerate – What can slow down rendering? I Too much geometry rendered I CPU/GPU can process only limited amount of data per second I Render only what is visible and with adequate details I Lighting/shading computation I Use simpler material model I Limit number of generated fragments I Data transfers between CPU/GPU I Try to reuse/cache data on GPU I Use async transfers 3 / 34 Optimizations Intro Optimize Rendering Textures Other APIs PERFORMANCE BOTTLENECKS II I State changes I Bundle object by materials I Use UBOs (uniform buffer objects) I GPU idling – cannot generate work fast enough I Multithreaded task generation I Not everything must be done in every frame – reuse information (temporal consistency) I CPU/Driver hotspots I Bindless textures I Instanced rendering I Indirect rendering 4 / 34 Optimizations Intro Optimize Rendering Textures Other APIs DIFFERENT NEEDS I Large open world I Rendering lots of objects I Not so many details needed for distant sections I Indoors scenes I Often lots of same objects – instaced rendering I Only small portion of the scene visible – occluded by walls, . I CAD, molecular biology, . I All geometry visible at once I Lots of self-occlusions I Switching levels of detail 5 / 34 Optimizations Intro Optimize Rendering Textures Other APIs PROFILING I Three rules of profiling: I Measure! I Measure! I Measure! I How to measure? I Code instrumentation I Sampling profilers I HW event counters I Profiling influences performance I Tools for GPU: I NVIDIA Nsight I RenderDoc I CodeXL 6 / 34 Optimizations Intro Optimize Rendering Textures Other APIs SCENE GRAPH I Boundary volume hierarchies (BVH) I Spatial partitioning I Binary space partitioning I Quadtrees, octrees I Grids I Additional meta-objects: I Occluders I Collision geometry 7 / 34 Optimizations Intro Optimize Rendering Textures Other APIs STANDARD RENDERING PROCEDURE 1. Analyze world view 2. Transfer data on GPU I Upload/reuse geometry in VBOs I Upload/reuse textures 3. Issue draw commands 8 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Optimize Rendering 9 / 34 Optimizations Intro Optimize Rendering Textures Other APIs QUERY OBJECTS I How to use: glGenQueries(GLsizei n, GLuint * ids); glBeginQuery(GLenum target, GLuint id); // ... glEndQuery(GLenum target); I Example queries: GL_SAMPLES_PASSED GL_ANY_SAMPLES_PASSED GL_PRIMITIVES_GENERATED ... I Usualy used with one frame lag 10 / 34 Optimizations Intro Optimize Rendering Textures Other APIs OCCLUSION CULLING I Do not render objects hidden behind others I Helper objects – occluders I CPU processing I Analyze scene graph + occluders to filter rendered geometry I GPU processing I Z-buffer pre-render I Render occluders to Z-buffer I Occlusion queries I Temporal consistency – Z-buffer reprojection 11 / 34 Optimizations Intro Optimize Rendering Textures Other APIs CONDITIONAL RENDERING I GL version is 3.0 or greater I Render cheap object I Use occlusion query to see if any of it is visible I If it isn’t, skip rendering of expensive object glBeginConditionalRender(GLuint id, GLenum mode); glEndConditionalRender(); 12 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Z-BUFFER OPTIMIZATIONS I Early Z-test: I Fragment shader cannot modify depth value I Front-to-back rendering (rough sort) I Double speed Z-only: I First pass depth (stencil) only, write z-buffer – faster when no color buffer present I Second pass color writes, z-buffer read-only I Optimizations: I Z-pass for major occluders I Deffered shading 13 / 34 Optimizations Intro Optimize Rendering Textures Other APIs BACKFACE CULLING I Good closed meshes I No need for triangle back-side rasterization I GPU can filter (face cull) according to vertex order: I glEnable( GL CULL FACE ); I glFrontFace( GL CCW ); I glCullFace( GL BACK ); // draw front faces only 14 / 34 Optimizations Intro Optimize Rendering Textures Other APIs CLIPPING PLANES I Objects outside of the view frustum are not rendered I Set far-plane reasonable close (also increases z-buffer precision) I Harsh end of all geometry I Render background geometry I Hide the far plane in fog (glFog()) I More sophisticated effect in fragment shader/deffered shading 15 / 34 Optimizations Intro Optimize Rendering Textures Other APIs LEVEL OF DETAIL (LOD) I Optimal rendering efficiency: I details of distant objects ( pixel size) need not be drawn I the closest objects (and/or user focus) need the best available visual quality I dynamic level of detail I rendering system adjusts individual rendering accuracy I global parameter definition (e.g. total approximate number of drawn triangles) I advance data preparation: discrete LoD levels / continuous LoD pre-processing.. 16 / 34 Optimizations Intro Optimize Rendering Textures Other APIs DISCRETE LOD I fixed LoD levels are prepared in advance I shape approximations with different accuracy I can come from the finest model – generalization (can be time consuming) I rendering – choosing appropriate LoD level: I according to object-camera distance I according to bounding object projection size or even exact object projection – errors perpendicular to viewing direction are most noticeable I object importance, “player focus”, .. I global balancing (declared number of triangles to draw) 17 / 34 Optimizations Intro Optimize Rendering Textures Other APIs LODTRANSITIONS I simple switching I hysteresis must be introduced (against unwanted ”popping”) I blending neighboring LoD levels I drawing both neighboring levels using transparency I linear combination (blending, “transition”) I another option I current LoD level is opaque I additional semitransparent new LoD (“over” a operation) I “z-writing” enabled only for the current LoD level 18 / 34 Optimizations Intro Optimize Rendering Textures Other APIs GENERATING LODS I top-down approach I based on the simplest shape representation (low-poly), additional details are introduced I less frequent (subdivision surfaces..) I bottom-up I starts with detailed (most accurate) model I gradual simplification / generalization (data reduction) 19 / 34 Optimizations Intro Optimize Rendering Textures Other APIs TRIANGLE STRIPS/FANS v0 v0 v1 v1 v1 v2 v2 v2 v0 v3 v3 v3 v5 v5 v4 v4 v4 v5 Figure: Figure: Figure: GL TRIANGLES GL TRIANGLE STRIP GL TRIANGLE FAN 20 / 34 Optimizations Intro Optimize Rendering Textures Other APIs LIMIT DRAW CALLS I Instanced rendering I Multidraw commands – bundle more indexed render calls together void glMultiDrawElements(GLenum mode, const GLsizei * count, GLenum type, const GLvoid *indices, GLsizei drawcount); I Indirect rendering I GL DRAW INDIRECT BUFFER I Fill on GPU void glDrawElementsIndirect() void glDrawArraysIndirect() void glMultiDrawElementsIndirect() 21 / 34 Optimizations Intro Optimize Rendering Textures Other APIs MULTITHREADING I Contexts are not thread-safe I Context can be current only in one thread I It can be used in multiple threads, but not in parallel I Multiple contexts can be used concurrently I Enable resource sharing – textures, VBOs, . I Split preparation work between multiple threads (task systems) 22 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Textures 23 / 34 Optimizations Intro Optimize Rendering Textures Other APIs TEXTURE ACCESS I Texture accesses expensive I Limit number of tex reads per fragment I Use reasonable sized textures I Use mip-mapping I Neighboring fragments should access neighboring texels I Textures have spatial caching 24 / 34 Optimizations Intro Optimize Rendering Textures Other APIs TEXTURE COMPRESSION I I Invention: S3 in DirectX 6 (1998) I DXTC, in OpenGL: S3TC, DXT1 I Fixed compression ratio I necessary for memory management I 4:1 to 6:1 lossy compression I Decomposition into rectangle tiles (4 × 4 px) I each tile: two 16-bit colors and sixteen 2-bit indices (together – 4 bpp) I two extreme colors (R5G6B5), two more in-between colors (or one in-between and black) I each pixel is represented by a reference to one color I Extensions add alpha compression (128-bit): DXT3, DXT5 25 / 34 Optimizations Intro Optimize Rendering Textures Other APIs TEXTURE COMPRESSION II I NVIDIA VTC (Volume Texture Compression) I 3D variant of S3TC I data blocks 4 × 4 × 1, 4 × 4 × 2, 4 × 4 × 4 I BPTC Texture Compression: I Byte stream I Unsigned + float I Multiple gradients 26 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Other APIs 27 / 34 Optimizations Intro Optimize Rendering Textures Other APIs OPENGL ES I OpenGL for Embedded Systems (smartphones, tablet computers, video game consoles) I Most widely deployed 3D graphics API in history I Subset of the OpenGL API I Latest version of OpenGL ES 3.2 I Adds tesselation and geometry shaders 28 / 34 Optimizations Intro Optimize Rendering Textures Other APIs WEBGL I JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-ins I Control code written in JavaScript and shader code that is written in OpenGL ES Shading Language I WebGL 1.0 is based on OpenGL ES 2.0, HTML5 canvas element I WebGL 2.0 is based on OpenGL ES 3.0 I WebAssembly I Compile

Realtime Computer Graphics on Gpus Speedup Techniques, Other Apis

Amd Filed: February 24, 2009 (Period: December 27, 2008)

Candidate Features for Future Opengl 5 / Direct3d 12 Hardware and Beyond 3 May 2014, Christophe Riccio

AMD Firepro™ W5000

AMD Firepro™Professional Graphics for CAD & Engineering and Media & Entertainment

Comparison of Technologies for General-Purpose Computing on Graphics Processing Units

AMD Codexl 1.7 GA Release Notes

Media Kit 2019 Technology for Optimal Engineering Design

GPU Rigid Body Simulation Using Opencl Erwin Coumans

Club 3D Radeon R9 380 Royalqueen 4096MB GDDR5 256BIT | PCI EXPRESS 3.0

SAPPHIRE R9 285 2GB GDDR5 ITX COMPACT OC Edition (UEFI)

Porting Source to Linux

AMD Codexl 1.8 GA Release Notes