Realtime Computer Graphics on Gpus Speedup Techniques, Other Apis
Total Page:16
File Type:pdf, Size:1020Kb
Optimizations Intro Optimize Rendering Textures Other APIs Realtime Computer Graphics on GPUs Speedup Techniques, Other APIs Jan Kolomazn´ık Department of Software and Computer Science Education Faculty of Mathematics and Physics Charles University in Prague 1 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Optimizations Intro 2 / 34 Optimizations Intro Optimize Rendering Textures Other APIs PERFORMANCE BOTTLENECKS I I Most of the applications require steady framerate – What can slow down rendering? I Too much geometry rendered I CPU/GPU can process only limited amount of data per second I Render only what is visible and with adequate details I Lighting/shading computation I Use simpler material model I Limit number of generated fragments I Data transfers between CPU/GPU I Try to reuse/cache data on GPU I Use async transfers 3 / 34 Optimizations Intro Optimize Rendering Textures Other APIs PERFORMANCE BOTTLENECKS II I State changes I Bundle object by materials I Use UBOs (uniform buffer objects) I GPU idling – cannot generate work fast enough I Multithreaded task generation I Not everything must be done in every frame – reuse information (temporal consistency) I CPU/Driver hotspots I Bindless textures I Instanced rendering I Indirect rendering 4 / 34 Optimizations Intro Optimize Rendering Textures Other APIs DIFFERENT NEEDS I Large open world I Rendering lots of objects I Not so many details needed for distant sections I Indoors scenes I Often lots of same objects – instaced rendering I Only small portion of the scene visible – occluded by walls, . I CAD, molecular biology, . I All geometry visible at once I Lots of self-occlusions I Switching levels of detail 5 / 34 Optimizations Intro Optimize Rendering Textures Other APIs PROFILING I Three rules of profiling: I Measure! I Measure! I Measure! I How to measure? I Code instrumentation I Sampling profilers I HW event counters I Profiling influences performance I Tools for GPU: I NVIDIA Nsight I RenderDoc I CodeXL 6 / 34 Optimizations Intro Optimize Rendering Textures Other APIs SCENE GRAPH I Boundary volume hierarchies (BVH) I Spatial partitioning I Binary space partitioning I Quadtrees, octrees I Grids I Additional meta-objects: I Occluders I Collision geometry 7 / 34 Optimizations Intro Optimize Rendering Textures Other APIs STANDARD RENDERING PROCEDURE 1. Analyze world view 2. Transfer data on GPU I Upload/reuse geometry in VBOs I Upload/reuse textures 3. Issue draw commands 8 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Optimize Rendering 9 / 34 Optimizations Intro Optimize Rendering Textures Other APIs QUERY OBJECTS I How to use: glGenQueries(GLsizei n, GLuint * ids); glBeginQuery(GLenum target, GLuint id); // ... glEndQuery(GLenum target); I Example queries: GL_SAMPLES_PASSED GL_ANY_SAMPLES_PASSED GL_PRIMITIVES_GENERATED ... I Usualy used with one frame lag 10 / 34 Optimizations Intro Optimize Rendering Textures Other APIs OCCLUSION CULLING I Do not render objects hidden behind others I Helper objects – occluders I CPU processing I Analyze scene graph + occluders to filter rendered geometry I GPU processing I Z-buffer pre-render I Render occluders to Z-buffer I Occlusion queries I Temporal consistency – Z-buffer reprojection 11 / 34 Optimizations Intro Optimize Rendering Textures Other APIs CONDITIONAL RENDERING I GL version is 3.0 or greater I Render cheap object I Use occlusion query to see if any of it is visible I If it isn’t, skip rendering of expensive object glBeginConditionalRender(GLuint id, GLenum mode); glEndConditionalRender(); 12 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Z-BUFFER OPTIMIZATIONS I Early Z-test: I Fragment shader cannot modify depth value I Front-to-back rendering (rough sort) I Double speed Z-only: I First pass depth (stencil) only, write z-buffer – faster when no color buffer present I Second pass color writes, z-buffer read-only I Optimizations: I Z-pass for major occluders I Deffered shading 13 / 34 Optimizations Intro Optimize Rendering Textures Other APIs BACKFACE CULLING I Good closed meshes I No need for triangle back-side rasterization I GPU can filter (face cull) according to vertex order: I glEnable( GL CULL FACE ); I glFrontFace( GL CCW ); I glCullFace( GL BACK ); // draw front faces only 14 / 34 Optimizations Intro Optimize Rendering Textures Other APIs CLIPPING PLANES I Objects outside of the view frustum are not rendered I Set far-plane reasonable close (also increases z-buffer precision) I Harsh end of all geometry I Render background geometry I Hide the far plane in fog (glFog()) I More sophisticated effect in fragment shader/deffered shading 15 / 34 Optimizations Intro Optimize Rendering Textures Other APIs LEVEL OF DETAIL (LOD) I Optimal rendering efficiency: I details of distant objects ( pixel size) need not be drawn I the closest objects (and/or user focus) need the best available visual quality I dynamic level of detail I rendering system adjusts individual rendering accuracy I global parameter definition (e.g. total approximate number of drawn triangles) I advance data preparation: discrete LoD levels / continuous LoD pre-processing.. 16 / 34 Optimizations Intro Optimize Rendering Textures Other APIs DISCRETE LOD I fixed LoD levels are prepared in advance I shape approximations with different accuracy I can come from the finest model – generalization (can be time consuming) I rendering – choosing appropriate LoD level: I according to object-camera distance I according to bounding object projection size or even exact object projection – errors perpendicular to viewing direction are most noticeable I object importance, “player focus”, .. I global balancing (declared number of triangles to draw) 17 / 34 Optimizations Intro Optimize Rendering Textures Other APIs LODTRANSITIONS I simple switching I hysteresis must be introduced (against unwanted ”popping”) I blending neighboring LoD levels I drawing both neighboring levels using transparency I linear combination (blending, “transition”) I another option I current LoD level is opaque I additional semitransparent new LoD (“over” a operation) I “z-writing” enabled only for the current LoD level 18 / 34 Optimizations Intro Optimize Rendering Textures Other APIs GENERATING LODS I top-down approach I based on the simplest shape representation (low-poly), additional details are introduced I less frequent (subdivision surfaces..) I bottom-up I starts with detailed (most accurate) model I gradual simplification / generalization (data reduction) 19 / 34 Optimizations Intro Optimize Rendering Textures Other APIs TRIANGLE STRIPS/FANS v0 v0 v1 v1 v1 v2 v2 v2 v0 v3 v3 v3 v5 v5 v4 v4 v4 v5 Figure: Figure: Figure: GL TRIANGLES GL TRIANGLE STRIP GL TRIANGLE FAN 20 / 34 Optimizations Intro Optimize Rendering Textures Other APIs LIMIT DRAW CALLS I Instanced rendering I Multidraw commands – bundle more indexed render calls together void glMultiDrawElements(GLenum mode, const GLsizei * count, GLenum type, const GLvoid *indices, GLsizei drawcount); I Indirect rendering I GL DRAW INDIRECT BUFFER I Fill on GPU void glDrawElementsIndirect() void glDrawArraysIndirect() void glMultiDrawElementsIndirect() 21 / 34 Optimizations Intro Optimize Rendering Textures Other APIs MULTITHREADING I Contexts are not thread-safe I Context can be current only in one thread I It can be used in multiple threads, but not in parallel I Multiple contexts can be used concurrently I Enable resource sharing – textures, VBOs, . I Split preparation work between multiple threads (task systems) 22 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Textures 23 / 34 Optimizations Intro Optimize Rendering Textures Other APIs TEXTURE ACCESS I Texture accesses expensive I Limit number of tex reads per fragment I Use reasonable sized textures I Use mip-mapping I Neighboring fragments should access neighboring texels I Textures have spatial caching 24 / 34 Optimizations Intro Optimize Rendering Textures Other APIs TEXTURE COMPRESSION I I Invention: S3 in DirectX 6 (1998) I DXTC, in OpenGL: S3TC, DXT1 I Fixed compression ratio I necessary for memory management I 4:1 to 6:1 lossy compression I Decomposition into rectangle tiles (4 × 4 px) I each tile: two 16-bit colors and sixteen 2-bit indices (together – 4 bpp) I two extreme colors (R5G6B5), two more in-between colors (or one in-between and black) I each pixel is represented by a reference to one color I Extensions add alpha compression (128-bit): DXT3, DXT5 25 / 34 Optimizations Intro Optimize Rendering Textures Other APIs TEXTURE COMPRESSION II I NVIDIA VTC (Volume Texture Compression) I 3D variant of S3TC I data blocks 4 × 4 × 1, 4 × 4 × 2, 4 × 4 × 4 I BPTC Texture Compression: I Byte stream I Unsigned + float I Multiple gradients 26 / 34 Optimizations Intro Optimize Rendering Textures Other APIs Other APIs 27 / 34 Optimizations Intro Optimize Rendering Textures Other APIs OPENGL ES I OpenGL for Embedded Systems (smartphones, tablet computers, video game consoles) I Most widely deployed 3D graphics API in history I Subset of the OpenGL API I Latest version of OpenGL ES 3.2 I Adds tesselation and geometry shaders 28 / 34 Optimizations Intro Optimize Rendering Textures Other APIs WEBGL I JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-ins I Control code written in JavaScript and shader code that is written in OpenGL ES Shading Language I WebGL 1.0 is based on OpenGL ES 2.0, HTML5 canvas element I WebGL 2.0 is based on OpenGL ES 3.0 I WebAssembly I Compile