Visual Computing Systems Stanford CS348K, Spring 2020 Lecture

Lecture 14: Scheduling the Graphics Pipeline on a GPU Visual Computing Systems Stanford CS348K, Spring 2020 Today ▪ Real-time 3D graphics workload metrics ▪ Scheduling the graphics pipeline on a modern GPU Stanford CS348K, Spring 2020 Quick Review Stanford CS348K, Spring 2020 Real-time graphics pipeline architecture Memory Buffers (system state) 3 1 Vertex Generation 4 Vertex data buffers 2 Vertices Vertex stream Vertex Processing Buffers, textures Vertex stream Primitive Generation Primitive stream Primitives Primitive Processing Buffers, textures Primitive stream Fragment Generation (Rasterization) Fragment stream Fragments Fragment Processing Buffers, textures Fragment stream Pixels Pixel Operations Output image buffer Stanford CS348K, Spring 2020 Programming the graphics pipeline ▪ Issue draw commands output image contents change Command Type Command State change Bind shaders, textures, uniforms Draw Draw using vertex buffer for object 1 State change Bind new uniforms Draw Draw using vertex buffer for object 2 State change Bind new shader Draw Draw using vertex buffer for object 3 State change Change depth test function State change Bind new shader Draw Draw using vertex buffer for object 4 Final rendered output should be consistent with the results of executing these commands in the order they are issued to the graphics pipeline. Stanford CS348K, Spring 2020 GPU: heterogeneous parallel processor SIMD SIMD SIMD SIMD Texture Texture Exec Exec Exec Exec Cache Cache Cache Cache Texture Texture SIMD SIMD SIMD SIMD Tessellate Tessellate Exec Exec Exec Exec Tessellate Tessellate Cache Cache Cache Cache GPU Clip/Cull Clip/Cull Rasterize Rasterize Memory SIMD SIMD SIMD SIMD Exec Exec Exec Exec Clip/Cull Clip/Cull Rasterize Rasterize Cache Cache Cache Cache Zbuffer / Zbuffer / Zbuffer / Blend Blend Blend SIMD SIMD SIMD SIMD Exec Exec Exec Exec Zbuffer / Zbuffer / Zbuffer / Blend Blend Blend Cache Cache Cache Cache Scheduler / Work Distributor We’re now going to talk about this scheduler Stanford CS348K, Spring 2020 Graphics workload metrics Stanford CS348K, Spring 2020 Let’s consider different workloads Average triangle size Image credit: https://www.theverge.com/2013/11/29/5155726/next-gen-supplementary-piece http://www.mobygames.com/game/android/ghostbusters-slime-city/screenshots/gameShotId,852293/ Stanford CS348K, Spring 2020 Triangle size (data from 2010) 30 20 10 Percentage of total triangles of total Percentage 0 [0-1] [1-5] [5-10] [10-20] [20-30] [30-40] [40-50] [50-60] [60-70] [70-80] [80-90] [90-100] [> 100] Triangle area (pixels) [source: NVIDIA] Stanford CS348K, Spring 2020 Low geometric detail Credit: Pro Evolution SoccerStanford 2010 CS348K, (Konami) Spring 2020 Surface tessellation Approximating Subdivision Surfaces with Gregory Patches Procedurally generate Approximatingfne triangle mesh from coarse Subdivision mesh representation Surfaces with Gregory Patches for Hardwarefor Hardware Tessellation Tessellation CharlesCharles Loop Loop Scott SchaeferScott Schaefer TianyunTianyun Ni NiIgnacio CastañoIgnacio Castaño MicrosoftMicrosoft Research Research TexasTexas A&M A&M University University NVIDIANVIDIA NVIDIA NVIDIA Coarse geometry Post-Tessellation (fne) geometry [image credit:Figure Loop et al. 1: 2009]The first image (far left) illustrates an input control mesh;Stanfordregular CS348K, Spring 2020 (gold) faces do not have an incident extraordinary vertex, irregularFigure quads 1: The (purple) first have image at (far least left) one extraordinary illustrates an ve inputrtex, and control triangular mesh; (green)regular faces (gold) are allowed. faces do The not second have an and incident third images extraordinary show vertex, theirregular parametric quads patches (purple) we generate. have at The least final one image extraordinary is of the same vertex, surface and with triangular a displacement (green) map faces applied. are allowed. The second and third images show the parametric patches we generate. The final image is of the same surface with a displacement map applied. Abstract non-realtime parts of production such as cut-scenes, but their real- Abstract time counternon-realtime parts are simplified parts of production polygon models such asof these cut-scenes, highres- but their real- We present a new method for approximating subdivision sur- olution characters.time counter Ideally, parts we are could simplified use a high polygon resolution modelspolygon of these highres- facesWe with present hardware a new accelerated method parametric for approximating patches. Our subdivision method sur-model extractedolution characters. at a high level Ideally, of subdivision we could to use better a high approximate resolution polygon improvesfaces with the memory hardware bandwidth accelerated requirements parametric for patches. patch contr Ourol methodthe character.model However, extracted this at a approach high level has of a number subdivision of proble to betterms: approximate points,improves translating the memory into superior bandwidth performance requirements compared for to existing patch control the character. However, this approach has a number of problems: methods. Our input is general, allowing for meshes that contain • Animation requires updating a large number of vertices each points, translating into superior performance compared to existing frame using bone weights or morph targets, consuming com- both quadrilateral and triangular faces in the input controlmesh,as • Animation requires updating a large number of vertices each methods. Our input is general, allowing for meshes that contain putational resources and harming performance. well as control meshes with boundary. We present two implementa- frame using bone weights or morph targets, consuming com- tionsboth of our quadrilateral scheme designed and triangular to run on faces Direct3D in the 11 input class contro hardwarelmesh,as well as control meshes with boundary. We present two implementa- • Facetingputational artifacts occur, resources due to and the static harming nature performance. of the polygon equipped with a tessellator unit. mesh connectivity. tions of our scheme designed to run on Direct3D 11 class hardware • Faceting artifacts occur, due to the static nature of the polygon equipped with a tessellator unit. • Large polygonmesh connectivity. meshes consume significant disk, bus, and net- 1 Introduction work resources to store and transmit. • Large polygon meshes consume significant disk, bus, and net- Catmull-Clark1 Introduction subdivision surfaces [Catmull and Clark 1978] have Given that thesework subdivision resources surfaces to store already and transmit. exist, we could sim- become a standard for modeling free form shapes such as dynamic plify the content creation pipeline by skipping the precomputed, charactersCatmull-Clark in movies subdivision and computer surfaces games. [Catmull By adding and displace Clarkment 1978] havefixed polygonalizationGiven that these step. subdivision surfaces already exist, we could sim- maps,become we can a standard create highly for modelingdetailed shapes free usingform shapesa minimal such am asount dynamic plify the content creation pipeline by skipping the precomputed, ofcharacters storage [Lee in et movies al. 2000]. and Tools computer such games. as ZBrush By combine adding displace these ment fixed polygonalization step. twomaps, ideas we to allow can create artists highly to edit detailed models at shapes multiple using resolut a minimalions and amount automaticallyof storage [Lee create et low al. resolution 2000]. Tools control such meshes as ZBrush and dis combineplace- these ment maps. two ideas to allow artists to edit models at multiple resolutions and Despiteautomatically the prevalence create of low subdivision resolution surfaces, control realtime meshes applica- and displace- tionsment such maps. as games predominately use polygon models to repre- sent their geometry. The reason is understandable as GPU’s are designedDespite to the accelerate prevalence polygon of subdivision rendering and surfaces, do so well. realtime Yet,in applica- manytions cases, such subdivision as games predominately surfaces are already use polygon part of the models content to repre- creationsent their pipeline geometry. for these The applications. reason is These understandable surfaces are used as GPU’s in are designed to accelerate polygon rendering and do so well. Yet,in many cases, subdivision surfaces are already part of the content creation pipeline for these applications. These surfaces are used in Figure 2: The Direct3D 11 graphics pipeline. To address this issue, API designers and hardware vendors have added a new tessellatorFigure unit 2:toThe the Direct3D graphics pipeline 11 graphics in Direct3D pipeline. 11 [Drone et al. 2008]. This change adds new programmable stages, the hull shader and the domain shader, to the graphics pipeline that To address this issue, API designers and hardware vendors have added a new tessellator unit to the graphics pipeline in Direct3D 11 [Drone et al. 2008]. This change adds new programmable stages, the hull shader and the domain shader, to the graphics pipeline that Graphics pipeline with tessellation Vertex Generation Coarse Vertices Five programmable stages in modern pipeline 1 in / 1 out Vertex Processing (OpenGL 4, Direct3D 11) Coarse Primitives Coarse Primitive Processing Vertex Generation 1 in / 1 out 1 in / N out Tessellation Vertices Fine Vertices 1 in / 1 out Vertex Processing 1 in / 1 out Fine Vertex Processing 3 in / 1 out Primitive Generation 3 in / 1

Visual Computing Systems Stanford CS348K, Spring 2020 Lecture

(12) United States Patent (10) Patent No.: US 7,133,041 B2 Kaufman Et Al

PACKET 22 BOOKSTORE, TEXTBOOK CHAPTER Reading Graphics

3D Computer Graphics Compiled By: H

Visual Computing Systems CMU 15-769, Fall 2016 Lecture

Opengl Performer™ Programmer's Guide

Appendix C Graphics and Computing Gpus

An Introductory Tour of Interactive Rendering by Eric Haines, [email protected]

Linzhi Working Papers Posts Against Progpow

Design of a High Performance Volume Visualization System

April/May 1997

Fakultet Za Informatiku I Menadžment

Monte Carlo Geometry Processing:A Grid-Free Approach to PDE-Based