Visual Computing Systems CMU 15-769, Fall 2016 Lecture
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 20: Scheduling the Graphics Pipeline on a GPU Visual Computing Systems CMU 15-769, Fall 2016 Today ▪ Real-time 3D graphics workload metrics ▪ Scheduling the graphics pipeline on a modern GPU CMU 15-769, Fall 2016 Quick aside: tessellation CMU 15-769, Fall 2016 Triangle size (data from 2010) 30 20 10 Percentage of total triangles of total Percentage 0 [0-1] [1-5] [5-10] [10-20] [20-30] [30-40] [40-50] [50-60] [60-70] [70-80] [80-90] [90-100] [> 100] Triangle area (pixels) [source: NVIDIA] CMU 15-769, Fall 2016 Low geometric detail Credit: Pro Evolution Soccer 2010 CMU (Konami) 15-769, Fall 2016 Surface tessellation Approximating Subdivision Surfaces with Gregory Patches Procedurally generate Approximatingfne triangle mesh from coarse Subdivision mesh representation Surfaces with Gregory Patches for Hardwarefor Hardware Tessellation Tessellation CharlesCharles Loop Loop Scott SchaeferScott Schaefer TianyunTianyun Ni NiIgnacio Casta˜noIgnacio Casta˜no MicrosoftMicrosoft Research Research TexasTexas A&M A&M University University NVIDIANVIDIA NVIDIA NVIDIA Coarse geometry Post-Tessellation (fne) geometry [image credit:Figure Loop et al. 1: 2009]The first image (far left) illustrates an input control mesh; CMUregular 15-769, Fall 201 (gold)6 faces do not have an incident extraordinary vertex, irregularFigure quads 1: The (purple) first haveimage at (far least left) one extraordinary illustrates an ve inputrtex, and control triangular mesh; (green)regular faces (gold) are allowed. faces do The not second have an and incident third images extraordinary show vertex, theirregular parametric quads patches (purple) we generate. have at The least final one image extraordinary is of the same vertex, surface and with triangular a displacement (green) map faces applied. are allowed. The second and third images show the parametric patches we generate. The final image is of the same surface with a displacement map applied. Abstract non-realtime parts of production such as cut-scenes, but their real- Abstract time counternon-realtime parts are simplified parts of production polygon models such asof these cut-scenes, highres- but their real- We present a new method for approximating subdivision sur- olution characters.time counter Ideally, parts we are could simplified use a high polygon resolution modelspolygon of these highres- facesWe with present hardware a new accelerated method parametric for approximating patches. Our subdivision method sur-model extractedolution characters. at a high level Ideally, of subdivision we could to use better a high approximate resolution polygon improvesfaces with the memory hardware bandwidth accelerated requirements parametric for patches. patch contr Ourol methodthe character.model However, extracted this at a approach high level has of a number subdivision of proble to betterms: approximate points,improves translating the memory into superior bandwidth performance requirements compared for to existing patch control the character. However, this approach has a number of problems: methods. Our input is general, allowing for meshes that contain • Animation requires updating a large number of vertices each points, translating into superior performance compared to existing frame using bone weights or morph targets, consuming com- both quadrilateral and triangular faces in the input controlmesh,as • Animation requires updating a large number of vertices each methods. Our input is general, allowing for meshes that contain putational resources and harming performance. well as control meshes with boundary. We present two implementa- frame using bone weights or morph targets, consuming com- tionsboth of ourquadrilateral scheme designed and triangular to run on faces Direct3D in the 11 input class contro hardwarelmesh,as well as control meshes with boundary. We present two implementa- • Facetingputational artifacts occur, resources due to and the static harming nature performance. of the polygon equipped with a tessellator unit. mesh connectivity. tions of our scheme designed to run on Direct3D 11 class hardware • Faceting artifacts occur, due to the static nature of the polygon equipped with a tessellator unit. • Large polygonmesh connectivity. meshes consume significant disk, bus, and net- 1 Introduction work resources to store and transmit. • Large polygon meshes consume significant disk, bus, and net- Catmull-Clark1 Introduction subdivision surfaces [Catmull and Clark 1978] have Given that thesework subdivision resources surfaces to store already and transmit. exist, we could sim- become a standard for modeling free form shapes such as dynamic plify the content creation pipeline by skipping the precomputed, charactersCatmull-Clark in movies subdivision and computer surfaces games. [Catmull By adding and displace Clarkment 1978] havefixed polygonalizationGiven that these step. subdivision surfaces already exist, we could sim- maps,become we can a standard create highly for modelingdetailed shapes free usingform shapesa minimal such am asount dynamic plify the content creation pipeline by skipping the precomputed, ofcharacters storage [Lee in et movies al. 2000]. and Tools computer such games. as ZBrush By combine adding displace these ment fixed polygonalization step. twomaps, ideas we to allow can create artists highly to edit detailed models at shapes multiple using resolut a minimalions and amount automaticallyof storage [Lee create et low al. resolution 2000]. Tools control such meshes as ZBrush and dis combineplace- these ment maps. two ideas to allow artists to edit models at multiple resolutions and Despiteautomatically the prevalence create of low subdivision resolution surfaces, control realtime meshes applica- and displace- tionsment such maps. as games predominately use polygon models to repre- sent their geometry. The reason is understandable as GPU’s are designedDespite to the accelerate prevalence polygon of subdivision rendering and surfaces, do so well. realtime Yet,in applica- manytions cases, such subdivision as games predominately surfaces are already use polygon part of the models content to repre- creationsent their pipeline geometry. for these The applications. reason is These understandable surfaces are used as GPU’s in are designed to accelerate polygon rendering and do so well. Yet,in many cases, subdivision surfaces are already part of the content creation pipeline for these applications. These surfaces are used in Figure 2: The Direct3D 11 graphics pipeline. To address this issue, API designers and hardware vendors have added a new tessellatorFigure unit 2:toThe the Direct3D graphics pipeline 11 graphics in Direct3D pipeline. 11 [Drone et al. 2008]. This change adds new programmable stages, the hull shader and the domain shader, to the graphics pipeline that To address this issue, API designers and hardware vendors have added a new tessellator unit to the graphics pipeline in Direct3D 11 [Drone et al. 2008]. This change adds new programmable stages, the hull shader and the domain shader, to the graphics pipeline that Graphics pipeline with tessellation Vertex Generation Coarse Vertices Five programmable stages in modern pipeline 1 in / 1 out Vertex Processing (OpenGL 4, Direct3D 11) Coarse Primitives Coarse Primitive Processing Vertex Generation 1 in / 1 out 1 in / N out Tessellation Vertices Fine Vertices 1 in / 1 out Vertex Processing 1 in / 1 out Fine Vertex Processing 3 in / 1 out Primitive Generation 3 in / 1 out Fine Primitive Generation (for tris) (for tris) Primitives Fine Primitives 1 in / small N out Primitive Processing 1 in / small N out Fine Primitive Processing Rasterization Rasterization 1 in / N out 1 in / N out (Fragment Generation) (Fragment Generation) Fragments Fragments 1 in / 1 out Fragment Processing 1 in / 1 out Fragment Processing Pixels 1 in / 0 or 1 out Frame-Buffer Ops Pixels 1 in / 0 or 1 out Frame-Buffer Ops CMU 15-769, Fall 2016 Graphics workload metrics CMU 15-769, Fall 2016 Key 3D graphics workload metrics ▪ Data amplifcation from stage to stage - Triangle size (amplifcation in rasterizer) - Expansion during primitive processing (if enabled) - Tessellation factor (if tessellation enabled) ▪ [Vertex/fragment/geometry] shader cost - How many instructions? - Ratio of math to data access instructions? ▪ Scene depth complexity - Determines number of depth and color buffer writes - Recall: early/high Z-cull optimizations are most efficient when pipeline receives triangles in depth order CMU 15-769, Fall 2016 Amount of data generated Vertex Generation (size of stream between consecutive stages) Compact geometric model Coarse Vertices 1 in / 1 out Vertex Processing Coarse Primitives Coarse Primitive Processing 1 in / 1 out High-resolution 1 in / N out Tessellation (post tessellation) mesh Fine Vertices 1 in / 1 out Fine Vertex Processing 3 in / 1 out Fine Primitive Generation “Diamond” structure of (for tris) Fine Primitives graphics workload 1 in / small N out Fine Primitive Processing Intermediate data streams tend to Rasterization 1 in / N out (Fragment Generation) be larger than scene inputs or Fragments image output Fragments 1 in / 1 out Fragment Processing Pixels 1 in / 0 or 1 out Frame-Buffer Ops Frame buffer pixels CMU 15-769, Fall 2016 Scene depth complexity [Imagination Technologies] Rough approximation: TA = SD T = # triangles A = average triangle area S = pixels on screen D = average depth complexity CMU 15-769, Fall 2016 Graphics pipeline workload changes dramatically across draw commands ▪ Triangle size is scene and frame dependent - Move far away from