<<

UnderstandingUnderstanding thethe graphicsgraphics pipelinepipeline

LectureLecture 22 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider LectureLecture OutlineOutline

► AA historicalhistorical perspectiveperspective onon thethe graphicsgraphics pipelinepipeline ƒ Dimensions of innovation. ƒ Where we are today ƒ Fixed-function vs programmable pipelines ► AA closercloser looklook atat thethe fixedfixed functionfunction pipelinepipeline ƒ Walk thru the sequence of operations ƒ Reinterpret these as stream operations ► WeWe cancan programprogram thethe fixedfixed--functionfunction pipelinepipeline !! ƒ Some examples ► WhatWhat constitutesconstitutes datadata andand memory,memory, andand howhow accessaccess affectsaffects programprogram design.design. TheThe evolutionevolution ofof thethe pipelinepipeline

Elements of the graphics : Parameters controlling design of the pipeline: 1. A scene description: vertices, triangles, , 1. Where is the boundary between CPU and GPU ? 2. Transformations that map the scene to a camera viewpoint 2. What transfer method is used ? 3. “Effects”: texturing, shadow 3. What resources are provided mapping, lighting calculations at each step ? 4. Rasterizing: converting geometry 4. What units can access which into pixels GPU memory elements ? 5. Pixel processing: depth tests, stencil tests, and other per-pixel operations. GenerationGeneration I:I: 3dfx3dfx VoodooVoodoo (1996)(1996)

• One of the first true 3D game cards • Worked by supplementing standard 2D . • Did not do vertex transformations: these were done in the CPU •Did dotexture mapping, z-buffering. http://accelenation.com/?ac.id.123.2

Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer

CPU GPU PCI GenerationGeneration II:II: GeForce/RadeonGeForce/ 75007500 (1998)(1998)

• Main innovation: shifting the transformation and lighting calculations to the GPU • Allowed multi-texturing: giving bump maps, maps, and others.. • Faster AGP bus instead of PCI http://accelenation.com/?ac.id.123.5

Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer

GPU AGP GenerationGeneration III:III: GeForce3/RadeonGeForce3/Radeon 8500(2001)8500(2001)

• For the first time, allowed limited amount of programmability in the • Also allowed volume texturing and multi-sampling (for antialiasing) http://accelenation.com/?ac.id.123.7

Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer

GPU AGP SmallSmall vertex vertex shadersshaders GenerationGeneration IV:IV: RadeonRadeon 9700/GeForce9700/GeForce FXFX (2002)(2002)

• This generation is the first generation of fully-programmable graphics cards • Different versions have different resource limits on fragment/vertex programs http://accelenation.com/?ac.id.123.8

Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer

AGP ProgrammableProgrammable ProgrammableProgrammable FragmentFragment VertexVertex shader ProcessorProcessor Texture Memory GenerationGeneration IV.V:IV.V: GeForce6/X800GeForce6/X800 (2004)(2004) Not exactly a quantum leap, but… ► Simultaneous rendering to multiple buffers ► True conditionals and loops ► Higher precision throughput in the pipeline (64 bits end-to-end, compared to 32 bits earlier.) ► PCIe bus ► More memory/program length/texture accesses ► Texture access by vertex shader

Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer

AGP ProgrammableProgrammable ProgrammableProgrammable FragmentFragment VertexVertex shader shader ProcessorProcessor

Texture Memory Texture Memory GenerationGeneration V:V: GeForce8800/HD2900GeForce8800/HD2900 (2006)(2006) Complete quantum leap ► Ground-up rewrite of GPU ► Support for DirectX 10, and all it implies (more on this later) ► Geometry Shader ► Support for General GPU programming ► Shared Memory ( only)

Input Programmable ProgrammableProgrammable Input ProgrammableProgrammable Raster Assembler Geometry PixelPixel Assembler VertexVertex shader shader Operations Shader ShaderShader

AGP

Output Merger Fixed-function pipeline 3D API Commands 3D3D API: API: 3D3D OpenGLOpenGL or or ApplicationApplication Direct3DDirect3D OrOr Game Game CPU-GPU Boundary (AGP/PCIe) Data Stream Command & GPU

Vertex Pixel Index Assembled Pixel Primitives Location Updates Stream Stream Rasterization GPU Primitive Raster Frame GPU Primitive and Frame Front End Assembly Operations Buffer Front End Assembly Interpolation Buffer Pre-transformed Pre-transformed Fragments Vertices

ProgrammableProgrammable Vertices ProgrammableProgrammable Fragments Transformed VertexVertex FragmentFragment Transformed ProcessorProcessor ProcessorProcessor AA closercloser looklook atat thethe fixedfixed--functionfunction pipelinepipeline PipelinePipeline InputInput

Vertex Image F(x,y) = (r,g,b,a)

(x, y, z)

(r, g, b,a)

(Nx,Ny,Nz)

(tx, ty,[tz])

(tx, ty)

(tx, ty)

Material properties* ModelViewModelView TransformationTransformation

►►VerticesVertices mappedmapped fromfrom objectobject spacespace toto worldworld spacespace ►►MM == modelmodel transformationtransformation (scene)(scene) ►►VV == viewview transformationtransformation (camera)(camera) Each matrix transform X’ X is applied to each vertex in the input Y Y’ stream. Think of this Z’ M * V * Z as a kernel operator. W’ 1 LightingLighting

LightingLighting informationinformation isis combinedcombined withwith normalsnormals andand otherother parametersparameters atat eacheach vertexvertex inin orderorder toto createcreate newnew colors.colors. (v) = emissive + ambient + diffuse + specular Each term in the right hand side is a function of the vertex color, position, and material properties. Clipping/Projection/Viewport(3D)Clipping/Projection/Viewport(3D)

►►MoreMore matrixmatrix transformationstransformations thatthat operateoperate onon aa vertexvertex toto transformtransform itit intointo thethe viewportviewport space.space. ►►NoteNote thatthat aa vertexvertex maymay bebe eliminatedeliminated fromfrom thethe inputinput streamstream (if(if itit isis clipped).clipped). ►►TheThe viewportviewport isis twotwo--dimensional:dimensional: however,however, vertexvertex zz--valuevalue isis retainedretained forfor depthdepth testing.testing.

Clip test is first example of a conditional in the pipeline. However, it is not a fully general conditional. Why ? Rasterizing+InterpolationRasterizing+Interpolation

►►AllAll primitivesprimitives areare nownow convertedconverted toto fragments.fragments. ►►DataData typetype changechange !! VerticesVertices toto fragmentsfragments

Texture coordinates are interpolated from Fragment attributes: texture coordinates of vertices. (r,g,b,a) This gives us a linear interpolation operator (x,y,z,w) for free. VERY USEFUL ! (tx,ty), … PerPer--fragmentfragment operationsoperations

►►TheThe rasterizerrasterizer producesproduces aa streamstream ofof fragments.fragments. ►►EachEach fragmentfragment undergoesundergoes aa seriesseries ofof teststests withwith increasingincreasing complexity.complexity.

Test 1: Scissor Scissor test is analogous to clipping If (fragment lies in fixed rectangle) operation in fragment space instead of let it pass else discard it vertex space.

Test 2: Alpha Alpha test is a slightly more general If( fragment.a >= ) conditional. Why ? let it pass else discard it. PerPer--fragmentfragment operationsoperations

► StencilStencil test:test: S(xS(x,, y)y) isis stencilstencil bufferbuffer valuevalue forfor fragmentfragment withwith coordinatescoordinates ((x,yx,y)) ► IfIf f(S(x,yf(S(x,y)),)), letlet pixelpixel passpass elseelse killkill it.it. UpdateUpdate S(xS(x,, y)y) conditionallyconditionally dependingdepending onon f(S(x,yf(S(x,y)))) andand g(D(x,yg(D(x,y)).)). ► DepthDepth test:test: D(xD(x,, y)y) isis depthdepth bufferbuffer value.value. ► IfIf g(D(x,yg(D(x,y)))) letlet pixelpixel passpass elseelse killkill it.it. UpdateUpdate D(x,yD(x,y)) conditionally.conditionally. PerPer--fragmentfragment operationsoperations

► StencilStencil andand depthdepth teststests areare moremore generalgeneral conditionals.conditionals. WhyWhy ?? ► TheseThese areare thethe onlyonly teststests thatthat cancan changechange thethe statestate ofof internalinternal storagestorage (stencil(,buffer, depthdepth buffer).buffer). ► OneOne ofof thethe updateupdate operationsoperations forfor thethe stencilstencil bufferbuffer isis aa ““countcount”” operation.operation. RememberRemember this!this! ► Unfortunately,Unfortunately, stencilstencil andand depthdepth buffersbuffers havehave lowerlower precisionprecision (8,(8, 2424 bitsbits respresp.).) PostPost--processingprocessing

►►Blending:Blending: pixelspixels areare accumulatedaccumulated intointo finalfinal framebufferframebuffer storagestorage newnew--valval == oldold--valval opop pixelpixel--valuevalue IfIf opop isis +,+, wewe cancan sumsum allall thethe (say)(say) redred componentscomponents ofof pixelspixels thatthat passpass allall tests.tests. Problem:Problem: InIn generation<=generation<= IV,IV, blendingblending cancan onlyonly bebe donedone inin 88--bitbit channelschannels (the(the channelschannels sentsent toto thethe videovideo card);card); precisionprecision isis limited.limited.

We could use accumulation buffers, but they are very slow. QuickQuick Review:Review: BuffersBuffers

►►ColorColor BuffersBuffers ƒ Front-left ƒ Front-right ƒ Back-left ƒ Back-right ►►DepthDepth BufferBuffer (z(z--buffer)buffer) ►►StencilStencil BufferBuffer ►►AccumulationAccumulation BufferBuffer QuickQuick Review:Review: TestsTests

► Scissor Test If(fragment exists inside rectangle) keep Else delete ► Alpha Test – Compare fragment’s alpha value against reference value ► Stencil Test – Compare fragment against stencil map ► Depth Test – Compare a fragment’s depth to the depth value already present in the depth buffer ƒ Never ƒ Always ƒ Less ƒ Less-Equal ƒ Greater-Equal ƒ Greater ƒ Not-Equal ReadbackReadback == FeedbackFeedback

WhatWhat isis thethe outputoutput ofof aa ““computationcomputation”” ?? 1. DisplayDisplay onon screen.screen. 2. RenderRender toto bufferbuffer andand retrieveretrieve valuesvalues ((readbackreadback)) Readbacks are VERY slow ! Readbacks are VERY slow ! What options do we have ? 1. Render to off-screen buffers PCI and AGP buses are asymmetric: DMA like accumulation buffer enables fast transfer TO graphics card. Reverse transfer has traditionally not 2. Copy from to been required, and is much slower. texture memory ? PCIe is symmetric but still very slow 3. Render directly to a texture ? compared to GPU speeds. This motivates idea of “pass” being an atomic “unit cost” operation. TimeTime forfor aa puzzlepuzzle…… AnAn Example:Example: VoronoiVoronoi Diagrams.Diagrams. DefinitionDefinition

►►YouYou areare givengiven nn sitessites (p(p1,, pp2,, pp3,, …… ppn)) inin thethe planeplane (think(think ofof eacheach sitesite asas havinghaving aa color)color) ►►ForFor anyany pointpoint pp inin thethe ,plane, itit isis closestclosest toto

somesome sitesite ppj.. ColorColor pp withwith colorcolor i.i. ►►ComputeCompute thisthis coloredcolored mapmap onon thethe plane.plane. InIn otherother words,words, ComputeCompute thethe nearestnearest thethe sites.sites. --neighbourneighbour diagramdiagram ofof ExampleExample

So how do we do this on the graphics card? Note, this does not use any programmable features of the card Hint:Hint: ThinkThink inin oneone dimensiondimension higherhigher

The lower envelope of “cones” centered at the points is the Voronoi diagram of this set of points. TheThe ProcedureProcedure

►►InIn orderorder toto computecompute thethe lowerlower envelope,envelope, wewe needneed toto determine,determine, atat eacheach pixel,pixel, thethe fragmentfragment havinghaving thethe smallestsmallest depthdepth value.value. ►►ThisThis cancan bebe donedone withwith aa simplesimple depthdepth test.test. ƒƒ AllowAllow aa fragmentfragment toto passpass onlyonly ifif itit isis smallersmaller thanthan thethe currentcurrent depthdepth bufferbuffer value,value, andand updateupdate thethe bufferbuffer accordingly.accordingly. ►►TheThe fragmentfragment thatthat survivessurvives hashas thethe correctcorrect color.color. LetLet’’ss makemake thisthis moremore complicatedcomplicated

►►TheThe 11--medianmedian ofof aa setset ofof sitessites isis aa pointpoint q*q* thatthat minimizesminimizes thethe sumsum ofof distancesdistances fromfrom allall sitessites toto itself.itself. q*q* == argarg minmin ΣΣ d(pd(p,, q)q)

WRONG ! RIGHT ! AA FirstFirst StepStep

CanCan wewe compute,compute, forfor eacheach pixelpixel q,q, thethe valuevalue F(qF(q)) == ΣΣ d(pd(p,, q)q)

WeWe cancan useuse thethe conecone tricktrick fromfrom before,before, andand insteadinstead ofof computingcomputing thethe minimumminimum depthdepth value,value, computecompute thethe sumsum ofof allall depthdepth valuesvalues usingusing blending.blending.

WhatWhat’’ss thethe catchcatch ?? WeWe cancan’’tt blendblend depthdepth valuesvalues !!

► UsingUsing texturetexture interpolationinterpolation helpshelps here.here. ► InsteadInstead ofof drawingdrawing aa singlesingle cone,cone, wewe drawdraw aa shadedshaded cone,cone, withwith anan appropriatelyappropriately constructedconstructed texturetexture map.map. ► Then,Then, fragmentfragment havinghaving depthdepth zz hashas colorcolor componentcomponent 1.01.0 ** z.z. ► NowNow wewe cancan blendblend thethe colors.colors. ► OpenGLOpenGL hashas anan aggregationaggregation operatoroperator thatthat willwill returnreturn thethe overalloverall minmin

Warning:Warning: wewe areare ignoringignoring issuesissues ofof precision.precision. NowNow wewe applyapply aa streamingstreaming perspectiveperspective…… TwoTwo kindskinds ofof datadata

► Stream data (data ► “Persistent” data associated with vertices (associated with buffers). and fragments) ƒ Depth, stencil, textures. ƒ Color/position/texture ► Can be modifed by coordinates. multiple fragments in a ƒ Functionally similar to single pass. member variables in a C++ object. ► Functionally similar to a ƒ Can be used for limited global array BUT each message passing: I modify fragment only gets one an object state and send it location to change. to you. ► Can be used to communicate across passes. WhoWho hashas accessaccess ??

► Memory “connectivity” in the graphics use of a GPU is tricky. ► In a traditional C program, all global variables can be written by all routines. ► In the fixed-function pipeline, certain data is private. ƒ A fragment cannot change a depth or stencil value of a location different from its own. ƒ The framebuffer can be copied to a texture; a depth buffer cannot be copied in this way, and neither can a stencil buffer. ƒ Only a stencil buffer can count (efficiently) ► In the fixed-function pipeline, depth and stencil buffers can be used in a multi-pass computation only via readbacks. ► A texture cannot be written directly. ► In programmable GPUs, the memory connectivity becomes more open, but there are still constraints.

Understanding access constraints and memory “connectivity” is a key step in programming the GPU. HowHow doesdoes thisthis relaterelate toto streamstream programsprograms ??

► TheThe mostmost importantimportant questionquestion toto askask whenwhen programmingprogramming thethe GPUGPU is:is: WhatWhat cancan II dodo inin oneone passpass ?? ► LimitationsLimitations onon memorymemory connectivityconnectivity meanmean thatthat aa stepstep inin aa computationcomputation maymay oftenoften havehave toto bebe deferreddeferred toto aa newnew pass.pass. ► ForFor example,example, whenwhen computingcomputing thethe secondsecond smallestsmallest element,element, wewe couldcould notnot storestore thethe currentcurrent minimumminimum inin read/writeread/write memory.memory. ► Thus,Thus, thethe ““communicationcommunication”” ofof thisthis valuevalue hashas toto happenhappen acrossacross aa pass.pass. GraphicsGraphics pipelinepipeline

3D API Commands 3D3D API: API: 3D3D OpenGLOpenGL or or ApplicationApplication Direct3DDirect3D OrOr Game Game

Data Stream CPU-GPU Boundary Command & GPU

Vertex Pixel Index Assembled Pixel Primitives Location Updates Stream Stream Rasterization GPU Primitive Raster Frame GPU Primitive and Frame Front End Assembly Operations Buffer Front End Assembly Interpolation Buffer

Vertex pipeline Fragment pipeline