Evolution of the Programmable Graphics Pipeline

Course Roadmap Evolution of the Programmable Graphics ►Graphics Pipeline (GLSL) ►GPGPU (GLSL) Pipeline Briefly ►GPU Computing (CUDA, OpenCL) Lecture 2 ►Choose your own adventure Original Slides by: Suresh Venkatasubramanian Choose your own adventure Updates by Joseph Kider Student Presentation Final Project ►Goal: Prepare you for your presentation and project Lecture Outline The evolution of the pipeline ► A historical perspective on the graphics pipeline Elements of the graphics pipeline: Parameters controlling design of the pipeline: Dimensions of innovation. 1. A scene description: vertices, triangles, colors, lighting 1. Where is the boundary Where we are today between CPU and GPU ? 2. Transformations that map the Fixed-function vs programmable pipelines scene to a camera viewpoint 2. What transfer method is used ? ► A closer look at the fixed function pipeline 3. “Effects”: texturing, shadow 3. What resources are provided Walk thru the sequence of operations mapping, lighting calculations at each step ? Reinterpret these as stream operations 4. Rasterizing: converting geometry 4. What units can access which into pixels GPU memory elements ? ► We can program the fixed-function pipeline ! 5. Pixel processing: depth tests, Some examples stencil tests, and other per-pixel ► What constitutes data and memory, and how operations. access affects program design. Generation I: 3dfx Voodoo (1996) Aside: Mario Kart 64 • One of the first true 3D game cards ►High fragment load / low vertex load • Worked by supplementing standard 2D video card. • Did not do vertex transformations: these were done in the CPU •Did dotexture mapping, z-buffering. http://accelenation.com/?ac.id.123.2 Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer CPU GPU PCI Image from: http://www.gamespot.com/users/my_shoe/ Aside: Mario Kart Wii Generation II: GeForce/Radeon 7500 (1998) • Main innovation: shifting the High fragment load / low vertex load? transformation and lighting calculations to the GPU • Allowed multi-texturing: giving bump maps, light maps, and others.. • Faster AGP bus instead of PCI http://accelenation.com/?ac.id.123.5 Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer GPU AGP Image from: http://wii.ign.com/dor/objects/949580/mario-kart-wii/images/ Generation III: GeForce3/Radeon 8500(2001) Generation IV: Radeon 9700/GeForce FX (2002) • For the first time, allowed limited • This generation is the first generation amount of programmability in the of fully-programmable graphics cards vertex pipeline • Different versions have different • Also allowed volume texturing and resource limits on fragment/vertex multi-sampling (for antialiasing) programs http://accelenation.com/?ac.id.123.7 http://accelenation.com/?ac.id.123.8 Rasterization Rasterization Vertex Primitive Raster Frame Vertex Primitive Raster Frame Vertex Primitive and Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer Transforms Assembly Interpolation Buffer GPU AGP AGP ProgrammableProgrammable SmallSmall vertex vertex ProgrammableProgrammable FragmentFragment shadersshaders VertexVertex shader shader ProcessorProcessor Texture Memory Generation IV.V: GeForce6/X800 (2004) NVIDIA NV40 Architecture ► Simultaneous rendering to multiple buffers 6 vertex ► True conditionals and loops shader units ► PCIe bus ► Vertex texture fetch Vertex Texture Fetch Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer PCIe ProgrammableProgrammable ProgrammableProgrammable FragmentFragment 16 fragment VertexVertex shader shader ProcessorProcessor shader units Texture Memory Texture Memory Slide adapted from Suresh Venkatasubramanian and Joe Kider Image from GPU Gems 2: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter30.html D3D 10 Pipeline Generation IV.V: GeForce6/X800 (2004) Not exactly a quantum leap, but… ► Simultaneous rendering to multiple buffers ► True conditionals and loops ► Higher precision throughput in the pipeline (64 bits end-to-end, compared to 32 bits earlier.) ► PCIe bus ► More memory/program length/texture accesses ► Texture access by vertex shader Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer AGP ProgrammableProgrammable ProgrammableProgrammable FragmentFragment VertexVertex shader shader ProcessorProcessor Texture Memory Texture Memory Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf Fixed-function pipeline Generation V: GeForce8800/HD2900 (2006) 3D API Commands 3D3D API: API: 3D3D Complete quantum leap OpenGLOpenGL or or ApplicationApplication Direct3D Or Game ► Ground-up rewrite of GPU Direct3D Or Game CPU-GPU Boundary (AGP/PCIe) Data Stream ► Support for DirectX 10, and all & Command it implies (more on this later) GPU ► Geometry Shader Vertex Pixel ► Support for General GPU Index Assembled Pixel programming Primitives Location Updates Stream Stream ► Shared Memory (NVIDIA only) Rasterization GPU Primitive Raster Frame GPU Primitive and Frame Front End Assembly Operations Buffer Front End Assembly Interpolation Buffer Input Programmable ProgrammableProgrammable Pre-transformed Pre-transformed Input ProgrammableProgrammable Raster Assembler Geometry PixelPixel Fragments Assembler VertexVertex shader shader Operations Vertices Shader ShaderShader ProgrammableProgrammable Vertices ProgrammableProgrammable AGP Fragments Transformed VertexVertex FragmentFragment Transformed Output ProcessorProcessor ProcessorProcessor Merger Geometry Shaders: Point Sprites Geometry Shaders: Point Sprites Geometry Shaders NVIDIA G80 Architecture Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf NVIDIA G80 Architecture Why Unify Shader Processors? Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf Why Unify Shader Processors? Unified Shader Processors Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf Terminology Shader Capabilities Shader Direct3D OpenGL Video card Model Example NVIDIA GeForce 6800 292.xATI Radeon X800 NVIDIA GeForce 8800 3 10.x 3.x ATI Radeon HD 2900 NVIDIA GeForce GTX 480 4 11.x 4.x ATI Radeon HD 5870 Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/ Shader Capabilities Evolution of the Programmable Graphics Pipeline Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/ Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf Evolution of the Programmable Evolution of the Programmable Graphics Pipeline Graphics Pipeline ►Not covered today: SM 5 / D3D 11 / GL 4 Tessellation shaders ►*cough* student presentation *cough* Later this semester: NVIDIA Fermi ►Dual warp scheduler ►Configurable L1 / shared memory ►Double precision ►… Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf New Tool: AMD System Monitor ►Released 01/04/2011 A closer look at the ► http://support.amd.com/us/kbarticles/Pages/AMDSystemMonitor.aspx A closer look at the http://support.amd.com/us/kbarticles/Pages/AMDSystemMonitor.aspx fixed-function pipeline Pipeline Input ModelView Transformation Vertex Image F(x,y) = (r,g,b,a) ►Vertices mapped from object space to world (x, y, z) space (r, g, b,a) ►M = model transformation (scene) (Nx,Ny,Nz) (tx, ty,[tz]) ►V = view transformation (camera) Each matrix transform (tx, ty) X’ X is applied to each vertex in the input Y (tx, ty) Y’ stream. Think of this Material Z’ M * V * Z as a kernel operator. properties* W’ 1 Lighting Clipping/Projection/Viewport(3D) Lighting information is combined with ►More matrix transformations that operate on normals and other parameters at each a vertex to transform it into the viewport vertex in order to create new colors. space. Color(v) = emissive + ambient + diffuse + ►Note that a vertex may be eliminated from specular the input stream (if it is clipped). Each term in the right hand side is a function of ►The viewport is two-dimensional: however, the vertex color, position, normal and material vertex z-value is retained for depth testing. properties. Clip test is first example of a conditional in the pipeline. However, it is not a fully general conditional. Why ? Rasterizing+Interpolation Per-fragment operations ►All primitives are now converted to ►The rasterizer produces a stream of fragments. fragments. ►Data type change ! Vertices to fragments ►Each fragment undergoes a series of tests with increasing complexity. Texture coordinates are interpolated from Fragment attributes: Test 1: Scissor texture coordinates of vertices. Scissor test is analogous to clipping (r,g,b,a) If (fragment lies in fixed rectangle) operation in fragment space instead of This gives us a linear interpolation operator let it pass else discard it vertex space. (x,y,z,w)

Load more