Directx and GPU (Nvidia-Centric) History Why

10/12/09 DirectX and GPU (Nvidia-centric) History DirectX 6 DirectX 7! ! DirectX 8! DirectX 9! DirectX 9.0c! Multitexturing! T&L ! SM 1.x! SM 2.0! SM 3.0! DirectX 5! Riva TNT GeForce 256 ! ! GeForce3! GeForceFX! GeForce 6! Riva 128! (NV4) (NV10) (NV20) Cg! (NV30) (NV40) XNA and Programmable Shaders DirectX 2! 1996! 1998! 1999! 2000! 2001! 2002! 2003! 2004! DirectX 10! Prof. Hsien-Hsin Sean Lee SM 4.0! GTX200 NVidia’s Unified Shader Model! Dualx1.4 billion GeForce 8 School of Electrical and Computer 3dfx’s response 3dfx demise ! Transistors first to Voodoo2 (G80) ~1.2GHz Engineering Voodoo chip GeForce 9! Georgia Institute of Technology 2006! 2008! GT 200 2009! ! GT 300! Adapted from David Kirk’s slide Why Programmable Shaders Evolution of Graphics Processing Units • Hardwired pipeline • Pre-GPU – Video controller – Produces limited effects – Dumb frame buffer – Effects look the same • First generation GPU – PCI bus – Gamers want unique look-n-feel – Rasterization done on GPU – Multi-texturing somewhat alleviates this, but not enough – ATI Rage, Nvidia TNT2, 3dfx Voodoo3 (‘96) • Second generation GPU – Less interoperable, less portable – AGP – Include T&L into GPU (D3D 7.0) • Programmable Shaders – Nvidia GeForce 256 (NV10), ATI Radeon 7500, S3 Savage3D (’98) – Vertex Shader • Third generation GPU – Programmable vertex and fragment shaders (D3D 8.0, SM1.0) – Pixel or Fragment Shader – Nvidia GeForce3, ATI Radeon 8500, Microsoft Xbox (’01) – Starting from DX 8.0 (assembly) • Fourth generation GPU – Programmable vertex and fragment shaders (D3D 9.0, SM2.0) – DX 9.0 added HLSL (High Level Shading Language) – Nvidia GeForce FX, ATI Radeon 9700 (’02) – HLSL (MS) is compatible with Cg (Nvidia) • Current generation GPU – Include geometry shader (SM 4.0 and DX10) – Nvidia G80 and up 1 10/12/09 Programmable Graphics Pipeline Graphics Programmable Pipeline 3D Apps Fixed Function Pipeline FixFunc HW T&L Pixel API commands Culling Blend Vertices Rasterization Clipping Mask 3D API: Vertex Pixel Direct3D NVidia GeForce FX Shader Shader GPU cmd & data stream DirectX 8.0 Pipeline Assembled Pixel Pixel Choice between programmable and fixed function pipeline Vtx index polygons location updates (mutually exclusive, parallel pipelines) GPU Primitive Rasterization Raster Frame Frontend Assembly & Interpolation Operations Buffer Input Vertex Geometry Pixel Output Transformed Transformed Rasterization vertices Fragments Assembler Shader Shader Shader Merger Programmable Programmable DirectX 10.0 Pipeline, Fully Programmable Vertex Shader Fragment Shader (No Fixed-Function Pipeline) Source: Cg tutorial XNA Rendering Pipeline Shader Languages : Data supply Vertex Data : Fixed operations • HLSL/Cg most common (Model Space) : Programmable – Both are compatible Vertex Pixel Rasterization Processing Processing – No assembly shaders allowed in DX 10.0. Other Memory Resource Output (Texture, Constants, etc) Merger • Other alternatives: Final image – GLSL • Vertex shader outputs transformed vertex position, texture coordinates, – Legacy DirectX shaders in assembly color, etc. – Solid deforming, skeletal animation, particle motion, etc. – Sh • Rasterization interpolates and determines what pixels to draw • Pixel shader outputs pixel color, depth (optional) – OpenVidia (U of Toronto) – Per-pixel lighting, procedural texture generation, postprocessing effects, brightness, contrast, blur, etc. Adapted from XNA 2.0 Game Programming Book 2 10/12/09 Motivation for Cg The Cg Tutorial Book (Nvidia) float3 cSpecular = pow(max(0, dot(Nf, H)), phongExp).xxx; float3 cPlastic = Cd * (cAmbient + cDiffuse) + Cs * cSpecular; From Cg tutorial From Cg tutorial Basic Shader Mechanics Vertex Shader • Data types: • Transform to clip-space (i.e., screen space) – Typically floats, and vectors/matrices of floats • Inputs: – Fixed size arrays – Three main types: – Common inputs: • Per-instance data, e.g., per-vertex position • Vertex position (x, y, z, w) • Per-pixel interpolated data, e.g., texture coordinates • Texture coordinate • Per-batch data, e.g., light position • Constant inputs – Data are tightly bound to the GPU • Can also have fog, color as input, but usually leaves – Flow control is very simple: them untouched for pixel shader • No recursion • Fixed size loops for v_2_0 or earlier – Output to Pixel (fragment) shader • Simple if-then-else statements allowed in the latest APIs • Vertex shader is executed once per vertex, • Texkill (asm) or clip (HLSL) or discard (GLSL) allows you to abort a write to a pixel (form of flow control) could be less expensive than pixel shader 3 10/12/09 Vertex Shader (3.0) Pixel (or Fragment) Shader Vertex stream • Determine each fragment’s color ) 6 32 Temporary registers Temporary 32 – custom (sophisticated) pixel operations r0 v0 v1 v2 v15 25 C0 – texture sampling 16 Vertex data registers r1 • Inputs C1 at least r2 – Interpolated output from vertex shader C2 – Typically vertex position, vertex normals, texture coordinates, etc. registers( – These registers could be reused for other purposes (thus called r31 Vertex GPGPU) float Shader • Output aL a0 16 Constant Integer Registers Cn – Color (including alpha) Loop Address Constant Register Register – Depth value (optional) 12 output registers • Executed once per pixel so is executed a lot more times than oPos oTn oFog oD0oD1 oPts vertex shader typically position texture fog Diff. color Output – It is advantageous to compute stuff on a per-vertex basis to improve Spec. color Pt size performance Each register is a 4-component vector register except aL Pixel Shader (3.0) Use of the Vertex Shader Pixel stream • Transform vertices to clip-space C0 ) v0 v1 v9 C1 • Pass normal, texture coordinates to PS Color (diff/spec) and • Transform vectors to other spaces (e.g., texture coord. registers Temporary registers Temporary 16 INT, 224 Float 16 INT, texture space) Constant registers ( r0 Cn • Calculate per-vertex lighting (e.g., Gouraud r1 s0 Pixel shading) Shader s1 r31 • Distort geometry (waves, fish-eye camera) s15 Sampler Registers (Up to 16texturesurfaces oC0 oDepth can be readin asingle pass) color Depth Adapted from Mart Slot’s presentation 4 10/12/09 Use of the Pixel Shader HLSL / Cg • Texturing objects • Compatible, jointly developed by Microsoft and Nvidia • Per-pixel lighting (e.g., Phong shading) • A C-like language and syntax • Normal mapping (each pixel has its own • DX 10 will discontinue assembly shader normal) • Shadows (determine whether a pixel is • But do not have shadowed or not) – Pointers • Environment mapping – Dynamic memory allocation – Unstructured/complex control structure • e.g., goto • Recursion (note that functions are inlined) – Bitwise operations (may have in the future) Adapted from Mart Slot’s presentation A Simple Vertex Shader Passed from A Simple Vertex Shader (Alternative) C# apps uniform extern float4x4 gWVP; uniform extern float4x4 gWVP; Semantics struct VtxOutput { float4 position : POSITION; void All_greenVS(float2 position : POSITION, float4 color : COLOR; Reserved }; data type out float4 oPosition : POSITION, out float4 oColor : COLOR) VtxOutput All_greenVS(float2 position : POSITION) { { Input to VtxOutput OUT; Vertex Shader oPosition = mul(float4(position, -30.0f, 1.0f), gWVP); oColor = float4(0, 1, 0, 1); OUT.position = mul(float4(position, -30.0f, 1.0f), gWVP); OUT.color = float4(0, 1, 0, 1); } return OUT; } Structure passed to No structure declaration fragment shader Adapted from Cg Tutorial 5 .

Directx and GPU (Nvidia-Centric) History Why

GPU Architecture • Display Controller • Designing for Safety • Vision Processing

MSI Afterburner V4.6.4

Graphics Shaders Mike Hergaarden January 2011, VU Amsterdam

CPU-GPU Benchmark Description

Model System of Zirconium Oxide An

ATI Radeon Driver for Plan 9 Implementing R600 Support

04-Prog-On-Gpu-Schaber.Pdf

1 Títol: Aceleración Con CUDA De Procesos 3D Volum

HP Z800 Workstation Overview

Nvidia's GPU Microarchitectures

Evaluating ATTILA, a Cycle-Accurate GPU Simulator

A Comparison of Modern GPU and CPU Architectures: and the Common Convergence of Both