10/12/09

DirectX and GPU (-centric) History

DirectX 6 DirectX 7! ! DirectX 8! DirectX 9! DirectX 9.0c! Multitexturing! T&L ! SM 1.x! SM 2.0! SM 3.0! DirectX 5! Riva TNT GeForce 256 ! ! GeForce3! GeForceFX! GeForce 6! Riva 128! (NV4) (NV10) (NV20) Cg! (NV30) (NV40)

XNA and Programmable DirectX 2!

1996! 1998! 1999! 2000! 2001! 2002! 2003! 2004! DirectX 10! Prof. Hsien-Hsin Sean Lee SM 4.0! GTX200 NVidia’s Unified Model! Dualx1.4 billion GeForce 8 School of Electrical and Computer 3dfx’s response 3dfx demise ! Transistors first to (G80) ~1.2GHz Engineering Voodoo chip GeForce 9! Georgia Institute of Technology 2006! 2008! GT 200 2009! ! GT 300! Adapted from David Kirk’s slide

Why Programmable Shaders Evolution of Graphics Processing Units • Hardwired pipeline • Pre-GPU – Video controller – Produces limited effects – Dumb frame buffer – Effects look the same • First generation GPU – PCI bus – Gamers want unique look-n-feel – Rasterization done on GPU – Multi-texturing somewhat alleviates this, but not enough – ATI Rage, Nvidia TNT2, 3dfx (‘96) • Second generation GPU – Less interoperable, less portable – AGP – Include T&L into GPU (D3D 7.0) • Programmable Shaders – Nvidia GeForce 256 (NV10), ATI 7500, S3 Savage3D (’98) – Vertex Shader • Third generation GPU – Programmable vertex and fragment shaders (D3D 8.0, SM1.0) – Pixel or Fragment Shader – Nvidia GeForce3, ATI Radeon 8500, Microsoft Xbox (’01) – Starting from DX 8.0 (assembly) • Fourth generation GPU – Programmable vertex and fragment shaders (D3D 9.0, SM2.0) – DX 9.0 added HLSL (High Level Language) – Nvidia GeForce FX, ATI Radeon 9700 (’02) – HLSL (MS) is compatible with Cg (Nvidia) • Current generation GPU – Include geometry shader (SM 4.0 and DX10) – Nvidia G80 and up

1 10/12/09

Programmable Graphics Programmable Pipeline

3D Apps Fixed Function Pipeline FixFunc HW T&L Pixel API commands Culling Blend Vertices Rasterization Clipping Mask 3D API: Vertex Pixel NVidia GeForce FX Shader Shader GPU cmd & data stream DirectX 8.0 Pipeline Assembled Pixel Pixel Choice between programmable and fixed function pipeline Vtx index polygons location updates (mutually exclusive, parallel pipelines) GPU Primitive Rasterization Raster Frame Frontend Assembly & Interpolation Operations Buffer

Input Vertex Geometry Pixel Output Transformed Transformed Rasterization vertices Fragments Assembler Shader Shader Shader Merger

Programmable Programmable DirectX 10.0 Pipeline, Fully Programmable Vertex Shader Fragment Shader (No Fixed-Function Pipeline)

Source: Cg tutorial

XNA Rendering Pipeline Shader Languages : Data supply Vertex Data : Fixed operations • HLSL/Cg most common (Model Space) : Programmable – Both are compatible Vertex Pixel Rasterization Processing Processing – No assembly shaders allowed in DX 10.0.

Other Memory Resource Output (Texture, Constants, etc) Merger • Other alternatives: Final image – GLSL • Vertex shader outputs transformed vertex position, texture coordinates, – Legacy DirectX shaders in assembly color, etc. – Solid deforming, skeletal animation, particle motion, etc. – Sh • Rasterization interpolates and determines what pixels to draw • Pixel shader outputs pixel color, depth (optional) – OpenVidia (U of Toronto) – Per-pixel lighting, procedural texture generation, postprocessing effects, brightness, contrast, blur, etc.

Adapted from XNA 2.0 Game Programming Book

2 10/12/09

Motivation for Cg The Cg Tutorial Book (Nvidia)

float3 cSpecular = pow(max(0, dot(Nf, H)), phongExp).xxx; float3 cPlastic = Cd * (cAmbient + cDiffuse) + Cs * cSpecular;

From Cg tutorial From Cg tutorial

Basic Shader Mechanics Vertex Shader • Data types: • Transform to clip-space (i.e., screen space) – Typically floats, and vectors/matrices of floats • Inputs: – Fixed size arrays – Three main types: – Common inputs: • Per-instance data, e.g., per-vertex position • Vertex position (x, y, z, w) • Per-pixel interpolated data, e.g., texture coordinates • Texture coordinate • Per-batch data, e.g., light position • Constant inputs – Data are tightly bound to the GPU • Can also have fog, color as input, but usually leaves – Flow control is very simple: them untouched for pixel shader • No recursion • Fixed size loops for v_2_0 or earlier – Output to Pixel (fragment) shader • Simple if-then-else statements allowed in the latest APIs • Vertex shader is executed once per vertex, • Texkill (asm) or clip (HLSL) or discard (GLSL) allows you to abort a write to a pixel (form of flow control) could be less expensive than pixel shader

3 10/12/09

Vertex Shader (3.0) Pixel (or Fragment) Shader Vertex stream • Determine each fragment’s color

) 6

32 32 Temporary registers – custom (sophisticated) pixel operations

r0 v0 v1 v2 v15 25 C0 – texture sampling 16 Vertex data registers r1 • Inputs

C1 leastat r2 – Interpolated output from vertex shader C2 – Typically vertex position, vertex normals, texture coordinates, etc.

registers( – These registers could be reused for other purposes (thus called r31 Vertex GPGPU) float Shader • Output

aL a0 16Constant Integer Registers Cn – Color (including alpha) Loop Address Constant Register Register – Depth value (optional) 12 output registers • Executed once per pixel so is executed a lot more times than oPos oTn oFog oD0oD1 oPts vertex shader typically position texture fog Diff. color Output – It is advantageous to compute stuff on a per-vertex basis to improve Spec. color Pt size performance Each register is a 4-component vector register except aL

Pixel Shader (3.0) Use of the Vertex Shader Pixel stream • Transform vertices to clip-space C0 )

v0 v1 v9 C1 • Pass normal, texture coordinates to PS Color (diff/spec) and • Transform vectors to other spaces (e.g., texture coord. registers Temporary registers

16 INT, 224 Float 16 INT, texture space) Constantregisters ( r0 Cn

• Calculate per-vertex lighting (e.g., Gouraud r1 s0 Pixel shading) Shader s1 r31 • Distort geometry (waves, fish-eye camera)

s15 SamplerRegisters (Up16to texturesurfaces

oC0 oDepth canbe read in asingle pass) color Depth

Adapted from Mart Slot’s presentation

4 10/12/09

Use of the Pixel Shader HLSL / Cg • Texturing objects • Compatible, jointly developed by Microsoft and Nvidia • Per-pixel lighting (e.g., Phong shading) • A C-like language and syntax • Normal mapping (each pixel has its own • DX 10 will discontinue assembly shader normal) • Shadows (determine whether a pixel is • But do not have shadowed or not) – Pointers • Environment mapping – Dynamic memory allocation – Unstructured/complex control structure • e.g., goto • Recursion (note that functions are inlined) – Bitwise operations (may have in the future)

Adapted from Mart Slot’s presentation

A Simple Vertex Shader Passed from A Simple Vertex Shader (Alternative) C# apps uniform extern float4x4 gWVP; uniform extern float4x4 gWVP; Semantics struct VtxOutput { float4 position : POSITION; void All_greenVS(float2 position : POSITION, float4 color : COLOR; Reserved }; data type out float4 oPosition : POSITION, out float4 oColor : COLOR) VtxOutput All_greenVS(float2 position : POSITION) { { Input to VtxOutput OUT; Vertex Shader oPosition = mul(float4(position, -30.0f, 1.0f), gWVP); oColor = float4(0, 1, 0, 1); OUT.position = mul(float4(position, -30.0f, 1.0f), gWVP); OUT.color = float4(0, 1, 0, 1); }

return OUT; } Structure passed to No structure declaration fragment shader

Adapted from Cg Tutorial

5