Overview of the Tutorial: Morning Tutorial 5: Programming Graphics Hardware
8:30 Introduction to the Hardware Graphics Pipeline Introduction to Cyril Zeller
the Hardware Graphics Pipeline 9:30 Controlling the GPU from the CPU: the 3D API Cyril Zeller Cyril Zeller 10:15 Break 10:45 Programming the GPU: High-level Shading Languages Randy Fernando
12:00 Lunch
Tutorial 5: Programming Graphics Hardware
Overview of the Tutorial: Afternoon Overview
Concepts: 12:00 Lunch Real-time rendering 14:00 Optimizing the Graphics Pipeline Hardware graphics pipeline Matthias Wloka Evolution of the PC hardware graphics pipeline: 14:45 Advanced Rendering Techniques 1995-1998: Texture mapping and z-buffer Matthias Wloka 1998: Multitexturing 15:45 Break 1999-2000: Transform and lighting 16:15 General-Purpose Computation Using Graphics Hardware 2001: Programmable vertex shader Mark Harris 2002-2003: Programmable pixel shader 2004: Shader model 3.0 and 64-bit color support 17:30 End PC graphics software architecture Performance numbers
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Real-Time Rendering Hardware Graphics Pipeline
Graphics hardware enables real-time rendering Application Geometry Rasterization 3D Triangles 2D Triangles Pixels Real-time means display rate at more than 10 images per second Stage Stage Stage
For each For each triangle: triangle vertex: Rasterize Transform triangle 3D position Interpolate into vertex attributes screen position across triangle
Compute Shade attributes pixels
3D Scene = Image = Resolve visibility Collection of Array of pixels 3D primitives (triangles, lines, points)
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
1 PC Architecture 1995-1998: Texture Mapping and Z-Buffer
CPU GPU Motherboard Application / Geometry Stage Rasterization Stage Raster Texture Central Processor Unit (CPU) Rasterizer Operations Unit Unit System Memory
Bus Port (PCI, AGP, PCIe) 2D Triangles Bus Frame 2D Triangles Textures Textures (PCI) Buffer Video Memory System Memory Video Memory Graphics Processor Unit (GPU) PCI: Peripheral Component Interconnect 3dfx’s Voodoo Video Board
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Texture Mapping Texture Mapping: Texture Coordinates Interpolation
Triangle Meshtextured with Base Texture Screen Space Texture Space
x u (x1, y1) + y v (u1, v1)
(x, y) (x0, y0)
(u, v) (x2, y2) (u0, v0)
= (u2, v2)
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Texture Mapping: Perspective-Correct Interpolation Texture Mapping: Magnification
Screen Space Texture Space
x u y v
Perspective-Incorrect Perspective-Correct
or
Nearest-Point Sampling Bilinear Filtering
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
2 Texture Mapping: Minification Texture Mapping: Mipmapping
Screen Space Texture Space
x u y v
or
Bilinear Filtering Trilinear Filtering
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Texture Mapping: Anisotropic Filtering Texture Mapping: Addressing Modes
or
Wrap Mirror
Bilinear Filtering Trilinear Filtering
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Texture Mapping: Addressing Modes Raster Operations Unit (ROP)
Raster Operations stencil buffer value Unit at (x, y) tested against Texture Rasterizer Fragments Scissor Test reference value Unit
Fragment Alpha Test tested against Frame Buffer Screen Position (x, y) scissor rectangle Stencil Test Stencil Buffer tested against Alpha Value a reference value Z Test Z-Buffer tested against Depth z z-buffer value at (x, y) Color Buffer (visibility test) Alpha Blending Pixels blended with color buffer value at (x, y): Clamp Border Color (r, g, b) Ksrc * Colorsrc + Kdst * Colorsrc (src = fragment dst = color buffer)
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
3 1998: Multitexturing AGP
CPU GPU PCI uses a parallel connection Application / Geometry Stage Rasterization Stage AGP uses a serial connection Raster Multitexture Rasterizer Operations → Less pins, simpler protocol → Cheaper, more scalable Unit Unit PCI uses a shared-bus protocol AGP uses a point-to-point protocol → Bandwidth is not shared among devices 2D Triangles Bus Frame 2D Triangles Textures Textures (AGP) Buffer AGP uses a dedicated system memory called AGP memory or non-local video memory System Memory Video Memory The GPU can lookup textures that resides in AGP memory AGP: Accelerated Graphics Port NVIDIA’s TNT, ATI’s Rage Bandwidth: AGP = 2 x PCI (AGP2x = 2 x AGP, etc.)
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Multitexturing 1999-2000: Transform and Lighting
Base Texturemodulated by Light Map CPU GPU “Fixed Function Pipeline” Application Geometry Stage Rasterization Stage Stage Raster Transform Register Rasterizer Operations and Combiner Unit Lighting X Unit Texture Unit
3D Triangles
Textures Bus Frame from UT2004 (c) 3D Triangles Textures (AGP) Epic Games Inc. Buffer Used with permission System = Memory Video Memory Register Combiner: Offers many more texture/color combinations NVIDIA’s GeForce 256 and GeForce2, ATI’s Radeon 7500, S3’s Savage3D
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Transform and Lighting Unit (TnL) Bump Mapping
Transform and Lighting Unit Bump mapping is about fetching the normal from a texture (called a normal map) instead of using the interpolated normal to compute Transform lighting at a given pixel
World World View Perspective Model Camera Projection Screen Matrix Space Matrix Division or or Projection or or and Object Eye Matrix Clip Window Model-View Viewport Space Space Space Space Matrix Matrix
Lighting Material Properties
Vertex Light Properties Diffuse and Specular Color Normal Map Vertex Color Diffuse light without bump Diffuse light with bumps
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
4 Cube Texture Mapping Projective Texture Mapping
Cubemap Projected Texture (covering Environment Mapping (x, y, z, w) the six faces z (x (the reflection vector of a cube) ,y,z y (x/w, y/w) Texture Projection ) is used to lookup x the cubemap)
Cubemap lookup Projective Texture lookup (with direction (x, y, z)) Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
2001: Programmable Vertex Shader Vertex Shader
CPU GPU A programming processor for any per-vertex computation
Application Geometry Stage Rasterization Stage void VertexShader( Stage Raster // Input per vertex Register in float4 positionInModelSpace, Vertex Shader Rasterizer Operations (no flow control) (with Z-Cull) Combiner in float2 textureCoordinates, Unit in float3 normal, Texture // Input per batch of triangles uniform float4x4 modelToProjection, Shader uniform float3 lightDirection,
// Output per vertex out float4 positionInProjectionSpace, 3D Triangles out float2 textureCoordinatesOutput, out float3 color Textures Bus Frame ) 3D Triangles Textures { (AGP) Buffer // Vertex transformation System positionInProjectionSpace = mul(modelToProjection, positionInModelSpace);
Memory Video Memory // Texture coordinates copy Z-Cull: Predicts which fragments will fail the Z test and discards them textureCoordinatesOutput = textureCoordinates; Texture Shader: Offers more texture addressing and operations // Vertex color computation color = dot(lightDirection, normal); NVIDIA’s GeForce3 and GeForce4 Ti, ATI’s Radeon 8500 }
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Volume Texture Mapping Hardware Shadow Mapping
Shadow Map Computation z/w The shadow map contains the depth z/w of the 3D points visible (x, y, z, w) from the light’s point of view: Spot (x/w, y/w) light
Shadow Rendering shadow z map A 3D point (x, y, z, w) is in shadow if: value (x,y,z) y z/w < value of shadow map at (x/w, y/w) (x, y, z, w) x Spot Volume Texture Noise Perturbation A hardware shadow map lookup (x/w, y/w) light (3D Noise) returns the value of this comparison between 0 and 1 Volume Texture lookup (with position (x, y, z)) Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
5 Antialiasing: Definition Antialiasing: Supersampling and Multisampling
Supersampling: Pixel Center Aliasing: Undesirable visual artifacts due to Compute color and Z at insufficient sampling of: higher resolution and Sample Primitives (triangles, lines, etc.) → jagged edges display averaged color to smooth out the visual Textures or shaders → pixelation, moiré patterns artifacts Those artifacts are even more noticeable on animated images. Multisampling: Antialiasing: Method to reduce aliasing Same thing except only Z Texture antialiasing is largely handled by proper is computed at higher mipmapping and anisotropic filtering resolution Multisampling performs Shader antialiasing can be tricky (especially with antialiasing on primitive conditionals) edges only
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
2002-2003: Programmable Pixel Shader Pixel Shader
CPU GPU A programming processor for any per-pixel computation Application Geometry Stage Rasterization Stage void PixelShader( Stage Vertex Shader Pixel Shader Raster // Input per pixel Rasterizer in float2 textureCoordinates, (static and dynamic (static Operations (with Z-Cull) in float3 normal, flow control) flow control only) Unit // Input per batch of triangles Texture uniform sampler2D baseTexture, Unit uniform float3 lightDirection,
// Output per pixel out float3 color 3D Triangles ) { Textures Frame // Texture lookup Bus 3D Triangles Textures (AGP) Buffer float3 baseColor = tex2D(baseTexture, textureCoordinates);
System // Light computation Memory Video Memory float light = dot(lightDirection, normal);
MRT: Multiple Render Target // Pixel color computation color = baseColor * light; NVIDIA’s GeForce FX, ATI’s Radeon 9600 to 9800 }
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Shader: Static vs. Dynamic Flow Control 2004: Shader Model 3.0 and 64-Bit Color Support
void Shader( CPU GPU ... Application Geometry Stage Rasterization Stage // Input per vertex or per pixel Stage in float3 normal, Raster Vertex Shader Rasterizer Pixel Shader (static and dynamic (static and dynamic Operations (with Z-Cull) // Input per batch of triangles flow control) flow control) Unit uniform float3 lightDirection, uniform bool computeLight, Texture Static Flow Control Unit (condition varies ... 64-Bit ) Color per batch of triangles) { 3D Triangles ... Textures Frame if (computeLight) { Bus 3D Triangles Textures ... (PCIe) Buffer if (dot(lightDirection, normal)) { System Dynamic Flow Control ... Memory Video Memory } (condition varies ... PCIe: Peripheral Component Interconnect Express per vertex or pixel) } ... NVIDIA’s GeForce 6800 } Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
6 PCIe Shader Model 3.0
Like AGP: Shader Model 3.0 means: Uses a serial connection → Cheap, scalable Longer shaders → More complex shading Uses a point-to-point protocol → No shared bandwidth Pixel shader: Dynamic flow control → Better performance Derivative instructions → Shader antialiasing Unlike AGP: Support for 32-bit floating-point precision → Fewer artifacts General-purpose (not only for graphics) Face register → Faster two-sided lighting Dual-channels: Bandwidth is available in both direction Vertex shader: Texture access → Simulation on GPU, displacement mapping Bandwidth: PCIe = 2 x AGPx8
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
64-Bit Color Support High Dynamic Range Imagery
64-bit color means one 16-bit floating-point value per The dynamic range of a scene is the ratio of the highest channel (R, G, B, A) to the lowest luminance Real-life scenes can have high dynamic ranges of Alpha blending works with 64-bit color buffer several millions (as opposed to 32-bit fixed-point color buffer only) Display and print devices have a low dynamic range of Texture filtering works with 64-bit textures around 100 (as opposed to 32-bit fixed-point textures only) Tone mapping is the process of displaying high dynamic range images on those low dynamic range devices Applications: High dynamic range images use floating-point colors High-precision image compositing OpenEXR is a high dynamic range image format that is compatible with NVIDIA’s 64-bit color format High dynamic range imagery
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
Real-Time Tone Mapping PC Graphics Software Architecture
The image is entirely computed in 64-bit color CPU GPU Vertex Pixel and tone-mapped for display Application Program Program BUS
3D API Commands (OpenGL or DirectX) Vertex Pixel Programs Shader Shader
Geometry Driver (triangles, vertices, normals, etc...) System Memory Video Memory Textures
The application, 3D API and driver are written in C or C++ The vertex and pixel programs are written in a high-level shading language (Cg, DirectX HLSL, OpenGL Shading Language) From low to high exposure image of the same scene Pushbuffer: Contains the commands to be executed on the GPU
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
7 Evolution of Performance The Future
10 000 Mpixels/s Unified general programming model at primitive, vertex and pixel levels 1000 Mvertices/s Scary amounts of: Mtransistors Floating point horsepower 100 Video memory
10 Bandwidth between system and video memory Lower chip costs and power requirements to make 3D graphics hardware ubiquitous: 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Automotive (gaming, navigation, heads-up displays) PCI AGP AGP2x AGP4x AGP8x PCIe (133 MB/s) (266 MB/s) (533 MB/s) (1.06 GB/s) (2.1 GB/s) (4 GB/s) Home (remotes, media center, automation) 4 MB 32 MB64 MB128 MB 256 MB 512 MB
DirectX 8 DirectX 9 Mobile (PDAs, cell phones) OpenGL 1.1 OpenGL 1.2DirectX 6 DirectX 7 OpenGL 1.4 OpenGL 1.3 OpenGL 1.5
Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware
More Information
www.realtimerendering.com developer.nvidia.com
Questions: [email protected]
Tutorial 5: Programming Graphics Hardware
8