<<

Overview of the Tutorial: Morning Tutorial 5: Programming Graphics Hardware

8:30 Introduction to the Hardware Graphics Introduction to Cyril Zeller

the Hardware 9:30 Controlling the GPU from the CPU: the 3D API Cyril Zeller Cyril Zeller 10:15 Break 10:45 Programming the GPU: High-level Languages Randy Fernando

12:00 Lunch

Tutorial 5: Programming Graphics Hardware

Overview of the Tutorial: Afternoon Overview

Concepts: 12:00 Lunch Real-time rendering 14:00 Optimizing the Graphics Pipeline Hardware graphics pipeline Matthias Wloka Evolution of the PC hardware graphics pipeline: 14:45 Advanced Rendering Techniques 1995-1998: and z-buffer Matthias Wloka 1998: Multitexturing 15:45 Break 1999-2000: Transform and lighting 16:15 General-Purpose Computation Using Graphics Hardware 2001: Programmable vertex Mark Harris 2002-2003: Programmable shader 2004: Shader model 3.0 and 64-bit color support 17:30 End PC architecture Performance numbers

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Real-Time Rendering Hardware Graphics Pipeline

Graphics hardware enables real-time rendering Application Geometry Rasterization 3D Triangles 2D Triangles Real-time means display rate at more than 10 images per second Stage Stage Stage

For each For each triangle: triangle vertex: Rasterize Transform triangle 3D position Interpolate into vertex attributes screen position across triangle

Compute Shade attributes pixels

3D Scene = Image = Resolve visibility Collection of Array of pixels 3D primitives (triangles, lines, points)

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

1 PC Architecture 1995-1998: Texture Mapping and Z-Buffer

CPU GPU Motherboard Application / Geometry Stage Rasterization Stage Raster Texture Central Processor Unit (CPU) Rasterizer Operations Unit Unit System Memory

Bus Port (PCI, AGP, PCIe) 2D Triangles Bus Frame 2D Triangles Textures Textures (PCI) Buffer Video Memory System Memory Video Memory Graphics Processor Unit (GPU) PCI: Peripheral Component Interconnect 3dfx’s Voodoo Video Board

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Texture Mapping Texture Mapping: Texture Coordinates Interpolation

Triangle Meshtextured with Base Texture Screen Space Texture Space

x u (x1, y1) + y v (u1, v1)

(x, y) (x0, y0)

(u, v) (x2, y2) (u0, v0)

= (u2, v2)

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Texture Mapping: Perspective-Correct Interpolation Texture Mapping: Magnification

Screen Space Texture Space

x u y v

Perspective-Incorrect Perspective-Correct

or

Nearest-Point Sampling Bilinear Filtering

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

2 Texture Mapping: Minification Texture Mapping: Mipmapping

Screen Space Texture Space

x u y v

or

Bilinear Filtering Trilinear Filtering

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Texture Mapping: Texture Mapping: Addressing Modes

or

Wrap Mirror

Bilinear Filtering Trilinear Filtering

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Texture Mapping: Addressing Modes Raster Operations Unit (ROP)

Raster Operations value Unit at (x, y) tested against Texture Rasterizer Fragments Scissor Test reference value Unit

Fragment Alpha Test tested against Frame Buffer Screen Position (x, y) scissor rectangle Stencil Test Stencil Buffer tested against Alpha Value a reference value Z Test Z-Buffer tested against Depth z z-buffer value at (x, y) Color Buffer (visibility test) Alpha Blending Pixels blended with color buffer value at (x, y): Clamp Border Color (r, g, b) Ksrc * Colorsrc + Kdst * Colorsrc (src = fragment dst = color buffer)

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

3 1998: Multitexturing AGP

CPU GPU PCI uses a parallel connection Application / Geometry Stage Rasterization Stage AGP uses a serial connection Raster Multitexture Rasterizer Operations → Less pins, simpler protocol → Cheaper, more scalable Unit Unit PCI uses a shared-bus protocol AGP uses a point-to-point protocol → Bandwidth is not shared among devices 2D Triangles Bus Frame 2D Triangles Textures Textures (AGP) Buffer AGP uses a dedicated system memory called AGP memory or non-local video memory System Memory Video Memory The GPU can lookup textures that resides in AGP memory AGP: Accelerated Graphics Port ’s TNT, ATI’s Rage Bandwidth: AGP = 2 x PCI (AGP2x = 2 x AGP, etc.)

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Multitexturing 1999-2000: Transform and Lighting

Base Texturemodulated by Light Map CPU GPU “Fixed Function Pipeline” Application Geometry Stage Rasterization Stage Stage Raster Transform Register Rasterizer Operations and Combiner Unit Lighting X Unit Texture Unit

3D Triangles

Textures Bus Frame from UT2004 (c) 3D Triangles Textures (AGP) Epic Games Inc. Buffer Used with permission System = Memory Video Memory Register Combiner: Offers many more texture/color combinations NVIDIA’s GeForce 256 and GeForce2, ATI’s 7500, S3’s Savage3D

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Transform and Lighting Unit (TnL)

Transform and Lighting Unit Bump mapping is about fetching the from a texture (called a normal map) instead of using the interpolated normal to compute Transform lighting at a given pixel

World World View Perspective Model Camera Projection Screen Matrix Space Matrix Division or or Projection or or and Object Eye Matrix Clip Window Model-View Viewport Space Space Space Space Matrix Matrix

Lighting Material Properties

Vertex Light Properties Diffuse and Specular Color Normal Map Vertex Color Diffuse light without bump Diffuse light with bumps

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

4 Cube Texture Mapping Projective Texture Mapping

Cubemap Projected Texture (covering Environment Mapping (x, y, z, w) the six faces z (x (the reflection vector of a cube) ,y,z y (x/w, y/w) Texture Projection ) is used to lookup x the cubemap)

Cubemap lookup Projective Texture lookup (with direction (x, y, z)) Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

2001: Programmable Vertex Shader Vertex Shader

CPU GPU A programming processor for any per-vertex computation

Application Geometry Stage Rasterization Stage void VertexShader( Stage Raster // Input per vertex Register in float4 positionInModelSpace, Vertex Shader Rasterizer Operations (no flow control) (with Z-Cull) Combiner in float2 textureCoordinates, Unit in float3 normal, Texture // Input per batch of triangles uniform float4x4 modelToProjection, Shader uniform float3 lightDirection,

// Output per vertex out float4 positionInProjectionSpace, 3D Triangles out float2 textureCoordinatesOutput, out float3 color Textures Bus Frame ) 3D Triangles Textures { (AGP) Buffer // Vertex transformation System positionInProjectionSpace = mul(modelToProjection, positionInModelSpace);

Memory Video Memory // Texture coordinates copy Z-Cull: Predicts which fragments will fail the Z test and discards them textureCoordinatesOutput = textureCoordinates; Texture Shader: Offers more texture addressing and operations // Vertex color computation color = dot(lightDirection, normal); NVIDIA’s GeForce3 and GeForce4 Ti, ATI’s Radeon 8500 }

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Volume Texture Mapping Hardware

Shadow Map Computation z/w The shadow map contains the depth z/w of the 3D points visible (x, y, z, w) from the light’s point of view: Spot (x/w, y/w) light

Shadow Rendering shadow z map A 3D point (x, y, z, w) is in shadow if: value (x,y,z) y z/w < value of shadow map at (x/w, y/w) (x, y, z, w) x Spot Volume Texture Noise Perturbation A hardware shadow map lookup (x/w, y/w) light (3D Noise) returns the value of this comparison between 0 and 1 Volume Texture lookup (with position (x, y, z)) Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

5 Antialiasing: Definition Antialiasing: Supersampling and Multisampling

Supersampling: Pixel Center Aliasing: Undesirable visual artifacts due to Compute color and Z at insufficient sampling of: higher resolution and Sample Primitives (triangles, lines, etc.) → jagged edges display averaged color to smooth out the visual Textures or → pixelation, moiré patterns artifacts Those artifacts are even more noticeable on animated images. Multisampling: Antialiasing: Method to reduce aliasing Same thing except only Z Texture antialiasing is largely handled by proper is computed at higher mipmapping and anisotropic filtering resolution Multisampling performs Shader antialiasing can be tricky (especially with antialiasing on primitive conditionals) edges only

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

2002-2003: Programmable Pixel Shader Pixel Shader

CPU GPU A programming processor for any per-pixel computation Application Geometry Stage Rasterization Stage void PixelShader( Stage Vertex Shader Pixel Shader Raster // Input per pixel Rasterizer in float2 textureCoordinates, (static and dynamic (static Operations (with Z-Cull) in float3 normal, flow control) flow control only) Unit // Input per batch of triangles Texture uniform sampler2D baseTexture, Unit uniform float3 lightDirection,

// Output per pixel out float3 color 3D Triangles ) { Textures Frame // Texture lookup Bus 3D Triangles Textures (AGP) Buffer float3 baseColor = tex2D(baseTexture, textureCoordinates);

System // Light computation Memory Video Memory float light = dot(lightDirection, normal);

MRT: Multiple Render Target // Pixel color computation color = baseColor * light; NVIDIA’s GeForce FX, ATI’s Radeon 9600 to 9800 }

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Shader: Static vs. Dynamic Flow Control 2004: Shader Model 3.0 and 64-Bit Color Support

void Shader( CPU GPU ... Application Geometry Stage Rasterization Stage // Input per vertex or per pixel Stage in float3 normal, Raster Vertex Shader Rasterizer Pixel Shader (static and dynamic (static and dynamic Operations (with Z-Cull) // Input per batch of triangles flow control) flow control) Unit uniform float3 lightDirection, uniform bool computeLight, Texture Static Flow Control Unit (condition varies ... 64-Bit ) Color per batch of triangles) { 3D Triangles ... Textures Frame if (computeLight) { Bus 3D Triangles Textures ... (PCIe) Buffer if (dot(lightDirection, normal)) { System Dynamic Flow Control ... Memory Video Memory } (condition varies ... PCIe: Peripheral Component Interconnect Express per vertex or pixel) } ... NVIDIA’s GeForce 6800 } Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

6 PCIe Shader Model 3.0

Like AGP: Shader Model 3.0 means: Uses a serial connection → Cheap, scalable Longer shaders → More complex shading Uses a point-to-point protocol → No shared bandwidth Pixel shader: Dynamic flow control → Better performance Derivative instructions → Shader antialiasing Unlike AGP: Support for 32-bit floating-point precision → Fewer artifacts General-purpose (not only for graphics) Face register → Faster two-sided lighting Dual-channels: Bandwidth is available in both direction Vertex shader: Texture access → Simulation on GPU, Bandwidth: PCIe = 2 x AGPx8

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

64-Bit Color Support High Dynamic Range Imagery

64-bit color means one 16-bit floating-point value per The dynamic range of a scene is the ratio of the highest (R, G, B, A) to the lowest luminance Real-life scenes can have high dynamic ranges of Alpha blending works with 64-bit color buffer several millions (as opposed to 32-bit fixed-point color buffer only) Display and print devices have a low dynamic range of works with 64-bit textures around 100 (as opposed to 32-bit fixed-point textures only) Tone mapping is the process of displaying high dynamic range images on those low dynamic range devices Applications: High dynamic range images use floating-point colors High-precision image compositing OpenEXR is a high dynamic range image format that is compatible with NVIDIA’s 64-bit color format High dynamic range imagery

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

Real-Time Tone Mapping PC Graphics Software Architecture

The image is entirely computed in 64-bit color CPU GPU Vertex Pixel and tone-mapped for display Application Program Program BUS

3D API Commands (OpenGL or DirectX) Vertex Pixel Programs Shader Shader

Geometry Driver (triangles, vertices, normals, etc...) System Memory Video Memory Textures

The application, 3D API and driver are written in C or C++ The vertex and pixel programs are written in a high-level shading language (Cg, DirectX HLSL, OpenGL Shading Language) From low to high exposure image of the same scene Pushbuffer: Contains the commands to be executed on the GPU

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

7 Evolution of Performance The Future

10 000 Mpixels/s Unified general programming model at primitive, vertex and pixel levels 1000 Mvertices/s Scary amounts of: Mtransistors Floating point horsepower 100 Video memory

10 Bandwidth between system and video memory Lower chip costs and power requirements to make 3D graphics hardware ubiquitous: 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Automotive (gaming, navigation, heads-up displays) PCI AGP AGP2x AGP4x AGP8x PCIe (133 MB/s) (266 MB/s) (533 MB/s) (1.06 GB/s) (2.1 GB/s) (4 GB/s) Home (remotes, media center, automation) 4 MB 32 MB64 MB128 MB 256 MB 512 MB

DirectX 8 DirectX 9 Mobile (PDAs, cell phones) OpenGL 1.1 OpenGL 1.2DirectX 6 DirectX 7 OpenGL 1.4 OpenGL 1.3 OpenGL 1.5

Tutorial 5: Programming Graphics Hardware Tutorial 5: Programming Graphics Hardware

More Information

www.realtimerendering.com developer.nvidia.com

Questions: [email protected]

Tutorial 5: Programming Graphics Hardware

8