Evolution of GPUs

Chris Seitz

PDF created with pdfFactory Pro trial version www.pdffactory.com Overview

Concepts: Real-time rendering Hardware Evolution of the PC hardware graphics pipeline: 1995-1998: and z-buffer 1998: Multitexturing 1999-2000: Transform and lighting 2001: Programmable vertex 2002-2003: Programmable pixel shader 2004: Shader model 3.0 and 64-bit color support

©2004 Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Real-Time Rendering

Graphics hardware enables real-time rendering Real-time means display rate at more than 10 images per second

3D Scene = Image = Collection of Array of pixels 3D primitives (triangles, lines, points)

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com PC Architecture

Motherboard

CPU

System Memory

Bus Port (PCI, AGP, PCIe)

Graphics Board

Video Memory

GPU

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Hardware Graphics Pipeline

Application 3D Geometry 2D Rasterization Pixels Stage Triangles Stage Triangles Stage

For each For each triangle vertex: triangle:

Transform Rasterize position: triangle 3D -> screen Interpolate Compute vertex attributes attributes Shade pixels

Resolve visibility

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Real-time Graphics 1997: RIVA 128 –3M Transistors

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Real-time Graphics 2004:

©2004 NVIDIA Corporation. All rights reserved. UnRealEngine 3.0 Images Courtesy of Epic Games

PDF created with pdfFactory Pro trial version www.pdffactory.com ‘95-‘98: Texture Mapping & Z-Buffer

CPU GPU

Application / Rasterization Stage Geometry Stage Raster Texture Rasterizer Operations Unit Unit

2D Triangles (PCI) Frame 2D Triangles Textures Buffer Textures

System Memory Video Memory

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Raster Operations Unit (ROP)

Raster Texture Rasterizer Fragments Operations Unit Unit

Scissor Test Fragment tested against scissor rectangle Screen Position Alpha Test (x, y) Frame Buffer tested against reference value Alpha Value Stencil Test a stencil buffer value at (x, y) tested against reference value Z-Buffer Depth Z Test z Color Buffer tested against z-buffer value at (x, y) Alpha Blending Color (Visibility test) Pixels (r, g, b)

blended with color buffer value at (x, y)

Ksrc * Colorsrc + Kdst * Colorsrc (src = fragment, dst = color buffer)

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Texture Mapping

Triangle Mesh Base Texture (with UV coordinates) + =

Sampling Magnification, Minification Filtering Bilinear, Trilinear, Anisotropic Mipmapping Perspective Correct Interpolation

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Texture Mapping: Mipmapping& Filtering

or

Bilinear Filtering TrilinearFiltering

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Filtering Examples

Point Bilinear Trilinear Anisotropic (4X) & Trilinear

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Incoming

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Riva TNT –7M Transistors -1998

Multitexture

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com 1998: Multitexturing

CPU GPU

Application / Rasterization Stage Geometry Stage Multi Raster Rasterizer Texture Operations Unit Unit

2 2D Triangles Bus (AGP) Frame 2D Triangles Textures Buffer Textures

System Memory Video Memory

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Multitexturing

Base Texture modulated by Light Map

X

from UT2004 (c) Epic Games Inc. = Used with permission

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com GeForce 256 –23M Transistors -1999

Hardware Transform & Lighting

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com 1999-2000: Transform and Lighting

CPU GPU “Fixed Function Pipeline”

Application Geometry Stage Rasterization Stage Stage Transform Raster Register & Lighting Rasterizer Operations Texture Combiner Unit Unit Unit

3D Bus Triangles (AGP) Frame 3D Triangles Textures Buffer Textures

System Memory Video Memory

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Transform and Lighting Unit (TnL)

Transform and Lighting Unit

Transform Lighting World Matrix Model or Object Space Material Properties World Space Light Properties Model-View Matrix View Matrix Vertex Color

Camera or Eye Space

Projection Matrix

Projection or Clip Space

Perspective Division & Viewport Matrix Vertex Diffuse & Screen or Window Space Specular Color

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Wanda and Bubble

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com GeForce 2

Pixel Multitexture

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com ©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com GeForce 3

Programmable Vertex Shading

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com 2001: Programmable Vertex Shader

CPU GPU

Application Geometry Stage Rasterization Stage Stage Vertex Raster Rasterizer Register Shader Texture Operations (no flow w/ Z-Cull Combiner control) Unit Unit (4 Textures)

3D Bus Triangles (AGP) Frame 3D Triangles Textures Buffer Textures

System Memory Video Memory

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Vertex Shader

A programmable processor for any per-vertex computation

void VertexShader( // Input per vertex in float4 positionInModelSpace, in float2 textureCoordinates, in float3 normal,

// Input per batch of triangles uniform float4x4 modelToProjection, uniform float3 lightDirection,

// Output per vertex out float4 positionInProjectionSpace, out float2 textureCoordinatesOutput, out float3 color ) { // Vertex transformation positionInProjectionSpace= mul(modelToProjection, positionInModelSpace);

// Texture coordinates copy textureCoordinatesOutput= textureCoordinates;

// Vertex color computation color = dot(lightDirection, normal); } ©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Bump Mapping

Bump mapping involves fetching the per-pixel normal from a normal map texture (instead of using the interpolated vertex normal) in order to compute lighting at a given pixel

+ =

Diffuse light Normal Map Diffuse light with bumps

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Cubic Texture Mapping

Cubemap

z Environment Mapping Reflection vector (x, y,z) y is used to lookup x into the cubemap

Cubemaplookup in direction (x, y, z)

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com ©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Projective Texture Mapping

Projected Texture

(x, y, z, w) Texture Projection (x/w, y/w)

Projective Texture lookup

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Volume Texture Mapping

z (x,y,z) y Volume Texture x Solid Textures 3D Noise Noise Perturbation

Volume Texture lookup with position (x, y, z)

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Hardware

Shadow Map Computation z/w The shadow map contains the depth

(z/w) of the 3D points visible from (x, y, z, w) Spot the light’s point of view: (x/w, y/w) light

Shadow Rendering shadow map A 3D point (x, y, z, w) is in shadow if: value z/w< value of shadow map at (x/w, y/w) (x, y, z, w) Spot (x/w, y/w) A hardware shadow map lookup light returns the value of this comparison between 0 and 1

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Antialiasing: Examples

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Antialiasing: Supersampling & Multisampling

Supersampling: Compute color and Z at higher resolution and display averaged color to smooth out the visual artifacts Multisampling: Same thing except only Z is computed at higher resolution Multisampling performs antialiasing on primitive edges only

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com X-Isle: Dinosaur Isle

©2004 NVIDIA Corporation. All rights reserved. Image courtesy of Crytek

PDF created with pdfFactory Pro trial version www.pdffactory.com GeForce 4

Multiple Vertex and Pixel

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Wolfman

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com GeForce FX ->100M Transistors

Programmable Pixel Shading (DirectX 9.0)

Scalable Architecture

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com ‘02-‘03: Programmable Pixel Shader

CPU GPU

Application Geometry Stage Rasterization Stage Stage Vertex Pixel Raster Shader Rasterizer Shader (static & Texture Operations w/ Z-Cull (static flow dynamic Unit control) Unit flow control) (16 textures)

3D Bus Triangles (AGP) Frame 3D Triangles Textures Buffer Textures

System Memory Video Memory

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com ©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com PC Graphics Software Architecture

CPU GPU

Application BUS Vertex Commands Pixel Frame Shader Buffer 3D API Shader Programs (OpenGL or DirectX) Geometry

Textures Vertex Pixel Driver Program Program

The application, 3D API and driver are written in C or C++ The vertex and pixel programs are written in a high-level shading language (DirectX HLSL, OpenGL Shading Language, Cg) Pushbuffer: Contains the commands to be executed on the GPU

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Pixel Shader

A programmable processor for any per-pixel computation void PixelShader( // Input per pixel in float2 textureCoordinates, in float3 normal,

// Input per batch of triangles uniform sampler2D baseTexture, uniform float3 lightDirection,

// Output per pixel out float3 color ) { // Texture lookup float3 baseColor= tex2D(baseTexture, textureCoordinates);

// Light computation float light = dot(lightDirection, normal);

// Pixel color computation color = baseColor* light; ©2004 NVIDIA Corporation. All rig}hts reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Shader: Static vs. Dynamic Flow Control

void Shader( ... // Input per vertex or per pixel in float3 normal,

// Input per batch of triangles uniform float3 lightDirection, uniform bool computeLight, Static Flow Control ... (condition varies ) per batch of triangles) { ... if (computeLight) { ... if (dot(lightDirection, normal)) { Dynamic Flow Control ... (condition varies } per vertex or pixel) ... } ... ©2004 NVIDIA Corporation. All rights reserved. }

PDF created with pdfFactory Pro trial version www.pdffactory.com ©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com GeForce 6800 –220M Transistors

Shader Model 3.0

FP64 & High Dynamic Range

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com 2004: Shader Model 3.0 & 64-Bit Color Support

CPU GPU

Application Geometry Stage Rasterization Stage Stage Vertex Rasterizer Pixel Shader Raster w/ Z-Cull Shader (static & Texture (static & Operations dynamic dynamic flow Unit flow control) Unit control)

64-Bit Color

3D Bus Triangles (PCIe) Frame 3D Triangles Textures Buffer Textures

System Memory Video Memory

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Shader Model 3.0

Longer shaders → More complex shading Pixel shader: Dynamic flow control → Better performance Derivative instructions → Shader antialiasing Support for 32-bit floating-point precision → Fewer artifacts Face register → Faster two-sided lighting Vertex shader: Lord of the Rings™ The Battle for Middle-earth™ Texture access → Simulation on GPU, displacement mapping Geometry Instancing → Better performance

Far Cry ©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com The GeForce 6 Series Amazing Real-time Effects

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com ©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Shader Model 3.0 / 64-bit Floating Point Processing 3.0 Running On GeForce 6 Series

2 Million Triangle Detail Mesh Models High Dynamic Range Rendering Fully Customizable Shaders

©2004 NVIDIA Corporation. All rights reserved. UnRealEngine 3.0 Images Courtesy of Epic Games

PDF created with pdfFactory Pro trial version www.pdffactory.com Shader Model 3.0 / 64-bit Floating Point Processing Unreal Engine 3.0 Running On GeForce 6 Series

100 Million Triangle Source Content Scene High Dynamic Range Rendering

©2004 NVIDIA Corporation. All rights reserved. UnRealEngine 3.0 Images Courtesy of Epic Games

PDF created with pdfFactory Pro trial version www.pdffactory.com High Dynamic Range Imagery

The dynamic range of a scene is the ratio of the highest to the lowest luminance

Real-life scenes can have high dynamic ranges of several millions

Display and print devices have a low dynamic range of around 100

Tone mapping is the process of displaying high dynamic range images on those low dynamic range devices

High dynamic range images use floating-point colors

OpenEXR is a high dynamic range image format that is compatible with NVIDIA’s 64-bit color format

HDR Rendering Engine - Compute surface reflectance, save in HDR buffer Contributions from multiple lights are additive (blended) Add image-space special effects & Post to HDR buffer AA, Glow, Depth of Field, Motion Blur

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Real-Time Tone Mapping

The image is entirely computed in 64-bit color and tone-mapped for display

Renderings of the same scene, from low to high exposure

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Evolution of Performance

10000 CPU Frequency (GHz)

Bus Bandwidth (GB/sec) e Rat l Fill 1000 Pixe Pixel Fill Rate (MPixels/sec)

Vertex Rate (MVerts/sec)

100 ate Graphics Bandwidth x R rte (GB/sec) Ve idth ndw s Ba 10 phic Gra th dwid Ban Bus ency 1 Frequ CPU

0.1 1994 2004

4 MB 32 MB 64 MB 128 MB 256 MB 512 MB

DirectX 6 DirectX 7DirectX 8 DirectX 9

OpenGL 1.1 OpenGL 1.4 OpenGL 1.5

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Looking Ahead: Now + 10 years

CPU Frequency (GHz) 1000000 Bus Bandwidth (GB/sec) 127 Gvertx Pixel Fill Rate (MPixels/sec) 100000 Vertex Rate (MVerts/sec) Graphics (GFlops/sec) 10000 Graphics Bandwidth (GB/sec)

te s 1000 a lop ill R f l F ics ixe ph P te ra Ra G 100 x rte Ve

idth 100 GHz 10 ndw Ba US ncy B que .5 Mverts Fre PU 1 C

0.1 1994 2004 2014 100 MHz ©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com NVIDIA SLI Multi-GPU

©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Progression of Graphics

Virtual Fighter Wanda Wolfman NV1 NV1x NV2x 1Mtrans 22Mtrans 63Mtrans

Dawn Nalu NV3x NV4x 130Mtrans 222M ©2004 NVIDIA Corporation. All rights reserved.

PDF created with pdfFactory Pro trial version www.pdffactory.com Thank You

PDF created with pdfFactory Pro trial version www.pdffactory.com