Administrivia

Evolution of the  Tip: google “cis 565” Programmable  Slides posted before each class  Tentative assignment dates on website  1st assignment handed out today Patrick Cozzi  Write concisely University of Pennsylvania  Due start of class, one week from today CIS 565 - Spring 2011  Google group in progress  FYI. GDC Early Registration - 01/24

Survey Results Survey Results

 15/23 – graphics experience  Class interests  Most students have usable video cards  Pure architecture  Game rendering  Lerk – don’t be scared  Physical simulations  I want to be a Toys R Us kid too  Animation  Vision algorithms  Image/video processing  …

1 Course Roadmap Agenda

 Graphics Pipeline (GLSL)  Why program the GPU?  GPGPU (GLSL)  Graphics Review  Briefly  Evolution of the Programmable Graphics  GPU Computing (CUDA, OpenCL) Pipeline  Choose your own adventure  Understand the past  Student Presentation  Final Project  Goal : Prepare you for your presentation and project

Why Program the GPU? Why Program the GPU?

Graph from: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf Graph from: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf

2 Why Program the GPU? NVIDIA GPU Evolution

 Compute  Intel Core i7 – 4 cores – 100 GFLOP  NVIDIA GTX280 – 240 cores – 1 TFLOP

 Memory Bandwidth  System Memory – 60 GB/s  NVIDIA GT200 – 150 GB/s

 Install Base  Over 200 million NVIDIA G80s shipped

Numbers from Programming Massively Parallel Processors . Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Graphics Review Graphics Review: Modeling

 Modeling  Modeling  Rendering  Polygons vs Triangles  How do you store a triangle mesh?  Animation  Implicit Surfaces  Height maps  …

3 Triangles Triangles

Image courtesy of A K Peters, Ltd. www.virtualglobebook.com Image courtesy of A K Peters, Ltd. www.virtualglobebook.com. Imagery from NASA Visible Earth: visibleearth.nasa.gov.

Triangles Triangles

4 Implicit Surfaces Height Maps

Images from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch01.html Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

Graphics Review: Rendering Rasterization

 Rendering  Goal: Assign color to pixels  Two Parts  Visible surfaces  What is in front of what for a given view  Shading  Simulate the interaction of material and light to produce a pixel color  What about ray tracing?

5 Visible Surfaces Visible Surfaces

 Z-Buffer / Depth Buffer  Fragment vs Pixel

Image courtesy of A K Peters, Ltd. www.virtualglobebook.com Image courtesy of A K Peters, Ltd. www.virtualglobebook.com

Shading Shading

Images courtesy of A K Peters, Ltd. www.virtualglobebook.com Image from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch14.html

6 Graphics Pipeline Graphics Pipeline

Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer

 Scissor Test  Stencil Test  Depth Test  Blending Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Graphics Pipeline Graphics Pipeline

Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/ Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

7 Graphics Pipeline Graphics Review: Animation

 Move the camera and/or agents, and re- render the scene  In less than 16.6 ms (60 fps)

Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Evolution of the Programmable Early 90s – Pre GPU Graphics Pipeline

 Pre GPU  Fixed function GPU  Programmable GPU  Unified Processors

Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

8 Why GPUs? Generation I: 3dfx Voodoo (1996)

• Did not do vertex transformations:  Exploit Parallelism these were done in the CPU  Pipeline parallel • Did do , z-buffering.  Data-parallel

 CPU and GPU executing in parallel Image from “7 years of Graphics”  Hardware: texture filtering, MAD, etc.

Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer

CPU GPU PCI

Slide adapted from Suresh Venkatasubramanian and Joe Kider

Aside: Mario Kart 64 Aside: Mario Kart Wii

 High fragment load / low vertex load  High fragment load / low vertex load?

Image from: http://www.gamespot.com/users/my_shoe/ Image from: http://wii.ign.com/dor/objects/949580/mario-kart-wii/images/

9 Generation II: GeForce/ 7500 (1998) Generation III: GeForce3/Radeon 8500(2001)

• Main innovation : shifting the • For the first time, allowed limited transformation and lighting amount of programmability in the calculations to the GPU vertex pipeline • Allowed multi-texturing: giving bump • Also allowed volume texturing and maps, light maps, and others.. multi-sampling (for antialiasing) • Faster AGP bus instead of PCI Image from “7 years of Graphics” Image from “7 years of Graphics”

Rasterization Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Rasterization Transforms Assembly Interpolation Buffer Vertex Primitive Raster Frame Vertex Primitive and Frame Transforms Assembly Operations Buffer Transforms Assembly Interpolation Buffer GPU AGP GPU SmallSmall vertex vertex AGP shadersshaders

Slide from Suresh Venkatasubramanian and Joe Kider Slide from Suresh Venkatasubramanian and Joe Kider

Generation IV: Radeon 9700/GeForce FX (2002) Generation IV.V: GeForce6/X800 (2004)

• This generation is the first generation  Simultaneous rendering to multiple buffers of fully-programmable graphics cards  True conditionals and loops  PCIe bus • Different versions have different  Vertex texture fetch resource limits on fragment/vertex programs Image from “7 years of Graphics”

Rasterization Rasterization Vertex Primitive Raster Frame Vertex Primitive Raster Vertex Primitive and Frame Vertex Primitive and Transforms Assembly Operations Buffer Transforms Assembly Operations Transforms Assembly Interpolation Buffer Transforms Assembly Interpolation

PCIe AGP ProgrammableProgrammable ProgrammableProgrammable ProgrammableProgrammable ProgrammableProgrammable FragmentFragment FragmentFragment VertexVertex shader shader VertexVertex shader shader ProcessorProcessor ProcessorProcessor Texture Memory Texture Memory Texture Memory

Slide from Suresh Venkatasubramanian and Joe Kider Slide adapted from Suresh Venkatasubramanian and Joe Kider

10 NVIDIA NV40 Architecture Generation V: GeForce8800/HD2900 (2006)  Ground-up GPU redesign 6 vertex shader units  Support for Direct3D 10 / OpenGL 3  Geometry Vertex Texture Fetch  Stream out / transform-feedback  Unified shader processors  Support for General GPU programming

Input Programmable ProgrammableProgrammable Input ProgrammableProgrammable Raster Assembler Geometry PixelPixel (Fragment) (Fragment) Assembler VertexVertex shader shader Operations Shader ShaderShader

16 fragment PCIe shader units Output Merger

Image from GPU Gems 2: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter30.html Slide adapted from Suresh Venkatasubramanian and Joe Kider

D3D 10 Pipeline Geometry Shaders: Point Sprites

Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf

11 Geometry Shaders: Point Sprites Geometry Shaders

Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf

NVIDIA G80 Architecture NVIDIA G80 Architecture

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

12 Why Unify Shader Processors? Why Unify Shader Processors?

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Unified Shader Processors Terminology

Shader Direct3D OpenGL Model Example NVIDIA GeForce 6800 2 9 2.x ATI Radeon X800

NVIDIA GeForce 8800 3 10.x 3.x ATI Radeon HD 2900

NVIDIA GeForce GTX 480 4 11.x 4.x ATI Radeon HD 5870

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

13 Shader Capabilities Shader Capabilities

Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/ Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Evolution of the Programmable Evolution of the Programmable Graphics Pipeline Graphics Pipeline

Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

14 Evolution of the Programmable Graphics Pipeline New Tool: AMD System Monitor

 Not covered today:  Released 01/04/2011

 SM 5 / D3D 11 / GL 4  http://support.amd.com/us/kbarticles/Pages/AMDSystemMonitor.aspx  Tessellation shaders  *cough* student presentation *cough*  Later this semester: NVIDIA Fermi  Dual warp scheduler  Configurable L1 / shared memory  Double precision  …

15