Real-Time Graphics Architecture P

4/18/2007 Real-Time Graphics Architecture Lecture 4: Parallelism and Communication Kurt Akeley Pat Hanrahan http://graphics.stanford.edu/cs448-07-spring/ CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Topics 1. Frame buffers 2. Types of parallelism 3. Communication patterns and requirements 4. Sorting classification for parallel rendering (with examples) CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 1 4/18/2007 Frame Buffers Raster vs. calligraphic Raster (image order) dominant choice Calligraphic (object order) Earliest choice (Sketchpad) E&S terminals in the 70s and 80s Works with light pens Scene complexity affects frame rate Monitors are expensive Still required for FAA simulation Increases absolute brightness of light points CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 2 4/18/2007 Frame buffer definitions What is a frame buffer? What can we learn by considering different definitions? CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Frame buffer definition #1 Storage for commands that are executed to refresh the display Allows for raster or calligraphiccalligraphic display (e. g. Megatech) “Frame buffer” for calligraphic display is a “display list” OpenGL “render list”? Key point: frame buffer contents are interpreted Color mapping Image scaling, warping Window system (overlay, separate windows, …) Address Recalculation Pipeline CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 3 4/18/2007 Frame buffer definition #2 Image memory used to decouple the render frame rate from the display frame rate Meets common understanding of frame buffer as image Leads naturally to double buffering One render buffer, one display buffer, swap n-buffering also possible, can control latency Key idea: decoupling enables general-purpose GPU Visual simulation has high render frame rate MCAD has low render frame rate Window manager has no frame rate CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Frame buffer definition #3 All pixel-assigned memory used to assemble and display the images being rendered Key point: frame buffer is active participant in rendering Leads to non-color buffers: depth, stencil, window control OpenGL treats these buffers as part of frame buffer Some reserve “frame buffer” for color images Should be n-buffered in some cases (sort last) RealityEngine frame buffer can be deeper than wide or high History cycles through this definition 2-D manipulation 3-D painters algorithm 3-D depth, stencil, accumulation, multi-pass Programmable shading CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 4 4/18/2007 Frame buffer is optional Calligraphic display If we don’t define display list as frame buffer “Follow-the-beam” rendering Minimizes latency Saves cost if frames are never “dropped” Talisman-like image assembly (3-D sprites) Old idea (visual simulation, window systems) GigaPixel render tile Frame buffer stores color images only Depth, stencil, etc. in small tile CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Dominant architecture is consistent SGI architectures look like ATI architectures, which look like NVIDIA architectures Details are evolving, but big picture remains the same Why is this? Simplicity of design Simplicity of algorithms Simplicity of immediate-mode approach CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 5 4/18/2007 Simplicity of design Frame buffer operations Blending: merge fragment and pixel color Deppgth Buffering: save nearest frag ment Stencil Buffering: simple pixel state machine Accumulation Buffering: high-resolution color arithmetic Antialiasing: (to be covered later) …. All frame buffer operations: Combine fragment and pixel data (not just a replace) But replace operation is optimized, e.g., no parity/ECC Are local (no intra-pixel dependencies) Why aren’t fragment operations programmable? CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Simplicity of algorithms Frame buffer employs brute-force simplicity Hidden surface elimination: Depth-buffer vs. sort/painter Capping: Stencil-based vs. object calculations Image-space algorithm is efficient Just samples, never “object” information, locality Just-in-time calculation, steady cost function Accumulation Buffer (high-resolution color arithmetic) The Accumulation Buffer, Haeberli and Akeley, Proceedings of SIGGRAPH ‘90 Volume rendering using 3D textures Multi-pass rendering Interactive Multi-pass Programmable Shading, Peercy, Olano, Airey, and Ungar, Proceedings of SIGGRAPH ‘00 CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 6 4/18/2007 Simplicity of immediate-mode Frame buffer contents are “context” Matches 2D/window-rendering model Rendering System Little graphics Frame buffer: state here most graphics state here CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Decreasing display bandwidth burden Historically display bandwidth was a limiting factor Hence “Sproull’s Rule”: fill rate >= display rate Now display bandwidth is almost inconsequential Year System FB (GB) Disp (GB) Disp / FB 1984 SGI 2000-series 0.3 0.14 1/2 1988 SGI GTX 1.8 * 0.29 1/6 1996 SGI InfiniteReality 12. 8 0600.60 1/20 2006 NVIDIA 7900 GTX 51.2 0.75 1/70 * VRAM provided separate video bandwidth CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 7 4/18/2007 Parallelism and Communication Parallelism and communication Parallelism – using multiple computational units to processes work in parallel Communication – connecting the computational units to allow work to be distributed and aggregated Issues Dependencies Ordering Sorting Scalability Computation Bandwidth Load balancing CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 8 4/18/2007 Parallelism taxonomy Hardware parallelism Virtual parallelism (time sharing a single (simultaneous execution on processor, usually with multiple processors) hardware support) Data parallelism [aka “parallelism”] (same task on similar data sets) Task parallelism (different tasks on similar OR differing data sets) CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Parallelism taxonomy Hardware parallelism Virtual parallelism (time sharing a single (simultaneous execution on processor, usually with multiple processors) hardware support) Frame-parallelism Data parallelism (batch, SGI N-clops) [aka “parallelism”] (same task on similar Object-parallelism data sets) (geometry) Image-parallelism (fragment/pixel) Task parallelism (different tasks on similar OR differing data sets) CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 9 4/18/2007 Parallelism taxonomy Hardware parallelism Virtual parallelism (time sharing a single (simultaneous execution on processor, usually with multiple processors) hardware support) Frame-parallelism Data parallelism (batch, SGI N-clops) [aka “parallelism”] (same task on similar Object-parallelism data sets) (geometry) Image-parallelism (fragment/pixel) Task parallelism Multi-processing (on multiple CPUs) (different tasks on similar OR differing Pipelining data sets) (the graphics pipeline) CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Parallelism taxonomy Hardware parallelism Virtual parallelism (time sharing a single (simultaneous execution on processor, usually with multiple processors) hardware support) Frame-parallelism Data parallelism (batch, SGI N-clops) Multi-processing [aka “parallelism”] (graphics context switching) (same task on similar Object-parallelism data sets) (geometry) Multi-threading (almost defines a GPU-like Image-parallelism processor) (fragment/pixel) Task parallelism Multi-processing (on multiple CPUs) (different tasks on similar OR differing Pipelining data sets) (the graphics pipeline) CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 10 4/18/2007 Parallelism taxonomy Hardware parallelism Virtual parallelism (time sharing a single (simultaneous execution on processor, usually with multiple processors) hardware support) Frame-parallelism Data parallelism (batch, SGI N-clops) Multi-processing [aka “parallelism”] (graphics context switching) (same task on similar Object-parallelism data sets) (geometry) Multi-threading (almost defines a GPU-like Image-parallelism processor) (fragment/pixel) Multi-processing Multi-processing Task parallelism (time sharing a single CPU) (on multiple CPUs) (different tasks on similar OR differing Pipelining Multi-threading data sets) (Direct3D-10 “common- (the graphics pipeline) core”) CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Graphics is embarrassingly parallel Ample self-similar data sets … Frames, vertexes, fragments, texels, pixels With minimal dependencies Few intra-set dependencies Pixels (in the frame buffer) are the significant exception Inter-set dependencies are purely sequential “Graphics pipeline” is designed to minimize dependencies Other graphics architectures have more dependencies E.g., for global lighting effects But graphics pipeline has huge redundancies Hence many opportunities for optimization … How hard should we work to do things wrong ? CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 11 4/18/2007 Geometry parallelism trend (SGI) 20 15 Model 10 Transform Length Transform Width 5 0 X 00 G X IR 0 000 T RE 1 2 G VG CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Image parallelism trend (SGI) Rasterization 400 300 200 100 0 1000 2000 G GTX VGX RE IR CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 12 4/18/2007 The clear trend Shorter and wider Why ? CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Communication taxonomy Sorting Distribution

Real-Time Graphics Architecture P

20 Years of Opengl

PACKET 22 BOOKSTORE, TEXTBOOK CHAPTER Reading Graphics

Interactive Visualization of Large-Scale Architectural Models Over the Grid Strolling in Tang Chang’An City

Onyx2™ Deskside Workstation Owner's Guide

Infinitereality: Ein Real-Time- Grafiksystem

GPU Architecture:Architecture: Implicationsimplications && Trendstrends

Infinitereality: a Real-Time Graphics System

Infinitereality™ Video Format Combiner User's Guide

SGI High Performance Visualization Solutions

Graphics and Computing Gpus

Sgrortgtn'" 3000 Sertes Modular, Hlgh-Performance Servers Sgiorigin 3000 Series Product Stanford and SGI .Q.Y..~Iyi~W

High-Performance Computing at SGI and the Status of Climate and Weather Codes on the SGI Altix Gerardo Cisneros, Ph.D