Why GPU Computing?

Total Page:16

File Type:pdf, Size:1020Kb

Why GPU Computing? Early 3D Graphics Perspective study of a chalice Paolo Uccello, circa 1450 © NVIDIA Corporation 2009 Early Graphics Hardware Artist using a perspective machine Albrecht Dürer, 1525 Perspective study of a chalice Paolo Uccello, circa 1450 © NVIDIA Corporation 2009 Early Electronic Graphics Hardware SKETCHPAD: A Man-Machine Graphical Communication System Ivan Sutherland, 1963 © NVIDIA Corporation 2009 The Graphics Pipeline The Geometry Engine: A VLSI Geometry System for Graphics Jim Clark, 1982 © NVIDIA Corporation 2009 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer © NVIDIA Corporation 2009 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer © NVIDIA Corporation 2009 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer © NVIDIA Corporation 2009 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer © NVIDIA Corporation 2009 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer © NVIDIA Corporation 2009 The Graphics Pipeline Vertex Key abstraction of real-time graphics Rasterize Hardware used to look like this Pixel One chip/board per stage Test & Blend Framebuffer Fixed data flow through pipeline © NVIDIA Corporation 2009 SGI RealityEngine (1993) Vertex Rasterize Pixel Test & Blend Framebuffer !"#$%&'()(*+%!"#$%&'()*%)" +,#-.%/0,%-.//0&12%34+ © NVIDIA Corporation 2009 SGI InfiniteReality (1997) Vertex Rasterize Pixel Test & Blend Framebuffer 567$#*8%($%9)+%1)2%)%&"!"#$%&'3454,"#$6&%7"4*,#-.%/040'0&"7,%-.//0&12%3: © NVIDIA Corporation 2009 The Graphics Pipeline Vertex Remains a useful abstraction Rasterize Hardware used to look like this Pixel Test & Blend Framebuffer © NVIDIA Corporation 2009 The Graphics Pipeline !!"#$%&"'&()$*"+)(,-(./"-0)"+$1(231/)"$**1'1-0 Vertex 4456-7$644 8-1*"8)%9**:,6-$';"9<",6-$';"=<",6-$';">? @ 10' 1 A"'&()$*B*CDC E"76-%FG1.DC ;"76-%FB*CDCH >I1J"A"9I1J"E"=I1JH K Rasterize Hardware used to look like this: Vertex, pixel processing became Pixel programmable Test & Blend Framebuffer © NVIDIA Corporation 2009 The Graphics Pipeline !!"#$%&"'&()$*"+)(,-(./"-0)"+$1(231/)"$**1'1-0 Vertex 4456-7$644 8-1*"8)%9**:,6-$';"9<",6-$';"=<",6-$';">? @ 10'"1"A"'&()$*B*CDC E"76-%FG1.DC ;"76-%FB*CDCH >I1J"A"9I1J"E"=I1JH K Geometry Hardware used to look like this Rasterize Vertex, pixel processing became programmable Pixel New stages added Test & Blend Framebuffer © NVIDIA Corporation 2009 The Graphics Pipeline !!"#$%&"'&()$*"+)(,-(./"-0)"+$1(231/)"$**1'1-0 Vertex 4456-7$644 8-1*"8)%9**:,6-$';"9<",6-$';"=<",6-$';">? @ 10'"1"A"'&()$*B*CDC E"76-%FG1.DC ;"76-%FB*CDCH >I1J"A"9I1J"E"=I1JH K Tessellation Hardware used to look like this Geometry Vertex, pixel processing became Rasterize programmable Pixel New stages added Test & Blend Framebuffer GPU architecture increasingly centers around shader execution © NVIDIA Corporation 2009 Modern GPUs: Unified Design Discrete Design Unified Design Shader A Shader B ibuffer ibuffer ibuffer ibuffer Shader Core obuffer obuffer obuffer obuffer Shader C Shader D Vertex shaders, pixel shaders, etc. become threads running different programs on a flexible core GeForce 8: Modern GPU Architecture 26?$ .7B"$%&??(8F)(# -($"B%C%09?$(#DE( @(#$(A%;<#(9=%.??"( /(68%;<#(9=%.??"( 1DA()%;<#(9=%.??"( SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF TF ;<#(9=%1#6>(??6# L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 Framebuffer Framebuffer Framebuffer Framebuffer Framebuffer Framebuffer GeForce 8: Modern GPU Architecture 26?$ .7B"$%&??(8F)(# -($"B%C%09?$(#DE( @(#$(A%;<#(9=%.??"( /(68%;<#(9=%.??"( 1DA()%;<#(9=%.??"( SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF TF ;<#(9=%1#6>(??6# L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 Framebuffer Framebuffer Framebuffer Framebuffer Framebuffer Framebuffer Modern GPU Architecture: GT200 26?$ .7B"$%&??(8F)(# -($"B%C%09?$(#DE( @(#$(A%;<#(9=%.??"( /(68%;<#(9=%.??"( 1DA()%;<#(9=%.??"( SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF L1 L1 L1 L1 L1 ;<#(9=%-><(=")(# L1 L1 L1 L1 L1 TF TF TF TF TF SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP L2 L2 L2 L2 L2 L2 L2 L2 Framebuffer© NVIDIA CorporationFramebuffer 2009 Framebuffer Framebuffer Framebuffer Framebuffer Framebuffer Framebuffer Next-Gen GPU Architecture: Fermi DRAM I/F DRAM I/F DRAM I/F HOST I/F HOST L2 DRAM I/F Giga Thread Giga DRAM I/F DRAM I/F © NVIDIA Corporation 2009 NVIDIA next-!"#$%&"'()*$+',-).",./'" GPUs Today Lessons from Graphics Pipeline Throughput is paramount Create, run, & retire lots of threads very rapidly !"#$%&' )9 +,-./ Use multithreading to hide latency 012-.31%((88 6(&*%+,-./ 012-.31 27 &'5*%+,-./ 012-.31 )% 012-.314 '56 68*%+,-./ !"#$%&'( ')*%+,-./ )*%+,-./ !""# $%%% $%%# $%!% Perspective 1 TeraFLOP in 1993 !"#$%&#'($)**+#,-$./' Computers no longer get faster, just wider You must re-think your algorithms to be parallel ! Data-parallel computing is most scalable solution © NVIDIA Corporation 2009 Why GPU Computing? Fact: Throughputnobody architecture cares about ! massivetheoretical parallelism peak Massive parallelism !Challenge:computational horsepower harness GPU power for real application performance GPU $"# !"# #<=4>&+234&?@&6.A !"#$#%&'()*%&+,-.- 0&12345 /0-&12345 GFLOPS ,-/&89*:;) 67.&89*:;) CPU CUDA Successes 146X 36X 18X 50X 100X Medical Imaging Molecular Dynamics Video Transcoding Matlab Computing Astrophysics U of Utah U of Illinois Elemental Tech AccelerEyes RIKEN 149X 47X 20X 130X 30X Financial simulation Linear Algebra 3D Ultrasound Quantum Chemistry Gene Sequencing Oxford Universidad Jaime Techniscan U of Illinois U of Maryland Accelerating Insight 4.6 Days 2.7 Days 3 Hours 8 Hours 30 Minutes 27 Minutes 16 Minutes 13 Minutes CPU Only Heterogeneous with Tesla GPU © NVIDIA Corporation 2009 GPU Computing 1.0 (Ignoring prehistory: Ikonas, Pixel Machine, Pixel-01/2#-34 GPU Computing 1.0: compute pretending to be graphics Disguise data as textures or geometry Disguise algorithm as render passes !Trick graphics pipeline into doing your computation! Term GPGPU coined by Mark Harris © NVIDIA Corporation 2009 Typical GPGPU Constructs © NVIDIA Corporation 2009 Typical GPGPU Constructs © NVIDIA Corporation 2009 GPGPU Hardware & Algorithms GPUs get progressively more capable Fixed-function ! register combiners ! shaders fp32 pixel hardware greatly extends reach Algorithms get more sophisticated Cellular automata ! PDE solvers ! ray tracing Clever graphics tricks High-level shading languages emerge GPU Computing 2.0 5 Enter CUDA GPU Computing 2.0: direct compute Program GPU directly, no graphics-based restrictions GPU Computing supplants graphics-based GPGPU November 2006: NVIDIA introduces CUDA © NVIDIA Corporation 2009 CUDA In One Slide Block Thread B(#GF)6>' B(#G$<#(9= ?<9#(= )6>9)%8(86#* Local barrier 8(86#* Kernel &''$% . Global barrier B(#G=(HD>( I)6F9) Kernel !"#$% 8(86#* . © NVIDIA Corporation 2009 CUDA C Example !"#$%&'()*+&,-#'./#01%02%3."'1%'2%3."'1%4(2%3."'1%4*5 6 3"- /#01%#%7%89%# : 09%;;#5 *<#=%7%'4(<#=%;%*<#=9 > Serial C Code ??%@0!"A,%&,-#'. BCDEF%A,-0,. &'()*+&,-#'./02%GH82%(2%*59 ++I."J'.++%!"#$%&'()*+)'-'..,./#01%02%3."'1%'2%3."'1%4(2%3."'1%4*5 6 #01%#%7%J."KA@$(H(4J."KAL#MH(%;%1N-,'$@$(H(9 #3 /# : 05%%*<#=%7%'4(<#=%;%*<#=9 > Parallel C Code ??%@0!"A,%)'-'..,. BCDEF%A,-0,. O#1N%GPQ%1N-,'$&?J."KA #01%0J."KA&%7%/0%;%GPP5%?%GPQ9 &'()*+)'-'..,.:::0J."KA&2%GPQRRR/02%GH82%(2%*59 Heterogeneous Programming Use the right processor for the right job Serial Code Parallel Kernel &''((()*+,-.)*/01)222$"#34%5 . Serial Code Parallel Kernel !"#((()*+,-.)*/01)222$"#34%5 . © NVIDIA Corporation 2009 GPU Computing 3.0 5 An Ecosystem GPU Computing 3.0: an emerging ecosystem Hardware & product lines Education & research Algorithmic sophistication Consumer applications Cross-platform standards High-level languages !"#$%&'()*$+,$-./)(*01$$%&0()2$3,2&*4$56/+7,89*4$/55* Products CUDA is in products from laptops to supercomputers ` © NVIDIA Corporation 2009 Emerging HPC Products New class of hybrid GPU-CPU servers 2 Tesla Upto 18 Tesla M1060 GPUs M1060 GPUs SuperMicro 1U Bull Bullx GPU Server Blade Enclosure © NVIDIA Corporation 2009 Algorithmic Sophistication Sort Sparse matrix Hash tables Fast multipole method Ray tracing (parallel tree traversal) © NVIDIA Corporation 2009 012$3"4/5.4$6'7($%89.)():+.)7#$76$;9+'4"$<+.')=-Vector Multiplication on Emerging Multicore Platforms", Williams et al, Supercomputing 2007 Cross-Platform Standards GPU Computing Applications Java Python tm Direct CUDA CUDA C OpenCL .NET Compute Fortran MATLAB 3 NVIDIA GPU with the CUDA Parallel Computing Architecture © NVIDIA Corporation 2009 OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc. CUDA Momentum Over 250 universities teach CUDA Over 1000 research papers CUDA powered TSUBAME 29th fastest supercomputer in the world 180 Million CUDA GPUs 100,000 active developers Over 500 CUDA Apps www.nvidia.com/CUDA © NVIDIA Corporation 2009 thrust Data Structures Algorithms ! !"#$%!&&'()*+(,)(+!-# ! !"#$%!&&%-#! ! !"#$%!&&"-%!,)(+!-# ! !"#$%!&&#('$+( ! !"#$%!&&'()*+(,.!# ! !"#$%!&&(/+0$%*)(,%+12 ! Etc. ! Etc. thrust: a library of data parallel algorithms & data structures with an interface similar to the C++ Standard Template
Recommended publications
  • Real-Time Graphics Architecture P
    4/18/2007 Real-Time Graphics Architecture Lecture 4: Parallelism and Communication Kurt Akeley Pat Hanrahan http://graphics.stanford.edu/cs448-07-spring/ CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Topics 1. Frame buffers 2. Types of parallelism 3. Communication patterns and requirements 4. Sorting classification for parallel rendering (with examples) CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 1 4/18/2007 Frame Buffers Raster vs. calligraphic Raster (image order) dominant choice Calligraphic (object order) Earliest choice (Sketchpad) E&S terminals in the 70s and 80s Works with light pens Scene complexity affects frame rate Monitors are expensive Still required for FAA simulation Increases absolute brightness of light points CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 2 4/18/2007 Frame buffer definitions What is a frame buffer? What can we learn by considering different definitions? CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 Frame buffer definition #1 Storage for commands that are executed to refresh the display Allows for raster or calligraphiccalligraphic display (e. g. Megatech) “Frame buffer” for calligraphic display is a “display list” OpenGL “render list”? Key point: frame buffer contents are interpreted Color mapping Image scaling, warping Window system (overlay, separate windows, …) Address Recalculation Pipeline CS448 Lecture 4 Kurt Akeley, Pat Hanrahan, Spring 2007 3 4/18/2007 Frame buffer definition #2 Image memory used to decouple the render frame rate from the display
    [Show full text]
  • 20 Years of Opengl
    20 Years of OpenGL Kurt Akeley © Copyright Khronos Group, 2010 - Page 1 So many deprecations! • Application-generated object names • Depth texture mode • Color index mode • Texture wrap mode • SL versions 1.10 and 1.20 • Texture borders • Begin / End primitive specification • Automatic mipmap generation • Edge flags • Fixed-function fragment processing • Client vertex arrays • Alpha test • Rectangles • Accumulation buffers • Current raster position • Pixel copying • Two-sided color selection • Auxiliary color buffers • Non-sprite points • Context framebuffer size queries • Wide lines and line stipple • Evaluators • Quad and polygon primitives • Selection and feedback modes • Separate polygon draw mode • Display lists • Polygon stipple • Hints • Pixel transfer modes and operation • Attribute stacks • Pixel drawing • Unified text string • Bitmaps • Token names and queries • Legacy pixel formats © Copyright Khronos Group, 2010 - Page 2 Technology and culture © Copyright Khronos Group, 2010 - Page 3 Technology © Copyright Khronos Group, 2010 - Page 4 OpenGL is an architecture Blaauw/Brooks OpenGL SGI Indy/Indigo/InfiniteReality Different IBM 360 30/40/50/65/75 NVIDIA GeForce, ATI implementations Amdahl Radeon, … Code runs equivalently on Top-level goal Compatibility all implementations Conformance tests, … It’s an architecture, whether Carefully planned, though Intentional design it was planned or not . mistakes were made Can vary amount of No feature subsetting Configuration resource (e.g., memory) Config attributes (e.g., FB) Not a formal
    [Show full text]
  • PACKET 22 BOOKSTORE, TEXTBOOK CHAPTER Reading Graphics
    A.11 GRAPHICS CARDS, Historical Perspective (edited by J Wunderlich PhD in 2020) Graphics Pipeline Evolution 3D graphics pipeline hardware evolved from the large expensive systems of the early 1980s to small workstations and then to PC accelerators in the 1990s, to $X,000 graphics cards of the 2020’s During this period, three major transitions occurred: 1. Performance-leading graphics subsystems PRICE changed from $50,000 in 1980’s down to $200 in 1990’s, then up to $X,0000 in 2020’s. 2. PERFORMANCE increased from 50 million PIXELS PER SECOND in 1980’s to 1 billion pixels per second in 1990’’s and from 100,000 VERTICES PER SECOND to 10 million vertices per second in the 1990’s. In the 2020’s performance is measured more in FRAMES PER SECOND (FPS) 3. Hardware RENDERING evolved from WIREFRAME to FILLED POLYGONS, to FULL- SCENE TEXTURE MAPPING Fixed-Function Graphics Pipelines Throughout the early evolution, graphics hardware was configurable, but not programmable by the application developer. With each generation, incremental improvements were offered. But developers were growing more sophisticated and asking for more new features than could be reasonably offered as built-in fixed functions. The NVIDIA GeForce 3, described by Lindholm, et al. [2001], took the first step toward true general shader programmability. It exposed to the application developer what had been the private internal instruction set of the floating-point vertex engine. This coincided with the release of Microsoft’s DirectX 8 and OpenGL’s vertex shader extensions. Later GPUs, at the time of DirectX 9, extended general programmability and floating point capability to the pixel fragment stage, and made texture available at the vertex stage.
    [Show full text]
  • Interactive Visualization of Large-Scale Architectural Models Over the Grid Strolling in Tang Chang’An City
    Interactive Visualization of Large-Scale Architectural Models over the Grid Strolling in Tang Chang’an City XU Shuhong1, HENG Chye Kiang2, SUBRAMANIAM Ganesan1, HO Quoc Thuan1, KHOO Boon Tat1 and HUANG Yan2 1 Institute of High Performance Computing, Singapore 2 Department of Architecture, National University of Singapore Keywords: remote visualization, grid-enabled visualization, large-scale architectural models, virtual heritage Abstract: Virtual reconstruction of the ancient Chinese Chang’an city has been continued for ten years at the National University of Singapore. Motivated by sharing this grand city with people who are geographically distant and equipped with normal personal computers, this paper presents a practical Grid-enabled visualization infrastructure that is suitable for interactive visualization of large-scale architectural models. The underlying Grid services, such as information service, visualization planner and execution container etc, are developed according to the OGSA standard. To tackle the critical problem of Grid visualization, i.e. data size and network bandwidth, a multi-stage data compression approach is deployed and the corresponding data pre-processing, rendering and remote display issues are systematically addressed. 1 INTRODUCTION Chang’an, meaning long-lasting peace, was the capital of China’s Tang dynasty from 618 to 907 AD. At its peak, it had a population of about one million. Measuring 9.7 by 8.6 kilometres, the city’s architecture inspired the planning of many capital cities in East Asia such as Heijo-kyo and Heian-kyo in Japan in the 8th century, and imperial Chinese cities like Beijing of the Ming and Qing dynasties. To bring this architectural miracle back to life, a team led by Professor Heng Chye Kiang at the National University of Singapore (NUS) has conducted extensive research since the middle of 90s.
    [Show full text]
  • Onyx2™ Deskside Workstation Owner's Guide
    Onyx2™ Deskside Workstation Owner’s Guide Document Number 007-3454-004 CONTRIBUTORS Written by Mark Schwenden and Carolyn Curtis Illustrated by Dan Young and Cheri Brown Production by Allen Clardy Engineering contributions by Bob Marinelli, Brad Morrow, Bharat Patel, Ed Reidenbach, Philip Montalban, Mike Mackovitch, Jim Ammon, Bob Murphy, Reuel Nash, Mohsen Hosseini, Suzanne Jones, Jeff Milo, Patrick Conway, Kirk Law, Dave Lima, and Martin Frankel St Peter’s Basilica image courtesy of ENEL SpA and InfoByte SpA. Disk Thrower image courtesy of Xavier Berenguer, Animatica. © 1998, Silicon Graphics, Inc.— All Rights Reserved The contents of this document may not be copied or duplicated in any form, in whole or in part, without the prior written permission of Silicon Graphics, Inc. RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and/or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94043-1389. FCC Warning This equipment has been tested and found compliant with the limits for a Class A digital device, pursuant to Part 15 of the FCC rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment generates, uses, and can radiate radio frequency energy and if not installed and used in accordance with the instruction manual, may cause harmful interference to radio communications.
    [Show full text]
  • Infinitereality: Ein Real-Time- Grafiksystem
    InfiniteReality: Ein Real-Time- Grafiksystem Vortrag am GDV-Seminar ETH Zürich Andreas Deller 18. Mai 1998 05/19/98 InfiniteReality Seite 1 Outline I. Einführung A. Grundinformationen über InfiniteReality B. Eckdaten der InfiniteReality C. Zum Vortrag II. Architektur A. Übersicht B. Detailbeschreibungen 1. Geometry Board a. Host Interface b. Geometry Distributor c. Geometry Engines (2 oder 4 pro Pipeline) d. Geometry Raster FIFO 2. Raster Memory Board (1, 2, oder 4 pro Pipeline) a. Fragment Generators b. Image Engines 3. Display Generator Board III. Features 1. Virtual Texture 2. Scene Load Management IV. Host-Systeme A. Systembeschreibung 1. Node Cards 2. Beispielkonfigurationen V. Performance VI. Endnoten/Literaturangaben VII. Index 05/19/98 InfiniteReality Seite 2 Einführung Grundinformationen über InfiniteReality Das InfiniteReality(IR)-Grafiksystem von SGI (Silicon Graphics, Inc.) ist das erste Workstation-System, das für gleichmässige, garantierte Hertz-Raten entwickelt worden ist. Als Nachfolger der RealityEngine wird es in den Hochleistungs-Grafikcomputern Onyx und Onyx2 von SGI eingesetzt. Das Design musste derart angepasst werden, dass die IR-Engine sowohl von den Vorzügen der verbesserten Systemarchitektur der Onyx2 profitieren als auch in der Onyx optimal eingesetzt werden kann. Zudem sollte sie skalierbar sein, um den verschiedenen Kundenwünschen möglichst gut zu entsprechen und einen benötigten Ausbau ohne grossen Aufwand zu erledigen. Das IR-Grafiksystem wird als «third-generation graphics system» bezeichnet. Dies bedeuetet, dass das System beleuchtete, schattierte, tiefengepufferte, texturierte, antialiased Dreiecke hardwareunterstützt rendern kann, und zwar mit möglichst wenig Performance-Einbusse. Weiter sollten Texturen eine praktisch unbegrenzte Grösse haben dürfen, ohne die Performance erheblich zu beeinflussen. Der Programmierer sollte mit dem Handling der Textur so wenig Zeit wie möglich aufbringen müssen.
    [Show full text]
  • GPU Architecture:Architecture: Implicationsimplications && Trendstrends
    GPUGPU Architecture:Architecture: ImplicationsImplications && TrendsTrends DavidDavid Luebke,Luebke, NVIDIANVIDIA ResearchResearch Beyond Programmable Shading: In Action GraphicsGraphics inin aa NutshellNutshell • Make great images – intricate shapes – complex optical effects – seamless motion • Make them fast – invent clever techniques – use every trick imaginable – build monster hardware Eugene d’Eon, David Luebke, Eric Enderton In Proc. EGSR 2007 and GPU Gems 3 …… oror wewe couldcould justjust dodo itit byby handhand Perspective study of a chalice Paolo Uccello, circa 1450 Beyond Programmable Shading: In Action GPU Evolution - Hardware 1995 1999 2002 2003 2004 2005 2006-2007 NV1 GeForce 256 GeForce4 GeForce FX GeForce 6 GeForce 7 GeForce 8 1 Million 22 Million 63 Million 130 Million 222 Million 302 Million 754 Million Transistors Transistors Transistors Transistors Transistors Transistors Transistors 20082008 GeForceGeForce GTX GTX 200200 1.41.4 BillionBillion TransistorsTransistors Beyond Programmable Shading: In Action GPUGPU EvolutionEvolution -- ProgrammabilityProgrammability ? Future: CUDA, DX11 Compute, OpenCL CUDA (PhysX, RT, AFSM...) 2008 - Backbreaker DX10 Geo Shaders 2007 - Crysis DX9 Prog Shaders 2004 – Far Cry DX7 HW T&L DX8 Pixel Shaders 1999 – Test Drive 6 2001 – Ballistics TheThe GraphicsGraphics PipelinePipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer Beyond Programmable Shading: In Action TheThe GraphicsGraphics PipelinePipeline Vertex Transform
    [Show full text]
  • Infinitereality: a Real-Time Graphics System
    InfiniteReality: A Real-Time Graphics System John S. Montrym, Daniel R. Baum, David L. Dignam, and Christopher J. Migdal Silicon Graphics Computer Systems ABSTRACT Most of the features and capabilities of the InfiniteReality architec- TM The InfiniteReality graphics system is the first general-purpose ture are designed to support this real-time performance goal. Mini- workstation system specifically designed to deliver 60Hz steady mizing the time required to change graphics modes and state is as frame rate high-quality rendering of complex scenes. This paper important as increasing raw transformation and pixel fill rate. Many describes the InfiniteReality system architecture and presents novel of the targeted applications require access to very large textures features designed to handle extremely large texture databases, and/or a great number of distinct textures. Permanently storing such maintain control over frame rendering time, and allow user custom- large amounts of texture data within the graphics system itself is not ization for diverse video output requirements. Rendering perfor- economically viable. Thus methods must be developed for applica- mance expressed using traditional workstation metrics exceeds tions to access a “virtual texture memory” without significantly seven million lighted, textured, antialiased triangles per second, and impacting overall performance. Finally, the system must provide 710 million textured antialiased pixels filled per second. capabilities for the application to monitor actual geometry and fill rate performance on a frame by frame basis and make adjustments CR Categories and Subject Descriptors: I.3.1 [Computer if necessary to maintain a constant 60Hz frame update rate. Graphics]: Hardware Architecture; I.3.3 [Computer Graph- ics]: Picture/Image Generation Aside from the primary goal of real-time application performance, two other areas significantly shaped the system architecture.
    [Show full text]
  • Infinitereality™ Video Format Combiner User's Guide
    InfiniteReality™ Video Format Combiner User’s Guide Document Number 007-3279-001 CONTRIBUTORS Written by Carolyn Curtis Illustrated by Carolyn Curtis Production by Laura Cooper Engineering contributions by Gregory Eitzmann, David Naegle, Chase Garfinkle, Ashok Yerneni, Rob Wheeler, Ben Garlick, Mark Schwenden, and Ed Hutchins Cover design and illustration by Rob Aguilar, Rikk Carey, Dean Hodgkinson, Erik Lindholm, and Kay Maitz © 1996, Silicon Graphics, Inc.— All Rights Reserved The contents of this document may not be copied or duplicated in any form, in whole or in part, without the prior written permission of Silicon Graphics, Inc. RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and/or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94043-1389. Silicon Graphics, the Silicon Graphics logo, Onyx, and IRIS are registered trademarks and IRIX, Sirius Video, POWER Onyx, and InfiniteReality are trademarks of Silicon Graphics, Inc. X Window System is a trademark of Massachusetts Institute of Technology. InfiniteReality™ Video Format Combiner User’s Guide Document Number 007-3279-001 Contents List of Figures vii List of Tables ix About This Guide xi Audience xi Structure of This Guide xi Conventions xii 1. InfiniteReality Graphics and the Video Format Combiner Utility 1 Channel Input and Output 2 Video Format Combinations 2 Video Format Combiner Utility and the Sirius Video Option 2 Programmable Querying of Video Format Combinations 3 2.
    [Show full text]
  • SGI High Performance Visualization Solutions
    Next Generation Graphics Hardware Architectures Fabrizio Magugliani SE Director, EMEA [email protected] 1 SGI Confidential-Not for Redistribution Agenda • The ultimate architecture •Driving the technology •SGI® Onyx® 3000 series –InfiniteReality3™ –InfinitePerformance™ •SGI® Onyx® 300 •Silicon Graphics Fuel™ visual workstation •Conclusion •Q&A SGI Confidential-Not for Redistribution The Ultimate Architecture Because some people think that “There is no tool like an old tool” SGI Confidential-Not for Redistribution Driving the Technology Real-time visualization Large DBs Desktop Throughput and Visualization High Turnaround Throughput SGI® Reality Center™ Silicon Graphics Fuel™ Silicon Graphics® SGI® Origin® 3000 series SGI® Origin® 3800 SGI® Origin® 300 SGI™ Onyx® 3000 series Octane2® SGI® Onyx® 3800 SGI® Onyx® 300 (IRIX) (IRIX®) (IRIX) (IRIX) Distributed memory Shared memory SGI Confidential-Not for Redistribution SGI™ Onyx® 3000 Series • SGI Onyx 3000 series with InfiniteReality® graphics IntroducedIntroduced JulyJuly 2000 2000 • The industry’s highest quality graphics • Revolutionary NUMAflex™ modular computing design with integrated graphics SGI Confidential-Not for Redistribution InfiniteReality™ Performance, Flexibility, and Quality SGI Confidential-Not for Redistribution InfiniteReality™ Versatile Frame Buffer Displays Frame Buffer SGI Confidential-Not for Redistribution InfiniteReality™ Versatile Frame Buffer SGI Confidential-Not for Redistribution InfiniteReality™ Reality Center™ Multichannels are easy! Even in Edge-Blended
    [Show full text]
  • Graphics and Computing Gpus
    B APPENDIX Graphics and Computing GPUs John Nickolls Imagination is more Director of Architecture important than NVIDIA knowledge. David Kirk Chief Scientist Albert Einstein On Science, 1930s NVIDIA B.1 Introduction B-3 B.2 GPU System Architectures B-7 B.3 Programming GPUs B-12 B.4 Multithreaded Multiprocessor Architecture B-25 B.5 Parallel Memory System B-36 B.6 Floating-point Arithmetic B-41 B.7 Real Stuff: The NVIDIA GeForce 8800 B-46 B.8 Real Stuff: Mapping Applications to GPUs B-55 B.9 Fallacies and Pitfalls B-72 B.10 Concluding Remarks B-76 B.11 Historical Perspective and Further Reading B-77 B.1 Introduction Th is appendix focuses on the GPU—the ubiquitous graphics processing unit graphics processing in every PC, laptop, desktop computer, and workstation. In its most basic form, unit (GPU) A processor the GPU generates 2D and 3D graphics, images, and video that enable Window- optimized for 2D and 3D based operating systems, graphical user interfaces, video games, visual imaging graphics, video, visual computing, and display. applications, and video. Th e modern GPU that we describe here is a highly parallel, highly multithreaded multiprocessor optimized for visual computing. To provide visual computing real-time visual interaction with computed objects via graphics, images, and video, A mix of graphics the GPU has a unifi ed graphics and computing architecture that serves as both a processing and computing programmable graphics processor and a scalable parallel computing platform. PCs that lets you visually interact with computed and game consoles combine a GPU with a CPU to form heterogeneous systems.
    [Show full text]
  • Sgrortgtn'" 3000 Sertes Modular, Hlgh-Performance Servers Sgiorigin 3000 Series Product Stanford and SGI .Q.Y..~Iyi~W
    ENCLOSURE 2 • l~ Product Guide 5 ...... SGrOrtgtn'" 3000 Sertes Modular, Hlgh-Performance Servers SGIOrigin 3000 Series Product Stanford and SGI .Q.y..~IYi~w. Announce New ~l,;?!.Q[igln.~2..Q.Q Partnership in §.l,;?LQ.r:isln Biomedical ~ Supercomputing SGI Origin 3400 [more] SGI Origin 3800 SARA Supercomputer System Facility Installation [more] Software Modular NUMAflex Supercomputing [view] Bricks ill Partijioning SAIC Upgrades FBI's The SGI Origin 3000 series takes system modularity to Datasheet & National Instant new heights, as NUMAflex™ allows you to scale CPU, Criminal Background White Papers storage, and I/O components independently within each Check System with Server system. Complete multidimensional flexibility allows SGI Origin Technology Solutions organizations to deploy. service, and expand system [more] components in every possible dimension to meet any Related News ShareAPhoto Unveils business demand. The only limitation is your imagination. Revolutionary Online Contact Us Digital Video Solution Using SGI Origin 3000 Configure Series Server [more] This System SGIOrigin Origin 3000:S ~ll.if.S 3200 SGIOrigin dem 3400 p';:2l;? i:t ~{!q(lj(~~ M'l\;iUlilM\;j SGIOrigin f·;ilS~".~ pi<~.hi 3800 Related Sites SGIOnyx 3000 Series Storage Networking Streaming Media privacy policy IQyestions/CQmments • Copyright © 1993-2001 Silicon Graphics, Inc. All rights reserved. ITrademark InfonDation °SGIOrigin Product Overview 3000 Series SGITM Origin™ 3000 Series of Servers Product Overview The SGITM Origin™ 3000 Series of servers sets the standard for high • S.G.LQ.r.igin.~2..QQ performance in today's market place by delivering flexibility, resiliency, • S.g.LQrisin investment protection, and performance in a new and innovative 3200c package.
    [Show full text]