Introduction to GPU Computing

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to GPU Computing INTRODUCTION TO HETEROGENEOUS/HYBRID COMPUTING François Courteille |Senior Solutions Architect, NVIDIA |[email protected] Agenda: 1 Introduction : Heterogeneous Computing & GPUs AGENDA 2 CPU versus GPU architecture 3 Accelerated computing roadmap 2015 2 Introduction : Heterogeneous Computing & GPUs 3 RACING TOWARD EXASCALE 4 ACCELERATORS SURGE IN WORLD’S TOP SUPERCOMPUTERS 100 Top500: # of Accelerated Supercomputers 100+ accelerated systems now on Top500 list 75 1/3 of total FLOPS powered by accelerators 50 NVIDIA Tesla GPUs sweep 23 of 24 new accelerated supercomputers Tesla supercomputers growing at 50% CAGR 25 over past five years 0 2013 2014 2015 NEXT-GEN SUPERCOMPUTERS ARE GPU-ACCELERATED SUMMIT SIERRA U.S. Dept. of Energy NOAA IBM Watson Pre-Exascale Supercomputers New Supercomputer for Next-Gen Breakthrough Natural Language for Science Weather Forecasting Processing for Cognitive Computing WHAT IS HETEROGENEOUS COMPUTING? Application Execution High Data Parallelism High Serial Performance CPU GPU + 7 ENTERPRISE HPC & CLOUD AUTONOMOUS GAMING DESIGN VIRTUALIZATION SERVICE PROVIDERS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing Evolution of GPUs “Kepler” 7B xtors GeForce 8800 681M xtors GeForce FX 250M xtors GeForce 3 GeForce 256 60M xtors RIVA 128 23M xtors 3M xtors 1995 2000 2001 2003 2006 2012 Fixed function Programmable shaders CUDA Performance Lead Continues to Grow Peak Double Precision FLOPS Peak Memory Bandwidth GFLOPS GB/s 3500 600 3000 K80 K80 500 2500 400 2000 K40 K40 300 1500 K20 K20 M2090 200 1000 M2090 M1060 Haswell Haswell 100 500 Ivy Bridge Sandy Bridge Ivy Bridge M1060 Sandy Bridge Westmere Westmere 0 0 2008 2009 2010 2011 2012 2013 2014 2008 2009 2010 2011 2012 2013 2014 NVIDIA GPU x86 CPU NVIDIA GPU x86 CPU GPU Motivation: Peak Flops & Memory BW CPU GPU Add GPUs: Accelerate Applications GPUs Enable Faster Deployment of Improved Algorithms Schedule pull-in due to GPUs GPUs CPUs Elastic Waveform Elastic Waveform Inversion Inversion Reverse Time Migration Reverse Time Migration Wave Equation Wave Equation KPrSDM Shot WEM (TTI) Gaussian Beam KPrSTM CAZ WEM (VTI) ACCELERATING discoveries USING A SUPERCOMPUTER POWERED BY 3,000 TESLA PROCESSORS, UNIVERSITY OF ILLINOIS SCIENTISTS PERFORMED THE FIRST ALL-ATOM SIMULATION OF THE HIV VIRUS AND DISCOVERED THE CHEMICAL STRUCTURE OF ITS CAPSID — “THE PERFECT TARGET FOR FIGHTING THE INFECTION.” WITHOUT GPU, THE SUPERCOMPUTER WOULD NEED TO BE 5X LARGER FOR SIMILAR PERFORMANCE. GOOGLE DATACENTER STANFORD AI LAB ACCELERATING INSIGHTS “ Now You Can Build Google’s “ $1M Artificial Brain on the Cheap 1,000 CPU Servers 600 kWatts 3 GPU-Accelerated Servers 4 kWatts 2,000 CPUs • 16,000 cores $5,000,000 12 GPUs • 18,432 cores $33,000 Deep learning with COTS HPC systems, A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro ICML 2013 Power is the Problem 120 petaflops | 376 megawatts Enough to power all of San Francisco CSCS Piz Daint Top 10 in Top500 and Green500 Top500 Green500 TFLOPS/s Site MFLOPS/W Site Rank Rank National Super Computer Centre 1 4,389.82 GSIC Center, Tokyo Tech 1 33,862.7 Guangzhou 2 3,631.70 Cambridge University 2 17,590.0 Oak Ridge National Lab #1 USA 3 3,517.84 University of Tsukuba 3 17,173.2 DOE, United States 4 3,459.46 SURFsara RIKEN Advanced Institute for Swiss National Supercomputing 4 10,510.0 5 3,185.91 Computational Science (CSCS) 6 3,131.06 ROMEO HPC Center 5 8,586.6 Argonne National Lab 7 3,019.72 CSIRO Swiss National Supercomputing#1 Europe 6 6,271.0 8 2,951.95 GSIC Center, Tokyo Tech Centre (CSCS) 9 2,813.14 Eni 7 5,168.1 University of Texas 10 2,629.10 (Financial Institution) 8 5,008.9 Forschungszentrum Juelich Mississippi State (top non- 16 2,495.12 NVIDIA) 9 4,293.3 DOE, United States 59 1,226.60 ICHEC (top X86 cluster) Piz Daint: 5,272 nodes with 1 x Xeon E5-2670 (SB) CPU, 10 3,143.5 Government 1 x NVIDIA K20X GPU, Cray XC30 at 2.0 MW CUDA: World’s Most Pervasive Parallel Programming Model Institutions with 700+ University Courses 14,000 CUDA Developers In 62 Countries 2,000,000 CUDA Downloads 487,000,000 CUDA GPUs Shipped 370 GPU-Accelerated Applications www.nvidia.com/appscatalog 70% OF TOP HPC APPS ACCELERATED INTERSECT360 SURVEY OF TOP APPS TOP 25 APPS IN SURVEY GROMACS LAMMPS SIMULIA Abaqus NWChem NAMD LS-DYNA AMBER Schrodinger ANSYS Mechanical Exelis IDL Gaussian MSC NASTRAN GAMESS ANSYS Fluent ANSYS CFX WRF Star-CD VASP CCSM OpenFOAM COMSOL Top 10 HPC Apps Top 50 HPC Apps CHARMM Star-CCM+ Quantum Espresso BLAST 90% 70% Accelerated Accelerated = All popular functions accelerated = Some popular functions accelerated Intersect360, Nov 2015 = In development “HPC Application Support for GPU Computing” = Not supported CORAL: Built for Grand Scientific Challenges Fusion Energy Climate Change Biofuels Role of material disorder, Study climate change adaptation and Search for renewable and statistics, and fluctuations in mitigation scenarios; realistically more efficient energy sources nanoscale materials and systems. represent detailed features Astrophysics Combustion Nuclear Energy Radiation transport – critical to Combustion simulations to Unprecedented high-fidelity astrophysics, laser fusion, atmospheric enable the next gen diesel/bio- radiation transport calculations for dynamics, and medical imaging fuels to burn more efficiently nuclear energy applications CPU versus GPU architecture 21 Low Latency or High Throughput? CPU ALU ALU Optimised for low-latency Control access to cached data sets ALU ALU Control logic for out-of-order Cache and speculative execution 10’s of threads DRAM GPU Optimised for data-parallel, throughput computation Architecture tolerant of memory latency Massive fine grain threaded parallelism DRAM More transistors dedicated to computation 10000’s of threads GPU Architecture: Two Main Components Global memory Analogous to RAM in a CPU server Accessible by both GPU and CPU Currently up to 12 GB per GPU DRAM I/F DRAM Bandwidth currently up to ~288 GB/s (Tesla products) ECC on/off (Quadro and Tesla products) I/F DRAM DRAM I/F DRAM HOST HOST I/F Streaming Multiprocessors (SMs) L2 Perform the actual computations I/F DRAM Each SM has its own: Giga Thread DRAM I/F DRAM Control units, registers, execution pipelines, caches DRAM I/F DRAM GPU Architecture SM-0 SM-1 SM-N GPU L2 SYSTEM MEMORY GPU DRAM GPU Memory Hierarchy SM-0 SM-1 SM-N Registers Registers Registers L1 SMEM L1 SMEM ~ 1 TB/S L1 SMEM L2 ~ 150 GB/S Global Memory Scientific Computing Challenge: Memory Bandwidth NVIDIA Technology Solves Memory Bandwidth Challenges DRAM I/F DRAM Shared Memory L2 Cache 1.3 TB/s DRAM I/F DRAM NVIDIA GPU 280 GB/s DRAM I/F DRAM HOST HOST I/F L2 DRAM I/F DRAM Giga Giga Thread GPU Memory Register DRAM I/F DRAM PCI-Express 177 GB/s 10.8 TB/s 6.4 GB/s DRAM I/F DRAM Kepler GK110 Block Diagram Architecture 7.1B Transistors 15 SMX units > 1 TFLOP FP64 1.5 MB L2 Cache 384-bit GDDR5 GPU SM Architecture Kepler SM Functional Units = CUDA cores L1 Cache 192 SP FP operations/clock SharedShared 64 DP FP operations/clock MemoryMemory Register file (256KB) Functional Register Units File Read-only Shared memory (16-48KB) Cache L1 cache (16-48KB) Constant Read-only cache (48KB) Cache Constant cache (8KB) SM SM SM SIMT Execution Model Thread: sequential execution unit All threads execute same sequential program Thread Threads execute in parallel Thread Block: a group of threads Threads within a block can cooperate Light-weight synchronization Data exchange Thread Block Grid: a collection of thread blocks Thread blocks do not synchronize with each other Communication between blocks is expensive Grid SIMT Execution Model Software Hardware Threads are executed by CUDA Cores CUDA Thread Core Thread blocks are executed on multiprocessors Thread blocks do not migrate Several concurrent thread blocks can reside on one Thread Block Multiprocessor multiprocessor - limited by multiprocessor resources (shared memory and register file) A kernel is launched as a grid of thread blocks Grid Device SIMT Execution Model Threads are organized into groups of 32 threads called “warps” All threads within a warp execute the same instruction simultaneously Simple Processing Flow PCI Bus 1. Copy input data from CPU memory/NIC to GPU memory Simple Processing Flow PCI Bus 1. Copy input data from CPU memory/NIC to GPU memory 2. Load GPU program and execute Simple Processing Flow PCI Bus 1. Copy input data from CPU memory/NIC to GPU memory 2. Load GPU program and execute 3. Copy results from GPU memory to CPU memory/NIC Accelerator Fundamentals We must expose enough parallelism to saturate the device Accelerator threads are slower than CPU threads Accelerators have orders of magnitude more threads Fine -grained parallelism is good Coarse-grained parallelism is bad t0 t1 t2 t3 t0 t0 t0 t0 t4 t5 t6 t7 t1 t1 t1 t1 t8 t9 t10 t11 t2 t2 t2 t2 t12 t13 t14 t15 t3 t3 t3 t3 Best Practices Optimize Data Locality: GPU Minimize data transfers between CPU and GPU System GPU Memory Memory Best Practices Optimize Data Locality: SM Minimize redundant accesses to L2 and DRAM Store intermediate results in registers instead of global memory Use shared memory for data frequently used within a thread block Use const __restrict__ to take advantage of read-only cache L2 GPU Cache DRAM SM 3 Ways to Accelerate Applications Applications Compiler Programming Libraries Directives Languages Easy to use Easy to use Most Performance Most Performance Portable code Most Flexibility Resources Learn more about GPUs CUDA resource center: http://docs.nvidia.com/cuda GTC on-demand and
Recommended publications
  • Drivers for Windows Compressed Modes User’S Guide
    Drivers for Windows Compressed Modes User’s Guide Version 2.1 NVIDIA Corporation October 24, 2002 NVIDIA Drivers Compressed Modes User’s Guide Version 2.1 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Copyright © 2002 NVIDIA Corporation. All rights reserved. This software may not, in whole or in part, be copied through any means, mechanical, electromechanical, or otherwise, without the express permission of NVIDIA Corporation. Information furnished is believed to be accurate and reliable. However, NVIDIA assumes no responsibility for the consequences of use of such information nor for any infringement of patents or other rights of third parties, which may result from its use. No License is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in the software are subject to change without notice. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. NVIDIA, the NVIDIA logo, GeForce, GeForce2 Ultra, GeForce2 MX, GeForce2 GTS, GeForce 256, GeForce3, Quadro2, NVIDIA Quadro2, Quadro2 Pro, Quadro2 MXR, Quadro, NVIDIA Quadro, Vanta, NVIDIA Vanta, TNT2, NVIDIA TNT2, TNT, NVIDIA TNT, RIVA, NVIDIA RIVA, NVIDIA RIVA 128ZX, and NVIDIA RIVA 128 are registered trademarks or trademarks of NVIDIA Corporation in the United States and/or other countries. Intel and Pentium are registered trademarks of Intel. Microsoft, Windows, Windows NT, Direct3D, DirectDraw, and DirectX are registered trademarks of Microsoft Corporation. CDRS is a trademark and Pro/ENGINEER is a registered trademark of Parametric Technology Corporation. OpenGL is a registered trademark of Silicon Graphics Inc.
    [Show full text]
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
    Case M:07-cv-01826-WHA Document 249 Filed 11/08/2007 Page 1 of 34 1 BOIES, SCHILLER & FLEXNER LLP WILLIAM A. ISAACSON (pro hac vice) 2 5301 Wisconsin Ave. NW, Suite 800 Washington, D.C. 20015 3 Telephone: (202) 237-2727 Facsimile: (202) 237-6131 4 Email: [email protected] 5 6 BOIES, SCHILLER & FLEXNER LLP BOIES, SCHILLER & FLEXNER LLP JOHN F. COVE, JR. (CA Bar No. 212213) PHILIP J. IOVIENO (pro hac vice) 7 DAVID W. SHAPIRO (CA Bar No. 219265) ANNE M. NARDACCI (pro hac vice) KEVIN J. BARRY (CA Bar No. 229748) 10 North Pearl Street 8 1999 Harrison St., Suite 900 4th Floor Oakland, CA 94612 Albany, NY 12207 9 Telephone: (510) 874-1000 Telephone: (518) 434-0600 Facsimile: (510) 874-1460 Facsimile: (518) 434-0665 10 Email: [email protected] Email: [email protected] [email protected] [email protected] 11 [email protected] 12 Attorneys for Plaintiff Jordan Walker Interim Class Counsel for Direct Purchaser 13 Plaintiffs 14 15 UNITED STATES DISTRICT COURT 16 NORTHERN DISTRICT OF CALIFORNIA 17 18 IN RE GRAPHICS PROCESSING UNITS ) Case No.: M:07-CV-01826-WHA ANTITRUST LITIGATION ) 19 ) MDL No. 1826 ) 20 This Document Relates to: ) THIRD CONSOLIDATED AND ALL DIRECT PURCHASER ACTIONS ) AMENDED CLASS ACTION 21 ) COMPLAINT FOR VIOLATION OF ) SECTION 1 OF THE SHERMAN ACT, 15 22 ) U.S.C. § 1 23 ) ) 24 ) ) JURY TRIAL DEMANDED 25 ) ) 26 ) ) 27 ) 28 THIRD CONSOLIDATED AND AMENDED CLASS ACTION COMPLAINT BY DIRECT PURCHASERS M:07-CV-01826-WHA Case M:07-cv-01826-WHA Document 249 Filed 11/08/2007 Page 2 of 34 1 Plaintiffs Jordan Walker, Michael Bensignor, d/b/a Mike’s Computer Services, Fred 2 Williams, and Karol Juskiewicz, on behalf of themselves and all others similarly situated in the 3 United States, bring this action for damages and injunctive relief under the federal antitrust laws 4 against Defendants named herein, demanding trial by jury, and complaining and alleging as 5 follows: 6 NATURE OF THE CASE 7 1.
    [Show full text]
  • In5050 – Gpu & Cuda
    IN5050 – GPU & CUDA Håkon Kvale Stensland Simula Research Laboratory / Department for Informatics PC Graphics Timeline § Challenges: − Render infinitely complex scenes − And extremely high resolution − In 1/60th of one second (60 frames per second) § Graphics hardware has evolved from a simple hardwired pipeline to a highly programmable multiword processor DirectX 6 DirectX 7 DirectX 8 DirectX 9 DirectX 9.0c DirectX 9.0c DirectX 10 DirectX 5 Multitexturing T&L TextureStageState SM 1.x SM 2.0 SM 3.0 SM 3.0 SM 4.0 Riva 128 Riva TNT GeForce 256 GeForce 3 Cg GeForceFX GeForce 6 GeForce 7 GeForce 8 1998 1999 2000 2001 2002 2003 2004 2005 2006 University of Oslo IN5050, Pål Halvorsen, Carsten Griwodz, Håkon Stensland GPU – Graphics Processing Units University of Oslo IN5050, Pål Halvorsen, Carsten Griwodz, Håkon Stensland Basic 3D Graphics Pipeline Application Host Scene Management Geometry Rasterization GPU Frame Pixel Processing Buffer Memory ROP/FBI/Display University of Oslo IN5050, Pål Halvorsen, Carsten Griwodz, Håkon Stensland Graphics in the PC Architecture § PCIe (PCI Express) Between processor and chipset − Memory Control now integrated in CPU § The old “NorthBridge” integrated onto CPU − PCI Express 4.0 x16 bandwidth at 64 GB/s (32 GB in each direction) § “SouthBridge” (X570) handles all other peripherals § Most mainstream CPUs now come with integrated GPU − Same capabilities as discrete GPU’s − Less performance (limited by die space and power) AMD «Raven Ridge» Zen+ APU University of Oslo IN5050, Pål Halvorsen, Carsten Griwodz, Håkon Stensland High-end «Graphics» Hardware § nVIDIA Ampere Architecture § The latest generation GPU, codenamed A100 § 54,2 billion transistors § 6912 Processing cores (SP) − Mixed precision − Dedicated Tensor cores − PCI Express 4.0 − NVLink interconnect Tesla V100 − Hardware support for preemption.
    [Show full text]
  • 4010, 237 8514, 226 80486, 280 82786, 227, 280 a AA. See Anti-Aliasing (AA) Abacus, 16 Accelerated Graphics Port (AGP), 219 Acce
    Index 4010, 237 AIB. See Add-in board (AIB) 8514, 226 Air traffic control system, 303 80486, 280 Akeley, Kurt, 242 82786, 227, 280 Akkadian, 16 Algebra, 26 Alias Research, 169 Alienware, 186 A Alioscopy, 389 AA. See Anti-aliasing (AA) All-In-One computer, 352 Abacus, 16 All-points addressable (APA), 221 Accelerated Graphics Port (AGP), 219 Alpha channel, 328 AccelGraphics, 166, 273 Alpha Processor, 164 Accel-KKR, 170 ALT-256, 223 ACM. See Association for Computing Altair 680b, 181 Machinery (ACM) Alto, 158 Acorn, 156 AMD, 232, 257, 277, 410, 411 ACRTC. See Advanced CRT Controller AMD 2901 bit-slice, 318 (ACRTC) American national Standards Institute (ANSI), ACS, 158 239 Action Graphics, 164, 273 Anaglyph, 376 Acumos, 253 Anaglyph glasses, 385 A.D., 15 Analog computer, 140 Adage, 315 Anamorphic distortion, 377 Adage AGT-30, 317 Anatomic and Symbolic Mapper Engine Adams Associates, 102 (ASME), 110 Adams, Charles W., 81, 148 Anderson, Bob, 321 Add-in board (AIB), 217, 363 AN/FSQ-7, 302 Additive color, 328 Anisotropic filtering (AF), 65 Adobe, 280 ANSI. See American national Standards Adobe RGB, 328 Institute (ANSI) Advanced CRT Controller (ACRTC), 226 Anti-aliasing (AA), 63 Advanced Remote Display Station (ARDS), ANTIC graphics co-processor, 279 322 Antikythera device, 127 Advanced Visual Systems (AVS), 164 APA. See All-points addressable (APA) AED 512, 333 Apalatequi, 42 AF. See Anisotropic filtering (AF) Aperture grille, 326 AGP. See Accelerated Graphics Port (AGP) API. See Application program interface Ahiska, Yavuz, 260 standard (API) AI.
    [Show full text]
  • Troubleshooting Guide Table of Contents -1- General Information
    Troubleshooting Guide This troubleshooting guide will provide you with information about Star Wars®: Episode I Battle for Naboo™. You will find solutions to problems that were encountered while running this program in the Windows 95, 98, 2000 and Millennium Edition (ME) Operating Systems. Table of Contents 1. General Information 2. General Troubleshooting 3. Installation 4. Performance 5. Video Issues 6. Sound Issues 7. CD-ROM Drive Issues 8. Controller Device Issues 9. DirectX Setup 10. How to Contact LucasArts 11. Web Sites -1- General Information DISCLAIMER This troubleshooting guide reflects LucasArts’ best efforts to account for and attempt to solve 6 problems that you may encounter while playing the Battle for Naboo computer video game. LucasArts makes no representation or warranty about the accuracy of the information provided in this troubleshooting guide, what may result or not result from following the suggestions contained in this troubleshooting guide or your success in solving the problems that are causing you to consult this troubleshooting guide. Your decision to follow the suggestions contained in this troubleshooting guide is entirely at your own risk and subject to the specific terms and legal disclaimers stated below and set forth in the Software License and Limited Warranty to which you previously agreed to be bound. This troubleshooting guide also contains reference to third parties and/or third party web sites. The third party web sites are not under the control of LucasArts and LucasArts is not responsible for the contents of any third party web site referenced in this troubleshooting guide or in any other materials provided by LucasArts with the Battle for Naboo computer video game, including without limitation any link contained in a third party web site, or any changes or updates to a third party web site.
    [Show full text]
  • Compressed Modes User's Guide
    Drivers for Windows Compressed Modes User’s Guide Version 2.0 NVIDIA Corporation August 27, 2002 NVIDIA Drivers Compressed Modes User’s Guide Version 2.0 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Copyright © 2002 NVIDIA Corporation. All rights reserved. This software may not, in whole or in part, be copied through any means, mechanical, electromechanical, or otherwise, without the express permission of NVIDIA Corporation. Information furnished is believed to be accurate and reliable. However, NVIDIA assumes no responsibility for the consequences of use of such information nor for any infringement of patents or other rights of third parties, which may result from its use. No License is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in the software are subject to change without notice. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. NVIDIA, the NVIDIA logo, GeForce, GeForce2 Ultra, GeForce2 MX, GeForce2 GTS, GeForce 256, GeForce3, Quadro2, NVIDIA Quadro2, Quadro2 Pro, Quadro2 MXR, Quadro, NVIDIA Quadro, Vanta, NVIDIA Vanta, TNT2, NVIDIA TNT2, TNT, NVIDIA TNT, RIVA, NVIDIA RIVA, NVIDIA RIVA 128ZX, and NVIDIA RIVA 128 are registered trademarks or trademarks of NVIDIA Corporation in the United States and/or other countries. Intel and Pentium are registered trademarks of Intel. Microsoft, Windows, Windows NT, Direct3D, DirectDraw, and DirectX are registered trademarks of Microsoft Corporation. CDRS is a trademark and Pro/ENGINEER is a registered trademark of Parametric Technology Corporation. OpenGL is a registered trademark of Silicon Graphics Inc.
    [Show full text]
  • Programmability
    Programmability - a New Frontier in Graphics Hardware A Revolution in Graphics Hardware The Near Future • Moving from graphics accelerators to processors • Full hardware OpenGL and DirectX pipelines ProgrammabilityProgrammability ChangesChanges thethe WorldWorld • graphics hardware pipelines are becoming massively programmable • will fundamentally change graphics • allows hyper-realistic characters, special effects, and lighting and shading 3D Graphics is about • Animated films (Bug’s Life, Toy Story, etc.) • Special Effects in live action movies (The Matrix) • Interactive Entertainment (Video games) • Computer Models of real world objects • Or, objects that haven’t been invented yet • Making reality more fantastic • Making fantasies seem real Why are Movie Special Effects Exciting and Interesting? • Suspension of Disbelief • Something amazing is happening • But, you believe it, because it is “real” • Realistic and detailed characters • Motion, and emotion • Realistic and recognizable materials • Chrome looks like chrome • Skin looks like skin • Action! Live action sfx slide The Year 2000 Graphics Pipeline vertex transform T & L and lighting setup rasterizer texture blending per-pixel texture fb anti-alias Pixar’s Geri – A Believable Old Man • Not a real person • Geri is built from Curved Surfaces • Curved surfaces are broken down into triangles • Each triangle is transformed into position • Each pixel in each triangle is shaded • Every frame • 24 (movie) or 60 (PC) times per second 3D Movie Special Effects Come to PC and Console Graphics
    [Show full text]
  • 3Dexplorer® 3000 USER's MANUAL
    R 3DexPlorer® 3000 AGP-V3000 Graphics Card USER’S MANUAL Hardware & Video Drivers USER’S NOTICE No part of this manual, including the products and softwares described in it, may be repro- duced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form or by any means, except documentation kept by the purchaser for backup pur- poses, without the express written permission of ASUSTeK COMPUTER INC. (“ASUS”). ASUS PROVIDES THIS MANUAL “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OR CONDITIONS OF MERCHANTABILITY OR FITNESS FOR A PAR- TICULAR PURPOSE. IN NO EVENT SHALL ASUS, ITS DIRECTORS, OFFICERS, EMPLOYEES OR AGENTS BE LIABLE FOR ANY INDIRECT, SPECIAL, INCIDEN- TAL, OR CONSEQUENTIAL DAMAGES (INCLUDING DAMAGES FOR LOSS OF PROFITS, LOSS OF BUSINESS, LOSS OF USE OR DATA, INTERRUPTION OF BUSI- NESS AND THE LIKE), EVEN IF ASUS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES ARISING FROM ANY DEFECT OR ERROR IN THIS MANUAL OR PRODUCT. Products and corporate names appearing in this manual may or may not be registered trade- marks or copyrights of their respective companies, and are used only for identification or explanation and to the owners’ benefit, without intent to infringe. • Intel, LANDesk, and Pentium are registered trademarks of Intel Corporation. • IBM and OS/2 are registered trademarks of International Business Machines. • Symbios is a registered trademark of Symbios Logic Corporation. • Windows and MS-DOS are registered trademarks of Microsoft Corporation. • Sound Blaster AWE32 and SB16 are trademarks of Creative Technology Ltd.
    [Show full text]
  • NVIDIA RIVA TNT POWERS OVER 30 TITLES at ECTS '98 Submitted By: Edelman Ltd Saturday, 5 September 1998
    NVIDIA RIVA TNT POWERS OVER 30 TITLES AT ECTS '98 Submitted by: Edelman Ltd Saturday, 5 September 1998 Messiah and Direct3D Version of Unreal Among First to Premier on RIVA TNT ECTS, LONDON SEPTEMBER 5, 1998 At ECTS in London, the NVIDIA Corporation today announced that over 30 game titles, including the Direct3D version of Unreal, will be shown on the RIVA TNT in Microsoft's booth (#N390) at ECTS, in London, September 6-8. Shiny Interactive will also premiere their hot new game title Messiah using the key features of the RIVA TNT. The RIVA TNT 3D processor, which recently ranked first on the 3D and 2D graphics benchmark tests conducted by Mercury Research Inc., is directly matched for Microsoft’s DirectX 6.0 application programming interface (API). "The RIVA TNT is the first 3D processor designed to offer the most complete set of features to match Microsoft’s new DirectX 6.0," said Michael Hara, director of strategic marketing at NVIDIA. "Developers can now fully exploit the capabilities of the API and the RIVA TNT and no longer have to compromise visual quality and performance." Messiah by Shiny Entertainment will debut on RIVA TNT in the Interplay booth. Some of the hottest new game titles being shown in the Microsoft booth running on RIVA TNT include: Unreal by Epic; Global Domination and ODT by Psygnosis; Motocross Madness by Microsoft/Rainbow; Colin McRae Rally from Codemasters; ActuaSoccer 3 by Gremlin; Wargasm by DID/Infogrammes; Combat Flight Simulator by Microsoft; Warzone by Eidos and Hostile Waters by Rage Interactive. The RIVA TNT processor, with its 32-bit true color TwiN Texel multi-texturing architecture, is designed to deliver two pixels-per-clock performance and to provide consumers with a stunningly realistic 2D and 3D graphics experience.
    [Show full text]
  • NV RIVA TNT PO Sanford 5/1/98 11:49 AM Page 1
    NV RIVA TNT PO sanford 5/1/98 11:49 AM Page 1 RIVA TNT Product Overview PRODUCT DESCRIPTION The RIVA TNT™ is the first integrated, 128-bit 3D processor that processes 2 pixels per-clock-cycle which enables single-pass multi-texturing and delivers a mind-blowing 250 million pixels-per-second fill rate. RIVA TNT’s (twin-texel) 32-bit color pipeline, 24-bit Z, 8-bit stencil buffer and per-pixel precision delivers unsur- passed quality and performance allowing developers to write standards-based applications with stunning visual effects and realism. The RIVA TNT offers industry-leading 2D and 3D performance, meeting all the requirements of the main- stream PC graphics market and Microsoft’s PC’98 and DirectX 6.0 initiatives.The RIVA TNT delivers the indus- try’s fastest Direct3D™ acceleration solution and also delivers leadership VGA, 2D and video performance enabling a range of applications from 3D games to DVD and video conferencing. A complete high performance OpenGL ICD is included in the standard software driver package. ARCHITECTURE HIGHLIGHTS RIVA TNT BLOCK DIAGRAM ■ 128-bit wide graphics engine and frame buffer ■ Massive 3.2GB/sec frame buffer bandwidth PCI/AGP architecture supporting up to 200MHz memory Demand-Driven Prefetcher Scatter-Gather Virtual DMA ■ 250 million pixels/sec peak fill rate Vertex Cache ■ 8 million triangles/sec peak Advanced Floating Point Setup Advanced Floating Point Setup ■ Large 12K on-chip cache & Pixel Processor & Pixel Processor Lighting Lighting ■ Texture Cache 10GFLOPS floating-point geometry processor
    [Show full text]
  • Evolution of Gpus.Pdf
    Evolution of GPUs Chris Seitz PDF created with pdfFactory Pro trial version www.pdffactory.com Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing 1999-2000: Transform and lighting 2001: Programmable vertex shader 2002-2003: Programmable pixel shader 2004: Shader model 3.0 and 64-bit color support ©2004 NVIDIA Corporation. All rights reserved. PDF created with pdfFactory Pro trial version www.pdffactory.com Real-Time Rendering Graphics hardware enables real-time rendering Real-time means display rate at more than 10 images per second 3D Scene = Image = Collection of Array of pixels 3D primitives (triangles, lines, points) ©2004 NVIDIA Corporation. All rights reserved. PDF created with pdfFactory Pro trial version www.pdffactory.com PC Architecture Motherboard CPU System Memory Bus Port (PCI, AGP, PCIe) Graphics Board Video Memory GPU ©2004 NVIDIA Corporation. All rights reserved. PDF created with pdfFactory Pro trial version www.pdffactory.com Hardware Graphics Pipeline Application 3D Geometry 2D Rasterization Pixels Stage Triangles Stage Triangles Stage For each For each triangle vertex: triangle: Transform Rasterize position: triangle 3D -> screen Interpolate Compute vertex attributes attributes Shade pixels Resolve visibility ©2004 NVIDIA Corporation. All rights reserved. PDF created with pdfFactory Pro trial version www.pdffactory.com Real-time Graphics 1997: RIVA 128 –3M Transistors ©2004 NVIDIA Corporation. All rights reserved. PDF created with pdfFactory Pro trial version www.pdffactory.com Real-time Graphics 2004: ©2004 NVIDIA Corporation. All rights reserved. UnRealEngine 3.0 Images Courtesy of Epic Games PDF created with pdfFactory Pro trial version www.pdffactory.com ‘95-‘98: Texture Mapping & Z-Buffer CPU GPU Application / Rasterization Stage Geometry Stage Raster Texture Rasterizer Operations Unit Unit 2D Triangles Bus (PCI) Frame 2D Triangles Textures Buffer Textures System Memory Video Memory ©2004 NVIDIA Corporation.
    [Show full text]
  • Unbreakable Enterprise Kernel Release Notes for Unbreakable Enterprise Kernel Release 6 Update 2
    Unbreakable Enterprise Kernel Release Notes for Unbreakable Enterprise Kernel Release 6 Update 2 F38480-03 August 2021 Oracle Legal Notices Copyright © 2021, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract.
    [Show full text]