A BRIEF HISTORY OF GPGPU Mark Harris Chief Technologist, GPU Computing UNC Ph.D. 2003 2 A BRIEF HISTORY OF GPGPU
fd
General-Purpose computation on Graphics Processing Units
3 THE FIRST GPGPU: IKONAS RDS-3000 1978: Nick England & Mary Whitton founded Ikonas Graphics Systems Tim Van Hook wrote microcode for solid modeling, ray tracing (SIGGRAPH ’86)
From a 1985 Video: “All computation is taking place in the Adage 3000 Display” http://www.virhistory.com/ikonas/ikonas.html 4 UNC PIXEL PLANES AND PIXELFLOW Procedural textures on Pixel Planes 5 (Rhodes et al. 1992)
Proceedings 1992 Symposium on Interactive 3D Graphics Cambridge, Massachusetts 29 March - 1 April 1992
PixelFlow Program Co-Chairs Marc Levoy Edwin E. Catmull 100,000+ 100MHzStanford 8-bit University processors Pixar Symposium Chair Early real-time programmable shading (Olano/Lastra ‘98) David Zeltzer Kedem et al. (‘98) usedMIT forMedia unix Laboratory password cracking
Sponsored by the following organizations:
Office of Naval Research 5 National Science Foundation USA Ballistic Research Laboratory Hewlett-Packard Silicon Graphics Sun Microsystems MIT Media Laboratory
In Cooperation with ACM SIGGRAPH GEFORCE 1-3: THE DAWN OF GPGPU (‘99-’01) GeForce 256: First “GPU” GeForce 3: First programmable GPU Vertex Shaders – programmable vertex transforms, 32-bit float Data-dependent, configurable texturing + register combiners Enabled early GPGPU results Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2 Larsen &McAllister (2001): first GPU matrix multiplication (8-bit) Rumpf & Strzodka (2001): first GPU PDEs (diffusion, image segmentation) NVIDIA SDK Game of Life, Shallow Water (Greg James, 2001)
6 PHYSICALLY BASED SIMULATION ON GEFORCE 3 Approximate simulation of natural phenomena Boiling liquid, fluid convection, chemical reaction-diffusion Inaccurate due to low GPU precision
“Physically-Based Visual Simulation on Graphics Hardware”. Harris, Coombe, Scheuermann, and Lastra. Graphics Hardware 2002 7 NAMING A TREND “Application of graphics hardware to non-graphics applications” “General computations on graphics hardware” “Exploiting special-purpose hardware for alternative purposes”
Let’s name this thing that people are doing! I coined “GPGPU” and created home page November 2002 home on the web to collect research / resources Interest grew quickly: launched GPGPU.org August 2003
8 GEFORCE FX (2003) : FLOATING POINT PIXELS True Programmability enabled broader simulation research
Ray Tracing (Purcell, 2002), Photon Maps (Purcell, 2003)
Radiosity (Carr et al., 2003 & Coombe et al., 2004)
PDE solvers
Red-black Gauss-Seidel (Harris et al., 2003)
Conjugate gradient (Bolz et al. 2003, Krueger et al. 2003)
Multigrid (Goodnight et al. 2003)
Physically-based simulation
Fluid and cloud simulation: (Krueger et al. 2003, Harris et al. 2003)
Cloth simulation (Green, 2003)
Ice crystal formation (Kim and Lin, 2003)
FFT (Moreland and Angel, 2003)
High-level language: Brook for GPUs (Buck et al. 2004)
9 GPU CLOUD SIMULATION My Ph.D. Dissertation: visually realistic cloud simulation on GPUs 2D & 3D Incompressible Navier-Stokes fluid Thermodynamics (latent heat, diffusion) Water condensation / evaporation Light scattering simulation for rendering Programmed in OpenGL with pixel shaders
10 “Real-Time Cloud Simulation and Rendering”. Mark Harris Ph.D. Dissertation U. of North Carolina. 2003 CUDA AND THE G80 GPU (2006)
First GPU arch. and software platform designed for computing Dedicated computing mode – threads rather than pixels/vertices General, byte-addressable memory architecture First C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge in GPGPU development Not just graphics PhDs
11 ACCELERATING
DISCOVERIES
USING A SUPERCOMPUTER POWERED BY 3,000 TESLA PROCESSORS, UNIVERSITY OF ILLINOIS SCIENTISTS PERFORMED THE FIRST ALL-ATOM SIMULATION OF THE HIV VIRUS AND DISCOVERED THE CHEMICAL STRUCTURE OF ITS CAPSID — “THE PERFECT TARGET FOR FIGHTING THE INFECTION.”
WITHOUT GPUS, THE SUPERCOMPUTER WOULD NEED TO BE 5X LARGER FOR SIMILAR PERFORMANCE.
12 FROM HPC TO ENTERPRISE DATACENTERS
Oil & Gas Higher Ed Government Supercomputing Finance Consumer Web
Air Force Research Laboratory
Tokyo Institute of Technology
Naval Research Laboratory
13 MACHINE LEARNING USING DEEP NEURAL NETWORKS
Input Result
Hinton et al., 2006; Bengio et al., 2007; Bengio & LeCun, 2007; Lee et al., 2008; 2009 Visual Object Recognition Using Deep Convolutional Neural Networks Rob Fergus (New York University / Facebook) http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php#2985 14 GPU-ACCELERATED DEEP LEARNING
START-UPS Image Detection
Face Recognition
Gesture Recognition
Video Search & Analytics
Speech Recognition & Translation
Image and Video Understanding
Recommendation Engines
Indexing & Search
15 COMMON PROGRAMMING APPROACHES Across Heterogeneous Architectures
Libraries AmgX cuBLAS
Compiler Directives
Programming x86 / Languages
16 Unified Memory DRAMATICALLY LOWER DEVELOPER EFFORT
Past Developer View Developer View With Unified Memory
System GPU Memory Unified Memory Memory
17 PARALLELISM IN MAINSTREAM LANGUAGES
Enable more programmers to write parallel software Give programmers the choice of language to use Parallel computing support in key languages
C
18 FUTURE C++: PARALLEL STL
std::vector
// previous standard sequential loop std::for_each(vec.begin(), vec.end(), f);
// explicitly sequential loop std::for_each(std::seq, vec.begin(), vec.end(), f);
// permitting parallel execution std::for_each(std::par, vec.begin(), vec.end(), f);
Complete set of parallel primitives: for_each, sort, reduce, scan, etc.
N3960 Technical Specification Working Draft: ISO C++ committee voted unanimously to http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4352.html accept as official tech. specification working draft Prototype: https://github.com/n3554/n3554
19 PARTING WORDS OF WISDOM
Stand Up! “Keep it narrow and doable” – Fred Brooks “Write a little bit every day” – Fred Brooks “If you measure it, you can improve it” – Jen-Hsun Huang “But you have to measure the right thing!”
20 1999: ADVENT OF THE GPU
NVIDIA GeForce 256 Coined the term “Graphics Processing Unit”
“A single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second.”
Register Combiners 4 RGB Inputs Fragment Color 4 Alpha Inputs General 3 RGB Outputs Specular Color Combiner 0 configurable multipass shading 3 Alpha Outputs Fog Color/Factor 4 RGB Inputs 4 Alpha Inputs Beginning of GPU programmability Texture 0 General Texture 3 RGB Outputs Combiner 1 3 Alpha Outputs Fetch Texture 1
Spare 0 Specular Color 6 RGB Inputs Final Combiner 1 Alpha Input RegisterSet 21 EARLY PC & WORKSTATION GRAPHICS
Rasterizer, texture unit, z-buffer, frame buffer Fixed-point math, fixed-function interpolation / texturing
Lengyel et al. (1990) – Robot motion planning Use rasterizer to fill minkowski sum polygons HP 9000 TurboSRX Workstation Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2 Render cones – rasterizer and z-buffer compute voronoi diagram
22