The Evolution of Modern Parallel Computing
Sanford H. Russell Director of CUDA Marketing, NVIDIA Corporation
Taipei | May 19 , 2011 GPU Computing
Is not CPU vs. GPU
It is CPU + GPU
2 GPU Computing
Application Code
Rest of Sequential Only Critical Functions CPU Code GPU Parallelize using CUDA CPU Programming Model
+ 3 GPU Computing Milestones
2002 2007 2007
GPGPU G80 NVIDIA First Parallel C Compiler Programming on Computing Architecture top of OGL (Single Precision) C SDK for GPU
4 GPU Computing Milestones
2009 2010
1st True HPC Class GPU Industry Standard IDE Fermi Architecture Parallel Nsight for DP, ECC and C++ support Microsoft Visual Studio 5 Tesla GPUs Power 3 of Top 5 Supercomputers
#1 : Tianhe-1A #3 : Nebulae #4 : Tsubame 2.0 7168 Tesla GPU’s 2.5 PFLOPS 4650 Tesla GPU’s 1.2 PFLOPS 4224 Tesla GPU’s 1.194 PFLOPS
6 Tesla in 3 of Top 5 Supercomputers Performance 2500
2000
1500
Gigaflops 1000
500
0 Tianhe-1A Jaguar Nebulae Tsubame Hopper II
7 Tesla Best Performance/Watt
2500 8 Power 7 2000 6
1500 5
4 Gigaflops 1000 3 Megawatts
2 500 1
0 0 Tianhe-1A Jaguar Nebulae Tsubame Hopper II
8 World’s Fastest HPC Processor Tesla M2090: The 512 Core Fermi
512 CUDA Cores
665 GFlops
178 GB/s memory B/W
9 Industry and Research Partners
Oil and gas Edu/Research Government Life Sciences Finance Manufacturing
Reverse Time Astrophysics Signal Processing Bio-chemistry Risk Analytics Structural Migration Molecular Satellite Imaging Bio-informatics Monte Carlo Mechanics Kirchoff Time Dynamics Video Analytics Material Science Options Pricing Computational Migration Weather / Climate Synthetic Aperture Sequence Analysis Insurance Fluid Dynamics Reservoir Sim Modeling Radar Genomics modeling Machine Vision Electromagnetics
10 Simulating Quarks - Lattice QCD Simulation
Professor Ting-Wei Chiu Department of Physics National Taiwan University 15 Tflops for $200,000
Lattice QCD
“Performance of Blue Gene/L at 1% the Cost”
11 Abaqus: Accelerated by CUDA
Faster = Better Quality
Engine Block s4b 5 Million Degrees of Freedom
Simulate More Scenarios 2x Faster More Fuel Efficient Engine Lower CO2 Emissions
12 The Future ARM is Pervasive and Open … and supported by Microsoft 9 Annual Shipments 8 ARM 7 x86 6
5
4 Unitsin Billions 3
2
1
0 2005 2006 2007 2008 2009 2010e 2011e 2012e 2013e 2014e Source: ARM, Mercury Research, NVIDIA 14 Project Denver NVIDIA-Designed High Performance ARM Core
15 CUDA GPU Roadmap
16 Maxwell
14
12
10 per Watt per 8 Kepler 6
DP GFLOPS GFLOPS DP 4 Fermi 2 Tesla
2007 2009 2011 2013 16 Summary and Call to Action
Heterogeneous Computing has achieved commercial volume CPU + GPU
Developers: Developing on NVIDIA based systems allows you to scale your work onto Heterogeneous Clusters and Supercomputers
Scientist: Publish your work, share with your peers
Industry: Buy systems that are heterogeneous capable 17