The Evolution of Modern Parallel Computing

Sanford H. Russell Director of CUDA Marketing, Corporation

Taipei | May 19 , 2011 GPU Computing

Is not CPU vs. GPU

It is CPU + GPU

2 GPU Computing

Application Code

Rest of Sequential Only Critical Functions CPU Code GPU Parallelize using CUDA CPU Programming Model

+ 3 GPU Computing Milestones

2002 2007 2007

GPGPU G80 NVIDIA First Parallel C Compiler Programming on Computing Architecture top of OGL (Single Precision) C SDK for GPU

4 GPU Computing Milestones

2009 2010

1st True HPC Class GPU Industry Standard IDE Fermi Architecture Parallel Nsight for DP, ECC and C++ support Microsoft Visual Studio 5 Tesla GPUs Power 3 of Top 5

#1 : Tianhe-1A #3 : Nebulae #4 : Tsubame 2.0 7168 Tesla GPU’s 2.5 PFLOPS 4650 Tesla GPU’s 1.2 PFLOPS 4224 Tesla GPU’s 1.194 PFLOPS

6 Tesla in 3 of Top 5 Supercomputers Performance 2500

2000

1500

Gigaflops 1000

500

0 Tianhe-1A Jaguar Nebulae Tsubame Hopper II

7 Tesla Best Performance/Watt

2500 8 Power 7 2000 6

1500 5

4 Gigaflops 1000 3 Megawatts

2 500 1

0 0 Tianhe-1A Jaguar Nebulae Tsubame Hopper II

8 World’s Fastest HPC Processor Tesla M2090: The 512 Core Fermi

512 CUDA Cores

665 GFlops

178 GB/s memory B/W

9 Industry and Research Partners

Oil and gas Edu/Research Government Life Sciences Finance Manufacturing

Reverse Time Astrophysics Signal Processing Bio-chemistry Risk Analytics Structural Migration Molecular Satellite Imaging Bio-informatics Monte Carlo Mechanics Kirchoff Time Dynamics Video Analytics Material Science Options Pricing Computational Migration Weather / Climate Synthetic Aperture Sequence Analysis Insurance Fluid Dynamics Reservoir Sim Modeling Radar Genomics modeling Machine Vision Electromagnetics

10 Simulating Quarks - Lattice QCD Simulation

Professor Ting-Wei Chiu Department of Physics National Taiwan University 15 Tflops for $200,000

Lattice QCD

“Performance of Blue Gene/L at 1% the Cost”

11 Abaqus: Accelerated by CUDA

Faster = Better Quality

Engine Block s4b 5 Million Degrees of Freedom

Simulate More Scenarios 2x Faster More Fuel Efficient Engine Lower CO2 Emissions

12 The Future ARM is Pervasive and Open … and supported by Microsoft 9 Annual Shipments 8 ARM 7 6

5

4 Unitsin Billions 3

2

1

0 2005 2006 2007 2008 2009 2010e 2011e 2012e 2013e 2014e Source: ARM, Mercury Research, NVIDIA 14 Project Denver NVIDIA-Designed High Performance ARM Core

15 CUDA GPU Roadmap

16 Maxwell

14

12

10 per Watt per 8 Kepler 6

DP GFLOPS GFLOPS DP 4 Fermi 2 Tesla

2007 2009 2011 2013 16 Summary and Call to Action

Heterogeneous Computing has achieved commercial volume CPU + GPU

Developers: Developing on NVIDIA based systems allows you to scale your work onto Heterogeneous Clusters and Supercomputers

Scientist: Publish your work, share with your peers

Industry: Buy systems that are heterogeneous capable 17