3-1-11: Master Deck- Please Save to Your Desktop
Total Page:16
File Type:pdf, Size:1020Kb
The Evolution of Modern Parallel Computing Sanford H. Russell Director of CUDA Marketing, NVIDIA Corporation Taipei | May 19 , 2011 GPU Computing Is not CPU vs. GPU It is CPU + GPU 2 GPU Computing Application Code Rest of Sequential Only Critical Functions CPU Code GPU Parallelize using CUDA CPU Programming Model + 3 GPU Computing Milestones 2002 2007 2007 GPGPU G80 NVIDIA First Parallel C Compiler Programming on Computing Architecture top of OGL (Single Precision) C SDK for GPU 4 GPU Computing Milestones 2009 2010 1st True HPC Class GPU Industry Standard IDE Fermi Architecture Parallel Nsight for DP, ECC and C++ support Microsoft Visual Studio 5 Tesla GPUs Power 3 of Top 5 Supercomputers #1 : Tianhe-1A #3 : Nebulae #4 : Tsubame 2.0 7168 Tesla GPU’s 2.5 PFLOPS 4650 Tesla GPU’s 1.2 PFLOPS 4224 Tesla GPU’s 1.194 PFLOPS 6 Tesla in 3 of Top 5 Supercomputers Performance 2500 2000 1500 Gigaflops 1000 500 0 Tianhe-1A Jaguar Nebulae Tsubame Hopper II 7 Tesla Best Performance/Watt 2500 8 Power 7 2000 6 1500 5 4 Gigaflops 1000 3 Megawatts 2 500 1 0 0 Tianhe-1A Jaguar Nebulae Tsubame Hopper II 8 World’s Fastest HPC Processor Tesla M2090: The 512 Core Fermi 512 CUDA Cores 665 GFlops 178 GB/s memory B/W 9 Industry and Research Partners Oil and gas Edu/Research Government Life Sciences Finance Manufacturing Reverse Time Astrophysics Signal Processing Bio-chemistry Risk Analytics Structural Migration Molecular Satellite Imaging Bio-informatics Monte Carlo Mechanics Kirchoff Time Dynamics Video Analytics Material Science Options Pricing Computational Migration Weather / Climate Synthetic Aperture Sequence Analysis Insurance Fluid Dynamics Reservoir Sim Modeling Radar Genomics modeling Machine Vision Electromagnetics 10 Simulating Quarks - Lattice QCD Simulation Professor Ting-Wei Chiu Department of Physics National Taiwan University 15 Tflops for $200,000 Lattice QCD “Performance of Blue Gene/L at 1% the Cost” 11 Abaqus: Accelerated by CUDA Faster = Better Quality Engine Block s4b 5 Million Degrees of Freedom Simulate More Scenarios 2x Faster More Fuel Efficient Engine Lower CO2 Emissions 12 The Future ARM is Pervasive and Open … and supported by Microsoft 9 Annual Shipments 8 ARM 7 x86 6 5 4 Unitsin Billions 3 2 1 0 2005 2006 2007 2008 2009 2010e 2011e 2012e 2013e 2014e Source: ARM, Mercury Research, NVIDIA 14 Project Denver NVIDIA-Designed High Performance ARM Core 15 CUDA GPU Roadmap 16 Maxwell 14 12 10 per Watt per 8 Kepler 6 DP GFLOPS DP 4 Fermi 2 Tesla 2007 2009 2011 2013 16 Summary and Call to Action Heterogeneous Computing has achieved commercial volume CPU + GPU Developers: Developing on NVIDIA based systems allows you to scale your work onto Heterogeneous Clusters and Supercomputers Scientist: Publish your work, share with your peers Industry: Buy systems that are heterogeneous capable 17.