3-1-11: Master Deck- Please Save to Your Desktop

3-1-11: Master Deck- Please Save to Your Desktop

The Evolution of Modern Parallel Computing Sanford H. Russell Director of CUDA Marketing, NVIDIA Corporation Taipei | May 19 , 2011 GPU Computing Is not CPU vs. GPU It is CPU + GPU 2 GPU Computing Application Code Rest of Sequential Only Critical Functions CPU Code GPU Parallelize using CUDA CPU Programming Model + 3 GPU Computing Milestones 2002 2007 2007 GPGPU G80 NVIDIA First Parallel C Compiler Programming on Computing Architecture top of OGL (Single Precision) C SDK for GPU 4 GPU Computing Milestones 2009 2010 1st True HPC Class GPU Industry Standard IDE Fermi Architecture Parallel Nsight for DP, ECC and C++ support Microsoft Visual Studio 5 Tesla GPUs Power 3 of Top 5 Supercomputers #1 : Tianhe-1A #3 : Nebulae #4 : Tsubame 2.0 7168 Tesla GPU’s 2.5 PFLOPS 4650 Tesla GPU’s 1.2 PFLOPS 4224 Tesla GPU’s 1.194 PFLOPS 6 Tesla in 3 of Top 5 Supercomputers Performance 2500 2000 1500 Gigaflops 1000 500 0 Tianhe-1A Jaguar Nebulae Tsubame Hopper II 7 Tesla Best Performance/Watt 2500 8 Power 7 2000 6 1500 5 4 Gigaflops 1000 3 Megawatts 2 500 1 0 0 Tianhe-1A Jaguar Nebulae Tsubame Hopper II 8 World’s Fastest HPC Processor Tesla M2090: The 512 Core Fermi 512 CUDA Cores 665 GFlops 178 GB/s memory B/W 9 Industry and Research Partners Oil and gas Edu/Research Government Life Sciences Finance Manufacturing Reverse Time Astrophysics Signal Processing Bio-chemistry Risk Analytics Structural Migration Molecular Satellite Imaging Bio-informatics Monte Carlo Mechanics Kirchoff Time Dynamics Video Analytics Material Science Options Pricing Computational Migration Weather / Climate Synthetic Aperture Sequence Analysis Insurance Fluid Dynamics Reservoir Sim Modeling Radar Genomics modeling Machine Vision Electromagnetics 10 Simulating Quarks - Lattice QCD Simulation Professor Ting-Wei Chiu Department of Physics National Taiwan University 15 Tflops for $200,000 Lattice QCD “Performance of Blue Gene/L at 1% the Cost” 11 Abaqus: Accelerated by CUDA Faster = Better Quality Engine Block s4b 5 Million Degrees of Freedom Simulate More Scenarios 2x Faster More Fuel Efficient Engine Lower CO2 Emissions 12 The Future ARM is Pervasive and Open … and supported by Microsoft 9 Annual Shipments 8 ARM 7 x86 6 5 4 Unitsin Billions 3 2 1 0 2005 2006 2007 2008 2009 2010e 2011e 2012e 2013e 2014e Source: ARM, Mercury Research, NVIDIA 14 Project Denver NVIDIA-Designed High Performance ARM Core 15 CUDA GPU Roadmap 16 Maxwell 14 12 10 per Watt per 8 Kepler 6 DP GFLOPS DP 4 Fermi 2 Tesla 2007 2009 2011 2013 16 Summary and Call to Action Heterogeneous Computing has achieved commercial volume CPU + GPU Developers: Developing on NVIDIA based systems allows you to scale your work onto Heterogeneous Clusters and Supercomputers Scientist: Publish your work, share with your peers Industry: Buy systems that are heterogeneous capable 17.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us