LIFE AND MATERIAL SCIENCES Mark Berger; [email protected] Founded 1993 Invented GPU 1999 – Computer Graphics Visual Computing, Supercomputing, Cloud & Mobile Computing NVIDIA - Core Technologies and Brands GPU Mobile Cloud ® ® GeForce Tegra GRID Quadro® , Tesla® Accelerated Computing Multi-core plus Many-cores GPU Accelerator CPU Optimized for Many Optimized for Parallel Tasks Serial Tasks 3-10X+ Comp Thruput 7X Memory Bandwidth 5x Energy Efficiency How GPU Acceleration Works Application Code Compute-Intensive Functions Rest of Sequential 5% of Code CPU Code GPU CPU + GPUs : Two Year Heart Beat 32 Volta Stacked DRAM 16 Maxwell Unified Virtual Memory 8 Kepler Dynamic Parallelism 4 Fermi 2 FP64 DP GFLOPS GFLOPS per DP Watt 1 Tesla 0.5 CUDA 2008 2010 2012 2014 Kepler Features Make GPU Coding Easier Hyper-Q Dynamic Parallelism Speedup Legacy MPI Apps Less Back-Forth, Simpler Code FERMI 1 Work Queue CPU Fermi GPU CPU Kepler GPU KEPLER 32 Concurrent Work Queues Developer Momentum Continues to Grow 100M 430M CUDA –Capable GPUs CUDA-Capable GPUs 150K 1.6M CUDA Downloads CUDA Downloads 1 50 Supercomputer Supercomputers 60 640 University Courses University Courses 4,000 37,000 Academic Papers Academic Papers 2008 2013 Explosive Growth of GPU Accelerated Apps # of Apps Top Scientific Apps 200 61% Increase Molecular AMBER LAMMPS CHARMM NAMD Dynamics GROMACS DL_POLY 150 Quantum QMCPACK Gaussian 40% Increase Quantum Espresso NWChem Chemistry GAMESS-US VASP CAM-SE 100 Climate & COSMO NIM GEOS-5 Weather WRF Chroma GTS 50 Physics Denovo ENZO GTC MILC ANSYS Mechanical ANSYS Fluent 0 CAE MSC Nastran OpenFOAM 2010 2011 2012 SIMULIA Abaqus LS-DYNA Accelerated, In Development NVIDIA GPU Life Science Focus Molecular Dynamics: All codes are available AMBER, CHARMM, DESMOND, DL_POLY, GROMACS, LAMMPS, NAMD Great multi-GPU performance GPU codes: ACEMD, HOOMD-Blue Focus: scaling to large numbers of GPUs Quantum Chemistry: key codes ported or optimizing Active GPU acceleration projects: VASP, NWChem, Gaussian, GAMESS, ABINIT, Quantum Espresso, BigDFT, CP2K, GPAW, etc. GPU code: TeraChem Analytical and Medical Imaging Instruments Genomics/Bioinformatics Molecular Dynamics (MD) Applications Features Application GPU Perf Release Status Notes/Benchmarks Supported AMBER 12, GPU Revision Support 12.2 PMEMD Explicit Solvent & GB > 100 ns/day JAC Released http://ambermd.org/gpus/benchmarks. AMBER Implicit Solvent NVE on 2X K20s Multi-GPU, multi-node htm#Benchmarks 2x C2070 equals Release C37b1; Implicit (5x), Explicit (2x) Solvent Released 32-35x X5667 http://www.charmm.org/news/c37b1.html#pos CHARMM via OpenMM Single & Multi-GPU in single node CPUs tjump Two-body Forces, Link-cell Pairs, Source only, Results Published Release V 4.04 Ewald SPME forces, 4x http://www.stfc.ac.uk/CSE/randd/ccg/softwar DL_POLY Multi-GPU, multi-node Shake VV e/DL_POLY/25526.aspx 165 ns/Day DHFR Released Implicit (5x), Explicit (2x) on Release 4.6.2; 1st Multi-GPU support GROMACS Multi-GPU, multi-node 4X C2075s Lennard-Jones, Gay-Berne, Released http://lammps.sandia.gov/bench.html#desktop 3.5-18x on Titan LAMMPS Tersoff & many more potentials Multi-GPU, multi-node and http://lammps.sandia.gov/bench.html#titan 4.0 ns/day Released Full electrostatics with PME and F1-ATPase on 100M atom capable NAMD 2.9 NAMD most simulation features 1x K20X Multi-GPU, multi-node GPU Perf compared against Multi-core x86 CPU socket. GPU Perf benchmarked on GPU supported features and may be a kernel to kernel perf comparison New/Additional MD Applications Ramping Features Application GPU Perf Release Status Notes Supported 150 ns/day DHFR on Released Production bio-molecular dynamics (MD) Written for use only on GPUs ACEMD 1x K20 Single and Multi-GPUs software specially optimized to run on GPUs http://www.deshawresearch.com/resources_whatd TBD TBD Under development DESMOND esmonddoes.html Powerful distributed computing Depends upon Released http://folding.stanford.edu molecular dynamics system; Folding@Home number of GPUs GPUs and CPUs GPUs get 4X the points of CPUs implicit solvent and folding High-performance all-atom Depends upon Released http://www.gpugrid.net/ biomolecular simulations; explicit GPUGrid.net number of GPUs NVIDIA GPUs only solvent and binding Simple fluids and binary mixtures (pair potentials, high-precision Up to 66x on 2090 vs. Released, Version 0.2.0 http://halmd.org/benchmarks.html#supercooled HALMD NVE and NVT, dynamic 1 CPU core Single GPU -binary-mixture-kob-andersen correlations) Kepler 2X faster than Released, Version 0.11.3 http://codeblue.umich.edu/hoomd-blue/ Written for use only on GPUs HOOMD-Blue Fermi Single and Multi-GPU on 1 node Multi-GPU w/ MPI in September 2013 mdcore TBD TBD Released, Version 0.1.7 http://mdcore.sourceforge.net/download.html GPU Perf compared against Multi-core x86 CPU socket. Implicit: 127-213 GPU Perf benchmarked on GPU supported features Implicit and explicit solvent, Released, Version 5.1 Library and application for molecular dynamics ns/day Explicit: 18- OpenMM custom forces Multi-GPU and may be a kernelon high to-performance kernel perf comparison 55 ns/day DHFR AMBER 12 GPU Support Revision 12.2 1/22/2013 13 Kepler - Our Fastest Family of GPUs Yet 30.00 Factor IX Running AMBER 12 GPU Support Revision 12.1 25.39 25.00 The blue node contains Dual E5-2687W CPUs 22.44 (8 Cores per CPU). 7.4x The green nodes contain Dual E5-2687W CPUs (8 20.00 18.90 Cores per CPU) and either 1x NVIDIA M2090, 1x K10 or 1x K20 for the GPU 6.6x 15.00 11.85 5.6x Nanoseconds / Day 10.00 3.5x 5.00 3.42 0.00 Factor IX 1 CPU Node 1 CPU Node + 1 CPU Node + K10 1 CPU Node + K20 1 CPU Node + K20X M2090 GPU speedup/throughput increased from 3.5x (with M2090) to 7.4x (with K20X) when compared to a CPU only node 14 K20 Accelerates Simulations of All Sizes 30.00 28.00 Running AMBER 12 GPU Support Revision 12.1 25.56 SPFP with CUDA 4.2.9 ECC Off 25.00 The blue node contains 2x Intel E5-2687W CPUs (8 Cores per CPU) 20.00 Each green nodes contains 2x Intel E5-2687W CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPUs 15.00 10.00 7.28 6.50 6.56 Speedup Compared to CPU Only 5.00 2.66 1.00 0.00 CPU All TRPcage GB JAC NVE PME Factor IX NVE Cellulose NVE Myoglobin GB Nucleosome Molecules PME PME GB Gain 28x throughput/performance by adding just one K20 GPU Nucleosome when compared to dual CPU performance 15 Quantum Chemistry Applications Application Features Supported GPU Perf Release Status Notes Local Hamiltonian, non-local Hamiltonian, LOBPCG algorithm, Released; Version 7.2.2 www.abinit.org 1.3-2.7X Abinit diagonalization / Multi-GPU support orthogonalization Integrating scheduling GPU into http://www.olcf.ornl.gov/wp- Under development SIAL programming language and 10X on kernels content/training/electronic-structure- ACES III Multi-GPU support SIP runtime environment 2012/deumens_ESaccel_2012.pdf Pilot project completed, ADF Fock Matrix, Hessians TBD Under development www.scm.com Multi-GPU support http://inac.cea.fr/L_Sim/BigDFT/news.html, http://www.olcf.ornl.gov/wp- 5-25X DFT; Daubechies wavelets, Released, Version 1.6.0 content/training/electronic-structure-2012/BigDFT- (1 CPU core to BigDFT part of Abinit Multi-GPU support Formalism.pdf and http://www.olcf.ornl.gov/wp- GPU kernel) content/training/electronic-structure-2012/BigDFT- HPC-tues.pdf Code for performing quantum Monte Carlo (QMC) electronic Under development http://www.tcm.phy.cam.ac.uk/~mdt26/casino.ht TBD Casino structure calculations for finite Multi-GPU support ml and periodic systems CASTEP TBD TBD Under development GPU Perf comparedhttp://www.castep.org/Main/HomePage against Multi-core x86 CPU socket. GPU Perf benchmarked on GPU supported features http://www.olcf.ornl.gov/wp- DBCSR (spare matrix multiply Under development and may be a kernel to kernel perf comparison 2-7X content/training/ascc_2012/friday/ACSS_2012_Van CP2K library) Multi-GPU support deVondele_s.pdf Libqc with Rys Quadrature 1.3-1.6X, Released Algorithm, Hartree-Fock, MP2 and http://www.msg.ameslab.gov/gamess/index.html GAMESS-US 2.3-2.9x HF Multi-GPU support CCSD Quantum Chemistry Applications Application Features Supported GPU Perf Release Status Notes (ss|ss) type integrals within calculations using Hartree Fock ab Released, Version 7.0 initio methods and density functional 8x http://www.ncbi.nlm.nih.gov/pubmed/21541963 GAMESS-UK Multi-GPU support theory. Supports organics & inorganics. Joint PGI, NVIDIA & Gaussian Under development http://www.gaussian.com/g_press/nvidia_press. TBD Gaussian Collaboration Multi-GPU support htm Electrostatic poisson equation, Released https://wiki.fysik.dtu.dk/gpaw/devel/projects/g GPAW orthonormalizing of vectors, residual 8x Multi-GPU support pu.html, Samuli Hakala (CSC Finland) & Chris minimization method (rmm-diis) O’Grady (SLAC) Schrodinger, Inc. Under development http://www.schrodinger.com/kb/278 Investigating GPU acceleration TBD Multi-GPU support Jaguar http://www.schrodinger.com/productpage/14/7 /32/ http://on- demand.gputechconf.com/gtc/2013/presentation Released CU_BLAS, SP2 algorithm TBD s/S3195-Fast-Quantum-Molecular-Dynamics-in- LATTE Multi-GPU support LATTE.pdf Released, Version 7.8 MOLCAS CU_BLAS support 1.1x Single GPU; Additional GPU www.molcas.org support coming in Version 8 GPU Perf compared against Multi-core x86 CPU socket. Density-fitted MP2 (DF-MP2), density GPU Perf benchmarked on GPU supported features 1.7-2.3X Under development www.molpro.net fitted local correlation methods (DF- and may be a kernel to kernel perf comparison MOLPRO projected Multiple GPU Hans-Joachim Werner RHF, DF-KS), DFT Quantum Chemistry Applications Features Application GPU Perf Release Status Notes Supported Pseudodiagonalization, full Under development Academic port.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages30 Page
-
File Size-