Molecular Dynamics (MD) on Gpus Feb

Molecular Dynamics (MD) on GPUs Feb. 2, 2017 Accelerating Discoveries Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom simulation of the HIV virus and discovered the chemical structure of its capsid — “the perfect target for fighting the infection.” Without gpu, the supercomputer would need to be 5x larger for similar performance. 2 Overview of Life & Material Accelerated Apps MD: All key codes are GPU-accelerated QC: All key codes are ported or optimizing Great multi-GPU performance Focus on using GPU-accelerated math libraries, OpenACC directives Focus on dense (up to 16) GPU nodes &/or large # of GPU nodes GPU-accelerated and available today: ACEMD*, AMBER (PMEMD)*, BAND, CHARMM, DESMOND, ESPResso, ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS- Folding@Home, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*, UK, GPAW, LATTE, LSDalton, LSMS, MOLCAS, MOPAC2012, LAMMPS, Lattice Microbes*, mdcore, MELD, miniMD, NAMD, NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack, OpenMM, PolyFTS, SOP-GPU* & more Quantum Espresso/PWscf, QUICK, TeraChem* Active GPU acceleration projects: CASTEP, GAMESS, Gaussian, ONETEP, Quantum Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU 3 MD vs. QC on GPUs “Classical” Molecular Dynamics Quantum Chemistry (MO, PW, DFT, Semi-Emp) Simulates positions of atoms over time; Calculates electronic properties; chemical-biological or ground state, excited states, spectral properties, chemical-material behaviors making/breaking bonds, physical properties Forces calculated from simple empirical formulas Forces derived from electron wave function (bond rearrangement generally forbidden) (bond rearrangement OK, e.g., bond energies) Up to millions of atoms Up to a few thousand atoms Solvent included without difficulty Generally in a vacuum but if needed, solvent treated classically (QM/MM) or using implicit methods Single precision dominated Double precision is important Uses cuBLAS, cuFFT, CUDA Uses cuBLAS, cuFFT, OpenACC Geforce (Workstations), Tesla (Servers) Tesla recommended ECC off ECC on 4 GPU-Accelerated Molecular Dynamics Apps Green Lettering Indicates Performance Slides Included GPUGrid.net MELD ACEMD GROMACS NAMD AMBER HALMD OpenMM CHARMM HOOMD-Blue PolyFTS DESMOND LAMMPS ESPResSO mdcore Folding@Home GPU Perf compared against dual multi-core x86 CPU socket. 5 Benefits of MD GPU-Accelerated Computing Why wouldn’t you want to turbocharge your research? • 3x-8x Faster than CPU only systems in all tests (on average) • Most major compute intensive aspects of classical MD ported • Large performance boost with marginal price increase • Energy usage cut by more than half • GPUs scale well within a node and/or over multiple nodes • K80 GPU is our fastest and lowest power high performance GPU yet Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive 6 ACEMD www.acellera.com 470 ns/day on 1 GPU for L-Iduronic acid (1362 atoms) 116 ns/day on 1 GPU for DHFR (23K atoms) M. Harvey, G. Giupponi and G. De Fabritiis, ACEMD: Accelerated molecular dynamics simulations in the microseconds timescale, J. Chem. Theory and Comput. 5, 1632 (2009) www.acellera.com NVT, NPT, PME, TCL, PLUMED, CAMSHIFT1 1 M. J. Harvey and G. De Fabritiis, An implementation of the smooth particle-mesh Ewald (PME) method on GPU hardware, J. Chem. Theory Comput., 5, 2371–2377 (2009) 2 For a list of selected references see http://www.acellera.com/acemd/publications AMBER 16 June 2017 JAC_NVE on GP100s 450 23,558 atoms PME 2fs 404.09 400 370.32 350 320.19 320.14 300 Running AMBER version 16 250 ns/day The green nodes contain Dual Intel(R) 200 Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 150 100 50 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 11 JAC_NVE on GP100s 900 23,558 atoms PME 4fs 800 782.11 714.23 700 614.42 613.16 600 Running AMBER version 16 500 ns/day The green nodes contain Dual Intel(R) 400 Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 300 200 100 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 12 JAC_NPT on GP100s 400 23,558 atoms PME 2fs 360.64 350 333.03 295.75 295.42 300 250 Running AMBER version 16 200 ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 150 Quadro GP100s GPUs (PCIe and NVLink) 100 50 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 13 JAC_NPT on GP100s 800 23,558 atoms PME 4fs 706.53 700 654.66 600 580.47 578.48 500 Running AMBER version 16 400 ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 300 Quadro GP100s GPUs (PCIe and NVLink) 200 100 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 14 FactorIX_NVE on GP100s 180 90,906 atoms PME 166.61 160 142.45 140 120 106.23 105.98 Running AMBER version 16 100 ns/day The green nodes contain Dual Intel(R) 80 Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 60 40 20 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 15 FactorIX_NPT on GP100s 160 90,906 atoms PME 146.34 140 126.75 120 102.27 102.26 100 Running AMBER version 16 80 ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 60 Quadro GP100s GPUs (PCIe and NVLink) 40 20 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 16 Cellulose_NVE on GP100s 40 408,609 atoms PME 36.91 35 31.35 30 25 24.01 24.02 Running AMBER version 16 20 ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 15 Quadro GP100s GPUs (PCIe and NVLink) 10 5 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 17 Cellulose_NPT on GP100s 35 408,609 atoms PME 32.37 30 28.76 25 22.76 22.8 20 Running AMBER version 16 ns/day The green nodes contain Dual Intel(R) 15 Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 10 5 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 18 STMV_NPT on GP100s 25 1,067,095 atoms PME 23.13 20.22 20 15.64 15.43 15 Running AMBER version 16 ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 10 Quadro GP100s GPUs (PCIe and NVLink) 5 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 19 TRPCAGE on GP100s 1500 304 atoms GB 1216.56 1250 1187.3 1000 Running AMBER version 16 750 The green nodes contain Dual Intel(R) ns/day Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 500 250 0 1 node + 1x GP100 1 node + 1x GP100 per node (PCIe) per node (NVLink) 20 Myoglobin on GP100s 600 2,492 atoms GB 470.41 458.28 443.49 447.23 450 Running AMBER version 16 300 ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 150 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 21 Nucleosome on GP100s 25 21.29 25,095 atoms GB 20.51 20 15 Running AMBER version 16 11.47 11.29 ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 10 Quadro GP100s GPUs (PCIe and NVLink) 5 0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink) 22 AMBER 16 February 2017 PME-Cellulose_NPT on K80s 20 PME-Cellulose_NPT 16 15.43 Running AMBER version 16.3 The blue node contains Dual Intel Xeon 6.6X E5-2699 [email protected] [3.6GHz Turbo] 12 11.36 (Broadwell) CPUs The green nodes contain Dual Intel ns/day 4.8X Xeon E5-2699 [email protected] [3.6GHz 8 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel 4 Xeon E5-2699 [email protected] [3.6GHz 2.35 Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 24 PME-Cellulose_NPT on P100s PCIe 40 PME-Cellulose_NPT 35 30.00 Running AMBER version 16.3 30 The blue node contains Dual Intel Xeon 25 E5-2699 [email protected] [3.6GHz Turbo] 21.85 (Broadwell) CPUs ns/day 12.8X 20 The green nodes contain Dual Intel 9.3X Xeon E5-2699 [email protected] [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 10 ➢ 1x P100 PCIe is paired with Single 5 Intel Xeon E5-2699 [email protected] 2.35 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 25 PME-Cellulose_NPT on P100s SXM2 40 PME-Cellulose_NPT 36.65 35 32.22 Running AMBER version 16.3 30 15.6X 13.7X The blue node contains Dual Intel Xeon 25 23.37 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 20 ns/day The green nodes contain Dual Intel 9.9X Xeon E5-2698 [email protected] [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 10 ➢ 1x P100 SXM2 is paired with Single 5 Intel Xeon E5-2698 [email protected] 2.35 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per

Load more