NVIDIA Cryo-EM May 2019
Total Page:16
File Type:pdf, Size:1020Kb
Molecular Dynamics (MD) on GPUs March 2019 Accelerating Discoveries Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom simulation of the HIV virus and discovered the chemical structure of its capsid — “the perfect target for fighting the infection.” Without gpu, the supercomputer would need to be 5x larger for similar performance. 2 Overview of Life & Material Accelerated Apps MD QC All key codes are GPU-accelerated All key codes are ported or optimizing Great multi-GPU, multi-node (dense) performance GPU-accelerated math libraries, OpenACC directives GPU-accelerated apps GPU-accelerated apps ACEMD*, AMBER*, BAND, CHARMM, DESMOND, ESPResso, Folding@Home, ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS-UK, GPAW, LATTE, LSDalton, LSMS, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*, LAMMPS, Lattice Microbes*, mdcore, MOLCAS, MOPAC2012, NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack, Quantum MELD, miniMD, NAMD, OpenMM, PolyFTS, SOP-GPU* & more Espresso/PWscf, QUICK, TeraChem* Active acceleration projects CASTEP, GAMESS, Gaussian, ONETEP, Quantum Supercharger Library*, VASP & more green* >90% of the workload is on GPU 3 MD vs. QC on GPUs Molecular Dynamics Quantum Chemistry Properties - electronic properties, Simulates atomic positions over time Calculations ground state, excitation, spectra Chemical-biological or chemical-material Examples: MO, PW, DFT, semi-emp Simple empirical formulas Electron wave function Forces No bond rearrangements Bond rearrangements allowed Atom count Millions Thousands Solvent optional Solvent Solvent included without difficulty Classical QM/MM or implicit methods Numeric precision Primarily FP32 Primarily FP64 CUDA - cuBLAS, cuFFT Software acceleration CUDA - cuFFT Solvers – cuTensor, Eigen OpenACC Quadro for workstations NVIDIA GPUs Tesla for data center Tesla for data center Error correction (ECC) Not required Required 4 GPU-Accelerated Molecular Dynamics Apps Performance Slides Available ACEMD HOOMD-Blue • DESMOND/FEP • HTMD • ESPResSO • mdcore AMBER/GTI LAMMPS • Folding@Home • MELD Chameleon NAMD • Genesis • OpenMM CHARMM • GPUGrid.net • PolyFTS GROMACS • HALMD GPU Perf compared against dual multi-core x86 CPU socket. 5 MD Applications GPU-Accelerated Computing Turbocharge your research! • Speedup of 3X-8X compared to CPU only in all tests (average) • Majority of compute intensive for classical MD ported to GPUs • Large performance boost and improve TCO for compute infrastructure • Tesla GPUs are more energy efficient <50% of CPU-only computing • GPUs scale well within a node and/or over multiple nodes • Tesla V100 is highest performance GPU Try GPU accelerated MD apps for free – nvidia.com/GPUTestDrive 6 AmberMD 18.10-AT_18.12 March 2019 AmberMD 18.10_AT_18.12- PME-Cellulose AmberMD 18.10-AT_18.12 - Tesla V100-SXM2-32GB 90 30.0X 80 78.55 25.0X 70 71.11 60 63.19 20.0X Cellulose 57.68 58.54 408,609 atoms 50 48.13 15.0X Running AmberMD 18.10_AT_18.12 ns/day 40 4X V100 2X V100 The blue node contains Dual Intel Xeon 4X V100 24.2X 30 2X V100 21.9X 10.0X Gold 6140 (Skylake) CPUs 20.3X 1X V100 1X V100 18.5X 18.0X The green nodes contain Dual Intel 20 15.4X Xeon Gold 6140 (Skylake) CPUs + Tesla 5.0X Skylake Skylake V100 SXM2 (32GB) GPUs 10 Dual CPU Dual CPU 1.0X 1.0X Speed up over dual CPU node (X) 0 0.0X PME-Cellulose_NPT 2fs PME-Cellulose_NVE 2fs 8 AmberMD 18.10_AT_18.12 - PME-FactorIX AmberMD 18.10-AT_18.12 - Tesla V100-SXM2-32GB 350 25.0X 326.85 300 290.8 20.0X Factor IX 268.08 250 262.56 90,906 atoms 236.39 15.0X 200 207.66 Running AmberMD 18.10_AT_18.12 ns/day 150 4X V100 2X V100 21.0X 10.0X The blue node contains Dual Intel Xeon 4X V100 1X V100 2X V100 18.7X Gold 6140 (Skylake) CPUs 17.8X 16.9X 100 1X V100 15.7X 13.8X The green nodes contain Dual Intel 5.0X Xeon Gold 6140 (Skylake) CPUs + Tesla 50 Skylake Skylake Dual CPU Dual CPU V100 SXM2 (32GB) GPUs 1.0X 1.0X Speed up over dual CPU node (X) 0 0.0X PME-FactorIX_NPT 2fs PME-FactorIX_NVE 2fs 9 AmberMD 18.10_AT_18.12 - PME-JAC AmberMD 18.10-AT_18.12 - Tesla V100-SXM2-32GB 800 14.0X 700 12.0X 687.83 600 622.91 591.28 10.0X DHFR 571.21 23,558 atoms 500 522.88 506.36 8.0X 400 Running AmberMD 18.10_AT_18.12 ns/day 4X V100 1X V100 6.0X 4X V100 2X V100 12.3X The blue node contains Dual Intel Xeon 300 1X V100 11.1X 2X V100 10.4X 10.6X Gold 6140 (Skylake) CPUs 9.6X 9.3X 4.0X 200 The green nodes contain Dual Intel Skylake Skylake Xeon Gold 6140 (Skylake) CPUs + Tesla Dual CPU Dual CPU 2.0X 100 V100 SXM2 (32GB) GPUs 1.0X 1.0X Speed up over dual CPU node (X) 0 0.0X PME-JAC_NPT 2fs PME-JAC_NVE 2fs 10 AmberMD 18.10_AT_18.12 - PME-STMV_NPT AmberMD 18.10-AT_18.12 - Tesla V100-SXM2-32GB 25 25.0X 20 20.83 20.0X 19.94 Satellite Tobacco Mosaic Virus 17.02 1,067,095 atoms 15 15.0X Running AmberMD 18.10_AT_18.12 ns/day 2X V100 4X V100 21.9X 10 1X V100 21.0X 10.0X The blue node contains Dual Intel Xeon 17.9X Gold 6140 (Skylake) CPUs The green nodes contain Dual Intel 5 5.0X Xeon Gold 6140 (Skylake) CPUs + Tesla Skylake Dual CPU V100 SXM2 (32GB) GPUs 1.0X Speed up over dual CPU node (X) 0 0.0X PME-STMV_NPT 4fs 11 AmberMD 18.10_AT_18.12 – P100 vs V100 AmberMD 18.10-AT_18.12 70.00 25.0X 60.00 63.19 All benchmarks compared as set 57.68 20.0X Cellulose, FactorIX, JAC, STMV 50.00 Running AmberMD 18.10_AT_18.12 48.13 15.0X 40.00 The blue node contains Dual Intel Xeon 39.99 Gold 6140 (Skylake) CPUs 37.17 ns/day 4X 30.00 The green nodes contain Dual Intel 2X V100 10.0X 30.22 Xeon Gold 6140 (Skylake) CPUs + 1X V100 20.3X 4X V100 18.5X Tesla P100 SXM2 (16GB) GPUs or 20.00 21.24 2X 21.24 Tesla V100 SXM2 (32GB) GPUs 1X P100 P100 15.4X P100 11.9X 12.8X 5.0X 10.00 Skylake 9.7X Skylake Speed up over dual CPU node (X) Dual CPU Dual CPU 1.0X 1.0X 0.00 0.0X P100 V100 12 AmberMD 18.10_AT_18.12- PME-Cellulose AmberMD 18.10-AT_18.12 - Tesla T4 30.0 9.0X 8.0X 25.0 24.9 7.0X 23.9 22.7 21.8 20.0 6.0X Cellulose 408,609 atoms 5.0X 17.1 15.0 16.0 Running AmberMD 18.10_AT_18.12 ns/day 2X T4 4X T4 4.0X 2X T4 4X T4 7.7X 7.4X The blue node contains Dual Intel Xeon 7.3X 7.0X 10.0 3.0X Gold 6140 (Skylake) CPUs 1X T4 1X T4 5.1X 5.3X The green nodes contain Dual Intel Skylake Skylake 2.0X Dual CPU Dual CPU Xeon Gold 6140 (Skylake) CPUs + Tesla 5.0 1.0X 1.0X T4 PCIe (16GB) GPUs 1.0X Speed up over dual CPU node (X) 0.0 0.0X PME-Cellulose_NPT 2fs PME-Cellulose_NVE 2fs 13 AmberMD 18.10_AT_18.12 - PME-FactorIX AmberMD 18.10-AT_18.12 - Tesla T4 140.0 9.0X 8.0X 120.0 123.8 112.5 112.6 7.0X Factor IX 100.0 102.3 90,906 atoms 6.0X 80.0 85.0 5.0X 79.6 Running AmberMD 18.10_AT_18.12 ns/day 2X T4 4.0X 60.0 2X T4 4X T4 The blue node contains Dual Intel Xeon 4X T4 8.0X 7.5X 7.2X Gold 6140 (Skylake) CPUs 6.8X 1X T4 1X T4 3.0X 40.0 5.3X 5.5X The green nodes contain Dual Intel Skylake Skylake 2.0X Xeon Gold 6140 (Skylake) CPUs + Tesla Dual CPU Dual CPU T4 PCIe (16GB) GPUs 20.0 1.0X 1.0X 1.0X 0.0 0.0X Speed up over dual CPU node (X) PME-FactorIX_NPT 2fs PME-FactorIX_NVE 2fs 14 AmberMD 18.10_AT_18.12 - PME-JAC AmberMD 18.10-AT_18.12 - Tesla T4 400.0 8.0X 372.8 350.0 7.0X 331.8 336.3 300.0 6.0X 301.8 DHFR 285.4 23,558 atoms 250.0 262.2 5.0X Running AmberMD 18.10_AT_18.12 200.0ns/day 4.0X 2X T4 The blue node contains Dual Intel Xeon 2X T4 6.7X 4X T4 150.0 4X T4 3.0X Gold 6140 (Skylake) CPUs 6.1X 1X T4 6.0X 1X T4 5.5X 4.8X 5.1X The green nodes contain Dual Intel 100.0 Skylake Skylake 2.0X Dual CPU Dual CPU Xeon Gold 6140 (Skylake) CPUs + Tesla 1.0X 1.0X T4 PCIe (16GB) GPUs 50.0 1.0X 0.0 0.0X Speed up over dual CPU node (X) PME-JAC_NPT 2fs PME-JAC_NVE 2fs 15 AmberMD 18.10_AT_18.12 - PME-STMV_NPT AmberMD 18.10-AT_18.12 - Tesla T4 16.0 9.0X 15.0 14.0 8.0X 14.3 7.0X 12.0 Satellite Tobacco Mosaic Virus 6.0X 10.0 10.7 1,067,095 atoms 5.0X 8.0 Running AmberMD 18.10_AT_18.12 ns/day 2X T4 4X T4 8.2X 4.0X 7.8X The blue node contains Dual Intel Xeon 6.0 1X T4 3.0X Gold 6140 (Skylake) CPUs 5.9X 4.0 Skylake 2.0X The green nodes contain Dual Intel Dual CPU Xeon Gold 6140 (Skylake) CPUs + Tesla 1.0X T4 PCIe (16GB) GPUs 2.0 1.0X Speed up over dual CPU node (X) 0.0 0.0X PME-STMV_NPT 4fs 16 AmberMD recommended usage Motherboard and CPU Dual-socket with server x86-64 CPU System memory >=16GB GPUs Tesla V100 SXM2 GPUs per socket 1 to 8 GPUs per task 1 – 4 (case dependent) 17 17 GROMACS 2019.1 March 2019 GROMACS 2019.1 - ADH Dodec GROMACS 2019.1 - Tesla V100-SXM2-32GB 250 4.0X 3.5X 200 193.52 3.0X 184.67 ADH 2.5X 150 160.21 134,000 atoms 4X V100 2.0X Running GROMACS 2019.1 ns/day 2X V100 100 1X V100 3.6X 3.4X 1.5X The blue node contains Dual Intel Xeon 3.0X Gold 6140 (Skylake) CPUs 1.0X The green nodes contain Dual Intel 50 53.7 Xeon Gold 6140 (Skylake) CPUs + Tesla Skylake 0.5X V100 SXM2 (32GB) GPUs Dual CPU 1.0X 0 0.0X Speed up over dual CPU node (X) ADH Dodec (h-bond) 19 GROMACS 2019.1 - Cellulose GROMACS 2019.1 - Tesla V100-SXM2-32GB 60 4.0X 54.22 3.5X 50 51.94 3.0X 44.49 40 Cellulose 2.5X 408,609 atoms 30 4X V100 2.0X Running GROMACS 2019.1 ns/day 2X V100 3.6X 1X V100 3.4X 1.5X The blue node contains Dual Intel Xeon 20 2.9X Gold 6140 (Skylake) CPUs 1.0X 15.13 The green nodes contain Dual Intel 10 Xeon Gold 6140 (Skylake) CPUs + Tesla Skylake 0.5X V100 SXM2 (32GB) GPUs Dual CPU 1.0X 0 0.0X Speed up over dual CPU node (X) Cellulose (h-bond) 20 GROMACS 2019.1 - STMV GROMACS 2019.1 - Tesla V100-SXM2-32GB 18 5.0X 16 4.5X 15.84 15.95 4.0X 14 3.5X Satellite Tobacco Mosaic Virus 12 1,067,095 atoms 3.0X 10 10.24 2X V100 4X V100 2.5X Running GROMACS 2019.1 ns/day 8 4.5X 4.5X 2.0X The blue node contains Dual Intel Xeon 6 Gold 6140 (Skylake) CPUs 1X