NVIDIA GPU Applications Presentation

Outline of NVIDIA tutorial at MSI I. Intro to GPUs A. MSI plans to add significantly more GPUs in the future; the current system is described at: https://www.msi.umn.edu/hpc/cascade B. MSI currently has: i. 8 Westmere nodes with 4 Fermi M2070 GPGPUs/node (Cascade queue) ii. 4 SandyBridge nodes with 2 Kepler K20m GPGPUs/node (Kepler queue) II. Computational Chemistry and Biology Applications A. Molecular Dynamics Applications, including: i. AMBER, NAMD, LAMMPS, GROMACS, CHARMM, DESMOND, DL_POLY, OpenMM ii. GPU only codes: ACEMD, HOOMD-Blue B. Quantum Chemistry, including: i. Abinit, BigDFT, CP2K, Gaussian, GAMESS, GPAW, NWChem, Quantum Espresso, Q-CHEM, VASP & more ii. GPU only code: TeraChem (ParaChem) C. Bioinformatics i. NVBIO (nvBowtie), BWA-MEM (Both currently in development) update Updated: Oct. 21, 2013 Founded 1993 Invented GPU 1999 – Computer Graphics Visual Computing, Supercomputing, Cloud & Mobile Computing NVIDIA - Core Technologies and Brands GPU Mobile Cloud ® ® GeForce Tegra GRID Quadro® , Tesla® Accelerated Computing Multi-core plus Many-cores GPU Accelerator CPU Optimized for Many Optimized for Parallel Tasks Serial Tasks 3-10X+ Comp Thruput 7X Memory Bandwidth 5x Energy Efficiency How GPU Acceleration Works Application Code Compute-Intensive Functions Rest of Sequential 5% of Code CPU Code GPU CPU + GPUs : Two Year Heart Beat 32 Volta Stacked DRAM 16 Maxwell Unified Virtual Memory 8 Kepler Dynamic Parallelism 4 Fermi 2 FP64 DP GFLOPS GFLOPS per DP Watt 1 Tesla 0.5 CUDA 2008 2010 2012 2014 Kepler Features Make GPU Coding Easier Hyper-Q Dynamic Parallelism Speedup Legacy MPI Apps Less Back-Forth, Simpler Code FERMI 1 Work Queue CPU Fermi GPU CPU Kepler GPU KEPLER 32 Concurrent Work Queues Developer Momentum Continues to Grow 100M 430M CUDA –Capable GPUs CUDA-Capable GPUs 150K 1.6M CUDA Downloads CUDA Downloads 1 50 Supercomputer Supercomputers 60 640 University Courses University Courses 4,000 37,000 Academic Papers Academic Papers 2008 2013 Explosive Growth of GPU Accelerated Apps # of Apps Top Scientific Apps 200 61% Increase Molecular AMBER LAMMPS CHARMM NAMD Dynamics GROMACS DL_POLY 150 Quantum QMCPACK Gaussian 40% Increase Quantum Espresso NWChem Chemistry GAMESS-US VASP CAM-SE 100 Climate & COSMO NIM GEOS-5 Weather WRF Chroma GTS 50 Physics Denovo ENZO GTC MILC ANSYS Mechanical ANSYS Fluent 0 CAE MSC Nastran OpenFOAM 2010 2011 2012 SIMULIA Abaqus LS-DYNA Accelerated, In Development Overview of Life & Material Accelerated Apps MD: All key codes are available AMBER, CHARMM, DESMOND, DL_POLY, GROMACS, LAMMPS, NAMD GPU only codes: ACEMD, HOOMD-Blue Great multi-GPU performance Focus: scaling to large numbers of GPUs / nodes QC: All key codes are ported/optimizing: Active GPU acceleration projects: Abinit, BigDFT, CP2K, GAMESS, Gaussian, GPAW, NWChem, Quantum Espresso, VASP & more GPU only code: TeraChem Analytical instruments – actively recruiting Bioinformatics – market development GPU Test Drive Experience GPU Acceleration For Computational Chemistry Researchers, Biophysicists Preconfigured with Molecular Dynamics Apps Remotely Hosted GPU Servers Free & Easy – Sign up, Log in and See Results www.nvidia.com/gputestdrive 11 Molecular Dynamics (MD) Applications Features Application GPU Perf Release Status Notes/Benchmarks Supported AMBER 12, GPU Revision Support 12.2 PMEMD Explicit Solvent & GB > 100 ns/day JAC Released http://ambermd.org/gpus/benchmarks. AMBER Implicit Solvent NVE on 2X K20s Multi-GPU, multi-node htm#Benchmarks 2x C2070 equals Release C37b1; Implicit (5x), Explicit (2x) Solvent Released 32-35x X5667 http://www.charmm.org/news/c37b1.html#pos CHARMM via OpenMM Single & Multi-GPU in single node CPUs tjump Two-body Forces, Link-cell Pairs, Source only, Results Published Release V 4.04 Ewald SPME forces, 4x http://www.stfc.ac.uk/CSE/randd/ccg/softwar DL_POLY Multi-GPU, multi-node Shake VV e/DL_POLY/25526.aspx 165 ns/Day DHFR Released Release 4.6.3; 1st Multi-GPU support Implicit (5x), Explicit (2x) on GROMACS Multi-GPU, multi-node www.gromacs.org 4X C2075s http://lammps.sandia.gov/bench.html#desktop Lennard-Jones, Gay-Berne, 3.5-18x on Released and LAMMPS Tersoff & many more potentials ORNL Titan Multi-GPU, multi-node http://lammps.sandia.gov/bench.html#titan 4.0 ns/day Released Full electrostatics with PME and NAMD 2.9 F1-ATPase on 100M atom capable NAMD most simulation features http://www.ks.uiuc.edu/Research/namd/ 1x K20X Multi-GPU, multi-node GPU Perf compared against Multi-core x86 CPU socket. GPU Perf benchmarked on GPU supported features and may be a kernel to kernel perf comparison New/Additional MD Applications Ramping Features Application GPU Perf Release Status Notes Supported Production biomolecular dynamics (MD) software specially 150 ns/day DHFR on 1x Released Written for use only on GPUs optimized to run on GPUs ACEMD K20 Single and Multi-GPUs http://www.acellera.com/ Bonded, pair, excluded interactions; Van Released, Version der Waals, electrostatic, nonbonded far TBD http://www.schrodinger.com/productpage/14/3/ DESMOND 3.4.0/0.7.2 interactions Powerful distributed computing molecular Depends upon number of Released http://folding.stanford.edu dynamics system; implicit solvent and Folding@Home GPUs GPUs and CPUs GPUs get 4X the points of CPUs folding High-performance all-atom biomolecular Depends upon number of http://www.gpugrid.net/ Released GPUGrid.net simulations; explicit solvent and binding GPUs Simple fluids and binary mixtures (pair Up to 66x on 2090 vs. 1 Released, Version 0.2.0 http://halmd.org/benchmarks.html#supercooled-binary- potentials, high-precision NVE and NVT, HALMD CPU core Single GPU mixture-kob-andersen dynamic correlations) Kepler 2X faster than Released, Version 0.11.3 http://codeblue.umich.edu/hoomd-blue/ Written for use only on GPUs HOOMD-Blue Fermi Single and Multi-GPU on 1 node Multi-GPU w/ MPI in late 2013 mdcore TBD TBD Released, Version 0.1.7 http://mdcore.sourceforge.net/download.html Implicit: 127-213 ns/day Library and application for molecular dynamics on high- Implicit and explicit solvent, custom Released, Version 5.2 Explicit: 18-55 performance OpenMM forces Multi-GPU ns/day DHFR https://simtk.org/home/openmm_suite GPU Perf compared against Multi-core x86 CPU socket. GPU Perf benchmarked on GPU supported features and may be a kernel to kernel perf comparison Quantum Chemistry Applications Application Features Supported GPU Perf Release Status Notes Local Hamiltonian, non-local Released; Version 7.4.1 www.abinit.org Hamiltonian, LOBPCG algorithm, 1.3-2.7X ABINIT Multi-GPU support diagonalization / orthogonalization Integrating scheduling GPU into SIAL http://www.olcf.ornl.gov/wp- Under development programming language and SIP 10X on kernels content/training/electronic-structure- ACES III Multi-GPU support runtime environment 2012/deumens_ESaccel_2012.pdf Available Q1 2014 Fock Matrix, Hessians TBD www.scm.com ADF Multi-GPU support DFT; Daubechies wavelets, Released, Version 1.7.0 2-5X http://bigdft.org BigDFT part of Abinit Multi-GPU support Code for performing quantum Monte Carlo (QMC) electronic structure Under development TBD http://www.tcm.phy.cam.ac.uk/~mdt26/casino.html Casino calculations for finite and periodic Multi-GPU support systems CASTEP TBD TBD Under development http://www.castep.org/Main/HomePage http://www.olcf.ornl.gov/wp- Released DBCSR (spare matrix multiply library) 2-7X content/training/ascc_2012/friday/ACSS_2012_VandeVon CP2K Multi-GPU support dele_s.pdf Libqc with Rys Quadrature Algorithm, 1.3-1.6X, Released http://www.msg.ameslab.gov/gamess/index.html GAMESS-US Hartree-Fock, MP2 and CCSD 2.3-2.9x HF Multi-GPU support GPU Perf compared against Multi-core x86 CPU socket. GPU Perf benchmarked on GPU supported features and may be a kernel to kernel perf comparison Quantum Chemistry Applications Application Features Supported GPU Perf Release Status Notes (ss|ss) type integrals within calculations using Hartree Fock ab initio methods and Released, Version 7.0 8x http://www.ncbi.nlm.nih.gov/pubmed/21541963 GAMESS-UK density functional theory. Supports Multi-GPU support organics & inorganics. Under development Joint PGI, NVIDIA & Gaussian TBD Multi-GPU support http://www.gaussian.com/g_press/nvidia_press.htm Gaussian Collaboration Electrostatic poisson equation, Released https://wiki.fysik.dtu.dk/gpaw/devel/projects/gpu.ht GPAW orthonormalizing of vectors, residual 8x Multi-GPU support ml, Samuli Hakala (CSC Finland) & minimization method (rmm-diis) Chris O’Grady (SLAC) Under development Schrodinger, Inc. Jaguar Investigating GPU acceleration TBD Multi-GPU support http://www.schrodinger.com/kb/278 http://www.schrodinger.com/productpage/14/7/32/ http://on- Released demand.gputechconf.com/gtc/2013/presentations/S31 CU_BLAS, SP2 algorithm TBD LATTE Multi-GPU support 95-Fast-Quantum-Molecular-Dynamics-in-LATTE.pdf Released, Version 7.8 MOLCAS CU_BLAS support 1.1x Single GPU; Additional GPU support www.molcas.org coming in Version 8 Density-fitted MP2 (DF-MP2), density Under development www.molpro.net fitted local correlation methods (DF-RHF, 1.7-2.3X projected MOLPRO Multiple GPU Hans-Joachim Werner DF-KS), DFT GPU Perf compared against Multi-core x86 CPU socket. GPU Perf benchmarked on GPU supported features and may be a kernel to kernel perf comparison Quantum Chemistry Applications Features Application GPU Perf Release Status Notes Supported Pseudodiagonalization, full Released Academic port. diagonalization, and density matrix 3.8-14X MOPAC2013 available Q1 2014 MOPAC2012 http://openmopac.net assembling Single GPU Development

NVIDIA GPU Applications Presentation

GPAW, Gpus, and LUMI

D6.1 Report on the Deployment of the Max Demonstrators and Feedback to WP1-5

Supporting Information

5 Jul 2020 (ﬁnite Non-Periodic Vs

GROMACS: Fast, Flexible, and Free

An Ab Initio Materials Simulation Code

Real-Time Pymol Visualization for Rosetta and Pyrosetta

Automated Construction of Quantum–Classical Hybrid Models Arxiv:2102.09355V1 [Physics.Chem-Ph] 18 Feb 2021

Chem Compute Quickstart

Molecular Dynamics Simulations in Drug Discovery and Pharmaceutical Development

An Explicit-Solvent Conformation Search Method Using Open Software

Parameterizing a Novel Residue