Molecular Dynamics (MD) on GPUs Feb. 2, 2017 Accelerating Discoveries
Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom simulation of the HIV virus and discovered the chemical structure of its capsid — “the perfect target for fighting the infection.”
Without gpu, the supercomputer would need to be 5x larger for similar performance.
2 Overview of Life & Material Accelerated Apps
MD: All key codes are GPU-accelerated QC: All key codes are ported or optimizing Great multi-GPU performance Focus on using GPU-accelerated math libraries, OpenACC directives Focus on dense (up to 16) GPU nodes &/or large # of GPU nodes GPU-accelerated and available today: ACEMD*, AMBER (PMEMD)*, BAND, CHARMM, DESMOND, ESPResso, ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS- Folding@Home, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*, UK, GPAW, LATTE, LSDalton, LSMS, MOLCAS, MOPAC2012, LAMMPS, Lattice Microbes*, mdcore, MELD, miniMD, NAMD, NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack, OpenMM, PolyFTS, SOP-GPU* & more Quantum Espresso/PWscf, QUICK, TeraChem* Active GPU acceleration projects: CASTEP, GAMESS, Gaussian, ONETEP, Quantum Supercharger Library*, VASP & more
green* = application where >90% of the workload is on GPU 3 MD vs. QC on GPUs
“Classical” Molecular Dynamics Quantum Chemistry (MO, PW, DFT, Semi-Emp) Simulates positions of atoms over time; Calculates electronic properties; chemical-biological or ground state, excited states, spectral properties, chemical-material behaviors making/breaking bonds, physical properties
Forces calculated from simple empirical formulas Forces derived from electron wave function (bond rearrangement generally forbidden) (bond rearrangement OK, e.g., bond energies) Up to millions of atoms Up to a few thousand atoms Solvent included without difficulty Generally in a vacuum but if needed, solvent treated classically (QM/MM) or using implicit methods
Single precision dominated Double precision is important Uses cuBLAS, cuFFT, CUDA Uses cuBLAS, cuFFT, OpenACC Geforce (Workstations), Tesla (Servers) Tesla recommended ECC off ECC on
4 GPU-Accelerated Molecular Dynamics Apps Green Lettering Indicates Performance Slides Included
GPUGrid.net MELD ACEMD GROMACS NAMD AMBER HALMD OpenMM CHARMM HOOMD-Blue PolyFTS DESMOND LAMMPS ESPResSO mdcore Folding@Home
GPU Perf compared against dual multi-core x86 CPU socket. 5 Benefits of MD GPU-Accelerated Computing Why wouldn’t you want to turbocharge your research?
• 3x-8x Faster than CPU only systems in all tests (on average) • Most major compute intensive aspects of classical MD ported • Large performance boost with marginal price increase • Energy usage cut by more than half • GPUs scale well within a node and/or over multiple nodes • K80 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive 6 ACEMD www.acellera.com
470 ns/day on 1 GPU for L-Iduronic acid (1362 atoms) 116 ns/day on 1 GPU for DHFR (23K atoms)
M. Harvey, G. Giupponi and G. De Fabritiis, ACEMD: Accelerated molecular dynamics simulations in the microseconds timescale, J. Chem. Theory and Comput. 5, 1632 (2009) www.acellera.com
NVT, NPT, PME, TCL, PLUMED, CAMSHIFT1
1 M. J. Harvey and G. De Fabritiis, An implementation of the smooth particle-mesh Ewald (PME) method on GPU hardware, J. Chem. Theory Comput., 5, 2371–2377 (2009) 2 For a list of selected references see http://www.acellera.com/acemd/publications AMBER 16 June 2017 JAC_NVE on GP100s
450 23,558 atoms PME 2fs 404.09 400 370.32
350 320.19 320.14
300 Running AMBER version 16 250
ns/day The green nodes contain Dual Intel(R) 200 Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 150
100
50
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
11 JAC_NVE on GP100s
900 23,558 atoms PME 4fs 800 782.11 714.23 700 614.42 613.16 600 Running AMBER version 16 500
ns/day The green nodes contain Dual Intel(R) 400 Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 300
200
100
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
12 JAC_NPT on GP100s
400 23,558 atoms PME 2fs 360.64 350 333.03
295.75 295.42 300
250 Running AMBER version 16
200
ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 150 Quadro GP100s GPUs (PCIe and NVLink)
100
50
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
13 JAC_NPT on GP100s
800
23,558 atoms PME 4fs 706.53 700 654.66
600 580.47 578.48
500 Running AMBER version 16
400
ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 300 Quadro GP100s GPUs (PCIe and NVLink)
200
100
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
14 FactorIX_NVE on GP100s
180 90,906 atoms PME 166.61 160 142.45 140
120 106.23 105.98 Running AMBER version 16 100
ns/day The green nodes contain Dual Intel(R) 80 Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 60
40
20
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
15 FactorIX_NPT on GP100s
160 90,906 atoms PME 146.34 140 126.75
120
102.27 102.26 100 Running AMBER version 16
80
ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 60 Quadro GP100s GPUs (PCIe and NVLink)
40
20
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
16 Cellulose_NVE on GP100s
40 408,609 atoms PME 36.91 35 31.35
30
25 24.01 24.02 Running AMBER version 16
20
ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 15 Quadro GP100s GPUs (PCIe and NVLink)
10
5
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
17 Cellulose_NPT on GP100s
35 408,609 atoms PME 32.37 30 28.76
25 22.76 22.8
20 Running AMBER version 16
ns/day The green nodes contain Dual Intel(R) 15 Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
10
5
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
18 STMV_NPT on GP100s
25 1,067,095 atoms PME 23.13 20.22 20
15.64 15.43 15 Running AMBER version 16
ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 10 Quadro GP100s GPUs (PCIe and NVLink)
5
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
19 TRPCAGE on GP100s
1500 304 atoms GB 1216.56 1250 1187.3
1000 Running AMBER version 16
750 The green nodes contain Dual Intel(R)
ns/day Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink) 500
250
0 1 node + 1x GP100 1 node + 1x GP100 per node (PCIe) per node (NVLink)
20 Myoglobin on GP100s
600 2,492 atoms GB
470.41 458.28 443.49 447.23 450
Running AMBER version 16
300
ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
150
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
21 Nucleosome on GP100s
25
21.29 25,095 atoms GB 20.51 20
15 Running AMBER version 16
11.47 11.29 ns/day The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + 10 Quadro GP100s GPUs (PCIe and NVLink)
5
0 1 node + 1x GP100 1 node + 1x GP100 1 node + 2x GP100 1 node + 2x GP100 per node (PCIe) per node (NVLink) per node (PCIe) per node (NVLink)
22 AMBER 16 February 2017 PME-Cellulose_NPT on K80s
20 PME-Cellulose_NPT 16 15.43 Running AMBER version 16.3
The blue node contains Dual Intel Xeon 6.6X E5-2699 [email protected] [3.6GHz Turbo] 12 11.36 (Broadwell) CPUs
The green nodes contain Dual Intel ns/day 4.8X Xeon E5-2699 [email protected] [3.6GHz 8 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel 4 Xeon E5-2699 [email protected] [3.6GHz 2.35 Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
24 PME-Cellulose_NPT on P100s PCIe
40 PME-Cellulose_NPT 35
30.00 Running AMBER version 16.3 30 The blue node contains Dual Intel Xeon 25 E5-2699 [email protected] [3.6GHz Turbo] 21.85 (Broadwell) CPUs
ns/day 12.8X 20 The green nodes contain Dual Intel 9.3X Xeon E5-2699 [email protected] [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 10 ➢ 1x P100 PCIe is paired with Single 5 Intel Xeon E5-2699 [email protected] 2.35 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node
25 PME-Cellulose_NPT on P100s SXM2
40 PME-Cellulose_NPT 36.65 35 32.22 Running AMBER version 16.3 30 15.6X 13.7X The blue node contains Dual Intel Xeon 25 23.37 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 20
ns/day The green nodes contain Dual Intel 9.9X Xeon E5-2698 [email protected] [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 10 ➢ 1x P100 SXM2 is paired with Single 5 Intel Xeon E5-2698 [email protected] 2.35 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node
26 PME-Cellulose_NVE on K80s
20 PME-Cellulose_NVE 16.53 16 Running AMBER version 16.3
6.7X The blue node contains Dual Intel Xeon 11.85 E5-2699 [email protected] [3.6GHz Turbo] 12 (Broadwell) CPUs
ns/day 4.8X The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 8 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel 4 2.47 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
27 PME-Cellulose_NVE on P100s PCIe
40 PME-Cellulose_NVE 35 32.55 Running AMBER version 16.3 30 13.2X The blue node contains Dual Intel Xeon 25 23.34 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
ns/day 20 The green nodes contain Dual Intel 9.4X Xeon E5-2699 [email protected] [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 10 ➢ 1x P100 PCIe is paired with Single 5 Intel Xeon E5-2699 [email protected] 2.47 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node
28 PME-Cellulose_NVE on P100s SXM2
45 PME-Cellulose_NVE 40.88 40
35.16 35 Running AMBER version 16.3
30 16.6X The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 24.94 25 14.2X (Broadwell) CPUs
ns/day 20 The green nodes contain Dual Intel 10.1X Xeon E5-2698 [email protected] [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
10 ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected] 5 2.47 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node
29 PME-FactorIX_NPT on K80s
80 PME-FactorIX_NPT 70 66.68 Running AMBER version 16.3 60 5.8X The blue node contains Dual Intel Xeon 50 48.54 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
40
ns/day The green nodes contain Dual Intel 4.2X Xeon E5-2699 [email protected] [3.6GHz 30 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
20 ➢ 1x K80 is paired with Single Intel 11.43 Xeon E5-2699 [email protected] [3.6GHz 10 Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
30 PME-FactorIX_NPT on P100s PCIe
140 PME-FactorIX_NPT 132.86 120 Running AMBER version 16.3 98.77 11.6X 100 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 80 (Broadwell) CPUs
ns/day 8.6X The green nodes contain Dual Intel 60 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 40 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single 20 Intel Xeon E5-2699 [email protected] 11.43 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node
31 PME-FactorIX_NPT on P100s SXM2
180 PME-FactorIX_NPT 159.80 160 144.11 140 14.0X Running AMBER version 16.3
120 12.6X The blue node contains Dual Intel Xeon 106.25 E5-2699 [email protected] [3.6GHz Turbo] 100 (Broadwell) CPUs 9.3X ns/day 80 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
60 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
40 ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected] 20 11.43 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node
32 PME-FactorIX_NVE on K80s
80 PME-FactorIX_NVE 71.49 70 6.0X Running AMBER version 16.3 60
51.14 The blue node contains Dual Intel Xeon 50 E5-2699 [email protected] [3.6GHz Turbo] 5.4X (Broadwell) CPUs 40 The green nodes contain Dual Intel ns/day Xeon E5-2699 [email protected] [3.6GHz 30 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
20 ➢ 1x K80 is paired with Single Intel 11.98 Xeon E5-2699 [email protected] [3.6GHz 10 Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
33 PME-FactorIX_NVE on P100s PCIe
160 PME-FactorIX_NVE 145.83 140
12.2X Running AMBER version 16.3 120 105.86 The blue node contains Dual Intel Xeon 100 E5-2699 [email protected] [3.6GHz Turbo] 8.8X (Broadwell) CPUs
ns/day 80 The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 60 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 40 ➢ 1x P100 PCIe is paired with Single 20 Intel Xeon E5-2699 [email protected] 11.98 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node
34 PME-FactorIX_NVE on P100s SXM2
200
178.02 180 PME-FactorIX_NVE 159.24 160 14.9X Running AMBER version 16.3 140 The blue node contains Dual Intel Xeon 120 114.88 13.3X E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 100
ns/day The green nodes contain Dual Intel 80 9.6X Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 60 SXM2 GPUs
40 ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected] 20 11.98 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node
35 PME-JAC_NPT on K80s
250 PME-JAC_NPT 216.78 200 Running AMBER version 16.3
162.09 4.7X The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 150 (Broadwell) CPUs 3.5X The green nodes contain Dual Intel ns/day Xeon E5-2699 [email protected] [3.6GHz 100 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
45.89 ➢ 1x K80 is paired with Single Intel 50 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
36 PME-JAC_NPT on P100s PCIe
350 PME-JAC_NPT 327.69 300 283.60 7.1X Running AMBER version 16.3 250 The blue node contains Dual Intel Xeon 6.2X E5-2699 [email protected] [3.6GHz Turbo] 200 (Broadwell) CPUs
ns/day The green nodes contain Dual Intel 150 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single 45.89 50 Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node
37 PME-JAC_NPT on P100s SXM2
450 PME-JAC_NPT 423.09 400 360.64 350 9.2X Running AMBER version 16.3 310.52 300 7.9X The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 250 6.8X (Broadwell) CPUs
ns/day 200 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
150 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
100 ➢ 1x P100 SXM2 is paired with Single 45.89 Intel Xeon E5-2698 [email protected] 50 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe per node per node per node
38 PME-JAC_NVE on K80s
250 234.99 PME-JAC_NVE 200 Running AMBER version 16.3 173.20 4.9X The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 150 (Broadwell) CPUs
3.6X The green nodes contain Dual Intel ns/day Xeon E5-2699 [email protected] [3.6GHz 100 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
47.90 ➢ 1x K80 is paired with Single Intel 50 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
39 PME-JAC_NVE on P100s PCIe
400 PME-JAC_NVE 363.79 350 308.46 7.6X Running AMBER version 16.3 300 The blue node contains Dual Intel Xeon 250 E5-2699 [email protected] [3.6GHz Turbo] 6.4X (Broadwell) CPUs 200
ns/day The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 150 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 100 ➢ 1x P100 PCIe is paired with Single 47.90 50 Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node
40 PME-JAC_NVE on P100s SXM2
500 473.10
450 PME-JAC_NVE 402.18 9.9X 400 Running AMBER version 16.3
350 339.81 8.4X The blue node contains Dual Intel Xeon 300 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 250 7.1X
ns/day The green nodes contain Dual Intel 200 Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 150 SXM2 GPUs
100 ➢ 1x P100 SXM2 is paired with Single 47.90 Intel Xeon E5-2698 [email protected] 50 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe per node per node per node
41 GB-Myoglobin on K80s
400 GB-Myoglobin 350 339.45
Running AMBER version 16.3 300 288.47 11.8X The blue node contains Dual Intel Xeon 250 E5-2699 [email protected] [3.6GHz Turbo] 10.0X (Broadwell) CPUs ns/day 200 The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 150 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
100 ➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz 50 Turbo] (Broadwell) 28.86
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
42 GB-Myoglobin on P100s PCIe
600 GB-Myoglobin 561.94 500 483.37 19.5X Running AMBER version 16.3 400 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 16.7X (Broadwell) CPUs
ns/day 300 The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
200 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single 100 Intel Xeon E5-2699 [email protected] 28.86 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node
43 GB-Myoglobin on P100s SXM2
700 GB-Myoglobin 639.37 600 534.28 22.2X Running AMBER version 16.3 500 The blue node contains Dual Intel Xeon 18.5X E5-2699 [email protected] [3.6GHz Turbo] 400 (Broadwell) CPUs
ns/day The green nodes contain Dual Intel 300 Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 200 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single 100 Intel Xeon E5-2698 [email protected] 28.86 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe 4x P100 PCIe per node per node
44 GB-Nucleosome on K80s
25
GB-Nucleosome 20.55 20 Running AMBER version 16.3 51.4X The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 15 (Broadwell) CPUs
11.31 ns/day The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 10 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 5.84 28.3X ➢ 1x K80 is paired with Single Intel 5 Xeon E5-2699 [email protected] [3.6GHz 14.6X Turbo] (Broadwell) 0.40 0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
45 GB-Nucleosome on P100s PCIe
50 45.92 45 GB-Nucleosome 39.91 114.8X 40 Running AMBER version 16.3
35 99.8X The blue node contains Dual Intel Xeon 30 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
ns/day 25 22.77 The green nodes contain Dual Intel 20 Xeon E5-2699 [email protected] [3.6GHz 56.9X Turbo] (Broadwell) CPUs + Tesla P100 15 PCIe (16GB) GPUs 11.91 10 ➢ 1x P100 PCIe is paired with Single 29.8X Intel Xeon E5-2699 [email protected] 5 [3.6GHz Turbo] (Broadwell) 0.40 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node
46 GB-Nucleosome on P100s SXM2
60 GB-Nucleosome 50 48.29 46.29 Running AMBER version 16.3 120.7X 40 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 115.7X (Broadwell) CPUs 30
ns/day 25.53 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
20 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 13.36 63.8X ➢ 1x P100 SXM2 is paired with Single 10 Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) 0.40 33.4X 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node
47 Rubisco-75K on K80s
1.6 Rubisco-75K 1.4 1.34 Running AMBER version 16.3 1.2 134.0X The blue node contains Dual Intel Xeon 1.0 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
ns/day 0.8 0.69 The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 0.6 Turbo] (Broadwell) CPUs + Tesla K80 69.0X (autoboost) GPUs 0.4 0.35 ➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz 0.2 Turbo] (Broadwell)
0.01 35.0X 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
48 Rubisco-75K on P100s PCIe
4.5 Rubisco-75K 4.20 4.0 420.0X 3.5 Running AMBER version 16.3
3.0 The blue node contains Dual Intel Xeon 2.69 E5-2699 [email protected] [3.6GHz Turbo] 2.5 (Broadwell) CPUs
ns/day 2.0 269.0X The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
1.5 1.40 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
1.0 0.71 ➢ 1x P100 PCIe is paired with Single 140.0X Intel Xeon E5-2699 [email protected] 0.5 [3.6GHz Turbo] (Broadwell) 0.01 71.0X 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node
49 Rubisco-75K on P100s SXM2
5.0
4.46 4.5 Rubisco-75K
4.0 Running AMBER version 16.3
3.5 446.0X The blue node contains Dual Intel Xeon 3.06 3.0 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 2.5 306.0X
ns/day The green nodes contain Dual Intel 2.0 Xeon E5-2698 [email protected] [3.6GHz 1.57 Turbo] (Broadwell) CPUs + Tesla P100 1.5 SXM2 GPUs
1.0 0.80 157.0X ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected] 0.5 [3.6GHz Turbo] (Broadwell) 0.01 80.0X 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node
50 AMBER 14 AMBER 14 vs. AMBER 12
Courtesy of Scott Le Grand From GTC 2014 presentation
52 AMBER 14; large P2P and small Boost Clocks impacts
AMBER 14 (ns/day) on 4x K40; P2P and Boost Clocks Impact DHFR NVE PME, 2fs Benchmark (CUDA 6.0, ECC off) 250
215.18
196.68 200
150 132.97
125.77 ns/day 100
No Boost Boost No Boost Boost No P2P No P2P P2P P2P 50
0 2 x Xeon E5-2690 [email protected] + 4 x 2 x Xeon E5-2690 [email protected] + 4 x 2 x Xeon E5-2690 [email protected] + 4 x 2 x Xeon E5-2690 [email protected] + 4 x Tesla K40@745Mhz (no P2P) Tesla K40@875Mhz (no P2P) Tesla K40@745Mhz (P2P) Tesla K40@875Mhz (P2P) Series1 125.77 132.97 196.68 215.18
53 AMBER Performance Over Time
Courtesy of Scott Le Grand From GTC 2014 presentation 54 54 Cellulose on K40s, K80s and M6000s
20 Running AMBER version 14 PME-Cellulose_NVE The blue node contains Dual Intel E5- 15.38 16 14.90 2698 [email protected], 3.6GHz Turbo CPUs 13.67 The green nodes contain Dual Intel E5- 11.76 8.0X 12 7.7X 2698 [email protected], 3.6GHz Turbo CPUs + 10.49 7.1X either NVIDIA Tesla K40@875Mhz, Tesla 8.96 6.1X K80@562Mhz (autoboost), or Quadro 7.87 8 M6000@987Mhz GPUs 4.6X 5.4X 4.1X 4 Simulated Simulated Time (ns/day) 1.93
0 1 Haswell 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node Node + 1x K40 + 0.5x K80 + 1x K80 + 1x M6000 + 2x K40 + 2x K80 + 2x M6000
55 Factor IX on K40s, K80s and M6000s
80 Running AMBER version 14 PME-FactorIX_NVE 70 66.89 The blue node contains Dual Intel E5- 61.18 60.93 2698 [email protected], 3.6GHz Turbo CPUs 60 7.0X 50.70 6.4X The green nodes contain Dual Intel E5- 50 47.80 6.3X 2698 [email protected], 3.6GHz Turbo CPUs + 40.48 5.2X either NVIDIA Tesla K40@875Mhz, Tesla 40 5.0X K80@562Mhz (autoboost), or Quadro 33.59 M6000@987Mhz GPUs 30 4.2X 3.5X 20
Simulated Simulated Time (ns/day) 9.68 10
0 1 Haswell 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node Node + 1x K40 + 0.5x K80 + 1x K80 + 1x M6000 + 2x K40 + 2x K80 + 2x M6000
56 JAC on K40s, K80s and M6000s
250 Running AMBER version 14 225.34 PME-JAC_NVE 219.83 200.34 The blue node contains Dual Intel E5- 200 2698 [email protected], 3.6GHz Turbo CPUs 6.0X 5.9X 174.34 5.4X 161.53 The green nodes contain Dual Intel E5- 2698 [email protected], 3.6GHz Turbo CPUs + 150 134.82 4.7X 121.30 either NVIDIA Tesla K40@875Mhz, Tesla 4.3X K80@562Mhz (autoboost), or Quadro 3.6X 100 M6000@987Mhz GPUs 3.2X
50 37.38 Simulated Simulated Time (ns/day)
0 1 Haswell 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node Node + 1x K40 + 0.5x K80 + 1x K80 + 1x M6000 + 2x K40 + 2x K80 + 2x M6000
57 Cellulose on M40s
18
PME - Cellulose_NPT 15.90 16 Running AMBER version 14 14.40 14.9X 14 The blue node contain Single Intel Xeon 13.5X E5-2698 [email protected] (Haswell) CPUs 12 The green nodes contain Single Intel 10.12 10 Xeon E5-2697 [email protected] (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs 8 9.5X
6 Simulated Simulated Time (ns/Day)
4
2 1.07
0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node
58 Cellulose on M40s
18 PME - Cellulose_NVE 17.13 16 15.41 16.0X Running AMBER version 14 14 The blue node contain Single Intel Xeon 14.4X E5-2698 [email protected] (Haswell) CPUs 12 10.50 The green nodes contain Single Intel 10 Xeon E5-2697 [email protected] (IvyBridge) 9.8X CPUs + Tesla M40 (autoboost) GPUs 8
6 Simulated Simulated Time (ns/Day)
4
2 1.07
0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node
59 FactorIX on M40s
80 PME - FactorIX_NPT 72.96 70 67.37 Running AMBER version 14 13.6X 60 The blue node contain Single Intel Xeon 12.5X E5-2698 [email protected] (Haswell) CPUs 50 46.90 The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge) 40 CPUs + Tesla M40 (autoboost) GPUs 8.7X
30 Simulated Simulated Time (ns/Day) 20
10 5.38
0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node
60 FactorIX on M40s
90 PME - FactorIX_NVE 80.04 80 Running AMBER version 14 73.00 14.6X 70 The blue node contain Single Intel Xeon 13.3X E5-2698 [email protected] (Haswell) CPUs 60 The green nodes contain Single Intel 49.33 50 Xeon E5-2697 [email protected] (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs 40 9.0X
30 Simulated Simulated Time (ns/Day)
20
10 5.47
0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node
61 JAC on M40s
250 PME - JAC_NPT 226.63 211.97 Running AMBER version 14 200 10.9X The blue node contain Single Intel Xeon 10.2X E5-2698 [email protected] (Haswell) CPUs 149.40 150 The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge) 7.2X CPUs + Tesla M40 (autoboost) GPUs
100 Simulated Simulated Time (ns/Day)
50
20.88
0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node
62 JAC on M40s
300 PME - JAC_NVE 246.15 Running AMBER version 14 250 230.18 The blue node contain Single Intel Xeon 11.7X E5-2698 [email protected] (Haswell) CPUs 200 The green nodes contain Single Intel 157.68 10.9X Xeon E5-2697 [email protected] (IvyBridge) 150 CPUs + Tesla M40 (autoboost) GPUs 7.5X
100 Simulated Simulated Time (ns/Day)
50 21.11
0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node
63 Myoglobin on M40s
350 GB - Myoglobin 322.09 300.86 300 Running AMBER version 14 32.8X The blue node contain Single Intel Xeon 30.6X E5-2698 [email protected] (Haswell) CPUs 250 232.20 The green nodes contain Single Intel 200 Xeon E5-2697 [email protected] (IvyBridge) 23.6X CPUs + Tesla M40 (autoboost) GPUs 150
Simulated Simulated Time (ns/Day) 100
50
9.83 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node
64 Nucleosome on M40s
18 GB - Nucleosome 16.11 16 Running AMBER version 14
14 The blue node contain Single Intel Xeon 123.9X E5-2698 [email protected] (Haswell) CPUs 12 The green nodes contain Single Intel 10 Xeon E5-2697 [email protected] (IvyBridge) 9.05 CPUs + Tesla M40 (autoboost) GPUs 8 69.6X
6 Simulated Simulated Time (ns/Day) 4.67 4 35.9X 2
0.13 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node
65 TrpCage on M40s
900 831.91 GB - TrpCage 800 2.03X Running AMBER version 14 700 The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs 600 551.36 The green nodes contain Single Intel 500 464.63 Xeon E5-2697 [email protected] (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs 408.88 1.3X 400 1.1X
300 Simulated Simulated Time (ns/Day)
200
100
0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node
66 Recommended GPU Node Configuration for AMBER Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2
Cores per CPU socket 6+ (1 CPU core drives 1 GPU)
CPU speed (Ghz) 2.66+ System memory per node (GB) 16
GPUs Kepler K20, K40, K80, P100
1-4 # of GPUs per CPU socket
GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 16x or higher Server storage 2 TB
Network configuration Infiniband QDR or better 67 Scale to multiple nodes with same single node configuration 67 CHARMM DOMDEC-GUI July 2016 CHARMM DOMDEC-GUI 465 K System Benchmark
4 465 K System
(Her1_HER1_membrane) Running CHARMM version c40a1 3 *Higher is better The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs 2.15 The green nodes contain Dual Intel 2 ns/day Xeon E5-2698 [email protected] GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs 6.0X
1 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36
0 1 Haswell node 1 node + 1x K80 per node
69 CHARMM DOMDEC-GUI 534 K System Benchmark
2.0 534 K System
(POPC_PSPC_CHL1mixture) Running CHARMM version c40a1 1.5 1.43 *Higher is better The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel
1.0 Xeon E5-2698 [email protected] GHz (Haswell) ns/day 8.0X CPUs + Tesla K80 (autoboost) GPUs
0.5 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.18
0.0 1 Haswell node 1 node + 1x K80 per node
70 CHARMM DOMDEC-GUI 20 K System Benchmark
80 20 K System (Crambin)
59.68 Running CHARMM version c40a1 60 *Higher is better The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel ns/day 40 Xeon E5-2698 [email protected] GHz (Haswell) CPUs + Tesla M40 GPUs 3.7X
20 16.00 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error.
0 1 Haswell node 1 node + 1x M40 per node
71 CHARMM DOMDEC-GUI 61 K System Benchmark
35 61 K System (GlnBP) 30 *Higher is better Running CHARMM version c40a1 25.08 25 The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs 20 The green nodes contain Dual Intel
6.4X Xeon E5-2698 [email protected] GHz (Haswell) ns/day 15 CPUs + Tesla M40 GPUs
10 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 5 3.90
0 1 Haswell node 1 node + 1x M40 per node
72 CHARMM DOMDEC-GUI 465 K System Benchmark
4 465 K System
(Her1_HER1_membrane) Running CHARMM version c40a1 3 *Higher is better The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs 2.27 The green nodes contain Dual Intel 2 Xeon E5-2698 [email protected] GHz (Haswell) ns/day CPUs + Tesla M40 GPUs 6.3X
1 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36
0 1 Haswell node 1 node + 1x M40 per node
73 GROMACS 2016 October 2016 Erik Lindahl (GROMACS developer) video
75 Water 1.5M on K80s
7 Water 1.5M 6.14 6 5.22 2.2X 5 Running GROMACS version 2016 1.9X The blue node contains Dual Intel Xeon 4 E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs ns/day 3 2.79 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
2 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
1
0 1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node
76 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. Water 3M on K80s
4
Water 3M 3.05 3 2.66 2.3X 3 Running GROMACS version 2016 2.0X The blue node contains Dual Intel Xeon 2 E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs ns/day 2 1.32 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 1 (autoboost) GPUs
1
0 1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node
77 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. Water 1.5M on M40s
8 Water 1.5M 7.60 7 6.15 2.7X 6 2.2X Running GROMACS version 2016 5 The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
4 (Broadwell) CPUs ns/day The green nodes contain Dual Intel 2.79 3 Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla M40 2 (autoboost) GPUs
1
0 1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node
78 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. Water 3M on M40s
4.5
3.94 4.0 Water 3M
3.5
2.97 3.0X Running GROMACS version 2016 3.0 The blue node contains Dual Intel Xeon 2.5 E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
ns/day 2.3X 2.0 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz 1.5 1.32 Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs 1.0
0.5
0.0 1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node
79 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. Water 1.5M on P40s
9
8.07 8 Water 1.5M
7 6.60 2.9X Running GROMACS version 2016 6 2.4X The blue node contains Dual Intel Xeon 5 E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
ns/day 4 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz 3 2.79 Turbo] (Broadwell) CPUs + Tesla P40 GPUs 2
1
0 1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node
80 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. Water 3M on P40s
4.5 4.19
4.0 Water 3M
3.5 3.36 3.2X Running GROMACS version 2016 3.0 2.5X The blue node contains Dual Intel Xeon 2.5 E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
ns/day 2.0 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz 1.5 1.32 Turbo] (Broadwell) CPUs + Tesla P40 GPUs 1.0
0.5
0.0 1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node
81 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. Water 1.5M on P100 PCIes
8 Water 1.5M 7.11 7 6.34 2.5X 6 Running GROMACS version 2016 5 2.3X The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
4 (Broadwell) CPUs ns/day
3 2.79 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + 2 Tesla P100 PCIe (16GB) GPUs
1
0 1 Broadwell node 1 node + 2x P100 PCIe (16GB) 1 node + 4x P100 PCIe (16GB) per node per node
82 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. Water 3M on P100 PCIes
4.0
3.5 Water 3M 3.43 3.16
3.0 2.6X 2.4X Running GROMACS version 2016 2.5 The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
2.0 (Broadwell) CPUs ns/day
1.5 1.32 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + 1.0 Tesla P100 PCIe (16GB) GPUs
0.5
0.0 1 Broadwell node 1 node + 2x P100 PCIe (16GB) 1 node + 4x P100 PCIe (16GB) per node per node
83 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. GROMACS 5.1.2 February 2017 Water 1.5M on K80s
7 Water 1.5M 6 5.75 Running GROMACS version 5.1.2
5 The blue node contains Dual Intel Xeon 1.9X E5-2699 [email protected] [3.6GHz Turbo] 4 (Broadwell) CPUs 3.49
3.04 The green nodes contain Dual Intel ns/day 3 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 1.1X (autoboost) GPUs 2 ➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz 1 Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
85 Water 1.5M on P100s PCIe
10 Water 1.5M 8 7.21 Running GROMACS version 5.1.2 6.96 The blue node contains Dual Intel Xeon 6 E5-2699 [email protected] [3.6GHz Turbo] 2.4X (Broadwell) CPUs 4.39 ns/day 2.3X The green nodes contain Dual Intel 4 Xeon E5-2699 [email protected] [3.6GHz 3.04 Turbo] (Broadwell) CPUs + Tesla P100 1.4X PCIe (16GB) GPUs 2 ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node per node
86 Water 1.5M on P100s SXM2
9
7.88 8 Water 1.5M 7.18 7 6.70 2.6X Running GROMACS version 5.1.2
6 2.4X The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 5 2.2X (Broadwell) CPUs
4.11 ns/day 4 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz 3.04 3 1.4X Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
2 ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected] 1 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x 100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node
87 Water 3M on K80s
3.5
Water 3M 2.98 3.0 Running GROMACS version 5.1.2 2.5 The blue node contains Dual Intel Xeon 2.2X E5-2699 [email protected] [3.6GHz Turbo] 2.0 (Broadwell) CPUs
1.59 The green nodes contain Dual Intel ns/day 1.5 1.38 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 1.2X (autoboost) GPUs 1.0 ➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz 0.5 Turbo] (Broadwell)
0.0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
88 Water 3M on P100s PCIe
4.0 Water 3M 3.80 3.5 3.43
2.8X Running GROMACS version 5.1.2 3.0 2.5X The blue node contains Dual Intel Xeon 2.5 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 1.96
ns/day 2.0 The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 1.5 1.38 Turbo] (Broadwell) CPUs + Tesla P100 1.4X PCIe (16GB) GPUs 1.0 ➢ 1x P100 PCIe is paired with Single 0.5 Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0.0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node per node
89 Water 3M on P100s SXM2
4.5
4.0 Water 3M 3.82
3.50 3.5 Running GROMACS version 5.1.2
3.0 2.8X The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 2.5 2.5X (Broadwell) CPUs
ns/day 2.0 1.84 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
1.5 1.38 Turbo] (Broadwell) CPUs + Tesla P100 1.3X SXM2 GPUs 1.0 ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected] 0.5 [3.6GHz Turbo] (Broadwell)
0.0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node
90 Recommended GPU Node Configuration for GROMACS Computational Chemistry
Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+
System memory per socket (GB) 32
GPUs Kepler K20, K40, K80
1x # of GPUs per CPU socket Kepler GPUs: need fast Sandy Bridge or Ivy Bridge, or high-end AMD Opterons
GPU memory preference (GB) 6
GPU to CPU connection PCIe 3.0 or higher Server storage 500 GB or higher
91 Network configuration Gemini, InfiniBand 91 HOOMD-Blue 1.3.3 February 2017 lj-liquid on K80s
2500 lj-liquid 2000 1942.12 Running HOOMD-Blue version 1.3.3
1594.37 The blue node contains Dual Intel Xeon 5.9X E5-2699 [email protected] [3.6GHz Turbo] 1500 1324.84 (Broadwell) CPUs The green nodes contain Dual Intel timesteps/sec 4.9X Xeon E5-2699 [email protected] [3.6GHz 1000 avg Turbo] (Broadwell) CPUs + Tesla K80 4.1X (autoboost) GPUs ➢ 1x K80 is paired with Single Intel 500 326.52 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
93 lj-liquid on P100s PCIe
3500 lj-liquid 3217.68 3000 2912.66 9.9X Running HOOMD-Blue version 1.3.3 2500 The blue node contains Dual Intel Xeon 8.9X E5-2699 [email protected] [3.6GHz Turbo] /sec 2000 (Broadwell) CPUs
The green nodes contain Dual Intel timesteps 1500 Xeon E5-2699 [email protected] [3.6GHz
avg Turbo] (Broadwell) CPUs + Tesla P100 1000 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single 500 326.52 Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 8x P100 PCIe (16GB) per node per node
94 lj-liquid on P100s SXM2
4000 lj-liquid 3500 3397.74 3129.11 Running HOOMD-Blue version 1.3.3 3000 10.4X The blue node contains Dual Intel Xeon 2500 E5-2699 [email protected] [3.6GHz Turbo] /sec 9.6X (Broadwell) CPUs 2000 The green nodes contain Dual Intel timesteps Xeon E5-2698 [email protected] [3.6GHz
avg 1500 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 1000 ➢ 1x P100 SXM2 is paired with Single 500 Intel Xeon E5-2698 [email protected] 326.52 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1x P100 SXM2 8x P100 SXM2 per node per node
95 lj_liquid_512k on K80s
600 lj_liquid_512k 526.47 500 12.1X Running HOOMD-Blue version 1.3.3 400 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 334.59 /sec (Broadwell) CPUs
300 The green nodes contain Dual Intel
timesteps Xeon E5-2699 [email protected] [3.6GHz 220.10 7.7X
avg Turbo] (Broadwell) CPUs + Tesla K80 200 (autoboost) GPUs 5.1X ➢ 1x K80 is paired with Single Intel 100 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) 43.43
0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
96 lj_liquid_512k on P100s PCIe
1200 lj_liquid_512k 1045.50 1000 24.1X Running HOOMD-Blue version 1.3.3 800 770.18 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] /sec 17.7X (Broadwell) CPUs 600 534.54 The green nodes contain Dual Intel timesteps Xeon E5-2699 [email protected] [3.6GHz
avg 398.12 400 12.3X Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
9.2X ➢ 1x P100 PCIe is paired with Single 200 Intel Xeon E5-2699 [email protected] 43.43 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node
97 lj_liquid_512k on P100s SXM2
1200 1119.76 lj_liquid_512k 1000 Running HOOMD-Blue version 1.3.3
793.36 25.8X 800 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] /sec 18.3X (Broadwell) CPUs 600 568.51
The green nodes contain Dual Intel timesteps
443.74 Xeon E5-2698 [email protected] [3.6GHz avg 400 13.1X Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 10.2X ➢ 1x P100 SXM2 is paired with Single 200 Intel Xeon E5-2698 [email protected] 43.43 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node
98 lj_liquid_1m on K80s
350
lj_liquid_1m 303.00 300 Running HOOMD-Blue version 1.3.3 250 The blue node contains Dual Intel Xeon 13.7X E5-2699 [email protected] [3.6GHz Turbo]
/sec 200 (Broadwell) CPUs 181.42 The green nodes contain Dual Intel
timesteps 150 Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 avg 109.54 8.2X (autoboost) GPUs 100 ➢ 1x K80 is paired with Single Intel 5.0X Xeon E5-2699 [email protected] [3.6GHz 50 Turbo] (Broadwell) 22.07
0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
99 lj_liquid_1m on P100s PCIe
800 lj_liquid_1m 700 672.46
Running HOOMD-Blue version 1.3.3 600 30.5X The blue node contains Dual Intel Xeon 500 465.58 E5-2699 [email protected] [3.6GHz Turbo]
/sec (Broadwell) CPUs 400 The green nodes contain Dual Intel timesteps 294.88 21.1X Xeon E5-2699 [email protected] [3.6GHz 300 avg Turbo] (Broadwell) CPUs + Tesla P100 204.67 13.4X PCIe (16GB) GPUs 200 9.3X ➢ 1x P100 PCIe is paired with Single 100 Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) 22.07 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node
100 lj_liquid_1m on P100s SXM2
800 lj_liquid_1m 707.73 700
Running HOOMD-Blue version 1.3.3 600 32.1X The blue node contains Dual Intel Xeon 488.04 500 E5-2699 [email protected] [3.6GHz Turbo] /sec (Broadwell) CPUs 400 22.1X The green nodes contain Dual Intel timesteps 315.07 Xeon E5-2698 [email protected] [3.6GHz
avg 300 Turbo] (Broadwell) CPUs + Tesla P100 221.02 14.3X SXM2 GPUs 200 10.0X ➢ 1x P100 SXM2 is paired with Single 100 Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) 22.07 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node
101 Microsphere on K80s
180 microsphere 166.74 160 9.5X 140 Running HOOMD-Blue version 1.3.3
120 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
/sec 98.43 (Broadwell) CPUs 100 The green nodes contain Dual Intel 80 timesteps Xeon E5-2699 [email protected] [3.6GHz 64.87 avg 5.6X Turbo] (Broadwell) CPUs + Tesla K80 60 (autoboost) GPUs 3.7X 40 ➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz 17.53 20 Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
102 Microsphere on P100s PCIe
400 microsphere 371.24 350
21.2X Running HOOMD-Blue version 1.3.3 300 257.58 The blue node contains Dual Intel Xeon 250 E5-2699 [email protected] [3.6GHz Turbo] /sec 14.7X (Broadwell) CPUs 200
179.54 The green nodes contain Dual Intel timesteps 145.71 Xeon E5-2699 [email protected] [3.6GHz avg 150 Turbo] (Broadwell) CPUs + Tesla P100 10.2X PCIe (16GB) GPUs 100 8.3X ➢ 1x P100 PCIe is paired with Single 50 Intel Xeon E5-2699 [email protected] 17.53 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node
103 Microsphere on P100s SXM2
450 microsphere 400 384.72
350 Running HOOMD-Blue version 1.3.3 21.9X 300 The blue node contains Dual Intel Xeon
271.21 E5-2699 [email protected] [3.6GHz Turbo] /sec 250 (Broadwell) CPUs 15.5X 200 186.01 The green nodes contain Dual Intel timesteps Xeon E5-2698 [email protected] [3.6GHz
avg 151.51 150 Turbo] (Broadwell) CPUs + Tesla P100 10.6X SXM2 GPUs 100 ➢ 1x P100 SXM2 is paired with Single 8.6X Intel Xeon E5-2698 [email protected] 50 17.53 [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node
104 Polymer on K80s
1600 polymer 1518.99 1400
1209.45 4.2X Running HOOMD-Blue version 1.3.3 1200 The blue node contains Dual Intel Xeon 975.14 1000 E5-2699 [email protected] [3.6GHz Turbo] /sec 3.3X (Broadwell) CPUs 800 The green nodes contain Dual Intel
timesteps 2.7X Xeon E5-2699 [email protected] [3.6GHz
avg 600 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 362.19 400 ➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz 200 Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
105 Polymer on P100s PCIe
3000
polymer 2480.70 2500 Running HOOMD-Blue version 1.3.3 2143.15 1999.64 6.8X 2000 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] /sec 5.9X (Broadwell) CPUs 1500 The green nodes contain Dual Intel
timesteps 5.5X Xeon E5-2699 [email protected] [3.6GHz avg 1000 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single 500 362.19 Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe (16GB) 4x P100 PCIe (16GB) 8x P100 PCIe (16GB) per node per node per node
106 Polymer on P100s SXM2
3000 polymer 2651.56 2500 2272.27 Running HOOMD-Blue version 1.3.3 2111.99 7.3X 2000 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] /sec 6.3X (Broadwell) CPUs 1500 The green nodes contain Dual Intel
timesteps 5.8X Xeon E5-2698 [email protected] [3.6GHz avg 1000 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single 500 362.19 Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node
107 Quasicrystal on K80s
1400 quasicrystal 1280.44 1200 16.3X Running HOOMD-Blue version 1.3.3 1000 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
/sec 800 767.90 (Broadwell) CPUs
The green nodes contain Dual Intel
timesteps 600 Xeon E5-2699 [email protected] [3.6GHz 502.53
avg Turbo] (Broadwell) CPUs + Tesla K80 9.8X (autoboost) GPUs 400 ➢ 1x K80 is paired with Single Intel 6.4X Xeon E5-2699 [email protected] [3.6GHz 200 Turbo] (Broadwell) 78.32
0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
108 Quasicrystal on P100s PCIe
2500 quasicrystal 2261.72 2000 Running HOOMD-Blue version 1.3.3 1791.41 28.9X The blue node contains Dual Intel Xeon 1500 E5-2699 [email protected] [3.6GHz Turbo] /sec (Broadwell) CPUs 1199.64 22.9X
The green nodes contain Dual Intel timsteps 1000 Xeon E5-2699 [email protected] [3.6GHz avg 851.29 Turbo] (Broadwell) CPUs + Tesla P100 15.3X PCIe (16GB) GPUs 10.9X 500 ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) 78.32 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node
109 Quasicrystal on P100s SXM2
3000 quasicrystal 2500 2429.68 Running HOOMD-Blue version 1.3.3
2000 1940.29 31.0X The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] /sec 24.8X (Broadwell) CPUs 1500 1249.90 The green nodes contain Dual Intel
timsteps Xeon E5-2698 [email protected] [3.6GHz avg 1000 939.53 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
16.0X ➢ 1x P100 SXM2 is paired with Single 500 Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) 78.32 12.0X 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node
110 Triblock-copolymer on K80s
1600 triblock-copolymer 1492.01 1400 4.1X Running HOOMD-Blue version 1.3.3 1200 1170.47 The blue node contains Dual Intel Xeon 1000 953.01 E5-2699 [email protected] [3.6GHz Turbo] /sec 3.2X (Broadwell) CPUs 800 The green nodes contain Dual Intel
timesteps 2.6X Xeon E5-2699 [email protected] [3.6GHz
avg 600 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 361.42 400 ➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz 200 Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
111 Triblock-copolymer on P100s PCIe
3000 triblock-copolymer 2456.09 2500 Running HOOMD-Blue version 1.3.3 2155.27 1999.14 6.8X 2000 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
/sec 6.0X (Broadwell) CPUs 1500 5.5X The green nodes contain Dual Intel
timesteps Xeon E5-2699 [email protected] [3.6GHz avg 1000 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single 500 361.42 Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe (16GB) 4x P100 PCIe (16GB) 8x P100 PCIe (16GB) per node per node per node
112 Triblock-copolymer on P100s SXM2
3000.00
triblock-copolymer 2587.91 2500.00 2253.83 Running HOOMD-Blue version 1.3.3 2132.92 7.2X 2000.00 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] /sec 6.2X (Broadwell) CPUs 1500.00 The green nodes contain Dual Intel
timesteps 5.9X Xeon E5-2698 [email protected] [3.6GHz avg 1000.00 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single 500.00 361.42 Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell)
0.00 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node
113 LAMMPS 2016 February 2017 Atomic-Fluid Lennard-Jones 2.5 Cutoff on K80s 1.00 Atomic-Fluid Lennard-Jones
0.80 2.5 Cutoff
Running LAMMPS version 2016
0.60 0.57 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
1/seconds 0.40 0.37 The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 1.5X Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 0.20
0.00 1 Broadwell node 1 node + 2x K80 per node
115 Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s PCIe 1.00 Atomic-Fluid Lennard-
0.80 Jones 2.5 Cutoff
0.62 Running LAMMPS version 2016 0.60 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 0.37 1/seconds 0.40 The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 1.7X Turbo] (Broadwell) CPUs + Tesla P100 0.20 PCIe (16GB) GPUs
0.00 1 Broadwell node 1 node + 2x P100 PCIe (16GB) per node
116 Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s SXM2
1.00 Atomic-Fluid Lennard- Jones 2.5 Cutoff 0.75 Running LAMMPS version 2016 0.64 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 0.50 (Broadwell) CPUs
0.37 The green nodes contain Dual Intel 1/seconds 1.7X Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 0.25 SXM2 (autoboost) GPUs
0.00 1 Broadwell node 1 node + 2x P100 SXM2 per node
117 Atomic-Fluid Lennard-Jones 5.0 Cutoff on K80s 1.00 Atomic-Fluid Lennard-
0.80 Jones 5.0 Cutoff Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 0.60 (Broadwell) CPUs
The green nodes contain Dual Intel
1/seconds Xeon E5-2699 [email protected] [3.6GHz 0.40 0.36 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 0.26 3.6X ➢ 1x K80 is paired with Single Intel 0.20 0.14 Xeon E5-2699 [email protected] [3.6GHz 0.10 2.6X Turbo] (Broadwell) 1.4X 0.00 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
118 Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s PCIe 1.00 Atomic-Fluid Lennard-Jones
0.80 5.0 Cutoff Running LAMMPS version 2016
The blue node contains Dual Intel Xeon 0.60 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
The green nodes contain Dual Intel 0.37 0.38 Xeon E5-2699 [email protected] [3.6GHz 1/seconds 0.40 0.35 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 0.22 3.7X 3.8X 0.20 3.5X ➢ 1x P100 PCIe is paired with Single 0.10 Intel Xeon E5-2699 [email protected] 2.2X [3.6GHz Turbo] (Broadwell) 0.00 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node
119 Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s SXM2
1.00 Atomic-Fluid Lennard-
Running LAMMPS version 2016 0.75 Jones 5.0 Cutoff The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 0.50 0.41 The green nodes contain Dual Intel
1/seconds 0.36 Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 4.1X SXM2 GPUs 0.25 0.22 3.6X ➢ 1x P100 SXM2 is paired with Single 0.10 Intel Xeon E5-2698 [email protected] 2.2X [3.6GHz Turbo] (Broadwell)
0.00 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node
120 Course-grain Water on K80s
0.0100 Course-grain Water 0.0080
Running LAMMPS version 2016
0.0060 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
0.00437 0.00444 (Broadwell) CPUs 1/seconds 0.0040 The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz 1.0X Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 0.0020
0.0000 1 Broadwell node 1 node + 4x K80 per node
121 Course-grain Water on P100s PCIe
0.0100 0.0093 0.0090 Course-grain Water
0.0080
0.0070 2.1X Running LAMMPS version 2016 0.0061 0.0060 The blue node contains Dual Intel Xeon 0.0050 E5-2699 [email protected] [3.6GHz Turbo] 0.0044 (Broadwell) CPUs
1/seconds 0.0040 1.4X The green nodes contain Dual Intel 0.0030 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 0.0020 PCIe (16GB) GPUs
0.0010
0.0000 1 Broadwell node 1 node + 1 node + 4x P100 PCIe (16GB) 8x P100 PCIe (16GB) per node per node
122 Course-grain Water on P100s SXM2
0.0120 Course-grain Water 0.0110 0.0100 2.5X 0.0080 Running LAMMPS version 2016 0.0069 The blue node contains Dual Intel Xeon 0.0060 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 1/seconds 0.0044 1.6X 0.0040 The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 0.0020
0.0000 1 Broadwell node 1 node + 1 node + 4x P100 SXM2 8x 100 SXM2 per node per node
123 EAM on K80s
0.07 0.07 EAM 0.06 7.0X Running LAMMPS version 2016 0.05 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 0.04 0.04 (Broadwell) CPUs
The green nodes contain Dual Intel 0.03 1/seconds Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 0.02 4.0X (autoboost) GPUs 0.02 ➢ 1x K80 is paired with Single Intel 0.01 2.0X Xeon E5-2699 [email protected] [3.6GHz 0.01 Turbo] (Broadwell)
0.00 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
124 EAM on P100s PCIe
0.14 EAM 0.13 0.12 13.0X Running LAMMPS version 2016 0.10 The blue node contains Dual Intel Xeon 0.08 E5-2699 [email protected] [3.6GHz Turbo] 0.08 (Broadwell) CPUs
The green nodes contain Dual Intel 0.06
1/seconds Xeon E5-2699 [email protected] [3.6GHz 0.05 8.0X Turbo] (Broadwell) CPUs + Tesla P100 0.04 PCIe (16GB) GPUs 0.03 5.0X ➢ 1x P100 PCIe is paired with Single 0.02 Intel Xeon E5-2699 [email protected] 0.01 3.0X [3.6GHz Turbo] (Broadwell) 0.00 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node
125 EAM on P100s SXM2
0.14 EAM 0.13 0.12 13.0X Running LAMMPS version 2016 0.10 The blue node contains Dual Intel Xeon 0.08 E5-2699 [email protected] [3.6GHz Turbo] 0.08 (Broadwell) CPUs
8.0X The green nodes contain Dual Intel 0.06 Xeon E5-2698 [email protected] [3.6GHz 0.05
1/seconds Turbo] (Broadwell) CPUs + Tesla P100 0.04 SXM2 GPUs 0.03 5.0X ➢ 1x P100 SXM2 is paired with Single 0.02 Intel Xeon E5-2698 [email protected] 0.01 3.0X [3.6GHz Turbo] (Broadwell) 0.00 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node
126 Gay-Berne on K80s
0.05 Gay-Berne 0.04 0.04 Running LAMMPS version 2016
0.03 The blue node contains Dual Intel Xeon 4.0X E5-2699 [email protected] [3.6GHz Turbo] 0.03 (Broadwell) CPUs
The green nodes contain Dual Intel 3.0X Xeon E5-2699 [email protected] [3.6GHz 1/seconds 0.02 Turbo] (Broadwell) CPUs + Tesla K80 0.02 0.01 (autoboost) GPUs
2.0X ➢ 1x K80 is paired with Single Intel 0.01 Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0.00 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
127 Gay-Berne on P100s PCIe
0.05 Gay-Berne 0.05
0.04 5.0X 0.04 Running LAMMPS version 2016 4.0X The blue node contains Dual Intel Xeon 0.03 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
The green nodes contain Dual Intel 0.02 0.02 Xeon E5-2699 [email protected] [3.6GHz 1/seconds Turbo] (Broadwell) CPUs + Tesla P100 0.01 2.0X PCIe (16GB) GPUs 0.01 ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0.00 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node per node
128 Gay-Berne on P100s SXM2
0.05 Gay-Berne 0.05 0.04 5.0X 0.04 Running LAMMPS version 2016 4.0X The blue node contains Dual Intel Xeon 0.03 E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs
The green nodes contain Dual Intel 0.02
1/seconds 0.02 Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 0.01 2.0X SXM2 GPUs 0.01 ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell)
0.00 1 Broadwell node 1 node + 1 node + 1 node + 1x SXM2 2x SXM2 4x SXM2 per node per node per node
129 Rhodopsin on K80s
0.40 Rhodopsin 0.38 0.35 0.31 1.7X Running LAMMPS version 2016 0.30 1.4X The blue node contains Dual Intel Xeon 0.25 E5-2699 [email protected] [3.6GHz Turbo] 0.22 0.22 (Broadwell) CPUs
0.20 The green nodes contain Dual Intel
1/seconds Xeon E5-2699 [email protected] [3.6GHz 0.15 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.10 ➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz 0.05 Turbo] (Broadwell)
0.00 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node
130 Rhodopsin on P100s PCIe
0.60
Rhodopsin 0.52 0.50 0.48 2.4X Running LAMMPS version 2016 0.40 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] 0.33 2.2X (Broadwell) CPUs 0.30 0.29 1.5X The green nodes contain Dual Intel
0.22 Xeon E5-2699 [email protected] [3.6GHz 1/seconds 0.20 1.3X Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single 0.10 Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell)
0.00 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node
131 Rhodopsin on P100s SXM2
0.60
Rhodopsin 0.50 0.50 0.49 Running LAMMPS version 2016 2.2X 2.3X 0.40 0.38 The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo] (Broadwell) CPUs 0.30 1.7X 0.30 The green nodes contain Dual Intel
1/seconds 0.22 1.4X Xeon E5-2698 [email protected] [3.6GHz 0.20 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single 0.10 Intel Xeon E5-2698 [email protected] [3.6GHz Turbo] (Broadwell)
0.00 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node
132 Recommended GPU Node Configuration for LAMMPS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32
GTX Titan X, GPUs Kepler K20, K40, K80, M40
# of GPUs per CPU socket 1-2 GPU memory preference (GB) 6+ GPU to CPU connection PCIe 3.0 or higher
Server storage 500 GB or higher
Network configuration Gemini, InfiniBand Scale to thousands of nodes with same single node configuration 13 133 3 NAMD 2.12 July 2017 APOA1 on K80s
20 APOA1 17.73 16 14.92
Running NAMD version 2.12
12 The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] ns/day (Broadwell) CPUs 8 The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] (Broadwell) CPUs + Tesla K80 4 3.45 (autoboost) GPUs
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
135 APOA1 on P100s PCIe
28 APOA1 24 22.58 22.85
20 Running NAMD version 2.12
16 The blue node contains Dual Intel Xeon
E5-2690 [email protected] [3.5GHz Turbo] ns/day 12 (Broadwell) CPUs The green nodes contain Dual Intel 8 Xeon E5-2690 [email protected] [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 4 3.45
0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe (16GB) per node (16GB) per node
136 APOA1 on P100s SXM2
30 APOA1 25 23.87 22.98 23.44
20 Running NAMD version 2.12
The blue node contains Dual Intel Xeon 15 E5-2690 [email protected] [3.5GHz Turbo] ns/day (Broadwell) CPUs
10 The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 5 3.45
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node
137 F1ATPASE on K80s
8 F1ATPASE 6.27 6 Running NAMD version 2.12 4.81 The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] ns/day 4 (Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz 2 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 1.15
0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
138 F1ATPASE on P100s PCIe
10 F1ATPASE 8 7.34 7.40 6.99 Running NAMD version 2.12 6 The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] ns/day (Broadwell) CPUs 4 The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 2 PCIe (16GB) GPUs 1.15
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe (16GB) per node (16GB) per node (16GB) per node
139 F1ATPASE on P100s SXM2
10 F1ATPASE 8 7.11 7.11 6.85 Running NAMD version 2.12 6 The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] ns/day (Broadwell) CPUs 4 The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 2 SXM2 GPUs 1.15
0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node
140 STMV on K80s
3.0 STMV 2.5
2.085 2.0 Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] ns/day 1.5 1.274 (Broadwell) CPUs The green nodes contain Dual Intel 1.0 Xeon E5-2690 [email protected] [3.5GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 0.5 0.292
0.0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node
141 STMV on P100s PCIe
3.0 STMV 2.5 2.32 2.15
2.0 Running NAMD version 2.12
The blue node contains Dual Intel Xeon 1.5 E5-2690 [email protected] [3.5GHz Turbo] ns/day (Broadwell) CPUs
1.0 The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 0.5 PCIe (16GB) GPUs 0.29
0.0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe (16GB) per node (16GB) per node
142 STMV on P100s SXM2
3.0 STMV 2.5
2.077 2.0 Running NAMD version 2.12
The blue node contains Dual Intel Xeon 1.5 E5-2690 [email protected] [3.5GHz Turbo] ns/day (Broadwell) CPUs
1.0 The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 0.5 SXM2 GPUs 0.292
0.0 1 Broadwell node 1 node + 1x P100 SXM2 per node
143 NAMD 2.11 – Up to 2X Faster New GPU features in NAMD 2.11 Selected Text from the NAMD website
• GPU-accelerated simulations up to twice as fast as NAMD 2.10 • Pressure calculation with fixed atoms on GPU works as on CPU • Improved scaling for GPU-accelerated particle-mesh Ewald calculation
• CPU-side operations overlap better and are parallelized across cores. • Improved scaling for GPU-accelerated simulations
• Nonbonded force calculation results are streamed from the GPU for better overlap. • NVIDIA CUDA GPU-acceleration binaries for Mac OS X
145 NAMD 2.11 is up to 2x faster
25 APoA1 (92,224 atoms) 20 2.0X 1.6X
15
10 1.2X Simulated Simulated Time (ns/day) 5
0 1 Node 2 Nodes 4 Nodes
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs 146 NAMD 2.11 APoA1 on 1 and 2 nodes
25 24.31 APoA1 4.7X Running NAMD version 2.11 (92,224 atoms) 19.73 20 The blue nodes contain Dual Intel E5- 2698 [email protected] (Haswell) CPUs 16.99 3.8X The green nodes contain Dual Intel E5- 15 2698 [email protected] (Haswell) CPUs + Tesla K80 (autoboost) GPUs 11.67 6.1X
10
4.2X Simulated Simulated Time (ns/day) 5.22 5 2.77
0 1 Node 1 Node + 1 Node + 2 Nodes 2 Nodes + 2 Nodes + 1x K80 2x K80 1x K80 2x K80
147 NAMD 2.11 APoA1 on 4 and 8 nodes
30 27.83 27.74 APoA1 Running NAMD version 2.11 (92,224 atoms) 25 23.52 The blue nodes contain Dual Intel E5- 1.7X 1.6X 2698 [email protected] (Haswell) CPUs 20.64 2.3X 20 The green nodes contain Dual Intel E5- 16.85 2698 [email protected] (Haswell) CPUs + Tesla 2.0X K80 (autoboost) GPUs 15
10.27
Simulated Simulated Time (ns/day) 10
5
0 4 Nodes 4 Nodes + 4 Nodes + 8 Nodes 8 Nodes + 8 Nodes + 1x K80 2x K80 1x K80 2x K80 148 NAMD 2.11 is up to 1.8x faster
10 F1-ATPase (327,506 atoms) 8 1.8X 1.4X
6
4 1.1X Simulated Simulated Time (ns/day) 2
0 1 Node 2 Nodes 4 Nodes
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs 149 NAMD 2.11 F1-ATPase on 1 and 2 nodes
15 Running NAMD version 2.11
F1-ATPase The blue nodes contain Dual Intel E5- (327,506 atoms) 2698 [email protected] (Haswell) CPUs 10.58 10 The green nodes contain Dual Intel E5- 2698 [email protected] (Haswell) CPUs + Tesla 5.7X K80 (autoboost) GPUs 7.23 6.11 3.9X 5 3.87
Simulated Simulated Time (ns/day) 6.5X
4.1X 1.86 0.94
0 1 Node 1 Node + 1 Node + 2 Nodes 2 Nodes + 2 Nodes + 1x K80 2x K80 1x K80 2x K80
150 NAMD 2.11 F1-ATPase on 4 and 8 nodes
20 F1-ATPase Running NAMD version 2.11 15.74 The blue nodes contain Dual Intel E5- (327,506 atoms) 2698 [email protected] (Haswell) CPUs 15 14.22
12.62 2.3X The green nodes contain Dual Intel E5- 11.66 2.1X 2698 [email protected] (Haswell) CPUs + Tesla 3.5X K80 (autoboost) GPUs 10 3.2X
6.88 Simulated Simulated Time (ns/day)
5 3.63
0 4 Nodes 4 Nodes + 4 Nodes + 8 Nodes 8 Nodes + 8 Nodes + 1x K80 2x K80 1x K80 2x K80
151 NAMD 2.11 is up to 1.5x faster
4.0 STMV (1,066,628 atoms) 3.5 1.5X 3.0
2.5
2.0 1.5X
1.5
Simulated Simulated Time (ns/day) 1.0 1.1X
0.5
0.0 1 Node 2 Nodes 4 Nodes
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs 152 NAMD 2.11 STMV on 1 and 2 nodes
4 STMV Running NAMD version 2.11 3.27 (1,066,628 atoms) The blue nodes contain Dual Intel E5- 3 2698 [email protected] (Haswell) CPUs 7.1X The green nodes contain Dual Intel E5- 2698 [email protected] CPUs (Haswell) + Tesla K80 (autoboost) GPUs 1.98 2 1.75 4.3X
7.6X Simulated Simulated Time (ns/day) 1.03 1
4.5X 0.46 0.23
0 1 Node 1 Node + 1 Node + 2 Nodes 2 Nodes + 2 Nodes + 1x K80 2x K80 1x K80 2x K80
153 NAMD 2.11 STMV on 4 and 8 nodes
8 STMV Running NAMD version 2.11 (1,066,628 atoms) 6.24 The blue nodes contain Dual Intel E5- 6 5.86 2698 [email protected] (Haswell) CPUs 3.6X The green nodes contain Dual Intel E5- 4.54 2698 [email protected] CPUs (Haswell) + Tesla 3.4X K80 (autoboost) GPUs 4 3.61 5.0X
4.0X Simulated Simulated Time (ns/day) 2 1.74
0.90
0 4 Nodes 4 Nodes + 4 Nodes + 8 Nodes 8 Nodes + 8 Nodes + 1x K80 2x K80 1x K80 2x K80
154 Benefits of MD GPU-Accelerated Computing Why wouldn’t you want to turbocharge your research?
• 3x-8x Faster than CPU only systems in all tests (on average) • Most major compute intensive aspects of classical MD ported • Large performance boost with marginal price increase • Energy usage cut by more than half • GPUs scale well within a node and/or over multiple nodes • K80 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive 155 Molecular Dynamics (MD) on GPUs Dec. 19, 2016 GPU-Accelerated Quantum Chemistry Apps Green Lettering Indicates Performance Slides Included
Abinit Gaussian Octopus Quantum SuperCharger ACES III GPAW ONETEP Library ADF LATTE Petot RMG BigDFT LSDalton Q-Chem TeraChem CP2K MOLCAS QMCPACK UNM GAMESS-US Mopac2012 Quantum VASP Espresso NWChem WL-LSMS
GPU Perf compared against dual multi-core x86 CPU socket. 157