AMOEBA and HPC Parallelising Polarisable Force Fields Weronika Filinger (w.fi[email protected]) EPCC, The University of Edinburgh
Why we want to use polarisable force fields OpenMP Performance The accuracy of bio-molecular simula ons can be seriously limited by the use of The performance of dynamic on the Joint AMBER-CHARMM (JAC) benchmark [5] run tradi onal treatment of electrosta cs from empirically-derived force fields. Non- on the UK ARCHER supercompu ng service is shown in Figure 1. Each ARCHER node polarisable models typically approximate electrosta c interac ons by only using fixed, has 24 cores so the program can be run on up to 24 OpenMP threads. The JAC system usually atom-centred, charges, so that the charges in atoms are limited in their ability consists of 2489 atoms of dihydrofolate reductase (dhfr) in a bath of 7023 water to adapt to changes in their local environment. Real physical systems, however, do molecules, giving a total of 23,558 atoms. The simula on run 100 MD steps in a adapt and undergo polarisa on, e.g. the orienta on of atom bonds will distort when canonical ensemble (NVT - constant number of atoms, volume and temperature) at a placed in poorly conduc ng mediums such as water, changing the geometry, energy temperature of 298 Kelvins. The length of a me step was set to 1.0 femtoseconds. and the way the system interacts. Polarisa on is thus a necessary and important From the performance and scalability intermolecular interac on required to accurately model chemical reac ons in analysis we can note that: biological molecules and water-based systems, which are of great interest in fields such • The applica on scales up for to 20 as chemistry, biochemistry and material science. OpenMP threads but a parallel efficiency above 50% is only observed for up to 12 Much effort has been put into developing mutually polarisable models, which enhance threads; fixed charge models by incorpora ng the polarisa on of atoms and molecules in the • Slightly be er performance and scalability dynamics of such systems. These polarisable models are more accurate and more are observed for larger systems, but complex hence they tend to be much more computa onally demanding. Therefore memory usage restricts the maximum size there is a strong requirement for the so ware packages that implement polarisable Figure 1. Speedups for different number of of the system that can be run; OpenMP threads for JAC benchmark on ARCHER. force fields to be capable of exploi ng current and emerging HPC architectures to • An a empt to improve the performance make modelling ever more complex systems viable. through OpenMP op misa ons were not The ideal speedup is the speedup that very successful; small performance would be obtained if the whole code improvements were compiler and was parallelised with maximal AMOEBA architecture dependant; efficiency, and the theore cal speedup • The difference between the observed and is the speedup calculated using AMOEBA [1] ( Atomic Mul pole Op mised Energies theore cal speedups suggests that Amdahl’s law [6] with the serial for Biomolecular Applica ons) is a polarisable force without increased parallel coverage the component (i.e. the frac on of the field that has been implemented and is used in OpenMP op misa on will not improve code that is executed serially) of 5%. codes such as TINKER [2] and AMBER [3]. AMOEBA the performance significantly; replaces fixed par al charges with atomic mul poles, including an explicit dipole polarisa on The next step is to create a hybrid MPI and OpenMP implementa on to make use of the shared memory within a node via OpenMP and to use MPI for the internode that allows atoms to respond to the polarisa on Amoeba proteus communica ons. changes in their molecular environment. Image taken from h p://www.microscopy-uk.org.uk/mag/ imgsep01/amoebaproteus450.jpg AMOEBA has been shown to give more accurate structural and thermodynamic MPI Parallelisa on proper es of various simulated systems [4]. However, the me required to calculate the effect of the induced dipoles alone is 15-20 mes larger than the cost of one me An MPI parallelisa on effort is under way. Due MPI 1 2 4 8 16 step for a fixed charge model. If this computa onal me could be reduced then the to the me constraints of the APES project processes Speedup 1 1.48 1.85 1.87 1.36 AMOEBA force field could, for example, model a much broader range of protein-ligand and the complexity of the Tinker code EPCC’s efforts have been focused on a replicated data Parallel 100% 74% 46% 23% 9% systems than is currently possible or provide a means of refining the analysis of X-ray Efficiency crystallography data to cope with large and challenging data sets, such as for ribosome strategy. This should enable performance improvements within the me frame of the Table 1. Speedup and parallel efficiency for crystals. The larger computa onal costs associated with the polarisa on calcula ons different number of MPI processes for JAC needs to be mi gated by improving the parallelisa on of codes such as TINKER. project. benchmark. Micro-benchmarks of the most me consuming subrou nes parallelised with MPI together with scaling profiles indicate that using both MPI and OpenMP over the TINKER same regions will not give sa sfactory scaling (table 1). There is not enough computa on to compensate for the MPI communica on costs in most subrou nes.
TINKER is a so ware package capable of Currently, work is being done to increase parallel coverage as much as possible while molecular mechanics and dynamics, with keeping communica on and synchronisa on between MPI processes to a minimum. some special features for biopolymers, wri en by Jay Ponder (Washington University in St. Louis). TINKER consists of a collec on of Difficul es programs offering different func onali es and During both stages of the parallelisa on of TINKER, a number of issues that affect the supports customised force fields parameter performance and the effec veness of the parallelisa on have been observed: sets for proteins, organic molecules, inorganic • The fine-grained structure of the code, i.e. many small subrou nes that are called systems, etc., including the AMOEBA force many mes, make it hard to parallelise; field. TINKER’s logo taken from TINKER’s website[2] • Memory issues: large memory usage and indirect memory addressing, produce bad memory locality and lead to memory trashing with a degrada on in the overall performance; TINKER is wri en in Fortran and has been • Large numbers of global variables, makes it hard to iden fy dependencies between parallelised using OpenMP direc ves. Nearly different variables and subrou nes. 95% of the dynamic program included in the Tinker distribu on, which can be used to generate a stochas c dynamic trajectory or References compute a molecular dynamics trajectory in a chosen ensemble, can be executed in parallel. [1] J. W. Ponder et al. , J Phys .Chem. B, 2010, 114(8), 2549-2564. However, the scalability of dynamic has been [2] J. W. Ponder and F M Richards J. Comput. Chem. 1987, 8, 1016-1024. website : h p://dasher.wustl.edu/ nker/ found to be limited. Hence further [3] D.A. Case, et al. (2014), AMBER 14, University of California, San Francisco. parallelisa on efforts have been undertaken website : h p://ambermd.org by EPCC as part of the Advanced Poten al DHFR in water used in JAC benchmark [4] O. Demerdash, E-H Yap and T. Head-Gordon, Annu.Rev. Phys. Chem. 2014(65): 149-74. Image taken from h p://www.gromacs.org/@api/deki/files/ Energy Surfaces (APES) project, a NSF-EPSRC th 130/=dhfr_water.png [5] h p://ambermd.org/amber10.bench1.html#jac, last visited on 9 April 2015. funded US-UK collabora on. [6] G.M. Amdahl, AFIPS Conf. Proc., 1967 (30): 483-485.