Multiscale Implementation of Infinite-Swap Replica Exchange Molecular Dynamics
Total Page:16
File Type:pdf, Size:1020Kb
Multiscale implementation of infinite-swap replica exchange molecular dynamics Tang-Qing Yua, Jianfeng Lub,c,d, Cameron F. Abramse, and Eric Vanden-Eijndena,1 aCourant Institute of Mathematical Sciences, New York University, New York, NY 10012; bDepartment of Mathematics, Duke University, Durham, NC 27708; cDepartment of Physics, Duke University, Durham, NC 27708; dDepartment of Chemistry, Duke University, Durham, NC 27708; and eDepartment of Chemical and Biological Engineering, Drexel University, Philadelphia, PA 19104 Edited by Michael L. Klein, Temple University, Philadelphia, PA, and approved August 9, 2016 (received for review March 28, 2016) Replica exchange molecular dynamics (REMD) is a popular method values permits accelerated sampling. These replicas are then to accelerate conformational sampling of complex molecular systems. evolved in parallel using standard MD, and their temperatures are The idea is to run several replicas of the system in parallel at different periodically swapped. If the swaps are performed in a thermody- temperatures that are swapped periodically. These swaps are typically namically consistent way, the correct canonical averages at any of attempted every few MD steps and accepted or rejected according to a these temperatures can be sampled from the pieces of the replica Metropolis–Hastings criterion. This guarantees that the joint distribu- trajectories during which they feel this temperature. In the stan- tion of the composite system of replicas is the normalized sum of the dard implementation of REMD, this is done by attempting swaps symmetrized product of the canonical distributions of these replicas at at a predefined frequency and accepting or rejecting them according – the different temperatures. Here we propose a different implementa- to a Metropolis Hastings criterion. This guarantees that the joint tion of REMD in which (i) the swaps obey a continuous-time Markov distribution of the replicas is the symmetrized product of their ca- jump process implemented via Gillespie’s stochastic simulation algo- nonical distributions at the different temperatures. The frequency of rithm (SSA), which also samples exactly the aforementioned joint dis- swap attempts is an input parameter in the method, and recent tribution and has the advantage of being rejection free, and (ii)this simulations argue that higher frequencies generally provide better sampling (13–17). This was justified rigorously through large REMD-SSA is combined with the heterogeneous multiscale method to deviation theory (18). How to implement high-frequency swap- accelerate the rate of the swaps and reach the so-called infinite-swap attempt REMD is not obvious, however, because increasing the limit that is known to optimize sampling efficiency. The method is easy attempt frequency of the swap also increases the computational to implement and can be trivially parallelized. Here we illustrate its cost, sometimes with little gain because this also leads to more accuracy and efficiency on the examples of alanine dipeptide in vac- β rejections. One way to get around this difficulty is to take the so- uum and C-terminal -hairpin of protein G in explicit solvent. In this called infinite-swap limit analytically and simulate the resulting latter example, our results indicate that the landscape of the protein is limiting equations. This solution was first advocated in ref. 19 and a triple funnel with two folded structures and one misfolded structure later revisited in ref. 20. Unfortunately, this solution cannot be that are stabilized by H-bonds. implemented directly if the number of temperatures is large, be- cause the limiting equation involves coefficients whose calculation importance sampling | REMD | SSA | HMM | protein-folding requires computing sum over all possible permutations of the temperatures, and if there are N temperatures, there are of course ll-atom molecular dynamics (MD) simulations are increasingly N! such permutations. Further approximations are therefore re- Aused as a tool to study the structureandfunctionoflarge quired in practice (19, 20). biomolecules and other complex molecular systems from first prin- In this paper, we propose a different solution to this problem ciples. By allowing us to scrutinize these systems at spatiotemporal that involves no approximation. It is based on the observation resolutions that are typically beyond experimental reach, MD sim- that REMD can be reformulated in such a way that the tem- ulations are changing the way we understand chemical and biological perature swaps satisfy a continuous-time Markov jump process processes, and their impact in areas such as chemical engineering, (MJP) rather than a discrete-time Markov chain. This MJP can syntheticbiology,drugdesign,andsoonisboundtoincreaseinthe coming years (1–3). However, exploiting the full potential of MD Significance simulations remains a major challenge, primarily due to approxi- mations in all-atom force fields and the limitation of accessible Efficiently sampling the conformational space of complex mo- timescales available to standard MD. Force fields continue to be lecular systems is a difficult problem that frequently arises in the improved (4–7), and hardware developments involving special-pur- context of molecular dynamics (MD) simulations. Here we pre- pose high-performance computers (8), high-performance graphics sent a modification of replica exchange MD (REMD) that is as processing units (9), or massively parallel simulations (10) also per- simple to implement and parallelize as standard REMD but is mit us to enlarge the scope of MD simulations. However, it has been shown to perform better in terms of sampling efficiency. The argued (11) that algorithm developments will be the key to access method is used here to investigate the multifunnel structure of biologically relevant timescales with MD. In particular, there is a the C-terminal β-hairpin of protein G in explicit solvent, which to great need for methods capable of accelerating the exploration and date has required much longer REMD simulations to produce sampling of the conformational space of complex molecular systems. converged predictions of conformational statistics. In particular, Among the various techniques that have been introduced to we identify a potentially important folded β structure previously this end in recent years, replica exchange MD (REMD) (12) undetected by other importance sampling methods. remains one of the most popular. The reasons for this success are simple: REMD does not require much prior information on the Author contributions: T.-Q.Y., J.L., and E.V.-E. designed research; T.-Q.Y., J.L., C.F.A., and E.V.-E. system (in particular, it does not require one to prescribe col- performed research; J.L. and E.V.-E. contributed new reagents/analytic tools; T.-Q.Y., C.F.A., lective variables beforehand), it is fairly versatile, and it is easy to and E.V.-E. analyzed data; and T.-Q.Y., J.L., C.F.A., and E.V.-E. wrote the paper. parallelize. The basic idea behind REMD is to introduce several The authors declare no conflict of interest. additional copies, or replicas, of the system that have temperatures This article is a PNAS Direct Submission. higher than the one(s) of interest and are therefore expected to 1To whom correspondence should be addressed. Email: [email protected]. explore its conformational space faster—control parameters other This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. than the temperature can also be used, as long as adjusting their 1073/pnas.1605089113/-/DCSupplemental. 11744–11749 | PNAS | October 18, 2016 | vol. 113 | no. 42 www.pnas.org/cgi/doi/10.1073/pnas.1605089113 Downloaded by guest on October 1, 2021 be simulated via Gillespie’s stochastic simulation algorithm (SSA) restrict ourselves to transposition moves in which two random indices are (21), which is rejection-free: Given the current assignments of the swapped and for which the total number of accessible states σ′ from σ is temperatures across the replicas, the method computes directly the NðN − 1Þ=2. time at which the next swap occurs and proceeds with the MD until σ = σ that time—this is in contrast with standard REMD, in which the These dynamics can also be reformulated as follows. Given that ðtÞ at time t, the probability that it jumps out of σ for the first time in the in- swaps are proposed (and rejected or accepted) at fixed time lags. finitesimal time interval ½t′; t′ + dt with t′ > t and reaches σ′ ≠ σ is given by This REMD-SSA can then be combined with multiscale simulations Z ! X t′ schemes such as the heterogeneous multiscale method (HMM) exp − qσ;σ′′ðXðsÞÞds qσ;σ′ðXðt′ÞÞdt, [6] (22–24) to effectively compute at the infinite-swap limit. Here, we σ′′ t discuss the theoretical and computational aspects of this refor- X ∈ ; ′ σ = σ mulation of REMD involving a multiscale SSA, termed REMD- where ðsÞ is the solution to Eq. 4 for s ½t t with ðtÞ fixed. This MSSA for short, and apply it to compute free-energy surfaces reformulation of the process can be used to design straightforward variants β of Gillespie’s SSA to simulate it—for this reason we will refer to it as REMD (FESs) of alanine dipeptide (AD) and a 16-residue -hairpin from with SSA, or REMD-SSA for short. How to implement REMD-SSA with a finite protein G. In both applications, we observe that REMD-MSSA is ν is explained in Supporting Information, and how to reach the limit ν → ∞ is more efficient than standard REMD: In particular, the rapid con- β dealt with below. In both cases, these implementations are rejection-free vergence of the free energy calculation in the context of -hairpin and akin to kinetic Monte-Carlo schemes. β allows one to identify a structure. For any swapping rate ν, the dynamics described in Eqs. 4 and 5 satisfies detailed balance with respect to Eq. 2 and therefore has Eq. 2 as its equi- Methodology librium distribution (see Supporting Information for details).