Multiscale implementation of infinite-swap replica exchange

Tang-Qing Yua, Jianfeng Lub,c,d, Cameron F. Abramse, and Eric Vanden-Eijndena,1

aCourant Institute of Mathematical Sciences, New York University, New York, NY 10012; bDepartment of Mathematics, Duke University, Durham, NC 27708; cDepartment of Physics, Duke University, Durham, NC 27708; dDepartment of Chemistry, Duke University, Durham, NC 27708; and eDepartment of Chemical and Biological Engineering, Drexel University, Philadelphia, PA 19104

Edited by Michael L. Klein, Temple University, Philadelphia, PA, and approved August 9, 2016 (received for review March 28, 2016)

Replica exchange molecular dynamics (REMD) is a popular method values permits accelerated sampling. These replicas are then to accelerate conformational sampling of complex molecular systems. evolved in parallel using standard MD, and their temperatures are The idea is to run several replicas of the system in parallel at different periodically swapped. If the swaps are performed in a thermody- temperatures that are swapped periodically. These swaps are typically namically consistent way, the correct canonical averages at any of attempted every few MD steps and accepted or rejected according to a these temperatures can be sampled from the pieces of the replica Metropolis–Hastings criterion. This guarantees that the joint distribu- trajectories during which they feel this temperature. In the stan- tion of the composite system of replicas is the normalized sum of the dard implementation of REMD, this is done by attempting swaps symmetrized product of the canonical distributions of these replicas at at a predefined frequency and accepting or rejecting them according – the different temperatures. Here we propose a different implementa- to a Metropolis Hastings criterion. This guarantees that the joint tion of REMD in which (i) the swaps obey a continuous-time Markov distribution of the replicas is the symmetrized product of their ca- jump process implemented via Gillespie’s stochastic simulation algo- nonical distributions at the different temperatures. The frequency of rithm (SSA), which also samples exactly the aforementioned joint dis- swap attempts is an input parameter in the method, and recent tribution and has the advantage of being rejection free, and (ii)this simulations argue that higher frequencies generally provide better sampling (13–17). This was justified rigorously through large REMD-SSA is combined with the heterogeneous multiscale method to deviation theory (18). How to implement high-frequency swap- accelerate the rate of the swaps and reach the so-called infinite-swap attempt REMD is not obvious, however, because increasing the limit that is known to optimize sampling efficiency. The method is easy attempt frequency of the swap also increases the computational to implement and can be trivially parallelized. Here we illustrate its cost, sometimes with little gain because this also leads to more accuracy and efficiency on the examples of alanine dipeptide in vac- β rejections. One way to get around this difficulty is to take the so- uum and C-terminal -hairpin of protein G in explicit solvent. In this called infinite-swap limit analytically and simulate the resulting latter example, our results indicate that the landscape of the protein is limiting equations. This solution was first advocated in ref. 19 and a triple funnel with two folded structures and one misfolded structure later revisited in ref. 20. Unfortunately, this solution cannot be that are stabilized by H-bonds. implemented directly if the number of temperatures is large, be- cause the limiting equation involves coefficients whose calculation importance sampling | REMD | SSA | HMM | protein-folding requires computing sum over all possible permutations of the temperatures, and if there are N temperatures, there are of course ll-atom molecular dynamics (MD) simulations are increasingly N! such permutations. Further approximations are therefore re- Aused as a tool to study the structureandfunctionoflarge quired in practice (19, 20). biomolecules and other complex molecular systems from first prin- In this paper, we propose a different solution to this problem ciples. By allowing us to scrutinize these systems at spatiotemporal that involves no approximation. It is based on the observation resolutions that are typically beyond experimental reach, MD sim- that REMD can be reformulated in such a way that the tem- ulations are changing the way we understand chemical and biological perature swaps satisfy a continuous-time Markov jump process processes, and their impact in areas such as chemical engineering, (MJP) rather than a discrete-time . This MJP can syntheticbiology,drugdesign,andsoonisboundtoincreaseinthe coming years (1–3). However, exploiting the full potential of MD Significance simulations remains a major challenge, primarily due to approxi- mations in all-atom force fields and the limitation of accessible Efficiently sampling the conformational space of complex mo- timescales available to standard MD. Force fields continue to be lecular systems is a difficult problem that frequently arises in the improved (4–7), and hardware developments involving special-pur- context of molecular dynamics (MD) simulations. Here we pre- pose high-performance computers (8), high-performance graphics sent a modification of replica exchange MD (REMD) that is as processing units (9), or massively parallel simulations (10) also per- simple to implement and parallelize as standard REMD but is mit us to enlarge the scope of MD simulations. However, it has been shown to perform better in terms of sampling efficiency. The argued (11) that algorithm developments will be the key to access method is used here to investigate the multifunnel structure of biologically relevant timescales with MD. In particular, there is a the C-terminal β-hairpin of protein G in explicit solvent, which to great need for methods capable of accelerating the exploration and date has required much longer REMD simulations to produce sampling of the conformational space of complex molecular systems. converged predictions of conformational statistics. In particular, Among the various techniques that have been introduced to we identify a potentially important folded β structure previously this end in recent years, replica exchange MD (REMD) (12) undetected by other importance sampling methods. remains one of the most popular. The reasons for this success are simple: REMD does not require much prior information on the Author contributions: T.-Q.Y., J.L., and E.V.-E. designed research; T.-Q.Y., J.L., C.F.A., and E.V.-E. system (in particular, it does not require one to prescribe col- performed research; J.L. and E.V.-E. contributed new reagents/analytic tools; T.-Q.Y., C.F.A., lective variables beforehand), it is fairly versatile, and it is easy to and E.V.-E. analyzed data; and T.-Q.Y., J.L., C.F.A., and E.V.-E. wrote the paper. parallelize. The basic idea behind REMD is to introduce several The authors declare no conflict of interest. additional copies, or replicas, of the system that have temperatures This article is a PNAS Direct Submission. higher than the one(s) of interest and are therefore expected to 1To whom correspondence should be addressed. Email: [email protected]. explore its conformational space faster—control parameters other This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. than the temperature can also be used, as long as adjusting their 1073/pnas.1605089113/-/DCSupplemental.

11744–11749 | PNAS | October 18, 2016 | vol. 113 | no. 42 www.pnas.org/cgi/doi/10.1073/pnas.1605089113 Downloaded by guest on October 1, 2021 be simulated via Gillespie’s stochastic simulation algorithm (SSA) restrict ourselves to transposition moves in which two random indices are (21), which is rejection-free: Given the current assignments of the swapped and for which the total number of accessible states σ′ from σ is temperatures across the replicas, the method computes directly the NðN − 1Þ=2. time at which the next swap occurs and proceeds with the MD until σ = σ that time—this is in contrast with standard REMD, in which the These dynamics can also be reformulated as follows. Given that ðtÞ at time t, the probability that it jumps out of σ for the first time in the in- swaps are proposed (and rejected or accepted) at fixed time lags. finitesimal time interval ½t′; t′ + dt with t′ > t and reaches σ′ ≠ σ is given by This REMD-SSA can then be combined with multiscale simulations Z ! X t′ schemes such as the heterogeneous multiscale method (HMM) exp − qσ;σ′′ðXðsÞÞds qσ;σ′ðXðt′ÞÞdt, [6] (22–24) to effectively compute at the infinite-swap limit. Here, we σ′′ t discuss the theoretical and computational aspects of this refor- X ∈ ; ′ σ = σ mulation of REMD involving a multiscale SSA, termed REMD- where ðsÞ is the solution to Eq. 4 for s ½t t with ðtÞ fixed. This MSSA for short, and apply it to compute free-energy surfaces reformulation of the process can be used to design straightforward variants β of Gillespie’s SSA to simulate it—for this reason we will refer to it as REMD (FESs) of alanine dipeptide (AD) and a 16-residue -hairpin from with SSA, or REMD-SSA for short. How to implement REMD-SSA with a finite protein G. In both applications, we observe that REMD-MSSA is ν is explained in Supporting Information, and how to reach the limit ν → ∞ is more efficient than standard REMD: In particular, the rapid con- β dealt with below. In both cases, these implementations are rejection-free vergence of the free energy calculation in the context of -hairpin and akin to kinetic Monte-Carlo schemes. β allows one to identify a structure. For any swapping rate ν, the dynamics described in Eqs. 4 and 5 satisfies with respect to Eq. 2 and therefore has Eq. 2 as its equi- Methodology librium distribution (see Supporting Information for details). As a result, the REMD-SSA. Let us propose a reformulation of REMD such that (i) the tem- expectation of an observable A at any temperature 1=β can be calculated as Z j perature swaps are replaced by swaps in factors multiplying the forces acting Æ æ = x ρ x x on the replicas and (ii) these swaps occur via a continuous-time MJP—the A j Að Þ j ð Þd generalization to other control parameters is straightforward and is con- Z ! X XN sidered briefly in Using Parameters Other than Temperature. To this end, it is = x ϱ σ; X X Að i Þ1j=σðiÞ ð Þd [7] useful to take as a starting point the probability distribution that a REMD σ i=1 Z simulation is designed to sample. Suppose we use N replicas with positions T XN X = x ; ⋯; x β > β ⋯ > β 1 denoted collectively as f 1 Ng and let 1 2 N be the N inverse ≈ Aðx ðtÞÞ1 =σ ; dt, T i j ði tÞ temperatures that are being swapped over these replicas. Denote also 0 i=1 −1 −β VðxÞ ρ ðxÞ = Zβ e i the canonical distribution at inverse temperature β over the i i R i −β V x where 1 =σ = 1ifj = σðiÞ and 1 =σ = 0 otherwise, and similarly for 1 =σ : atomic potential VðxÞ (with Zβ = e i ð Þ dx). Then REMD samples the sym- j ðiÞ j ðiÞ j ði, tÞ i Here σ i, t denotes the index onto which i is mapped at time t by the time- metrized equilibrium probability density (18, 19): ð Þ dependent permutation σðtÞ.

1 X ϱðXÞ = ρσ ðx Þ⋯ρσ ðx Þ, [1] Infinite-Swap REMD. It can be shown that the sampling efficiency of REMD- N! ð1Þ 1 ðNÞ N σ SSA increases monotonically with ν (Supporting Information). This suggests one should operate the scheme in the infinite-swap limit ν → ∞, and so let us where the sum is taken over all of the permutation σ of the indices f1; ⋯; Ng consider briefly this limit next—how to operate within it in practice will be [with σðiÞ denoting the index onto which i is mapped by the permutation σ]. The described in the next section. As ν → ∞, the permutation σ evolves infinitely symmetrized density in Eq. 1 can be thought of as the marginal density on the faster than X, meaning that σ is always at equilibrium conditional on the positions X alone of the following joint distribution for X and the permutation σ: current value of XðtÞ, and XðtÞ only feels the average effect of σ. In other words, the dynamics of X is captured by the limiting equation 1 ϱ σ; X = ρ x ρ x _ −1 ð Þ σ ð 1Þ⋯ σ ð NÞ. [2] xj = m p , N! ð1Þ ðNÞ j pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi [8] p_ = − X ∇ x − γp + γ β−1η j Rj ð Þ V j j 2 m j . Performing temperature swaps is equivalent to evolving the permutation σ concurrently with the replica configurations X in a way that is consistent Here X with Eq. 2. In standard REMD this is done by proposing a new permutation X = β−1 β ω σ CHEMISTRY σ′ ≠ σ Rj ð Þ σ−1 ðjÞ X ð Þ, [9] after a fixed time lag and accepting or rejecting it according to a σ Metropolis–Hastings criterion. However, it is easy to modify the method and − make both X and σ continuous-time Markov processes in which the updates where σ 1ðjÞ denotes the index mapped onto j by the permutation σ,isthe of σ occur at random times. Introducing the symmetrized (mixture) potential averaged rescaling parameter of the force, with the average taken with respect to the equilibrium distribution of σ given X: −β X;σ XN e Vð Þ ϱðσ; XÞ −1 −1 ωX ðσÞdP = P . [10] X; σ = −β log ϱ σ; X = β β V x + cst, [3] −βVðX;σ′Þ Vð Þ ð Þ σðiÞ ð i Þ σ′e σ′ϱðσ′; XÞ i=1

β ≡ β ∇ X; σ = We note that Eq. 8 is exactly the infinite-swap REMD (ISREMD) formulated in BIOPHYSICS AND where we used 1 as reference temperature, and noting that xj Vð Þ − ref. 20.

1 COMPUTATIONAL BIOLOGY β βσ ∇Vðxj Þ, this amounts to imposing that: ðjÞ The equilibrium distribution sampled by the limiting equations (Eq. 8)is i) the replica positions evolve via standard MD (using, e.g., Langevin’s the mixed distribution ϱðXÞ in Eq. 1. Therefore, the canonical average of A at thermostat with friction coefficient γ) over the potential (Eq. 3), β can be estimated by j Z Æ æ = x ρ x x _ −1 A j Að Þ j ð Þd xj = m p ; j pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi [4] p_ = −β−1β ∇ x − γp + γ β−1 η ; Z j σðjÞ V j j 2 m j XN = Aðx Þη ; ðXÞϱðXÞdX where η is white noise with mean zero and covariance Æη t ηT t′ æ = i i j [11] j j ð Þ k ð Þ = δ δ − ′ i 1 j;k ðt t ÞId, and Z σ T XN ii) the permutation is updated via the continuous-time MJP with rate ≈ 1 x η X Að i ðtÞÞ i;j ð ðtÞÞdt, σ;σ′ X σ′ ≠ σ T q ð Þ with given by 0 i=1

−1 βðVðX;σ′Þ−VðX;σÞÞ where qσ;σ′ðXÞ = νaσ;σ′e 2 , [5] X η X = ω σ ν > ν i;j ð Þ 1j=σðiÞ X ð Þ [12] where 0 is a constant that controls the swapping rate (the higher , σ the more frequently the jumps occur), and the symmetric matrix aσ;σ′ = aσ′;σ indicates whether the permutation σ′ is accessible from σ,thatis,aσ;σ′ = 1if is the probability that the ith replica is at the jth temperature conditional on we allow the jump from σ to σ′ and aσ;σ′ = 0 otherwise—below we will the replica positions being at X.

Yu et al. PNAS | October 18, 2016 | vol. 113 | no. 42 | 11745 Downloaded by guest on October 1, 2021 In ISREMD, the permutation σ is no longer a dynamical variable: We Supporting Information. It is closer to what is done in standard REMD, ex- onlyneedtoevolveXðtÞ accordingtoEq.8. Unfortunately, this involves cept that many swaps are attempted at each MD step instead of one swap calculating the factor Rj ðXÞ defined in Eq. 9, and the cost of these calcu- every few MD steps (i.e., this implementation still permits one to effectively lations increases very rapidly with the number of replicas N, because the compute in the infinite-swap limit). number of permutations grows as a factorial of N. On top of this, to es- Like standard REMD, REMD-MSSA is easily parallelizable. For example, the η X timate averages we also need to compute i,j ð Þ, which is costly too. One MD simulations of each replica can be performed on different nodes with an way to address this problem is to introduce additional approximations, additional node used to evolve σ. At each MD step, the energies Vðxj Þ of the for example, partial swapping (18, 19). Next, we propose an alternative N replicas need to be communicated to the node evolving σ, and this node ^ strategy that permits us to reach the infinite-swap limit while working sends back to those evolving Xk the N values of Rj ðXkÞ. The calculation of directly with the original dynamics defined in Eqs. 4 and 5, without in- averages can be handled separately and requires the N values of Aðxj Þ and − ^η X troducing additional approximations. the NðN 1Þ values of i,jð kÞ. A version REMD-MSSA has been implemented into (v3.4.0.2) Implementation: REMD with Multiscale SSA. To simulate the process defined by (25) and used in the numerical examples presented next. To benchmark Eqs. 4 and 5 at large ν, that is, in a regime where the evolution of slow the method on a simple example, we also tested REMD-MSSA on a 1D variables X is effectively captured by the limiting equation (Eq. 8), we will use asymmetric double-well potential. These results are reported in Figs. S1–S3 a variant of the HMM (22–24). HMM’s basic idea is to compute on-the-fly the and show that REMD-MSAA combined with HMM permits one to reach the coefficients in the limiting equation (Eq. 8) for the slow variables X,asthey infinite-swap limit and is more efficient than standard REMD. are needed to evolve these variables, via short bursts of simulation of the fast process σ—these simulations are performed in what is called the microsolver Numerical Results in HMM, whereas the routine used to evolve the slow variables is referred to AD in Vacuum. As a first test of REMD-MSSA, we studied blocked as the macrosolver; the scheme is made complete by an estimator specifying AD (ACE-ALA-CBX) in vacuum under the CHARMM force how the data generated in the microsolver are used within the macrosolver field with CMAP corrections (26, 27). The free energy profile (i.e., how the unknown coefficients in the limiting equation for X are being ϕ ψ computed). In the present context, the HMM scheme we will use works along the two dihedral angles and for this system has been mapped (e.g., in ref. 28). We performed tests of REMD-MSSA as follows. ν ChooseatimestepΔt appropriate for the simulation via MD of the using either 4 or 12 replicas, using the values of specified in limiting equation (Eq. 8) and pick a frequency ν such that ν 1=Δt (for simulation details, and running the simulations for 50 ns. We 2 3 compared these results with those from an ISREMD simulation of example ν = 10 =Δt − 10 =Δt). Start the simulation with X0 = Xð0Þ and σ0 = σð0Þ, and set t0 = 0. Then for k ≥ 0, use the following three-step iteration 50 ns with four replicas (in which there are only 24 permutations) algorithm: using the analytic weights in ref. 10—such a calculation with ISREMD is not feasible with 12 replicas because the number of σ d + Δ i) Microsolver: Evolve k via SSA from tk to tk+1 tk t using the rate in permutations ( ≈ 4.8 × 108) becomes too large to manage. Finally, σ = σ = ≥ Eq. 5, that is: Set k;0 k, tk;0 tk, and for l 1, do: we compared the results with those from a standard REMD with −1 (a) Compute the lag to the next reaction via 12 replicas and swapping rate of 0.5 ps . The reconstructed free energy profiles at 300 K are shown in Fig. ln r 1A. The profiles calculated using REMD-MSSA with 4 and 12 τl = − , [13] σ q k;l−1 replicas (Fig. 1A, Left, Upper and Lower) and ISREMD with four wherePr is a random number uniformly picked in the interval ð0,1Þ and replicas (Fig. 1A, Upper Right) are in close agreement. These = X qσ qσ≠σ′ð kÞ; profiles also match those obtained with standard REMD (Fig. 1A, σ′≠σ Lower Right) and they are in close agreement with those estimated σ (b) pick k,l with probability by other enhanced sampling methods under the same force field, including those calculated with 100 ns of and 250-ns X qσ ; − ;σ ; ð Þ accelerated MD (28). In particular, we correctly identified the four = k l 1 k l k pσk;l . [14] qσk;l−1 known local minima [using the same terminology as in Mackerell et al. (27)]: C5 located at ð−154,160Þ,C7eq at ð−81,66Þ,C7ax at = + τ > + Δ (c) Set tk;l tk;l−1 l and repeat till the first L such that tk;L tk t; ð80, − 63Þ,andαR at ð−99,12Þ. σ = σ τ = + Δ − then set k+1 k;L and reset L tk t tk;L−1. To assess the sampling efficiency of our scheme, we first looked at the times the temperatures of each replicas take to make a ii) Estimator: Given the trajectory of σ, estimate η ðX Þ and R ðX Þ via i,j k j k round trip between their highest and lowest values; equivalently, this amounts to monitoring the values of σði, tÞ versus time for Z +Δ tk t each replica index i. It has been argued (16, 29–31) that the ^η X = 1 i,j ð kÞ Δ 1j=σði;sÞds shorter these round-trip times, the faster the scheme will converge. t tk [15] A similar conclusion was reached in ref. 17, using a quantity 1 XL = 1 =σ τ ; closely related to round-trip times: the lifetime, which gives the Δt j k;l ðiÞ l l=1 average time a temperature stays at a given value before being and updated to another. In the context of ISREMD, the round-trip times (and the lifetime) should be zero because this method

Z +Δ operates in the infinite-swap limit: This means that we can use this X β tk t ^ −1 i R ðX Þ = β 1 =σ ds test to quantify how close REMD-MSSA is from this limit. In Fig. j k Δt j ði, sÞ i tk 1B we show a typical trajectory σði, tÞ from REMD-MSSA and one X [16] −1 = β−1 β ^η X from standard REMD (with swap rate at 0.5 ps ) for the case of i i;j ð kÞ. i 12 replicas (the full set of trajectories can be found in Fig. S4). It can be seen that the round-trip times are indeed much shorter in iii) Macrosolver: Evolve Xk to Xk+1 using one time-step of size Δt in the X ^ X REMD-MSSA than in standard REMD. Similarly, the lifetime in MD integrator for Eq. 4 with Rj ð kÞ replaced by the factor Rj ð kÞ calcu- REMD-SSA is roughly 10 as =10−17 s compared with 102 ps lated in the estimator. Then repeat the three steps above for each time =10−10s for standard REMD. Another more stringent measure of step k. efficiency is the flatness of the temperature distributions at each σ We will refer to this scheme as REMD with multiscale SSA, or REMD-MSSA replica, or equivalently that of ði, tÞ for each replica index i.In- for short. It gives an accurate approximation of the solution to the limiting deed, it is known (30, 31) that these distributions should be uni- ^ σ equation (Eq. 8)ifν 1=Δt. Indeed, in this regime the sampled Rj ðXÞ and form at equilibrium [i.e., for each i ði, tÞ should spend the same ^η X i,jð Þ will be close to their averaged values in Eqs. 9 and 12. We also note proportion of time at every index value j]. Unlike the round-trip that a different implementation of the microsolver, based on a Metropolis– times, which can be short even if the replicas have not equilibrated Hastings algorithm can also be used; this implementation is discussed in yet, these distributions can only be flat if the replicas are themselves

11746 | www.pnas.org/cgi/doi/10.1073/pnas.1605089113 Yu et al. Downloaded by guest on October 1, 2021 A simulation. These traces indicate that our REMD-MSSA simula- tion experienced several folding/unfolding events within 50 ns. This is a significant speed-up compared with a bare MD simulation, in which folding event are expected to take place every 500 ns to 1 μs (38). We used the REMD-MSSA simulation data to generate FES along the two order parameters, β-strand H-bonds NH and back- bone radius of gyration RG, from a 100-ns REMD-MSSA run (Fig. 2A and Fig. S7). This FES captures the main features present in FES obtained from previous longer REMD simulations (32, 34, 39, 40). Specifically, we observe basins corresponding to conformations that form no β-strand H-bonds, form one or two H-bonds as partially folded β-sheet, and fully β-sheet with more than three H-bonds. We also compare this FES with the one from 120-ns standard REMD starting from the same initial structures (Fig. 2B). It can be seen that the folded basins are populated in REMD- MSSA whereas only a few samples are seen in standard REMD. The FES from REMD-MSSA is in better agreement with previous longer simulation studies, indicating that REMD-MSSA can give a more converged FES than standard REMD within a 100-ns run due to higher sampling efficiency. BC To test convergence, in Fig. 2 C and D we show the trajectories and the distributions of σði, tÞ for one representative replica, for both REMD-MSSA and standard REMD (more representative trajectories and distributions are shown in Figs. S8 and S9). The round-trip times and the lifetime observed in REMD-MSSA are − again shorter than those in REMD with swap rate 1 ps 1,with values similar to those reported for AD, and the temperature dis- tributions are also a significantly flatter in REMD-MSSA than in REMD. To measure how flat these distributions are across all of the replicas,P we calculated the relative entropy of each, that is, = = σ SðpiÞ jpiðjÞlogpiðjÞ pref ðjÞ,wherepiðjÞ is the distribution ði, tÞ and pref ðjÞ = 1=60 is the target uniform distribution: If piðjÞ = pref ðjÞ, SðpiÞ = 0, otherwise it is positive, and the higher the value of Fig. 1. (A) Free energy surfaces along the two dihedral angles ϕ and ψ for = AD in vacuum at 300 K. (Upper Left) Fifty-nanosecond REMD-MSSA simu- SðpiÞ 0, the less flat piðjÞ is. These relative entropies are shown lation with four replicas. (Upper Right) Fifty-nanosecond ISREMD simula- after ordering from smallest to largest in Fig. 2E: They are signifi- tion with four replicas using the analytical weights calculated from Eq. 10. cantly smaller for REMD-MSSA than for standard REMD, in- (Lower Left) Fifty-nanosecond REMD-MSSA simulation with 12 replicas. dicative of faster mixing by the former than by the latter. (Lower Right) Fifty-nanosecond standard REMD simulation with 12 replicas Remarkably, careful investigation of the FES in Fig. 2 leads us − and a swapping rate of 0.5 ps 1. All plots have 60 × 60 bins and filtered by to find a folded state that is sampled in REMD-MSSA but not in standard Gaussian kernel. Same level sets are used in the contour plots. standard REMD. For reference, the native state’s structure and (B)Trajectoriesσði, tÞ of one replica. (C) Distributions of σði, tÞ for the same H-bond registry are shown in Fig. 3A; the native state is char- replicaafter0.8nsofsimulation.InB and C, the orange curve is from REMD- acterized by the β-strand H-bond pairs (E14,T1), (T12,T3), and MSSA and the blue one is from standard REMD, both with 12 replicas. (D10,T5). We detect three metastable states whose representative β β structures are depicted in Fig. 3 as insets and labeled as N, 2,and

M. The respective backbone H-bond contact maps of these states CHEMISTRY at equilibrium. A representative distribution from REMD-MSSA are shown in Fig. 3B. From these H-bond registries, we recognize β β and one from standard REMD, both calculated via time averaging N as the native state, whereas 2 represents a distinct, nonnative of σði, tÞ over 0.8 ns of simulations, are shown in Fig. 1C (the full set β-hairpin state that is not sampled in our 100 ns of standard of distributions can also be found in Fig. S5). As can be seen, the REMD, and M a misfolded state. Fig. 3C shows the FES at 270 K β β distribution from REMD-MSSA is almost uniform and significantly in the space of RG and NH, with states N, 2, and M indicated. In flatter than that from standard REMD, indicating that the former this space, the three states are not well-distinguished, indicating has converged after 0.8 ns, whereas the latter has not. that the (RG, NH) space is likely not optimal for understanding the conformational distribution of the folded states. We show in Folding of Protein G β-Hairpin in Explicit Solvent. As a second test we Fig. 3D the FES at 270 K in the space of two new collective BIOPHYSICS AND

considered a β-hairpin peptide in explicit water, whose folding be- CMAP CMAP COMPUTATIONAL BIOLOGY variables, d1 and d2 , respectively, built from all samples in havior resembles that of larger protein, and which has been in- > CMAP the REMD-MSSA at 270 K for which NH 2. Here, d1 vestigated by many techniques, including standard REMD (32– β CMAP β measures the L1 distance from state 2 and d2 from state N 37). Specifically, we studied the C-terminal fragment of the Ig (see Supporting Information for their definition). In this space, the binding domain B1 of protein G [Protein Data Bank (PDB) ID three states are easily distinguishable. To the best of our knowl- β β code 2gb1]. This capped peptide sequence contains 16 residues, edge, the -sheet state we observe here ( 2) has never been de- β 1-Ace-GEWTYDDATKTFTVTE-NMe-16, with 256 atoms. tected in any other simulations of this peptide. To verify that 2 is This β-hairpin is known as a hard-to-fold protein. In our REMD- metastable, we launched a 220-ns standard MD simulation at β MSSA simulations, the initial structures for the replicas are chosen 270 K from a 2 sample. The state remained stable for the duration from an unfolded ensemble that is generated from a high-tem- of that simulation, as can be seen from the driftless traces of several perature MD run. We then run REMD-MSSA and check how fast collective variables that feature the conformational structure folded state emerges and reasonable free energy profiles in various (Fig. 3E and Fig. S10). We also calculated the average potential β β ± order parameters are obtained. In Fig. S6, we show traces of two energy difference between 2 and N to be 1.4 7.7 kcal/mol, order parameters, namely the number of β-strand H-bonds NH which indicates they are energetically indistinguishable. S18 β β (defined in Eq. ) and the rmsd of the Cα s from the native Even though state 2 looks similar to N, its H-bond registry is structure (PDB ID code 2gb1), obtained by monitoring the evo- different, as evidenced by the O–H contact maps in Fig. 3B.Their CMAP CMAP β lution for a few representative replicas in the REMD-MSSA distinction is revealed in the (d1 ,d2 ) space. To access 2 from

Yu et al. PNAS | October 18, 2016 | vol. 113 | no. 42 | 11747 Downloaded by guest on October 1, 2021 A B landscape for the folding thermodynamics of this β-hairpin that has at least two distinct and deep minima and a misfolded kinetic trap. Concluding Remarks We have proposed a multiscale variant of REMD in which the swaps are performed via SSA, which effectively permits one to reach the infinite-swap limit known to optimize sampling efficiency. This REMD-MSSA is as simple to implement and parallelize as standard REMD and can be used with any set of temperatures, including sophisticated choices like those advocated, for example, in ref. 29 that would most likely improve the efficiency even more. It can also be used with parameters other than temperature (Figs. S11 and S12). Here its usefulness and efficiency were benchmarked C D on test cases of various complexity. In particular, in the context of folding the G β-hairpin, our REMD-MSSA method was shown to outperform standard REMD and other enhanced sampling tech- niques, as evidenced by the fact that it allowed us to identify a folded β structure previously undetected by these other methods. These results indicate that REMD-MSSA can be an efficient and accurate alternative to existing enhanced sampling methods and should be useful in a broad variety of MD applications. Simulation Details AD. We used REMD-MSSA runs with temperatures in geo- E metric progression between 300 K and 1,200 K; other, perhaps more efficient, choices of temperatures could be used as well (29), but this one proved sufficient to our purpose. We used a time step of 2 fs for the Langevin dynamics and chose γ = 5ps−1. The bond lengths between hydrogen and heavy atoms were kept constant via M-SHAKE (52). We choose ν to obtain about 104 jumps of σ on average between two consecutive MD steps. This led to using ν = 10000=Δt with 4 replicas and ν = 500=Δt with 12 replicas—ν must be larger with 4 rather than 12 replicas because the energy gaps between the replicas are larger in the former Fig. 2. Free energy surfaces of β-hairpin at 270 K along the two order pa- case, implying that the jump rate in Eq. 5 is smaller. In the rameters β-strand HB number and backbone radius of gyration RG (A) from 120 ns of standard REMD at swapping rate 1 ps−1 and (B) from 100 ns of REMD- MSSA. (C)Trajectoriesofσði, tÞ for a given replica i.(D) Distributions of σði, tÞ for this replica, from a 36-ns-long run. (E) Relative entropy of the distributions of σði, tÞ (in an ascending order). In C–E the orange curve is for REMD-MSSA and the blue one for standard REMD.

β N, the amino-terminal strand must flip 180° so that its outside O and H can point inside to form a hydrogen bond with H and O on the carboxyl-terminal strand (C-strand). This implies that either state must unfold first to break all H-bonds to get a chance to fold into the β other. We indeed observe that 2 is populated predominantly by β transitions from the unfolded pool rather than from the N pool. Another difference between the two folded states is the turn β π structure. N has a large -turn (41) composed of D10, D9, A8, β γ T7, and K6, whereas 2 has a small -turn (42) only involving D9, A8, and T7. Previous studies proposed folding of this β-hairpin uses either a “zipper” mechanism (36, 43–46) where hydrogen bonds form sequentially from the turn, or a “hydrophobic col- lapse” in which the folded state arises from a collapsed globule (32, 33, 35, 39, 47, 48). It is not clear from our simulation results γ β which of these is preferred, but the -turn of 2, being so tight, may be unlikely to form before a few H-bonds first lock in a registry between the two strands. We therefore suggest a more detailed mechanistic study based on string method (49, 50) or transition path sampling (51) in the future. In addition to two folded β-sheet Fig. 3. (A)Nativeβ-hairpin structure with H-bond registry indicated. (B)H–O states, we observed one extremely stable misfolded state, labeled β β M. Its contact map and conformation are shown in Fig. 3. This contact map of for N, 2, and M; in each map, the horizontal axis indicates the residue number of the amide H and the vertical axis the residue number of the misfolded state is stabilized by three β-strand hydrogen bonds and carbonyl O. (C) FES at 270 K in the space of number of β-strand H-bonds NH and two salt bridges between the K6 ammonium group and the car- the backbone radius of gyration RG. (D) FES at 270 K in the space of two con- boxylates on D10 and D9. Neither folded state displays any such tact-map distances for conformations with more than two β-strand H-bonds. side-chain salt bridging, and we therefore suggest that state M The insets in C and D are representative conformations of the native folded serves as a kinetic trap that slows the acquisition of authentic state, β ; the folded state β ; and a misfolded state, M. The FESs in C and D are β N 2 -sheet structure, either in the native or secondary conformation. built from a 100-ns-long run of REMD-MSSA. (E)TracesofNH and RG from a β β The picture that arises in light of these results is of a funnel-like 220-ns-long bare MD simulation, where the blue is for 2 and the green is for N.

11748 | www.pnas.org/cgi/doi/10.1073/pnas.1605089113 Yu et al. Downloaded by guest on October 1, 2021 standard REMD simulation with 12 replicas under same tem- with temperatures ranging from 270 K to 690 K in a geometric perature setting, we use the same MD parameters as above with progression (55)—here, too, other choices of temperatures could − a swapping rate 0.5 ps 1. All MD simulations were carried out be used (29). The initial structures for these replicas were taken with Desmond (v3.4.0.2) (25). from the 4-ns NVT simulation at 700 K. We used a friction co- efficientof5ps−1 in the Langevin thermostat and choose ν = 3=Δt β-Hairpin. We solvated the protein with 1,549 water molecules, + to get about 1,000 temperature swaps per MD step. In the stan- neutralized with three Na ions, resulting in a total system size of dard REMD simulation with 60 replicas under the same tem- × × 3 4,906 atoms and box dimensions of 36.3 36.3 36.3 Å . OPLS- perature setting we use random-neighbor swapping at the rate of AA is the force field for protein (53) and TIP3P for water (54). 1ps−1 and the same MD parameters as above. All MD simulations Long-range electrostatics are handled via particel-mesh Ewald were carried out with Desmond (v3.4.0.2) (25). summation with a 64 × 64 × 64 mesh. The cutoff for short-range nonbonded interactions was set to 12 Å. The integration time ACKNOWLEDGMENTS. This work was supported by National Science Foun- step is 2 fs with the bonds between hydrogen and heavy atoms dation Grant DMR-1207389 (to C.F.A., E.V.-E., and T.-Q.Y.), National Institutes constrained via M-SHAKE (52). The system is equilibrated at of Health Grant GM100472, and National Science Foundation Grant DMS- 270 K and 1 atm via 1-ns NPT Langevin dynamics then a 50-ns 1454939 (to J.L.). T.-Q.Y. acknowledges the technical support from Shenglong NVT simulation. REMD-MSSA simulations have 60 replicas Wang (New York University).

1. Giberti F, Salvalaglio M, Mazzotti M, Parrinello M (2015) Insight into the nucleation of 29. Katzgraber HG, Trebst S, Huse Da, Troyer M (2006) Feedback-optimized parallel tem- urea crystals from the melt. Chem Eng Sci 121:51–59. pering Monte Carlo. J Stat Mech Theory Exp 2006, 10.1088/1742-5468/2006/03/P03018. 2. Dror RO, et al. (2013) Structural basis for modulation of a G-protein-coupled receptor 30. Doll JD, Plattner N, Freeman DL, Liu Y, Dupuis P (2012) Rare-event sampling: Occu- by allosteric drugs. Nature 503(7475):295–299. pation-based performance measures for parallel tempering and infinite swapping 3. Yu T-Q, Lapelosa M, Vanden-Eijnden E, Abrams CF (2015) Full kinetics of CO entry, Monte Carlo methods. J Chem Phys 137(20):204112. internal diffusion, and exit in myoglobin from transition-path theory simulations. 31. Doll JD, Dupuis P (2015) On performance measures for infinite swapping Monte Carlo J Am Chem Soc 137(8):3041–3050. methods. J Chem Phys 142(2):024111. 4. Freddolino PL, Harrison CB, Liu Y, Schulten K (2010) Challenges in protein folding 32. Zhou R, Berne BJ, Germain R (2001) The free energy landscape for beta hairpin simulations: Timescale, representation, and analysis. Nat Phys 6(10):751–758. folding in explicit water. Proc Natl Acad Sci USA 98(26):14931–14936. 5. Lindorff-Larsen K, et al. (2010) Improved side-chain torsion potentials for the Amber 33. Nguyen PH, Stock G, Mittag E, Hu CK, Li MS (2005) Free energy landscape and folding ff99SB protein force field. Proteins 78(8):1950–1958. mechanism of a beta-hairpin in explicit water: A replica exchange molecular dynamics 6. Robertson MJ, Tirado-Rives J, Jorgensen WL (2015) Improved peptide and protein tor- study. Proteins 61(4):795–808. sional energetics with the OPLSAA force field. J Chem Theory Comput 11(7):3499–3509. 34. Bussi G, Gervasio FL, Laio A, Parrinello M (2006) Free-energy landscape for beta 7. Lopes P, Guvench O, MacKerell J, Alexander D (2015) Current status of protein force hairpin folding from combined parallel tempering and metadynamics. J Am Chem Soc – fields for molecular dynamics simulations. Molecular Modeling of Proteins, Methods 128(41):13435 13441. in Molecular Biology, ed Kukol A (Springer, New York), Vol 1215, pp 47–71. 35. Best RB, Mittal J (2010) Balance between alpha and beta structures in ab initio protein – 8. Shaw DE, et al. (2010) Atomic-level characterization of the structural dynamics of folding. J Phys Chem B 114(26):8790 8798. proteins. Science 330(6002):341–346. 36. Best RB, Mittal J (2011) Free-energy landscape of the GB1 hairpin in all-atom explicit 9. Stone JE, et al. (2007) Accelerating molecular modeling applications with graphics solvent simulations with different force fields: Similarities and differences. Proteins – processors. J Comput Chem 28(16):2618–2640. 79(4):1318 1328. 10. Voelz VA, Bowman GR, Beauchamp K, Pande VS (2010) Molecular simulation of ab initio 37. Hatch HW, Stillinger FH, Debenedetti PG (2014) Computational study of the stability β protein folding for a millisecond folder NTL9(1-39). JAmChemSoc132(5):1526–1528. of the miniprotein trp-cage, the GB1 -hairpin, and the AK16 peptide, under negative – 11. Holdren JP, et al. (2010) Designing a digital future: Federally funded research and pressure. J Phys Chem B 118(28):7761 7769. 38. Thukral L, Smith JC, Daidone I (2009) Common folding mechanism of a beta-hairpin development in networking and information technology (The President’s Council of peptide via non-native turn formation revealed by unbiased molecular dynamics Advisors on Science and Technology, Washington, DC). simulations. J Am Chem Soc 131(50):18147–18152. 12. Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for pro- 39. Bonomi M, Branduardi D, Gervasio FL, Parrinello M (2008) The unfolded ensemble tein folding. Chem Phys Lett 314(1–2):141–151. and folding mechanism of the C-terminal GB1 beta-hairpin. J Am Chem Soc 130(42): 13. Sindhikara D, Meng Y, Roitberg AE (2008) Exchange frequency in replica exchange 13938–13944. molecular dynamics. J Chem Phys 128(2):024103. 40. Ardevol A, Tribello GA, Ceriotti M, Parrinello M (2015) Probing the unfolded con- 14. Abraham MJ, Gready JE (2008) Ensuring mixing efficiency of replica-exchange mo- figurations of a β-hairpin using sketch-map. J Chem Theory Comput 11(3):1086–1093. lecular dynamics simulations. J Chem Theory Comput 4(7):1119–1128. 41. Rajashankar KR, Ramakumar S (1996) Pi-turns in proteins and peptides: Classification,

15. Rosta E, Hummer G (2009) Error and efficiency of replica exchange molecular dy- CHEMISTRY conformation, occurrence, hydration and sequence. Protein Sci 5(5):932–946. namics simulations. J Chem Phys 131(16):165102. 42. Némethy G, Printz MP (1972) The γ turn, a possible folded conformation of the 16. Sindhikara DJ, Emerson DJ, Roitberg AE (2010) Exchange often and properly in replica polypeptide chain. Comparison with the β turn. Macromolecules 5:755–758. exchange molecular dynamics. J Chem Theory Comput 6(9):2804–2808. 43. Muñoz V, Thompson PA, Hofrichter J, Eaton WA (1997) Folding dynamics and 17. Zhang BW, et al. (2016) Simulating replica exchange: Markov state models, proposal mechanism of beta-hairpin formation. Nature 390(6656):196–199. schemes, and the infinite swapping limit. J Phys Chem B. 44. Muñoz V, Henry ER, Hofrichter J, Eaton WA (1998) A statistical mechanical model for 18. Dupuis P, Liu Y, Plattner N, Doll JD (2012) On the infinite swapping limit for parallel beta-hairpin kinetics. Proc Natl Acad Sci USA 95(11):5872–5879. tempering. SIAM J Multiscale Model Simul 10:986–1022. 45. Du D, Zhu Y, Huang CY, Gai F (2004) Understanding the key factors that control the 19. Plattner N, et al. (2011) An infinite swapping approach to the rare-event sampling rate of beta-hairpin folding. Proc Natl Acad Sci USA 101(45):15915–15920. problem. J Chem Phys 135(13):134111. 46. Du D, Tucker MJ, Gai F (2006) Understanding the mechanism of beta-hairpin folding 20. Lu J, Vanden-Eijnden E (2013) Infinite swapping replica exchange molecular dynamics via phi-value analysis. Biochemistry 45(8):2668–2678. BIOPHYSICS AND leads to a simple simulation patch using mixture potentials. JChemPhys138(8):084105. 47. Klimov DK, Thirumalai D (2000) Mechanisms and kinetics of beta-hairpin formation. COMPUTATIONAL BIOLOGY 21. Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Proc Natl Acad Sci USA 97(6):2544–2549. – Chem 81(25):2340 2361. 48. Juraszek J, Bolhuis PG (2009) Effects of a mutation on the folding mechanism of a 22. Weinan E, Engquist WEB (2003) The heterogeneous multiscale methods. Commun beta-hairpin. J Phys Chem B 113(50):16184–16196. – Math Sci 1(1):87 132. 49. Weinan E, Ren W, Vanden-Eijnden E (2002) String method for the study of rare 23. Vanden-Eijnden E (2003) Numerical techniques for multi-scale dynamical systems with events. Phys Rev B 66:52301. – stochastic effects. Commun Math Sci 1(2):385 391. 50. Maragliano L, Fischer A, Vanden-Eijnden E, Ciccotti G (2006) String method in collective 24. Engquist WEB, Li X, Ren W, Vanden-Eijnden E (2007) Heterogeneous multiscale variables: Minimum free energy paths and isocommittor surfaces. J Chem Phys 125(2):24106. methods: A review. Commun Comput Phys 2(3):367–450. 51. Bolhuis PG, Chandler D, Dellago C, Geissler PL (2002) Transition path sampling: Throwing 25. Bowers KJ, et al. (2006) Scalable algorithms for molecular dynamics simulations on ropes over rough mountain passes, in the dark. Annu Rev Phys Chem 53:291–318. commodity clusters. Proceedings of the 2006 ACM/IEEE Conference on Super- 52. Lambrakos S, Boris J, Oran E, Chandrasekhar I, Nagumo M (1989) A modified shake computing (Assoc for Computing Machinery, New York). algorithm for maintaining rigid bonds in molecular dynamics simulations of large 26. MacKerell AD, et al. (1998) All-atom empirical potential for molecular modeling and molecules. J Comput Phys 85(2):473–486. dynamics studies of proteins. J Phys Chem B 102(18):3586–3616. 53. Jorgensen WL, Tirado-Rives J (1988) The OPLS [optimized potentials for liquid simu- 27. Mackerell AD, Jr, Feig M, Brooks CL, 3rd (2004) Extending the treatment of backbone lations] potential functions for proteins, energy minimizations for crystals of cyclic energetics in protein force fields: limitations of gas-phase quantum mechanics in peptides and crambin. J Am Chem Soc 110(6):1657–1666. reproducing protein conformational distributions in molecular dynamics simulations. 54. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) Comparison J Comput Chem 25(11):1400–1415. of simple potential functions for simulating liquid water. J Chem Phys 79:926–935. 28. Wang Y, Harrison CB, Schulten K, McCammon JA (2011) Implementation of accelerated 55. Kofke DA (2002) On the acceptance probability of replica-exchange Monte Carlo molecular dynamics in NAMD. Comput Sci Discov 4(1):015002. trials. J Chem Phys 117(15):6911–6914.

Yu et al. PNAS | October 18, 2016 | vol. 113 | no. 42 | 11749 Downloaded by guest on October 1, 2021