Downloaded by guest on September 24, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1907975116 dynamics molecular collective handled. several be allows to and variables surfaces, artifacts, free-energy effects boundary varying net- removes rapidly neural representing of allows expressivity works more an The a minimizing technique. of at minimization development efficient the aimed required scheme This learning functional. appropriate variational by deter- a achieved are parameters in is The This mined network. neural potential. a as such bias determine the expressing to order accelerate Parrinello, sampling in enhanced M. to (2014)] Valsson, variationally order recent [O. in the techniques method machine-learning fluctuations with use conjunction their to potential in propose bias favor we a to Here, introduce direct sampling. num- able to small a and is a variables renders identify that collective to often key is of surfaces strategy ber popular presence such A The in useless. methods. approach bottlenecks chal- simulation main kinetic atomistic the Pfaendtner) of Jim of modern and one Carleo of Giuseppe is by lenges reviewed surfaces 2019; free-energy 8, May complex review Sampling for (sent 2019 9, July Parrinello, Michele by Contributed Italy Genova, 16163 Switzerland; d Lugano, 6900 (USI), a Bonati Luigi sampling enhanced variationally networks-based Neural h a nwihVSi omlyue st expand use to and its is Despite polynomials parameters. used orthonormal variational normally as of coefficients is combination expansion the VES linear a which in in way the the and (21), (22). ensembles transitions multithermal–multibaric of phase of calculation sampling second-order in the of appli- indexes (18), parameters for critical the models these also of free-energy of estimation potential Ginzburg–Landau Examples the great are sampling. has applications enhanced and heterodox from 20) different (19, cations flexible be to distribution target assigned minimizes divergence that of (KL) distribution Kullback–Leibler bias a The is to (18). related bias closely the is VES, functional tional a In minimizing (17). by (VES) determined enhanced Sampling Enhanced from ationally profit turn in can also applications (16). is sampling ML new it experiences, construct that early to clear these and From (6–10) (11–15). CVs methodologies appropriate by identify enhanced to order then CVs is chosen Sampling potential the bias (5). suitable external system an of the constructing of identifications relax- enhanced slowest modes the the of to ation connected on family are that based (CVs) important of variables collective range An is restricted explored. methods very be sampling a to simulations, only scales (MD) allow time dynamics methods molecular sam- standard in accelerate where to issue (4) used crucial potentials. processes been a accurate have Gaussian pling, generate methods to or ML used recently, 3) More routinely (2, almost Parrinello now (NNs) and are Behler networks of work neural the (1), Since exceptions. no are tions M eateto hsc,EHZrc,89 uih Switzerland; Zurich, 8092 Zurich, ETH Physics, of Department eateto hmsr n ple isine,EHZrc,89 uih wteln;and Switzerland; Zurich, 8092 Zurich, ETH Biosciences, Applied and Chemistry of Department lhuhdfeetapoce aebe ugse 2,24), (23, suggested been have approaches different Although Vari- called method, new relatively a on focus shall we Here, cec scnutd tmsi-ae optrsimula- computer Atomistic-based conducted. modern which is in way science the changing is (ML) learning achine a,b,c s ntebae ensemble biased the in u-uZhang Yue-Yu , | nti otx,M a enapidin applied been has ML context, this In s. nacdsampling enhanced c p ainlCne o opttoa einadDsoeyo oe aeil MRE) S,60 uao Switzerland; Lugano, 6900 USI, (MARVEL), Materials Novel of Discovery and Design Computational for Center National h ehdhsbe shown been has method The (s). Ω b,d ssc htteprobability the that such is hs e.Lett. Rev. Phys. n ihl Parrinello Michele and , | eplearning deep V Ω[ = Ω P hc eed on depends which (s), V (s) V seult pre- a to equal is hsfunc- This (s)]. 1,090601 113, b Facolt iIfraia nttt iSineCmuainl,Universit Computazionali, Scienze di Instituto Informatica, di a V ` (s b,c,d,e,1 ) 1073/pnas.1907975116/-/DCSupplemental ulse nieAgs 5 2019. 15, August online Published this of logarithm the as distribution: (FES) surface free-energy associated the and energy where hsatcecnan uprigifrainoln at online information supporting contains article This 1 the under Published interest.y of conflict no performed declare Y.-Y.Z. authors Washington.y The of and University J.P., paper. y and L.B. the Institute; wrote M.P. Flatiron research; and G.C., L.B. Reviewers: designed and data; M.P. analyzed M.P. and Y.-Y.Z., and L.B., research; L.B. contributions: Author h rcs fitrs oocr ecnie h equilibrium the consider We as: CVs occur. these to of distribution interest probability of process CVs the of coordinates number atomic restricted the a of to system the of Variables. Collective VES. particularly and CV-based methods of sampling ideas some enhanced recall we method, our illustrating Before VES Networks-Based VES Neural conventional more to applied we profitably doing, be applications. so scheme also In optimization can stochastic parameters. efficient that NN more framework a the descent developed of have steepest determination stochastic the a for and potential bias the convergence. slow very to lead Finally, may large. unmanageably CVs expansion. become nonoptimal set can with and basis exponentially CVs of scales the number parameters in the variational terms of many number require The may features not VES and sharp the expediency Representing in computational motivations. of physical matter the on of a grounded choice often The is problems. set without basis not is VES successes, many owo orsodnemyb drse.Eal [email protected] Email: addressed. be may correspondence whom To erigadalw st eetfo h ail growing rapidly the from benefit area. this to in us advances machine allows and devel- and sampling Our enhanced learning simulations. of fields such be the of can bridges scope opment progress the much extending learning, in made deep vari- a with combining by approach that ational show we Here, proposed. techniques been sampling have order enhanced In of many problem, curtailed. presence this severely mitigate the to is power in their However, bottlenecks, science. kinetic used widely contemporary most in the tools of one are simulations Atomistic-based Significance nti ae,w s h xrsiiy(5 fNst represent to NNs of (25) expressivity the use we paper, this In Z stepriinfnto ftesystem, the of function partition the is PNAS β e (k = opttoa cec,IainIsiueo Technology, of Institute Italian Science, Computational P NSlicense.y PNAS | = (s) etme ,2019 3, September B T ti fe osbet euetedescription the reduce to possible often is It ) −1 Z d h nes eprtr.W a define can We temperature. inverse the R hs utain r rtclfor critical are fluctuations whose R, e −β . y Z U | ( R) o.116 vol. δ (s www.pnas.org/lookup/suppl/doi:10. − s(R)), | el vzeaitaliana Svizzera della a ` o 36 no. s U = (R) functions s(R), | t potential its 17641–17647 [1]

CHEMISTRY 1 F (s) = − log P(s). [2] β Then, an external bias is built as a function of the chosen CVs in order to enhance sampling. In umbrella sampling (26), the bias is static, while in metadynamics (27), it is iteratively built as a sum of repulsive Gaussians centered on the points already sampled.

The Variational Principle. In VES, a functional of the bias potential is introduced:

1 R dse−β(F(s)+V (s)) Z Ω[V ] = log + ds p(s)V (s), [3] β R dse−βF(s)

where p(s) is a chosen target probability distribution. The func- tional Ω is convex (17), and the bias that minimizes it is related to the free energy by the simple relation: 1 F (s)= −V (s) − log p(s). [4] β

At the minimum, the distribution of the CVs in the biased ensemble is equal to the target distribution:

pV (s) = p(s), [5] Fig. 1. NN representation of the bias. The inputs are the chosen CVs, whose values are propagated across the network in order to get the bias. The parameters are optimized according to the variational principle of Eq. 3. where pV (s) is defined as:

−β(F(s)+V (s)) e Here, the nonlinear activation function is taken to be a rectified pV (s)= R . [6] dse−β(F(s)+V (s)) linear unit. In the last layer, only a linear combination is done, and the output of the network is the bias potential. In other words, p(s) is the distribution the CVs will follow when We are employing NNs since they are smooth interpolators. the V (s) that minimizes Ω is taken as bias. This can be seen Indeed, the NN representation ensures that the bias is continu- also from the perspective of the distance between the distribu- ous and differentiable. The external force acting on the ith atom tion in the biased ensemble and the target one. The functional can be then recovered as: can be indeed written as βΩ[V ] = DKL(p k pV ) − DKL(p k P) n (18), where DKL denotes the KL divergence. X ∂V Fi = −∇Ri V = − ∇Ri sj , [9] ∂sj The Target Distribution. In VES, an important role is played by j =1 the target distribution p(s). A careful choice of p(s) may focus where the first term is efficiently computed via back-propagation. sampling in relevant regions of the CVs space and, in gen- The coefficients {w i } and {bi } that we lump in a single vector eral, accelerate convergence (28). This freedom has been taken w will be our variational coefficients. With this bias represen- advantage of in the so called well-tempered VES (19). In this tation, the functional Ω[V ] becomes a function of the param- variant, one takes inspiration from well-tempered metadynamics eters w. Care must be taken to preserve the symmetry of the (29) and targets the distribution: CVs, such as the periodicity. In order to accelerate conver- gence, we also standardize the input to have mean zero and e−βF(s)/γ p(s) = ∝ [P(s)]1/γ , [7] variance one (30). R ds e−βF(s)/γ The Optimization Scheme. As in all NN applications, training where P(s) is the distribution in the unbiased system and γ > 1 plays a crucial role. The functional Ω can be considered a is a parameter that regulates the amplitude of the s fluctuations. scalar loss function with respect to the set of parameters w. This choice of p(s) has proven to be highly efficient (19). Since We shall evolve the parameters following the direction of the at the beginning of the simulation, F (s) is not known, p(s) is Ω derivatives: determined in a self-consistent way. Thus, also the target evolves ∂Ω  ∂V   ∂V  during the optimization procedure. In this work, we update the = − + , [10] target distribution every iteration, although less frequent updates ∂w ∂w V ∂w p are also possible. where the first average is performed over the system biased by NN Representation of the Bias. The standard practice of VES has V (s) and the second over the target distribution p(s). It is worth been so far of expanding linearly V (s) on a set of basis functions noting that the form of the gradients is analogous to unsuper- and use the expansion coefficients as variational parameters. vised schemes in energy-based models, where the network is Here, this procedure is circumvented as the bias is expressed as optimized to sample from a desired distribution (31). a deep NN, as shown in Fig. 1. We call this variant DEEP-VES. At every iteration, the sampled configurations are used to The inputs of the network are the chosen CVs, and this informa- compute the first term of the gradient. The second term, which tion is propagated to the next layers through a linear combination involves an average over the target distribution, can be computed followed by the application of a nonlinear function a (25): numerically when the number of CVs is small or using a Monte Carlo scheme when the number of CV is large. In doing so, the xl+1 = a(wl+1xl + bl+1). [8] exponential growth of the computational cost with respect to the

17642 | www.pnas.org/cgi/doi/10.1073/pnas.1907975116 Bonati et al. Downloaded by guest on September 24, 2021 Downloaded by guest on September 24, 2021 oaie al. than et is other Bonati shoulder pathways left new potential the creates bias on which static accuracy the bias, of with the lack reweight of The the strength simulations. performing the different by practically and 8 removed variable is 10 of are chosen it of results artifacts the regions until the These of energy. averaging exponentially character minimum by decreased suboptimal of given the is one is of the rate error combination learning the the the of of estimate value, consequence an a threshold also the curve, below DEEP-VES lowered the For is divergence KL the (C zero. When (B, (gray). value. factor energy scaling their rate to according colored are i.2. Fig. by context, biased present run the a mini- In that useful. means the be this to reaching value bias optimum at the the to of the aim enough value a in not determining do at attitude rather rather we mum, prevailing is Namely, the NNs community. to espouse NN algorithm we same are reach- Therefore CVs the allows the complex. Applying that which quality. provided (33), far good minimum, Moulines the so of accuracy and has high Bach choice with of of ing one method the the necessary. applications been is VES method standard optimization In stochastic a akin averages, learning quantum tistical to variational networks of neural type of applications a distribution in (32). is physics done target This is the what space. sample to CVs to system the the in the drives algo- respect, that learning this policy reinforcement In a dataset. potential given The resembles rithm. a here simula- to used the network stop approach the to fit need the and NN the without tion allows on-the-fly, scheme optimized This be circumvented. to be can CV of number otmn qiiru vrgsfo h elkonumbrella well-known formula: the sampling-like from averages equilibrium Boltzmann ic h aclto ftegainsrqie efrigsta- performing requires gradients the of calculation the Since reeeg rfie bandfo h Nadwt h eegtn rcdr,cmae ihterfrneotie yitgaigtemodel. the integrating by obtained reference the with compared procedure, reweighting the with and NN the from obtained profiles Free-energy ) oeta nrysraeo h oe.( B, model. the of surface energy Potential (A) model. 2D the for Results k B T rmterfrneminimum. reference the from V (s) a ergre neda stochastic a as indeed regarded be can V s (s ) a eue ocmuethe compute to used be can Lower V vlto fteK iegnebtenteba n h agtdsrbto gen n learning and (green) distribution target the and bias the between divergence KL the of Evolution ) s (s ) hti close is that ogrudtdadsaitc sacmltduigEq. using accumulated is statistics provided and zero updated to longer brought exponentially is that rate learning the threshold on, preassigned a until used, value configurations is (34) of optimizer ADAM number the limited a divergence. only KL running way, the aver- to a contribute decaying such exponentially In as ages. estimated are quantities These probability biased the of estimates p running the between toward gence minimization the of progression Eq. the measure to order In Upper V ( n h iuaini hsdvddit at.I h rtone, first the In parts. 3 into divided thus is simulation The ) (s emntra trto step iteration at monitor we 5, D  Veouina ucino h ubro trtos h points The iterations. of number the of function a as evolution CV ) ) KL n h target the and o h distance the for ean below remains D KL ( n PNAS ) hO ( p (R)i V | k p etme ,2019 3, September p ( = n rmti on n h ewr sno is network the on, point this From . D = ) ) (s D KL O ): X ( V he (R)e s s p MEo h E optdi the in computed FES the on RMSE (D) . β V p V V ( s β n k ( n V ) s log (s) ( p s napoiaeK diver- KL approximate an R)) ( | s ) ( i R)) o.116 vol. srahd rmthen From reached. is V s E p p V V ( (t s n ) ) . (s) (s) | o 36 no. . The 11. | 17643 [11] [12]

CHEMISTRY Fig. 3. Alanine dipeptide free-energy results. (A) DEEP-VES representation of the bias. (B) FES profile obtained with reweighting. (C) Reference from a 100-ns metadynamics simulation. The main features of the FES are captured by the NN, which allows for an efficient enhanced sampling of the conformations space. Finer details can be easily recovered with the reweighting procedure.

longer the second part, the better the bias potential Vs , and the accurate result (Fig. 2D), removing also the artifacts caused in shorter phase 3 needs to be. Faster decay times instead need to the first part by the rapidly varying potential. be followed by longer statistics accumulation runs. However, in judging the relative merits of these strategies, it must be taken Alanine Dipeptide and Tetrapeptide. As a second example, we con- into account that the third phase in which the bias is kept con- sider the case of 2 small peptides, alanine dipeptide (Fig. 4) and stant involves a minor number of operations, and it is therefore alanine tetrapeptide (Fig. 5) in vacuum, which are often used as much faster. a benchmark for enhanced sampling methods. We will refer to them as Ala2 and Ala4, respectively. Their conformations can be Results described in terms of the Ramachandran angles φi and ψi , where the first ones are connected to the slowest kinetic processes. The Wolfe–Quapp Potential. We first focus on a toy model, namely the smaller Ala2 has only 1 pair of such dihedral angles, which we 2D Wolfe–Quapp potential, rotated as in ref. 24. This is shown in will denote as {φ, ψ}, while Ala4 has 3 pairs of backbone angles Fig. 2A. The reason behind this choice is that the dominant fluc- denoted by {φi , ψi } with i = 1, 2, 3. tuations that lead from one state to the other are in an oblique We want to show here the usefulness of the flexibility provided x y direction with respect to and . We choose on purpose to use by DEEP-VES in different systems. For this purpose, we use the x only as a CV in order to exemplify the case of a suboptimal same architecture and optimization scheme as in the previous CV. This is representative of what happens in the practice when, example. We only decrease the decay time of the learning rate more often than not, some slow degrees of freedom are not fully in the second phase since the dihedral angles {φ, ψ} are known accounted for (24). to be a good set of CVs. In order to enforce the periodicity of We use a 3-layer NN with [48,24,12] nodes, resulting in 1,585 bias potential, the angles are first transformed in their sines and variational parameters, which are updated every 500 steps. The cosines: {φ, ψ} → {cos(φ), sin(φ), cos(ψ), sin(ψ)}. KL divergence between the biased and the target distribution is The simulation of Ala2 reached the KL divergence threshold computed with an exponentially decaying average with a time after 3 ns, and from there on, the learning rate was expo- 5 · 104  constant of iterations. The threshold is set to 0.5 and nentially decreased. The NN bias was no longer updated after 5 · 103 the learning-rate decay time to iterations. The results 12 ns. At variance with the first example, when the learning is SI are robust with respect to the choice of these parameters ( slowed down and even when it is stopped, the transition rate Appendix ). is not affected: this is a signal that our set of CVs is good (SI B In the upper right panel (Fig. 2 ), we show the evolution of Appendix). In Fig. 3, we show the free-energy profiles obtained the CV as a function of the number of iterations. At the begin- from the NN bias following Eq. 4 and the one recovered with ning, the bias changes very rapidly with large fluctuations that the reweighting procedure of Eq. 11. We compute the root- help to explore the configuration space. It should be noted that mean-square error (RMSE) on the FES up to 20 kJ/mol, as the combination of a suboptimal CV and a rapidly varying poten- in ref. 19. The errors are 1.5 and 0.45 kJ/mol, corresponding tial might lead the system to pathways different from the lowest to 0.6 and 0.2 kB T , both well below the threshold of chemical energy one. For this reason, it is important to slow down the accuracy. optimization in the second phase, where the learning rate is In order to exemplify the ability of DEEP-VES to represent exponentially lowered until it is practically zero. When this limit functions of several variables, we study the FES of Ala4 in terms is reached, the frequency of transitions becomes lower, reflect- of its 6 dihedral angles {φ1, ψ1, φ2, ψ2, φ3, ψ3} (Fig. 6). In the ing the suboptimal character of the CV (24), as can be seen in Fig. 2B. Although the bias itself is not yet fully converged, still the main features of the FES are captured by Vs (s) (Fig. 2C). This is ensured by the fact that while decreasing the learning rate, the running estimate of the KL divergence must stay below the threshold ; otherwise, the optimization proceeds with a constant learning rate. This means that the final potential is able to pro- mote efficiently transitions between the metastable states. The successive static reweighting refines the FES and leads to a very Fig. 4. Alanine dipeptide.

17644 | www.pnas.org/cgi/doi/10.1073/pnas.1907975116 Bonati et al. Downloaded by guest on September 24, 2021 Downloaded by guest on September 24, 2021 ntrso e-tmcytliiymaue.Te,tenumber the defined Then, are measures. CVs crystallinity many per-atom consequence, of a terms As marked in character. are local process a melting the by and crystallization The around process. barrier high is a This by 80 studied. characterized is pressure phenomenon, ambient complex at a solid to liquid from silicon its Crystallization. Silicon and metadynamics Appendix). from (SI results (36) version the obtained Bias with profiles Parallel accuracy free-energy reweight the the the verify compared from to we between order results, recrossings In these multiple of observed. after are initiated states be metastable static can the to region schedule. phase optimization learning-rate bias constant the the to from Thus, devoted transition high. be The very is to modes needs slow attention relevant less the the of by all spanned covering space of the CVs of probability the increases, treated CV Eq. target factor use the that well-tempered Monte fact the Con- a the by (35). to by algorithm resort helped Metropolis we is the thus, vergence on space; 6D based a estimation the in Carlo on possible integral not second clearly the is variables, Eq. of of side number right-hand small a of case oaie al. et Bonati in reported is references, the with comparison the well 6. Fig. k B T w-iesoa Eso l4otie ihterwihig safnto fte3dhda angles dihedral 3 the of function a as reweighting, the with obtained Ala4 of FESs Two-dimensional n eea V aebe rpsdt nac this enhance to proposed been have CVs several and , 12 omntrcnegne oee,a h ubrof number the as However, convergence. monitor to i.5. Fig. ntels xml,tepaetasto of transition phase the example, last the In 10 scluae ueial nagi.This grid. a on numerically calculated is lnn tetrapeptide. Alanine γ (Eq. .I hscs,w cannot we case, this In 7). p IAppendix SI (s) sbroadened is . osatptnilqiky n re oke tlwr(nthe (in lower it keep to reaching tries of spirit one the quickly, in potential but value con- rate, constant higher learning the a monitor a the to applies of to choose argument constant to used similar decay possible A only speed. is compromising is this it without greater this therefore the and However, variables, vergence be. neglected greater should the the of scale so time distribution, target relaxation the the samples system the which attention more CVs, nonoptimal simulation of needed. the case is on the impact limited in a set However, has chosen result. choice the this if good, that decay is is CVs the experience of Our and rate. threshold, learning for the KL scale for the time time divergence, pro- the KL This namely parameters. the parameters, of calculating 3 choice the setting on requires words cedure few a spend to like Parameters. the updated of longer Choice no is bias the and ns. ns, divergence 30 esti- 10 KL after good in The reweight. a reached subsequent is recovering the threshold for in FES and the states of mate 2 bias the NN between the good efficiently boundaries, a sharp learn very to reweight- of able by is presence recovered the one free-energy the in well the Even as show ing. NN we the 7, and by Fig. learned FES In profile the of artifacts. features boundaries varying avoid rapidly to represent to here use expansion we set basis vari- high-order employ a VES. to or the needs (37) of often metadynamics one of space, ants such variables, in these minima, fluctuations of narrow the terms and very by In 0 characterized CV. around is as profile used free-energy is the atoms solid-like of h Ldvrec hudb acltdo iesaein scale time a on calculated be should divergence KL The which NNs, of flexibility the using addressed easily be can This PNAS N atoms | nodrt nac na fcetway efficient an in enhance to order In . V etme ,2019 3, September {φ s hc losfrgigbc n forth and back going for allows which , fe rsnigtersls ewould we results, the presenting After 1 , φ 2 , φ 3 h E ntrso te ar,as pairs, other of terms in FES The }. | o.116 vol. | o 36 no. | 17645

CHEMISTRY efficient scheme to exploit the variational principle in the context of NNs, as well as learning not only the bias but also the CVs on-the-fly. This work also allows tapping into the immense liter- ature on ML and NNs for the purpose of improving enhanced sampling.

Materials and Methods The VES-NN is implemented on a modified version of PLUMED2 (38), linked against LibTorch (PyTorch C++ library). All data and input files required to reproduce the results reported in this paper are available on PLUMED-NEST (www.-nest.org), the public repository of the PLUMED consortium (39), as plumID:19.060 (40). We plan to release the code also in the open-source PLUMED package. We take care of the con- struction and the optimization of the NN with LibTorch. The gradients of the functional with respect to the parameters are computed inside PLUMED according to Eq. 10. The first expectation value is computed by sampling, while the second is obtained by numerical integration over a grid or with Monte Carlo techniques. In the following, we report the setup for the VES-NN and the simulations for all of the examples reported in the paper.

Fig. 7. FES of silicon crystallization, in terms of the number of cubic dia- Wolfe–Quapp Potential. The Wolfe–Quapp potential is a fourth-order poly- monds atoms in the system. Snapshots of the 2 minima, as well as the nomial: U(x, y) = x4 + y4 − 2 x2 − 4 y2 + xy + 0.3 x + 0.1y. We rotated it by transition state, are also shown. The reweight is obtained using Vs. an angle θ = −3/20 π in order to change the direction of the path connect- ing the 2 minima. A langevin dynamics is run with PLUMED using a timestep of 0.005, a target temperature of 1 and a friction parameter equal to 10 (in order of thousands of iterations). Finally, we found that the pro- terms of natural units). The biasfactor for the well-tempered distribution is tocol is robust with respect to the epsilon parameter of the KL equal to 10. An NN composed by [48,24,12] nodes is used to represent the divergence threshold, provided that it is chosen in a range of val- bias. The learning rate is equal to 0.001. An optimization step of the NN ues around 0.5. In the case of FES with larger dimensionality, it is performed every 500 timesteps. The running KL divergence is computed on a timescale of 5 · 104 iterations. The threshold is set to  = 0.5, and the may be appropriate to increase this parameter in order to reach decay constant for the learning rate is set to 5 · 103 iterations. The grid for rapidly a good estimate of Vs . In the supplementary informa- computing the target distribution integrals has 100 bins. The parameters of tion, we report a study of the influence of these parameters in the NN are the kept the same also for the following examples, unless other- the accuracy of the bias learned and the time needed to converge wise stated. for the Wolfe–Quapp potential. Peptides. For the alanine dipeptide (Ace-Ala-Nme) and tetrapeptide (Ace- Conclusions Ala3-Nme) simulations, we use GROMACS (41) patched with PLUMED. The In this work, we have shown how the flexibility of NNs can be peptides are simulated in the canonical ensemble using the Amber99-SB used to represent the bias potential and the FES as well, in the force field (42) with a time step of 2 fs. The target temperature of 300 K is controlled with the velocity rescaling thermostat (43). For Ala2, we use context of the VES method. Using the same architecture and 3 similar optimization parameters we were able to deal with dif- the following parameters: decay constant equal to 10 and threshold equal ferent physical systems and FES dimensionalities. This includes to  = 0.5. A grid of 50 × 50 bins is used. For Ala4, the Metropolis algorithm is used to generate a set of points according to the target distribution. At also the case in which some important degree of freedom is left every iteration, 25,000 points are sampled. The learning rate is kept constant out from the set of enhanced CVs. The case of alanine tetrapep- for the first 10 ns and then decreased with a decay time of 2 · 103 iterations. tide with a 6D bias already shows the capability of DEEP-VES In both cases, the bias factor is equal to 10. of dealing with high-dimensional FES. We plan to extend this to even higher dimensional landscapes, where the power of NNs Silicon. For the silicon simulations, we use Large-Scale Atomic/Molecular can be fully exploited. Massively Parallel Simulator (44) patched with PLUMED, employing the Still- Our work is an example of a variational learning scheme, inger and Weber potential (45). A 3 × 3 × 3 supercell (216 atoms) is where the NN is optimized following a variational principle. simulated in the isothermal-isobaric ensemble with a timestep of 2 fs. The Also, the target distribution allows an efficient sampling of the temperature of the thermostat (43) is set to 1,700 K with a relaxation time relevant configurational space, which is particularly important in of 100 fs, while the values for the barostat (46) are 1 atm and 1 ps. The CV used is the number of cubic diamond atoms in the system, defined accord- the optics of sampling high-dimensional FES. ing to ref. 47. The decay time for the learning rate is 2 · 103, and a grid of In the process, we have developed a minimization procedure 216 bins is used. The bias factor used is equal to 100. alternative to that of ref. 17, which globally tempers the bias potential based on a KL divergence between the current and ACKNOWLEDGMENTS. This research was supported by the National Cen- the target probability distribution. We think that conventional tre for Computational Design and Discovery of Novel Materials MARVEL, funded by the Swiss National Science Foundation, and European Union VES can also benefit from our approach and that we have Grant ERC-2014-AdG-670227/VARMET. Calculations were carried out on made another step in the process of sampling complex free- the Euler cluster at ETH Zurich. We thank Michele Invernizzi for useful energy landscapes. Future goals might include developing a more discussions and for carefully reading the paper.

1. J. Behler, M. Parrinello, Generalized neural-network representation of high- 5. O. Valsson, P. Tiwary, M. Parrinello, Enhancing important fluctuations: Rare events dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007). and metadynamics from a conceptual viewpoint. Annu. Rev. Phys. Chem. 67, 159–184 2. J. Behler, Perspective: potentials for atomistic simulations. J. Chem. (2016). Phys. 145, 170901–170901 (2016). 6. W. Chen, A. R. Tan, A. L. Ferguson, Collective variable discovery and enhanced sam- 3. L. Zhang, J. Han, H. Wang, R. Car, E. Weinan, Deep potential : A pling using autoencoders: Innovations in network architecture and error function scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 design. J. Chem. Phys. 149, 72312 (2018). (2018). 7. M. Schoberl,¨ N. Zabaras, P.-S. Koutsourelakis, Predictive collective variable discovery 4. A. P. Bartok,´ M. C. Payne, R. Kondor, G. Csanyi,´ Gaussian approximation potentials: with deep Bayesian models. J. Chem. Phys. 150, 024109 (2018). The accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 8. C. X. Hernandez,´ H. K. Wayment-Steele, M. M. Sultan, B. E. Husic, V. S. Pande, 136403 (2010). Variational encoding of complex dynamics. Phys. Rev. E 97, 062412 (2018).

17646 | www.pnas.org/cgi/doi/10.1073/pnas.1907975116 Bonati et al. Downloaded by guest on September 24, 2021 Downloaded by guest on September 24, 2021 3 .Glei,Y uia erlntokadnaetniho loihsfrenhancing for algorithms neighbor nearest and network Neural Sugita, Y. autoencoded Galvelis, R. Reweighted 13. Tiwary, P. Cs paths. Wang, G. Bernstein, without Y. N. Mones, variables Bravo, L. collective P. 12. Path Ribeiro, Lamim Parrinello, M. M. J. 11. Piccini, G. Mendels, D. 10. 4 .Sdy .K hte,Lann reeeg adcpsuigatfiilneural artificial using landscapes energy free Learning Whitmer, K. J. Sidky, H. 14. 5 .Zag .Wn,E enn enocddnmc o nacdsmln nlarge in sampling enhanced for dynamics Reinforced Weinan, E. Wang, H. Zhang, L. 15. 6 .Bnt,M arnlo iio iudsrcueadcytlnceto rmab-initio from nucleation crystal and structure liquid Silicon Parrinello, M. Bonati, L. 16. 8 .Safr .Vlsn .Priel,Ehne,tree apigo high- of sampling targeted Enhanced, Parrinello, M. Valsson, O. Shaffer, P. 28. minima. free- free-energy Carlo Escaping Parrinello, Monte M. in Laio, distributions A. sampling 27. Nonphysical Valleau, P. J. Torrie, M. G. 26. approach multiscale A situation: bad a of dif- best the energy Making Parrinello, free M. Invernizzi, obtaining M. for 24. bias Bespoke Parrinello, nucleation M. a Valsson, to from O. simulations approach McCarty, molecular J. variational Multithermal-multibaric 23. A Parrinello, M. Parrinello, Piaggi, M. M. P. Valsson, 22. O. Piaggi, free-energy optimized M. Variationally P. Parrinello, 21. M. Tiwary, P. Valsson, O. McCarty, J. sampling. 20. enhanced to approach variational Well-tempered Parrinello, M. Valsson, O. 19. enhanced variationally from graining Coarse Parrinello, M. Valsson, O. Invernizzi, M. energy free 18. and sampling enhanced to approach Variational Parrinello, M. Valsson, O. 17. 5 .Goflo,Y ego .Courville, A. Bengio, Y. Goodfellow, I. 25. oaie al. et Bonati .C emyr .No F. Wehmeyer, C. 9. aibe o oeua kinetics. molecular for variables apigo oeua dynamics. molecular of sampling (2016). 5110 regression. process Gaussian with surfaces energy (RAVE). sampling enhanced for Bayes variational (2018). arXiv:1803.03076 networks. tmcadmlclrsystems. molecular and atomic epMetadynamics. deep iesoa reeeg adcpsuigvrainlyehne apig ihan with sampling, chignolin. enhanced to variationally application using landscapes free-energy dimensional (2002). 12562–12566 sampling. Umbrella estimation: energy 2016). calculation. energy free to sampling. enhanced (2016). 2162–2169 variationally within ferences principle. variational simulation. calculation. rate for flooding Comput. Theory Chem. J. model. (2017). Ginzburg–Landau 3370–3374 the to applied sampling calculations. .Ce.Phys. Chem. J. aaa Discuss. Faraday hs e.Lett. Rev. Phys. ,Tm-agdatecdr:De erigo lwcollective slow of learning Deep autoencoders: Time-lagged e, ´ hs e.Lett. Rev. Phys. hs e.Lett. Rev. Phys. rc al cd c.U.S.A. Sci. Acad. Natl. Proc. 011(2018). 104111 148, 9620 (2015). 1996–2002 11, .Ce.Ter Comput. Theory Chem. J. 5–6 (2016). 557–568 195, 961(2014). 090601 113, ni xlrto,smln,adrcntuto ffree of reconstruction and sampling, Exploration, anyi, ´ hs e.Lett. Rev. Phys. .Ce.Phys. Chem. J. .Ce.Phys. Chem. J. .Ce.Ter Comput. Theory Chem. J. 671(2018). 265701 121, 561(2019). 050601 122, .Cmu.Phys. Comput. J. epLearning Deep 761(2015). 070601 115, 213(2018). 124113 148, 473(2018). 241703 148, .Ce.Phys. Chem. J. .Ce.Ter Comput. Theory Chem. J. rc al cd c.U.S.A. Sci. Acad. Natl. Proc. 1015 (2016). 1150–1155 113, 1729 (2019). 2187–2194 15, .Ce.Ter Comput. Theory Chem. J. rc al cd c.U.S.A. Sci. Acad. Natl. Proc. MTPes abig,MA, Cambridge, Press, (MIT 8–9 (1977). 187–199 23, 4920 (2017). 2489–2500 13, –0(2018). 1–10 149, 5100– 12, 114, 99, 12, 7 .M igi .Priel,Paedarm rmsnl oeua dynamics molecular single from diagrams Phase Parrinello, M. Piaggi, M. P. 47. rescaling. velocity through sampling Canonical Parrinello, M. Donadio, D. Bussi, G. 43. Hornak V. variationally 42. networks-based Neural Spoel Der 2019, Van D. Parrinello, 41. M. Zhang, L. Bonati, New enhanced in 2: L. reproducibility PLUMED 40. and Bussi, transparency G. Promoting Camilloni, consortium, PLUMED C. The Branduardi, D. 39. Gaussians. Bonomi, adaptive M. with Tribello, Metadynamics A. G. Parrinello, 38. M. Bussi, G. Branduardi, D. 37. land- free-energy high-dimensional of sampling Efficient Bonomi, M. Equation Pfaendtner, Teller, J. E. Teller, 36. H. A. Rosenbluth, N. arXiv:1412.6980 M. optimization. Rosenbluth, W. stochastic A. for Metropolis, method N. A 35. Adam: Ba, with J. approximation Kingma, stochastic P. D. smooth 34. Non-strongly-convex Moulines, E. Bach, F. 33. 6 .J atn,D .Tba,M .Ken osatpesr oeua dynamics molecular pressure Constant Klein, L. M. Tobias, J. D. phases Martyna, condensed in J. order G. local of 46. simulation Computer Weber, A. T. Stillinger, H. F. dynamics. molecular 45. short-range for algorithms parallel Fast Plimpton, S. 44. neural energy- artificial with on problem tutorial many-body quantum “A the Solving Huang, Troyer, M. F. Carleo, G. Ranzato, 32. M. Hadsell, R. Chopra, M S. Robert LeCun, K. Y. Orr, B. 31. G. smoothly Bottou, A L. LeCun, metadynamics: A. Y. Well-tempered 30. Parrinello, M. Bussi, G. Barducci, A. 29. iuain.aXv10.52 (2019). arXiv:1904.05624 simulations. parameters. (2006). backbone 712–725 protein improved of (2005). 1718 2019. August 1 https://www.plumed-nest.org/eggs/ Deposited at 19/060/. Available PLUMED-NEST. sampling. enhanced simulations. molecular bird. old an for feathers metadynamics. bias (2015). parallel with scapes machines. computing fast by (1953). calculations state of (2014). (2013). arXiv:1306.2119 O(1/n). rate convergence algorithms. silicon. of Phys. Phys. Chem. Comput. Theory Chem. networks. 2006). MA, Cambridge, Press, (MIT Eds. Taskar, B. Smola, A. in learning” based Sci. Comput. method. free-energy tunable (2008). and converging –9(1995). 1–19 117, hs e.B Rev. Phys. Science oprsno utpeabrfrefilsaddevelopment and fields force amber multiple of Comparison al., et .Ce.Phys. Chem. J. 40 (2007). 14101 126, –8(2012). 9–48 7700, RMC:Fs,flxbe n free. and flexible, Fast, GROMACS: al., et PNAS 0–0 (2017). 602–606 355, .Bkr .Hfa,B Scholkopf, B. Hofman, T. Bakir, G. Data, Structured Predicting 2257 (1985). 5262–5271 31, a.Methods Nat. 2725 (2012). 2247–2254 8, opt hs Commun. Phys. Comput. | 1748 (1994). 4177–4189 101, etme ,2019 3, September 7–7 (2019). 670–673 16, .Ce.Ter Comput. Theory Chem. J. rtisSrc.Fnt Bioinform. Funct. Struct. Proteins le,Efiin backprop. Efficient uller, ¨ 0–1 (2014). 604–613 185, | hs e.Lett. Rev. Phys. o.116 vol. .Ce.Phys. Chem. J. .Cmu.Chem. Comput. J. | o 36 no. 1087–1092 21, 5062–5067 11, 020603 100, et Notes Lect. .Comput. J. | 1701– 26, 17647 65, J. J.

CHEMISTRY