Neural Networks-Based Variationally Enhanced Sampling
Total Page:16
File Type:pdf, Size:1020Kb
Neural networks-based variationally enhanced sampling Luigi Bonatia,b,c, Yue-Yu Zhangb,d, and Michele Parrinellob,c,d,e,1 aDepartment of Physics, ETH Zurich, 8092 Zurich, Switzerland; bFacolta` di Informatica, Instituto di Scienze Computazionali, Universita` della Svizzera italiana (USI), 6900 Lugano, Switzerland; cNational Center for Computational Design and Discovery of Novel Materials (MARVEL), USI, 6900 Lugano, Switzerland; dDepartment of Chemistry and Applied Biosciences, ETH Zurich, 8092 Zurich, Switzerland; and eComputational Science, Italian Institute of Technology, 16163 Genova, Italy Contributed by Michele Parrinello, July 9, 2019 (sent for review May 8, 2019; reviewed by Giuseppe Carleo and Jim Pfaendtner) Sampling complex free-energy surfaces is one of the main chal- many successes, VES is not without problems. The choice of the lenges of modern atomistic simulation methods. The presence basis set is often a matter of computational expediency and not of kinetic bottlenecks in such surfaces often renders a direct grounded on physical motivations. Representing sharp features approach useless. A popular strategy is to identify a small num- in the VES may require many terms in the basis set expansion. ber of key collective variables and to introduce a bias potential The number of variational parameters scales exponentially with that is able to favor their fluctuations in order to accelerate the number of CVs and can become unmanageably large. Finally, sampling. Here, we propose to use machine-learning techniques nonoptimal CVs may lead to very slow convergence. in conjunction with the recent variationally enhanced sampling In this paper, we use the expressivity (25) of NNs to represent method [O. Valsson, M. Parrinello, Phys. Rev. Lett. 113, 090601 the bias potential and a stochastic steepest descent framework (2014)] in order to determine such potential. This is achieved by for the determination of the NN parameters. In so doing, we expressing the bias as a neural network. The parameters are deter- have developed a more efficient stochastic optimization scheme mined in a variational learning scheme aimed at minimizing an that can also be profitably applied to more conventional VES appropriate functional. This required the development of a more applications. efficient minimization technique. The expressivity of neural net- works allows representing rapidly varying free-energy surfaces, Neural Networks-Based VES CHEMISTRY removes boundary effects artifacts, and allows several collective Before illustrating our method, we recall some ideas of CV-based variables to be handled. enhanced sampling methods and particularly VES. molecular dynamics j enhanced sampling j deep learning Collective Variables. It is often possible to reduce the description of the system to a restricted number of CVs s = s(R), functions achine learning (ML) is changing the way in which modern of the atomic coordinates R, whose fluctuations are critical for Mscience is conducted. Atomistic-based computer simula- the process of interest to occur. We consider the equilibrium tions are no exceptions. Since the work of Behler and Parrinello probability distribution of these CVs as: (1), neural networks (NNs) (2, 3) or Gaussian processes (4) are now almost routinely used to generate accurate potentials. Z e−βU (R) P(s)= dR δ (s − s(R)), [1] More recently, ML methods have been used to accelerate sam- Z pling, a crucial issue in molecular dynamics (MD) simulations, where standard methods allow only a very restricted range of where Z is the partition function of the system, U (R) its potential −1 time scales to be explored. An important family of enhanced energy and β = (kB T ) the inverse temperature. We can define sampling methods is based on the identifications of suitable the associated free-energy surface (FES) as the logarithm of this collective variables (CVs) that are connected to the slowest relax- distribution: ation modes of the system (5). Sampling is then enhanced by constructing an external bias potential V (s), which depends on the chosen CVs s. In this context, ML has been applied in Significance order to identify appropriate CVs (6–10) and to construct new methodologies (11–15). From these early experiences, it is also Atomistic-based simulations are one of the most widely used clear that ML applications can in turn profit from enhanced tools in contemporary science. However, in the presence of sampling (16). kinetic bottlenecks, their power is severely curtailed. In order Here, we shall focus on a relatively new method, called Vari- to mitigate this problem, many enhanced sampling techniques ationally Enhanced Sampling (VES) (17). In VES, the bias is have been proposed. Here, we show that by combining a vari- determined by minimizing a functional Ω = Ω[V (s)]. This func- ational approach with deep learning, much progress can be tional is closely related to a Kullback–Leibler (KL) divergence made in extending the scope of such simulations. Our devel- (18). The bias that minimizes Ω is such that the probability opment bridges the fields of enhanced sampling and machine distribution of s in the biased ensemble PV (s) is equal to a pre- learning and allows us to benefit from the rapidly growing assigned target distribution p(s). The method has been shown advances in this area. to be flexible (19, 20) and has great potential also for appli- cations different from enhanced sampling. Examples of these Author contributions: L.B. and M.P. designed research; L.B. and Y.-Y.Z. performed heterodox applications are the estimation of the parameters of research; L.B., Y.-Y.Z., and M.P. analyzed data; and L.B. and M.P. wrote the paper.y Ginzburg–Landau free-energy models (18), the calculation of Reviewers: G.C., Flatiron Institute; and J.P., University of Washington.y critical indexes in second-order phase transitions (21), and the The authors declare no conflict of interest.y sampling of multithermal–multibaric ensembles (22). Published under the PNAS license.y Although different approaches have been suggested (23, 24), 1 To whom correspondence may be addressed. Email: [email protected] the way in which VES is normally used is to expand V (s) This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. in a linear combination of orthonormal polynomials and use 1073/pnas.1907975116/-/DCSupplemental.y the expansion coefficients as variational parameters. Despite its Published online August 15, 2019. www.pnas.org/cgi/doi/10.1073/pnas.1907975116 PNAS j September 3, 2019 j vol. 116 j no. 36 j 17641–17647 Downloaded by guest on September 24, 2021 1 F (s) = − log P(s): [2] β Then, an external bias is built as a function of the chosen CVs in order to enhance sampling. In umbrella sampling (26), the bias is static, while in metadynamics (27), it is iteratively built as a sum of repulsive Gaussians centered on the points already sampled. The Variational Principle. In VES, a functional of the bias potential is introduced: 1 R dse−β(F(s)+V (s)) Z Ω[V ] = log + ds p(s)V (s), [3] β R dse−βF(s) where p(s) is a chosen target probability distribution. The func- tional Ω is convex (17), and the bias that minimizes it is related to the free energy by the simple relation: 1 F (s)= −V (s) − log p(s): [4] β At the minimum, the distribution of the CVs in the biased ensemble is equal to the target distribution: pV (s) = p(s), [5] Fig. 1. NN representation of the bias. The inputs are the chosen CVs, whose values are propagated across the network in order to get the bias. The parameters are optimized according to the variational principle of Eq. 3. where pV (s) is defined as: −β(F(s)+V (s)) e Here, the nonlinear activation function is taken to be a rectified pV (s)= R : [6] dse−β(F(s)+V (s)) linear unit. In the last layer, only a linear combination is done, and the output of the network is the bias potential. In other words, p(s) is the distribution the CVs will follow when We are employing NNs since they are smooth interpolators. the V (s) that minimizes Ω is taken as bias. This can be seen Indeed, the NN representation ensures that the bias is continu- also from the perspective of the distance between the distribu- ous and differentiable. The external force acting on the ith atom tion in the biased ensemble and the target one. The functional can be then recovered as: can be indeed written as βΩ[V ] = DKL(p k pV ) − DKL(p k P) n (18), where DKL denotes the KL divergence. X @V Fi = −∇Ri V = − rRi sj , [9] @sj The Target Distribution. In VES, an important role is played by j =1 the target distribution p(s). A careful choice of p(s) may focus where the first term is efficiently computed via back-propagation. sampling in relevant regions of the CVs space and, in gen- The coefficients fw i g and fbi g that we lump in a single vector eral, accelerate convergence (28). This freedom has been taken w will be our variational coefficients. With this bias represen- advantage of in the so called well-tempered VES (19). In this tation, the functional Ω[V ] becomes a function of the param- variant, one takes inspiration from well-tempered metadynamics eters w. Care must be taken to preserve the symmetry of the (29) and targets the distribution: CVs, such as the periodicity. In order to accelerate conver- gence, we also standardize the input to have mean zero and e−βF(s)/γ p(s) = / [P(s)]1/γ , [7] variance one (30). R ds e−βF(s)/γ The Optimization Scheme.