Arxiv:2101.11588V3 [Cs.LG] 29 Mar 2021 and Many More
Total Page:16
File Type:pdf, Size:1020Kb
Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks Daniel Schwalbe-Koda,∗) Aik Rui Tan,∗) and Rafael G´omez-Bombarelli†) Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139 (Dated: 30 March 2021) Neural network (NN) interatomic potentials provide fast prediction of potential energy surfaces, closely match- ing the accuracy of the electronic structure methods used to produce the training data. However, NN predic- tions are only reliable within well-learned training domains, and show volatile behavior when extrapolating. Uncertainty quantification approaches can flag atomic configurations for which prediction confidence is low, but arriving at such uncertain regions requires expensive sampling of the NN phase space, often using atomistic simulations. Here, we exploit automatic differentiation to drive atomistic systems towards high-likelihood, high-uncertainty configurations without the need for molecular dynamics simulations. By performing adver- sarial attacks on an uncertainty metric, informative geometries that expand the training domain of NNs are sampled. When combined to an active learning loop, this approach bootstraps and improves NN potentials while decreasing the number of calls to the ground truth method. This efficiency is demonstrated on sampling of kinetic barriers and collective variables in molecules, and can be extended to any NN potential architecture and materials system. I. INTRODUCTION along a simulation may negate some of the acceleration provided by ML models. In addition, exhaustive explo- Recent advances in machine learning (ML) techniques ration or data augmentation of the input space is in- have enabled the study of increasingly larger and more tractable. Therefore, assessing the trustworthiness of NN complex materials systems.1{3 In particular, ML-based predictions and systematically improving them is funda- atomistic simulations have demonstrated predictions of mental for deploying ML-accelerated tools to real world potential energy surfaces (PESes) with accuracy com- applications, including the prediction of materials prop- parable to ab initio simulations while being orders of erties. magnitude faster.4{6 ML potentials employing kernels Quantifying model uncertainty then becomes key, since or Gaussian processes have been widely used for fit- it allows distinguishing new inputs that are likely to be ting PESes,7{9 and are particularly effective in low-data informative and worth labeling with ab initio simula- regimes. For systems with greater diversity in chemical tions, from those close to configurations already repre- composition and structures, such as molecular confor- sented in the training data. In this context, epistemic un- mations or reactions, larger training datasets are typi- certainty | the model uncertainty arising from the lack cally needed. Neural networks (NNs) can fit interatomic of appropriate training data | is much more relevant potentials to extensive datasets with high accuracy and to ML potentials than the aleatoric uncertainty, which lower training and inference costs.10,11 Over the last arises from noise in the training data. Whereas ML-based years, several models have combined different represen- interatomic potentials are becoming increasingly popu- tations and NN architectures to predict PESes with in- lar, uncertainty quantification applied to atomistic sim- creasing accuracy.11{15 They have been applied to pre- ulations is at earlier stages.27,28 ML potentials based on dict molecular systems,16,17 solids,18 interfaces,19 chem- Gaussian processes are Bayesian in nature, and thus ben- ical reactions,20,21 kinetic events,22 phase transitions23 efit from an intrinsic error quantification scheme, which 4,6 9,29 arXiv:2101.11588v3 [cs.LG] 29 Mar 2021 and many more. has been applied to train ML potentials on-the-fly or Despite their remarkable capacity to interpolate be- to accelerate nudged elastic band (NEB) calculations.30 tween data points, NNs are known to perform poorly NNs do not typically handle uncertainty natively and it is outside of their training domain24,25 and may fail catas- common to use approaches that provide distributions of trophically for rare events, such as those occurring in predictions to quantify epistemic uncertainty. Strategies atomistic simulations with large sizes or time scales not such as Bayesian NNs,31 Monte Carlo dropout32 or NN explored in the training data. Increasing the amount committees33{35 allow estimating the model uncertainty and diversity of the training data is beneficial to improve by building a set of related models and comparing their performance,21,26 but there are significant costs associ- predictions for a given input. In particular, NN commit- ated to generating new ground-truth data points. Con- tee force fields have been used to control simulations,36 to tinuously acquiring more data and re-training the NN inform sampling strategies37 and to calibrate error bars for computed properties.38 Even when uncertainty estimates are available to dis- ∗)D.S.-K. and A.R.T. contributed equally to this work tinguish informative from uninformative inputs, ML po- ) † Electronic mail: [email protected] tentials rely on atomistic simulations to generate new 2 trial configurations. For example, it is common to per- II. THEORY form molecular dynamics (MD) simulations with NN- based models to expand their training set in an active A. Neural network potentials learning (AL) loop.21,26,39 MD simulations explore phase space based on the thermodynamic probability of the An NN potential is a hypothesis function h that pre- PES. Thus, in the best case, ML-accelerated MD sim- θ dicts a real value of energy E^ = h (X) for a given atom- ulations produce outputs correlated to the training set if θ istic configuration X as input. X is generally described the ML potential accurately reproduces the ground truth by n atoms with atomic numbers Z n and nuclear PES. However, exploring this region only provides incre- Z+ coordinates R n×3. Energy-conserving2 atomic forces mental improvement to the potentials, which will still R F on atom i 2and Cartesian coordinate j are obtained struggle to observe rare events. In the worst case, MD ij by differentiating the output energy with respect to the trajectories can be unstable when executed with an NN atomic coordinates r , potential and sample unrealistic events that are irrele- ij vant to the true PES, especially in early stages of the AL cycle when the NN training set is not representa- @E^ tive of the overall configuration space. Gathering data F^ij = : (1) @r from ab initio MD prevents the latter issue, although at − ij a higher computational cost. Even NN simulations need The parameters θ are trained to minimize the expected to sample very large amounts of low uncertainty phase loss given the distribution of ground truth data space before stumbling upon uncertain regions. Some (X; E;LF) according to the dataset , works avoid performing dynamic simulations, but still re- D quire forward exploration of the PES to find new training points.40 Hence, one of the major bottlenecks for scaling min E [ (X; E; F; θ)] : (2) up NN potentials is minimizing their extrapolation errors θ (X;E;F)∼D L until they achieve self-sufficiency to perform atomistic simulations within the full phase space they will be used During training, the loss is usually computed by tak- in, including handling rare events. Inverting the problem ing the average mean squaredL error of the predicted and of exploring the configuration space with NN potentials target properties within a batch of size N, would allow for a more efficient sampling of transition states and dynamic control.41,42 N 1 X h 2 2i = αE Ei E^i + αF Fi F^ i ; (3) L N k − k k − k i=1 where αE and αF are coefficients indicating the trade- In this work, we propose an inverse sampling strat- off between energy and force-matching during training.12 egy for NN-based atomistic simulations by perform- The training proceeds using stochastic gradient descent- ing gradient-based optimization of a differentiable, based techniques. likelihood-weighted uncertainty metric. Building on the concept of adversarial attacks from the ML literature,43,44 new molecular conformations are sampled B. Uncertainty quantification by backpropagating atomic displacements to find local optima that maximize the uncertainty of an NN commit- To create a differentiable metric of uncertainty, we tee while balancing thermodynamic likelihood. These turned to NN committees. These are typically imple- new configurations are then evaluated using atomistic mented by training different hθ and obtaining a distribu- simulations (e.g. density functional theory or force fields) tion of predictions for each input X. For example, given and used to retrain the NNs in an AL loop. The tech- (m) M models implementing E^(m) = h (X), the mean and nique is able to bootstrap training data for NN potentials θ the variance of the energy of an NN potential ensemble starting from few configurations, improve their extrap- can be computed as olation power, and efficiently explore the configuration space. The approach is demonstrated in several atom- istic systems, including finding unknown local minima M 1 X in a toy two-dimensional PES, improving kinetic barrier E¯(X) = E^(m)(X); (4) M predictions for nitrogen inversion, increasing the stabil- m=1 ity of MD simulations in molecular systems, and sam- M 1 X pling of collective variables in alanine dipeptide. This σ2 (X) = E^(m)(X) E¯(X) 2; (5) E M 1 k − k work provides a new method to explore potential energy − m=1 landscapes without the need for brute-force ab initio MD simulations to propose trial configurations. and similarly for forces, 3 FIG. 1. a, Schematic diagram of the method. Nuclear coordinates of an input molecule are slightly displaced by δ. Then, a potential energy surface (PES) and its associated uncertainty are calculated with an NN potential committee. By backpropa- gating an adversarial loss through the NN committee, the displacement δ can be updated using gradient ascent techniques until the adversarial loss is maximized, thus converging to states that compromise high uncertainty with low energy.