IMPLEMENTATION OF METHODS TO ACCURATELY PREDICT TRANSITION PATHWAYS AND THE UNDERLYING POTENTIAL ENERGY SURFACE OF BIOMOLECULAR SYSTEMS

By DELARAM GHOREISHI

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2019 c 2019 Delaram Ghoreishi I dedicate this dissertation to my mother, my brother, and my partner. For their endless love, support, and encouragement. ACKNOWLEDGMENTS I am thankful to my advisor, Adrian Roitberg, for his guidance during my graduate studies. I am grateful for the opportunities he provided me and for allowing me to work independently. I also thank my committee members, Rodney Bartlett, Xiaoguang Zhang, and Alberto Perez, for their valuable inputs. I am grateful to the University of Florida Informatics Institue for providing financial support in 2016, allowing me to take a break from teaching and focus more on research. I like to acknowledge my group members and friends for their moral support and technical assistance. Natali di Russo helped me become familiar with Amber. I thank Pilar Buteler, Sunidhi Lenka, and Vinicius Cruzeiro for daily conversations regarding science and life. Pancham Lal Gupta was my cpptraj encyclopedia. I thank my physicist colleagues, Ankita Sarkar and Dustin Tracy, who went through the intense physics coursework with me during the first year. I thank Farhad Ramezanghorbani, Justin Smith, Kavindri Ranasinghe, and Xiang Gao for helpful discussions regarding ANI and active learning. I also thank David Cerutti from Rutgers University for his help with NEB implementation. I thank Pilar Buteler and Alvaro Gonzalez for the good times we had camping and climbing. Lastly, I express my sincere gratitude to Farhad Ramezanghorbani for always being there for me, for encouraging me, and for his significant scientific inputs. I also thank my mother, Fatemeh Kaheh, and my brother, Ramin Ghoreishi, for their love and encouragement at every step of my life. I am forever grateful to all three of them.

4 TABLE OF CONTENTS page ACKNOWLEDGMENTS...... 4 LIST OF TABLES...... 8 LIST OF FIGURES...... 9 ABSTRACT...... 11

CHAPTER 1 INTRODUCTION...... 13 1.1 Minimum Energy Path Sampling...... 13 1.2 with Machine Learned Potentials...... 15 2 THEORY AND METHODS...... 17 2.1 Statistical Mechanics...... 17 2.1.1 Statistical Ensembles...... 18 2.1.2 Microcanonical Ensembles: Constant N-V-E...... 19 2.1.3 Canonical Ensembles: Constant N-V-T...... 20 2.1.4 Isothermal-Isobaric Ensembles: Constant N-P-T...... 20 2.1.5 Grand Canonical Ensembles: Constant µ-V-T...... 21 2.2 Nudged Elastic Band...... 21 2.3 String Method...... 25 2.4 Free Energy and Transition Rate Calculations...... 27 2.5 Computational Methods of Free Energy Calculations...... 28 2.5.1 Free Energy Perturbation...... 29 2.5.2 Thermodynamic Integration...... 31 2.5.3 Bennett Acceptance Ratio...... 31 2.6 Indirect Approach to Free Energy Calculations...... 33 2.7 Feed-Froward Neural Networks...... 35 2.8 Active Learning...... 38 2.9 Transfer Learning...... 39 2.10 ANI Neural Network Potentials...... 40 2.10.1 Network Architecture...... 40 2.10.2 Sampling the Chemical Space...... 42 2.10.2.1 Normal Mode Sampling...... 44 2.10.2.2 Molecular Dynamics Sampling...... 44 3 IMPLEMENTATION...... 45 3.1 Implementation of Nudged Elastic Band in Amber...... 45 3.2 Implementaion of ANI-Amber Interface...... 47 3.3 Sample Amber Input Files...... 50

5 3.3.1 Sample Input File for NEB Simulations...... 50 3.3.2 Sample Input File for ANI Simulations...... 51 4 NUDGED ELASTIC BAND: VALIDATION AND RESULTS...... 53 4.1 Computational Details...... 53 4.1.1 Test Case 1: Conformational Change of Alanine Dipeptide...... 53 4.1.2 Test Case 2: α-helix to β-sheet Transition in Polyalanine...... 54 4.1.3 Test Case 3: Base Eversion Pathway of the OGG1–DNA Complex... 55 4.2 Accuracy Tests...... 55 4.2.1 Test Case 1: Conformational Change of Alanine Dipeptide...... 55 4.2.2 Test Case 2: α-helix to β-sheet Transition in Polyalanine...... 57 4.2.3 Test Case 3: Base Eversion Pathway of the OGG1–DNA Complex... 58 4.3 Timing Benchmarks...... 59 5 FREE ENERGY METHODS WITH MACHINE LEARNING...... 64 5.1 Two Dimensional Energy Surface with ANI-Amber...... 64 5.2 End-State Free Energy Corrections...... 65 5.2.1 Conformational Free Energy with ANI-Amber...... 66 5.2.2 Hydration Free Energy with ANI-Amber...... 69 5.2.2.1 Data preparation and network training...... 70 5.2.2.2 Energy prediction results...... 71 6 CONCLUDING REMARKS AND FUTURE DIRECTIONS...... 74 6.1 Final Remarks on Nudged Elastic Band...... 74 6.2 Final Remarks on Free Energy Calculations with ANI-Ambers...... 75

APPENDIX A KABSCH ALGORITHM...... 77 B PARAMETERIZATION OF A CURVE...... 79 B.1 Re-parameterization of a Curve...... 79 B.2 Arclength of a Curve...... 79 B.3 Arclength Parameterization...... 80 C DERIVATION OF EQUATION (2-42)...... 81 D PENALTY METHOD...... 82 E TWO DIMENSIONAL TEST POTENTIALS...... 83 E.1 LEPS Potential...... 83 E.2 LEPS Harmonic Oscillator Potential...... 83 F ALANINE DIPEPTIDE CONFORMATIONAL CHANGE...... 85

6 REFERENCES...... 86 BIOGRAPHICAL SKETCH...... 95

7 LIST OF TABLES Table page 5-1 Free energy difference of cis/trans conformational transition...... 69

8 LIST OF FIGURES Figure page 2-1 Mass and spring representation of NEB...... 22 2-2 Force decoupling representation of NEB...... 24 2-3 Thermodynamic cycle of indirect approach...... 34 2-4 Feed-forward neural network...... 35 2-5 Active learning work-flow...... 39 2-6 Transfer learning work-flow...... 41 2-7 Radial symmetry functionals...... 41 2-8 ANI neural network potential...... 43 3-1 MPI implementation of NEB in Amber...... 46 3-2 GPU implementation of NEB in Amber...... 47 3-3 Amber molecular dynamics...... 48 3-4 ANI-Amber molecular dynamics...... 49 4-1 Alanine dipeptide potential energy surface...... 56 4-2 Energy of NEB replicas in alanine dipeptide...... 57 4-3 End to end distance in polyalanine...... 58 4-4 Glycosidic angle vs. eversion distance...... 60 4-5 Performance comparison between sander and pmemd...... 60 4-6 Performance comparison for different nebfreq values...... 62 4-7 Performance dependence on the size of the data transfers...... 63 5-1 Ethylene glycol...... 64 5-2 Two-dimensional energy surface with GAFF...... 65 5-3 Two-dimensional energy surface with ANI-1x...... 66 5-4 cis conformer N-methyl acetamide...... 66 5-5 trans conformer N-methyl acetamide...... 67 5-6 Umbrella sampling...... 68

9 5-7 Endpoint correction...... 69 5-8 Data preparation...... 70 5-9 Energy prediction results...... 72 5-10 Cumulative number of data-points...... 73 F-1 φ dihedral angle change of NEB replicas in alanine dipeptide...... 85 F-2 ψ dihedral angle change of NEB replicas in alanine dipeptide...... 85

10 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy IMPLEMENTATION OF METHODS TO ACCURATELY PREDICT TRANSITION PATHWAYS AND THE UNDERLYING POTENTIAL ENERGY SURFACE OF BIOMOLECULAR SYSTEMS By Delaram Ghoreishi December 2019 Chair: Adrian Roitberg Major: Physics This thesis focuses on the fast implementation of the nudged elastic band (NEB) method, and the implementation of the interface of a deep neural network potential into the AMBER molecular dynamics package. The details of the implementations and validation results are explored within this document. The reliability of physics-based simulations is restricted by the accuracy of potential energies that regulate the dynamics of a system of particles, as well as the efficiency and the precision of the advanced sampling techniques. Biological systems often experience transitions that completely change their conformations and functionalities. Locating minimum energy pathways (MEP) of such transitions provides an insightful understanding of their properties. Experimental and conventional computational methods, however, are limited to sampling structures around the minimum energy states of the system. Replicating the transition path requires methods that are able to correctly identify the unstable conformations along the path, including the transition states. In this document, we explain how NEB overcomes these sampling issues and successfully predicts the MEP. And we provide a graphics processing unit (GPU) accelerated implementation of NEB into the particle mesh Ewald molecular dynamics (pmemd) module of AMBER. This GPU accelerated implementation significantly enhances the MEP predictions of biomolecules experiencing

11 conformational transitions in high dimensional phase space, without a priori knowledge of the reaction coordinates. On another note, the applicability of precise ab initio methods is limited in scenarios that demand fast and cost-effective predictions of molecular interactions in complex systems. Advances in computing units, specifically GPUs, along with automated data-driven machine learning (ML) have significantly changed the way scientific research is conducted over the past decade. ANAKIN-ME (ANI) is a deep neural network potential that is trained to produce high precision quantum mechanics (QM) energies and forces at significant speedups over ab initio methods. The interface between ANI and AMBER software suite allows computational scientists to perform molecular dynamics simulations that are as accurate as QM methods at speeds comparable to the classical force fields. Through this interface, it is also possible to use other available features implemented in the AMBER package like NEB, constant pH, and replica exchange molecular dynamics.

12 CHAPTER 1 INTRODUCTION 1.1 Minimum Energy Path Sampling∗

The statistical behavior gleaned from simulations of biomolecules yields detailed information about observed biochemical phenomena. Computational biologists can probe the frequency and mechanism of rare transitions that are hard to observe in experiments by locating the minimum free energy pathways. However, traditional molecular dynamics of proteins and biopolymers often fail to sample these important transitions, as the systems are thermally limited to low energy states on a rugged free energy surface. Precision and reproducibility in the result have been limited by the cost of the calculations. The use of graphics processing units (GPUs) significantly accelerates these intensive calculations, offering a base multiplier for enhanced sampling strategies that can be implemented on their advanced architecture, but the multiplier by itself is not enough. The community needs a set of efficient simulation algorithms implemented on vector-accelerated architectures that enhance the exploration of free energy surfaces for detecting the multitude of rare transition pathways. Different methods have been developed for finding transition pathways1;2;3;4. Some depend only on the initial structure and follow a minimum ascent path to reach a final structure5;6;7;8. It is not guaranteed, however, that the desired final structure is reached2. Other methods use the second derivatives of the potential energy function to locate the saddle points9;10;11. Once the saddle points are identified, local minimum states can be found using the steepest descent algorithm. But, since calculation and diagonalization of the second derivative matrix at each step of the simulation is expensive, these methods are applicable only to small systems. Other approaches determine the path when both initial and final states are identified12;13. Among these methods, chain-of-states algorithms12;13;14;15

∗ Section 1.1 was reprinted/adapted with permission from Ghoreishi, D.; Cerutti, D. S.; Fallon, Z.; Simmerling, C.; Roitberg, A. E. Fast Implementation of the Nudged Elastic Band Method in AMBER. J. Chem. Theory Comput. 2019, 15, 46994707.

13 are compelling as they adjust and scale with resources to produce the desired precision and efficiency. The nudged elastic band (NEB)14 method in combination with simulated annealing, for instance, has proven successful in determining minimum energy paths (MEP) for rugged energy surfaces16;17. A minimum energy path is a transition passage connecting the initial and final states, for which any point along the path is at the minimum energy value compared to other positions in the hyperplane perpendicular to MEP at that specific point. Hence, the perpendicular component of the gradient of the potential energy at any point along the path is zero. NEB14, first proposed by J´onssonet al., is an evolution of methods such as self penalty walk (SPW)18, locally updated planes (LUP)19, and elastic band20. NEB results in a continuous representation of MEP by simultaneous energy minimization of a series of connected replicas. This continuity is not guaranteed in LUP, in which the initial choice of the pathway could affect convergence to a connected path. In SPW, the converged path is not the MEP per se, but a path along which the averaged potential energy is minimized20. Moreover, the elastic band can result in corner cutting and sliding down; problems which are easily excluded in NEB by force decoupling14. A more sophisticated approach to finding MEP is the string method15, which results in a smoother path compared to NEB since it uses higher order interpolation schemes. In the string method, it is also possible to change the number of replicas on the fly. Moreover, methods such as temperature-dependent NEB (tNEB)21 and finite temperature string method4 account for temperature corrections to MEP. The purpose of this work is to accelerate the current implementation of NEB in AMBER suite of molecular dynamics software22, in a way that is easily extendable to other chain-of-states methods for searching minimum energy paths such as the string method. Partial nudged elastic band (PNEB)16–which exclusively applies the NEB forces to a user-defined subset of atoms–is the supported implementation of NEB in the AMBER18. PNEB allows the use of NEB in systems with explicit solvent molecules, which are left alone to relax and adapt to the conformational changes of the system, free of any additional restraints

14 aside from their physical interactions with other molecules defined by force field parameters. Furthermore, this method decreases the communication overhead between the nodes by incorporating fewer atoms in the NEB calculations, which leads to better scaling of the code. We have extended PNEB by incorporating the routines into the particle mesh Ewald molecular dynamics (PMEMD) module of AMBER and further accelerated it by implementing those routines with CUDA to operate on NVIDIA GPUs. In this implementation, a shuttle transfer was developed which minimizes the amount of data that must traverse a message passing interface (MPI) between GPUs. Additional performance enhancement has been made possible through the usage of a flag which enables the users to control the frequency at which NEB forces are computed. Running on NVIDIA Tesla-P100 GPUs in parallel, the AMBER18 GPU-accelerated PNEB executes simulations more than 60 times faster than a two-core Intel Xeon Platinum 8160 CPU processor tasked with the same problem. Numerical precision is uncompromised. The new implementation enhances the study of properties of biomolecules due to conformational transitions in a multidimensional phase space of thousands of atoms–which was otherwise hard to study with current CPU architectures. Our implementation of NEB for multi-GPU execution within the AMBER software suite could aid computational scientists in developing new drug compounds and novel materials by applying these powerful algorithms on commodity hardware. 1.2 Molecular Dynamics with Machine Learned Potentials

Computational simulations have revolutionized chemistry and physics by providing insight into physical phenomena at the atomic and molecular levels. Molecular properties can be obtained through a description of the electronic structure of any molecular system, which can be derived from high-level ab initio quantum mechanics (QM)23;24. As soon as the number of electrons increases, the simple one-electron wavefunction needs to be replaced by a many-electron wavefunction to incorporate many-electron interactions25;26;27. The cost of these calculations becomes prohibitively high for any system with more than two atoms. Various numerical approximations and computational techniques have been proposed to

15 accommodate the substantial cost of these calculations in order to generate results in a reasonable time-frame. Hartree-Fock28;29;30;31 and post-Hartree-Fock32;33, different forms of density functional theory (DFT)34;35, empirical and classical physics-based methods36;37, are all widely used to approximate the exact solution. These approximations come at a cost: generally, the more accurate the method is, the more computationally expensive it is. Among these techniques, coupled cluster theory is considered the gold standard in computational chemistry that can deliver an accurate solution for many systems by accounting for electron correlation38;39;40. High accuracy coupled cluster theory methods such as CCSD(T)/CBS, however, are computationally expensive, with a scaling of O(N 7), and are applicable only to tens of atom systems41;42. On the other end of the spectrum, we find fast but less accurate classical physics-based methods which parametrize a set of variables with a predefined functional form (a so-called force field) to reproduce experimental or quantum mechanical results36;43. These methods have been widely used to study large bio-molecular systems, involving dynamical processes of proteins and drug molecules44;45. Computational techniques have become integral for early stages of drug discovery in cases where the selected methods need to match the fast pace of drug design settings but in a cost-effective manner. The applicability of highly accurate methods is limited in such scenarios as they require massive computational resources. A revolution will come from a fast but accurate method that could combine the two ends of the spectrum–perform as fast as classical force fields, and as accurate as high-level QM models.

16 CHAPTER 2 THEORY AND METHODS 2.1 Statistical Mechanics

Two different approaches can be utilized to describe properties of a thermodynamic system; macroscopic or microscopic. It is from these approaches that the two branches of thermodynamics–classical and statistical–emerge. In the macroscopic approach, the large scale properties of the system are considered. These properties can be perceived by human senses without the aid of magnifying devices. Temperature, pressure, and volume are examples of macroscopic properties. The microscopic approach deals with the statistical properties of a large number of particles–on the order of ’s number. Velocity, momentum, and kinetic energy are examples of microscopic properties. These two approaches, however distinct, are very well connected with a function named partition function. From the partition function, all the macroscopic properties of a system in equilibrium can be derived. In a system with N particles, a microstate is a specific configuration that the particles of the system may occupy. If the system is in equilibrium, each microstate will occur with a certain probability, and the system is free to switch between different microstates, as long as its macroscopic properties are unchanged; that is, even though the macroscopic properties are constant, the system itself is dynamic. Given a defined set of macroscopic properties, various possible microstates of the system (considered all at once) form an ensemble. In other words, the statistical properties of a system are dependent on the distribution of the possible microstates of the system, and the knowledge of statistics can help us perceive a system consisting of discrete particles as a whole. In this picture, the partition function is a sum of weighted energetic functions of the allowed microstates where the purpose of the weights is to take into account the probability of occurrence of individual microstates. The rest of this section establishes the basis of statistical ensembles and their partition functions for classical systems.

17 2.1.1 Statistical Ensembles

The Hamiltonian of a system consisting of N particles with decoupled kinetic and potential energies can be written as:

3N X p2 H(p, q) = i + U(q , q , ..., q ) (2-1) 2m 1 2 3N i=1 where q and p are the 3N sets of coordinates and momenta, respectively. From the Hamilton’s equations ∂H ∂H q˙i = p˙i = − (2-2) ∂p˙i ∂q˙i it is possible to study the time evolution of a system with a known initial position (q0, p0) in the 6N-dimensional phase space. However, these sets of equations can be solved analytically only in a few cases. Most of the times, numerical methods are required to investigate the evolution of the system. But, the accuracy of the solutions solved numerically is contingent on computer precision. Nevertheless, we are more concerned with macroscopic properties of a thermodynamic system than the exact solution of its constituents’ states over time. For an ergodic Hamiltonian46;47, the time average is the same as the ensemble average. Hence, the macroscopic properties can be derived from a computer simulation over a sufficiently long time to allow the system to pass through all possible microstates corresponding to a specific ensemble. For any physical quantity f, the ensemble average < f > is equal to:

R f(q, p)p(q, p)dω < f >= (2-3) R p(q, p)dω

with p(q, p) being the probability of the appearance of the representative microstates in the phase space and dqdp d3N qd3N p dω = = (2-4) N!(2π~)3N N!(2π~)3N being the normalized volume element of the phase space, which is equivalent to the volume of the spherical shell between the limits (p − 1/2∆, q − 1/2∆) and (p + 1/2∆, q + 1/2∆)

3N 1 divided by the volume corresponding to a single microstate ω0 = (2π~) . The N! term is

18 a correction factor to account for the permutation of the N indistinguishable particles, and

~ is the reduced Planck’s constant. According to the Heisenberg uncertainty principle48, the minimum uncertainty of the simultaneous measurement of a 6N-dimensional coordinates and momenta of a single particle is of the order of:

3N Y 3N ∆q∆p = ∆pi∆qi ' (~/2) (2-5) i=1 which is of the order of the volume assigned to one microstate ω0. Other considerations contributing to the derivation of the exact value of the volume of one microstate is out of the scope of this work and can be pursued in the work of Bose 49. The normalizing factor in the denominator of equation 2-3 is the partition function. 2.1.2 Microcanonical Ensembles: Constant N-V-E

Consider an isolated system of N identical and indistinguishable particles, confined to volume V with a given total energy E. This system evolves on a (6N-1)-dimensional constant energy hypersurface. As a result of the equal a priori probabilities, the system can be at any point on this surface with equal likelihood. So the probability of finding a particular microstate can be written as: δ(H(q, p) − E) p(q, p) = (2-6) R δ(H(q, p) − E)dω The denominator of the above equation is the partition function of the microcanonical ensemble. 1 Z Ω(N,V,E) = δ(H(q, p) − E)d3N qd3N p (2-7) N!(2π~)3N The entropy of the system is related to the partition function by:

S(N,V,E) = kB ln Ω(N,V,E) (2-8)

in which kB is the Boltzmann constant.

19 2.1.3 Canonical Ensembles: Constant N-V-T

The canonical ensemble is a closed system of N particles confined to volume V that is in equilibrium with a heat reservoir with temperature T. In this system, the canonical probability of finding a particular microstate at a certain point in the phase space depends on the corresponding energy value of the Hamiltonian at that point.

exp(−H(q, p)/kBT ) p(q, p) = R (2-9) exp(−H(q, p)/kBT )dω

The canonical ensemble partition function acts as a normalizing factor.

Z 1 3N 3N Q(N,V,T ) = exp(−H(q, p)/kBT )d qd p (2-10) N!(2π~)3N

The Helmholtz free energy is related to the partition function by:

A(N,V,T ) = −kBT ln Q(N,V,T ) (2-11)

2.1.4 Isothermal-Isobaric Ensembles: Constant N-P-T

In an isothermal-isobaric ensemble, the system has a fixed number of particles N at a constant pressure P and is in equilibrium with a heat reservoir with temperature T. The probability of a microstate of this system having pressure P and total energy E corresponding to Hamiltonian H(q, p) is equal to:

exp((−H(q, p) − PV )/kBT ) p(q, p) = R ∞ R (2-12) 0 exp((−H(q, p) − PV )/kBT )dΩdV in which dqdp d3N qd3N p dΩ = = (2-13) N!(λ)3N N!(λ)3N with λ being the thermal de Broglie wavelength, acting as a normalization factor coming from the 3N integrals over the momenta:

Z ∞ 2 3N 3N/2 exp(−p /2mkBT )d p = (2mπkBT ) (2-14) −∞

20 h λ = √ (2-15) 2πmkBT The isothermal-isobaric partition function is equal to:

Z ∞ Z 1 3N 3N ∆(N,P,T ) = 3N exp((−H(q, p) − PV )/kBT )d qd p dV (2-16) N!(λ) 0

The Gibbs free energy is related to the partition function by:

G(N,P,T ) = −kBT ln ∆(N,P,T ) (2-17)

2.1.5 Grand Canonical Ensembles: Constant µ-V-T

Consider a system confined to volume V that is allowed to exchange particles and energy with a large reservoir. At equilibrium, the reservoir and the system have a common temperature T and chemical potential µ. The probability of a microstate of this system having

N particles and total energy of EN corresponding to Hamiltonian H(q, p) is equal to:

exp((−H(q, p) − µN)/kBT ) p(q, p) = P∞ R (2-18) N=0 exp((−H(q, p) − µN)/kBT )dω

with the grand canonical partition function being:

∞ X 1 Z Ξ(µ, V, T ) = exp((−H(q, p) − µN)/k T )d3N qd3N p (2-19) N!(2π )3N B N=0 ~ The pressure of the system is related to the partition function by:

k T P (µ, V, T ) = B ln Ξ(µ, V, T ) (2-20) V

2.2 Nudged Elastic Band∗

NEB applies a series of harmonic restraints between replicas (or images), which are first generated along a putative pathway. Replicas are linked to their nearest neighbors via springs,

∗ Section 2.2 was reprinted/adapted with permission from Ghoreishi, D.; Cerutti, D. S.; Fallon, Z.; Simmerling, C.; Roitberg, A. E. Fast Implementation of the Nudged Elastic Band Method in AMBER. J. Chem. Theory Comput. 2019, 15, 46994707.

21 such that the entire system represents a discrete pathway, from reactants to products. The purpose of the springs is to distribute the replicas along the path and prevent them from sliding down to the minimum states14. The replicas evolve into a discrete representation of MEP by simultaneous energy minimization of the entire chain. Setting N − 2 replicas between the initial and final states, positions of the discrete points can be denoted by the array

[R1, R2, R3, ..., RN ], where R1 and RN are the two endpoints which are kept fixed in the phase space throughout the simulation. Some variants of NEB algorithms do not require fixed endpoints50;51;52. Figure 2-1 shows a mass and spring representation of NEB. Each replica is an atomic representation of the system at a certain position along the pathway that connects the initial and final states.

Figure 2-1. Mass and spring representation of the nudged elastic band method. Each circle represents an individual simulation called replica which is bound to its nearest neighbors by harmonic potentials modeled as springs in the figure.

If no guess for the reaction coordinate is available, this pathway can be constructed by placing half of the replicas on or close to the initial structure and the other half on or close to the final structure. This way of initializing the path requires shorter timesteps and weaker springs at the beginning of the simulation to ensure that the very stretched central spring will not exert strong forces on the particles of the systems, allowing the ensemble to slowly approach a smooth path. Translational and rotational differences of the adjacent replicas should be minimized before the calculation of the spring forces17. First, the translational differences are removed by placing the origin of the coordinate system on the corresponding center of mass (COM) coordinates. Then an optimal rotation matrix is applied to the coordinates of the neighboring replicas to minimize the root mean square deviation (RMSD) between the two sets of the atomic coordinates53.

22 A tangent vector at each image position (τi) is responsible for decoupling the spring forces and the potential forces to prevent them from interfering. Only the perpendicular component of the forces defined by force field parameters, and the parallel component of the spring forces, are considered in the equations of motion (refer to the inset in figure 2-2). The tangents are defined only for the subspace of the atoms included in the NEB force calculations. This follows previous work by Bergonzo et al. 16. The total force acting on these atoms hence includes two orthogonal components:

NEB k ⊥ Fi = Fi + Fi (2-21)

⊥ Fi = −∇V (Ri) + (∇V (Ri) · τi) · τi (2-22)

k s Fi = (Fi · τi) · τi (2-23)

s th where Fi is the spring force at the position of the i replica, and ∇V (Ri) is the potential force described by the force field. The tangents are defined based on the energy of the self and the neighboring replicas as:   Ri+1 − Ri if Vi+1 > Vi > Vi−1 τi = (2-24)  Ri − Ri−1 if Vi+1 < Vi < Vi−1 in which Vi = V (Ri) . In this definition, only the position of the neighboring replica with higher energy is considered. If both of the neighboring replicas have either higher or lower energy with respect to replica i, that is, if

Vi+1 > Vi < Vi−1

Vi+1 < Vi > Vi−1 then a weighted average will be used to define the tangent estimates:   max min (Ri+1 − Ri)∆Vi + (Ri − Ri−1)∆Vi if Vi+1 > Vi−1 τi = (2-25)  min max (Ri+1 − Ri)∆Vi + (Ri − Ri−1)∆Vi if Vi+1 < Vi−1

23 where

max ∆Vi = max(|Vi+1 − Vi|, |Vi−1 − Vi|)

min ∆Vi = min(|Vi+1 − Vi|, |Vi−1 − Vi|)

For more detailed information on the definition of tangents, the reader can refer to reference 54. The decoupling ensures a smooth convergence to the path and prevents the images from corner-cutting or sliding down14. The force projection decouples the dynamics of the images from the discrete distribution of the images along the path, such that only the true forces are responsible for relaxation of the images while the spring forces keep the images away from the minimum states. Figure 2-2 shows a schematic representation of the NEB force decoupling on a two-dimensional LEPS harmonic potential energy surface. For more information regarding this potential model refer to appendix A of reference 14.

Figure 2-2. Two-dimensional potential energy surface of LEPS potential coupled to a harmonic oscillator14. LEPS harmonic oscillator potential represents the energy of a system of four atoms. Atoms A and C are fixed. Atom B is allowed to move on the line connecting A and C. Another degree of freedom in terms of a harmonic oscillator is introduced by adding atom D that is coupled to B (refer to appendix A of reference 14). The X and Y axes in the figure represent the distance between atoms A and B and atoms B and D, respectively. The figure on the left shows a schematic representation of an initial path (the dashed straight line) and the replicas along that path (the circles). The force decoupling of one of the replicas is shown. The figure on the right represents a schematic MEP.

The minimization step would take the chain towards the local minimum that is most accessible to the initial path. Simulated annealing can increase the chance of NEB simulations

24 converging to the global MEP rather than a local MEP. Supervision may be required to prevent temperature increases from leading to the presence of unphysical structures during the simulation. In the final phase of the simulation, a gradual decrease of the temperature to zero will freeze the replicas along the minimum transition path. It is possible that the biological system of interest has multiple pathways connecting the two metastable states. This necessitates statistical analysis of multiple independent simulations which result in different pathways–which once again illustrates the importance of fast simulation techniques.55;56 2.3 String Method

An alternative method for predicting transition pathways is through the use of a parametrized string which evolves to the most probable path between two locations on the energy surface, i.e., the MEP. The original string method15 (zero temperature string method) was developed for smooth energy landscapes. Within two years of their original paper E. et al. published the finite temperature sting method4 which incorporates the temperature effects for rough energy surfaces. The string evolves to MEP due to the potential forces that are imposed via the specific parametrization. The arc-length parameterization results in equally spaced points along the string and energy-weighted arc length results in better resolutions around the transition state region. For specifics regarding the parametrization of a curve refer to Appendix B. The mathematical derivations for the zero and finite temperature string methods fromE et al. 15;4 are presented below. To derive the equations for the dynamics of the string consider the Langevin equation with a friction coefficient γ and a white noise ζ satisfying:

F = −∇V (q) − γq˙ + ζ(t) (2-26)

< ζi(t)ζj(t0) >= 2γkBT δijδ(t − t0) (2-27)

Assuming at least two minimas for the potential V (q), the purpose is to find the MEP going from an initial minimum state (A) to a final minimum state (B). A string φ∗ lies on the MEP if

25 it satisfies ∇V ⊥(φ∗) = 0 (2-28)

where V ⊥ is the gradient of potential in the hyperplane perpendicular to the string. The MEP is located through the stationary solution of the evolution of the string in the hyperplane perpendicular to the curve u⊥ = ∇V ⊥(φ) (2-29)

Assuming an intrinsic parameterization for the string for numerical purposes equation 2-29 can be written as

⊥ φt = −∇V (φ) + rtˆ (2-30) with tˆ = φα being the unit tangent along φ. The scalar term r = r(α, t) is a Lagrange |φα| multiplier responsible for maintaining the intrinsic parameterization through an imposed constraint. If the path parameterization is normalized arc-length with α = 0 at the initial state and α = 1 at the final state, then the constraint is

(|φα|)α = 0 (2-31)

Energy weighted arc-length parameterization however is achieved using the constraint

[f(V (φ))|φα|]α = 0 (2-32)

where f(V (φ)) is a function with negative first derivative. The finite temperature string method is a generalization of the zero temperature string

w method which instead of evolving one string, minimizes an ensemble of strings {φt } each of which evolve with w ⊥ w w ⊥ ˆ φt = −∇V (φ ) + (η ) + rt (2-33)

Here ηw is a white noise satisfying

w w 0 < ηi (α, t)ηj (α , t0) >= 2kBT δijδ(t − t0)δαα0 (2-34)

26 The ensemble mean, φ(α), is defined as the averaged curve over the ensemble of strings:

< φw(α) >≡ φ(α) (2-35)

The density function for this system is given by

−1 ρ(q, α) = Q (α) exp(−βV (q))δS(α)(q) (2-36) with the partition function Q acting as the normalization constant:

Z Q(α) = exp(−βV (q))dσ (2-37) S(α)

The ensemble mean φ(α) and the effective transition tube radius (the standard deviation) R(α) can then be determined from

Z φ(α) = Q−1(α) q exp(−βV (q))dσ (2-38) S(α) and Z R2(α) = λQ−1(α) |q − φ(α)|2 exp(−βV (q))dσ (2-39) S(α) with λ being a free constant parameter of order of unity. 2.4 Free Energy and Transition Rate Calculations

Once the MEP has been identified, the free energy profiles along the path can be determined through various free energy calculation techniques, such as the umbrella sampling57. A rough estimation about the MEP often suffices to get an estimate about the region which needs to be sampled via umbrella sampling. The free energy difference between two points along the path can be obtained from

Q(α) F (α) − F (0) = −k T ln[ ] (2-40) B Q(0) where Z(α) is the partition function restricted to the hyperplane S(α) normal to the MEP:

Z Q(α) = exp(−βV (q))dσ (2-41) S(α)

27 Combining the equations 2-40 and 2-41 results (for derivation refer to AppendixC):

Z α F (α) − F (0) = < (∇V.tˆ)((tˆ.φ)α − tˆα.φ) > dα (2-42) 0 where the angle bracket is the expectation value or the ensemble average of the distribution restricted to the hyperplane S(α). Transition rates can be calculated in terms of the free energy difference of the initial state and the transition state. k = Ke−β∆F (2-43) where K can be a constant defining the rate based on the frequency of the collisions as in Arrhenius equation or if the assumptions of the transition state theory holds58 it can take the

kB T value of h . More accurate derivations of K can be calculated based on Kramers’ argument (one can refer to Chapter 9 in59). Since our interest lies in finding the relative transition rates of mutated biomolecules with respect to each other, discussions on how to accurately derive K is irrelevant to the focus of this study. 2.5 Computational Methods of Free Energy Calculations

This section establishes the theoretical background for some of the most popular computational methods of free energy calculations. As explained in section 2.1.3, the Helmholtz free energy of a canonical ensemble can be calculated from its partition function:

A(N,V,T ) = −kBT ln Q(N,V,T ) (2-44)

In most systems, the contribution of the kinetic energy to the canonical partition function can be calculated analytically by performing a trivial integration. Hence, it suffices to devote our attention to the contribution of the potential energy to free energy. This contribution leads to free energy, which depends on the configuration of the system and is accordingly called configurational free energy. In this section, expressions like Hamiltonian and potential

28 energy thus may be used interchangeably. Also, free energy and partition function refer to configurational contributions to the free energy and partition function. 2.5.1 Free Energy Perturbation

A transformation in a system can be explained as a change in its Hamiltonian. Consider a

system transforming from an initial state ’a’ with the Hamiltonian Ha to a final state ’b’ with

the Hamiltonian Hb, for which the change in its potential energy is equal to ∆V = Vb − Va. This transformation causes a change in the configurational free energy of the system that can be calculated from:

∆A = Ab − Aa

Qb(N,V,T ) = −kBT ln Qa(N,V,T ) R exp(−Vb(q)/kBT )dq = −kBT ln R (2-45) exp(−Va(q)/kBT )dq

R exp(−∆V (q)/kBT ) exp(−Va(q)dq = −kBT ln R exp(−Va(q)/kBT )dq

= −kBT lnhexp(−∆V (q)/kBT )ia where hexp(−∆V (q)/kBT )ia is the ensemble average of the Boltzmann factor in the reference state ’a’. Equation 2-45 is the free energy perturbation (FEP) method. It is also called the exponential averaging or the Zwanzig equation named after Robert Zwanzig who first derived this equation60. Equation 2-45 is usually solved numerically for systems that include thousands of interacting particles. For this purpose, thousands of structures are sampled using molecular dynamics or Monte Carlo simulations, and single point energies of these structures are calculated with the two different Hamiltonians. If the phase space overlap between the two Hamiltonians is adequate, i.e., if the structures sampled with one of the Hamiltonians are representative of structures created with the other Hamiltonian, the FEP formulation is

29 advantageous. In these cases, it is possible to generate numerous statistically independent structures at the cheaper Hamiltonian and calculate the free energy differences at the expensive Hamiltonian. However, the Boltzmann factors in equation 2-45 are often dominated by contributions of a few structures with large energy values. This dependency leads to large statistical instabilities. These instabilities can be suppressed by expanding the exponential term using either the Taylor series or the cumulant expansions61. Expanding the exponential term with Taylor series would result in

∞ ∞ X (−β)n X (−β)n hexp(−β∆V )i = h ∆V ni = h∆V ni (2-46) a n! a n! a n=0 n=0 in which β = 1 . If the difference in the potential energies ∆V is small, the expansion can be kB T approximated using the first few terms of the series. Using the cumulant expansion theorem61, the exponential term can be written as

∞ X (−β)n hexp(−β∆V )i = exp{ h∆V ni } (2-47) a n! a,c n=1

n n in which h∆V ia,c is the cumulant of ∆V in the reference state ’a’. The cumulants can be written as the averages of powers of ∆V . These averages up to the third order are given by

h∆V ia,c = h∆V ia

2 2 2 h∆V ia,c = h∆V ia − h∆V ia (2-48)

3 3 2 3 h∆V ia,c = h∆V ia − 3h∆V iah∆V ia + 2h∆V ia

Including only the first order of equation 2-47 results into equation 2-46:

exp(−βh∆V ia,c) = exp(−βh∆V ia) ∞ X (−β)n = h∆V ni n! a n=0 ∞ (2-49) X (−β)n = h ∆V ni n! a n=0

= hexp(−β∆V )ia

30 Hence, the cumulant expansion up to the first order is equivalent to the complete Taylor expansion. 2.5.2 Thermodynamic Integration

The Hamiltonian of a system transitioning from an initial state ’a’ to a final state ’b’ can be written as a function of a parameter λ which varies between 0 and 1, producing a

Hamiltonian that transforms from Va to Vb.

H(λ) = λVb + (1 − λ)Va (2-50)

For intermediate values of λ, the Hamiltonian is a mix between the initial and the final state, and may not represent a physical system. This mathematical function, however, would allow the calculation of the free energy. The derivative of the free energy with respect to lambda is equal to: d d A(λ) = −k T [ ln Q(λ)] dλ B dλ d Z = −k T [ ln exp(−V (q, λ)/k T ) dq ] B dλ B (2-51) R ∂V (q,λ) ∂λ exp(−V (q, λ)/kBT ) dq = R exp(−V (q, λ)/kBT ) dq

∂V (q, λ) = h i ∂λ λ Hence the change in the free energy can be calculated from:

Z λ=1 d ∆A = A(λ) dλ λ=0 dλ (2-52) Z λ=1 ∂V (q, λ) = h iλ dλ λ=0 ∂λ

2.5.3 Bennett Acceptance Ratio

The Bennett acceptance ratio method, proposed by Charles Bennett in 197662, is a method for calculating the free energy differences between two states using configurations from both. Starting from equation 2-45, for a function W (q) that is everywhere-finite, it follows

31 that:

∆A = Ab − Aa

Qb = −kBT ln Qa R Qb W (q) exp(−Ub − Ua)dq = −kBT ln R Qa W (q) exp(−Ua − Ub)dq (2-53)

R W exp(−Ub − Ua)dq Qb(N,V,T ) = −kBT ln R Qa(N,V,T ) W exp(−Ua − Ub)dq

hW exp(−Ub)ia = −kBT ln hW exp(−Ua)ib

in which, for simplicity, U(q) = V (q)/kBT is the scaled potential energy.

Equation 2-53 can be estimated by the finite sample distribution values for na and nb

statistically independent samples from Ua and Ub ensembles, respectively. The standard error of the mean for this estimation can be calculated from the first order approximation of the variance using the Taylor series expansions:

2 2 Qb (∆A − ∆Aest) = σ∆A = var(−kBT ln ) Qa

hW exp(−Ub)ia = var(−kBT ln ) hW exp(−Ua)ib

= var(−kBT lnhW exp(−Ub)ia) + var(−kBT lnhW exp(−Ua)ib)

2 2 2 2 2 hW exp(−2Ub)ia − hW exp(−Ub)ia hW exp(−2Ua)ib − hW exp(−Ua)ib = (kBT ) [ 2 + 2 ] nahW exp(−Ub)ia nbhW exp(−Ua)ib

R (Q /n exp(−U ) + Q /n exp(−U ))W 2 exp(−U − U )dq = (k T )2 [ b b a a a b b a ] B R 2 [ W exp(−Ub − Ua)dq] (2-54) By taking partial derivatives with respect to W from the above equation, the W function for an optimum error value is found to be equal to:

const W (q) = (2-55) Qb/nb exp(−Ua) + Qa/na exp(−Ub)

32 and equation 2-53 can be written as:

hf(Ub − Ua + C)ia ∆A = −kBT [ ln + C ] (2-56a) hf(Ua − Ub − C)ib Q n C = ln b a (2-56b) Qanb in which 1 f(x) = (2-57) 1 + exp(x) is the Fermi function. Equations 2-56a and 2-56b can be solved self consistently for finite sample sizes. P a f(Ub − Ua + C) nb ∆Aest = −kBT [ ln P + C + ln ] (2-58a) b f(Ua − Ub − C) na nb ∆Aest = −kBT [ C + ln ] (2-58b) na

The self consistent criterion is observed when 2-58a is equal to 2-58b, and hence:

X X f(Ub − Ua + C) = f(Ua − Ub − C) (2-59) a b

2.6 Indirect Approach to Free Energy Calculations

Predicting accurate free energy differences is restricted by the accuracy of the computational method and adequate sampling of the configurational space. High-level quantum mechanical(QM) techniques, however accurate, are computationally expensive and are not widely used for free energy simulations (FES). But, since the calculation of free energy differences depends only on the end states, it is possible to conduct these calculations at a reference potential and correct the end states with a higher level of theory. This indirect strategy has been first developed by Gao63;64 and Warshel65;66 and further refined by others67;68;69;70. This method uses a thermodynamic cycle to find the free energy differences between two states with a method that is more affordable and then applies a correction to the end states. Figure 2-3 shows the thermodynamic cycle involved in indirect free energy calculations. To calculate the free energy differences between the two states ’a’ and ’b’ at a high level, it suffices to find this quantity at

33 a low level and apply the end state corrections. That is:

∆A(ahigh → bhigh) = ∆A(alow → blow) (2-60)

+ ∆A(ahigh → alow) + ∆A(blow → bhigh)

Here a high level refers to an accurate but computationally expensive QM or QM/MM method while a low level indicates a fast but less accurate molecular mechanical (MM) methods.

Figure 2-3. Thermodynamic cycle involved in the indirect approach of free energy calculations.

This approach reduces the cost and complexity of the calculations since it decreases the QM computations significantly. However, getting an accurate value for the end state corrections is not a straightforward task. The FEP method requires the simulations to be performed only at the reference MM potential. Recalculation of the potential energies at the QM level for the independent structures, which is significantly faster than generating the distribution itself, would then provide the answer. However, the FEP method is accurate only in the limit of infinite sampling. If the sampled structures do not cover the potential surface or if the two levels of theory do not have significant overlap, the FEP method would have convergence issues. Different procedures have been suggested to tackle this problem, such as using interaction energy differences rather than total potential energy differences71;72, or fixing the internal coordinates of the QM region73;74. The BAR method is more efficient than FEP and converges even if the overlap is not significant. But this method requires performing simulations at the QM level, which is an immediate setback. Only a fast but accurate method is worthwhile to be considered for such simulations.

34 2.7 Feed-Froward Neural Networks

A feed-forward neural network is a computer model comprised of multiple units called neurons grouped into layers. The layers are connected such that the output of one layer is input to the next. The term neural network gets its origin from neurophysiology and represents a model that imitates the brain’s ability in pattern recognition.

Input Layer Hidden Layers Output Layer

Figure 2-4. A densely connected feed-forward representation of a neural network. Each layer is comprised of multiple neurons. The output of one layer acts as an input of the next layer. The arrows are the links between the neurons, and their direction is the direction of the forward propagation.

Figure 2-4 shows a diagram representing a three-layered neural network. These three layers consist of an output layer and two hidden layers, which are confined between the input values and the output layer. The arrows connecting the neurons are the links between the

35 consecutive nodes. Each link defines the weights and biases of a linear regression model, as in equation 2-61a.

(l) X (l) (l−1) (l) zj = wji ai + bj (2-61a) i

(l) (l) (l) aj = g (zj ) (2-61b)

th th in which zj is the j output variable, ai is the i activation or input variable, wji is the weight, bj is the bias, and g is the nonlinear activation function. The superscripts indicate the layer number. Logistic sigmoid (σ), hyperbolic tangent (tanh), and rectified linear units (ReLU) are three of the numerous possible examples of activation functions75. The non-linearity of the activation function is what allows the network to learn complex behaviors. A linear combination of multiple linear functions is linear itself. Hence, if the activations were linear, it was possible to merge all the hidden layers into the output layer, which would detangle the network into a simple linear regression problem. Equation 2-61 can be written in a compact form as:

Z(l) = W (l)T A(l−1) + b(l) (2-62a)

A(l) = g(l)(Z(l)) (2-62b)

From this equation it is apparent that the activation functions of each layer act as an input to the next layer. The process of evaluating equation 2-62 from input variables to the output layer is called the forward propagation. This is the direction of information flow in the network. Training a neural network is the process of optimizing weights and biases such that the output variables become close to the target values for a given set of input variables. This task is achieved through the minimization of a cost function, which resembles the differences between

N the output and target values. That is, for a set of input variables {xi}i=1 corresponding to a

36 N set of target values {yi}i=1 the sum-of-squares error-function can be written as:

N 1 X E(W, b) = {y − yˆ (x , W, b)}2 (2-63) 2 i i i i=1

N in which the {yˆi}i=1 are the output variables or the estimations of target values. This cost function corresponds to the maximum likelihood solution for target values with a Gaussian distribution75. If the output variables of the network are non-linear, the sum-of-squares error-function could be non-convex which makes finding the global minima difficult through the gradient descent method. Hence, for classification problems, cross-entropy error-functions would be more suited76;77. Optimizing the network parameters is achieved by finding a solution to:

∇E(W, b) = 0 (2-64)

Solving equation 2-64 is an optimization problem of a continuous nonlinear function in high-dimensional parameter space which can be tackled with an iterative procedure. First, the cost function is evaluated with some randomly chosen initial values for the weights and biases. Second, the derivatives of the cost function with respect to these parameters are calculated.

∂E(W, b) ∂E(W, b) ∂Z(l) ∂E(W, b) = = A(l−1)T (2-65a) ∂W (l) ∂Z(l) ∂W (l) ∂Z(l)

∂E(W, b) ∂E(W, b) ∂Z(l) = (2-65b) ∂b(l) ∂Z(l) ∂b(l)

in which: ∂E(W, b) ∂E(W, b) ∂A(l) ∂E(W, b) = = g(l)0(Z(l)) ∂Z(l) ∂A(l) ∂Z(l) ∂A(l) (2-66) ∂E(W, b) ∂Z(l+1) ∂E(W, b) = g(l)0(Z(l)) = W (l+1)T g(l)0(Z(l)) ∂Z(l+1) ∂A(l) ∂Z(l+1)

37 This process is called back-propagation. Then, the parameters are updated accordingly. The simplest approach is the gradient descent:

∂E(W, b) W (t+1) = W (t) − α (2-67a) ∂W (t) ∂E(W, b) b(t+1) = b(t) − α (2-67b) ∂b(t)

In equation 2-67 the superscripts are the iteration steps, and α is the learning rate or the step size. This iterative procedure is repeated until convergence is achieved. 2.8 Active Learning

Active learning is a procedure in which the learner can generate its training data based on some action selection criteria. In other words, the learner actively cooperates in learning. The action selection incorporates new data into the training from the regions where the model prediction is poor or has low confidence, among other factors78. A sophisticated selection strategy would allow the network to explore the regions where it lacks information and drastically reduces the amount of training data required to improve the accuracy of prediction79. The more informative the sampled data, the cheaper becomes the training. One common approach to selective sampling is query by committee henceforth referred to as QBC algorithm80. In query learning an ensemble of models cooperate in calculating the learner’s uncertainty in prediction. The concept of QBC is as follows. The members of the committee are trained independently of one another. After every training iteration, each member predicts the values of the samples in the test set. The members then compare the predicted values. If the predictions lie within a certain threshold from each other, the model has learned the pattern. Else, a data generation algorithm produces more data of the same nature and incorporates them into the training set for the next iteration. This cycle is repeated until the desired accuracy is reached. This general work-flow is demonstrated in figure 2-5.

38 Figure 2-5. Active learning work-flow. After each round of training the QBC criteria is tested on the newly generated data. If the results of the predictions are within tolerance, the training is terminated. Otherwise, the labeled data on which the QBC criterion failed is included for the next training iteration.

2.9 Transfer Learning

In machine learning, a common assumption is that the training and the test data belong to the same feature space and distribution. This assumption however convenient, may not be practical. It may be beneficial to reuse the available training data instead of rebuilding a new set. The ability to convey knowledge from a previously trained model in one domain to trigger the learning process of a different task is called transfer learning81. The purpose is to generalize the acquired knowledge such that it is applicable beyond a specific task and domain. Domain and task are common keywords in this field. A domain D is a two-component object consisting of the feature space X and the marginal probability P (X). A task T also

39 consists of two components: the label space Y, and the objective probability P (Y |X). Hence:

D = {X ,P (X)} (2-68a)

T = {Y,P (Y |X)} (2-68b)

In the above definitions X and Y are the n-dimensional input data and its corresponding labeled information:

n X = {xi}i=1 ∀ xi ∈ X (2-69a)

n Y = {yi}i=1 ∀ yi ∈ Y (2-69b)

In this mathematical notation the definition of transfer learning is as follows: given and source domain DS specific to a source task TS, and a different target domain DT coupled to a different target task TT , transfer learning tries to incorporate the model knowledge gained from the source to improve the predictions performed on the target. Figure 2-6 shows a schematic representation for transfer learning. In this scenario, the knowledge that is transferred from the source work-flow reduces the amount of required data for training the target network. 2.10 ANI Neural Network Potentials

This section explains the theory and model architecture of ANI neural network potentials, as well as the specifics regarding training a new model from the available ANI models. More information regarding ANI potentials and models can be found in related references82;83;84;85. 2.10.1 Network Architecture

In ANI, the local environment of each atom is captured by what is called atomic environment vectors (AEV)82. These AEVs are generated from symmetry functionals originated from Behler and Parrinello work86. The radial and angular nature of these symmetric functions encode the information about the local neighborhood of each atom. Figure 2-7 shows a representation of the radial symmetry functionals. The element-wise radial functionals indicate the strength of the effect of each atom on its surroundings. In figure 2, to find

40 Source Target Labels Labels

Transfer Knowledge

Target Source Database Database

Figure 2-6. Transfer learning work-flow. The knowledge acquired from training the source network reduces the size of the target database. the contributions of the adjacent atoms to the energy of the carbon atom, the coordinates of the carbon atom should be input to the neighboring atom’s radial functionals.

Figure 2-7. This is a representation radial symmetry functionals centered at each atom’s position. As the radial distance increases the effect of the environment becomes less significant, which is illustrated by dimming the colors.

41 R A For a system containing N atoms the radial Gm and angular Gm terms are as follows:

N 2 R X −η(Rij −Rs) Gm = e fC (Rij) (2-70a) j6=i

N X Rij +Rik 2 A ζ −η( 2 −Rs) Gm = (1 + cos(θijk − θs)) e fC (Rij)fC (Rik) (2-70b) j,k6=i

In equation 2-70, η is a parameter dictating the width of the Gaussian distributions, Rs and

θs are radial and angular shift parameters respectively, and ζ is a parameter which controls the

width and the peaks of the angular distributions. fc(Rij) is a continuous cutoff function with a continuous first derivative:   πRij 0.5 cos( ) + 0.5 for Rij RC  RC 6 fc(Rij) = (2-71)  0.0 for Rij > RC

in which RC is a cutoff radius and is set to 4.6 A˚ and 3.1 A˚ for radial and angular symmetry functions, respectively. For each atomic number and each atomic number pair, there exist a radial and an angular symmetry functional, respectively. For the ANI-1 potential82, 32 radial shift values are used in the radial part, and 8 radial plus 8 angular shift values are used in the angular part. For a network trained for four atomic elements (H-C-N-O), this adds up to a total of 768 components in the AEV vectors, consisting of 128 radial and 640 angular components. Figure 2-8 shows the diagram of the ANI neural network potentials. Coordinates of atoms generate the corresponding AEV vectors, which are input variables to the element-specific networks. The outputs are the atomic energies, which adds to the total energy. 2.10.2 Sampling the Chemical Space

Currently, there are two commonly used ANI potentials: ANI-1x85, and ANI-1ccx84. ANI-1x has been trained to generate molecular energies and forces resembling the accuracy of the ωB97x density functional87 with 6-31G(d) basis set88. This data set has been originated from the ANI-1 data set83. ANI-1 has been created through sampling near equilibrium

42 q, Z

AEVO AEVC AEVH

. . .

N, S, F, CL

EO EC EH

ET

Figure 2-8. Schematic representation of ANI neural network potentials. Atomic numbers and coordinates generate the AEVs, which capture information regarding the local surroundings of each atom. The AEV vectors are input to the element-specific network to generate their contribution to the total energy. structures from a subset of the GDB-11 database. The GDB database, which stands for the generated and collected database, is a product of the Chemical Space Project89. This group has computationally enumerated all possible organic molecules that contain certain chemical elements up to a certain total number of atoms. For instance, the GDB-1790 database includes 166.4 billion structures that contain up to 17 atoms from C, N, O, S, and halogens. And GDB-1191;92 contains 26.4 million molecules which have up to 11 atoms from C, N, O, F. ANI-1 data set accommodates molecules from the GDB-11 database that consist of 1 to 8 heavy atoms limited to C, N, and O. ANI-1x data set is a reduced version of ANI-1 data set with only 5.5 million structures. This model has been trained with the active learning work-flow explained in section 2.8, which results in a considerable reduction in the data set size and higher precision. ANI-1ccx approaches the accuracy of the coupled cluster single and double excitation and the perturbative triples with complete basis set limit CCSD(T)/CBS39;38;41;40. ANI-1ccx has been trained via the transfer learning method explained in section 2.9 from the ANI-1x

43 network. A portion of the parameters obtained from training ANI-1x has been transferred and held fixed during the training of the ANI-1ccx. The set that was involved in this training is comprised of 500 thousand data points which are trained to an approximation of CCSD(T)/CBS energies. This data set is named CCSD(T)∗/CBS84, which uses a linear-scaling domain-localized DPLNO-CCSD(T) method93;94 to reduce the computational cost of the calculations. Through the active learning work-flow, it is possible to train an ANI model to meet the specific need for the system. Two common ways of generating samples to incorporate more data into each training cycle are explained below. 2.10.2.1 Normal Mode Sampling

An N atom system has 3N-6 normal mode (NM) coordinates, after excluding translational and rotational degrees of freedom. These coordinates can be calculated by diagonalizing the Hessian matrix. A slight displacement of atoms along the normal mode coordinates creates samples that are moderately different from the minimized structure, which allows the sampling of the potential energy surface surrounding the minimum energy structure. As a result, the database includes a more diverse set of structures that can improve the quality of the training. 2.10.2.2 Molecular Dynamics Sampling

Normal mode sampling generates near-equilibrium structures and cannot incorporate the effects of thermal fluctuations. If making a more diverse set that can capture the qualities of the potential energy surface is desired, molecular dynamics should be used. Additionally, with molecular dynamics, it is possible to impose structural restraints or constraints.

44 CHAPTER 3 IMPLEMENTATION 3.1 Implementation of Nudged Elastic Band in Amber∗

The PNEB routines have been implemented in the pmemd module of the Amber package and the parallel-MPI simulations can be performed with either CPU or GPU processors (i.e., with pmemd.MPI and pmemd.cuda.MPI). The CPU implementation was straight forward and does not need a detailed explanation. Figure 3-1 shows the MPI framework of the CPU code. Replicas have to communicate their coordinates to their nearest neighbors for harmonic force calculations. The two arrows connecting the replicas in the figure represent the MPI calls that regulate the data transfer. A higher number of replicas leads to a higher resolution of the path but demands more computational resources. Benchmarking can indicate the best balance between precision and cost. The number of replicas, however, usually goes above what a single cluster node can accommodate. Each replica offloads its computation task to multiple MPI processes. Process zero, which is the master rank, computes the NEB forces and broadcasts them to the other processes. Afterward, a molecular dynamics step is carried out to update the coordinates and velocities. GPU programming is inevitably susceptible to the latency of the data transfer between the device and the host memory which hinders the performance of the GPU-accelerated code. In the PNEB routines, specifically, the coordinate exchange between the replicas at each step of the simulation can aggravate the performance. Hence the GPU implementation required a data transfer optimization other than programming the PNEB routines with CUDA. Figure 3-2 shows the framework of the GPU implementation of PNEB in Amber. Replicas are denoted in this figure by white squares, with a blue polyalanine complex in the middle transitioning from an α-helix to β-sheet conformation. The figure also illustrates our optimized transfer scheme

∗ Section 3.1 was reprinted/adapted with permission from Ghoreishi, D.; Cerutti, D. S.; Fallon, Z.; Simmerling, C.; Roitberg, A. E. Fast Implementation of the Nudged Elastic Band Method in AMBER. J. Chem. Theory Comput. 2019, 15, 46994707.

45 Figure 3-1. Each NEB replica offloads its computation to M threads. The MPI thread with rank zero calculates the NEB forces and broadcasts them to all other threads. named shuttle transfer. In practice, the information regarding the coordinates of all the atoms could be transferred between the processing units. However, not all coordinates are needed, and an exchange of a smaller set is sufficient. Our shuttle transfer reduces the amount of data that must traverse between the computing units by selectively coalescing the memory sections that only correspond to the two atom masks provided by the user for NEB calculations. These two masks contain the atoms that are included in NEB force calculations and the atoms involved in performing the root mean square fitting to the neighboring structures. Data communication routines are overlapped with the main force computation tasks to obtain further performance improvement. It is possible to increase the performance by applying multiple timesteps to compute NEB forces less frequently. The NEB forces are only updated every nth step of the simulation, while for the other steps, the most recent calculated NEB forces are applied to the replicas. Since for a small number of steps the coordinates do not change drastically, it is safe to skip the spring force calculations on the off-steps and apply the most recent NEB forces instead. The attempt for updating the NEB forces, however, should happen as frequently as possible or else the NEB forces might change too much for two consecutive steps and cause instability in the

46 Figure 3-2. The NEB replicas, which are shown with polyalanine complex transitioning from α-helix to β-sheet, communicate their coordinates (crd) with their nearest neighbors for harmonic force calculations. In the shuttle transfer scheme, only the atoms that are needed in the NEB force calculations are being transferred between the host and the device. simulations. The flag that enables this functionality is nebfreq, which has a default value of 1 corresponding to performing NEB on every step. 3.2 Implementaion of ANI-Amber Interface

NeuroChem is a highly efficient NVIDIA GPU-driven software that trains and tests ANI potentials. This package has been interfaced into AMBER22 by incorporating the required modifications to make it suitable for taking advantage of Amber’s highly parallelizable code. The motivation for this work was to create a suitable framework to use ANI potentials in dynamical simulations to quickly and accurately study the time evolution of biological systems. This interface paves the way for practical use of models that are as accurate as QM methods and have comparable performance to the classical force fields. Other available features of Amber like NEB95, and different types of replica exchange molecular dynamics96;97;98;99 could likewise be employed.

47 Figure 3-3 shows the general work-flow of a molecular dynamics simulation. At the beginning of the simulation, the coordinates and velocities of the atoms are initialized. At each iteration, the forces of each atom are calculated using the force field parameters, and the coordinates are updated using an integration algorithm with a user-defined time-step. This process is repeated until the total number of simulation steps is completed.

Integrate AMBER

Initialize Iterate Receive forces Update coordinates coordinates and from forcefield velovities

Figure 3-3. Molecular dynamics workflow in Amber. After initializing coordinates and velocities, forces of each atom is calculated through the force field. An integration scheme then updates the coordinates. The dynamic step is repeated until the total number of steps is achieved.

In the ANI-Amber interface, the usual work-flow of the dynamic is modified in a way that the forces are no longer determined from the force field parameters. As illustrated in figure 3-4, the Amber codebase is updated to transfer the coordinates to NeuroChem right after the integration step, and allow the network to calculate and send the energies and forces back. This interface is available in both the sander and the pmemd modules. The new code is written in a directive based format and needs to be compiled using the ’-ani’ flag, which generates the sander.ANI and the pmemd.ANI executable files. One GPU and one CPU processor are required to run this interface. The program will offload the dynamics calculations to the CPU processor and reserves the GPU for the network

48 Integrate AMBER

Initialize Receive forces Update coordinates coordinates and from ANI velovities

q, Z

AEVO AEVC AEVH FT = -? ET

. . .

EO EC EH

ET

ET

NeuroChem n

Figure 3-4. Molecular dynamics workflow in Amber. After initializing coordinates and o i t a

velocities, forces of each atom is calculated through the force field. An integration g a

scheme then updates the coordinates. The dynamic step is repeated until the total p o r P

number of steps is achieved. k c a B

49 computations, which are the time-consuming part of the simulation. This way, the GPU power is used where it is required the most. 3.3 Sample Amber Input Files

This section provides an example of Amber input files for the NEB and ANI simulations, with each specific flag explained. For information regarding the general Amber MD flags, the reader can refer to the latest Amber manual. 3.3.1 Sample Input File for NEB Simulations

A sample Amber input file for NEB simulations is provided below. Setting the ’ineb’ flag to 1 activates the NEB calculations. The ’nebfreq’ flag has a default value of 1, which corresponds to performing NEB every step. It is possible to update the NEB forces less frequently by setting ’nebfreq’ to values higher than 1. This additional functionality improves performance by sacrificing details. Cautions must be exercised. The ’skmin’ and ’skmax’ are the spring constants used to link the NEB replicas to each other, which have the default values of 50 and 100, respectively. The ’tgtfitmask’ indicates the atoms used for performing the RMS fitting of the replicas to their neighbors. And ’tgtrmsmask’ is the mask that defines the atoms that are involved in the NEB force calculations. These are the atoms that are linked via springs to the neighboring replicas.

50 NEB sample input file & cntrl imin=0, ! run MD ntc=1, ntf=1, ! turn off shake ntpr=1, ntwx=1, ! output settings ntb=0, ! non-periodic MD cut=999.0, ! non-bond cut off igb=6, ! vacuum MD nstlim=50, ! total MD steps dt=0.001, ! time step, in ps ig=-1, ! use random seed ntt=3, temp0=300.0, gamma ln=75.0, ! temperature control ineb=1, ! perform NEB nebfreq=1, ! perform NEB every step skmin=10, skmax=10, ! NEB spring constants tgtfitmask=”:*& !@H=”, ! atoms used in NEB RMS fittings tgtrmsmask=”:*@N,CA,C,O”, ! atoms involved in NEB force calculations /

3.3.2 Sample Input File for ANI Simulations

A sample Amber input file is provided below. The flag that incorporates the ANI-Amber interface is ’iani’. Setting this flag to 1 means the forces are retrieved from the ANI network. The default value for this option is 0. The path to the network directory should also be provided via a string input to ’ani net path’, which informs NeuroChem where the network parameters are stored. The ’ani ensemble size’ flag should be set to the number of the networks in the QBC ensemble. The default value for this flag is 8, which means the energies and forces are the averaged value of eight separate predictions. The ’ani cnst file’ is the

51 address of the file that stores the required constants for the calculation of the AEV vectors, and the list of the atomic elements supported by the network.

ANI sample input file & cntrl imin=0, ! run MD ntc=1, ntf=1, ! turn off shake ntpr=1, ntwx=1, ! output settings ntb=0, ! non-periodic MD cut=999.0, ! non-bond cut off igb=6, ! vacuum MD nstlim=50, ! total MD steps dt=0.001, ! time step, in ps ig=-1, ! use random seed ntt=3, temp0=300.0, gamma ln=2.0, ! temperature control iani=1, ! perform ANI MD ani net path=’/home/user/network directory’, ! ANI network path ani ensemble size=8, ! ANI ensemble size ani cnst file=’/home/user/ani cnst file.params’, ! AEV parameters file /

52 CHAPTER 4 NUDGED ELASTIC BAND: VALIDATION AND RESULTS ∗ 4.1 Computational Details

Three different test cases have been selected for precision and performance examinations. All simulations have been performed using the Amber18 and AmberTools18 suite of programs22. The structures for the first two test cases were built using the leap module of Amber16, with the ff14SB100 forcefield parameters. The structures for the third test case were obtained from Li et al. 56. The TIP3P water model101 has been used for explicit solvent simulations. As previously illustrated in the work of Bergonzo et al. 16 and Li et al. 56, a Langevin thermostat with a high collision frequency is required to control the temperature when performing NEB simulations. High values of collision frequency may not be appropriate for normal MD simulations, but it is recommended for NEB simulations. A strongly coupled thermostat reduces instabilities due to the projection of the potential forces and the addition of the unrealistic springs. For this reason, a collision frequency of 1000 ps−1 was used to control the temperature for simulations performed in an implicit solvent. For explicit solvent simulations, a lower collision frequency was used (refer to test case 3 for values), since the presence of water molecules contributes to the increase of the viscosity of the system. These values of the collision frequencies are in line with the studies that performed NEB previously16;55;102;56. A strong thermostat, however, would increase the system’s temperature too fast. To prevent this from happening, it is recommended to use a slow and a linear change in the temperature. 4.1.1 Test Case 1: Conformational Change of Alanine Dipeptide

A capped alanine dipeptide with a total of 22 atoms was built. In this test case, the pathway between the so-called αR and the αL basins on the Ramachandran plot of alanine

∗ Chapter4 was reprinted/adapted with permission from Ghoreishi, D.; Cerutti, D. S.; Fallon, Z.; Simmerling, C.; Roitberg, A. E. Fast Implementation of the Nudged Elastic Band Method in AMBER. J. Chem. Theory Comput. 2019, 15, 46994707.

53 dipeptide is explored103;16. The impose command of the tleap was used to initialize the two structures close to the two basins. GB-Neck2104 generalized Born implicit solvent model, with sodium chloride salt concentration of 0.2 M, has been used for this test case. The initial structures were then minimized by performing 1500 steps of steepest descent followed by 1500 steps of conjugate gradient with a default tolerance of 0.001. After the minimization, the initial structure had (φ, ψ) = (−78.33◦, −10.58◦), while the final structure had (φ, ψ) = (57.50◦, 20.55◦). NEB calculations have been performed using 16 replicas in implicit solvent. NEB forces have been applied to only the backbone atoms, while all the atoms are included for fitting the neighboring structures to calculate the NEB forces. Simulated annealing was used along with NEB to improve the exploration of the energy landscape. First, 20 ps of simulation with timesteps of 0.5 fs was performed to heat the system to 300 K with spring constants of 10 kcal · mol−1 · A˚−2. Afterward, spring constants were raised to 50 kcal · mol−1 · A˚−2 and 100 ps of simulation was followed with timesteps of 1 fs during which the temperature was held at 300 K. Next, 300 ps of simulated annealing105 with timesteps of 0.5 fs was performed to gradually increase the temperature up to 500 K and back down to 300 K. Short timesteps and gradual increase of the temperature at this step is critical for the stability of the system. Following the simulated annealing, the temperature of the system was gradually decreased to zero over 120 ps of simulation with timesteps of 1 fs which were followed by quenched MD for 200 ps. 4.1.2 Test Case 2: α-helix to β-sheet Transition in Polyalanine

Twelve alanine residues were created from ACE-ALA(12)-NME sequence for a total of 112 atoms. The impose command in the tleap was used to create the initial and final structures in a α-helix and a β-sheet conformation, respectively. Minimization was performed on the endpoint structures for 5000 steps of steepest descent followed by 5000 steps of conjugate gradient. The GBn model106 with sodium chloride salt concentration of 0.2M has been used to model the implicit solvent. NEB calculations and the selection of the atoms included in the NEB region were performed as described in the previous test case. Hence, NEB forces

54 have been applied to only the backbone atoms, while all the atoms were involved in fitting the neighboring structures. 4.1.3 Test Case 3: Base Eversion Pathway of the OGG1–DNA Complex

The OGG1–DNA complexes with intrahelical (initial) and extrahelical (final) endpoints together with the additional intermediate structures along the major groove path56 were generated as described in the supplementary information of reference 56. The parameters for running NEB were the same as the work of Li et al. 56. The system contains 49534 atoms, 43698 of which belong to the solvent. The NEB simulations were performed with 32 replicas in an explicit solvent. For performing NEB, first, the replicas are equilibrated at 310 K, with spring constants of 1 kcal · mol−1 · A˚−2, and collision frequency of 100 ps−1, for 100 ps with timesteps of 1 fs. For the rest of the NEB simulations, the spring constants are raised to 20 kcal · mol−1 · A˚−2 and the collision frequency is brought down to 75 ps−1. The system is equilibrated at 310 K for another 500 ps with 1 fs timesteps. The system’s temperature is then raised to 380 K during 100 ps with timesteps of 1 fs. Further, the system is equilibrated at 380 K for an extra 200 ps, and finally, the temperature is lowered to 310 K during 100 ps. For the last phase of the NEB simulation, the temperature is kept fixed at 310 K for an extra 500 ps in order to equilibrate the system at that temperature. 4.2 Accuracy Tests

We use the three test cases to demonstrate that numerical accuracy is not compromised in the new implementations. These tests compare the numerical values of a specific reaction coordinate along the path for different implementations. The choice of the reaction coordinate was based on a priori knowledge for alanine dipeptide conformational change along the φ and ψ angles, or revelations from prior NEB simulations connecting the initial and final states of the second and third test cases. 4.2.1 Test Case 1: Conformational Change of Alanine Dipeptide

The first test case studies the conformational change of alanine dipeptide. The pathway between two stable conformations corresponding to two minimum regions on the potential

55 energy surface of φ and ψ dihedral angles103;16 is explored. Figure 4-1 shows the energy landscape of this system along with the positions of the two metastable conformations and one of the possible transition pathways. Another choice for the pathway would be to proceed through the barrier around φ = 0◦ and ψ = 100◦. In this test case, we performed two sets of simulations: one set was performed on sander (previous implementation) and the other on pmemd-CPU (recent CPU implementation). From each set, 20 independent simulations that proceeded from the pathway shown in figure 4-1 was selected. When running NEB simulations it is important to perform a set of independent simulations to identify all the possible transition pathways. Free energy profiles along the paths, can then be obtained through various available free energy calculation techniques, such as the umbrella sampling57, which can yield more insight into the preferred pathway in terms of the energetics along the path.

180 pmemd 12 sander 120 10 60 8

0 6 kcal/mol Ψ (Degree) -60 4

-120 2

-180 0 -180 -120 -60 0 60 120 180 Φ (Degree)

Figure 4-1. Two-dimensional potential energy landscape of alanine dipeptide in the φ and ψ dihedral angles space. The initial and final conformations are displayed on top of the energy surface, positioned close to their corresponding minimum states. The bottom pathway indicates the averaged results from two independent sets of simulations performed with the pmemd (red) and the sander (light blue). The two transition pathways are lined up close to each other, indicating that the two implementations agree.

56 Figure 4-2 shows the potential energies of the NEB replicas along the path. In order to have statistically reliable results, each point in the plot is averaged over the individual simulations. Replicas 1 and 16 correspond to the initial and final minimum energy states, respectively. As we move away from the initial state along the path, the energy of the replicas increases until it reaches a transition state with an energy value of ∼6.3 kcal · mol−1. Vertical bars show the standard deviation for each replica. The errors can be represented by dividing the standard deviations by the square root of the ensemble size. The two lines plotted in figure 4-2 show that the results obtained from pmemd-CPU are in good agreement with the ones obtained from the sander.

pmemd 7 sander

6

5

4

3

Energy (kcal/mol) 2

1

0 0 2 4 6 8 10 12 14 16 Replica ID

Figure 4-2. The potential energy of the replicas, with the end replicas representing minimum states and replicas 9-11 representing the transition region. The light blue line represents simulations done with the sander and the red line represents simulations done with the pmemd-CPU. The lines in the plot are averaged over individual simulations and the vertical bars represent the standard deviations along the transition region.

For the plots demonstrating the φ and ψ dihedral angle change along the path, refer to AppendixF 4.2.2 Test Case 2: α-helix to β-sheet Transition in Polyalanine

The second test case is a small alanine peptide transitioning from α-helix to β-sheet. Figure 4-3 shows the peptide end to end distance along the path and the initial and the

57 final conformations. The blue and red lines are averaged over 50 individual simulations performed with pmemd-CPU and pmemd-GPU, respectively. Vertical bars represent the standard deviation. This test case demonstrates that the GPU implementation of PNEB provides results that are statistically equivalent to the ones achieved using the pmemd-CPU implementation.

Figure 4-3. End to end distance in the 12 alanine residue complex transitioning from α-helix to β-sheet. The blue line represents simulations done with pmemd-CPU while the red line shows simulations done with pmemd-GPU.

4.2.3 Test Case 3: Base Eversion Pathway of the OGG1–DNA Complex

8-Oxoguanine (8-oxoG) is a result of oxidation of guanine and one of the most common products of oxidative damage in DNA, which can lead to mutations in cells if not excised prior to DNA replication55;102;56. Human 8-oxoguanine–DNA glycosylase (OGG1) excises 8-oxoG from damaged DNA in base excision repair. Several studies have been reported that address different aspects of this transition to better understand the base eversion pathways and the preferred binding conformation107;108;55;102;56. The two endpoints correspond to the intrahelical and the extrahelical conformations in the base eversion pathway. Figure 4-4 illustrates the endpoint structures of the region involved in the transition plus the glycosidic angle versus the

58 eversion distance change of this transition. This eversion distance has been reported previously as a reaction coordinate for nucleic acid base eversion56;109. The red points are the result of the simulations performed with nebfreq equal to 1. The light blue points are the result of the simulations performed with nebfreq equal to 5 in all the stages of the simulation but the first and the final, at which the value of nebfreq was set to 2. The data points are the averaged values extracted from the trajectories of the last stage of 10 independent NEB simulations. The two sets agree which explains that even when we update the NEB forces every 5 steps the transition pathway still falls into the correct region. A perfect agreement between individual sets of simulations for systems that contain many degrees of freedom is not possible and it is reasonable to expect of a transition region rather than a single pathway. Once the MEPs are identified, advanced sampling techniques such as umbrella sampling57 can be used to calculate the free energy changes along the path. PNEB provides substantial computational savings by allowing one to focus only on the portion of the configuration space surrounding the MEPs56.

4.3 Timing Benchmarks

Figures 4-5 and 4-6 show the performance of the different implementations of PNEB in Amber. All the benchmarks were performed using the third test case consisting of 49534 atoms. The GPU simulations were performed on P100 NVIDIA GPUs and Intel Xeon E5-2680v3 CPU processors linked via Mellanox FDR InfiniBand interconnects. The CPU simulations were performed on Intel Xeon Platinum 8160 processors. Figure 4-5 shows the scalability of sander and pmemd-CPU implementations using different numbers of CPU processors per NEB replica. Porting the code from sander to pmemd provides a performance gain of about 1.6X, commensurate with the speed of the CPU pmemd engine relative to the sander. Figure 4-6 shows the average increase in the performance of the CPU and the GPU code, with different GPU precision models110. The currently available GPUs have a higher processing power for single precision floating-point arithmetic operations than that of double precision.

59 Figure 4-4. Glycosidic angle vs. eversion distance involved in transitioning from intrahelical to extrahelical conformations in the OGG1–DNA complex. The blue points represent simulations performed with nebfreq=5 for all the stages but the first and the last, which had nebfreq=2. The red points represent simulations performed with nebfreq=1.

4.5 sander 4.1 4 pmemd

3.5

3 2.9 2.5 2.5

2 1.7 1.6 1.5

0.9 1.0

Performance (ns/day) 1 0.7 0.5 0.5 0.3 0 2CPU 4CPU 8CPU 16CPU 32CPU Processors/Replica

Figure 4-5. Performance comparison between sander and pmemd. All simulations have been performed on CPUs with timesteps of 1 fs.

60 The Amber SPFP precision model replaces the double precision arithmetic with single-precision combined with 64-bit fixed point integer arithmetic for the accumulation of forces and energies. The SPFP precision model hence results in higher performance simulations compared to the slow double precision DPFP model without loss of accuracy110. For both SPFP and DPFP models, the data that gets transferred between GPU and CPU has a 64-bit double precision floating-point format. CPU benchmarks were performed for sander with nebfreq equal to 1, and for pmemd with nebfreq equal to 1, 2, 5, and 10. GPU benchmarks were performed for a full transfer with nebfreq equal to 1, and a shuttle transfer with nebfreq equal to 1, 2, 5, and 10. The performance of a regular MD simulation has been included as a control for gauging of the performance penalty caused by the activation of the NEB option. In this test case, the simulations were performed with timesteps of 1 fs. As a result, setting nebfreq=n corresponds to updating the NEB forces every n fs. For shuttle transfers, the coordinates of 2630 out of the 49534 atoms traveled between the CPU host and GPU device memory every nebfreq steps. For the shuttle transfer with nebfreq equal to 1, roughly 61% of the total simulation time spent on the NEB calculations, belongs to the data transfer. The use of the shuttle transfer with this system achieves more than 2X performance gain for the case of SPFP. Compared to the initial implementation (sander), the GPU code results in more than 10X performance enhancement over 32 CPU processors. For this test case, a further two-fold acceleration was observed by increasing nebfreq to 5. Another set of benchmarks has been provided in Figure 4-7, which compares the speed of the GPU code for various numbers of atoms included in the shuttle transfer. For each replica, during each transfer, the coordinate values of a certain number of atoms are transferred, first from the device to the host of that replica, second through an MPI routine between the hosts of the neighboring replicas, and third from the host of the neighboring replicas to their corresponding devices. Data transfer speed depends on a lot of factors including the bandwidth of the hardware, and the type and bandwidth of the interconnect linking the hardware, as well as the latency of each transfer call. But as a general rule, minimizing

61 120 CPU(sander, nebfreq=1) GPU(full-transfer, nebfreq=1) CPU(pmemd, nebfreq=1) GPU(shuttle-transfer, nebfreq=1) CPU(pmemd, nebfreq=2) GPU(shuttle-transfer, nebfreq=2) CPU(pmemd, nebfreq=5) GPU(shuttle-transfer, nebfreq=5) CPU(pmemd, nebfreq=10) GPU(shuttle-transfer, nebfreq=10) 100 CPU(pmemd, regular MD) GPU(regular MD) 99.1

80

72.3

61.1 60

42.4 Performance (ns/day) 40

30.9 29.7 27.4 25.2 21.1 20 14.5 11.6 10.2 8.0 8.1 9.1 8.6 4.1 2.5 0 32CPU(DPDP) GPU(DPFP) GPU(SPFP) Device Precision Model

Figure 4-6. Performance comparison for different nebfreq values for various CPU and GPU implementations. The performance of a regular MD simulation has been included to illustrate the cost of activating the NEB option. The simulations were performed with timesteps of 1 fs. the size of the transferred data results in a better performance. As shown in figure 4-7, the performance of the PNEB code is also dependent on the data transfer size. The use of peer to peer communication and interconnect architectures such as NVLink is likely to improve the performance further but have not been tested here.

62 40 shuttle atoms=4096 34.8 35 shuttle atoms=2048 33.4 shuttle atoms=1024 shuttle atoms=512 30.8 30 27.6 25

20 18.6 17.5 18.2 16.5 15

10 Performance (ns/day) 5

0 DPFP SPFP Device Precision Model

Figure 4-7. Performance dependence of different GPU precision models on the size of the data transfers. Shuttle atoms specify the number of atoms for which their coordinates are transferred between the neighboring replicas.

63 CHAPTER 5 FREE ENERGY METHODS WITH MACHINE LEARNING This chapter represents results of the studies performed with the ANI-Amber interface. The goal is to provide information regarding the accuracy of the implementation and to highlight possible applications that could benefit from this interface. Of particular interest is the use of these potentials in conformational and hydration free energy predictions. 5.1 Two Dimensional Energy Surface with ANI-Amber

This section provides a two-dimensional potential energy surface of ethylene glycol with

the formula of (CH2OH)2. This molecule is represented in figure 5-1. The two-dimensional scan is generated by performing a full torsional scan of two dihedral angles of this molecule,

which are denoted by Φ1 and Φ2 containing O-C-C-O and C-C-O-H atoms, respectively. The dihedral angles were incremented by 2◦ to create the initial structures. The single point energies of each structure are measured once with the general Amber forcefield (GAFF)111 and once with the ANI-1x neural network potentials.

Figure 5-1. Ethylene glycol with the formula of (CH2OH)2. The two dihedral angles for torsional scan are angles containing O-C-C-O and C-C-O-H atoms.

Figures 5-2 and 5-3 demonstrate the two-dimensional energy surfaces represented by GAFF and ANI-1x potentials, respectively. Both plots have been generated with Amber. The

64 initial structures were created by performing a restrained minimization with GAFF to generate

the desired Φ1 and Φ2 angles. Single point energies of these structures were then captured with GAFF and ANI-1x. The two-dimensional surface generated with ANI-1x is not as symmetric as the one with GAFF. Two reasons could cause this dissimilarity. First, the initial structures were created with GAFF, and it is expected to see different patterns in regions where the two potential energy surfaces do not overlap. Second, the ANI-1x network has not been trained to produce accurate energies for rotational conformers of the molecules in the training set. Increasing the energy prediction accuracy of rotational conformers is one of the aims of the ANI-2x potentials, which is currently at the final stages of release.

180 5

120 4

60 3 0 kcal/mol

1 (Degree) 2

Φ -60

1 -120

-180 0 -180 -120 -60 0 60 120 180 Φ2 (Degree)

Figure 5-2. Two-dimensional potential energy surface of ethylene glycol. The Φ1 and Φ2 dihedral angles contain O-C-C-O and C-C-O-H atoms, respectively. This surface is generated using the general Amber forcefield.

5.2 End-State Free Energy Corrections

The machine-learned potentials can speed up dynamical calculations while maintaining the desired precision. The time consuming free energy calculations, for instance, can benefit considerably from these potentials. In free energy calculations, it is necessary to allow the system to visit the less likely explored regions. Hence, long simulations are pivotal in the

65 180 10

120 8

60 6 0 kcal/mol

1 (Degree) 4

Φ -60

2 -120

-180 0 -180 -120 -60 0 60 120 180 Φ2 (Degree)

Figure 5-3. The same figure as 5-2 except the potential energy surface has been generated using ANI-1x potential.

convergence of the results. Therefore, it is common to switch to less accurate classical force fields and sacrifice precision to achieve convergence. This section demonstrates the achievement of more accurate calculations through the use of the Amber-ANI interface. 5.2.1 Conformational Free Energy with ANI-Amber

The goal is to perform umbrella sampling57 to calculate the conformational free energy of N-methyl acetamide transforming from cis to trans conformation. Figures 5-4 and 5-5 demonstrate the cis and trans conformers of the N-methyl acetamide.

Figure 5-4. cis conformer of N-methyl acetamide. This is the less favorable conformer with ◦ dihedral angle of ∼0 consisting of Cα-C-N-Cα atoms.

66 Figure 5-5. trans conformer of N-methyl acetamide with the dihedral angle of ∼180◦ consisting of Cα-C-N-Cα atoms.

The parameter and coordinate files for N-methyl acetamide was built using the leap module of Amber1822, with the ff14SB100 forcefield parameters. This initial structure is then minimized for a total of 1000 steps which includes 500 steps of steepest descent followed by 500 steps of conjugate gradient. The resulting minimized structure has a backbone

◦ dihedral angle ω = 180.0 corresponding to Cα-C-N-Cα atoms (trans conformation). The umbrella sampling simulations were performed with windows that are 3◦ apart. To speed up the calculations four equally spaced starting structures with ω dihedral angles of 180◦, 120◦, 60◦, and 0◦ were initially created and used for different sets of the umbrella sampling windows. Harmonic restraints with force constants of 200 kcal · mol−1 · A˚−2 were applied to create the initial starting structures. For each window of simulation, a restrained minimization including 500 steps of steepest descent and 1500 steps of conjugate gradient was performed to create the desired conformation. Next, 50 ps of restrained simulation with timesteps of 0.5 fs was performed to increase the temperature to 300 K. Then, 100 ps of production run with timesteps of 0.5 fs was performed at 300 K. During this stage, the value of the dihedral angle was measured every 50 steps. At the end of the simulations, a histogram of the umbrella sampling windows has been plotted to check that they have appropriate overlap. The Weighted histogram analysis method (WHAM)112 implementation of Alan Grossfield has been used to generate the one dimensional potential of mean force. The result of this

67 calculation is demonstrated in figure 5-6. Figure conveys that trans conformation is more stable than the cis conformation by approximately 1.43 kcal · mol−1.

14

12

10

8

6

4 Free Energy (kcal/mol) 2

0 0 20 40 60 80 100 120 140 160 180 ω (Degree)

Figure 5-6. Potential of mean force (PMF) plot of N-methyl acetamide transitioning from cis conformer to trans conformer. At a 0◦ peptide bond (cis conformation) the free energy corresponds to 1.43 kcal · mol−1. At a 180◦ peptide bond (trans conformation) the free energy equals to 0.0 kcal · mol−1. The energy barrier height from cis to trans equals to 12.58 kcal · mol−1

The indirect free energy approach is used to calculate the free energy difference between these two conformations at a higher level. For this purpose, 102.5 ns of production run with timesteps of 0.5 fs was performed at 300 K for each of the two conformers. The simulation frames were saved every 1 ps, which resulted in a trajectory consisting of 205 k frames. The first 5 k frames of the simulation were discarded to ensure that the systems were in a relaxed state. These two trajectories were then provided as an input trajectory to the sander module for post-processing. Single point energies of each frame were recorded once with the forcefield and once with the ANI-1x potential. The endpoint correction calculations of free energy is demonstrated in figure 5-7. In this figure, ’MM’ stands for molecular mechanics forcefields, and ’ANI’ stands for ani potentials trained to predict quantum mechanics energies. The cumulant expansion method explained in section 2.5.1 was used to calculate the endpoint corrections. Table 5-1 shows the results of the free energy calculations.

’∆AcisMM→transMM ’ is the result obtained from umbrella sampling at the MM level, ’∆AcisANI→transANI ’

68 Figure 5-7. The indirect approach to correct the free energy difference of N-methyl acetamide transitioning from cis to trans conformation. The free energy difference between a molecular mechanics (MM) forcefield and ani potentials trained to replicate quantum mechanics (ANI) energies is calculated at both endpoints. The subtraction of these two terms will be added to the conformational free energy change from cis to trans, calculated at the MM level.

is the result obtained from the indirect approach by applying the endpoint corrections with

ANI-1x potential, and ’∆AcisQM→transQM ’ is the result of the ab initio calculations performed by Jorgensen and Gao 113. All numbers are reported in units of kcal · mol−1. As demonstrated, by including the QM correction the result gets close to the calculations performed at the ab initio level.

Table 5-1. Free energy difference for cis to trans conformational transition. All energies are in kcal · mol−1. The first, second, and third columns are the results obtained from MM umbrella sampling, MM umbrella sampling with ANI endpoint corrections, and ab initio calculations performed in reference 113, respectively.

∆AcisMM→transMM ∆AcisANI→transANI ∆AcisQM→transQM 1.43 2.51 2.50

5.2.2 Hydration Free Energy with ANI-Amber

The goal of this section is to perform end-state free energy calculations to correct the hydration free energies. However, none of the ANI potentials so far have been trained to predict potential energies of solute molecules in bulk water. Currently, long-range interactions are absent in ANI network training, which disqualifies these potentials from accurately predict the energetic features of solvated systems. To overcome this limitation, we used active learning

69 to train a potential that predicts the correct energy values of clustered methane-water systems. The following sections demonstrate the details of the training and the results of this work. 5.2.2.1 Data preparation and network training

Molecular dynamic simulations of bulk water and methane in water solutions were conducted using the SPC flexible water model (SPC/Fw or SPF)114 in Amber. This model introduces flexibility into rigid simple point charge (SPC) water molecules115. ANI does not apply any constraint for internal degrees of freedom (DoF). Therefore sampling these DoFs is crucial in providing unbiased examples for training ANI neural network potentials (NNP). The SPC/Fw model allows sampling various conformations of water molecules in contrast to rigid water models such as TIP3P101 and extended SPC (SPC/E)116. Multiple MD simulations were performed in the NVT ensemble with a temperature chosen from 290-450 K to scan a wide range of potential energy surfaces in the generated samples. The trajectories were stripped to generate snapshots of water clusters and methane-in-water clusters with the size of 10-20 molecules. In the case of methane-in-water, the closest water molecules to methane resemble the first hydration shell of water around methane. Figure 5-8 demonstrates the workflow for creating the training dataset.

Remove box Strip out the cluster information

Figure 5-8. Data preparation workflow for creating clustered structures for AL training. First, a water box with a solute molecule is generated. Minimization, NPT, and NVT simulations are performed to relax the system and set the temperature to the desired value. Then, the excess water molecules are stripped to create the clustered solute-solvent structures. In the end, the box information is removed. The atom types and coordinates are stored with the corresponding QM energy values as labels.

70 This sampling scheme results in a data-pool containing 200k sample coordinate files for the active learning (AL) process. After each iterative AL process, the QBC model ensemble disagreement criteria are tested for this data-pool. A random subset of samples that do not satisfy the QBC criteria is selected with their corresponding QM energies as labels. The labeled data is appended to the data-set that the AL process is initialized to (i.e., ANI-1x). Therefore, the level of theory for labeling the new data should be in-line with the ANI-1x data-set. The QM single point energy calculation was calculated with Gaussiona 09117 with the ωB97x density functional87 and 6-31G(d) basis set88. The new labeled data is appended to the existing data-set before the start of the training iteration. This iterative process of data selection and labeling is repeated until the desired accuracy is reached. 5.2.2.2 Energy prediction results

The training process explained above has been repeated five times (i.e., five active learning iterations). To test whether the network is learning to predict the correct energy values and can generalize its predictions to larger systems, a test set of 20 individual structures containing water-methane clusters with 70 water molecules has been created. The single point energies of these structures have been calculated with Gaussian 09117 and compared with the predictions of the trained networks. Figure 5-9 shows the mean absolute error (MAE) of the network predictions and the QM energy values versus the iteration number. Iteration 0 is the ANI-1x network. This initial iteration results in the MAE values of 19.24 kcal · mol−1, which is the direct result of using ANI-1x for predictions. As expected, after each iteration with appending the appropriate data-points to the training set, the MAE decreases. On the 5th iteration, the MAE value drops to 3.49 kcal · mol−1. This significant decrease in the MAE is proof that the network is learning to predict the correct energy values. The continuation of the training process for a few more iterations is likely to produce even more accurate predictions.

71 20

18

16

14

12

10

8

6

Mean absolute error kcal/mol (MAE) absolute Mean 4

0 1 2 3 4 5 Active learning iteration

Figure 5-9. Mean absolute error (MAE) values of AL network predictions on the test set and their corresponding QM energy differences versus training iterations. The test set includes 20 individual structures containing water-methane clusters with 70 water molecules. Iteration 0 is the ANI-1x network, which initializes the AL process. After 5 repetitions, the MAE value has dropped from 19.24 kcal · mol−1 to 3.49 kcal · mol−1.

Figure 5-10 illustrates the cumulative number of structures in the data-pool that failed the QBC criteria after each round of training. These data-points are added to the training set before the new iteration begins. 7200 is set as a limit to avoid additions of too many data-points per iteration. Hence, if more structures fail the QBC criteria per iterations, 7200 of them are selected randomly to join the training set. The network training should continue until the cumulative sum plateaus. This state is achieved when the ensemble network reaches an agreement state for all the structures in the data-pool. The agreement of the network ensemble over a prediction doesn’t necessarily mean they are predicting the correct value. All the networks in the ensemble may have similarly inaccurate predictions of a sample. Therefore, they can satisfy the QBC criteria. However, failing to satisfy the QBC criteria is an absolute sign that the network training is not successful. This is the main reason to include a test set to assess the accuracy of the predictions while the QBC criteria are satisfied.

72 22500

20000

17500

15000

12500

10000 Cumulative number of datapoints of number Cumulative 7500

0 1 2 3 4 5 Active learning iteration

Figure 5-10. The cumulative sum of the number of structures added to the training set after each iteration. To avoid additions of too many data-points per iteration, the maximum limit for the total number of structures allowed to fail the QBC criteria is 7200.

73 CHAPTER 6 CONCLUDING REMARKS AND FUTURE DIRECTIONS Parallel programming performed on graphics processing units and machine learning methods combined with the ability to quickly generate training data has become a routine in all the areas of science, specifically computational science. This thesis demonstrates a fast implementation of the nudged elastic band, and the ANI-Amber interface, which are both efforts made to create more time-efficient and accurate methods accessible to perform computational studies with the Amber molecular dynamics suite. 6.1 Final Remarks on Nudged Elastic Band∗

A fast implementation of the nudged elastic band (NEB) method into the particle mesh Ewald molecular dynamics (pmemd) module of the Amber software package both for central processing units (CPU) and graphics processing units (GPU) has been presented. The accuracy of the new implementation has been validated for three cases: a conformational change of alanine dipeptide, the α-helix to β-sheet transition in polyalanine, and a large conformational transition in human 8-oxoguanine–DNA glycosylase with DNA complex (OGG1–DNA). Timing benchmark tests were performed on the explicitly solvated OGG1–DNA system containing ∼50k atoms. The GPU-optimized implementation of NEB achieves more than two orders of magnitude speedup compared to the previous CPU implementation performed with a two-core CPU processor. The speed and scalable features of this implementation will enable NEB applications on larger and more complex systems. A fast and accurate study of conformational transitions in biological systems can lead to advancements in computational drug discoveries by assisting the rational design of novel drug molecules. Efficient sampling algorithms are pivotal in identifying transition pathways on multidimensional rugged energy surfaces with numerous degrees of freedom.

∗ Section 6.1 was reprinted/adapted with permission from Ghoreishi, D.; Cerutti, D. S.; Fallon, Z.; Simmerling, C.; Roitberg, A. E. Fast Implementation of the Nudged Elastic Band Method in AMBER. J. Chem. Theory Comput. 2019, 15, 46994707.

74 Chain-of-states methods, like NEB, can adjust to the available computational resources to efficiently predict the transition pathways with the required precision. The new implementation takes advantage of the efficient vectorization in commodity GPUs and is easily extendable to other chain-of-states methods for transition path sampling. 6.2 Final Remarks on Free Energy Calculations with ANI-Ambers

The ANI-Amber interface has been introduced in this thesis. This interface allows computational scientists to perform molecular dynamics with the fast neural network potentials. Efficient sampling is no longer a concern, as long as the selected network resembles the potential energy surface accurately. We intend to to use the available ANI-Amber interface to predict the hydration free energies at the QM level of theory. The absence of long-range interactions from ANI potentials makes them ineligible for such applications. However, training the network to produce correct energy values for clustered structures makes it possible to replicate the intermolecular interactions. The active learning framework for training a network for this purpose is presented in section 5.2.2.1. The results represented in the section 5.2.2.2 are a part of an ongoing project. It is our intention to continue this research further. After the AL training is complete and the network is prepared, a similar workflow to section 5.2.1 will be performed. The process of free energy calculations is automized and ready to be utilized. The BAR method explained in section 2.5.3 is expected result in a better convergence since it requires less overlap between the structures sampled by the two potentials. It is common practice to use FEP since performing simulations on a system containing hundreds of atoms is computationally inefficient. However, ANI potentials are significantly faster, and hence more affordable than the traditional QM methods The inclusion of ANI potentials in Amber for hybrid QM/MM calculations could be an additional effort as a future direction. The challenging aspect of this implementation is the correct representation of the electrostatic interactions between the QM and MM regions.

75 The ongoing efforts to train a network that can predict partial charges could make this implementation possible.

76 APPENDIX A KABSCH ALGORITHM Kabsch algorithm aims to find an orthogonal matrix U, which minimizes the objective function: N 1 X E = ω {Uy − x }2 (A-1) 2 i i i i=1 N N N in which {xi}i=1 and {yi}i=1 are two sets of N-dimensional vectors, and {ωi}i=1 is a set of predefined weights. The orthogonality of matrix U imposes the constraint:

N X ukiukj = δij (A-2) k=1 Hence the restricted minimization problem, by the use of Lagrange multipliers matrix L turns into:

G = E + F (A-3) in which: N N 1 X X F = l ( u u − δ ) (A-4) 2 ij ki kj ij i,j=1 k=1 Following Kabsch solution53 we can show that U should obey:

U(S + L) = R (A-5)

in which rij and sij the elements of R and S are defined via:

N X rij = ωkykixkj (A-6a) k=1

N X sij = ωkxkixkj (A-6b) k=1 The solutions to the Lagrange multipliers can be obtained from equation A-7 and the orthogonality of U:

RT R = [U(S + L)]T U(S + L) = (S + L)T U T U(S + L) = (S + L)(S + L) (A-7)

77 T And R R is symmetric and positive definite with positive eigenvalues λi and corresponding √ eigenvectors ai. Similarly S + L is symmetric positive definite with eigenvalues λi and the

same eigenvectors ai, and N X p lij = λkakiakj − sij (A-8) k=1 We can write:

Rak = λkak

= U(S + L)ak (A-9) p = λkUak p = λkbk From which the solutions to U can be derived:

N X uij = bkiakj (A-10) k=1

78 APPENDIX B PARAMETERIZATION OF A CURVE

Definition: A parametrized curve in Rn is a function γ : I → Rn from a range I to Rn B.1 Re-parameterization of a Curve

A curve in Rn can be traced out in different ways, going from one parameterization to another. Consider a curve γ :[a, b] → Rn. Also consider a function α :[a0, b0] → [a, b] where α is differentiable with a continuous derivative α0 and α0 6= 0. Then α is a one to one function from τ ∈ [a0, b0] to t ∈ [a, b] such that α(τ) = t. Now α can be used to re-parameterize the curve γ

γ(t) = γ(α(τ)) = (γoα)(τ) = γe(τ) (B-1) dt α0(τ) = dτ

γe is a re-parameterization of γ. Using chain rule gives d d γ0(τ) = ( γ , ..., γ ) e dτ e1 dτ en 0 dt 0 dt = (γ1(t) , ..., γn(t) ) dτ dτ (B-2) 0 0 0 = α (τ)(γ1(t), ..., γn(t))

= α0(τ)γ0(t)

B.2 Arclength of a Curve

Assuming a particle moving on a curve γ, its position along the curve can be shown as γ(t). Then γ0(t) represents the velocity of the moving particle and |γ0(t)| its speed.

Choosing t0 as the starting point, the arclength of the curve from t0 to t is defined as

Z t s = l(t) = |γ0(t)|dt (B-3) t0

and ds = |γ0(t)| (B-4) dt

79 0 dxi representing the curve with γ(t) = (x1, ..., xn) and γi(t) = dt we get s ds dx 2 dx 2 = 1 + ... + n (B-5) dt dt dt

B.3 Arclength Parameterization

Assume a particle moving along a curve γ(t) starting from t0 ∈ I. Since the speed |γ0(t)| > 0, s = l(t) is an increasing one to one function from I to another range J with

−1 l(t0) = 0. Now consider the inverse function α = l where α : J → I with α(s) = t. γe = γoα is the arclength re-parameterization of the curve118 with

1 1 α0(s) = = (B-6) l0(t) |γ0(t)|

From the chain rule we have γ0(t) γ0(s) = γ0(t)α0(s) = (B-7) e |γ0(t)| hence

0 |γe (s)| = 1 (B-8)

This means that whenever a curve is parametrized based on its arclength, the speed the movement always equals to one. This easily results in having

Z t 0 |γ (t)|dt = t − t0 (B-9) t0

This means that the arclength of the curve equals to the difference of the ending and the starting points.

80 APPENDIX C DERIVATION OF EQUATION (2-42) Using the identity Z α F 0(α)dα = F (α) − F (0) (C-1) 0 we can get

F (α) − F (0) = −kBT [ln Z(α) − ln Z(0)] Z α ∂ ln Z(α) = −kBT dα 0 ∂α Z α 1 ∂Z(α) = −kBT dα 0 Z(α) ∂α Z α R ∂ S(α) ∂α exp(−βV (q))dσ = −kBT R dα 0 S(α) exp(−βV (q))dσ Z α −β ∂V (q) exp(−βV (q))dσ ∂α (C-2) = −kBT R dα 0 S(α) exp(−βV (q))dσ Z α ∂ = < V (q) > dα 0 ∂α Z α ∂q = < ∇V. > dα 0 ∂α Z α = < ∇V.tˆ(tˆ.φα) > dα 0 Z α = < (∇V.tˆ)((tˆ.φ)α − tˆα.φ) > dα 0 which is equation 2-42.

81 APPENDIX D PENALTY METHOD Penalty method attempts to approximate a constrained optimization problem with an unconstrained problem119;120. A penalty term added to the objective function is responsible to impose a high cost for any constraint violation. Assume the optimization problem

Minimize{f(x): x ∈ S} (D-1)

where f(x) ∈ Rn and S ∈ Rn is a constrained set. Penalty method replaces the optimization problem in equation D-1 by Minimize{f(x) + cP (x)} (D-2)

where c , the penalty parameter, is a positive constant, and P is a continuous function where we have P (x) ≥ 0 ∀ x ∈ Rn P (x) = 0 ⇐⇒ x ∈ S If multiple constraints exist, it is advised to scale them so that each have the same amount of contribution to the penalty function. Otherwise, the method will direct to solutions which satisfy the dominant constraint rather than leading the minimum solution. Care must be taken in choosing the penalty parameter. A small value of the penalty parameter will decrease the power of the penalty function. A large value however will create steep functions at the boundaries which can cause convergence difficulties unless the search starts from a point close to the local minimum. On the other hand large and small values of the penalty parameter completely depends on the particular optimization problem. This is challenging since the function dynamically changes based on the value of x and the constraints that are violated.

82 APPENDIX E TWO DIMENSIONAL TEST POTENTIALS E.1 LEPS Potential

The LEPS14 model gives the energy of a system consisting of three atoms constrained to movements along a straight line. Only one bond, either between atoms A and B or atoms B and C, can be formed. The potential function is represented by

Q Q Q C J 2 J 2 J 2 V LEP S(r , r ) = AB + BC + A − [ AB + BC + AC AB BC 1 + a 1 + b 1 + c (1 + a)2 (1 + b)2 (a + c)2 (E-1) JABJBC JBC JAC JABJAC 1 − − − ] 2 (1 + a)(1 + b) (1 + b)(1 + c) (1 + a)(1 + c)

Q functions represent electron-nuclei interactions and J functions represent electron-electron exchange interactions.

d 3 Q(r) = ( exp(−2α(r − r0)) − exp(−α(r − r0))) 2 2 (E-2) d J(r) = (exp(−2α(r − r )) − 6 exp(−α(r − r ))) 4 0 0 In the equations above parameters are

a = 0.05 b = 0.30 c = 0.05

dAB = 4.746 dBC = 4.746 dAC = 3.445 (E-3)

r0 = 0.742 α = 1.942

E.2 LEPS Harmonic Oscillator Potential

LEPS and harmonic oscillator potential14 represents the energy of a system of four atoms. Atoms A and C are fixed. Atom B is allowed to move on the line connecting A and C and can form only one bond, either with A or C. Another degree of freedom in terms of a harmonic oscillator is introduced by adding atom D that is coupled to B. The potential function is represented by

r y V (r , y) = V LEP S(r , r − r ) + 2k (rAB − AC + )2 (E-4) AB AB AC AB c 2 c

83 The parameters are the same as the LEPS model except for

rAC = 3.742 kc = 0.2025 c = 1.154 b = 0.80 (E-5)

84 APPENDIX F ALANINE DIPEPTIDE CONFORMATIONAL CHANGE

80 pmemd 60 sander

40

20

0

-20 (Degree) Φ -40

-60

-80

-100 0 2 4 6 8 10 12 14 16 Replica ID

Figure F-1. The φ dihedral angle change in conformational transition of alanine dipeptide. The light blue line represents simulations done with the sander and the red line represents simulations done with the pmemd-CPU. The lines in the plot are averaged over individual simulations and the vertical bars represent the standard deviations along the transition region.

40

20

0

-20

(Degree) -40 Ψ

-60

-80 pmemd sander -100 0 2 4 6 8 10 12 14 16 Replica ID

Figure F-2. The ψ dihedral angle change in conformational transition of alanine dipeptide. The light blue line represents simulations done with the sander and the red line represents simulations done with the pmemd-CPU. The lines in the plot are averaged over individual simulations and the vertical bars represent the standard deviations along the transition region.

85 REFERENCES [1] Mckee, M. L.; Page, M. Computing Reaction Pathways on Molecular Potential Energy Surfaces; John Wiley & Sons, Ltd, 2007; pp 35–65. [2] Henkelman, G.; J´ohannesson, G.; J´onsson,H. Theoretical Methods in Condensed Phase Chemistry; Kluwer Academic Publishers: Dordrecht, 2002; pp 269–302. [3] E., W.; Vanden-Eijnden, E. Towards a Theory of Transition Paths. Journal of Statistical Physics 2006, 123, 503–523. [4] E., W.; Ren, W.; Vanden-Eijnden, E. Transition pathways in complex systems: Reaction coordinates, isocommittor surfaces, and transition tubes. Chemical Physics Letters 2005, 413, 242–247. [5] Cerjan, C. J.; Miller, W. H. On finding transition states. The Journal of Chemical Physics 1981, 75, 2800–2806. [6] Nguyen, D. T.; Case, D. A. On finding stationary states on large-molecule potential energy surfaces. The Journal of Physical Chemistry 1985, 89, 4020–4026. [7] Quapp, W. A gradient-only algorithm for tracing a reaction path uphill to the saddle of a potential energy surface. Chemical Physics Letters 1996, 253, 286–292. [8] Henkelman, G.; J´onsson, H. A dimer method for finding saddle points on high dimensional potential surfaces using only first derivatives. The Journal of Chemical Physics 1999, 111, 7010–7022. [9] Malek, R.; Mousseau, N. Dynamics of Lennard-Jones clusters: A characterization of the activation-relaxation technique. Physical Review E 2000, 62, 7723–7728. [10] Taylor, H.; Simons, J. Imposition of geometrical constraints on potential energy surface walking procedures. The Journal of Physical Chemistry 1985, 89, 684–688. [11] Baker, J. An algorithm for the location of transition states. Journal of Computational Chemistry 1986, 7, 385–395. [12] Sevick, E. M.; Bell, A. T.; Theodorou, D. N. A chain of states method for investigating infrequent event processes occurring in multistate, multidimensional systems. The Journal of Chemical Physics 1993, 98, 3196–3212. [13] Gillilan, R. E.; Wilson, K. R. Shadowing, rare events, and rubber bands. A variational Verlet algorithm for molecular dynamics. The Journal of Chemical Physics 1992, 97, 1757–1772. [14] J´onsson,H.; Mills, G.; Jacobsen, K. W. Nudged elastic band method for finding minimum energy paths of transitions. Classical and Quantum Dynamics in Condensed Phase Simulations. 1998; pp 385–404.

86 [15] E, W.; Ren, W.; Vanden-Eijnden, E. String method for the study of rare events. Physical Review B 2002, 66, 052301. [16] Bergonzo, C.; Campbell, A. J.; Walker, R. C.; Simmerling, C. A partial nudged elastic band implementation for use with large or explicitly solvated systems. International Journal of 2009, 109, 3781–3790. [17] Herbol, H. C.; Stevenson, J.; Clancy, P. Computational Implementation of Nudged Elastic Band, Rigid Rotation, and Corresponding Force Optimization. Journal of Chemical Theory and Computation 2017, 13, 3250–3259. [18] Czerminski, R.; Elber, R. Self-avoiding walk between two fixed points as a tool to calculate reaction paths in large molecular systems. International Journal of Quantum Chemistry 1990, 38, 167–185. [19] Choi, C.; Elber, R. Reaction path study of helix formation in tetrapeptides: Effect of side chains. The Journal of Chemical Physics 1991, 94, 751–760. [20] Elber, R.; Karplus, M. A method for determining reaction paths in large molecules: Application to myoglobin. Chemical Physics Letters 1987, 139, 375–380. [21] Crehuet, R.; Field, M. J. A temperature-dependent nudged-elastic-band algorithm. The Journal of Chemical Physics 2003, 118, 9563–9571. [22] Case, D. et al. AMBER 18. University of California, San Francisco 2018, [23] Szabo, A.; Ostlund, N. S. Modern quantum chemistry : introduction to advanced electronic structure theory; Dover Publications, 1996; p 466. [24] Davies, M. et al. Challenges; MDPI AG, 2014; Vol. 5; pp 1–4. [25] L¨owdin,P.-O. Correlation Problem in Many-Electron Quantum Mechanics I. Review of Different Approaches and Discussion of Some Current Ideas; Wiley-Blackwell, 2007; pp 207–322. [26] , P. A. M. Quantum Mechanics of Many-Electron Systems. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 1929, 123, 714–733. [27] Møller, C.; Plesset, M. S. Note on an Approximation Treatment for Many-Electron Systems. Physical Review 1934, 46, 618–622. [28] Kitaura, K.; Morokuma, K. A new energy decomposition scheme for molecular interactions within the Hartree-Fock approximation. International Journal of Quan- tum Chemistry 1976, 10, 325–340. [29] Dalgarno, A.; Victor, G. A. The Time-Dependent Coupled Hartree-Fock Approximation. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 1966, 291, 291–295.

87 [30] Sellers, B. D.; James, N. C.; Gobbi, A. A Comparison of Quantum and Molecular Mechanical Methods to Estimate Strain Energy in Druglike Fragments. Journal of Chemical Information and Modeling 2017, 57, 1265–1275. [31] Poater, J.; Sol`a,M.; Duran, M.; Fradera, X. The calculation of electron localization and delocalization indices at the Hartree-Fock, density functional and post-Hartree-Fock levels of theory. Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta) 2002, 107, 362–371. [32] Johnson, E. R.; Becke, A. D. A post-HartreeFock model of intermolecular interactions. The Journal of Chemical Physics 2005, 123, 024101. [33] Becke, A. D. A new mixing of HartreeFock and local densityfunctional theories. The Journal of Chemical Physics 1993, 98, 1372–1377. [34] Ayers, P. W.; Yang, W. Density-Functional Theory. 2003, 103–132. [35] Runge, E.; Gross, E. K. U. Density-Functional Theory for Time-Dependent Systems. Physical Review Letters 1984, 52, 997–1000. [36] Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins: Structure, Function, and Bioinformatics 2006, 65, 712–725. [37] MacKerell, A. D.; Banavali, N. K. All-atom empirical force field for nucleic acids: II. Application to molecular dynamics simulations of DNA and RNA in solution. Journal of Computational Chemistry 2000, 21, 105–120. [38] Purvis, G. D.; Bartlett, R. J. A full coupledcluster singles and doubles model: The inclusion of disconnected triples. The Journal of Chemical Physics 1982, 76, 1910–1918. [39] Bartlett, R. J.; Musia l, M. Coupled-cluster theory in quantum chemistry. Reviews of Modern Physics 2007, 79, 291–352. [40] Hobza, P.; Sponer,ˇ J. Toward True DNA Base-Stacking Energies: MP2, CCSD(T), and Complete Basis Set Calculations. 2002, [41] ez´aˇc,J.; Riley, K. E.; Hobza, P. Extensions of the S66 Data Set: More Accurate Interaction Energies and Angular-Displaced Nonequilibrium Geometries. Journal of Chemical Theory and Computation 2011, 7, 3466–3470. [42] Feller, D.; Dixon, D. A. Extended benchmark studies of coupled cluster theory through triple excitations. The Journal of Chemical Physics 2001, 115, 3484–3496. [43] Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and testing of a general amber force field. Journal of Computational Chemistry 2004, 25, 1157–1174.

88 [44] Karplus, M.; McCammon, J. A. Molecular dynamics simulations of biomolecules. Nature Structural Biology 2002, 9, 646–652. [45] Karplus, M.; Petsko, G. A. Molecular dynamics simulations in biology. Nature 1990, 347, 631–639. [46] Liverani, C.; Wojtkowski, M. P. Ergodicity in Hamiltonian Systems; Springer, Berlin, Heidelberg, 1995; pp 130–202. [47] Walters, P. An introduction to ergodic theory; Springer-Verlag, 2000; p 250. [48] Compton, A. H.; Heisenberg, W. The Physical Principles of the Quantum Theory; Springer Berlin Heidelberg: Berlin, Heidelberg, 1984; pp 117–166. [49] Bose, Plancks Gesetz und Lichtquantenhypothese. Zeitschrift f¨urPhysik 1924, 26, 178–181. [50] Chen, D.; Costello, L. L.; Geller, C. B.; Zhu, T.; McDowell, D. L. Atomistic modeling of dislocation cross-slip in nickel using free-end nudged elastic band method. Acta Mater. 2019, 168, 436–447. [51] Zhu, T.; Li, J.; Samanta, A.; Leach, A.; Gall, K. Temperature and Strain-Rate Dependence of Surface Dislocation Nucleation. Phys. Rev. Lett. 2008, 100, 025502. [52] Zhu, T.; Li, J.; Samanta, A.; Kim, H. G.; Suresh, S. Interfacial plasticity governs strain rate sensitivity and ductility in nanostructured metals. Proc. Natl. Acad. Sci. U. S. A. 2007, 104, 3031–6. [53] Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta Crystallo- graphica Section A 1976, 32, 922–923. [54] Henkelman, G.; J´onsson, H. Improved tangent estimate in the nudged elastic band method for finding minimum energy paths and saddle points. The Journal of Chemical Physics 2000, 113, 9978–9985. [55] Bergonzo, C.; Campbell, A. J.; de los Santos, C.; Grollman, A. P.; Simmerling, C. Energetic Preference of 8-oxoG Eversion Pathways in a DNA Glycosylase. Journal of the American Chemical Society 2011, 133, 14504–14506. [56] Li, H.; Endutkin, A. V.; Bergonzo, C.; Fu, L.; Grollman, A.; Zharkov, D. O.; Simmerling, C. DNA Deformation-Coupled Recognition of 8-Oxoguanine: Conformational Kinetic Gating in Human DNA Glycosylase. Journal of the American Chemical Society 2017, 139, 2682–2692. [57] Torrie, G.; Valleau, J. Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling. Journal of Computational Physics 1977, 23, 187–199. [58] Pechukas, P. Transition state theory. Annual Review of Physical Chemistry 1981, 32, 159–177.

89 [59] Gardiner, C. W. Handbook of stochastic methods for physics, chemistry and the natural sciences, vol. 13 of. Springer series in synergetics 1985, [60] Zwanzig, R. W. HighTemperature Equation of State by a Perturbation Method. I. Nonpolar Gases. The Journal of Chemical Physics 1954, 22, 1420–1426. [61] Ma, S.-K. Statistical Mechanics. World Scientific 1985, [62] Bennett, C. H. Efficient estimation of free energy differences from Monte Carlo data. J. Comput. Phys. 1976, 22, 245–268. [63] Gao, J.; Freindorf, M. Hybrid ab Initio QM/MM Simulation of N-Methylacetamide in Aqueous Solution. 1997, [64] Gao, J. Absolute free energy of solvation from Monte Carlo simulations using combined quantum and molecular mechanical potentials. J. Phys. Chem. 1992, 96, 537–540. [65] Wesolowski, T.; Warshel, A. Ab Initio Free Energy Perturbation Calculations of Solvation Free Energy Using the Frozen Density Functional Approach. J. Phys. Chem. 1994, 98, 5183–5187. [66] Luzhkov, V.; Warshel, A. Microscopic models for quantum mechanical calculations of chemical processes in solutions: LD/AMPAC and SCAAS/AMPAC calculations of solvation energies. J. Comput. Chem. 1992, 13, 199–213. [67] K¨onig,G.; Hudson, P. S.; Boresch, S.; Woodcock, H. L. Multiscale Free Energy Simulations: An Efficient Method for Connecting Classical MD Simulations to QM or QM/MM Free Energies Using Non-Boltzmann Bennett Reweighting Schemes. J. Chem. Theory Comput. 2014, 10, 1406–1419. [68] Hudson, P. S.; Boresch, S.; Rogers, D. M.; Woodcock, H. L. Accelerating QM/MM Free Energy Computations via Intramolecular Force Matching. J. Chem. Theory Comput. 2018, 14, 6327–6335. [69] Giese, T. J.; York, D. M. Development of a Robust Indirect Approach for MM QM Free Energy Calculations That Combines Force-Matched Reference Potential and Bennett’s Acceptance Ratio Methods. J. Chem. Theory Comput. 2019, acs.jctc.9b00401. [70] Wang, M.; Mei, Y.; Ryde, U. HostGuest Relative Binding Affinities at Density-Functional Theory Level from Semiempirical Molecular Dynamics Simulations. J. Chem. Theory Comput. 2019, 15, 2659–2671. [71] Beierlein, F. R.; Michel, J.; Essex, J. W. A Simple QM/MM Approach for Capturing Polarization Effects in ProteinLigand Binding Free Energy Calculations. J. Phys. Chem. B 2011, 115, 4911–4926. [72] Fox, S. J.; Pittock, C.; Tautermann, C. S.; Fox, T.; Christ, C.; Malcolm, N. O. J.; Essex, J. W.; Skylaris, C.-K. Free Energies of Binding from Large-Scale First-Principles

90 Quantum Mechanical Calculations: Application to Ligand Hydration Energies. J. Phys. Chem. B 2013, 117, 9478–9485. [73] And, T. H. R.; Ryde, U. Accurate QM/MM Free Energy Calculations of Enzyme Reactions: Methylation by Catechol O-Methyltransferase. 2005, [74] Rod, T. H.; Ryde, U. Quantum Mechanical Free Energy Barrier for an Enzymatic Reaction. Phys. Rev. Lett. 2005, 94, 138302. [75] Bishop, C. M. Pattern Recognition and Machine Learning; Springer, 2006. [76] Kline, D. M.; Berardi, V. L. Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Comput. Appl. 2005, 14, 310–318. [77] Falas, T.; Stafylopatis, A.-G. The impact of the error function selection in neural network-based classifiers. IJCNN’99. Int. Jt. Conf. Neural Networks. Proc. (Cat. No.99CH36339). pp 1799–1804. [78] Cohn, D. A.; Ghahramani, Z.; Jordan, M. I. Active Learning with Statistical Models. J. Artif. Intell. Res. 1996, 4, 129–145. [79] Fukumizu, K. Statistical active learning in multilayer perceptrons. IEEE Trans. Neural Networks 2000, 11, 17–26. [80] Freund, Y.; Seung, H. S.; Shamir, E.; Tishby, N. Selective Sampling Using the Query by Committee Algorithm. Mach. Learn. 1997, 28, 133–168. [81] Pan, S. J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [82] Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chemical Science 2017, 8, 3192–3203. [83] Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data 2017, 4, 170193. [84] S Smith, J.; Nebgen, B. T.; Zubatyuk, R.; Lubbers, N.; Devereux, C.; Barros, K.; Tretiak, S.; Isayev, O.; Roitberg, A. Outsmarting Quantum Chemistry Through Transfer Learning. 2018, [85] Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018, 148, 241733. [86] Behler, J.; Parrinello, M. Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces. Phys. Rev. Lett. 2007, 98, 146401. [87] Chai, J.-D.; Head-Gordon, M. Systematic optimization of long-range corrected hybrid density functionals. J. Chem. Phys. 2008, 128, 084106.

91 [88] Hehre, W. J.; Ditchfield, R.; Pople, J. A. SelfConsistent Molecular Orbital Methods. XII. Further Extensions of GaussianType Basis Sets for Use in Molecular Orbital Studies of Organic Molecules. J. Chem. Phys. 1972, 56, 2257–2261. [89] Reymond, J.-L. The Chemical Space Project. Acc. Chem. Res. 2015, 48, 722–730. [90] Ruddigkeit, L.; van Deursen, R.; Blum, L. C.; Reymond, J.-L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864–2875. [91] Fink, T.; Reymond, J.-L. Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discov. 2007, [92] Fink, T.; Bruggesser, H.; Reymond, J.-L. Virtual Exploration of the Small-Molecule Chemical Universe below 160 Daltons. Angew. Chemie Int. Ed. 2005, 44, 1504–1508. [93] Riplinger, C.; Pinski, P.; Becker, U.; Valeev, E. F.; Neese, F. Sparse mapsA systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory. J. Chem. Phys. 2016, 144, 024109. [94] Neese, F. The ORCA program system. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2, 73–78. [95] Ghoreishi, D.; Cerutti, D. S.; Fallon, Z.; Simmerling, C.; Roitberg, A. E. Fast Implementation of the Nudged Elastic Band Method in AMBER. J. Chem. Theory Comput. 2019, 15, 4699–4707. [96] Meng, Y.; Sabri Dashti, D.; Roitberg, A. E. Computing Alchemical Free Energy Differences with Hamiltonian Replica Exchange Molecular Dynamics (H-REMD) Simulations. J. Chem. Theory Comput. 2011, 7, 2721–2727. [97] Swails, J. M.; York, D. M.; Roitberg, A. E. Constant pH Replica Exchange Molecular Dynamics in Explicit Solvent Using Discrete Protonation States: Implementation, Testing, and Validation. J. Chem. Theory Comput. 2014, 10, 1341–1352. [98] Bergonzo, C.; Henriksen, N. M.; Roe, D. R.; Swails, J. M.; Roitberg, A. E.; Cheatham, T. E. Multidimensional Replica Exchange Molecular Dynamics Yields a Converged Ensemble of an RNA Tetranucleotide. J. Chem. Theory Comput. 2014, 10, 492–499. [99] Cruzeiro, V. W. D.; Amaral, M. S.; Roitberg, A. E. Redox potential replica exchange molecular dynamics at constant pH in AMBER: Implementation and validation. J. Chem. Phys. 2018, 149, 072338. [100] Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone

92 Parameters from ff99SB. Journal of Chemical Theory and Computation 2015, 11, 3696–3713. [101] Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics 1983, 79, 926–935. [102] Li, H.; Endutkin, A. V.; Bergonzo, C.; Campbell, A. J.; de los Santos, C.; Grollman, A.; Zharkov, D. O.; Simmerling, C. A dynamic checkpoint in oxidative lesion discrimination by formamidopyrimidine–DNA glycosylase. Nucleic Acids Research 2016, 44, 683–694. [103] Dmitriy S. Chekmarev,; ; Tateki Ishida,; Levy*, R. M. Long-Time Conformational Transitions of Alanine Dipeptide in Aqueous Solution: Continuous and Discrete-State Kinetic Models. The Journal of Physical Chemistry B 2004, 108, 19487–19495. [104] Nguyen, H.; Roe, D. R.; Simmerling, C. Improved Generalized Born Solvent Model Parameters for Protein Simulations. Journal of Chemical Theory and Computation 2013, 9, 2020–2034. [105] Mathews, D. H.; Case, D. A. Nudged elastic band calculation of minimal energy paths for the conformational change of a GG non-canonical pair. Journal of molecular biology 2006, 357, 1683–93. [106] Mongan, J.; Simmerling, C.; McCammon, J. A.; Case, D. A.; Onufriev, A. Generalized Born Model with a Simple, Robust Molecular Volume Correction. 2006, [107] Cheng, X.; Kelso, C.; Hornak, V.; de los Santos, C.; Grollman, A. P.; Simmerling, C. Dynamic Behavior of DNA Base Pairs Containing 8-Oxoguanine. 2005, [108] Song, K.; Hornak, V.; de los Santos, C.; Grollman, A. P.; Simmerling, C. Computational Analysis of the Mode of Binding of 8-Oxoguanine to Formamidopyrimidine-DNA Glycosylase. 2006, [109] Song, K.; Campbell, A. J.; Bergonzo, C.; de Los Santos, C.; Grollman, A. P.; Simmerling, C. An Improved Reaction Coordinate for Nucleic Acid Base Flipping Studies. Journal of chemical theory and computation 2009, 5, 3105–13. [110] Le Grand, S.; G¨otz,A. W.; Walker, R. C. SPFP: Speed without compromise–A mixed precision model for GPU accelerated molecular dynamics simulations. Computer Physics Communications 2013, 184, 374–380. [111] Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 2004, 25, 1157–1174. [112] Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. THE weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. J. Comput. Chem. 1992, 13, 1011–1021.

93 [113] Jorgensen, W. L.; Gao, J. Cis-trans energy difference for the peptide bond in the gas phase and in aqueous solution. J. Am. Chem. Soc. 1988, 110, 4212–4216. [114] Wu, Y.; Tepper, H. L.; Voth, G. A. Flexible simple point-charge water model with improved liquid-state properties. J. Chem. Phys. 2006, 124, 024503. [115] Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Hermans, J. Interaction Models for Water in Relation to Protein Hydration. Intermolecular Forces 1981, 331–342. [116] Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T. P. The missing term in effective pair potentials. J. Phys. Chem. 1987, 91, 6269–6271. [117] Frisch, M. J. et al. Gaussian 09, Revision D.01. Gaussian, Inc., Wallingford CT. 2013, [118] Shahshahani, S. Differential and Integral Calculus. 2008, [119] Di Pillo, G.; Grippo, L. Exact Penalty Functions in Constrained Optimization. SIAM Journal on Control and Optimization 1989, 27, 1333–1360. [120] Tessema, B.; Yen, G. A Self Adaptive Penalty Function Based Algorithm for Constrained Optimization. 2006, 246–253.

94 BIOGRAPHICAL SKETCH Delaram Ghoreishi was born in Tehran, Iran. She attended the Farzanegan middle and high schools, administered under the National Organization for Development of Exceptional Talents. She received her Bachelor of Science degree in physics from the Sharif University of Technology in 2013. Delaram began her graduate studies in the Department of Physics at the University of Florida in August 2013 and graduated in December 2019 with a Doctor of Philosophy degree. Her research involved implementing and validating methods into the Amber suite of biomolecular simulation programs.

95