<<

)ORULGD6WDWH8QLYHUVLW\/LEUDULHV

2020 Machine Learned Force Fields Cole Nathaniel Sheridan

Follow this and additional works at DigiNole: FSU's Digital Repository. For more information, please contact [email protected] dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS & SCIENCES

MACHINE LEARNED FORCE FIELDS

By

COLE SHERIDAN

A Thesis submitted to the Department of Scientific Computing in partial fulfillment of the requirements for graduation with Honors in the Major

Degree Awarded: Fall, 2020

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

The members of the Defense Committee approve the thesis of Cole Sheridan defended on November 20, 2020.

Dr. Chen Huang Thesis Director

Dr. Wei Yang Outside Committee Member

Dr. Albert DePrince Committee Member

Dr. Xiaoqiang Wang Committee Member

1

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Acknowledgements

I would like to express my sincerest thanks to Dr. Chen Huang, for his continuous support and encouragement, and his exceptional leadership throughout the course of this research. Without his assistance, this project may have never been realized. It has been an honor and a pleasure to do research with you throughout my undergraduate career. I would also like to express my gratitude to my defense committee, Dr. Chen Huang, Dr. Xiaoqiang Wang, Dr. Wei Yang, and Dr. Albert De Prince. While the circumstances surrounding the time of this prospectus may have caused difficulties in meeting, the continued support throughout this endeavor is greatly appreciated. Furthermore, a sincere thanks to Florida State University (FSU) and the FSU Department of Scientific Computing for providing the foundational education and resources required to complete this research project. Finally, a big thank you to my family and friends, who helped provide emotional support throughout these trying times.

i

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Table of Contents

List of Figures ……………………………………..……………………………………………. iii List of Tables ……………………………………..……………………………………………. iv Abstract ………………………………………………………………………………………… v Chapter 1. Introduction to Density Functional Theory 1 1.1 Density Functional Theory ……………………………………………………………. 1 1.2 Gaussian Basis and Force Calculation ……………………………………………….… 2 1.3 Force Calculation …………………..………………………………………...………..… 3 1.4 Introduction to NWChem Program ..…………………………………………………..… 4 Chapter 2. Introduction to Molecular Dynamics 7 2.1 Basics of Molecular Dynamics Simulation ………………………………………….…. 7 2.2 Force Fields ……………………………………………………………….…………. 9 Chapter 3. Introduction to Neural Networks 11 3.1 Background ………………….……………………………………………………….. 11 3.2 ANN Force Fields ……………………………...……………………………………... 14 Chapter 4. Machine-Learned Force Fields for C-X and Radicals 17

4.1 Neural Networks for Describing Heterogeneous Molecular Systems ...... ………… 17 4.2 Training Set …………………………………………………………………………… 18 4.3 Training the Neural Network …………………………………………………………… 22 Chapter 5. Machine-Learning for A Simple Three-Dimensional 27 5.1 Neural Networks for Describing Three-Dimensional Molecules……………...... ….... 27 5.2 Application to Cyanopolyyne …………………………...... ……………………...... 28 5.3 Results And Discussion ………………….……………...... ………………………... 30 Chapter 6 Conclusion 33

References 34

ii

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

List of Figures

Figure 1.1 NWChem input file …………………………………………………….……………. 4 Figure 2.1 Flowchart of processes for performing molecular dynamics simulations …………. 7 Figure 3.1 Python code for a neural network using matrices ………….…………………..….... 14 Figure 4.1 ANN for ……………………………………………………….. 17 Figure 4.2 Python code for reproducing NWChem input files ………………………………… 19

Figure 4.3 Job submission script for C2 …….……………………………….………………… 20 Figure 4.4 Graph of DFT energy vs bond length …………………………….………………… 21 Figure 4.5 Graph of partial charge vs bond length …………………………………………… 21 Figure 4.6 Outputs of the original neural network for DFT energies ……………………… 23 Figure 4.7 Outputs of the original neural network for partial charges ……………………… 23 Figure 4.8 Error from different initialization seeds for ANNs for two hidden layers ………..… 24 Figure 4.9 Error from different initialization seeds for ANNs for three hidden layers ……..… 25 Figure 4.10 Errors in calculation from different ANN sizes ………………………………… 26 Figure 5.1 Structure of a cyanopolyyne molecule …………………………….……………… 28 Figure 5.2 Results of training in with three MD points ………………………………………… 31 Figure 5.3 Results of training in with 30 MD points ………………………………………… 31 Figure 5.4 Results of training in with 110 MD points ………………………………………… 32

iii

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

List of Tables

Figure 5.1 Input of cyanopolyyne molecule at the time step 90 ………..………….…………… 28

iv

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Abstract

Studying molecules through their potential energy surfaces using molecular dynamics have greatly advanced our understanding of chemical and physical processes in many exciting systems. To perform these simulations, we need atomic forces. Accurate atomic forces can be calculated using ab initio methods (such as the density functional theory). However, simulations based on ab initio methods are computationally costly. To reduce the cost, force fields that describe the interaction between atoms are often used. The accuracy of the simulations is then determined by the accuracy of force fields. Machine learning has become a promising method to generate force fields with an accuracy close to ab initio methods. Especially, artificial neural networks (ANN) have been shown to be an efficient and adaptable method for generating accurate force fields. One prominent challenge for ANN force fields is to represent different chemical elements in heterogeneous systems. In this thesis, we develop a new ANN force field that is capable of recognizing different chemical elements by using atomic numbers as additional descriptors. Our new method is demonstrated by generating ANN force fields for C-X systems in a one-dimensional space, where X stands for H, O, N, and C atoms. Afterwards, ANN methods are used to calculate the molecular force fiend of a cyanopolyyne molecule in three-dimensional space based on the partial energy of each atom in the molecule.

v

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Chapter 1. Introduction to Density Functional Theory

1.1 Density Functional Theory To solve quantum systems, we often adopt the Born-Oppenheimer approximation, meaning that the nuclei and electrons of molecules in the system are treated as separate cases due to the difference in mass between the nuclei and the electrons [1]. The system’s time-independent wave function can then be solved as

which gives

Above, is the system’s Hamiltonian, is the electronic energy, is the kinetic energy, is the potential energy from the nucleus, and is the electron-electron interaction energy. and are the same for all N-electron systems, while depends on the system. The above equation is an eigenvalue problem with the eigenvalue E and the eigenvector . For systems that have many electrons, the above equation is very difficult to solve due to the electron-electron interaction term .

Density Functional Theory (DFT) is a method for calculating the electronic structure in materials and molecules efficiently by condensing the wave function to electron density. DFT is based on the Hohenberg-Kohn (HK) Theorem [2].The HK theorem proves that a non-degenerate quantum system is fully determined by its electron density, given as

where is the number of electrons, is the system’s wave function, and is the coordinate of electron i. In other words, HK theorem proves that the external potential, , is determined by the system’s electron density, , and therefore the electron density determines the ground-state wave function and all other electronic properties [3]. On the other hand, the total energy of a quantum system can be obtained by minimizing the total energy with respect to its many-electron wave function

Since the external potentialmin and the wave function are determined by the electron density, HK theorem also proved that the total energy is a functional of electron density, as 1

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

where is a universal functional in terms of density and is dependent on the system. Though it is often difficult, if not impossible, to find , resulting in the development of various methods to perform DFT [3]. One such method is the Kohn-Sham (KS) DFT [4], which is considered one of the most popular methods available for simulating quantum systems. Under KS-DFT, the electronic structure of the system is evaluated by the potential acting on the system’s electrons as a sum of the external and effective potentials. This allows a material with N electrons to be represented by N one-electron KS equations [4]. KS equation is a single-electron Schrödinger equation for non- interacting electrons that move in a local effective potential, , (which is also known as the KS potential [4]). KS-DFT then seeks to solve the KS orbitals, , by solving the following ordinary differential equation

where is the orbital energy of . The system’s electron density is calculated as

where is the occupation number of orbital i. The total energy of the system in KS-DFT can be expressed as a functional of the density as

where is the kinetic energy, is the external potential, is the Hartree energy, and is the exchange-correlation (XC) energy. The KS potential is obtained by taking the functional derivative of the total energy with respect to the electron density as

One key element in KS-DFT is the XC energy functional, , which needs to be approximated in practice.

1.2 Gaussian Basis In order to efficiently represent and solve the KS orbitals, KS-DFT requires a basis set of functions, often, comprised of either atomic orbitals or plane waves. The most commonly implemented basis set, and that which will be used for this study, is the Gaussian basis set. Gaussian type orbitals (GTO), from which the Gaussian basis set is derived, are a set of functions

2

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

that, because of the Gaussian product theorem, may speed up the calculation of integrals up to four to five orders of magnitude when compared to traditional Slater orbitals [5]. This speedup may be contributed to the fact that two GTOs that are centered on two different atoms are equivalent to the finite sum of the Gaussians centered on the axis that connects the two atoms. When represented in Cartesian coordinates, the primitive GTO is defined as

where k, m, and n are the Cartesian vector representation of the angular momentum. However, primitive GTOs on their own cannot well represent the atomic orbital, as they lack the tail [5]. Therefore, the atomic orbitals must be approximated using multiple GTOs to create a contracted Gaussian (CG) such that

where is the number of GTOs. During the KS-DFT calculations, are fixed. The minimum basis set is the STO-3G basis set, using three GTOs to represent each atomic orbital, though this may be extended further into double, triple, or quadruple zeta basis sets, which use two, three, or four CGs to represent the atomic orbital respectively [5].

1.3 Force Calculation To derive the forces in the context of DFT, we first write down the total energy, which is a functional of the electron density and the atomic positions { },

where is the nuclear charge of atom i and is the coordinates of atom i. is the number of atoms. The force on atom i is obtained as

where is the system’s chemical potential. When taking into consideration that the first term on the right hand side of the above equation is zero due to the conservation of electron number and the fact that the KS kinetic energy, , the exchange-correlation energy, and the Hartree energy are independent of { }, the force on atom i can be rewritten as 3

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

In the case that basis set depends on atoms’ positions, an additional force needs to be calculated, that is the Pulay force.

1.4 Introduction to NWChem Program In this thesis, NWChem program is used for the KS-DFT calculations. NWChem is an ab initio software package designed for high-performance parallel computing, with quantum chemical and molecular dynamics capabilities. As of the latest release, version 7.0.0, NWChem is capable of performing molecular mechanics, molecular dynamics, Hartree-Fock, density functional theory, time-dependent density functional theory, post-Hartree- Fock methods, QM/MM, and ONIOM. NWChem is appealing for its scalability and ability to use parallel computing resources. We use the version 6.6.0 of NWChem, and run DFT computations parallel on the Florida State University high-performance computing supercomputer. NWChem input files are generated using the Avogadro program, a molecular visualization software designed for cross-platform use, allowing for the creation of molecules or molecule chains with minimized energy fields. In addition to performing energy minimization, Avogadro supports NWChem and can generate input files for NWChem. Further information about Avogadro may be found elsewhere [6].

Figure 1.1 NWChem input file for performing DFT calculation for (CH) with a bond length of 0.7 Angstroms. All other input files follow a similar structure, though it should be noted C2 and CO have a multiplicity of one due to having zero total magnetic moment, while CH and CN radicals have a multiplicity of two (set on line 19) due to their having one unpaired electron.

4

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

As seen in Figure 1.1, NWChem input files required to complete this study are quite simple, requiring only 30 lines. In its simplest form, the NWChem input file requires a molecular geometry, basis set, and task. The start or restart directive is not required in the input file, but is recommended to tell NWChem whether or not the file is a new calculation, requiring a new database to be generated, or a repeat of a completed calculation, meaning the old database can be reused. If the start or restart directives are not specified in the input file, NWChem will infer which to use based on the name of the input file and whether a database for the molecule exists. An additional string variable may be specified after the start/restart directive, and if specified, the files generated by NWChem will all use the specified string as a prefix to the file name. The title directive is also optional, specifying a string of text either in brackets or quotations marks, the later required if the string has white space, that is assigned all tasks or jobs following until redefined to allow the user to identify which job or jobs are associated with a database. In the case of our study, it is imperative to use the restart directive to ensure each NWChem job runs using the output of the previous job, allowing for a more accurate calculation of KS-DFT, and a smoother energy curve. As a result of this, all input files for a given molecule or radical in our training set have the same title, as for the restart directive to work, the NWChem jobs must share their title. The charge directive specifies how many electrons to either subtract (having a positive charge) or add (having a negative charge) to the system. As the default, the molecular charge in NWChem is zero. The charge directive need not be specified for either of our training sets, but is included to adhere with best practice. The final optional directive appearing in our input file is the echo directive. The echo directive is a simple one-word directive with no variables that follow. It does not matter where the echo directive is in the file, as long as it has its own line separate from any other directives. Including the echo directive causes the contents of the input file to be included in the log file for the NWChem simulation. This is included in our study to help troubleshoot any complications that may occur when running many simulations. As for the mandatory directives, the geometry directive allows for the specification of the molecular geometry for the simulation, with three key components: keywords on the first line; including units and print or noprint, with x, y, and z are arguments, which includes the coordinates for the specified planes in the log file; the symmetry information, autosym or noautosym, which specifies whether NWChem should assume the symmetry of the geometric system or not; and the cartesian coordinates of the atom(s) in the molecule. Lattice parameters may also be specified for periodic systems. The basis directive allows the user to specify a basis set or library to be used in the NWChem calculations. Defining a basis library tells NWChem to perform the simulation using one of the standard basis sets in NWChem. Basis sets may be defined for specific atoms using their atomic symbol, all atoms in the simulation using an asterisk, or all atoms except certain ones using an asterisk and the keyword except followed by the atomic symbol. In this study, we are using the def2-tzvpd set for our one-dimensional case, the valence triple-zeta polarization with additional diffuse functions, in hopes that the increased range over the default basis set for NWChem may allow for greater stability. For the three-dimensional training set, the default 6-31G* basis set is utilized for the sake of time. The DFT module allows for the calculation of closed and open shell densities and KS orbitals using the Gaussian basis set. While a plethora of directives are available in the DFT module to allow for certain calculations, for this study we are interested only in the XC

5

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

functional and the multiplicity. The XC functional is set using the xc directive. In our study, we will use the B3LYP functional for its accuracy in DFT and MD calculations. The multiplicity of the system is defined using the mult directive. The multiplicity is defined as 2S+1, where S is the total spin of the system. For closed-shell molecules, such as the cyanopolyyne molecule, there is no unpaired electron, resulting in a singlet ground state (a total spin of 0, that is, S=0, which gives a multiplicity of 1). However, the radicals in our study have one unpaired electron, resulting in a total spin of 1/2, that is, a multiplicity of 2. Additionally, for this study the cgmin parameter is specified, which directly minimizes the KS-DFT energy, instead of using the traditional method in which the KS-DFT equation is solved as a fixed-point equation in a self-consistent manner. Often, direct minimization is slower than solving a fixed-point equation. However, direct minimization provides a more stable approach by ensuring that the total energy decreases with each step. It should also be noted that the maximum number of iterations for NWChem has been increased to 5000 iterations in the DFT module, as the molecules and radicals in our training set failed to converge within the default number of iterations (30) for NWChem. The properties module serves a similar purpose to the DFT module, allowing for the user to make specifications for the NWChem simulation. Once again, the full extent of the properties module is beyond the scope of this review. For the purposes of our study, we need only be concerned with the Mulliken directive to perform Mulliken population analysis in order to obtain partial charges on atoms. Finally, the task directive is a single-line directive requiring a theory and an operation to be defined, an operation being a specific calculation within the scope of the theory. It should be noted that the DFT and property descriptors in the task directive are not the same as the DFT and property modules. The property descriptor in the task directive specifies that NWChem will calculate the properties of the wave function. Further explanations of NWChem and the input files can be found in the NWChem documentation [7].

6

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Chapter 2. Introduction to Molecular Dynamics

2.1 Basics of Molecular Dynamics Molecular dynamics (MD) is a computational method for simulating the movement of atoms and molecules in systems for which analytical chemical analysis of the system may not be possible due to the vast number of atoms. A brief illustration of the process of MD simulations is given in the figure below.

Figure 2.1 Flowchart of processes for performing MD simulations.

The concept of performing MD simulations is considered to derive from the early 1950s, with principles built off Monte Carlo simulations popularized in the eighteenth century [8]. The study of N-body systems and the time evolution of such systems dates to the fifteenth century, with Newton’s studies, and continued studies during the time period, of celestial bodies and the mechanics and stability of the solar system [8]. Many of the methods developed during this period are still used in MD simulations today. Original work with studying N-body systems using analog computers dates as early as 1941, where a study was performed to model atomic motions with physical models in hopes to replicate the structure of liquids and examine their behavior [8]. With the discovery of computers, Fermi published his 1955 paper on the use of MANIAC I to compute the time evolution of equations for a many-body system, though the first realistic simulation of

7

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

matter occurred half a decade later, in 1960, when Gibson et al. simulated radiation damage of solid copper using a Born-Mayer interaction [9]. Today, many software packages are available for performing MD simulations, including AMBER, LAMMPS, and NWChem. For the sake of this study, the MD simulations will be performed using NWChem, in which the Newtonian equations are integrated using the Verlet method and the NVT simulations will be performed. Verlet integration is often used for MD simulations, as it is time-reversible and conserves the system’s total energy (kinetic energy + potential energy) for NVE simulations. Newton’s equation of motion is

in which t is time, is the position vector of N atoms at time t, V is the scalar potential and is a function of atoms’ coordinates, are the atomic forces, are the negative gradients of the potential , and is the mass for particle k. The velocity Verlet method, the most used form of the Verlet method, is given below

in which is the time step and is the acceleration. There are three main ensembles under which molecular dynamics situations are often performed: NVE, NPT, and NVT, known as the microcanonical ensemble, the isothermal-isobaric ensemble, and the canonical ensemble respectively, the last of which is used for this study. For the NVE ensemble, the number moles (N), volume (V), and total energy (E) of the system do not change, representing an adiabatic process -- one without an exchange of heat with the environment. For the NPT ensemble, the number of moles (N), pressure (P), and temperature (T) of the system are conserved. The NPT ensemble is considered ideal for performing simulations that replicate a laboratory setting with an open flask and ambient temperature and pressure. Finally, for the NVT ensemble, the number of moles (N), volume (V), and temperature (T) of the system are conserved. The NVT ensemble is reliant on thermostat, which allows the generation of a statistical ensemble of the probability of each microstate at a constant temperature. There are a plethora of thermostats to choose from for the NVT ensemble, each either adding or removing energy to or from the system in order to maintain a constant temperature, but doing so in more or less realistic ways depending on the thermostat. Four most popular thermostats for NVT ensemble are the Nosé-Hoover thermostat, the Berendsen thermostat, the Anderson thermostat, and Langevin dynamics. The Nosé-Hoover thermostat was originally developed by Nosé, who introduced an extra degree of freedom to the Hamiltonian equation of the system to account for a single-particle heat bath to maintain the system’s temperature, then further improved by Hoover. The Berendsen thermostat rescales the velocities of the atoms to maintain the desired temperature. The Andersen thermostat maintains

8

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

the temperature by reassigning atoms’ velocities according to the Maxwell-Boltzmann distribution. Langevin dynamics is a mathematical approach to mimic the friction and thermal fluctuations by the bath while maintaining a constant temperature through a dampening constant and a noise term. Note that all these thermostats cannot produce the dynamics of the true NVT ensemble, because they artificially modify the dynamics of atoms.

2.2 Force Fields A force field is a mathematical analytical expression of the energy of a system as it relates to the coordinates of the molecules and/or atoms within. A system’s force field is compromised of the interatomic potential energy between the molecules in the system, and parameters related to the form of the system. The parameters for the system are typically generated from ab initio calculations or experimental data. In this report, we use KS-DFT to calculate these parameters. While there are a multitude of force fields available, the expression consists of the intermolecular and local contributions to the total energy and the repulsion and van der Waals (vdW) interactions in the system, such as in the case below:

where bond stretching, , bond angle bending, , dihedral angle torsion, , inversion energy/improper torsion, , vdW energy, , and electrostatic energy, are defined below

cos

cos The force field equation may take many forms. In the form demonstrated above, a harmonic potential is considered for the bond stretching and bond angle bending terms; while other functions, such as the Morse potential could be used to improve accuracy. Harmonic functions are much less expensive in terms of computing, and are reasonably good approximations of bond stretching. It should be noted however that harmonic potentials are poor approximations for bond lengths larger than 10% of the equilibrium value, and prevents the study of chemical processes as the harmonic bond cannot be broken. Angle bending can also be represented by a harmonic potential, a trigonometric potential, or the Urey-Bradley potential [10]. Both the angle bending

9

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

and bond stretching are typically not relevant to study the properties of interest, thus why a rigid structure may be used as a substitution [11]. Dihedral and torsional terms are required for molecules containing four or more atoms. As a result, these terms are not taken into account during the one-dimensional phase of this experiment. Dihedral and torsional potentials are used in determining the local structure of the molecule and its relative stability, important for calculating the symmetry functions in a three-dimensional environment. VdW forces arise from the dynamical polarization between two objects and are often modeled by the Lennard-Jones (LJ) potential. VdW forces act between atoms belonging to different molecules in the system and sometimes between sufficiently separated atoms within the same molecule, though the later does not occur in this study. The LJ potential is defined by the depth of the potential well, , the distance between particles, , and either in terms of the distance at which the potential is zero, , or the distance where the potential reaches a minimum , the expression for the LJ potential is

in which the term represents the Pauli repulsion, and the term represents the attractive vdW forces at long range.

10

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Chapter 3. Introduction to Neural Networks

3.1 Background Artificial neural networks (ANN) are a series of algorithms made to replicate the decision making processes of the human brain, allowing computers to interpolate information based off of a training or learning set of data in manner used to make inferences about an unknown test set of data. ANNs consist of neurons, or nodes, arranged in multiple layers including an input layer, an output layer, and some number of hidden layers in between. Data is fed into the nodes within the input layer with a pre-determined weight and bias given to the node. The node then calculates the desired output using an activation function, which maps the input to a certain threshold for which the node fires either if it meets or exceeds the threshold, or fails to fire if it does not. There are many activation functions available, such as the sigmoid function, hyperbolic tangent, sine function, and binary step. In this study, the hyperbolic tangent will be used for the activation function. While for the most part the activation function used in the input and hidden layers of ANN does not matter, it is suggested that a linear activation function is used for the final, output layer of the ANN to allow for an open range of outputs. Once the input is processed and the either fires or fails to fire, the output of each node is fed into the nodes of the next layer, iteratively, until the final output is obtained from the node, or nodes, in the output layer. Mathematically, for an example ANN with a single hidden layer, with M input nodes, nodes in the hidden layer, and one node in the output layer, the output of the ANN may be expressed as

where is the weights connecting node j in layer k with node i in layer k-1, is the bias associated with node i in layer k, is the bias given to the output layer, otherwise for an ANN with x layers, and is the activation function associated with layer k. An ANN can also be written compactly using a series of matrix multiplications. Consider an ANN has M inputs, organized in a M 1 vector and denoted by . The weights of each layer are represented by matrices ( ), where and are the number of nodes in the layer i and the layer of the ANN, respectively. Also denote the biases on the nodes of layer as a vector . The output ( ) of each layer can be written as …

11

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

where is the activation functions for the nodes on layer . The first attempt at artificially simulating the decision-making hierarchy of the human brain using artificial neurons can be attributed to McCulloch and Pitts in 1943. McCulloch and Pitts’ artificial neuron used an activation threshold to determine if, based on the inputs to the neuron, the neuron with yield and output of zero, failing to fire or remaining inactive, or one, firing and stimulating its neighbors [12]. Rosenblatt would create the first artificial network arranging neurons into an input and output layer, called the perceptron, fifteen years later in 1958. While Rosenblatt’s perceptron worked for simple logical functions, Minsky and Papert introduced the hidden layer in 1969 to further expand the functions that could be represented by an artificial neural network. While the hidden layers increased the scope of the neural networks, they came with the drawback of substantial difficulty in optimizing the weights and connections of the nodes between the input and hidden layers of the network [13]. The optimization issue was resolved by Paul Werbos in 1974 with the introduction of backpropagation as a method to teach the neural network. During the same year, Little proposed a non-linear activation function to allow for a continuous range of outputs for the perceptron vs the traditional binary output [12]. ANNs learn information based upon a training set of data. During the training process for supervised or semi-supervised ANN training, the data within the training set is fed into the neural network, and the output from the ANN is compared to the expected output for the training data. The nodes are then recursively updated to reflect weights and biases that corrects errors in the ANN’s output compared to the expected output through the process of backpropagation. During the training and backpropagation process, the weights and biases of the ANN are not fixed, rather each value in the training set has a fixed input and expected output. Once the input has gone through the ANN and an output is obtained, the gradient for the cost function, usually a cross entropy or squared error loss function, based on the weights of the network is calculated using the ANN’s output, the input’s expected output, and the chain rule. This can be utilized to find optimal weights and biases by minimizing the cost function in respect to the weights and biases, which will be touched upon more later. It is also possible to train ANNs without a training set, in a process called unsupervised learning. However, due to the difficulties in find errors in the ANN due to the lack of a training set, unsupervised learning has a limited scope of use, and is not utilized in this study. It is inefficient for the backpropagation algorithm to calculate the cost for each weight separately, due to duplicate calculations and the need to calculate intermediate calculations at each stage. To maximize efficiency, the backpropagation algorithm calculates the gradient of the weighted input of each layer, denoted by L, backwards through ANN, starting at the output layer. What allows this to work for backpropagation is the fact that the weights in each layer may only affect the loss through its effect on the next layer, meaning L is the only data required to calculate the gradient of the weights for layer L, and this may be repeated for each recursive layer, directly computing the impact of the weights on the output, and avoiding the need to calculate the derivatives in subsequent layers of the network.

12

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

It should be noted that while the number of nodes and hidden layers allows for a better multi-linear regression fit of the data from the ANN, a balance must be maintained in the number of layers and nodes in order to prevent underfitting or overfitting the data. Overfitting is a substantial issue for ANNs, especially in recognizing functions, similar to the error yielded from polynomial curve fitting, as after a certain point, in the case of ANNs being a certain N nodes in each hidden layer, where the error in training is minimized, as a result of overfitting data and a poor polynomial being generated, the error gradually increases. Behler suggests the use of a 3-50- 50-50-1 ANN as the average ANN size that would be used in most molecular studies [14]. In addition, ANNs are limited in scope to the data from the training set, meaning that neural networks often result in substantial errors when used to extrapolate data that is not similar to the training set. For this study, MATLAB is used to generate the ANNs for its simplified black-box approach to neural network configuration, as this study is interested in demonstrating how ANNs can be used to accurately calculate molecular forcefields and to address the problem of overfitting in regards to these simulations instead of on manually constructing ANNs. There are many software packages that allow for the creation of ANNs and subsequent training and testing, such as TensorFlow [15], PyTorch [16] (shown below), and the MATLAB deep learning toolbox [17]. Information for each of these software packages may be found elsewhere. Another main advantage of MATLAB’s deep learning toolbox is that its utilization helps drastically decrease the time needed to debug the program. For example, constructing an ANN using Python manually, using NumPy matrixes, an ANN class using matrix multiplication to perform the feed-forward process for our study is mathematically expressed as

where , , and , are the weights, biases, and activation function corresponding to layer n respectively. The input vector, [B, A], and output vector [E, q], will be described later, though it is important to note that all weights and biases are 2 1 vectors in this example. In order to save the weights and biases from the training of the ANN to re-use them in the testing phase, two additional matrixes, Z1 and Z2, are introduced, independent of the ANN, that stores the weights and biases such that

where and are the number of nodes in hidden layer one and two of the ANN respectively, and the second half of the 2 1 vectors are stored and obtained in a similar fashion from Z2. The code for this neural network class is shown below.

13

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Figure 3.1 Python code that represents a neural network using matrices, based on the PyTorch neural network model.

3.2 ANN Force Fields Original studies in determining the potential energy surface (PES) of a system using an ANN dates to 1992, when Sumpter and Noid used an ANN to study the vibrational spectrum of polyethylene and its relationship to the PES of the system [18]. In this study, a force field consisting of 18 parameters was used, with each parameter varied and the corresponding vibrational spectrum calculated. The ANN was trained on this information, using the vibrational spectrum as the input, discretized into 426 parts, and having 18 output nodes corresponding to the force field parameters [12]. Fischer, Peterson, and Lüthi would then, in 1995, demonstrate the use of ANN to predict force constants as a function of molecular structure by replicating the constants obtained from empirical rules. The first official use of a ANN trained by electronic structure calculations would be presented in the same year by Doren and coworkers [19].

14

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Today ANNs are explored as a potential means to speed up ab initio calculations of molecular properties without sacrificing accuracy as is typically done with approximate methods. Machine learning methods have already found application in successfully predicting reaction pathways, calculating formation energies, calculating atomic forces, and predicting quantum mechanical molecular energies with speed-ups up to five orders in magnitude [12]. However, machine learning algorithms have yet to be able to apply to complex chemical environments, such as encompassing all degrees of freedom for organic compounds. According to Behler, machine learning models must adhere to three principles in order to ensure energy conservation and to be useful: they must be rotationally and translationally invariant, the exchange of two identical atoms must yield the same result, and the molecule must describe a molecule’s conformation in a unique way for a given set of atoms and their positions [14]. Within the last two decades, Behler has performed significant research on the use of machine learning as a means to quickly generate accurate PES for molecules that would otherwise prove to be computationally demanding for traditional ab initio simulations. In 2007, Behler proposed that, given the total energy of the system can be represented as a sum of its atomic contributions, a series of standard neural networks, one per atom in the system, and dependent on the coordinates of the atoms in the system, could be used to calculate the individual energy contribution of each atom to the system’s energy, and the total energy of the system [9]. In order for the set of neural networks to work, and to prevent variance in energy as a result of interchanging two atoms, all neural networks within the ML system must share the same structure, weights, and biases. There are two main concerns when it comes to training neural networks for molecular structures that will have to be addressed. The first is the tendency for a neural network to become biased to certain rotations or positions. This means that the neural network may consider a molecule rotated on one or more of its axes to be an entirely different molecule than it is if it is not rotated on an axis, and therefore produce different energies for the same molecule. This problem may be solved via the use of a symmetry function, which maps the molecules to a uniform coordinate basis, eliminating issues of directionality. By doing that, two structures with different energies will have different symmetry function values, identical local environments yield the same values, the values are not affected by translation or rotation of the system, and the number of symmetry functions is independent of the coordination number of an atom. Behler’s symmetry functions only considered the energetically relevant local environment, utilizing a cutoff function, , for which interatomic separations, , larger than a cutoff radius, , were considered to have zero value [14]. The symmetry function or functions used, denoted by , does not have to be unique, so long as the set of values is suitable for describing the environment. Such examples include a radial symmetry function, summing Gaussians with parameters and :

and an angular term for triplets of atoms with parameters , , and :

15

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

In addition to the requirement of a set of symmetry functions to uniquely describe the configuration of the system, ANN PESs require sufficiently large training sets in order to correctly describe the physics of the system and to avoid extrema. Given that MD simulations only require the energetically accessible segments of the configuration be represented accurately, it has been suggested to only sample low-energy regions for the training set. Artrith and Behler suggested that the ANN itself may be exploited to reveal poorly sampled regions. According to the pair, the ANN will predict similar energies for structures which are similar to the training set. At the same time, poorly sampled regions that exist in the testing set will result in significantly different energies and forces being computed by the ANN, indicating those structures require additional ab initio calculations be added to the training set [13]. Behler suggests that molecular structures beyond the range of the extrapolation capabilities of the neural network may be found by comparing the vector of each atom in the structure after going through the symmetry function with the minimum and maximum values of the symmetry functions found within the training set. Systems containing an atom or atoms with a vector that has a value that falls outside of the range of the training set would require additional training to be performed in order for the neural network to make valid assumptions of the system’s properties. Neural networks may also overfit the training set, causing poor fitting of structures in- between the range of the symmetry functions. Overfitting in the neural network may be detected when comparing the errors found in the training and testing sets of the neural network. If the error between these two sets are similar, the neural network has good fitting properties. However, if the errors are different, it suggests that the neural network is overfitting the training set and additional data is required in the training set. When altering the training set and iterating through, the error between the training and testing set ideally will diminish until reaching a certain point in the iterative process where further tampering with the testing set causes the difference in the errors to begin increasing again. The optimal testing set would be that produced just before the differences in the errors begin to increase.

16

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Chapter 4. Machine-Learned Force Fields for C-X Molecules and Radicals

4.1 Neural Networks for Describing Heterogeneous Molecular Systems For the simplistic diatomic cases being considered for the one-dimensional study, atoms are expressed to the ANNs utilizing their Cartesian coordinates. Given the diatomic systems in this study exist in a one-dimensional space, and all feature carbon as the central element, there is no concern for rotational or translational variance in representing the system, as each system is uniquely identified by the bond length connecting the atoms, and atomic number of the non-central atom, denoted by X. Utilizing MATLAB’s deep learning toolbox, the ANNs may be represented using a 2-N-N-2 and 2-N-N-N-2 notation, with N being the size of the hidden layers. The ANNs have the inputs B and A, where B is the bond length and A is the atomic number of the non-central atom, and outputs E and q, where E is the DFT energy for the system in Hartree and q is the partial charge of the non-central atom. The ANNs can be visualized as:

Figure 4.1 ANN for diatomic system with two hidden layers.

where is the activation function for layer i, and and are the weights and biases for the first input for node j of layer i, and and are the weights and biases for the second input respectively. While it is not a concern for the one-dimensional case, in the three-dimensional case, additional considerations must be taken to account for rotational and translational invariance, and representation of the molecule(s) being studied.

17

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

For our study, MATLAB’s deep learning toolbox was used to demonstrate the application of ANNs to calculate molecular force fields and to examine the overfitting problem of ANNs. For our study, six ANNs were created, one for each molecule/radical in our training set, and then one which incorporated all molecules and radicals in our training set, to determine the viability of representing molecular force fields with ANNs. While Behler’s suggestion of using a 3-50-50-50- 1 ANN was a sound starting point for our ANNs, due to the use of simple molecules and radicals in this study, this large ANN scheme leads to drastic overfitting of the training data. Therefore, at the start of the study, our ANNs had a 2-25-25-2 configuration. After initial runs, and determination of the viability of the ANNs, further analysis was done on what the appropriate size for the hidden layers, and how many hidden layers should be within the ANNs, testing for both two and three hidden layers from 5 to 25 nodes in each layer, and comparing the error yielded from each. To prevent bias in the training of the neural networks, a Monte Carlo simulation was used for both two layer and three layer 2-N-N-2 and 2-N-N-N-2 ANNs, where N = [5,10,15,20,25], using different seeds for the randomized initialization of the weights and biases. The seed that resulted in the lowest error during training and testing phases was then used to test for the best size ANN.

4.2 Training Set

For the training set used to optimize the ANNs’ weights and biases, we utilized four C-X systems, where X stands for C, O, N, and H atoms. The reason for using carbon-based configurations for the training set is fairly apparent given the scope of studying organic molecules, and its dependency on carbon as the central element. One of, if not the most important property of carbon is the fact that it has four valence electrons in its outer shell, meaning that carbon may form four chemical bonds. As a result of this, carbon is capable of having single, double, and/or triple bonds with a plethora of other elements, and thanks to its relatively small size, having only six electrons, carbon easily fits into most molecular configurations, unlike other elements of group 14 of the periodic table, such as silicon. Carbon is capable of forming bonds with most, if not all elements on the periodic table, and may create rings or chains of atoms of various sizes, not only being capable of creating many molecules, but creating the same configuration of atoms with varying bond angles and molecular shapes, allowing for the formation of isomers with different molecular properties. It is because of carbon’s role as a fundamental component of organic chemistry and its versatility in creating molecules that we consider these C-X molecules and radicals for the training set of our ANNs.

The four systems used in our training set are (C2), the methylidyne carbine radical (CH), (CO), and the (CN). These molecules were chosen for the training set simply because they are diatomic, meaning that calculations and simulations of the molecules in the training set may be done in an efficient manner, taking a matter of seconds for each simulation to run, and the ANNs would forgo the need of utilizing symmetry functions to uniquely represent the different configurations in the training set. The training set only focuses on fitting DFT energies and partial charges with respect to the bond lengths and does not consider many-body interactions, nor does it consider bond angles and dihedral angles.

18

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

• C2 is an inorganic chemical gas, which represents the second simplest form of carbon (apart from a single carbon atom), composed of two carbon atoms connected by a double bond, and C2 is kinetically unstable at standard temperature and pressure (STP).

• CH is a radical composed of a carbon atom with a single bond to a atom. Due to the unpaired valence electron, the CH radical is a highly reactive gas.

• CO, at STP, is an odorless, tasteless, and flammable gas, known for its toxicity for animals with hemoglobin. Carbon and are connected by a triple bond.

• CN is a radial and is one of the first molecules that was detected in the . For our training set, we obtain KS-DFT energies for these systems using NWChem as the bond length varies between 0.5 Angstroms to 8.1 Angstroms for each of the compounds. Because the bond length is only increased by 0.2 Angstroms with each simulation, it would be inefficient to create all NWChem input files by hand. For these reasons, a simple Python code has been developed to replicate the NWChem input files based on a single original file for each compound:

Figure 4.2 Python code for reproducing the NWChem input files used for the training set.

19

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

The ReplicateAngstrom.py Python script, Figure 3.2, is dependent on the CX.nw template file, and takes the molecular formula as an input argument. There are four CX.nw input files, CC.nw, CH.nw, CO.nw, and CN.nw, which are similar to the file found in Figure 2.1, with the exception that the molecule matches that of the title, and the length of the bond in the chemical structure (where 0.7 is found in Figure 2.1) are replaced with the term replacex. The Python code takes the content of the template file for the specified molecule, and creates a new file for each replication, replacing the term replacex with the length of the bond. This is repeated in loop until all bond lengths are considered. It should also be noted that because CH and CN are radicals, molecules with one unpaired electron, they are calculated with a multiplicity of two, while C2 and CO have a multiplicity of one. The new files generated by the Python code are labeled by the molecule and a numerical count such that CX1.nw has bond length 0.5 A, CX2.nw has bond length 0.7 A, and so forth. These jobs are submitted via a shell file which executed each job independently using the chemical name and count of the input files, and created a matching NWChem log file in the form of logCX# for the respective molecule and count.

Figure 4.3 Job submission script for C2.

A final Python code was written to extract the DFT energies and partial charges from the NWChem log files and store them in a .csv file for use by the neural network during training and testing. After performing the simulations, the following energies were found for the bond lengths studied:

20

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Figure 4.4 Total energy versus bond length for C2, CH, CO, and CN.

Figure 4.5 Partial charge on the non-central atom vs bond length for C2, CH, CO, and CN.

21

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

4.3 Training the Neural Network Before using ANNs to calculate configuration energies of organic molecules consisting of carbon, hydrogen, and oxygen, preliminary work was done to configure the ANN model and ensure that it operates as expected in the simplest-case scenario. In this preliminary case, multiple ANNs were created to train configuration energies for carbon as the central atom of diatomic molecules and radicals, one for each separate compound, and one for all compounds together. Carbon was studied bonded with each of the four C, H, O, and N elements independently, resulting in C-C, C-H, C-O, and C-N. For each diatomic configuration, the central atom is centered around the angstrom Cartesian coordinate [0, 0, 0], with the bonded outer atom located at Cartesian coordinates [x, 0, 0], where x varies between 0.5 angstroms and 8.1 angstroms away from the central atom, with a variance of 0.2 angstroms between NWChem simulations, as noted in the previous section. In this preliminary training, the ANNs contain an input vector consisting of two parts; the vector position of the outer bonded atom, and the atomic number of the outer atom. Originally, 2- 25-25-2 ANNs were used, later expanded to 2-25-25-25-2, and then further explored for optimal ANN size. Due to the diatomic system having only variances on one plane, rotational and translational variance are not considered to have an effect on the ANNs. It should be addressed that in the following work, dealing with compounds in a three-dimensional space, a symmetry function set is often required to avoid the ANNs from becoming biased and considering two identical molecular systems, one rotated to some degree on any plane and/or translated symmetrically throughout any plane, to be two different systems with two different configuration energies. For the preliminary training, the training set is that from the previous section, with the testing set having the same diatomic configurations but with half the difference between bond lengths at any given consecutive points. The goal of this preliminary training is to ensure that the ANNs can accurately calculate DFT energies and partial charges within the scope of the basic preliminary set. Once the ANN is successful in this task, more complicated training sets with a broader scope of organic molecules may be considered for the study. An additional consideration in the simplest-case training is solving for the overfitting problem of ANNs, as to provide the minimal error possible when determining the molecular force field using ANNs. To accomplish this, neural networks for both two and three hidden layers were tested in a Monte Carlo simulation, varying the random number generator’s seed used to initialize weights and biases in training. These ANNs were tested from a portion of considered sizes for our ANNs with 50 random seeds each, ranging from 0 to 5000, and were of size 2-N-N-2 and 2-N-N- N-2, with N = [5,10,15,20,25]. After obtaining the resulting error in training and testing the ANNs, the seed with the lowest total error in calculation was used to test for the best size of N for both two and three layer ANNs, with N = [5,10,15,20,25]. For the original training of a single ANN to represent all molecules with a 2-25-25-2 configuration, the following results were produced, with X representing the ANN output and circles representing the expected outputs from the training set:

22

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Figure 4.6 Outputs of the neural network for total energies for the training set.

Figure 4.7 Outputs of the neural network for partial charges for the training set. Note that CC is on a scale of 10-4. 23

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

with a maximum error of approximately 2.2381 10-4 Hartree for energies, and 0.0015e for partial charges. This relatively small error is already enough to demonstrate that at least in the simplest case, ANNs may be used to calculate molecular force fields, and that a single ANN can be used for all molecules in our one-dimensional case. However, it may be seen already in the 2-25-25-2 case, that the ANN has a slight overfitting problem. To correct for this, and determine what size ANN was most appropriate, a Monte Carlo simulation was first run to determine if the random number generation seed that was used to initialize the neural network had any effect on the resulting error during the training process. It was found that the seed that was used had very little effect on the resulting error, and that it was negligible which seed was used for testing different ANN sizes, resulting in seed 2042 being chosen at random.

Figure 4.8 Error in calculations from Monte Carlo simulation of different initialization seeds for ANNs for two hidden layers. Partial charge has been shortened to P.c.

24

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Figure 4.9 Error in calculations from Monte Carlo simulation of different initialization seeds for ANNs for three hidden layers. Partial charge has been shortened to P.c.

When determining the appropriate size of the ANNs to prevent both underfitting and overfitting of the DFT energy and partial charge calculations, two separate simulations were run simultaneously for examining two-layer ANNs and three-layer ANNs, and the error resulting from each was graphed in Figure 3.10 for comparison. Originally, consideration was given to test ANNs with a number of nodes in the hidden layers between 1 and 25. However, the range of considered values for the number of nodes in each hidden layer was reduced to 5 through 25, as the error for the first few values of N were too exceedingly large to conveniently compare while graphing. It was determined after testing that the best size for the two-layer ANN for all points was 2-15-15-2, with an error of 3.3390 10-4 Hartree for energies, 0.0015e for the partial charges. The best size for the three-layer ANN for all points was 2-13-13-13-2, with an error of 5.2728 10-4 Hartree for energies, 0.0020e for the partial charges, even though 2-22-22-2 and 2-20-20-20-2 resulted in the minimum max error at any one given point.

25

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Figure 4.10 Maximum error at any point for calculations from different ANN sizes by the number of nodes in the hidden layers, denoted by N.

26

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Chapter 5. Machine Learning for A Simple Three- Dimensional Molecule

5.1 Neural Networks for Describing Three-Dimensional Molecules In the previous chapter, we demonstrated how to parameterize energies and dipoles for molecules were that essentially one-dimensional, which was incredibly simple, as at most, only the atomic numbers of each atom and the bond length between them are required to uniquely describe molecules. The same cannot be said for multi-atomic molecules or those, which exist in three-dimensional space. Beyond just the atomic numbers for each atom and the lengths of the bonds that connect them, one must also consider bond angles, the rotation of the molecule in space, the location of the molecule, and in more complex simulations involving multiple molecules, the interatomic forces and bonds, including the hydrogen bond(s), attractive force, and any covalent or metallic bonds, among other factors, that contribute to the total energy of the molecule or system. To help represent a molecule in three-dimensional space, symmetry functions must be utilized for properly describing the symmetries of the molecules. In Behler and Parrinello’s works, two types of symmetry functions are proposed: radial symmetry functions and bond-angle symmetry functions. The radial symmetry functions define which atoms around a central atom have an effect on the calculation of the energy of that atom [20]. While all atoms in the system have some effect on the energy on all other atoms, at a certain distance this effect is miniscule enough to be considered zero. The point at which the contribution to the energy is considered zero is determined by a cutoff radius, Rc, usually between 2 and 6 Angstroms depending on the complexity of the system and desired accuracy. However, for simple molecules, such as the cyanopolyyne molecule used in this study, the use of symmetry functions may be forgone, due to the fact the molecule is linear. The radial symmetry functions alone are not enough to obtain a full representation of the atomic environment, as direction is not considered. To improve upon the representation, the angular symmetry function is also used. Behler and Parrinello proposed a series of angular symmetry functions, once again up to discretion. The angular symmetry functions are dependent on the angle, , describing the connection of central atom, i, with atoms j and k within Rc, and approaches zero as the distance between any two atoms that form the bond angle approaches and passes Rc, though problem may arise when representing non-central atoms in the bond due to summation across the system [20]. Utilizing these symmetry functions, we can consider the energy of atomic energy of atom i as a function in terms of atoms encompassed in a certain radius around atom i, { }, while { } itself cannot be supplied to the ANN, by expressing it with a unique representation, {Gi}. Recall

27

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

though, that the symmetry functions utilized to represent the molecule are up to the discretion of the user. An additional concern when representing multi-atomic molecules is that the total energy of the molecule is the result of the sum of the partial energies for each atom within the molecule. While it is theoretically possible to train a single ANN to represent the entire molecule based off the symmetry functions, this would require that the ANN is used for one and only one molecule. For a more consistent result, and to allow the ANNs to be utilized for multiple differing molecules, it is suggested to utilize one ANN for each atom within the molecule, with the sum of the outputs being the total energy of the molecule. While the former method could have been used due to this study only having one molecule, the later was utilized to keep consistent with best practice in the field of study.

5.2 Application to Cyanopolyyne In what follows, we use cyanopolyyne, which allows for avoiding using symmetry functions, as cyanopolyyne is a linear molecule.

Figure 5.1 Structure of a cyanopolyyne molecule

Each atom can be represented by the bond length for bonds on each side of the atom (using 0 to represent where a bond was not present) and the angle of the bond. Therefore, each atom is represented by a 3 1 input vector. Table 4.1 shows such representation. The first row is the bond length to the “left” of the atom in Angstrom. The second row was the bond angle in radians. The third row was the bond length of the bond to the “right” of the atom in Angstrom.

HCCCN 0.0000 2.1208 2.3126 2.5484 2.2550 0.0000 3.0767 2.4679 2.7990 0.0000 2.1208 2.3126 2.5484 2.2550 0.0000 Table 5.1 Input of cyanopolyyne molecule at the time-step 90. “0” is used for the bond length where a bond is not present. For these atoms, the bond angle is also marked by “0”.

28

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

The bond angles were calculated using the law of cosines:

acos where is the angle connecting central atom i to atoms j and k, where i, j, and k are unique, and

where xi, yi, zi are the Cartesian coordinates of atom i. Using ANNs, the computation of the molecular energy in the three-dimensional space is accomplished by summing up the partial energies of each atom. On the other hand, we only have the total energies from NWChem calculations. As a result, traditional supervised training of the ANNs using our training data is not applicable. To overcome this difficulty, the weights and biases of each ANN is updated to the weights and biases matrix passed into the object function and tested with select points from the training set. The outputs of the neural networks are then added together to obtain the total energy that is compared to the expected output from the training set (the total energy of the molecule) at the selected points. For our study, five 3-5-1 ANNs were used to represent the cyanopolyyne molecule for the sake of computation time, with some tolerance for error as a result of the small ANN size. When training the ANNs for the cyanopolyyne molecule, the MATLAB’s fminsearch() function was used as the optimizer for the weights and biases function. The fminsearch function utilizes Nelder- Mead simplex algorithm to minimize the error across all points in the error function, resulting a consistent ANN across multiple training points. Additionally, because the Nelder-Mead simplex algorithm updates the values of each weight and bias by plus or minus five percent at each training step, the computation time for finding the minimum error is heavily dependent on the starting weights and biases. To help minimize the time required for each minimization, the weights and biases from the previous smaller sized training set were used for the starting weights and biases of each increasing size training set thereafter. Due to the computational cost of training with multiple points as a result of each weight and bias being updated individually by a relatively small amount in each point in training, the fminsearch was utilized to find optimal weights and biases for 1, 2, 3, 30, and 110 training points from our 999-point training set, obtained using the first N indexes of 1:30:999, where N is the number of training points used to train the ANNs. In this way the training points would contain those from the previous N points used for each training iteration, with the exception of 110 training points, which used the first 110 indexes of 1:9:999. Given the fminsearch optimizer has to fix the weights and biases of the ANNs to minimize the error in training, MATLAB’s deep learning toolbox was not used here, instead each ANN being created manually using MATLAB’s neural network class. The ANNs were created utilizing a customized function, CreateNeuralNetwork, which takes the number of layers, the size of the hidden layers, a binary vector of if each layer has biases (1 for yes, 0 for no), and an example input. To keep the ANNs consistent, all inputs were 3 1 vectors as noted above, even if the atom could be uniquely represented by a single number. Original testing utilized the hyperbolic tangent as the 29

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

activation function for each layer within the ANN, with exception of the output layer, utilizing a linear function. However, due to the original testing using random weights and biases, all nodes within the ANNs would always fire, resulting in a single output regardless of the input for the ANN. While in traditional practice, ANNs utilize a sigmoid function for the activation function of hidden layers, our ANNs utilized a linear activation function for all layers to avoid all nodes being over the threshold of the activation function with the initial untrained weights and biases. During training, a similar function, SetWeightsAndBiases, was used to update the weights and biases of the ANNs, using the ANN to update, the number of layers, size of hidden layers, and the weights and biases matrix as inputs. The weights and biases matrix is a N 2 matrix, where N is the number of weights in the ANN, and is organized such that the first column contains the weights of each node in the ANN, and the second contains the node’s respective bias. It should be noted that because the input and output layers do not have biases in our ANNs, some biases in the weights and bias matrix are unused. While the bias connect vector could be used to avoid setting these biases for the SetWeightsAndBiases function, no error comes from setting them, as they are not used in calculations unless the layer has biases as specified in the ANN’s initialization.

5.3 Results and Discussion Due to complexities with representing molecules in a three-dimensional space, the computational cost of training the ANNs for the cyanopolyyne molecule is incomparable to that of the one-dimensional case. As a result of this, only the 3-5-1 ANNs are considered for this study, and as a result, there is some tolerance for error as due to the ANNs’ size. While solving for the optimal size ANN to minimize the error for this three-dimensional case is possible, following similar methods as those found in the one-dimensional case, it is considered beyond the time constraint of this study. Recall that our training for the five 3-5-1 ANNs that represent the cyanopolyyne molecule utilize one, two, three, thirty, and one hundred and ten MD points from our training set. The one- point and two-point training cases were utilized primarily for testing the ANN code, as the ANNs should be able to accurately fit one to two points, as the number of unknowns (the outputs at each point) were significantly less than the number of nodes used to solve for the unknowns, meaning the error in training compared to the training points used should be near zero, though for time, a small error in convergence is acceptable. However, these cases drastically underfit the entire training set, as only the 1st and 30th timestep were used. As a result of this, the ability of these training cases to actually represent all 999 points in the training set is not of concern for this study. While it is still a poor representation of all 999 points in the training set, the three MD point training is presented in figure 4.3, with a total error across the three training points of 2.5871x10-3 Hartree, a maximum error at any point in the 999-point training set of 0.1674 Hartree, and a total error when tested with the complete training set of 42.4181 Hartree.

30

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Figure 5.2 Results of training with three MD points, with the minimization of error from updating weight and biases in training (left), comparison of the total energy at the points used in training from the ANNs and the training set at convergence (center), and testing for all 999 points in the training set (right).

Using only three MD points in training took approximately 25 minutes for the training to converge. In comparison, using 30 MD points took approximately 90 minutes to converge, resulting in an a total error across the 30 training points of 0.1967 Hartree, a maximum error at any point in the 999-point training set of 0.0452 Hartree, and a total error when tested with the complete training set of 29.5117 Hartree, as seen in Figure 4.4.

Figure 5.3 Results of training with 30 MD points, with the minimization of error from updating weight and biases in training (left), comparison of the total energy at the points used in training from the ANNs and the training set at convergence (center), and testing for all 999 points in the training set (right).

As for the 110 MD point training, which took approximately 270 minutes to converge, the total error across all 110 training points was 0.7090 Hartree, a maximum error at any point in the

31

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

entire training set of 0.0269 Hartree, and a total error across the entire training set of 6.7311 Hartree.

Figure 5.4 Results of training with110 MD points, with the minimization of error from updating weight and biases in training (left), comparison of the total energy at the points used in training from the ANNs and the training set at convergence (center), and testing for all 999 points in the training set (right).

While it was considered training with all 999 points from the training set, 110 points was chosen as a reasonable cap due to the computation time for convergence. For consideration, it takes approximately 40 minutes to update the weights and biases in a single step of training for all 999 points, which when considering it may take up to 750 to 1,000 iterations of training to converge if we are lucky, this would result in approximately 30,000 to 40,000 minutes of training, or 500 to 666 hours.

32

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

Chapter 6. Conclusion

The use of ANNs to generate accurate force fields for performing molecular dynamics calculations could, if successful, greatly increase efficiency in chemical simulations, and help to broaden the timescale for which MD simulations may be observed. ANNs have many advantages, being applicable to a plethora of systems, may be scaled for accuracy or quickness, and are constructible to simulate any electronic structure method. One of the largest hindrances facing ANN-based MD simulations comes from the ANN’s inability to extrapolate beyond the bounds of the training set. Extensive research has been performed on increasing the accuracy of machine learned energy fields, looking at increasing the efficiency of training in order to limit the number of costly ab initio simulations that must be performed while still providing accurate and useful approximations. This being said, it remains beyond the scope of reason that machine learning could completely replace ab initio simulations, as the simulations provide the foundation from which the ANNs may learn, and in complex or lengthy cases, may prove more efficient than training ANNs to represent the ab initio simulations. In addition to ANNs having limited capacity to provide accurate and useful data beyond the bounds of the training set, another hindrance for machine learned force fields is the requirement that atoms in the molecule are represented in a translationally and rotationally independent, an interchangeable, and a unique way for complex systems. Simply using the Cartesian coordinates of all the atoms in the system cannot uniquely represent the system, as symmetry operations (such as translation and rotation). In addition, a new ANN is needed for a new system. Again, prior research has yielded a solution to such problem by using symmetry functions. Additionally, the use of one ANN for each atom to calculate the contribution of each atom to the system’s energy helps to eliminate these problems. While each atom in the system requires an independent ANN to calculate its partial contribution to the system, each ANN should have the same structure, weights, and biases in order to reflect the requirement that the inputs are interchangeable for a system. The replication of ab initio simulations and their respective force fields is not a new application of ANNs. This research is built on the foundation of prior research performed in the field of study to increase the cost loss that can be attributed to the use of ANNs. In this study, a preliminary case of simple diatomic molecules and radicals in a one-dimensional space proved the validity for representing molecular systems through vector notations, and demonstrated the accuracy of ANNs in comparison to size, and the overfitting error that may arise from large ANNs. Beyond this, a cyanopolyyne molecule was studied, demonstrating the efficiency of ANNs for representing organic molecules in MD simulations in comparison to the use of ab initio methods. While the methods used in this study are not all-encompassing and should be improved upon to further increase cost loss attributed to the use of ANNs, this study serves as a promising example that ANNs may be used to represent chemical simulations on large timescales with relative accuracy and efficiency.

33

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

References

[1] M. Born and J. R. Oppenheimer, "On the Quantum Theory of Molecules," Annalen der Physik, vol. 389, no. 20, pp. 457-484, 1927.

[2] P. Hohenberg and W. Kohn, "Inhomogeneous Electron Gas," Physical Review, vol. 136, p. B864, 1964.

[3] D. Bagayoko, "Understanding density functional theory (DFT) and completing it in practice," AIP Advances, p. 127104, 2014.

[4] W. Kohn and L. Sham, "Self-Consistent Equations Including Exchange and Correlation Effects," PHYSICAL REVIEW, 1965.

[5] B. J. Alder and T. E. Wainwright, "Studies in Molecular Dynamics. I. General Method," The Journal of Chemical Physics, pp. 459-466, 1995.

[6] M. D. Hanwell, D. E. Curtis, D. C. Lonie, T. Vandermeersch, E. Zurek and G. R. Hutchison, "Avogadro: An advanced semantic chemical editor, visualization, and analysis platform," Journal of Cheminformatics, pp. 4 - 17, 2012.

[7] E. Aprà, E. J. Bylaska, W. A. de Jong, N. Govind, K. Kowalski, T. P. Straatsma, M. Valiev, H. J. J. van Dam, Y. Alexeev, J. Anchell, V. Anisimov, F. W. Aquino, R. Atta- Fynn, J. Autschbach, N. P. Bauman, J. C. Becca, D. E. Bernholdt and K. Bhaskaran-Nair, "NWChem: Past, present, and future," The Journal of Chemical Physics, p. 152, 2020.

[8] J. D. Bernal, "The Bakerian Lecture, 1962 The structure of liquids," Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, pp. 299-322, 1997.

[9] J. Behler and M. Parrinello, "Generalized Neural-Network Representation of High- Dimensional Potential-Energy Surfaces," PHYSICAL REVIEW LETTERS, pp. 146401-1 - 146401-4, 2007.

[10] H. C. Urey and C. A. Bradley, "The vibrations of pentatonic tetrahedral molecules," Physical Review, vol. 38, no. 11, 1931.

[11] M. González, "Force fields and molecular dynamics simulations," Collection SFN, pp. 169- 200, 2011.

[12] J. Behler, "Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations," Physical Chemistry Chemical Physics, p. 17901–18232, 2011.

34

dッ」オsゥァョ@eョカ・ャッー・@idZ@dceeXVYRMXPQcMTTWTMadVWMSfbdWVTTPTec

[13] J. Behler, "First Principles Neural Network Potentials for Reactive," Angewandte Chemie International Edition, p. 12828 – 12840, 2017.

[14] J. Behler, "Constructing High-Dimensional Neural Network Potentials," Quantum Chemistry, pp. 1032-1050, 2015.

[15] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, R. Jozefowicz, Y. Jia, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, M. Schuster, R. Monga, S. Moore, D. Murray, C. Olah, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu and X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.

[16] Paszke, A. a. Gross, S. a. Chintala, S. a. Chanan, G. a. Yang, E. a. DeVito, Z. a. Lin, Z. a. Desmaison, A. a. Antiga, L. a. Lerer and Adam, "Automatic differentiation in PyTorch," in NIPS-W, 2017.

[17] V. 9. (. MATLAB., Natick, Massachusetts: The MathWorks Inc., 2020.

[18] B. G. Sumpter and D. W. Noid, Chem. Phys. Lett., vol. 192, p. 455, 1992.

[19] T. B. Blank, S. D. Brown, A. W. Calhoun and D. J. Doren, "Neural network models of potential energy surfaces," The Journal of Chemical Physics, vol. 103, pp. 4129 - 4137, 1995.

[20] J. S. Smith, O. Isayev and A. E. Roitberg, "ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost," Chemical Science, 2017.

35