Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1793
Alexandria: A General Drude Polarizable Force Field with Spherical Charge Density
MOHAMMAD MEHDI GHAHREMANPOUR
ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 ISBN 978-91-513-0624-7 UPPSALA urn:nbn:se:uu:diva-380687 2019 Dissertation presented at Uppsala University to be publicly examined in Room B21, Uppsala Biomedical Centre, Husargatan 3, Uppsala, Monday, 27 May 2019 at 09:15 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Professor Kresten Lindorff-Larsen (Department of Biology, University of Copenhagen).
Abstract Ghahremanpour, M. M. 2019. Alexandria: A General Drude Polarizable Force Field with Spherical Charge Density. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1793. 69 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-513-0624-7.
Molecular-mechanical (MM) force fields are mathematical functions that map the geometry of a molecule to its associated energy. MM force fields have been extensively used for an atomistic view into the dynamic and thermodynamics of large molecular systems in their condensed phase. Nevertheless, the grand challenge in force field development—which remains to be addressed —is to predict properties of materials with different chemistries and in all their physical phases. Force fields are, in principle, derived through supervised machine learning methods. Therefore, the first step toward more accurate force fields is to provide high-quality reference data from which the force fields can learn. Thus, we benchmarked quantum-mechanical methods—at different levels of theory—in predicting of molecular energetics and electrostatic properties. As the result, the Alexandria library was released as an open access database of molecular properties. The second step is to use potential functions describing interactions between molecules accurately. For this, we incorporated electronic polarization and charge penetration effects into the Alexandria force field. The Drude model was used for the explicit inclusion of electronic polarization. The distribution of the atomic charges was described by either a 1s-Gaussian or an ns-Slater density function to account for charge penetration effects. Moreover, the 12-6 Lennard-Jones (LJ) potential function, commonly used in force fields, was replaced by the Wang-Buckingham (WBK) function to describe the interaction of two particles at very short distances. In contrast to the 12-6 LJ function, the WBK function is well behaved at short distances because it has a finite limit as the distance between two particles approaches zero. The third step is free and open source software (FOSS) for systematic optimization of the built-in force field parameters. For this, we developed the Alexandria chemistry toolkit that is currently part of the GROMACS software package. With these three steps, the Alexandria force field was developed for alkali halides and for organic compounds consisting of (H, C, N, O, S, P) and halogens (F, Cl, Br, I). We demonstrated that the Alexandria force field described alkali halides in gas, liquid, and solid phases with an overall performance better than the benchmarked reference force fields. We also showed that the Alexandria force field predicted the electrostatics of isolated molecules and molecular complexes in agreement with the density functional theory at the B3LYP/aug-cc-pVTZ level of theory.
Keywords: Molecular mechanics, Force field, Drude oscillator model, Alexandria library, GROMACS
Mohammad Mehdi Ghahremanpour, Department of Cell and Molecular Biology, Box 596, Uppsala University, SE-75124 Uppsala, Sweden.
© Mohammad Mehdi Ghahremanpour 2019
ISSN 1651-6214 ISBN 978-91-513-0624-7 urn:nbn:se:uu:diva-380687 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-380687) To Rezvan
List of papers
This thesis is based on the following papers, which are referred to in the text by their Roman numerals.
I Ghahremanpour, M. M., van Maaren, P. J., Ditz, J., Lindh, R., Van der Spoel, D. (2016) Large-Scale Calculations of Gas Phase Thermochemistry: Enthalpy of Formation, Standard Entropy and Heat Capacity. J. Chem. Phys., 145, 114305.
II Ghahremanpour, M. M., van Maaren, P. J., Van der Spoel, D. (2018) The Alexandria Library: A Quantum Chemical Database of Molecular Properties for Force Field Development. Sci. Data, 5, 180062.
III Ghahremanpour, M. M., van Maaren, P. J., Caleman, C., Hutchison, G. R., Van der Spoel, D. (2018) Polarizable Drude Model with s-type Gaussian or Slater Charge Density for General Molecular Mechanics Force Fields. J. Chem. Theory Comput. 14, 5553-5566.
IV Walz, M. M., Ghahremanpour, M. M., van Maaren, P. J., Van der Spoel, D. (2018) Phase-Transferable Force Field for Alkali Halides. J. Chem. Theory Comput., 14, 5933-5948.
V Van der Spoel, D., Ghahremanpour, M. M., Lemkul, J. (2018) Small Molecule Thermochemistry: A Tool for Empirical Force Field Development. J. Phys. Chem. A, 122, 8982-8988.
VI Ghahremanpour, M. M., van Maaren, P. J., Van der Spoel, D. Efficient Physics-Based Polarizable Charges: from Organic Compounds to Proteins. Manuscript.
Reprints were made with permission from the publishers.
Contents
1 Introduction ...... 11 1.1 Quantum-Mechanical Models ...... 12 1.1.1 Wave function theory ...... 12 1.1.2 Density functional theory ...... 15 1.2 Molecular-Mechanical Models ...... 17
2 Alexandria Library ...... 22 2.1 Data Availability ...... 22 2.2 Properties in the Alexandria Library ...... 23 2.2.1 Molecular thermochemistry ...... 23 2.2.2 Molecular electrostatics ...... 26 2.3 Computational details ...... 29 2.4 Technical validation ...... 29
3 Intermolecular Potential Energy Function ...... 31 3.1 Quantum mechanical approximations for intermolecular energies ...... 31 3.2 Alexandria force field approximations for intermolecular energies ...... 34 3.2.1 Electrostatic and Charge Penetration ...... 34 3.2.2 Electronic Polarization ...... 36 3.2.3 Repulsion and Dispersion ...... 38
4 Generation of Polarizable Atomic Charges ...... 42 4.1 Electrostatic Potential (ESP)-fitting with Drude Model ...... 42 4.2 Alexandria Charge Model ...... 44
5 Parameterization ...... 47 5.1 Linear Fitting ...... 47 5.1.1 Singular value decomposition ...... 47 5.1.2 Bootstrapping ...... 48 5.2 Nonlinear Fitting ...... 48 5.2.1 Bayesian inference ...... 48 5.2.2 Bayesian computation ...... 50 5.2.3 Simulated annealing ...... 52
6 Alexandria Chemistry Toolkit ...... 53 6.1 Extracting Quantum Chemistry Data ...... 53 6.2 Generation of Force Field Atom Types ...... 53 6.3 Optimization of Force Field Parameters ...... 53 6.4 Generation of Molecular Topology and Atomic Charges ...... 54 6.5 Coulomb Integrals for Distributed Charge Densities ...... 55 6.6 Parallelization ...... 55 6.7 License ...... 55
7 Summary of papers ...... 56 7.1 Paper I ...... 56 7.2 Paper II ...... 56 7.3 Paper III ...... 57 7.4 Paper IV ...... 57 7.5 Paper V ...... 58 7.6 Paper VI ...... 59
8 Populärvetenskaplig Sammanfattning på Svenska ...... 60
References ...... 63 Abbreviations
ACM Alexandria Charge Model ACT Alexandria Chemistry Toolkit BSSE Basis Set Superposition Error DFA Density Functional Approximation DFT Density Functional Theory DM Drude Model ESP Electrostatic Potential HF Hatree-Fock MM Molecular Mechanics QM Quantum Mechanics SVD Singular Value Decomposition
1. Introduction
The longstanding and far-reaching goal in computational chemistry is to pre- dict—with a single model—the state of chemical compounds with different chemistries and in all their physical phases. At the same time the model should be applicable to systems of any size. However, theory-driven molecular mod- els are all approximations limited to their domain(s) of applicability. The approximations range from quantum mechanics (QM) to molecular mechanics (MM). The tradeoff between the simplicity (brute force) and the complexity (sophistication) of these approximations makes it challenging to design a single model with an efficient predicting power of the structure and the property of molecules and at a moderate computational cost. For instance, QM models at a high level of theory are more accurate than MM models but at increased computational cost. This limits their applicability to small- and medium-sized systems. Consequently, MM models are increasingly used for an atomistic view into the dynamic and thermodynamics of large systems such as liquids and biomacromolecules [1, 2, 3]. In spite of all the progress in MM models [4, 5, 6, 7, 8], they have not yet reached the level of elegance to be phase and chemically transferable. The aim of the Alexandria project is to provide a platform to systematically advance MM models toward these goals. This platform consists of three main com- ponents: High Quality Reference Data (HQRD), Free and Open Source Soft- ware (FOSS), and Accurate Potential Energy Functions (APEF). MM models are parametric; the role of these components together, as explained in what follows, is central to the systematic development of MM models.
Figure 1.1. The development procedure of the Alexandria force field in a nutshell.
This thesis is structured to explain Fig. 1.1 as follows: the first part of Chap- ter 1 briefly explains the basics of the wave function and the density functional theory used to provide quantum chemistry reference data. The second part of Chapter 1 explains molecular mechanics and the formulation of classical force fields. In Chapters 2-6, I turn to specifically explain HQRD, APEF, and FOSS
11 in the context of the Alexandria project. Chapter 2 introduces the Alexandria library that is related to the HQRD component. Chapters 3 and 4, which are about the APEF component, explain the potenial energy functions used in the Alexandria force field. Chapters 5 and 6 are related to the FOSS component. Chapter 5 explains the optimization algorithms implemented to parameterize the potential functions. Chapter 6 introduces the Alexandria Chemistry Toolkit (ACT) as a free and open source software for force field parameterization. Fi- nally, Chapter 7 summarizes each paper included into this thesis.
1.1 Quantum-Mechanical Models 1.1.1 Wave function theory The Schrödinger wave equation for a non-relativistic and time-independent system is formulated as follows: Hˆ Ψ(x)=EΨ(x) (1.1) where Hˆ is the Hamiltonian operator of M nuclei and N electrons with position vectors Ra and ri, respectively. Ψ(x) is the molecular wave function where x is a vector consisting of three spatial coordinates r and one spin coordinate ω. E is the energy of molecule. In the absence of external magnetic and electric fields, Hˆ is defined, in atomic unit, as: M ∇2 N M N ˆ = − 1 a − 1 ∇2 − Za H ∑ ∑ i ∑∑ |R − r | 2 a Ma 2 i a i a i M M N N + ZaZb + 1 ∑ ∑ |R − R | ∑ ∑ |r − r | (1.2) a=1 b>a a b i=1 j>i i j where the first two terms are the kinetic energy of nuclei and electrons, re- spectively. ∇ is the Laplace operator and Ma is the mass of nucleus a. The third term is the attractive electrostatic interaction between nuclei and elec- trons. The last two terms are the repulsive electrostatic energy because of the nucleus-nucleus and electron-electron interactions, respectively. Eqn. 1.1 and Eqn. 1.2 can be simplified through two central approximations: the Born-Oppenheimer (BO) and the Hartree-Fock (HF) approximations. The BO approximation assumes that the position of nuclei in a molecule is fixed relative to the position of electrons. This suggests that electrons move in the field of fixed nuclei [9]. This is a useful approximation because of the fact that nuclei are much heavier than electrons; thus, they move much more slowly [9]. As a result, Eqn. 1.2 simplifies to
N M N N N ˆ = −1 ∇2 − Za + 1 He ∑ i ∑∑ |R − r | ∑ ∑ |r − r | (1.3) 2 i a i a i i=1 j>i i j
12 where Hˆ e is called the electronic Hamiltonian. The wave function of a molecule with N electrons is antisymmetric with respect to the interchange of the spatial coordinates of any two electrons. This means that Ψ(x1,x2,···,xN)=−Ψ(x2,x1,···,xN) (1.4) If we assume that these N electrons do not interact with each other, the molec- ular wave function is simply the product of N one-electron spin orbitals χ(x):
Ψ(x1,x2,···,xN)=χi(x1)χ j(x2),···, χk(xN) (1.5) This product is known as the Hartree Product (HP), which becomes antisym- metric if it is written as a determinant. This is often referred to as the Slater determinant (SD) [9, 10]: 1 ΨSD(x)=√ χi(x1)χ j(x2),···, χk(xN) (1.6) N! Note that χ(x) can also be written as the product of one spatial orbital ψ(r) and one spin orbital ω(α/β) [10]:
χ(x)=ψ(r)ω(α/β) (1.7) where ω(α) is for spin up and ω(β) is for spin down. The exact ψ(r) is not known, but it can be approximated from a linear combination of a finite number of basis functions φ(r), given by
k ψi(r)=∑cijφ j(r) (1.8) j where cij is the coefficient of the basis function j in the spatial orbital i [9, 11]. For a Gaussian basis function, Eqn. 1.8 is written as:
k 2 −ξ jr ψi(r)=∑cije (1.9) j where ξ j is the exponent of the basis function j. In the HF approximation, the energy and the electronic Hamiltonian of a molecule is approximated by the sum over N one-electron energies and one- electron Hamiltonian (Fock) operators [9], given by
N Ee = ∑εi (1.10) i
N Hˆ e = ∑ fˆ(i) (1.11) i
13 such that fˆ(i)χ(xi)=εiχ(xi) (1.12) where the Fock operator is defined as
M ˆ( )=−1∇2 − Za + HF f i i ∑ |R − r | Ui (1.13) 2 a a i
HF where Ui is the average potential energy felt by the i’th electron because of the presence of the other (N-1) electrons, given by
N HF = ( ˆ − ˆ ) Ui ∑ Jj Kj (1.14) j where Jˆ is the Coulomb operator and Kˆ is the exchange operator that does not have a classical interpretation [11]. Note that for i = j in Eqn. 1.14, the Coulomb and the exchange energies are equal with opposite signs; thus, the Coulomb self-interaction energy will cancel out [11, 10]. Now we need to find the best set of spin orbitals forming the Slater de- terminant that minimizes the electronic energy of the molecule subject to the constrain that the spin orbitals remain orthonormal [9, 11]. This can be for- mulated as Ψ∗ ˆ Ψ τ SDHe SDd E = min ∗ , (1.15) HF Ψ →Ψ Ψ Ψ τ SD SD SDd based upon the Variational principle. The lowest energy obtained from this procedure corresponds to the the ground-state of the molecule. Since Eqn. 1.12 is nonlinear, Eqn. 1.15 is solved iteratively using a self-consistent field (SCF) procedure, which, in practice, optimizes the coefficient of basis functions used in Eqn. 1.8 to expand the spatial orbitals [11]. The number of basis functions profoundly affect the accuracy and the efficiency of the quantum chemistry models. A larger basis set increases accuracy at increased computational cost. However, any set of basis functions is finite; thus, QM models suffer from the basis set incompleteness error. The basis sets used for this work are men- tioned in Chapter 2 and their performance is discussed in paper II. For the curious reader, theoretical details of different basis sets are explained in ref [9]. As explained above, the Born-Oppenheimer and the HF approximations simplify the wave function theory. Nevertheless, the wave function of an N- electron molecule is a complicated function that depends on 4N variables and cannot be measured experimentally. It should be noted that the HF approxima- tion considers the correlation between the motion of electrons with the same spin, but it neglects the correlation between electrons with apposite spins in the wave function. This is compensated in Post-Hartree Fock (PHF) meth- ods such as the Møller-Plesset (MP) perturbation theory, the coupled-cluster
14 (CC) theory, and the full configuration interaction (FCI), at the price of higher computational cost. The reader is referred to ref [9] for the theory of PHF methods.
1.1.2 Density functional theory The Density Functional Theory (DFT) is another approach to compute the energy of a chemical compound based on the principles of quantum mechan- ics. DFT considers the molecular electron density, ρ(r), as the main variable rather than the molecular wave function. The electron density can be probed experimentally and only depends on three spatial variables. ρ(r) integrates to the number of electrons (N). The nuclear charge is generally treated as a point charge; therefore in the ground-state, the electron density forms a maximum or a cusp at the position of each nucleus (R). The charge magnitude of nucleus can also be obtained from the electron density and its gradient as the position of the nucleus is approached [10, 11]: ∂ lim + 2Z ρ¯(r) = 0 (1.16) r→R ∂r where ρ¯(r) is the spherical average of ρ(r) [10]. The Hohenberg-Kohn (HK) theorems form the core of the density func- tional theory. The proof of the HK theorems can be found in refs [10, 11]. The first HK theorem states that the ground-state energy of a molecule is a unique functional of its electron density. The second HK theorem states that the energy given by the functional corresponds to the ground-state if and only if the electron density is the exact electron density of the ground-state. There- fore, to use DFT, we need to obtain the electron density of the ground-state of the molecule as well as the functional that maps the electron density to its associated energy. The electron density of a system consisting of N non-interacting electrons can be obtained by a sum over squared one-electron orbitals:
N/2 ρ(r)= ψ ∗ (r)ψ (r) 2 ∑ i i (1.17) i ∗ where ψ (r) is the complex conjugate of ψ(r), which is expanded by the linear combination of basis functions φ(r) following Eqn. 1.8. Therefore, we need to optimize the coefficients of the basis functions to minimize the energy of the molecule. However, the search for spatial orbitals is constrained such that any trial electron density ρ˜(r) must integrate to the number of electrons N. This constraint is taken into account by means of Lagrange multiplier μ as follows: ∂ E − μ ρ˜(r)dr − N = 0 (1.18) ∂ρ˜(r)
15 where μ is also called the chemical potential, which is used for determinin- ing partial atomic charges as it is explained in Chapter 4 (see Section 4.2). According to the second HK theorem, the energy associated with any ρ˜(r) is either higher or equal to the ground-state energy of the molecule:
E[ρ(r)] ≥ E[ρ˜(r)] (1.19)
Therefore, the search for the electron density needs to continue until the dif- ference between the energy of trial electron densities converges to a threshold. We now need to define E[ρ(r)]. The Kohn-Sham (KS) theory states that E[ρ(r)] consists of four components [11], given by
E[ρ(r)] = T[ρ(r)] +Uee[ρ(r)] +Une[ρ(r)] + Exc[ρ(r)] (1.20) where T gives the kinetic energy of the electron density, Uee yields the electron- electron repulsion energy, Une gives the nucleus-electron energy, and Exc gives the exchange-correlation energy. The first three terms can be obtained as fol- lows: N 1 2 T[ρ(r)] = ∑ψi(r)| − ∇ |ψi(r) (1.21) i 2 1 ρ(r)ρ(r) U [ρ(r)] = drdr (1.22) ee 2 |r − r|
Une[ρ(r)] = ρ(r)νext(r)dr (1.23) where νext(r) is the external potential because of the nuclei:
ν (r)= Za ext ∑ |r − R | (1.24) a a
The cumbersome term in Eqn. 1.20 is the exchange-correlation functional Exc, for which no exact solution exists. Therefore, much effort has been put into developing density functional approximations (DFA). Many approches have been explored to approximate Exc. Some of these are the Local Density Approximation (LDA), the Generalized Gradient Approximation (GGA), the Meta-GGA, and the Hybrid Functionals, such as the B3LYP functional used in this work. The main idea behind a hybrid functional is to include a fraction of the Hartree-Fock exchange into the functional in a parametric approach. The amount of the HF exchange energy is controlled by empirical parameters. It is beyond the scope of this thesis to review the density functional approxima- tions, so the reader is referred to Ref [12] for a comprehensive review on these methods.
16 1.2 Molecular-Mechanical Models The main source of the high computational cost of QM methods is the explicit presence of electrons and their interactions with each other and with nuclei in the quantum-mechanical Hamiltonian. However, a molecular-mechanical Hamiltonian takes electrons into account implicitly in the form of partial charg- es associated with the atoms inside a molecule. The atoms are held to each other by means of classical springs representing the chemical bonds. The bond information then is built into the model in terms of bond lengths, bond angles, and dihedral angles. These are often referred to as the internal coordinates of the molecule (Figure 1.2).
Figure 1.2. The internal coordinates of a molecule in terms of the bond length l, the bond angle θ, and the dihedral angle φ.
In molecular mechanics, the mechanics of the system is based on particles. Therefore, the Newton equation needs to be solved instead of the Schrödinger wave equation or DFT. The Newton equations are formulated as
p x˙ = (1.25) m
∂U(x) 1 x¨ = − (1.26) ∂x m where x is the position, p is the momentum, and m is the mass of the particle. U is the potential energy that is a function of the position. x˙ and x¨ represent the velocity (v) and acceleration (a), respectively. It has been shown elsewhere that the laws of motion in classical mechanics follow from quantum mechan- ics [13]. The Newton equations of motion cannot be solved analytically for systems larger than two particles, this is known as the many-body problem. Thus, finite-difference methods like the leap-frog algorithm are used to solve
17 the Newton equations numerically as follows [14]: 1 x(t + δt)=x(t)+δtv(t + δt) (1.27) 2 1 1 v(t + δt)=v(t − δt)+δa(t) (1.28) 2 2 where the equations are solved step-by-step based on the time interval δt. The transition from quantum mechanics to molecular mechanics is, how- ever, a drastic simplification. It needs to be examined carefully whether this transition is justified. The influence of quantum character of a particle on its behavior can be roughly approximated from its minimum quantum width, σx. For a (nearly) classical particle in thermal equilibrium, σx can be obtained from the Heisenberg’s uncertainty principle [13]: h¯ σx ≥ (1.29) 2 mkβ T where h¯ = h/2π and h is the Planck’s constant, kβ is the Boltzmann constant, and T is the temperature of the particle. The quantum effect becomes impor- tant when the force acting on the particle changes considerably over the width of the particle. For instance, this happens when the width of the particle ex- ceeds ∼0.01 nm in the condensed phase with an inter-particle distance of a few tenths nanometer [13]. Table 1.1 lists σx for electron and some elements of the periodic table at different temperatures from 10 K to 300 K. The values printed in this table shows that the quantum character of particles decreases as the temperature increases. Heavy elements like carbon, oxygen and idoine can be assumed to be nearly classical at 300 K. However, hydrogen has a sig- nificant amount of quantum character even at 300 K [13].
Table 1.1. The minimal quantum width in nm of the electron and some atoms at temperatures between 10 and 300 K, derived from Heisenberg’s uncertainty relation [13]. m (u) 10 K 30 K 100 K 300 K e 0.0000545 4.7 2.7 1.5 0.86 H 1 0.11 0.064 0.035 0.02 C 12 0.032 0.018 0.010 0.0058 O 16 0.028 0.016 0.0087 0.0050 I 127 0.0098 0.0056 0.0031 0.00018
The importance of quantum effects on a physical property can be measured experimentally through the isotopic effect [13, 1]. The quantum effect is not negligible if a property of interest is largely dependent on the isotopic com- position of the system. However, quantum corrections can be applied to some properties calculated via classical simulations. For instance, quantum correc- tions can be applied to thermodynamic functions such as the Helmholtz free
18 energy, and structural quantities such as the radial distribution function g(r). These are explained in other places [1, 15]. To directly take quantum effects into account for light nuclei like the hydrogen, other methods, such as the path-integral simulation, can be used, but at a higher computational cost. This is explained in Ref [1]. In spite of all these shortcomings, molecular mechanics has been exten- sively used in theoretical studies of molecular systems in the condensed phase over the last four decades. The low computational cost of MM methods allows sampling the configurational space of a molecular system. This makes a link between microscopic properties, such as the atomic interactions in a liquid, for example, to macroscopic properties, such as density, pressure, tempera- ture, and free energy [1]. This has helped the study of the dynamic and ther- modynamics of macromolecules in solution–from biomolecules to synthetic polymers–at atomistic level [3]. This not only is of academic interest but also useful in industrial applications, such as drug discovery [16]. Therefore, in the Alexandria project, we attempt to increase the accuracy of molecular simula- tions for describing molecular interactions at the atomic level. As mentioned earlier, molecular mechanics maps the geometry of a molecule to its associated energy through a potential energy function U. U is a scalar- valued mathematical function such that -∇U = F, where F is a conservative force field [14]. Potential functions consists of a set of built-in parameters (constants) that influence their output (energy). The accuracy of the molecular energies depends on the form and the parameters of these potential functions. Parameters also control the transferability of the potential functions between chemicals with different chemistries. The reason for this is that these parame- ters are, in essence, derived by (supervised) machine learning algorithms that can learn from training datasets. Thus, the data quality and the diversity of the molecules in the training sets determine the domain of accuracy and transfer- ability of the resulting parameters. In the early years of force field development, experimental data were the only source from which to learn the parameters. However, the amount of ex- perimental data for molecular properties is limited despite the vast chemical space of synthetic and virtual compounds. Generating quantum chemistry data for chemical compounds from different parts of the chemical space will pro- vide an altenative source of reference data to accelerate progress in molecular- mechanical force fields [17, 18, 19]. QM methods allow calculation of molec- ular properties even for toxic compounds for which performing experiments is hazardous. For this reason, we have benchmarked different quantum chem- istry methods against the accessible experimental data. The result was the Alexandria library that has been released as an open access quantum chem- istry database of optimized geometries and the gas-phase physico-chemical properties of chemical compounds for force field development (see papers II and III). However, it should be noted that for the bulk and dynamic properties
19 of solids and liquids, experiment is still our main source of data (see paper IV). The form of the potential functions also profoundly influences the accuracy and computational efficiency of MM models. Force fields have traditionally been formulated as [1]:
1 2 U = ∑ kl(l − le) bonds 2 1 2 + ∑ kθ (θ − θe) angles 2
+ ∑ ∑kφ,n[1 + cos(nφ + δn)] torsions n + 1 qiq j ∑ πε ij 4 0 rij 12 6 σij σij + ∑4εij − (1.30) ij rij rij where kl is the bond stretch force constant; le is the equilibrium bond length; l is the bond length of two connected atoms ij; kθ is the bending angle force constant; θe is the equilibrium bending angle; θ is the bending angle of three connected atoms ijk; φ is the dihedral angle of four connected atoms ijkl; n is the number of minima in a rotation of 2π around the j − k bond, kφ,n is the dihedral angle force constant; δn is the phase shift angle; rij is the distance between atoms i and j; q is the partial atomic charge; and εij and σij are Lennard-Jones (LJ) parameters. Eqn. 1.30 has extensively been used over the last 40 years. At the same time much effort has been devoted to developing potential functions with a more realistic form that, for example, responds to changing conditions of the environment such as the electric field. An example is a form that accounts for the redistribution of electron density of a molecule, due to interaction with the electric field produced by other molecules in the system [20, 21, 22, 23, 24]. Models like the fluctuating charge [25], the induced point dipoles [26], and the Drude oscillator [23, 27, 28] have included the linear response of the electron density—known as the dipole polarizability—to an external electric field. It is beyond the scope of this thesis to review all of them here, but the Drude oscillator will be explained in detail in Chapter 3. A substantial effort has also been made to explicitly include the charge pen- etration effect; this refers to the overlap between two charge densities. The contribution of the charge overlap to electrostatic energy becomes important at close distances. However, the point charge model used in Eqn. 1.30 is inad- equate to account for charge penetration effects, because it gives a singularity in the electrostatic energy surface at very short distances. To overcome this problem, Hall et al. suggested, in 1986, describing the distribution of partial
20 atomic charges by a spherically symmetric Gaussian density function [29, 30]. Later, in 1991, Rappé et al. used the valance s-type Slater orbital of each atom to treat partial atomic charges in MM simulations [31]. The advances in mod- ern computers have recently made it possible to develop Gaussian or Slater charge models for either specific cases like water [32, 33, 34], carbon dioxide [35], alkali halides [36], or for a small set of compounds [37, 38]. The inclusion of charge polarization and penetration effects have increased the physical realism of force fields [37, 38]. However, it has added at least two parameters per atom to the parameter space of the force fields. One pa- rameter determines the polarizability of each atom. The other determines the diffuseness of the partial charge. Therefore, the chemical transferability of these parameters remains to be addressed on large molecular databases (see paper III). Following this path of force field development, we have attempted to in- crease the accuracy of intermolecular interactions in the Alexandria force field by parametrizing a Drude polarizable model with spherical charge densities using either a 1s-Gaussian or a Slater density function (see papers III and IV). We have also replaced the 12-6 LJ potential function, the last term in Eqn. 1.30, by a potential function that provides a more realistic description of the inter-particle repulsion interaction (see paper IV). The intermolecular potential functions used in the Alexandria force field are explained in Chapter 3.
21 2. Alexandria Library
This chapter explains the theoretical background, computational details and validation of the calculated molecular properties provided in the Alexandria library. It is based on papers I and II. The Alexandria library has been designed to parameterize the potential en- ergy functions used in the Alexandria force field. It is an open access database of the optimized geometry and physico-chemical properties of molecules ob- tained using quantum chemistry calculations. The first version of the library, AlexandriaLib.v1.1, consists of 2704 organic and inorganic compounds (paper II). The elements available in the library are colored in Fig. 2.1A and their fraction is displayed in Fig. 2.1B. However, the first version of the Alexandria force field only supports the elements that are shown in green in Fig. 2.1A. The alkali elements are available in their ionic form only while the halogens are available both as halide ions and as atoms in polyatomic compounds.
Figure 2.1. A: The 39 colored elements are available in the AlexandriaLib.v1.1. The elements supported by the first version of the Alexandria force field are shown in green. B: The fraction of the elements that are available in the AlexandriaLib.v1.1.
2.1 Data Availability The Alexandria library is freely available on the Zenodo repository that can be downloaded from https://doi.org/10.5281/zenodo.1004711.
22 2.2 Properties in the Alexandria Library The thermochemical equations explained in section 2.2.1 are implemented in the GROMACS software package [39] to calculate molecular thermochem- istry using classical force fields (paper V). The electrostatic equations ex- plained in section 2.2.2 are implemented in the Alexandria chemistry toolkit for parameterizing the Alexandria force field (papers III and VI).
2.2.1 Molecular thermochemistry The gas-phase molecular thermochemistry is useful for obtaining potential functions for bond deformation (stretching, bending, and twisting) and also to evaluate the potential functions in terms of the molecular vibrational fre- quencies (see paper V). The thermochemical properties discussed here are the internal energy (E); the molar enthalpy of formation Δ f H; the heat capacity at constant volume 0 (CV ) and pressure (Cp); and the standard entropy (S ). The internal energy of a molecule is the total energy contained within the molecule. The molar en- thalpy of formation of a chemical compound is the change of enthalpy during the formation of one mole of the compound from its constituent elements. The molar heat capacity measures the amount of energy transfered to one mole of a substance to increase its temperature by one degree Kelvin. The entropy of one mole of substance in a standard state is called the standard molar entropy. 0 Statistical thermodynamics calculates E, CV , and S of an isolated molecule in the gas phase from its geometry and its motional degrees of freedom. The thermodynamic functions are: ∂ = 2 lnQ E RT ∂ (2.1) T N,V ∂lnQ ∂ 2lnQ C = 2RT + RT 2 (2.2) V ∂ ∂ 2 T N,V T N,V ∂ 0 = + lnQ S RlnQ RT ∂ (2.3) T N,V where R is the ideal gas constant, T is the absolute temperature, and Q is the partition function of the canonical ensemble (N,V,T). The heat capacity at constant pressure can be approximated from the heat capacity at constant volume and the second viral coefficient using the equation of state of a gas. The virial expansion expresses the equation of state as a polynomial either in V¯ −1 or in P as follows [40]:
23 PV¯ = 1 + B (T)V¯ −1 + B (T)V¯ −2 + ··· (2.4) RT 2v 3v 2 = 1 + B2p(T)P + B3p(T)P + ··· (2.5) where V¯ is the molar volume, P is the pressure, and Bv(T) and Bp(T) are the virial coefficients at constant volume and at constant pressure, respectively. B2v(T) and B2p(T) are related by
B2v(T)=RTB2p(T) (2.6) Therefore, using Eqn. 2.5 and Eqn. 2.6, we can approximate the state equation of a gas up to the first order as follows:
PV = RT + B2v(T)P (2.7) This is a good approximation because the third virial coefficient is negligible even at very high pressures [40]. From the thermodynamic relations, we can derive that ∂ ∂ − = P V CP CV T ∂ ∂ (2.8) T V T P Therefore, applying Eqn. 2.8 to Eqn. 2.7 results in dB (T) P2 dB (T) 2 C = C + R + 2P 2v + 2v (2.9) P V dT R dT
It should be noted that B2v(T) is zero for an ideal gas; hence, Eqn. 2.9 simpli- fies to CP = CV + R (2.10) The canonical partition function of an ideal gas can be decomposed into partition functions of different degrees of freedom, namely electronic (el), translational (tr), rotational (rot), and vibrational (vib) motions using [40, 41]:
Q = QelQtrQrot Qvib (2.11) The rigid-rotator model is used to determine the contribution of the rotational motions into the total partition function [42]. In this model, the length of the bonds between the atoms of the molecule is considered to remain fixed while the molecule rotates. The partition function of a rigid-rotator is defined as: 1 T Qrot = (2.12) σ Θrot where σ is the symmetry number and
2 Θ = h rot 2 (2.13) 8π IkB 24 where I are the moment of inertia, h is Planck’s constant, and kB is Boltz- mann’s constant. The harmonic-oscillator approximation is usually used to describe the bond vibration around its equilibrium length. This is a good ap- proximation only if the amplitude of the vibrations is small. For a harmonic- oscillator, the partition function is defined as
−hν/kBT = e Qvib − ν/ (2.14) 1 − e h kBT where ν is the frequency of vibration. Considering all these approximations, the internal energy E can be calcu- lated by applying Eqn. 2.1 to Eqn. 2.11:
E = Etr + Erot + Evib (2.15) 3 where E is RT for both linear and nonlinear polyatomic ideal gases, while tr 2 3 E is RT for linear and RT for nonlinear molecules. The contribution of rot 2 vibrational modes into internal thermal energy is given by 3n− f hν /k T f hνi e i B Evib = RT + ∑ + (2.16) hνi/kBT − 2 i 2kBT e 1 where f is 5 for a linear and 6 for a nonlinear polyatomic molecule. Applying Eqn. 2.2 to Eqn. 2.11 gives the heat capacity at constant volume CV :
CV = Ctr +Crot +Cvib (2.17) 3 3 where C is R for all molecules, while C is R for linear and R for non- tr 2 rot 2 linear molecules. The contribution of vibrational modes into heat capacity is given by [40, 41] − − ν / f 3n f hν 2 e h i kBT C = R + ∑ i (2.18) vib − ν / 2 2 i kBT 1 − e h i kBT Similarly, by applying Eqn. 2.3 to Eqn. 2.11, the standard entropy S0 will be given by
0 S = Str + Srot + Svib (2.19) / 5 k T 2πMk T 3 2 S = R + ln B + ln B (2.20) tr 2 P h2 √ 3 π T 3/2 Srot = R + ln + ln √ (2.21) 2 σ Θrot
25 where P is the pressure and M is the mass of the molecule. The vibrational entropy is given by [40, 41] − ν / 3n f eh i kBT −hνi/kBT Svib = R ∑ − ln 1 − e (2.22) hνi/kBT − i e 1 More information than the vibrational frequencies is needed to calculate the enthalpy of formation Δ f H(M,T) of a molecule at a given temperature. It is computed in a number of steps [41]. The enthalpy of formation at T K is given by
N Δ f H(M,T)=Δ f H(M,0)+ΔΔH(M,T) − ∑[ΔΔH(x,T)] (2.23) x=1 where Δ f H(M,0) is the enthalpy of formation of the molecule at zero K, and ΔΔH(M,T) corresponds to the energy needed to increase the temperature from 0 to T K for molecule M. This can be calculated by Hcorr − ZPC where Hcorr is the thermal correction to the enthalpy of the molecule and ZPC is the zero- point correction. ΔΔH(x,T) represents the amount of energy needed to in- crease the temperature from 0 K to T for atom x in molecule M. The enthalpy of formation at0Kisgivenby
N Δ f H(M,0K)=E0(M)+∑[Δ f H(x,0) − E0(x)] (2.24) x=1 N Δ ( , ) where ∑x=1 f H x 0 is the enthalpy of formation of atom x at 0 K, N is the number of atoms in the molecule, and E0 is the total electronic energy. It should be noted that ΔΔH(x,T) and Δ f H(x,0) were obtained from experimen- tal data in the thermochemical theories used in this study[43, 44].
2.2.2 Molecular electrostatics The molecular electron density ρ(r) generates an electrostatic potential φ(r) at an arbitrary point r in space. It is, by definition, the work done to bring a unit positive charge from infinity to that point. The electrostatic potential is a physical observable that can be determined from quantum mechanical calculations. Following the superposition principle, φ(r) can be computed by integrating the contributions from individual differential elements of the electron density as follows: φ(r)= 1 1 ρ(r) r d (2.25) 4πε0 |r − r | where ε0 is the absolute permittivity of free space. It should be noted that the nuclei of the molecule also contribute to φ(r) (see Chapter 5). If the point
26 r is outside the distribution of electron density and r >> r, the electrostatic potential can be evaluated through the Taylor expansion of |r − r|−1 [13]: 1 1 1 1 1 ≈ +(rˆ · r) + 3(rˆ · r)2 − r2I + ··· (2.26) |r − r| r r2 2 r3 where rˆ = r/r and I is the identity matrix. By inserting Eqn. 2.26 into Eqn. 2.25, we get μ Θ φ(r) ≈ 1 Q + 0 + 0 + ··· 2 3 (2.27) 4πε0 r r r Q is the monopole moment, sometimes called the zeroth moment of the molec- ular electron density. In principle this is the total charge of the molecule. Q is given by Q = ρ(r)dr (2.28)
μ0 is the vector of the permanent dipole moment, which measures the polarity of the molecular electron density. μ0 is given by μ0 = ρ(r)(rˆ · r)dr (2.29)
Θ0 is the tensor of the permanent quadrupole moment, which exhibits the deviation of the distribution of the molecular electron density from spherical symmetry. Θ0 is given by 1 Θ = ρ(r) 3(rˆ · r)2 − r2I dr (2.30) 0 2 The quadrupole tensor can be written as a traceless 3 × 3 matrix in the Carte- sian coordinate if one writes (rˆ · r)2 as:
(rˆ · r)2 = rˆ · (rr) · rˆ (2.31) where rr is the outer product of vector r with itself. This results in a matrix that can be written in terms of the Cartesian components of the vector r as follows: ⎡ ⎤ x2 xy zx rr = ⎣ yx y2 yz ⎦ (2.32) zx zy z2 Finally, we get ⎡ ⎤ 3x2 − r2 3xy 3zx 1 1 3(rˆ · r)2 − r2I = rˆ · ⎣ 3yx 3y2 − r2 3yz ⎦ · rˆ (2.33) 2 2 3zx 3zy 3z2 − r2
27 Note that higher electric moments such as the octupole and hexadecapole mo- ments are not described here, but they are available in the Alexandria library. The interested reader is referred to Ref [14] for the higher electric moments. The shape of the molecular electron density changes when it interacts with an external electric field; hence, the total energy of the molecule changes. The static response of a molecule to a homogeneous external electric field, (F), can be studied by expanding its energy in a Taylor series [45, 46]: ∂E 1 ∂ 2E 1 ∂ 3E E (F)=E (0)+ F + F2 + F3 + ··· (2.34) ∂F ∂F2 ∂F3 F=0 2 F=0 6 F=0 where ∂ − E = μ ∂F 0 (2.35) F =0 ∂ 2E − = α (2.36) ∂F2 F=0 ∂ 3E − = β (2.37) ∂F3 F=0 where μ0 is the vector of permanent dipole moment, α is the tensor of po- larizability, which is the linear part of the response of the molecular electron density with respect to the external electric field, and β is the first hyperpolar- izability [45]. Instead of expanding the energy, we can expand the dipole moment of a molecule in an external electric field [46], written as 1 μ = μ + αF + βF2 + ··· (2.38) 0 2 where αF gives the vector of induced dipole moment, μ1 [45]:
μ1 = αF (2.39) that can be written in⎡ matrix⎤ form⎡ as ⎤⎡ ⎤ μx αxx αxy αxz Fx ⎣ ⎦ ⎣ ⎦⎣ ⎦ μy = αyx αyy αyz Fy (2.40) μz αzx αzy αzz Fz From the polarizability tensor the polarizability isotropy [47, 46], (α + α + α ) α¯ = xx yy zz (2.41) 3 and the polarizability anisotropy [47], Δα = [(α − α )2 +(α − α )2 +(α − α )2 + (α2 + α2 + α2 )]/ xx yy xx zz yy zz 6 xy xz yz 2 (2.42) can be calculated. There are other definitions for the polarizability anisotropy that can be found in Ref [46].
28 2.3 Computational details All the quantum chemistry calculations were performed by the Gaussian soft- ware package (versions 09 [48] and 16 [49]). The standard G2, G3, G4 [50, 51, 52, 53, 43], CBS-QB3 [54, 55], W1U, and W1BD [44] methods were used to calculate the enthalpy of formation, heat capacity, and absolute entropy at room temperature. The Weizmann family of methods was used on a subset of about 600 compounds only, due to computational cost. The B3LYP density functional was used to optimize molecular geometries and to calculate frequencies, electric moments up to hexadecapole, the polar- izability tensor, and the electrostatic potential (ESP) surface of the molecules in the aug-cc-pVTZ basis [56, 57, 58]. However, the aug-cc-pVTZ-PP ba- sis set was used for iodine to take relativistic pseudopotentials into account. Quantum-based partial atomic charges were computed for each molecule us- ing different charge generating algorithms including Mulliken Population Anal- ysis (MPA) [59], Hirshfeld Population Analysis (HPA) [60], ESP charges [61], and the Charge Model 5 (CM5) [62]. The Merz-Kollman scheme was used to generate the grids around the molecule in order to calculate the electro- static potential and its corresponding atomic charges [63]. As a reference, the same calculations were also performed at the HF/6-311G** level of the- ory [64, 65, 66, 67], which is similar to widely used methods for calculating partial atomic charges in the virtual screening of large chemical libraries.
Figure 2.2. The number of quantum chemistry calculations performed at each level of theory.
2.4 Technical validation Paper II explains the procedure used to evaluate the optimized geometry of the molecules provided in the AlexandriaLib.v1.1. The thermochemistry and the electrostatic properties obtained from quantum chemistry calculations were validated by comparing them to experimental data. Several resources of exper- imental data of physico-chemical properties were used, such as the National Institute of Standard and Technology (NIST), the Design Institute for Physi-
29 cal Properties (DIPPR)[68], and the Handbooks of Chemistry [69, 70, 71]. In some cases, multiple experimental values were reported for the same property of a molecule. If these values were similar to each other, the average and the standard deviation of the values were taken to be the reference value and the error, respectively. Where the discrepancy between values was significant, the values were cross referenced against the original publication to check for tran- scription errors. If the original publication was not accesible, the value was excluded from our statistics and reported as a suspected error in the experi- mental data in papers I and II.
30 3. Intermolecular Potential Energy Function
This chapter is based on papers III and IV. It briefly explains the quantum mechanical and molecular mechanical approximations used in computing the long-range and the short-range intermolecular interactions.
3.1 Quantum mechanical approximations for intermolecular energies The interaction energy between two molecules depends on the distance be- tween the molecules, r, and their orientations. The supermolecular approach computes the interaction energy, Eint, between molecules A and B as follows [72, 73]: Eint = EAB − (EA + EB) (3.1) where the interaction energy is simply the difference between the energy of the dimer (EAB) and the energies of the two monomers (EA and EB). However, there is always a non-physical lowering of the monomer’s energy in calcula- tions of the dimer, because each monomer uses the basis set of the interacting partner to lower its own energy. This is called the basis set superposition error (BSSE) [72, 73]. To reduce the BSSE, one solution is to use the time-independent Rayleigh- Schrödinger perturbation theory (RSPT). This method is also often called the polarization theory. The RSPT can be formulated for two interacting molecules A and B as follows [72, 73]: 0 Hˆ + ξHˆ ΨAB = EABΨAB (3.2)
0 where Hˆ is the unperturbed Born-Oppenheimer Hamiltonian operator of the dimer AB, given by 0 Hˆ = Hˆ A + Hˆ B (3.3) and Hˆ is the perturbation operator consisting of the electrostatic interaction between electrons and nuclei of molecule A with those of molecule B,given by [47] A B 1 ρˆ (r1)ρˆ (r2) Hˆ = dr1dr2 (3.4) 4πε0 |r1 − r2|
31 where ρˆ is the charge density operator. ξ defines the order of perturbation and varies between 0 and 1. ξ = 0 switches off the electrostatic interactions be- tween the two molecules; hence, the dimer energy is the sum of the monomer energies and the dimer wave function is the product of the monomer wave functions [73]: EAB(ξ = 0)=EA + EB (3.5)
ΨAB(ξ = 0)=ΨAΨB (3.6) On the other hand, ξ = 1 completely includes the electrostatic interactions between the two monomers; thus, EAB(ξ = 1) and ΨAB(ξ = 1) are the exact energy and wave function of the dimer. As the result of RSPT, the interaction energy and the dimer wave function can be expressed as an infinite power series in ξ as follows [73]: ∞ (ξ)= ξ n (n) Eint ∑ Epol (3.7) n=1 ∞ Ψ (ξ)= ξ nΨ(n) AB ∑ pol (3.8) n=1 It should be noted that Eqns. 3.7 and 3.8 converge only for small values of ξ. The energy terms in Eqn. 3.7 are often called polarization energies [72, 73]. The first-order polarization energy is equivalent to the classical electrostatic interaction energy between two charge distributions. However, an additional energy term becomes important at short intermolecular distances. This term is often called the charge penetration energy, which is a result of the overlap of the electron densities of the monomers. Therefore, the first-order polarization energy is the sum of the electrostatic and penetration energies [72]:
(1) = (1) + (1) Epol Eelstat Epenetr (3.9) The second- and the third-order polarization energies include the induction and dispersion energies [72]:
(2) = (2) + (2) Epol Eind Edisp (3.10)
(3) = (3) + (3) Epol Eind Edisp (3.11) (2) The second-order induction energy, Eind , results from the mutual polarization of the monomers by the static electric field of their unperturbed partners. How- (3) ever, the third-order induction energy, Eind , corresponds to the simultaneous (2) (3) polarization of the monomers by the field of their partners. Edisp and Edisp result from the intermolecular correlation of electrons of monomers. The problem with the RSPT is that it neglects the electron exchange be- tween the monomers. A simultaneous tunneling of two electrons is called the
32 electron exchange [73]. If the interacting monomers have WA and WB elec- trons, the number of quantum states, M, as the result of electron exchange is given by (W +W )! M = A B (3.12) WA!WB! Note that the electron exchange might bring two electrons with the same spin into the same orbital; this is not allowed by the Pauli exclusion principle. It results in an energy cost; therefore, the energy associated with the electron exchange is repulsive. The lack of electron exchange in RSPT can be cured by applying an anti- symmetrization operator Aˆ to ΨAB(ξ = 0) [72, 73]:
AˆΨAB(ξ = 0)=AˆΨAΨB (3.13)
0 However, the unperturbed Hamiltonian Hˆ is no longer the sum of the monomer Hamiltonians, as defined in Eqn. 3.3. Many methods have been explored to resolve this problem over years [74]. One of these is called symmetry-adapted perturbation theory (SAPT) [73], which was used in paper III. The SAPT adds energy correction terms—because of the electron exchange—to each order of polarization energies [72]. Thus as the result, we get:
(1) = (1) + (1) E Epol Eexch (3.14) (2) = (2) + (2) + (2) E Epol Eexch−ind Eexch−disp (3.15) (1) ∼ Eexch accounts for 90% of the exchange energy in the interaction energy at van der Waals distances [73]. The reader is referred to Ref [47, 75] for the details of the SAPT and the exchange energy corrections. The RSPT and SAPT methods, as shown above, decompose the intermolec- ular interaction energy into different energy components. Table 3.1 summa- rizes the distance dependency and the energy sign for the first- and second- order polarization energies that mainly contribute to the total interaction en- ergy [72].
Table 3.1. Intermolecular distance dependence and the sign of energy for the compo- nents of intermolecular interaction energy [72]. r dependence sign (1) −1 −2 −3 ··· Eelstat r , r , r , +/- (1) −ar Epenetr e + (1) −br Eexch e + (2) −4 −6 Eind r , r - (2) −6 −8 −10 ··· Edisp r , r , r , -
33 3.2 Alexandria force field approximations for intermolecular energies Molecular-mechanical force fields use simple potential energy functions to compute the intermolecular interaction energy of a molecular system, as ex- plained in chapter 1 (see Eqn. 1.30). What follows in this chapter explains the potential functions implemented to include explicit terms for electronic polar- ization and charge penetration effects in the Alexandria force field. Our aim is to use potential functions that change smoothly as a function of distance and that remain well-behaved as the distance between two molecules approches zero.
3.2.1 Electrostatic and Charge Penetration The electrostatic energy between atoms i and j can be computed through the Coulomb integral as follows: