<<

Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1793

Alexandria: A General Drude Polarizable with Spherical Charge Density

MOHAMMAD MEHDI GHAHREMANPOUR

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 ISBN 978-91-513-0624-7 UPPSALA urn:nbn:se:uu:diva-380687 2019 Dissertation presented at Uppsala University to be publicly examined in Room B21, Uppsala Biomedical Centre, Husargatan 3, Uppsala, Monday, 27 May 2019 at 09:15 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Professor Kresten Lindorff-Larsen (Department of Biology, University of Copenhagen).

Abstract Ghahremanpour, M. M. 2019. Alexandria: A General Drude Polarizable Force Field with Spherical Charge Density. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1793. 69 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-513-0624-7.

Molecular-mechanical (MM) force fields are mathematical functions that map the geometry of a molecule to its associated energy. MM force fields have been extensively used for an atomistic view into the dynamic and thermodynamics of large molecular systems in their condensed phase. Nevertheless, the grand challenge in force field development—which remains to be addressed —is to predict properties of materials with different chemistries and in all their physical phases. Force fields are, in principle, derived through supervised machine learning methods. Therefore, the first step toward more accurate force fields is to provide high-quality reference data from which the force fields can learn. Thus, we benchmarked -mechanical methods—at different levels of theory—in predicting of molecular energetics and electrostatic properties. As the result, the Alexandria library was released as an open access database of molecular properties. The second step is to use potential functions describing interactions between molecules accurately. For this, we incorporated electronic polarization and charge penetration effects into the Alexandria force field. The Drude model was used for the explicit inclusion of electronic polarization. The distribution of the atomic charges was described by either a 1s-Gaussian or an ns-Slater density function to account for charge penetration effects. Moreover, the 12-6 Lennard-Jones (LJ) potential function, commonly used in force fields, was replaced by the Wang-Buckingham (WBK) function to describe the interaction of two particles at very short distances. In contrast to the 12-6 LJ function, the WBK function is well behaved at short distances because it has a finite limit as the distance between two particles approaches zero. The third step is free and open source software (FOSS) for systematic optimization of the built-in force field parameters. For this, we developed the Alexandria chemistry toolkit that is currently part of the GROMACS software package. With these three steps, the Alexandria force field was developed for alkali halides and for organic compounds consisting of (H, C, N, O, S, P) and halogens (F, Cl, Br, I). We demonstrated that the Alexandria force field described alkali halides in gas, liquid, and phases with an overall performance better than the benchmarked reference force fields. We also showed that the Alexandria force field predicted the electrostatics of isolated molecules and molecular complexes in agreement with the density functional theory at the B3LYP/aug-cc-pVTZ level of theory.

Keywords: Molecular mechanics, Force field, Drude oscillator model, Alexandria library, GROMACS

Mohammad Mehdi Ghahremanpour, Department of Cell and Molecular Biology, Box 596, Uppsala University, SE-75124 Uppsala, Sweden.

© Mohammad Mehdi Ghahremanpour 2019

ISSN 1651-6214 ISBN 978-91-513-0624-7 urn:nbn:se:uu:diva-380687 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-380687) To Rezvan

List of papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Ghahremanpour, M. M., van Maaren, P. J., Ditz, J., Lindh, R., Van der Spoel, D. (2016) Large-Scale Calculations of Gas Phase Thermochemistry: Enthalpy of Formation, Standard Entropy and Heat Capacity. J. Chem. Phys., 145, 114305.

II Ghahremanpour, M. M., van Maaren, P. J., Van der Spoel, D. (2018) The Alexandria Library: A Quantum Chemical Database of Molecular Properties for Force Field Development. Sci. Data, 5, 180062.

III Ghahremanpour, M. M., van Maaren, P. J., Caleman, C., Hutchison, G. R., Van der Spoel, D. (2018) Polarizable Drude Model with s-type Gaussian or Slater Charge Density for General Molecular Mechanics Force Fields. J. Chem. Theory Comput. 14, 5553-5566.

IV Walz, M. M., Ghahremanpour, M. M., van Maaren, P. J., Van der Spoel, D. (2018) Phase-Transferable Force Field for Alkali Halides. J. Chem. Theory Comput., 14, 5933-5948.

V Van der Spoel, D., Ghahremanpour, M. M., Lemkul, J. (2018) Small Molecule Thermochemistry: A Tool for Empirical Force Field Development. J. Phys. Chem. A, 122, 8982-8988.

VI Ghahremanpour, M. M., van Maaren, P. J., Van der Spoel, D. Efficient Physics-Based Polarizable Charges: from Organic Compounds to Proteins. Manuscript.

Reprints were made with permission from the publishers.

Contents

1 Introduction ...... 11 1.1 Quantum-Mechanical Models ...... 12 1.1.1 Wave function theory ...... 12 1.1.2 Density functional theory ...... 15 1.2 Molecular-Mechanical Models ...... 17

2 Alexandria Library ...... 22 2.1 Data Availability ...... 22 2.2 Properties in the Alexandria Library ...... 23 2.2.1 Molecular thermochemistry ...... 23 2.2.2 Molecular electrostatics ...... 26 2.3 Computational details ...... 29 2.4 Technical validation ...... 29

3 Intermolecular Potential Energy Function ...... 31 3.1 Quantum mechanical approximations for intermolecular energies ...... 31 3.2 Alexandria force field approximations for intermolecular energies ...... 34 3.2.1 Electrostatic and Charge Penetration ...... 34 3.2.2 Electronic Polarization ...... 36 3.2.3 Repulsion and Dispersion ...... 38

4 Generation of Polarizable Atomic Charges ...... 42 4.1 Electrostatic Potential (ESP)-fitting with Drude Model ...... 42 4.2 Alexandria Charge Model ...... 44

5 Parameterization ...... 47 5.1 Linear Fitting ...... 47 5.1.1 Singular value decomposition ...... 47 5.1.2 Bootstrapping ...... 48 5.2 Nonlinear Fitting ...... 48 5.2.1 Bayesian inference ...... 48 5.2.2 Bayesian computation ...... 50 5.2.3 Simulated annealing ...... 52

6 Alexandria Chemistry Toolkit ...... 53 6.1 Extracting Quantum Chemistry Data ...... 53 6.2 Generation of Force Field Atom Types ...... 53 6.3 Optimization of Force Field Parameters ...... 53 6.4 Generation of Molecular Topology and Atomic Charges ...... 54 6.5 Coulomb Integrals for Distributed Charge Densities ...... 55 6.6 Parallelization ...... 55 6.7 License ...... 55

7 Summary of papers ...... 56 7.1 Paper I ...... 56 7.2 Paper II ...... 56 7.3 Paper III ...... 57 7.4 Paper IV ...... 57 7.5 Paper V ...... 58 7.6 Paper VI ...... 59

8 Populärvetenskaplig Sammanfattning på Svenska ...... 60

References ...... 63 Abbreviations

ACM Alexandria Charge Model ACT Alexandria Chemistry Toolkit BSSE Basis Set Superposition Error DFA Density Functional Approximation DFT Density Functional Theory DM Drude Model ESP Electrostatic Potential HF Hatree-Fock MM Molecular Mechanics QM Quantum Mechanics SVD Singular Value Decomposition

1. Introduction

The longstanding and far-reaching goal in computational chemistry is to pre- dict—with a single model—the state of chemical compounds with different chemistries and in all their physical phases. At the same time the model should be applicable to systems of any size. However, theory-driven molecular mod- els are all approximations limited to their domain(s) of applicability. The approximations range from quantum mechanics (QM) to molecular mechanics (MM). The tradeoff between the simplicity (brute force) and the complexity (sophistication) of these approximations makes it challenging to design a single model with an efficient predicting power of the structure and the property of molecules and at a moderate computational cost. For instance, QM models at a high level of theory are more accurate than MM models but at increased computational cost. This limits their applicability to small- and medium-sized systems. Consequently, MM models are increasingly used for an atomistic view into the dynamic and thermodynamics of large systems such as liquids and biomacromolecules [1, 2, 3]. In spite of all the progress in MM models [4, 5, 6, 7, 8], they have not yet reached the level of elegance to be phase and chemically transferable. The aim of the Alexandria project is to provide a platform to systematically advance MM models toward these goals. This platform consists of three main com- ponents: High Quality Reference Data (HQRD), Free and Open Source Soft- ware (FOSS), and Accurate Potential Energy Functions (APEF). MM models are parametric; the role of these components together, as explained in what follows, is central to the systematic development of MM models.

Figure 1.1. The development procedure of the Alexandria force field in a nutshell.

This thesis is structured to explain Fig. 1.1 as follows: the first part of Chap- ter 1 briefly explains the basics of the wave function and the density functional theory used to provide quantum chemistry reference data. The second part of Chapter 1 explains molecular mechanics and the formulation of classical force fields. In Chapters 2-6, I turn to specifically explain HQRD, APEF, and FOSS

11 in the context of the Alexandria project. Chapter 2 introduces the Alexandria library that is related to the HQRD component. Chapters 3 and 4, which are about the APEF component, explain the potenial energy functions used in the Alexandria force field. Chapters 5 and 6 are related to the FOSS component. Chapter 5 explains the optimization algorithms implemented to parameterize the potential functions. Chapter 6 introduces the Alexandria Chemistry Toolkit (ACT) as a free and open source software for force field parameterization. Fi- nally, Chapter 7 summarizes each paper included into this thesis.

1.1 Quantum-Mechanical Models 1.1.1 Wave function theory The Schrödinger wave equation for a non-relativistic and time-independent system is formulated as follows: Hˆ Ψ(x)=EΨ(x) (1.1) where Hˆ is the Hamiltonian operator of M nuclei and N with position vectors Ra and ri, respectively. Ψ(x) is the molecular wave function where x is a vector consisting of three spatial coordinates r and one spin coordinate ω. E is the energy of molecule. In the absence of external magnetic and electric fields, Hˆ is defined, in atomic unit, as: M ∇2 N M N ˆ = − 1 a − 1 ∇2 − Za H ∑ ∑ i ∑∑ |R − r | 2 a Ma 2 i a i a i M M N N + ZaZb + 1 ∑ ∑ |R − R | ∑ ∑ |r − r | (1.2) a=1 b>a a b i=1 j>i i j where the first two terms are the kinetic energy of nuclei and electrons, re- spectively. ∇ is the Laplace operator and Ma is the mass of nucleus a. The third term is the attractive electrostatic interaction between nuclei and elec- trons. The last two terms are the repulsive electrostatic energy because of the nucleus-nucleus and -electron interactions, respectively. Eqn. 1.1 and Eqn. 1.2 can be simplified through two central approximations: the Born-Oppenheimer (BO) and the Hartree-Fock (HF) approximations. The BO approximation assumes that the position of nuclei in a molecule is fixed relative to the position of electrons. This suggests that electrons move in the field of fixed nuclei [9]. This is a useful approximation because of the fact that nuclei are much heavier than electrons; thus, they move much more slowly [9]. As a result, Eqn. 1.2 simplifies to

N M N N N ˆ = −1 ∇2 − Za + 1 He ∑ i ∑∑ |R − r | ∑ ∑ |r − r | (1.3) 2 i a i a i i=1 j>i i j

12 where Hˆ e is called the electronic Hamiltonian. The wave function of a molecule with N electrons is antisymmetric with respect to the interchange of the spatial coordinates of any two electrons. This means that Ψ(x1,x2,···,xN)=−Ψ(x2,x1,···,xN) (1.4) If we assume that these N electrons do not interact with each other, the molec- ular wave function is simply the product of N one-electron spin orbitals χ(x):

Ψ(x1,x2,···,xN)=χi(x1)χ j(x2),···, χk(xN) (1.5) This product is known as the Hartree Product (HP), which becomes antisym- metric if it is written as a determinant. This is often referred to as the Slater determinant (SD) [9, 10]: 1 ΨSD(x)=√ χi(x1)χ j(x2),···, χk(xN) (1.6) N! Note that χ(x) can also be written as the product of one spatial orbital ψ(r) and one spin orbital ω(α/β) [10]:

χ(x)=ψ(r)ω(α/β) (1.7) where ω(α) is for spin up and ω(β) is for spin down. The exact ψ(r) is not known, but it can be approximated from a linear combination of a finite number of basis functions φ(r), given by

k ψi(r)=∑cijφ j(r) (1.8) j where cij is the coefficient of the basis function j in the spatial orbital i [9, 11]. For a Gaussian basis function, Eqn. 1.8 is written as:

k 2 −ξ jr ψi(r)=∑cije (1.9) j where ξ j is the exponent of the basis function j. In the HF approximation, the energy and the electronic Hamiltonian of a molecule is approximated by the sum over N one-electron energies and one- electron Hamiltonian (Fock) operators [9], given by

N Ee = ∑εi (1.10) i

N Hˆ e = ∑ fˆ(i) (1.11) i

13 such that fˆ(i)χ(xi)=εiχ(xi) (1.12) where the Fock operator is defined as

M ˆ( )=−1∇2 − Za + HF f i i ∑ |R − r | Ui (1.13) 2 a a i

HF where Ui is the average potential energy felt by the i’th electron because of the presence of the other (N-1) electrons, given by

N HF = ( ˆ − ˆ ) Ui ∑ Jj Kj (1.14) j where Jˆ is the Coulomb operator and Kˆ is the exchange operator that does not have a classical interpretation [11]. Note that for i = j in Eqn. 1.14, the Coulomb and the exchange energies are equal with opposite signs; thus, the Coulomb self-interaction energy will cancel out [11, 10]. Now we need to find the best set of spin orbitals forming the Slater de- terminant that minimizes the electronic energy of the molecule subject to the constrain that the spin orbitals remain orthonormal [9, 11]. This can be for- mulated as Ψ∗ ˆ Ψ τ SDHe SDd E = min ∗ , (1.15) HF Ψ →Ψ Ψ Ψ τ SD SD SDd based upon the Variational principle. The lowest energy obtained from this procedure corresponds to the the ground-state of the molecule. Since Eqn. 1.12 is nonlinear, Eqn. 1.15 is solved iteratively using a self-consistent field (SCF) procedure, which, in practice, optimizes the coefficient of basis functions used in Eqn. 1.8 to expand the spatial orbitals [11]. The number of basis functions profoundly affect the accuracy and the efficiency of the quantum chemistry models. A larger basis set increases accuracy at increased computational cost. However, any set of basis functions is finite; thus, QM models suffer from the basis set incompleteness error. The basis sets used for this work are men- tioned in Chapter 2 and their performance is discussed in paper II. For the curious reader, theoretical details of different basis sets are explained in ref [9]. As explained above, the Born-Oppenheimer and the HF approximations simplify the wave function theory. Nevertheless, the wave function of an N- electron molecule is a complicated function that depends on 4N variables and cannot be measured experimentally. It should be noted that the HF approxima- tion considers the correlation between the motion of electrons with the same spin, but it neglects the correlation between electrons with apposite spins in the wave function. This is compensated in Post-Hartree Fock (PHF) meth- ods such as the Møller-Plesset (MP) perturbation theory, the coupled-cluster

14 (CC) theory, and the full configuration interaction (FCI), at the price of higher computational cost. The reader is referred to ref [9] for the theory of PHF methods.

1.1.2 Density functional theory The Density Functional Theory (DFT) is another approach to compute the energy of a chemical compound based on the principles of quantum mechan- ics. DFT considers the molecular electron density, ρ(r), as the main variable rather than the molecular wave function. The electron density can be probed experimentally and only depends on three spatial variables. ρ(r) integrates to the number of electrons (N). The nuclear charge is generally treated as a point charge; therefore in the ground-state, the electron density forms a maximum or a cusp at the position of each nucleus (R). The charge magnitude of nucleus can also be obtained from the electron density and its gradient as the position of the nucleus is approached [10, 11]: ∂ lim + 2Z ρ¯(r) = 0 (1.16) r→R ∂r where ρ¯(r) is the spherical average of ρ(r) [10]. The Hohenberg-Kohn (HK) theorems form the core of the density func- tional theory. The proof of the HK theorems can be found in refs [10, 11]. The first HK theorem states that the ground-state energy of a molecule is a unique functional of its electron density. The second HK theorem states that the energy given by the functional corresponds to the ground-state if and only if the electron density is the exact electron density of the ground-state. There- fore, to use DFT, we need to obtain the electron density of the ground-state of the molecule as well as the functional that maps the electron density to its associated energy. The electron density of a system consisting of N non-interacting electrons can be obtained by a sum over squared one-electron orbitals:

N/2 ρ(r)= ψ ∗ (r)ψ (r) 2 ∑ i i (1.17) i ∗ where ψ (r) is the complex conjugate of ψ(r), which is expanded by the linear combination of basis functions φ(r) following Eqn. 1.8. Therefore, we need to optimize the coefficients of the basis functions to minimize the energy of the molecule. However, the search for spatial orbitals is constrained such that any trial electron density ρ˜(r) must integrate to the number of electrons N. This constraint is taken into account by means of Lagrange multiplier μ as follows: ∂ E − μ ρ˜(r)dr − N = 0 (1.18) ∂ρ˜(r)

15 where μ is also called the chemical potential, which is used for determinin- ing partial atomic charges as it is explained in Chapter 4 (see Section 4.2). According to the second HK theorem, the energy associated with any ρ˜(r) is either higher or equal to the ground-state energy of the molecule:

E[ρ(r)] ≥ E[ρ˜(r)] (1.19)

Therefore, the search for the electron density needs to continue until the dif- ference between the energy of trial electron densities converges to a threshold. We now need to define E[ρ(r)]. The Kohn-Sham (KS) theory states that E[ρ(r)] consists of four components [11], given by

E[ρ(r)] = T[ρ(r)] +Uee[ρ(r)] +Une[ρ(r)] + Exc[ρ(r)] (1.20) where T gives the kinetic energy of the electron density, Uee yields the electron- electron repulsion energy, Une gives the nucleus-electron energy, and Exc gives the exchange-correlation energy. The first three terms can be obtained as fol- lows: N 1 2 T[ρ(r)] = ∑ψi(r)| − ∇ |ψi(r) (1.21) i 2 1 ρ(r)ρ(r) U [ρ(r)] = drdr (1.22) ee 2 |r − r|

Une[ρ(r)] = ρ(r)νext(r)dr (1.23) where νext(r) is the external potential because of the nuclei:

ν (r)= Za ext ∑ |r − R | (1.24) a a

The cumbersome term in Eqn. 1.20 is the exchange-correlation functional Exc, for which no exact solution exists. Therefore, much effort has been put into developing density functional approximations (DFA). Many approches have been explored to approximate Exc. Some of these are the Local Density Approximation (LDA), the Generalized Gradient Approximation (GGA), the Meta-GGA, and the Hybrid Functionals, such as the B3LYP functional used in this work. The main idea behind a hybrid functional is to include a fraction of the Hartree-Fock exchange into the functional in a parametric approach. The amount of the HF exchange energy is controlled by empirical parameters. It is beyond the scope of this thesis to review the density functional approxima- tions, so the reader is referred to Ref [12] for a comprehensive review on these methods.

16 1.2 Molecular-Mechanical Models The main source of the high computational cost of QM methods is the explicit presence of electrons and their interactions with each other and with nuclei in the quantum-mechanical Hamiltonian. However, a molecular-mechanical Hamiltonian takes electrons into account implicitly in the form of partial charg- es associated with the atoms inside a molecule. The atoms are held to each other by means of classical springs representing the chemical bonds. The bond information then is built into the model in terms of bond lengths, bond angles, and dihedral angles. These are often referred to as the internal coordinates of the molecule (Figure 1.2).

Figure 1.2. The internal coordinates of a molecule in terms of the bond length l, the bond angle θ, and the dihedral angle φ.

In molecular mechanics, the mechanics of the system is based on particles. Therefore, the Newton equation needs to be solved instead of the Schrödinger wave equation or DFT. The Newton equations are formulated as

p x˙ = (1.25) m

∂U(x) 1 x¨ = − (1.26) ∂x m where x is the position, p is the momentum, and m is the mass of the particle. U is the potential energy that is a function of the position. x˙ and x¨ represent the velocity (v) and acceleration (a), respectively. It has been shown elsewhere that the laws of motion in classical mechanics follow from quantum mechan- ics [13]. The Newton equations of motion cannot be solved analytically for systems larger than two particles, this is known as the many-body problem. Thus, finite-difference methods like the leap-frog algorithm are used to solve

17 the Newton equations numerically as follows [14]: 1 x(t + δt)=x(t)+δtv(t + δt) (1.27) 2 1 1 v(t + δt)=v(t − δt)+δa(t) (1.28) 2 2 where the equations are solved step-by-step based on the time interval δt. The transition from quantum mechanics to molecular mechanics is, how- ever, a drastic simplification. It needs to be examined carefully whether this transition is justified. The influence of quantum character of a particle on its behavior can be roughly approximated from its minimum quantum width, σx. For a (nearly) classical particle in thermal equilibrium, σx can be obtained from the Heisenberg’s uncertainty principle [13]: h¯ σx ≥ (1.29) 2 mkβ T where h¯ = h/2π and h is the Planck’s constant, kβ is the Boltzmann constant, and T is the temperature of the particle. The quantum effect becomes impor- tant when the force acting on the particle changes considerably over the width of the particle. For instance, this happens when the width of the particle ex- ceeds ∼0.01 nm in the condensed phase with an inter-particle distance of a few tenths nanometer [13]. Table 1.1 lists σx for electron and some elements of the periodic table at different temperatures from 10 K to 300 K. The values printed in this table shows that the quantum character of particles decreases as the temperature increases. Heavy elements like carbon, oxygen and idoine can be assumed to be nearly classical at 300 K. However, hydrogen has a sig- nificant amount of quantum character even at 300 K [13].

Table 1.1. The minimal quantum width in nm of the electron and some atoms at temperatures between 10 and 300 K, derived from Heisenberg’s uncertainty relation [13]. m (u) 10 K 30 K 100 K 300 K e 0.0000545 4.7 2.7 1.5 0.86 H 1 0.11 0.064 0.035 0.02 C 12 0.032 0.018 0.010 0.0058 O 16 0.028 0.016 0.0087 0.0050 I 127 0.0098 0.0056 0.0031 0.00018

The importance of quantum effects on a physical property can be measured experimentally through the isotopic effect [13, 1]. The quantum effect is not negligible if a property of interest is largely dependent on the isotopic com- position of the system. However, quantum corrections can be applied to some properties calculated via classical simulations. For instance, quantum correc- tions can be applied to thermodynamic functions such as the Helmholtz free

18 energy, and structural quantities such as the radial distribution function g(r). These are explained in other places [1, 15]. To directly take quantum effects into account for light nuclei like the hydrogen, other methods, such as the path-integral simulation, can be used, but at a higher computational cost. This is explained in Ref [1]. In spite of all these shortcomings, molecular mechanics has been exten- sively used in theoretical studies of molecular systems in the condensed phase over the last four decades. The low computational cost of MM methods allows sampling the configurational space of a molecular system. This makes a link between microscopic properties, such as the atomic interactions in a liquid, for example, to macroscopic properties, such as density, pressure, tempera- ture, and free energy [1]. This has helped the study of the dynamic and ther- modynamics of macromolecules in solution–from biomolecules to synthetic polymers–at atomistic level [3]. This not only is of academic interest but also useful in industrial applications, such as drug discovery [16]. Therefore, in the Alexandria project, we attempt to increase the accuracy of molecular simula- tions for describing molecular interactions at the atomic level. As mentioned earlier, molecular mechanics maps the geometry of a molecule to its associated energy through a potential energy function U. U is a scalar- valued mathematical function such that -∇U = F, where F is a conservative force field [14]. Potential functions consists of a set of built-in parameters (constants) that influence their output (energy). The accuracy of the molecular energies depends on the form and the parameters of these potential functions. Parameters also control the transferability of the potential functions between chemicals with different chemistries. The reason for this is that these parame- ters are, in essence, derived by (supervised) machine learning algorithms that can learn from training datasets. Thus, the data quality and the diversity of the molecules in the training sets determine the domain of accuracy and transfer- ability of the resulting parameters. In the early years of force field development, experimental data were the only source from which to learn the parameters. However, the amount of ex- perimental data for molecular properties is limited despite the vast chemical space of synthetic and virtual compounds. Generating quantum chemistry data for chemical compounds from different parts of the chemical space will pro- vide an altenative source of reference data to accelerate progress in molecular- mechanical force fields [17, 18, 19]. QM methods allow calculation of molec- ular properties even for toxic compounds for which performing experiments is hazardous. For this reason, we have benchmarked different quantum chem- istry methods against the accessible experimental data. The result was the Alexandria library that has been released as an open access quantum chem- istry database of optimized geometries and the gas-phase physico-chemical properties of chemical compounds for force field development (see papers II and III). However, it should be noted that for the bulk and dynamic properties

19 of and liquids, experiment is still our main source of data (see paper IV). The form of the potential functions also profoundly influences the accuracy and computational efficiency of MM models. Force fields have traditionally been formulated as [1]:

1 2 U = ∑ kl(l − le) bonds 2 1 2 + ∑ kθ (θ − θe) angles 2

+ ∑ ∑kφ,n[1 + cos(nφ + δn)] torsions n + 1 qiq j ∑ πε ij 4 0 rij 12 6 σij σij + ∑4εij − (1.30) ij rij rij where kl is the bond stretch force constant; le is the equilibrium bond length; l is the bond length of two connected atoms ij; kθ is the bending angle force constant; θe is the equilibrium bending angle; θ is the bending angle of three connected atoms ijk; φ is the dihedral angle of four connected atoms ijkl; n is the number of minima in a rotation of 2π around the j − k bond, kφ,n is the dihedral angle force constant; δn is the phase shift angle; rij is the distance between atoms i and j; q is the partial atomic charge; and εij and σij are Lennard-Jones (LJ) parameters. Eqn. 1.30 has extensively been used over the last 40 years. At the same time much effort has been devoted to developing potential functions with a more realistic form that, for example, responds to changing conditions of the environment such as the electric field. An example is a form that accounts for the redistribution of electron density of a molecule, due to interaction with the electric field produced by other molecules in the system [20, 21, 22, 23, 24]. Models like the fluctuating charge [25], the induced point dipoles [26], and the Drude oscillator [23, 27, 28] have included the linear response of the electron density—known as the dipole polarizability—to an external electric field. It is beyond the scope of this thesis to review all of them here, but the Drude oscillator will be explained in detail in Chapter 3. A substantial effort has also been made to explicitly include the charge pen- etration effect; this refers to the overlap between two charge densities. The contribution of the charge overlap to electrostatic energy becomes important at close distances. However, the point charge model used in Eqn. 1.30 is inad- equate to account for charge penetration effects, because it gives a singularity in the electrostatic energy surface at very short distances. To overcome this problem, Hall et al. suggested, in 1986, describing the distribution of partial

20 atomic charges by a spherically symmetric Gaussian density function [29, 30]. Later, in 1991, Rappé et al. used the valance s-type Slater orbital of each atom to treat partial atomic charges in MM simulations [31]. The advances in mod- ern computers have recently made it possible to develop Gaussian or Slater charge models for either specific cases like water [32, 33, 34], carbon dioxide [35], alkali halides [36], or for a small set of compounds [37, 38]. The inclusion of charge polarization and penetration effects have increased the physical realism of force fields [37, 38]. However, it has added at least two parameters per atom to the parameter space of the force fields. One pa- rameter determines the polarizability of each atom. The other determines the diffuseness of the partial charge. Therefore, the chemical transferability of these parameters remains to be addressed on large molecular databases (see paper III). Following this path of force field development, we have attempted to in- crease the accuracy of intermolecular interactions in the Alexandria force field by parametrizing a Drude polarizable model with spherical charge densities using either a 1s-Gaussian or a Slater density function (see papers III and IV). We have also replaced the 12-6 LJ potential function, the last term in Eqn. 1.30, by a potential function that provides a more realistic description of the inter-particle repulsion interaction (see paper IV). The intermolecular potential functions used in the Alexandria force field are explained in Chapter 3.

21 2. Alexandria Library

This chapter explains the theoretical background, computational details and validation of the calculated molecular properties provided in the Alexandria library. It is based on papers I and II. The Alexandria library has been designed to parameterize the potential en- ergy functions used in the Alexandria force field. It is an open access database of the optimized geometry and physico-chemical properties of molecules ob- tained using quantum chemistry calculations. The first version of the library, AlexandriaLib.v1.1, consists of 2704 organic and inorganic compounds (paper II). The elements available in the library are colored in Fig. 2.1A and their fraction is displayed in Fig. 2.1B. However, the first version of the Alexandria force field only supports the elements that are shown in green in Fig. 2.1A. The alkali elements are available in their ionic form only while the halogens are available both as halide and as atoms in polyatomic compounds.

Figure 2.1. A: The 39 colored elements are available in the AlexandriaLib.v1.1. The elements supported by the first version of the Alexandria force field are shown in green. B: The fraction of the elements that are available in the AlexandriaLib.v1.1.

2.1 Data Availability The Alexandria library is freely available on the Zenodo repository that can be downloaded from https://doi.org/10.5281/zenodo.1004711.

22 2.2 Properties in the Alexandria Library The thermochemical equations explained in section 2.2.1 are implemented in the GROMACS software package [39] to calculate molecular thermochem- istry using classical force fields (paper V). The electrostatic equations ex- plained in section 2.2.2 are implemented in the Alexandria chemistry toolkit for parameterizing the Alexandria force field (papers III and VI).

2.2.1 Molecular thermochemistry The gas-phase molecular thermochemistry is useful for obtaining potential functions for bond deformation (stretching, bending, and twisting) and also to evaluate the potential functions in terms of the molecular vibrational fre- quencies (see paper V). The thermochemical properties discussed here are the internal energy (E); the molar enthalpy of formation Δ f H; the heat capacity at constant volume 0 (CV ) and pressure (Cp); and the standard entropy (S ). The internal energy of a molecule is the total energy contained within the molecule. The molar en- thalpy of formation of a chemical compound is the change of enthalpy during the formation of one mole of the compound from its constituent elements. The molar heat capacity measures the amount of energy transfered to one mole of a substance to increase its temperature by one degree Kelvin. The entropy of one mole of substance in a standard state is called the standard molar entropy. 0 Statistical thermodynamics calculates E, CV , and S of an isolated molecule in the gas phase from its geometry and its motional degrees of freedom. The thermodynamic functions are: ∂ = 2 lnQ E RT ∂ (2.1) T N,V ∂lnQ ∂ 2lnQ C = 2RT + RT 2 (2.2) V ∂ ∂ 2 T N,V T N,V ∂ 0 = + lnQ S RlnQ RT ∂ (2.3) T N,V where R is the constant, T is the absolute temperature, and Q is the partition function of the canonical ensemble (N,V,T). The heat capacity at constant pressure can be approximated from the heat capacity at constant volume and the second viral coefficient using the equation of state of a gas. The virial expansion expresses the equation of state as a polynomial either in V¯ −1 or in P as follows [40]:

23 PV¯ = 1 + B (T)V¯ −1 + B (T)V¯ −2 + ··· (2.4) RT 2v 3v 2 = 1 + B2p(T)P + B3p(T)P + ··· (2.5) where V¯ is the molar volume, P is the pressure, and Bv(T) and Bp(T) are the virial coefficients at constant volume and at constant pressure, respectively. B2v(T) and B2p(T) are related by

B2v(T)=RTB2p(T) (2.6) Therefore, using Eqn. 2.5 and Eqn. 2.6, we can approximate the state equation of a gas up to the first order as follows:

PV = RT + B2v(T)P (2.7) This is a good approximation because the third virial coefficient is negligible even at very high pressures [40]. From the thermodynamic relations, we can derive that ∂ ∂ − = P V CP CV T ∂ ∂ (2.8) T V T P Therefore, applying Eqn. 2.8 to Eqn. 2.7 results in dB (T) P2 dB (T) 2 C = C + R + 2P 2v + 2v (2.9) P V dT R dT

It should be noted that B2v(T) is zero for an ideal gas; hence, Eqn. 2.9 simpli- fies to CP = CV + R (2.10) The canonical partition function of an ideal gas can be decomposed into partition functions of different degrees of freedom, namely electronic (el), translational (tr), rotational (rot), and vibrational (vib) motions using [40, 41]:

Q = QelQtrQrot Qvib (2.11) The rigid-rotator model is used to determine the contribution of the rotational motions into the total partition function [42]. In this model, the length of the bonds between the atoms of the molecule is considered to remain fixed while the molecule rotates. The partition function of a rigid-rotator is defined as: 1 T Qrot = (2.12) σ Θrot where σ is the symmetry number and

2 Θ = h rot 2 (2.13) 8π IkB 24 where I are the moment of inertia, h is Planck’s constant, and kB is Boltz- mann’s constant. The harmonic-oscillator approximation is usually used to describe the bond vibration around its equilibrium length. This is a good ap- proximation only if the amplitude of the vibrations is small. For a harmonic- oscillator, the partition function is defined as

−hν/kBT = e Qvib − ν/ (2.14) 1 − e h kBT where ν is the frequency of vibration. Considering all these approximations, the internal energy E can be calcu- lated by applying Eqn. 2.1 to Eqn. 2.11:

E = Etr + Erot + Evib (2.15) 3 where E is RT for both linear and nonlinear polyatomic ideal gases, while tr 2 3 E is RT for linear and RT for nonlinear molecules. The contribution of rot 2 vibrational modes into internal thermal energy is given by 3n− f hν /k T f hνi e i B Evib = RT + ∑ + (2.16) hνi/kBT − 2 i 2kBT e 1 where f is 5 for a linear and 6 for a nonlinear polyatomic molecule. Applying Eqn. 2.2 to Eqn. 2.11 gives the heat capacity at constant volume CV :

CV = Ctr +Crot +Cvib (2.17) 3 3 where C is R for all molecules, while C is R for linear and R for non- tr 2 rot 2 linear molecules. The contribution of vibrational modes into heat capacity is given by [40, 41] − − ν / f 3n f hν 2 e h i kBT C = R + ∑ i (2.18) vib − ν / 2 2 i kBT 1 − e h i kBT Similarly, by applying Eqn. 2.3 to Eqn. 2.11, the standard entropy S0 will be given by

0 S = Str + Srot + Svib (2.19) / 5 k T 2πMk T 3 2 S = R + ln B + ln B (2.20) tr 2 P h2 √ 3 π T 3/2 Srot = R + ln + ln √ (2.21) 2 σ Θrot

25 where P is the pressure and M is the mass of the molecule. The vibrational entropy is given by [40, 41] − ν / 3n f eh i kBT −hνi/kBT Svib = R ∑ − ln 1 − e (2.22) hνi/kBT − i e 1 More information than the vibrational frequencies is needed to calculate the enthalpy of formation Δ f H(M,T) of a molecule at a given temperature. It is computed in a number of steps [41]. The enthalpy of formation at T K is given by

N Δ f H(M,T)=Δ f H(M,0)+ΔΔH(M,T) − ∑[ΔΔH(x,T)] (2.23) x=1 where Δ f H(M,0) is the enthalpy of formation of the molecule at zero K, and ΔΔH(M,T) corresponds to the energy needed to increase the temperature from 0 to T K for molecule M. This can be calculated by Hcorr − ZPC where Hcorr is the thermal correction to the enthalpy of the molecule and ZPC is the zero- point correction. ΔΔH(x,T) represents the amount of energy needed to in- crease the temperature from 0 K to T for atom x in molecule M. The enthalpy of formation at0Kisgivenby

N Δ f H(M,0K)=E0(M)+∑[Δ f H(x,0) − E0(x)] (2.24) x=1 N Δ ( , ) where ∑x=1 f H x 0 is the enthalpy of formation of atom x at 0 K, N is the number of atoms in the molecule, and E0 is the total electronic energy. It should be noted that ΔΔH(x,T) and Δ f H(x,0) were obtained from experimen- tal data in the thermochemical theories used in this study[43, 44].

2.2.2 Molecular electrostatics The molecular electron density ρ(r) generates an electrostatic potential φ(r) at an arbitrary point r in space. It is, by definition, the work done to bring a unit positive charge from infinity to that point. The electrostatic potential is a physical observable that can be determined from quantum mechanical calculations. Following the superposition principle, φ(r) can be computed by integrating the contributions from individual differential elements of the electron density as follows: φ(r)= 1 1 ρ(r) r  d (2.25) 4πε0 |r − r | where ε0 is the absolute of free space. It should be noted that the nuclei of the molecule also contribute to φ(r) (see Chapter 5). If the point

26 r is outside the distribution of electron density and r >> r, the electrostatic potential can be evaluated through the Taylor expansion of |r − r|−1 [13]: 1 1 1 1 1 ≈ +(rˆ · r) + 3(rˆ · r)2 − r2I + ··· (2.26) |r − r| r r2 2 r3 where rˆ = r/r and I is the identity matrix. By inserting Eqn. 2.26 into Eqn. 2.25, we get μ Θ φ(r) ≈ 1 Q + 0 + 0 + ··· 2 3 (2.27) 4πε0 r r r Q is the monopole moment, sometimes called the zeroth moment of the molec- ular electron density. In principle this is the total charge of the molecule. Q is given by Q = ρ(r)dr (2.28)

μ0 is the vector of the permanent dipole moment, which measures the polarity of the molecular electron density. μ0 is given by μ0 = ρ(r)(rˆ · r)dr (2.29)

Θ0 is the tensor of the permanent quadrupole moment, which exhibits the deviation of the distribution of the molecular electron density from spherical symmetry. Θ0 is given by 1 Θ = ρ(r) 3(rˆ · r)2 − r2I dr (2.30) 0 2 The quadrupole tensor can be written as a traceless 3 × 3 matrix in the Carte- sian coordinate if one writes (rˆ · r)2 as:

(rˆ · r)2 = rˆ · (rr) · rˆ (2.31) where rr is the outer product of vector r with itself. This results in a matrix that can be written in terms of the Cartesian components of the vector r as follows: ⎡ ⎤ x2 xy zx rr = ⎣ yx y2 yz ⎦ (2.32) zx zy z2 Finally, we get ⎡ ⎤ 3x2 − r2 3xy 3zx 1 1 3(rˆ · r)2 − r2I = rˆ · ⎣ 3yx 3y2 − r2 3yz ⎦ · rˆ (2.33) 2 2 3zx 3zy 3z2 − r2

27 Note that higher electric moments such as the octupole and hexadecapole mo- ments are not described here, but they are available in the Alexandria library. The interested reader is referred to Ref [14] for the higher electric moments. The shape of the molecular electron density changes when it interacts with an external electric field; hence, the total energy of the molecule changes. The static response of a molecule to a homogeneous external electric field, (F), can be studied by expanding its energy in a Taylor series [45, 46]: ∂E 1 ∂ 2E 1 ∂ 3E E (F)=E (0)+ F + F2 + F3 + ··· (2.34) ∂F ∂F2 ∂F3 F=0 2 F=0 6 F=0 where ∂ − E = μ ∂F 0 (2.35) F=0 ∂ 2E − = α (2.36) ∂F2 F=0 ∂ 3E − = β (2.37) ∂F3 F=0 where μ0 is the vector of permanent dipole moment, α is the tensor of po- larizability, which is the linear part of the response of the molecular electron density with respect to the external electric field, and β is the first hyperpolar- izability [45]. Instead of expanding the energy, we can expand the dipole moment of a molecule in an external electric field [46], written as 1 μ = μ + αF + βF2 + ··· (2.38) 0 2 where αF gives the vector of induced dipole moment, μ1 [45]:

μ1 = αF (2.39) that can be written in⎡ matrix⎤ form⎡ as ⎤⎡ ⎤ μx αxx αxy αxz Fx ⎣ ⎦ ⎣ ⎦⎣ ⎦ μy = αyx αyy αyz Fy (2.40) μz αzx αzy αzz Fz From the polarizability tensor the polarizability isotropy [47, 46], (α + α + α ) α¯ = xx yy zz (2.41) 3 and the polarizability anisotropy [47],  Δα = [(α − α )2 +(α − α )2 +(α − α )2 + (α2 + α2 + α2 )]/ xx yy xx zz yy zz 6 xy xz yz 2 (2.42) can be calculated. There are other definitions for the polarizability anisotropy that can be found in Ref [46].

28 2.3 Computational details All the quantum chemistry calculations were performed by the Gaussian soft- ware package (versions 09 [48] and 16 [49]). The standard G2, G3, G4 [50, 51, 52, 53, 43], CBS-QB3 [54, 55], W1U, and W1BD [44] methods were used to calculate the enthalpy of formation, heat capacity, and absolute entropy at room temperature. The Weizmann family of methods was used on a subset of about 600 compounds only, due to computational cost. The B3LYP density functional was used to optimize molecular geometries and to calculate frequencies, electric moments up to hexadecapole, the polar- izability tensor, and the electrostatic potential (ESP) surface of the molecules in the aug-cc-pVTZ basis [56, 57, 58]. However, the aug-cc-pVTZ-PP ba- sis set was used for iodine to take relativistic pseudopotentials into account. Quantum-based partial atomic charges were computed for each molecule us- ing different charge generating algorithms including Mulliken Population Anal- ysis (MPA) [59], Hirshfeld Population Analysis (HPA) [60], ESP charges [61], and the Charge Model 5 (CM5) [62]. The Merz-Kollman scheme was used to generate the grids around the molecule in order to calculate the electro- static potential and its corresponding atomic charges [63]. As a reference, the same calculations were also performed at the HF/6-311G** level of the- ory [64, 65, 66, 67], which is similar to widely used methods for calculating partial atomic charges in the virtual screening of large chemical libraries.

Figure 2.2. The number of quantum chemistry calculations performed at each level of theory.

2.4 Technical validation Paper II explains the procedure used to evaluate the optimized geometry of the molecules provided in the AlexandriaLib.v1.1. The thermochemistry and the electrostatic properties obtained from quantum chemistry calculations were validated by comparing them to experimental data. Several resources of exper- imental data of physico-chemical properties were used, such as the National Institute of Standard and Technology (NIST), the Design Institute for Physi-

29 cal Properties (DIPPR)[68], and the Handbooks of Chemistry [69, 70, 71]. In some cases, multiple experimental values were reported for the same property of a molecule. If these values were similar to each other, the average and the standard deviation of the values were taken to be the reference value and the error, respectively. Where the discrepancy between values was significant, the values were cross referenced against the original publication to check for tran- scription errors. If the original publication was not accesible, the value was excluded from our statistics and reported as a suspected error in the experi- mental data in papers I and II.

30 3. Intermolecular Potential Energy Function

This chapter is based on papers III and IV. It briefly explains the quantum mechanical and molecular mechanical approximations used in computing the long-range and the short-range intermolecular interactions.

3.1 Quantum mechanical approximations for intermolecular energies The interaction energy between two molecules depends on the distance be- tween the molecules, r, and their orientations. The supermolecular approach computes the interaction energy, Eint, between molecules A and B as follows [72, 73]: Eint = EAB − (EA + EB) (3.1) where the interaction energy is simply the difference between the energy of the dimer (EAB) and the energies of the two monomers (EA and EB). However, there is always a non-physical lowering of the monomer’s energy in calcula- tions of the dimer, because each monomer uses the basis set of the interacting partner to lower its own energy. This is called the basis set superposition error (BSSE) [72, 73]. To reduce the BSSE, one solution is to use the time-independent Rayleigh- Schrödinger perturbation theory (RSPT). This method is also often called the polarization theory. The RSPT can be formulated for two interacting molecules A and B as follows [72, 73]: 0  Hˆ + ξHˆ ΨAB = EABΨAB (3.2)

0 where Hˆ is the unperturbed Born-Oppenheimer Hamiltonian operator of the dimer AB, given by 0 Hˆ = Hˆ A + Hˆ B (3.3)  and Hˆ is the perturbation operator consisting of the electrostatic interaction between electrons and nuclei of molecule A with those of molecule B,given by [47] A B  1 ρˆ (r1)ρˆ (r2) Hˆ = dr1dr2 (3.4) 4πε0 |r1 − r2|

31 where ρˆ is the charge density operator. ξ defines the order of perturbation and varies between 0 and 1. ξ = 0 switches off the electrostatic interactions be- tween the two molecules; hence, the dimer energy is the sum of the monomer energies and the dimer wave function is the product of the monomer wave functions [73]: EAB(ξ = 0)=EA + EB (3.5)

ΨAB(ξ = 0)=ΨAΨB (3.6) On the other hand, ξ = 1 completely includes the electrostatic interactions between the two monomers; thus, EAB(ξ = 1) and ΨAB(ξ = 1) are the exact energy and wave function of the dimer. As the result of RSPT, the interaction energy and the dimer wave function can be expressed as an infinite power series in ξ as follows [73]: ∞ (ξ)= ξ n (n) Eint ∑ Epol (3.7) n=1 ∞ Ψ (ξ)= ξ nΨ(n) AB ∑ pol (3.8) n=1 It should be noted that Eqns. 3.7 and 3.8 converge only for small values of ξ. The energy terms in Eqn. 3.7 are often called polarization energies [72, 73]. The first-order polarization energy is equivalent to the classical electrostatic interaction energy between two charge distributions. However, an additional energy term becomes important at short intermolecular distances. This term is often called the charge penetration energy, which is a result of the overlap of the electron densities of the monomers. Therefore, the first-order polarization energy is the sum of the electrostatic and penetration energies [72]:

(1) = (1) + (1) Epol Eelstat Epenetr (3.9) The second- and the third-order polarization energies include the induction and dispersion energies [72]:

(2) = (2) + (2) Epol Eind Edisp (3.10)

(3) = (3) + (3) Epol Eind Edisp (3.11) (2) The second-order induction energy, Eind , results from the mutual polarization of the monomers by the static electric field of their unperturbed partners. How- (3) ever, the third-order induction energy, Eind , corresponds to the simultaneous (2) (3) polarization of the monomers by the field of their partners. Edisp and Edisp result from the intermolecular correlation of electrons of monomers. The problem with the RSPT is that it neglects the electron exchange be- tween the monomers. A simultaneous tunneling of two electrons is called the

32 electron exchange [73]. If the interacting monomers have WA and WB elec- trons, the number of quantum states, M, as the result of electron exchange is given by (W +W )! M = A B (3.12) WA!WB! Note that the electron exchange might bring two electrons with the same spin into the same orbital; this is not allowed by the Pauli exclusion principle. It results in an energy cost; therefore, the energy associated with the electron exchange is repulsive. The lack of electron exchange in RSPT can be cured by applying an anti- symmetrization operator Aˆ to ΨAB(ξ = 0) [72, 73]:

AˆΨAB(ξ = 0)=AˆΨAΨB (3.13)

0 However, the unperturbed Hamiltonian Hˆ is no longer the sum of the monomer Hamiltonians, as defined in Eqn. 3.3. Many methods have been explored to resolve this problem over years [74]. One of these is called symmetry-adapted perturbation theory (SAPT) [73], which was used in paper III. The SAPT adds energy correction terms—because of the electron exchange—to each order of polarization energies [72]. Thus as the result, we get:

(1) = (1) + (1) E Epol Eexch (3.14) (2) = (2) + (2) + (2) E Epol Eexch−ind Eexch−disp (3.15) (1) ∼ Eexch accounts for 90% of the exchange energy in the interaction energy at van der Waals distances [73]. The reader is referred to Ref [47, 75] for the details of the SAPT and the exchange energy corrections. The RSPT and SAPT methods, as shown above, decompose the intermolec- ular interaction energy into different energy components. Table 3.1 summa- rizes the distance dependency and the energy sign for the first- and second- order polarization energies that mainly contribute to the total interaction en- ergy [72].

Table 3.1. Intermolecular distance dependence and the sign of energy for the compo- nents of intermolecular interaction energy [72]. r dependence sign (1) −1 −2 −3 ··· Eelstat r , r , r , +/- (1) −ar Epenetr e + (1) −br Eexch e + (2) −4 −6 Eind r , r - (2) −6 −8 −10 ··· Edisp r , r , r , -

33 3.2 Alexandria force field approximations for intermolecular energies Molecular-mechanical force fields use simple potential energy functions to compute the intermolecular interaction energy of a molecular system, as ex- plained in chapter 1 (see Eqn. 1.30). What follows in this chapter explains the potential functions implemented to include explicit terms for electronic polar- ization and charge penetration effects in the Alexandria force field. Our aim is to use potential functions that change smoothly as a function of distance and that remain well-behaved as the distance between two molecules approches zero.

3.2.1 Electrostatic and Charge Penetration The electrostatic energy between atoms i and j can be computed through the Coulomb integral as follows:

qiq j ρ(ri)ρ(ri) Jij = dridrj (3.16) 4πε0 |ri − rj| where qi is the magnitude and ρ(ri) is the normalized distribution of the partial charge centered on the position of atom i. Conventionally, the Dirac δ function is used to describe ρ(ri); therefore, Eqn. 3.16 simplifies to

1 qiq j Jij = (3.17) 4πε0 |ri − rj| where the atomic charges are treated as a point charge. The fundamental prob- lem with the point charge model is the inverse-distance singularity. It also neglects the charge penetration effect at very short distances. To circumvent these shortcomings, the distribution of ρ(ri) can be described by a spheri- cal Slater-type orbital (STO) or a Gaussian-type orbital (GTO). Based on the principles of quantum mechanics, ρSTO(r) is the square of the Slater wave function as follows [9]:

 2 (2ζ)2n+1 ρSTO(r,n,ζ)= rn−1e−ζr (3.18) 4π(2n)! where n is the principal quantum number and ζ is the exponent of the orbital in in inverse distance that determines the diffuseness of the charge distribution. Similarly, ρGTO(r) is the square of the Gaussian wave function [9]:

34 3 2 2ξ 4 2 ρGTO (r,ξ)= −ξr π e (3.19)

3 ξ 2 = 2 −2ξr2 π e (3.20) where ξ is the exponent of the orbital in inverse distance squared determining the diffuseness of the charge distribution. Fig. 3.1 and Fig. 3.2 display the wave function and the radial probability distribution for 1s STO and GTO, respectively.

0.6 STO GTO

0.4 ) ( r 1 s ψ 0.2

r 12345

Figure 3.1. The wave function for 1s-STO and 1s-GTO as a function of distance (r) from the nucleus.

To make the units of ζ and ξ consistent with each other, Eqn. 3.20 can be rewritten in terms of β as follows [76]:

3 β 2 2 ρGTO (r,β)= −β 2r2 π e (3.21) where β = 2ξ in nm−1. Thus, the solution of the Coulomb integral (Eqn. 3.16) for two interacting Gaussian charge densities is as follows [76]:

GTO(r)= 1 qiq j (β ) Jij erf ijrij (3.22) 4πε0 |ri − rj| where βiβ j βij =  . (3.23) β 2 + β 2 i j

35 STO GTO 0.6 2 | )

( r 0.4 1 s ψ | 2 r

4 π 0.2

r 12345

Figure 3.2. Radial distribution function for 1s-STO and 1s-GTO as a function of distance (r) from the nucleus.

√ Note that for i = j, βii = βi/ 2. The solution of the Coulomb integral (Eqn. 3.16) for two interacting Slater charge densities, with principal quantum numbers n and m and diffuseness values of ζi and ζ j, is given by [77]

4ζ 2n+1ζ 2m+1 ∂ 2n−2∂ 2m−2 STO(r)= 1 qiq j i j 1 Jij − − πε |ri − rj| ( ) ( ) ∂ζ2n 2∂ζ2m 2 ζ 3ζ 3 4 0 2n ! 2m ! i j i j ( ζ 2 − ζ 2)ζ 4 (ζ 2 − ζ 2)ζ 4 3 i j j − ζ i 3 j i − ζ − 2 irij − 2 jrij (3.24) 1 3 3 e 3 3 e (ζi − ζ j) (ζi + ζ j) (ζi − ζ j) (ζi + ζ j) ζ ζ 4 4 i j − ζ ζ ζ j − ζ − 2 irij − i 2 jrij 2 2 rije 2 2 rije (ζi − ζ j) (ζi + ζ j) (ζi − ζ j) (ζi + ζ j)

3.2.2 Electronic Polarization The Alexandria force field uses the classical Drude model (DM) to explicitly take electronic polarization into account. In this model, an atom is represented as a two-particle system consisting of a core particle that is connected to a Drude particle by a harmonic spring [78, 79, 80]. However, because the Drude particle is massless in the Alexandria force field, I shall refer to it as a shell to be consistent with other studies [81, 82, 83].

36 Figure 3.3. A Drude-type polarizable atom. The atom is split into a positive core particle connected to a negative massless shell particle by a harmonic spring. The core represents the nucleus and the shell represents the electron cloud.

The partial charge of a Drude-type atom (qa) is the sum of the charge on its core (qc) and that on its shell (qs). The electronic polarization energy (UP)is expressed as the energy of the harmonic spring connecting the core of atom to its shell, which is given by [83]: 1 UP = kd2 (3.25) 2 where d is the core-shell distance under the influence of an external electric field induced by other charged particles in the system, and k is the force con- stant of the spring defined as [83]

2 qs k = (3.26) αa where αa is the atomic polarizability. The charge on the core is chosen to be positive to represent the nucleus, and the charge on the shell is negative to represent the electron cloud. For two interacting atoms, for example (Fig. 3.4), the sum of the electrostatic and polarization energies is given by:

Figure 3.4. Two Drude-type polarizable atoms interacting with each other.

E =J(cA − cB)+J(cA − sB) + J(sA − cB)+J(sA − sB) + P + P UA UB (3.27)

37 where the first term represents the repulsive nucleus-nucleus interaction. The second and third terms represent the attractive nucleus-electron interactions and the forth term represents the repulsive electron-electron interaction. The last two terms are the polarization energies of atoms A and B, respectively. It should be noted that Eqn. 3.27 explicitly takes the charge penetration energy at short distances into account because the charge on the shell particles is treated as a smeared charge.

3.2.3 Repulsion and Dispersion As explained above, the repulsion (rep) and the dispersion (disp) energies are components of the intermolecular interaction energy. The repulsion is often called the Pauli-repulsion as it is related to the Pauli exclusion principle, which forbids two electrons with the same spin to occupy the same spatial orbital. These interaction energies were characterized before the birth of quantum me- chanics; however, they are indeed a quantum mechanical problem. The sum of these energies is often referred to as the van der Waals (vdw) energy:

Evdw = Erep + Edisp (3.28) Gustav Mie proposed a potential energy function as an expansion in powers of (1/r) to describe both the repulsion and the dispersion as follows [84]:

m n n n−m σ n σ m U (r)= ε − (3.29) vdw n − m m r r where σ is the minmum energy distance (nm) and ε is the well depth (kJ mol−1). Later, John Lennard Jones proposed a special case of Eqn. 3.29 where n = 12 and m = 6 [85]: σ 12 σ 6 U (r)=4ε − (3.30) vdw r r

The r−6 term describing the dispersion energy is consistent with quantum- mechanics. Nevertheless, the r−12 term for the repulsion energy is arbitrary because, as shown in Table 3.1, the dependency of the exchange-repulsion energy on the distance is exponential. Therefore, R. A. Buckingham derived a potential function that expands the repulsion in an exponential term and the dispersion in terms of (1/r) [86]. ε 6 γ( − r ) σ 6 U (r)= e 1 σ − (3.31) vdw 6 γ 1 − γ r where γ is a dimensionless constant describing the steepness of the repulsion. Eqn. 3.30 and Eqn. 3.31, respectively, go to plus and negative infinity as the distance between two particles approches zero. To circumvent this unphysical

38 singularity for the Pauli-repulsion at the origin, Wang and coworkers, by 2013, modified Eqn. 3.31 to [87]: 6 2ε σ 3 γ( − r ) U (r)= e 1 σ − 1 (3.32) vdw − 3 σ 6 + r6 γ + 3 1 γ+3 Eqn. 3.5 plots the energy of two interacting particles as the function of their distance calculated by Eqn. 3.32.

Evdw

6

4

2

r 0.20.30.4 0.50.6 0.7

−2

Figure 3.5. Evdw is the energy between two interacting particles computed by the Wang-Buckingham potential function (Eqn. 3.32). r is the distance between the two particles.

Eqn. 3.32 can be approximately decomposed into the Pauli-repulsion (Eqn. 3.33) and the dispersion (Eqn. 3.34) as follows: 6 2ε σ 3 γ( − r ) U (r)= e 1 σ (3.33) rep − 3 σ 6 + r6 γ + 3 1 γ+3 2ε σ 6 U (r)=− (3.34) disp − 3 σ 6 + r6 1 γ+3 Note that Eqns. 3.33 and 3.34 have a finite limit as the distance between two particles approaches zero, given by

6ε γ lim Urep(r)= e (3.35) r→0 γ 2ε lim Udisp(r)=− (γ + 3) (3.36) r→0 γ

39 As depicted in Fig. 3.6 and Fig. 3.7, respectively, Eqns. 3.33 and 3.34 are well-behaved at very close distances.

Erep

300

200

100

r 0.20.4 0.6

Figure 3.6. Erep is the Pauli repulsion energy between two interacting particles com- puted by Eqn. 3.33 where r is the distance between the two particles.

Edisp

6

4

2

r 0.20.4 0.6 −2

−4

−6

Figure 3.7. Edisp is the dispersion energy between two interacting particles computed by Eqn. 3.34 where r is the distance between the two particles.

In paper IV, we evaluated a number of potential functions including Eqns. 3.30, 3.31, and 3.32 on alkali halides in gas, solid, and liquid phases at the

40 room- and elevated temperatures. We demonstrated that Eqn. 3.32 outper- formed the other benchmarked potentials. Therefore, the Wang-Buckingham potential (Eqn. 3.32) is used in the Alexandria force field to describe the Pauli- repulsion and the dispersion interactions.

41 4. Generation of Polarizable Atomic Charges

This chapter is based on papers III and VI. It explains the algorithms im- plemented for the Alexandria project to generate polarizable partial atomic charges for organic compounds.

4.1 Electrostatic Potential (ESP)-fitting with Drude Model The nuclei and the charge density of a molecule generate an electrostatic potential φ at any arbitrary point r around the molecule (See Chapter 2). Quantum-mechanical methods compute φ(r) as follows: N ρ( ) φ QM( )= 1 Za + r r r πε ∑ |R − r| |r − r|d (4.1) 4 0 a a where N is the number of nuclei, Za is the atomic number, and Ra is the position of nucleus a. ρ(r) is the molecular electron density. The molecular electrostatic potential (MEP) is a continuous physical property; however, a discrete representation of MEP is needed for numerical analysis. Therefore, Eqn. 4.1 is applied to a series of grid points around the molecule [63, 61]. The electrostatic potential at point r can also be approximated by applying the classical Drude model, given by N c M ρ (r) φ DM(r)= 1 qa + s b r πε ∑ |R − r| ∑qb |r − r|d (4.2) 4 0 a a b

c where N and M are the number of core and shell particles, respectively. qa is s the partial charge on core a and qb is the partial charge on shell b with a local normalized distribution of ρb(r). The ESP-fitting algorithm implemented for the Alexandria project gener- ates atomic polarizable charges by reproducing the φ QM at a series of points around the molecule using Eqn. 4.2 (paper III). It performs a restrained least- squares fitting that can be written in matrix notation as follows: J   u q = (4.3) V v

42 where J is the Coulomb matrix. Its elements, Jij, are computed from the dis- tance between atom j in the molecule and the grid point i around the molecule using the Coulomb integral (see Section 3.2.1). Restraints are imposed to en- sure that the charges on symmetrically equivalent atoms are equal and that the sum of partial charges equals the net charge of the molecule. Matrix V, which is appended to J, contains additional linear equations to fulfill these re- straints. These are encoded in vector v appended to vector u containing the QM electrostatic potentials. For m grid points around a molecule with n polarizable atoms, we fulfill Eqn. 4.3 as follows: ⎡ ⎤ ⎡ ⎤ c c c ··· c φ QM − Js J11 J12 J13 J1n 1 1 ⎢ c c c ··· c ⎥⎡ ⎤ ⎢ φ QM − s ⎥ ⎢ J21 J22 J23 J2n ⎥ ⎢ 2 J2 ⎥ qc ⎢ ⎥ ⎢ ...... ⎥ 1 ⎢ . ⎥ ⎢ . . . . . ⎥⎢ c ⎥ ⎢ . ⎥ ⎢ ⎥⎢ q2 ⎥ c c c ··· c ⎢ QM s ⎥ ⎢ J J J Jmn ⎥⎢ c ⎥ ⎢ φ − J ⎥ ⎢ m1 m2 m3 ⎥⎢ q3 ⎥ = m m (4.4) ⎢ V p V p V p ··· V p ⎥⎢ ⎥ ⎢ T − ∑n s ⎥ ⎢ 11 12 13 1n ⎥⎣ . ⎦ ⎢ Q j q j ⎥ ⎢ p − p ··· ⎥ . ⎢ ⎥ V21 V22 0 0 ⎢ 0 ⎥ ⎢ ⎥ qc ⎢ ⎥ ⎣ ...... ⎦ n ⎣ . ⎦ ...... p ··· − p 00Vk3 Vkn 0

c c where Jijq j is the electrostatic potential on the grid point i produced by the c φ charge q j on the core of the atom j. i is the quantum mechanical electrostatic s potential on the grid point i, and Ji is the electrostatic potential on grid point i produced by all the shell particles. V p is a penalty factor to ensure that the restrains encoded in vector v are satisfied. For instance, in the example above, the charges on atoms 1 and 3 are restrained to be equal to the charge on atoms 2, and n, respectively. c s To simplify the fitting procedure, we only vary qi and qi is set to -1 e for hydrogen and to -2 e for others. This means that the contribution of the shells becomes a constant term in the linear equations of matrix J; hence, they move to the vector u on the right-hand side of Eqn. 4.3. Eqn. 4.4 can be summarized as Aq = b (4.5) that can be solved by the Singular Value Decomposition (SVD) algorithm. We then compute the goodness of fit from:

χ2 =(Aq − b)2 (4.6)

Since the shell particles contribute to the right-hand side of Eqn. 4.5, the fitting algorithm must be combined with the optimization of shell positions in the mean-electric field of nuclei. This can be solved iteratively (Fig. 4.1).

43 c Fit qi to ESP

Shell Minimization

Compute χ2

no χ2 ≤ ε?

yes

Stop

Figure 4.1. Shell minimization indicates performance of energy minimization of the shell particle positions in the field of the fixed core particles. This is to ensure that the force on every shell particle is zero at every iteration. Minimization of the position of the shell particles is carried out by the GROMACS package [39].

4.2 Alexandria Charge Model The ground-state internal energy of an isolated atom Ea depends on its charge qa. This can be approximated through a Maclaurin series for a neutral atomic reference state at a fixed external potential ν as follows: a 2 a ∂E 1 ∂ E 2 Ea (qa)=Ea (0)+ qa + qa + ··· (4.7) a a2 ∂q ν 2! ∂q ν Parr et al. have shown that the first and second coefficients of this Maclaurin series are, in principle, the electronegativity (χa) and the hardness (ηa)ofan atom in its ground state [88, 89]. ∂ a χa = −μ = E a (4.8) ∂q ν ∂ 2Ea ηa = (4.9) a2 ∂q ν In other words, the electronegativity of an atom is the negative of its chemical potential (μ)(see Eqn. 1.18). The hardness is the resistance of the chemical potential to variation because of the change in the charge of the atom. Thus,

44 ηa controls the amount of charge flow in/out of the atom. Therefore, Eqn. 4.7 truncated at the second order simplifies to:

1 2 Ea (qa)=Ea (0)+χaqa + ηaqa (4.10) 2 The charge density of an atom becomes polarized inside a molecule because of its interactions with other atoms. Therefore, the internal energy of the atom changes due to the polarization of its charge density. In the Alexandria force field, this polarization energy is approximated by the classical Drude model, given by 1 EP(qs,αa)= (dqs)2 (4.11) 2αa where qs is the charge on the shell particle, and d is the core-shell distance (see Chapter 3 for details). Therefore, Eqn. 4.10 can be written for a Drude polarizable atom inside a molecule as 1 1 Ea = χa(qc + qs)+ ηa(qc + qs)2 + (dqs)2 (4.12) 2 2αa The electrostatic energy of a molecule, Em, containing n Drude polarizable atoms, can thus be calculated as n 1 1 Em = ∑ χa(qc + qs)+ ηa(qc + qs)2 + (d qs)2 +VC (4.13) i i i i i i αa i i i 2 2 i where VC is the Coulomb energy including all the core-core, core-shell, and shell-shell pair interactions:

n n n C = 1 c c cc + 1 c s cs + 1 s s ss V ∑qi q jJij ∑qi q jJij ∑qi q jJij (4.14) 2 ij 2 ij 2 ij where Jij is the Coulomb integral: 1 ρ(ri)ρ(rj) Jij = dridrj (4.15) 4πε0 |ri − rj| where ρ(ri) and ρ(rj) are the normalized distributions of the charges centered at the position of particles i and j, respectively. ε0 is the absolute permittivity of free space. The Coulomb integral is explained for different charge distribu- tions in Chapter 3. The electronegativity of atom i inside a molecule can be calculated by tak- ing the derivative of the molecular electrostatic energy with respect to change in the charge of atom i:

∂Em χm = (4.16) i ∂ a qi 45 However, for simplicity, we keep the charge on the shell constant and only vary the charge on the core. Therefore, Eqn. 4.16 can be rewritten as follows:

∂Em 1 n 1 n χm = = χa + ηaqc + ηaqs + ∑ qcJcc + ∑ qsJcs (4.17) i ∂ c i i i i i j ij j ij qi 2 j =i 2 j =i The charge on the shell was set to -1 for the hydrogen and to -2 for other atoms to be consistent with the number of electrons in the ground-state s or- bital. The principle of chemical potential equalization (CPE) requires that the atomic electronegativities are equal inside a molecule at equilibrium, which leads to n linear conditions.

χm = χm,...,= χm = χm 1 2 n eq (4.18) One additional condition is needed to maintain the total charge of the molecule (QT ) that leads to n + 1 linear equations as follows:

⎡ ⎤⎡ ⎤ ⎡ ⎤ ηa 1 cc ··· 1 cc − c −χa − ηaqs − 1 ∑n qsJcs 1 2 J12 2 J1n 1 q1 1 1 1 2 j =1 j 1 j ⎢ 1 cc a 1 cc ⎥⎢ c ⎥ ⎢ a a s 1 n s cs ⎥ ⎢ J η ··· J −1 ⎥⎢ q ⎥ ⎢ −χ − η q − ∑ = q J ⎥ ⎢ 2 21 2 2 2n ⎥⎢ 2 ⎥ ⎢ 2 2 2 2 j 2 j 2 j ⎥ ⎢ ...... ⎥⎢ . ⎥ = ⎢ . ⎥ ⎢ . . . . . ⎥⎢ . ⎥ ⎢ . ⎥ ⎣ 1 cc 1 cc ··· ηa − ⎦⎣ qc ⎦ ⎣ −χa − ηa s − 1 ∑n s cs ⎦ 2 Jn1 2 Jn2 n 1 n n n qn 2 j =n q jJnj 11··· 10 χm T − n s eq Q ∑i qi (4.19) Similar to the ESP-fitting algorithm explained in section 4.1, the shell par- ticles contribute to the right-hand side of Eqn. 4.19. The ACM algorithm is combined with the relaxation of the shell positions in the mean-field of the cores. Note that all terms including the shell particles in Eqn. 4.19 vanish for a non-polarizable molecule.

46 5. Parameterization

This chapter is based on papers III, IV and VI. It briefly explains the theo- retical background of the algorithms used to optimize the parameters of the Alexandria force field.

5.1 Linear Fitting 5.1.1 Singular value decomposition For this thesis, the singular value decomposition (SVD) algorithm is used to perform linear regression. The SVD is a factorization algorithm that decom- poses an m × n matrix A into three matrices [90]:

A = UΣVT (5.1)

U is a unitary m × m matrix consisting of the eigenvectors of AT A. V is a unitary n × n matrix that consists of the eigenvectors of AAT . Σ is a diagonal m × n matrix consisting of the singular values, which are the non-negative square roots of the eigenvalues of AT A [90]. The singular values are ordered as follows: σ1 ≥ σ2 ≥ σ3 ···≥σn ≥ 0 (5.2) One of the applications of the SVD algorithm is to solve systems of linear equations: b = Ax (5.3) such that xˆ = Σ−1bˆ (5.4) where bˆ and xˆ represent b and x in the U and V bases, respectively. Eqn. 5.4 can be derived as follows:

UUTb = AV V T x (5.5)

Ubˆ = AV xˆ (5.6) bˆ = UT AV xˆ (5.7) bˆ = Σxˆ (5.8)

47 5.1.2 Bootstrapping The bootstrapping technique is used to quantify the uncertainty associated with the optimum value of the parameters obtained from the SVD algorithm [91]. Bootstrapping infers statistics of a population from randomly resampled data. It starts by taking a large number of “bootstrapped samples” randomly with replacement from the original sample data. This process generates thou- sands of hypothetical samples consisting of the same number of subjects as the original sample. However, the number of occurrences of a specific subject may be different from the number in the original sample. Finally, the statis- tical analysis of interest, the SVD in our case, is replicated in each of these bootstrapped resamples [92]. This results in a large number of estimates of the statistic of interest, which is usually represented as a distribution known as the bootstrap distribution (Fig. 5.1).

Figure 5.1. The process of bootstrapping statistical analysis. Different subjects in the samples are shown in green, blue and orange.

5.2 Nonlinear Fitting 5.2.1 Bayesian inference

The Bayesian formalism makes inferences about a model parameter θi in terms of probability statements conditioned on the observed data D, which is denoted as P(θi|D) in this thesis. In other words, Bayesian statistics apply probabil- ity theory to make inferences about unknown parameters. This is in contrast

48 to the conventional frequentist approach in which parameters are fixed and observations are random [93]. Following the Bayes rule, the posterior distribution of parameter θi is pro- portional to the product of its prior distribution and its likelihood [93]:

P(θi|D) ∝ P(θi)P(D|θi) (5.9)

The prior distribution P(θi) represents the belief before any data were ob- served, while the posterior distribution reflects the belief revised by observed data [93]. If the posterior and the prior distributions are in the same distribu- tion family, as depicted in Fig. 5.2, they are termed conjugate distributions. Eqn. 5.9 can be divided by the probability of observing the data given all possible values of θ to be normalized.

P(D|θ )P(θ ) P(θ |D)= i i (5.10) i P(D)

The set of parameters θ1,···,θn partitions the parameter space. An observable data D will be partitioned to different parts by the partition [93]:

D =(D ∩ θ1) ∪ (D ∩ θ2) ∪···(D ∩ θn) (5.11)

where (D ∩ θ1) and (D ∩ θ2) are disjoint because θ1 and θ2 are disjoint. There- fore, according to the law of total probability, the probability of the observable D is the sum of the probabilities of its disjoint parts:

P(D)= P(D ∩ θ)dθ (5.12)

Applying the multiplication rule on each disjoint probability in Eqn. 5.12 gives [93] P(D)= P(D|θ)P(θ)dθ (5.13)

Thus, Eqn. 5.10 can be rewritten as:

P(D|θ )P(θ ) P(θ |D)= i i (5.14) i P(D|θ)P(θ)dθ

49 P(D|θi) 1

0.8 P(θi|D)

0.6 P(θi) 0.4

0.2

0 0246 8

Figure 5.2. Bayes rule states that the posterior distribution, shown in blue, of param- eter θi is proportional to the product of its prior distribution, P(θi), and its likelihood, P(D|θi).

5.2.2 Bayesian computation The posterior distribution in Eqn. 5.14 can be computed analytically if the integral in the denominator (the normalization factor) has a closed form [94]. This can be done with conjugate priors. However, for complex models with a high dimensional parameter space, conjugate distributions do not bring much relief; thus, a numerical method or a stochastic simulation is required [94]. Monte Carlo is a stochastic integration method widely used to evaluate an integral by computing the value of the integrand h(Θ) at a finite sequence of points randomly generated by an arbitrarily distribution π(Θ) [94]. This approach becomes inefficient if π(Θ) is not chosen in connection with the features of h(Θ). However, if the domain Θ is bounded, a uniform distribution U(Θ) would be enough for most problems. In minimization problems, the integrand is usually defined as a loss function that calculates the deviation of predictions Dˆ from the observed data D. In the context of molecular modeling, the loss function calculates the deviation of the value of molecular properties predicted by the model from a reference value. The action is then to explore the parameter space by searching for a set of parameter estimates, Θˆ , that minimizes the expected loss. For this thesis, the loss function is formulated as follows:

(Θˆ )= ω χ2 + Λ L ∑ i i (5.15) i

50 χ2 where i consists of residuals in the least-squares form for molecular property i weighted by ωi factor

k χ2 = 1 ( − ˆ )2 i ∑ Dij Dij (5.16) k j where k is the number of data points available for the property i. Λ is an l2-norm regularizer added to the loss function. It discourages overfitting by restraining the optimization to search in a region of the parameter space con- fined by hyperparameters L and U as the lower and upper bounds, respectively. It is defined as: Λ = 1 θˆ − 2 (−θˆ )+ θˆ − 2 (θˆ ) ∑ i L HL i ∑ i U HU i (5.17) 2 i i where H denotes the Heaviside function. After the trust region of each pa- rameter is found, a box-constrained algorithm is applied to fine-tune each pa- rameter inside its trust region in the parameter space. The box-constrained is formulated as follows [95]: min L(Θˆ ) (5.18) Θˆ ∈K where n K = {Θˆ ∈ R : li ≤ θˆi ≤ ui,i = 1,···,n} (5.19) and li and ui are the lower and upper bounds respectively for parameter θˆi. The parameter estimates are generated as a Markov chain (MC) from dis- tribution π(Θ). Therefore, the integration method is called Markov Chain Monte Carlo (MCMC). A Markov chain is a sequence of random variables θ 1,θ 2,···,θ N, for which, for any t, the distribution of θ t only depends on the value of θ t−1 [96, 97]. Any Markov chain can be generated by starting at some point in the parameter space, θ 0. Then, for each t, the transition from θt−1 → θt takes place if it minimizes the loss function, otherwise it takes t t−1 place through a transition distribution Jt(θ |θ ) according to the Metropolis criterion. The transition distribution is, in this work, defined as a Boltzmann distribution: Δ t t−1 − L Jt(θ |θ )=e T (5.20) where ΔL = L(θt) − L(θ t−1) (5.21) and T is a weighting factor, often referred to as the effective temperature even though it does not have a physical meaning in most optimization problems. For a parameter estimate θ ∗ proposed for θ t, we can write

θ ∗ = θ t−1(1 + δU(0,1)) (5.22)

51 then ⎧ ⎨⎪θ t = θ ∗ if ΔL < 0 θ t = θ ∗ else if J ≥ U(0,1) ⎩⎪ t θ t = θ t−1 else where U(0,1) generates a uniform random number between 0 and 1 and δ is a factor controlling the perturbation in θ t−1.

5.2.3 Simulated annealing The Metropolis procedure generates a population of parameter sets Θ at some effective temperature denoted as T in the exponential term of Eqn. 5.20 [98]. This T is in fact a parameter controlling the acceptance ratio through the tran- sition distribution Jt. The higher the temperature, the higher the acceptance ratio is. The simulated annealing (SA) process starts the MCMC simulation at a high temperature. Thereafter, it lowers the temperature according to a prede- fined schedule until the Markov chain freezes at some place in the parameter space.

52 6. Alexandria Chemistry Toolkit

This chapter explains the Alexandria Chemistry Toolkit (ACT) developed to derive parameters of the Alexandria force field from quantum-chemistry and experimental data. The ACT also generates the molecular topology and atomic partial charges for chemical compounds for performing molecular simulations with the GROMACS package.

6.1 Extracting Quantum Chemistry Data Two Alexandria programs, gauss2molprop and merge_mp, extract the geom- etry and properties of a molecule from the output of quantum-chemistry cal- culations performed by the Gaussian software [48, 49]. The gauss2molprop programs uses the Open Babel open source code [99] to parse and extract the quantum data from the output file of the Gaussian package. It stores them in an XML file containing molecular properties or, in short, a molprop file. We contributed to the Open Babel source code to extract the optimized geome- try, frequencies, partial atomic charges, and thermochemical and electrostatic properties from the Gaussian output files. The procedure is summarized in Fig. 6.1.

6.2 Generation of Force Field Atom Types SMiles ARbitrary Target Specification (SMARTS) patterns [100] are used to define atom types. The Open Babel software [99] was used to generate atom types for each compound. The definition of atom types is based on the Gen- eralized Amber Force Field (GAFF) [101]. The SMARTS patterns in Open Babel (available in version 2.4.1 or later) were modified to reproduce the pub- lished definition of GAFF for each atom type [101].

6.3 Optimization of Force Field Parameters Three ACT programs were used in this thesis for optimization of the force field parameters. They apply the SVD algorithm combined with bootstrapping (SVD/B), and the Bayesian Monte Carlo (BMC) simulation combined with the Simulated Annealing (SA), denoted as BMC/SA. (see Fig. 6.1). The programs are

53 • tune_pol uses the SVD/B algorithm to derive the optimum value and the uncertainty of atomic polarizability from experimental values of molec- ular isotropic polarizability (paper III). • tune_zeta uses the BMC/SA simulation to optimize the exponent of the spherical Gaussian and ns-Slater density functions. It reproduces molecular electrostatic potentials and electric moments using the ESP- fitting charge generating algorithm (paper III). • tune_eem uses the BMC/SA simulation to optimize atomic electronega- tivity, atomic absolute hardness, and the exponent of the spherical Gaus- sian and ns-Slater density functions. It reproduces molecular electro- static potentials and electric moments using the Alexandria charge model (paper VI).

Figure 6.1. Extraction of quantum chemistry data merged with experimental data for optimization of the parameters of the Alexandria force field.

6.4 Generation of Molecular Topology and Atomic Charges The Alexandria gentop program is implemented for generating the molecular topology file (.top), coordinate file (.gro), and partial atomic charges. The

54 format of the generated files is the format of the input files of the GROMACS package.

6.5 Coulomb Integrals for Distributed Charge Densities Eqn. 3.22, and its analytical derivative with respect to r, were implemented for computing the Coulomb energy and force between two interacting Gaussian charge densities. For two interacting Slater charge densities, Eqn. 3.24 was implemented in the Mathematica program from which C++ code was generated for the analyt- ical computation of the Coulomb integral. The analytical derivative with re- spect to r was also derived using the Mathematica program, which is necessary for computing Coulomb forces between two Slater charge densities. There are many terms with large powers in Eqn. 3.24, particularly for principal quan- tum numbers greater than three. Thus, the equations were implemented using the arbitrary precision arithmetic library “Class Library for Numbers” to avoid numerical instabilities.

6.6 Parallelization Message Passing Interface (MPI) is used to perform the Bayesian Monte Carlo simulation on multiple Central Processing Units (CPUs). The evaluation of the loss function (Eqn. 5.15) is distributed on multiple CPUs.

6.7 License The ACT is a free software under the terms of the GNU General Public Li- cense as published by the Free Software Foundation. Therefore, the ACT can be modified and/or distributed under version 2 or any later version of the Li- cense.

55 7. Summary of papers

7.1 Paper I The aim of this paper was to identify a quantum-mechanical (QM) method that accurately predicts molecular thermochemistry at a moderate computational cost. This QM method then provided reference data of molecular energetics 0 0 including the enthalpy of formation (Δ f H ), the standard entropy (S ), and the heat capacity (CV ) for parameterizing the Alexandria force field. The thermochemistry calculations of six popular QM methods were bench- marked against experiments. These methods were the G2, G3, G4, CBS- QB3, W1U, and W1BD. The performance of these methods was compared on ∼2000 molecules up to 47 atoms. Our results showed that the G4 method was more efficient than the other benchmarked methods—considering both ac- curacy and computational cost—to perform large-scale quantum calculations of molecular thermochemistry. Moreover, the large number of molecules uncovered possible hidden short- comings of the QM thermochemistry methods. For example, our results, con- sistent with previous studies, demonstrated that QM calculations done on a single optimized geometry underestimated the gas-phase S0 for flexible molec- ules. We showed that this systematic deviation from experimental S0 corre- sponds roughly to the Boltzmann equation (S = RlnΩ), where R is the ideal gas constant and Ω the number of possible conformations. This was used to empirically correct the calculated entropy for molecules consisting of multiple conformations. This paper also predicted the molecular thermochemistry for over 700 com- pounds for which there were no experimental data in the available databases. Finally, to facilitate the analysis of thermodynamics properties by others, we implemented a tool obthermo and a table of reference atomization energy val- ues for popular thermochemistry methods in the Open Babel software pack- age.

7.2 Paper II The accuracy and the reliability of molecular-mechanical (MM) force fields is mainly determined by the data quality and the size of the training sets. To predict molecular properties in a vast chemical space, force fields need to be trained on a large variety of chemical compounds. The reference values of molecular properties also need to be as accurate as possible.

56 Paper II is a data descriptor on the first version of the Alexandria library, AlexandriaLib.v1.1. The Alexandria library was released as an open and freely accessible database of the optimized geometry, frequencies, electric moments up to the hexadecupole, the electrostatic potential map, the polarizability ten- sor, and the thermochemistry obtained from quantum chemistry calculations for 2704 compounds. Computed values are tabulated and where available compared to experimental data.

7.3 Paper III The aim of this paper was to explicitly include electronic polarization and charge penetration effects in the Alexandria force field for describing the elec- trostatics of organic compounds with different chemistries. To do so, a Drude-type polarizable model was developed. In this model, the core particle was treated as a positive point charge and the shell particle was treated as a negative smeared charge. The core particle represented the nucleus, and the shell particle represented the electron cloud. The distribution of the smeared charge was described by either a 1s-Gaussian or an ns-Slater (n = 1, 2, 3) density function. The experimental and quantum-chemistry data of molecular electrostatic properties from the Alexandria library were used for parametrizing the model. The number and variety of compounds used in the training set ensured that the model was chemically transferable beyond the Alexandria library. The uncer- tainty in the optimized value of each parameter was also reported. This will allow propagating uncertainties in future predictions caused by the uncertainty in the model parameters. Our results demonstrated that the Alexandria Drude model predicts the dipole moment and the isotropic polarizability of molecules in agreement with experiments within the accuracy of the Density Functional Theory (DFT) at the level of B3LYP/aug-cc-pVTZ. Moreover, the explicit inclusion of elec- tronic polarization into the force field equation reduced by more than 50% the root-mean squared deviation from DFT calculations of the dipole moment of 152 dimers and clusters. We also showed that the accuracy of the electrostatic interaction energy of the water dimer was remarkably improved by the intro- duction of polarizable smeared charges as a model accounting for the charge penetration effect.

7.4 Paper IV The aim of this paper was to describe—with the Alexandria force field—alkali halides in gas, liquid, and solid phases. The model was developed in two steps as explained below.

57 In the first step, a Drude-type polarizable model was parameterized by re- producing the experimental data of the dipole moment of the alkali halide pairs. In this model, the partial charge on the core and shell particles of each ion was described by the same 1s-Gaussian density function. In the second step, potential energy functions describing van der Waals in- teractions were parameterized—in combination with the Drude model. The potential functions included the 12-6 and 8-6 Lennard-Jones potential func- tions, the standard Buckingham function, and the Wang-Buckingham potential function. The parameterization was done by reproducing experimental data of ion pair dissociation energies, inter-ionic equilibrium distances, vibrational frequencies, and the density of the alkali halide crystals. All the models parameterized in this study, as well as four reference force fields, were benchmarked against experimental data of physico-chemical prop- erties of alkali halides. Our results systematically demonstrated that the Alexan- dria Drude model, combined with the Wang-Buckingham potential function, predicted the tested properties in gas, liquid, and solid phases with a good accuracy.

7.5 Paper V The Density Functional Theory (DFT) is commonly used to calculate the thermochemical properties of chemical compounds in gas phase. However, its high computational cost limits its applicability to small compounds. Ac- curate classical force fields could, in principle, complement these quantum- mechanical methods, because they are much cheaper computationally. Paper V benchmarked two popular empirical force fields for predicting molecular thermochemistry. The force fields were the General Amber Force Field (GAFF), and the CHARMM General Force Field (CGenFF). We calcu- lated the internal thermal energy (E), zero-point energy (ZPE), heat capacity 0 at constant volume (CV ), and the standard absolute entropy (S ) using these force fields for about 1800 small molecules. The results were compared to ex- perimental data as well as the results obtained from the G4 quantum-chemical calculations. Our results showed that the force field calculations had larger deviation from the experimental data than the G4 method for the benchmarked thermo- chemical properties, particularly for the standard absolute entropy. This work, however, suggested that classical force fields—with some tuning—could in- deed complement DFT in thermochemical applications.

58 7.6 Paper VI It has been challenging to efficiently generate partial atomic charges for pro- teins and their complexes with organic compounds at different physiological conditions. Biomolecular force fields use tabulated charges for proteins that take into account only on average the effect of the local chemical environment on the atomic charges. It is also prohibitively expensive to generate charges from quantum chemistry calculations even for small proteins. To overcome this problem, this work presented the Alexandria Charge Model (ACM) that is based on the well-known chemical potential equalization (CPE) method. The CPE was combined with the classical Drude model to make the polarizable ACM (PACM). The optimum value and the uncertainty of the model parameters were inferred from quantum chemistry data using Bayesian formalism. We demonstrated that the ACM and PACM yield atomic charges that pre- dict electrostatic properties of neutral organic compounds in agreement with the high-level Density Functional Theory (DFT). The computational cost of the ACM and PACM is substantially lower than the reference methods bench- marked in this study. This allows generation of atomic charges for proteins, for example, at different protonation states. Moreover, the PACM generates Drude-type polarizable charges for proteins, which is unprecedented.

59 8. Populärvetenskaplig Sammanfattning på Svenska

Framsteg i utvecklingen av datorer har gjort beräkningsmetoder användbara inom alla aspekter av kemi. Applikationerna sträcker sig från att studera kemiska reaktioner i lösning till den rationella utformningen av kemiska föreni- ngar med nya egenskaper och funktioner. De som är skickliga med denna typ av beräkningsverktyg kan därför hjälpa till att upptäcka nya läkemedel eller designa material med nya egenskaper. Som ämne, kan beräkningskemi således bidra till samhället, till exempel genom att leverera effektiv medicin till en lä- gre kostnad. Beräkningskemi gör det möjligt att förutsäga fysikalisk-kemiska egenskaper hos föreningar i förväg, innan de syntetiseras i laboratoriet. På det sättet kan man exempelvis uppskatta toxiciteten hos industriella kemikalier vilket kan förekomma massproduktion och utsläpp av syntetiska föreningar som är skadliga för såväl människor som miljön. En annan tillämpning av beräkn- ingskemi är att utforska nya kemikalier, t.ex. biobränslen som skulle kunna ersätta fossila varianter som energibärare. Av dessa skäl har kemiska beräkningsmetoder varit under kontinuerlig utve- ckling under många år. Det långsiktiga målet har varit och är fortfarande att förutspå—med en enda enhetlig modell—tillståndet av föreningar med olika kemiska egenskaper i samtliga fysiska aggregationstillstånd. Samtliga molekylberäkningsmodeller innehåller dock vissa antaganden och approxima- tioner som begränsar deras respektive användningsområde. Denna avhandling fokuserar på utvecklingen av molekylära mekaniska mod- eller. Dessa modeller bygger på klassisk mekanik snarare än kvantmekanik, vilket gör beräkningarna både enklare och snabbare än kvantmekaniska mod- eller på viss bekostnad av deras noggrannhet. Den lägre beräkningskostnaden gör dem å andra sidan mycket väl lämpade för stora system av biologiskt och medicinskt intresse. Denna avhandling presenterar en ny klassisk molekylmodell, det så kallade Alexandria-kraftfältet. Ett kraftfält är en uppsättning av matematiska funk- tioner utformade för att beräkna energin hos en kemisk förening utifrån de bundna och icke-bundna interaktionerna hos dess ingående atomer. De bundna interaktionerna inkluderar de atomer som hålls samman genom kovalenta bind- ningar, medan de icke-bundna interaktionerna innefattar de atomer som inte är kovalent bundna, men som påverkar varandra genom elektrostatiska krafter och van der Waals-interaktioner.

60 Förståelsen av icke-bundna interaktioner är viktiga inom rationell läkeme- delsdesign, eftersom de flesta läkemedel binder till sina receptorer genom icke-kovalenta bindningar. Huvudsyftet i detta arbete var därför att bygga upp Alexandria-kraftfältet med mer fysikaliskt realistiska modeller för elek- trostatiska krafter och van der Waals-interaktioner än vad som tidigare använts i andra kraftfält. Formen på och de inbyggda parametrarna i kraftfältets ekvationer bestäm- mer kraftfältets noggrannhet. Traditionellt behandlar kraftfält av detta slag partiella atomladdningar som en punktladdning, eftersom beräkningen av den elektrostatiska energin mellan två punktladdningar är enkel att utföra. I själva verket är elektronerna utspridda kring atomkärnan och den verkliga formen på laddningsfördelningen förändras ständigt på grund av interaktioner med an- dra atomer. Denna omfördelning av laddningar kallas laddningspolarisation. I Alexandria-kraftfältet ersätts punktladdningarna med sfäriskt fördelade, po- lariserbara laddningar. Våra resultat visar att dessa sfäriskt fördelade, polariserbara laddningar för- bättrade noggrannheten för den elektrostatiska interaktionsenergin mellan två vattenmolekyler. Dessutom visade vi att Alexandria-kraftfältet förutspår de elektrostatiska egenskaperna hos isolerade organiska föreningar i överensstäm- melse med kvantmekaniska modeller på hög nivå men till en bråkdel av beräkn- ingskostnaden. Eftersom vi baserade dessa analyser på många olika kemiska föreningar är det sannolikt att Alexandria-kraftfältet kan användas till fler än de som finns med i denna träningssats. Ett komplett Alexandria-kraftfält har utvecklats för alkali-halider såsom natriumklorid. De fysikalisk-kemiska egenskaper som kraftfältet förutspår för dessa föreningar jämfördes med data från experimentella mätningar och beräkningar från fyra andra kraftfält. De erhållna resultaten visade att Alexan- dria-kraftfältet överlag hade högst noggrannhet av alla testade kraftfält gäl- lande förutsägelser av alkalihalidernas egenskaper i såväl gasfas som flytande och fast form. Sammanfattningsvis ger Alexandria-projektet som presenteras i denna avh- andling en plattform för systematisk utveckling av fasoberoende och kemiskt överförbara molekylmekaniska kraftfält. Mycket arbete kvarstår emellertid för att kunna använda Alexandria-kraftfältet i applikationer som exempelvis läkemedelsutveckling. För detta krävs till exempel särskild parametrisering av Alexandria-kraftfältet för biomakromolekyler såsom proteiner och nukleinsy- ror.

61 Acknowledgment

The work presented in this thesis was carried out in Computational Biology and Bioinformatics research group at the Institute of Cellular and Molecular Biology (ICM), Biomedical Center, Uppsala University, Sweden. This work would have not been made possible without the help of many people. First and foremost, I am deeply grateful to my supervisor David van der Spoel for giving me the chance to do what I like the most. I thank David for his continuous support over these years. I thank my colleagues in David’s lab and my coauthors for their support and inspirations. I thank my colleagues in the ICM. I acknowledge their friendship and help. I also want to thank staff members and administrators at the department.

62 References

[1] M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids. Great Clarendon Street, Oxford, OX2 6DP, Uk: Oxford University Press, 2017. [2] C. Caleman, P. J. van Maaren, M. Hong, J. S. Hub, L. T. Costa, and D. van der Spoel, “Force field benchmark of organic liquids: Density, enthalpy of vaporization, heat capacities, surface tension, compressibility, expansion coefficient and dielectric constant,” J. Chem. Theory Comput., vol. 8, pp. 61–74, 2012. [3] J. R. Perilla, B. C. Goh, C. K. Cassidy, B. Liu, R. C. Bernardi, T. Rudack, H. Yu, Z. Wu, and K. Schulten, “Molecular dynamics simulations of large macromolecular complexes,” Curr. Opin. Struct. Biol., vol. 31, pp. 64–74, 2015. [4] M. G. Martin and J. I. Siepmann, “Transferable potentials for phase equilibria. 1. united-atom description of n-alkanes,” J. Phys. Chem. B., vol. 102, no. 14, pp. 2569–2577, 1998. [5] M. G. Martin and J. I. Siepmann, “Novel configurational-bias monte carlo method for branched molecules. transferable potentials for phase equilibria. 2. united-atom description of branched alkanes,” J. Phys. Chem. B., vol. 103, no. 21, pp. 4508–4517, 1999. [6] B. Chen and J. I. Siepmann, “Transferable potentials for phase equilibria. 3. explicit-hydrogen description of normal alkanes,” J. Phys. Chem. B., vol. 103, no. 25, pp. 5370–5379, 1999. [7] N. Ferrando, V. Lachet, J. M. Teuler, and A. Boutin, “Transferable Force Field for Alcohols and Polyalcohols,” J. Phys. Chem. B., vol. 130, 2009. [8] P. Bai, M. Tsapatsis, and J. I. Siepmann, “Trappe-zeo: Transferable potentials for phase equilibria force field for all-silica zeolites,” J. Phys. Chem. C., vol. 117, no. 46, pp. 24375–24387, 2013. [9] A. Szabo and N. S. Ostlund, Modern Quantum Chemistry. Mineola, N.Y.: Dover publications inc., 1989. [10] W. Koch and H. M. C, A Chemist’s Guide to Density Functional Theory. WILEY-VCH Verlag GmbH, D-69469 Weinheim: Dover publications inc., 2001. [11] D. B. Cook, Handbook of Computational Quntum Chemistry. Mineola, New York: Dover publications inc., 2005. [12] A. J. Cohen, P. Mori-Sanchez, and W. Yang, “Challenges for Density Functional Theory,” Chem. rev, vol. 112, pp. 289–320, 2012. [13] H. J. C. Berendsen, Simulating the Physical World: Hierarchical Modeling from Quantum Mechanics to Fluid Dynamics. The Edinburgh Building, Cambridge CB2 8RU, UK: Cambridge University Press, 2007. [14] A. R. Leach, Molecular Modeling Principles and Applications. Edinburgh Gate, Harlow, Essex CM20 2JE, England: Pearson Education Limited, 2001.

63 [15] W. G. Gibson, “Quantum corrections to the radial distribution function of a fluid,” Molec. Phys., vol. 28, pp. 793–800, 1974. [16] W. L. Jorgensen, “The many roles of computation in drug discovery,” Science, vol. 303, no. 5665, pp. 1813–1818, 2004. [17] R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld, “Quantum chemistry structures and properties of 134 kilo molecules,” Sci. Data, vol. 1, p. 140022, 2014. [18] J. M. Simmie, “A database of formation enthalpies of nitrogen species by compound methods (CBS-QB3, CBS-APNO, G3, G4),” J. Phys. Chem. A., vol. 119, no. 42, pp. 10511–10526, 2015. [19] D. Hait and M. Head-Gordon, “How accurate is density functional theory at predicting dipole moments? an assessment using a new database of 200 benchmark values.,” 2017. [20] P. Cieplak, P. A. Kollman, and T. Lybrand, “A new water potential including polarization: Application to gas-phase, liquid, and crystal properties of water,” J. Chem. Phys., vol. 92, pp. 6755–6760, 1990. [21] R. Chelli, R. Righini, S. Califanao, and P. Procacci, “Towards a polarizable force field for molecular liquids,” J. Mol. Liq., vol. 96-97, pp. 87–100, 2002. [22] G. A. Kaminski, H. A. Stern, B. J. Berne, and R. A. Friesner, “Development of an Accurate and Robust Polarizable Molecular Mechanics Force Field from ab Initio Quantum Chemistry,” J. Phys. Chem. A., vol. 108, pp. 621–627, 2004. [23] E. Harder, V. M. Anisimov, T. Whitfield, A. D. MacKerell,Jr., and B. Roux, “Understanding the Dielectric Properties of Liquid Amides from a Polarizable Force Field,” J. Phys. Chem. B., vol. 112, pp. 3509–3521, 2008. [24] J. W. Ponder and et. al., “Current Status of the AMOEBA Polarizable Force Field,” J. Phys. Chem. B, vol. 114, pp. 2549–2564, 2010. [25] S. Patel, A. D. Mackerell, and C. L. Brooks, “CHARMM fluctuating charge force field for proteins: II - Protein/solvent properties from molecular dynamics simulations using a nonadditive electrostatic model,” J. Comput. Chem., vol. 25, pp. 1504–1514, 2004. [26] J. W. Ponder and D. A. Case, “Force fields for protein simulations,” Adv. Prot. Chem., vol. 66, p. 27, 2003. [27] G. Lamoureux and B. Roux, “Modeling induced polarization with classical Drude oscillators: Theory and molecular dynamics simulation algorithm,” J. Phys. Chem., vol. 119, pp. 3025–3039, 2003. [28] G. Lamoureux, A. D. MacKerell, and B. Roux, “A simple polarizable model of water based on classical Drude oscillators,” J. Chem. Phys., vol. 119, pp. 5185–5197, 2003. [29] G. G. Hall and K. Tsujinaga, “The molecular electrostatic potential of some simple molecules,” Theor. Chim. Acta., vol. 69, pp. 425–436, 1986. [30] G. G. Hall and C. M. Smith, “The electron density of the water molecule,” Theor. Chim. Acta., vol. 69, pp. 71–81, 1986. [31] A. K. Rappé and W. A. Goddard III, “Charge Equillibration for Molecular Dynamics Simulations,” J. Phys. Chem., vol. 95, pp. 3358–3363, 1991. [32] A. Baranyai and P. T. Kiss, “A transferable classical potential for the water molecule,” J. Chem. Phys., vol. 133, p. 144109, 2010. [33] A. Baranyai and P. T. Kiss, “Polarizable model of water with field-dependent

64 polarization,” J. Chem. Phys., vol. 135, p. 234110, 2011. [34] P. T. Kiss and A. Baranyai, “Density maximum and polarizable models of water,” J. Chem. Phys., vol. 137, pp. 84506–84508, 2012. [35] H. Jiang, O. A. Moultos, I. G. Economou, and A. Z. Panagiotopoulos, “Gaussian-charge polarizable and nonpolarizable models for CO2,” J. Phys. Chem. B, vol. 120, no. 5, pp. 984–994, 2016. [36] P. T. Kiss and A. Baranyai, “A new polarizable force field for alkali halide ions,” J. Chem. Phys., vol. 141, p. 114501, 2014. [37] A. G. Donchev, V. D. Ozrin, M. V. Subbotin, O. V. Tarasov, and V. I. Tarasov, “A quantum mechanical polarizable force field for biomolecular interactions,” Proc. Natl. Acad. Sci. U.S.A., vol. 102, pp. 7829–7834, 2005. [38] Q. Wang, J. A. Rackers, C. He, R. Qi, C. Narth, L. Lagardere, N. Gresh, J. W. Ponder, J. P. Piquemal, and P. Ren, “General model for treating short-range electrostatic penetration in a molecular mechanics force field,” J. Chem. Theory Comput., vol. 11, no. 6, pp. 2609–2618, 2015. [39] S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl, “GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit.,” Bioinformatics, vol. 29, pp. 845–54, 2013. [40] D. A. McQuarrie, Statistical Mechanics. New York: Harper & Row, 1976. [41] J. W. Ochterski, Thermochemistry in Gaussian. Gaussian, Inc., Pitssburg PA, 2000. [42] P. H. Berens, D. H. J. Mackay, G. M. White, and K. R. Wilson, “Thermodynamic and quantum corrections from molecular dynamics for liquid water,” J. Chem. Phys., vol. 79, pp. 2375–2389, 1983. [43] L. A. Curtiss, P. C. Redfern, and K. Raghavachari, “Gaussian-4 theory,” J. Chem. Phys., vol. 126, p. 84108, 2007. [44] E. C. Barnes, G. A. Petersson, J. A. Montgomery, M. J. Frisch, and J. M. L. Martin, “Unrestricted coupled cluster and Brueckner doubles variations of W1 theory,” J. Chem. Theory Comput., vol. 5, no. 10, pp. 2687–2693, 2009. [45] F. Jensen, Introduction to Computational Chemistry. The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England: John Wiley and Sons Ltd, 2004. [46] P. Calaminici, K. Jug, and M. Köster, “Density funtional calculations of molecular polarizabilities anf hyperpolarizabilities,” J. Chem. Phys., vol. 109, no. 18, p. 7756, 1998. [47] A. J. Stone, The Theory of Intermolecular Forces. Great Clarendon Street, Oxford, ox2 6dp, UK: Oxford University Press, 2013. [48] M. J. Frisch and et. al, “Gaussian 09 Revision B.01,” 2009. Gaussian Inc. Wallingford CT. [49] M. J. Frisch and et. al, “Gaussian 16 Revision A.03,” 2016. Gaussian Inc. Wallingford CT. [50] J. A. Pople, M. Head-Gordon, D. J. Fox, K. Raghavachari, and L. A. Curtiss, “Gaussian-1 theory: A general procedure for prediction of molecular energies,” J. Chem. Phys., vol. 90, pp. 5622–5629, 1989. [51] L. A. Curtiss, C. Jones, G. W. Trucks, K. Raghavachari, and J. A. Pople, “Gaussian-1 theory of molecular energies for second-row compounds,” J.

65 Chem. Phys., vol. 93, pp. 2537–2545, 1990. [52] L. A. Curtiss, K. Raghavachari, G. W. Trucks, and J. A. Pople, “Gaussian-2 theory for molecular energies of first- and second-row compounds,” J. Chem. Phys., vol. 94, pp. 7221–7230, 1991. [53] L. A. Curtiss, K. Raghavachari, P. C. Redfern, V. Rassolov, and J. A. Pople, “Gaussian-3 (G3) theory for molecules containing first and second-row atoms,” J. Chem. Phys., vol. 109, pp. 7764–7776, 1998. [54] J. A. Montgomery Jr., M. J. Frisch, J. W. Ochterski, and G. A. Petersson, “A complete basis set model chemistry. VI. Use of density functional geometries and frequencies,” J. Chem. Phys., vol. 110, pp. 2822–2827, 1999. [55] J. A. Montgomery Jr., M. J. Frisch, J. W. Ochterski, and G. A. Petersson, “A complete basis set model chemistry. VII. Use of the minimum population localization method,” J. Chem. Phys., vol. 112, pp. 6532–6542, 2000. [56] A. D. Becke, “Density-functional exchange-energy approximation with correct asymptotic-behavior,” Phys. Rev. A, vol. 38, pp. 3098–3100, 1988. [57] R. A. Kendall, T. H. Dunning, Jr., and R. J. Harrison, “Electron affinities of the first-row atoms revisited. Systematic basis sets and wave functions,” J. Chem. Phys., vol. 96, pp. 6796–6806, 1992. [58] T. H. Dunning, Jr. and K. A. Peterson, “Approximating the basis set dependence of coupled cluster calculations: Evaluation of perturbation theory approximations for stable molecules,” J. Chem. Phys., vol. 1113, pp. 7799–7808, 2000. [59] R. S. Mulliken, “Electronic population analysis on LCAO-MO molecular wave functions. I,” J. Chem. Phys., vol. 23, pp. 1833–1840, 1955. [60] F. L. Hirshfeld, “Bonded-atom fragments for describing molecular charge densities,” Theor. Chem. Acc., vol. 44, pp. 129–138, Jun 1977. [61] C. I. Bayly, P. Cieplak, W. D. Cornell, and P. A. Kollman, “A Well-Behaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges - the RESP Model,” J. Phys. Chem., vol. 97, pp. 10269–10280, 1993. [62] A. V. Marenich, S. V. Jerome, C. J. Cramer, and D. G. Truhlar, “Charge model 5: An extension of Hirshfeld population analysis for the accurate description of molecular interactions in gaseous and condensed phases,” J. Chem. Theory Comput., vol. 8, no. 2, pp. 527–541, 2012. [63] B. H. Besler, K. M. Merz Jr., and P. A. Kollman, “Atomic charges derived from semiempirical methods,” J. Comput. Chem., vol. 11, pp. 431–439, 1990. [64] W. J. Hehre, R. Ditchfield, and J. A. Pople, “Self-consistent molecular orbital methods. XII. Further extensions of gaussian-type basis sets for use in molecular orbital studies of organic molecules,” J. Chem. Phys., vol. 56, pp. 2257–2261, 1972. [65] M. M. Francl, W. J. Pietro, W. J. Hehre, J. S. Binkley, M. S. Gordon, D. J. DeFrees, and J. A. Pople, “Selfconsistent molecular orbital methods. XXIII. A polarizationtype basis set for secondrow elements,” J. Chem. Phys., vol. 77, pp. 3654–3665, 1982. [66] T. Clark, J. Chandrasekhar, G. W. Spitznagel, and P. V. R. Schleyer, “Efficient diffuse function-augmented basis sets for anion calculations. III. The 3-21+G basis set for first-row elements, Li-F,” J. Comput. Chem., vol. 4, pp. 294–301,

66 1983. [67] P. M. Gill, B. G. Johnson, J. A. Pople, and M. J. Frisch, “The performance of the Becke-Lee-Yang-Parr (B-LYP) density functional theory with various basis sets,” Chem. Phys. Lett., vol. 197, pp. 499 – 505, 1992. [68] R. L. Rowley, W. V. Wilding, J. L. Oscarson, Y. Yang, and N. F. Giles, Data Compilation of Pure Chemical Properties (Design Institute for Physical Properties. New York: American Institute for Chemical Engineering, 2012. [69] D. R. Lide, CRC Handbook of Chemistry and Physics 90th edition. Cleveland, Ohio: CRC Press, 2009. [70] C. L. Yaws, Yaws’ Handbook of Thermodynamic Properties for Hydrocarbons and Chemicals. http://www.knovel.com: Knovel, 2009. [71] C. L. Yaws, Yaws’ Critical Property Data for Chemical Engineers and Chemists. http://www.knovel.com: Knovel, 2012. [72] C. P, D. Francois-Yves, D. Yong, and W. Junmei, “Polarization effects in molecular mechanical force fields,” J. Phys. Condens. Matter, vol. 21, p. 333102, 2009. [73] B. Jeziorski, R. Moszynski, and K. Szalewicz, “Perturbation theory approach to intermolecular potential energy surfaces of van der waals complexes,” Chem. Rev., vol. 94, no. 7, pp. 1887–1930, 1994. [74] V. Magnasco and R. McWeeny, Theoretical models of chemical bonding, vol. 4. New York, USA: Springer, 1991. [75] T. M. Parker, L. A. Burns, R. M. Parrish, A. G. Ryno, and C. D. Sherrill, “Levels of symmetry adapted perturbation theory (sapt). i. efficiency and performance for interaction energies,” J. Chem. Phys., vol. 140, no. 9, p. 094106, 2014. [76] D. M. Elking, G. A. Cisneros, J.-P. Piquemal, T. A. Darden, and L. G. Pedersen, “Gaussian Multipole Model (GMM),” J. Chem. Theory Comput., vol. 6, pp. 190–202, 2010. [77] R. Hentschke, E. M. Aydt, B. Fodi, and E. Schöckelmann, Molekulares Modellieren mit Kraftfeldern. Wuppertal, Germany: Bergische Universität Wuppertal, 2004. [78] B. G. Dick and A. W. Overhauser, “Theory of the dielectric constants of alkali halide crystals,” Phys. Rev., vol. 112, pp. 90–103, 1958. [79] P. C. Jordan, P. J. van Maaren, J. Mavri, D. van der Spoel, and H. J. C. Berendsen, “Towards Phase Transferable Potential Functions: Methodology and Application to Nitrogen,” J. Chem. Phys., vol. 103, pp. 2272–2285, 1995. [80] P. E. M. Lopes, J. Huang, J. Shim, Y. Luo, H. Li, B. Roux, and A. D. MacKerell. Jr, “Polarizable force field for peptides and proteins based on the classical drude oscillator,” J. Chem. Theory Comput, vol. 9, pp. 5430–5449, 2013. [81] P. J. D. Lindan and M. J. Gillan, “Shell-model molecular dynamics simulation of superionic conduction in CaF_2,” J. Phys.: Condens. Matter., vol. 5, pp. 1019–1030, 1993. [82] P. J. D. Lindan, “Dynamics with the shell model,” Mol. Simul., vol. 14, no. 4-5, pp. 303–312, 1995. [83] P. J. van Maaren and D. van der Spoel, “Molecular dynamics simulations of water with a novel shell-model potential,” J. Phys. Chem. B., vol. 105,

67 pp. 2618–2626, 2001. [84] G. Mie, “Zur kinetischen theorie der einatomigen körper,” Annalen der Physik, vol. 316, pp. 657–697, 1903. [85] J. E. Jones, “On the Determination of Molecular Fields,” Proc. Royal Soc. Lond. Ser. A, vol. 106, pp. 463–477, 1924. [86] R. A. Buckingham, “The Classical Equation of State of Gaseous Helium, Neon and Argon,” Proc. R. Soc. London Ser. A, vol. 168, pp. 264–283, 1938. [87] L. P. Wang, J. Chen, and T. V. Voorhis, “Systematic Parametrization of Polarizable Force Fields from Quantum Chemistry Data,” J. Chem. Theory Comput., vol. 9, pp. 452–460, 2013. [88] R. G. Parr, R. A. Donnelly, M. Levy, and W. E. Palke, “Electronegativity - density functional viewpoint,” J. Chem. Phys., vol. 68, pp. 3801–3807, 1978. [89] R. G. Parr and R. G. Pearson, “Absolute hardness: companion parameter to absolute electronegativity,” J. Amer. Chem. Soc., vol. 105, no. 26, pp. 7512–7516, 1983. [90] G. H. Golub and C. Reinsch, “Singular Value Decomposition and Least Squares Solutions,” Numer. Math., vol. 14, pp. 403–420, 1970. [91] D. A. Freedman, “Bootstrapping regression models.,” Ann. Stat., vol. 9, pp. 1–20, 1981. [92] E. S. Banjanovic and J. W. Osborne, “Confidence intervals for effect sizes: Applying bootstrap resampling,” Pract. Assess. Res. Eval., vol. 21, pp. 1218–1228, 2016. [93] W. M. Bolstad, Introduction to Bayesian Statistics. John Wiley and Sons, Inc., Hoboken, New Jersey.: Cambridge University Press, 2004. [94] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin, Machine Learning: A Probabilistic Perspective. 6000 Broken Sound Parkway NW, Suite 300: CRC Press, Taylor and Francis Group, 2014. [95] F. Facchinei, J. Judice,´ and J. Sõares, “Generating box-constrained optmization problems,” ACM Trans. Math. Softw., vol. 23, pp. 443–447, 199. [96] K. P. Murphy, Bayesian Data Analysis. Cambridge, Massachusetts, USA: The MIT Press, 2012. [97] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, Bayesian Data Analysis. 6000 Broken Sound Parkway NW, Suite 300: CRC Press, Taylor and Francis Group, 1996. [98] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by Simulated Annealing,” Science, vol. 220, pp. 671–680, 1983. [99] N. M. O’Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch, and G. R. Hutchison, “Open Babel: An open chemical toolbox,” J. Cheminf., vol. 3, p. 33, 2011. [100] “SMARTS - A Language for Describing Molecular Patterns.” http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html, 2008. [101] J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman, and D. A. Case, “Development and testing of a general AMBER force field,” J. Comput. Chem., vol. 25, pp. 1157–1174, 2004. [102] T. Darden, D. York, and L. Pedersen, “Particle mesh Ewald: An N-log(N) method for Ewald sums in large systems,” J. Chem. Phys., vol. 98, pp. 10089–10092, 1993.

68 [103] F. Mazzocchi, “Could Big Data be the end of theory in science?,” EMBO reports, vol. 16, pp. 1250–1255, 2015. [104] D. A. Anapolitanos, “Theories and their models,” J. Gen. Philos. Sci., vol. 20, pp. 201–211, 1989. [105] P. Achinstein, “Theoretical models,” Br. J. Philos. Sci., vol. 16, pp. 102–120, 1965. [106] B. T. Thole, “Molecular polarizabilities with a modified dipole interaction,” Chem. Phys., vol. 59, pp. 341–345, 1981. [107] M. Levitt, M. Hirshberg, R. Sharon, and V. Daggett, “Potential energy function and parameters for simulations of the molecular dynamics of proteins and nucleic acids in solution,” Comput. Phys. Commun, vol. 91, no. 1, pp. 215 – 231, 1995.

69 Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1793 Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-380687 2019