<<

20 YEARS OF POLARIZABLE

FORCEFIELDDEVELOPMENTfor biomolecular systems

thor van heesch

Supervised by Daan P. Geerke and Paola Gori-Giorg

1 contents 2

1 Introduction 3 2contentsForce fields: Basics, Caveats and Extensions 4 2.1 The Classical Approach ...... 4 2.2 The caveats of point-charge electrostatics ...... 8 2.3 Physical phenomenon of polarizability ...... 10 2.4 Common implementation methods of electronic polarization . . . . . 11 2.5 Accounting for anisotropic interactions ...... 14 3 Knitting the reviews and perspectives together 17 3.1 Polarizability: A smoking gun? ...... 17 3.2 New branches of electronic polarization ...... 18 3.3 Descriptions of electrostatics ...... 19 3.4 Solvation and polarization ...... 20 3.5 The rise of new challenges ...... 21 3.6 Parameterization or polarization? ...... 22 3.7 Enough response: how far away? ...... 23 3.8 The last perspectives ...... 23 3.9 A new hope: the next-generation force fields ...... 29 4 Learning with machines 29 4.1 Replace the functional form with machine learned force fields . . . . . 31 4.2 A different take on polarizable force fields ...... 33 4.3 Are transferable parameters an universal requirement? ...... 33 4.4 The difference between derivation and prediction ...... 34 4.5 From small molecules to long range interactions ...... 36 4.6 Enough knowledge to fold a protein? ...... 37 4.7 Boltzmann generators, a not so hypothetical machine anymore . . . . 38 5 Summary: The Red Thread 41 introduction 3

In this literature study we aimed to answer the following question: What has changedabstract in the outlook on polarizable force field development during the last 20 years? The theory, history, methods, and applications of polarizable force fields have been discussed to address this question. This investigation showed that the quality of the force field potential is detrimental for any type of atomistic sampling technique. If the underlying energy function contains flaws, these flaws are one way or another embedded in the fast amount of data we seek to understand to repro- duce a wide range of quantifiable observables accurately. Subsequently, we identi- fied two key challenges associated with the development of polarizable force fields: First, how to determine transferable and accurate parameters sets, and second, how to advance the underlying physical model without computational overhead. Af- ter twenty year the classical avenue of force field development has the modeling community finally made the transition towards a general acceptance of the need to develop more physically sound models. During the last 5 years machine learning techniques emerged to provide new means to remove bottlenecks in the current process towards the development of an accurate force field potential.

Next-generation atomistic force fields include polarization effects for the simula- tion1 ofintroduction biomolecular systems. How this apparent change happened is a different story, to explain this generational transition we begin our study in the late 1950s. Physicists Bernie Alder and Thomas Wainwright were the first to translate digital computation into the study of many particle systems.[1] Eventually, the offspring of their research brought the simulation method called (MD) to reality. Currently, classical MD simulation methods are being applied to study a multitude of physical, chemical, and biological systems, ranging from pure liquids to large complex systems such as proteins and cell membranes. [2, 3] As a result atomistic simulations have become an important tool to understand fundamental processes of biological systems. Since the pioneering work of Alder and Wainwright, computing performance has increased by more than trillion fold.[4] This rapid development of digital machines lead to the expansion of system sizes and increase of timescales. As this was not the only advancement, since technological advancement inspired in an equal man- ner the drive to search for faster, more efficient and accurate underlying physical models for our simulation methods. In response a diverse set of atomistic sim- ulations methods developed (co-)independently for the simulation of electrolytes, ionic liquids, metal organic frame works, biomolecular systems, and other types of nano-materials. In this study we will keep our attention focused on the simulation of biomolecular systems. For this particular simulation field, the inclusion of explicit polarization effect has dominated the evolutionary process towards obtaining an improved de- scription of the systems under investigation. To understand the reasons why the biomolecular simulation community chose to include polarization effects into the force fields: basics, caveats and extensions 4 atomistic model, we will let ourselves get inspired by the following remarkable ob- servation: The number of reviews that specifically address recent developments in polarizable force fields is numerous: [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. This (incomplete) selection of publications account together for more than 2000 citations according to Google Scholar. The period over which these reviews span, from 2001 till the present, amounts to almost two decades. Therefore, it might be fruitful to ask ourselves the following question: what has changed in the outlook on polarizable force field development during the last 20 years? First, this approach will allow us to assess the progress made and may show us the edge of where polarizable force fields are right now. Second, this method could expose whether there are persistent factors that hamper our progress towards a po- larizable force field for general use in biomolecular simulation. After the evaluation of this question we will discuss the open ends and look forward to how these new challenges can be solved. This will include the exploration of how Machine Learn- ing (ML) methods and novel sampling techniques can aid towards the development of general use biomolecular force fields. Finally, we will summarise these efforts and provide an outlook on the current status of polarizable force field development for biomolecular simulations.

Before we will proceed with an analysis of the aforementioned reviews we will provide2 force the necessary fields: background basics, information caveats to understand and extensions what is about to be discussed. The next sections will therefore explain the basics behind force fields together with the caveats of choosing a model that is based on fixed point charges, i.e., a non-polarizable force field. The first section discusses the components of the potential-energy function in terms of bonded and non-bonded interactions and seemingly continues with the caveats related to this classical approach. This next section will focus on the methods used to refine the model for its intent and purpose. Finally, the contributions of anisotropic charge distributions are discussed in the last section.

In2.1 molecularThe Classical dynamics Approach (MD) simulations are the atoms more often than not treated as point-like particles. Their reciprocal interactions in combination with the dy- namic equations of motion determine how the system will evolve over time. This simple approach is well suited to simulate the collective behavior of atoms in molec- ular structures and ensembles. Moreover, when a few assumptions are set aside, this level of theory can determine both the micro- and macroscopic properties of the respective system within the line of expectations. [17] However, there is still plenty of room for uncertainty to develop during the course of the simulation. The biggest assumption that makes atomistic simulations to some extent a conjecture is the lack of an explicit expressions for electrons. In addition, the lack hereof assumes that the system is always in the electronic ground state. force fields: basics, caveats and extensions 5 As it where, these factors seem like a major shortcoming for creating a faithful molecular model. But, the simplicity of the atomistic model is also its greatest strength. Accounting for the electronic behavior implicitly allows for long simula- tion times up to the regime of µs, while maintaining relative low computational cost.[18, 19, 20] Given the reason for treating atoms as simple particles, how do we resolve the need for inclusion of fundamental electrostatic effects without turning to on-the-fly quantum mechanical calculations? Atomistic simulation methods have developed an implicit way to account for the existence and effects of electrons. The electronic energy is formulated as a para- metric function of the nuclear coordinates and corresponding parameters are sub- sequently fitted to experimental or higher level computational data. Such a param- eterised potential that describes all forces in the system is called a force field. This description allows us to think more pragmatic about what chemistry conveys in molecular simulations. Simply put, chemistry becomes the knowing of the energy as a function of nuclear coordinates and molecular properties become the knowing of how the energy changes upon adding a perturbation to the system. [17] Deter- mining the physics using this approach is justified in classical mechanics, because this is the basis of methods that give access to the Boltzmann weighted ensembles from which macroscopic properties of the system directly follow. [21] Still, there are numerous ways to construction of such energy functions. In the following section we will therefore discuss the most important components for the construction of a force field.

An atomistic-potential energy function

2.1.1

Schematic illustration of the terms in a classical fixed-charge force field, i.e. bond stretching (Ebond), bond-angle bending (Eangle), dihedral- angle torsion (Etor- Figure 1: sion), and improper dihedral-angle bending (Eimproper) as well as van der Waals (EvdW) and electrostatic (Eele) interactions. Figure courtesy to ref. [22]

The majority of force fields describe the interactions between the atoms in the sys- tem via a potential energy function of the atomic coordinates. The total molecular potential energy is then composed of many terms and its exact form is unknown. Today classical biomolecular force fields use essentially the same gross approxima- tion for the potential energy as proposed by Levitt and Lifson in 1969:[23]

Epot(~r) = Espr(~r) + Eang(~r) + Edih(~r) + Eimp(~r) + EvdW(~r) + Eele(~r) (1)

Here is Epot the potential-energy function of the system and ~r are the coordi- nates of all atoms in the system. Espr, Eang, Edih, Eimp are the bonded terms force fields: basics, caveats and extensions 6 for bond stretching, bond-angle bending, dihedral-angle torsion, and improper dihedral-angle bending (or out-of-plane distortions) in the description of molecules. Together these terms account for all the energy contributions due to covalent inter- actions: Ecov = Espr(~r) + Eang(~r) + Edih(~r) + Eimp(~r). The interactions between atoms that are not directly connected via covalent bonds and bond angles consist of the potential-energy term for all nonbonded interactions: Enb = EvdW + Eele. Where the van der Waals term, EvdW, accounts for dispersion and core-core re- pulsion, and the latter term, Eele, for the electrostatic interactions of the system. Together with the potential-energy terms for the covalent interactions this again sums up to the full potential-energy expression of the force field:

Epot = Ecov + Enb (2)

See figure 1 for a schematic illustration of the terms in a classical fixed-charged force field. Without question, additional terms can be added to this general expression of the Epot, e.g., in the early days hydrogen-bonding interaction terms, or cross-terms that describe coupling between the first three terms of the covalent interactions and other types of restraining potential-energy functions could be utilized to prevent the system from drifting away. [17] However, these extra descriptors are usually correction terms to account for omissions made in the other terms to still obtain realistic free energy profiles, instead of improving the description of the other terms. Despite the major importance of all the electronic effects, the partial charges of the atoms are fixed during simulations that are based upon classical force fields. Historically, this choice was made due to the high computational cost and added complexity associated with a more physics based force fields. [24] Starting from the static decision the continuous development of fixed-charge force fields has actually shown to be surprisingly successful. A recent publication by S. Riniker illustrates the development of this class of force fields and provides an overview on the major force-field families, with detailed discussions about the dif- ferent covalent and nonbonded force-field terms, parametrization strategies and the historic background of these models. [22] Therefore, the aspects mentioned above revolving around non-additive force fields will only be touched upon in this litera- ture study. Instead, we will mainly focus on the effects and importance of extending the description of the classical electrostatics in biomolecular simulations.

Van der Waals potential

In (almost) all fixed-charged force-fields, the van der Waals interactions are de- 2.1.2 scribed using a 12-6 Lennard-Jones (LJ) functional form

vdW C12(i, j) C6(i, j) Eij (rij) = 12 − 6 (3) rij rij

12 6 with C12(i, j) = 4ij(σij) , C6(i, j) = 4ij(σij) , where rij is the distance be- tween atoms i and j, and where ij denotes the depth of the attractive well and σij the inter-particle distance where the potential changes sign.[25, 26] The function of van der Waals energy, Evdw, is to describe the repulsion or at- traction between atoms that are not directly bonded. Often the Evdw is interpreted force fields: basics, caveats and extensions 7 as the non-polar part of the interaction not related to electrostatic energy due to charges.[17] Hence, the atoms in the system are grouped into atom types, which all have respective LJ parameters and typically represent atoms in a specific chemical environment, for example, a nitrogen in an amine group or oxygen in a hydroxyl- group, and so forth. However, it should be noted that this widely used non-bonded term also has some (major) flaws embedded. Recently Frenkel et. al noted that over the course of time the LJ potential, which originally has been derived to describe the cohesive energy of crystals of noble gases, is used so often for systems where it is not expected to be particularly realistic that the disadvantages of the LJ potential have become very relevant these days.[27] The implications of the arguments made against the LJ 12-6 potential can be summarised as follows: the LJ 12-6 potential is anything but a well- defined standard and, in particular for proteins and nano-colloids it is not a good model. Mainly, because the range of attraction is too large compared to the effective diameter. To top it off, Frenkel et al. argue there is not obvious indication of why a truncated LJ 12-6 potential should have any special merits that outweigh its disadvantages.[27] Instead, they argue that no such advantage exists and present a simple model pair potential to unify all the different unambiguous truncation methods used in the past. In contrast to all the shifted, truncated and interpolated LJ 12-6 models, once 1 a cut-off distance, rc, is specified the model is uniquely defined. If required, their class of LJ-like potentials could be used for numerical studies of systems of particles with short-ranged attraction. For example, a set of thermodynamic and transport properties is reported for the cases rc=2.0 (“atomic liquid”) and rc=1.2 (“colloidal suspension”).[27] In short, the main message is that we should be careful with con- tinuing trends based old habits during the construction of next-generation force fields.

Electrostatic energy

The electrostatic interaction of the nonbonded interaction should account for the 2.1.3 internal (re)distribution of the electrons, creating positive and negative parts of the molecule. [10] At the lowest order of approximation, this electronic behaviour can be modeled by assigning a bond dipole to the bond or, more conventionally, partial charges to each atom. For example, in classical fixed-charge force fields, this pairwise Coulomb interactions between the point charges of atoms i and j is considered as ele qiqj 1 Eij (rij) = (4) 4πε0ε1 rij

where q is the partial charge, ε0 the electric constant, ε1 the background dielectric permittivity, and rij the distance between atoms i and j.[22] The classical electro- 1 static interactions can be viewed as the remaining long-range part, ( r ), of the quan- tum mechanical electrostatic interactions, after all short-range contributions like bonds, repulsion and dispersion are removed. Still, obtaining a good description of the electrostatic interaction between molecules (or between different parts of the same molecule) is one of the big challenges in force field developments. [10] The

1 The united model potential vanishes quadratically at a cut-off distance rc and represents now specific substance. The authors stress that their model rather represent generic models. However, the authors mention, I cite, "very often the ubiquitious LJ 12-6 potential is used in exactly the same way."[27] force fields: basics, caveats and extensions 8 predictive ability of MD simulations relies on the accuracy of the underlying force field.

From2.2 The equation caveats3 it of is point-charge evident that electrostatics not all types of electrostatic interactions are ac- counted for, and as such the deficiencies in the electrostatic potential that result from the fixed partial atomic charge approximation have been widely acknowl- edged over the past decade.[11, 10, 28, 12, 29, 9, 13, 30] Both Jensen and Hagler independently made special efforts to summarize all the components that currently are unaccounted for in most “standard” force fields and are likely required to achieve experimental accuracy in (bio)molecular force fields.[17, 31] The following 8 points summarize the combined efforts in addressing the caveats associated with fixed- point charge force fields:

Electronic polarization effects are not account for, as fixed-point charge mod- els only two-body interactions are included. However, for polar bodies the i) three-body contribution is quite significant, perhaps 10–20% of the two-body term. [32] Moreover, the response of the molecular dipole moments to vari- ances in dielectric conditions are not taken into account by the fixed charge approximation models.[33, 34]

The geometric dependence of charge and higher order multi-poles is omitted. [35, 36] Studies have shown that both partial atomic charges and higher-order elec- ii) tric moments have a significant dependence on the geometry. Consequently, these quantities cannot fulfill the requirement of transferability in parameter sets.

The effects and energetics associated with charge transfer (CT) that occur be- tween compounds, as for example in H-bonded complexes, are not accounted iii) for in additive force fields. [37, 38, 39]

The partial charge model cannot correctly model the electrostatics surrounding a molecule, resulting in errors in the electrostatic potential ranging between iv) 10 to 20 kJ/mol.[17]

From section 2.1.2 we already concluded that the use of the standard LJ potential is not really justified. In addition, the following points will specifically address caveats hidden in the van der Waals potential and why these components should be accounted for or revisited.

The r−6 dispersion is an approximation omitting higher order terms, which have been indicated to be important for close packed systems such as globular v) proteins. [40, 41]

Short range electrostatic penetration (CP) effects due to electron cloud overlap are un accounted for. [42, 43, 44, 45] CP effects arises naturally when two vi) atoms, at a typical van der Waals distance apart, penetrate each others electron clouds. The results is an attractive interaction, which cannot be described by a fixed partial charge model and the effect is instead accounted for implicitly in the van der Waals energy term. force fields: basics, caveats and extensions 9 Another reason to revisit the van der Waals potential originates from the fact that atoms are not always spherical, which results in anisotropy of van der vii) Waals parameters. [46, 47, 48, 45, 49]

The need to account for anisotropic interactions has been further demon- strated by showing that inclusion of anisotropic exchange-repulsion, charge viii) penetration, and dispersion effects, along with atomic multi-poles yielded substantive improvement in accounting for 91,000 dimer energies of 13 small organic compounds as well as several experimental systems. [50]

What should be noted from this exhaustive but most certainly incomplete list is that errors of a few kJ/mol are enough to draw the wrong qualitative conclusion from a simulation. In addition, as a more general observation about the points above, the origin of the errors made in standard force fields can be condensed to five topics that either require re-evaluation or an improved description. In summary, the 5 main caveats of point-charge electrostatics can be traced back to omissions and or lack thereof in polarization, dispersion, anisotropic, charge transfer and charge penetration effects, see figure 2 for an illustration of these 5 themes.

Electron cloud representations that illustrate 5 main caveats of point-charge electro- statics: electronic polarization, dispersion, anisotropy, charge transfer and charge Figure 2: penetration effects.

As mentioned in the first sentence of this study, the biomolecular simulation community chose to improve the classical potential by accounting for electronic polarization effects first. One of the main reasons is that we are interested in sim- ulations of solvated biological macro-molecules. In these types of simulations vary the electronic properties as a function of the environment, especially electrostatic interactions represent the dominant interaction in polar environments. The commu- nity therefore anticipated to yield a more physically realistic and consistent model if one would start including polarization effects.[51] Mainly because additive force fields do not provide information about the dependence of the charge distribution in the thermodynamic state of the system, nor do they resolve molecular motions causing fluctuations in the electric field.[52, 53] In addition, point charges are inadequate to reflect the electrostatic interactions in systems with varying polarity, such as proteins. Besides, omitting explicit po- larizability effects draw on when defining molecular interactions occurring in am- phiphilic environments,[54, 55] as the contribution of electrostatic static interactions to the system will either be overestimated or underestimated.[56] However, when force fields: basics, caveats and extensions 10 electronic polarization effects are taken into account the dipoles moments of the molecular fragments in core of the system are actually susceptible to the total elec- tric field produced by the rest of the system as well as its direct molecular environ- ment. The inclusion of an extra set of polarizability parameters is therefore consid- ered to be the answer in treating hydrophobic versus highly polar environments in a transferable way.[54] The perfect force field potential obviously does not only account for polarization effects, however, we hope that the message is clear: there is plenty of room to improve the classical force field. Still, it remains to be seen if a combination of tech- nological advancements and scientific goodwill will achieve a long overdue transition in atomistic simulations that will surpass the elementary description of electrostat- ics that atomic point-charges currently offer.[17, 31, 30] In the remaining sections of this chapter we will therefore explore the physical phenomenon of polarizabil- ity in more depth and show how to upgrade the classical force field by explicitly accounting for polarization effects.

In2.3 general,Physical the phenomenon polarization of of polarizability a body depends on the electric field strength. This relation can be explained as follows: All matter is built up of electrically charged particles. To the core they are either negative electrons or positive nuclei. Togther these particles combine to neutral atoms and molecules. But in other cases the combination results in charged particles, for example, ions either in solutions or crystals. [57] What these constituents have in common is that during the act of po- larization the electron density of a these particles is being reshuffled by an external electric field (usually generated by other molecules). [53] Subsequently, the electric field polarizes the electron density away from the nuclei, creating a depletion of charge on one side of the molecule (δ+) and an increase in charge on the other side (δ−). The result is a more asymmetric distribution across the electron density that causes a dipole along the field lines. The difference between the dipole moments before and after the application of the field is defined as the induced dipole moment. And the degree to which these induced dipoles align with (and augment) the electric field is referred to as polarizability. If a body shows an induced dipole moment differing from zero upon application of a uniform field, the body is said to be polarizable. [57] What should be noticed is that the magnitude of the polarizability of a molecule correlates with the interaction of electrons and nuclei. The number of electrons affects how much influence the nuclei have on the overall charge distribution in the molecule. Less electrons means smaller and denser electron clouds, resulting in less pronounced shielding effects. For this reason are small atoms typically less prone to become polarized by an electric field, because their electrons are more tightly localized around their nucleus. In contrast, large negative ions are easily polarized, since diffuse electrons clouds and large atomic radii limit the interaction of the outer electrons with the nucleus. In most cases the polarizability is well described as isotropic and scalar propor- tionality can be assumed. When the inducible dipole moments, µ~i, are assigned to (heavy) atoms i they can subsequently be determined by using equation 1, where force fields: basics, caveats2 −1 and extensions 11 αi is the polarizability of the species with units of Cm V , and the electric field at that site, E~i.[53, 52] µ~i = αi(4πε0)E~i (5)

Eele = Eself + ECoulomb

1 ECOS = Σ k d2 self 2 D,i i Here, the atomic or molecular polarizability tensor can be represented as a tensor. In the case of isotropic polarizability, the tensor has three non- zero elements on the diagonal and αxx = αyy = αzz. However, we can also consider the case of anisotropic polarizability, the tensor is still symmetric (ααβ = αβα, α, β = x, y, z) with the diagonal elements generally not equal to each other and non-zero off- diagonal elements (in an arbitrary coordinate system). We can write the induced dipole moment as       µx αxxαxyαxz Ex       µy = αyxαyyαyz Ey (6) µz αzxαzyαzz Ez

Often, the values are experimentally not known for the off-diagonal elements, although in the past various algorithms have been proposed to derive the atomic polarizability tensor in a molecule from ab initio calculations.[58, 59] Yet, this may be not really necessary to capture the major polarization effects in case the polariz- ability is not very anisotropic, i.e., αxx, αyy, and αzz are not very different from each other. For example, for water the diagonal components of the molecular po- −3 3 larizability are αxx = 1.415, αyy = 1.528, and αzz = 1.468 in units 4π0·10 nm [84], where the x-axis is perpendicular to the plane of the molecule and the y-axis is parallel to the line connecting the two hydrogen atoms. Since these values differ by less than 4% from their average, using an isotropically polarizable dipole in a water molecule is a simple, but yet accurate way to account for its polarizability. Furthermore, note that the polarizability of a single atom is just isotropic. But if one considers the polarizability of a larger systems of covalently bonded atoms we have to consider the anisotropy. Meaning, the vector components of the induced dipole moments and the strength of the electric field may have different orientations. [52] This directional effect arises when a molecule or ion is not com- pletely spherical and thus the molecular polarizability cannot be described as a scalar component. Thus, for a complete description the use of a polarizability ten- sor is necessary (while still assuming that the effects remain linear). [57] When the anisotropic component the polarizability should be taken into account will be fur- ther discussed in the implementation methods of polarization methods in section 2.5.

Polarization2.4 Common is implementation a response property, methods in of terms electronic of molecular polarization interactions polarization this means we are dealing with non-additivity, meaning that if molecule A is polar- ized by molecule B, A will interact differently with molecule if A would not have force fields: basics, caveats and extensions 12 been polarized by B. We thus have to find a way such that our model can account for this non-additive response in its environment. Over the past decade this important question of how to explicitly treat polarization effects has been answered in various ways. Hence, multiple families of force fields have explored and implemented po- larization effects in order to model atom polarization during molecular simulations. However, the by far three most popular approaches are the induced point dipole (IPD), Drude oscillator (DO) and fluctuating charge (FQ) model. Figure 3 illustrates the electron clouds of the three different implementation methods compared to the fixed-charge model.

Electron cloud represented by the fixed-charge model and polarization models. Both the induced point dipole and the Drude oscillator models can represent the Figure 3: deformation of the electron cloud (dipole induction). The fluctuating charge model can represent the redistribution of charges within a molecule, while the induced dipole or the Drude oscillator models are required to represent the electron de- formation of monoatomic ions and out-of-plane polarization. Figure courtesy to ref. [13]

By accounting for the extra polarization term, the total electrostatic energy is now a sum of the Coulomb energy between all the charges and dipoles in the system and a self-energy term corresponding to the work needed to change the charge distribution: [13] Eelec = Eself + ECoulomb (6)

To account for the non-additive polarization effects we only have to define and solve the expressions for the self-energy corresponding to the polarization model. The following paragraphs will describe the three main polarizable models in more detail.

Fluctuating charge

In the fluctuating charge model is the partial charge of each atom placed at the site 2.4.1 of the atomic nucleus. The basis of this model is the most similar to the classical non-polarizable force fields. However, the partial charges in the fluctuating charge (FQ) treatment are not static. Instead, they are free to change during the simulation. This dynamic flow of charges allows the model to account for polarization in the system. FQs are implemented by assigning fictitious masses to the charges that are evaluated as separate degrees of freedom in the dynamic equation. Charges keep flowing between atoms until instantaneous electronegativities on the atomic sites are equalized. [60] These instantaneous electronegativities depend on the elec- force fields: basics, caveats and extensions 13 tronegativity of the atom, the hardness that represents the resistance of the electron to flow from or towards the atom and lastly the external electrostatic potential. The main advantage of the FQ force field is that no extra interaction terms than in classical force fields are required, and the electrostatic interactions are calculated using a standard Coulomb potential. This cost effectively manner represents polar- ization that occurs in the direction of atomic bonds with a breeze. Through-bond and through-space polarization are actually treated equivalently in the FQ model. Use this same principle of treating polarization effects in such a way enables in principle the description of intermolecular charge transfer. [17] However, the major drawback of the FQ approach becomes apparent when polarization in any direction other than the bond should be accounted for. [11] Take for example an aromatic molecule such as benzene, the FQ cannot account for the charge polarization when the external field is perpendicular to the plane of the molecule. The lack of out- of-plane charge density representation can easily be solved by including additional point charges. [61, 62] Up until now the charge equilibration (CHEQ) force field has put the most effort in applying the FQ model in biomolecular simulations. [63, 64] However, the ABEEMσπ polarization force field that is also based upon the FQ principle has been made applicable for the simulation of base pairs with amino acid residue complexes.[65]

Charge on spring

The Drude oscillator (DO) or charge-on-spring (COS) works also on the basis of 2.4.2 representing an inducible dipole moment at polarizable sites. [66, 67, 52] Each po- larizable atom is depicted by a pair of point charges. However, what is different compared to the ID model is that at the core of the atom one partial charge is in- stalled and the other charge is attached via a harmonic spring. Summing these two charges yields the total partial charge of the site and the magnitude of the total partial charge is kept constant during the whole of the simulation. This charge on a spring has no mass and is called the Drude particle and similar to the use of pseudo-atoms for modeling lone pairs. The Drude particle is free to move anywhere around the atomic center in response to the electrostatic environment. The resulting displacement gives rise to atomic dipole polarizability. [68] This approach is consid- ered to be the most intuitive from a chemical point of view, as the two particles carrying both partial charges can be considered as the nucleus and electron cloud of the atom that together mimic the polarization effects. Several groups have put considerable effort in the development of Drude oscillator-based polarizable force fields, but thus far only MacKerell et al. made their polarizable force field (based on the CHARMM package) suitable for the simulation of biomolecules. [51, 12]

Polarizable point dipole

The next method adds a set of inducible point dipoles assigned to polarization 2.4.3 sites, i.e., to an atomic center, lone-pair or interaction site between bonds, [69, 70] and maintains the framework of fixed atomic charge force fields. To account for polarization the field due to the explicit charges is calculated first. Thereafter are the dipoles calculated as the field multiplied by a local polarizability tensor. [71] Induced dipoles themselves also create an electric field, in a response to the perma- force fields: basics, caveats and extensions 14 nent charges, and mutual polarization between induced dipoles is present. [53] Elec- trostatic interactions can therefore no longer be approximated by just charge-charge interactions described by a standard Coulomb potential. An iterative procedure is usually employed to solve the non-additive electrostatic energy term that should also includes the charge-dipole and dipole-dipole interactions. [71, 72] The most frequently used polarizable force field based on inducible dipoles is implemented in the AMBER simulation package as the AMBER ff02 force field. [73, 72] The other major induced dipole model is the AMOEBA force field.[8] A brief background on the use of Thole screening factors. At the start of the devel- opment of the induced-dipole models it was discovered that the model is suscepti- ble to a so-called polarization catastrophe, although this phenomenon is not exclusive to this type of model. At short range point dipoles could cause artificially strong in- teractions between one another. To alleviate this problem, Thole proposed to screen the polarization interactions if two atoms cross this distance threshold with a damp- ening function, which later became known as the Thole model. [8, 74]. Even though the Thole models has been considered effective for many years, however, recent studies revealed much of the inadequacies related to the use of the Thole damping function.[75, 76] One alternative strategy to prevent the “polarization catastrophe” from happen- ing that has been used in the "early days" for solvent models is done by damping the linear dependence between the polarization and the electrostatic field. In this case the polarizabilities are damped when the electrostatic field has higher magnitude than a predefined threshold to prevent the unlimited polarization.[77, 78] More re- cently, simple modifications ought to remedy some of the inadequacies of the Thole model, which resulted in improvements by separating the parameters of the nonlin- ear effects and optimizing the exponent in the many-body energy function.[79]

Above,2.5 Accounting we have discussed for anisotropic that oneinteractions of the major short-comings of most force fields is the inability to model anisotropic charge distributions. Point charges are not able to reproduce relevant interaction features, or they are unable to properly estimate the directionality of the interactions.[80] Having the proper knowledge of about this spatial charge effect embedded in the force field is critical for determining the equilibrium geometry and energy of molecular complexes. Anisotropy becomes especially important for charge distribution such as σ-holes, lone pairs, and aro- matic systems. A few examples of planar molecules that have obvious polarizability anisotropy are shown in figure 4. Considering these type and conformational spe- cific charge effects, it is no easy task to develop a polarizable force field that properly models all anisotropic effects. During the quest of how to account for anisotropy it is therefore important to ask the questions of what is the right balance for includ- ing anisotropic effects and how much rigor should be applied to account for these spatial charge effects. Anisotropy can be accounted for in both polarizable and non-polarizable force fields, however, the difference is made by to what extend the force field can describe the dynamic character of the charge. In principle an anisotropic object would be stretched or deformed by a force to varying degrees depend on its orientation. In a force fields: basics, caveats and extensions 15

Representative molecules that have substantial polarization anisotropies. Figure courtesy to ref. [81] Figure 4: static manner one could account for this deformation before the actual deformation occurs by introducing extra point charges that are usually fixed in magnitude and position based on the type of interaction they should represent. For example, anisotropy can be included by the addition of off-centered point charges to represent lone pairs. Early on the addition of off-centered charges was already found to be both effective in improving structures and interaction energies as well as an easy to implement method.[82] This approach of adding virtual charge interaction sites has subsequently been applied to model electron lone pairs.[12] This simple solution is able to mimic the complicated directionality of H-bonding interactions reasonably well. [11] In addition, the same remedy can also be ap- plied to describe σ-holes. The reason being that the σ-hole is actually a region with positive electrostatic potential on halogen atoms, which has the possibility to interact with a lone pair on a hetero-atom, thereby forming the so-called halogen bond. In the classical fixed-point charge model, the halogen atoms have a spheri- cal negative electrostatic potential, but by simply attaching an off-centered positive charge to the halogen atom an extra interaction site is created.[83, 84] Similarly, off-centered charges can also be used to account for π-bonding by attaching two negative charges to each heavy atom, although this approach has found to be com- putationally inefficient.[84] Still, one might wonder, with how much confidence is the position of such a vir- tual site determined? Some methods used ab initio derived dipole and quadruple moments to fit optimal positions of the charges.[85] However, even after refine- ments this method yields a large number of possible locations for the placement of these charges. Other methods are exclusively applicable for specific types of bonding, such as the σ-hole and halogen-bonds, limiting the transferability of the efforts.[86, 83] Or in a like-wise manner the approaches are tailored towards indi- vidual elements, such as sulfur.[87] Another path is to use experimental data in conjunction to parameterize the positions of the off-centered charges.[87] Adding off-centered charges has shown to be an effective, albeit ad hoc, solution for describing different interaction types, including H-bonding. Still, even if the QM/DFT community has not come to a consensus about the complex nature of hy- drogen bonds,[88] how can off-centered charges properly describe its mechanism? One aspect where it becomes evident that adding off-centered point charges are force fields: basics, caveats and extensions 16 no panacea is illustrated by how poorly this method reproduces the electrostatic po- tential (ESP) of a molecule.[80] By this reasoning, a more recent avenue aimed to determine off center point charges by optimizing the recreation of the QM ESP with additional point charges.[89]23 Their approach shows promise, as this method was able to reduce for a set of small molecules the atom’s mean ESP error by 65.8%. Despite this, the authors note that they cannot say with certainty that their method is appropriate for all possible organic molecules, even though it has already been applied successfully to over 100 small organic molecules. On the other hand, a minimal distributed charge model has also been developed to determine a minimal number of off-centered point charges to approach the reference ESP.[92] Overall, there is no clear answer to whether more or less off-centered charges improve the anisotropic charge description. Nevertheless, if the static approach is not sufficient enough to model anisotropy, how do we account for the anisotropic charge deformation? Polarizable force fields are more flexible than their non-polarizable counter-parts and are therefore able to address the anisotropy problem dynamically. For example, in the Drude Oscillator model the atomic polarizability, α, of a given atom is given by equation 2.5.[68]

Here is qD the charge on a Drude particle and kD the force constant for the spring that connects the Drude particle. q2 α = D (6) kD

In this model could, kD, for atoms that act as H bond acceptors actually be treated as a vector rather than a scalar, allowing for an anisotropic representation of po- larizabilities. [11] However, a disadvantage of the Drude particle method is the introduction of extra charges, which means more interactions to evaluate. And thus

in practice, a constant value of kD is used for all atoms that results in qD determin- ing the polarizability of an individual atom. Another approach to improve the dynamic description of anisotropic charge dis- tribution is the use of atomic multipoles, the model that is utilized by the polariz- able point dipole force field approaches.[8, 9, 10, 13] Multipoles are series expan- sions that can represent arbitrary angular distributions. Atomic multipoles trun- cated at quadrupole have shown to be sufficient to model common chemical inter- actions, such as the previously mentioned σ-holes, lone pairs, and π-bonding.[30] However these higher-order moments also come at a certain cost. Furthermore it has also been shown, in contrast to common assumptions, that by itself the multi- pole approach cannot completely account for all energetically important effects of atomic-level anisotropy.[50] See figure 5 for a comparison of the ESP based upon point-charges, multipoles and QM approaches. A more recent approach that aimed for the reproduction of ab initio polarizabil- ity anisotropy is the work of Wang et al. in which a set of atomic polarizability parameters for a new polarizable Gaussian model (pGM) has been developed.[81] Instead of fitting the ESP, this method takes advantage of the ability of the Gaussian model to screen all short-range electrostatic interactions and subsequently fit the ab initio molecular polarizability tensors directly. The authors suggest that by using

2 This QM ESP fit is very similar to an slightly older approaches, [90, 91] although difference is made in how both methods treat symmetry. The new method maintains the symmetry around the atom and bonding environment when placing the off center point charges. 3 The framework, QUBE, in which this methods is applied, is discussed in more detail in section 4.3. knitting the reviews and perspectives together 17

Comparison of electrostatic anisotropy illustrated by the ESP of bromobenzene with (i) off-centered charges and (ii) atomic multipoles, and (iii) the reference QM Figure 5: potential. Figure courtesy to ref. [13]

Gaussian functions to seamlessly treat multipoles and electron penetration effects, a promising polarization framework has been developed. The main reason being that the pGM will most likely improve the stability in charge fitting, energy, and force calculations and the accuracy of multi-body polarization.

In the following chapter we will discuss the content and outlooks of 12 reviews and perspectives3 knitting that addressed the reviews the developments and perspectives of polarizable force together fields over the past 20 years.

The3.1 firstPolarizability: review in A our smoking list already gun? concluded that without a doubt polarizable force fields provide a far more superior physical description compared to their fixed-charged counterparts.[5] The future seemed bright as the authors, Halgren and Damm, noticed that after twenty years the field finally started to gain traction within the (bio)molecular simulations community. Despite the technical limitations of the 2000’s, there was still no real proof or reason to believe that including polariz- ability would improve in accuracy, for example, in the calculation of ligand-receptor binding affinities, i.e., there was as of yet no "smoking gun".[5] Six years later in 2007 Warshel et al. made special efforts to review a wide range of applications where the inclusion of polarizability is important.[6] Unfortunately, the trial’s long-awaited smoking gun still failed to surface, as it turned out that the inclusion of induced dipoles made no major difference in binding calculations of knitting the reviews and perspectives together 18 neutral molecules.[93] At the time, the convergence problems were still more prob- lematic than the errors associated with the implicit inclusion of the induced dipoles in the parametrization procedure. Another highlight in the review described the procedure of a systematic (polarizable and non-polarizable) force field calibration by calculations on cation solvation energies with water and cation binding sites of proteins.[6] Interestingly, the polarizable models performed better only when the system was moved from water to other environments and even then only in the case of divalent ions (when dealing with ions that are in contact with water). What should have been noticed by now, is that a general biomolecular force field was still far far away. For example, the study above about the cation solvation en- ergies had to carry out a complete empirical parameterization procedure on the system of interest. Nevertheless, theses early (not especially convincing) studies seem like a small steps, but the subtle differences between polarizable and non- polarizable force fields stimulated a more widespread realization of the importance to account for polarization effects. To wrap up this review, Warshel et al. men- tioned in their outlook that consistent quantum mechanical studies with QM/MM inclusion of the rest of the environment should be extremely useful in further force field development, as such, the effect of the induced dipoles can be separated from charge-transfer effects.[6]

In3.22009Newalready branches Cieplak of electronic et al. touched polarization exactly on this latter point, as the review focused for a major part on the progress of how electronic polarization effects are in- corporated into force fields.[7] At this point in time, roughly five active branches of polarizable force fields methods were being in development: the fluctuating charge (FQ), Drude oscillator, induced point dipoles, electronic polarization via quantum mechanical treatment (QM) or mixed QM/MM,[94, 95, 96] and polarization treat- ment using a continuum solvent.[97] Most efforts had been related to explicitly induced dipoles and fewer to Drude oscillator and FQ approaches. In general, the parameterization of all these aforementioned approaches were considered to be a much slower process compared with additive counterparts. Therefore, the applica- tion of polarizable force fields was sparse and became even less due to the extra associated computational costs.[7] The reason for the upstart of these many differ- ent branches was attributed by Cieplak et al. to the fact that the force fields of ten years ago were "overdue for much-needed advances". However, the question of what approach to include polarizability will become the method of choice or is most suited for biomolecular simulations remained to be seen. From 2000 till 2010 the much needed development occurred and molecular force fields were about to approach a generational transition, moving away from well- established and well-tuned, but physically less sound, non-additive point charge models towards more realistic and expensive polarizable models. Ponder et al. wondered if the new polarizable force field parameterizations actually reached a new level of predictive power over their non-polarizable predecessors.[8] In their review they addressed the validation of the AMOEBA force field, which was a leading example of the next-gen force fields at at the moment of the publication.[8, 98, 99, 100, 101] knitting the reviews and perspectives together 19 AMOEBA focused on the description of intermolecular interactions, which is es- pecially important for protein-ligand binding predictions. The result was a much improved reproduction of structural and thermodynamic properties. For example, in study that investigated the binding free energies of a series of benzamidine-like inhibitors to trypsin with both a polarizable and non-polarizable potential.[102, 103] The study revealed that the polarization between water and benzamidine was re- sponsible for 4.5 kcal/mol out of the total 45.8 kcal/mol hydration free energy. On the other hand, the polarization between trypsin-in-water and benzamidine lowered the binding energy by 22.4 kcal/mol. These diverging results can be explained as follows: when polarization effects are included in the description of the medium, the protein becomes responsive and can screen the permanent electrostatic interac- tions, thereby weakening the attraction between benzamidine and trypsin. The results from the protein-ligand binding studies with inclusion of polarizabil- ity looked very promising, although the hydration free energies (HFE) of the drug- like compounds in the SAMPL 2009 test revealed much larger errors.[104] The main source of these errors in the HFE were attributed to uncertainty in the experimental data as well as to problems with halogenated molecules and nitro compounds.4 Point dipole and Drude oscillator, both use point charge models to account for the permanent electrostatics, which in essence limits the physical character of the method. Real atoms are anisotropic while the point charge models are intrinsically isotropic.[30] Lone pairs, π-clouds and σ-holes are some examples of anisotropic in- teraction sites that are caused by specific electron distributions in molecules. These effects can be incorporated within point charge models by adding extra point charges, but require extra computation or extra attention during the development and pa- rameterization process.[105] In comparison„ multipole models naturally capture any non-spherical contribution of the atomic charge density, due to the inclusion of monopoles (charges) and higher order terms such as dipoles and quadrupoles. Still, Ponder et al. acknowledged that further fine-tuning would be necessary to accu- rately describe dynamical properties that are not sampled at ambient conditions as well as in the description of aromatic interactions (paradoxically interactions involv- ing anisotropy) as well as for solvent models.

Regarding3.3 Descriptions the latter of electrostatics obstacle involving solvation free energies, the description of solvation and electrostatics go hand in hand when studying biomolecular folding, binding, enzyme catalysis and dynamics. Consequently, understanding the rela- tion between these two phenomena is ought to be important for the development of polarizable force fields. The in 2012 published review by Ren and co-workers discussed the advances made in the solvation of biomolecules with a sharp aim at computational biophysics,[9] and thus including those with interest in the ad- vancement of polarizable force fields. By now several groups have performed MD simulations using polarizable force fields to study ion behavior or to determine ion solvation energies employing the three different approaches for including polariz-

4 The atomic polarizability values in the AMOEBA force field were taken from original work done by Thole, which did not include halogen atoms.[74] The nitro compounds were challenging because of large bond length changes between the gas and liquid phase, as well as intricate “push-pull” polarization that was not captured in their “simple” polarization model. knitting the reviews and perspectives together 20 ability effects mentioned above.[100, 106, 107, 108] Despite all the efforts, modeling explicit ions remained a considerable challenge due to the subtle nature and com- plex dynamics of ions.[9] Without regard to the abstinence5 in the development in classical polarizable force field methods for ions, the development of advanced classical electrostatic model beyond simple polarization was a great area of interest in the early 2010’s. In addition to polarization effect more quantum effect became the topic of interest, for example, local charge-transfer (CT) and penetration effects displayed an important role for short-range molecular interactions in water, aromatics and high-valence ions.[109, 45, 110, 111] Inclusion of empirical, additive terms for CT and penetration effects have been shown to be rather effective, due to their short-range nature.[112, 109, 110, 113] Mainly, because these interactions can be treated with a local cut- offs, resulting in negligible additional computational cost relative to polarizable electrostatics computed with particle-mesh Ewald summation. In light of these improvements, Ren et al. (almost just as how Warshel et al. did) made the prognosis that the future direction would start to focus more on applying ab initio treatments and polarizable force fields or hybrid QM/MM approaches.[9]

One3.4 yearSolvation later andin 2013 polarizationCisneros et al. reviewed methods that accurately describe electrostatics in classical biomolecular simulations in explicit solvents. [10] Apart from the computational methods that have been developed to deal with long-range nature of the electrostatic interaction, the ways of representing the molecular elec- tronic charge cloud beyond the fixed point-charge representation were discussed extensively. Note, I focus solely on the aspect of including polarizability, but Cis- neros et al. also showed examples where the molecular electronic clouds can be accurately described by the inclusion of higher-order multipoles and or continuous electrostatic functions.[10, 114] Again, the authors state that in spite of the presence of various models for treating polarization effects, the progress toward the develop- ment and applications of general-purpose polarizable force fields was still limited. The limitations were once more attributed to concerns related to computational speed and the lack of understanding of the importance of polarization effects. An- other facet blocking the way towards a widely applicable polarizable force fields are found in the challenges associated to other representations in the potential energy function, such as the van der Waals interactions and the short-range valence term. At the time, most polarizable force fields were derived from both gas-phase ab ini- tio data and experimental properties for parametrization, although to a differing ex- tent. The basis of deriving more accurate representations of the molecular electronic clouds depend (often) upon the fitting the electrostatic property of interest (usually the atomic charge) to a molecular electrostatic potential (MEP) obtained from ab initio, density functional, or semi-empirical wave function methods. However, the individual contribution of atoms in a molecule to the polarizability is not physically observable; it is only the molecular polarizability that can be measured.[30] Still, po-

5 Judging from publication dates of the articles mentioned above (ranging between 2003 and 2007) and the year of publication of the review article itself, the progress in describing ions seemed to be relatively slow. knitting the reviews and perspectives together 21 larization may be described entirely in terms of classical electrostatics, and as such, it should be distinguished from the dispersion interaction that arises from instanta- neous fluctuations of molecular charge distributions, which is entirely quantum in origin.[10] Even though the overall molecular polarizabilities are recovered exactly in the ab initio approach, determining such atomic polarizabilities from quantum mechanics is not trivial. On the other hand fixed charge force field simulation tech- nology has become rather mature over the years, therefore leaving much room for development in the parameterization process of polarizable force fields. Cisneros et al. expected therefore to see growth of polarizable force fields along with their in- creasing application to unconventional molecular systems, mainly where traditional force fields have been challenged.

In3.52015TheBaker rise of made new an challenges overview on the systems to which polarizable force fields have been applied as well as the pro’s and cons that arise due to the inclusion of polarizability effects.[11] By the time of this publication a shift in the application and coverage of polarizable force fields has occurred, as parameters for fully opti- mized polarizable force fields were being published along with associated code im- plemented in standard simulation software. Together cover AMBERff02, AMOEBA, CHARMM Drude and CHEQ all major classes of biomolecules6 (either with com- plete or partial coverage),[72, 99, 8, 51, 115, 116] but Baker notes not all teething problems have been eliminated and the relative slow sampling rate employing po- larizable force fields remained an issue. For future directions in the development of polarizable force fields, Baker categorized issues that need more attention into three groups: parameterization, sampling and protein-ligand binding. One of the challenges for (both polarizable and non-polarizable) force field devel- opers should be the identification of any weaknesses in parameter sets. Automated schemes for the optimization of force field parameters circumvent the need of man- ual involvement, thereby reducing human errors and resulting in systematic and reproducible parameters.[11, 117, 118] The following two studies show successful applications of such automated parameterization protocols. First, the force balance method for obtaining a single parameter set has shown to be able to optimize pa- rameters for a simplified AMOEBA water models.[119, 120] Secondly, QM target data has been used to parameterize small molecules in an automated manner for both polarizable and non polarizable force fields.[121] The second challenge addresses the slow reputation of polarizable force fields. In absolute terms this is neither true nor false, but compared to the non-polarizable counterparts this statement is true. Of course this issue also depends on the research question, as to whether a more accurate description is more important or adequate sampling. Enhanced sampling methods and Hamiltonian replica exchange between simulations of two different force fields are proposed by Baker as a possible resolve for the sampling problem.[11, 122, 123] The last challenge of protein-ligand binding is actually a combination of both the previous issues, as the challenge come from both the accuracy of the force fields and the quantity of sampling. The free energy estimates will be poor if the two

6 Major classes: Proteins, Nucleic Acids, Lipids, Carbohydrates knitting the reviews and perspectives together 22 previous issues are not accounted for. At the time of publication, polarizable force fields were not being applied large scale free energy calculations due to the lack of parameters for a diverse range of small molecules. However, the AMOEBA method (where a set for small molecules was already available) displayed the capability to produce reliable estimates of experimental binding free energies.[124] As a final remark Baker states that if automated methods for generating force field parame- ters become successfully applicable to polarizable force fields, it is likely that the accuracy of calculating binding free energies will be further enhanced.

In3.62016ParameterizationLemkul et al. or described polarization? the latest developments in the Drude force field parametrization and application.[12] The main focus is placed on the Drude-2013 polarizable force field for proteins, DNA, lipids and carbohydrates. A set of small organic compounds have been used as target data for the parameterization process. QM calculations and condensed-phase experimental data were targeted during the iterative refinement protocol that previously was developed for the parameteriza- tion of the additive CHARMM force field, which yielded a transferable parameter set across molecules.[125] To achieve such a robust model and transferability, the parameterization protocol involved flexible tuning of Thole screening factors on a per-atom basis, the use of atom-pair-specific LJ parameters, and scaling of gas-phase polarizabilities based on the nature of the model compounds.[12] Shortly going back to Baker, as he noted that compared to the non-additive counter parts polarizable force fields tend to include more individual parameters per atom, which is of course inherent to the more complex additive potential func- tion. [11] Nevertheless, granted that polarizable force fields represent a more phys- ically sound potential, atoms should be more adaptive to a diverse set of surround- ings. The actual implication should therefore be that less distinct atom types are necessary for a polarizable force field than the non-additive counterparts. As of 2015 the CHARMM Drude polarizable force field required as many distinct atom types as the non-polarizable analogue, which indicates towards a point of future improvements.[126] By 2011 the parameterization of small molecules and ions for the CHARMM Drude model seemed as good as finished.[127] However, the full release of the Drude-2013 force field, was delayed (as the name speaks for itself) by 2 years. The extra time required for the fully applicable polarizable protein force field to become available was caused by challenges associated with non-additive effects that arose when going from small molecules to larger, polymeric macro-molecules.[12] For instance, the interactions between positive and negative charged residues showed small imbalances, as these were not treated at the small molecule level. Conse- quently, this lack in description of multiple charged residues caused nonphysical interactions that could result into a polarization catastrophe. knitting the reviews and perspectives together 23

The3.7 amountEnough of response: reviews how specifically far away? aimed at the development of polarizable force fields remained relatively sparse, until a burst of three consecutive publications appeared in 2019. Before I will discuss these three recent literature studies in further detail, it might be a good idea to wrap up the conclusions, challenges and issues made thus far. After this pre-evaluation we hopefully find breakthroughs to the problems in the publications of 2019. In the early 2000 a general-purpose polarizable force fields was still far from re- ality and the development was rather limited compared to the fixed-counterparts. Mainly, because the parameterization of polarizable force fields was much slower with considerable more computational costs. Another issue associated with the pa- rameterization process is the non trivial way atomic polarizabilities are obtained from quantum mechanical data with the addition of the need for less distinct atom types than the non-additive counterparts. As the development of methods to ac- count for polarizability effects started to move on apace many other issues arose, such as how to handle anisotropic interactions, charge penetration effects, dynamic ionic systems and the limitations of other representations in the potential energy function. Finally, challenges associated with non-additive effects arise when go- ing from small molecules to larger, polymeric macro-molecules and/with multiple charged residues. Apart from the often bleak but hopeful outlooks, considerable progress was made towards a generally applicable polarizable force field for biomolecular simulations. As automated schemes for the optimization of force field parameters as well as enhanced sampling methods and other algorithmic improvements began to soften the computational burden of development and application of the more expensive polarizable models. And at last, complete polarizable models appeared of which the CHARMM Drude and the AMOEBA force fields are two examples that cover a large array of system types and also have been through various iterations of testing and improvements.

By3.8 2016The, however,last perspectives considerable efforts are still required to address all problems associated with the inclusion of polarizability in one package and the amount of up-to-date software is still limited compared to the number of non-polarizable force fields. What has become clear, at this point, is that the fixed-point charge treatment has come out of due and polarizable force fields start to ascend in the mainstream of the biomolecular simulations. Obviously, there are still pieces of the polarizable force field puzzle that have been omitted so far. Therefore, we will quickly continue with our final course of reviews and afterwards I hope to construct a clear basis of how (some of) the current issues have been resolved and what challenges still need to be tackled in the coming years.

Jing et al.

The first review of 2019 was published by Jing et al. and as always, the review 3.8.1 provides an update on the recent advances and applications, as well as future di- knitting the reviews and perspectives together 24 rections of polarizable force fields in biomolecular simulations.[13] Advances in the utilization of computational resources, especially the application of the graphical processing unit (GPU), have made microsecond MD simulation using polarizable force fields accessible.[128, 129, 130] The accuracy and coverage of polarizable force fields have also improved in recent years, see figure 6 for the polarizable force fields easily accessible and applicable for biomolecular simulations. The AMOEBA force field has recently been extended to DNA and RNA and has also become more flexi- ble with sampling in different environments.[131] The CHARMM Drude force field has been refined for DNA and recently also includes carbohydrates and halogenated molecule.[84, 132]

Schematic of polarizable force fields for biomolecules and available software. Fig- ure courtesy to ref. [13] Figure 6: With regard to the development in systematic and automatic parameterization approaches, Jing et al. expect that these protocols will profit from the continuous advancement in QM methods and machine learning (ML) approaches that become more frequently adopted in chemistry.[13] According to Jing et al. the combina- tion of polarizable force fields with enhanced sampling methods has not yet been explored fully, but starts to gain more interest. This field of methods that poten- tially can extend the simulation timescales of large biomolecular complexes, is il- lustrated with examples of orthogonal space sampling, Markov state models, and mile-stoning.[133, 134, 135]7 Still, the authors note that further improvements of the underlying physical models are necessary, particularly for short-range interactions such as charge penetration (CP) and charge transfer (CT).[44, 42]

Bedrov et al.

In May of 2019 Bedrov et al. published a review (with over 435 references herein) 3.8.2 in which they discuss the different polarization models in MD simulations of ionic materials, a subject that is important for many applications in chemistry, biology, as well as energy storage and conversion.[14] Subsequently, the review provides the pros and cons of the different polarization treatments for highly ionic materials. Furthermore, the authors address and compare the methods and strategies for the

7 Further strengthened by the recent publication of Celerse et al. in which the steered molecular dynamics methodology has been implemented in the framework of the massively parallel Tinker-HP software allowing for both long polarizable and non-polarizable MD simulations of large protein.[136] knitting the reviews and perspectives together 25 parametrization of polarizable models. Some of the aspects and case studies they discuss are beyond the scope of our discussion that aims to encompass issues spe- cific to biomolecular simulations.8 Nevertheless, the strategies for the development of polarizable models presented in this review are equally relevant to our premise of investigating biomolecular systems, as the physical models and challenges depend upon the same fundamental basis of modeling a many-body response property. As discussed several times before, the treatment of short-range interactions is a multi-faceted subject. Compared to ab initio methods force fields often fail to give good approximations at the short-range polarization regime.[137] Solutions such has incorporating CT and CP effects, nonlinear behavior of short-range in- duction and or other non-classical effects into the potential have been proposed to overcome this type of difficulty, however, less involved adjustments have also been utilized.[14] As mentioned in the theory section, one method that is often utilized is the addition of Thole screening functions to correct for the penetration energy when molecules closely approach each other (countering the "polarization catastrophe").[74, 138] Alternatively, the screening functions may be replaced by 1−2, 1−3 exclusions (bonds and angles) for the interaction of induced dipoles, similar to how such exclu- sions already have been used for in the Coulomb and Lennard-Jones interactions.[14] Another strategy (to overcome the “polarization catastrophe”) is to damp the linear dependence between the polarization and the electrostatic field.[139, 78] Lastly, to account for the variations in the short-range interactions by fluctuations in the elec- tric field strength, Gaussian electrostatic models (GEM) have been developed to accurately reproduce (quasi exact quantum) permanent electric fields.[140, 141, 142, 143] The poor dynamics previously mentioned about simulating ions can be attributed to the anisotropic water interactions with ionic liquids.[144] In addition, multiple possibilities for (bifurcated) hydrogen bonding cause competition with the inter- ionic interactions. Evaluation of this complex behavior is often done by the cal-

culation of the free energy of hydration (∆Ghyd). In the past methods to im- prove these types of description introduced NBFIX terms.[126, 127] This approach has been applied to describe various ion−ion parameters in the context of both the additive CHARMM36 and Drude polarizable force fields as well as for other force fields.[145, 146, 147] In more recent studies the Drude model has been ex- tended while using NBFIX terms with molecular ions representatives of charged moieties in biological macromolecules.[84] For the AMOEBA potential a system- atic optimization of atomic ions and molecular ions as well as more recently lan- thanides have been reported by means of a cation specific parameterization of the Thole polarization damping model in combination with an ab initio bottom-up strategy.[111, 148, 149] Interactions due to induced polarization have shown to play an important struc- tural and dynamic role in the description of ILs as well as the solvation and trans-

8 Due to the extensive and vast amount of topics discussed in this review, I will cherry pick a few key points that the rest of the discussion will benefit from, even though much more interesting and valuable topics are presented in the review by Bedrov et al. The (omitted) case studies are about several classes of bulk systems that include: dilute and concentrated electrolytes with the focus on emerging battery applications, bulk ionic liquids, and electrolytes at charged surfaces. As a side note within the footnote, in the latter class isotropic approximations are no longer valid and explicit inclusion of polarization of electrolyte and the electrode can be important. knitting the reviews and perspectives together 26 port of small metal cations. In addition, the interactions of the ions with co-solvents or solutes improves with a polarizable description. Alternative approaches origi- nating from non-polarizable simulations, e.g., scaling charges or reparametrizing Lennard-Jones interactions incur a trade-off, since these byways only model an av- eraged effect of the induced dipoles. The collective dynamics and sometimes even the molecular dynamics of ions may agree with experimental observations, how- ever, local interactions such as hydrogen bonds are affected by the down-scaling of charges.[14] Bedrov et al. end their discussion with a high note, as polarizable force fields for ionic liquids from various groups are currently under development and will be available soon to the computational community, enabling large scale simulations with high performance packages such as NAMD for Drude approaches or Tinker-HP for AMOEBA and other point dipole induced models up to Gaussian- based electrostatics ones.[150, 151, 14]

Melcr and Piquemal

In the following perspective published at the end of 2019, Melcr and Piquemal dis- 3.8.3 cuss how and when the accuracy is affected if the simulation accounts for electronic many-body polarization.[16] There are three main sections: the first part highlights a revival of the implicit treatment of electronic polarization via electronic contin- uum correction (ECC), which introduces almost no additional computational cost. Despite the approximate character of the ECC approach, lipid force fields inlcud- ing polarization in such way have shown to accurately describe interactions be- tween phospholipid bilayers and cations as well as with other lipids and negatively charged phosphatidylserine.[152, 153, 154] For several different types of systems the ECC approach has yielded promising results, allowing for realistic simulations of, for example, membrane proteins at physiological ionic conditions.[16]9 However, the implicit ECC approach can only be used beyond its approximations when ad hoc interaction terms are introduced, which makes the method not gen- erally applicable.[155] For that reason the implementation of explicit methods that account for electronic polarization and/or effects beyond polarization become nec- essary. The subject of how to capture effects beyond electron polarization is being addressed in the second part of the perspective. This particular matter has often been addressed in the outlooks of the other reviews, therefore, will this topic be of special interest in our discussion. We have seen before in the review of Ren et al. and Jing et al. that short-range interactions (CP and CT) will and should start to play a more important role in biomolecular simulations. An exemplary field of where these effects play an essen- tial role is in the simulation of metalloproteins.[16] However, the anti-cooperative interactions (i.e., the total energy being lower that the purely additive contributions) involved with divalent metal cations are difficult to capture by electronic polariza- tion models. Efforts have been made to improve the physical description of these potentials by including the short-range interactions. One of the first of such force fields is the Sum of Interactions Between Fragments Ab Initio (SIBFA), a multi- pole based force field that explicitly incorporates many-body charge transfer effects [156, 105, 157] and more recently also penetration corrections in the electrostatic

9 For the extensive list of references to all the systems that have shown to yield good results with the ECC approach, consult the references in the perspective of Merlcr and Piquemal.[16] knitting the reviews and perspectives together 27 description.10 Originally SIFBA was developed the normally difficult to handle di- valent ionic metallo-protein systems,[159, 43, 160] but has been further extended to halogens and DNA.[161, 162] Melcr and Piquemal notice that improvements in cap- turing correct physics is a general trend in current developments.[16] Like SIBFA, but more recent, the AMOEBA force field developed into the AMOEBA+ model, which is a potential that accounts for short-ranged physical effects.[163] In addition, the authors note that several other novel developments of general polarizable force fields start to appear.[164, 165, 166] 11 Lastly, the authors pose the question whether polarizable simulations have be- come computationally tractable by 2019. Advancements in computational power, theory and algorithms have resulted in practical and achievable simulations with ex- plicit polarizable dipoles that comprise of complex and relevantly sized systems.[169, 14, 170, 171] Another noteworthy achievement are the time scales that have been sampled with explicit polarization models, as they have become on par with the fixed-point charged counterparts due to efficient implementations using fast al- gorithms for the calculation of binding free energies.[12, 136, 129] Furthermore, fully variational polarizable embedding have also been realized in hybrid QM/MM molecular simulations. Melcr and Piquemal expect that these hybrid explicit polarization/ECC simu- lations will become important for the multi-level global treatment of polarization "across very large complex molecular systems", which is considered to be important for the computational modeling of lipid membranes.[172, 171, 16] More use of the ECC approach has also been suggested by Bedrov et al., since the ECC approach is presented as an alternative to the computationally demanding explicit modeling of electronic polarization.[14] As a final note before moving to the last review, I quote the Mercl and Piquemal in which they give an answer to why accurate biomolec- ular simulations should account for electronic polarization: "Biomolecules in the real world cannot turn off their polarizability. Hence, molecular dynamics simulations, which aim to give a realistic, robust, and predictive results, cannot afford to neglect this important contribution to the electrostatic interaction."[16]

Inakollu et al.

The last review of 2019 (published as an arXiv preprint) by Inakollu et al. has 3.8.4 made the effort to summarise the recent developments in accounting for essen- tial biomolecular interactions of those described by polarizable force fields and the accompanied applications to elucidate difficult biological questions.[15] The description of highly anisotropic cation-π interactions (commonly occuring in pro- teins between aromatics and charged amino acids) has recently been improved in the Drude-2013 model by introducing atom pair specific Lennard-Jones parame- ters along with virtual particles as selected ring centroids.[173] On the other hand, the multiple AMOEBA model (which has inherently less trouble with describing anisotropy) only needed to derived parameters for aromatic interactions (including nucleobases) based on comprehensive studies of molecular structures, energetics

10 For more information on how this ab initio type force field is developed and compared to other ab initio force field families, I recommend the following perspective published in 2018 by Xu et al.[158] 11 Other noteworthy and novel methods that include short-range interactions include a non-variational bond capacity model and a QM/MM combination of fluctuating charges and fluctuating dipoles.[167, 168] knitting the reviews and perspectives together 28 and liquid phase thermodynamic properties, and by systematic comparison with both QM calculations and experimental measurements.[174, 131] Recent progress in protein simulations showed that the accuracy in protein struc- ture refinement and the description of intrinsically disordered proteins (IDPs) con- formational ensembles improved with the inclusion of explicit polarisation. 12[15, 175] Without polarization effects the the sampling of IPDs yielded inconsistent en- sembles due to the protein structure and dynamics being affected significantly by induced polarization effects.[176, 177, 178, 179] However, the study employing a polarizable model also mentioned the difficulty to sample the native structures in the selected proteins.[175] The authors suggest that future work should address this issue by further refinement of the parameters. One avenue that recently has shown to be important for the simulation of IPDs, is an improved description of dispersion forces.[176] The inherent of the description of the force field potential can also be improved upon, on such component that recently has shown systematic improvements is the van der Waals interaction. Inakollu et al. (re)address the issue of how polarizable models should account for London dispersion interactions.[15] A known issue with

polarizable force fields is the overestimation of molecular C6 dispersion parame- ters due to static charges and dipole-moments that are typically being closer to their gas-phase QM estimates than the additive force fields.[180, 181] New methods have been developed to derive dispersion parameters, which will eventually make polarizable potentials more transferable.[85, 182, 183] For example, Visscher et al.

showed that by combining C6 and C8 attractive terms together with a C11 repul- sive potential results in satisfying models and that explicit inclusion of higher-order dispersion terms could be a viable choice for the future of polarizable force field development.[182] Furthermore, Inakollu et al. highlight that QM/MM MD simulations are power- ful methods to study how the environment affects the reactivity or spectroscopic properties of a critical component.[15] However, this approach has shown to be prone to imbalances between the interactions of the QM and MM part caused by the enhanced partial charges in additive force fields. Polarizable force fields have been hypothesised to fill in this gap between the two parts. Multiple attempts have been made to use a polarizable part instead of a classical force field.[171, 184, 185, 186] However, the use of polarizable models resulted in poor hydration free energies compared to pure classical results. Another QM/MM study investigated the Claisen rearrangement in chorismate mutase and the hydroxylation reaction in p-hydroxybenzoate hydrolase, while us- ing a polarizable model as the MM part. This resulted in only moderate effects in the activation and reaction (free) energies.[186] The authors, therefore, concluded that further validation work is required to establish the best QM/MM-based proce- dure for handling polarization effects in enzymatic reactions. On the other hand, combined QM and polarizable force field (AMOEBA) simulations have shown to be an attractive method for the prediction and elucidation of infrared spectra mea- surements in solution and also for the calculation of spectra from biomolecular systems.[187] Inclusion of explicit polarization improved the sensitivity of the spec-

12 The widely used AMBER99SB7 and CHARMM36/CHARMM36m5 were selected to represent classical force fields, while AMOEBA-2013 and Drude-2013 were used to represent polarizable force fields. learning with machines 29 tra to the environment by proper inclusion of solvent-solute interactions such as hydrogen bonds.[188, 189, 190] As final remark Inakollu et al. note in the outlook that more systematic validation is required to advance the current polarizable force fields, including both the un- derlying physical models and their parameterization procedures. Extra emphasis is placed on the development of automated and systematic parameterization tech- niques, as this direction has recently shown encouraging advancements. Neverthe- less, the authors also celebrate the fruitful and insightful applications of polarizable force fields that provided many new insights into biological processes.[15]

Have3.9 A the new developments hope: the next-generation since 2016 provided force fields us with more insight into how to solve the polarizable puzzle or did we end up with an even more complicated picture to complete? Again the coverage of polarizable force fields has increased and the sim- ulations themselves have also become more tractable and easier accessible through the distribution of software-packages such as NAMD, OpenMM and Tinker-HP. Pa- rameterization protocols are being automatised. Not only the matter of including polarization effects are tackled, but also CP and CT are being accounted for by im- proved potentials such as SIBFA and AMOEBA+. In addition, improved anisotropic description of charges have been included in the latest Drude model. Finally, we have seen revisions of the the van der Waals interactions by including higher-order dispersion effects. Considering all this progress, a general trend can be observed: all five major omissions (see figure 2) made by the fixed-point charge models are being addressed in the recent years. This is a hopeful notion, but the holy-grail of a computationally cheap force field that is also physically accurate remains to be a hidden treasure to be unveiled. Thus, what has changed in the outlook on polarizable force field developments during the last 20 years? Researchers are still aiming for better physical models. The sampling problem remains an unsolved problem. The description of ions, hydrogen- bonds and intrinsically disordered proteins has improved, but still leaves much room for improvement. More validation has to be carried out, as not all new polar- izable force fields have been bench-marked against the same tests in a systematic manner. Finally, we can learn more from advancements made by QM methods and more recently also by ML techniques. The latter direction might be considered as a so-called buzzword, however, we are interested in how this field full of algorithms and experience with complex puzzles can aid us to overcome all the challenges men- tioned above and before. The final part of this study will therefore investigate this fashionable avenue of ML for the development of general use biomolecular force fields.

As a general observation from the discussion above, there are two main challenges associated4 learning with the withdevelopment machines of polarizable force fields: The first obstacle is learning with machines 30 how to determine transferable and accurate parameters sets, and second, how to improve the underlying physical model while maintaining reliable sample rates of phase space. Coincidentally or not, these questions are intertwined very closely, as the solution to both requires a balance between the complexity of the atomic system with the simplicity of its own description. The latter statement or rather proposed challenge is very analogous to what Dirac already established in 1929: “The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be solu- ble. It therefore becomes desirable that approximate practical methods of applying quantum mechanics should be developed, which can lead to an explanation of the main features of complex atomic systems without too much computation.”[191] A 2019 review by Noé et al. about machine learning (ML) for molecular sim- ulation used the same quote in their introduction to illustrate how Dirac’s quote remains state-of-the-art, even after more than 90 years.[192] Nevertheless, in the last decade a fresh wind has begun to blow, as the application of ML techniques is becoming an integral part of the molecular simulation community. The potential of ML is especially promising for the development of approximate methods that aim to understand complex atomic systems while circumventing the direct solution of those equations that are “much too complicated to be soluble”. If we let the latter state- ment sink into our minds it is not so strange that Noé et al. envision that in the near future ML will transform the way how we study molecular systems in silico.[192] So, why let the machines learn? At the core ML is a data mining technique used to create data-derived models. More specifically, that is, molding a large data set into a mathematical expression for the extraction of complex patterns and relationships. Subsequently, these models allow us to predict specific properties of the data. How does such a procedure work? In the most simple case, the “machine” is trained on a large number of examples and “learns” how to interpret the examples such that the “machine” knows what to do when presented with unseen observations.[192] The link to atomic and molecular systems should be clear now, since we know there exists an structure property relationship that connects atomistic configurations to chemical properties.[193] Thus, the machine should learn how to connect the latter to the former without solving the first principle equations. However, there is as always an if, because this operating principle will only succeed if the “machine” has “learned” enough examples. A big challenge to apply ML to chemistry is therefore the formulation of prob- lems associated with the simulation of molecular systems as machine learning problems.[192] Obviously, one can consider many different approaches for many different types of problems, but that is only logical as there are also many different aspects of molecular simulation. In our case, which is the development of polariz- able force fields, we are mainly interested in acquiring an improved description of the physical molecular model, albeit with less or equal computational effort com- pared to the more expensive models with less accuracy. In practice that means the type of ML we are interested in can be used to (i) calibrate and correct the results of physics-based models to account for some of embedded systematic errors, (ii) aid traditional modeling and simulation methods; and (iii) assist the advancement of new physics-based simulation methods.[194] But, learning with machines 31 that is not to say ML techniques are not also having a great influence on many other areas of chemistry. Methods based on ML can be used to explore the vastness of chemical or for the design of new compounds. Some specific examples that are mentioned by Noé et al. include the direct predictions of physicochemical or phar- maceutical properties, or the design of materials and molecules with specifically tailored properties as well as many other possibilities.[192]13 In the following sections I will make an attempt to highlight the power of ML in the context of the most persistent problems and difficult challenges in the devel- opment of polarizable force fields. I will therefore mainly address physics-based ML approaches for molecular simulation. However, I will occasionally blend some non-ML studies to emphasize the problem or contextualize the subject at hand.

As4.1 discussedReplace thein the functional theory form section, with the machine functional learned form force of fields a force fields is estab- lished by beginning with the construction of various types of interaction potentials (bonded and non-bonded) and then secondly fit the parameters belonging to this function to some reference data, either experimental or ab initio. Most of the time the functional form is determined before the reference data is chosen, however, what if we replace the classical function by a deep neural network (NN) or some other machine learning model.[197] The artificial model can be trained subsequently on the available data to learn how to reproduce target properties, which are most com- monly the energies and forces obtained from ab initio calculations.14

Schematic illustration of the ANI-1 potential versus DFT calculations. The ANI-1 neural network is trained on he total energy of a molecule, ET , and is after training Figure 7: orders of magnitude faster than DFT computations. Figure courtesy to ref. [199]

13 The latter two have been the subject of two recent reviews.[195, 196] 14 Often a large amount of quantum chemical calculations are used to train such machine-learned force fields, but in principle experimental data could also be included.[198] learning with machines 32 In the last decade the ML approach to force field design has already shown big advances. One such striking example is the ANI deep NN potential, the first gen- eration of its kind has been trained on a database of "millions" of QM calculations consisting of more than 57000 small organic molecules, see figure 7 for a schematic comparison of the potential with DFT calculations.[199] The result is an very accu- rate interatomic potential, of course only relative to the accuracy of the underlying QM calculations. However, there is one major difference compared the QM compu- tations, the NN potential is orders of magnitude faster. In addition, the framework could be extended for the use of actual molecules larger than those in the training data, emphasizing that machine-learned potentials have already become an attrac- tive substitute to expensive QM methods for simulating structural properties of, for example, drug-like molecules.[200] One of the main advantage of this function-substitution is that NNs are univer- sal function approximators, and thus have the abillity to try and evaluate a large number of possible functional forms. So, instead of being constrained by a prede- fined energy function the NN can in principle incorporate multi-body correlations (e.g. polarization effects) that are generally ignored in classical force fields (unless specified explicitly at a higher cost). Knowing all the benefits, one might wonder whether these NN potentials are actually computationally tractable? Well, a big challenge to make NN potentials widely applicable resides in the associated computational cost.[201] Even though the calculation of energies and forces with a trained NN potential is several orders of magnitude faster than using quantum chemical calculations, the NN approach cannot (as of now) surpass the speed of standard classical force fields. However, Noé et al. believe that as the field of ML force fields evolves the software for prac- tical applications and MD simulations will become a more applicable alternative to methods used in the present. In addition, the same arsenal of enhanced sampling methods15 (that also should be used more often in combination of polarizable force fields) could equally well be applied to ML force fields for the simulation of longer timescales and more extensive systems. Still, there is another downside associated with the increased flexibility of NN mentioned above, since copious amounts of data is required to train machine learn- ing models. And if the training data-set is not "complete" , the resulting model may preform poorly at regions of conformational space where no training data has been provided. From this point of view the ML-model is only as good as the training data that is available. However, this heuristic problem is nothing new compared to the classical parameterization approaches. Since, parameterization approaches are often based on the assumption that the fitted properties of small molecules can be seamed together to simulate large protein structures by assigning atom types to each parameter.[204]16

15 e.g. [202, 203] 16 However, as we have seen before and other studies have shown as well that the non-bonded parameters in a force field, such as the electrostatic and LJ terms, are dependent on additional effects that are usually not included in the atom’s type, such as local polarization effects.[80, 205] learning with machines 33

Another4.2 A different player using take on a polarizabledifferent angle force that fields stayed for a while under the radar of the regular polarizable force field development is the FFLUX model,17 which was called formerly QCTFF.[206] What FFLUX sets apart from all other classical models is its completely different architecture that avoids (harmonic) potentials for bonded, valence and torsion angles. Furthermore, the convention of partitioning the poten- tial in bonded and non-bonded parts is abandoned. Poperlier et al. argue that this sharp distinction does not properly reflect the complexity of the atomic interactions found in condensed matter.[207] One prime example that is especially important for the simulation of biomolecular system is hydrogen bonding, since this type of ’bond’ is probably the oldest type of interaction to challenge the artificial distinction between bonded and non-bonded interaction. Instead, FFLUX is a force field po- tential that utilizes the machine-learning method kriging (also known as Gaussian process regression) and the topological energy partitioning method called Interact- ing Quantum Atoms (IQA).[208, 209, 210] Just like the many other approaches, FFLUX attempts to address the same key issue, namely to provide a more realistic description of electrostatics than the point- charge models can offer. Towards this end plays machine learning a crucial role for the FFLUX model. In contrast to other multipolar based force fields18 estab- lishes the ML component a direct link between an atomic multipole moment and the nuclear positions of surrounding atoms. Initially proof-of-concepts such as ge- ometry optimizations have been published, [207, 210, 211] although more recently these developments have lead to simulations of a flexible, polarizable water models with multipolar description of electrostatics.[212] Currently, the posibilities of adap- tive sampling is being investigated and soon simulations of oligopeptides will be explored with the FFLUX model.[213]

The4.3 particularAre transferable assumption parameters of building an universal a transferable requirement? force field described above has been challenged and viewed from a different angle by Allen and co-workers. This group has developed a new class of QUantum mechanical BEspoke (QUBE) biomolecular force fields, which derives non-bonded terms that are specific to the system of interest.[89] This particular parameterization protocol of QUBE is based upon the partitioning of atoms-in-molecule density to subsequently derive the non- bonded component of the force field.[85] The bonded components (bonds and an- gles) rely on only the quantum mechanical Hessian of a molecule, while the torsions were fitted with QM dihedral energy scans.[214, 215] To actually scale the force field to arbitrarily large system sizes and to treat charge and Lennard–Jones parameter derivation in a consistent way, the QUBE nonbonded parameters are derived from a single QM calculation. Furthermore, the ONETEP linear-scaling density functional theory code has been added to the parameterization protocol such that the QUBE non-bonded parameters can derived for large systems, including proteins. In addi- tion, improvements to the electrostatic potential of system-specific small molecule

17 (corrects to electrostatics in biomolecular simulations...) 18 E.g. polarisable point dipoles models employing multipolar electrostatic such as SIBFA and AMOEBA where errors are typically overcome by damping functions.[210] learning with machines 34 included a new method to add off centre point charges to account for anisotropic effects. In addition, the tailored approach has been recently released as the QUBEkit package written in python, which is parameter derivation toolkit that enables the automated generation of system-specific small molecule force field parameters di- rectly from quantum mechanics.[216] See figure 8 for a schematic illustration of the functionality and capabilities of the QUBEkit. As a proof of concept, a complete set of parameters for 109 small organic molecules has been rederived and tested by comparing the simulated liquid properties with experiments. QUBEKit is com- petitive when compared to standard transferable force fields, as the method yields unsigned errors of 0.024 g/cm3, 0.79 kcal/mol, and 1.17 kcal/mol for the liquid density, heat of vaporization, and free energy of hydration, respectively.

Schematic illustration of the QUBEkit python package workflow. QUBEKit au- tomates the derivation of the bond, angle, dihedral and non-bonded force field Figure 8: parameters for small organic molecules and provides simulation-ready input files in BOSS, GROMACS, or OpenMM format. Figure courtesy to ref. [216]

In short, by efficiently tailoring the force field parameters to the system under study, the assumptions of transferability made about each parameter involved are significantly reduced. Furthermore, the methods is able to scale towards larger sys- tems. As the developers have also displayed that their system-specific QM force field can yield accurate conformational preferences of short peptides and could out-preform the OPLS-AA and OPLS-AA/M force fields in terms of reproducing experimental J coupling values in protein simulations of GB3 and ubiquitin.[89] In addition, the QUBE protocol has been deployed at the L99A mutant of T4 lysozyme, subsequently the the model out-performed the OPLS-AA/M force field in the pre- diction of absolute binding free energies of six benzene analogs to the protein (0.85 vs 1.26 kcal/mol mean unsigned errors).[217]

The4.4 mostThe difference obvious disadvantage between derivation of tailor-made and prediction force fields is the resulting system specificity (similar to over-fitting) as well as the required costly underlying QM cal- culations. Alternatively, ML approaches have shown a very promising way around the expensive calculations required for the parameterization procedures, instead the one such ML model relies only on the molecular environment as the starting point (see figure 9 for the workflow of the protocol).[218] Bleiziffer et al. developed a ML model that predicts the atoms-in-molecule charges for fixed-charged force fields with nothing but the 2D molecular topology as input. Such a simple descriptor thus learning with machines 35

Workflow for predicting partial charges extracted from DFT electron densities. The partial charges predicted by ML are not dependent on the 3D conformation in Figure 9: contrast to the those obtained by fitting to the electrostatic potential (ESP). Figure courtesy to ref. [218]

supports only a SMILES string as the necessary input to predict the partial charges of a new molecule.[219] The key to obtain accurate atomic charges was achieved by training the ML model on a large and diverse data set of approximately 130,000 lead-like compounds. Afterwards, the trained model was capable to provide quick and accurate atomic charges of a test dataset of drug-like compounds with RMSE of less than 0.02 e. Furthermore, the ML-based approach gains a speed-up of several orders of magnitude compared to the original derivation of DDEC partial charges, the method used in the QUBE force field to obtain the atomic charges.[220] The work above has shown that ML-based approach can be constructed for the prediction of partial charges for classical fixed-charge force fields. Therefore one might wonder, how far away is the development of such a tool in the context of different electrostatic parameters predictions, such as the atomic polarizabilities, anisotropic polarizabilities, Thole scale factors, and lone pairs as well as bonded and van der Waals terms? Such approaches could especially provide much im- provement for the prediction of atomic polarizabilities for polarizable force fields, as the derivation of this term has been a long-standing problem. Toward this goal Heid et al. trained both a linear increment scheme and a mul- tilayer perceptron NN on a large data set of QM atomic polarizabilities and partial atomic charges, as input only the atom type and connectivity have been used.[221] The models predicted atomic polarizabilities and charges with average errors of 0.023 Å3 and 0.019 e using the NN and 0.063 Å3 and 0.069 e using the simple incre- ment scheme. Clearly, the NN approach outperforms the simple increment scheme by a factor of 3. What should be noted is that the algorithm relies only on the con- nectivities of the atoms within a molecule, thus omitting dependencies related to the 3D conformation. However, for general application this might actually be bene- ficial. The dependencies have been smoothed out with the large training set (made up of about 166.000 atoms in 10.000 molecules) used, the predicted electrostatic parameters are therefore more or less independent of geometry.19

19 A large data set prevent the trained model to indirectly rely on geometry due to the conformation dependence of the input vector. So, each structure element occurs in multiple molecules in different con- formations, the resulting parameters do therefore not depend very much on geometry and are according learning with machines 36 Both studies above require the involvement of large and costly QM derived data sets. Accurate and abundant reference data to train and apply such ML based approaches is not always easily available or within reach to those who are actually interested. Furthermore, the molecular polarizability, α, an essential ingredient in polarizable force fields, but also a property that is sensitive to the underlying electronic structure description.[224] This sensitivity introduces uncertainty in the process of obtaining accurate predictions of α, making the determination rather difficult and expensive. On the contrary, Wilkins et. al recently demonstrated that with using a symmetry- adapted machine-learning approach, it is possible to predict molecular polarizabil- ities for a diverse set of 52 larger molecules (including challenging conjugated sys- tems, carbohydrates, small drugs, amino acids, nucleobases, and hydrocarbon iso- mers) at an accuracy that exceeds that of hybrid DFT results.[225] Although, the approach utilizes a database of small organic molecules, the trained model can be used to predict larger compounds with an accuracy that rivals DFT and can be systematically improved by extending the training set. This study exemplifies how design of polarizable force fields could greatly benefit from the new ML-based methods that provide inexpensive atom-centered estimates of fully anisotropic α’s.

Back4.5 From to the small two studies molecules by Bleizifferto long range et. al interactions and Heid et al., these two studies a may proof themselves to be especially useful for applications aimed at small molecules. However, for the simulation of larger systems any change in the molecular envi- ronment could have detrimental consequences, for example, in the prediction of secondary structural features. A recent DFT study by Kondo et al. revealed by comparison of a minimal H-bond model with complete α-helices of short alanine peptides that hydrogen bond donors and acceptors are generally depolarized in the α-helices.[226] This depolarization effect has been attributed to repulsive interactions with the neighboring polar pep- tide groups in the α-helix backbone. Classical force fields lack this depolarization effect and yield results similar to those in the minimal H-bond models, resulting in overestimated H-bond energies in α-helices. This comparative QM study has pointed forward the chemical origin of (another) incompleteness hidden in classical force fields and also puts on display the possible (negative) consequences of using small molecules as target data during the parameterization procedure or rather the lack of long-range interactions. Hence, one fundamental challenge resides in the modeling of long-range inter- actions. In the context of ML approaches, for example, if only quantum calcula- tions on small molecules are used in the training of the force fields the interactions beyond the locality or scale of the training compounds could be missed or not included. So, the locality of ML force fields could be insufficient to capture the correct electrostatic interactions and or long-range van der Waals interactions.[227]

to the authors both general and transferable between different molecules.[221] Furthermore it should be noted that the atomic polarizabilities are indeed largely independent of the 3D structure of a molecule, but the partial charges have shown to change to some degree.[222, 223] learning with machines 37 The main reason being the slow decay, ~1/r, of the Coulomb interaction that makes convergence of the model virtually impossible when only using a local ML scheme. Usually, long-range effects are incorporated by an explicit separation of the local many-body contribution to the total energy from a classical electrostatic term ap- proximated via pairwise Coulomb interactions. For instance, atomic partial charges can be learned on local energy terms and at the same time linked to electrostatic interactions to obtain a total energy that is used during training.[228] However, cap- turing long-range electrostatic interactions without priors about the characteristics of the target is non-trivial with most methods. Many approaches have attempted to explicitly account for long-range interactions, such as Coulomb kernels, many-body tensor representations, and multi-scale invariants.[229, 230, 230] However, these frameworks are still built upon a global representation of the system. Recently Grisafi and Ceriotti have successfully challenged the locality ansatz of long-range physics by introducing a non-local representations of the system using an additive atom-centered model.[231] The crux of their framework is based on the atom-density potential used, which adopts the same asymptotic behavior as the electrostatic potential and folds the global information on the structure and compo- sition of a given system into a local representation. This work provides a conceptual framework to incorporate non-local physics into atomistic machine learning, but more effort is needed to establish a systematic and formal connection of this new long-range description with the description of short-range interactions, which has already reached a stage of maturity.[231]

In4.6 theEnough sections knowledge above we have to fold seen a protein? how ML techniques may be exploited to facilitate quick parameter assignment, but also how this approach may provide an alternative to expensive QM methods for simulating structural and physical properties of small molecules. Furthermore, much effort has been put into the design of transferable force fields that are trained on a large number of small molecules and could in principle be used to simulate much larger molecules, e.g., proteins.[199, 232] Even though there are still challenges to overcome such as proper inclusion of long-range effects, ML techniques are developing to become more than just a novel method for extracting force field parameters from large data sets, as they can actually learn the underlying QM potential energy surfaces. Despite the challenges laying ahead, the likelihood that machine-learned force fields will be widely used in protein simulations is very high.[192, 201] The first steps have already been made, for example, the ANI potential (the small molecule ML force field discussed in section 5.1) has continued its development by including QM calculations on water, amino acids, and small peptides in the latest genera- tion.20 More applications of (transferable) ML force fields to larger systems include polypeptide test simulations as well as the simulation of a 50 ns of a cellulose- binding domein protein (1EXG) in the folded state.[201] Nevertheless, Noé et al. remark that, as of yet, ML force fields have not been able to carry out actual pro- tein folding simulations (nor have been able to predict thermodynamic or kinetic properties of proteins).[201] At the moment there remain significant challenges that

20 see the github page for the most up-to-date development: https://github.com/isayev/ASE_ANI learning with machines 38 need to be overcome towards this goal. Thus, the answer to whether our generally applicable biomolecular potential should expand its physics-based know-how by including polarization effects or should just simply learn harder, remains to be seen.

Maybe4.7 Boltzmann the answer generators, above has a not been so given hypothetical a long machinetime ago anymore and have we been very delusional all along, because August Comte stated in 1830: “Every attempt to employ mathematical methods in the study of chemical questions must be considered profoundly irrational and contrary to the spirit of chemistry. If mathematical analysis should ever hold a prominent place in chemistry − an aberration which is happily almost impossible − it would occasion a rapid and widespread degeneration of that science.”[233] To verify our scientific trust in mathematics, lets see how times have changed and what we exactly have been aiming to achieve since the first Monte Carlo (MC) and Molecular Dynamics (MD) simulations of the fifties.[234, 1] To carry out insightful atomistic simulations we need in principle only two ingre- dients. [235] First, we need an accurate description of the inter-atomic interactions. This requirement can be represented by a single potential energy function U(x), where x denotes the all atomic coordinates of the system. Second, given U(x) we need a statistically sufficient number of realizations of x sampled from the so-called Boltzmann probability distribution P(x) to predict or estimate thermodynamic, equi- librium or kinetic properties of the system. The Boltzmann distribution is propor-

tional to exp[–U(x)/kBT], where T is the system temperature and kB is Boltzmann’s constant. Atomistic simulations in a nutshell: with statistical mechanics we aim to compute the average behavior of physical systems on the basis of their microscopic states. To illustrate this goal further and make our discussion a bit more concrete con- sider how we could obtain a solution to the following question: What is the proba- bility that a protein will be folded at a given temperature?21 To solve this problem we have to evaluate all equilibrium states, x, of the protein. This is extremely diffi- cult because x could hold the coordinates of hundreds of thousands if not millions of particles. We thus have to map a rocky energy “landscape” in a high-dimensional space represented by an exponentially large number of low-energy region that are separated by steep hills whose energies are way above the minima at room tem- perature, see left-hand side of figure 10.[235] In the protein-folding example, this means we have to evaluate all possibilities of how to place all protein atoms in space, which is obviously astronomically many ways. In addition we also have to compute the probability of each such configuration in the equilibrium ensemble, and finally compare the total probability of unfolded and folded configurations.[236] To sample the Boltzmann distribution of a given system we currently rely on step- wise methods, such as MC or MD, that make tiny changes to x. If the procedure above has become clear, many simulation steps are needed to produce a statisti- cally independent samples. This problem is often dubbed as the sampling problem. What complicates matters even more is that these simulations get trapped in meta-

21 More generally, if we have solutions to such questions, we can learn a lot about the how molecules and materials work, but we could also design drug molecules and materials with properties defined beforehand. learning with machines 39

Representation of how z deep neural network learns a function Fxz mapping from a rugged landscape, P(x), to a smoother distribution, p(z), using sample Figure 10: configurations. Figure courtesy to ref. [235] stable (long-lived) states. For example, sampling a single folding or unfolding event with atomistic MD may take a year on a supercomputer.[236] If the situation were ideal, we could hypothesize there is a machine that given a U(x) can produce un- biased one-shot samples from exp[-U(x)], thus circumventing the sampling problem altogether. Thus far, such a fantastic machine has only remained a hypothetical construct. Nevertheless, this fantasy did not stop Noé et al. because they elegantly com- bined deep machine learning and statistical mechanics to develop the so-called Boltzmann generators that are able to identify transformations of x to a new coordi- nate z that creates a smoother search surface (see right-hand side of figure 10 ).[192] In reality, Boltzmann generators are trained on the energy function of a many-body system and learn to provide unbiased, one-shot samples from its equilibrium state. This is achieved by training an invertible neural network to learn a coordinate trans- formation from a system’s configurations to a so-called latent space representation, in which the low-energy configurations of different states are close to each other and can be easily sampled. In this fashion the Boltzmann generator can sample the micro-states directly from their equilibrium distribution. More specifically, the authors trained a deep NN to learn a transformation from x to z such that when z is sampled from a simple probability distribution p(z), e.g., a Gaussian normal distribution, the NN is able to map back, through the inverse transformation of z to x, the z onto a high-probability region of P(x). This net- work transformation, Fzx, should then provide an x with a high Boltzmann weight. This exact procedure has been coined the Boltzmann generator, see figure 11 for an illustration of its workflow. The breakthrough of this approach is that these trans- formations can already be learned with minimal information about the system, and even more important, no arbitrary reaction coordinates are required to proceed the sampling. As a benchmark demonstration of their approach, Noé et al. considered the challenging case of the bovine pancreatic trypsin inhibitor (BPTI) protein.[236] Not only could the Boltzmann generators sample with impressive efficiency key meta- stable conformation states and pathways linking them, which were not included in the training, they also yielded accurate free energy differences between these states. learning with machines 40

The Boltzmann generator works as follows: 1. We sample from a simple (e.g., Gaussian) distribution. 2. An invertible deep neural network is trained to trans- Figure 11: form this simple distribution to a distribution px(x) that is similar to the desired Boltzmann distribution of the system of interest. 3. To compute thermodynam- ics quantities, the samples are re-weighted to the Boltzmann distribution using statistical mechanics methods. Figure courtesy to ref. [192]

Note that this is no simple feat, as obtaining these states and their associated free energies using ordinary methods would have required extremely long and costly simulations.[236] To summarize and evaluate the applications of this method, Boltzmann genera- tors can overcome rare event–sampling problems in many-body systems by learn- ing to generate unbiased equilibrium samples from different meta-stable states in one shot. Furthermore, Boltzmann generators are able to effectively compute free energy differences from disconnected data, interpolate realistic pathways, and ex- plore configuration space, all without a reaction coordinate. In addition, Boltzmann generators include temperature dependence. With the inclusion of a reaction coor- dinate Boltzmann generators can effectively compute free energy profiles by con- necting data. Moreover, with this new development, Noé et al. reveal a promising new direction in the computational exploration of complex molecular systems. Nonetheless, additional challenges remain.[235] We have seen that Boltzmann generators reveal a promising new direction, but how will they perform in even higher dimensions?22 If the method were to scale up such that a much large confor- mational space can be represented in the latent space dimension, in case of explic- itly solvated biomolecules or defects in materials, Noé et al. note that the method is not ready for such demanding tasks. Enhanced sampling methods combined with Boltzmann generators could, possibly, offer the right type of relief to improve its performance.[235, 237] As of yet we do not know how Boltzmann generators will mix with current methods. To end with a bright note, these challenges have a very high likelihood to be overcome and when that happens Boltzmann generators may become the state-of-the-art sampling technique.

22 In the reported applications, the dimension of x ranged from two to several thousand and for BPTI, the dimensionality was kept low by simulating the protein in an implicit solvent rather than in a bath of explicit water molecules. summary: the red thread 41

Let us remind ourselves of the question we proposed to answer by conducting this study:5 summary: What has changed the in red the outlook thread on polarizable force field development dur- ing the last 20 years? We have seen that many researches have invested themselves to improve the underlying model, either by gradual improvement of earlier work or by redesigning the force field altogether completely from scratch. In either case, force field calibration remains laborious, often relying on manual adjustments, ad hoc remedies, or semi-automated methods that apply optimization strategies to an under-determined optimization problem.[238, 239, 118, 240] Even with the current computing resources available, the real general use polarizable force field has not yet manifested itself in the biomolecular simulation community. Along with the advancements of the last two decades we have come a long way towards the latter goal, see figure 12 for a respective timeline of the reviews, per- spectives and potentials discussed in this work. From no real proof for the neces- sity of including polarizability for accurate predictions, to a clear conviction of the contribution of polarizability to the electrostatic interaction.[5, 16] From a recom- mendation if one studies the charge energetics of protein interiors, to many appli- cations of polarizable force fields that have provided many new insights, such as the critical role of polarization for the stability of nucleic acids and proteins, base- pair flipping, ion distribution around DNA, diffusion, and permeation of small molecules.[6, 12, 13] Especially the latter application reflect the progress we have made, since only ten years ago the application of polarizable force fields to large biomolecular simulations was still far from routine due to the high associated com- putational burden.[7] We can thus say that the situation has changed rapidly and the much-needed advances most certainly happened. Still, there are aspects that have remained the same, for example, the development of advanced classical electrostatic model beyond simple polarization, such as CT and CP, is still ongoing.[9, 13] Further- more, the standard classical potential is being revised and refined consciously. Yet, simultaneously, more systematic validation is essential to overcome the limitations in the current models.[15]

Timeline of polarizable force field reviews and perspectives over the last 20 years, accompanied by the potentials mentioned in this study. Note that since 2010 Figure 12: the number of reviews already started to become more frequent, indicating more motivation of researchers towards the improvement of force fields. references 42 More recently, machine learning techniques are becoming integral to the develop- ment of more accurate potentials and faster sampling techniques. The classical func- tional form is being replaced by machine learned force fields, such as the ANI-1 and FFLUX potentials.[199, 212] In line with this deviation from the classical approach other approach appeared, for instance the QUBE biomolecular force field that ques- tioned the necessity of transferable parameter sets.[216] Instead, the QUBE force field derives system-specific parameters in an automated-fashion. ML approaches have shown a very promising way that completely circumvents the expensive calcu- lations required for the parameterization procedure by predicting charges, atomic and molecular polarizabilities.[218, 221, 225] Furthermore, efforts have been made to incorporate non-local physics into atomistic machine leaning to overcome the locality ansatz often made in ML models.[231] Finally, novel sampling techniques are being developed, such as the not so hypothetical Boltzmann generators that have the potential to truly overcome the sampling problem without the need of complicated reaction coordinates.[236] To finalise the present study, we have discussed the theory, history, methods, and applications of polarizable force fields. All in all, the quality of the force field poten- tial is detrimental for any type of sampling technique. Whether we are considering MD, MC, latent space, enhanced or biased sampling methods, if the underlying energy function contains flaws, they are one way or another embedded in the fast amount of data we seek to understand. Subsequently, we can identify two key chal- lenges associated with the development of polarizable force fields: First, how to determine transferable and accurate parameters sets, and second, how to advance the underlying physical model, while maintaining reliable sample rates of phase space. The classical avenue has finally made the transition towards the acceptance of the need to develop more physically sound models. While ML techniques may provide a new direction to elevate bottlenecks in the current process towards the development of an accurate potential. Furthermore, it should be stressed that ML techniques may be exploited to facilitate quick parameter assignment, but have also shown that they actually can learn the underlying molecular potential and provide new means to develop exciting and alternative sampling techniques. As a final remark, the gain of computer power is often completely overshadowed by the in- crease in power of algorithms. So, the journey towards the perfect polarizable force field should still be full of surprises.

[1] Berni Julian Alder and Thomas Everett Wainwright. Phase transition for a referenceshard sphere system. The Journal of chemical physics, 27(5):1208–1209, 1957.

[2] Daan Frenkel and Berend Smit. Understanding molecular simulation: from algo- rithms to applications, volume 1. Elsevier, 2001.

[3] Wilfred F van Gunsteren, Dirk Bakowies, Riccardo Baron, Indira Chan- drasekhar, Markus Christen, Xavier Daura, Peter Gee, Daan P Geerke, Alice Glättli, Philippe H Hünenberger, et al. Biomolecular modeling: goals, prob- references 43 lems, perspectives. Angewandte Chemie International Edition, 45(25):4064–4092, 2006.

[4] Expert Exchange. Processing power compared, 2015.

[5] Thomas A Halgren and Wolfgang Damm. Polarizable force fields. Current opinion in structural biology, 11(2):236–242, 2001.

[6] Arieh Warshel, Mitsunori Kato, and Andrei V Pisliakov. Polarizable force fields: history, test cases, and prospects. Journal of Chemical Theory and Compu- tation, 3(6):2034–2045, 2007.

[7] Piotr Cieplak, François-Yves Dupradeau, Yong Duan, and Junmei Wang. Po- larization effects in molecular mechanical force fields. Journal of Physics: Con- densed Matter, 21(33):333102, 2009.

[8] Jay W Ponder, Chuanjie Wu, Pengyu Ren, Vijay S Pande, John D Chodera, Michael J Schnieders, Imran Haque, David L Mobley, Daniel S Lambrecht, Robert A DiStasio Jr, et al. Current status of the amoeba polarizable force field. The journal of physical chemistry B, 114(8):2549–2564, 2010.

[9] Pengyu Ren, Jaehun Chun, Dennis G Thomas, Michael J Schnieders, Marcelo Marucho, Jiajing Zhang, and Nathan A Baker. Biomolecular electrostatics and solvation: a computational perspective. Quarterly reviews of biophysics, 45(4):427–491, 2012.

[10] G Andrés Cisneros, Mikko Karttunen, Pengyu Ren, and Celeste Sagui. Clas- sical electrostatics for biomolecular simulations. Chemical reviews, 114(1):779– 814, 2013.

[11] Christopher M Baker. Polarizable force fields for molecular dynamics simula- tions of biomolecules. Wiley Interdisciplinary Reviews: Computational Molecular Science, 5(2):241–254, 2015.

[12] Justin A Lemkul, Jing Huang, Benoît Roux, and Alexander D MacKerell Jr. An empirical polarizable force field based on the classical drude oscillator model: development history and recent applications. Chemical reviews, 116(9):4983– 5013, 2016.

[13] Zhifeng Jing, Chengwen Liu, Sara Y Cheng, Rui Qi, Brandon D Walker, Jean- Philip Piquemal, and Pengyu Ren. Polarizable force fields for biomolecular simulations: Recent advances and applications. Annual Review of biophysics, 48:371–394, 2019.

[14] Dmitry Bedrov, Jean-Philip Piquemal, Oleg Borodin, Alexander D MacK- erell Jr, Benoît Roux, and Christian Schröder. Molecular dynamics simula- tions of ionic liquids and electrolytes using polarizable force fields. Chemical reviews, 2019.

[15] VS Inakollu, Daan P Geerke, Christopher N Rowley, and Haibo Yu. Polar- isable force fields: What do they add in biomolecular simulations? arXiv preprint arXiv:1910.14237, 2019. references 44 [16] Josef Melcr and Jean-Philip Piquemal. Accurate biomolecular simulations account for electronic polarization. arXiv preprint arXiv:1909.03732, 2019.

[17] Frank Jensen. Introduction to computational chemistry. John wiley & sons, 2017.

[18] Andreas W Götz, Mark J Williamson, Dong Xu, Duncan Poole, Scott Le Grand, and Ross C Walker. Routine microsecond molecular dynamics simulations with amber on gpus. 1. generalized born. Journal of chemical theory and compu- tation, 8(5):1542–1555, 2012.

[19] Romelia Salomon-Ferrer, Andreas W Götz, Duncan Poole, Scott Le Grand, and Ross C Walker. Routine microsecond molecular dynamics simulations with amber on gpus. 2. explicit solvent particle mesh ewald. Journal of chemical theory and computation, 9(9):3878–3888, 2013.

[20] Chunkit Hong, D Peter Tieleman, and Yi Wang. Microsecond molecular dy- namics simulations of lipid mixing. Langmuir, 30(40):11993–12001, 2014.

[21] Terrell L Hill. An introduction to statistical thermodynamics. Courier Corpora- tion, 1986.

[22] Sereina Riniker. Fixed-Charge Atomistic Force Fields for Molecular Dynam- ics Simulations in the Condensed Phase: An Overview. Journal of Chemical Information and Modeling, 58(3):565–578, 2018.

[23] Michael Levitt and Shneior Lifson. Refinement of protein conformations using a macromolecular energy minimization procedure. Journal of molecular biology, 46(2):269–279, 1969.

[24] H Berendsen. Report of cecam workshop: models for protein dynamics. Orsay, May, 1976.

[25] John Edward Lennard-Jones. On the determination of molecular fields. ii. from the equation of state of gas. Proc. Roy. Soc. A, 106:463–477, 1924.

[26] John E Lennard-Jones. Cohesion. Proceedings of the Physical Society, 43(5):461, 1931.

[27] Xipeng Wang, Simón Ramírez-Hinestrosa, Jure Dobnikar, and Daan Frenkel. The lennard-jones potential: when (not) to use it. Physical Chemistry Chemical Physics, 2020.

[28] Omar Demerdash, Lee-Ping Wang, and Teresa Head-Gordon. Advanced mod- els for water simulations. Wiley Interdisciplinary Reviews: Computational Molec- ular Science, 8(1):e1355, 2018.

[29] Garland R Marshall. Limiting assumptions in molecular modeling: electro- statics. Journal of computer-aided molecular design, 27(2):107–114, 2013.

[30] Salvatore Cardamone, Timothy J Hughes, and Paul LA Popelier. Multipolar electrostatics. Physical Chemistry Chemical Physics, 16(22):10367–10387, 2014.

[31] AT Hagler. Force field development phase ii: Relaxation of physics-based criteria. . . or inclusion of more rigorous physics into the representation of molecular energetics. Journal of computer-aided molecular design, pages 1–60, 2018. references 45 [32] Gregory R Medders, Volodymyr Babin, and Francesco Paesani. A critical as- sessment of two-body and three-body interactions in water. Journal of chemical theory and computation, 9(2):1103–1114, 2013.

[33] Bertrand Guillot. A reappraisal of what we have learnt during three decades of computer simulations on water. Journal of molecular liquids, 101(1-3):219–260, 2002.

[34] Jay W Ponder and David A Case. Force fields for protein simulations. In Advances in protein chemistry, volume 66, pages 27–85. Elsevier, 2003.

[35] U Dinur and AT Hagler. Geometry-dependent atomic charges: Methodol- ogy and application to alkanes, aldehydes, ketones, and amides. Journal of Computational Chemistry, 16(2):154–170, 1995.

[36] Daria Galimberti, Alberto Milani, and Chiara Castiglioni. Charge mobility in molecules: charge fluxes from second derivatives of the molecular dipole. The Journal of Chemical Physics, 138(16):164115, 2013.

[37] Pavel Hobza and ZdenˇekHavlas. Blue-shifting hydrogen bonds. Chemical reviews, 100(11):4253–4264, 2000.

[38] Kazuhiko Honda. An effective potential function with enhanced charge- transfer-type interaction for hydrogen-bonding liquids. The Journal of chemical physics, 117(8):3558–3569, 2002.

[39] Jan Rezac and Pavel Hobza. Benchmark calculations of interaction energies in noncovalent complexes and their applications. Chemical reviews, 116(9):5038– 5071, 2016.

[40] Anthony Stone. The theory of intermolecular forces. OUP Oxford, 2013.

[41] Joseph O Hirschfelder, Charles F Curtiss, Robert Byron Bird, and Maria Goep- pert Mayer. Molecular theory of gases and liquids, volume 26. Wiley New York, 1954.

[42] Qiantao Wang, Joshua A Rackers, Chenfeng He, Rui Qi, Christophe Narth, Louis Lagardere, Nohad Gresh, Jay W Ponder, Jean-Philip Piquemal, and Pengyu Ren. General model for treating short-range electrostatic penetration in a force field. Journal of chemical theory and computation, 11(6):2609–2618, 2015.

[43] Christophe Narth, Louis Lagardère, Etienne Polack, Nohad Gresh, Qiantao Wang, David R Bell, Joshua A Rackers, Jay W Ponder, Pengyu Y Ren, and Jean-Philip Piquemal. Scalable improvement of spme multipolar electrostatics in anisotropic polarizable molecular mechanics using a general short-range penetration correction up to quadrupoles. Journal of computational chemistry, 37(5):494–506, 2016.

[44] Joshua A Rackers, Qiantao Wang, Chengwen Liu, Jean-Philip Piquemal, Pengyu Ren, and Jay W Ponder. An optimized charge penetration model for use with the amoeba force field. Physical Chemistry Chemical Physics, 19(1):276– 291, 2017. references 46 [45] Maxim Tafipolsky and Kay Ansorg. Toward a physically motivated force field: hydrogen bond directionality from a symmetry-adapted perturbation theory perspective. Journal of chemical theory and computation, 12(3):1267–1279, 2016.

[46] AT Hagler. Quantum derivative fitting and biomolecular force fields: func- tional form, coupling terms, charge flux, nonbond anharmonicity, and in- dividual dihedral potentials. Journal of chemical theory and computation, 11(12):5555–5572, 2015.

[47] Uri Dinur and Arnold T Hagler. Direct evaluation of nonbonding interac- tions from ab initio calculations. Journal of the American Chemical Society, 111(14):5149–5151, 1989.

[48] AJ Stone and SL Price. Some new ideas in the theory of intermolecular forces: anisotropic atom-atom potentials. The Journal of Physical Chemistry, 92(12):3325–3335, 1988.

[49] Hamed Eramian, Yong-Hui Tian, Zach Fox, Habtamu Z Beneberu, and Miklos Kertesz. On the anisotropy of van der waals atomic radii of o, s, se, f, cl, br, and i. The Journal of Physical Chemistry A, 117(51):14184–14190, 2013.

[50] Mary J Van Vleet, Alston J Misquitta, and JR Schmidt. New angles on standard force fields: toward a general approach for treating atomic-level anisotropy. Journal of chemical theory and computation, 14(2):739–758, 2018.

[51] Pedro EM Lopes, Jing Huang, Jihyun Shim, Yun Luo, Hui Li, Benoît Roux, and Alexander D MacKerell Jr. Polarizable force field for peptides and pro- teins based on the classical drude oscillator. Journal of chemical theory and computation, 9(12):5430–5449, 2013.

[52] Haibo Yu and Wilfred F Van Gunsteren. Accounting for polarization in molec- ular simulation. Computer Physics Communications, 172(2):69–85, 2005.

[53] Steven W Rick and Steven J Stuart. Potentials and algorithms for incorporat- ing polarizability in computer simulations. Reviews in computational chemistry, 18:89–146, 2002.

[54] Daan P Geerke and Wilfred F van Gunsteren. Calculation of the free energy of polarization: Quantifying the effect of explicitly treating electronic polar- ization on the transferability of force-field parameters. The Journal of Physical Chemistry B, 111(23):6425–6436, 2007.

[55] Wilfred F van Gunsteren, Paul K Weiner, and Anthony J Wilkinson. Com- puter simulation of biomolecular systems: theoretical and experimental applications, volume 3. Springer Science & Business Media, 2013.

[56] Chris Oostenbrink, Alessandra Villa, Alan E Mark, and Wilfred F Van Gun- steren. A biomolecular force field based on the free enthalpy of hydration and solvation: the gromos force-field parameter sets 53a5 and 53a6. Journal of computational chemistry, 25(13):1656–1676, 2004.

[57] Carl Johan Friedrich Böttcher and OC Van Belle. Dielectrics in static fields. Elsevier Scientific Publishing Company, 1973. references 47 [58] AJ Stone. Distributed polarizabilities. Molecular Physics, 56(5):1065–1082, 1985.

[59] Jon R Maple and Carl S Ewig. An ab initio procedure for deriving atomic po- larizability tensors in molecules. The Journal of Chemical Physics, 115(11):4981– 4988, 2001.

[60] Wilfried J Mortier, Swapan K Ghosh, and S Shankar. Electronegativity- equalization method for the calculation of atomic charges in molecules. Jour- nal of the American Chemical Society, 108(15):4315–4320, 1986.

[61] Christopher M Baker and Guy H Grant. Modeling aromatic liquids: toluene, phenol, and pyridine. Journal of chemical theory and computation, 3(2):530–548, 2007.

[62] Stefano Rendine, Stefano Pieraccini, Alessandra Forni, and Maurizio Sironi. Halogen bonding in ligand–receptor systems in the framework of classical force fields. Physical Chemistry Chemical Physics, 13(43):19508–19516, 2011.

[63] Sandeep Patel and Charles L Brooks III. Charmm fluctuating charge force field for proteins: I parameterization and application to bulk organic liquid simulations. Journal of computational chemistry, 25(1):1–16, 2004.

[64] Sandeep Patel, Alexander D Mackerell Jr, and Charles L Brooks III. Charmm fluctuating charge force field for proteins: Ii protein/solvent properties from molecular dynamics simulations using a nonadditive electrostatic model. Jour- nal of computational chemistry, 25(12):1504–1514, 2004.

[65] Cui Liu, Yue Li, Bing-Yu Han, Li-Dong Gong, Li-Nan Lu, Zhong-Zhi Yang, and Dong-Xia Zhao. Development of the abeem σπ polarization force field for base pairs with amino acid residue complexes. Journal of chemical theory and computation, 13(5):2098–2111, 2017.

[66] P Drude, RA Millikan, and RC Mann. The theory of optics longmans. Green, and Co, 1902.

[67] TP Straatsma and JA McCammon. Molecular dynamics simulations with interaction potentials including polarization development of a noniterative method and application to water. Molecular Simulation, 5(3-4):181–192, 1990.

[68] Guillaume Lamoureux and Benoıt Roux. Modeling induced polarization with classical drude oscillators: Theory and molecular dynamics simulation algo- rithm. The Journal of chemical physics, 119(6):3025–3039, 2003.

[69] Jon R Maple, Yixiang Cao, Wolfgang Damm, Thomas A Halgren, George A Kaminski, Linda Y Zhang, and Richard A Friesner. A polarizable force field and continuum solvation methodology for modeling of protein- ligand inter- actions. Journal of Chemical Theory and Computation, 1(4):694–715, 2005.

[70] Buyong Ma, Jenn-Huei Lii, and Norman L Allinger. Molecular polarizabilities and induced dipole moments in molecular mechanics. Journal of Computational Chemistry, 21(10):813–825, 2000.

[71] Malcolm E Davis and J Andrew McCammon. Electrostatics in biomolecular structure and dynamics. Chemical Reviews, 90(3):509–521, 1990. references 48 [72] Piotr Cieplak, James Caldwell, and Peter Kollman. Molecular mechanical models for organic and biological systems going beyond the atom centered two body additive approximation: aqueous solution free energies of methanol and n-methyl acetamide, nucleic acid base, and amide hydrogen bonding and chloroform/water partition coefficients of the nucleic acid bases. Journal of Computational Chemistry, 22(10):1048–1057, 2001.

[73] Junmei Wang, Romain M Wolf, James W Caldwell, Peter A Kollman, and David A Case. Development and testing of a general amber force field. Journal of computational chemistry, 25(9):1157–1174, 2004.

[74] B Th Thole. Molecular polarizabilities calculated with a modified dipole in- teraction. Chemical Physics, 59(3):341–350, 1981.

[75] Omar Demerdash, Yuezhi Mao, Tianyi Liu, Martin Head-Gordon, and Teresa Head-Gordon. Assessing many-body contributions to intermolecular interac- tions of the amoeba force field using energy decomposition analysis of elec- tronic structure calculations. The Journal of chemical physics, 147(16):161721, 2017.

[76] Yuezhi Mao, Omar Demerdash, Martin Head-Gordon, and Teresa Head- Gordon. Assessing ion–water interactions in the amoeba force field using energy decomposition analysis of electronic structure calculations. Journal of chemical theory and computation, 12(11):5422–5437, 2016.

[77] Haibo Yu, Tomas Hansson, and Wilfred F van Gunsteren. Development of a simple, self-consistent polarizable model for liquid water. The Journal of chemical physics, 118(1):221–234, 2003.

[78] Anna-Pitschna E Kunz and Wilfred F van Gunsteren. Development of a non- linear classical polarization model for liquid water and aqueous solutions: Cos/d. The Journal of Physical Chemistry A, 113(43):11570–11579, 2009.

[79] Chengwen Liu, Rui Qi, Qiantao Wang, J-P Piquemal, and Pengyu Ren. Cap- turing many-body interactions with classical dipole induction models. Journal of chemical theory and computation, 13(6):2751–2761, 2017.

[80] Christian Kramer, Alexander Spinn, and Klaus R Liedl. Charge anisotropy: where atomic multipoles matter most. Journal of chemical theory and computa- tion, 10(10):4488–4496, 2014.

[81] Junmei Wang, Piotr Cieplak, Ray Luo, and Yong Duan. Development of polar- izable gaussian model for molecular mechanical calculations i: Atomic polar- izability parameterization to reproduce ab initio anisotropy. Journal of chemical theory and computation, 15(2):1146–1158, 2019.

[82] Richard W Dixon and Peter A Kollman. Advancing beyond the atom-centered model in additive and nonadditive molecular mechanics. Journal of Computa- tional Chemistry, 18(13):1632–1646, 1997.

[83] William L Jorgensen and Patric Schyman. Treatment of halogen bonding in the opls-aa force field: application to potent anti-hiv agents. Journal of chemical theory and computation, 8(10):3895–3901, 2012. references 49 [84] Fang-Yu Lin, Pedro EM Lopes, Edward Harder, Benoît Roux, and Alexan- der D MacKerell Jr. Polarizable force field for molecular ions based on the classical drude oscillator. Journal of chemical information and modeling, 58(5):993– 1004, 2018.

[85] Daniel J Cole, Jonah Z Vilseck, Julian Tirado-Rives, Mike C Payne, and William L Jorgensen. Biomolecular force field parameterization via atoms- in-molecule electron density partitioning. Journal of chemical theory and compu- tation, 12(5):2312–2323, 2016.

[86] Mahmoud AA Ibrahim. Molecular mechanical study of halogen bonding in drug discovery. Journal of computational chemistry, 32(12):2564–2574, 2011.

[87] Xin Cindy Yan, Michael J Robertson, Julian Tirado-Rives, and William L Jor- gensen. Improved description of sulfur charge anisotropy in opls force fields: model development and parameterization. The Journal of Physical Chemistry B, 121(27):6626–6636, 2017.

[88] Stephanie CC van der Lubbe and Célia Fonseca Guerra. The nature of hydro- gen bonds: a delineation of the role of different energy components on hydro- gen bond strengths and lengths. Chemistry–An Asian Journal, 14(16):2760–2769, 2019.

[89] Alice Allen. Quantum Mechanically Derived Biomolecular Force Fields. PhD the- sis, University of Cambridge, 2019.

[90] Edward Harder, Victor M Anisimov, Igor V Vorobyov, Pedro EM Lopes, Sergei Y Noskov, Alexander D MacKerell, and Benoît Roux. Atomic level anisotropy in the electrostatic modeling of lone pairs for a polarizable force field based on the classical drude oscillator. Journal of chemical theory and com- putation, 2(6):1587–1597, 2006.

[91] Ramu Anandakrishnan, Charles Baker, Saeed Izadi, and Alexey V Onufriev. Point charges optimally placed to represent the multipole expansion of charge distributions. PloS one, 8(7), 2013.

[92] Oliver T Unke, Mike Devereux, and Markus Meuwly. Minimal distributed charges: Multipolar quality at the cost of point charge electrostatics. The Journal of chemical physics, 147(16):161712, 2017.

[93] Jan Florian, Myron F Goodman, and Arieh Warshel. Theoretical investiga- tion of the binding free energies and key substrate-recognition components of the replication fidelity of human dna polymerase β. The Journal of Physical Chemistry B, 106(22):5739–5753, 2002.

[94] Robert B Murphy, Dean M Philipp, and Richard A Friesner. A mixed quan- tum mechanics/molecular mechanics (qm/mm) method for large-scale mod- eling of chemistry in protein environments. Journal of Computational Chemistry, 21(16):1442–1457, 2000.

[95] François Dehez, János G Ángyán, Ignacio Soteras Gutiérrez, F Javier Luque, Klaus Schulten, and Christophe Chipot. Modeling induction phenomena in references 50 intermolecular interactions with an ab initio force field. Journal of chemical theory and computation, 3(6):1914–1926, 2007.

[96] Hans Martin Senn and Walter Thiel. Qm/mm methods for biomolecular sys- tems. Angewandte Chemie International Edition, 48(7):1198–1229, 2009.

[97] Yu-Hong Tan and Ray Luo. Continuum treatment of electronic polarization effect. The Journal of chemical physics, 126(9):03B606, 2007.

[98] Pengyu Ren and Jay W Ponder. Consistent treatment of inter-and intramolec- ular polarization in molecular mechanics calculations. Journal of computational chemistry, 23(16):1497–1506, 2002.

[99] Pengyu Ren and Jay W Ponder. Polarizable atomic multipole water model for molecular mechanics simulation. The Journal of Physical Chemistry B, 107(24):5933–5947, 2003.

[100] Alan Grossfield, Pengyu Ren, and Jay W Ponder. Ion solvation thermody- namics from simulation with a polarizable force field. Journal of the American Chemical Society, 125(50):15671–15682, 2003.

[101] Pengyu Ren and Jay W Ponder. Temperature and pressure dependence of the amoeba water model. The Journal of Physical Chemistry B, 108(35):13427–13437, 2004.

[102] Dian Jiao, Jiajing Zhang, Robert E Duke, Guohui Li, Michael J Schnieders, and Pengyu Ren. Trypsin-ligand binding free energies from explicit and im- plicit solvent simulations with polarizable potential. Journal of computational chemistry, 30(11):1701–1711, 2009.

[103] Yue Shi, Dian Jiao, Michael J Schnieders, and Pengyu Ren. Trypsin-ligand binding free energy calculation with amoeba. In 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 2328– 2331. IEEE, 2009.

[104] Janet Newman, Vincent J Fazio, Tom T Caradoc-Davies, Kim Branson, and Thomas S Peat. Practical aspects of the sampl challenge: providing an exten- sive experimental data set for the modeling community. Journal of biomolecular screening, 14(10):1245–1250, 2009.

[105] Nohad Gresh, G Andrés Cisneros, Thomas A Darden, and Jean-Philip Pique- mal. Anisotropic, polarizable molecular mechanics studies of inter-and in- tramolecular interactions and ligand- macromolecule complexes. a bottom-up strategy. Journal of chemical theory and computation, 3(6):1960–1986, 2007.

[106] Dian Jiao, Christopher King, Alan Grossfield, Thomas A Darden, and Pengyu Ren. Simulation of ca2+ and mg2+ solvation using polarizable atomic multi- pole potential. The Journal of Physical Chemistry B, 110(37):18553–18559, 2006.

[107] Guillaume Lamoureux and Benoît Roux. Absolute hydration free energy scale for alkali and halide ions established from simulations with a polarizable force field. The journal of physical chemistry B, 110(7):3308–3322, 2006. references 51 [108] Troy W Whitfield, Sameer Varma, Edward Harder, Guillaume Lamoureux, Susan B Rempe, and Benoit Roux. Theoretical study of aqueous solvation of k+ comparing ab initio, polarizable, and fixed-charge models. Journal of chemical theory and computation, 3(6):2068–2082, 2007.

[109] Revati Kumar, Fang-Fang Wang, Glen R Jenness, and Kenneth D Jordan. A second generation distributed point polarizable water model. The Journal of chemical physics, 132(1):014309, 2010.

[110] GA Cisneros, S Na-Im Tholander, O Parisel, TA Darden, D Elking, L Perera, and J-P Piquemal. Simple formulas for improved point-charge electrostatics in classical force fields and hybrid quantum mechanical/molecular mechanical embedding. International journal of quantum chemistry, 108(11):1905–1912, 2008.

[111] Johnny C Wu, Jean-Philip Piquemal, Robin Chaudret, Peter Reinhardt, and Pengyu Ren. Polarizable molecular dynamics simulation of zn (ii) in wa- ter using the amoeba force field. Journal of chemical theory and computation, 6(7):2059–2070, 2010.

[112] Daniel Hagberg, Gunnar Karlström, Björn O Roos, and Laura Gagliardi. The coordination of uranyl in water: a combined quantum chemical and molec- ular simulation study. Journal of the American Chemical Society, 127(41):14250– 14256, 2005.

[113] Maxim Tafipolsky and Bernd Engels. Accurate intermolecular potentials with physically grounded electrostatics. Journal of Chemical Theory and Computation, 7(6):1791–1803, 2011.

[114] Celeste Sagui, Lee G Pedersen, and Thomas A Darden. Towards an accurate representation of electrostatics in classical force fields: Efficient implemen- tation of multipolar interactions in biomolecular simulations. The Journal of chemical physics, 120(1):73–87, 2004.

[115] Brad A Bauer and Sandeep Patel. Recent applications and developments of charge equilibration force fields for modeling dynamical charges in classical molecular dynamics simulations. Theoretical Chemistry Accounts, 131(3):1153, 2012.

[116] Timothy R Lucas, Brad A Bauer, and Sandeep Patel. Charge equilibra- tion force fields for molecular dynamics simulations of lipids, bilayers, and integral membrane protein systems. Biochimica et Biophysica Acta (BBA)- Biomembranes, 1818(2):318–329, 2012.

[117] Andreas Krämer, Marco Hülsmann, Thorsten Köddermann, and Dirk Reith. Automated parameterization of intermolecular pair potentials using global optimization techniques. Computer Physics Communications, 185(12):3228–3239, 2014.

[118] Robin M Betz and Ross C Walker. Paramfit: Automated optimization of force field parameters for molecular dynamics simulations. Journal of computational chemistry, 36(2):79–87, 2015. references 52 [119] Lee-Ping Wang, Todd J Martinez, and Vijay S Pande. Building force fields: An automatic, systematic, and reproducible approach. The journal of physical chemistry letters, 5(11):1885–1891, 2014.

[120] Lee-Ping Wang, Teresa Head-Gordon, Jay W Ponder, Pengyu Ren, John D Chodera, Peter K Eastman, Todd J Martinez, and Vijay S Pande. Systematic improvement of a classical molecular model of water. The Journal of Physical Chemistry B, 117(34):9956–9972, 2013.

[121] Lei Huang and Benoît Roux. Automated force field parameterization for non- polarizable and polarizable atomic models based on ab initio target data. Jour- nal of chemical theory and computation, 9(8):3543–3556, 2013.

[122] Alessandro Barducci, Jim Pfaendtner, and Massimiliano Bonomi. Tackling sampling challenges in biomolecular simulations. In Molecular modeling of proteins, pages 151–171. Springer, 2015.

[123] Christopher M Baker and Robert B Best. Matching of additive and polarizable force fields for multiscale condensed phase simulations. Journal of chemical theory and computation, 9(6):2826–2837, 2013.

[124] Jiajing Zhang, Yue Shi, and Pengyu Ren. Polarizable force fields for scoring protein–ligand interactions. Protein-Ligand Interactions, pages 99–120, 2012.

[125] Alex D MacKerell Jr, Donald Bashford, MLDR Bellott, Roland Leslie Dun- brack Jr, Jeffrey D Evanseck, Martin J Field, Stefan Fischer, Jiali Gao, H Guo, Sookhee Ha, et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. The journal of physical chemistry B, 102(18):3586– 3616, 1998.

[126] Christopher M Baker, Pedro EM Lopes, Xiao Zhu, Benoît Roux, and Alexan- der D MacKerell Jr. Accurate calculation of hydration free energies using pair-specific lennard-jones parameters in the charmm drude polarizable force field. Journal of chemical theory and computation, 6(4):1181–1198, 2010.

[127] Haibo Yu, Troy W Whitfield, Edward Harder, Guillaume Lamoureux, Igor Vorobyov, Victor M Anisimov, Alexander D MacKerell Jr, and Benoît Roux. Simulating monovalent and divalent ions in aqueous solution using a drude polarizable force field. Journal of chemical theory and computation, 6(3):774–786, 2010.

[128] Peter Eastman, Jason Swails, John D Chodera, Robert T McGibbon, Yutong Zhao, Kyle A Beauchamp, Lee-Ping Wang, Andrew C Simmonett, Matthew P Harrigan, Chaya D Stern, et al. Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7):e1005659, 2017.

[129] Louis Lagardère, Luc-Henri Jolly, Filippo Lipparini, Félix Aviat, Benjamin Stamm, Zhifeng F Jing, Matthew Harger, Hedieh Torabifard, G Andrés Cis- neros, Michael J Schnieders, et al. Tinker-hp: a massively parallel molecular dynamics package for multiscale simulations of large complex systems with advanced point dipole polarizable force fields. Chemical Science, 9(4):956–972, 2018. references 53 [130] Jing Huang, Justin A Lemkul, Peter K Eastman, and Alexander D MacKerell Jr. Molecular dynamics simulations using the drude polarizable force field on gpus with openmm: Implementation, validation, and benchmarks. Journal of computational chemistry, 39(21):1682–1689, 2018.

[131] Changsheng Zhang, Chao Lu, Zhifeng Jing, Chuanjie Wu, Jean-Philip Pique- mal, Jay W Ponder, and Pengyu Ren. Amoeba polarizable atomic multi- pole force field for nucleic acids. Journal of chemical theory and computation, 14(4):2084–2108, 2018.

[132] Poonam Pandey, Asaminew H Aytenfisu, Alexander D MacKerell Jr, and Sairam S Mallajosyula. Drude polarizable force field parametrization of car- boxylate and n-acetyl amine carbohydrate derivatives. Journal of chemical the- ory and computation, 15(9):4982–5000, 2019.

[133] Chao Lu, Xubin Li, Dongsheng Wu, Lianqing Zheng, and Wei Yang. Predic- tive sampling of rare conformational events in aqueous solution: designing a generalized orthogonal space tempering method. Journal of chemical theory and computation, 12(1):41–52, 2015.

[134] Brooke E Husic and Vijay S Pande. Markov state models: From an art to a science. Journal of the American Chemical Society, 140(7):2386–2396, 2018.

[135] Ron Elber. Perspective: Computer simulations of long time dynamics. The Journal of chemical physics, 144(6):060901, 2016.

[136] Frederic Célerse, Louis Lagardère, Etienne Derat, and Jean-Philip Piquemal. Massively parallel implementation of steered molecular dynamics in tinker- hp: comparisons of polarizable and non-polarizable simulations of realistic systems. Journal of chemical theory and computation, 2019.

[137] Anthony J Stone. Electrostatic damping functions and the penetration energy. The Journal of Physical Chemistry A, 115(25):7017–7027, 2011.

[138] Piet Th Van Duijnen and Marcel Swart. Molecular and atomic polarizabilities: Thole’s model revisited. The Journal of Physical Chemistry A, 102(14):2399–2407, 1998.

[139] Jun Wang, Piotr Cieplak, Qin Cai, Meng-Juei Hsieh, Junmei Wang, Yong Duan, and Ray Luo. Development of polarizable models for molecular mechani- cal calculations. 3. polarizable water models conforming to thole polarization screening schemes. The Journal of Physical Chemistry B, 116(28):7999–8008, 2012.

[140] Dennis M Elking, G Andrés Cisneros, Jean-Philip Piquemal, Thomas A Dar- den, and Lee G Pedersen. Gaussian multipole model (gmm). Journal of chemi- cal theory and computation, 6(1):190–202, 2009.

[141] Robert E Duke, Oleg N Starovoytov, Jean-Philip Piquemal, and G Andrés Cis- neros. Gem*: A molecular electronic density-based force field for molecular dynamics simulations. Journal of chemical theory and computation, 10(4):1361– 1365, 2014. references 54 [142] Robin Chaudret, Nohad Gresh, Christophe Narth, Louis Lagardère, Thomas A Darden, G Andrés Cisneros, and Jean-Philip Piquemal. S/g-1: An ab initio force-field blending frozen hermite gaussian densities and dis- tributed multipoles. proof of concept and first applications to metal cations. The Journal of Physical Chemistry A, 118(35):7598–7612, 2014.

[143] Jean-Philip Piquemal and G Andrés Cisneros. Status of the gaussian electro- static model, a density-based polarizable force field, 2016.

[144] C Schröder, Johannes Hunger, A Stoppa, Richard Buchner, and O Stein- hauser. On the collective network of ionic liquid/water mixtures. ii. decom- position and interpretation of dielectric spectra. The Journal of chemical physics, 129(18):184501, 2008.

[145] Yun Luo, Wei Jiang, Haibo Yu, Alexander D MacKerell, and Benoît Roux. Simulation study of ion pairing in concentrated aqueous salt solutions with a polarizable force field. Faraday discussions, 160:135–149, 2013.

[146] Justin A Lemkul and Alexander D MacKerell Jr. Balancing the interactions of mg2+ in aqueous solution and with nucleic acid moieties for a polarizable force field based on the classical drude oscillator model. The Journal of Physical Chemistry B, 120(44):11436–11448, 2016.

[147] Jejoong Yoo and Aleksei Aksimentiev. New tricks for old dogs: improving the accuracy of biomolecular force fields by pair-specific corrections to non- bonded interactions. Physical Chemistry Chemical Physics, 20(13):8432–8449, 2018.

[148] Aude Marjolin, Christophe Gourlaouen, Carine Clavaguéra, Pengyu Y Ren, Jean-Philip Piquemal, and Jean-Pierre Dognon. Hydration gibbs free energies of open and closed shell trivalent lanthanide and actinide cations from polar- izable molecular dynamics. Journal of molecular modeling, 20(10):2471, 2014.

[149] Yi-Jung Tu, Matthew J Allen, and G Andrés Cisneros. Simulations of the water exchange dynamics of lanthanide ions in 1-ethyl-3-methylimidazolium ethyl sulfate ([emim][etso4]) and water. Physical Chemistry Chemical Physics, 18(44):30323–30333, 2016.

[150] Wei Jiang, David J Hardy, James C Phillips, Alexander D MacKerell Jr, Klaus Schulten, and Benoît Roux. High-performance scalable molecular dynamics simulations of a polarizable force field based on classical drude oscillators in namd. The journal of physical chemistry letters, 2(2):87–92, 2010.

[151] Joshua A Rackers, Zhi Wang, Chao Lu, Marie L Laury, Louis Lagardere, Michael J Schnieders, Jean-Philip Piquemal, Pengyu Ren, and Jay W Ponder. Tinker 8: software tools for molecular design. Journal of chemical theory and computation, 14(10):5273–5289, 2018.

[152] Josef Melcr, Hector Martinez-Seara, Ricky Nencini, Jiˇrí Kolafa, Pavel Jung- wirth, and OH Samuli Ollila. Accurate binding of sodium and calcium to a popc bilayer by effective inclusion of electronic polarization. The Journal of Physical Chemistry B, 122(16):4546–4557, 2018. references 55 [153] Josef Melcr, Tiago Mendes Ferreira, Pavel Jungwirth, and OH Samuli Ollila. Improved cation binding to lipid bilayer with negatively charged pops by effective inclusion of electronic polarization. Journal of chemical theory and com- putation, 2019.

[154] Hanne Antila, Pavel Buslaev, Fernando Favela-Rosales, Tiago M Ferreira, Ivan Gushchin, Matti Javanainen, Batuhan Kav, Jesper J Madsen, Josef Melcr, Markus S Miettinen, et al. Headgroup structure and cation binding in phos- phatidylserine lipid bilayers. The Journal of Physical Chemistry B, 123(43):9066– 9079, 2019.

[155] Elise Duboué-Dijon, Philip E Mason, Henry E Fischer, and Pavel Jungwirth. Hydration and ion pairing in aqueous mg2+ and zn2+ solutions: force-field description aided by neutron scattering experiments and ab initio molecular dynamics simulations. The Journal of Physical Chemistry B, 122(13):3296–3306, 2017.

[156] Nohad Gresh, Jean-Philip Piquemal, and Morris Krauss. Representation of zn (ii) complexes in polarizable molecular mechanics. further refinements of the electrostatic and short-range contributions. comparisons with parallel ab initio computations. Journal of computational chemistry, 26(11):1113–1130, 2005.

[157] Jean-Philip Piquemal, Hilaire Chevreau, and Nohad Gresh. Toward a separate reproduction of the contributions to the hartree- fock and dft intermolecular interaction energies by polarizable molecular mechanics with the sibfa poten- tial. Journal of chemical theory and computation, 3(3):824–837, 2007.

[158] Peng Xu, Emilie B Guidez, Colleen Bertoni, and Mark S Gordon. Perspective: Ab initio force field methods derived from quantum mechanics. The Journal of Chemical Physics, 148(9):090901, 2018.

[159] Jean-Philip Piquemal, Nohad Gresh, and Claude Giessner-Prettre. Improved formulas for the calculation of the electrostatic contribution to the intermolec- ular interaction energy from multipolar expansion of the electronic distribu- tion. The Journal of Physical Chemistry A, 107(48):10353–10359, 2003.

[160] Louis Lagardère, Léa El-Khoury, Sehr Naseem-Khan, Félix Aviat, Nohad Gresh, and Jean-Philip Piquemal. Towards scalable and accurate molecular dynamics using the sibfa polarizable force field. In AIP Conference Proceedings, volume 1906, page 030018. AIP Publishing, 2017.

[161] Krystel El Hage, Jean-Philip Piquemal, Zeina Hobaika, Richard G Maroun, and Nohad Gresh. Could an anisotropic molecular mechanics/dynamics po- tential account for sigma hole effects in the complexes of halogenated com- pounds? Journal of computational chemistry, 34(13):1125–1135, 2013.

[162] Nohad Gresh, Judit E Sponer, Mike Devereux, Konstantinos Gkionis, Benoit de Courcy, Jean-Philip Piquemal, and Jiri Sponer. Stacked and h-bonded cy- tosine dimers. analysis of the intermolecular interaction energies by parallel quantum chemistry and polarizable molecular mechanics. The Journal of Phys- ical Chemistry B, 119(30):9477–9495, 2015. references 56 [163] Chengwen Liu, Jean-Philip Piquemal, and Pengyu Ren. Amoeba+ classical potential for modeling molecular interactions. Journal of chemical theory and computation, 2019.

[164] Jing Huang, Andrew C Simmonett, Frank C Pickard IV, Alexander D MacK- erell Jr, and Bernard R Brooks. Mapping the drude polarizable force field onto a multipole and induced dipole model. The Journal of chemical physics, 147(16):161702, 2017.

[165] Akshaya K Das, Lars Urban, Itai Leven, Matthias Loipersberger, Abdulrah- man Aldossary, Martin Head-Gordon, and Teresa Head-Gordon. Develop- ment of an advanced force field for water using variational energy decompo- sition analysis. arXiv preprint arXiv:1905.07816, 2019.

[166] Joshua A Rackers and Jay W Ponder. Classical pauli repulsion: An anisotropic, atomic multipole model. The Journal of chemical physics, 150(8):084104, 2019.

[167] Pier Paolo Poier, Louis Lagardère, Jean-Philip Piquemal, and Frank Jensen. Molecular dynamics using nonvariational polarizable force fields: Theory, periodic boundary conditions implementation, and application to the bond capacity model. Journal of chemical theory and computation, 15(11):6213–6224, 2019.

[168] Tommaso Giovannini, Alessandra Puglisi, Matteo Ambrosetti, and Chiara Cappelli. Polarizable qm/mm approach with fluctuating charges and fluctu- ating dipoles: The qm/fqfµ model. Journal of chemical theory and computation, 15(4):2233–2245, 2019.

[169] Rui Qi, Zhifeng Jing, Chengwen Liu, Jean-Philip Piquemal, Kevin N Dalby, and Pengyu Ren. Elucidating the phosphate binding mode of phosphate- binding protein: The critical effect of buffer solution. The Journal of Physical Chemistry B, 122(24):6371–6376, 2018.

[170] Louis Lagardère, Félix Aviat, and Jean-Philip Piquemal. Pushing the limits of multiple-time-step strategies for polarizable point dipole molecular dynamics. The journal of physical chemistry letters, 10:2593–2599, 2019.

[171] Daniele Loco, Louis Lagardère, Andres Cisneros, Giovanni Scalmani, Michael Frisch, Filippo Lipparini, Benedetta Mennucci, and Jean-Philip Piquemal. To- wards large scale hybrid qm/mm dynamics of complex systems with ad- vanced point dipole polarizable embeddings. Chemical Science, 2019.

[172] Daniele Loco, Louis Lagardère, Stefano Caprasecca, Filippo Lipparini, Benedetta Mennucci, and Jean-Philip Piquemal. Hybrid qm/mm molecular dynamics with amoeba polarizable embedding. Journal of chemical theory and computation, 13(9):4025–4033, 2017.

[173] Fang-Yu Lin and Alexander D MacKerell. Improved modeling of cation-π and anion-ring interactions using the drude polarizable empirical force field for proteins. Journal of computational chemistry, 2019. references 57 [174] Changsheng Zhang, David Bell, Matthew Harger, and Pengyu Ren. Polar- izable multipole-based force field for aromatic molecules and nucleobases. Journal of chemical theory and computation, 13(2):666–678, 2017.

[175] Anhui Wang, Zhichao Zhang, and Guohui Li. Higher accuracy achieved in the simulations of protein structure refinement, protein folding, and intrinsi- cally disordered proteins using polarizable force fields. The journal of physical chemistry letters, 9(24):7110–7116, 2018.

[176] Paul Robustelli, Stefano Piana, and David E Shaw. Developing a molecular dynamics force field for both folded and disordered protein states. Proceedings of the National Academy of Sciences, 115(21):E4758–E4766, 2018.

[177] Jing Huang, Sarah Rauscher, Grzegorz Nawrocki, Ting Ran, Michael Feig, Bert L de Groot, Helmut Grubmüller, and Alexander D MacKerell Jr. Charmm36m: an improved force field for folded and intrinsically disordered proteins. Nature methods, 14(1):71, 2017.

[178] Jing Huang and Alexander D MacKerell Jr. Force field development and simulations of intrinsically disordered proteins. Current opinion in structural biology, 48:40–48, 2018.

[179] Paul S Nerenberg and Teresa Head-Gordon. New developments in force fields for biomolecular simulations. Current opinion in structural biology, 49:129–138, 2018.

[180] Mohamad Mohebifar, Erin R Johnson, and Christopher N Rowley. Evaluat- ing force-field london dispersion coefficients using the exchange-hole dipole moment model. Journal of chemical theory and computation, 13(12):6146–6157, 2017.

[181] Evan T Walters, Mohamad Mohebifar, Erin R Johnson, and Christopher N Rowley. Evaluating the london dispersion coefficients of protein force fields using the exchange-hole dipole moment model. The Journal of Physical Chem- istry B, 122(26):6690–6701, 2018.

[182] Koen M Visscher and Daan P Geerke. Deriving force-field parameters from first principles using a polarizable and higher order dispersion model. Journal of chemical theory and computation, 15(3):1875–1883, 2019.

[183] Derk P Kooi and Paola Gori-Giorgi. A variational approach to london disper- sion interactions without density distortion. The journal of physical chemistry letters, 10(7):1537–1541, 2019.

[184] Eliot Boulanger and Walter Thiel. Toward qm/mm simulation of enzymatic reactions with the drude oscillator polarizable force field. Journal of chemical theory and computation, 10(4):1795–1809, 2014.

[185] Gerhard König, Frank Pickard, Jing Huang, Walter Thiel, Alexander MacK- erell, Bernard Brooks, and Darrin York. A comparison of qm/mm simulations with and without the drude oscillator model based on hydration free energies of simple solutes. Molecules, 23(10):2695, 2018. references 58 [186] Abir Ganguly, Eliot Boulanger, and Walter Thiel. Importance of mm polariza- tion in qm/mm studies of enzymatic reactions: Assessment of the qm/mm drude oscillator model. Journal of chemical theory and computation, 13(6):2954– 2961, 2017.

[187] Tommaso Giovannini, Rosario Roberto Riso, Matteo Ambrosetti, Alessandra Puglisi, and Chiara Cappelli. Electronic transitions for a fully polarizable qm/mm approach based on fluctuating charges and fluctuating dipoles: Lin- ear and corrected linear response regimes. arXiv preprint arXiv:1906.03852, 2019.

[188] David Semrouni, Ashwani Sharma, Jean-Pierre Dognon, Gilles Ohanessian, and Carine Clavaguéra. Finite temperature infrared spectra from polarizable molecular dynamics simulations. Journal of chemical theory and computation, 10(8):3190–3199, 2014.

[189] Florian Thaunay, Jean-Pierre Dognon, Gilles Ohanessian, and Carine Clavaguéra. Vibrational mode assignment of finite temperature infrared spec- tra using the amoeba polarizable force field. Physical Chemistry Chemical Physics, 17(39):25968–25977, 2015.

[190] Florian Thaunay, Florent Calvo, Edith Nicol, Gilles Ohanessian, and Carine Clavaguéra. Infrared spectra of deprotonated dicarboxylic acids: Irmpd spec- troscopy and empirical valence-bond modeling. ChemPhysChem, 20(6):803– 814, 2019.

[191] Paul Adrien Maurice Dirac. Quantum mechanics of many-electron systems. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathe- matical and Physical Character, 123(792):714–733, 1929.

[192] Frank Noé, Alexandre Tkatchenko, Klaus-Robert Müller, and Cecilia Clementi. Machine learning for molecular simulation. arXiv preprint arXiv:1911.02792, 2019.

[193] J Hachmann, TL Windus, JA McLean, V Allwardt, AC Schrimpe-Rutledge, MAF Afzal, and M Haghighatlari. Framing the role of big data and modern data science in chemistry. Technical report, tech. rep, 2018.

[194] Mojtaba Haghighatlari and Johannes Hachmann. Advances of machine learn- ing in molecular modeling and simulation. Current Opinion in Chemical Engi- neering, 23:51–57, 2019.

[195] Benjamin Sanchez-Lengeling and Alán Aspuru-Guzik. Inverse molecular de- sign using machine learning: Generative models for matter engineering. Sci- ence, 361(6400):360–365, 2018.

[196] Keith T Butler, Daniel W Davies, Hugh Cartwright, Olexandr Isayev, and Aron Walsh. Machine learning for molecular and materials science. Nature, 559(7715):547–555, 2018.

[197] Kristof T Schütt, Huziel E Sauceda, P-J Kindermans, Alexandre Tkatchenko, and K-R Müller. Schnet–a deep learning architecture for molecules and mate- rials. The Journal of Chemical Physics, 148(24):241722, 2018. references 59 [198] Justin Chen, Jiming Chen, Giovanni Pinamonti, and Cecilia Clementi. Learn- ing effective molecular models from experimental observables. Journal of chem- ical theory and computation, 14(7):3849–3858, 2018.

[199] Justin S Smith, Olexandr Isayev, and Adrian E Roitberg. Ani-1: an extensible neural network potential with dft accuracy at force field computational cost. Chemical science, 8(4):3192–3203, 2017.

[200] Daniel J Cole, Joshua T Horton, Lauren Nelson, and Vadiraj Kurdekar. The future of force fields in computer-aided drug design, 2019.

[201] Frank Noé, Gianni De Fabritiis, and Cecilia Clementi. Machine learning for protein folding and dynamics. Current Opinion in Structural Biology, 60:77–84, 2020.

[202] Alessandro Laio and Michele Parrinello. Escaping free-energy minima. Pro- ceedings of the National Academy of Sciences, 99(20):12562–12566, 2002.

[203] Jordane Preto and Cecilia Clementi. Fast recovery of free energy landscapes via diffusion-map-directed molecular dynamics. Physical Chemistry Chemical Physics, 16(36):19181–19191, 2014.

[204] Christopher G Mayne, Melanie Muller, and Emad Tajkhorshid. Parameteriz- ing small molecules using the force field toolkit (fftk). University of Illinois at Urbana-Champaign, 2015.

[205] Alexandre Tkatchenko and Matthias Scheffler. Accurate molecular van der waals interactions from ground-state electron density and free-atom reference data. Physical review letters, 102(7):073005, 2009.

[206] Paul LA Popelier. Qctff: on the construction of a novel protein force field. International Journal of Quantum Chemistry, 115(16):1005–1011, 2015.

[207] François Zielinski, Peter I Maxwell, Timothy L Fletcher, Stuart J Davie, Nicodemo Di Pasquale, Salvatore Cardamone, Matthew JL Mills, and Paul LA Popelier. Geometry optimization with machine trained topological atoms. Sci- entific reports, 7(1):1–18, 2017.

[208] MA Blanco, A Martín Pendás, and E Francisco. Interacting quantum atoms: a correlated energy decomposition scheme based on the quantum theory of atoms in molecules. Journal of chemical theory and computation, 1(6):1096–1109, 2005.

[209] Noel Cressie. Statistics for spatial data. Terra Nova, 4(5):613–617, 1992.

[210] Joseph CR Thacker, Alex L Wilson, Zak E Hughes, Matthew J Burn, Peter I Maxwell, and Paul LA Popelier. Towards the simulation of biomolecules: optimisation of peptide-capped glycine using fflux. Molecular Simulation, 44(11):881–890, 2018.

[211] Zak E Hughes, Joseph CR Thacker, Alex L Wilson, and Paul LA Popelier. Description of potential energy surfaces of molecules using fflux machine learning models. Journal of chemical theory and computation, 15(1):116–126, 2018. references 60 [212] Zak E Hughes, Emmanuel Ren, Joseph CR Thacker, Benjamin CB Symons, Arnaldo F Silva, and Paul LA Popelier. A fflux water model: Flexible, polariz- able and with a multipolar description of electrostatics. Journal of computational chemistry, 41(7):619–628, 2020.

[213] Paul Popelier, Matthew Burn, Benjamin Symons, Alex Wilson, and Zak Hughes. Fflux: on knowledgeable quantum atoms. In Report on the Work- shop “Developing High-Dimensional Potential Energy Surfaces: From the Gas Phase to Materials” Göttingen, April 24-26, 2019, page 118.

[214] Alice EA Allen, Michael C Payne, and Daniel J Cole. Harmonic force constants for molecular mechanics force fields via hessian matrix projection. Journal of chemical theory and computation, 14(1):274–281, 2018.

[215] Jorge M Seminario. Calculation of intramolecular force fields from second- derivative tensors. International journal of quantum chemistry, 60(7):1271–1277, 1996.

[216] Joshua T Horton, Alice EA Allen, Leela S Dodda, and Daniel J Cole. Qubekit: automating the derivation of force field parameters from quantum mechanics. Journal of chemical information and modeling, 59(4):1366–1381, 2019.

[217] Daniel J Cole, Israel Cabeza de Vaca, and William L Jorgensen. Computation of protein–ligand binding free energies using quantum mechanical bespoke force fields. MedChemComm, 10(7):1116–1120, 2019.

[218] Patrick Bleiziffer, Kay Schaller, and Sereina Riniker. Machine learning of par- tial charges derived from high-quality quantum-mechanical calculations. Jour- nal of chemical information and modeling, 58(3):579–590, 2018.

[219] David Weininger, Arthur Weininger, and Joseph L Weininger. Smiles. 2. algo- rithm for generation of unique smiles notation. Journal of chemical information and computer sciences, 29(2):97–101, 1989.

[220] Louis P Lee, Daniel J Cole, Chris-Kriton Skylaris, William L Jorgensen, and Mike C Payne. Polarized protein-specific charges from atoms-in-molecule electron density partitioning. Journal of chemical theory and computation, 9(7):2981–2991, 2013.

[221] Esther Heid, Markus Fleck, Payal Chatterjee, Christian Schröder, and Alexan- der D MacKerell Jr. Toward prediction of electrostatic parameters for force fields that explicitly treat electronic polarization. Journal of chemical theory and computation, 15(4):2460–2469, 2019.

[222] Uwe Koch and Anthony J Stone. Conformational dependence of the molecu- lar charge distribution and its influence on intermolecular interactions. Journal of the Chemical Society, Faraday Transactions, 92(10):1701–1708, 1996.

[223] Pär Söderhjelm, Jacob Kongsted, and Ulf Ryde. Conformational depen- dence of isotropic polarizabilities. Journal of chemical theory and computation, 7(5):1404–1414, 2011. references 61 [224] Diptarka Hait and Martin Head-Gordon. How accurate are static polarizabil- ity predictions from density functional theory? an assessment over 132 species at equilibrium geometry. Physical Chemistry Chemical Physics, 20(30):19800– 19810, 2018.

[225] David M Wilkins, Andrea Grisafi, Yang Yang, Ka Un Lao, Robert A DiStasio, and Michele Ceriotti. Accurate molecular polarizabilities with coupled cluster theory and machine learning. Proceedings of the National Academy of Sciences, 116(9):3401–3406, 2019.

[226] Hiroko X Kondo, Ayumi Kusaka, Colin K Kitakawa, Jinta Onari, Shusuke Yamanaka, Haruki Nakamura, and Yu Takano. Hydrogen bond donors and acceptors are generally depolarized in α-helices as revealed by a molecular tailoring approach. Journal of computational chemistry, 40(23):2043–2052, 2019.

[227] Jan Hermann, Robert A DiStasio Jr, and Alexandre Tkatchenko. First- principles models for van der waals interactions in molecules and materials: Concepts, theory, and applications. Chemical Reviews, 117(6):4714–4758, 2017.

[228] Benjamin Nebgen, Nicholas Lubbers, Justin S Smith, Andrew E Sifain, Andrey Lokhov, Olexandr Isayev, Adrian E Roitberg, Kipton Barros, and Sergei Tre- tiak. Transferable dynamic molecular charge assignment using deep neural networks. Journal of chemical theory and computation, 14(9):4687–4698, 2018.

[229] Matthias Rupp, Alexandre Tkatchenko, Klaus-Robert Müller, and O Anatole Von Lilienfeld. Fast and accurate modeling of molecular atomization energies with machine learning. Physical review letters, 108(5):058301, 2012.

[230] Haoyan Huo and Matthias Rupp. Unified representation of molecules and crystals for machine learning. arXiv preprint arXiv:1704.06439, 2017.

[231] Andrea Grisafi and Michele Ceriotti. Incorporating long-range physics in atomic-scale machine learning. The Journal of chemical physics, 151(20):204105, 2019.

[232] Justin S Smith, Ben Nebgen, Nicholas Lubbers, Olexandr Isayev, and Adrian E Roitberg. Less is more: Sampling chemical space with active learning. The Journal of chemical physics, 148(24):241733, 2018.

[233] Auguste Comte. Cours de philosophie positive, 6 vols. Paris: Bachelier, 42:1851–54, 1830.

[234] Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Au- gusta H Teller, and Edward Teller. Equation of state calculations by fast com- puting machines. The journal of chemical physics, 21(6):1087–1092, 1953.

[235] Mark E Tuckerman. Machine learning transforms how microstates are sam- pled. Science, 365(6457):982–983, 2019.

[236] Frank Noé, Simon Olsson, Jonas Köhler, and Hao Wu. Boltzmann genera- tors: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019. references 62 [237] Ming Chen, Tang-Qing Yu, and Mark E Tuckerman. Locating landmarks on high-dimensional free energy surfaces. Proceedings of the National Academy of Sciences, 112(11):3235–3240, 2015.

[238] Bruno AC Horta, Pascal T Merz, Patrick FJ Fuchs, Jozica Dolenc, Sereina Riniker, and Philippe H Hünenberger. A gromos-compatible force field for small organic molecules in the condensed phase: The 2016h66 parameter set. Journal of chemical theory and computation, 12(8):3825–3850, 2016.

[239] Marie L Laury, Lee-Ping Wang, Vijay S Pande, Teresa Head-Gordon, and Jay W Ponder. Revised parameters for the amoeba polarizable atomic mul- tipole water model. The Journal of Physical Chemistry B, 119(29):9423–9437, 2015.

[240] Koen M Visscher and Daan P Geerke. Deriving a polarizable force field for biomolecular building blocks with minimal empirical calibration. The Journal of Physical Chemistry B, 2020.