2 0 Y E A R S O F P O L A R I Z A B L E
F O R C E F I E L D D E V E LO P M E N T
for biomolecular systems
thor van heesch
Supervised by Daan P. Geerke and Paola Gori-Giorg
1
contents
2
contents
12
Introduction
3448
Force fields: Basics, Caveats and Extensions 2.1 The Classical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The caveats of point-charge electrostatics . . . . . . . . . . . . . . . . . 2.3 Physical phenomenon of polarizability . . . . . . . . . . . . . . . . . . 10 2.4 Common implementation methods of electronic polarization . . . . . 11 2.5 Accounting for anisotropic interactions . . . . . . . . . . . . . . . . . . 14
3
Knitting the reviews and perspectives together
17
3.1 Polarizability: A smoking gun? . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 New branches of electronic polarization . . . . . . . . . . . . . . . . . 18 3.3 Descriptions of electrostatics . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 Solvation and polarization . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5 The rise of new challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6 Parameterization or polarization? . . . . . . . . . . . . . . . . . . . . . . 22 3.7 Enough response: how far away? . . . . . . . . . . . . . . . . . . . . . . 23 3.8 The last perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.9 A new hope: the next-generation force fields . . . . . . . . . . . . . . . 29
4
Learning with machines
29
4.1 Replace the functional form with machine learned force fields . . . . . 31 4.2 A different take on polarizable force fields . . . . . . . . . . . . . . . . 33 4.3 Are transferable parameters an universal requirement? . . . . . . . . . 33 4.4 The difference between derivation and prediction . . . . . . . . . . . . 34 4.5 From small molecules to long range interactions . . . . . . . . . . . . . 36 4.6 Enough knowledge to fold a protein? . . . . . . . . . . . . . . . . . . . 37 4.7 Boltzmann generators, a not so hypothetical machine anymore . . . . 38
5
Summary: The Red Thread
41
introduction
3
abstract
In this literature study we aimed to answer the following question: What has changed in the outlook on polarizable force field development during the last 20 years? The theory, history, methods, and applications of polarizable force fields have been discussed to address this question. This investigation showed that the quality of the force field potential is detrimental for any type of atomistic sampling technique. If the underlying energy function contains flaws, these flaws are one way or another embedded in the fast amount of data we seek to understand to reproduce a wide range of quantifiable observables accurately. Subsequently, we identified two key challenges associated with the development of polarizable force fields: First, how to determine transferable and accurate parameters sets, and second, how to advance the underlying physical model without computational overhead. After twenty year the classical avenue of force field development has the modeling community finally made the transition towards a general acceptance of the need to develop more physically sound models. During the last 5 years machine learning techniques emerged to provide new means to remove bottlenecks in the current process towards the development of an accurate force field potential.
1 introduction
Next-generation atomistic force fields include polarization effects for the simulation of biomolecular systems. How this apparent change happened is a different story, to explain this generational transition we begin our study in the late 1950s. Physicists Bernie Alder and Thomas Wainwright were the first to translate digital computation into the study of many particle systems.[1] Eventually, the offspring of their research brought the simulation method called molecular dynamics (MD) to reality. Currently, classical MD simulation methods are being applied to study a multitude of physical, chemical, and biological systems, ranging from pure liquids to large complex systems such as proteins and cell membranes. [2, 3] As a result atomistic simulations have become an important tool to understand fundamental processes of biological systems.
Since the pioneering work of Alder and Wainwright, computing performance has increased by more than trillion fold.[4] This rapid development of digital machines lead to the expansion of system sizes and increase of timescales. As this was not the only advancement, since technological advancement inspired in an equal manner the drive to search for faster, more efficient and accurate underlying physical models for our simulation methods. In response a diverse set of atomistic simulations methods developed (co-)independently for the simulation of electrolytes, ionic liquids, metal organic frame works, biomolecular systems, and other types of nano-materials.
In this study we will keep our attention focused on the simulation of biomolecular systems. For this particular simulation field, the inclusion of explicit polarization effect has dominated the evolutionary process towards obtaining an improved description of the systems under investigation. To understand the reasons why the biomolecular simulation community chose to include polarization effects into the
force fields: basics, caveats and extensions
4
atomistic model, we will let ourselves get inspired by the following remarkable observation: The number of reviews that specifically address recent developments in polarizable force fields is numerous: [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. This (incomplete) selection of publications account together for more than 2000 citations according to Google Scholar. The period over which these reviews span, from 2001 till the present, amounts to almost two decades. Therefore, it might be fruitful to ask ourselves the following question: what has changed in the outlook on polarizable force field development during the last 20 years?
First, this approach will allow us to assess the progress made and may show us the edge of where polarizable force fields are right now. Second, this method could expose whether there are persistent factors that hamper our progress towards a polarizable force field for general use in biomolecular simulation. After the evaluation of this question we will discuss the open ends and look forward to how these new challenges can be solved. This will include the exploration of how Machine Learning (ML) methods and novel sampling techniques can aid towards the development of general use biomolecular force fields. Finally, we will summarise these efforts and provide an outlook on the current status of polarizable force field development for biomolecular simulations.
2 force fields: basics, caveats and extensions
Before we will proceed with an analysis of the aforementioned reviews we will provide the necessary background information to understand what is about to be discussed. The next sections will therefore explain the basics behind force fields together with the caveats of choosing a model that is based on fixed point charges, i.e., a non-polarizable force field. The first section discusses the components of the potential-energy function in terms of bonded and non-bonded interactions and seemingly continues with the caveats related to this classical approach. This next section will focus on the methods used to refine the model for its intent and purpose. Finally, the contributions of anisotropic charge distributions are discussed in the last section.
2.1 The Classical Approach
In molecular dynamics (MD) simulations are the atoms more often than not treated as point-like particles. Their reciprocal interactions in combination with the dynamic equations of motion determine how the system will evolve over time. This simple approach is well suited to simulate the collective behavior of atoms in molecular structures and ensembles. Moreover, when a few assumptions are set aside, this level of theory can determine both the micro- and macroscopic properties of the respective system within the line of expectations. [17]
However, there is still plenty of room for uncertainty to develop during the course of the simulation. The biggest assumption that makes atomistic simulations to some extent a conjecture is the lack of an explicit expressions for electrons. In addition, the lack hereof assumes that the system is always in the electronic ground state.
force fields: basics, caveats and extensions
5
As it where, these factors seem like a major shortcoming for creating a faithful molecular model. But, the simplicity of the atomistic model is also its greatest strength. Accounting for the electronic behavior implicitly allows for long simulation times up to the regime of µs, while maintaining relative low computational cost.[18, 19, 20] Given the reason for treating atoms as simple particles, how do we resolve the need for inclusion of fundamental electrostatic effects without turning to on-the-fly quantum mechanical calculations?
Atomistic simulation methods have developed an implicit way to account for the existence and effects of electrons. The electronic energy is formulated as a parametric function of the nuclear coordinates and corresponding parameters are subsequently fitted to experimental or higher level computational data. Such a parameterised potential that describes all forces in the system is called a force field. This description allows us to think more pragmatic about what chemistry conveys in molecular simulations. Simply put, chemistry becomes the knowing of the energy as a function of nuclear coordinates and molecular properties become the knowing of how the energy changes upon adding a perturbation to the system. [17] Determining the physics using this approach is justified in classical mechanics, because this is the basis of methods that give access to the Boltzmann weighted ensembles from which macroscopic properties of the system directly follow. [21] Still, there are numerous ways to construction of such energy functions. In the following section we will therefore discuss the most important components for the construction of a force field.
2.1.1 An atomistic-potential energy function
Figure 1: Schematic illustration of the terms in a classical fixed-charge force field, i.e. bond stretching (Ebond), bond-angle bending (Eangle), dihedral- angle torsion (Etorsion), and improper dihedral-angle bending (Eimproper) as well as van der Waals (EvdW) and electrostatic (Eele) interactions. Figure courtesy to ref. [22]
The majority of force fields describe the interactions between the atoms in the system via a potential energy function of the atomic coordinates. The total molecular potential energy is then composed of many terms and its exact form is unknown. Today classical biomolecular force fields use essentially the same gross approximation for the potential energy as proposed by Levitt and Lifson in 1969: [23]
E
pot(~r) = Espr(~r) + Eang(~r) + Edih(~r) + Eimp(~r) + EvdW(~r) + Eele(~r)
(1)
Here is Epot the potential-energy function of the system and ~r are the coordinates of all atoms in the system. Espr, Eang, Edih, Eimp are the bonded terms
force fields: basics, caveats and extensions
6
for bond stretching, bond-angle bending, dihedral-angle torsion, and improper dihedral-angle bending (or out-of-plane distortions) in the description of molecules. Together these terms account for all the energy contributions due to covalent interactions: Ecov = Espr(~r) + Eang(~r) + Edih(~r) + Eimp(~r). The interactions between atoms that are not directly connected via covalent bonds and bond angles consist
- of the potential-energy term for all nonbonded interactions: Enb = EvdW + Eele
- .
Where the van der Waals term, EvdW, accounts for dispersion and core-core repulsion, and the latter term, Eele, for the electrostatic interactions of the system. Together with the potential-energy terms for the covalent interactions this again sums up to the full potential-energy expression of the force field:
Epot = Ecov + Enb
(2)
See figure 1 for a schematic illustration of the terms in a classical fixed-charged force field.
Without question, additional terms can be added to this general expression of the Epot, e.g., in the early days hydrogen-bonding interaction terms, or cross-terms that describe coupling between the first three terms of the covalent interactions and other types of restraining potential-energy functions could be utilized to prevent the system from drifting away. [17] However, these extra descriptors are usually correction terms to account for omissions made in the other terms to still obtain realistic free energy profiles, instead of improving the description of the other terms. Despite the major importance of all the electronic effects, the partial charges of the atoms are fixed during simulations that are based upon classical force fields. Historically, this choice was made due to the high computational cost and added complexity associated with a more physics based force fields. [24]
Starting from the static decision the continuous development of fixed-charge force fields has actually shown to be surprisingly successful. A recent publication by S. Riniker illustrates the development of this class of force fields and provides an overview on the major force-field families, with detailed discussions about the different covalent and nonbonded force-field terms, parametrization strategies and the historic background of these models. [22] Therefore, the aspects mentioned above revolving around non-additive force fields will only be touched upon in this literature study. Instead, we will mainly focus on the effects and importance of extending the description of the classical electrostatics in biomolecular simulations.
2.1.2 Van der Waals potential
In (almost) all fixed-charged force-fields, the van der Waals interactions are described using a 12-6 Lennard-Jones (LJ) functional form
C12(i, j) C6(i, j)
EvijdW(rij) =
−
(3)
12 ij
r
r6ij
with C12(i, j) = 4ꢀij(σij)12, C6(i, j) = 4ꢀij(σij)6, where rij is the distance between atoms i and j, and where ꢀij denotes the depth of the attractive well and σij the inter-particle distance where the potential changes sign.[25, 26]
The function of van der Waals energy, Evdw, is to describe the repulsion or attraction between atoms that are not directly bonded. Often the Evdw is interpreted
force fields: basics, caveats and extensions
7
as the non-polar part of the interaction not related to electrostatic energy due to charges.[17] Hence, the atoms in the system are grouped into atom types, which all have respective LJ parameters and typically represent atoms in a specific chemical environment, for example, a nitrogen in an amine group or oxygen in a hydroxylgroup, and so forth.
However, it should be noted that this widely used non-bonded term also has some
(major) flaws embedded. Recently Frenkel et. al noted that over the course of time the LJ potential, which originally has been derived to describe the cohesive energy of crystals of noble gases, is used so often for systems where it is not expected to be particularly realistic that the disadvantages of the LJ potential have become very relevant these days.[27] The implications of the arguments made against the LJ 12-6 potential can be summarised as follows: the LJ 12-6 potential is anything but a welldefined standard and, in particular for proteins and nano-colloids it is not a good model. Mainly, because the range of attraction is too large compared to the effective diameter.
To top it off, Frenkel et al. argue there is not obvious indication of why a truncated
LJ 12-6 potential should have any special merits that outweigh its disadvantages.[27] Instead, they argue that no such advantage exists and present a simple model pair potential to unify all the different unambiguous truncation methods used in the past. In contrast to all the shifted, truncated and interpolated LJ 12-6 models, once a cut-off distance, rc, is specified the model is uniquely defined.1 If required, their class of LJ-like potentials could be used for numerical studies of systems of particles with short-ranged attraction. For example, a set of thermodynamic and transport properties is reported for the cases rc=2.0 (“atomic liquid”) and rc=1.2 (“colloidal suspension”).[27] In short, the main message is that we should be careful with continuing trends based old habits during the construction of next-generation force fields.
2.1.3 Electrostatic energy
The electrostatic interaction of the nonbonded interaction should account for the internal (re)distribution of the electrons, creating positive and negative parts of the molecule. [10] At the lowest order of approximation, this electronic behaviour can be modeled by assigning a bond dipole to the bond or, more conventionally, partial charges to each atom. For example, in classical fixed-charge force fields, this pairwise Coulomb interactions between the point charges of atoms i and j is considered as
qiqj
1
Eeijle(rij) =
(4)
4πε0ε1 rij
where q is the partial charge, ε0 the electric constant, ε1 the background dielectric permittivity, and rij the distance between atoms i and j.[22] The classical electrostatic interactions can be viewed as the remaining long-range part, (1r ), of the quantum mechanical electrostatic interactions, after all short-range contributions like bonds, repulsion and dispersion are removed. Still, obtaining a good description of the electrostatic interaction between molecules (or between different parts of the same molecule) is one of the big challenges in force field developments. [10] The
1 The united model potential vanishes quadratically at a cut-off distance rc and represents now specific substance. The authors stress that their model rather represent generic models. However, the authors
mention, I cite, "very often the ubiquitious LJ 12-6 potential is used in exactly the same way."[27]
force fields: basics, caveats and extensions
8
predictive ability of MD simulations relies on the accuracy of the underlying force field.
2.2 The caveats of point-charge electrostatics
From equation 3 it is evident that not all types of electrostatic interactions are accounted for, and as such the deficiencies in the electrostatic potential that result from the fixed partial atomic charge approximation have been widely acknowledged over the past decade.[11, 10, 28, 12, 29, 9, 13, 30] Both Jensen and Hagler independently made special efforts to summarize all the components that currently are unaccounted for in most “standard” force fields and are likely required to achieve experimental accuracy in (bio)molecular force fields.[17, 31] The following 8 points summarize the combined efforts in addressing the caveats associated with fixedpoint charge force fields:
i) Electronic polarization effects are not account for, as fixed-point charge models only two-body interactions are included. However, for polar bodies the three-body contribution is quite significant, perhaps 10–20% of the two-body term. [32] Moreover, the response of the molecular dipole moments to variances in dielectric conditions are not taken into account by the fixed charge approximation models.[33, 34]
ii) The geometric dependence of charge and higher order multi-poles is omitted. [35,
36] Studies have shown that both partial atomic charges and higher-order electric moments have a significant dependence on the geometry. Consequently, these quantities cannot fulfill the requirement of transferability in parameter sets.
iii) The effects and energetics associated with charge transfer (CT) that occur between compounds, as for example in H-bonded complexes, are not accounted for in additive force fields. [37, 38, 39]
iv) The partial charge model cannot correctly model the electrostatics surrounding a molecule, resulting in errors in the electrostatic potential ranging between 10 to 20 kJ/mol.[17]
From section 2.1.2 we already concluded that the use of the standard LJ potential is not really justified. In addition, the following points will specifically address caveats hidden in the van der Waals potential and why these components should be accounted for or revisited.
v) The r−6 dispersion is an approximation omitting higher order terms, which have been indicated to be important for close packed systems such as globular proteins. [40, 41]