Albert-Ludwigs-Universit¨atFreiburg Fakult¨atf¨urMathematik und Physik
Water models and hydrogen bonds
Dissertation zur Erlangung des Doktorgrades der Fakult¨atf¨urMathematik und Physik der Albert-Ludwigs-Universit¨atFreiburg im Breisgau Freiburg Institute for Advanced Studies
vorgelegt von Roman Shevchuk betreut durch Prof. Dr. Gerhard Stock / Dr. Francesco Rao
Freiburg, 2014 Dekan : Prof. Dr. Michal R˚uˇziˇcka Prodekan : Prof. Dr. Andreas Buchleitner Leiter der Arbeit : Prof. Dr. Gerhard Stock Referent : Prof. Dr. Gerhard Stock Koreferent : PD Dr. Thomas Wellens Datum der m¨undlichen Pr¨ufung : 08.05.2014 Contents
Introduction 4
1 Molecular simulations 9 1.1 Force fields ...... 10 1.2 Newtonian dynamics ...... 11 1.3 Thermostats ...... 12 1.4 Barostats ...... 15 1.5 Water models in molecular dynamics ...... 16 1.6 Simulation details ...... 20
2 Water phase diagram and water anomalies 22 2.1 Water phase diagram ...... 22 2.2 Water anomalies ...... 23
3 Water supercooling and freezing 28 3.1 General perspective ...... 28 3.2 Test of water freezing ...... 30
4 Complex network approach for molecular dynamics trajec- tories and hydrogen bond as an order parameter 36 4.1 Complex network as a tool to study molecular simulations . . 37 4.2 Hydrogen bond criteria ...... 43
3 4 CONTENTS
5 Applications 57 5.1 Study of classical water models at ambient pressure ...... 57 5.2 Effect of polarizability ...... 67 5.3 Free energy landscape of water ...... 74 5.4 Proton transfer ...... 84
Conclusions 97
Bibliography 100
Acknowledgment 122 Introduction
For every phenomenon, however complex, someone will eventually come up with a simple and elegant theory. This theory will be wrong. Rotschild’s Rule
Water is the most important element for all living organisms on Earth. About 80 percents of all living cells consist of water [1]. It plays a role of solvent and thermoregulator, being the environment for the vast majority of all biochemical processes. At the fundamental level, water directly influences several biologically relevant processes including protein folding [2], protein- protein association [2–5] and amyloid aggregation [6]. A single water molecule consists of two hydrogens and an oxygen atom forming a V-shaped molecule with an angle of about 106◦. Because oxygen has a higher electronegativity than hydrogen, the side of the molecule with the oxygen is partially negative and the hydrogen end is partially positive. Consequently, the direction of the dipole moment points from the oxygen towards the center of the hydrogens. This charge difference causes water molecules to be attracted to each other through highly directional hydrogen bonds (the relatively positive areas being attracted to the relatively negative areas) as well as to other polar molecules [7]. One of most interesting properties of water is its polyamorphism. At
5 least 15 crystalline forms of ice are known [8]. For example the number of crystalline modifications of Si or Ge is comparable, but their structural diversity is connected with the transition from semiconductors to metals, on the other hand, the nature of intermolecular interactions in water ice is the same. Water molecules keep their individuality and what changes is the order and structure of the hydrogen bond network [9]. Since there are so many possible crystal structures of water, two questions spontaneously emerge: (i) is there any residual structure in liquid water? (ii) how does water crystallize into ice? To address these questions the concept of network of hydrogen bonds which is continuous in space was proposed by Bernal and Fowler [10]. With all modern experimental and computational techniques there is no doubt that at normal conditions water molecules are connected through three-dimensional network of hydrogen bonds [11, 12]. Many interesting results were obtained by simulations [13–17] and experiments [18,19]. But the problem is that even nowadays none of the experimental methods can track the motion of single water molecules in bulk liquid or explicitly detect all hydrogen bonds in the bulk. This is where computer simulations come into play. The first computer simulation of water was done at the end of the 60s [20,21]. At that time it was possible to simulate a system of a few hundreds of water molecules, where van der Waals interactions were described with a Lennard-Jones potential [22]. With the rise of computational power, the number of simulated molecules increased by several orders of magnitude [23] as well as new refined (and more complex) water models appeared, including molecular flexibility and polarizability [24–27]. In this thesis we will focus on several aspects of molecular dynamics stud- ies of liquid water, particularly the temperature response of some of the most popular water models, including their hydrogen bond network struc- ture. Apart from commonly used thermodynamical measurements here we apply a recently developed complex network framework [16,28]. Within this framework the system is described by a discrete set of a microstates evolv-
6 Introduction ing in time. Microstates represent the nodes of a transition network where a link is placed between two microstates if the system jumped from one to the other one along the molecular dynamics trajectory. Thanks to the net- work analyzing such as cluster structure it is possible to characterize both thermodynamics and kinetics of the system. Combining a complex network framework with more conventional tools like radial distribution function, a detailed description of liquid water is achieved. A short overview of this thesis is presented below:
• In Chapter 1 an introduction of the basic principles of molecular dy- namics simulations is provided. The most commonly used approaches for temperature and pressure coupling is described as well as the dif- ference between classical molecular dynamics and Langevin dynamics.
• In Chapter 2 the picture of the phase diagram of water is given as well as the description of some of water’s properties and so called anomalies. In particular, the water density and thermodynamic anomalies such as presence of the maximum of the density above melting temperature and anomalous increase of viscosity at supercooled region is highlighted.
• In Chapter 3 we briefly describe the problems related to supercooled water. The results of the microsecond-long simulation of water in this region are shown, where the correlation between water energy, density and structural order as well as possible scenarios of water freezing were discussed.
• In Chapter 4 we give an analysis of the molecular dynamics trajectories via the complex network approach. The detailed description of complex network building for the case of liquid water is provided. In the second section of this chapter the hydrogen bond definitions commonly used in molecular dynamics are analyzed in detail.
• In Chapter 5 the applications of above described methods and tools are provided. In particular, the free-energy landscape of water in 220K <
7 T < 340K temperature range is studied via complex network analysis. We present the comparative analysis of seven classical water models as well as the polarizable SWM4-NDP water model. Moreover, the simplified complex network analysis for the case of proton transfer in bulk water is presented. All molecular simulations presented in this thesis (except the ones de- scribed in section 5.4) have been prepared, launched and analyzed by myself. The statistical tools and algorithms used for the analysis have been coded by me in collaboration with Dr. D. Prada-Gracia and in- cluded in a software library called AQUAlab (GPL license, available at raolab.com).
8 Introduction
Some results of this thesis were published in the following papers:
– R. Shevchuk, D. Prada-Gracia, and F. Rao. Water structure- forming capabilities are temperature shifted for different models. J. Phys. Chem. B., 116(25):7538–7543, 2012. – R. Shevchuk and F. Rao. Note: Microsecond long atomistic sim- ulation of supercooled water. J. Chem. Phys., 137:036101, 2012. – D. Prada-Gracia, R. Shevchuk, P. Hamm, and F. Rao. Towards a microscopic description of the free-energy landscape of water. J. Chem. Phys., 137:144504, 2012. – D. Prada-Gracia*, R. Shevchuk* and F. Rao. The quest for self-consistency in hydrogen bond definitions. J. Chem. Phys., 139:084501, 2013. * authors contributed equally to this work.
9 Chapter 1
Molecular simulations
In the recent years along with traditional experiments, computer simulations became a useful tool to elucidate some physical and chemical processes on the molecular level. Here we mainly use classical molecular dynamics sim- ulations, which are a tool that allows to simulate the microscopic system with all-atom resolution using simple Newtonian equations of motion. There are multiple applications of molecular dynamics: they are used for refine- ment of molecular structure from the experiments (crystallography, NMR or electronic microscopy), for the interpretation of the experimental data, for the prediction of functional properties of biological systems and for sampling the regions of phase space which are unreachable in the experiments [29]. First molecular simulations of water were made around forty years ago and were able to calculate the trajectory of few hundreds of atoms for several picoseconds [30]. Since that time the increase of computational power allows simulations to be significantly larger in size and longer in time. Several simu- lations packages such as GROMACS [31], NAMD [32] and LAMMPS [33] al- low to use modern hardware and multiclustering algorithms. Here we briefly describe the basic concepts of molecular dynamics simulations.
10 Chapter 1: Molecular simulations
1.1 Force fields
In classical molecular dynamics all the covalent bonds can not be broken. In the classical form, the potential energy the potential energy of the system
U(r) depends on the positions of all N atoms of the system r = (r1, r2, ..., rN ).
Moreover, the system is characterized by the mass of each atom mi and cer- tain boundary conditions. In practice the molecular simulation is performed with one of the available potentials (force fields) such as CHARMM [34], AMBER [35], OPLS [36], where the potential typically has such a form:
V U(r) = K (r − r )2 + K (θ − θ )2 + n [1 + cos(nχ − δ)]+ b 0 a 0 2 bonds angles dihedrals X X X 2 + Kijkl(S − S0) + ULJ (r) + UE(r) (1.1) impr.dih. X
where l is the length of a bond, θ is bond angle, χ is the dihedral angle, rij is the distance between two atoms and all the other variables are the param- eters of the model, which numerical values can be different in different force
fields. Here, the coefficients Ki for each term are fitted from ab initio data or are empirical and calculated in a way that better match the experimental behavior of studied system. Lennard-Jones potential is representative for repulsion and van der Waals forces [22] and is defined as:
σ σ U (r) = 4 [ ij − ij ], (1.2) LJ r12 r6 i q q U (r) = i j . (1.3) E 4π r i 11 Figure 1.1: Schematic illustration of terms of bonded potential energy in molecular dynamics simulations. 1.2 Newtonian dynamics In classical mechanics, the time evolution of the system is governed by the classical Newton equations: r¨i = fi/mi, (1.4) where fi is the potential force acting on the i-th atom: fi = ∂U/∂ri It is assumed that the system occupies the volume of appropriate shape, so the periodic boundary conditions can be applied. In numerical simulation, the system moves with a discrete steps of a small time interval ∆t. The value of ∆t has to be smaller than the fastest vibrations of the systems in order to obtain reasonable trajectory. The moves are performed with a numerical algorithms [37–40] that allows to obtain the coordinates of each atom ri and velocities r˙i at the next timestep t0 + ∆t, provided that these values are known at time t0. The most common practice is to apply periodic boundary conditions and calculate the energy of the long-ranged electrostatic inter- actions via particle-mesh Ewald method [39]. For improving the efficiency, 12 Chapter 1: Molecular simulations the constrains for the covalent bonds are applied. This approach introduces additional forces that act on the atoms along their bonds. Hence the bond between atoms i and j gives rise to a pair of forces: the force gij = λij(ri −rj) acting on atom i and the force gji = λji(rj − ri) acting on j atom, where the coefficients λij and λji are equal [29]. The Newtonian dynamics require that the system keeps its total energy constant and moves in a way predefined by its initial conditions (i.e. starting positions of the atoms). However, the real systems involve some stochastic degrees of freedom via coupling to the external environment which acts as a heat bath. In this case the total en- ergy of the system fluctuates within a certain distribution characterized by certain temperature and pressure. Here we briefly introduce the most com- mon algorithms to introduce temperature and pressure coupling in molecular dynamics. 1.3 Thermostats 1.3.1 Andersen thermostat The easy way to obtain a temperature coupling is to periodically redefine the velocities of each particle from a Maxwell-Boltzmann distribution [41]. This can either be done by randomizing all the velocities simultaneously every τT /∆t steps, or by randomizing every particle with some small probabil- ity ∆t/τ every timestep, where ∆t is the timestep and τT is characteristic coupling time. This algorithm avoids some of the ergodicity issues of other algorithms, as energy cannot flow back and forth between energetically decoupled compo- nents of the system as in velocity scaling motions. However, it can slow down the kinetics of system by randomizing correlated motions of the system. 13 1.3.2 Berendsen thermostat The Berendsen algorithm mimics weak coupling with first-order kinetics to an external heat bath with given temperature T0 [42]. The effect of this algorithm is that a deviation of the system temperature from T0 is slowly corrected according to: dT T − T = 0 (1.5) dt τ which means that a temperature deviation decays exponentially with a time constant τ. This method of coupling has the advantage that the strength of the coupling can be varied and adapted to the specific system. The Berendsen thermostat suppresses the fluctuations of the kinetic energy. This means that one does not generate a proper canonical ensemble, so rigorously, the 1 sampling will be incorrect. This error scales with N , so for very large systems most ensemble averages will not be affected significantly, except for the distribution of the kinetic energy itself. However, fluctuation properties, such as the heat capacity, will be affected [31]. 1.3.3 Velocity-rescaling thermostat The velocity-rescaling thermostat [43] is similar to a Berendsen thermostat but has an additional stochastic term that ensures a correct kinetic energy distribution by modifying it according to dt KK0 dW dK = (K0 − K) + 2 √ , (1.6) τT s Nf τT where K is the kinetic energy, Nf is the number of degrees of freedom and dW a Wiener process. This thermostat produces a correct canonical ensemble and still has the advantage of the Berendsen thermostat: first order decay of temperature deviations and no oscillations. 14 Chapter 1: Molecular simulations 1.3.4 Nose-Hoover thermostat In the Nose-Hoover scheme the system Hamiltonian extended by introducing a thermal reservoir and a friction term in the equations of motion [44, 45]. The friction force is proportional to the product of each particle velocity and a friction parameter, ξ. This parameter is a dynamic quantity with its own momentum and equation of motion and the time derivative is calculated from the difference between the current kinetic energy and the reference temperature [31]. In this case the Newtonian equation has an additional term: 2 d ri fi pξ dri 2 = − , (1.7) dt mi Q dt where Q is a constant of the coupling and the equation of the motion for the heat bath is: dp ξ = T − T , (1.8) dt 0 where T0 is the reference temperature and T is the current temperature of the system. 1.3.5 Langevin dynamics Another way to introduce stochastic degrees of freedom to the system is to introduce random forces and to compensate for their overheating effect using phenomenological friction terms [46]. In this way the modified Newton equation will take a form: r¨i = f i/mi − γir˙i + Fi/mi, (1.9) where the force Fi is a random function of time which fluctuates very rapidly in comparison with integration timestep ∆t. This force does not depend on positions and velocities of the atoms. Then, the integrators of the system can be written as: 15 1 1 1 − α kBT 2 G v(t + ∆t) = αv(t − ∆t) + F(t) + (1 − α ri (1.10) 2 2 mγ r m 1 r(t + ∆t) = r(t) + ∆tv(t + ∆t), (1.11) 2 where γ∆t α = (1 − ). (1.12) m G Here ri is Gaussian distributed noise with µ = 0, σ = 1. 1.4 Barostats 1.4.1 Berendsen barostat The Berendsen barostat rescales the coordinates and the size of the simula- tion system every step [31, 42], or every n steps, with a matrix µ which has the effect of a first-order kinetic relaxation of the pressure towards a given reference pressure P0 according to dP P − P = 0 (1.13) dt τp The matrix µ is defined as n∆t µij = δij − βijP0ij − Pij(t), (1.14) 3τp where β is the isothermal compressibility of the system. It is worth to note that Berendsen barostat does not give the exact NPT ensemble but is just an approximation. 16 Chapter 1: Molecular simulations 1.4.2 Parinello-Rahman barostat Parinello-Rahman pressure coupling scheme is similar to to the Nose-Hoover thermostat [31, 45, 47, 48]. With the Parrinello-Rahman barostat, the box vectors as represented by the matrix b obey the matrix equation of motion: db2 = V W−1b0−1(P − P ) (1.15) dt2 ref Here, the volume of the system is denoted as V and W is a matrix parameter that determines the strength of the coupling (similarly to ξ in Nose-Hoover scheme). The matrices P and Pref are the current and reference pressures. The equations of motion also have to be modified: 2 d ri Fi dri 2 = − M , (1.16) dt mi dt where M is: db0 db M = b−1[b + b0]b0−1 (1.17) dt dt The mass parameter W−1 determines the strength of the coupling and possible deformation of the simulation box. It depends on the isothermal compressibilities β, pressure coupling time τp and the largest matrix element of simulation box L: 2 −1 4π βij (W )ij = 2 (1.18) 3τp L 1.5 Water models in molecular dynamics 1.5.1 Classical water models Computer simulations of water started from the pioneering paper by Rah- man and Stillinger about forty years ago [21]. Most important issue when 17 performing water simulations is the choice of the potential model used to describe the interaction between molecules [49,50]. A large number of water models exists for molecular simulations. They differ in the ability to repro- duce specific features of real water instead of others, like the correct temper- ature for the density maximum or the melting temperature. The mostly used ”classical” water potentials are simple rigid non-polarizable models such as TIP3P,SPC,TIP4P,TIP4P/2005 [51–55]. However, with the increase of the computational power new polarizable and flexible potentials begin to ap- pear [26, 56]. The simplest water models have the positive charge on the hydrogen atoms and a Lennard-Jones interaction site and negative charge on the position of the oxygen. Classical water models differ in three significant aspects: (i) the geometry of the molecule, i.e. length of OH bond and H-O- H angle; (ii) the charge position (the negative charge of the oxygen can be placed not in the center of oxygen atom or even can be splitted); (iii) target properties, i.e. some properties of real water which the model is fitted to reproduce. The parameters of Lennard-Jones potential as well as geometry for the most used classical water models are shown in Table 1.1. a b c Figure 1.2: Schematic representation of three (a), four (b) and five-site wa- ter models. All parameters can vary depending on particular water model. Figure is adapted from Ref. [57]. All the water models were developed to reproduce certain water prop- 18 Chapter 1: Molecular simulations erties. So as consequence, while focused on one single property they show different results. Such an example is shown on Fig. 1.3 for the case of density. Figure 1.3: Maximum in density for several water models at atmospheric pressure. Filled circles: experimental results, lines: simulation results. Fig- ure is adapted from Ref. [50]. 1.5.2 Non-classical water models With recent increase of computational power it becomes possible to simulate relatively big systems with the potentials which explicitly takes into account such an effects as polarizability or flexibility.Generally rigid water models give excessive stabilization of the dimer compared with polarizable models [58]. Although the simulation time needed to simulate polarizable water model is approximately one order of magnitude higher than rigid-body water de- scribed above, it should increase the accuracy of the simulation results and shed the light upon the role of polarization in the water anomalies. Polar- izability is the ability of changing the distribution of the electronic cloud of the atom in the presence of the external field. In classical rigid water 19 Table 1.1: Potential parameters of the classical water models. The distance between the oxygen and hydrogen is denoted as dOH . The angle formed by hydrogen, oxygen and the other hydrogen atom is denoted as H-O-H. The parameters of Lennard-Jones potential is denoted as σ and (/kB). The charge of oxygen is qH .All the models (except TIP5P) place the negative charge in a point M at a distance dOM from the oxygen along the H-O-H bisector. For TIP5P, dOM is the distance between the oxygen and the L sites placed at the lone electron pairs. Schematic picture of different water models is given of Fig.1.2. The table is adapted from Ref. [50]. o Water dOH [A]˚ H-O-H[ ] σ[A]˚ (/kB)[K] qH [A]˚ dOM [A]˚ model SPC 1.0 109.47 3.1656 78.20 0.41 0 SPC/E 1.0 109.47 3.1656 78.20 0.423 0 TIP3P 0.9572 104.52 3.1506 76.52 0.417 0 TIP4P 0.9572 104.52 3.1540 78.02 0.52 0.15 TIP4P/2005 0.9572 104.52 3.1589 93.2 0.5564 0.1546 TIP5P 0.9572 104.52 3.1200 80.51 0.241 0.70 models this effect was not implemented due to its computational cost. Ob- viously in this case the polarization effects are neglected and this fact can be a source of errors and deviations from the experimental data. However, recently several polarizable water model such as BK, SWM4, AMOEBA were developed [25, 26, 59]. There are different ways to implement polarization. For example, in AMOEBA force field polarization effects are treated via mu- tual induction of dipoles at atomic centers where atomic polarizabilities were derived from the experimental data. In terms of computational time such approach is 8 times slower that the simulation of classical rigid-body water model. Also it’s worth to mention that for vdW interactions AMOEBA uses 14-7 potential [60] with repulsion-dispersion parameters placed on both oxy- 20 Chapter 1: Molecular simulations gens and hydrogens instead commonly used Lennard-Jones potential which is used only for oxygen atoms. Another way to introduce polarization is to use Drude oscillator potential. In this case the point charge is connected via classical spring to the oxygen atom. In the absence of external field the spring particle remains on the oxygen site and net charge on the oxygen is zero and to balance the positive charges of the hydrogen the charge of hydrogens the dummy particle with negative charge is introduced. However, the description of some processes, such as proton transfer, requires breaking and formation of the covalent bonds [61]. For these purposes more complex water poten- tials are used [62]. These potentials use ab initio calculations to represent the reacting fragments, while the remainder of the system is treated classically. One of the simplest methods is Empirical-Valence-Body method in which the ab initio potential energy surface is fit with an analytic form [63]. In the same time there are attempts to create a coarse-grained potential to mimic the behavior of water [64]. The aim of this model is to qualitatively good description of the water properties and remain fast in terms of computational speed. In general such models can be tuned to calculate some water prop- erties, such as density, but lack of fully atomic description gives the error in other properties which depend on reoriental movement of hydrogens. 1.6 Simulation details All the simulations of bulk water in this work if not specified elsewhere were done as following. GROMACS simulation package was used to handle the molecular dynamics [31]. The Berendsen barostat [42], velocity rescale ther- mostat [43] and Particle-Mesh-Ewald [39] were used for pressure coupling, temperature coupling and long-range electrostatics calculation, respectively. Coupling times for the barostat and thermostat were set to τP =1.0 ps and τT =1.0 ps, respectively. This combination of pressure and temperature cou- pling can easily produce a correct canonical ensemble. None-covalent inter- actions were treated with 1.2 nm cut-off. The integration time-step was set 21 to 2 fs. Such value was chosen in order to monitor the kinetics of a single hydrogen bond which lifetime is on a similar timescale. All the simulations were done at atmospheric pressure and periodic boundary conditions. The data was obtained over 25000 snapshots obtained from a 100 ps long run after a 10 ns equilibration in the same conditions. Such simulation length was chosen to equilibrate the system at low temperatures. In all cases of bulk water simulations the box contains 1024 water molecules. 22 Chapter 2 Water phase diagram and water anomalies 2.1 Water phase diagram Water is present on Earth as a gas, a liquid and a solid. Its properties are of great interest of researchers from various fields because of following reasons. First, water plays the main role in biological properties and studying the dynamical and kinetical properties of water molecules can help in investiga- tion of role of water around biomolecules. Second, water is one of the most prevalent substances in the universe and investigation of its properties can shed some light upon composition and behavior of objects in outer space. Third, water has reach phase diagram and many different crystalline forms, and studying its properties and structure can help to investigate general laws of phase transition, properties of amorphous, liquid and crystal substances. H2O ice is characterized by one of the most complex phase diagrams: at least 16 different crystalline and amorphous modifications are observed at different pressures P and temperatures T [65, 66]. Some of this crystalline forms are stable, others (IC ,IV,IX,XII) exist only in metastable form. In crystal phases of normal pressure the water local structure is close to perfect tetrahedral while at high pressures it becomes distorted [67]. And at the pressures higher 23 than 5 katm the two independent interpenetrating hydrogen bond networks are created (Ice VI,VII,VII) [68–70]. In general, all these possible phases of water can occur in nature due to the restructurization of water hydrogen network [71]. However, because phase transitions are on longer timescales than are accessible by molecular dynamics simulations, the direct observation of the crystallization is impossible. For this purpose the methods based on the energy calculations of beforehand constructed structures are used [72, 73]. On Fig. 2.1 experimental phase diagram and the results for TIP4P water model [73, 74]. Although two diagrams quantitatively are not the same, TIP4P model is able to capture the main features of water phase diagram. Figure 2.1: Phase diagrams of water. Left panel: simulation results from TIP4P water model. Right panel: experimental phase diagram. Only stable phases of ice are shown. Adapted from Ref. [73]. 2.2 Water anomalies The anomalies of water are properties where the behavior of liquid water is different from what is found with other liquids [75]. In the following section 24 Chapter 2: Water phase diagram and water anomalies we highlight some of the anomalous properties of water. At atmospheric pressure after passing the melting point water density increases, reaches its maximum at 277 K and only after that going down, while in other liquids the density always decreases with the increasing of temperature [76,77]. Such a maximum is the only one occurring in liquids in their stable liquid phases just above the melting point [77]. The high density of liquid water is due mainly to the cohesive nature of the hydrogen-bonded network, with each water molecule capable of forming four hydrogen bonds. This reduces the free volume and ensures a relatively high-density, partially compensating for the open nature of the hydrogen-bonded network. The anomalous temperature-density behavior of water can be explained utilizing the range of environments within whole or partially formed clusters with differing degrees of dodecahedral puckering [78,79]. Another interesting property related to the water density is that the den- sity of liquid water is higher than the density of ice. It is usual for liquids to contract on freezing and expand on melting. This is because the molecules are in fixed positions within the solid but require more space to move around within the liquid [80]. The structure of ice Ih is open with a low pack- ing efficiency where all the water molecules are involved in four directed tetrahedrally-oriented hydrogen bonds and passing the melting point some of these bonds break and some become distorted, what is different with re- spect to another solids, where breaking bonds upon melting requires more space and therefore the density decreases [80]. It’s worth to note that this sit- uation does not happen with high-pressure ices (III,VI,VII), which expand on melting [81]. It can be expected that due to large cavities in hydrogen bond network dV water should have a high isothermal compressibility (kT = −[ dP ]T /V ]). In −1 fact, water has unusually low compressibility (0.46 GPa , compare to CCl4 1.05 GPa−1 at 300 K) [82, 83]. The low compressibility of water is due to the cohesive nature of its hydrogen bonds. This means that in fact there’s not so many free space as it can be expected. Also, the compressibility 25 behavior in temperature space is different with respect to typical liquids. In a typical liquid the compressibility increases with increase of the temperature (the structure becomes less compact). But because water structure becomes more open at lower temperatures, the capacity to be compressed increases [84–86]. At sufficiently low temperatures, where the liquid-amorphous phase transition occurs the compressibility reaches its maximum [86] (see Fig. 2.2). Figure 2.2: Isothermal compressibility of water. Solid lines are data from Ref. [86], symbols represents the data from Ref. [84,85,87]. Figure is adapted from Ref. [86]. Water has the highest specific heat of all liquids except ammonia. This occurs because as water is heated, the increased movement of water causes the hydrogen bonds to bend and break. As the energy absorbed in these processes is not available to increase the kinetic energy of the water, it takes considerable heat to raise water’s temperature. Also, as water is a light molecule there are more molecules per gram, than most similar molecules, to absorb this energy [57,76]. However the occurrence of a maximum in the 26 Chapter 2: Water phase diagram and water anomalies specific heat as the pressure or temperature is varied across the extension of the coexistence line is well documented. This is understood by definition of the ’Widom line’ a term introduced to define the locus of maximum correlation length that extends into the single fluid phase beyond the critical point [88]. Another striking property of water is anomalous increase of viscosity with lowering the temperature [89, 90]. The water cluster equilibrium shifts to- wards the more open structure as the temperature is lowered. This structure is formed by stronger hydrogen bonding. This creates larger clusters and reduces the ability to move or in other words increases viscosity [57]. It is also interesting that Einstein-Stokes relation which connects viscosity and kB T temperature D = 6πηr (here D is diffusion coefficient, η is viscosity and r is approximate radius of the particle) violates for water. At low tempera- tures the diffusion dependence on temperature can be fitted with Arrenhius lax while at high temperatures it behaves accordingly to empirical Vogel- kTo Fulcher-Tamman relation D = D0exp( ), where D0 and T0 are fitting T −T0 coefficients). The example of such a behavior is shown on Fig. 2.3 [90–92]. Figure 2.3: The temperature dependence of the inverse of self-diffusion coef- ficient of water. Red line is fit to the Vogel-Fulcher-Tamman relation, dashed line is fit to the Arrhenius law. Figure is adapted from Ref. [90] 27 Here we explain only some unusual properties of water, but it’s evi- dent that its properties are strongly correlated with its hydrogen bond local structure. In order to study structure and dynamics of hydrogen bond net- works various experiments were made [71, 73, 90, 93–95] and theories were proposed [9, 10, 13, 16, 66], but yet the whole picture is unclear. For exam- ple, there is still open question about inhomogeneties of liquid water and its structure in general [16]. 28 Chapter 3 Water supercooling and freezing 3.1 General perspective Water freezing is not simply the reverse of ice melting . Melting is a single step process that occurs at the melting point as ice is heated whereas freezing of liquid water on cooling requires ice crystal nucleation and crystal growth that generally is initiated a few degrees below the melting point even for pure water [96]. Here we refer to the liquid water below its melting temperature as to supercooled water. Liquid water may be easily supercooled to 248 K and with more difficulty to the temperature of homogeneous nucleation TH ≈ 225 K at atmospheric pressure [84, 97]. Supercooled water is a metastable phase of liquid water below the melting temperature [66]. In this regime, the transition to the solid phase is irreversible once the process is activated. At low temperatures water is a liquid, but glassy water - also called amor- phous ice - can exist when the temperature drops below the glass transition temperature Tg (about 130 K at 1 atm). Although glassy water is a solid, its structure exhibits a disordered liquid-like arrangement [66]. This state of water is known for many years and calls low-density amorphous ice. Around thirty years ago another form of amorphous ice with much higher density 29 Figure 3.1: Schematic illustration indicating the various phases of liquid water. Figure is adapted from Ref. [97]. (High-density amorphous ice, HDA) was obtained experimentally [98] (See Fig. 3.1). Low-density ice originally was obtained by depositing water vapor upon a cold plate [99] or by rapid cooling of small water droplets [100]. Upon heating up to 130K this form of ice transforms to a highly viscous liquid [101]. On the other hand, high-density ice was obtained by compressing hexagonal ice IH below temperatures of 150K [66,98,102]. After further compression HDA crystallizes into high-density crystalline ice [103]. Moreover, with changing pressure this two forms (LDA and HDA) can interconvert with volume change 30 Chapter 3: Water supercooling and freezing of about 20%. Thus it remains unresolved whether one considers HDA to be a glassy state of liquid water or to be a collapsed crystal state . Recently it was hypothesized that at higher temperatures LDA and HDA will turn into low-density liquid and high-density liquid phases respectively [13, 66]. However, the possible liquid-liquid critical point lays in so called ”no man’s land”, the region almost unreachable for the experiments because supercooled water freezes at such temperatures. An interesting discussion recently developed on the relationship between crystallization rate and the time scales of equilibration within the liquid phase [104, 105]. Calculations using a coarse grained monoatomic model of water, the mW model, suggested that equilibration of the liquid below the temperature of homogeneous nucleation TH ≈ 225 K is slower than ice nu- cleation [105]. This observation has important consequences to a proposed theory of water anomalies, predicting a second critical point below TH where a liquid-liquid phase transition occurs [13]. Although it has attracted at- tention [106–109], this theory is not without problems. If the speed of ice nucleation is faster than liquid relaxation, the liquid-liquid transition would loose sense from a thermodynamical point of view, being the liquid phase not equilibrated [104]. It is worth to note that during the whole history of the molecular dynamics simulations of water there’s still no evidence of systematic water nucleation so far [14]. 3.2 Test of water freezing To investigate the relaxation properties of an atomistic model in the super- cooled region below TH , a 3 µs long molecular dynamics simulation of the TIP4P-Ew water model. The length of this calculation is one order of magni- tude larger than the 350 ns used to study freezing with the mW model [105]. The simulation was run at 190 K and 1250 atm. These values are close to the estimated liquid-liquid critical point for the TIP4P-Ew [15], congruous with recent calculations on the similar TIP4P/2005 model [107]. 31 The structural parameters are designed to distinguish between different phases by analyzing the geometrical structure. Here we used two different approaches to estimate the structural order of water molecules. First one, the tetrahedral order parameter which takes into account the configuration of four nearest neighbors of the water molecule i, qi. It was calculated as 3 4 3 1 2 q = 1 − cos ψ + , (3.1) i 8 jik 3 j=1 k=j+1 X X where ψjik is the angle formed by their oxygens [71]. The averaged value of this order parameter over an ensemble of water molecules for each sin- gle timestep is denoted as QT . The second parameter we used is bond- orientational parameter Q6 developed by Steinhardt et. al. [110]. This parameter is a function of a projection of the density field into averaged spherical harmonic components. To calculate Q6 we need to calculate the set of quantities 1 4 qi = Y m(φ θ ), −l ≤ m ≤ l (3.2) l,m 4 l ij ij j∈ni X m where the sum is over four nearest neighbors, ni. Yl is the l, m spherical harmonic function associated with the angular coordinates of the vector ~ri−~rj joining molecules i and j, measured with respect to an arbitrary external frame. These quantities are then summed over all particles to obtain a global metric N i Ql,m = ql,m (3.3) i=1 X and then contracted along the m axis to produce a parameter that is invariant with respect to the orientation of the arbitrary external frame, l 1 ∗ 1 Q = ( Q Q ) 2 (3.4) l N l,m l,m m=−l X The most probable value of Ql for an amorphous phase approaches zero in the thermodynamic limit, while it is finite for a crystalline phase [104]. We 32 Chapter 3: Water supercooling and freezing used l = 6 because it was found empirically that it is useful for distinguishing liquid water and ice [104, 111]. It is worth to note that the main difference between these order parameters is that Q6 is the measure of the crystalline order for the whole system. On the other hand QT describes tetrahedral order of the single water molecule and can vary for the different water molecules showing at the same time moment that some waters keep tetrahedral ice-like structure while another have distorted liquidlike structure. A -54 [kJ/mol] p E -55 B 1.01 ] -3 0.99 [g cm ρ 0.97 C 0.88 0.86 T Q 0.84 D 0.82 0.03 6 Q 0.01 0 1000 2000 3000 Time [ns] Figure 3.2: Time series for the 3 µs trajectory. (A) potential energy; (B) density; (C) tetrahedral order parameter QT ; (D) Q6 parameter. Right panels show the probability distribution of the respective quantities. 33 In the simulated conditions, water freezing was not observed as shown by the timeseries of the potential energy Ep (Fig. 3.2A). Fluctuations are of the order of 0.5 kJ/mol per molecule with no systematic drift. It has been observed that once freezing is activated the energy drifts very quickly to low values of the potential energy, with large energy changes (e.g. roughly 5 and 2 kJ/mol per molecule for TIP4P at 230 K [14] and TIP4P/2005 at 242 K [112], respectively). The time series of the density ρ and the tetrahedral order parameter QT [71] are shown in Fig. 3.2B-C. They respectively correlate and anticorrelate with the potential energy (Pearson correlation coefficient r = 0.69 and -0.86) (see upper panel of Fig. 3.3). The distributions of both ρ and QT show an appreciable bump at one of the tails (see right panel of Fig. 3.2B-C), suggesting the presence of a subpopulation. For the case of the tetrahedral order parameter, the subpopulation emerges at values around 0.873 (red dashed line and right side of Fig. 3.2C). This fluctuation is localized in a time window between 2.3 and 2.6 µs in correspondence to a decreasing of both the density and the potential energy. It is interesting to note that density subpopulations have been interpreted by some [111] as a signature of the aforementioned liquid-liquid transition. To check whether this fluctuation corresponded to an ice nucleation at- tempt, the Q6 order parameter [104, 110, 113] was calculated (Fig. 3.2D). In the time window between 2.3-2.6 µs the value of the parameter is around 0.025, with no signs of ice nucleation. Moreover, no correlation with the en- −6 ergy was found (r = 10 ). With a value of Q6 for hexagonal ice expected to be one order of magnitude larger [113], no evidence for ice nucleation is found in the present trajectory. Moreover, nor correlation neither anticorrelation between Q6 and any other of calculated parameters was observed (bottom panel of Fig. 3.3). Also to check the fact that at studied conditions the water molecules can move we calculated the oxygen mean-square-displacement (MSD) as: 34 Chapter 3: Water supercooling and freezing 1020 1020 -53.5 1010 1010 ] ] -3 1000 -3 1000 -54 990 990 980 980 -54.5 Energy [kJ/mol] Density [kg m Density [kg m 970 970 960 960 -55 -55 -54.5 -54 -53.5 0.82 0.84 0.86 0.88 0.82 0.84 0.86 0.88 Energy [kJ/mol] QT QT -53.5 1020 0.9 1010 ] 0.88 -54 -3 1000 990 T 0.86 Q -54.5 980 Energy [kJ/mol] Density [kg m 0.84 970 -55 960 0.82 0 0.01 0.02 0.03 0.04 0.05 0 0.01 0.02 0.03 0.04 0.05 0 0.01 0.02 0.03 0.04 0.05 Q6 Q6 Q6 Figure 3.3: Instant relationship between Q6,QT , density and potential energy. 2 MSD(t) = h(ri(t) − ri(0)) i, (3.5) where ri is the coordinates of single atom (Fig. 3.4). At timescales shorter than one ns, water shows a subdiffusive behavior (dotted line in Fig. 3.4). For larger times the system enters a diffusive regime, following the linear relationship MSD ≈ t (dashed line), with a maximum average displacement of 3.47 nm after 3 µs. Taking into account that the molecular diameter is around 0.3 nm, water molecules have diffused for about 11.5 molecular diameters (the average box side length is of 3.14 nm). With these results the evidence is provided that the liquid phase of the TIP4P-Ew model is at equilibrium in the supercooled regime before ice nu- cleation. This result is in agreement with another µs long simulation of supercooled water with a 5-site model [111], suggesting that equilibration of the liquid phase below TH is a common feature of atomistic models. The mW 35 101 0 ] 10 2 10-1 MSD [nm 10-2 10-3 10-2 100 102 104 Time [ns] Figure 3.4: Oxygen mean square displacement (MSD). The dashed and dot- ted lines represent a linear and a power-law (exponent equal to 0.1) regres- sion, respectively. The diffusion coefficient extracted from the linear regime is of 6.6 × 10−9cm2/s. The g msd function of GROMACS was used with 150 windows to improve statistics. model has shown to reproduce several properties of water, including density and phase diagram [114]. But the lack of hydrogens, and consequently of molecular reorientations [17], might considerably speed up the time scales. Probably, the differences in the relaxation kinetics between atomistic models and the mW model are due to the lack of molecular reorientations in the latter. Clearly, further experimental validation is needed to clarify which proposed mechanism (if any) is closer to real water. 36 Chapter 4 Complex network approach for molecular dynamics trajectories and hydrogen bond as an order parameter Molecular dynamics simulations can give the important information about thermodynamics and kinetics of the simulated systems [28]. Order param- eters are conventionally used for this purposes [115, 116]. Some of the con- ventional order parameters commonly used to measure the structure of liq- uids were described in previous chapter. Unfortunately, it is known that reduced descriptions based on order parameters in many cases are inaccu- rate [28, 115, 117–120]. The description based on order parameter can not clearly define to which state belong the certain value of an order parameter. Moreover, in some cases kinetic description based on the order parameter is wrong. The example of such a problem is a stochastic two state model, which was studied in Ref. [115] (see Fig. 4.1). The origin of the failure is due to overlaps in the order parameter distribution, i.e., configurations with differ- ent properties corresponding to the same value of the coordinate, making the discrimination between states almost impossible [121, 122]. To improve this 37 situation a new arsenal of tools emerged making use of complex networks and the theory of stochastic processes [28, 123–125] as it described in the following. 4.1 Complex network as a tool to study molec- ular simulations A network is a set of items, which we will call nodes, with connections be- tween them, called edges. Systems taking the form of networks abound in the world [126]. Here we will call “complex network” the network with non- trivial topological properties. Surprisingly such networks can be obtained from many sociological [127], biological [128] or technological systems [129]. From analysis of the networks built from the real systems one can obtain many useful information. For example with network analysis possible to detect the most vulnerable nodes, destroying which the connectivity of the network would be highly reduced. Another useful property of the networks is community (cluster) structure i.e., groups of nodes that have a high density of edges within them, with a lower connectivity between these groups. It is obvious that social complex networks split in a groups along certain interests, friends, age, occupation. The same happens with the complex networks built from other systems. But in the case of some systems splitting into communities is not so easy. For this purpose many algorithms were proposed [130–133]. Some of them are fast but not precisely accurate, some are better in predicting cluster structure but require more computational time. Another important aspect is that the output of algorithm depends on the structure of the complex network. However, for the typical analysis of molecular dynamics trajectory not all the conventional algorithms are able to properly map the free-energy landscape [124,134]. Here we describe one of the complex networks approaches to map the free- energy landscape of the system from the molecular dynamics simulation. The basic idea behind this approach is to map a dynamical system into a discrete 38 Chapter 4: New strategies for the analysis of molecular dynamics trajectories (a) (b) Figure 4.1: Timeseries of an artificial order parameter of stochastic two-state model. (a) The conventional histogram method is unable to distinguish be- tween two states with the same value of an order parameter. (b) Network clusterization techniques allow the lumping of kinetically homogeneous re- gions of the network into states and build a model of the original process. Figure is adapted from Ref. [115]. set of microstates, and their interconvertion rates as calculated from the original trajectory. The advantage of this approach is that it allows to merge different parameters into a single order parameter. To obtain the transition network from molecular dynamics trajectory the following procedure has to be done. For the snapshot at time t for each water molecule we define a microstate based on some order parameter. In the case of water the most natural parameter is a hydrogen bond structure of its solvation shells [16]. This microstate represents a single node of a transition network. Then we 39 Figure 4.2: The example of complex network obtained from molecular dy- namics. Here, microstates were defined as different conformations of protein. On the upper panel the whole complex network is shown, on the lower panel nodes which belong to the same clusters were merged together. Figure is adapted from Ref. [135]. can obtain the value for the order parameter at the next snapshot t+∆t and get the corresponding microstate. If two microstates i and j are different 40 Chapter 4: New strategies for the analysis of molecular dynamics trajectories the link with weight Wij=1 is put into the transition network, for the case when microstate remained the same, the selflink Wii is put in the network. If certain transition occured second time the link weight has to be increased: Wij+=1. Doing this procedure for all the snapshots in the trajectory one can obtain the transition network. At equilibrium the obtained weight of the certain node is equal to its probability and link between two nodes is proportional to the transition probability [28] In case of liquid water the definition of the microstate has to mimic the topology of hydrogen-bond network around a given water molecule that de- termines the structural and dynamical properties of the bulk. However, the binding partners to any central water molecule are not predefined but keep exchanging on a fast picosecond time scale [136]. Therefore, any approach to define a microstate must be invariant to interchanging water molecules, as well as binding sites [16]. To simplify the definition of the microstate it is useful to make an approximation that each water molecule can have at max- imum four hydrogen bonds (two on the oxygen and one on each hydrogen). In some cases all of four possible hydrogen bonds are formed, but in others there are broken bonds and distorted loops (See Fig.4.3). The microstate def- inition describes each of possible structures by a unique string that encodes the connectivity through hydrogen bonds. For each molecule the search of a hydrogen bond partners is performed. After finding this molecules which form the first solvation shell, the search expands in a treelike manner. Each subsequent solvation shell is a new generation and follows, in order, in the microstate string, numbered by their position in the fully hydrogen-bonded tree up to the second solvation shell [16]. From an operative point of view, the algorithm works on a per-node basis by deleting all the links (transitions) but the most visited one (which represents the local direction of the gradient). When applied to the whole network, the algorithm provides a set of disconnected trees, each of them representing a collective pathway of relaxation to the bottom of the local free-energy basin of attraction (gradient-cluster, gray regions in Fig. 4.4). 41 [h!] Figure 4.3: Water microstates. (a) Conformation in which all four hydrogen- bonding sites of each water molecule connect to new water molecules, and the corresponding microstate string. Water molecules are numbered according to their appearance in the tree search, and water molecules from subsequent generations are placed next to each other. (b) If a hydrogen-bonding site is empty (e.g., molecule 5), it is labeled as 0, as are all subsequent entries down the tree. Small loops, such as 1-2-3, are included in a natural fashion. Figure is adapted from Ref. [16]. 42 Chapter 4: New strategies for the analysis of molecular dynamics trajectories Each gradient-cluster represents a structurally and kinetically well defined molecular arrangement with an extension of up to two solvation shells [16]. The application of the conformational network technique is shown in Chapter V of this work. As observed elsewhere [119, 124, 135, 137], the transition network syn- thetically encodes the complex organization of the underlying free-energy landscape. Specifically, densely connected regions of the network correspond to free-energy basins, i.e., metastable regions of the configuration space. Sev- eral algorithms can be used to extract this information, including the max flow theorem [119], random walks [124, 138] or transition gradient analy- sis [137,139]. All these approaches aim to clusterize the network into kineti- cally and structurally well defined basins of attraction. In enthalpy driven free-energy landscapes, of which proteins are an archety- pal example, the transition probability to stay inside a given basin Zin is much larger then the probability to hop outside Zout [119, 135]. That is, basin hoping is a rare event. Moreover, the number of neighboring basins is usually very limited, with the emergence of well defined transition path- ways [28, 125, 135]. This is not the case for water [16]. Being a liquid, it is mainly characterized by entropic basins of attraction. As illustrated in Fig. 4.4, Zin and Zout become comparable because the cumulative of the many small inter-basin transition probabilities (Zout) is similar to the few highly populated intra-basin relaxations (Zin). In other words, the probabil- ity to leave the basin i is similar to stay in it. This observation would lead to the conclusion that, at the atomic level, water does not have any type of configurational selection. However, this is not true when considering all the contributions to Zout separately: (i) Zout = Zout (4.1) i X Structural inhomogeneities, i.e., configurational selection, emerge because (i) (i) max Zin max Zout , (4.2) 43 meaning that the probability of an intra-basin transition is larger than hoping to any other specific basin. When this condition holds, the environ- ment of a given water molecule alternatively adopts a number of different configurations, each of them characterized by a specific free-energy basin of attraction. This is an emergent property of water at ambient tempera- ture [16]. Zin Zout Figure 4.4: Configuration-space-networks. Pictorial representation of the relative balance between intra-basin (Zin) and inter-basins (Zout) transi- tion probabilities from the point of view of a node (in blue). Gray re- gions represent free-energy basins of attraction as detected by the gradient- algorithm [137,139]. 4.2 Hydrogen bond criteria Hydrogen bond is one of the possible order parameters which can be used to obtain free-energy landscape of water. It represents a fundamental in- teraction in molecular systems [140]. Its peculiarity resides in the common aspects it has with both covalent bonds and van der Waals interactions. In hexagonal ice the energy of the hydrogen bond is part electrostatic (90 %) 44 Chapter 4: New strategies for the analysis of molecular dynamics trajectories and part covalent(10 %) [141], however it is not clear if this is the case for the liquid water. The strong directionality together with the ease of being formed and broken at ambient conditions makes it an important ingredient in water structure and dynamics [142], protein stability [143] and ligand bind- ing [144]. Notwithstanding, a universal definition of this interaction is still missing [145]. The case is even more difficult for molecular dynamics where the different potentials for water are used [136]. Hydrogen bonds are formed between two polar atoms via a hydrogen which is covalently bound to one of the two. This interaction is highly direc- tional. For example, in bulk water at 300 K the angle OH-O is mostly below 30 degrees [146], while the donor-acceptor distance is of around 3.5 A˚ [147]. Despite the apparent simplicity, the presence of thermal fluctuations as well as the non-trivial effects of the environment made the development of an operative definition of this bond difficult. In the last decades, several definitions were proposed based on computer simulations [136]. The most popular ones look at bond formation by using a mixture of distances and angles between the two partners [148–150]. Others tried to avoid altogether cutoffs by proposing topology-based definitions [151– 153]. Given the many degrees of freedom involved in molecular association, it is now clear that all definitions retain some degree of arbitrariness [154]. In most cases, hydrogen bond definitions were developed at specific ther- modynamic conditions. However, not much is known on the behavior of those definitions as a function of temperature and water model. This section is an effort to present a transparent comparison between hydrogen bond def- initions in several different conditions, including temperature, water model and cutoff dependence. Here, we present an assessment of most used hy- drogen bond definitions based on the analysis of molecular dynamics simu- lations of water in a temperature range from 220 K to 400 K. Six among the most widespread classical water models were used in the analysis, in- cluding SPC [52], SPC/E [53], TIP3P [54], TIP4P [74], TIP4P-Ew [51] and TIP4P/2005 [55]. Comparison of this water models per se is presented in 45 Chapter V. Six hydrogen bond definitions were considered. Here we distinguish two broad classes of hydrogen bond definitions: geometrical and topological (Fig. 4.5). The difference between them is that geometrical definitions make use of cut- offs on inter-atomic distances and angles while the latter mostly avoid this problem using topological criteria. A brief description of the definitions fol- lows. rOO Θ rOH geometrical topological Figure 4.5: Hydrogen bond definitions can be roughly partitioned into two classes: geometrical and topological. Geometrical definitions 1. rOH . In this definition the oxygen-hydrogen distance (rOH ) is used as criterion (Fig. 4.5A) [149]. In the original work, a cutoff of 2.3 A˚ was proposed by simulating amorphous ice at T=10 K with the TIPS2 potential [155]. The distance cutoff value is related with the position of the first minimum in the oxygen-hydrogen radial distribution function. 2. rOOΘ. This definition makes use of both the oxygen-oxygen distance 46 Chapter 4: New strategies for the analysis of molecular dynamics trajectories (rOO) and the ∠OOH angle (Θ) between two water molecules. In the original work, a bond was considered formed when rOO and Θ were smaller than 3.5 A˚ and 30 degrees, respectively [150]. The distance cutoff was taken from the position of the first minimum in the oxygen- oxygen radial distribution function. Missing a clear signature of the bond state in the distribution of the angle Θ, the cutoff value was taken from experimental data [146,147]. 3. Sk. The hydrogen bond definition of Skinner and collaborators is based on an empirical correlation between the occupancy N of the O ··· H σ∗ orbital and the geometries observed in molecular dynamics simulations [148]. Two water molecules were considered bonded if the value of N is higher than a certain cutoff which is taken in correspondence to the position of the first minimum in the distribution of N. In the original paper N was defined as: N = exp(−r/0.343)(7.1 − 0.05φ + 0.00021φ2), (4.3) where φ is the angle bewteen water molecule bisector and a vector between oxygen of a water molecule and hydrogen of a possible partner (See Fig. 4.6). A cutoff equal to 0.0085 was chosen by analyzing MD simulations of the SPC/E model at ambient conditions. Topological definitions 4. DΘ. A hydrogen bond is formed between a hydrogen atom and its nearest oxygen not covalently bound. An additional restriction was imposed: the angle Θ had to be lower than π/3. In the original work [152], this definition was applied to the study of the SPC/E water model for temperatures ranging from 273 to 373 K. 5. DA. Two criteria for the hydrogen bond were used: (i) the acceptor is defined as the closest oxygen to a donating hydrogen and (ii) this hydrogen is the first or second nearest neighbor of the oxygen. As a 47 Figure 4.6: Pictorial representation of the distances and angles used for hy- drogen bond definitions. The z axis is perpendicular to the molecular plane. Figure is adapted from Ref. [148]. consequence, the total number of hydrogen bonds per water is limited to four. This definition was proposed with simulations of the EMP water model at 292 K [151]. 6. TP. A hydrogen bond is formed between a hydrogen and its closest oxygen. When more than one hydrogen bond between the two water molecules is found, the one with the shortest oxygen-hydrogen distance is considered to be formed [153]. This definition was mainly evaluated at ambient conditions using the TIP4P/2005 water model. To analyze difference between hydrogen bond definitions described before we analyzed the number of hydrogen bonds per molecule. Here we performed analysis over SPC/E model since results of its simulations were used to define the most recent Sk hydrogen bond criterium [148]. Discrepancies were found in the distribution of the number of bonded partners (Fig. 4.8). At 300 K geometrical definitions were quite consistent among each other, with a larger fraction of three hydrogen bonded configurations for Sk. For the topological case, DA and TP agreed on the number of four coordinated molecules. How- ever, the former detected a larger fraction of three and two bonded molecules while TP presented a non-negligible fraction of cases with five partners and 48 Chapter 4: New strategies for the analysis of molecular dynamics trajectories 4 3.8 3.6 3.4 3.2 N, HBs per molecule 3 2.8 240 280 320 360 400 Temperature [K] Figure 4.7: Average number of hydrogen bonds per water molecule for the six hydrogen bond definitions: rOH (orange), rOOΘ (green), Sk (red), DΘ (cyan), DA (blue) and TP (purple) no evidence for two bonded molecules. This scenario changes when a topo- logical definition is coupled with an angle cutoff (DΘ). In this case, almost identical results as the conventional rOOΘ were found with an agreement that persists in the entire temperature range as shown in Fig. 4.7 (green and cyan data). Kinetics was analyzed in terms of hydrogen bond lifetime distributions. The lifetime was calculated as follows. For each definition, pairwise hydro- gen bonds among all water molecules were calculated for every frame. For each of the water pairs that formed a bond, the time span for how long that particular bond lasted is called lifetime. The distribution was then calcu- lated by building an histogram of all the lifetimes collected in the molecular trajectory. The average lifetime was denoted with the symbol τ. Distributions for the six hydrogen bond definitions at 300 K are shown in Fig. 4.9A. Fastest decays (i.e. shorter life times) were observed for rOOΘ (green) and DΘ (cyan), strongly suggesting that fluctuations along the Θ an- 49 0.8 Θ rOH rOO Sk 0.6 0.4 Probability 0.2 0 0.8 DΘ DA TP 0.6 0.4 Probability 0.2 0 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 N hb N hb N hb Figure 4.8: Average number of bonded partners for the six hydrogen bond definitions at 300K. gle represent the major responsible for the faster kinetics. On the other hand, the largest lifetimes were found with the TP definition. At very short times (<200 fs) purely topological approaches provided the best results (inset of Fig. 4.9A). In fact, both DA (blue) and TP (purple) showed a smooth decay, in contrast to all the other definitions which provided a debatable oscillating behavior [156]. This observation strongly suggests that those fluctuations are an artifact of the use of cutoffs. For the average lifetime τ, Arrhenius behavior in the range 260 K (P3) and two or less (P210) bonds with a generic first solvation shell. In the ∗ calculation of P4 the propensity of P4 was subtracted. A more comprehen- 50 Chapter 4: New strategies for the analysis of molecular dynamics trajectories A 10-1 -2 10 -2 10 10-3 10-4 0 0.1 0.2 Probability 10-5 10-6 10-7 0 5 10 15 20 25 30 time [ps] B 101 100 [ps] τ 10-1 2.5 3 3.5 4 4.5 1000/T [K-1] Figure 4.9: Hydrogen bond kinetics for the six different definitions: color- code is the same as in Fig 4.7. (A) The lifetime distribution at T=300K is shown. (B) the average hydrogen bond lifetime versus 1/T is plotted (error bars are smaller than the symbol size). The Arrhenius behavior is observed in the range of temperatures from 260 to 400K. 51 sive study of these four propensities, including temperature and water model dependence is presented in Chapter V of this thesis. P4 configuration Figure 4.10: Graphical representation of fully coordinated molecule, P4. In Fig. 4.11 hydrogen bond propensities including the second solvation shell are presented. The behavior of these propensities strongly depend on the hydrogen bond definition taken into account. Consistency was found within two groups. The first one includes rOH , rOOΘ and DΘ and the second one TP and DA. Sk did not match very well any of them. The value of P4, i.e., the probability to have a four-coordinated water molecule with a fully coordinated first and second shells (Fig. 4.5B), was equal to 0.34 and 0.58 at 220K for Sk and TP , respectively (red data). As temperature was increased this difference became even more pronounced. A similar disagreement was also observed for the other three propensities. ∗ An interesting case is given by P4 . This quantity reports on four-coordinated water molecules with an arbitrarily disordered second solvation shell. For all definitions this quantity presented a peak. However, TP and DA made an ex- ception being the maximum much more shallow and at a higher temperature with respect to the other approaches. This leads to an over estimation of four coordinated water molecules which are predicted to be the most abundant 52 Chapter 4: New strategies for the analysis of molecular dynamics trajectories 0.8 Θ * rOH rOO Sk P 0.6 4 0.4 P3 P4 Population 0.2 P210 0.0 0.8 DΘ DA TP 0.6 0.4 Population 0.2 0.0 240 280 320 360 400 240 280 320 360 400 240 280 320 360 400 Temperature [K] Temperature [K] Temperature [K] Figure 4.11: Hydrogen bond propensities including the second solvation shell ∗ for temperatures between 220 K and 400 K. P4, P4 , P3 and P210 are shown in red, blue, light blue and very light blue, respectively. configuration at temperatures as high as 400 K. This result is counter intu- itive as waters with three or less hydrogen bonds would have been expected to represent a larger fraction of the sample at a such high temperature. Sub- stantial discrepancies among definitions were also found in the case of P210 (water molecules with two bonds or less). For the case of TP this probability was essentially zero at all temperatures while it grew with temperature in all the other cases. Hydrogen bonds were described so far on the base of propensities and kinetics. Now, we investigate the robustness of the geometrical definitions with cutoff choice. The aim of the following analysis is to understand what 53 is the influence of temperature and water model on the distributions relevant to cutoff choice. In fact, default cutoff values were originally proposed from experiments and calculations at specific temperatures and water models. Al- though in most cases prescriptions were given to properly choose the cutoffs, default values were often applied in conditions far away from the original works. To check this temperature dependence was investigated with conventinal radial distribution functions (RDF). Here and in the following text we define RDF as: g(r) = 4πr2ρdr, (4.4) where ρ is the density of the system and ρ is the number of particles over volume. For rOH , the distribution that matters is the oxygen-hydrogen radial dis- tribution function (g(r), left column in Fig. 4.12). The plot shows that the first minimum becomes less pronounced with temperature while its position gets closer to the origin (from 2.424 to 2.419 A,˚ left middle row of Fig. 4.12). Choosing the cutoff according to the position of the minimum, the average number of hydrogen bonds per molecule was significantly affected despite the small change of the cutoff value. In the bottom left panel of Fig. 4.12 the difference between a “standard” cutoff approach (empty circles) and a temperature dependent cutoff (filled circles) is shown. Similar results were obtained for the other two geometrical definitions. For these cases the value of the cutoff was chosen according to the position of the first minimum of the oxygen-oxygen radial distribution function and the distribution of the occupancy N for the case of rOOΘ and Sk, respectively (second and third columns of Fig. 4.12). Lacking of a bimodal behavior we intentionally avoided the study of the angle Θ cutoff dependence. Interestingly, radial distribution functions depend not only on temper- ature but also on the water model under study. This suggests a further dependence on cutoff choice. To verify this idea, we ran MD simulations of 54 Chapter 4: New strategies for the analysis of molecular dynamics trajectories rOH rOO Sk 240K 280K 320K g(r) g(r) Probability 1 2 3 4 2 3 4 5 0 0.01 0.02 0.03 0.04 r [Å] r [Å] Occupancy [N] 3.5 0.012 2.424 ] ] Å Å 3.4 0.010 2.422 3.3 0.008 distance [ 2.420 distance [ Occupancy [N] 2.418 3.2 0.006 240 280 320 240 280 320 240 280 320 Temperature [K] Temperature [K] Temperature [K] 4.0 4.0 4.0 3.8 3.8 3.8 3.6 3.6 3.6 3.4 3.4 3.4 3.2 3.2 3.2 Number of HB 3.0 Number of HB 3.0 Number of HB 3.0 2.8 2.8 2.8 240 280 320 240 280 320 240 280 320 Temperature [K] Temperature [K] Temperature [K] Figure 4.12: Temperature dependence for cutoff choice. Data relative to the rOH , rOOΘ and Sk definitions are shown in the first, second and third column, respectively. (Top) The oxygen-hydrogen, oxygen-oxygen radial distribution functions and the occupancy distribution are displayed from left to right. (Middle) Cutoff dependence as a function of temperature. (Bottom) Average number of hydrogen bonds with fixed and variable cutoffs are shown as empty and filled circles, respectively. six of the most commonly used water models. In Fig. 4.13 and Fig. 4.14 re- sults for the Sk definition are presented. The data reports on the position of the first minimum in the occupancy N distribution as a function of tempera- ture for different water models. According to the original prescription [148], 55 0.012 TIP3P SPC TIP4P SPC/E 0.010 TIP4P-Ew 0.008 TIP4P/2005 N min 0.006 0.004 240 280 320 Temperature [K] Figure 4.13: The position of the first minimum of the occupancy distribution relative to the Sk definition for different water models. Red line refer to the SPC/E model which was used for the rest of the analysis presented in this section. the hydrogen bond cutoff should be taken as the position of this minimum. The plot shows that this value strongly depends on both water model and temperature. Similar conclusions can be drawn for the case of rOH and rOO. To study Sk definition more precisely we built a contour map for SPC- E and TIP3P water models in similar way as in original study [148] (see Fig. 4.14). While for the case of SPC-E the default value of a cut-off overlaps with the obtained minimum, for TIP3P water model this two values are different. However, it is also clear that even usage of a minimum of occupancy does not perfectly separate bound and unbound states. Overall, analysis presented here put in evidence a number of limitations in current approaches, highlighting a general lack of consensus among them. Somewhat surprising was to find that two of the most recent definitions, Sk and TP , were the ones to agree the least with each other. This certainly motivates the exploration of alternative routes, like the use of multi-body definitions going beyond the classical pairwise models [157,158]. 56 Chapter 4: New strategies for the analysis of molecular dynamics trajectories SPC−E TIP3P 80 2 60 1.5 Ψ 40 1 20 0.5 0 0 1.5 2 2.5 3 1.5 2 2.5 3 r [Å] r [Å] Figure 4.14: Contour map with the parameters used for Sk definition for SPC-E and TIP3P water models at 300 K. On the left figure default value of the occupancy cutoff overlaps with the minimum we found (yellow line). With the TIP3P water model the default value of occupancy (black line) differs from the minimum we found (dotted line). 57 Chapter 5 Applications In the previous chapters we described some of the water anomalies as well as the problem of the hydrogen bond definition in molecular dynamics. Also the description of complex network framework was provided for the analysis of molecular dynamics trajectories. In this chapter the results of the molec- ular dynamics simulation of water is presented. In particular, we compare seven widely used classical rigid-body water models in terms of their local structure at wide range of temperatures. Also we present similar analysis of the polarizable SWM4-NDP water model. The complex network approach is presented in order to build free energy landscape of water. And at last we show the analysis of proton transfer events in bulk water with simplified network approach. 5.1 Study of classical water models at ambi- ent pressure The simplest water models in molecular dynamics consists only of three atoms connected with unbreakable covalent bonds. Its potential is composed with only two pieces: Lennard-Jones potential for van der Vaals force and repul- sion and the simple electrostatic potential [22]. Surprisingly, this relatively 58 Chapter 5: Applications simple models with fixed charges and geometry are able to reproduce the phase diagram as well as many of the anomalies of water with good accu- racy [159, 160]. For example, all popular classical water models present a density maximum [55,161]. However, only those that explicitly included this information in the fitting of the potential are able to correctly reproduce the experimental value located at around 277 K at ambient pressure [162]. Due to their improved speed, biomolecular simulations in explicit wa- ter were traditionally run with TIP3P [54] or SPC [52]. Nowadays, more elaborated models can be easily used and their impact on the calculation assessed [163]. Optimized four site models reproducing the experimental temperature of maximum density seem to improve the accuracy of biomolec- ular simulations. For example, Best and collaborators showed that pre- dicted helical propensities are in better agreement with experiments when a TIP4P/2005 water model is chosen in place of the traditional TIP3P [164]. Others reported that TIP4P-Ew provides better free-energy estimations com- pared to conventional water models [165]. In both studies, the improved be- havior was not connected to a clear microscopic property of the water model. To this aim, one limitation is the lack of a common framework to compare the structural behavior of liquid water at the atomic level. Here, seven most popular classical water models, namely SPC [52], SPC/E [53], TIP3P [54], TIP4P [74], TIP4P-Ew [51], TIP4P/2005 [55] and TIP5P [24] were investigated in terms of their local structure forming capabilities. That is, their ability to form structured or partially structured environments of the size of up to two solvation shells through hydrogen bonds. The simulations of a classical water models were performed in tempera- ture range from 210 K to 350 K with steps of 10 K. TIP5P data was collected from 230 K, just before the approaching of the glass-transition [166]. The location of the maximum density was obtained from 1 ns long sim- ulations after 10 ns of equilibration. The temperature of maximum density was extracted by polynomial fitting around the maximum. Variations from the literature (see Table 5.1) may be due to size effects and a different treat- 59 Table 5.1: Temperature of maximum density calculated from our simulations (TMD), as found in the literature (TMDref ) and the structural temperature shift (∆Ts) for the seven water models investigated in this section. ref Water model TMD TMD ∆Ts T TIP3P 199 182 [161] 65 229 SPC 226 228 [161] 42 247 SPC/E 250 241 [167] 18 275 TIP4P 256 248 [54] 20 268 TIP4P/2005 280 278 [55] 0 287 TIP4P-Ew 273 274 [51] 6 281 TIP5P 282 285 [161,168] n.a. 269 ment of the electrostatics. The location of the TIP3P density maximum was obtained by running further simulations at lower temperatures. The free energy of a configuration i is given by ∆Fi = −kBT log(Pi), (5.1) where kB is the Boltzmann factor, T the temperature and Pi the popu- lation of the selected configuration. The enthalpy is estimated by summing up all pairwise contributions to the enthalpy between the water molecules belonging to the same configuration (i.e. sum of the Lennard-Jones and electrostatic interactions). The tetrahedral order parameter [71] for a water molecule i was calculated in the same way as for the microsecond simulation described in Chapter III of this work: 3 3 4 1 q = 1 − cos ψ + 2, (5.2) i 8 jik 3 j=1 k=j+1 X X 60 Chapter 5: Applications Here, we focus on water structural propensities which were used to analyze hydrogen bond definitions in Chapter IV. Water structure forming capabili- ties were investigated by analyzing the hydrogen-bond network of each water molecule in the simulation box together with its first and second solvation shells. A maximum of four hydrogen-bonds per molecule was considered. A bond is formed when the distance between oxygens and the angle O-H-O is smaller than 3.5 A˚ and 30 degrees, respectively [150]. Water structures were (∗) grouped into four archetypal configurations of population Pi : the fully co- ordinated first and second solvation shells for a total of 16 hydrogen-bonds (P4, see Fig. 5.1 for a schematic representation); the fully coordinated first shell, in which one or more hydrogen bonds between the first and the sec- ∗ ond shells are missing or loops are formed (P4); the three coordinated water molecule (P3) and the rest (P210). Within this representation the sum over the four populations is always equal to one for each temperature. In Fig. 5.2, the temperature dependence of the four microscopic water structures is shown. Among the different water models, the qualitative be- havior is strikingly similar. Three main types of temperature scalings were observed: increasing population with decreasing temperature (enthalpically stabilized); increasing population with increasing temperature (entropically stabilized); with a maximum, where a turnover between enthalpic and en- tropic stabilization takes place at a model dependent temperature. All four water configurations fall into one of these three main classes. The popula- tion of the fully ordered structure, P4, increases with decreasing temperature (Fig. 5.2, red empty circles). Consequently, this configuration is enthalpically stabilized. This is not the case when defects in the hydrogen bond structure ∗ are introduced (P4, filled red circles). For this configuration the population increases with decreasing temperature until it reaches a maximum in cor- respondence to a rapid increase of the population of the fully-coordinated configuration. The maximum is located in a temperature range close to the temperature of maximum density of the model under consideration (dashed vertical line). Finally, both P3 and P210 are mainly entropically stabilized, 61 P P * 4 4 13 16 17 13 14 16 17 4 5 4 5 12 15 12 15 1 1 8 11 8 11 2 3 2 3 6 7 9 10 6 7 9 10 P3 P2 13 14 13 14 16 17 4 4 5 12 12 15 1 1 8 11 8 11 2 3 2 3 6 7 9 10 6 7 9 10 Figure 5.1: Schematic representation of the four possible configurations of ∗ water solvation shells (P4,P4 ,P3,P2 population, see text). Dashed lines rep- resent hydrogen bonds. For clarity, all water molecules are labeled with numbers. showing larger populations at higher temperatures. Taken together, these results indicated that specific water configurations dominate at each temper- ature range: full-coordination extending to at least two solvation shells at low temperatures, four-coordinated configurations with no spatial extension at intermediate temperatures and mainly disordered ones at higher temper- atures. Despite these similarities, an important difference among the models is the temperature range at which the relative configurations become dominant. ∗ For example, the maximum population of P4 for the SPC model was observed around 245 K. This is not the case for TIP4P/2005, where the maximum is located at a 40 K larger temperature. The same behavior was observed com- paring the temperatures at which P4 and P3 are equal (e.g., around 270 K for TIP4P/2005). These observations suggested that a temperature shift 62 Chapter 5: Applications 0.8 0.8 0.8 spc/e spc tip3p 0.6 0.6 0.6 0.4 0.4 0.4 Population 0.2 0.2 0.2 0 0 0 200 250 300 350 200 250 300 350 200 250 300 350 0.8 0.8 0.8 tip4p/2005 tip4p-ew tip4p 0.6 0.6 0.6 0.4 0.4 0.4 Population 0.2 0.2 0.2 0 0 0 200 250 300 350 200 250 300 350 200 250 300 350 0.8 0.8 tip5p P4 * 0.6 0.6 P 4 0.4 0.4 P3 P Population 0.2 Population 0.2 210 0 0 200 250 300 350 200 250 300 350 Temperature Temperature Figure 5.2: Temperature dependence of water structure populations for seven ∗ classical water models. P4,P4,P3, and P210 are shown in red empty, filled red, blue, and cyan circles, respectively (see text for details). The gray stretch highlights the temperature difference between the calculated position of the temperature of maximum density (vertical dashed line, see also Table 5.1) and the experimental value at 277 K (solid line). The bottom right monochrome plot shows the superposition of all models after temperature shifting each data set (TIP4P/2005 data was used as reference). For each temperature, the sum over the 4 groups is equal to one. 63 A B 60 80 60 s 40 T