<<

Histograms and free energies ChE210D

Today's lecture: basic, general methods for computing entropies and free ener- gies from histograms taken in molecular simulation, with applications to phase equilibria.

Overview of free energies Free energies drive many important processes and are one of the most challenging kinds of quan- tities to compute in simulation. Free energies involve at constant temperature, and ultimately are tied to summations involving partition functions. There are many kinds of free energies that we might compute.

Macroscopic free energies We may be concerned with the Helmholtz free energy or Gibbs free energy. We might compute changes in these as a function of their natural variables. For single-component systems:

퐴(푇, 푉, 푁)

퐺(푇, 푃, 푁)

For multicomponent systems,

퐴(푇, 푉, 푁1, … , 푁푀)

퐺(푇, 푃, 푁1, … , 푁푀)

Typically we are only interested in the dependence of these free energies along a single param- eter, e.g.,

퐴(푉), 퐺(푃), 퐺(푇), etc. for constant values of the other independent variables.

Free energies for changes in the potential It is also possible to define a free energy change associated with a change in the interaction po- 푁 푁 tential. Initially the energy function is 푈0(퐫 ) and we perturb it to 푈1(퐫 ). If this change hap- pens in the canonical ensemble, we are interested in the free energy associated with this pertur- bation:

Δ퐴 = 퐴1(푇, 푉, 푁) − 퐴0(푇, 푉, 푁)

© M. S. Shell 2009 1/29 last modified 11/7/2019

푁 ∫ 푒−훽푈1(퐫 )푑퐫푁 = −푘퐵푇 ln ( 푁 ) ∫ 푒−훽푈0(퐫 )푑퐫푁

What kinds of states 1 and 0 might we use to evaluate this expression? Here is a small number of sample applications:

• electrostatic free energy – charging of an atom or atoms in a molecule, in which state 0 has zero partial charges and state 1 has finite values

• dipolar free energy – adding a point dipole to an atom between states 0 and 1

• solvation free energy – one can “turn on” interactions between a solvent and a solute as a way to determine the free energy of solvation

• free energy associated with a field – states 0 and 1 correspond to the absence and pres- ence, respectively, of a field, such as an electrostatic field

• restraint free energy – turning on some kind of restraint, such as confining a molecule to have a particular conformation or location in space. Such restraints would correspond to energetic penalties for deviations from the restrained space in state 1.

• free energies of alchemical transforms – we convert one kind of molecule (e.g., CH4) to

another kind (e.g., CF4). This gives the relative free energies of these two kinds of mole- cules in the system of interest (e.g., solvation free energies in solution).

Potentials of force (PMFs) Oftentimes we would like to compute the free energy along some order parameter or reaction coordinate of interest. These are broadly termed potentials of mean force, for reasons we will see shortly. This perspective enables us to understand free-energetic driving forces in many pro- cesses. For the purposes of this discussion, we will notate a PMF by 퐹(휉) where 휉 is the reaction coordinate of interest. This coordinate might be, for example:

• an intra- or intermolecular distance (or combination of distances)

• a bond or torsion angle

• a structural order parameter (e.g., degree of crystallinity, number of hydrogen bonds)

Consider the example of a protein in aqueous solution interacting with a surface. The reaction coordinate might be the distance between the center of mass of the protein and the surface:

© M. S. Shell 2009 2/29 last modified 11/7/2019

The PMF along 푧, 퐹(푧) would give the free energy of the system as a function of the protein- surface distance. It might look something like:

퐹(푧)

This curve would show us:

• the preferred distance at which the protein binds to the surface, from the value of 푧 at the free energy minimum

• the free energy change upon binding, from the difference in free energy between the minimum and large values of 푧

• the barrier in free energy for binding and unbinding, from the height of the hump

Importantly, the free energy function does not just include the direct potential energy interac- tions between atoms in the molecule with atoms in the surface. It also includes the effects of all of the interactions in the solvent molecules. This may be crucial to the behavior of the system.

© M. S. Shell 2009 3/29 last modified 11/7/2019

For example, the direct pairwise interactions of an alkane with a silica surface will be the same regardless of whether the solvent is water or octanol. However, the net interaction of the alkane and surface will be very different in the two cases due to solvent energies and entropies, and this effect is exactly determined by the PMF.

Definition Formally, a potential of mean force (the free energy) along some reaction coordinate 휉 is given by a partial integration of the partition function. In the canonical ensemble, we begin with the configurational part of the Helmholtz free energy,

퐹(휉) = 퐴푐(푇, 푉, 푁, 휉) = −푘퐵푇 ln 푍(푇, 푉, 푁, 휉) −훽푈(퐫푁) ̂ 푁 푁 = −푘퐵푇 ln ∫ 푒 훿[휉 − 휉(퐫 )]푑퐫

Here, 휉̂(퐫푁) is a function that returns the value of the order parameter for a particular configu- ration 퐫푁. The integral in this expression entails a delta function that filters for only those Boltz- mann factors for configurations with the specified 휉.

One can think of the PMF as the free energy when the system is constrained to a given value of 휉. Notice that we have the identity

∫ 푒−훽퐹(휉)푑휉 = 푒−훽퐴푐

The potential of mean force is so-named because its derivative gives the average force along the direction of 휉 at equilibrium. We proceed to find the derivative of the PMF:

푑퐹(휉) 푑 푁 = −푘 푇 ln ∫ 푒−훽푈(퐫 )훿[휉 − 휉̂(퐫푁)]푑퐫푁 푑휉 퐵 푑휉 푑 푁 −훽푈(퐫 ) [ ̂( 푁)] 푁 푑휉 ∫ 푒 훿 휉 − 휉 퐫 푑퐫 = −푘퐵푇 ∫ 푒−훽푈(퐫푁)훿[휉 − 휉̂(퐫푁)]푑퐫푁 To make progress, we need the mathematical identity

푑 푑푔(푥) ∫ 푔(푥)훿(푥 − 푎)푑푥 = ∫ 훿(푥 − 푎)푑푥 푑푎 푑푥

This allows us to pull the derivative inside the integral:

푑 푁 ( 푒−훽푈(퐫 )) 훿[휉 − 휉̂(퐫푁)]푑퐫푁 푑퐹(휉) ∫ 푑휉 = −푘퐵푇 푑휉 ∫ 푒−훽푈(퐫푁)훿[휉 − 휉̂(퐫푁)]푑퐫푁

© M. S. Shell 2009 4/29 last modified 11/7/2019

푑푈 푁 −훽푈(퐫 ) [ ̂( 푁)] 푁 ∫ (−훽 푑휉 푒 ) 훿 휉 − 휉 퐫 푑퐫 = −푘퐵푇 ∫ 푒−훽푈(퐫푁)훿[휉 − 휉̂(퐫푁)]푑퐫푁

−훽푈(퐫푁) ̂ 푁 푁 ∫ 푓휉푒 훿[휉 − 휉(퐫 )]푑퐫 = − ∫ 푒−훽푈(퐫푁)훿[휉 − 휉̂(퐫푁)]푑퐫푁

Here, the term 푓휉 gives the force along the direction of 휉,

푑푈 푓 = − 휉 푑휉 푑퐫푁 = − ⋅ ∇푈 푑휉̂ 푑퐫푁 = ⋅ 퐟푁 푑휉̂

The remainder of the terms in the PMF equation serve to average the force for a specified value of 휉. Thus,

푑퐹(휉) = −⟨푓 (휉)⟩ 푑휉 휉

Paths Keep in mind that free energies are state functions. That is, if we are to compute a change in any free energy between two conditions, we are free to pick an arbitrary path of interest between them. This ultimately lends flexibility to the kinds of simulation approaches that we can take to compute free energies.

Overview of histograms in simulation Until now, we have focused mainly on computing property averages in simulation. Histograms, on the other hand, are concerned with computing property distributions. These distributions can be used to compute averages, but they contain much more . Importantly, they relate to the fluctuations in the ensemble of interest, and ultimately can be tied to statistical-mechan- ical partition functions. It is through this connection that histograms enable us to compute free energies and entropies.

Definitions and measurement in simulation For the purposes of illustration, we will consider a histogram in potential energy. In our simula- tion, we might measure the distribution of the variable 푈 using a long simulation run and many observations of the instantaneous value of 푈.

© M. S. Shell 2009 5/29 last modified 11/7/2019

In classical systems, the potential energy is a continuously-varying variable. Therefore, the un- derlying ℘(푈) is a continuous . However, in the computer we must meas- ure a discretized version of this distribution.

• We specify a minimum and maximum value of the energy that defines a of energies

in which we are interested. Let these be 푈min and 푈max.

• We define a set of 푚 bins into which the energy range is discretized. Each bin has a bin width of

푈 − 푈 훿푈 = max min 푚

• Let the variable 푘 be the bin index. It varies from 0 to 푚 − 1. The average energy of bin 푘 is then given by

1 푈 = 푈 + (푘 + ) 훿푈 푘 min 2

• We create a histogram along the energy bins. This is simply an array in the computer that measures counts of observations:

푐푘 = counts of 푈 observations in the range [푈푘 − 훿푈⁄2 , 푈푘 + 훿푈⁄2)

For the sake of simplicity, we will often write the histogram array using the energy, rather than the bin index,

푈 − 푈min 푐(푈) = 푐 where 푘 = int ( ) 푘 훿푈

Here the int function returns the integer part of its argument. For example, int(2.6) = 2.

To determine a histogram in simulation, we perform a very large number of observations 푛 from a long, equilibrated molecular simulation. At each observation, we update:

푐(푈) ← 푐(푈) + 1

This update is only performed if 푈min ≤ 푈 < 푈max. Otherwise, the energy would be outside of the finite range of energies of interest. However, we still need to keep count of all energies, whether or not inside the range, in order to properly normalize our histogram.

We can normalize the histogram to determine a discretized approximation to the true underlying distribution ℘(푈) in the energy range of interest:

© M. S. Shell 2009 6/29 last modified 11/7/2019

푐(푈) ℘̃(푈)훿푈 = 푛 where 푛 is the total number of observations, including those energies outside of the defined range. On the LHS we include the bin width so as to approximate the continuous probability differential ℘(푈)푑푈. Thus,

푐(푈) ℘̃(푈) = 푛훿푈

In the limit of an infinite number of observations from an infinitely long, equilibrated simulation, this approximation converges to the true one in the following manner:

푈푘+훿푈⁄2 ℘̃(푈푘)훿푈 = ∫ ℘(푈)푑푈 푈푘−훿푈⁄2

This equation simply says that we sum up all of the underlying probabilities for the continuous energies within a bin to obtain the observed, computed probabilities. As the bin width goes to zero, we have

lim ℘̃(푈) = ℘(푈) 훿푈→0,푛→∞

Notice that there are two components to this limit:

• We need an infinite number of observations.

• We need an infinite number of bins.

These two limits “compete” with each other: as we increase the number of bins, we need more observations so that we have enough counts in each bin to have good statistical accuracy. Prac- tically speaking, we must choose a finite bin width that enables us to balance the length of the run with statistical accuracy in each bin. Typically, for the energy example above,

• the bin width is chosen to be of the order of a basic energy scale in the force field. For a Lennard-Jones system, this might be 휖.

• the simulation is performed long enough to achieve on the order of ~1000 or more aver- age counts per bin, that is, 푛 ≥ 푚 × 1000.

Statistical considerations Keep in mind that the computation of a histogram is subject to the same statistical considerations as simple simulation averages. That is, the histogram needs to be performed for many correlation

© M. S. Shell 2009 7/29 last modified 11/7/2019

times of the energy 푈 in order to reach good statistical accuracy. It can be shown that the ex- pected squared error in the histogram bin 푐(푈) goes as

2 𝜎푐(푈) ∝ 푐(푈)

This implies that the error goes as the square root as the number of counts. Similarly, the ex- pected squared error in the corresponding estimate ℘̃(푈) goes as:

℘(푈) 𝜎2 ∝ ℘̃(푈) 푛 The relative error in ℘̃(푈) is given by:

𝜎℘̃(푈) 1 ∝ ℘̃(푈) √푛℘(푈)

Notice that the relative error is higher for lower values of ℘(푈), i.e., at the tails of the distribu- tion.

Multidimensional histograms In this example, we considered only a histogram of potential energy. However, it is possible to construct histograms of many kinds of simulation observables. In other ensembles, we might compute distributions of other fluctuating macroscopic quantities like 푁 in GCMC simulations and 푉 in 푁푃푇 simulations. We can also view histograms of arbitrary parameters of interest, such as the end-to-end distance of a polymer chain or that number of hydrogen bonds a water mole- cule makes.

We can also compute joint distributions using multidimensional histogram arrays. For example,

푐(푈, 푉) ℘̃(푈, 푉)훿푈훿푉 = 푛

Note that any continuous variables will require discretization and specification of a bin width. Discrete variables, on the other hand, do not require such a definition because the underlying distribution itself is discrete:

푐(푁) ℘̃(푁) = 푛

Many kinds of distributions, however, require the specification of a minimum and maximum ob- servable value.

© M. S. Shell 2009 8/29 last modified 11/7/2019

Connection to statistical mechanics The true power of histograms is that they allow us to measure fluctuations in the simulation that can be used to extract underlying partition functions. That is, we measure ℘(푈) in simulation and then we post-process this discretized function to make connections to free energies and en- tropies.

Consider the energy distribution ℘(푈). Its form in the canonical ensemble is given by the ex- pression:

−훽푈 Ωc(푈, 푉, 푁)푒 ℘(푈) = 푍(푇, 푉, 푁) where 푍(푇, 푉, 푁) is the configurational partition function,

−훽푈 푍(푇, 푉, 푁) = ∫ Ωc(푈, 푉, 푁)푒 푑푈

Here, Ω푐 is the configurational density of states. It is the part of the microcanonical partition function that corresponds to the potential energy and configurational coordinates. It is defined by the equation

푁 푁 Ω푐(푈, 푉, 푁) = ∫ 훿[푈(퐫 ) − 푈]푑퐫

The reason we are concerned only with the configurational energy distribution is that the kinetic component can always be treated analytically, within the context of equilibrium simulations.

We can rewrite both using the dimensionless configurational entropy:

푆푐(푈, 푉, 푁) ≡ ln Ω푐(푈, 푉, 푁)

푒푆푐(푈,푉,푁)−훽푈 ℘(푈) = 푍(푇, 푉, 푁) = ∫ 푒푆푐(푈,푉,푁)−훽푈푑푈 푍(푇, 푉, 푁)

Basic histogram analysis and reweighting For the remainder of this section we will drop the subscript “c” from the configurational proper- ties, for simplicity.

The above formalism shows that if we were able to compute the function 푆(푈, 푉, 푁), we would be able to predict the complete energy distribution ℘(푈) at any temperature. In fact, things are a bit simpler than this: we only need the energy-dependence of 푆, since the volume and number of particles stay fixed.

© M. S. Shell 2009 9/29 last modified 11/7/2019

For example, we can compute the average potential energy at any temperature using the expres- sion

⟨푈⟩(푇) = ∫ 푈℘(푈; 푇)푑푈 푈푒푆(푈)−훽푈푑푈 = ∫ 푍(푇) ∫ 푈푒푆(푈)−훽푈푑푈 = ∫ 푒푆(푈)−훽푈푑푈

Here, we use the notation ℘(푈; 푇) to signify a distribution at a given specified temperature 푇.

Extracting the entropy and density of states from a histogram

We might extract an estimate for 푆푐 by measuring ℘(푈; 푇) from a histogram in a canonical sim- ulation at specified temperature 푇. Inverting the above relationship,

푍(푇)℘(푈; 푇) 푆(푈) = ln [ ] 푒−훽푈 = ln ℘(푈; 푇) + 훽푈 + ln 푍(푇) = ln ℘(푈; 푇) + 훽푈 − 훽퐴(푇)

Here, 퐴(푇) technically denotes the configurational part of the Helmholtz free energy, which is 3푁 푘퐵푇 ln[푁! Λ(푇) ] less than the total free energy in a single component system.

We can measure ℘(푈; 푇) using a histogram along a set of discrete energies. Post-simulation, we can then take this measured distribution and use it to compute a discrete approximation to the dimensionless entropy function at the same energies,

푆(푈푘) = ln ℘(푈푘; 푇) + 훽푈푘 − 훽퐴(푇)

Notice that the partition function part of this expression is a temperature-dependent constant that is independent of potential energy 푈. Thus, we can compute the entropy function 푆 to within an additive constant using this approach. This also says that we can compute the configurational density of states to within a multiplicative constant. This basic inversion of the probability distri- bution function underlies many histogram-based methods.

Keep in mind that 푆(푈) is fundamentally independent of temperature since it stems from the microcanonical partition function. That is, any temperature-dependencies on the RHS of this equation should exactly cancel to leave a 푇-independent function. Another way of putting this is that, in principle, it does not matter at which temperature we measure ℘(푈; 푇). At equilib- rium, the ℘(푈; 푇) from every temperature should give the same 푆(푈) function.

© M. S. Shell 2009 10/29 last modified 11/7/2019

Free energy differences from histograms

Imagine that we measure ℘(푈; 푇1) and ℘(푈; 푇2) from two simulations at different tempera- tures. We can use the relationship of both with the density of states to compute a free energy difference. We construct

℘(푈; 푇2) ln = [푆(푈) − 훽2푈 + 훽2퐴(푇2)] − [푆(푈) − 훽1푈 + 훽1퐴(푇1)] ℘(푈; 푇1) = −(훽2 − 훽1)푈 + 훽2퐴(푇2) − 훽1퐴(푇1)

Rearranging,

℘(푈; 푇2) 훽2퐴(푇2) − 훽1퐴(푇1) = ln + (훽2 − 훽1)푈 ℘(푈; 푇1)

This expression provides us with a way to determine the configurational free energy difference between temperatures 1 and 2 using measured histograms of the energy distribution.

You may find it interesting that the RHS is dependent on 푈, whereas the LHS is not. In fact, the free energies should have no 푈-dependence. In principle, any value of 푈 could be plugged into the RHS and the same free energy would be returned. In practice, statistical errors in the meas- ured ℘(푈) mean that the estimate for the free energy can have different accuracies depending on the value of 푈 chosen. More on this below.

Reweighting With a computed a discrete approximation to the underlying energy distribution, we can also use it to predict ℘(푈) at temperatures other than the original simulation temperature. This basic procedure is called reweighting, since we use a distribution measured at one temperature to predict that at another. The scheme is as follows.

Imagine 푆(푈) is computed from ℘(푈; 푇1) in a canonical simulation at 푇1. We have

푆(푈) = ln ℘(푈; 푇1) + 훽1푈 − 훽1퐴(푇1)

We want to predict ℘(푈; 푇2) at another temperature 푇2. We have,

푒푆(푈)−훽2푈 ℘(푈; 푇2) = 푍(푇2)

Plugging in the above expression for 푆,

푍(푇1) −(훽2−훽1)푈 ℘(푈; 푇2) = ℘(푈; 푇1)푒 푍(푇2)

© M. S. Shell 2009 11/29 last modified 11/7/2019

However, the last term on the RHS involving the ratio of partition functions is independent of 푈. We can find it using the probability normalization condition,

∫ ℘(푈; 푇2)푑푈 = 1

Thus,

℘(푈; 푇 )푒−(훽2−훽1)푈 ( ) 1 ℘ 푈; 푇2 = −(훽 −훽 )푈 ∫ ℘(푈; 푇1)푒 2 1 푑푈

This equation is an important one. It states that a distribution measured at 푇1 can be used to predict a distribution at a different 푇2. Thus, in principle, we would only need to perform a single simulation, measure ℘(푈) once, and use this expression to examine the energy distribution at any other temperature of interest. For example, we might compute the average energy as a function of temperature using the equation

⟨푈⟩(푇2) = ∫ 푈℘(푈; 푇2)푑푈

−(훽2−훽1)푈 ∫ 푈℘(푈; 푇1)푒 푑푈 = −(훽 −훽 )푈 ∫ ℘(푈; 푇1)푒 2 1 푑푈

Similar expressions could be found for other moments of the potential energy, such as ⟨푈2⟩. These could be used to compute the temperature dependence of the heat capacity, using the 2 2 2 relationship 푘퐵푇 퐶푉 = ⟨푈 ⟩ − ⟨푈⟩ .

Statistical analysis Unfortunately, there are practical limits to the histogram reweighting procedure just described. The main problem is the measurement of ℘(푈) to good statistical accuracy in the tails of its distribution. Consider that a typical ℘(푈) is very sharply peaked:

℘(푈)

© M. S. Shell 2009 12/29 last modified 11/7/2019

1 − The width of the distribution relative to the mean goes roughly as 푁 2 so that, for macroscopic systems, the distribution is infinitely peaked.

The implication of this result is that it is very hard to measure the distribution at it tails, where we typically only have a few counts in each bin. If we reweight a measured distribution to a temperature where the tails change to high probability, the error can be magnified to be very large.

푇1 ℘(푈)

푇2 > 푇1

푈∗ 푈

Here, the mean of the distribution at the new temperature is well within the tail region of the distribution at the original temperature. If we reweight ℘(푈; 푇1) to 푇2, errors in the tails will be magnified. The error in the new distributions can be written approximately as

℘(푈; 푇 ) 2 𝜎2 ≈ 𝜎2 ( 2 ) ℘(푈;푇2) ℘(푈;푇1) ℘(푈; 푇1)

This formula is derived using propagation rules and the reweighting expression.

Notice that if 푇2 = 푇1, the error is the same at each energy as from the original measurement. Otherwise, we must weigh the error by a ratio involving the two probability distributions. In the above picture, the ratio at the mean energy

∗ ℘(푈 ; 푇2) ∗ ℘(푈 ; 푇1) is very large due to the small probability of this energy at 푇1. Therefore, the error is greatly mag- nified in the reweighting procedure according to the above equation.

In general,

• Distributions are subject to statistical inaccuracies at their tails, owing to the finite num- ber of counts in each bin.

© M. S. Shell 2009 13/29 last modified 11/7/2019

• If the important parts of the energy distribution at 푇2 correspond to the tails of a meas-

ured distribution at 푇1, a reweighting procedure will fail due to large statistical errors.

• If computed from a measured ℘(푈; 푇), the of an estimated 푆푐(푈) function is only good around the frequently-sampled energies, i.e., for energies where the histogram has many entries.

These techniques limit the determination of free energies and entropies using measurements from single histograms. The solution here, as we will see next, it to incorporate multiple histo- gram estimates of these quantities.

Multiple histogram reweighting Imagine that we were able to take histograms from multiple simulations 푖, each at a different temperature 푇푖. For each, we could get a different estimate of the underlying dimensionless configurational entropy function 푆:

푆(푈) = ln ℘(푈; 푇푖) + 훽푖푈 − 훽푖퐴(푇푖)

We should get the same function every time. We don’t know the free energies, but they are independent of 푈 and thus can be treated as additive constants. We can therefore shift the different estimates for the entropy function up or down in value until they all match each other:

푇1 ℘(푈)

푇2 > 푇1

푇3 > 푇2

ln ℘(푈) + 훽푈 + const

© M. S. Shell 2009 14/29 last modified 11/7/2019

This is the basic idea behind multiple-histogram methods. Notice that by finding the amounts to shift each curve to achieve overlap, we are essentially determining free energy differences be- tween the adjacent state points.

The trick in these calculations is to determine the shift amounts. The best ways of doing this take into account the different errors for each energy in each measured ℘(푈). Below we describe methods for computing free energy differences so as to maximize the statistical quality of the predicted results.

For this approach to work, we must have good overlap between the histograms. That is, there must be ranges of the histograms that overlap with a reasonable number of counts in each, for statistical efficiencies. Otherwise, it will be challenging to determine the shift amounts to good statistical accuracy.

Ferrenberg-Swendsen reweighting and WHAM Ferrenberg and Swendsen in 1989 proposed an optimal way to stitch together different histo- grams by minimizing the statistical error in the computed density of states and entropy function. It was later generalized by [Kumar et al., 1992] and was named the Weighted Histogram Analysis Method (WHAM). Although the original derivation relied on certain forms for the error propaga- tion, the same result can be derived using a more elegant maximum likelihood approach, which is what we discuss here.

The maximum likelihood approach is a statistical method for parameterizing probability models. It simply says the following: posit some form of the underlying probability distribution function for a random process. Then, given an observed set of events, the best parameters for that dis- tribution are those that maximize the probability (likelihood) of the observed events.

In this context, imagine we are trying to determine S(푈) from 퐽 simulations at different temper- atures 푇푗. We know that the underlying S(푈) for each should be the same, but that the measured histograms 푐푗(푈) will be different because the simulations are performed at different tempera- tures. The observed events are the energies tabulated in the histograms.

Say that we make 푛 independent observations of the energy 푈 in each temperature. Here, inde- pendent implies that we are beyond twice the correlation time. In order to make a connection with the histogram, we will discretize all of energy space into discrete values 푈푘 separated by intervals 훿푈. Before we start, let’s define some notation:

• 푛 – number of observations at each temperature

• 푖 – index of observations (푖 = 1, … , 푛)

© M. S. Shell 2009 15/29 last modified 11/7/2019

• 푗 – index of temperature (푗 = 1, … , 퐽)

• 푘 – index of discrete energy values

• 푈푘 – discrete energy values

• Ω푘, 푆푘 – density of states and entropy at the values 푈푘

• 푈푖푗 – energy of the 푖th observation at temperature 푗

• 푐푗푘– count of histogram entries for temperature 푗 and energy bin 푘

Derivation With these notations, we construct the total probability or likelihood 퐿 of making the 푛퐽 obser- vations from the simulations in terms of a yet-unknown unknown density of states function:

푛 퐽

퐿 = ∏ ∏ ℘(푈푖푗; 푇푗) 푖=1 푗=1 푛 퐽 푒푆(푈푖푗)−훽푗푈푖푗 = ∏ ∏ ( ) 푍푗 푖=1 푗=1

Here, the 푍푗 are formally given by

푆(푈)−훽푗푈 푍푗 = ∫ 푒 푑푈

In terms of our discrete energies,

푆푘−훽푗푈푘 푍푗 = ∑ 푒 푘

Note that

−훽퐴푗 푍푗 = 푒 where the 퐴푗 are the free energies at each temperature.

With all of these considerations, we can rewrite the above probability in terms of histograms:

퐽 푐 푒푆푘−훽푗푈푘 푗푘 퐿 = ∏ ∏ ( ) 푍푗 푘 푗=1

© M. S. Shell 2009 16/29 last modified 11/7/2019

퐽 푐 = ∏ ∏(푒푆푘−훽푗푈푘+훽푗퐴푗 ) 푗푘 푘 푗=1

Taking the logarithm,

ln 퐿 = ∑ ∑ 푐푗푘[푆푘 − 훽푗 푈푘 + 훽푗 퐴푗] 푘 푗=1 퐽 퐽

= ∑ ∑ 푐푗푘[푆푘 − 훽푗 푈푘] + 푛 ∑ 훽푗퐴푗 푘 푗=1 푗=1 According to the maximum likelihood approach, we want to maximize this probability with re- spect to any adjustable parameters in our model and the given observations. The former are the yet unknown values 푆푘 while the latter are the histograms 푐푗푘. Therefore, we take the derivative with respect to 푆푘 and set it equal to zero:

휕 ln 퐿 0 = for all 푘 휕푆푘

Evaluating, 퐽 휕퐴푗 0 = ∑ (푐푗푘 + 푛훽푗 ) 휕푆푘 푗=1

The latter derivative is

휕퐴푗 휕 ln 푍푗 훽푗 = − 휕푆푘 휕푆푘 푒푆푘−훽푗푈푘 = − 푍푗 Substituting back in:

푆푘−훽푗푈푘+훽푗퐴푗 0 = ∑(푐푗푘 − 푛푒 ) for all 푘 푗=1

We can solve for 푆푘:

퐽 퐽 −1

푆푘 −훽푗푈푘+훽푗퐴푗 푒 = [∑ 푐푗푘] [푛 ∑ 푒 ] 푗=1 푗=1

© M. S. Shell 2009 17/29 last modified 11/7/2019

Notice that this equation now provides us with a recipe for computing the density of states (the LHS) based on histogram . Let’s rewrite it without the index 푘 for clarity:

퐽 −1 푐푡표푡(푈) 푒푆(푈) = [∑ 푒−훽푗푈+훽푗퐴푗 ] 푛 푗=1

Here, we have made the definition that 푐푡표푡(푈) gives the total number of histogram counts of energy 푈, at any temperature.

Even though this expression gives us a way to determine the entropy function from multiple sim- ulation results, note that the RHS involves the free energy at every temperature. This free energy depends on the entropy itself:

−1 푆(푈)−훽푗푈 퐴푗 = −훽푗 ln ∑ 푒 푈

Iterative solution How can we solve for 푆(푈)? Ferrenberg and Swendsen suggested an iterative solution:

1. An initial guess is made for the 퐽 values of 퐴푗. One can simply choose 퐴푗 = 0 for all 푗.

2. The (discretized) entropy at every energy is computed using

−훽푗푈+훽퐴푗 푆(푈) = ln 푐푡표푡(푈) − ln 푛 − ln ∑ 푒 for all 푈 푗=1

3. The free energies are recalculated using

푆(푈)−훽푗푈 −훽푗퐴푗 = ln ∑ 푒 for all 푗 푈

4. Steps 2 and 3 are repeated until the free energies no longer change upon each iteration. In practice, one typically checks to see if the free energies change less than some frac- tional tolerance.

5. According to the above equations, the entropy function and the combinations 훽푗퐴푗 can- not be determined to within more than an additive constant. Thus, with each iteration,

one typically sets one of the free energies equal to zero at one temperature, 퐴푗=1 = 0.

© M. S. Shell 2009 18/29 last modified 11/7/2019

This iterative procedure can be fairly computationally expensive since it involves sums over all temperatures and all discretized energies with each iteration. It is not uncommon to require on the order of 100-1000 iterations to achieve good convergence.

However, once convergence has been achieved, energy averages at any temperature—even tem- peratures not included in the original dataset—can be computed with the expression

℘(푈; 푇) ∝ 푒푆(푈)−훽푈

The constant of proportionality is determined by normalizing the probability distribution.

This general approach is termed multiple histogram reweighting, and it can be shown to be the statistically optimal method for reweighting data from multiple simulations. That is, it makes the maximum use of information and results in reweighted distributions for ℘(푈; 푇) that have the least statistical error. Importantly, it provides statistically optimal estimates of free energy dif- ferences 퐴푗 − 퐴푗+1.

Configurational weights Using the multiple histogram technique, it is possible to define a weight associated with each configuration 푖 at each temperature 푗 at the reweighting temperature 푇:

푒푆(푈푖푗)−훽푈푖푗 푤푖푗 = 푐푡표푡(푈푖푗 )

In practice, the logarithm of the weights are maintained in the computer rather than the weights themselves, so as to maintain precision.

Any property depending on the configurational coordinates can be determined by a sum over all of the 푛퐽 observations, using the weights:

퐽 푛 ∑ ∑푖=1 푤푖푗 푋푖푗 ⟨푋⟩ = 푗=1 퐽 푛 ∑푗=1 ∑푖=1 푤푖푗

Here, 푋푖푗 is the value of the property 푋 for the 푖th configuration observed at temperature 푗. For example, one can compute the mean potential energy,

퐽 푛 ∑ ∑푖=1 푤푖푗 푈푖푗 ⟨푈⟩ = 푗=1 퐽 푛 ∑푗=1 ∑푖=1 푤푖푗

© M. S. Shell 2009 19/29 last modified 11/7/2019

However, keep in mind that 푋 can be any property of interest. The multiple histogram technique provides a way to determine its average at any arbitrary reweighting temperature 푇 and provides a statistically optimal estimator of the average.

This approach also enables one to determine the distribution ℘(푋) using the weights:

퐽 푛 ( ) ℘ 푋 ∝ ∑ ∑ 푤푖푗 훿푋푖푗,푋 푗=1 푖=1

That is, for a given value of 푋, the probability ℘(푋) includes all of the weights from configurations

푖푗 that have 푋푖푗 = 푋. Again, the constant of proportionality stems from the probability normali- zation condition.

Binless implementation One possible drawback of this approach is that it requires us to specify a bin width 훿푈 for com- puting histograms and tabulating energies. It is also possible to follow this derivation in the limit that 훿푈 → 0. In this limit, the expression for the free energies are:

−1 퐽 푛 퐽

−훽푗푈푖푙 −훽푚푈푖푙+훽푚퐴푚 −훽푗 퐴푗 = − ln 푛 + ln [∑ ∑ 푒 ( ∑ 푒 ) ] 푙=1 푖=1 푚=1 where 푙 and 푚 are also indices over temperature. As with the previous approach, this equation must be iterated to solve for the free energies 퐴푗. Notice that it involves a quadruple loop, once over observations and three times over temperatures (the indices 푖, 푗, 푚). Therefore, conver- gence can be much slower than the histogram version.

Once the free energies are determined using this expression, the expression for the configuration weights is:

푒−훽푈푖푗 푤푖푗 ∝ 퐽 훽푙퐴푙−훽푙푈푖푗 ∑푙=1 푒

Precision issues It is very important to keep in mind that the combined exponentiation and logarithm functions in these expressions require careful treatment in numerical evaluation in the computer in order to avoid precision inaccuracies. More on this issue is described in the document “Simulation best practices.” As a quick example, consider the computation of

© M. S. Shell 2009 20/29 last modified 11/7/2019

푀 −퐹 = ln ∑ 푒푤푖 푖=1

To get around this numerical precision issues, we rearrange this expression so that the terms in the exponential are better-behaved:

푀 푒푤max −퐹 = ln [( ) ∑ 푒−푤푖 ] 푒푤max 푖=1 푀 푒푤푖 = 푤max + ln ∑ 푒푤max 푖=1 푀

푤푖−푤max = 푤max + ln ∑ 푒 푖=1 where

푤max = max 푤푖

Other thermodynamic potentials The multiple histogram technique is not limited to determining 푆(푈). It can also be used to de- termine the joint functions 푆(푈, 푁) or 푆(푈, 푉) if histograms are taken not only in energy but in particle number of volume fluctuations, respectively. Such might be the case in grand-canonical or isothermal-isobaric MC simulations. The Ferrenberg-Swendsen equations change slightly in each case, but the derivation proceeds conceptually as above. We will discuss one implementa- tion in the context of phase equilibria, below.

Alternatively, one can find other underlying thermodynamic potentials, besides the entropy 푆, using the reweighting approach. Imagine, for example, that we perform 퐽 simulations in the grand-canonical ensemble, each at the same temperature 푇 but with a different chemical poten- tial 휇푗. For each, we collect a histogram in the particle number, 푐푗푘 where 푘 is now an index over different values of 푁.

Statistical mechanics shows that the histogram of 푁 in the grand canonical ensemble relates to the Helmholtz free energy,

푄(푁)푒훽휇푁 ℘(푁; 휇) = Ξ(휇) 푒−훽퐴(푁)+훽휇푁 = Ξ(휇)

© M. S. Shell 2009 21/29 last modified 11/7/2019

Here, the dependence of 퐴 and Ξ on 푇 and 푉 has been suppressed since these variables are constant throughout.

The total likelihood is therefore

푛 퐽

℘푡표푡 = ∏ ∏ ℘(푁푖푗; 푇푗) 푖=1 푗=1 푛 퐽 푒−훽퐴(푁푖푗)+훽휇푗푁푖푗 = ∏ ∏ ( ) Ξ푗 푖=1 푗=1

A derivation similar to the one above shows that the reweighting equations providing the solu- tion to 퐴(푁) and the 퐽 values Ξ푗 are:

훽휇푗푁+훽퐹푗 퐴(푁) = ln 푐푡표푡(푁) − ln 푛 − ln ∑ 푒 for all 푁 푗=1

퐴(푁)+훽휇푗푁 −훽퐹푗 = ln ∑ 푒 for all 푗 푁

Notice here the change in roles relative to the previous example

• 푁 takes the place of 푈.

• 휇 takes the place of 푇.

• Instead of computing 푆(푈), we compute the underlying function 퐴(푁). That is, we find the partition function corresponding to variations in 푁 fluctuations at constant 푇, since all of our simulations were at the same temperature.

• We compute 퐹푗 = −푘퐵푇 ln Ξ푗 at each chemical potential, rather than 퐴푗 at each temper-

ature. By statistical mechanics, 퐹푗 = −푃푗푉. Thus, the differences between these weights provide information about the pressure differences.

It is possible to determine many different kinds of free-energetic quantities in this way. In gen- eral,

• To compute a free energy or entropy 퐹 as a function of some macroscopic parameter 푋, the simulation must accomplish fluctuations in 푋. For example, 퐴(푉) requires fluctua- tions in 푉 and thus would necessitate use of the isothermal-isobaric ensemble.

© M. S. Shell 2009 22/29 last modified 11/7/2019

• To have good statistical accuracy, a wide range of fluctuations in 푋 must be sampled. This can be accomplished by performing multiple simulations at different values of the param- eter conjugate to 푋. To compute 퐴(푉), different pressures 푃 would be imposed for mul- tiple simulations at the same temperature.

• The multiple-histogram reweighting procedure will compute 퐹(푋) as well as the relative partition functions at the different simulation conditions. In the example for 퐴(푉), we

would find 퐴(푉) as well as 퐺푗, where 퐺 is the Gibbs free energy of the isothermal-isobaric ensemble.

Potentials of mean force (PMFs) The connection between a PMF and simulation observables stems from the probability distribu- tion of the reaction coordinate of interest. Based upon the fact that the total partition function can be expressed as a sum over the PMF partition function, we have that

푒−훽퐹(휉) ℘(휉) = 푍 That is, the probability that one will see configurations with any value of 휉 is proportional to a Boltzmann factor involving the potential of mean force at 휉.

This approach enables us to measure PMFs in simulation by computing histograms along reaction coordinates. We have,

퐹(휉) = −푘퐵푇 ln ℘(휉) − 푘퐵푇 ln 푍 = −푘퐵푇 ln ℘(휉) + const

Notice that 푍 provides an additive constant independent of the reaction coordinate. Typically, we cannot determine the PMF in absolute units, which would require the determination of the absolute free energy of the system. Instead, we can only compute 퐹(휉) curves to within an ad- ditive constant. This is not normally a problem, however, since we are usually only interested in relative free energies along 휉.

One of the limitations of this basic approach is that we cannot compute large changes in free energies due to the statistics of the histogram. Imagine that we examine a free energy difference at two values of 휉, 퐹(휉2) − 퐹(휉1). Say that this value is equal to 4푘퐵푇, which is a moderate energy equal to about 2 kcal/mol at room temperature. The relative probabilities at these two values would be

℘(휉2) = 푒−훽(퐹(휉2)−퐹(휉1)) ℘(휉1)

© M. S. Shell 2009 23/29 last modified 11/7/2019

= 푒−4 ≈ 0.02

If we were to measure a histogram and the bin for 휉1 contained 1000 entries, the bin for 휉2 would only contain ~20 entries. Thus, the statistical error for the latter would be quite high.

The main point here is that it is difficult to measure free energies for rarely sampled states. One way around this, which we will discuss in a later lecture, is to bias our simulation to visit these states. Though the introduction of the bias modifies the ensemble we are in, we can exactly account for this fact in the computation of thermodynamic properties.

Weighted Histogram Analysis Method (WHAM) If multiple simulations are performed at different state conditions (e.g., different 푇,푃, or 휇), the Ferrenberg-Swendsen reweighting approach can also be used to compute potentials of mean force at some set of target conditions. This is quite straightforward.

As before, imagine that 퐽 simulations are performed at different conditions. In each simulation 푗, 푛 observations 푖 are made. In addition to computing the energy, one tallies the value of the reaction coordinate for each observation:

휉푖푗

Then, the PMF at some final set of conditions (the reweighting temperature, chemical potential, and/or pressure) is found by tallying the configurational weights to compute the distribution in 휉:

퐽 푛 ( ) ℘ 휉 ∝ ∑ ∑ 푤푖푗훿휉푖푗,휉 푗=1 푖=1

Ultimately, this distribution will need to be performed in a discretized array if 휉 is a continuous variable. Then, the PMF simply comes from

퐹(휉) = −푘퐵푇 ln ℘(휉)

Histogram methods for computing phase equilibria The computation of phase equilibria—saturation conditions, critical points, phase , etc.—is a major enterprise in the molecular simulation community. There are many ways that one might go about this, including the Gibbs ensemble, and they are discussed in detail in [Chipot & Pohorille, Free Energy Calculations: Theory and Applications in Chemistry and Biology, Springer, 2007]. Of all of these methods, however, the current gold-standard for phase

© M. S. Shell 2009 24/29 last modified 11/7/2019

equilibrium computations is one that relies on multiple histogram reweighting. This approach is often the quickest and has the maximum accuracy.

Here we will discuss these calculations in the context of liquid-vapor equilibria. Other phase be- havior—such as liquid-solid, vapor-solid, or liquid-liquid (in mixtures)—can be treated using this approach as well, although special techniques may be needed to treat structured phases.

Grand canonical perspective The basic idea of our calculations is the following:

• For a given temperature, we want to find the conditions at which phase equilibrium oc- curs. In a single component system, we only have one free parameter to specify this con- dition in addition to 푇, per the Gibbs phase rule. This could be 푃 or 휇.

• We will focus on 휇 instead of 푃 and perform simulations in the grand canonical ensemble. Thus, we want to find the value of 휇 at coexistence that generates an equilibrium between liquid and vapor phases. We could equally choose 푃 and perform 푁푃푇 simulations; how- ever, particle insertions and deletions are generally much more efficient than volume fluc- tuations and thus we go with the former for computational reasons.

• If the state conditions place us at conditions of phase equilibrium (i.e., on the liquid-vapor phase line), we expect the GCMC simulation to spontaneously fluctuate between a dense liquid (low 푈, large 푁) and dilute gas (high 푈, small 푁). These fluctuations can be ob- served by examining the joint distribution

℘(푈, 푁)

• At phase equilibrium, ℘(푈, 푁) will be bimodal, and the total probability weight under the two peaks will be equal. One peak corresponds to the liquid phase, and the other to the gas phase.

• We will find the conditions of phase equilibrium (푇, 휇) that result in a bimodal ℘(푈, 푁). We will generate ℘(푈, 푁; 푇, 휇) by reweighting simulation data from multiple simulations, each not necessarily on the phase coexistence line.

GCMC implementation We need to gather enough histogram data so that we will be able to reweight ℘(푈, 푁; 푇, 휇) with good statistical accuracy. We perform 퐽 simulations in the grand-canonical ensemble at different temperatures 푇푗 and chemical potentials 휇푗. For each we measure the joint distribution

℘(푈, 푁; 푇푗, 휇푗 ). The temperatures and chemical potentials should be chosen so that the energy

© M. S. Shell 2009 25/29 last modified 11/7/2019

and particle number distributions all overlap with each other and span the complete range of liquid-to-gas densities. Typically we perform the following kinds simulations:

• one near the (presumed) critical point that has large fluctuations in 푈 and 푁

• one in the gas phase

• several in the liquid phase, of varying temperature

The point is that we must have good histogram statistics for all the particle numbers and energies that will ultimately be relevant at phase equilibrium.

gas-phase simulation

near-critical simulation

liquid-phase simulation lower-푇 liquid-phase simulation

Once these histograms are taken, Ferrenberg-Swendsen iterations are used to compute the free energy at each state point. The relevant equations for GCMC simulations are:

−훽푗푈+훽푗휇푗푁+훽푗퐹푗 푆(푈, 푁) = ln 푐푡표푡(푈, 푁) − ln 푛 − ln ∑ 푒 for all 푈, 푁 푗=1

푆(푈,푁)−훽푗푈+훽푗휇푗푁 −훽퐹푗 = ln ∑ ∑ 푒 for all 푗 푈 푁

This procedure is performed until values for the discretized entropy 푆(푈, 푁) and free energies

퐹푗 = −푘퐵푇푗 ln Ξ푗 converge. Keep in mind that one value of 퐹 must be set equal to zero.

Once these values are determined, the joint distribution ℘(푈, 푁) can be computed for an arbi- trary reweighting 푇, 휇 using

© M. S. Shell 2009 26/29 last modified 11/7/2019

℘(푈, 푁; 푇, 휇) ∝ 푒푆(푈,푁)−훽푈+훽휇푁

The constant of proportionality is given by the normalization condition.

Using the reweighting equation, one finds phase equilibrium by adjusting 휇 given a value of 푇 until a bimodal distribution appears and the integral under each peak (the total probability) is the same. Notice that this is a very fast operation since the Ferrenberg-Swendsen reweighting only needs to be performed once, at the beginning, to determine 푆(푈, 푁).

The following figure shows what this distribution might look like, taken from a review article by Panagiotopoulos [Panagiotopoulos, J Phys: Condens Matter 12, R25 (2000)].

By repeating this procedure at different reweighting temperatures, one can map out the phase . In the 푇-𝜌 plane, this might look like:

critical point gas liquid T gas + liquid

r

© M. S. Shell 2009 27/29 last modified 11/7/2019

The average density of each phase can be determined using:

1 ∑푈 ∑푁>푁∗ 푁℘(푈, 푁) 1 ∑푈 ∑푁<푁∗ 푁℘(푈, 푁) ⟨𝜌⟩퐿 = ⟨𝜌⟩퐺 = 푉 ∑푈 ∑푁>푁∗ ℘(푈, 푁) 푉 ∑푈 ∑푁<푁∗ ℘(푈, 푁)

Here, 푁∗ is the value of 푁 at the minimum between the two peaks in the distribution.

Pressures can be found using the fact that

훽푃푉 = ln Ξ(휇, 푇, 푉)

We cannot compute absolute pressures using this equation, because we cannot compute Ξ ab- solutely (one must be set to zero). However, we can compare the pressures of two different state points:

Ξ(휇2, 푇2, 푉) 훽2푃2푉 − 훽1푃1푉 = ln Ξ(휇1, 푇1, 푉) 푆(푈,푁)−훽2푈+훽2휇2푁 ∑푈 ∑푁 푒 = ln 푆(푈,푁)−훽 푈+훽 휇 푁 ∑푈 ∑푁 푒 1 1 1

By letting one state point correspond to a very dilute, high temperature gas phase, we can com- pute its absolute pressure using the ideal gas law. This equation then can be used to relate the pressure at other state points back to the ideal gas reference.

Critical point reweighting and finite-size scaling How can one predict the location of the critical point? It turns out that systems have a kind of universal behavior at this point in the phase diagram. For many fluids, their behavior at the crit- ical point follows the same trends as the three-dimensional Ising model (or, equivalently, the lattice gas); that is, they fall in the Ising universality class or the Ising criticality class. In particu- lar, the probability distribution ℘(푈, 푁) assumes a universal form.

We can locate the true critical point by finding reweighting values of 푇푐, 휇푐 so that ℘(푈, 푁) maps onto the universal Ising distribution, which has been computed to high accuracy by a number of researchers.

Near the critical point, fluctuations become macroscopic in size. That that computed properties become sensitive to the size of the system being studied. In particular, the computed

푇푐, 휇푐 can have a substantial dependence on the system size 푉. Fortunately, these parameters have a scaling law behavior with 푉. To locate the infinite system size 푇푐, 휇푐 one performs multiple studies for different values of 푉, each time computing the critical point using the procedure above. Then, one extrapolates to 푉 → ∞. An example of this taken from [Panagiotopoulos, 2000] is:

© M. S. Shell 2009 28/29 last modified 11/7/2019

© M. S. Shell 2009 29/29 last modified 11/7/2019