4th CP2K tutorial

August 31st -September 4th, Zurich Sampling Free Energy Surfaces by MD

Marcella Iannuzzi

Department of Chemistry, University of Zurich

http://www.cp2k.org Complex Processes by MD Choose a suitable model of the system

Determine the thermodynamic conditions ⇒ Ensemble averages

Equilibrium sampling of physical quantities

RDF O-O Cl-H

1.5 2 2.5 3 3.5 4 4.5 5

Predictive power frustrated by sampling fast degrees of freedom with time-steps from < 0.1 fs (CPMD) up to 1 fs (MM)

~ few ns

2 Rare Events Phase Transitions, Conformational Rearrangements, Chemical Reactions, Nucleation, Diffusion, Growth, etc.

Activation Energies Minimum energy pathways

Δ

Exploration of configurational space

Complex and high dimensional Intrinsically multidimensional configurational space order parameter Multitude of unknown Entropic bottlenecks intermediates and products Unforeseen processes, many Diffusive trajectories irrelevant transition states

3 Canonical Partition Function The Laplace transform of the density of state

Q(N,V,T)= exp( E)⌦(N,V,E)dE Z Probability of the macrostate at a given T

1 1 Q(N,V,T)= exp (rN , pN ) drN dpN = Z(N,V,T) N!h3N H ⇤()3N N! Z ⇥ ⇤

one dimensional integral over E replaced by configurational integral analytic kinetic part is integrated out

(r) configurational partition Z(N,V,T)= e U dr function

4 Free Energy

Helmholtz free energy or thermodynamic potential

1 A = ln Q(N,V,T) Thermodynamics Statistical Mechanics

1 Z A = ln 1 entropic and enthalpic Z contributions 0 ⇥

(r,p) Macroscopic state 0 corresponds to a portion of Q0 e H drdp the phase space : Γ0 0

(r,p) Q e H0 drdp 0 Macroscopic state 0 corresponds to H0

0 (r,p) Macroscopic state 0 corresponds to a value of a Q0 e H drdp macroscopic parameter, e.g T

5 Perturbation formalism

Reference (0) and target system (1) (r, p)= (r, p)+ (r, p) H1 H0 H

(r,p) e H0 Probability of finding (0) in configuration (r,p) 0(r, p)= (r,p) P eH drdp Free energy difference

1 e 1 drN dpN 1 e 0 e drN dpN A = ln H = ln H H e 0 drN dpN e 0 drN dpN R H R H R R 1 A = ln exp (rN , pN ) H 0 Integrating ⌦out the⇥ analytic kinetic part ⇤↵ 1 U e U 0 A0,1 = ln e 0 (r, p) = ⇥F ⇤ ⇥ ⇤ ⇥F ⇤1 e ⇥ U ⇤0 6 Using a similar approach we can derive a formula for the statistical average of any mechanical property (rN ) in the target system, in terms of statistical averages over con- F formations representative of the reference ensemble

N N N N (r )e U1 r (r )e U e U0 r (rN ) = F = F (11.22) 1 e 1 rN e e 0 rN F R U R U U ⌦ ↵ After multiplying both the numeratorR and the denominatorR by Z0, one obtains

(rN )exp (rN ) (rN ) = F U 0 (11.23) F 1 exp [ (rN )] ⌦ h ⇥ U i0 ⇤↵ ⌦ ↵ Examples of properties that can be calculated are the average potential energy, forces, molecular dipole moment or torsional angles in a flexible molecule. Since A is calculated as the average over a quantity that depends on , as expressed U in Eq. 11.21, then this average can be taken over the probability distribution ( ), P0 U instead of (rN ), thus giving a one dimensional integral over the energy difference P0 Limitations 1 A = ln exp [ ] ( )d . (11.24) U P0 U U Z The integrand is the probability distribution multiplied by the Boltzmann1 factor, which gives a shifted function as shown in Fig. 11.1. This indicatesA = that the valueln of theexp integral [ ] 0( )d 38 C. Chipot and A. Pohorille U P U U Z 2.4

2.0 exp(−bDU ) Shifted function

1.6 P0(DU ) Low-!U tail is poorly sampled

) low statistical accuracy

U 1.2 D ( but important contribution to !A P 0.8 P0(DU )x exp(−bDU ) } 0.4

0.0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 DU

Fig. 2.1. P0(U), the Boltzmann factor exp ( U) and their product, which is the inte- Figure 11.1: 0 the Boltzmann factor, and their product. The low- tail of the integrand is poorly grand in (2.12). TheP low-U tail of the integrand, markedU with stripes is poorly sampled sampled with 0( ) andAccuracy therefore is known ⇒ with target low statistical and accuracy. reference However, it provides systems an are similar⇒ overlapping regions P U with P0(Uimportant) and, contribution therefore, to the is integral. known with low statistical accuracy. However, it provides an important contribution to the integral depends on the low-energy tail of the distribution. If the distribution is Gaussian

2 1 1 ( 0) 0( )= exp U h Ui (11.25) P U p2⇡ 22 1 to (2.12), we obtain  1

C 0 2 0 00 exp( A)= exp 208U U Γ 2 /22 dU Γ(2.15) Γ p2⇡ h i0 Z h i Here, C is independent of U insufficient statistics or incomplete overlap ⇒ enhanced sampling 1 7 C = exp U 2 (2.16) h i0 2  ✓ ◆ Comparing (2.13) and (2.15), we note that exp ( U) P0(U) is a Gaussian, as 2 is P0(U), but is not normalized and shifted toward low U by . This means that reasonably accurate evaluation of A it via direct numerical integration is pos- sible only if the probability distribution function in the low-U region is sufficiently well known up to two standard deviations from the peak of the integrand or 2 +2standarddeviationsfromthepeakofP (U),locatedat U . This state- 0 h i0 ment is clearly only qualitative — the reader is referred to Chap. 6 for detailed error analysis in FEP methods. This simple example, nevertheless, clearly illustrates the limitations in the direct application of (2.12). If is small, e.g., equal to kBT ,95% of the sampled values of U are within 2 of the peak of exp ( U) P (U) at 0 room temperature. However, if is large, for example equal to 4kBT ,thispercent- age drops to 5%. Moreover, most of these samples correspond to U larger than U 2 (the peak of the integrand). For this value of , U smaller than the h i0 peak of the integrand will be sampled, on average, only 63 out of 106 times. Not surprisingly, estimates of A will be highly inaccurate in this case, as illustrated in Fig. 2.1. Several techniques for dealing with this problem will be discussed later in this chapter and in Chap. 6. Order Parameters

Variables chosen to describe changes in the system

Reaction coordinate : the order parameter corresponds to the pathway along which the transformation occur in nature

Collective variable : fully represented as function of coordinates

Indicating intermediate stages of the transformation: mutation point

torsion angle annihilation non-bonded

Different possible definitions of OP Set up of system with desired values of OP Effects on accuracy and efficiency of ΔA calculations Smoothness of the simulated path

8 Extended Ensemble N Select parameters, continuous functions of coordinates ⇠ˆi(r )

Density of States ⌦ (N,V,E,⇠)= [ (rN ) E] ⇧ [⇠ˆ (rN ) ⇠ ] drN ⇠ U i i i Z ⇣ ⌘ ⇠ = ⇠ { i} Canonical Partition Function (rN ) N N Q (N,V,T,⇠)= e U ⇧ [⇠ˆ (r ) ⇠ ] dr ⇠ i i i Z ⇣ ⌘

1 Free Energy A = ln Q ⇠ ⇠

N ⇠ˆi(r ) must distinguish among metastable states

select specific configurations in the partition function 9 3 Methods Based on Probability Distributions and Histograms 85

Chap. 1 that the free energy difference, A(⇠),betweenthestatesdescribedby⇠ and ⇠0 is 1 (⇠) A (⇠)=A (⇠) A (⇠0)= ln . (3.13) (⇠0) In practice, the continuous function (⇠) is represented as a histogram consisting of M bins. If all bins have equal size ⇠ =(⇠ ⇠ ) /M then 1 0

fi [⇠0 +(i 0.5) ⇠]= (3.14) fj j X where fi is the number of sampled configurations for which the order parameter takes a value between ⇠ +(i 1) ⇠ and ⇠ + i⇠. 0 0 Combining (3.13) and (3.14) leads to a formula for histogram-based estimates of (⇠) 1 fi A (⇠0 +(i 0.5) ⇠)= ln . (3.15) f1 In practice, this simple formula will hardly ever work, especially if the free energy changes appreciably with ⇠.Consider,forexample,twostatesofthesystems,⇠i and ⇠ such that A(⇠ ) A(⇠ )=5k T .Then,onaverage,theformerstateStratification Scheme j i j B is sampled only seven times for every 1,000 configurations sampled from the latter state. Such nonuniform sampling is undesirable, as it leads to a considerable loss of statistical accuracy.Free For theenergy free energy butane profile shown isomeris in Fig. 3.1,ation transitions between 7 Probability distribution of the order parameter 6 Not uniform sampling 1 (⇠1) 5 A(⇠)=A(⇠ ) A(⇠ )= ln P 1 0 (⇠ ) 4 P 0 3

A (kcal/mol) 2 Histogram of M bins ⇠ =(⇠ ⇠ )/M 1 0 1

0 fi -60 0 60 120 180 240 (⇠0 +(i 0.5)⇠)= Θ (degrees) C-C-C-C P j fj Fig. 3.1. Free energy of isomerization of butane as a function of the C–C–C–C torsional angle, Figure.Becauseofhighenergybarriers,transitionsbetweenstable 11.4: Free energy of isomerization of butane as a functiontrans of theand C-C-C-Cgauche torsionalrotamers angle are P rare,⇥. Because which makes of high calculation energy barriers, of the transitions free energy between in stable a single trans simulation and gauche highly rotamers inefficient. are rare, Instead,which makes the calculation calculation was of the performed free energyRestrain in four in a overlapping single simulationthe windows, system highly whose inefficient. edgeswithin Instead, are marked a the window by harmonic potential oncalculation the x-axis. was In performed each window, in four the overlapping probability windows, density whose functions edges are and marked the free on energiesthe x-axis. were In determinedeach window, as the functions probability of density. They functions were subsequently and the free energies shifted were so determined that they matched as functions in of the ⇥. overlapping regions, yielding the freeOverlapping energy profile in the fullwindows range of consecutive windows overlap one can build the complete probability distribution by match- 2 ing (⇠) in the overlapping regions, as illustrated in Fig. 11.4. P (⇠1 ⇠0) Stratification improves efficiencyEfficient even if the free sampling energy is constant and the motion⌧ = L⌧w along ⇠ is strictly diffusing. In general, by dividing the entire range into L windows of / LD equal suze, the computer time required to acquire the desired statistics in each window, ⌧w, ⇠ is proportional to the characteristic timeReconstruct of diffusion within the a window full probability by matching 2 [(⇠1 ⇠0)/L] ⌧w . (11.57) / D⇠ 10

The total computer time is approximately

2 (⇠1 ⇠0) ⌧ = L⌧w . (11.58) / LD⇠

It decreases linearly with the number of windows, at least for large windows. For small windows, the statistic accumulated is affected by more correlation. Anyway, ⌧w should be longer than the time needed to equilibrate the system along the degrees of freedom orthogonal to ⇠. In particular when the motion along ⇠ is strongly correlated with the motion along orthogonal DoF, special care is needed. The windows must be large enough, in order to to hinder the equilibration in the orthogonal DoF. For these reason, at some point the reduction of the window size is not helping with the computational cost, but might spoil the good statistical sampling.

11.6.1 Importance Sampling A powerful strategy to improve the efficiency of free energy calculations is based on mod- ifying the underlying sampling probability such that important regions in phase space are visited more frequently.

217 Good Coordinates for Pathways

Capture the essential physics include all relevant DoF and properly describes the dynamics

q distinguishes between A and B but might fail in describing essential aspects of the transition

Discriminate configurations of Reversibility reactants and products and intermediates Fast equilibration of orthogonal DoF (no relevant bottlenecks) Characterisation of the mechanisms of transition 11 landscape as a function of two order parameters is displayed. Such a free energy surface might result, for instance, for a system with a first-order . The variable ⇠ could then be the size of a cluster of the stable phase forming in the metastable phase and ⇠0 could be some other, important degree of freedom. Due to a rare fluctuation a system initially in the metastable state A can overcome the nucleation barrier following a pathway (thick solid line) that crosses the transition state corresponding to the critical nucleus. After passing the transition state the system then relaxes into B. Although ⇠ can be used to tell whether the system is in A or in B, it fails to capture all important features of the transition. Indeed, during the transition a systematic motion along ⇠0 must occur. The failure becomes apparent by computinf the free energy profile as a function of ⇠ only. This function, shown inHypothetical the bottom panel of the figure, displays a barrier with2D top at ⇠⇤ , whichFree des not coincide with the transition region. Rather, configurations with ⇠ = ⇠⇤ (distributed along the dotted 7TransitionPathSamplingandtheCalculationofFreeEnergiesline) will most likely belong to the basins of attraction of regions A or B. 251

x′ (a)

B

A Not including important DoF leads to wrong x A x interpretation ( ) (b)

x* x

Fig. 7.1. (a) Hypothetical free energyFigure 11.17: landscapeHypothetical free energyA( landscape.⇠, ⇠) as a function of two selected degrees of freedom ⇠ and ⇠. SuchThe a transition free path energy sampling (TPS) surface method is mighta result, technique for de- instance, for a system with a 12 first-order phase transition. The variable ⇠ could244 then be the size of a cluster of the stable phase forming in the metastable phase and ⇠ could be some other, important degree of freedom. Due to a rare fluctuation a system initially in the metastable state A can overcome the nucleation barrier following a pathway (thick solid line) that crosses the transition state corresponding to the critical nucleus. After passing the transition state the system then relaxes in the stable state B. Although the cluster size ⇠ can be used to tell wether the system is in state A or B it fails to capture all important features of the transition because during the transition system- atic motion along ⇠ must occur. (b) The failure of ⇠ to include the essential physics of the transition becomes apparent when we determine the free energy of the system as a function of the variable ⇠ only, A(⇠)= d⇠ exp A(⇠, ⇠) . This function, shown in the lower panel { } of the figure, displays a barrier separating the ⇠-values corresponding to the region A from those from region B. The topR of the barrier, located at ⇠⇤,does,however,notcoincidewith the transition region. Rather, configurations with ⇠ = ⇠⇤ (distributed along the dotted line in the upper panel) will most likely belong to the basins of attraction of regions A or B along a given reaction coordinate [10]. Of course, it is also trivially possible to gen- erate an equilibrium distribution in configuration space by applying path sampling techniques to a path probability in which the requirement that paths start and end in certain regions has been removed. In this case free energies can be simply calculated from configurations lying on the sampled pathways [11]. For such an unconstrained ensemble of pathways additional precautions must be taken to guarantee that the rare event of interest occurs at least on some of the trajectories collected in the simulation. These issues are discussed in Sect. 7.4. Transition path sampling can also be helpful in the calculation of free ener- gies in the context of fast-switching methods described in Chap. 5. As shown by Jarzynski [12], equilibrium free energies can be computed from the work performed on a system in repeated transformations carried out arbitrarily far from equilib- rium. From a computational point of view, this remarkable theorem is attractive because it promises efficient free energy calculations due to the reduced cost of Some simple collective variables

Derivable function of the degrees of freedom to which a given value can be assigned

Distance R R Generalized Coordination Number | I J | Select two arbitrary lists of atoms L1 and L2 and a reference distance r0 Angle (RI , RJ , Rk) n N N rij L1 L2 1 Dihedral (RI , RJ , Rk, RL) 1 r0 CL1L2 = m ⇤ rij ⇥ ⌅ NL1 j=1 i=1 1 r Difference of distances RI RJ RJ RK ⌦ ⌥⌦ 0 | | | | ⇥ ⇧ ⌃

Coordination number Generalised coordination number n N N rij L1 L2 1 1 r0 CL1L2 = m N ⇤ rij ⇥ ⌅ L1 j=1 i=1 1 ⌥ r0 ⇥ ⇧ ⌃ Generalised displacement

[klm] 1 1 D = di vˆ[klm] dj vˆ[klm] First Prev Next Last Go Back Full Screen Close Quit L1L2 • • • • • • • • NL1 · NL2 · i L1 j L2 13 CP2K input for CV

In SUBSYS add one section per CV

&COLVAR &COLVAR &COLVAR &DISTANCE &COORDINATION &RMSD AXIS X KINDS_FROM N &FRAME ATOMS 1 4 KINDS_TO O COORD_FILE_NAME planar.xyz &END DISTANCE R_0 [angstrom] 1.8 &END &END COLVAR NN 8 &FRAME ND 14 COORD_FILE_NAME cage.xyz &END COORDINATION &END &END COLVAR SUBSET_TYPE LIST &COLVAR ATOMS 1 3 5 6 8 9 &DISTANCE_FUNCTION ALIGN_FRAMES T ATOMS 4 6 6 1 &END COEFFICIENT -1.00000 &END # distance 1 = ( 4 - 6 ) # distance 2 = ( 6 - 1 ) &END DISTANCE_FUNCTION &END COLVAR

14 Constraints and Restraints

In MOTION add one section per constraint

&CONSTRAINT &COLLECTIVE &COLLECTIVE COLVAR 1 TARGET [deg] 0.0 INTERMOLECULAR MOLECULE 1 COLVAR 1 TARGET 5. &RESTRAINT TARGET_GROWTH 1.1 K [kcalmol] 4.90 TARGET_LIMIT 10. &END &END COLLECTIVE &END COLLECTIVE &END CONSTRAINT

15 Geometrical Constraints Implicit functions of the degrees of freedom of the system

( R , h, )=0 ˙ ( R , h, )=0 { I } { I }

To freeze fast degrees of freedom and increase the time step: e.g., intra-molecular bonds by distance constraints

To explore only a sub-region of the conformational space

As reaction coordinates : constrained dynamics and thermodynamic integration

To prevent specific events or reactions

Lagrange formulation for simple constraints, functions of RI

( R , P )= ( R , P ) ⇥( R ) L { I } { I } L { I } { I } { I } The Lagrange multipliers ensure that positions and velocities satisfy the constraints

16 Shake-Rattle algorithm Modified velocity Verlet scheme by additional constraint forces

First update of velocities (first half step) and positions t VI = VI (t)+ FI (t) RI = RI (t)+tVI 2MI

Positions’ correction by constraint forces 2 t (p) RI (t + t)=RI + gI (t) 2MI

Calculation of the new forces FI(t+∂t)

Update of velocity (second half step) t (v) VI (t + t)=VI + [FI (t + t)+gI (t + t)] 2MI Correction by the constraint forces Constraint Forces

⇤⇥ ( ) (p) (p) ⇤⇥( RI ) 1 J = ⇤ e( ⇤ )= J⇥ ⇥⇥( ⇤ ) ⇥ { } gI (t)= { } { } { } ⇤⇥ ⇤R ⇥ I Set of non-linear equations solved iteratively

∂σ ∂σ δt2 ∂σ ∂σ α V α · V′ α β λv (v) (v) ⇤⇥( RI ) I = I + β =0 g (t)= { } ∂RI ∂RI 2MI ∂RI ∂RI I ⇤R I I β " I # I ! ! ! !

H.C. Andersen, J. Comp. Phys., 52, 24 (1983) 17 Thermodynamic Integration

1 dA along a one dimensional A( ) A( )= d generalized coordinate ξ(x) 1 0 d 0 Path-independent

Potential of Mean Force exerted on ξ q (x, p) (,q1..qN 1,p ..pN 1) generalized coordinate to simplify derivative

⇤ ⇥ q dA ⇤⇥H e Hdq1..dqN 1dp ..dpN 1 ⇥ = = H ⇥ q ⇥ d e Hdq1..dqN 1dp ..dpN 1 ⇥ force at ξ, averaged over instantaneous force on ξ fluctuations of other DoF

⇤ ⇤U 1 ⇤ ln J x H = | | [J] = i ⇤⇥ ⇤⇥ ⇤⇥ ij q ⇥ ⇥ j

mechanical + entropic

18 Blue Moon Ensemble

Series of constrained MD simulations = + (⇥ ⇥(r)) H H F(S) ! mean force acting on the ∂F(S)! (⇥ ⇥(r)) A(ξ) S − system to hold ξ constant f MDn = ! ⇥ independent∂S !MDn MD replica MD1 ˙ q MD1 =0 : p (q, p )

tMD2 un-constrained <..> = constrained corrected <..>F tMD1 ξS

SMD1 SMD2 SMD3 ξ1 ξ2 ξ3 Fixman Potential 1 1 ⇥ 2 = + ln Z Z = F 2 ⇥ m ⇥x H H i i i ⇤ ⇥

mean force acting on ξ dA F = F ˙ related to external force d⇥ ⇥ to hold ξ constant

19 ...some difficulties MD performed at fixed ξ, collecting statistics of the force acting on ξ ⇒ λ∇ξ, constraint force

Many estimates at ξ to reduce statistical errors

Many windows to get accurate integrals

May not be easy to prepare by hand the system at given ξ

Different possible pathways: the starting configuration selects one path, but crossing is rare, ξ(r)=ξ partially sampled or rate limiting

Multidimensionality (more coordinates) too expensive

20 Metadynamics

Canonical equilibrium distribution under potential V(r)

Choose a set of relevant Collective Variables S(r):{Sα(r)}, such that the process is well defined in the reduced space Σ(S)

A(S) e 1 V (r) P (S)= A(S)= ln dr e ⇥(S S(r)) dS e A(S) ⇤ ⇥ Perform MD and re-map each micro-state by projecting the trajectory into the configuration space Σ(S): meta-trajectory S(r(t))

Enhance the exploration by adding a penalty potential that discourages the system to visit already explored macro-states

S(r) S(r (t )) 2 [ G ⇥ ] 2S2 VMTD(S(r),t)= Wt⇥ e t⇥=G,2G,...

New probability distribution generated under the action of V+VMTD

A Laio et al. Proc. Natl. Acad. Sci. U.S.A., 99, 12562 (2002) M Iannuzzi et al, PRL, 90, 238302 (2003) 21 History Dependent Potential Non Markovian Coarse-grained MD

F(S) V(t) Escape t4 Fill-up by successive contributions Keeping track of the explored space t3 Direct estimate of FES topology t2 t1 S

Eliminate metastability and reconstruct A(S) within Σ(S) A (S,t)= V (S(r),t) G MTD

Flattening of free energy surface

W/ 0 G

[A(S) A (S,t)] A(S)=A(S) AG(S,t) P (S) e G A(S) tMD ⇥G ⇥s ⇥

C Micheletti et al, PRL, 92, 170601 (2004) 22 2D FES

Histogram of a trajectory generated by V+VMTD provides a direct estimate of free energy

23 Extended Lagrangian MTD

Enforcing adiabatic separation, s other time scales and memory effects

Introduction of auxiliary variables s : {sα} one for each Sα(r) with large enough M 1 1 = K V (r)+ M s˙ k (s S (r))2 V (s,t) L 2 2 G coupling to system sampling thermalization DoF enhancement

(s s (t ))2 G ⇥ 2s2 VG(s,t)= Wt⇥ e t⇥

For large t and slow deposition rate, VG approximates the free energy and the meta-trajectory s(t) follows the MEP

1 V (r)+ 1 k (s S (r))2 A (s)= ln dr e [ 2 ] k ⇤ P ⇥

lim Ak(s)=A(s) k ⇥

M Iannuzzi et al, PRL, 90, 238302 (2003) 24 Accuracy of FES profile

Dynamics generating the equilibrium distribution associated with A(s)-AG(s,t)

different Averaging over many independent trajectories 1 A(s) (s,t)= (A (s,t) A(s) A (s,t) A(s) )2 profiles ⇥ G ⇥ G ⇤ ⇤ ds (s,t) 0.6 ¯(t)= error ds 0.2 Empirical error estimate 100 300 500 700 900 s W number of Gaussians ⇥¯ = C(d) V D ⇤G too large ∆s would smear out A(s) details : ∆s/L<0.1 ds(Amax A(s)) t = ⇥ :A(s)

The selected CV must discriminate among the relevant states (reactants, products, TS)

The number of hills required to fill the well is proportional to 1/(∆)NCV

The sampling of large variations of the CV over almost flat energy regions is expensive: diffusive behavior

MTD is not the true dynamics. Reaction rates are derived a posteriori from the estimated FES

The analysis of the trajectory is needed to isolate the TS

With proper choices of CV and parameters, the MTD trajectory describes the most probable pathway taking into account also possible kinetic effects (lager and shallower channels are preferred)

The accuracy in the evaluation of the FES depends on hills’ shape and size, and on the deposition rate. The ideal coverage VG ({S})= -A({S}) (flat surface) 26

MTD input

&FREE_ENERGY &METADYN &METAVAR COLVAR 2 ! TORSION DO_HILLS T SCALE 0.22 LAMBDA 0.8 LAGRANGE MASS 30 NT_HILLS 40 &END METAVAR SLOW_GROWTH TEMPERATURE 300 TEMP_TOL 100 WW 0.0001 &PRINT HILL_TAIL_CUTOFF 2 &COLVAR P_EXPONENT 8 COMMON_ITERATION_LEVELS 3 Q_EXPONENT 20 &EACH MD 1 &METAVAR &END COLVAR 1 ! COORDINATION &END SCALE 0.18 &HILLS LAMBDA 0.8 COMMON_ITERATION_LEVELS 3 MASS 20 &EACH &WALL MD 1 POSITION 2.0 &END TYPE QUADRATIC &END &QUADRATIC &END DIRECTION WALL_PLUS K 1.0 &END METADYN &END QUADRATIC &END WALL &END FREE_ENERGY &END METAVAR

28 PLUMED in CP2K

External library with advanced MTD capabilities:

Many different collective variables Well-tempered MTD Multiple walkers MTD Bias exchange MTD Reconaissance MTD

2 versions work with : Plumed 1.3, Plumed 2.x

29 Installation with PLUMED v1.3

Need to separately compile Plumed with cp2k Different procedures for v 1.3 and 2.x

Compiling Plumed 1.3 with a current cp2k version: 1. Download a modified 1.3 release from http://www.cp2k.org/static/ downloads/plumed/ 2. Extract the archive, run the provided script plumedpatch_cp2k.sh 3. Compile the plumed library 4. Compile cp2k with the flag –D__PLUMED_CP2K

Detailed instructions provided under: http://www.cp2k.org/ howto:install_with_plumed

Issues: Need to recompile with this complicated 2-step procedure every time you want to update the code 1.3 is an outdated version of plumed

30 Installation with PLUMED v2.x

Compiling Plumed 2.x with a current cp2k version:

1. Download plumed from the official website

3. Modify cp2k ARCH file as specified under http://www.cp2k.org/howto:install_with_plumed

5. Compile cp2k

Much more straightforward procedure, giving you the most recent version of plumed, which is actively maintained.

Use CP2K v 2.7

31 Using Plumed

In the CP2K input file: &FREE_ENERGY ... &METADYN USE_PLUMED .TRUE. PLUMED_INPUT_FILE ./filename.inp &END METADYN &END FREE_ENERGY

Additional plumed input file (v2): a: DISTANCE ATOMS=5,7 b: ANGLE ATOMS=7,9,15 Colvar definition

metad: METAD ARG=a,b PACE=500 HEIGHT=1.2 SIGMA=0.35,0.35 FILE=HILLS

PRINT STRIDE=10 ARG=phi,psi,metad.bias FILE=COLVAR

32 Article

pubs.acs.org/JPCC

Building Blocks for Two-Dimensional Metal−Organic Frameworks Confined at the Air−Water Interface: An Ab Initio Study Ralph Koitz, Marcella Iannuzzi,* and Jürg Hutter Department of Chemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland *S Supporting Information Tris Terpyridine networks

ABSTRACT: Two-dimensional molecular sheets are of prime interest in nanoscience and technology. A promising class of such materials is 2D metal−organic frameworks (MOFs), assembled by cross-linking precursors with metal ions. It was recently TTPB is confined on a surface of liquid demonstrated that such MOFs can be synthesized from monomers confined at an air− water, metal ion dissolved in the bulk water interface. In order to elucidate this process at the atomic scale, we study a large flat tris-terpyridine-derived molecule (TTPB) on a water surface using ab initio molecular coordinates to N-pockets dynamics. We investigate the properties of the molecule and examine its reaction with Zn fl fi ions from the liquid phase. The uid substrate signi cantly stabilizes the adsorbate while Metal ion coordination leads to maintaining sufficient conformational flexibility to allow dynamic rearrangement and chemical reactions. The successful uptake and binding of ions is the first step toward formation of 2D-MOFs linking TTPB molecules to dimers and large 2D MOFs.

■ INTRODUCTION membranes, free-standing functional sheets, and improved Since the advent of graphene and related materials, two- graphene analogues. To this end, we aim to gain insight into dimensional sheet-like molecules have become the target of the relevant processes at the atomic scale using advanced intense study.1,2 Up to now, the large-scale fabrication of well- computational methods. defined nanosheets has remained a challenge. A popular A number of recent simulation studies have explored the 10−12 method to synthesize these materials is through the properties of water surfaces, focusing particularly on decomposition of precursor molecules on suitable surfaces.3,4 hydrogen bonding at the interface, diffusivity, and ion solvation. However, some aspects of this approach limit its usefulness. Water slabs with adsorbed molecules are less widely studied, First, substrates are typically only reactive at the surface, leaving but some classical simulations of surfactants on water have been the bulk of the solid unutilized for the reaction. Second, the performed.13 Self-assemblyKoitz, R., and Iannuzzi, reactions M., & Hutter, to produce J. (2015). JPCC large, 119(8), two- 4023–4030 33 solid with its rigidly placed binding sites limits the conforma- dimensional layers on solids constitute an important field in tional flexibility of adsorbed species and thus the range of surface science that has been exhaustively studied in recent accessible reaction products.5,6 Third, removing the final years.3,14−17 In contrast, the approach of using fluid surfaces to 7 extended sheet from the support may be challenging. carry out reactions and produce extended sheets is currently Alternative approaches to preparing 2D sheets are thus highly largely unexamined, both experimentally and computationally. desirable. An exception to this has been the use of liquid interfaces for the A promising route toward preparing extended 2D coordina- formation of self-assembled monolayers, as well as the large tion polymers or 2D metal−organic frameworks (MOFs) has fi 8,9 eld of surfactants and lipid bilayers. In these applications, recently been presented. Suitable monomeric precursors can however, no new bonds are formed, and products are typically be confined on the surface of water, induced to form a dense weakly bound aggregates. Previous investigations have focused layer by increasing lateral pressure, and linked with metal ions either on photochemical processes involving molecules on from the liquid phase to form large sheets.8,9 This way, the water,18,19 salts and ions at/near the surface,20−22 and substance liquid−vapor interface of water is used as a “surface” on which 23 the reaction is performed, while metal ions can be supplied uptake across the gas/liquid interface. Reactions such as the enzymatic cleavage of surface-confined esters have also been from the bulk of the solution and used to assemble an extended 24 sheet. The substrate can provide reactants from the bulk liquid studied. Large molecules “adsorbed” at the water surface and by diffusion, lightly and dynamically stabilize adsorbates, and their reactions with solutes have not been examined thoroughly facilitate easy removal by drying. Therefore, all three issues so far, particularly at the atomic scale. mentioned above are overcome. By systematically exploiting the advantages of liquids as a support for chemical reactions, it Received: October 9, 2014 may be possible to prepare new materials or develop new Revised: January 17, 2015 routes toward relevant products such as tailored single-layer Published: February 4, 2015

© 2015 American Chemical Society 4023 DOI: 10.1021/jp510199k J. Phys. Chem. C 2015, 119, 4023−4030 Zn ion migration and binding

mechanism and energetics of Zn binding by MTD

2 CVs

Coordination number of Zn by TTPB (nZn)

Distance between Zn and centre of TTPB molecule (dZn)

Hills spawned every 50 steps after 7.5ps of unbiased MD nZn dZn [Å] 3 12 2 8

1 4

0 3 6 3 6 time [ps] 34 The Journal of Physical Chemistry C Article

approximately 50%. Compared to the bulk-like region, the fraction of DDAA species decreases from 0.65, and the occurrence of DA molecules roughly doubles. On the other hand, the DAA and DDA water molecules are approximately equally common on the free and covered surface sides, exhibiting increased probabilities compared to the bulk. In ff the TTPBw−Zn3 simulations, the di erences in DDAA frequencies between the covered and free surfaces are less pronounced. The fraction of DA species is about 40% lower on the TTPB-covered side, and the occurrence of DAA species is somewhat higher. Furthermore, we note a small number of double-acceptor (AA) species on the TTPB-covered side that are almost completely absent fromFES all other vertical of regions. migration Generally, the TTPBw and TTPBw−Zn3 systems exhibit quite similar depth dependencies of their H-bond distributions. It The Journal of Physical Chemistry C should be noted that in the TTPBw−Zn3 simulation,Article a total of six Cl− ions are present in the bulk liquid, which may account approximately 50%. Compared to the bulk-like region, the for some of the differences in H-bonding patterns between the fraction of DDAA species decreases from 0.65, and the two simulations. nZn Our results agree with previous conclusions about the occurrence of DA molecules roughly doubles. On the other 0 3 distributions of H-bonds near the air−water interface.KJ/mol hand, the DAA and DDA water molecules are approximately Furthermore, they suggest that the adsorbate has a subtle MEP equally common on the free and covered surface sides, influence on the H-bonding pattern of near-surface water exhibiting increased probabilities compared to the bulk. In molecules, reordering them to affect the frequencies-100 of different -300 the TTPB −Zn simulations, the differences in DDAA D/A combinations. w 3 2 Ion Uptake from the Solution. Once the TTPB frequencies between the covered and free surfaces are less monomer is confined at the water surface, the next step in pronounced. The fraction of DA species is about 40% lower on the formation of a MOF is the coordination of metal-200 ions, the TTPB-covered side, and the occurrence of DAA species is which serve as linkers between molecules. This process involves -400 somewhat higher. Furthermore, we note a small number of diffusion of the ions from the liquid phase to the surface and 1 subsequent formation of three coordinative Nterpy−Zn bonds. double-acceptor (AA) species on the TTPB-covered side that 2+ -300 are almost completely absent from all other vertical regions. Having established that Zn binds to the terpy moieties of TTPB in a stable way (vide supra), we now examine the -500 Generally, the TTPBw and TTPBw−Zn3 systems exhibit quite 2+ transition TTPB + Zn (aq) ⇌ TTPB−Zn in detail, focusing on similar depth dependencies of their H-bond distributions. It 0 one isolated ion uptake rather than the saturation-400 of all three should be noted that in the TTPBw−Zn3 simulation, a total of ligand sites. six Cl− ions are present in the bulk liquid, which may account We use MTD to accelerate the simulation of the process. As for some of the differences in H-bonding patterns between the CVs, we choose4 the distance8 from12 Zn to the center of TTPB, dZn,TTPB, and the Zn−N coordination number, nZn,N. Starting two simulations. from a configurationdZn with [Å] Zn2+ rather far away from the Our results agree with previous conclusions about the molecule, along the MTD trajectory, dZn,TTPB slowly decreases, distributions of H-bonds near the air−water interface. and eventually nZn,N increases from 0 to 3, that is, full Furthermore, they suggest that the adsorbate has a subtle coordination of Zn by terpyridine. AsZn the migrates simulation goes towards on, the surface. 2+ influence on the H-bonding pattern of near-surface water the3-step bias potential binding drives Zn processout of the complexof the back Zn into ion, the assisted by rotation of pyridyl groups. molecules, reordering them to affect the frequencies of different solution. We can thus elucidate the molecular details of ion binding and describe the energy landscape of the process. D/A combinations. Ion Insertion: Free Energy and Mechanism. We plot the Ion Uptake from the Solution. Once the TTPB aggregatedBinding energy energy data as a functionof Zn of competes the two CVs, as with well as the larger number of configurations in monomer is confined at the water surface, the next step in a minimum-energy path for the first Zn2+ uptake event in Figure 5a.solution. The free-energy More profile alongmicrostates this path is shown for in dissolved Zn lower in free energy. the formation of a MOF is the coordination of metal ions, 2+ Figure 5b, with minima and barriers labeled for comparison. Figure 5. (a) Reconstructed FES of Zn uptake from solution as a which serve as linkers between molecules. This process involves function of Zn N coordination number and distance from Zn to the 35 ff Starting from the bulk water (point 1), the ion has to escape a − di usion of the ions from the liquid phase to the surface and minimum of 430 kJ·mol−1 due to the solvation free energy, center of the TTPB molecule. Minimum-energy path representing the first Zn2+ uptake event from bulk water. Significant points (minima, subsequent formation of three coordinative Nterpy−Zn bonds. · −1 overcomes a barrier of about 165 kJ mol (point 2), and settles barriers) are numbered 1−7. (b) Free-energy profile along the Having established that Zn2+ binds to the terpy moieties of into a wider and deeper minimum somewhat closer to TTPB −1 minimum-energy path shown in (a). (c) Snapshots from the MTD TTPB in a stable way (vide supra), we now examine the (−520 kJ·mol , point 3). The ion then traverses a second trajectory showing the stepwise binding of Zn (orange) by TTPB. fi transition TTPB + Zn2+ ⇌ TTPB−Zn in detail, focusing on barrier (point 4) until it is bound by the rst pyridyl group Color codes are as those in Figure 1. (aq) fi · −1 one isolated ion uptake rather than the saturation of all three (nZn,N = 1). After a nal, smaller barrier (127 kJ mol , point 6), TTPBw−Zn3 is formed, characterized by a very narrow ligand sites. −1 minimum with a depth of −388 kJ·mol (point 7). solvated by H2O in a bulk-like environment. There is also a We use MTD to accelerate the simulation of the process. As The FES exihibits a number of minima. Most notable is a broad minimum for nZn,N between 1 and 2 (point 5), where the CVs, we choose the distance from Zn to the center of TTPB, broad and deep “valley” with uncoordinated Zn (nZn,N ≈ 0) at ion is partially coordinated by TTPB and still has substantial the center of mass of TTPB up to d ≈ 8 Å with Zn freedom of movement. dZn,TTPB, and the Zn−N coordination number, nZn,N. Starting Zn,TTPB fi 2+ from a con guration with Zn rather far away from the 4027 DOI: 10.1021/jp510199k J. Phys. Chem. C 2015, 119, 4023−4030 molecule, along the MTD trajectory, dZn,TTPB slowly decreases, and eventually nZn,N increases from 0 to 3, that is, full coordination of Zn by terpyridine. As the simulation goes on, the bias potential drives Zn2+ out of the complex back into the solution. We can thus elucidate the molecular details of ion binding and describe the energy landscape of the process. Ion Insertion: Free Energy and Mechanism. We plot the aggregated energy data as a function of the two CVs, as well as a minimum-energy path for the first Zn2+ uptake event in Figure 5a. The free-energy profile along this path is shown in 2+ Figure 5b, with minima and barriers labeled for comparison. Figure 5. (a) Reconstructed FES of Zn uptake from solution as a Starting from the bulk water (point 1), the ion has to escape a function of Zn−N coordination number and distance from Zn to the −1 center of the TTPB molecule. Minimum-energy path representing the minimum of 430 kJ·mol due to the solvation free energy, fi 2+ fi · −1 rst Zn uptake event from bulk water. Signi cant points (minima, overcomes a barrier of about 165 kJ mol (point 2), and settles barriers) are numbered 1−7. (b) Free-energy profile along the into a wider and deeper minimum somewhat closer to TTPB −1 minimum-energy path shown in (a). (c) Snapshots from the MTD (−520 kJ·mol , point 3). The ion then traverses a second trajectory showing the stepwise binding of Zn (orange) by TTPB. barrier (point 4) until it is bound by the first pyridyl group Color codes are as those in Figure 1. fi −1 (nZn,N = 1). After a nal, smaller barrier (127 kJ·mol , point 6), TTPBw−Zn3 is formed, characterized by a very narrow −1 minimum with a depth of −388 kJ·mol (point 7). solvated by H2O in a bulk-like environment. There is also a The FES exihibits a number of minima. Most notable is a broad minimum for nZn,N between 1 and 2 (point 5), where the broad and deep “valley” with uncoordinated Zn (nZn,N ≈ 0) at ion is partially coordinated by TTPB and still has substantial the center of mass of TTPB up to dZn,TTPB ≈ 8 Å with Zn freedom of movement.

4027 DOI: 10.1021/jp510199k J. Phys. Chem. C 2015, 119, 4023−4030