Fundamentals of Quantum Mechanics for Chemistry

G´eraldMONARD

Equipe de Chimie et Biochimie Th´eoriques UMR 7565 CNRS - Universit´eHenri Poincar´e Facult´edes Sciences - B.P. 239 54506 Vandœuvre-les-Nancy Cedex - FRANCE

http://www.monard.info/ Outline . . . 1. Fundamentals of Quantum Mechanics for Chemistry  Hartree-Fock methods  Density Functional Theory  The QM scaling problem  Semiempirical methods  Molecular Mechanics 2. Fundamentals of QM/MM methods  Partionning  QM/MM interactions  Cutting covalent bonds  ONIOM  Some available software 3. Selected QM/MM applications  Solvent effects  Spectroscopy  Biochemistry . . . Outline 4. Fundamentals of Linear Scaling methods  QM Bottlenecks  General ideas and solutions  Some available software 5. Focus on some Linear Scaling methods  CG-DMS  Mozyme  Divide & Conquer 6. Selected Linear Scaling applications  Energy Decomposition; Charge Transfer & Polarization  Born-Oppenheimer Molecular Dynamics 7. Parallelization of QM/MM and Linear Scaling methods Fundamentals of Quantum Mechanics for Chemistry (1)

Some approximations And there was the Schr¨odinger equation. . .

H0Ψ0 = E0Ψ0

where:

H0 is an Hamiltonian operator that describes a molecular system

Ψ0 is a wavefunction (solution of the Schr¨odingerequation) that describe a state of the system

E0 the energy associated to Ψ0

1 equation + 2 unknowns (given H0) = +∞ solutions! ­ From now on:  ground state  closed shell  non-relativistic Fundamentals of Quantum Mechanics for Chemistry (2)

Born-Oppenheimer approximation ­ nuclei are fixed point charges ­ only electrons are represented by a wavefunction Ψ

HΨ = Eelec Ψ (1)

H = Te + VeN + Vee

1 −Z 1 K (2) = − ∑∆i + ∑∑ + ∑∑ 2 i i K riK i i>j rij | {z } | {z } | {z } kinetic energy e−-nuclei inter. e−–e− inter.

E = Eelec + Enuclei Z Z = hΨ|H|Ψi + ∑ ∑ K L K L>K RKL Fundamentals of Quantum Mechanics for Chemistry (3)

Orbital approximation ­ each electron is described by a mono-electronic wavefunction: the Molecular Orbital (MO)

Ψ(1,2,...,n) = ψ1(1)ψ2(2)...ψn(n)

­ all MOs are combined in a Slater determinant (Pauli principle)

Linear Combination of Atomic Orbitals (LCAO)

­ Each MO ψi is developed on a basis set of functions φµ : the Atomic Orbitals (AO) AOs ψi = ∑ cµi φµ µ

­ the real coefficients ciµ are the unknown of the problem Fundamentals of Quantum Mechanics for Chemistry (4)

Variational Principle

­ The eletronic energy Eelec corresponds to a minimum with respect to each MO ψi ∂E ∀i elec = 0 ∂ψi Hartree-Fock equations

∀i Fψi = εi ψi

where F is the mono-electronic Fock operator The Hartree-Fock method (1)

The Fock operator

c F(1) = H (1) + ∑[Jj (1) − Kj (1)] j

­ Hc(1) is the one-electron core Hamiltonian

c 1 ZK H (1) = − ∆1 − ∑ 2 K R1K

­ Jj (1) is the Coulomb operator, Z ∗ 1 Jj (1) = ψj (2) ψj (2)dτ2 (2) r12

­ Kj (1) is the exchange operator Z ∗ 1 Kj (1)ψi (1) = ψj (1) ψj (2) ψi (2)dτ2 (2) r12 The Hartree-Fock method (2)

The Roothan-Hall equations  ­ N electrons, φµ AO basis set, closed shell, ground state occupied MO = 2 electrons; virtual MO = 0 electron

AO ψi = ∑ cµi φµ µ

­ the Hartree-Fock equations can be re-written:

FC = SCε

with

 C the matrix of cµi coefficients  ε the diagonal energy matrix  S the overlap matrix:

Sµν =< φµ |φν > The Hartree-Fock method (3) The Density Matrix

From the MO coefficient cµi , it is possible to build a density matrix whose elements are:

MO Pµν = ∑ nj cµj cνj with nj = 0 or 2 (occupation number) j The Fock matrix

AO AO  1  F = Hc + P (µν|λη) − (µη|λν) µν µν ∑∑ λη 2 λ η with: Z Z 1 (µν|λη) = φµ (1)φν (1) φλ (2)φη (2)dr1dr2 (1) (2) r12 The Hartree-Fock energy

1 Eelec = ∑∑Pµν [Hµν + Fµν ] 2 µ ν The Hartree-Fock method (4)

The Hartree-Fock algorithm 1. Compute mono- and bielectronic integrals 2. Build core hamiltonian (invariant) Hc 3. Guess an initial density matrix 4. Build the Fock matrix F 5. Orthogonal transformation using S1/2

F0C0 = εC0

6. Diagonalization of the Fock matrix F0 The C0 coefficients are obtained 7. Inverse transformation C0 → C 8. Build the new density matrix Back to 4. unless convergence The Hartree-Fock method (5)

The Hartree-Fock method is an ab initio method ­ No (empirical) parameters ¡ ab initio method ­ Orbital approximation ¡ No electronic correlation

Other ab initio methods (post-Hartree-Fock methods) ­ Møller-Plesset Perturbation Theory (MP2, MP4, etc) ­ Configuration Interaction (CI) ­ Coupled Cluster (e.g. CCSD(T)) ­ MultiConfigurational Self-Consistent-Field (MCSCF) Density Functional Theory (1)

Background ­ DFT (Density Functional Theory) methods are (almost) ab initio methods which include electronic correlation at a cost similar to a Hartree-Fock calculation. In most cases, a DFT calculation is even less costly than a HF calculation. ­ DFT methods relie on the Hohenberg-Kohn theorem (1964) which states that the ground state energy E of a system is a functional of the electronic density of this system, ρ(~r). Any electronic density ρ0(~r) other than the real electronic density will necessary lead to a higher energy. (variational principle) Density Functional Theory (2)

A different approach To the opposite of ab initio methods, DFT methods try to find a simple 3-dimensional ρ(~r) function and not a complex 3N-dimensional wave function.

ρ : R3 −→ R ~r 7−→ ρ(~r) From the Hohenberg-Kohn theorem, the energy E depends on the electronic density. It is said that E is a functional of the electronic density:

3  E : R → R −→ R ρ 7−→ E [ρ(~r)] Density Functional Theory (3)

The Kohn-Sham approach Let’s write: E [ρ(~r)] = U [ρ(~r)] + T [ρ(~r)] + Exc [ρ(~r)] with: ­ U [ρ(~r)] the classical electrostatic energy

nuclei Z −Z ρ(~r) 1 ZZ ρ(~r)ρ(~r 0) U [ρ(~r)] = A d~r + d~rd~r 0 ∑ ~ ~0 A |~r − RA| 2 |~r − r | | {z } | {z } electron-nuclei attraction electron-electron repulsion

­ T [ρ(~r)] is defined as the kinetic energy of a system with the same electronic density ρ(~r) but in which the electrons don’t interact

­ Exc [ρ(~r)] the rest of the energy: exchange and electronic correlation contributions to the total energy + the difference between T [ρ(~r)] and the real kinetic energy Density Functional Theory (4)

Kohn-Sham orbitals Kohn and Sham suggest to decompose the total electronic density into a sum of individual contributions for each electron:

N Nα β α β ρ(~r) = ∑ρi (~r) + ∑ρi (~r) i i N Nα β 2 | α ~ |2 β ~ = ∑ ψi (r) + ∑ ψi (r) i i (α : high spin; β : low spin)

α β ψi , ψi : Kohn-Sham molecular orbitals, or “auxiliary” orbitals Density Functional Theory (5)

The Kinetic Energy operator is defined using Kohn-Sham orbitals One can then define T [ρ(~r)]:

Nσ Z ∆ T [ρ(~r)] = ψσ (~r) − ψσ (~r)d~r ∑ ∑ i 2 i σ=α,β i

Be careful: T [ρ(~r)] is not a real functional of the density since it is only defined using Kohn-Sham molecular orbitals (and not using ρ(~r)). Density Functional Theory (6)

The Kohn-Sham Equations By applying the variational principle to the ground state energy:

∂E [ρ(~r)] ∂E [ρ(~r)] = = 0 ∂ρα (~r) β i ∂ρi (~r) One can find the one-electron Kohn-Sham equations:

hKSψi = εi ψi

with hKS: the one-electron Kohn-Sham operator

∆ Z Z (~r 0) A ρ ~0 hKS = − − + dr + Vxc ∑ ~ ~0 2 A |~r − RA| |~r − r |

∂E [ρ(~r)] and V = xc : the exchange-correlation potential xc ∂ρ(~r) Density Functional Theory (7)

Exchange-correlation functionals If the “true” exchange-correlation functional was known, the Kohn-Sham equations would give the exact electronic density of a ground state system. This is not the case ! various approximations are to be made it exists various exchange-correlation models Z First case: Exc = εxc (ρ).ρ(~r)d~r local methods Z  ~  Second case: Exc = εxc ρ,∇ρ .ρ(~r)d~r non-local methods gradient corrected GGA (Generalized Gradient Approximation) Density Functional Theory (8)

Exchange-correlation potential models: Local density methods Usually, the exchange and the correlation contributions are separated:

Exc = Ex + Ec

Ex Ec

­ LDA: Local Density Approximation ­ VWN (Vosko-Wilk-Nusair) Z  2  ­ PZ (Perdew-Zunger) −3e 2 1/3 Ex = ρ(~r) 3π ρ(~r) d~r 4π ­ PW92 (Perdew-Wang, 1992) (exact exchange energy in a ­ etc. homogeneous electron gas) Density Functional Theory (9)

Exchange-correlation potential models: Non-local density methods

Ex Ec

­ PW86 (Perdew-Wang, 1986) ­ LYP (Lee-Yang-Parr)

PW 86 LDA 2 4 61/15 ­ PW91 εx = εx 1 + ax + bx + cx ­ PBE (Perdew-Burke-Ernzerhof) |~∇ρ with x = ­ P86 ρ4/3 and a, b, and c real parameters Exc ­ B88 (Becke 1988)

2 B88 LDA 1/3 x ­ BP86 = B88 + P86 εx = εx − βρ 1 + 6βx sinh−1 x ­ BLYP = B88 + LYP with β an atomic parameter ­ PBE ­ PW91 ­ BPW91 = B88 + PW91 ­ PBE ­ etc. Density Functional Theory (10)

Exchange-correlation potential models: Hybrid methods In the hybrid methods, the exchange energy contains a part of “exact” exchange energy calculated in a similar manner as Hartree-Fock exchange energy (but using Kohn-Sham orbitals) Ex.: B3LYP

B3LYP LDA HF LDA B88 LDA LYP LDA Exc = Exc + a0(Ex − Ex ) + ax (Ex − Ex ) + ac (Ec − Ec ) with a0 = 0.20, ax = 0.72, ac = 0.81

Common hybrid methods: B3LYP, PBE0, PBE1PBE The QM scaling problem (1)

energy of a water cluster (3-21G basis set) energy of a water cluster (6-31G* basis set)

3500 B3LYP/3-21G 3500 B3LYP/6-31G* BLYP/3-21G BLYP/6-31G* CCSD(T)/3-21G CCSD(T)/6-31G* MP2/3-21G MP2/6-31G* 3000 HF/3-21G 3000 HF/6-31G*

2500 2500

2000 2000

1500 1500

wall clock CPU time (seconds) 1000 wall clock CPU time (seconds) 1000

500 500

0 0 0 50 100 150 200 0 50 100 150 200 number of water molecules number of water molecules ­ (H O) water cluster energy of a water cluster (6-311+G** basis set) 2 n 3500 B3LYP/6-311+G** BLYP/6-311+G** CCSD(T)/6-311+G** (n from 1 to 216) MP2/6-311+G** 3000 HF/6-311+G**

­ 1 energy calculations 2500

­ G09.B01 2000

(NProcShared=4, Mem=8Gb, 1500

MaxDisk=36Gb) wall clock CPU time (seconds) 1000

­ Wall clock time limit: 1 hour 500

0 ­ Intel(R) Xeon(R) CPU E5620 0 50 100 150 200 number of water molecules 2.40GHz (8 cores) 32Gb RAM The QM scaling problem (2)

Theoretical CPU scaling order for different QM methods

QM method Scaling semiempirical O(N3) DFT O(N3 − N4) Hartree-Fock O(N4) MP2 O(N5) CCSD(T) O(N7) Full CI O(expN )

The (H2O)n example: n max in 1/2 hour (4 cores)

HF BLYP B3LYP MP2 CCSD(T) 3-21G 216 128 128 32 8 6-31G* 96 96 96 24 4 6-311+G** 32 32 28 16 4 The QM scaling problem (3)

How to solve the QM scaling problem? ­ Moore’s Law: CPU power doubles every 18 months doubling a molecular system is possible:  O(N3) scaling: every 18x3 months = 4.5 years  O(N4) scaling: every 6 years  O(N5) scaling: every 7.5 years, etc. ­ Parallelism is not a valid option in the long run  Good speeds-up are difficult to obtain (Amdahl’s Law)  non linear scaling of the “standard” algorithms  standard algorithms are not parallel friendly change the methods: use approximate quantum methods  semiempirical QM methods  molecular mechanics (MM) force fields  combined QM/MM methods change the algorithms  Linear scaling algorithms Semiempirical methods (1)

They are as old as ab initio methods ­ PPP (Pariser-Parr-Pople) method 1950s ­ Extended Huckel method 1960s ­ CNDO 1960s ­ INDO 1960s ­ etc.

A shared assumption ­ ab initio (HF) calculations are too time consuming ­ the equations are simplified to yield accessible timings for “real” molecules ­ some parameters are introduced to correct the loss of information ­ these parameters are obtained from experimental data empirical parameters (hence the term semiempirical methods) Semiempirical methods (2)

NDDO: Neglect of Diatomic Differential Overlap ­ Most modern semiempirical methods are NDDO based:  MNDO (1977)  AM1 (1985)  PM3 (1989)  PDDG/PM3 & PDDG/MNDO (2002)  PM6 (2007)  and going ... ­ They are based on a simplification of the Hartree-Fock equations Standard QM algorithm (Hartree-Fock)

Roothan Equations (closed shells)

1 c ZAZB Total energy E = ∑∑Pµν [Hµν + Fµν ] + ∑ ∑ 2 µ ν A B>A RAB

occ Density matrix element Pµν = 2 ∑ cµj cνj (cµj : M.O. coefficients) j

 1  Fock matrix element F = Hc + P (µν|λη) − (µη|λν) µν µν ∑∑ λη 2 λ η Z ∗ 1 ∗ bielectronic integrals( µν|λη) = φµ (1)φν (1) φλ (2)φη (2) dr1dr2 r12

The Roothan equations FC = SCε (ε : M.O. eigenvalues) (S : overlap matrix C : M.O. coefficient matrix) Standard QM algorithm (Hartree-Fock)

Hartree-Fock SCF algorithm 1. Compute mono- and bielectronic integrals O(N4) 2. Build core hamiltonian (invariant) Hc 3. Guess an initial density matrix 4. Build the Fock matrix F 5. Orthogonal transformation using S1/2 O(N3)

F0C0 = εC0

6. Diagonalization of the Fock matrix F0 O(N3) The C0 coefficients are obtained 7. Inverse transformation C0 → C 8. Build the new density matrix O(N3) Back to 4. unless convergence Semiempirical methods (3)

NDDO approximations ­ Only valence shell electrons are considered core electrons are taken into account by reducing the nuclei charges (effective nuclei charge) and by introducing empirical functions to model the interactions between (nuclei+core electrons) and the other particles ­ A minimal basis set is used. Usually: minimal Slater Type Orbital basis set ­ ZDO approximation (Zero Differential Overlap): All products between basis functions corresponding to a single electron but centered on different atoms are neglected:

A B ϕµ (i).ϕν (i) = 0 if A 6= B Semiempirical methods (4)

Consequences of the ZDO approximation ­ The overlap matrix S is equal to unity: S = I There is no orthogonalization step in the SCF procedure ­ one-electron three-center integrals (two centers for the basis functions and one center for the operator) are considered to be equal to zero ­ All three-center and four-center bielectronic integrals are neglected (these are the most numerous integrals) The number of integrals scales as O(N2) (where N is the number of basis functions) Z Z ­ ∑ ∑ A B in HF equations is replaced by A A>B RAB ∑ ∑ fAB (RAB ) a parameterized core-core repulsion function A A>B Semiempirical methods (5)

Semiempirical SCF algorithm 1. Compute mono- and bielectronic integrals O(N2)

2. Build core hamiltonian (invariant) Hc 3. Guess an initial density matrix 4. Build the Fock matrix F 5. Diagonalization of the Fock matrix F O(N3) The C coefficients are obtained 6. Build the new density matrix from C O(N3) Back to 4. unless convergence Semiempirical methods (6)

PM3 vs. ab initio energy of a water cluster (3-21G basis set vs. PM3) energy of a water cluster (3-21G basis set vs. PM3) 100 3500 B3LYP/3-21G B3LYP/3-21G BLYP/3-21G BLYP/3-21G CCSD(T)/3-21G CCSD(T)/3-21G MP2/3-21G MP2/3-21G 3000 HF/3-21G HF/3-21G PM3 80 PM3

2500

60 2000

1500 40

wall clock CPU time (seconds) 1000 wall clock CPU time (seconds)

20 500

0 0 0 50 100 150 200 0 50 100 150 200 number of water molecules number of water molecules

­ (H2O)n water cluster ­ 3-21G: (n from 1 to 216)  NProcShared=4 ­ 1 energy calculations  Mem=8Gb  MaxDisk=36Gb ­ Gaussian G09.B01 ­ PM3: ­ Wall clock time limit: 1 hour  NProcShared=1 ­ Intel(R) Xeon(R) CPU E5620 2.40GHz (8 cores) 32Gb RAM Semiempirical methods (7)

An example of a NDDO method: MNDO MNDO: Modified Neglect of Differential Overlap (Dewar & Thiel: 1977)

Nomenclature: A,B: atoms (A 6= B) µ,ν: atomic orbitals from A λ,η: atomic orbitals from B

Fock matrix elements:

A  1  B F = U − Z 0 (µµ|s s ) + P (µµ|νν) − (µν|µν) + P (µµ|λη) µµ µµ ∑ B B B ∑ νν 2 ∑ ∑ λη B6=A ν B λ,η

1 B F = − Z 0 (µν|s s ) + P [3(µν|µν) − (µµ|νν)] + P (µν|λη) µν ∑ B B B 2 µν ∑ ∑ λη B6=A B λ,η

1 A B Fµλ = βµλ Sµλ − ∑∑Pνη (µν|λη) 2 ν η Semiempirical methods (8)

Parameters (1)

­ Uµµ (Uss and Upp): these terms represent the one-electron one-center integrals corresponding to the sum of the kinetic energy of one electron in the atomic orbital ϕµ of A and the potential energy of the same electron due to its attraction by the core of A (nuclei + core electrons) ­ coulombic one-center bielectronic integrals( µµ|νν) are generally noted gµν , while exchange one-center bielectronic integrals (µν|µν) are generally noted hµν .

In the MNDO method, gµν and hµν integrals are evaluated from experimental spectroscopic data (oleari, 1966). There are five one-center bielectronic integrals:

gss = (ss|ss) gsp = (ss|pp) hsp = (sp|sp)

0 0 0 0 1  1 0 0  gpp = (pp|pp) gpp0 = (pp|p p )(pp |pp ) = 2 (pp|pp) − 2 (pp|p p ) Semiempirical methods (9)

Parameters (2)

­ one-electron two-center integrals βµλ are computed using the formula: β A + β B β = µ λ µλ 2 A B where βµ and βλ are two atomic parameters of the MNDO method.

­ two-electron two-center integrals (µν|λν) are computed using a multipolar development which uses two kinds of parameters: Di and ρj which are computed from the knowledge of ζ, the Slater atomic orbital coefficients

For example: for an element of the second row of the periodic table 5 parameters: D1,D2,ρ0,ρ1,ρ2 Semiempirical methods (10)

Parameters (3) ­ core-core repulsion functions are computed using the formula:

0 0  −αARAB −αB RAB  fAB (RAB ) = ZAZB (sAsA|sB sB ) 1 + e + e

where αA and αB are atomic parameters.

­ In the case where (A,B) represents a hydrogen bond (A = N or O, and B = H):

0 0  −αX RXH −αH RXH  fXH (RXH ) = ZX ZH (sX sX |sH sH ) 1 + RXH e + e

(with RXH in A)˚ Semiempirical methods (11)

Parameters (4) Thus, MNDO semiempirical parameters are defined by atom.

Example: C: Uss , Upp, ζ (in MNDO: ζs = ζp) βs , βp, α, D1, D2, ρ0, ρ1, ρ2 gss , gpp, gsp, gpp0 , hsp 16 parameters Semiempirical methods (12)

Two improvements of the MNDO method: AM1 and PM3

AM1 (Austin Model 1): 1985 (Dewar et al.) PM3 (Parameter Model 3): 1989 (Stewart et al.)

s and p Slater atomic orbital coefficients are now different( ζs 6= ζp) the core-core function is modified

0 0 " # Z Z A A 2 B B 2 MNDO A B A −b (RAB −c ) B −b (RAB −c ) fAB (RAB ) = fAB (RAB )+ ∑ak e k k + ∑ak e k k RAB k k

A A A ­ ak ,bk ,ck : atomic parameters ­ AM1: k goes from 1 to 4 PM3: k goes from 1 to 2 ­ In the case of the PM3 method: gµν and hµν integrals are now optimized parameters In the case of the carbon element: AM1: 29 parameters PM3: 23 parameters Semiempirical methods (13)

Determination of the semiempirical parameters The semiempirical parameters are optimized (=fitted) to reproduce a given set of experimental data from small molecules in gas phase:

? geometrical structures ? heat of formation (∆Hf ) ? dipolar moments ? ionization potentials MNDO, AM1, and PM3, etc. are different because they make use of different semiempirical equations, different number of parameters, different number of optimized parameters (experimental parameters vs. optimized parameters), and different sets of experimental data. Semiempirical methods (14)

Advantages and disadvantages of the semiempirical methods ­ A lot faster than Hartree-Fock and post-Hartree-Fock methods. ­ Electronic correlation is implicitly taken into account through the use parameters fitted from experimental data. ­ Give, when properly used, better results than Hartree-Fock method.

­ The quality of a semiempirical computation is dependant on the way the semiempirical parameters have been fitted:

experimental = small gas phase domain of validity for data molecule semiempirical methods ! Semiempirical methods (15) Semiempirical methods (16)

Semiempirical: what is it good for? ­ enthalpies, heats of formation ­ , gas phase geometries (stable structures) of small molecules ­ Y, transition state geometries ­ Y frequency calculations ­ intermolecular interactions ( currently being improved) / Selected publications

MNDO Dewar, M. J. S.; Thiel, W. J. Am. Chem. Soc. 1977, 99, 4899–4907 AM1 Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J. Am. Chem. Soc. 1985, 107, 3902–3909 PM3 Stewart, J. J. P. J. Comput. Chem. 1989, 10, 209–220 PDDG Repasky, M.; Chandrasekhar, J.; Jorgensen, W. J. Comput. Chem. 2002, 23, 1601–1622 PM6 Stewart, J. J. P. J. Mol. Model. 2007, 13, 1173–1213 Molecular Mechanics (1)

How can we further speed up the calculations? ­ In many problems, an accurate description of the electronic wavefunctions is not necessary ­ This is true when no chemical change is performed along a simulation Molecular Mechanics is a simplification of the description of a molecular system at the atomic level where no explicit electrons are considered the energy of a system is then defined solely by the positions of the nuclei (Born-Oppenheimer approximation) Molecular Mechanics (2)

Quantum Mechanics around the equilibrium structure 1 water molecule

O O O

HH HH HH Symetric stretch Asymetric stretch Bend (3657 cm-1) (3776 cm-1) (1595 cm-1)

Deformation around the equilibrium geometry can be modelled using harmonic potentials. Molecular Mechanics (3)

Quantum Mechanics around the equilibrium structure Many water molecules

Water molecules in interactions: ­ van der Waals contacts:

" 12  6# ij σij σij Evdw = εij − 2 Rij Rij

­ electrostatic dipole-dipole interactions replaced by charge-charge interactions:

1 qi qj Eelec = ∑∑ i i>j 4πε0 rij Molecular Mechanics (4)

Force Fields ­ Molecular Mechanics (MM) is the application of the Newtonian mechanics (classical mechanics) to molecular systems. ­ In a molecule, each atom is considered as a point charge ­ The point charges interact using a parametrized force field ­ A force field is an equation describing all possible interactions in a molecular system associated with pre-defined parameters: force field = equation + parameters ­ In most cases, the connectivity of the system remains constant ( no chemical reaction) Molecular Mechanics (5)

Molecular Interactions described by a force field

       Bond stretching  Angle bending  Bond rotation (torsion) Out−of−plane (improper torsion)

δ+ δ−

δ+

Non−bonded interactions Non−bonded interactions (electrostatic) (van der Waals) Molecular Mechanics (6)

Transferability / Additivity Molecular Mechanics is based on two main assumptions: Transferability: properties of chemical subgroups are similar either in small molecules or large compounds (e.g.: a carbonyl C=O group has very similar stretching properties in H2CO or in a 10,000 atom structure) Additivity: effective molecular energy can be expressed as a sum of potentials describing all interactions in the molecular system: ­ van der Waals and electrostatic interactions (non-bonded interactions) ­ bond length and angle deviations, internal torsion flexibility, etc. (bonded interactions) Molecular Mechanics (7)

Example of a force field: AMBER AMBER: general force field for the description of proteins and nucleic acids (DNA, RNA).

bonds angles 1 2 1 2 Epot = ∑ kb(r − rb) + ∑ ka(θ − θa) b 2 a 2 dihedrals V + ∑ ∑ n (1 + cos(nω − γ)) d n 2 atoms atoms ( " 12  6#) 1 qi qj σij σij + ∑ ∑ + εij − 2 i j>i 4πε0εr rij rij rij Molecular Mechanics (8)

An example using the AMBER force field (ff03) N-methylacetamide

Atom Residue Number Name Name Number 1 1HH3 ACE 1 2 CH3 ACE 1 3 2HH3 ACE 1 4 3HH3 ACE 1 5 C ACE 1 6 O ACE 1 7 N NME 2 8 H NME 2 9 CH3 NME 2 10 1HH3 NME 2 11 2HH3 NME 2 12 3HH3 NME 2 Molecular Mechanics (9)

AMBER atom types and atom charges (ff03) N-methylacetamide

Atom Number Name Type Charge 1 1HH3 HC 0.0760 2 CH3 CT -0.1903 3 2HH3 HC 0.0760 4 3HH3 HC 0.0760 5 C C 0.5124 6 O O -0.5502 7 N N -0.4239 8 H H 0.2901 9 CH3 CT -0.0543 10 1HH3 H1 0.0627 11 2HH3 H1 0.0627 12 3HH3 H1 0.0627 Molecular Mechanics (10)

AMBER bond types (ff03) N-methylacetamide

Bond Number kb rb CT–HC 3 340.0 1.090 CT–C 1 317.0 1.522 C–O 1 570.0 1.229 C–N 1 490.0 1.335 N–H 1 434.0 1.010 N–CT 1 337.0 1.449 CT–H1 3 340.0 1.090 Molecular Mechanics (11)

AMBER angle types (ff03) N-methylacetamide

Angle Number ka ra HC–CT–HC 3 35.0 109.50 HC–CT–C 3 50.0 109.50 CT–C–O 1 80.0 120.40 CT–C–N 1 70.0 116.60 O–C–N 1 80.0 122.90 C–N–H 1 50.0 120.00 C–N–CT 1 50.0 121.90 H–N–CT 1 50.0 118.04 N–CT–H1 3 50.0 109.50 H1–CT–H1 3 35.0 109.50 Molecular Mechanics (12)

AMBER dihedral and improper types (ff03) N-methylacetamide

Dihedral Number n Vn γ HC–CT–C–O 3 1 0.80 0.0 3 0.08 180.0 HC–CT–C–N 3 0 0.00 0.0 CT–C–N–H 1 2 10.00 180.0 CT–C–N–CT 1 2 10.00 180.0 O–C–N–H 1 2 2.50 180.0 1 2.00 0.0 O–C–N–CT 1 2 10.00 180.0 C–N–CT–H1 3 0 0.00 0.0 H–N–CT–H1 3 0 0.00 0.0

Improper Number n Vn γ H–N–C–CT 1 2 1.1 180.0 O–C–N–CT 1 2 1.1 180.0 Molecular Mechanics (13)

AMBER van der Waals types (ff03) N-methylacetamide

Atom type σi εi C 1.9080 0.0860 CT 1.9080 0.1094 H 0.6000 0.0157 HC 1.4870 0.0157 H1 1.3870 0.0157 N 1.8240 0.1700 O 1.6612 0.2100

" 12  6# ij σij σij Evdw = εij − 2 Rij Rij

√ σij = σi + σj and εij = εi εj Molecular Mechanics (14)

Some usual force fields AMBER Assisted Model Building and Energy Refinement (UCSF) specialized in the modelization of proteins and nucleic acids (DNA, RNA) CHARMm Chemistry at HARvard Macromolecular Mechanics (Harvard, Strasbourg) specialized in the modelization of proteins MM2, MM3, MM4 Allinger Molecular Mechanics (UGA) specialized in organic compounds MMFF94 Merck Molecular Force Field (Merck Res. Lab.) specialized in organic compounds OPLS Optimized Potentials for Liquid Simulations (Yale) AMOEBA Polarizable force field for water, ions and proteins (WUSTL) etc. Molecular Mechanics (15)

Simulating infinite systems

                                                   

                           Molecular Mechanics (16)

Simulating infinite systems ­ Periodic Boundary Conditions (PBC): a molecular system is enclosed in a box (the unit cell) and is replicated infinitely in the three space dimensions (the images).

­ Minimum Image Convention: Only the coordinates of the unit cell is recorded. As an atom leaves the unit cell by crossing the boundary, an image enters to replace it. the total number of particles is conserved. Molecular Mechanics (17)

Long-range electrostatic interactions ­ The coulomb energy in periodic domains (neutral system):

0 1 qi qj Eelec = ∑∑∑ 2 ~n i j |~ri −~rj +~n|

The sum is conditionnally convergent (= slow convergence, if any) 1 ­ cut-off: if rij > rcut-off = 0 rij non-physical but speeds up computations Molecular Mechanics (18)

The Ewald Summation ­ The coulomb sum can be converted in a sum of two absolutely and rapidly convergent series in direct and reciprocal space. ­ This conversion is accomplished by adding to each point charge a Gaussian charge density of opposite value and same magnitude as the point charge: √ 3 2 2 3 ρi (~r) = −qi α exp(−α r )/ π (3)

where α is a positive parameter which determine the width of the gaussians ­ This charge distribution screens the interaction between neighbouring point charges. fast convergence in the direct space. ­ The distribution of opposite gaussian charges converges quickly in the reciprocal space using a Fourier transform. Molecular Mechanics (19)

The Ewald Summation It is demonstrated (by Ewald, 1921):

r m 0 Eelec = U + U + U with Ur the direct sum, Um the reciprocal sum, and U0 the self-interacting term (which corrects the interactions between the counter charges introduced in the system).

0 1 q q erfc(α|~r −~r +~n|) Ur = ∑ i j ∑ i j (4) 2 i,j 4πε0 ~n |~ri −~rj +~n| 2 m 1 qi qj exp(−(π~m/α) + 2π i ~m.(~ri −~rj )) U = 3 ∑ ∑ 2 (5) 2πL 4πε0 ~m i,j ~m6=~0

0 −α 2 U = √ ∑qi (6) π i

2 ~m = 2π ~n a reciprocal space vector, erfc(x) = 1 − erf(x) = 1 − √2 R x e−u du L π 0 Molecular Mechanics (20)

Particle Mesh Ewald (Darden et al., 1993) ­ The computation time of the Ewald summations grows as O(N2) where N is the number of particles in the periodic systems. ­ To speed up computations, the Particle Mesh Ewald (PME) method has been designed. Its computation grows as O(N logN). ­ It is based on the use of a cut-off in the direct space and the use of Fast Fourier Transform (FFT) in the reciprocal space. Molecular Mechanics (21)

Molecular Dynamics (MD) ­ The Molecular Dynamics (MD) is the simulation of the behavior of a molecular system along time. ­ It is performed by solving the Newton’s equations of motions: −→ −→ mi a i = F i

(m : the mass of the particle; −→a : its acceleration; −→ i i F i : the external forces acting on it) ­ The resolution of the Newton’s equations of motions is made using numerical integration ­ ∆t is the time increment At each t, the potential energy and the forces must be computed −→ −→ −→ Positions ( xi ), velocities ( vi ), and forces (Fi ) must be knowed at each t Molecular Mechanics (22)

Molecular Dynamics Integrators There are many ways of solving the Newton’s equations of motions. Verlet: ∆t2a(t) r(t + ∆t) = r(t) + ∆tv(t) + 2 f(t + ∆t) a(t + ∆t) = m 1 v(t + ∆t) = r(t) + ∆t[a(t) + a(t + ∆t)] 2 Leapfrog: 1 1 v(t + ∆t) = v(t − ∆t) + ∆ta(t) 2 2 1 r(t + ∆t) = r(t) + ∆tv(t + ∆t) 2 f(t + ∆t) a(t + ∆t) = m Molecular Mechanics (23)

Conservation of the Energy If the system is isolated, the total energy is conserved: −→ d v i dE mi = − −→ dt d x i with dE −→ ∑ −→ = 0 i d x i gives Etotal = Ekin. + Epot. = Cte Molecular Mechanics (24)

Timestep ∆t ­ ∆t must be small enough to ensure the conservation of the total energy ­ The higher the ∆t, the less energy computations are needed for a given simulation length ­ Nyquist-Shannon sampling theorem:

r µ ∆t 2π 6 k

with k the strongest force constant in the system and µ its associated reduced mass ­ In practice, ∆t ∼ 1fs (1fs = 10−15s) Molecular Mechanics (25)

The Ergodic hypothesis ­ How long a molecular dynamics simulation should be run ? ­ Ergodic hypothesis: at t ∼ +∞, all accessible states have been explored by the system. it is not possible to wait t = +∞ ! ­ In practice: 1 year of CPU time = 31.5e6 sec. typical MD length = 1 to 100 ns It is difficult to ensure proper “convergence” (= that all accessible states have been explored) Molecular Mechanics (26)

Thermodynamical ensembles NVE Constant number of atoms (N), constant volume (V), constant energy (E) an isolated molecular system in a periodic box NVT constant N, V, and Temperature (T) the system in the periodic box is coupled to a thermostat of infinite mass (temperature coupling) NPT constant N, T, and pressure (P) the system in the periodic box is coupled to a thermostat and a barostat (temperature and pressure coupling) the size of the box changes along time µVT/µPT constant V or P, T, and chemical potential (µ) multiple phases (at least two); the chemical potential is constant for each phase. Molecular Mechanics (27)

Properties accessibles by MD Nearly all non-reactive properties are accessible: ­ Molecular conformations; static properties (heat of vaporization, radial distribution functions, dieletric constant, etc.) ­ Dynamical properties (diffusion constant, transport, etc.) ­ Phase change; state change; protein folding ­ Molecular recognition; signal ­ Free energy changes (solvation, alchemical transformation, thermodynamical cycle, etc.) QM/MM Methods: Foundations (1) How to simulate a very large ”reactive” molecular system? Quantum Mechanics ­ Description of the electrons and nuclei behavior ­ Allows the breaking and forming of covalent bonds ­ CPU time intensive −→ limited to small systems

Molecular Mechanics ­ Atoms = interacting point charges ­ Bad description of chemical reaction ­ Fast computations −→ suitable for large systems QM/MM Methods: Foundations (2)

General Idea ­ Partionning of the total system ­ Active part = small number of atoms Description by Quantum Mechanics (QM) the quantum part ­ Rest of the system Description by Molecular Mechanics (MM) the classical part ­ The MM part acts as a perbutation to the QM part ­ The coupling is called a QM/MM method QM/MM Methods: Foundations (3)

Seminal papers ­ Warshel, A.; Levitt, M. J. Mol. Biol. 1976, 103, 227–249 ­ Singh, U. C.; Kollman, P. A. J. Comput. Chem. 1986, 7, 718–730 ­ Field, M.; Bash, P.; Karplus, M. J. Comput. Chem. 1990, 11, 700–733

Selected reviews ­ Aqvist,˚ J.; Warshel, A. Chem. Rev. 1993, 93, 2523–2544 ­ Monard, G.; Jr., K. M. Acc. Chem. Res. 1999, 32(10), 904–911 ­ Monard, G.; Prat-Resina, X.; Gonz´alez-Lafont,A.; Lluch, J. Int. J. Quant. Chem. 2003, 93(3), 229–244 ­ Amara, P.; Field, M. J. In Encyclopedia of Computational Chemistry; John Wiley & Sons, Ltd, 2002 ­ Lin, H.; Truhlar, D. G. Theor. Chem. Acc. 2007, 117, 185–199 QM/MM Methods: Foundations (4)

QM/MM Hamiltonians

H = HQM + HMM + HQM/MM

HQM/MM describes the interactions between the quantum part and the classical part

The QM hamiltonian

e- e- nuclei e- e- nuclei nuclei 1 ZK 1 ZK ZL HQM = − ∑∆i − ∑ ∑ + ∑∑ + ∑ ∑ 2 i i K riK i i>j rij K K>L RKL QM/MM Methods: Foundations (5)

The MM hamiltonian

bonds angles dihedrals 1 2 1 2 Vn HMM = ∑ kb(r − rb) + ∑ ka(θ − θa) + ∑ ∑ (1 + cos(nω − γ)) b 2 a 2 d n 2 atoms atoms ( " 12  6#) 1 qi qj σij σij + ∑ ∑ + εij − 2 i j>i 4πε0εr rij rij rij

The QM/MM hamiltonian

e- classical nuclei classical QC ZK QC van der Waals HQM/MM = −∑ ∑ + ∑ ∑ +VQM/MM i C riC K C RKC | {z } | {z } e− − charge nuclei - charge interactions interactions QM/MM Methods: Foundations (6) re-writing of the equations into electrostatic and non-electrostatic interactions

H = Helec + Hnon-elec

e- e- nuclei e- e- e- classical 1 ZK 1 −QC Helec = − ∑∆i − ∑ ∑ + ∑∑ + ∑ ∑ 2 i i K riK i i>j rij i C riC | {z } | {z } standard equations wavefunction polarization by external charges

nuclei classical nuclei nuclei van der Waals ZK QC ZK ZL Hnon-elec = HMM + VQM/MM + ∑ ∑ + ∑ ∑ K C RKC K K>L RKL van der Waals nuclei = HMM + VQM/MM + VQM+QM/MM QM/MM Methods: Foundations (7)

QM/MM Implementations

nuclei ­ Helec, and VQM+QM/MM can be computed using a standard quantum mechanics code. ­ The term describing the electrons-classical charge interaction is incorporated into the core Hamiltonian of the quantum subsystem (electrostatic embedded scheme). van der Waals ­ HMM, and VQM/MM are computed using standard molecular mechanics code and are relatively easy to implement. QM/MM Methods: Foundations (8)

Calibrating QM/MM interactions ­ The calibration of the QM/MM interactions is the main problem facing QM/MM methods ­ The QM/MM interaction should reproduce quantitatively the interaction between the classical and the quantum parts as if the system was computed fully quantum mechanically ­ The quantitative reproduction of the QM/MM interactions depends on three points

1. The choice of QC or more in general the choice of the MM force field van der Waals 2. The choice of the van der Waals parameters to describe VQM/MM 3. The way the classic charges polarize the quantum subsystem QM/MM Methods: Foundations (9)

The choice of QC

­ QC must be chosen to reproduce the electrostatic field due to the MM part onto the QM part ­ It is a good approximation to take the charge definition from an

empirical force field and incorporate those charges into Helec ­ Because MM charges are designed to properly reproduce electrostatic potentials ­ However MM charges can differ greatly between force fields ­ No systematic studies so far QM/MM Methods: Foundations (10)

The choice of the van der Waals components ­ Specific sets of van der Waals parameters and potential energy should be redefined to properly reproduce non-electrostatic QM/MM interactions

all these parameters are MM (QC ), QM and basis sets dependent Selected/ papers ­ small solute in water Freindorf, M.; Gao, J. J. Comput. Chem. 1996, 17, 386–395 Riccardi, D.; Li, G.; Cui, Q. J. Phys. Chem. B 2004, 108, 6467–6478 ­ protein, nucleic acids Freindorf, M.; Shao, Y.; Furlani, T. R.; Kong, J. J. Comput. Chem. 2005, 26, 1270–1278 Pentik¨ainen, U.; Shaw, K. E.; Senthilkumar, K.; Woods, C. J.; Mulholland, A. J. J. Chem. Theory Comput. 2009, 5, 396–410 ­ beyond Lennard-Jones Giese, T. J.; York, D. M. J. Chem. Phys. 2007, 127, 194101 QM/MM Methods: Foundations (11)

Classical charge polarization ­ ab initio: similar to electron-nuclei interaction electrons classical Q H0core = Hcore − ∑ ∑ C i C ric

0core 0core E µν = < µ|H |ν > Q = < µ|Hcore|ν > −∑∑ < µ| C |ν > i C riC QM/MM Methods: Foundations (12)

Classical charge polarization: the special case for semiempirical methods

ab initio semiempirical D E − −ZK 0 QM e –nuclei µ ν −Z (µν|sK sK ) RKi K D E − −QC QM/MM e –MM charge µ ν −QC (µν|sC sC ) RKi

Z 0 Z 0 (s s |s s )f (R ) QM nuclei–nuclei ZK ZL K L K K L L KL RKL 0 0 + ZK ZLg(RKL)/RKL

QM/MM nuclei–MM charge ZK QC many ways. . . RKC ­ Field, M.; Bash, P.; Karplus, M. J. Comput. Chem. 1990, 11, 700–733 ­ Luque, F. J.; Reuter, N.; Cartier, A.; Ruiz-L´opez, M. F. J. Phys. Chem. A 2000, 104, 10923–10931 ­ Wang, Q.; Bryce, R. A. J. Chem. Theory Comput. 2009, 5, 2206–2211 QM/MM Methods: Cutting Covalent Bonds (1)

Classical Part Quantum Part ­ Link Atoms ­ Connection Atoms C C ­ Local Self Consistent Field ­ Generalized Hybrid Orbitals Incomplete valency QM/MM Methods: Cutting Covalent Bonds (2)

Link atom method 1 ­ A monovalent atom is added along the X—Y bond = the link atom ­ Usually the link atom is an hydrogen, but some implementations use a halogen-like fluorine or chlorine ­ Interaction with the MM part ? It should interact with the MM part, except for the few closest atoms 2 ­ The link atom can be free or constrained along the X—Y bond ­ Easiest implementation ­ Give accurate answers as long as it is placed sufficiently far away from the reactive atoms (3-4 covalent bonds)

1Field, M.; Bash, P.; Karplus, M. J. Comput. Chem. 1990, 11, 700–733 2Reuter, N.; Dejaegere, A.; Maigret, B.; Karplus, M. J. Phys. Chem. A 2000, 104, 1720–1735 QM/MM Methods: Cutting Covalent Bonds (3)

Connection atoms34 ­ A monovalent pseudo-atom is added at the Y position = the connection atom ­ Its behavior mimics the behavior of a methyl group ­ semiempirical: Antes and Thiel, 1999 ­ DFT (pseudo-potential): Zhang, Lee and Yang, 1999 ­ Pro: no supplementary atom (MM: Y atom; QM: connection atom) ­ Con: Need to reparametrize each covalent bond type (C-C, C-N, etc)

3Antes, I.; Thiel, W. J. Phys. Chem. A 1999, 103(46), 9290–9295 4Zhang, Y.; Lee, T. S.; Yang, W. J. Chem. Phys. 1999, 110, 46 Zhang, Y. Theor. Chem. Acc. 2006, 116, 43–50 QM/MM Methods: Cutting Covalent Bonds (4)

Local Self Consistent Field 567 ­ the two electrons of the frontier bond are described by a strictly localized bond orbital (SLBO) ­ its electronic properties are considered as constant during the chemical reaction ­ Using model systems and the MM transferability assumption of bond properties, it is possible to determine the representation of the SLBO in the atomic orbital basis set of the quantum part ­ By freezing this representation, the other QM molecular orbitals, orthogonal to the SLBOs, are generated using a local self consistent procedure

5Thery, V.; Rinaldi, D.; Rivail, J.-L.; Maigret, B.; Ferenczy, G. J. Comput. Chem. 1994, 15, 269–282 6Assfeld, X.; Rivail, J.-L. Chem. Phys. Lett. 1996, 263(1–2), 100 – 106 7Monard, G.; Loos, M.; Th´ery, V.; Baka, K.; Rivail, J.-L. Int. J. Quant. Chem. 1996, 58(2), 153–159 QM/MM Methods: Cutting Covalent Bonds (5)

Local Self Consistent Field To simplify: 1. The MOs describing the frontier bonds are known (transferable SLBO extracted from a model system) ⇓ 2. The other MOs describing the rest of the quantum fragment are built orthogonally to the frozen orbitals with a local SCF procedure.

­ LSCF is available at the semiempirical and ab initio levels ­ Pro: no supplementary atom, proper chemical description of the X—Y bond ­ Con: difficult to implement, especially in ab initio QM/MM Methods: Cutting Covalent Bonds (6)

Generalized Hybrid Orbitals 8 Extension of the LSCF method the classical frontier atom is described by a set of orbitals divided into two sets of auxiliary and active orbitals The latter set is included in the SCF calculation, while the former generates an effective core potential for the frontier atom ­ Available at the semiempirical, SCC-DFTB 9 and ab initio 10 levels ­ Pros and Cons similar to LSCF

8Gao, J.; Amara, P.; Alhambra, C.; Field, M. J. J. Phys. Chem. A 1998, 102, 4714–4721 9Pu, J.; Gao, J.; Truhlar, D. G. J. Phys. Chem. A 2004, 108, 5454–5463 10Pu, J.; Gao, J.; Truhlar, D. G. J. Phys. Chem. A 2004, 108, 632–650 QM/MM Methods: the case of ONIOM (1)

Some peculiar QM/MM methods: ONIOM-like methods

Size of the system What we would like to model

Large (1+2) 2

1 Small (1)

Low Level High Level Level of computations

Low High Low Etotal = E1+2 + E1 − E1 QM/MM Methods: the case of ONIOM (2)

Different Approaches ­ IMOMM11: QM/MM with no MM charge inclusion into the QM core hamiltonian (no QM polarization in the original version) ­ IMOMO12: QM/QM (low level QM polarization) ­ ONIOM13: N-layered scheme

Low Medium Low High Medium Etotal = E1+2+3 + E1+2 − E1+2 + E1 − E1

Note to Gaussian Users: please use the ’EmbedCharge’ keyword , Cutting covalent bonds ­ Link atom scheme

11Maseras, F.; Morokuma, K. J. Comput. Chem. 1995, 16, 1170–1179 12Humbel, S.; Sieber, S.; Morokuma, K. J. Chem. Phys. 1996, 105, 1959 13Svensson, M.; Humbel, S.; Froese, R. D. J.; Matsubara, T.; Sieber, S.; Morokuma, K. J. Phys. Chem. 1996, 100, 19357–19363 Availability of QM/MM methods

Commercial and academic software (non exhaustive list) On the MM side: On the QM side: ­ AMBER ­ CP2K ­ BOSS ­ CPMD (with GROMOS) ­ GROMACS + ­ Gaussian09 ONIOM Gaussian/GAMESS/CPMD implementation ­ NWCHEM ­ Qsite Other software (non exhaustive list) ­ ChemShell: a layer on top of other QM and MM software (Daresbury, UK P. Sherwood) ­ Tinker-Gaussian (Nancy, France X. Assfeld & M. F. Ruiz-L´opez) ­ Tinker-Molcas (Marseille, France N. Ferr´e)