<<

Advanced Molecular Science: Electronic Structure Theory

Krzysztof Szalewicz et al. Department of Physics and Astronomy, University of Delaware, Newark, DE 19716, USA (Dated: December 17, 2017) Abstract These Lecture Notes were prepared during a one-semester course at the University of Delaware. Some lectures were given by students and the corresponding notes were also prepared by students. The goal of this course was to cover the material from first principles, assuming only the knowledge of standard at advanced undergraduate level. Thus, all the concepts are defined and all theorems are proved. There is some amount of material looking ahead which is not proved, it should be obvious from the context. About 95% of the material given in the notes was actually presented in the class, in the traditional blackboard and chalk manner.

1 CONTENTS

I. Introduction 5 A. Spinorbitals 6 B. Products of complete basis sets 7

II. Symmetries of many-particle functions 7 A. 8 B. 10

III. Separation of nuclear and electronic motion 12 A. Hamiltonian in relative coordinates 13 B. Born-Oppenheimer approximation 15 C. Adiabatic approximation and nonadiabatic correction 16

IV. The independent-particle model: the Hartree-Fock method 18 A. Slater determinant and antisymmetrizer 19 B. Slater-Condon rules 21 C. Derivation of Hartree-Fock equations 22

V. Second-quantization formalism 27 A. Annihilation and creation operators 27 B. Products and commutators of operators 28 C. Hamiltonian and number operator 30 D. Normal products and Wick’s theorem 31 1. Normal-Product 31 2. Contractions (Pairings) 32 3. Time-independent Wick’s theorem 32 4. Outline of proof of Wick’s theorem 33 5. Comprehensive proof of Wick’s theorem 34 6. Particle-hole formalism 37 7. Normal products and Wick’s theorem relative to the Fermi vacuum 38 8. Generalized Wick’s theorem 39 9. Normal-product form of operators with respect to Fermi’s vaccum 40

VI. Density-functional theory 41 A. Thomas-Fermi-Dirac method 42 B. Hohenberg-Kohn theorems 47 C. Kohn-Sham method 49 D. Local density approximation 53 E. Generalized gradient approximations (GGA) 55

2 F. Beyond GGA 57

VII. Variational Method 58 A. Configuration Interaction (CI) method 59 1. Size extensivity of CI 61 2. MCSCF, CASSCF, RASSCF, and MRCI 62 B. Basis sets and basis set convergence 64 C. Explicitly-correlated methods 66 1. Coulomb cusp 66 2. Hylleraas function 67 3. Slater geminals 68 4. Explicitly-correlated Gaussian functions 69

VIII. Many-body perturbation theory (MBPT) 70 A. Rayleigh Schrödinger perturbation theory (classical derivation) 70 B. Hylleraas variation principle 74 C. Møller-Plesset perturbation theory 76 D. Diagrammatic expansions for MPPT 82 1. Diagrammatic notation 82 2. One-particle operator 83 3. Two-particle operators 84 4. Hugenholtz diagrams 86 5. Antisymmetrized Goldstone diagrams 87 6. Diagrammatic representation of RSPT 87 E. Time versions 88 1. Time version of the first kind 88 2. Time version of the second kind 88 F. Connected and disconnected diagrams 89 G. Linked and unlinked diagrams 90 H. Factorization lemma (Frantz and Mills) 92 I. Linked-cluster theorem 94 J. Removal of spin 96

IX. theory 97 A. Exponential ansatz 97 B. Size consistency 98 C. CC method with double excitations 99 D. Equivalence of CC and MBPT theory 108 E. Noniterative triple excitations correction 110 F. Full triple and higher excitations 113

3 X. Linear response theory 114 A. Response function 114 1. Density-density response function 117 2. Calculation of properties from response functions 119 B. Linear response in CC approach 120 1. CC equations 120 2. Hellmann-Feynman theorem 122 3. Linear response CC for static perturbation 124 4. Lambda equations 126

XI. Treatment of excited states 128 A. Excitation energies from TD-DFT 128 B. Limitations of single-reference CC metods 131 C. The equation-of-motion coupled-cluster method 131 D. Multireference coupled-cluster methods 133

XII. Intermolecular interactions 136 A. Symmetry-adapted perturbation theory 137 B. Asymptotic expansion of interaction energy 141 C. Intermolecular interactions in DFT 142

XIII. Diffusion Monte Carlo 143

XIV. Density-matrix approaches 146 A. Reduced density matrices 148 B. Spinless density matrices 150 C. N-representability 151 D. functional theory 154 E. Contracted Schrödinger equation 156

XV. Density matrix renormalization group (DMRG) 159 A. Singular value decomposition 159 B. SVD applications 161 C. DMRG 162 D. Expectation values and diagrammatic notation 163 E. Matrix product ansatz 165 F. DMRG algorithm 166 G. DMRG in practice 167 H. Dynamic correlation and excited states 167 I. Applications to atoms and molecules 167 J. Limitations 168

4 I. INTRODUCTION

The subject of these lecture notes will be methods of solving Schödinger’s equation for atoms, molecules, biomolecular aggregates,

and solids. Schrödinger’s equation provides very accurate description of most types of matter under most conditions, where by “most" we will understands the materials and conditions on the Earth. Exceptions include materials that include heavy atoms where relativistic effects have to be accounted for and high-precision measurements where not only the relativistic but also (QED) effects play a role. We will restrict our attention to solutions of Schrödinger’s equation. Incorporation of relativistic and QED effects can be achieved by a fairly straightforward extension of the methods discussed here. We will also restrict our attention to systems built of and nuclei treated as point particles. Thus, we will not consider phenomena which involve nuclear reactions. However, many of the methods discussed here are used in theoretical nuclear physics.

For systems with up to 4 electrons and 1 to 3 nuclei, one can now solve Schrödinger’s equation almost to any desired precision, although for the most complicated systems of this type it requires huge amounts of computer resources. Some of the methods used in such calculations, such as the variational method with explicitly correlated functions (i.e., functions depending explicitly on -electron distances), will be briefly discussed here, but we will devote most of the time to systems that are larger and for which such methods are not applicable. The difficulty of solving Schrödinger’s equation for systems with 5 and more electrons originates from dimensionality of the problem. For example, the benzene molecule contains 12 nuclei and 42 electrons, so that Schrödinger’s equation for this system is 162-dimensional. Thus, this equation can be solved only by making approximations (although quantum diffusion Monte Carlo methods which will be considered later on do solve such equations “almost" directly). The main approximation applied is many-particle (or many-electron or many-body) expansion. Therefore, most of the material covered here belongs to the branch of physics called many-body physics. The concept of many-particle expansion is based on the observation that in a many-particle system the most important interactions are those involving only two particles. This leads to several method hierarchical in the number of particle interaction considered.

The particles that we will consider almost almost exclusively will be electrons. Many- particle theories applied to bound states of such particles are known as electronic structure theory. The reasons for using the word “structure" are uncleared, but probably relates to shell structure of atoms and orbital picture of molecules.

5 A. Spinorbitals

1 Electrons are of spin 2 . The wave function of a single electron depends on the space coordinate r = [x,y,z] with each variable in the range (−∞,∞) and on the ± 1 spin coordinate s which takes only values 2 . Therefore, the wave function for a single electron, called spinorbital, can be written in the form of the so-called spinor " # ψ (r) = + ↔ ψ(r,s) ≡ ψ(x). ψ (r) − where the ψ+ component is the amplitude of finding the electron with spin projection, ~ the eigenvalue of the operator Sz, equal to /2 (“spin up") whereas the ψ component − which is the amplitude of finding this particle with spin projection −~/2 (“spin down"). Note that an electron in the state is in general in a mixed spin state. It will be more convenient to use the other form of the wave function shown in the eqution above 1 ψ (r) = ψ(r, ) + 2 1 ψ (r) = ψ(r,− ). − 2 This form is particularly convenient to use in the expectation values of operators (matrix elements) involving spinorbitals. In the one-electron case X Z hψ|f φi = d3rψ(x)f (r)φ(x), s= 1/2,1/2 − i.e., we sum over spin variable and integrate over the three space variables. Since f does not contain spin operators, the sum over the spin degrees of freedom can be computed immediately. In most cases we will consider pure spin states, i.e., states with the property that either

ψ+(r) or ψ (r) is zero. For example, − " # ψ (r) = + 0

is the spinorbital with spin projections ~/2 or “spin alpha (α)" state, whereas the other option is called “spin beta (β)" state. In such cases, the pure-spin spinorbitals can be denoted as

ψ+(r) and ψ (r) or ψ(r)α(s) and ψ0(r)β(s) or ψα(r) and ψβ(r), − where α(1/2) = β(−1/2) = 1 and α(−1/2) = β(1/2) = 0 and the spatial part is called

the orbital. Note the ψ+(r) and ψ (r) are now different spinorbitals, whereas before − they were components of a single spinorbital. We also continue using the symbol ψ(x)

6

assuming that ψ(x) either describes pure spin alpha or is zero. One sometimes uses s=1/2 a somewhat confusing notation where spinorbital and orbital are denoted by the same symbol, so that we have, for example, ψ(x) = ψ(r)α(s). The meaning of this symbol should be obvious from the context. If ψ and φ represent the same spin projection, they are simultaneously nonzero at either s = 1/2 or −1/2 and zero at the opposite value, so that the spin summation can be performed and leaves only the spatial integral. If spin projections are opposite, at each value of s one of the spinorbitals is zero, so that the result of summation over spins is zero. The same becomes more transparent in the alpha/beta notation, for example, X Z Z hψα|f φαi = α2 d3rψ(r)f (r)φ(r) = d3r ψ(r)f (r)φ(r) s= 1/2,1/2 − and obviously for opposite spins one gets zero.

B. Products of complete basis sets

One of main theorems used in the many-body theory tells that a complete basis set in the space of a many-particle functions can be formed as a product of complete basis sets of single-particle functions. Let us show that this is the case on the simplest example { } of a function of two variables. Lets assume that gi(x) i∞=1 is a complete basis set of { } one variable. Then the set of products gi(x)gj(y) is a complete set in the space of two variables, i.e., any functions f (x,y) can be expanded in this set

X∞ f (x,y) = cijgi(x)gj(y). i,j=1 To see that this is indeed the case, consider the function f (x,y) at a fixed value of x denoted by x0. Since f (x0,y) is just a function of a single variable, we may write X f (x0,y) = dj(x0)gj(y). j

However, taken at different values of x0, dj(x0) is just a single-variable function and can be expanded in our basis X dj(x) = cijgi(x) i which proves the theorem.

II. SYMMETRIES OF MANY-PARTICLE FUNCTIONS

Since electrons are fermions, the electronic wave functions have to be antisymmetric. This chapter will show how to achieve this goal. The notion of antisymmetry is related

7 to permutations of electrons’ coordinates. Therefore we will start with the discussion of the permutation group.

A. Symmetric group

The permutation group, known also under the name of symmetric group, it the group of all operations on a set of N distinct objects that order the objects in all possible ways.

The group is denoted as SN (we will show that this is a group below). We will call these operations permutations and denote them by symbol σi. For a set consisting of numbers 1, 2, ..., N, the permutation σi orders these numbers in such a way that k is at jth position. Often a better way of looking at permutations is to say that permutations are all mappings of the set 1, 2, ..., N onto itself: σi(k) = j, where j has to go over all elements. The number of permutations is N! Indeed, we can first place each object at positions 1, so there are N possible placements. For each case, we can place one of the remaining N −1 objects at the second positions, so that the number of possible arrangements is now N(N − 1). Continuing in this way, we prove the theorem. For three numbers: 1, 2, 3, there are the following 3! = 6 arrangements: 123, 132, 213, 231, 312, 321. One can use the following “matrix" to denote permutations: ! 1 2 ... k ... N σ = σ(1) σ(2) ... σ(k) ... σ(N)

The order of columns in the matrix above is convenient, but note that if the columns were ordered differently, this will still be the same permutation. An example of a permutation in this notation is ! 1 2 3 4 σ = 3 4 1 2

We define the operation of multiplication within the set of permutations as (σ ◦σ 0)(k) = σ(σ 0(k)). For example, if ! ! 1 2 3 4 1 2 3 4 σ = σ = 1 3 4 1 2 2 2 4 3 1 then ! 1 2 3 4 σ ◦ σ = . 2 1 3 1 2 4 We can now check if these operations satisfy the group postulates ◦ ∈ • Closure: σ σ 0 SN . The proof is obvious since the product of permutations gives a number from the set, therefore is a permutation.

8 • Existence of unity I: this is the permutation σ(k) = k.

1 1 • Existence of inverse, i.e., for each σ there exists σ − such that σ ◦ σ − = I. Clearly, 1 the inverse can be defined such that if σ(k) = j, then σ − (j) = k.

• Multiplications are associative:

◦ ◦ ◦ ◦ σ3 (σ2 σ1) = (σ3 σ2) σ1.

Proof is in a homework problem.

One important theorem resulting from these definitions is that the set of products of a single permutation with all elements of SN is equal to SN ◦ σ SN = SN .

Proof: Due to closure, the only possibility of not reproducing the whole group is that two

different elements of SN are mapped by σ onto the same element:

σ ◦ σ 0 = σ 000 = σ ◦ σ 00.

1 Multiplying this equation by σ − , we get σ 0 = σ 00 which contradicts our assumption. { 1} 1 Another theorem states that σ − = SN . This is equivalent to saying that σ and σ − are in one-to-one correspondence. Indeed, assume that there are two permutations that ◦ ◦ 1 are inverse to σ: σ1 σ = I = σ2 σ. Multiplying this by σ − from the right, we get that σ1 = σ2. One important property of permutation is that each permutation can be written as a product of the simplest possible permutations called transpositions. A transposition is a permutation involving only two elements:    σ(i) = j !  1 2 ... i ... j ... N τ = τij = (ij) =  σ(j) = i = .  1 2 ... j ... i ... N  σ(k) = k for k , i,j

To prove that any permutation can be written as a product of transpositions, we just construct such a product. For a permutation σ written as ! 1 2 ... k ... N σ = (1) i1 i2 ... ik ... iN { } first find i1 in the set 1,2,...,N and then transpose it with 1 (unless i1 = 1, in which case do nothing). This maps i1 in 1. Then consider the set with i1 removed, find i2, and transpose it with 2. Continuing in this way, we get the mapping of expression (1) which proves the theorem. The decomposition of a permutation into transposition is

not unique as we can always add τijτij = 1. Although the number of transpositions in

9 a decomposition is not unique, this number is always either odd or even for a given permutation. The proof of this important theorem is given as a homework. Thus, (−1)πσ , − where πσ is the number of permutations in an arbitrary decomposition, is always 1 or 1 for a given permutation and we can classify each permutation as either odd or even. We say that each permutation has a definitive parity. πσ π 1 One theorem concerning the parity of permutations is that that (−1) = (−1) σ− , i.e., that a permutation and its inverse have the same parity. This results from the fact that each transposition is its own inverse.

B. Determinant

The fundamental zeroth-order approximation for the wave function in theory of many fermions is Slater’s determinant. Thus, we have to study the concept of determinant. For × A a general N N matrix with elements aij, the determinant is defined as

a a ... a 11 12 1N a a ... a X |A| ≡ A 21 22 2N − πσ det = = ( 1) aσ(1)1aσ(2)2 ... aσ(N)N (2) ...... σ

aN1 aN2 ... aNN where the sum is over all permutations of numbers 1 to N and πσ is the parity of the permutation. There are several important theorems involving that we will now prove. First, let us show that that |A| = |AT |, which also means that the definition (280) can be written as X |A| − πσ = ( 1) a1σ(1)a2σ(2) ... aNσ(N). (3) σ

To prove this property, first consider σ(i) = 1. There must be one such aiσ(i) in each term in formula (3). Denote this value of i in a given term by i1 and move ai11 to the first position in the product

a1σ(1)a2σ(2) ... ai 1 ... aNσ(N) = ai 1a2σ(2) ... ai 1σ(i 1) ai +1σ(i +1) aNσ(N) 1 1 1− 1− 1 1

Next, look for σ(i) = 2 = σ(i2) and move ai22 it the second position in the product. Continuing, one eventually gets

a1σ(1)a2σ(2) ... aσ(N)N = ai11ai22 ... aikk ... aiN N . (4)

The set i1,i2,...,iN is a permutation σ˜: σ˜(k) = ik. Note that σ˜ , σ in general. Also, the permutations σ˜ originating from different terms in expansion (3) are all different. This is ◦ 1 so since from σ(ik) = k and ik = σ˜(k) it follows that σ(σ˜(k)) = k = (σ σ˜)(k). Thus, σ˜ = σ − . Therefore, if we sum all possible terms on the right-hand side of Eq. (4), we sum over

10 { 1} all permutations of SN (as shown earlier, σ − = SN ). The only remaining issue is the 1 sign. The sign is right since we have proved that the parity of σ and σ − is the same. This completes the proof. The next important theorem says that if one interchanges two columns (or rows) in a determinant, the value of the determinant changes sign

|A | −|A| i j = ↔ A where i j denotes a matrix with such interchange. The proof is as follows. We can ↔ assume without loss of generality that i < j. Denote: n o A { } A = akl i j = akl0 ↔

akl = akl0 if l , i,j (5)

aki = akj0 , akj = aki0 (6) |A| | A | Therefore, in the expansions of and i j , we can identify identical terms, modulo ↔ sign. Pick up a term in the expansion of |A|

− πσ ··· ··· ··· ( 1) aσ(1)1aσ(2)2 aσ(i)i aσ(j)j aσ(n)n where σ is here some fixed permutation of 1,2,···n. To find the corresponding term in | A | the expansion of i j ↔ X | A | − πσ˜ ··· ··· ··· i j = ( 1) aσ0˜(1)1 aσ0˜(2)2 aσ0˜(i)i aσ0˜(j)j aσ0˜(n)n ↔ σ˜ we should choose: σ˜(k) = σ(k) for k , i,j since, due to (5), aσ0 (k)k = aσ(k)k if k , i,j. Analogously, σ˜(i) = σ(j) and σ˜(j) = σ(i) since, due to (6),

aσ0˜(j)j = aσ˜(j)i = aσ(i)i, where the second equality results from our assumption σ˜(j) = σ(i), and, similarly,

aσ0˜(i)i = aσ˜(i)j = aσ(j)j.

This can be done for all n! terms in |A| so that there is one to one correspondence between terms, modulo sign. Since ( ) σ(k) k , i,j  ◦  σ˜(k) = ◦ = σ τij (k) (σ τij)(k) k = i or j

(if k , i,j, τij has no effect), the permutations σ˜ and σ differ by one transposition and therefore (−1)πσ = −(−1)πσ˜ , which proves the theorem.

11 Another theorem states that if a column of a matrix is a linear combination of two (or more) column matrices, the determinant of this matrix is equal to the linear combination of determinants, each containing one of these column matrices:

|A a b c | |A a b | |A a c | ( j = β + γ ) = β ( j = ) + γ ( j = ) . (7)

The proof follows from the fact that the definition of determinant implies that each term in the expansion (280) contains exactly one element from each column and each row.

Thus, each term contains the factor βbi + γci and can be written as a sum of two terms. Pulling the coefficients in front of determinants proves the theorem. One more theorem which is the subject of a homework is that the determinant of a product of two matrices is the product of determinants: |AB| = |A||B|. This theorem can be used to prove that the determinant of a unitary matrix, i.e., a matrix with the UU I property † = , where the dagger denotes a matrix which is transformed and complex conjugated, is a complex number of modulus 1. Indeed

 T  2 1 = |UU†| = |U||U†| = |U|| U ∗ | = |U||U∗| = |z|

where we used the theorem about the determinant of a transformed matrix. Finally, a homework problem shows that the determinant of A can be computed using the so-called Laplace’s expansion X X |A| − i+j |M | − i+j |M | = ( 1) aij ij = ( 1) aij ij . i j M A where the matrix ij is obtained from matrix by removing the ith row and jth column.

III. SEPARATION OF NUCLEAR AND ELECTRONIC MOTION

For a molecule consisting of K particles, nuclei and electrons, the Hamiltonian is

K 2 K X ~ X qiqj H = − ∇2 + (8) 2m Ri |R − R | i=1 i i

12 the gain is not as dramatic since the number of electrons in molecules containing heavier atoms is much larger than the number of nuclei. Nevertheless, this separation is always performed since it easier to solve equations that concern (i.e., electrons) than several different kinds. The separation of nuclear and electronic motion is a good approximation since a nucleus is at least about√ 2000 times heavier than an electron and therefore the former particles move about 2000 times slower. Thus, as the slow nuclei move, the fast electrons follow them and their distribution around nuclei is not much different than in the case of stationary nuclei. Such separation of motions is called the adiabatic approximation. In the case of molecules, we more often uses the so- called Born-Oppenheimer (BO) approximation which is a further simplification of the adiabatic one. The BO approximation, called also the clamped-nuclei approximation, just means that electrons move in the field of nuclei clamped in space. The solutions of the clamped-nuclei Schrödinger’s equation are called the electronic states. In many cases, one has to go beyond the adiabatic approximation. This is needed for small molecules when one needs to get very accurate results or for any size molecules in certain regions of nuclear configurations where the adiabatic approximation breaks down do to strong interactions between energetically close electronic states. One usually starts from the adiabatic approximation and solves equations that couple the electronic and nuclear motions in a perturbative fashion, computing in this way the so-called nonadiabatic effects.

A. Hamiltonian in relative coordinates

To simplify notation, let us restrict our attention to diatomic molecules with nuclear

masses M1 and M2. A generalization to molecules with more nuclei is straightforward. Let Ri, i = 1,2, denote the coordinate of the two nuclei, whereas the coordinates of the N electrons will be denoted by r˜i, all coordinates still in a space-fixed system. Now introduce the center of mass (CM)

 N  1  X  R = M R + M R + m r˜ , CM M  1 1 2 2 e i i=1 where me is the mass of an electron and M = M1+M2+Nme the total mass of the molecule, and relative coordinates 1 R = R − R r = r˜ − (R + R ). 1 2 i i 2 1 2 We have chosen to measure electronic positions from the geometric center of nuclei. Another possible choice is to measure them from the center of nuclear mass. To transform the Hamiltonian (8), we have to perform some chain-rule differentiations

13 → corresponding to the following change of variables: [R1,R2,r˜1,...,r˜N ] [RCM,R,r1,...,rN ] ∂ ∂ ∂X ∂ ∂X X ∂ ∂x M ∂ ∂ 1 X ∂ = CM + + i = 1 + − , ∂X ∂X ∂X ∂X ∂X ∂x ∂X M ∂X ∂X 2 ∂x 1 CM 1 1 i i 1 CM i i ∂ M ∂ ∂ 1 X ∂ = 2 − − , ∂X M ∂X ∂X 2 ∂x 2 CM i i ∂ m ∂ ∂ = e + . ∂x˜i M ∂XCM ∂xi Now second derivatives

 2 2  2 2 2 ∂ M1 ∂ ∂ 1 X ∂  M1 ∂ ∂ M1 ∂ X ∂ ∂ X ∂ = + +   +2 − − , 2 M 2 ∂X2 4  ∂x  M ∂X ∂X M ∂X ∂x ∂X ∂x ∂X1 ∂XCM i i CM CM i i i i

 2 2  2 2 2 ∂ M2 ∂ ∂ 1 X ∂  M2 ∂ ∂ M2 ∂ X ∂ ∂ X ∂ = + +   −2 − + , 2 M 2 ∂X2 4  ∂x  M ∂X ∂X M ∂X ∂x ∂X ∂x ∂X2 ∂XCM i i CM CM i i i i ∂2 m 2 ∂2 ∂2 m ∂ ∂ = e + + 2 e . 2 M 2 2 M ∂X ∂x ∂x˜i ∂XCM ∂xi CM i Plug the derivatives in the kinetic energy part of the Hamiltonian

 2 ~2 2 ~2 2 ~2 ~2 ~2 M1 ∂ ∂ X ∂  ∂ ∂ ∂ X ∂ T = − − −   − + x 2 M2 2 2M ∂X2 8M  ∂x  M ∂X ∂X 2M ∂X ∂x ∂XCM 1 1 i i CM CM i i ~2 ∂ X ∂ + 2M ∂X ∂x 1 i i  2 ~2 2 ~2 2 ~2 ~2 ~2 M2 ∂ ∂ X ∂  ∂ ∂ ∂ X ∂ − − −   + + 2 M2 2 2M ∂X2 8M  ∂x  M ∂X ∂X 2M ∂X ∂x ∂XCM 2 2 i i CM CM i i ~2 X − ∂ ∂ 2M ∂X ∂x 2 i i ~2 Nm ∂2 ~2 X ∂2 ~2 ∂ X ∂ − e − − , 2 M2 2 2m 2 M ∂X ∂x ∂XCM e i ∂xi CM i i Terms 4 and 10 cancel, so do terms 5, 11, and 15. Terms 1, 7, and 13 can be added together and the masses in the numerators add to M. We therefore now get

 2 2 2 2 2 2 2 2 2 ! ~ ∂ ~ ∂ ~ X ∂ ~ X ∂  ~ 1 1 ∂ X ∂ T = − − − −   + − , x 2M 2 2µ ∂X2 2m 2 8µ  ∂x  2 M M ∂X ∂x ∂XCM e i ∂xi i i 1 2 i i where 1/µ = 1/M1 + 1/M2. Since the CM coordinates appear only in the first term, the center of mass motion can be separated. After adding the terms in the other two

14 directions, the remaining Hamiltonian, expressed only in relative coordinates, can be written as

 2 ~2 ~2 X ~2 X ~2 X 2 2   1 H = − ∇ − ∇ −  ∇r  − ∇R · ∇r + V, 2µ R 2m ri 8µ  i  2 µ i e i i a i where we denoted 1 − 1 = 1 and V denotes the second term in the Hamiltonian (8). M2 M1 µa Since this term contains only interparticle distances, it is uneffected by the transformation.

B. Born-Oppenheimer approximation

The Hamiltonian can be divided into two parts

H = H0 + H0 (9) ~2 X H = − ∇2 + V (10) 0 2m ri e i  2 ~2 ~2 X ~2 X 2   1 H0 = − ∇ −  ∇r  + ∇R · ∇r . (11) 2µ R 8µ  i  2 µ i i a i

The Hamiltonian H0 is called the electronic Hamiltonian since it acts only on electronic coordinates. It is also called clamped-nuclei Hamiltonian since it describes the system if

H0 is neglected (and H0 becomes zero if nuclear masses go to infinity so that nuclei do not move, are clamped in space). Such approach is called the BO approximation. The electronic Schrödinger equation is   ~2 X  2  − ∇ + V ψ(r ,...,r ;R) = E(R)ψ(r ,...,r ;R).  2m ri  1 N 1 N e i

Since the equation is different for each internuclear separation R = |R|, the wave function and the energy depend parametrically on R. We use the word “parametrically" to emphasize that R is not a variable in the electronic Schrödinger equation, but the equation has to be solved separately for each value of R that is of interest. For molecules with more than two nuclei, the electronic wave function depends parametrically on the positions of all nuclei. Despite the name “clamped-nuclei" approximation, one solves for nuclear motion in the BO approximation. To do this, one assumes the exact wave function to be a product of the electronic wave function and of a function of R

≈ Ψ (r1,...,rN ;R) ψ(r1,...,rN ;R)f (R).

15 Next, approximate H0 by its first term and plug this function into the approximate Schrödinger’s equation   ~2 ~2 X  2 2  − ∇ − ∇ + V ψ(r ,...,r ;R)f (R) = Eψ(r ,...,r ;R)f (R).  2µ R 2m ri  1 N 1 N e i The function f can be pulled out from the second and third term of the Hamiltonian. We make now one more approximation and neglect the terms resulting from the action of the first term on ψ. Then we can write   ~2 ~2 X 2  2  − ψ(r ,...,r ,R)∇ f (R)+f (R)− ∇ + V ψ(r ,...,r ,R) = Eψ(r ,...,r ;R)f (R). 2µ 1 N R  2m ri  1 N 1 N e i and integrate over electron coordinates assuming hψ|ψi = 1 for all R. We then get " ~2 # − ∇2 + E(R) f (R) = Ef (R). 2µ R Thus, the electronic energy becomes the potential energy surface for the motion of nuclei.

C. Adiabatic approximation and nonadiabatic correction

The BO approximation discussed above can be obtained from a more rigorous procedure that originates from the exact solutions of the Schrödinger equation for all particles. We can expand such a solution in complete basis sets in electronic and nuclear coordinates using the theorem discussed earlier X X X ˜ ˜ Ψ (r1,...,rN ,R) = cijψi(r1,...,rN )gj(R) = ψi(r1,...,rN ) cijgj(R) ij i j X ˜ ˜ = ψi(r1,...,rN )fi(R) i where the second, equivalent form is more convenient to use. However, since we want to use the solutions of the electronic Schrödinger equation rather than some arbitrary complete basis set, our expansion becomes X Ψ (r1,...,rN ,R) = ψj(r1,...,rN ;R)fj(R). (12) j One can view this expression as using a different complete basis set for each R. We now insert the expansion (12) into Schrödinger’s equation (with CM separated), multiply by ψi(r1,...,rN ;R), and integrate over electronic coordinates. Let’s work out the first term in the operator H0: ~2 X   ~2 Xh     i − hψ | ∇2 ψ if = − hψ | ∇2 ψ if + hψ |ψ i∇2 f + 2hψ | ∇ ψ i · ∇ f , 2µ i R j j 2µ i R j j i j R j i R j R j j j (13)

16 where the parentheses inside integrals indicate that differentiations with respect to R are performed only inside the parentheses. Similarly, for the third term we get   2 ~ X  X  − hψ |∇R · ∇r ψ if  = (14) 2µ i  k j j a j k     2 ~ X  X  X  − hψ |∇R · ∇r ψ if + hψ | ∇r ψ i · ∇Rf . 2µ  i  k j j i k j j a j k k (15)

The sum of the first term in Eq. (13), of the matrix element of second operator in Eq. (11), and of the first term in Eq. (14) can be written as

 2   ~2 X ~2 X X ~2 X X  2      − hψ | ∇ ψ if − hψ | ∇r  ψ if − hψ |∇R · ∇r ψ if 2µ i R j j 8µ i  k  j j 2µ i  k j j j j k a j k X ˜ = Hij0 fj (16) j

˜ where Hij0 are the matrix elements of H0 between the electronic wave functions with H0 ˜ interpreted in such a way that it does not act outside the integrals. i.e., Hij0 are simple functions of R. With this definition, we can write Schrödinger’s equation as

~2 2 X − ∇ f (R) + E (R)f (R) − Ef (R) + H˜ 0 (R)f (R) 2µ R i i i i ij j j ~2 X   ~2 X X − hψ | ∇ ψ i · ∇ f (R) − hψ | ∇ ψ i · ∇ f (R) = 0 (17) µ i R j R j 2µ i rk j R j j a j k

where we used the orthonormality of electronic wave functions for each R to obtain the first three terms. The last two terms will be written as   X 2 1   1  B (R) · ∇Rf (R) = −~  hψ | ∇Rψ i + hψ | ∇r ψ i · ∇Rf (R). ij j µ i j 2µ i k j  j a k

We will now show that Bii(R) = 0 for real electronic functions (one can always choose electronic functions to be real, for proof see Shankar p. 177). This is because we have

h | i h | i h | i h | i 0 = ∇R ψi ψi = ∇Rψi ψi + ψi ∇Rψi = 2 ψi ∇Rψi

so that the first term is zero. The second term is zero since it is proportional to the expectation value of the momentum operator. The latter value is zero since for real wave function the probability of finding momentum P and −P is the same (in one dimension

17 ipx/~ 2 ipx/~ 2 |he− |ψi| = |he |ψi| and this result generalizes to any number of dimensions). Now we can move the off-diagonal to the right-hand side, getting " ~2 # 2 Xh i − ∇ + E (R) + H˜ 0 (R) − E f (R) = − H˜ 0 f (R) + B (R) · ∇ f (R) (18) 2µ R i ii i ij j ij R j j,i Note that this equation is still equivalent to Schrödinger’s equation. This set of coupled equations can be solved directly for very small molecules, but usually one solves it perturbatively, treating the right-hand side as a perturbation. The last form of Schrödinger’s equation is appropriate for making the approximations discussed above. Since usually the off-diagonal matrix elements are smaller than diagonal one, one obvious approximation is to neglect the right-hand side. This gives the adiabatic

approximation. The resulting equation for fi(R) differs from the BO equation by the term ˜ Hii0 (R) which is called the adiabatic or diagonal correction. Thus, the BO approximation differes from the adiabatic approximation by this correction. The adiabatic equation is of the same degree of difficulty as the BO equation since in each case nuclei move on a potential energy surface. Since the diagonal correction is usually small, in most current calculations it is neglected. The adiabatic approximation fails when potential two energy surfaces E (R) and E (R) i i0 become close to each other. Clearly, in such cases some off-diagonal matrix elements are not significantly smaller compared to diagonal ones since there are two electronic wave functions which are similar. In such cases, one has to include at last the off-diagonal matrix elements that couple these states.

IV. THE INDEPENDENT-PARTICLE MODEL: THE HARTREE-FOCK METHOD

Our problem to solve it the time-independent Schrödinger equation with the Hamil- tonian

N N N N ~2 X Xnuc X Z e2 1 X e2 Hˆ = − ∇2 − a + (19) 2m i |r − R | 2 |r − r | i=1 a=1 i=1 i a i

where m denotes electron’s mass, e electron’s charge, N is the number of electrons, Nnuc is the number of nuclei, Za is the charge of nucleus a, ri are positions of electrons, Ra are positions of nuclei. Note that this Hamiltonian is the same as the Hamiltonian defined by Eq. (10) except that we neglected the nuclear-nuclear repulsion terms. These terms give just a constant in any type of electronic structure approach and this constant can be simply added to the final result. We also dropped the subscript “0" since this will be the only Hamiltonian considered from now on. Despite the simplification of eliminating nuclear degrees of freedom, the solution of the clamped-nuclei Schrödinger’s equation for even simple molecules, such as the

18 water molecule with 10 electrons and 30 spatial degrees of freedom, appears as an impossible task. The main idea for simplifications that may come to mind is to solve such equation one electron at a time, which is then a 3-dimensional problem. In the most straightforward approach, this would mean that one neglects all interactions between electrons in the Hamiltonian (19). With such an approximation, the problem rigorously separates into N one-electron problem when the wave function is written as a product of one-electron functions. However, this straightforward independent-particle model works poorly. In particular, when an electron in a molecule or solid is far from a nucleus, it does not see an object of charge Za since other electrons screen the nuclear charge. There were several efforts at the beginnings of quantum mechanics to scale nuclear charges to account for the screening. One step further is to include in the one-electron equation an interaction with an electron cloud representing an average of the electron positions, leading to a family of mean-field methods. It turns out there is a rigorous and systematic way of achieving the best possible representation of the mean field, called the Hartree-Fock (HF) method. The wave function in this method is an antisymmetrized product of one-electron functions and the method still requires solving only one-electron equations, however, the set of equations is coupled.

A. Slater determinant and antisymmetrizer

The wave function in the HF method is written in the form of Slater determinant

x x x φk1 ( 1) φk1 ( 2) ... φk1 ( N )

1 φk (x1) φk (x2) ... φk (xN ) Ψ (x ,x ,...,x ) = √ 2 2 2 (20) 1 2 N ... N! x x x φkN ( 1) φkN ( 2) ... φkN ( N ) { } where xi = ri,si denotes the spatial and spin coordinates of ith electron and single- x electron functions φki ( j) are called spinorbitals. The spin variable takes on the values ± 1 2 . We will use only pure-spin spinorbitals which means that a given φi(x) has to be 1 − 1 zero at s = 2 and nonzero at s = 2 or vice verse. The orbitals form a complete set of one-electron functions and the subset included in the Slater determinant is an arbitrary subset of such spinorbitals. Of course, as it follows from properties of determinants, all spinorbitals have to be different, otherwise the determinant is zero. As we will show soon, the normalization factors assures that the determinant is normalized to 1 if the set of spinorbitals is orthonormal. The determinant can also be written as the result of the action of an operator A called antisymmetrizer 1 X A = (−1)πσ P (21) N! σ σ

19 where the sum is over all N! permutations Pσ of N electrons. Since Pσ acts now on electron coordinates, we call it an operator. The normalization factor assures that the antisymmetrizer is idempotent, i.e., A2 = A. This can be seen from the following X 2 1 π π A = (−1) σ (−1) σ0 P ◦ P . (N!)2 σ σ 0 σ,σ 0 Consider a fixed value of σ. The product of P with all operators P is equal to the set σ σ 0 of all SN operators. Thus, as we sum over σ, we get N! times the set of all permutation operators. Thus, it will be equal to A if the signs are right. This is the case since if we write P = P ◦ P , and expand P and P into products of transpositions, we see that σ 00 σ σ 0 σ σ 0 the number of transpositions in P is the sum of the numbers of transpositions in P and σ 00 σ P . σ 0 Acting with A on the product of spinorbitals, we get √   Ψ x x x A x x x ( 1, 2,..., N ) = N! φk1 ( 1)φk2 ( 2) ... φkN ( N )

we indeed get the Slater determinant since the antisymmetrizer realizes the definition of the determinant. Let us now prove that the Slater determinant is normalized if the spinorbitals are normalized: XZ h | i 3 φi φj = φi∗(x)φj(x)d r = δij s where we defined the bracket notation that will be used for integrals from now on. Notice that the bracket includes summation over the spins. This summation runs over 1 − 1 s = 2 and s = 2 . If i = j, the spinorbital is nonzero at one of the values of s. Two spinorbitals may have the same spatial part, but differ by spin. Then for each value of s one of the spinorbitals is zero, which satisfies orthonormality. Other pairs of different spinorbitals can be orthogonal already due to different spins or/and due to orthogonality of spatial components (usually one assumes, however, that different spatial parts are always orthogonal). The overlap integral can be written as     hΨ |Ψ i hA x x x |A x x x i = N! φk1 ( 1)φk2 ( 2) ... φkN ( N ) φk1 ( 1)φk2 ( 2) ... φkN ( N ) where the brackets denote integral over space and spin coordinates of all electron. Since A is obviously Hermitian and we have shown that it is idempotent, we can move it to the ket getting     hΨ |Ψ i h x x x |A x x x i = N! φk1 ( 1)φk2 ( 2) ... φkN ( N ) φk1 ( 1)φk2 ( 2) ... φkN ( N ) .

Consider first the identity permutation in A. For this term, the integral separates into the product of N one-electron integrals with the integrand in each one-electron integral

20 being the square modulus of a spinorbital. Thus, each such integral is 1. Now consider the term such that electron 1 is permuted with 2:     h x x x | x x x i φk1 ( 1)φk2 ( 2) ... φkN ( N ) φk1 ( 2)φk2 ( 1) ... φkN ( N ) .

h x | x i h x | x i Now, we get two integrals that are zero: φk1 ( 1) φk2 ( 1) and φk2 ( 2) φk1 ( 2) . Thus, this contribution is zero. Clearly, any permutation of electrons in the ket leads to zero A 1 h | i term. Thus, reduces to N! I and Ψ Ψ = 1.

B. Slater-Condon rules

Matrix elements of the Hamiltonian (19) with Slater determinants can be written in terms of matrix elements between spinorbitals using the so-called Slater-Condon rules. Let us define the operators

N ~2 Xnuc Z e2 h(i) = − ∇2 − a (22) 2m i |r − R | a=1 i a XN Fˆ = h(i) (23) i=1 N N 1 X e2 1 X Gˆ = = g(ij) (24) 2 |r − r | 2 i

ˆ where we introduced short-hand notation replacing xi by i. The rules for the operator F are

XN h | ˆ i Ψ FΨ = hii (25) i=1 h | ˆ i Ψ FΨ 0 = hik (26)

hΨ |FˆΨ 00i = 0 (27)

where Ψ denotes the determinant built from the set of spinobitals φ1, φ2, ... φN , Ψ 0 differs from Ψ by replacement of the spinorbital φi by the spinobital φk, k > N, and Ψ 00 includes two such replacements. We have also introduced a short-hand notation for h | i the integrals, i.e., hik = φi hφk . The rules are actually valid for any set of orbitals, but we will need them only for the set specified. The proof of Eq. (25) is as follows. Similarly as we did when proving the normalization of Slater determinants, we can move the antisymmetrizer, only now from ket to bra. We have to first commute it with the operator Fˆ. This is possible since this operator is symmetric, i.e., does not change if we permute any electrons in it. Thus, we can pull Fˆ

21 through the antisymmetrizer. Then using the Hermiticity and idempotency of A, we get X h | ˆ i hA | i Ψ FΨ = N! (φ1(1)φ2(2) ... φN (N)) h(i)(φ1(1)φ2(2) ... φN (N)) i hA | = N! (φ1(1)φ2(2) ... φN (N)) [h(1)φ1(1)φ2(2) ... φN (N)

+ φ1(1)h(2)φ2(2) ... φN (N) i +φ1(1)φ2(2) ... h(N)φN (N)]

We can see that similarly as in proof of normalization, any permutation of the electron in A 1 the bra will lead to a zero integral. Thus, reduces to N! I which proves Eq. (25). In the case of Eq. (26), we will have in the ket one spinorbital, φk, which is orthogonal to all spinorbitals in the bra. Thus, the integral involving this spinorbital can be nonzero only if h acts on it. Moreover, in the bra one has to permute the electrons in such a way that spinorbital φi is the function of the same electron as φk since φi is orthogonal to all spinorbital in the bra. Thus, only a single permutation in the bra survives, which proves Eq. (26).

In the case of Eq. (27), we will have in the ket two spinorbitals, φk and φk, which are absent in bra. Only one of them can be acted upon by h, so that the other spinorbital will always make the integrals zero, which proves Eq. (27). The analogous formulas involving Gˆ are (proofs in a homework):

N 1 X   hΨ |GˆΨ i = g − g (28) 2 ijij ijji i,j=1 XN h | ˆ i  −  Ψ GΨ 0 = gijkj gijjk (29) j=1 h | ˆ i − Ψ GΨ 00 = gijkl gijlk (30)

hΨ |GˆΨ 000i = 0 (31) where Ψ 000 denotes a triply substituted Slater determinant and

X e2 h | i 3r 3r x x x x gijkl = φiφj gφiφj = d 1d 2φi∗( 1)φj∗( 2)| − |φk( 1)φl( 2). r1 r2 s1,s2 " C. Derivation of Hartree-Fock equations

The Hartree-Fock methods seeks the Slater determinant Ψ which minimizes the expectation value of the Hamiltonian h | i HF Ψ HΨ ≥ E = min E0. Ψ hΨ |Ψ i

22 The Ritz variational principle guarantees that EHF is always greater or equal to the exact ground-state energy E0 of a given system. Since Ψ is built from spinorbitals, the method finds the optimal spinorbitals for the ground state of a system and can be considered to be the ultimate mean-field method. We will always assume that the spinorbitals are orthonormal, so that Ψ is normalized and we can write

EHF = minhΨ |HΨ i. Ψ Using the Slater rules for the Hamiltonian (19), the expectation value can be written as N N X 1 X   hΨ |HΨ i = h + g − g ii 2 ijij ijji i=1 i,j=1 where we still assume that the set of spinorbitals is enumerated by 1,...,N and we included i = j term in the second sum since the two terms add to zero in this case. To find the minimum, we have to vary each orbital. Since the orbitals are complex, one has to vary both the real part and the imaginary part. Equivalently, one can vary the orbital and its complex conjugate → → φi φi + δφi φi∗ φi∗ + δφi∗.

We will start from varying ψi∗’s only and we will find that this is sufficient to obtain a solvable set of equations. One can then check that varying ψi’s gives an equivalent set of equations. Since we assumed that the spinorbitals are orthonormal, we have to imposed h | i the condition ψi ψj = δij during the optimization. This can be done by adding to the expectation value the the condition multiplied by Lagrange’s undetermined multipliers. Thus, we will minimize XN L h | i − h | i −  = Ψ HΨ λij φi φj δij . i j=1 ≤

Replacing all φi∗ by φi∗ + δφi∗, we get X 1 X     L[{φ + δφ }] = hφ + δφ |hφˆ i + h(φ + δφ ) φ + δφ |gˆ φ φ − φ φ i i∗ i∗ i i i 2 i i j j i j j i i ij X − h | i −  λij φi + δφi φj δij i j ≤ X 1 X   = hφ |hφˆ i + hφ φ |gˆ φ φ − φ φ i i i 2 i j i j j i i ij X 1 X   1 X   + hδφ |hφˆ i + hδφ φ |gˆ φ φ − φ φ i + hφ δφ |gˆ φ φ − φ φ i i i 2 i j i j j i 2 i j i j j i i ij ij X − h | i λij δφi φj i j ≤

23 where the first two terms give the value of the functional at the minimum and we have used the fact that the spinorbitals are orthonormal at the minimum so there is no orthonormality term in this part. We have also omitted the terms that are products

of orbital increments as they are of second order. The term with δφj can be shown to be equal to the preceding term 1 X   1 X   1 X   hφ δφ |gˆ φ φ − φ φ i = hδφ φ |gˆ φ φ − φ φ i = hδφ φ |gˆ φ φ − φ φ i 2 i j i j j i 2 j i j i i j 2 i j i j j i ij ij ij where in the first step with interchanged coordinates of electron 1 and 2 in the integral and in the second step we interchanged the summation indices. We can now write XXZ L { } HF { } 3 × [ φi∗ + δφi∗ ] = E [ φi∗ ] + d r1δφi∗(x1) i s1    XXZ   X  ˆ 3 − −  h(r1)φi(x1) + d r2φ∗(x2)gˆ(r1,r2) φi(x1)φj(x2) φj(x1)φi(x2) λijφj(x1)  j  j s2 j

Since at the minimum the linear increment has to be zero for an arbitrary δφi∗, this can be only achieve if the whole expression in the large square bracket is equal to zero for any

x1 XXZ   X ˆ 3 − h(r1)φi(x1) + d r2φj∗(x2)gˆ(r1,r2) φi(x1)φj(x2) φj(x1)φi(x2) = λijφj(x1). j s2 j

These are the Hartree-Fock equations for spinorbitals. Let us rewrite this equation introducing the so-called Coulomb and exchange operators X XXZ ˆ ˆ 3 J(r1) = Jj(r1) = d r2φj∗(x2)gˆ(r1,r2)φj(x2) j j s2 X XXZ ˆ ˆ 3 K(r1)φ(x1) = Kj(r1)φ(x1) = d r2φj∗(x2)gˆ(r1,r2)φ(x2)φj(x1) j j s2

where φ(x1) is an arbitrary spinorbital. Note that while Jˆ is just a multiplicative operator, Kˆ is an integral one since it integrates over the function it acts upon. Notice that the operators Jˆ and Kˆ do not depend on spin but act on functions including spin coordinate. Using these operators, one can rewrite the HF equations as h i X ˆ ˆ ˆ h(r1) + J(r1) + K(r1) φi(x1) = λijφj(x1). (32) j

Equations (32) are the set of N equations for N spinorbitals φi depending also on N(N + 1)/2 Lagrange’s multipliers. It is possible to transform these equations to the so- called canonical form where only the diagonal multipliers are present. To achieve this

24 goal, we will use the important theorem stating that Slater determinants are invariant under unitary transformations of spinorbitals (i.e., the determinant is the same when expressed in the original and in the transformed spinorbitals). Let us denote the trans- formed set of spinorbitals by φi0, so that

    φ φ  1   10       φ2   φ20   =   → 0 =   = U      ...   ...      φN φN0

UU I with † = . The Slater determinant built of transformed spinorbitals can be written as

P P P i u1iφi(1) i u1iφi(2) ... i u1iφi(N) P P P i u2iφi(1) i u2iφi(2) ... i u2iφi(N) Ψ 0 = φ0 (x1)φ0 (x2)...φ0 (xN ) = 1 2 N ...... P P P i uNiφi(1) i uNiφi(2) ... i u2iφi(N)   φ (1) φ (2) ... φ (N)  1 1 1     φ2(1) φ2(2) ... φ2(N)  = U  = |U|Ψ    ......    φN (1) φN (2) ... φN (N) where |...| denote a determinant and [...] a matrix and where we started to use short- ≡ hand notation xi i. Since the determinant of a unitary matrix is a complex number of modulus 1, |U| is just a multiplicative phase factor which is irrelevant. If we apply the unitary transformation to the HF equations, i.e., replace all spinobitals by transformed spinorbitals, this substitution has to be made also in the Jˆ and Kˆ operators. Let us now show that these operators are invariant under such transformation. For Jˆ we have

XXZ ˆ { } 3 J[ φi0 ] = d r2(φj0 )∗(2)gˆ(r1,r2)φj0 (2) j s2    XXZ X X 3    = d r gˆ(r ,r ) u∗ φ∗(2) u φ (2) 2 1 2  ji i  jk k  j s2 i k    XXZ X X 3    = d r gˆ(r ,r ) u φ∗(2) u φ (2) 2 1 2  ij i  jk k  j s2 i k Z X 3 X X = d r2gˆ(r1,r2) φi∗(2)φk(2) uji∗ ujk. s2 i,k j

25 The last sum is the same as in U†U = I, so that XZ X ˆ { } 3 J[ φi0 ] = d r2gˆ(r1,r2) φi∗(2)φk(2)δik s2 i,k XXZ 3 ˆ { } = d r2gˆ(r1,r2)φi∗(2)φi(2) = J[ φi ], i s2 i.e., the Coulomb operator is indeed invariant under the unitary transformation of orbitals. An analogous proof holds for Kˆ . Let us write HF equations in matrix form h i hˆ(r) + Jˆ(r) + Kˆ (r)  =  and rewrite them in transformed form h i hˆ(r) + Jˆ(r) + Kˆ (r) U†0 = U†0 and multiply this equation from left by U to get h i hˆ(r) + Jˆ(r) + Kˆ (r) UU†0 = UU†0.

Since matrix  is symmetric, we can always find a unitary transformation that diagonalizes it, getting hˆ i   h(r) + Jˆ(r) + Kˆ (r) 0 = diag 0  Denoting the diagonal elements of diag by i, we can write the canonical HF equations as h i ˆ ˆ ˆ h(r) + J(r) + K(r) φi(x) = iφi(x) i = 1,2,...,N. These equations are sometimes called pseudo-eigenvalue equations since the operators ˆ ˆ J and K depend on all φi’s. The equations have to be solved iteratively, i.e., one first assumes some initial orbitals (for example, for atoms these can be the hydrogenic orbitals), solves the resulting eigenvalue problem, computes new operators Jˆ and Kˆ using the spinorbitals just obtained, and so on. The convergence of iteration can sometimes be a problem and several methods have been developed to deal with such problems. The most often used method of solving HF problems is to expand the spinorbitals in

terms of some known basis functions χi X φi = cijχj. j

Then Hartree-Fock equations become matrix pseudo-eigenvalue equations, sometimes called Hartree-Fock-Roothaan or Hartree-Fock-Roothaan-Hall equations. The basis functions can be in particular atomic orbitals and such approach is then called the linear combination of atomic orbitals (LCAO) method. This approach can be used not only to

26 compute energies for atoms, molecules, and solids, but also as an interpretative tool. In particular, it can be used to interpret the chemical bond in molecules. In the simplest case of a diatomic molecules if one restricts restricts linear combinations to pairs of orbitals with similar energy, one of the two molecular orbital energies resulting from such a pair is lower than either energy whereas the other one is higher (this fact is not obvious). Thus, if the spinorbitals corresponding to the lower level are occupied (such spinorbitals are called the bonding ones), there is a gain in energy. This picture can be also applied to solids where the orbital energies corresponding to different pairs of atoms will be very close to each other, forming the so-called bands. The relations between the highest filled or partly filled band and the lowest empty band determine whether a solid is an insulator, semiconductor, or conductor. However, for conductors the HF method encounters several problems.

V. SECOND-QUANTIZATION FORMALISM

In many-particle theory, one often uses the so-called formalism of , i.e., we express formulas in terms of operators that create or annihilate particles at some spinorbitals. The name of the formalism is somewhat misleading. It results from the first use of such operators to quantize electromagnetic field. This happened in the late 1920s, after the formulation of Schrödinger’s equation which was the first quantization, i.e., the quantization of particles. Since we will consider only particles, this leads to a kind of oxymoron: we will use second-quantization tools in first quantization approach.

A. Annihilation and creation operators

Slater determinants can be described using occupation number representation:

√ ( h i n = 1 if i ∈ {k ,k ,...,k } Ψ A x x x ↔ | i i 1 2 N = N! φk1 ( 1)φk2 ( 2) ... φkN ( N ) n1, n2,..., ni,... ni = 0 otherwise

where we assumed that spinorbitals form an ordered set. Conventionally, we order spinorbitals coming from the HF method according to their orbital energies (for degenerate spinorbitals, the order is arbitrary). To have a unique relation to the standard way of writing the Slater determinant, we assume that that the spinorbitals on the left-hand side are also ordered. The number of positions in the occupation number representation is infinite, but we can display explicitly only the sequence up to the highest occupied orbital. For example, √ A ↔ | i Ψ = 5! [φ2(x1)φ7(x2)φ9(x3)φ10(x4)φ11(x5)] 0100001011100 ...

27 The creation operator can now be defined   i i  a |... 0 ...i = (−1)σ |... 1 ...i a |Ψ i =  i† i†  i  | i ai† ... 1 ... = 0

where σ denotes the number of ‘1’s before the position i. In words, if spinorbital φi is absent in |Ψ i, this orbital is added at the ith place. If this orbital is present, the action of

ai† gives zero. The phase factor (−1)σ is needed in order to create exactly the same determinant as | i would be built from the set of the spinorbitals present in Ψ plus the spinorbital φi. To | i see that the phase is correct, first add φi as the first row to Ψ . This gives us a uniquely defined determinant, but it may differ by phase from the standard determinant. We can then permute this row with other rows until it arrives at ith position. This gives the phase factor (−1)σ . We can use the creation operators to define the ground state determinant of a system with N electrons: | i | i Φ0 = a1†a2† ... aN† vac . A homework exercise will show the order chosen gives the correct ground-state deter- minant. In the case of a closed-shell system (e.g., atoms with complete shells occupied), this definition gives a unique determinant. For open-shell systems where the highest occupied spinorbital is degenerate, several determinants can be created. In most cases, a linear combination of such determinants is required to form a wave function with proper symmetry properties. The annihilation operator is defined analogously to the creation operator

 i i  | i − σ | i | i  ai ... 1 ... = ( 1) ... 0 ... ai Ψ =  i ,  | i  ai ... 0 ... = 0

so the we may consider this action as first permuting φi until it arrives at the first row and then annihilating it.

B. Products and commutators of operators

If several creation operators act in sequence, the order is important since these operators do not commute. Let us find the commutation rule by acting with a pair of creation operators on a state with zero electrons. We call such a state true vacuum and denote by |vaci. We get (assume without loss of generality that i < j):

j i j | i | ··· | i ai†aj† vac = ai† ... 1 = ... 1 ... 1 ...

28 i i j | i | i −| i aj†ai† vac = ai† ... 1 ... = ... 1 ... 1 ...

Thus, since ai†ai† = 0 from the definition, we have h i a†a† = −a†a† or a†,a† = 0, i j j i i j + where [a,b]+ = ab +ba, is the anticommutator. Thus, the creation operators anticommute.

We have considered here the action of ai†aj† on the vacuum state only, but one can easily see that the result is the same for any determinant since if either φi or φj is included in the determinant, we get zero. In the opposite case, reasoning is the same as for the vacuum case. Clearly, the commutation rule for annihilation operators is analogous to that for the creation operators h i a ,a = 0. i j +

To show this, we have to act on a determinant containing both φi and φj and the minus sign results analogously to the creation operators case. Let us now find our commutation rules for products of creation and annihilation operators ( 0 if i < Ψ a†a |Ψ i = . i i |Ψ i if i ∈ Ψ Note the phase factors cancel: (−1)σ (−1)σ = 1. Analogously,

( ∈ | i 0 if i Ψ aia† Ψ = . i |Ψ i if i < Ψ

Thus, in the anticommutator, one of the two terms will always reproduce |Ψ i, so that h i a†,a = 1. i i + Finally, consider (again assuming i < j)   0 if j < Ψ or i ∈ ψ | i  a†aj Ψ = i j . i  σ  (−1) j (−1)σi |... 1 ... 0 ...i otherwise   0 if j < Ψ or i ∈ ψ | i  aja† Ψ = i j i  σ +1  (−1) j (−1)σi |... 1 ... 0 ...i otherwise where the additional power of −1 results from the fact that the number of occupied states increases by one due to the action of ai†. Thus, the action in the opposite orders gives the same result times −1, so that we have h i a†,a = δ . i j + ij

29 C. Hamiltonian and number operator

Since linear combinations of creation and annihilation operators can be used to construct an arbitrary wave function in the Hilbert space, one can use such linear combination to construct various operators. Let us first construct a simple operator called the occupation number operator

ˆ Ni = ai†ai

As shown above, this operator acting on any determinant gives 0 if spinorbital i is absent and recovers the determinant when it is present. We may say therefore that the eigenvalues of this operator are 0 and 1 and these eigenvalues are occupation numbers | i for φi in Ψ ˆ | i | i Ni Ψ = ni Ψ . We can now construct the number operator

X ˆ ∞ ˆ N = Ni i=1 which gives X ˆ | i ∞ | i | i N Ψ = ni Ψ = N Ψ . i=1 Thus, this operator “detects" the number of electrons in |Ψ i. Let us now consider the one-electron part of the Hamiltonian

XN ˆ ˆ F = f (ri). i=1

We postulate that Fˆ can be written as

X ˆ ∞ F = fijai†aj i,j=1

h | ˆ i where, as before, fij = φi f φj . To prove this expression, it is sufficient to show that the matrix elements of this Hamiltonian with arbitrary determinants are the same as those resulting from the Slater-Condon rules

X X X h | ˆ i ∞ h | i Ψ FΨ = fij Ψ ai†ajΨ = fijδij = fii i,j=1 i,j Ψ i Ψ ∈ ∈ We could reduce the sum to go only over indices of the orbitals present in |Ψ i since if j is not in the range, then the action of aj gives zero, whereas if i is not in the range,

30 the determinant created in the ket is orthogonal to the one in the bra. For indices in the

range, the action of ai†aj with i , j, as discussed earlier, gives either zero or a determinant different from the original one which makes the matrix element equal to zero. Thus, the

only case when the matrix element is nonzero is i = j and then ai†ai is just the occupation number operator. This proves the theorem for the same determinant on both sides. Now consider the case when spinorbital k in the ket is replaced by spinorbital l. All the arguments from the previous case still hold except that in addition i has to be equal

to k to annihilate φk, otherwise we get zero. Thus, the next to last sum is only over j X X h | ˆ i ∞ h | i h | i Ψ FΨ 0 = fij Ψ ai†ajΨ 0 = fij Ψ ai†ajΨ 0 = fkl. i,j=1 i,j Ψ ,Ψ ∈ 0 In this case, j has to be equal to k to annihilate the replacement spinorbital and i has to be i equal to k to create φk in Ψ 0 , which gives just a single matrix element. The overall sign is plus since the spinorbitals are annihilated and created at the same position. For the Ψ 00i

case, ai†aj is unable to annihilate two replacement orbitals, so the result is zero. Thus, the second-quantized form of Fˆ gives the same matrix elements as the first-quantized form and therefore the two forms are equivalent. The proof of a similar expression for the operator Gˆ is left as a homework problem

1 X∞ Gˆ = g a a a a . 2 ijkl i† j† l k i,j,k,l=1 Notice that the order of indices is different in the matrix element from that in the string of operators.

D. Normal products and Wick’s theorem

1. Normal-Product

The normal-product of creation and annhilation operators is defined as rearragned product of these operators such that all creation operators are to the left of all annihilation operators with a phase factor corresponding to the parity of the permutation producing the rearrangement. For an arbitrary product of creation and annihilation operators ABC... normal-product is denoted as n[ABC..] and is given as

σ n[ABC...] = (−1) a†b†...uv...

where a†b†...uv... = Pˆ(ABC...) Pˆ being the permutation of operators A,B,C,... and σ being the parity of permutation. This definition is not unique, since any rearrangement of the creation operators among

31 themselves and/or the annihilation operators among themselves is permissible but would always be accompanied by an appropriate change in the phase factor; thus all forms of a normal-product are equivalent. Examples are as follows:

n[a†b] = a†b, n[ab†] = −b†a, n[ab] = ab = −ba, n[a†b†] = a†b† = −b†a†

n[a†bc†d] = −a†c†bd = a†c†db = c†a†bd = −c†a†db The usefulness of the normal-product form is that its physical vacuum expectation value is zero: hvac|n[ABC...]|vaci = 0 if [ABC...] is not empty.

2. Contractions (Pairings)

In order to be able to compute expectation values of general operator strings, we will take advantage of Wick’s theorem. In order to be able to formulate this we need to define the contraction (or pairing) of operators. For a pair of creation or annihilation operators A,B, we define their contraction as

AB ≡ AB − n[AB]

Specifically, the four possibilities are:

a† b† = a† b† − a† b† = 0,

a b = ab − ab = 0,

a† b = a† b − a† b = 0, − − a b† = ab† ( b†a) = [a,b†]+ = δab. A normal-product with contractions is defined as follows:

n[ABC...R...S...T ...V ...] = (−1)σ RT SV ...n[ABC...] where all the contracted pairs have been put in front of the normal-product and σ is the parity of the permutation.

3. Time-independent Wick’s theorem

A product of a string of creation and annihilation operators is equal to their normal- product plus the sum of all possible normal-products with contractions. Symbolically,

ABCD... = n[ABCD...]+n[ABCD...]+n[ABCD...]+n[ABCD...]+...+n[ABCD...]+n[ABCD...]

32 +... + n[ABCD...] + n[ABCD...] + n[ABCD...] + n[ABCD...] + ...

Thus, all possible contractions of one pair, two pairs etc. are included. The importance of the above result is that the vacuum expectation value of any normal-product with contractions is zero unless all operators are contracted. The reason is that each contraction contributes a factor of zero or 1 and, if an uncontracted normal-product remains, its vacuum expectation value is zero. For example, consider a†bc†de†f , applying Wick’s theorem we get

a†bc†de†f = n[a†bc†de†f ] + n[a†bc†de†f ] + n[a†bc†de†f ] + n[a†bc†de†f ] + n[a†bc†de†f ] where we have omitted all contractions except those of the form ab†, since they vanish. Since no fully contracted term survives, the vacuum expectation value of this operator product is zero. A more complex example ab†cd†ef † is given as a homework.

4. Outline of proof of Wick’s theorem

In a normal-ordered product p†q†...uv all contractions vanish since in such a product there can be no contractions involving annihilation operator to the left of creation operator. Thus, if a string of operators is already in normal-product form we have X p†q†...uv = n[p†q†...uv] + (All possible contractions) since all terms in the sum vanish. Thus Wick’s theorem holds in this case. Consider next the case where one pair of operators is out of normal order:

 −  p†q†...rs†...uv = p†q†... [r,s†]+ s†r ... uv − = p†q†...δrs...uv p†q†...s†r ... uv

= n[p†q†...rs†...uv] + n[p†q†...rs†...uv]

All other contractions vanish, so Wick’s theorem still holds. Now consider the case where we have two annihilation operators to the left of one of the creation operators:

p†q†...rst†...uv = p†q†...rst†...uv − p†q†...rt†s ... uv

= p†q†...rst†...uv − p†q†...rt†s...uv + p†q†...t†rs ... uv

= n[p†q†...rst†...uv] + n[p†q†...rst†...uv] + n[p†q†...rst†...uv] again satisfying Wick’s theorem, since all other contractions vanish. This procedure can be continued for all pairs of operators out of normal order.

33 5. Comprehensive proof of Wick’s theorem

We shall prove this theorem in three steps. We first prove a lemma L1 which expresses an arbitrary normal-product, multiplied on the right by a single operator, in terms of the normal-product and normal-products with pairings of all the operators involved. Next, we shall generalize this lemma (L2) for normal-products with contraction and, finally, we shall use the two lemmas to prove the general theorem. L1: Xk n[M1M2...Mk]Ml = n[M1M2...MkMl] + n[M1...Mi...MkMl] i=1

Consider Ml is an annihilator, say a, then indeed all the normal-product with contraction appearing on the right hand side vanishes. And we get

n[M1M2...Mk]a = n[M1M2...Mka]

Since a is an annihilator, it can be taken inside the normal-product. We can thus assume that Ml is a creator say b†. Moreover, without any loss of generality, we can further assume that all the operators Mi, i = 1, ..., k are annihilators. Since one can easily extend this special case to a general case as follows. We simply multiply from the left both sides of L1 with the product of pertinent creation operator. These being to the left of all the operators may be brought inside of all the normal-products. Then we can add to the right hand side the terms in which these added creators are contracted, one by one, with the last operator in the product, b†. All these terms vanish, since a contraction of two creators vanishes, and will not change the validity of our identity. Finally, we may rearrange the order of the operator Mi, i = 1, ..., k in each normal-product as desired. Moreover, any permutation of the operators Mi, i = 1, ..., k will not reverse the ordering of the contrated pairs, since the only contraction present is with the right most operator in the product, which is not affected by the permutation.

It thus remains to prove L1 for the special case in which M1,M2,...,Mk are annihilators, say a1,a2,...,ak, and Ml is the creator b†, i.e.,

Xk n[a1a2...ak]b† = n[a1a2...akb†] + n[a1...ai...akb†] (33) i=1 We can now use induction, since above equation is clearly valid for k = 1,

n[a1]b† = n[a1b†] + n[a1b†] − a1b† = b†a1 + δab

and gives anticommutation relation. We suppose that Eq. (33) is valid for k = N ≥ 1 and prove that it is also valid for k = N +1.

34 To do so wmultiply Eq. (33) by an arbitrary annihilator, say a0, from the left.

XN a0 n[a1a2...aN ]b† = a0 n[a1a2...aN b†] + a0 n[a1...ai...aN b†] (34) i=1 Consider now the left hand side of Eq. (34). Since all the operators in the normal-product are annihilators, we can bring a0 in the normal-product.

a0 n[a1a2...aN ]b† = n[a0a1a2...aN ]b† (35)

Similarly we can rewrite all the terms under the summation symbol on the right hand side of Eq. (34) obtaining

XN XN a0 n[a1...ai...aN b†] = n[a0a1...ai...aN b†] (36) i=1 i=1 Finally, the first term on the right hand side of Eq. (34) can be rearranged to the form

− N − N a0 n[a1a2...aN b†] = ( 1) a0b† n[a1a2...aN ] = ( 1) (n[a0b†] + a0b†) n[a1a2...aN ] (37) where in the last equation we have used definition of contraction.

− − n[a0b†]n[a1a2...aN ] = b†a0a1...aN = n[b†a0a1...aN ]

− N a0b† n[a1a2...aN ] = n[a0b†a1a2...aN ] = ( 1) n[a0a1...aN b†] Substituting above two equations in Eq. (37) we get

− N+1 − 2N a0 n[a1a2...aN b†] = ( 1) n[b†a0a1...aN ] + ( 1) n[a0a1...aN b†]

a0 n[a1a2...aN b†] = n[a0a1...aN b†] + n[a0a1...aN b†] (38) Now substituting Eqs. (35), (36), and (38) into Eq. (34), we finally get

XN n[a0a1a2...aN ]b† = n[a0a1...aN b†] + n[a0a1...aN b†] + n[a0a1...ai...aN b†] i=1

XN n[a0a1a2...aN ]b† = n[a0a1...aN b†] + n[a0a1...ai...aN b†] (39) i=0 We now generalize L1 to the case of the normal-products with contraction. L2:

Xk n[M1M2...Mi...... Mk] Ml = n[M1M2...Mi...... MkMl]+ n[M1M2...Mi...... Mj.....MkMl] j=1,j

35 where C designates the index set of those operators Mi, i = 1,...,k which are already contracted in the normal-product on the left hand side. Thus in the normal-product on the left hand side and in the first term on the right ∈ hand side only the operators Mi, i C, are contracted, while in the last term there is an additional contraction involving the last operator Ml and some yet unpaired operators Mi, i = 1,...,k; i < C. The proof of this lemma is very easy when one realizes that L2 reduces to L1 when C is empty. This is because all the contracted terms on left hand

side, first term on the right hand side, and only the terms not contracted with Ml in the second term on the right hand side can be taken out of the normal-product. We are now ready to prove Wick’s theorem. We shall use again the mathematical induction, since from the definition of a contraction the theorem holds for N = 2.

M1M2 = n[M1M2] + n[M1M2]

= n[M1M2] + M1M2

and, trivially for N = 1. We thus assume its validity for N ≥ 2 and prove that it is then also valid for N + 1. Indeed, multiplying Wick’s theorem for N operators with an arbitrary creation or annihilation

operator MN+1 from the right we obtain X M1M2...MN MN+1 = n[M1M2...MN ]MN+1 + n[M1...Mi...Mj...MN ]MN+1 1 i

XN M1M2...MN MN+1 = n[M1M2...MN MN+1] + n[M1...Mi...MN MN+1] i=1 X X X + n[M1...Mi...Mj...MN MN+1] + n[M1...Mi...Mj...Mk...MN ...MN+1] 1 i

36 second term of Eq.(40). For N even, the terms in the last line contain all possible normal- products with all but one operator contracted, since (N + 1) is then odd. Conversely, for odd N, Eq.(41) contains all possible fully contracted terms. Consequently, Wick’s theorem also holds for (N + 1) operators and, thus, in general.

6. Particle-hole formalism

Instead of referring all SDs and their matrix elements back to the vacuum state | i | i | i I = a1a2...aN = a1†a2†...aN† vac it’s more convenient to begin with a fixed reference state also called as Fermi vacuum, in contrast with the physical vacuum |vaci. | i ≡ | i | i 0 Φ0 = ijk...n and define other SD’s relative to it, e.g. | ai ≡ | i | i Φi a†i Φ0 = ajk...n (single excitation), | abi ≡ | i | i Φij a†b†ji Φ0 = abk...n (double excitation), | i ≡ | i | i Φi i Φ0 = jk...n (electron removal), | ai ≡ | i | i Φ a† Φ0 = aijk...n (electron attachment) etc. Notice also that | abi | bai −| bai −| abi Φij = Φji = Φij = Φji The spinorbitals i,j,k,...,n are occupied in |0i are called hole states (they appear explicitly only when an electron is excited out of them by, e.g. i, creating a hole in the reference state), while the other spinorbitals a,b,... are called particle states. We shall use the letters i,j,k,... to indicate indices restricted to hole states, the letters a,b,c,... to indicate indices restricted to particle states and the letters p,q,r,... to indicate any state (either hole or particle, without restriction). We assume an energy level separating the filled (hole) states (present in |0i) with the empty (particle) state. This energy level is called Fermi level. Using this notation, we find that

i†|0i = 0, a|0i = 0

h0|i = 0, h0|a† = 0

It is convenient to define a new set of operators, sometimes called pseudo-creation and pseudo-annihilation operators (or quasi-operators), via

bi = ai†, bi† = ai

ba = aa, ba† = aa†

37 Thus bi† creates a vacancy in state i while bi eliminates such a vacancy. The particle pseudo-operators are identical to the ordinary particle operators, while the hole pseudo- creation and pseudo-annihilation operators are equivalent to the ordinary hole annihila- tion and creation operators, respectively. The motivation for this notation is that all pseudo-annihilation operators operating to the right on the Fermi vacuum state give zero and all pseudocreation operators operating to the left on the Fermi vacuum state also give zero, | i h | bp 0 = 0, 0 bp† = 0

7. Normal products and Wick’s theorem relative to the Fermi vacuum

Now we modify the concepts of normal products, contractions and Wick’s theorem so that they relate to a reference state (the Fermi vacuum) instead of the physical vacuum. A product of creation and/or annihilation operators is said to be in normal order relative | i ≡ | i | i to the Fermi vacuum 0 Φ0 = ijk...n if all pseudo-creation operators a†, ... and i, ... are to the left of all pseudo-annihilation operators a, ... and i†, ... . Using the notation

bi† = ai = i, bi = ai† = i†, ba† = aa† = a†, ba = aa = a

the product is in normal order if all the bp† operators are to the left of all the bp operators. Since | i h | bp 0 = 0, 0 bp† = 0 the Fermi-vacuum expectation value of a normal-ordered product of such operators vanishes. To distinguish the new type of normal product from the previous type, it is often written as − σ N[ABC...] = ( 1) bp†bq†...bubv, instead of n[ABC...] when the ordering is relative to the physical vacuum. The power σ of

the phase factor is the parity of the permutation from ABC... to bp†bq†...bubv. Contractions relative to the Fermi vacuum will be denoted by brackets above the operators instead of below, and we have AB ≡ AB − N[AB] So for contractions relative to the Fermi vacuum we find that the only nonzero contrations are

i†j = δij, ab† = δab A normal product with contractions is also defined in the same way as in the case where it is relative to the physical vacuum:

N[ABC...R...S...T ...V ...] = (−1)σ RT SV ...N[ABC...]

38 Quantity True vacuum formalism Fermi vacuum formalism | i | i | i vacuum state vac 0 or Φ0

creation operator ba† = aa† = a† bi† = ai = i annihilation operator ba = aa = a bi = ai† = i† normal product of operators n[ABC...] N[ABC..]

If we recall the proof of Wick’s theorem, we see immediately that the same proof will apply to particle-hole formalism versions of the theorem. We only have to replace everywhere the true vacuum quantities with the corresponding Fermi vacuum quantities, as indicated in the table given above. We can thus write immediately the particle-hole form of Wick’s theorem as follows: X ABCD... = N[ABCD...] + (All possible contractions) as indicated, the sum is over all possible contractions of one pair, two pairs etc. The operatore are particle-hole operators defined with respect to |0i. Obviously, the usefulness of this theorem is at least partly due to the fact that the Fermi vacuum expectation value of a normal product vanishes unless it is fully contracted, so that

X h0|A...B...C...D...|0i = h0|N[A...B...C...D...]|0i

where the sum is over all fully contracted normal products. From here on, unless explicitly stated otherwise, whenever we talk of the vacuum we will be referring to the Fermi vacuum and whenever we talk of normal products or contractions, we are referring to these concepts relative to the Fermi vacuum.

8. Generalized Wick’s theorem

To complete this phase of the analysis, we need one more theorem, the generalized Wick’s theorem dealing with products of normal products of operators. This is needed since we shall have to evaluate matrix elements of the normal-product operator Wˆ between various Slater determinants (not just the reference SD), as for example in

h ab...| ˆ | de...i h | ˆ | i Φij... W Φlm... = 0 i†j†...baW d†e†...ml 0 Here we have a vacuum expectation value of a product of three operator strings, each of which separately is in normal-product form, since

N[i†j†...ba] = i†j†...ba,

N[d†e†...ml] = d†e†...ml

39 The generalized Wick’s theorem states that a general product of creation and annihilation operators in which some operator strings are already in normal-product form is given as the overall normal product of all the creation and annihilation operators plus the sum of all overall normal products with contractions except that, since contractions of pairs of operators that are already in normal order vanish, no contractions between pairs of operators within the same original normal product need be included: X 0 N[A1A2...]N[B1B2...]N[C1C2...] = N[A1A2...B1B2...C1C2...]+ N[(All possible contractions)] where the sum is over contractions of one pair at a time, two pairs, etc., and the prime on the summation sign indicates that no ”internal” contractions. Note that the case in which the original product contains some individual creation or annihilation operators not within any normal product is also included in the scope of the generalized Wick’s theorem, since for such operators A = N[A].

9. Normal-product form of operators with respect to Fermi’s vaccum

One-electron operators: Let us consider a one-electron operator X Fˆ = hp|fˆ|qip†q (42) pq

Using Wick’s theorem, p†q = N[p†q] + p†q The contracted term vanishes unless p and q are the same hole state (call it i), when it is equal to 1, and thus X X Fˆ = hp|fˆ|qi N[p†q] + hi|fˆ|ii pq i X ˆ h | ˆ| i = FN + i f i i ˆ where FN is the normal-product form of the operator Eq. (42), X ˆ h | ˆ| i FN = p f q N[p†q] pq

ˆ h | ˆ | i The expectation value of FN for Fermi vacuum is zero, i.e., 0 FN 0 = 0. To show it, consider the four possible permutations of p†q for particle and hole operators. Case I: Both operators correspond to particle states, then h0|N[a†b]|0i = h0|a†b|0i = 0 since there is no particle state to annihilate in |0i.

Case II: Both operators correspond to hole states, then h0|N[i†j]|0i = h0|ji†|0i = 0. Case III: One of the operators corresponds to a hole state and other to a a particle state,

40 such that h0|N[i†a]|0i = h0|i†a|0i = 0. Case IV: One of the operators corresponds to a particle state and the other one to a hole h | | i h | | i h | ai state, such that 0 N[a†i] 0 = 0 a†i 0 = 0 Φi = 0. Therefore, we have X h | ˆ| i h | ˆ| i ˆ ˆ h | ˆ| i 0 F 0 = i f i , F = FN + 0 F 0 i ˆ Note that FN contains hole-hole, particle-particle, and hole-particle terms, X X X X ˆ FN = fij N[i†j] + fab N[a†b] + fia N[i†a] + fai N[a†i] ij ab ia ai X X X X − = fij ji† + fab a†b + fia i†a + fai a†i ij ab ia ai Two-electron operators: Next consider a two-electron operator, 1 X Gˆ = hpq|gˆ|rsi p q sr (43) 2 † † pqrs The derivation of the normal-product for of this operator is left for homework.

VI. DENSITY-FUNCTIONAL THEORY

The solution of Schrödinger’s equation in the clamped-nuclei approximation is a 3N- dimensional function (not including the spin degrees of freedom), where N is the number of electrons ˆ HΨ (x1,x2,...,xN ) = EΨ (x1,x2,...,xN ). Each wave function gives a unique X Z 3 3 | |2 ρ(r) = N d r2 ...d rN Ψ (x,x2,...,xN ) . (44) s,s2,...,sN Note that since Ψ is antisymmetric, it does not matter which particle is left out of the integrations. We see immediately that Z d3rρ(r) = N.

The other commonly used symbol denoting ρ is n. Obviously, solutions of electronic structure problems would be much easier if one could replace Ψ by ρ, an object that is only three dimensional. While it may seem initially impossible, attempts to do so go back to the early days of quantum mechanics and we now know solid mathematical background for such an approach. Here is a summary of major historical developments in this field.

41 1927 Llewellyn H. Thomas proposes to apply the expressions coming from quantum statistical treatment of uniform electron gas (the latter can be found in most statistical mechanics textbooks) to atoms. While electron density in atoms is obviously non-uniform, Thomas assumed that it is uniform locally.

1928 Enrico Fermi comes independently with a similar idea.

1930 Paul Dirac extends theory to include the so-called exchange terms. Such approach is now called the Thomas-Fermi-Dirac (TFD) method. The essence of this theory is that the energy of a system is written as a functional of electron density with all terms in the functional originating from quantum statistical treatment of electron gas.

1935 Carl Weizsäcker proposes a correction to the kinetic energy term in TFD.

1951 John Slater develops a method which is a combination of the HF method and TFD, in particular, the density is computed from the Slater determinant and the method solves one-electron equations similar to HF equations. The Slater method was in many respects similar to the Kohn-Sham method discussed below, but was missing a rigorous derivation.

1964 Pierre Hohenberg and Walter Kohn (HK) prove that there exists a functional of ρ which upon minimization gives the exact ground-state energy. The method is called the density-funtional theory (DFT). However, HK say nothing on how to construct such a functional.

1965 Kohn and Lu Jeu Sham (KS) derive one-electron equations similar to HF equations that can be solved for spinorbitals which then give ρ which minimizes the density functional. The functional used in the original KS method has several terms taken from the TFD method and therefore this approach is now called the local-density approximation (LDA).

1998 Walter Kohn receives Nobel prize in chemistry for DFT.

2017 Hundreds of approximation to the unknown exact density functional have been proposed by now and DFT is the most used computational method in many fields of physics and chemistry.

A. Thomas-Fermi-Dirac method

Although it is often stated that the Thomas-Fermi-Dirac (TFD) method originates from quantum statistical mechanics, no statistical approach is needed to derive this

42 method in its basic form. The reason is that the statistical treatment is taken for temperature T → 0, when one can use non-statistical quantum mechanics. The Thomas-Fermi (TF) theory expressions come from considering a system of non- iteracting spin 1/2 fermions of mass equal to the electrons mass placed in a cubic box. Then one adds the interelectron Coulomb interactions as a first-order correction neglecting at this point the permutational symmetry of wave function (like in the Hartree approach). The TFD extension fixes this deficiency, i.e., computes the first-order correction accounting for antisymmetry. Let us consider a system of N noniteracting spin 1/2 fermions of mass equal to the electrons mass m placed in a cubic box of side L (volume V = L3). Since the Hamiltonian is separable into Hamitonians of individual particles, the solution of the Schödinger equation for such system reduces to solutions of single-particle equations, and then to separate solutions for each dimension, giving orbitals r 8 ψ (r) = ψ (x)ψ (y)ψ (z) = sin(k x)sin(k y)sin(k z) nx,ny ,nz x y z V x y x and orbital energies

π2~2    = n2 + n2 + n2 n , n , n = 1,2,... nx,ny ,nz 2mL2 x y z x y z where π π π k = n k = n k = n . x L x y L y z L z We assumed that the box extends from 0 to L in each dimension and that the potential

is zero inside the box and infinite outside, so that the boundary conditions are ψx(0) = ψx(L) = 0 and similarly for other dimensions. One may also assume periodic boundary conditions: ψx(x) = ψx(x + L) and similarly for other dimensions and this assumption leads to same results in the limit of large number of particles. We assume that each orbital corresponds to two spinorbitals, one with s = 1/2 and one with s = −1/2. The total wave function for the ground state of this system is then the Slater orbital built from N spinorbitals with lowest energies. This means that each orbital energy level is doubly occupied or doubly degenerate (the highest occupied energy level is called Fermi level). The total energy of the system is

Xocc E = nx,ny ,nz . (45) nx,ny ,nz

If we put a dot in three-dimensional coordinate system for each point nx,ny,nz, the part of the space with positive coordinates will be divided into cubes of side 1. For large q 2 2 2 ≤ N, the surface formed by largest such values, n¯x,n¯y,n¯z, limited by n¯x + n¯y + n¯z r for some fixed, sufficiently large r, is well approximated by the surface of the sphere with the

43 1 4 3 radius r. The volume of the considered part of the space limited by this surface is 8 3 πr 1 3 so the number of states inside this surface is nr = 3 πr , where we multiplied by 2 to 2 include spin degeneracy. The number of states in a shell r,r +dr is therefore dnr = πr dr. 2~2 2 2 All these states have (approximately) the same orbital energies r = (π /2mL )r . Thus, we can obtain the total energy of the system by integrating

Z Z rF 2~2 3~2 π 2 2 π 5 E = rdnr = 2 r πr dr = 2 rF 0 2mL 10mL

where rF denoted the radius corresponding to the Fermi level. This radius can be found from 1 N = πr3 3 F which gives π3~2 3N 5/3 E = . 10mL2 π This energy can be further written in terms of electron (number) density ρ = N/V as

π3~2  3N 5/3 π3~2 3ρ5/3 E = L3 = V = C V ρ5/3 (46) 10m πL3 10m π F

3 ~ 2 2/3 where CF = 10 m (2π ) is the so called Fermi constant. We will later use atomic units 3 2 2/3 where this constant reduces to CF = 10 (2π ) . Note that this energy is just the the kinetic energy, the only energy in the case of noninteracting gas. For future reference, let us find the expression for Fermi’s energy and Fermi’s wave

vector. The Fermi energy is the orbital energy at rF

π2~2 π2~2 3N 2/3 π2~2  3N 2/3 π2~2 3ρ2/3  = r2 = = = (47) F 2mL2 F 2mL2 π 2m πL3 2m π where the Fermi wave vector is

p 3ρ1/3  1/3 k = 2m = π~ = ~ 3π2ρ (48) F F π Thomas and Fermi used the expression of Eq. (46) as the kinetic energy in their model even if it was applied to atoms, molecules, or solids despite the fact that the electron density in such systems is obviously not constant. A critical assumption of TF model is that the density can be assumed locally constant, the so-called local density approximation (LDA). Next, Thomas and Fermi moved to interacting electron gas, i.e., added to the Hamiltonian the electron Coulomb repulsion term as a perturbation operator

N 1 X e2 Uˆ = (49) ee 2 |r − r | i

44 ˆ and included the first-order correction, i.e., the expectation value of Uˆee = G with the product of ground-state spinorbitals in their energy expression. If we recall derivetion of Eq. (28), N 1 X   hΨ |GˆΨ i = g − g (50) 2 ijij ijji i,j=1 using just the product produces only the first term in this expression. It term can be written as N N 1 X 1 X X e2 3r 3r x x x x gijij = d 1d 2φi∗( 1)φj∗( 2)| − |φi( 1)φj( 2) (51) 2 2 r1 r2 i,j=1 i,j=1 s1,s2 " The sum over spinorbitals can be replaced by electron density. To see it, let us write Eq. (44) for Slater determinant. Due to orthonormality of spinorbitals, the spinorbitals of coordinates integrated over must be the same. Thus the only surviving terms are those where the consecutive squares of modulus of a given spinorbital are depending on x − XXN XXN XN (N√ 1)! ρ(r) = N φ∗(x)φi(x) = φ∗(x)φi(x) = 2 ψ∗(r)ψi(r) (52) 2 i i i ( N!) s i=1 s i=1 i=1 √ where 1/ N! comes from the definition of the determinant and the factor (N − 1)! is the

number of permutations (identical in the bra and ket) of spinorbitals other than φi(x). In the last step, we have integrated over spin recalling that pairs of spinorbitals are related to the same orbital. Using Eq. (52) we can write Eq. (51) as 1 e2 3r 3r r r d 1d 2ρ( 1)| − |ρ( 2) = JH[ρ]. (53) 2 r1 r2

This term is known under" the name of Hartree energy and denoted by JH[ρ]. It described Coulombic interaction of electron density with itself. For atoms, moleccules, and solids, Thomas and Fermi included, of course, also the Coulomb interaction of electron with nuclei N N N Xnuc X Z e2 X Vˆ = − a = vˆ(r ). (54) |r − R | i a=1 i=1 i a i=1 Since this is a one-electron operators, its expectation value with the ground-state Slater determinant XN Z h | ˆ i Ψ V Ψ = vii = ρ(r)vˆ(r) = V [ρ]. i=1 Since we now have different kinetic energy at teach point of space, we have to average the expression (46) over the space, obtaining in this way the kinetic energy of the TF method Z Z 1 T TF = Ed3r = C ρ5/3(r) V F

45 The total energy expression in the TF metod is therefore

TF T [ρ] = TTF[ρ] + V [ρ] + JH[ρ].

This functional of ρ can be minimized with respect to ρ, we will not discuss these methods. For atoms, the functional has often been evaluated with densities obtained from the HF method. The TFD method is an extension of the TF method by including the permutational symmetry in evaluating the expectation value of the Uee operator, the term sometimes called the Dirac exchange energy. In contrast to the Hartree term which is valid for any set of orbitals, the Dirac term is explicitly computed with orbitals of the noninteracting gas. The exchange integral of HF theory can be written in terms of one-particle density matrix

N N 1 X 1 X X e2 3r 3r x x x x K = gijji = d 1d 2φi∗( 1)φj∗( 2)| − |φj( 1)φi( 2). 2 2 r1 r2 i,j i,j=1 s1,s2 " Let’s first sum over spin. We have X X ψ∗ (1)σ(1)ψ (1)σ 0(1) ψ (2)σ(2)ψ∗ (2)σ 0(2) i0 j0 i0 j0 s1 s2 where i0 and j0 are orbital indices. Note that i0 is coupled with the same σ in both places − as it is the same spinorbital. Thus, if in the sum over s1 we have, say, + combination of spins, the same combination apears in the sum over s2. Therefore, the only nonvanishing terms are ++ and −−, so we get an overal factor of 2 from spin summation and we can write     XN/2 2 XN/2 3 3   e   K = d r d r  ψ∗(r )ψ (r )  ψ (r )ψ∗(r ) 1 2  i 1 i 2  |r − r |  i 1 i 2  i 1 2 i 1" e2 3r 3r r r r r = d 1d 2ρ1( 1, 2)| − |ρ1( 2, 1) 4 r1 r2 2 1 2 e " 3r 3r r r = d 1d 2 ρ1( 1, 2) | − | 4 r1 r2 " The quantity ρ1 is the one-electron (reduced, i.e., integrated over spin) density matrix

XN/2 ρ1(r1,r2) = 2 ψi(r1)ψi∗(r2) (55) i and the factor 2 in its definition leads to the factor 1/4 in the expression for K. We will now compute the one-electron density matrix for noniteracting electron gas. In contrast to what we did when deriving the kinetic energy expression, it is now more

46 convenient to assume the periodic boundary conditions. One can prove that for large N the two conditions give the same answers, but we will not do it since at this point we made much more drastic approximations then could arise from a possible inconsistency resulting from different boundary conditions. With the periodic conditions, the orbital wave functions is of the form √1 ik r ψn ,n ,n = e · x y z V 2π ± ± where kv = L nv with nv = 0, 1, 2,.... The density matrix of this system is Xocc 2 ik (r r ) ρ (r ,r ) = e · 1− 2 . 1 1 2 V nx,ny ,nz

Analogously as before, for large N we can change summation to integration

Z occ Z k =kF 2 ik r 1 | | ik r 3 ρ (r ,r ) = e · 12 dn dn dn = e · 12 d k 1 1 2 V x y z 4π3 where changing variables with used dkx = (2π/L)dnx and so on, which gives overall Jacobian V/(8π3). The upper limit of the integration was defined in Eq. (48). This integral can be evaluated in spherical coordinates and this evaluation is given as a homework. The result is 1 ρ (r ,r ) = [sin(sk ) − sk cos(sk )] (56) 1 1 2 π2s3 F F F | | − where s = r12 . It is now natural to view ρ1 as a function of variables s = r1 r2 and r = (r1 + r2)/2. We see that it is independent of r and of the direction of s, as expected for the uniform gas. Notice that ρ1 does depend on ρ via kF. To compute the Dirac exchange energy used in TFD, we assume as before that the uniform gas expression is valid locally and with ρ dependent on r averge over space, obtaining Z 3  3 1/3 K = C ρ4/3(r)d3r,C = . (57) D x x 4 π The derivation of this expression is left as homework.

B. Hohenberg-Kohn theorems

The first HK theorem states that, for the ground state of a system, the knowledge of ρ allows one to determine Ψ and vice verse. The latter is obvious from the definition of density. For systems consisting of atoms with no external potentials, the proof of the former part of this theorem is very simple. Since the sources of the potential are nuclei and the electron interaction with a nucleus is singular, the density will have sharp peaks

47 exactly at the positions of nuclei. The steepness depends on the charge of a nucleus. Thus, the knowledge of density gives us locations and charges of nuclei. Therefore, one can write Schrödinger’s equation and solve it to find Ψ . The importance of this theorem is that is shows that the density alone gives all the needed information about the system. Also note that the theorem discusses only the exact ground-state density. The second HK theorem states that there exists a functional of density, denoted by E[ρ], that upon minimization with respect to ρ gives the ground-state energy Z ≥ 3 E[ρ] E[ρ0] = E0 where ρ(r) > 0 and d rρ(r) = N.

The density is arbitrary except for satisfying the two conditions listed, originating from the definition of ρ in Eq. (44). To prove this theorem, we will follow the arguments given by Levy. We will start from the Ritz variational principle h | i E0 = min Ψ Hˆ Ψ Ψ where Ψ belongs to the Hilbert space of normalized antisymmetric N-electron functions. We can then write " # h | i h | i E0 = min Ψ Hˆ Ψ = min min Ψ Hˆ Ψ Ψ ρ Ψ ρ → with ρ constrained by the conditions specified in the theorem. The meaning of the double minimizations is as follows. We go over the space of all possible ρ’s and for a given

ρ find all Ψ 0s that give this ρ. We select such Ψ out of this set that gives the lowest expectation value of the Hamiltonian. Clearly, if we go over all ρ’s, we will eventually find the ground-state energy, which proves the theorem. One subtlety to discuss is whether for an arbitrary ρ there exists an antisymmetric Ψ which gives this ρ via Eq. (44). One can indeed prove that this is the case (by constructing a set of orthonormal spinorbitals from the density and then constructing a Slater determinant from the density). We say that each ρ (fulfilling the constraints) is N-representable. However, we do not need to prove this theorem to complete the proof of the second HK theorem. Since for each Ψ there exists a ρ, we will sweep the space of all Ψ ’s when going over all ρ’s. If there were ρ’s that are not N-representable (which is not the case), we could just ignore them. The importance of Hohenberg-Kohn work is mainly conceptual, stemming from the fact that it has put density-functional theory on a solid mathematical ground, in contrast to the TFD and Slater methods which both were based on ad hoc arguments. However, the HK theorems did not offer any new practical tools. The proof that the functional exists is via the wave functions, so it tells us nothing about finding the actual functional that could be applied without invoking wave functions. Significant efforts have been made by many researchers to find good approximations to such a functional, but in fact all the proposed pure density functionals work poorly. The family of methods that do

48 work, now known under the name DFT, are in fact not true DFT approaches, i.e., are not based on density alone. These methods originate from the Kohn-Sham ideas which will be discussed next and use spinorbitals in addition to densities.

C. Kohn-Sham method

Let us write the Hamiltonian of Eq. (19) with the following notation for the three consecutive terms

Hˆ = Tˆ + Vˆ + Uˆee. The matrix elements of the multiplicative operator Vˆ which is the sum of one-electron terms N N N Xnuc X Z e2 X Vˆ = − a = vˆ(r ) |r − R | i a=1 i=1 i a i=1 can be written as an explicit functional of density

X Z Z h | ˆ i 3 3 | |2 3 Ψ V Ψ = N d r1 ...d rN vˆ(r1) Ψ (x1,x2,...,xN ) = d r1vˆ(r1)ρ(r1). (58) s1,...,sN

We can therefore write HK theorem as "Z # 3 HK E0 = min d r vˆ(r)ρ(r) + F [ρ] ρ

where     HK HK HK h | ˆ ˆ i h | ˆ ˆ i F [ρ] = T [ρ] + Uee [ρ] = min Ψ T + Uee Ψ = Ψmin[ρ] T + Uee Ψmin[ρ] . Ψ ρ → There is nothing new in this equation except for introducing the notation.

One might think that an explicit density functional can be obtained when the Uˆee operator is neglected. This is not so since the operator Tˆ is a differential operator. Thus, if we replace Vˆ by Tˆ in Eq. (58), this equation cannot be integrated to depend on density only. To overcome this difficulty, Kohn and Sham replaced T HK[ρ] by an expression used in the HF method, i.e., by the expectation value of Tˆ with a Slater determinant

N 1 X T HK[ρ] → T [{φ [ρ]}] = − hφ |∇2φ i S i 2 i i i=1 ~ where we started to use atomic units such that = e = me = 1. The notation introduced in TS indicates that spinorbitals can be considered to be determined by density. We may say { } that TS[ φi[ρ] ] is an explicit functional of orbitals and an implicit functional of density

49 (we will not make use of these concepts). The density can be calculated from a Slater determinant using Slater-Condon’s rules for the following electron-density operator

XN − ρˆ = δ(r ri), i=1 i.e.,

XN XZ XN X XN h | i 3 − 2 ˜ 2 ρ(r) = Ψ ρˆΨ = d r1 φi∗(x1)δ(r r1)φi(x1) = φi(x) = φi(r) , i=1 s1 i=1 s i=1 (59) ˜ where we replaced s1 by s in the next to last equation, φi is the orbital part of the spinorbital φi, and where we assumed pure spin states, so that the sum over the spin part is one. So far we do not know how to determine the spinorbitals, we will get to this issue later on. HK The next important idea of Kohn and Sham was to write the Uee [ρ] term as a sum of the Coulomb interaction of the density with itself Z 1 ρ(r)ρ(r ) 3r 3r 0 EH[ρ] = d d 0 | − | , 2 r r0 called the Hartree energy, and of the remainder. This term appears in the HF theory as the expectation value of the N-electron Coulomb operator and in the THD theory, so this choice was natural. With both discussed approximations, the FHK[ρ] functional can be written as HK { } F [ρ] = TS[ φi[ρ] ] + EH[ρ] + Exc[ρ] where the last term, called the exchange-correlation energy, collects all interactions not included in the first two terms. It is worth to write this term explicitly HK − { } HK − Exc[ρ] = T [ρ] TS[ φi[ρ] ] + Uee [ρ] EH[ρ].

Thus, despite the label “exchange-correlation", this term includes kinetic energy correc- tions. The term is expected to correct for the electron correlation effects not included in EH[ρ] and for the effects resulting from antisymmetrization of the Slater determinant which in the HF method lead to the exchange operator. When a concrete Exc[ρ] is constructed, one usually considers separately the correlation and exchange components,

denoted by Ec[ρ] and Ex[ρ], respectively. All the hundreds of DFT methods in use differ by the selection of Exc. In the simplest case used in the original KS paper, this term is taken from the TFD theory. We will discuss various choices of Exc later on. The complete KS functional can be written as Z KS { } 3 E0 [ρ] = TS[ φi[ρ] ] + d rvˆ(r)ρ(r) + EH[ρ] + Exc[ρ]

50 We will find its minimum in a way analogous to that used in the derivation of the HF

method. Since ρ is expressed in terms of φi via Eq. (59), variation of ρ will be expressed via variations of φi’s. Thus, we will vary, as in the HF method, → φi∗ φi∗ + δφi∗ and this variation will imply the variation of ρ

XXN XXN →   ρ(r) ρ(r) + δρ(r) = φi∗(x) + δφi∗(x) φi(x) = ρ(r) + φi(x)δφi∗(x) s i=1 s i=1 We have to now impose the two conditions on ρ. The positiveness condition is automa- tically satisfied if the definition (59) is used. The normalization to N will be achieved if each orbital is normalized. We will in addition require that orbitals are orthogonal to each other since only then Eq. (59) holds. Thus, the conditions will be imposed in exactly the same way as in the HF method, i.e., we will minimize

XN   LKS KS − h | i − [ρ] = E0 [ρ] λij φi φj δij j i ≥ The linear variations of the kinetic energy, the nuclear attraction, the Hartree, and the constraints terms are exactly the same as in the HF method

N 1 X δT = − hδφ |∇2φ i S 2 i i i=1

Z XN 3 h | i δ d rvˆ(r)ρ(r) = δφi vφˆ i i=1 XN h |ˆ i δEH[ρ] = δφi Jφi i=1 XN XN h | i −  h | i δ λij φi φj δij = λij δφi φj . j i j i ≥ ≥ For the exchange-correlation (xc) energy term, we have to use symbolic notation since we do not know this term explicitly. This term is assumed in the form of an integral of the so-called xc energy density Z 3 Exc[ρ] = d r exc[ρ](r)

where exc(r) is some function of ρ(r), in the simplest cases it can be just a power of ρ. Therefore, we can write Z O 2 3 O 2 Exc[ρ + δρ] = Exc[ρ] + δExc[ρ] + [(δρ) ] = Exc[ρ] + d r exc0 (r)δρ(r) + [(δρ) ]

51 where exc0 (r) is defined as the function which integrated with δρ(r) gives δExc[ρ]. Thus, 2 this is an analog of the standard derivative: f (x + δx) = f (x) + f 0(x)δx + O((δx) ). We will use notation δExc e0 (r) ≡ (r) ≡ v (r) xc δρ xc and we call vxc(r) the functional derivative of Exc[ρ]. Notice that vxc(r) is a function of r, whereas Exc[ρ] is just a single real number for a given ρ. Although this definition may appear to be abstract, it is simple to find vxc in practice. For example, the exchange 4/3 energy density in TFD is ex(r) = Cxρ(r) so that Z 3 4/3 Ex[ρ] = Cx d r ρ(r) Z Z  4  E [ρ + δρ] = C d3r (ρ(r) + δρ(r))4/3 = C d3r ρ(r)4/3 + ρ(r)1/3δρ(r) + ... , x x x 3 where we applied Taylor’s expansion of f (x) = x4/3 (true of any value ρ(r) = x). Thus, 4 1/3 vx(r) = 3 ρ(r) in this case. One may also comment that the familiar derivation of Euler- Lagrange’s equations in classical mechanics uses concepts analogous to those defined above and can also be formulated using functional derivatives.

The derivation of a formula for vxc(r) gets a bit more complicated if the exchange- correlation energy depends also on derivatives of ρ, as it will be discussed later on. In this case, we have Z 3 Exc[ρ] = d r exc[ρ,∇ρ](r). ≡ The quantity exc[ρ,∇ρ](r) is some concrete function of ρ and of ρi ∂ρ/∂xi. Although the derivatives ρi are defined by ρ, we can first treat them as independent variables [like in df (x,y(x)) = (∂f /∂x)dx + (∂f /∂x)dy = (∂f /∂x)dx + (∂f /∂x)(dy/dx)dx]. At a given point r

and for a given ρ, the linear variation of exc[ρ,∇ρ](r) is the sum of the increments δρ and δρi’s multiplied by the regular partial derivatives of exc with respect to these variables − − O 2 δexc = exc[ρ + δρ,ρ1 + δρ1,ρ2 + δρ2,ρ3 + δρ3] exc[ρ,∇ρ] [(δρ) ]

∂exc ∂exc ∂exc ∂exc = δρ + δρ1 + δρ2 + δρ3. ∂ρ ∂ρ1 ∂ρ2 ∂ρ3

We now have to eliminate the increments δρi’s in favor of δρ, similarly as in the derivation of the Euler-Lagrange equations. This can be done integrating by parts: Z Z Z ! 3 ∂exc 3 ∂exc ∂δρ ∂exc ∞ − 3 ∂ ∂exc d r δρi = d r = δρ d r δρ. ∂ρi ∂ρi ∂xi ∂ρi ∂xi ∂ρi −∞ The surface term vanishes since δρ vanishes at infinity and we eventually have   Z Z X 3 3 ∂exc ∂ ∂exc  d r v (r)δρ = d r  − δρ. xc  ∂ρ ∂x ∂ρ  i i i

52 Using the definition of the functional derivative, we can write the linear variation of xc energy in terms of variations of orbitals as

Z N Z 3 XX 3 δExc[ρ] = d r vxc(r)δρ(r) = d r vxc(r)φi(x)δφi∗(x). s i=1

Now we have determined variations of all terms in LKS[ρ]. Assuming that all variations

of spinorbitals are zero except for spinorbital φi∗ (or, alternatively, noticing that all variations are independent), we get  1  X hδφ | − ∇2 + vˆ + Jˆ + v φ i − λ hδφ |φ i = 0. i 2 xc i ij i j j

As in the HF case, it implies that the ket has to be identically equal to zero  1  X − ∇2 + vˆ(r) + Jˆ(r) + v (r) φ (x) = λ φ (x) 2 xc i ij j j

Since ρ is obviously invariant to unitary transformation of orbitals, so is vxc(r) and we can diagonalize the matrix  by a unitary transformation, obtaining the canonical KS equations  1  − ∇2 + vˆ(r) + Jˆ(r) + v (r) φ (x) = φ (x). 2 xc i i The KS equation are similar to HF equations. The major difference is the presence of the vxc functional and the absence of the Kˆ operator. One might think that it would be a good idea to include Kˆ , but the predictions of such methods are poor. However, there is a whole family of density functionals that add a fraction of Kˆ at the same time subtracting an equivalent part of vx. Such approaches are called hybrid DFT methods. Another major difference between HF and KS approaches is that in the former case one computes the total system energy as an expectation value of the Hamiltonian, whereas in the latter case one uses the appropriate KS functional. Of course, this is consistent with the way each type of equations was obtained.

D. Local density approximation

The exchange-correlation energy is defined in the KS method as

− − Exc = T + Uee TS EH

where T is the exact kinetic energy and Uee is the exact electron repulsion energy. Thus, despite its name, it should correct also for the deficiencies in the description of kinetic

energy by TS. However, little work is done in this direction, major efforts in the field have been directed into improving the description of components resulting from electron

53 correlation and electron exchanges (however, in the so-called meta-GGA theories that will be discussed later, one used terms related to kinetic energies).

The electron repulsion term is partly accounted in KS approach by the EH term, Coulomb repulsion of density with itself. This term is the same as in the HF method, except that it is computed with KS rather than HF densities. In the HF approach, this term does not include any correlation effect (if electron correlation energy is defined as − Ecorr = Eexact EHF). Thus, we do not expect it to completely describe electron correlation in the KS approach. We need further contributions. The need for some term related to electron exchanges is clear when we realize that the exchange operator of the HF method is missing in KS orbital equation. It would be simple to add this operator and indeed the so-called hybrid methods which will be discussed alter on do it. However, just adding the complete Kˆ was found to give poor results. The truth is that the meaning of the words "exchange" and "correlation" in relation to

Exc should be taken in only a loose sense. Nevertheless, there is a rich literature devoted to constructing Exc functionals and most of this work is based on solid physics. One often discusses separately the two terms writing

Exc = Ec + Ex

So, how one gets any expression for Exc? The preceding discussion might indicate that this is an impossible task. However, as usual in physics, one studies simple, exactly solvable models and tries to design expression which work for such models. One of the most important model is homogeneous interacting electron gas (HIEG). As discussed in Sec. ??, the TFD model is derived from this physical system. This also means that if the TFD model is applied to HIEG, it is expected to work very well, and this is indeed the case. The KS approach also uses the local density approximation (LDA) as does TFD. In fact, the original KS model in the 1965 is often called the LDA approach. It is similar to

TFD in several respects. First, the terms V and EH are identical. Next, Kohn and Sham have also taken the Ex from TFD: Ex = KD where KD is defined by Eq. (57). Thus the main difference is the use of TS instead of TTF, which was the main feature of Slater’s density functional theory. The Ec was usually set to zero in early LDA variants, although there were some attempts to approximate this term based on perturbation theory of interacting electron gas. Note that while the noninteracting electron gas problem can be solved exactly, no analytic solutions exist for the interaction gas except in the limits of very large and very small densities. The use of LDA may seem as a huge approximation for atoms molecules and solids. Indeed, in its initial form KS/LDA did not work well for molecules, in many cases predicting that well-known molecules are not bound. The greatest successes of this approach were for metal, where the conduction electrons resemble electron gas.

54 The LDA method is still widely used. The main difference between the modern LDA and the original KS version is the addition of an Ec term fitted to nearly exact numerical calculations for interacting uniform electron gas performed by Ceperley and Alder in 1980. The calculations were performed using the diffusions Monte Carlo (DMC) method which will be discussed later on. The only quantity obtained by Ceperley and Alder was the total energy of the electron gas as function of density. One can then arbitrarily define the correlation energy as

− − Ec[ρ] = Etotal[ρ] TS[ρ] ED[ρ]. (60)

These numerical results were then fitted by some simple analytic functions. One may

notice that there is no V and EH terms in this equation. The reason is that for the interacting uniform electron gas, one has to use a uniform positive background to compensate for the electron charges and the Coulomb interactions included in these two terms add up to zero. Notice further the arbitrariness of definition (60): the two subtracted terms are not the exact kinetic and exchange energies of the system but some approximations of these quantities.

The fits of Ec are usually expressed in terms of the quantity rS called the Wigner-Seitz radius and defined by 4 1 V πr3 = = 3 S ρ N i.e., it is the radius of a sphere with volume corresponding to the volume occupied by one

electron. Ec is usually expressed as Z 3 Ec = ρ(r)c(r)d r

where c(r) is called correlation energy density. Several fits of this quantitiy have been published, a particularly simple one was developed by Chachiyo in 2016 ! b b c = aln 1 + + 2 rs rs

where a and b are fit parameters. Other fits have been published by Vosko-Wilk-Nussair (VWN) in 1981 and by Perdew and Wang (PW92) in 1992.

E. Generalized gradient approximations (GGA)

LDA has been developed using uniform electron gas as the underlying model, a system where ∇ρ(r) = 0, but applying the resulting formulas to systems where ∇ρ(r) , 0. It is therefore natural to include ∇ρ(r) in DFT. Attempts to do so have a long history: the Weizsäcker correction to the TF kinetic energy is the earliest example. The expansion

55 in powers of ∇ρ(r) was discussed in the HK 1964 paper. This expansion, now called gradient expansion approximation (GEA), is usually expressed in terms of the quantity |∇ρ(r)| |∇ρ(r)| s(r) = = 2 1/3 4/3 2kFρ(r) 2(3π ) ρ(r) or in terms of |∇ρ(r)| x(r) = . ρ(r)4/3 The exchange energy can then be written as Z Z 4/3  2  3 4/3 3 Ex = Cx ρ (r) 1 + Dxs (r) d r = Cx ρ (r)Fx(s)d r where Fx(s) is called the enhancement factor. Notice that this expression does reduce to the uniform gas limit for ∇ρ(r) = 0. Note also that there are no terms linear in components of ∇ρ(r). The reason is that Ex is assumed to be a quantity independent of the external potential, i.e., only dependent on electron-electron interactions. Thus, this quantity should be invariant under rotations. The coefficient Dx can be determined by requesting that Ex satisfies some exact constraints or by fitting to experimental data or to results from wave function based calculations. We will not discuss these issues since calculations applying GEA have demonstrated that it gives poor results, usually worse than LDA. The underlying reasons is that GEA violates several exact conditions that LDA does satisfy (again, we will not discuss these issue). The problems of GEA were solved by a family of methods known under the name generalized gradient approximations (GGA). To explain heuristically the main idea of these solutions let us consider the exchange hole defined as

| r r |2 −1 ρ1( 1, 2) ρx(r1,r2) = 2 ρ(r1

where ρ1 has been defined by Eq. (55). Numerical GEA calculations show that while this | − | quantity is well reproduced by GEA for small r1 r2 , at some range of intermediate values of interelectron separation the values of the exchange hole are much too large. One simple solution is to cut-off this region in numerical calculations. However, the unphysical behaviour is clearly related to the enhancement factor Fx(s) increasing too fast. Thus, the GGA methods use some fnctions multiplying s2 that damp this growth. One of the popular GGA enhancement factors was introduced by Becke in 1988

2 β Fx(x) = 1 + x 1 1 + 6βx sinh− (x) with β = 0.0042 fitted to reproduce atomic HF energies. One can plot this function to see that it increases slower than x2 for large x. GGAs perform much than LDA for virtually all systems and are now the mainstream DFT methods.

56 F. Beyond GGA

To build on success of GGA, one can think about including further terms in GEA, starting from the s4 term. This leads to a family of methods called meta-GGA which use also terms dependent on the so-called spinorbital kinetic energy density

XXN 1 2 τ(r) = ∇φi(x) . 2 s i=1 Another extension of GGA approach is the inclusion of the HF exchange operator in the KS one-electron equations, which leads to a family of the so-called hybrid GGA methods. The HF exchange is, of course, calculated with KS orbitals, so it is different from the actual HF exchange. This term is often called “exact” exchange, but of course it is not exact. The HF exchange operator is multiplied by some fractional number α and − exchange potential vx is then multiplied by 1 α. There are also methods called range- separated hybrid (RSH) methods which admix the HF exchange at a variable amount | − | depending on the value of r1 r2 in the exchange integral. The whole family of DFT method is sometimes visualized in the form of the so-called “Jacob” ladder proposed by Perdew. The consecutive “rungs” are

———— virtual (e.g., RPA)

———— hybrid: HF exchange

———— metaGGA: |∇ρ|2, τ

———— GGA: ∇ρ

———— LDA: ρ(r) The last rung are theories that use virtual orbitals. As discussed earlier, the KS method is not a true DFT method since it uses orbitals. However, all the rungs but the top one use only the occupied orbitals. Although the use of orbitals is numerically more costly than the use of density only, the restriction to occupied orbitals makes the cost increase manageable and even hybrid metaGGA methods are numerically much less expensive than even the simplest wave function methods above the HF level. However, it is possible to use KS both occupied and virtual orbitals in many-body methods such as those discussed in the next sections. One of the simples ones is the random-phase approximation (RPA) which can be viewed as a special case of the coupled cluster method with double excitations (CCD) [see Sec. IX C]. Of course, the costs of such approaches as the same as costs of the corresponding wave function approaches. One may ask why to use DFT in such cases. One reason is that the unperturbed problem may be closer to the exact solution than in the case when HF is used as zeroth-order approximation. This is the case in particular for metals where HF work poorly.

57 Let us now present a very breif list of most popular functionals (the total number of DFT functionals proposed is a few hundred). LDA nonempirical Kohn and Sham 1965 solids BLYP nonempirical Becke, Lee, Yang, and Parr 1988 molecules PBE nonempirical Perdew, Burke, and Ernzerhof 1996 molecules and solids SCAN nonempirical Perdew et al. 1995 molecules and solids B3LYP fitted Becke 1993 molecules PBE0 fitted Adamo and Barone 1999 molecules M06-2X fitted Truhlar et al. 2006 molecules The second and third functional belong to the GGA rung, the fourth is metaGGA, the fifth and sixth are hybrid GGAs, and the last one is a hybrid metaGGA. The first four functionals can be classified as nonempirical, i.e., the parameters were fixed mostly using various exact conditions, possibly with some fitting to a limited set of data such as atomic total energies. In the fitted functionals, the parameters were adjusted by fitting DFT predictions to a set of benchmark date obtained both from experiments and from accurate calculations using wave function methods.

VII. VARIATIONAL METHOD

The variation principle is an approximation method that provides a simple way of placing an upper bound on the ground state energy of any quantum system. We start with the inequality hψ|H|ψi hEi = ≥ E (61) hψ|ψi 0 h i | i where E is the expected energy, ψ is an arbitrary state and E0 is the lowest eigenvalue of the Hamiltonian, H. The proof of the proposed claim is as follows. First, let’s assume that the arbitrary state is normalized, i.e. hψ|ψi = 1. If we expand | i P | i P | |2 ψ = n cn n with n cn = 1 to ensure normalization, then we can write for the expected energy X X X h i ∞ h | | i ∞ ∞ | |2 E = cm∗ cn m H n = cm∗ cnEnδmn = cn En n,m=0 n,m=0 n=0 X X h i ∞ | |2 ∞ | |2 − ≥ E = E0 cn + cn (En E0) E0 n=0 n=0 In the case of a non-degenerate ground state, we have an equality only if c0 = 1, which | i implies that cn = 0 for all n , 0. If we consider a family of states ψ(α) , which depend on some number of parameters αi, we can define hψ(α)|H|ψ(α)i E(α) = ≥ E hψ(α)|ψ(α)i 0

58 ≥ Here, we still have the relation E(α) E0 for all parameters α. The lowest upper bound on the ground state energy is then obtained from the minimum value of E(α) over the range of parameters α, i.e. obtained by taking the first derivative

∂E = 0 ∂αi α=αk ≤ giving us the upper bound E0 E(αk). Unfortunately, the variational method does not tell us how far above the ground state E(αk) lies. Despite the limitation, when a set of states |ψ(α)i is chosen fairly close to the ground state, the variational method can give remarkably accurate results.

A. Configuration Interaction (CI) method

The basic idea of Configuration Interaction (CI) is to diagonalize the N-electron Hamiltonian in a basis of N-electron functions, or Slater determinants. Essentially what we’re doing here is representing the exact wave function as a linear combination of N- electron trial functions and then using the variational method to minimize the energy. If a complete basis were used, we would obtain the exact energies to both the ground state and all excited states of the system. In principle, this provides an exact solution to the many-electron problem; however, in practice, only a finite set of N-electron trial functions are manageable so the CI wavefunction expansion is typically truncated at specific excited configurations. As a result of the size restrictions on practical CI calculations, CI often provides only upper bounds to the exact energies. | i The CI wavefunction is a linear combination of known Slater determinants Φi with unknown coefficients. This allows us to write eigenvectors of our Hamiltonian as X | i | i Ψj = cij Φi i Generally, the Slater determinants are constructed from excitations of the Hartree- | i Fock "reference" determinant Φ0 . X X X | i | i r | ri rs | rsi rst | rst i Ψ = c0 Φ0 + ca Φa + cab Φab + cabc Φabc + ... (62) r,a r

59 state wavefunction and so on. We now optimize our total CI wavefunction via the Ritz variational method. hΨ CI |Hˆ |Ψ CI i E = hΨ CI |Ψ CI i If we then expand the CI wavefunction in a linear combination of our Slater determinants, we get P P h | ˆ | i i j ci∗cj Φi H Φj E = P P h | i i j ci∗cj Φi Φj The variational procedure corresponds to setting all the derivatives of our energy with

respect to the expansion coefficients ci equal to zero. Rearranging, we get XX XX h | i h | ˆ | i E ci∗cj Φi Φj = ci∗cj Φi H Φj i j i j

∂E X X X X ∂   c∗c hΦ |Φ i + 2E c hΦ |Φ i = 2 c hΦ |Hˆ |Φ i + c∗c hΦ |Hˆ |Φ i ∂c i j i j i i j i i j i j ∂c i j i ij i i ij i The first term vanishes from the minimization of the energy, and the last term vanishes since it doesn’t depend on the coefficients. Since the basis functions are orthonormal, we obtain X X h | ˆ | i E ciδij = ci Φi H Φj i i X X − Hijci Eδijci = 0 i i h | ˆ | i where Hij = Φi H Φj . Since there is one equation for each j, we can transform this equation into a matrix equation. (H − EI)c = 0 Hc = Ec      H − EH ...H ...  00 01 0j  c0   0   −      H10 H11 E...H1j ...  c   0    1     . . .. .  .   .   ......  .  =  .  (63)       .      H ....H. − E...  cj   0   j0 jj      . . . .  .   .   ......  .   .  Solving these secular equations is equivalent to diagonalizing the CI matrix. The CI energy is then obtained as the lowest eigenvalue of the CI matrix, and the corresponding

eigenvectors contain the ci coefficients in front of the determinants in Eq. 62. In this case, the second lowest eigenvalue corresponds to the first excited state, the third lowest is the second excited state and so on. We have mentioned that the CI expansion is typically truncated at specific excited configurations. From studying the Slater-Condon rules, we know that only singly and

60 doubly excited states can interact directly with the reference state, therefore matrix elements that have more than three unlike spinorbitals vanish. Due to Brillouin’s h | | i theorem, the matrix elements S H Φ0 are zero. The structure of the CI matrix, under the basis set of HF Slater determinants and their excited states is then given as

h |  h | | i h | | i  Φ0  Ψ0 H Ψ0 0 D H Ψ0 0 0 ...    h |  h | | i h | | i h | | i  S  0 S H S S H D S H T 0 ...    hD|  hD|H|Ψ i hD|H|Si hD|H|Di hD|H|Ti hD|H|Qi ...   0  H = h |  h | | i h | | i h | | i h | | i  (64) T  0 T H S T H D T H T T H Q ...    hQ|  0 0 hQ|H|Di hQ|H|Ti hQ|H|Qi ...    .  ......  .  ...... 

| i | i | i where Φ0 is the Hartree-Fock reference state, S is the singly excited state, D is the doubly excited state and so on. The blocks hX|H|Y i which are not necessarily zero may still be sparse, meaning that most of its elements are zero. Let’s look at the matrix element h | | i h rs| | tuvwi belonging to the block D H Q . The matrix elements Φab H Φcdef will be nonzero only { } if φa and φb are contained in the set φc,φd,φe,φf , and if φr and φs are contained in the { } set φt,φu,φv,φw . The task at hand is then to calculate each matrix element and to diagonalize the CI matrix. As we include more and more excitations in the CI expansion, we capture more and more electron correlation. CI needs more basis sets in order to capture the correlation energy efficiently. We can increase the size of the CI matrix by adding more excited configurations, or by increasing the basis set size. However, there’s a problem with adding more and more excitations or basis sets - namely, it is very expensive to do so. If the number of spinorbitals produced by HF is 2M, the number of determinants 2M constructed is then N , where N is the number of electrons. Taking into account all possible excitations in the expansion is known as Full CI (FCI), and this method goes with a complexity of O(N!). Because of the complexity of Full CI, what is usually done is to take advantage of lower excitation states and truncate the CI matrix, i.e. CI Doubles (CID) only takes into consideration CI with double excitations. Since the single excitations themselves do not correlate with the ground state explicitly, the most significant term for the correlation energy must come from the double excitations, since they are the first excitations coupled with the HF Slater determinant. This gives a reduced matrix which is much more feasible for practical computation; however, this introduces another problem - size extensivity.

1. Size extensivity of CI

A method is said to be size extensive if the energy calculated scales linearly with the number of particles N, i.e. the word "extensive" is used in the same sense as in

61 thermodynamics. The truncated CI will introduce errors in the wave function, which will in turn cause errors in the energy and all other properties. A particular result of truncating the N-electron basis is that the CI energies obtained are no longer size extensive. Let us show that CI is not size extensive through an example. Consider two noninteracting

hydrogen (H2) molecules. We expect the total energy of the two molecules to be the sum of the individual molecules, i.e. E(2H2) = 2E(H2). Using CID for a single H2 molecule will result in the exact energy; however, if we use the CID method and consider the energy from CI wavefunction for the pair of molecules, the energy of the two molecules at large separation will not be the same as the sum of their energies when calculated separately. The CI wavefunction for this system will look like

| i A| i| i Ψ = ψa ψb

 2  2  | i A | 2i | i | 2i | i Ψ = a0 σ a + a2 σ ∗ a a0 σ b + a2 σ ∗ b

 2 2 2 2  | i A 2| 2i | 2i | 2i | i | i | 2i 2| i | i Ψ = a0 σ a σ b + a0a2 σ a σ ∗ b + a0a2 σ ∗ a σ b + a2 σ ∗ a σ ∗ b where A is the asymmetrization operator and states |σ 2i correspond to both electrons 2 being in the ground state and |σ ∗ i corresponds to both electrons in the excited state. Notice that the last term in the expansion has two excitations from both molecules. This is considered a quadruply excited state, which is truncated out in the CID calculation. In order to account for the missing energy from the expected total energy, we would have to have included quadruply excited states in the CI basis set, since local double excitations could happen simultaneously on both subsystems. It is clear that the fraction of the correlation energy recovered by a truncated CI will diminish as the size of the system increases, making it a progressively less accurate method. However, if we were to truncate CI, we should realize that the spinorbitals in the Slater determinants come from HF method, so we should allow those orbitals to re-optimize as we take linear combinations of the determinants. We should also consider for example, not exciting the inner shell orbitals since the computational complexity for those excitations are huge for small effects on the energy differences. We can neglect these orbitals by "freezing" the core orbitals and implementing CI in higher orbitals.

2. MCSCF, CASSCF, RASSCF, and MRCI

The Multi-Configurational Self-Consistent Field (MCSCF) method is another approach to the CI method, in which we decide on a set of determinants that can sufficiently describe our system. Each of the determinants are constructed from spinorbitals that are not fixed, but optimized as to lower the total energy as much as possible. The main idea here is to use the variational principle to not only optimize the coefficients in front of the

62 determinants, but also the spinorbitals used to construct the determinants. In a sense, the MCSCF method is a combination of the CI method and HF method (if the number of determinants chosen was just 1, we get back the HF method). The classical MCSCF approach follows very closely to the Ritz variational method described before. We start with the MCSCF wavefunction, which has the form of a finite linear combination of Slater determinants ΦI X MCSCF Ψ = cI ΦI I where cI are the variational coefficients. Next, we calculate the coefficients for the determinant using the variational method, without changing the determinants. Next, we vary the coefficients in the determinants at the fixed CI coefficients to obtain the best determinants. And finally, we repeat by going back and expanding the MCSCF wavefunction in terms of the newly optimized determinants. The MCSCF method is mainly used to generate a qualitatively correct wavefunction, i.e. recover the "static" part of the configuration. The goal is usually not to recover a large fraction of the total correlation energy, but to recover all the changes that occur in the correlation energy for a given process. A major problem that this procedure faces is figuring out which configurations are necessary in include for the property of interest. The Complete Active Space Self-Consistent Field (CASSCF) method is a special case of the MCSCF method. From the molecular orbitals computed from HF, we partition the space of these orbitals into an active and inactive space. The inactive space of spinorbitals are chosen from the low energy orbitals, i.e. the doubly occupied orbitals in all determinants (inner shells). The remaining spinorbitals belong to the active space. Within the active space, we consider all possible occupancies and excitations of the active spinorbitals to obtain the set of determinants in the expansion of the MCSCF wavefunction (hence, "complete"). A common notation used for CASSCF is the following: [n,m]-CASSCF, where n is the number of electrons distributed in all possible ways in m spinorbitals. For example, [11,8]-CASSCF for the molecule NO pertains to the problem of 11 valence electrons being distributed between all configurations that can be constructed from 8 molecular orbitals. For any full CI expansion, CASSCF becomes too large to be useful, even with small active spaces. To overcome this problem, a variation called the Restricted Active Space Self- Consistent Field (RASSCF) method is used. In the RASSCF method, the active orbitals are divided into 3 subsections, RAS1, RAS2, and RAS3. Each of these subsystems have restrictions on the excitations allowed. A typical example is one where RAS1 includes occupied orbitals that are excited in the HF reference determinant, RAS2 includes orbitals from the full CI or limited to SDTQ excitations, and RAS3 includes virtual orbitals that are empty in the HF determinant. The full CI expansion within the active space severely restricts the number of orbitals

63 and electrons that can be treated by CASSCF methods. Any additional configurations to those from RAS2 space can be generated by allowing excitations from one space to another. For example, allowing 2 electrons to be excited from RAS1 to RAS3. In essence, a typical example of the RASSCF method generates configurations by a combination of a full CI in a small number of orbitals in RAS2 and a CISD in a somewhat large orbital space in RAS1 or RAS3. Excitation energies of truncated CI methods such as the ones described above are generally too high, since the excited states are not that well correlated as the ground state is. For equally correlated ground and excited states, one can use a method called Multi- Reference Configuration Interaction (MRCI), which can use more than one reference determinant from which certainly known singly, doubly, and higher excited states (this set of certainly known determinants is called the model space). MRCI gives a better correlation of the ground state, which is important if the system under consideration has more than one dominant determinant since some higher excited determinants are also taken into the CI space. The CI expansion is then obtained by replacing the spinorbitals in the model space by other virtual orbitals.

B. Basis sets and basis set convergence

The standard wave functions used in solving Schrödinger’s equations for atoms and molecules are constructed from antisymmetric products of spinorbitals. In most methods, these spinorbitals are generated by expanding a finite set of simple basis functions. The choice of basis functions for a molecular calculation if therefore important, depending on which system we wish to analyze. There are hundreds of basis sets that can be used, each optimized for a specific system. The most general types include Slater- type orbitals (STO) and Gaussian-type orbitals (GTO). Here, we will consider Thom Dunning’s correlation-consistent basis sets, which were designed for converging post-HF calculations systematically to the complete basis set limit using extrapolation techniques. Correlation consistent basis sets are built by adding functions corresponding to electron shells to a core set of HF functions. What we will need for carrying out accurate correlated calculations are not only a set of spinorbitals that resemble as closely as possible the occupied orbitals of the atomic systems, but also a set of virtual correlating orbits into which the correlated electrons can be excited. An obvious candidate here are the canonical orbitals from the HF calculations; however, since the lowest virtual HF orbitals are very diffuse, they will not be well suited for correlating the ground-state electrons, except when the full set of orbitals is used. Another strategy is to try and generate correlating atomic orbitals for molecular calculations by relying on the energy criterion alone, i.e. adjust the exponents of the correlating orbitals so as to maximize their contribution to the correlation energy.

64 By doing this, we should be able to generate sets of correlating orbitals that are more compact, i.e. contains fewer primitive basis functions. This method will generate for us correlation-consistent basis sets, meaning that each basis set contains all correlating orbitals that lower the energy by comparable amounts as well as all orbitals that lower the energy by larger amounts. In these correlation-consistent basis sets, each correlating orbital is represented as a single primitive chosen as to maximize its contribution to the correlation energy, and where all correlating orbitals that make similar contributions to the correlation energy are added simultaneously. A hierarchy of basis sets can then be set up that is correlation- consistent in the sense that each basis set contains all correlating orbitals that lower the energy by comparable amounts as well as all orbitals that lower the energy by larger amounts. The main advantage of this method is that it allows us to empty smaller primitive sets. Correlation-consistent basis sets were designed to converge systematically to the complete basis set limit using extrapolation techniques. Let us consider the structure of the correlation-consistent basis sets in more detail. We will start with the main two families of basis sets, cc-pVXZ and cc-pCVXZ, where n = D,T,Q,5,6,7... Here, cc-p stands for correlation consistent polarized, and V and CV stand for valence and core- valence, respectively. p indicates the presence of polarization functions in the basis set. XZ is the zeta factor, which tells us how many basis functions are used for each atomic orbital. As we increase X, we add more higher angular functions, which spans higher angular space. The basis functions are added in shells, e.g, for the C atom, cc-pVDZ would consist of [3s2p1d], cc-pVTZ would consist of [4s3p2d1f], and cc-pVQZ would consist of[5s4p3d2f1g]. The main difference between the two families is that the cc- pCVXZ basis sets are extended from the standard cc-pVXZ sets for additional flexibility in the core region. A prefix aug can be added to the two families of basis sets above to means that one set of diffuse functions is added for every angular momentum present in the basis, improving flexibility in the outer valence region. As the number of basis functions increase, the wavefunctions become better represented and the energy decreases to approach the complete basis set limit (CBS). An infinite number of basis functions is impossible to employ practically, but we can try to estimate the energy at the CBS limit. By using hierarchical basis sets, i.e. correlating consistent sets with adjacent angular momenta, we can calculate the energy for a couple of points then hope to extrapolate higher basis function energies or higher correlation energies. If we look at the dependence of the HF energy on the basis set size, we will see that the error in HF energy should scale exponentially with the cardinal number, X. The 3 correlation energy scales differently, by E ∝ X− . This allows us to carry out calculations at for example, Dζ and Tζ, and fit the energies on a logarithmic plot with energies vs. X. This line can then be used to extrapolate what the energies would be in higher ζ, or even at the CBS limit.

65 C. Explicitly-correlated methods

In this section, we will consider methods that utilize wavefunctions that depend

explicitly on the interelectronic distance r12. This explicitly-correlated wavefunction leads to much faster convergence of the CI expansion, as well as improving dramatically the accuracy of the energy. Recall that in the HF method, we neglected all interactions

between the electrons, i.e. the HF wavefunction did not depend on r12 near r12 = 0. This method overestimates the possibility of finding two electrons close together and thus overestimates the electron repulsion energy. To account for the interactions between electrons, we must somehow integrate the interelectronic distance into our calculation. However, these explictly correlated methods do bring a couple of problems. First, the resulting algorithms are much more difficult to implement. Second, they are incompatible with concepts such as orbitals and electron configurations since they avoid the 1-electron approximation from the very beginning.

1. Coulomb cusp

We will consider the behavior of the exact wavefunctions for coinciding particles; in particular, where the electronic Hamiltonian becomes singular and gives rise to a cusp in the wavefunction. For simplicity, we will examine the ground state of He, which we can easily generate accurate approximations to the true wavefunction. The Hamiltonian of He is given as −1∇2 − 1∇2 − 2 − 2 1 H = 1 2 | | | | + | − | 2 2 r1 r2 r1 r2 We can see here that the singularities of this Hamiltonian occur if r1 = 0, r2 = 0, or − r1 r2 = 0. At these points, the exact solution of Schrödinger’s equation must provide contributions to HΨ that balance the singularities in H to ensure the local energy remains constant and equal to the energy eigenvalue E. The only possibly source of this balancing is via the kinetic energy term. It is convenient to express the Hamiltonian

in terms of relative coordinates r1, r2, and r12, where r1 and r2 are the distances of the electrons to the nucleus and r12 the interelectronic distance. Doing so, we get 2 ! ! ! 1 X ∂2 2 ∂ 2Z ∂2 2 ∂ 1 r r ∂ r r ∂ ∂ H = − + + − + − − 1 · 12 + 2 · 21 2 2 r ∂r r 2 r ∂r r r r ∂r r r ∂r ∂r i=1 ∂ri i i i ∂r12 12 12 12 1 12 1 2 21 2 12 Schrödinger’s equation must be well behaved, so the singularities must somehow cancel, leading to a nuclear and interelectronic cusps. In order for the singularities to cancel, terms that multiply 1 and 1 must cancel. We’ll only look at the electron-electron cusp, ri r12 for which terms with 1 must vanish in HΨ . From the second term, we find that this r12 leads to ∂Ψ 1 = Ψ (r12 = 0) ∂r12 r12=0 2

66 which describes the behavior of the wavefunction when the electrons coincide and represents the electron-electron cusp condition. This cusp condition is impossible to fulfill using orbital-based wave functions. If we do a FCI expansion for He in terms of Slater-type orbitals, we will get X FCI ζ(r1+r2) i j j i 2k Ψ = e− cijk(r1r2 + r1r2)r12 ijk

where the summation is over all nonnegative integers. This FCI expansion thus contains

all possible combinations of powers of r1, r2 and r12. Our wavefunction now includes the interelectronic distance r12; however, since only even powers of r12 are present, the cusp condition can never be satisfied. This missing cusp condition in the wavefunction leads to slow convergence of CI with respect to the basis set. This is an intrinsic problem shared by all wavefunction expansions in orbital products. In order to fix this problem

and gain faster convergence, we will introduce an explicit linear dependence on r12 into the wavefunction. 1 Ψ CI = (1 + r )Ψ CI r12 2 12 Now if we take the derivative, we get

Ψ CI ∂ r12 1 1 = Ψ CI (r = 0) = Ψ CI (r = 0) 12 r12 12 ∂r12 r12=0 2 2 which satisfies the Coulomb cusp condition exactly. In general, we may impose the correct Coulomb cusp behavior on any determinant-based wave function Φ by multiplying the expansion by some correlating function γ, such that

1 X γ = 1 + r 2 ij i

2. Hylleraas function

The Hylleraas function is one such function. Hylleraas was the first who succeeded in constructing an accurate wavefunction for the singlet S state . If we

67 generalize the FCI expansion of STO’s to include all powers of r12, we will obtain X H ζ(r1+r2) i j i j k Ψ = e− cijk(r1r2 + r2r1)r12 ijk which is usually expressed as X H ζs i 2j k Ψ = e− cijks t u ijk where s, t and u are the so-called Hylleraas coordinates,

− s = r1 + r2, t = r1 r2, u = r12

The Hylleraas function is usually truncated according to i + 2j + k ≤ N; however, it still presents very high accuracy with only a few terms, especially with Helium. This function is also only applicable to few electron atomic systems, since the complexity of the function increases dramatically with more electrons.

3. Slater geminals

Geminals, or two-electron functions, are another type of explicitly correlated functions that represent a generalization of single-electron orbitals accounting for intra-orbital correlation effects. The wavefunction is expanded into two-electron basis functions in addition to orbital products. The primary cusp condition suggests that such an expansion is effective for geminal basis functions with the asymptotic behavior 1 f = r + O(r2 ) 12 2 12 12 1 Including these f12 functions requires two-electron integrals for operators f12 and r12− , such as (Q) − ∇ · ∇ K12 = ( 1f12) ( 1f12)

Most explicitly correlated methods have employed basis functions such as the linear r12 (R12) or Gaussian-type geminals (GTG) 1 f R12 = r 12 2 12

NG X 2 GTG ζGr12 f12 = cGe− G A downside of R12 functions is that the associated energies do not always cover a sufficient fraction of the correlation energy. GTG does not suffer from such a problem at large r12; however it never fulfills the cusp condition exactly. Despite this, a modest

68 number of GTGs can still represent a suitable range of r12 accurately. The main disadvantage is that the computation of the integrals involved can get relatively costly especially for 2 operators quadratic to f12 involving NG/2 primitive operations. Slater-type geminals (STG), or Slater geminals, with the form r f STG = − c e r12/rc 12 2 − where rc is a scale-length parameter, remedy the above problems of GTGs. These functions use STO’s as geminal basis functions to incorporate interelectronic distances. STG simplifies the quadratic operators to the exponential forms, i.e.

(Q) 1 K = − e 2r12/rc 12 4 −

 STG2 2 (Q) f12 = rc K12 It turns out that STG provides better results in comparison to methods such as GTG and R12. For example, the upside of these functions is that at least 5ζ quality results are obtained in a Tζ basis when used. From a computational point of view, STGs are also more efficient due to its compact and short-range form.

4. Explicitly-correlated Gaussian functions

Explicitly correlated Gaussians were proposed to describe N-particle wavefunctions with a basis of exponential functions with an argument involving the square of the interelectronic distances, PN ECG α (r r )2 ψ = Ae− i

69 N ≥ 3. The main disadvantages they seem to exhibit are that they are unable to describe the electron-nuclear cusp, they vanish too quickly for large distances, and the Gaussian correlation factor does not reproduce the electron-electron cusp, as mentioned in the previous section.

VIII. MANY-BODY PERTURBATION THEORY (MBPT)

The second quantized formalism is perhaps most extensively utilized in the field of perturbation theory of many-electron systems. The is due to the tedious derivations necessary to arrive at feasible working formulae, especially at the higher orders of PT......

A. Rayleigh Schrödinger perturbation theory (classical derivation)

Let us review first the essence of the nondegenerate Rayleigh-Schro¨dinger perturbation theory. Consider the time-independent Schrodinger equation. ˆ HΨn = EΨn (65) Finding solutions to this equation is, in most cases, a difficult task. Assume, however, that the Hamiltonian consist of two Hermitian parts, zero-order part and a perturbation,

Hˆ = Hˆ 0 + Vˆ (66) It is convenient to write the following form

Hˆ = Hˆ 0 + λVˆ (67) where λ is an "order parameter" that is used to classify the various contributions by their order. We assume, solutions to the zeroth order eigenvalue problem for Hˆ 0 ˆ (0) H0Φn = En Φn (68) with h | i Φm Φn = δmn (69)

If Φn is nondegenerate, it is possible to number the solutions in such a way that

lim Ψn = Φn λ 0 → (0) (70) lim En = En λ 0 → And if there are degeneracies, it is possible to choose the zero-order solutions so that (70) is still satisfied. − χn = Ψn Φn (71) − (0) ∆En = En En

70 Here we have partitioned Ψn into two parts, one parallel (i.e. proportional) to Φn and the other orthogonal to it. So it is convenient to use intermediate normalization:

h | i h | i Φn Φn , χn Φn , h | i h | i Ψn Φn = Φn + χn Φn = 1, (72) h | i h | i Ψn Ψn = 1 + χn χn

To proceed further, we use the order parameter λ and expand:

(0) (1) 2 (2) (0) ≡ Ψn = Φn + χn = Ψn + λΨn + λ Ψn + ... (Ψn Φn) (73) (0) (0) (1) 2 (2) En = En + ∆En = En + λEn + λ En + ...

Substituting into the Schro¨dinger equation   ˆ − H En Ψn = 0 (74)

with Hˆ = Hˆ 0 + λVˆ , we get    − (0) − (1) − 2 (2) − (0) (1) 2 (2) Hˆ 0 + λVˆ En λEn λ En ... Ψn + λΨn + λ Ψn + ... = 0 (75)

Equating coefficients of powers of λ gives for λ0, λ1, and λ2, respectively:   − (0) (0) Hˆ 0 En Ψn = 0 (zero order), (76)     − (0) (1) (1) − (0) Hˆ 0 En Ψn = En Vˆ Ψn (first order), (77)     − (0) (2) (1) − (1) (2) (0) Hˆ 0 En Ψn = En Vˆ Ψn + En Ψn (second order) (78) and in general, for λm, the mth-order equation

    mX2 − (0) (m) (1) − (m 1) − (m l) (l) Hˆ 0 En Ψn = En Vˆ Ψn − + En − Ψn (79) l=0 which becomes

  mX1 (0) − (m) (m 1) − − (m l) (l) En Hˆ 0 Ψn = Vˆ Ψn − En − Ψn (80) l=0

(m) h | 1 In order to get expressions for En we apply Φn to each equation and integrate. For λ we get

h | ˆ − (0)| (1)i h | (1) − ˆ | i Φn H0 En Ψn = Φn En V Φn (81)

71 By the Hermitian property of Hˆ 0 we have   h ˆ − (0) | (1)i (1) − h | ˆ | i H0 En Φn Ψn = En Φn V Φn (82) | {z } | {z } V =0 ≡ nn and so

(1) h | ˆ | i En = Φn V Φn = Vnn (83)

(1) (1) Thus we have obtained En without knowledge of Ψn and same can be done for each order m:

mX1 h | (0) − ˆ | (m)i h | ˆ | (m 1)i − − (m l) h | (l)i Φn En H0 Ψn = Φn V Ψn − En − Φn Ψn (84) | {z } l=0 | {z } =0 =δl0 giving

(m) h | ˆ | (m 1)i En = Φn V Ψn − (85)

(m) (m 1) Thus, in principle, we can obtain each En from the previous Ψn − and then solve for (m) h | (m)i Ψn etc., while always maintaining Φn Ψn = 0(m > 0). (m) To calculate Ψn we can expand it in terms of the known zero-order solutions Φk. This exploits the fact that the set of eigenfunctions of any semibounded Hermitian operator form a complete set: X X (m) (m) | ih | (m)i Ψn = akn Φk = Φk Φk Ψn k k (86) (m) h | (m)i akn = Φk Ψn (to be determined)

(m) h | To obtain akn we multiply the mth-order equation by Φk and integrate:

mX1 h | (0) − ˆ | (m)i h | ˆ | (m 1)i − − (m l) h | (l)i Φk En H0 Ψn = Φk V Ψn − En − Φk Ψn (87) | {z } | {z } l=0 | {z }   (0) (0) P ˆ (m 1) (l) E E Φ j Φk V Φj Φj Ψn − =a n − k h k| h | | ih | i kn Thus   m 1 (0) (0) (m) X (m 1) X− (m l) (l) − − − − En Ek akn = Vkjajn En akn (88) j l=0

(0) h | i In this equation the l =0 contributions are to be interpreted as akn = Φk Φn = δkn. This (m) result provides a system of equations for the akn , coefficients, to be solved order by

72 (m) order, but the first thing to notice is that we have no equation for ann ; this coefficient is arbitrary, corresponding to the arbitrariness of adding any multiple of the zero-order (m)n solution Φn. This arbitrariness appears for each order Ψ separately. The following choice of intermediate normalization can thus be made for each order: h | (m)i Φn Ψn = 0 (m > 0), (m) ann = 0 (m > 0).

Consequently

(m) ann = δm0 (89)

(0) Since akn = δkn, the first-order equation becomes:   X (0) − (0) (1) (0) − (1) (0) En Ek akn = Vkj ajn En akn j |{z} δjn − (1) (0) = Vkn En akn

= Vkn (n , k)

(1) Vkn a = (n , k) (90) kn (0) − (0) En Ek Thus we have the well-known result

(1) X Vkn Ψn = Φk (91) (0) − (0) n,k En Ek From this we get the second-order energy, X X (2) h | ˆ | (1)i h | ˆ | ih | (1)i (1) En = Φn V Ψn = Φn V Φk Φk Ψn = akn Vnk k k (92) X V V X |V |2 = nk kn = − kn (0) − (0) (0) − (0) n,k En Ek n,k Ek En which is also well known. This process can be continued in the same manner to higher orders, e.g.,     1 X  (2) (0) − (0) −  (1) − (1) (1) − (2) (0) akn = En Ek  ajn Vkj En akn En akn     n,j  (93) X V V X kj jn − VknVnn =    2 (k , n) (0) (0) (0) (0)  (0) (0) k,j,n E − En E − En k,n − k j Ek En

73 (3) Using the (93) we can write En , X V V V X (3) nk kj jn − VnkVkn En =    2 (94) (0) (0) (0) (0)  (0) (0) k,j,n E − En E − En k,n − k j Ek En

It is evident that while this procedure is quite straightforward, the book-keeping for the generation of order by order wave function and energy is cumbersome.

B. Hylleraas variation principle

Hylleraas showed that the first order wave function and the second order energy can also be determined variationally. According to Hylleraas variation principle, if the trial (1) wave function Ψ˜n is an approximate solution of the first order wave function, then using (1) (77) and, multiplying it with hΨ˜n | and integrating,     h ˜ (1)| ˆ − (0) | ˜ (1)i h ˜ (1)| (1) − ˆ | i Ψn H0 En Ψn = Ψn En V Φn

    h ˜ (1)| ˆ − (0) | ˜ (1)i h ˜ (1)| ˆ − (1) | i 0 = Ψn H0 En Ψn + Ψn V En Φn (95)

To this equation we add the equation for the second-order energy,

˜(2) h | ˆ − (1)| ˜ (1)i En = Φn V En Ψn (96)

Adding (95) and (96)     ˜(2) h | ˆ − (1)| ˜ (1)i h ˜ (1)| ˆ − (0) | ˜ (1)i h ˜ (1)| ˆ − (1) | i En = Φn V En Ψn + Ψn H0 En Ψn + Ψn V En Φn     h ˜ (1)| ˆ − (1) | i h ˜ (1)| ˆ − (0) | ˜ (1)i = 2Re Ψn V En Φn + Ψn H0 En Ψn (97)

If we define a functional       ˜ (1) h ˜ (1)| ˆ − (1) | i h ˜ (1)| ˆ − (0) | ˜ (1)i J2 Ψn = 2Re Ψn V En Φn + Ψn H0 En Ψn (98)

Then we can write,   ˜ (1) (2) J2 Ψn > En (99)

(2) If Ψ˜n is the exact correction to the wave function, from (99) it follows:   ˜ (1) (2) J2 Ψn = En (100)

74   ˜ (1) (2) Otherwise, the functional J2 Ψn yields an upper bound for En . Then it can be proved that (99) for the first-order correction follows directly from the variation of functional   ˜ (1) J2 Ψn equated to zero.       ˜ (1) h ˜ (1)| ˆ − (1) | i h | ˆ − (1) | ˜ (1)i δJ2 Ψn = δΨn V En Φn + Φn V En δΨn (101)     h ˜ (1)| − (0) | ˜ (1)i h ˜ (1)| − (0) | ˜ (1)i + δΨn Hˆ 0 En Ψn + Ψn Hˆ 0 En δΨn   ˜ (1) ˜ ˜ Requiring that δJ2 Ψn = 0 for any δΨ (including δΨ ∗), then      h ˜ (1)| ˆ − (1) | i h ˜ (1)| ˆ − (0) | ˜ (1)i = δΨn V En Φn + δΨn H0 En Ψn then,     − (0) ˜ (1) (1) − (0) Hˆ 0 En Ψn = En Vˆ Ψn (102)

(1) (1) for which Ψ˜n = Ψn is a solution (since the above relation is equivalent to the first-order equation).   (0) ˜ (1) Next we show that if En is the lowest eigen value of Hˆ 0 then δJ2 Ψn is an upper (2) (1) bound for En . Taking the trial wave function Ψ˜n as,

(1) (1) Ψ˜n = Ψn + χ (103)

Using (103) in (98) gives,       ˜ (1) h (1) | ˆ − (1) | i h (1) | ˆ − (0) | (1) i J2 Ψn = 2Re Ψn + χ V En Φn + Ψn + χ H0 En Ψn + χ       h (1)| ˆ − (1) | i h | ˆ − (1) | i h (1)| ˆ − (0) | (1)i = 2Re Ψn V En Φn + 2Re χ V En Φn + Ψn H0 En Ψn     h | − (0) | (1)i h | − (0) | i + 2Re +χ Hˆ 0 En Ψn + χ Hˆ 0 En χ (104) | {z }  (1)  E Vˆ Φ n − | ni Second and fourth term in the above equation cancels each other,       ˜ (1) (1) h | − (0) | i J2 Ψn = J2 Ψn + χ Hˆ 0 En χ (105)   (0) h | − (0) | i If En is the lowest eigen value of Hˆ 0 then the integral χ Hˆ 0 En χ is nonnegative and zero if and only if, χ is the corresponding eigenfunction. Therefore,   ˜ (1) (2) J2 Ψn > En (106)

75   ˜ (1) ˜ (1) Thus J2 Ψn , with an arbitrary trial function Ψn containing adjustable parameters, can be used in a variational approach for finding approximations to the first-order wave (2) function and second-order energy, and this provides an upper bound to En in the case of a state having the lowest zero-order energy (provided that Φn is an exact eigenfunction of Hˆ 0.

C. Møller-Plesset perturbation theory

The role of the many-body theory is to evaluate the expressions of energy for different orders coming from RSPT, containing many electron wave functions in terms of orbital contributions. The matrix elements should be expressed in terms integrals over one- electron functions. In the course of quantum mechanical application, the following points should be clarified:

1. The nonrelativistic Born-Oppenheimer many-body Hamiltonian projected to a given basis set can be most conveniently specified by the usual second quantized form. Underlying basis set is assumed to be orthonormalized; MBPT calculations are usually performed in the molecular orbital (MO) basis which meets this criterion.

2. The choice of the zeroth-order Hamiltonian is arbitrary, any Hermitain operator

would do in principle. In wishes to chose Hˆ 0 as close to Hˆ as possible in order to obtain favorable convergence properties of the perturbation series. On the other

hand, Hˆ 0 should be as simple as possible, since one should be able to diagonalize it and obtain its complete set of eigenfunctions. A practical balance between these

two conflicting requirements is to choose Hˆ 0 as the Fock operator: X ˆ ˆ H0 = F = εppˆ†pˆ (107) p

in terms of molecular spinorbital operators pˆ and orbital energies εp. By this choice, the perturbation operator Vˆ describes the electron correlation (the error of the Hartree-Fock approach) and the aim of the perturbation calculation is to improve the HF energy towards the exact solution of the Schro¨dinger equation in the same basis set. This is the so-called Møller-Plesset partitioning. The formal expansion of the MPPT partitioned Hamiltonian may be written as

X 1 Xh i Hˆ = Hˆ + Vˆ ⇒ Vˆ = Hˆ − Fˆ = − Jˆ(i) − Kˆ (i) (108) 0 r i

76 where 1 X X 1 X Hˆ = hpq||rsi{p q sr} + hpi||qii{p q} + hij||iji (109) 2 4 † † † 2 pqrs pqi ij

and, X Uˆ = hpi||qii{p†q} + h0|Uˆ |0i pqi X X = hpi||qii{p†q} + hij||iji (110) pqi ij

Then we can write 1 X 1 X Vˆ = Hˆ − Uˆ = hpq||rsi{p q sr} − hij||iji (111) 2 4 † † 2 pqrs ij

In the above derivation no assumption has been made about Fˆ. We can now assume if in canonical HF (107) is valid.

3. Accepting the partition described by (107), the solution of the zeroth-order equation involves the solution of the Hartree-Fock problem. We have to specify the ground states and excited many-electron states explicitly. The ground state is simply the Fermi vacuum,

| (0)i | i | i | i Ψ0 = Fermivacuum = HF = 0 (112)

The excited states can be classified according to the number of electrons to be excited. Singly excited states are given given by:

(0) ˆ| i ΨK = aˆ†i 0 (113)

where K labels the i → a excitation. Equation (113) expresses that an electron is annihilated from spinorbital i and it is inserted into a. A doubly excited state is given by

(0) ˆ ˆˆ| i ΨK = aˆ†b†ji 0 (114)

where ( i → a K = j → b

77 Now let us evaluate RSPT theory formulae using second quantization. Starting from zeroth order, we have

X n o X ˆ (0) ˆ| i ˆ ˆ | i | i H0Ψ0 = F 0 = εi i†i 0 + εi 0 i i Xocc | i = εi 0 (115) i

Here only hole-hole pair has contributed. So the Fermi vacuum is the zeroth-order eigen ˆ P function in the ground-state of the H0. And i εi is the sum of the energies of the occupied orbitals and not the HF energy. The first order contribution is given by,

(1) h | ˆ | i E0 = 0 V 0 (116)

It follows that the energy to the first order will be

(0) (1) h | ˆ | i h | ˆ | i E = E0 + E0 = 0 H0 0 + 0 V 0 h | | i = 0 Hˆ 0 + Vˆ 0 h | | i h | | i = 0 Hˆ 0 = 0 Hˆ 1 + Hˆ 2 0 (117)

Using the second quantized form of the Hˆ as mentioned previously, we can expression one-electron and two-electron as,

X 1 X Hˆ = Hˆ + hi|hˆ|ii, Hˆ = Hˆ + Hˆ + hij|vˆ|iji (118) 1 1,N 2 2,N 02,N 2 1 ij

Equating (118) in (117) we get,

X 1 X E = hi|hˆ|ii + hij|vˆ|iji (119) 2 i ij

E = Eref = EHF (120) which is the expectation value of the full Hamiltonian with the Hartree-Fock wave function, the Hartree-Fock electronic. We see that, using the Møller-Plesset partitioning, the first order of perturbation theory corrects the sum of orbital energies to the true HF energy.

78 Then first-order energy can be written as,

(1) h | ˆ | i − h | ˆ | i E0 = 0 H 0 0 H0 0 X X 1 X = − ε + h + hij||iji i ii 2 i i ij X 1 X = − u + hij||iji ii 2 i ij X 1 X = − hij||iji + hij||iji 2 ij ij 1 X = − hij||iji (121) 2 ij

ˆ ˆ ˆ where we have used F = H + U and fpq = hpq + upq. In deriving the second order result, the explicit form of the perturbation operation Vˆ should be specified. We can write,

− Vˆ = Hˆ Hˆ 0 (122)

To evaluate the second order formula, the only matrix element we need is V0K since the second-order energy correction can also be written as:

2 (2) X |V | E = − 0K (123) 0 (0) − (0) n,K EK E0 where K labels an excited state. In principle, it can be a p-fold state with p = 1,2,3....

However, it is easy to show only p = 2 contribute to V0K . Let us check first the role of singly excited states. From the Brillouin theorem we know that the full Hamiltonian does not have such a matrix element:

h (0)| ˆ | (0)i H0K = Ψ0 H ΨK = 0 (124)

that is,

h (0)| ˆ ˆ | (0)i h (0)| (0)i h (0)| ˆ | (0)i Ψ0 H0 + V ΨK = EK Ψ0 ΨK + Ψ0 V ΨK

= V0K = 0 (125)

where the zeroth-order Schro¨dinger equation and the orthogonality of the zeroth-order

states are utilized. It follows that V0K = 0 if K is a singly excited state. And for any excited state higher than doubly excited state it will also give zero ˆ contribution to V0K = 0 because V contains at most two-elctron terms, then using Slater- Condon rule for two-electron operator with more than two non-coincidences we get zero.

79 So, only doubly excited states contribute to the matrix element V0K , thus only they enter the second-order formula (123). With this result, the matrix element of V0K = 0 can be evaluated. Then h (0)| ˆ − ˆ | (0)i Ψ0 H H0 ΨK (126) First, we evaluate one-electron part of Vˆ using generalized Wick’s theorem and (118), X X h (0)| ˆ | (0)i h (0)|{ }| (0)i h (0)|{ }{ ˆ ˆˆ}| (0)i Ψ0 H1 ΨK = hpq Ψ0 pˆ†qˆ ΨK = hpq Ψ0 pˆ†qˆ aˆ†b†ji Ψ0 (127) pq pq here p and q include both hole and particle states. Second term in Fˆ will not contribute (0) (0) because Ψ0 and ΨK are orthogonal. X h (0)|{ } { } { } { } = hpq Ψ0 p†qa†b†ji + p†qa†b†ji + p†qa†b†ji + p†qa†b†ji + ... pq | (0)i + all allowed contractions Ψ0 (128) Here all terms we will get are not fully contracted, so vacuum expectation value of such

a operator vanishes. Same argument holds for Hˆ 0. So only non-zero contribution will come from two-electron operator of Hˆ . Now,

(0) (0) (0) 1 X (0) hΨ |Hˆ |Ψ i = hΨ |Hˆ + Hˆ + hij|vˆ|iji|Ψ i (129) 0 2 K 0 2,N 02,N 2 K ij ˆ ˆ Again here also second term will become zero because H02,N has similar form as H0 and third term vanishes because of orthogonality. Then,

(0) (0) (0) (0) 1 X (0) (0) hΨ |Hˆ |Ψ i = hΨ |Hˆ |Ψ i = hpq|vˆ|rsihΨ |{p q sr}{a b ji}|Ψ i (130) 0 2 K 0 2,N K 2 0 † † † † 0 pqrs Using generalized Wick’s theorem and collecting fully contracted terms with non-zero contractions,

1 X (0) = hpq|vˆ|rsihΨ |{p q sra b ji} + {p q sra b ji} + {p q sra b ji} 2 0 † † † † † † † † † † † † pqrs

{ } { }| (0)i + p†q†sra†b†ji + p†q†sra†b†ji Ψ0 (131) First term has no contribution, then 1 X h i = hpq|vˆ|rsi −δ δ δ δ + δ δ δ δ − δ δ δ δ + δ δ δ δ 2 pi qj sa rb pi qj sb ra pj qi sb ra pj qi sa rb pqrs 1 = [−hij|vˆ|bai + hij|vˆ|abi − hji|vˆ|abi + hji|vˆ|bai] 2 = [hij|vˆ|abi − hij|vˆ|bai] = [hij||abi] (132)

80 Then collecting all non-zero terms and substituting into (123). The excitation energy in the denominator of the second-order formula is determined by the change in the sum of the orbital energies due to the change in the occupancy of the orbitals upon excitation:

2 (2) X |hij||abi| E = − (133) 0 ε + ε − ε − ε a

Equation (133) is second-order Møller-Plesset(MP2) formula for the correction energy in terms of the spinorbitals. (1) Similarly for third-order formula (94), we have already calculated V00 = E0 and V0K . h (0)| ˆ | (0)i Only unknown matrix element is VKJ = ΨK V ΨJ . Here also only doubly excited states with two electron operator term will contribute as mentioned before. Using generalized Wick’s theorem we can write,

h cd| | abi h cd| | abi Ψlm p†p Ψij = 0, Ψlm p†q Ψij = 0 (134)

Then,

h (0)| ˆ | (0)i h (0)|{ }{ }{ }| (0)i VKJ = ΨK V ΨJ = Ψ0 l†m†dc p†q†sr a†b†ji Ψ0

h (0)|{ } { } = Ψ0 l†m†dcp†q†sra†b†ji + l†m†dcp†q†sra†b†ji

+ {l†m†dcp†q†sra†b†ji} + {l†m†dcp†q†sra†b†ji}

{ } | (0)i + l†m†dcp†q†sra†b†ji + ... + many terms Ψ0 (135)

Remaining work is left as an exercise. If we evaluate all fully contracted terms then we will end up with,

occ vir (3) 1 X X hij||abihab||lmihlm||iji E = 0 8  − −  − − i,j,l,m ab εa + εb εi εj (εa + εb εl εm) occ vir 1 X X hij||abihab||cdihcd||iji 8  − −  − −  i,j a,b,c,d εa + εb εi εj εc + εd εi εj occ vir XX hij||abihlb||cjihac||ili (136)  − −  − − i,j,l a,b,c εa + εb εi εj (εa + εc εi εl)

This is MP3 formula for the correction energy in terms of the spinorbitals.

81 D. Diagrammatic expansions for MPPT

1. Diagrammatic notation

Sometimes evaluating terms in second-quantization treatment can be cumbersome and error-prone so to our calculations easy diagrammatic notation was introduced. It helps to list all non vanishing distinct terms in the perturbation sums, to elucidate certain cancellations in these sums and to provide certain systematics for the discussion and manipulation of the various surviving terms. Time ordering represents the time sequence in the application of various operators, and this is indicated in the diagrams by means of a time axis for the sequence of events. Another common arrangement is to place the time axis horizontally, from right to left. The actual time at which each event occurs (i.e. an operator acts) is irrelevant; only the

t

FIG. 1. Time Ordering sequence is significant. Starting with the representation of a Slater determinant (SD). The reference state (the Fermi vacuum) is represented by nothing, i.e. by a position on the time axis at which there are no lines or other symbols. Any other SD, is represented by vertical or diagonal directed lines, pointing upward for particles and downward for holes, with labels identifying the spinorbitals. The horizontal double line represents the point

a Ψ = aˆ†ˆi 0 | i i { }| i

i a

of operation of the normal-product operator, and below or above it we have the Fermi vacuum. To avoid phase ambiguity, we can indicate which particle index appears above which hole index.

82 ab Ψ = aˆ†ˆb†ˆjˆi 0 | ij i { }| i

i a j b

2. One-particle operator

Now we consider the representation of operators. We begin with a one-electron operator in the normal form, say, X ˆ h | | i{ } UN = p uˆ q pˆ†qˆ (137) pq | ai { ˆ}| i acting on singly excited Slater determinant Ψi = aˆ†i 0 The action and representation of the individual terms in the sum over p, q in (137) will depend on whether p and q are particle or hole indices. For illustration we will consider two cases particle −particle(pp) and particle − hole(ph) only and remaining cases are left as an exercise. We begin with a (pp) term, then application of one-electron operator on singly excited Slater determinant we obtain (using the generalized Wick’s theorem) h | | i{ˆ }{ ˆ}| i h | | i | bi h | | i| bi b uˆ c b†cˆ aˆ†i 0 = b uˆ c δac Ψi = b uˆ a Ψi (138) | ai which is represented by the diagram Here at the bottom we had Ψi and at the top we

b

i a

| bi have Ψi , the resulting determinant. The point of action of the operator is marked by the interaction line (or vertex ). We associate the integral hb|uˆ|ai with the vertex as a multiplicative factor. Note that the bra spinorbital in the integral corresponds to the line leaving the vertex, while the ket corresponds to the entering line.

83 Similarly for ph,

h | | i{ˆ ˆ}{ ˆ}| i h | | i| abi b uˆ j b†j aˆ†i 0 = b uˆ j Ψij (139) | abi showing that the resulting determinant is Ψij . The following principles are used to

b j

i a

draw these kind of diagrams:

1. The interaction is denoted by a dotted, horizontal line and the electron orbitals involved in that interaction by solid, vertical lines, connected with the interaction line to a vertex.

2. A core orbital is represented by a line directed downwards (hole line) and a virtual orbital by a line directed upwards (particle line).

3. The orbitals belonging to the initial state (to the right in the matrix element) have their arrows pointing toward the interaction vertex, those of the final state away from the vertex.

3. Two-particle operators

We now turn to a two-particle operator in normal-product form, 1 X 1 X Wˆ = hpq|rsi{pˆ qˆ sˆrˆ} = hpq||rsi{pˆ qˆ sˆrˆ} (140) 2 † † 4 † † pqrs pqrs This operator is denoted by an interaction line connecting two half-vertices at the same level (i.e. the same point on the time axis). The two half-vertices and the interaction line constitute a single vertex. Each individual half-vertex will have one incoming and one outgoing line, each of which may be a particle line or a hole line. The association of line labels with the two-electron integral indices and the creation or annihilation operators follows the same rule as for one-body vertices:

incoming line ↔ annihilation operator ↔ ket state outgoing line ↔ creation operator ↔ bra state (141)

84 electron 1 ↔ left half-vertex electron 2 ↔ right half-vertex (142) The integral indices associated with a two-body vertex are assigned according to the scheme hleft-out right-out|left-in right-ini (143) while the corresponding operator product can be described by

{(left-out)† (right-out)† (left-in)(right-in)} (144)

Diagrams employing this representation of the two-body interaction (which is based on non-antisymmetrized integrals) are called Goldstone diagrams. Consider a simple example of vacuum expectation value of Wˆ 2. Then using Wick’s theorem we obtain, 1 X 1 X h0|Wˆ 2|0i = hpq|rsi htu|vwih0|{pˆ qˆ sˆrˆ}{tˆ uˆ wˆvˆ}|0i 2 2 † † † † pqrs tuvw     1 X X  =  hij|abihab|iji − hij|abihba|iji 4   abij abij      1  X X  + − hij|abihab|jii + hij|abihba|jii (145) 4    abij abij 

The diagrammatic description of terms can be done easily using rules defined earlier. The first and fourth diagrams are equivalent (by exchange of the two half-vertices at the

top or bottom) and so are the second and third. So keeping only first and third terms in h l the sum. And to identify the correct phase factor, there is a rule (−1) − . Where h is the number of hole lines in the loop and l is the number of loops. So, 1 X 1 X h0|Wˆ 2|0i = hij|abihab|iji − hij|abihab|jii (146) 2 2 abij abij

The factor 1/2 derives from the fact that each of these diagrams is symmetric under reflection in a vertical plane through its middle. Now if we want to do similar calculations for matrix element like h0|Wˆ 3|0i, Goldstone representation will have number of distinct diagrams as the number of interaction vertices increases, reflecting the individual listing

85 of each possible exchange. There is also some difficulty in making sure that all those distinct possibilities have been listed exactly once, since it is not always easy to determine whether two diagrams are equivalent. However, the advantage of Goldstone diagrams is the straightforward determination of phase factors. The difficulties associated with the use of the Goldstone representation can be overcome by basing the analysis on the antisymmetric integrals hpq||rsi. Since the exchange contribution is incorporated within each antisymmetrized integral, such an approach leads to a much smaller number of distinct diagrams. The diagrams using this representation of the Wˆ operator are called Hugenholtz diagrams.

4. Hugenholtz diagrams

They maintain the usual (Goldstone) form for one-body operators but represent the two-body vertex as a single large dot with two incoming and two outgoing lines (each of which can be a particle or hole line). The labels on the outgoing lines appear in the bra part of the antisymmetrized integral, while the incoming labels appear in the ket part. The order of the labels in each part is indeterminate, and therefore the phase of the corresponding algebraic interpretation is indeterminate. The Hugenholtz representation of the h0|Wˆ 2|0i matrix element has just one distinct diagram instead of two, Expansion of the antisymmetrized integrals in terms of ordinary

ˆ 2 1 0 W 0 = = 4 ij ab ab ij h | | i Ph || ih || i

integrals gives four terms, which are equal in pairs, reproducing the two-term result obtained with Goldstone diagrams. The weight factor 1/4 is obtained by counting the number of pairs of equivalent lines in the diagram: a pair of lines is equivalent if they connect the same pair of vertices in the same direction. Each pair of equivalent lines contributes a factor 1/2. The diagram for h0|Wˆ 2|0i has two such pairs, resulting in a weight factor 1/4. Goldstone and Hugenholtz representation is left as an exercise. It is a good exercise to convince yourself the power of Goldstone and Hugenholtz representation.

86 5. Antisymmetrized Goldstone diagrams

The antisymmetrized Goldstone diagrams can be summarized by the following rules:

1. Generate all distinct Hugenholtz skeletons.

2. For each skeleton assign arrows in all distinct ways to generate Hugenholtz diagrams.

3. Expand each Hugenholtz diagram into an ASG diagram in any of the possible equivalent ways.

4. Interpret each two-body vertex in each ASG diagram in terms of an antisymmetrized integral, with the usual hleft-out right-out||left-in right-ini arrangement.

5. Interpret each one-body vertex in each ASG diagram as in ordinary Goldstone diagrams.

h l 6. Assign a phase factor (−1) − ,as for ordinary Goldstone diagrams.

 1 n 7. Assign a weight factor 2 , where n is the number of equivalent line pairs; two lines are equivalent if they connect the same two vertices in the same direction.

6. Diagrammatic representation of RSPT

The zero- and first-order energies are given by

(0) X E0 = εi (147) i

(1) 1 X E = − hij|Vˆ |iji (148) 0 2 ij

The second-order energy expression can be alternatively written in the following equivalent

= −

form which is more useful,

(0) (0) (0) (0) 2 (2) X hΨ |Vˆ |Ψ ihΨ |Vˆ |Ψ i (0) (0) 1 X |hij||abi| E = 0 K K 0 = hΨ |Vˆ Rˆ Vˆ |Ψ i = (149) 0 (0) (0) 0 0 0 − − − 4 εi + εj εa εb K,0 E0 EK a,b,i,j

87 where, | (0)ih (0)| X ΨK ΨK Rˆ0 = (150) (0) − (0) K,0 E0 EK is called the resolvent operator. Its presence in an expression is represented diagrammatically

Rˆ0

a i j b

by a thin horizontal line cutting the particle-hole lines, as shown on the figure. Rˆ0 does not change the state on which it operates, it only represents the division by the energy

denominator, therefore any particle or hole lines present below the point of action of Rˆ0 continue unchanged above it. Expressions we have derived for MP 1, MP 2 and MP 3 are correlation energies which corrects the Hartree Fock energy. While computing, MP2 is less expensive and give significance improvement. In principle, one could go up to higher orders of perturbation theory (MP3, MP4, etc), but the computer programs become too hard to write, and the results (perhaps surprisingly) don’t necessarily get any better.

E. Time versions

The diagrams which may be transformed one into the other by topological deformations (transformations) which do not preserve the order of operators along the time axis are referred to as time versions of the same diagrams.

1. Time version of the first kind

Time version of the first kind may be obtained one from another by the permutation of vertices which do not change the particle-hole character of any of the fermion line in the diagram. For example, the diagrams in Fig.(2) are time versions of the first kind of a fourth order energy diagram with two U vertices.

2. Time version of the second kind

When the vertex permutation changes the hole-particle character of atleast one line, we obtain different time versions of the second kind. Thus the diagram in Fig.(3) is time version of the second kind of the first diagram.

88 FIG. 2. Time version of first kind.

FIG. 3. Time version of second kind.

F. Connected and disconnected diagrams

In the second order wavefunction we have either disconnected or connected diagrams. | (2)i | i Ψ = Rˆ0Wˆ Rˆ0Wˆ 0 Fig.(4) are all the possible Hugenholtz diagrams for the second order wavefunction contribution. Thus we have one disconnected diagram (i) which yeilds a quadruply excited contribution, while the remaining diagrams are connected and correspond to triply (ii), (iii) doubly (iv), (v), (vi) and singly (vii), (viii) excited contributions. No contribution of the vacuum can arise, since any diagram having only internal lines (i.e. energy diagrams) would lead to a dangerous denominator.

89 FIG. 4. Second order wavefunction correction.

FIG. 5. Third order wavefunction correction.

G. Linked and unlinked diagrams

We have seen in the preceding section that for wavefunction contribution we obtain disconnected diagrams already in the second order. In higher order of perturbation theory disconnected diagrams of another type will occur. For example, a few possible third order wavefuntion diagrams, which are disconnected are as shown in Fig.(5). Even though all these diagrams are bona-fide wavefunction diagrams (i.e., no dangerous denominators). We shall see that the latter diagram (iii) has a very different character than the former two diagrams (i) and (ii), since it contains an energy diagram as a disconnected part. We shall refer to energy diagrams, which have no external lines, as vacuum diagrams (or vacuum parts when they form a disconnected part of some diagrams), since they represent Fermi vacuum mean values. Further, a disconnected is unlinked if it has atleast

90 FIG. 6. Unlinked part of third order wavefunction correction.

one disconnected vacuum part, and linked, if it has no disconnected vacuum part. Any unlinked diagram is by definition a disconnected diagram, while a linked diagram can be either connected or disconnected. In the latter case, however, none of its disconnected parts can be a vacuum diagram. On the other hand, a connected diagram is always linked (even if it’s a vacuum diagram), while a disconnected diagram can be either linked or unlinked, depending on whether all of its disconnected parts are of a non-vacuum type or not respectively. Obviously, each unlinked diagram has a number of time versions of the first kind, since its disconnected parts can be positioned relative to one another in all distinct ways which do not introduce dangerous denominators. Thus, in the case of diagrams Fig.(5(iii)), there are two possible time versions as shown in Fig.(6). The contributions from either of these time vertices differ only in the denominator part since all the scalar factors associated with the vertices and all the operators associated with external lines are clearly identical. Designating the denominator of the vacuum part, considered as a seperate diagram, by a and, similarly, the denominator of the part involving external lines (considered seperately) as b, the contribution from both time versions is 1 1 N + N b(a + b)b a(a + b)b where N designates the identical numerator part. Carrying out the sum we get,

1 1 1 a + b 1 N + = N = N (151) b(a + b) b a b(a + b)ab ab2

91 FIG. 7. Linked disconnected diagrams for parts A and B.

The above result is easily seen to be precisely the contribution, except for the sign, from the third order renormalization term in third order corrected wavefuntion given as,         3  |Ψ (3)i =  Rˆ Wˆ −h0|Wˆ Rˆ Wˆ |0iRˆ2Wˆ |0i (152)  0 0 0     | {z } | {z } P rincipal term Renormalization term  which is given, up to the sign, by the product of the second order energy contribution and the first order wavefunction contribution taken with the second order denominator − 1 1 vertex and thus equals N a b2 . Therefore, the renormalization term Eq.(152) exacly cancels the contribution from unlinked diagrams Fig.(6) originating from the principal third order term, given by Eq.(151).

H. Factorization lemma (Frantz and Mills)

Consider all the possible time versions for a linked diagram consisting of two disconnected parts called A and B as shown in Fig.(7). Let the set of energy denominators for the part a b A alone be ∆µ, µ = 1,...,m, and for the part B be ∆ν, ν = 1,...,n. The denominators are A B numbered along the time axis, i.e., the lowest denominators are ∆1 and ∆1 in parts A and B respectively. The denominator contribution from all time versions of the first kind corresponding to all possible orderings of permutation vertices in parts A and B relative to one another

92 AB can be written as Dmn given as

m+n X Y  1 AB A B − Dmn = ∆α(p) + ∆β(p) (153) α,β p=1 { }

where the summation extends over all the sets of (m + n) integer pairs Γp = (α(p),β(p)) such that 0 ≤ α(p) ≤ m & 0 ≤ β(p) ≤ n.

Γp is defined as follows:

1. Γ1 = (1,0) or Γ1 = (0,1).

2. Γp+1 = (α(p) + 1,β(p)) or Γp+1 = (α(p),β(p) + 1).

3. Γm+n = (m + n); α(m + n) = m, β(m + n) = n. where we also define A B ∆0 = ∆0 = 0 A For the seperate disconnected parts the denominators are given by the products of ∆µ or B ∆ν which can alos be written using the general expression Eq.(153) as m n Y  1 Y  1 A AB A − B AB B − Dm = Dm0 = ∆µ & Dn = D0n = ∆ν (154) µ=1 ν=1

A B AB where we define D0 = D0 = D00 = 1 The desired factorization lemma can now be simly stated as

AB A B Dmn = DmDn (155)

The proof is easily carried out using mathematical induction. The lemma holds when m = 0 or n = 0, Since

AB A B A AB A B B Dm0 = DmD0 = Dm or D0n = D0 Dn = Dn in agreement with Eq.(154). Assume that the lemma holds for M = m − 1,N = n and − ≥ AB A B AB M = m,N = n 1, m,n 1, i.e. DMN = DMDN . Clearly, all the terms in Dmn can be divided into two disjoint classes according to whether the leftmost interaction occurs in A or in B subgraph, respectively, The last (top) denomimator factor being always the same as A B required by (3), namely (∆m + ∆n), we can write   1   AB A B − AB AB Dmn = ∆m + ∆n Dm 1,n + Dm,n 1 (156) − − Since all the remaining factors are identical with those characterizing the disconnected diagrams which results when one top vertex is deleted: either in the A part (M = m − 1,N = n) or in the B part (M = m,N = n − 1). This result holds even when m or n equals 1.

93 Since the lemma Eq.(155) holds for M = m−1,N = n and M = m,N = n−1 by assumption, we can write Eq.(156) as

  1   AB A B − A B A B Dmn = ∆m + ∆n Dm 1Dn + DmDn 1 (157) − − The denominator of seperate parts are given by a single product, Eqs.(154), so we have

  1 A A A − ⇒ A A A Dm = Dm 1 ∆m = Dm 1 = Dm∆m − −   1 B B B − ⇒ B B B Dn = Dn 1 ∆n = Dn 1 = Dn ∆n − − so we get   1   AB A B − A A B A B B A B Dmn = ∆m + ∆n Dm∆mDn + DmDn ∆n = DmDn proving the lemma.

I. Linked-cluster theorem

We saw that for third order correction to the wavefunction, the renormalization term was cancelled by the unlinked term from the principal term. This happens at all orders, so the contribution in each order is given by all the linked diagrams. In this derivation the linked-diagram expansions for the wave function and energy are substituted into the recursive form Eq.(158) of the Schro¨dinger equation (This equation can be found in Shavitt’s Chapter 2 Eq.(2.75)), and the factorization theorem is used to show that this expansion satisfies the equation. To prove this assertion we first rewrite Eq.(2.75) from Shavitt’s book in a form appropriate for RSPT, | i | i  − | i Ψ = 0 + Rˆ0 Wˆ ∆E Ψ (158)

∆E = h0|Wˆ |Ψ i (159)

ˆ ≡ ˆ ˆ ˆ − (1) − − − (1) where R0 R0 (E0),W = V E and ∆E = E Eref = E E0 E . The implicit equations (158), (159) for |Ψ i and ∆E are entirely equivalent to the Schro¨dinger equation. We need to prove that these equations are satisfied by the linked-diagram expansions

X∞ h n i |Ψ i = Rˆ Wˆ |0i (160) 0 L n=0

X∞  n h | ˆ ˆ ˆ | i ∆E = 0 W R0W 0 L (161) n=1 where the subscript L indicates that the summations are limited to linked diagrams only (note that the n = 0 term is missing in the summation for ∆E in Eq.(161) because

94 h0|Wˆ |0i = 0). We are going to prove this assertion by substituting Eq.(160) and Eq.(161) into the recursive equations (158), (159) and showing that the latter are then satisfied. We first substitute Eq.(160) in Eq.(159), obtaining

X∞ h n i ∆E = h0|Wˆ Rˆ Wˆ |0i (162) 0 L n=1

It is easy to verify that all the closed diagrams that can be formed by adding a new top vertex to the upwards-open linked n-vertex diagrams are linked (because all disconnected parts of the open diagram must be closed by the single added vertex) and constitute the complete set of all closed linked (n+1)-vertex diagrams. Therefore Eq.(162) is consistent with Eq.(159). Next we substitute Eq.(160) in Eq.(158), resulting in

X∞  h n i |Ψ i = |0i + Rˆ Wˆ − ∆E Rˆ Wˆ |0i 0 0 L n=0 (163) X∞ h n i X∞ h n i = |0i + Rˆ Wˆ Rˆ Wˆ |0i − ∆ERˆ Rˆ Wˆ |0i 0 0 L 0 0 L n=0 n=0

Each term of the first sum over n in the second line of Eq.(163) consists of all the upwards-open (n + 1)-vertex diagrams that can be formed by adding one vertex (and the corresponding resolvent) to all upwards-open linked n vertex diagrams. Each resulting diagram either is linked or is unlinked with a single separate closed part (if the added vertex closed a disconnected part of the n-vertex open diagram) and has the top vertex of the closed part as the top vertex of the entire diagram. We may therefore rewrite Eq.(163) in the form

X X X | i | i ∞ h  n | ii ∞ h h n | ii i − ∞ h n | ii Ψ = 0 + Rˆ0Wˆ Rˆ0Wˆ 0 + Rˆ0Wˆ Rˆ0Wˆ 0 ∆ERˆ0 Rˆ0Wˆ 0 L L U L n=0 n=0 n=0

(164) where the subscript U indicates restriction to unlinked terms. The factorization theorem can then be used to show the cancellation of the last two sums in this equation, because each term in the third sum can be described by an open diagram with an insertion above its top vertex; this diagram cancels the contributions to the second sum from the sum of corresponding unlinked two-part open diagrams in which the top vertex of the closed part is the top vertex of the entire diagram. The remaining terms of the right-hand side are equivalent to the linked-diagram expansion Eq.(160), proving that this expansion satisfies Eq.(158) and the Schro¨dinger equation.

95 J. Removal of spin

So far the formalism has been specified in terms of spinorbitals, and no attempt has been made to consider the effects of spin. However, since the nonrelativistic Hamiltonian does not contain spin coordinates, integration over the spin variables is easily carried out and results in significant economies in the calculations. The simplest way in which spin affects the perturbation theory summations is that some integrals vanish because of spin orthogonality. Thus if we indicate the spin factor of a spinorbital by putting a bar over β spinorbitals, and no bar over α’s, we have

hpq||rsi = hpq|v|rsi − hpq|v|sri, hpq¯||rs¯i = hpq|v|rsi, hpq¯||rs¯ i = −hpq|v|sri, hpq¯ ||rs¯ i = hpq|v|rsi, hpq¯ ||rs¯i = −hpq|v|sri, hp¯q¯||r¯s¯i = hpq|v|rsi − hpq|v|sri, where the integrals on the r.h.s. are over the spatial factors only, and

hpq¯ ||rsi = hpq¯||rsi = hpq||rs¯ i = hpq||rs¯i = 0, hp¯q¯||rsi = hpq||r¯s¯i = 0, hp¯q¯||rs¯ i = hp¯q¯||rs¯i = hpq¯ ||r¯s¯i = hpq¯||r¯s¯i = 0. Thus, out of the 16 possible combinations of spin assignments to the four orbitals in an antisymmetric two-electron integral, 10 of the resulting integrals vanish completely and four are reduced to a single spatial integral. Taking the second-order energy in the canonical RHF case as an example, we’ve: 1 X hij||abihab||iji E(2) = 4 ab abij ij

1 X 1 h i E(2) = hij||abihab||iji + hij¯||ab¯ihab¯||ij¯i + hij¯||ab¯ ihab¯ ||ij¯i + 4 ab abij ij 1 X 1 h i hij¯ ||ab¯ihab¯||ij¯ i + hij¯ ||ab¯ ihab¯ ||ij¯ i + hi¯j¯||a¯b¯iha¯b¯||i¯j¯i 4 ab abij ij

1 X 1 E(2) = [hij||abihab||iji + hij|v|abihab|v|iji + hij|v|baihab|v|jii]+ 4 ab abij ij 1 X 1 [hij|v|baihab|v|jii + hij|v|abihab|v|iji + hij||abihab||iji] 4 ab abij ij

96 1 X 1 E(2) = [2hij||abihab||iji + 2hij|v|abihab|v|iji + 2hij|v|baihab|v|jii] 4 ab abij ij 1 X 1 E(2) = [(hij|v|abi − hij|v|bai)(hab|v|iji − hab|v|jii) + hij|v|abihab|v|iji + hij|v|baihab|v|jii] 2 ab abij ij

1 X 1 E(2) = [hij|v|abihab|v|iji − hij|v|baihab|v|iji − hij|v|abihab|v|jii] 2 ab abij ij 1 X 1 + [hij|v|baihab|v|jii + hij|v|abihab|v|iji + hij|v|baihab|v|jii] 2 ab abij ij

1 X 1 E(2) = [2hij|v|abihab|v|iji + 2hij|v|baihab|v|jii − hij|v|abihab|v|jii − hij|v|baihab|v|iji] 2 ab abij ij Exchanging electron labels in second part of second and third term in above equation. 1 X 1 E(2) = [2hij|v|abihab|v|iji + 2hij|v|baihba|v|iji − hij|v|abihba|v|iji − hij|v|baihab|v|iji] 2 ab abij ij where the summations are over the distinct spatial orbitals only. Since a, b are dummy summation indices and can be interchanged, we find that the first two terms in the brackets are equal (after summation), and so are the third and fourth. Thus X 1 E(2) = [2hij|v|abihab|v|iji − hij|v|abihba|v|iji] ab abij ij

X hij|v|abi E(2) = [2hab|v|iji − hba|v|iji] ab abij ij Similar treatments hold for other terms.

IX. COUPLED CLUSTER THEORY

The coupled cluster theory was introduced in 1960 by Coester and Kummel for calculating nuclear binding energies. In 1966 J. Cizek and latter with J. Paldus reformulated the method for electron correlation in atoms and molecules.

A. Exponential ansatz

Ψcc = ΩΦ0 Tˆ Ψcc = e Φ0

97 where Ω is often called wave operator as it takes an unperturbed solution into the exact solution and ˆ ˆ ˆ ˆ ˆ T = T1 + T2 + T3 + ...... Tm where X ˆ a{ ˆ ˆ} T1 = ti a†i ia 1 X Tˆ = tab{aˆ bˆ jˆiˆ} 2 (2!)2 ij † † ijab . . . 1 X Tˆ = tab...{aˆ bˆ .....jˆiˆ} m (m!)2 ij... † † ij... ab... ≤ ab... where m N and N represents the number of electrons. tij... are coefficients to be determined, usually referred as "amplitudes" for the corresponding operators. Also

ab − ab − ba ba tij = tji = tij = tji

The simplest couple cluster approach is that of coupled cluster doubles (CCD) in which Tˆ is truncated to ˆ ˆ TCCD = T2 The most common extension of this model is coupled cluster singles and doubles (CCSD), defined by ˆ ˆ ˆ TCCSD = T1 + T2 and similarly ˆ ˆ ˆ ˆ TCCSDT = T1 + T2 + T3

B. Size consistency

Consider a system AB composed of two non-interacting components A and B

Φ0(AB) = Φ0(A)Φ0(B)

T (AB) = T (A) + T (B) then T (AB) Ψ (AB) = e Φ0(AB) T (A)+T (B) Ψ (AB) = e Φ0(A)Φ0(B)

98 T (A) T (B) Ψ (AB) = e Φ0(A)e Φ0(B) Ψ (AB) = Ψ (A)Ψ (B) This separability of wavefunction ensures the additivity of the energy

H(AB)Ψ (AB) = [H(A) + H(B)]Ψ (A)Ψ (B)

H(AB)Ψ (AB) = [E(A) + E(B)]Ψ (A)Ψ (B) H(AB)Ψ (AB) = [E(A) + E(B)]Ψ (AB)

C. CC method with double excitations

The Schrodinger equation is

HΨCCD = ECCDΨCCD (165)

h | | i h | i Φ0 H ΨCCD = ECCD Φ0 ΨCCD h | i We can put Φ0 ΨCCD = 1 by the choice of intermediate normalization

h | | i ECCD = Φ0 H ΨCCD (166)

T2 where ΨCCD = e Φ0 In order to make our calculation easy, we write the total Hamiltonian H in normal order form i.e. h | | i H = HN + 0 H 0

H = HN + Eref where

HN = FN + WN X 1 X H = f {pˆ qˆ} + hpq||rsi{pˆ qˆ sˆrˆ} N pq † 4 † † pq pqrs and h | | i Eref = 0 H 0 So equation (170) becomes

− h | − | i ECCD Eref = Φ0 (H Eref ) ΨCCD

h | | i ∆ECCD = Φ0 HN ΨCCD | i | i Let for simplicity Φ0 = 0 so

h | T2 | i ∆ECCD = 0 HN e 0

99 1 ∆E = h0|H (1 + T + T 2)|0i (167) CCD N 2 2 2

The first term in above equation is zero because HN is in normal order form and also the third term is zero due to slatter-condon rule so we get

h | | i ∆ECCD = 0 HN T2 0

1 X X 1 X  ∆E = tab h0| f {pˆ qˆ} + hpq||rsi{pˆ qˆ sˆrˆ} {aˆ bˆ jˆiˆ}|0i CCD 4 ij pq † 4 † † † † ijab pq pqrs The first term is zero since it is in normal order form and the second term becomes

1 XX ∆E = hpq||rsitab h0|{pˆ qˆ sˆrˆ}{aˆ bˆ jˆiˆ}|0i CCD 16 ij † † † † ijab pqrs As we know that the full contraction terms survive which are

1 XX  ∆E = h0|p q sra b ji|0i+h0|p q sra b ji|0i+h0|p q sra b ji|0i+h0|p q sra b ji|0i CCD 16 † † † † † † † † † † † † † † † † ijab pqrs

1 XX  ∆E = δ δ δ δ − δ δ δ δ − δ δ δ δ + δ δ δ δ CCD 16 pi qj sb ra pi qj sa rb pj qi sb ra pj qi sa rb ijab pqrs so

1 X  ∆E = hij||abi − hij||bai − hji||abi + hji||bai tab CCD 16 ij ijab since hij||abi = hij||bai so the above equation becomes

1 X ∆E = hij||abitab CCD 4 ij ijab

ab To calculate energy, we need the amplitudes tij and we can obtain equation for these amplitudes by projecting equation (169) onto all double excitation i.e.

h ab| | i h ab| i Φij HN ΨCCD = ECCD Φij ΨCCD

h ab| T2 | i h ab| T2 | i Φij HN e 0 = ECCD Φij e 0

1 1 hΦab|H (1 + T + T 2)|0i = E hΦab|(1 + T + T 2)|0i ij N 2 2 2 CCD ij 2 2 2

100 1 hΦab|H (1 + T + T 2)|0i = E hΦab|T |0i ij N 2 2 2 CCD ij 2

1 1 X h ab| 2 | i h ab| a0b0 i a0b0 Φ HN (1 + T2 + T ) 0 = ECCD Φ Φ t ij 2 2 (2!)2 ij i0j0 i0j0 i0j0a0b0 1 hΦab|H (1 + T + T 2)|0i = E tab (168) ij N 2 2 2 CCD ij Let’s evaluate each term separately of the LHS of equation (168). The first term is equal to

h ab| | i h || i Φij HN 0 = ab ij in above equation we used the Slatter-Condon rule. Now let’s evaluate the second term

1 X hΦab|H T |0i = hΦab|H |Φcditcd ij N 2 4 ij N kl kl klcd 1 X hΦab|H T |0i = hΦab|(F + W )|Φcditcd ij N 2 4 ij N N kl kl klcd

We solve each term separately in above equation. Let’s name the first term as L1 and the second as L2, so

1 X L = hΦab|F |Φcditcd 1 4 ij N kl kl klcd 1 XX L = f h0|{iˆ jˆ bˆaˆ}{pq}{cˆ dˆ lˆkˆ}|0itcd 1 4 pq † † † † kl klcd pq Here the possible contractions are 16, four of which are

    h0|{i†j†ba}{pq}{c†d†lk}|0i + h0|{i†j†ba}{pq}{c†d†lk}|0i +

    h |{ }{ }{ }| i h |{ }{ }{ }| i cd 0 i†j†ba pq c†d†lk 0 + 0 i†j†ba pq c†d†lk 0 tkl

  − − cd = δikδjlδbdδapδcq + δikδjlδacδbpδdq δikδjqδlpδacδbd δiqδkpδjlδacδbd tkl As there are twelve more terms, in each case we obtain contributions that are equal to 1 the above four terms canceling the factor 4 and hence we get X X X X cb ad − ab − ab L1 = factij + fbdtij fljtil fkitij c d l k

101 Changing the dummy summation indices in some terms and permuting some indices gives the following result.

X X − cb − ac − ab − ab L1 = (factij fbctij ) (fjktik fiktjk ) c k In canonical Hartee-Fock case

X X − bc − ac − ab − ab L1 = (εaδactij εbδbctij ) (εjδjktik εiδiktjk ) c k

− ba − ab − ab − ab L1 = (εatij εbtij ) (εjtij εitji )

− − ab L1 = (εa + εb εi εj)tji For the two particle part of the linear term, we have to evaluate

1 XX L = hpq||rsih0|{i j ba}{p q sr}{c d lk}|0itcd 2 16 † † † † † † kl pqrs klcd To obtain valid contractions in this case we must form two contraction each. They can be classified into three cases (a) contract two pairs of hole-index operators (b) contract two pairs of particle-index operators (c) contract one pair of each type.

1 XX L = hpq||rsih0|{i j ba}{p q sr}{c d lk}|0itcd 2a 16 † † † † † † kl pqrs klcd 1 XX L = hpq||rsih0|{ba}{p q sr}{c d }|0itcd 2a 8 † † † † kl pqrs cd

1 XX  L = hpq||rsi h0|{ba}{p q sr}{c d }|0i + h0|{ba}{p q sr}{c d }|0i+ 2a 16 † † † † † † † † pqrs klcd  h |{ }{ }{ }| i h |{ }{ }{ }| i cd 0 ba p†q†sr c†d† 0 + 0 ba p†q†sr c†d† 0 tkl

1 XX   L = hpq||rsi δ δ δ δ − δ δ δ δ − δ δ δ δ + δ δ δ δ tcd 2a 16 bq ap ds cr bq ap cs dr bp aq ds cr bq ap cs dr kl pqrs klcd

1 X L = hab||cditcd 2a 2 ij cd Now

102 1 XX L = hpq||rsih0|{i j ba}{p q sr}{c d lk}|0itcd 2b 16 † † † † † † kl pqrs klcd 1 XX L = hpq||rsih0|{i j }{p q sr}{lk}|0itcd 2b 8 † † † † kl pqrs kl

By similar contraction as for L2a, we get 1 X L = hkl||ijitab 2b 2 kl kl

For L2c, 1 XX L = hpq||rsih0|{i j ba}{p q sr}{c d lk}|0itcd 2c 16 † † † † † † kl pqrs klcd

1 XX  L = hpq||rsi h0|{i j ba}{p q sr}{c d lk}|0i + h0|{i j ba}{p q sr}{c d lk}|0i+ 2c 16 † † † † † † † † † † † † pqrs klcd  h |{ }{ }{ }| i h |{ }{ }{ }| i cd 0 i†j†ba p†q†sr c†d†lk 0 + 0 i†j†ba p†q†sr c†d†lk 0 tkl

1 XX  L = hpq||rsi h0|{j b}{p q sr}{c k}|0itac − h0|{i b}{p q sr}{c k}|0itac− 2c 16 † † † † ik † † † † jk pqrs klcd  h |{ }{ }{ }| i ba h |{ }{ }{ }| i bc 0 j†a p†q†sr c†k 0 tik + 0 i†a p†q†sr c†k 0 tjk

The first term in above equation can be contracted in four ways

h0|{j†b}{p†q†sr}{c†k} + {j†b}{p†q†sr}{c†k} + {j†b}{p†q†sr}{c†k} + {j†b}{p†q†sr}{c†k}|0i

After simplifying, we get −h || i ac = bk cj tik

Apply the same procedure on the other three terms of L2c and then after combining all the terms, we get X  − h || i ac − h || i ac − h || i bc h || i bc L2c = bk cj tik bk ci tjk ak cj tik + ak cj tjk kc

After adding L1, L2a, L2b, L2c, we get

L = L1 + L2a + L2b + L2c

103 1 X 1 X L = (ε + ε − ε − ε )tab + hab||cditcd + hkl||ijitab a b i j ji 2 ij 2 kl cd kl X  − h || i ac − h || i ac − h || i bc h || i bc bk cj tik bk ci tjk ak cj tik + ak cj tjk kc

Now we solve for the quadratic term in equation ()

1 Q = hΦab|H ( T 2)|0i ij N 2 2

1 Q = hΦab|(F + W )( T 2)|0i ij N N 2 2

1 1 Q = hΦab|(F )( T 2)|0i + hΦab|W ( T 2)|0i ij N 2 2 ij N 2 2 The first term in above equation having one electron operator is zero. If we think in 2 terms of diagram, this becomes more clear. Since T2 corresponds to quadruple excitation while the target state is a double excitation, we must use a −2 de-excitation level diagram − but FN has at most 1 de-excitation and hence becomes zero. The second term having two electron operator is

1 Q = hΦab|W (T 2)|0i 2 ij N 2

1 X X X ef Q = hpq||rsih0|{i j ba}{p q sr}{c d lk}{e f nm}|0itcdt 8 † † † † † † † † kl mn pqrs m>n,e>f k>l,c>d No nonzero contractions are possible between the third and fourth normal products in above equation and thus, to obtain nonzero contributions, four of the eight operators in the third and fourth normal products have to be contracted with the first product, and the remaining four with the second product. We shall first consider the case in which the four operators of the first product are contracted with the four operators of the fourth. This term, and the similar one in which the four contractions are between the first and third normal products, represent unlinked contributions since the set of contractions involving the first normal product is decoupled from the set involving the second. Considering the inequalities in the restricted summations over m, n, e, f and the restriction i > j,a > b, the contractions between the first and fourth products can be accomplished in only one way:

1 X X X ef Q = hpq||rsih0|{i j ba}{p q sr}{c d lk}{e f nm}|0itcdt a 8 † † † † † † † † kl mn pqrs m>n,e>f k>l,c>d

104 1 X X Q = hpq||rsih0|{p q sr}{c d lk}|0itcdtab a 8 † † † † kl ij pqrs k>l,c>d The above term can be contracted in four possible ways which gives equal contribution and is given

1 X Q = hkl||cditcdtab a 2 kl ij k>l,c>d The same result is obtained (after renaming the summation indices) for the case in which the four operators of the first product are contracted with those in the third product, and thus we get

1 X Q = hkl||cditcdtab b 2 kl ij k>l,c>d The remaining terms in the quadratic contribution fall into four classes, depending on the pattern of contractions of the first normal product. In class (a) the two hole operators of the first product are contracted with either the third or the fourth product (i.e. i† and i† are contracted with k and l, respectively, or with m and n, respectively, using ordered sums) while the two particle operators are contracted with the fourth or third product, respectively. These two types of contraction produce equal results, canceling a factor 1/2 . Then converting to unrestricted summations adds a factor 1/4, which is later canceled by the four equivalent ways of contracting the remaining operators, giving

1 XX Q = hpq||rsih0|{p q sr}{lk}{c d }|0itcdtab c 16 † † † † kl ij pqrs klcd 1 X Q = hkl||cditcdtab c 4 ij kl klcd In class (b) one hole and one particle operator of the first normal product are contracted with operators in the third product, while the remaining two operators are contracted with operators in the fourth. Converting to unrestricted summations, which introduces an additional factor 1/16, we find that there are 64 choices for these contractions.

Specifically, there are four ways for i† and a to be contracted with operators in the third product while j† and b can be contracted with operators in the fourth product in four ways, giving 16 equal terms; contracting i† and a with operators in the fourth product while j† and b are contracted with operators in the third product give 16 more terms equal to the above, for a total of 32 equal terms. Another set of 32 equal terms is obtained by contracting i† and b with operators in the third product while j† and a are contracted with operators in the fourth product, or vice versa. In total, after renaming the summation indices and performing the remaining contractions we get

105 1 XX Q = hpq||rsih0|{p q sr}{c k}{d l}|0i(tactbd − tbctad) d 4 † † † † ik jl ik jl pqrs klcd X h || i ac bd − bc ad Qd = pq rs (tik tjl tik tjl ) klcd X h || i ac bd bd ac Qd = pq rs (tik tjl + tik tjl ) klcd In classes (c) and (d) three operators of the first normal product are contracted with operators in the third product and one with an operator in the fourth, or vice versa. In class (c) the set of three operators in the first product consists of two particle operators and one hole operator while in class (d) it consists of one particle operator and two hole operators. Furthermore, each case can be generated in two distinct ways, depending on whether the set of three operators is j†ab or j†ab for (c) and i† j†a or i† j†b for (d). There are 16 possibilities in each case: the set of three operators in the first product can be contracted with operators in the third or the fourth product, and in each case these three contractions can be done in four ways, while the remaining single contraction can be chosen in two ways. The 16 possibilities lead to equivalent results, canceling the factor

1/16 obtained by converting to unrestricted summations. As an example, the first Qe term can be written in the form

1 X X X ef hpq||rsih0|{i j ba}{p q sr}{c d lk}{e f nm}|0itcdt 8 † † † † † † † † kl mn pqrs mnef klcd 1 XX = − h0|{p q sr}{c d k}{l}|0itcdtab 8 † † † † kj li pqrs klcd The sign reflects the odd number of interchanges needed to move all the contracted operators to the front in pairs (note that the summation index m is changed to l after the contraction). The remaining operators can be contracted in four ways:

h0|{p†q†sr}{c†d†k}{l} + {p†q†sr}{c†d†k}{l} + {p†q†sr}{c†d†k}{l} + {p†q†sr}{c†d†k}{l}|0i

1 X = − hkl||cditabtcd 2 ik jl klcd

Similarly for the second term of Qd, we get

1 X − hkl||cditcdtab 2 ik jl klcd

106 and hence we get

1 X Q = − hkl||cdi(tcdtab + tabtcd) e 2 ik jl ik jl klcd and using the same procedure for case (d), we get

1 X Q = − hkl||cdi(tactbd + tbdtac) f 2 ij kl ij kl klcd When all Q’s are put together, we get

1 X 1 X 1 X X Q = hkl||cditcdtab + hkl||cditcdtab + hkl||cditcdtab + hpq||rsi(tactbd 8 kl ij 8 kl ij 4 ij kl ik jl klcd klcd klcd klcd 1 X 1 X + tbdtac) − hkl||cdi(tcdtab + tabtcd) − hkl||cdi(tactbd + tbdtac) ik jl 2 ik jl ik jl 2 ij kl ij kl klcd klcd

Equation (168), after putting all the values and some cancellation, we get

1 X 1 X X εabtab = hab||iji + hab||cditcd + hkl||ijitab − hbk||cjitac − hbk||ciitac ij ij 2 ij 2 kl ik jk cd kl kc  1 X X − hak||cjitbc + hak||cjitbc + hkl||cditcdtab + hpq||rsi(tactbd + tbdtac) ik jk 4 ij kl ik jl ik jl klcd klcd 1 X 1 X − hkl||cdi(tcdtab + tabtcd) − hkl||cdi(tactbd + tbdtac) 2 ik jl ik jl 2 ij kl ij kl klcd klcd

In order to solve for the following CCD amplitude equation, we proceed as follows

1 X 1 X X εabtab = hab||iji + hab||cditcd + hkl||ijitab − hbk||cjitac − hbk||ciitac ij ij 2 ij 2 kl ik jk cd kl kc  1 X X − hak||cjitbc + hak||cjitbc + hkl||cditcdtab + hpq||rsi(tactbd + tbdtac) ik jk 4 ij kl ik jl ik jl klcd klcd 1 X 1 X − hkl||cdi(tcdtab + tabtcd) − hkl||cdi(tactbd + tbdtac) 2 ik jl ik jl 2 ij kl ij kl klcd klcd

ab ab h || i εij tij = ab ij + L(t) + Q(tt) where L(t) and Q(tt) corresponds to the linear and quadratic amplitudes respectively. Now the question is how to solve for the amplitude t’s. Here we will use the iterative method. First we substitute L(t) and Q(tt) equal to zero. Thus the first approximation to ab tij is

107 hab||iji tab = ij ab εij This gives an estimate of each amplitude. This approximate value is then substituted then back on the right hand side to evaluate the left hand side and so forth. Finally, one can achieve a self-consistency of the iterative process and obtain the CC function for the ground state of the system. A more efficient way is when the initial amplitudes are taken from a short CI expansion, with subsequent linearization of terms containing the initial (known) amplitudes.

D. Equivalence of CC and MBPT theory

Here we will show that CC form of wave function can be derived from the infinite order of MBPT wave function. The total wave function in MBPT can be written as X ∞ ˆ n | i ΨMBP T = (R0VN ) 0 n=0

(1) (2) = Φ0 + Ψ + Ψ + ..... 0 where the superscripts indicate the order in VN and where VN = FN + W The algebraic expression for individual orders of Ψ can be written as

1 X hab||iji X ha|fˆ|ii Ψ (1) = Φab + Φa 4 ab ij εa i abij εij ai i

1 X hab||cdihcd||iji X hak||cjihcb||iki 1 X haj||cbihcb||iji P si(2) = Φab − Φab + Φa 8 ab cd ij ab bc ij 2 cb a i abcdij εij εij abcijk εij εik abcij εij εi 1 X hab||dkihad||iji 1 X hlc||jkihab||ili 1 X hcd||klihab||iji + Φabc− Φabc+ Φabcd+...... 4 abc ad ijk 4 abc ab ijk 16 abcd ab ijkl abcdijk εijk εij abcijkl εijk εil abcdijkl εijkl εij

In Ψ1 , Ψ2 etc. contain expressions corresponding to connected and disconnected diagrams. Let r=1 contain all the expressions corresponding to connected form of wave function diagrams. We can represent this class by connected operator Tˆ, X ˆ ∞ ˆ n | i T = (R0VN ) 0 C n=0 where XN ˆ ˆ T = Tm m=1

108 We can see that in perturbation theory, infinite order of connected form corresponds to Tm. so we can write N X (n) Tm = Tm m=1 | i { ˆ n | i} Tm 0 = (R0VN ) 0 C,m and hence the corresponding expansion for amplitudes is

X abc... ∞ abc...(n) tijk... = tijk... n=0

(1) As an example, we can write the expression corresponding to T2 is

(1) 1 X hab||iji T = Φab 2 4 ab ij abij εij where ab(1) hab||iji t = ij ab εij

Initial few terms in the expansion of T1 , T2 interms of MBPT expressions are given

| i (1) (2) (3) | i T1 0 = (T1 + T1 + T1 + ...) 0

X ha|fˆ|ii 1 X haj||cbihcb||iji 1 X haj||cbihcb||iji  = a†i + a†i − a†i + ... |0i εa 2 cb a 2 cb a ai i abcij εij εi abcij εij εi

| i (1) (2) (3) | i T2 0 = (T2 + T2 + T2 + ...) 0

1 X hab||iji 1 X hab||cdihcd||iji X hak||cjihcb||iki  = a b ji + a b ji − a b ji + .... |0i 4 ab † † 8 ab cd † † ab bc † † abij εij abcdij εij εij abcijk εij εik

similarly we can write for T3 and other terms. Now we will show that r = 2(having two disconnected parts) expression of MBPT corresponds to the square terms in CC theory. we will illustrate this, for m=2, r=2 and in first order. We can write the last term of Ψ (2) as

1 X hcd||klihab||iji 1 1 X  1 1  = hcd||klihab||iji + {a b ji}{c d lk} 16 abcd ab 2 16 abcd ab abcd cd † † † † abcdijkl εijkl εij abcdijkl εijkl εij εijkl εkl

109 1 1 X  1 1  = hcd||klihab||iji + {a b ji}{c d lk} 2 16 ab cd ab ab cd cd † † † † abcdijkl (εij + εkl )εij (εij + εkl )εkl after simplification, we get

1 1 X hcd||klihab||iji = {a b ji}{c d lk} 2 16 cd ab † † † † abcdijkl εkl εij

1 (1) = (T )2 2 2 if we collect terms in a similar fashion, at the end we will get all the terms of exponential in CC theory i.e.

Tˆ | i ΨMBP T = e 0

E. Noniterative triple excitations correction

CCSDT (coupled cluster single double and triple) approximation is more accurate than CCSD but has an order of N 8 and hence is very expensive computationally. In order to reduce the cost, MBPT have been used to account for the famous (T) correction and is called CCSD(T) approach instead of using full CCSDT approximation. Now we will show that how MBPT can be used in (T) correction of connected triple excitation.

We can decompose the normal ordered Hamiltonian HN as follows

0 1 HN = H + H = FN + VN

where zeroth order component of the Hamiltonian is taken to be the Fock operator such

that the perturbation operator is then the remaining two electron operator i.e. VN . Also we can decompose the cluster operators as done before in previous section i.e.

(1) (2) (3) Tm = (Tm + Tm + Tm + ... (169)

We can define our Hamiltonian as

¯ T T H = e HN e

For T = T1 + T2 + T3, the above equation takes the form

1 1 1 H¯ = (H + H T + H T + H T + H T 2 + H T 2 + H T 2 + H T T + H T T + ...... ) N N 1 N 2 N 3 2 N 1 2 N 2 2 N 3 N 1 2 N 1 3 C The proof of this relation is asked in the homework problem set. As in the previous section, we proved the equivalence between CC theory and MBPT interms of wave

110 function so in a similar way we can show the equivalence in terms of energy. The CCSD energy contains contributions identical to those of MBPT(2) and MBPT(3) energy, but lacks triple excitation contribution necessary for MBPT(4). Thus a natural approach to the "triples problem" is to correct the CCSD energy for the missing MBPT(4) terms using the CCSDT similarity-transformed Hamiltonian,

¯ T1 T2 T3 T1+T2+T3 H = e− − − HN e

For m=1,2,3 in equation (169) and then plug in the above equation we get

H¯ = H¯ (0) + H¯ (1) + H¯ (2) + .... Here we are interested in calculating E(4) so we will need H¯ (4) which is

¯ (4) 3 H = (VN T2 )C so (4) h | 3 | i E = 0 (VN T2 )C 0 3 plugging the VN and T2 operator in above equation and using Wick’s theorem, we get

1 ab(3) X E(4) = t hij||abi (170) 4 ij ijab

ab(3) in above equation, tij is not known and need to be calculated as follows

h ab| ¯ (3)| i 0 = Φij H 0

1 0 = hΦab|(F T 2 + V T 2 + V T 2 + V T 2 + V (T 1)2)|0i ij N 2 N 1 N 2 N 3 2 N 2 2 abc(2) we can see from above equation that it contains T3 and hence will involve tijk amplitude. so in order to find this amplitude, we have

h abc| ¯ (2)| i 0 = Φijk H 0

1 0 = hΦabc|V T 1 + V T 2 + V (T 1)2|0i ijk N 2 N 3 2 N 2 after plugging the operator and using Wick’s theorem we get X X abc abc(2) | | h || i ad(1) − | | h || i ab(1) εijk tijk = P (i ij)P (a bc) bc dk tij P (i kj)P (c ab) lc jk til (171) d l where P (p|qr) permutation operators perform anti-symmetric permutations of index 2 3 p with indices q and r. These T3 amplitudes may then be used to compute the T2

111 amplitudes, which may then be used in equation (170) to compute the triple excitation (4) contribution to the forth-order energy, ET . The corrected CCSD energy is

(4) ECCSD+T (4) = ECCSD + ET

and is referred as CCSD + T (4) method. If on the other hand one choose to use the

converged CCSD T2 amplitudes rather than first order T2 in equation ( 174 ) then one can obtain different correction which is called CCSD+T(CCSD) or CCSD[T]

[4] ECCSD+T (CCSD) = ECCSD + ET

This approach is reported to give quantitatively incorrect predictions of molecular properties for some systems. In 1989 a similar analysis was developed by Raghavachari et al., who determined that a fifth-order energy contribution involving single excitations, [5] denoted EST ;, should be included in the CCSD correction, as well. This component may be derived based on the second-order T3 contribution to the third-order T1 operator, which subsequently contributes to fourth-order T2. Although the diagrammatic techniques [5] described above are particularly convenient for deriving EST , here we will simply present the final equation [5] 1 X E = hjk||bcitatabc ST 4 i ijk ijkabc where the triple-excitation amplitudes are determined using a modified form of Eq. (

174 ) that includes converged T2 amplitudes X X abc abc(2) | | h || i ad − h || i bc εijk tijk = P (i jk)P (a bc) bc di tjk la jk til d l Hence, the total CCSD(T) energy may be succinctly written as

[4] [5] ECCSD(T ) = ECCSD + ET + EST This method of energy calculation is called CCSD(T) approach and is the "Gold Standard" in quantum chemistry. The second method to solve for the amplitude equation is the multivariable Newton- Raphson Method. We can see that the amplitude equation we have is nonlinear. We can ab write the amplitude equation in matrix form (by defining tij as the ij,ab element of the t column vector) as

0 = a + bt + ctt (172)

where a = hab||iji. The solution of these nonlinear algebraic equations pose a substantial difficulty in implementing coupled cluster theory.

112 To solve for the nonlinear amplitude equation, we choose t such that the vector f(t) defined as f (t) ≡ a + bt + ctt (173) becomes equal to zero. This is done by expanding f(t) about the point t0. Keeping only linear terms in this Taylor expansion and setting f(t) equal to zero, one obtains equation for the changes ∆t in the t amplitudes, which can be expressed as

X∂f ab  a ab ij cd fij b(t) = 0 = fij (t0) + ∆tkl (174) cd t klcd ∂tkl 0

The step lengths(corrections to t0) can be obtained by solving the above set of linear equations and then used to update the t amplitudes

t = t0 + ∆t

These values of t can then be used as a new t0 vector for the next application of Eq. (174). This multidimensional Newton-Raphson procedure, which involves the solution of a large number of coupled linear equations, is then repeated until the ∆t values are sufficiently small (convergence). Although the first applications of the coupled cluster method to quantum chemistry did employ this Newton-Raphson scheme, the numerical problems involved in solving the large multivariable inhomogenous equations (174) has led more recent workers to use the perturbative techniques discussed already. To solve for nonlinear equation of coupled cluster theory, other methods were devised. One such method within the perturbative framework is the reduced linear equations technique, developed by Purvis and Bartlett. Although this method can efficiently solve a large systems of linear equations but can also be used for nonlinear coupled cluster equations by assuming an approximate linearization of the nonlinear terms.

F. Full triple and higher excitations

Due to the on going growth of in computational resources, it is nowadays often possible to perform full Coupled Cluster singles, doubles and triples (CCSDT) calculations in cases that demand very high accuracy. This method was first formulated in 1987. The 8 complete inclusion of T3 makes it harder and hence scales as N . However it is necessary in many cases to go beyond this level and to include correlation effects beyond CCSDT. The next method is CC single, double, triples, and quadruples (CCSDTQ) which is very expensive with a computational scaling of N 10. It is of interest to explore methods intermediate between CCSDT and CCSDTQ with a reduced scaling of the cost. The CC wave function including quadruple excitation is given by ΨCCSDTQ = exp(T1+T2+T3+T4).

113 X. LINEAR RESPONSE THEORY

The linear response theory is used in situations when a system of electrons is subject to small perturbations. For example example an electric or magnetic field from a probe in an experiment. The response properties of a system determine the screening (dielectric) properties of the system and can be used to study excited states of a system. The density-density response function determines the second-order dispersion energy in the symmetry-adapted perturbation theory (SAPT) and the exact exchange-correlation energy of the density functional theory Exc.

A. Response function

| i Let us consider a system in the ground state Ψ0 of a Hamiltonian operator Hˆ 0 i.e., | i | i ˆ Hˆ 0 Ψ0 = E0 Ψ0 . Let A be an observable of the system then its expectation value is ˆ h | ˆ i i(t t0)H0 A0 = Ψ0 AΨ0 . The time evolution operator in this situation is Uˆ (t,t0) = e− − and i(t t )E therefore the wave function should just have a phase change in time e− − 0 0 . Let the system be subject to a perturbation Hˆ 1(t) = F(t)Bˆ, where F(t) is a time-dependent field coupled to an observable Bˆ of the system. The total Hamiltonian becomes Hˆ (t) = ˆ Hˆ 0 + Hˆ 1(t). As a consequence the expectation value of A also become time dependent in general i.e., h | ˆ| i A(t) = Ψ (t) A Ψ (t) t > t0 . (175) − ˆ The difference A(t) A0 is called the response of A to the perturbation Hˆ 1(t). The response in general can be written as − ··· A(t) A0 = A1(t) + A2(t) + A3(t) , (176) where A1(t) is the change which is first-order (linear) in the perturbation Hˆ 1(t), A2(t) is second-order (quadratic) and so on. We will limit discussion to linear response A1(t). i(t t )Hˆ The time evolution operator can be written now as Uˆ (t,t0) = e− − 0 0 Uˆ1(t,t0). The state of the system after the application of perturbation evolves as

ˆ | i | i i(t t0)H0 | i Ψ (t) = Uˆ (t,t0) Ψ0 = e− − Uˆ1(t,t0) Ψ0 . (177)

The time-dependent Schrödinger wave equation gives ∂|Ψ (t)i i = [Hˆ + Hˆ (t)]|Ψ (t)i (178) ∂t 0 1 ∂Uˆ (t,t ) ˆ ˆ 1 0 i(t t0)H0 ˆ i(t t0)H0 i = e − H (t)e− − Uˆ (t,t ) (179) ∂t 1 1 0 Z t ˆ ˆ ˆ ˆ − i(t0 t0)H0 ˆ i(t t0)H0 ˆ U1(t,t0) = U1(t0,t0) i dt0 e − H1(t0)e− − U1(t0,t0) (180) t0

114 As perturbation is turned on at t = t0 so Uˆ1(t0,t0) = 1. Therefore, Z t ˆ ˆ ˆ − i(t0 t0)H0 ˆ i(t t0)H0 ˆ U1(t,t0) = 1 i dt0 e − H1(t0)e− − U1(t0,t0) (181) t0 This integral transformation equation can be solved iteratively. The zeroth-order solution ˆ (0) is U1 (t,t0) = 1, thus, the first-order solution is Z t (1) ˆ ˆ ˆ − i(t0 t0)H0 ˆ i(t t0)H0 U1 (t,t0) = 1 i dt0 e − H1(t0)e− − . (182) t0 The time evolution operator to first-order in perturbation is

" Z t # ˆ ˆ ˆ ˆ (1) i(t t0)H0 − i(t0 t0)H0 ˆ i(t t0)H0 U (t,t0) = e− − 1 i dt0 e − H1(t0)e− − . (183) t0 The linear response then can be written as − h | ˆ| i − h ˆ i A1(t) = A(t) A0 = Ψ (t) A Ψ (t) Ψ0AΨ0 h | (1) ˆ (1) | i − h | ˆ| i = Ψ0 Uˆ †(t,t0)AUˆ (t,t0) Ψ0 Ψ0 A Ψ0 (184)

Consider

(1) ˆ (1) Uˆ †(t,t0)AUˆ (t,t0) " Z t # " Z t # ˆ ˆ − +i(t t0)H0 ˆ i(t t0)H0 − − = 1 + i dt0 F(t0)Bˆ(t0 t0) e − Ae− − 1 i dt0 F(t0)Bˆ(t0 t0) t0 t0 " Z t # " Z t # − ˆ − − − = 1 + i dt0 F(t0)Bˆ(t0 t0) A(t t0) 1 i dt0 F(t0)Bˆ(t0 t0) t0 t0 " Z t #" Z t # − ˆ − − ˆ − − = 1 + i dt0 F(t0)Bˆ(t0 t0) A(t t0) i dt0 F(t0)A(t t0)Bˆ(t0 t0) t0 t0 Z t Z t ˆ − − ˆ − − − ˆ − ˆ 2 = A(t t0) i dt0 F(t0)A(t t0)Bˆ(t0 t0) + i dt0 F(t0)Bˆ(t0 t0)A(t t0) + O(F ) t0 t0 Z t ˆ − − { ˆ − − − − ˆ − } ˆ 2 = A(t t0) i dt0 F(t0) A(t t0)Bˆ(t0 t0) Bˆ(t0 t0)A(t t0) + O(F ) t0 Z t h i ˆ − − ˆ − − ˆ 2 = A(t t0) i dt0 F(t0) A(t t0),Bˆ(t0 t0) + O(F ), (185) t0 h ˆ − − i where A(t t0),Bˆ(t0 t0) is a commutator. Using Eq. (185) in Eq. (184) and keeping terms upto first order in field F we get

Z t h | ˆ| i − h |h ˆ − − i| i − h | ˆ| i A1(t) = Ψ0 A Ψ0 i dt0 F(t0) Ψ0 A(t t0),Bˆ(t0 t0) Ψ0 Ψ0 A Ψ0 t0 Z t − h |h ˆ − − i| i = i dt0 F(t0) Ψ0 A(t t0),Bˆ(t0 t0) Ψ0 (186) t0

115 Now consider

h ˆ − − i A(t t0),Bˆ(t0 t0) ˆ − − − − ˆ − = A(t t0)Bˆ(t0 t0) Bˆ(t0 t0)A(t t0) i(t t )H i(t t )Hˆ i(t t )Hˆ i(t t )Hˆ i(t t )Hˆ i(t t )Hˆ = e − 0 0 Aeˆ − − 0 0 Beˆ − 0− 0 0 − e 0− 0 0 Beˆ − 0 0 Aeˆ − − 0 0 i(t t )H h i i(t t )H = e 0− 0 0 Aˆ(t − t0),Bˆ e− 0− 0 0 , (187)

ˆ ˆ ˆ i(t t0)H0 ˆ i(t t0)H0 i(t0 t0)H0 where A(t − t0) = e − Ae− − . As e− − is unitary, and it is well known in quantum mechanics that unitary transformations do not change expectation values. We confirm it here as h i h i h | ˆ − − | i h | i(t0 t0)H0 ˆ − i(t0 t0)H0 | i Ψ0 A(t t0),Bˆ(t0 t0) Ψ0 = Ψ0 e − A(t t0),Bˆ e− − Ψ0 h i h | i(t0 t0)E0 ˆ − i(t0 t0)E0 | i = Ψ0 e − A(t t0),Bˆ e− − Ψ0 h |h ˆ − i| i = Ψ0 A(t t0),Bˆ Ψ0 . (188)

Thus linear response from Eq. (186) using Eq. (188) is

Z t − h |h ˆ − i| i A1(t) = i dt0 F(t0) Ψ0 A(t t0),Bˆ Ψ0 (189) t0 The response function is defined as h i − ≡ − − h | ˆ − ˆ | i χAB(t t0) iΘ(t t0) Ψ0 A(t t0),B Ψ0 , (190) where Θ(t − t0) is time step function which has value 1 for t ≥ t0 and zero otherwise.This ensures the causality. It allows us to replace upper limit by ∞.Thus linear response of Aˆ in terms of the response function is Z ∞ − A1(t) = dt0 F(t0)χAB(t t0) (191) −∞ where the lower limit has been extended to −∞ as field F(t) is zero for all values below t0. Let us write Eq. (191)in Fourier space, Z Z Z Z 1 ∞ iωt 1 ∞ ∞ ∞ iωt i(t t )ω dω A (ω)e− = dt0dωdω0F(ω)χ (ω0)e− 0 e− − 0 0 2π 1 (2π)2 AB −∞ Z−∞ Z−∞ Z−∞ 1 ∞ ∞ ∞ i(ω ω )t iω t = dt0dωdω0F(ω)χ (ω0)e− − 0 0 e− 0 (2π)2 AB Z −∞Z −∞ −∞ 1 ∞ ∞ iω t = dωdω0 F(ω)χ (ω0)δ(ω − ω0)e− 0 2π AB Z−∞ −∞ 1 ∞ iωt = dω F(ω)χ (ω)e− , (192) 2π AB −∞

116 R i(ω ω )t where we used 2πδ(ω − ω0) = ∞ dt0e− − 0 0 . The relation in Eq. (192) holds for any value of t. So, we can write −∞

A1(ω) = F(ω)χAB(ω) (193)

The Fourier transform of Eq. (190) gives frequency dependent response. Z Z h i ∞ iωτ − ∞ h | ˆ ˆ | i iωτ χAB(ω) = dτ χAB(τ)e = i dτ Θ(τ) Ψ0 A(τ),B Ψ0 e

−∞ Z Z −∞iω τ 1 ∞ ∞ e 0 h i = lim dτdω − hΨ | Aˆ(τ),Bˆ |Ψ ieiωτ , (194) + 0 0 0 η 0 2π ω0 + iη → −∞ −∞ R iω τ i e− 0 where we have used the integral representation of time step function Θ(τ) = limη 0+ ∞ dω0 . 2π ω0+iη P | ih | → −∞ Now using 1 = j∞=0 Ψj Ψj we get

Z Z hΨ | ˆ|Ψ ihΨ | ˆ|Ψ i X∞ 1 ∞ ∞ 0 A j j B 0 i(ω ω+Ω )τ χ (ω) = lim dτdω e− 0− j AB + 0 η 0 2π ω0 + iη → j=0 −∞ −∞ Z Z hΨ | ˆ|Ψ ihΨ | ˆ|Ψ i X∞ 1 ∞ ∞ 0 B j j A 0 i(ω ω Ω )τ − lim dτdω e− 0− − j , (195) + 0 η 0 2π ω0 + iη → j=0 −∞ −∞ − where Ωj = Ej E0. Now using the standard integral

ZZ i(ω0 ω+Ωj )τ Z − 1 e− − δ(ω0 ω + Ωn) 1 dτdω0 = dω0 = − , (196) 2π ω0 + iη ω0 + iη ω Ωj + iη we can write Eq. (195) as   X hΨ |Aˆ|Ψ ihΨ |Bˆ|Ψ i hΨ |Bˆ|Ψ ihΨ |Aˆ|Ψ i ∞  0 j j 0 − 0 j j 0  χAB(ω) = lim  , (197) η 0+  ω − Ω + iη ω + Ω + iη  → j=1 j j where the j = 0 terms cancel out in the first and second terms.This is the so-called Lehman representation of response function.

1. Density-density response function

Consider the perturbation coupled to electron density as Z 3 Hˆ 1(t) = d r0 v1(r0,t)nˆ(r0), (198) where v1(r0,t) is the fluctuation in the external potential. The corresponding change in the density is Z Z ∞ 3 − n1(r,t) = dt0 d r0 χnn(r,r0,t t0)v1(r0,t0), (199) −∞

117 − where χnn(r,r0,t t0) is the density-density response function which can be written as − − − h | − | i χnn(r,r0,t t0) = i Θ(t t0) Ψ0 [nˆ(r,t t0),nˆ(r0)] Ψ0 . (200)

In frequency space we can write Z 3 n1(r,ω) = d r0 χnn(r,r0,ω)v1(r0,ω), (201)

where χnn(r,r0,ω) is the density-density response function in the frequency space. A fluctuation in the Kohn-sham potential vs(r0,ω) can induce a change in density Z 3 n1(r,ω) = d r0 χs(r,r0,ω)vs 1(r0,ω), (202)

where χs(r,r0,ω) is the Kohn-Sham response function. The Kohn-Sham potential is given as

vs(r,ω) = vH(r,ω) + vxc(r,ω) + vext(r,ω)

vs 1(r,ω) = vH1(r,ω) + vxc1(r,ω) + vext1(r,ω) (203) Z Z 3 δvH[n](r,ω) 3 δvxc[n](r,ω) = d r0 n1(r0,ω) + d r0 n1(r0,ω) + v1(r,ω) δn(r ,ω) δn(r ,ω) 0 n 0 n Z Z 0 0 n (r ,ω) = d3r 1 0 + d3r f (r,r ,ω)n (r ,ω) + v (r,ω) 0 |r − r | 0 xc 0 1 0 1 Z 0 3 = d r0 [w(r,r0) + fxc(r,r0,ω)]n1(r0,ω) + v1(r,ω) Z 3 = d r0 fHxc(r,r0,ω)n1(r0,ω) + v1(r,ω) (204)

r r 1 r r δvxc[n](r,ω) where w( , 0) = r r , fxc( , 0,ω) = δn(r ,ω) is the so-called exchange-correlation | − 0| 0 n0 kernel and fHxc(r,r0,ω) = w(r,r0) + fxc(r,r0,ω) is the so-called Hartree-xc kernel. The KS response function χs relates the change of density n1 and change of KS potential given in Eq. (204) as Z 3 n1(r1,ω) = d r0 χs(r,r0,ω)vs 1(r0,ω) Z 3 3 3 = d r0d r00 χs(r,r0,ω)fHxc(r0,r00,ω)n1(r00,ω) + d r0 χs(r,r0,ω)v1(r0,ω). (205)

Consider"

3 3 d r0d r00 χs(r,r0,ω)fHxc(r0,r00,ω)n1(r00,ω)

" 3 3 3 = d r0d r00d r000 χs(r,r0,ω)fHxc(r0,r00,ω)χ(r00,r000,ω)v1(r000,ω) (206)

$ 118 Using Eq. (206) and Eq. (201) in Eq. (205) we can write Z Z 3 3 d r000 χ(r,r000,ω)v1(r000,ω) = d r000 χs(r,r000,ω)v1(r000,ω)

3 3 3 + d r0d r00d r000 χs(r,r0,ω)fHxc(r0,r00,ω)χ(r00,r000,ω)v1(r000,ω) (207)

Since, this relationship$ is valid for any arbitrary perturbation v1, hence, one can write

3 3 χ(r,r0,ω) = χs(r,r0,ω) + d r00d r000 χs(r,r00,ω)fHxc(r00,r000,ω)χ(r000,r0,ω)

3 "3 = χs(r,r0,ω) + d r00d r000 χs(r,r00,ω)[w(r00,r000) + fxc(r00,r000,ω)]χ(r000,r0,ω) (208)

This equation is the" so-called Dyson screening equation. The iterative formal solution is possible if on knows the exchange-correlation kernel fxc. If we set fxc = 0 we end up with the random-phase approximation (RPA).

2. Calculation of properties from response functions

The response function can be used to calculate polarizability of a system which is very important physical property. If a system is subject to an electric field E(t) = εsin(ωt)eˆz then the 1st-order dipole polarization p1 is given as Z − p1(t) = dt0 (t t0)E(t0), (209)

p1(ω) = (ω)E(ω) (210) where is the dipole-dipole polarizability tensor. The perturbation as a consequence of

E(t) is v1 = zεsin(ωt) which in Fourier-space becomes v1(r,ω) = εz/2 which can polarize the system of electrons and change density Z 3 n1(r,ω) = d r0 χ(r,r0,ω)v1(r0,ω) (211)

The z-component of the polarization would be Z − 3 p1z = d r zn1(r,ω) (212)

− 3 3 = d rd r0 z χ(r,r0,ω)v1(r0,ω) (213) ε = −" d3rd3r z χ(r,r ,ω)z (214) 2 0 0 0 Comparison of Eqs. 210 and 214 leads" to the conclusion that the dipole-dipole polarizability αzz is given as − 3 3 αzz(ω) = d rd r0 z χ(r,r0,ω)z0. (215)

" 119 Now using the Lehman representation of the response function χ and using the fact

3 3 h | | ih | | i d rd r0 z z0 Ψ0 nˆ(r) Ψj Ψj nˆ(r0) Ψ0 X X h | | ih | | i "= Ψ0 zk Ψj Ψj zl Ψ0 k l X X h | | ih | | i = Ψ0 zk Ψj Ψj zl Ψ0 k l h | | i 2 = Ψ0 Z Ψj , (216) P where Z = i zi, the polarizability can be written as   X ∞  1 − 1  h | | i 2 αzz(ω) = lim   Ψ0 Z Ψj η 0+ ω − Ω + iη ω + Ω + iη  → j=1 n j   X 2Ω ∞  j  h | | i 2 = lim   Ψ0 Z Ψj (217) η 0+ ( + )2 − Ω2  → j=1 ω iη n

The polarizability generally would be defined as

3 3 αab = d rd r0rarb0 χ(r,r0,ω) (218)   X XN XN " ∞  2Ωn h | | ih | | i = lim   Ψ0 ral Ψj Ψj r0 Ψ0 (219) η 0+ (ω + iη)2 − Ω2  b m → j=1 j l m

B. Linear response in CC approach

1. CC equations

The Schrödinger equation for coupled cluster system is already given by equation (165) and the corresponding wavefunction is given by equation (166).

| i Tˆ | i Ψ CCD = e 0 (220) where Tˆ is the excitation operator and |0 > is the Fermi vacuum state. For normal ordered Hamiltonian, the time independent Schrödinger equation with CC method is given as

ˆ | i | i HN Ψ CCD = ∆E Ψ CCD (221) ˆ | i − | i (HN Ψ CCD ∆E Ψ CCD) = 0 ˆ − | i (HN ∆E) Ψ CCD = 0

120 Here ∆E is the energy with respect to vacuum reference state. Applying eTˆ to the left of the above equation yields

Tˆ ˆ Tˆ − | i ( e− HN e ∆E ) 0 = 0 which we can write as (Hˆ − ∆E)|0 > = 0

Hˆ Tˆ ˆ Tˆ Hˆ where = ( e− HN e ). Here we can see that which is also called the CC effective Hamiltonian/similarity transformed CC Hamiltonian, is non-hermitian and is a symmetric transformation of normal ordered Hamiltonian. Also, ∆E is the energy corresponding to the state vector |0 >. Hˆ Tˆ ˆ Tˆ Now let us take a closer look to the CC effective Hamiltonian = e− HN e . Using Baker-Campbell-Housdorff expansion one can write that

ˆ ˆ 1 1 e BAeˆ B = Aˆ + [A,ˆ Bˆ] + [[A,ˆ Bˆ],Bˆ] + [[[A,ˆ Bˆ],Bˆ],Bˆ] + ... (222) − 2! 3! The transformed CC Hamiltonian thus becomes

ˆ ˆ 1 1 1 e T Hˆ eT = Hˆ +[Hˆ ,Tˆ]+ [[Hˆ ,Tˆ],Tˆ]+ [[[Hˆ ,Tˆ],Tˆ],Tˆ]+ [[[[Hˆ ,Tˆ],Tˆ],Tˆ],Tˆ] (223) − N N N 2! N 3! N 4! N Here the series terminate with four fold term as the Hamiltonian has atmost two- particle interactions. Now Tˆ has only particle creation and hole annihilation operators and only possible non-zero contractions are

AB† = δab

i†j = δij ˆ ˆ which shows that [Tm,Tn] = 0. Therefore the only non-zero terms will be commutation ˆ ˆ between HN and T . Since only non-zero contractions are particle creation with particle annihilation on it’s left and hole annihilation with hole creation on its left, the only non- zero terms will be HN on the left. Hˆ therefore becomes

H TT ˆ Hˆ = Hˆ + Hˆ Tˆ + N + ... = (H eT ) N N 2! N C Here ’C’ denotes that only fully connected terms are included. Now we can replace the term in Schrödinger equation for CC system.

ˆ Tˆ | i | i (HN e 0 )CCD = ∆E 0 (224)

121 Now let us define the ground state and excited state projecton operators P, Q as

P = |0ih0| (225)

Q = I − P (226) which has the properties :

P 2 = P (227)

Q2 = (1 − P )2 = 1 − P − P + P 2 = Q (228) Therefore using projection operators equation (8) can be written as

(h0|H|ˆ 0i) = ∆E (229)

and

h ab...|H|ˆ i ( φij... 0 ) = 0 (230) i.e.

P HP = ∆EP (231) and

QHP = 0 (232) These are called the CC amplitude equations. We are going to use these equations and Hellmann Feynman theorem to derive CC energy functional using linear response theory.

2. Hellmann-Feynman theorem

As we already know by now, response theory is an alternative of studying properties of molecular system in presence of perturbation ( i.e. external electric or magnetic field or displacement between nuclei etc. ). Using Hellmann-Feynman theorem, we can study first or higher order properties even if the wavefunction is not know. The theorem states that the expectation value of first and higher order property is equivalent to the energy derivative with respect to applied perturbation at the point when perturbation is equal to zero. Let us try to prove the theorem. Since we are considering a system in a perturbed field, we can write Hˆ = Hˆ (λ) and a Maclaurin series expansion gives

122 Hˆ (λ) = Hˆ (0) + λHˆ 1 + λ2Hˆ 2 + ... (233)

ˆ (n) 1 dnHˆ | where H = n! dλn λ=0 Here Hˆ (0) is the unperturbed Hamiltonian of the system and λ is the perturbation parameter. For linear perturbation all higher order terms with n > 1 will be zero and the Hamiltonian will be

Hˆ (λ) = Hˆ (0) + λHˆ 1 (234)

Similarly, the energy and the wavefunctions will be

E(λ) = E(0) + λE(1) + ... (235)

and

Ψ (λ) = Ψ (0) + λΨ (1) + ... (236)

Now Schrödinger equation gives

Hˆ (λ)Ψ (λ) = E(λ)Ψ (λ) (237)

(Hˆ (0) + λHˆ (1))(Ψ (0) + λΨ (1) + ...) = (E(0) + λE(1) + ...)(Ψ (0) + λΨ (1) + ...)

h (0)| ˆ (1)| (0) Ψ H Ψ > 1 dE(λ)| = E = (238) hΨ (0)|Ψ (0) > dλ λ=0 which is a trivial case of Hellmann-Feynman theorem i.e. first order property of a system can be studied as derivative of the perturbed energy at λ = 0 i.e. at zero perturbation. We can take an example of a system in an external electric field, E~ ( perturbation in this case). The energy of the system is given by

X E~ −E·~ E( ) = qu~ru (239)

dE~(~) It is quite conspicuous from the energy equation that first order property i.e. E | d a ~=0 is nothing but the dipole moment of the system. The higher order properties canE E be obtained following same method.

123 3. Linear response CC for static perturbation

Now that we have all required equations, we can derive energy functional for CC. The amplitude equation for CC is given by equation (234), (235)

P HP = ∆EP

and

QHP = 0

Taking derivative of equation (234) with repect to λ gives

d∆E d Pˆ = Pˆ HˆPˆ (240) dλ dλ Hˆ Tˆ ˆ Tˆ Now = ( e− HN e )

ˆ ˆ ˆ d Hˆ − Tˆ dT ˆ Tˆ Tˆ ˆ Tˆ dT Tˆ dHN Tˆ = e− HN e + e− HN e + e− e dλ λ=0 dλ dλ dλ

= [Hˆ, Tˆ λ] + Hˆ [λ]

ˆ Hˆ [λ] Tˆ dHN Tˆ ˆ λ dTˆ where = e− dλ e and T = dλ λ=0 λ=0

On inserting this expression into above energy derivative gives,

Pˆ[Hˆ,Tˆ λ]Pˆ

= PˆHˆ(Pˆ + Qˆ )Tˆ λPˆ − PˆTˆ λ(ˆPˆ + Qˆ )HˆP

= ∆EPˆTˆ λPˆ + PˆHˆQˆ Tˆ λPˆ − ∆EPˆTˆ λPˆ = PˆHˆQˆ Tˆ λPˆ

This expression can be simplified by using CC amplitude equation as follows :

Qˆ HˆPˆ = 0

124 dHˆ Qˆ Pˆ = 0 dλ i.e.

Qˆ {Hˆ [λ] + [Hˆ,Tˆ λ]}Pˆ = 0

Now, the second term gives

Qˆ [Hˆ,Tˆ λ]Pˆ = Qˆ Hˆ(Pˆ + Qˆ )Tˆ λPˆ − Qˆ Tˆ λ(Pˆ + Qˆ )HPˆ = Qˆ (Qˆ HˆQˆ − ∆E)Qˆ Tˆ λPˆ

Therefore using the fact that Qˆ2 = Qˆ , we get

λ 1 [λ] Qˆ Tˆ Pˆ = Qˆ (∆E − Qˆ HˆQˆ )− Qˆ Hˆ Pˆ

and equation(236) can be writen as where λ d∆E ∆E = dλ λ=0

λ [λ] 1 [λ] ∆E Pˆ = PˆHˆ Pˆ + PˆHˆQˆ (∆E − Qˆ HˆQˆ )− Qˆ Hˆ Pˆ (241)

In the above equation, Tˆ λ has been eliminated. ˆ ˆ ˆ ˆ 1 ˆ We can now define effective resolvent operator R(λ) = Q[∆E(λ) − QH(λ)Q]− Q The above equation then becomes

∆EλPˆ = PˆHˆ [λ]Pˆ + PˆHRˆ Qˆ Hˆ [λ]Pˆ (242)

To enhance the fact that this equation is valid at λ = 0, let us write the above equation in following manner :

∆E(1)Pˆ = PˆHˆ [λ](0)Pˆ + PˆHˆ(0)R(0)Qˆ Hˆ [λ](0)Pˆ

Here we can see that PˆH(0)ˆ Rˆ (0)Qˆ is independent of λ (perturbation) at λ = 0 and does not contain the perturbation operator and therefore we define a new operator Λ as Λ(0) = PˆH(zeroˆ )Rˆ (0)Qˆ and therefore we get

∆EλPˆ = Pˆ(1 + Λ)Hˆ [λ]Pˆ (243)

125 which when integrated over λ gives

∆EPˆ = Pˆ(1 + Λ)HˆPˆ (244)

This is called fundamental CC energy functional which is independent of perturbation and if it is solved at λ = 0, Λ needs to be solved only once. The Λ operator satisfies few linear equations under stationary conditions which is going to be our next section.

4. Lambda equations

To exploit stationary requirements of a functional we are going to set the coefficients of dependent functions equal to zero so that all higher derivatives vanish and henceforth Helmann-Feynman theorem is still valid. We start with a functional defined as E(Λ,Tˆ) = PˆE(Λ,Tˆ)Pˆ = Pˆ(1 + Λ)HˆPˆ Hˆ Tˆ ˆ Tˆ where = e− HN e and Λ is a dexcitation operator and Tˆ is an excitation operator satisfying satisfying equations

ΛPˆ = 0, ΛQˆ = Λ, PˆTˆ = 0, Qˆ Tˆ = Qˆ

Now variation of the functional with respect to its argument will be given by

ˆ E ˆ ˆ ˆ Hˆ ˆ ˆ ˆ Tˆ ˆ Tˆ ˆ ˆ ˆ Hˆ ˆ ˆ P δ P = P δΛQ P + P (1 + ΛQ)δ(e− HN e )P = 0 + P (1 + ΛQ)[ ,δT ]P (245)

Since the functional is stationary with respect to its argument, we can set the coefficients of δΛ and δTˆ is zero satisfying CC amplitude equations. We can simplify the second term using CC amplitude equations PˆHˆPˆ = ∆EPˆ and Qˆ HˆPˆ = 0 i.e.

Pˆ(1 + ΛQˆ )[Hˆ,δTˆ]Pˆ = Pˆ(Hˆ + ΛQˆ Hˆ − ∆EΛ)Qδˆ TˆPˆ

The stationary condition of the functional with respect to Tˆ gives

Pˆ(Hˆ + ΛQˆ Hˆ − ∆EΛ)Qˆ = 0

To evaluate Λ, we can either use inversion of the operator from equation (32) which is a tedius precedure or we can exploit the stationary condition of the functional and get some linear equations that Λ satisfies.

126 Starting from above equation we get,

Pˆ(1 + ΛQˆ )(Hˆ − ∆E) = 0

Here an extra term ∆EPˆQˆ (which is equal to zero since P,ˆ Qˆ are orthonormal operators) has been added to derive the above steps. Now projection of Pˆ will produce CC energy functional again but projecting Qˆ to the right gives the Λˆ equations.

Pˆ(1 + ΛQˆ )(Hˆ − ∆E)Qˆ = 0

PˆHˆQˆ + PˆΛHˆQˆ − ∆EPˆΛQˆ = 0

The equation has energy dependence which can produce some disconnected terms. The energy dependence can be eliminated the following way.

PˆΛHˆQˆ = Pˆ[Λ,Hˆ]Qˆ + PˆHˆ(Pˆ + Qˆ )ΛQˆ

ˆ Hˆ ˆ ˆ ˆ ˆHˆ ˆ ˆ = P (Λ )CQ + ∆EP ΛQ + P QΛQ

Therefore the energy equation after eliminating energy dependence is given as

ˆHˆ ˆ ˆ Hˆ ˆ ˆHˆ ˆ ˆ P Q + P (Λ )CQ + P QΛQ = 0 (246)

ˆ ˆ Tˆ ˆ ˆ ˆ Tˆ ˆ ˆ ˆ Tˆ ˆ ˆ or, P (HN e )CQ + P (Λ(HN e )C)CQ + P (HN e )CQΛQ = 0 (247) where the last term is discunnected. For an arbitrary excited state, the above equation can be written in explicit form as

| ˆ Tˆ | ab...i | ˆ Tˆ | ab...i < 0 HN e φij... + < 0 Λ(HN e )C φij... C (248)

X | ˆ Tˆ | cd...i h cd...| | ab...i + < 0 HN e φkl... C φkl... Λ φij... = 0 k

127 | ˆ Tˆ | ai | ˆ Tˆ | ai < 0 HN e φi + < 0 Λ(HN e )C φi C (249) where the disconnected term doesn’t contribute to the Λ equation since there is no intermediate state between vacuum and singly excited state. For doubly excited state, Λ equation looks like

| ˆ Tˆ | abi | ˆ Tˆ | abi < 0 HN e φij + < 0 Λ(HN e )C φij C (250)

X | ˆ Tˆ | c i h c | | abi + < 0 HN e φk C φk Λ φij = 0 k=ij c=a,b Now once we know these linear equations, we can solve for Λ.

XI. TREATMENT OF EXCITED STATES

A. Excitation energies from TD-DFT

The poles of density-density response function give exact excitation energies. A finite response can be sustained by a system at its excitation frequencies even in the absence of

any external perturbation. Therefore, setting v1 = 0 in Eq. (205) one can write

3 3 n1(r,Ω) = d r0d r00 χs(r0,r00,Ω)fHxc(r0,r00,Ω)n1(r00,Ω) (251)

In the spin dependent formalism" X n (r,Ω) = d3r d3r χ (r,r ,Ω)f (r ,r ,Ω)n (r ,Ω) (252) 1σ 0 00 s,σσ 0 0 Hxc,σ 0σ 00 0 00 1σ 00 00 σ 0σ 00 "R If we pre-multiply with d3r f (r,r ,Ω) and using the notation 000 Hxc,σσ 0 000 Z g (r,Ω) = d3r f (r,r ,Ω)n (r ,Ω), (253) σσ 0 0 Hxc,σσ 0 0 1σ 0 0 we can write Eq. 252 as X g (r,Ω) = d3r d3r f (r,r ,Ω)χ (r ,r ,Ω)g (r ,Ω) (254) σσ 0 0 00 Hxc,σσ 0 0 s,σ 0σ 00 0 00 σ 00σ 00 00 σ 00σ 000 " The Kohn-Sham response function can be written as

X Φjkσ∗ (r)Φjkσ (r0) χ (r,r0,Ω) = δ α , (255) s,σσ 0 σσ 0 jkσ Ω − ω + iη jk jkσ

128 where

− αjkσ = fkσ fjσ (256)

Φjkσ (r) = φjσ∗ (r)φkσ (r) (257) − ωjkσ = εjσ εkσ (258)

X δ α σ 0σ 00 jkσ 0 3 3 gσσ (r,Ω) = d r0d r00 fHxc,σσ (r,r0,Ω)Φ∗ (r0)Φjkσ (r00)gσ σ (r00,Ω) 0 Ω − ω 0 jkσ 0 0 00 000 jkσ 0 jkσ 00σ 000 " (259) X α Z Z jkσ 0 3 3 = d r0 fHxc,σσ (r,r0,Ω)Φ∗ (r0) d r00Φjkσ (r00)gσ σ (r00,Ω) Ω − ω 0 jkσ 0 0 0 000 jkσ 0 jkσ 000 | {z }| {z } (260) P R Now multiplying Eq. 260 by d 3 r Φ ( r ) and using σ 0 jk0σ XZ H (Ω) = d3r Φ (r)g (r,Ω), (261) jkσ jkσ σσ 0 σ 0 3 3 K = d rd r0 Φ∗ (r)f (r,r0,Ω)Φ∗ (r0), (262) jkσ,j0k0σ 0 jkσ Hxc,σσ 0 j0k0σ 0 we can write " X α j0k0σ 0 3 3 Hjkσ (Ω) = d rd r0 Φ∗ (r)fHxc,σσ (r,r0,Ω)Φ∗ (r0)Hj k σ (263) Ω − ω jkσ 0 j0k0σ 0 0 0 0 j0k0σ 0 j0k0σ 0 | {z } " X αj k σ H (Ω) = 0 0 0 K H (264) jkσ Ω − ω jkσ,j0k0σ 0 j0k0σ 0 j0k0σ 0 j0k0σ 0 X H (Ω) = α K β (Ω) (265) jkσ j0k0σ 0 jkσ,j0k0σ 0 j0k0σ 0 j0k0σ 0 Hj k σ β (Ω) = 0 0 0 (266) j0k0σ 0 Ω − ω j0k0σ 0 − Now multiplying and dividing Eq. 265 by Ω ωjkσ we can write X (Ω − ω )β = α K β (Ω), (267) jkσ jkσ j0k0σ 0 jkσ,j0k0σ 0 j0k0σ 0 j0k0σ 0 X ω β (Ω) + α K β (Ω) = Ωβ (Ω) (268) jkσ jkσ j0k0σ 0 jkσ,j0k0σ 0 j0k0σ 0 jkσ j0k0σ 0 X h i δ δ δ ω + α K β (Ω) = Ωβ (Ω) (269) j,j0 kk0 σσ 0 j0k0σ 0 j0k0σ 0 jkσ,j0k0σ 0 j0k0σ 0 jkσ j0k0σ 0

129 Since indices j(j0) and k(k0) can take values such that if one runs over occupied orbitals then the other has to run over virtual orbitals. We will be denoting occupied orbitals by

i(i0) and virtual orbitals by a(a0). Therefore, Eq. 269 can be written as X h i δ δ δ ω + α K β (Ω) = Ωβ (Ω) (270) ij0 ak0 σσ 0 j0k0σ 0 j0k0σ 0 jkσ,j0k0σ 0 j0k0σ 0 iaσ j0k0σ 0 X h i δ δ δ ω + α K β (Ω) = Ωβ (Ω) (271) aj0 ik0 σσ 0 j0k0σ 0 j0k0σ 0 jkσ,j0k0σ 0 j0k0σ 0 aiσ j0k0σ 0

X δ δ δ ω + α K β ii0 aa0 σσ 0 i0a0σ 0 i0a0σ 0 iaσ,i0a0σ 0 i0a0σ 0 i a σ 0 0X0 + δ δ δ ω + α K β ia0 ai0 σσ 0 a0i0σ 0 a0i0σ 0 iaσ,i0a0σ 0 a0i0σ 0 i a σ 0X0 0 = (δ δ δ ω − K )β + K β  (272) ii0 aa0 σσ 0 i0a0σ 0 iaσ,i0a0σ 0 i0a0σ 0 iaσ,i0a0σ 0 a0i0σ 0 i0a0σ 0 Therefore Eqs. 271 and 270 can be written as X (δ δ δ ω − K )β + K β  = Ωβ (273) ii0 aa0 σσ 0 i0a0σ 0 iaσ,i0a0σ 0 i0a0σ 0 iaσ,i0a0σ 0 a0i0σ 0 iaσ i a σ X0 0 0 −K β + (δ δ δ ω − K )β  = Ωβ (274) aiσ,i0a0σ 0 i0a0σ 0 aa0 ii0 σσ 0 a0i0σ 0 iaσ,a0i0σ 0 a0i0σ 0 aiσ i0a0σ 0 Now defining X = −β and Y = β and using the fact that ω = −ω we iaσ iaσ iaσ aiσ i0a0σ 0 a0i0σ 0 can rewrite Eqs. 273 and 274 as X (δ δ δ ω + K )X + K Y  = −ΩX (275) ii0 aa0 σσ 0 a0i0σ 0 iaσ,i0a0σ 0 i0a0σ 0 iaσ,i0a0σ 0 a0i0σ 0 iaσ i a σ 0 0X0 K X + (δ δ δ ω + K )Y  = ΩY (276) aiσ,i0a0σ 0 i0a0σ 0 aa0 ii0 σσ 0 a0i0σ 0 iaσ,a0i0σ 0 a0i0σ 0 aiσ i0a0σ 0 If Kohn-Sham orbitals are real then from Eq. 262 K = K . Therefore, we can iaσ,i0a0σ 0 aiσ,a0i0σ 0 write ! ! ! ! AB X −1 0 X = Ω (277) BA Y 0 1 Y where matrices A and B are called Hessians and have elements

A (Ω) = δ δ δ ω + K (Ω) (278) iaσ,i0a0σ 0 ii0 aa0 σσ 0 a0i0σ 0 iaσ,i0a0σ 0 B (Ω) = K (279) iaσ,i0a0σ 0 iaσ,i0a0σ 0

The Eq.277 is known as Casida Equation which can in principle be solved to get the exact − excitation energies Ωn. The solution of Eq. 277 also gives Ωn which is deexcitation energy. The excitation energy energy Ωn may represent single as well as multiple

130 excitations.One needs the exact Kohn-Sham orbitals and corresponding energy eigenvalues

along with the exchange correlation kernel fxc which is unknown in general, moreover, it will be an infinite dimensional problem and in practice one needs to approximate fxc and solve it for finite dimensions. One such approximation is Tamm-Dancoff approximation which ignores deexcitation processes. Another approximation called small matrix approximation (SMA) in which off-diagonal elements of the matrices A and B are neglected which may work well in special conditions.

B. Limitations of single-reference CC metods

The conventional, single-reference, coupled-cluster method is very effective for electronic states domminated by a single determinant, such as most molecular ground states near their equilibrium geometry. Such stateds are predominantly closed-shell singlet states, and CC calculations on them produce pure singlet wave functions. But even these states become dominated by more than one determinant when one or more bonds are stretched close to breaking, besides, most excited, ionized and electron-attached states are open-shell states, so that single-reference CC based on RHF orbitals is then not usually appropriate for the calculation of entire potential- energy surfaces. One solution to these problems is to resort to multireference methods. An effective alternative in many cases is provided by the equation-of-motion coupled-cluster (EOM-CC) method.

C. The equation-of-motion coupled-cluster method

The basic idea of EOM-CC is to start with a conventional CC calculation on some initial state, usually a vonveniently chosen closed-shell state, and obtain the desired target state by application of a CI-like linear operator acting on the initial state CC wave function. Althogh the calculations for the the two states must use the same set of nuclei in the same geometrical arrangement and the same set of spinorbitals defining a common Fermi state |0i, they need not have the same number of electrons. In the EOM-CC method we consider two Schrödinger-equation eigenstates simultaneously,

an initial state Ψ0 and a target state Ψk,

ˆ ˆ HΨ0 = E0Ψ0,HΨk = EkΨk. (280) The initial state is often referred as the reference state. The aim of the method is to determine the energy difference

− ωk = Ek E0 (281) If we use the normal-product form of the Hamiltonion, equations (280) become

131 ˆ HN Ψ0 = ∆E0Ψ0 (282) ˆ HN Ψk = ∆EkΨk (283) − − h | ˆ | i where ∆E0 = E0 Eref and ∆Ek Eref , with Eref = 0 H 0 . Then we have

− ωk = ∆Ek ∆E0 (284) The initial-state coupled-cluster wave function is represented by the action of an Tˆ | i exponential wave operator Ω0 = e on a single-determinant reference function 0 ,

| i | i Tˆ | i Ψ0 = Ω0 0 = e 0 (285) ˆ An operator Rk is used to generate the target state from the initial state,

| i ˆ | i Ψk = Rk Ψ0 (286) so that, using 285, the target-state Schrödinger equation refcl can be written in the form

ˆ ˆ Tˆ | i ˆ Tˆ | i HN Rke 0 = ∆EkRke 0 (287) In the EOM-CC case, if all possible excitations from the initial state are included we have

X ˆ a{ ˆ} Rk = r0 + ri aˆ†i + ... (288) i,a ˆ ˆ Since Rkis an excitation operator, it commutes with the CC cluster operator T and all its components. Tˆ ˆ ˆ Multiplying (287) on the left with e− and using the commutation between Rk and T ˆ ˆ and using the commuation between Rk and T , we get

H ˆ | i ˆ | i Rk 0 = ∆kRk 0 (289) H Tˆ ˆ Tˆ where = e− HN e ˆ | i H showing that Rk 0 is a right eighenfuction of with eigenvalue ∆Ek. And it has h |ˆ left eigenfunctions 0 Lk, with the same eigenvalues ∆Ek as the corresponding right ˆ | i eigenfunctions Rk 0 , satisfying

h |ˆ H h |ˆ 0 Lk = 0 Lk∆Ek (290) ˆ The operator Lk is a de-excitation operator.

132 X ˆ i {ˆ } Lk = l0 + la i†aˆ + ... (291) i,a and therefore safisfies

ˆ ˆ ˆ ˆ ˆ LkP = 0,Lk = LkQ (292)

For the initial state (k=0) we have Rˆ0 = 1,ˆ but Lˆ 0 , 1ˆ The two sets of eigenfunctions are biorthogonal and can be normalized to satisfy

h |ˆ ˆ | i 0 LkRl 0 = δkl (293) They provide a resolution of the identity,

X ˆ | ih |ˆ 1ˆ = Rk 0 0 Lk (294) k

Also, because Rˆ0 = 1ˆ we have

h |ˆ | i 0 Lk 0 = δk0 (295)

Since Rˆ0 = 1, the initial-state version of (289) is

H| i | i 0 = ∆E0 0 (296) ˆ Multiplying this equation on the left by Rk and substracting it from (289), we obtain the EOM-CC equation in the form

H ˆ | i − ˆ | i [ ,Rk] 0 = (∆Ek ∆E0)Rk 0 (297) or H ˆ | i ˆ | i ( Rk 0 )C = ωkRk 0 (298)

D. Multireference coupled-cluster methods

As in the case of quasidegenerate perturbation theory, multireference coupled-cluster (MRCC) theory is designed to deal with electronic states for which a zero-order description in terms of a single Slater determinant does not provide an adequate starting point for calculating the electron correlation effects. All multireference methods are based on the generalized Bloch equation (Lindrgen 1974)

− [Ω,Hˆ 0]Pˆ = Vˆ ΩPˆ ΩPˆVˆ ΩPˆ (299)

133 The projection operator Pˆ projects onto a model space spanned by a set of model

functions Φα, X X ˆ | ih | ˆ ˆ | ih | P = Φα Φα = Pα,Pα = Φα Φα (300) α α and Ω = ΩPˆ is the wave operator, which, when operating on the model space, produces the space spanned by the perturbated wave functions,

Ψα = ΩΦα (301)

By rearranging the terms in (299), and using Hˆ 0 + Vˆ = Hˆ , at the same time noting thatPˆΩ = Pˆ, the result is

Hˆ Ω = Ω(Hˆ 0Pˆ + Vˆ Ω) = Ω(Hˆ 0 + Vˆ )Ω (302) or

Hˆ Ω = ΩHˆ Ω (303) ˆ The functions Ψα, are not individually eigenfunctions of H but span the space of ˜ eigenfunctions Ψα for which the model space forms a zero-order approximation,

ˆ ˜ ˜ HΨα = EαΨα (304) Applying Pˆ from the left, we get the matrix eigenvalue equation

ˆ ˆ ˜ ˜ P HΩΨα = EαΨα (305) The operator

Hˆ ef f = PˆHˆ Ω (306) ˆ which operators entirely in P -space and whose eigenfunctions and eigenvalues are Φα and Eα, respectively, is called the effective Hamiltonion operator. With this notation, the generalized Bloch equation (303) can be written in the form

Hˆ Ω = ΩHˆ ef f (307)

In the Hilbert-space approach to MRCC theory assumes a separate Fermi-vacuum definition, and thus a separate partition of the spinorbitals into hole and particle states, for which model-space dererminant. The wave operator is separated into individual wave operators for the different model states,

134 X X Tˆ α ˆ Ω = Ωα = e Pα (308) α α Substituting the definition of the wave operator (308), the generalized Bloch equation (307) may be written in the form X X ˆ Tˆ β ˆ Tˆ β ˆ ˆ ef f ˆ He Pβ = e PβH P (309) β β Tˆ α ˆ Projection on the left with e and on the right with Pα we obtain X Tˆ α ˆ Tˆ α ˆ Tˆ α Tˆ β ˆ ˆ ef f ˆ e− He Pα = e− e PβH Pα (310) β h ab... | h ˆˆ ˆ | Applying an external-space determinant Φij... (α) = aˆ†ib†j...Φα on the left and | i the model function Φα on the right, we obtain equations for the external-excitation ab... ˆ α emplitudes tij... (α) contained in the operators T X h ab... | Tˆ α ˆ Tˆ α | i h ab... | Tˆ α Tˆ β | ih | ˆ ef f | i Φij... (α) e− He Φα = Φij... (α) e− e Φβ Φβ H Φα (311) β The matrix elements of the effective Hamiltonian Hˆ ef f appearing in this equation are obtained, using (306) and (308), as

ef f h | ˆ ef f | i h | ˆ | i h | ˆ Tˆ α | i Hβα = Φβ H Φα = Φβ HΩ Φα = Φβ He Φα (312)

We can use the CC effective Hamiltonian for model Φα,

α Tˆ α Tˆ α H = e− Heˆ (313) Then the equations for the external-excitation amplitudes take the form X h ab... |Hα| i h ab... | Tˆ α Tˆ β | i ef f Φij... (α) Φα = Φij... (α) e− e Φβ Hβα (314) β To evaluate the matrix element in (312), we note that

h | Tˆ α h | − ˆ α h | Φα e− = Φα (1 T + ...) = Φα (315) and therefore

ef f h | Tˆ α ˆ Tˆ α | i h |Hα| i Hαα = Φα e− He Φα = Φα Φα (316) Tˆ α Tˆ α We insert e e− = 1 and obtain

ef f h | Tˆ α Hα| i Hβα = Φβ e Φα (317)

135 Next, we consider the first factor in the sum on the r.h.s. of(314)

ab... h ab... | Tˆ α Tˆ β | i h ab... | Tˆ α Tˆ β | xy... i Sij.... (αβ) = Φij... (α) e− e Φβ = Φij... (α) e− e Φuv...(α) (318) Insering a resolution of the identity between the two exponentials, we obtain X ab... h ab... | Tˆ | ih | Tˆ β | xy... i Sij.... (αβ) = Φij... (α) e− ΦI ΦI e Φuv...(α) (319) I

The series expansions of the exponentials in this equation result in expressions, involving CI-like amplitudes, corresponding to linear combinations of Tˆ amplitudes and their products. Combing the matrix Hef f with the CI-like amplitudes, an eigenvalue function of the Hef f can be derived. The diagrammatic representation and the evaluation of this expansion are decribed in detail by Paldus, Li and Petraco (2004). Several applications of Hilbert-space SU-MRCCSD were discussed by Li and Paldus (2003c, 2004), who compared model spaces of different dimensions ith high-excitation single- reference CI.

XII. INTERMOLECULAR INTERACTIONS

Intermolecular interactions (forces) determine the structure and properties of clusters, nanostructures, and condensed phases including biosystems. The interaction energy of a cluster of N atoms or molecules (called monomers) with n electrons is defined in the following way. The time-independent Schrödinger equation in Born-Oppenheimer approximation can be written as

ˆ H(r1,...,rn;Q1,...,QN )Ψ (x1,...,xn;Q1,...,QN ) = Etot(Q1,...,QN )Ψ (x1,...,xn;Q1,...,QN ) with the usual notation for the electron coordinates and with the variable Qi = (Ri,ωi,ξi) denoting the set of coordinates needed to specify the geometry of ith monomer: the position of the center of mass, Ri, set of three Euler angles ωi defining the orientation of the monomer, and a set of internal monomer coordinates, ξi. The interaction energy is the defined as the difference between this quantity and the sum of monomer energies E (ξ ) i i X − Eint(Q1,...,QN ) = Etot(Q1,...,QN ) Ei(ξi). i Interaction energies defined in this way are sometimes called “vertical" interaction energies since the geometry of each monomer is the same as its geometry in the dimer. The interaction energy of an N-mer can be represented in the form of the following many-body expansion (assuming rigid monomers)

Eint[N] = Eint[2,N] + Eint[3,N] + ... + Eint[N,N],

136 where the term Eint[k,N] is called the k-body contribution to the N-mer energy. The two- body contribution is just the sum of interaction energies of all isolated monomer pairs, i.e., all dimers X Eint[2,N] = Eint(Qi,Qj)[2,2]. i

This definition applied to a trimer shows that the trimer three-body energy is

X3 − Eint(Q1,Q2,Q3)[3,3] = Eint[N] Eint(Qi,Qj)[2,2] i

The higher-rank terms are defined in an analogous way. Any of the electronic structure methods can be used to compute interaction energies from the definition given above (so-called supermolecular approach). However, since interaction energies are usually more than an order of magnitude smaller in absolute value than chemical-bond energies and at least four orders of magnitude smaller in absolute value than the total electronic energies of atoms or molecules, the most natural method for investigating these phenomena is to start from isolated monomers and treat the interactions as small perturbations of this system. Such an approach is called symmetry-adapted perturbation theory (SAPT). An early version of SAPT was introduced already in 1930s by Eisenschitz and London. A generally applicable SAPT was developed in late 1970s and 1980s.

A. Symmetry-adapted perturbation theory

The simplest perturbation theory of intermolecular interactions is just the standard Rayleigh-Schrödinger (RS) perturbation theory discussed earlier. For a dimer, we partition the total Hamiltonian as ˆ ˆ ˆ ˆ ˆ ˆ H = H0 + V = HA + HB + V ˆ ˆ where HX is the Hamiltonian of the isolated monomer X and V is the intermonomer interaction potential

XX ZαZβ XX Z XX Zβ XX 1 Vˆ = − α − + . R r r r α A β B αβ α A j B jα i A β B iβ i A j B ij ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ The zeroth-order problem is − (Hˆ 0 E0)Φ0 = 0

137 A B where Φ0 = Φ Φ and E0 = EA + EB. We get the standard set of RS equations

Xn ˆ − (n) − ˆ (n 1) (k) (n k) (H0 E0)ΦRS = V ΦRS− + ERS ΦRS− k=1

(n) h | ˆ (n 1)i ERS = Φ0 V ΦRS− . It can be shown (a homework problem) that the first-order energy

(1) h | ˆ i ERS = Φ0 V Φ0 can be expressed as a Coulomb interaction of unperturbed charge densities of monomers, i.e., an electrostatic interaction. Therefore, this terms is usually called the electrostatic (1) energy and denoted as Eelst. The second-order energy can be written in the form of the usual spectral expansion

|h A B| A Bi|2 (2) X Φ Φ V Φ Φ E = 0 0 k l . RS EA + EB − EA − EB k+l,0 0 0 k l This energy consists of two physically distinct components, the induction energy

|h A B| A Bi|2 |h A B| A Bi|2 (2) X Φ Φ V Φ Φ X Φ Φ V Φ Φ E = 0 0 k 0 + 0 0 0 l ind EA − EA EB − EB k,0 0 k l,0 0 l and the dispersion energy

|h A B| A Bi|2 (2) XX Φ Φ V Φ Φ E = 0 0 k l . disp EA + EB − EA − EB k,0 l,0 0 0 k l In the induction energy expression, one can integrate in the first (second) sum over the coordinates of system B (A), obtaining in this way the electrostatic potential of monomer B (A) acting on system A (B). Thus, the induction energy is the response of a monomer to the electrostatic field of the interacting partner. The dispersion energy term is a pure quantum effect resulting from the correlation of electronic positions in system A with those in system B. The RS approach uses wave functions that are not globaly antisymmetric, they are antisymmetric only with respect to exchanges of electrons within monomer. We often say that the RS theory violates the Pauli’s exclusion principle. Despite of this, it was shown by performing numerical calculations for one- and two-electron monomers that the RS theory actually does converge to the correct ground-state energy. However, this is not true anymore if even one of the monomers includes three or more electrons. We will return to this subject later on. Even more serious problem is that the RS approach does not give the repulsive walls at short intermonomer distances, i.e., becomes

138 unphysical there. It is easiest to see this in interactions of rare-gas atoms where the electrostatic energy is very small (cf. classical electrostatic interactions of spherical charge distribution), so the interaction energy is dominated by the second-order term which we know is negative for the ground state. Despite problems at small separations, the RS method gives nearly exact energies at large separations, as will be discussed below, and is the basis for the multipole expansion of interaction energy, also discussed later. To solve this problem of the RS method, one has to antisymmetrize the wave functions.

We cannot use anymore the RS approach since already Φ0 has to be antisymmetrized, so that the zeroth-order equation does not hold. There are several ways of introducing antisymmetry constraint in a perturbative way, leading to the family of SAPT methods. One way to derive several variants of SAPT is to iterate the Bloch form of Schrödinger’s equation

h | i −  Ψ = Φ0 + Rˆ0 Φ0 Vˆ Ψ Vˆ Ψ (320) where X | ih | ˆ Φm Φm R0 = − (321) Em E0 m,0 is the same resolvent operator as used before. To derive Bloch’s equation, write Schrödinger’s equation as   Hˆ 0 + Vˆ Ψ = (E0 + ∆E)Ψ or

 −   −  Hˆ 0 E0 Ψ = ∆E Vˆ Ψ (322) − −| ih | and act from the left with Rˆ0. Since Rˆ0(H0 E0) = 1 Φ0 Φ0 and assuming intermediate | | normalization raketΦ0 Ψ = 1, we get the Bloch equation where ∆E = raketΦ0 Vˆ Ψ from h | multiplication of Eq. (322) by Φ0 . The Bloch equation can be iterated starting from replacing Ψ by Φ0. We get ˆ E − ˆ Ψn = Φ0 + R0 ( n V )Ψn 1 − with E h | ˆ i n = Φ0 V Ψn 1 . − Note that n is not the order of perturbation theory here. This set of equations is equivalent to RS perturbation theory in the sense that consecutive iterations reproduce higher and higher orders of this theory. However, since AˆΨ = Ψ , where Aˆ is the antisymmetrizer defined earlier, we can insert Aˆ in front of Ψ on the right-hand side of Eq. (320). After iterating, one get the following set of equations

ˆ h | ˆ G i − ˆ Fˆ Ψn = Φ0 + R0 ( Φ0 V Ψn 1 V ) Ψn 1 − − 139 hΦ |Vˆ Gˆ Ψ i E = 0 0 n 1 (323) n h |Gˆ −i Φ0 0Ψn 1 − ˆ ˆ where Fˆ , G and G0 can be A or 1. Particular choices of these operators lead to the following SAPT methods Fˆ Gˆ Gˆ 0 Ψ0 name Aˆ 1 1 Φ0 Symmetrized RS (SRS) Aˆ Aˆ 1 1 Φ0 Jeziorski-Kolos (JK) Aˆ Aˆ Aˆ Aˆ Φ0 Eisenschitz-London-Hirschfelder-van der Avoird (EL-HAV)

One may think that the EL-HAV method, applying the antisymmetrizer in all possible places, should work best. This is not the case since at large intermolecular separations this method in low order is not compatible with the RS approach (we omit the proof), and, as mentioned above, this approach is very accurate at such separation. The reason is that the exchange effects in interaction energies, i.e., effect resulting from the global antisymmetrization, decay exponentially with increasing intermonomer separation R, whereas the total interaction energy, as it will be shown below, decays as inverse powers of R. One can see the former from the zeroth-order antisymmetrized wave function for two interacting hydrogen atoms

sAˆ ± ˆ ± [1sA(r1)1sB(r2)] = (1 P12)1sA(r1)1sB(r2) = 1sA(r1)1sB(r2) 1sA(r2)1sB(r1)

where we use spin-free approach and therefore we consider two types of states which after multiplication by singlet and triplet spin functions will form antisymmetric wave

functions. The first term in the last part of the equation written above is the Φ0 of the RS theory and the second term is the exchange one. The latter term, when used in the bra of expression (323) will lead to integrals where electron 1 is on center A in the bra and on center B in the ket, and similarly for electron 2. The effect is that all such integrals are proportional to two-center overlap integrals, and such integrals have to decay exponentially since wave functions decay exponentially. In contrast to EL-HAV, the SRS and JK methods have correct asymptotics. This correctness is evident for SRS since the SRS wave functions corrections are the same as those of RS. Moreover, the two latter methods are identical in the first two orders. In practice, modern SAPT implementations always use SRS due to its simplicity. The derivation presented above assumed that one knows exact wave functions of monomers. In practice, it is possible only for the smallest atoms. Thus, a generally applicable SAPT theory has to use methods analogous to the MBPT methods discussed earlier. The zeroth-order approximation that can be computed accurately for very large systems is the Hartree-Fock level. One then simultaneously accounts for the intramonomer correlation energies, e.g., at the MP2 level or at the CCSD level, and the SAPT expansion effects.

140 We conclude this section by examining the spectrum of Li–H in order to understand why RS perturbation theory has to diverge for dimers containing one or two monomers with three or more electrons. The simplest example of such system is Li–H. The left column of the figure shows the spectrum of the unperturbed system, i.e., the energies Li H Ek + El . The right column shows the physical spectrum of LiH at the minimum separation of the dimer. The word physical indicates that the wave functions are completely antisymmetrized. The middle column shows the spectrum of LiH at the same R but in the space of functions that are antisymmetized only within Li. Since the Pauli exclusion principle does not apply to such states, one may have states well approximated by a wave function with three electrons occupying the 1s orbital: two electrons of Li and one electron coming from H. One can show (see a homework problem) that the lowest energy of such as system is much below the lowest physical energy and that the continuous spectrum starts below the lowest physical state. The physical state to appear in this spectrum by are “submerged" in the unphysical (sometimes called Pauli- forbidden) continuum. Since this means that the physical states are degenerate with this continuum, one cannot expect convergence.

FIG. 8. Spectrum of Li–H

.

B. Asymptotic expansion of interaction energy

When the distance between monomers becomes large, one can expand the interaction potential Vˆ in multipole series. This series is defined in most E&M textbooks. For the

141 electron repulsion term, we have

m X Xl K 1 ∞ < lAlB m m = Q (r )Q− (r ), − l +l +1 lA 1 lB 2 |r1 r2| m= l< R A B lA,lB=0 −

l< = min(lA,lB), where K is a combinatorial coefficients (l + l )! Km = (−1)lB A B lAlB − − 1/2 [(lA + m)!(lA m)!(lB + m)!(lB m)!] m and the solid harmonics are expressed through the standard spherical harmonics Ql (r) (called also 2lth-pole moment operator)

 4π 1/2 Qm(r) = − rlY m(rˆ). l 2l + 1 l One of homework problems shows that the first-order electrostatic energy can be written as (1) 3 3 Eelst = ρA(r1)v(r1,r2)ρB(r2)d r1d r2 where "

1 1 X Zβ 1 X Z 1 X ZαZβ v(r ,r ) = − − α + (324) 1 2 |r − r | N |r − R | N |r − R | N N |R − R | 1 2 A β 1 β B α 2 α A B α,β α β with the sums running over the nuclei of system A and B and Zγ ’s denoting the nuclear charges. Let’s apply the asymptotic expansion to the first term in this expression

1 r r 3r 3r ρA( 1)| − |ρB( 2)d 1d 2 = r1 r2 m X Xl K Z Z " ∞ < lAlB m 3 m 3 ρ (r )Q (r )d r ρ (r )Q− (r )d r l +l +1 A 1 lA 1 1 B 2 lB 2 2 m= l< R A B lA,lB=0 − The first (second) of the two integrals can be recognized as component of the multipole

moment of monomer A (B) of rank lA (lB). If the molecules are neutral and polar, the first nonvanishing moment is the dipole moment. For such systems the electrostatic energy decays as 1/R3. Similar derivations can be performed for the remaining terms in Eq. (324) and in higher orders of RS perturbation theory.

C. Intermolecular interactions in DFT

Density functional theory (DFT) is the most often used method in computational studies of matter. In the standard Kohn-Sham (KS) implementation, all electron cor-

142 relation effects are included in the exchange-correlation energy. The exact form of this energy is unknown, and a large number of approximate functionals have been constructed to describe it, as discussed earlier. While such functionals describe many properties of matter quite accurately, there are also several properties where all existing functionals fail, and one such example are intermolecular interactions which involve atoms or molecules separated by several angstroms or more. The local density approximation (LDA) obviously misses any interactions between distant regions. The semilocal generalized gradient approximations (GGA’s) still cannot describe long-range electron correlations due to the limited range of the exchange-correlation hole. The size of such a hole is of the order of 1 Å, so correlation interactions between regions separated by much more than this distance cannot be recovered. One can say that these methods are myopic with the range of vision of about 1 Å. Interaction energies given by most DFT methods can be brought to agreement with accurate interaction energies by adding a negative correction, which at very large R (for systems with no dipole and quadrupole moments) is simply the dispersion energy. For shorter R, the dispersion energy has to be tapered, differently for each DFT method. This observation led to a family of methods supplementing DFT interaction energies by a “dispersion" correction (computed for example as an atom-atom function fitted to results of calculations with wave function methods and properly tapered) referred to as DFT+D type methods. The DFT+D methods are reasonably successful, reproducing complete interaction energy curves with errors of the order of a few percent, but this approach is not anymore a first-principles one. There were also several so-called nonlocal density functionals created. These are first- principles approaches but at the present time are less accurate than DFT+D methods.

XIII. DIFFUSION MONTE CARLO

The diffusion Monte Carlo method provides a different way, than what we have seen so far in this course, to solve time-dependent Schrodinger equation of a system. Let’s consider a single particle m in a one-dimensional box. Transformation from real time to imaginary time can be done by making the following changes:

τ = it (325)

and V (x) → −V (x). Using this transformation the Schrodinger equation reads: ~ ~2 2 − − ∂τ ψ = /2m∂xψ [V (x) Er]ψ (326) One can solve this equation as: X − − ~ ψ(x,τ) = cnφn(x)exp[ (En ER)τ/ ] (327)

143 where φn(x) and En are the eigenstates and eigenvalues of the time-independent Schrodinger → ∞ equation, respectively. There are three possibilities for τ : (i) if ER > E0 the wavefunction diverges exponentially fast. (ii) if ER < E0 the wavefunction vanishes and (iii) if ER = E0 we get ψ(x,τ) = c0φ0(x). This behavior provides the basis of the DMC method: for ER = E0 the wavefunction ψ(x,τ) converges to the ground state φ0(x) regardless of the choice of initial wavefunction ψ(x,0) as long as there is an overlap

between the initial wavefunction and the ground state, namely as long as c0 , 0. Path integral method can be used to solve 327. Readers are referred to standard quantum mechanics text books to convince themselves that the following equation is true:

Z N 1 N ∞ Y− Y ψ(x,τ) = lim dxj W (xn)P (xn,xn 1)ψ(x0,0) (328) N − →∞ −∞ j=0 n=1 where the probability density P and the weight function W can be obtained as: √ ~ − − 2 ~ P (xn,xn 1) = m/2π ∆τ exp[ m(xn xn 1) /(2 ∆τ)] (329) − −

− − ~ W (xn) = exp[ (V (xn) ER)∆τ/ ] (330) R with ∆τ = τ/N. Note that ∞ P (x,y)dy = 1 and the exponential part of P is Gaussian√ −∞ ~ probability for the random variable xn with mean xn 1 and variance σ = ∆τ/m. − Equation 328 should be solved numerically and one may use the so-called Monte Carlo method. In this method an N-dimensional integral

Z N 1 ∞ Y− I = dxjf (x0,...,xN 1)P (x0,...,xN 1) (331) − − −∞ j=0 with P being probability density can be approximated as:

N X (i) (i) I = 1/N f (x0 ,...,xN 1) (332) − i=1,x(i) P ∈ (i) ∈ − xj P means i = 1,2,...,N;j = 0,1,...,N 1 are selected randomly with probability density P. It is worth mentioning that the larger N the better approximation for I.

While the Monte Carlo method is able to calculate ψ(x,τ), it is unable to find E0 and φ0(x0). An improvement over this method is called Diffusion Monte Carlo which will be explained here. The basic idea is to consider the wavefunction a probability density sampling the initial

wavefunction,ψ(x0,0), at N0 points. In fact, this method generates N0 Gaussian random walkers which evolve in time:

(i) (i) (i) xn = xn 1 + σρn (333) −

144 (i) (i) (i) where xn is generated by 329 with mean value xn 1 and variance σ. ρn is a Gaussian random number in the interval [0,1] with mean being− 0 and variance 1. It is obvious that this stochastic process looks exactly like Brownian diffusion process. The generated "random walkers" are called "particles" or "replicas" in the DMC method. Instead of tracing the motion of each particle, one follows the motion of whole ensemble of replicas. The integrand in 328 can be interpreted as:

W (xn)P (xn,xn 1)...W (x2)P (x2,x1)W (x1)P (x1,x0)ψ(x0,0) (334) − where ψ(x0,0), P (x1,x0), W (x1), ..., P (xn,xn 1) and W (xn) are process 0, process 1, process − 2, ..., process 2N − 1 and process 2N, respectively. Initial state: The 0th process describes particles distributed according to the initial − − wavefunction,ψ(x0,0), which is typically chosen as δ function (ψ(x0,0) = δ(x x0)). Diffusive displacement: The DMC algorithm produces x1 = x0 + σρ1, x2 = x1 + σρ2, etc. by generating random numbers ρn;n = 1,2,... Birth-death processes: After each time step, each particle is replaced by a number of replicas which is given by:

mn = min[int[W (xn)] + u,3] (335) where u is a random number which is uniformly distributed in [0,1]. If mn = 0 the particle is deleted and diffusion process is terminated (death). If mn = 1 there is no effect, the particle stays alive and the algorithm takes it to the next diffusion step. If mn = 2 the particle goes to the next diffusion step and another particle starts off a new series at the present location (birth). If mn = 3, the scenario is similar to the previous case but there are 2 newly born replicas starting off at the current location. Algorithm: Now, it is time to summarize the algorithmic steps of the DMC: (i) 1) One starts with N0 particles at positions x0 ,i = 1,2,..,N0 which are placed according to the distribution ψ(x0,0). It is more convenient to choose all replicas to start at the same point x0. 2) Rather than following the fate of each replica, one follows all replicas simultaneously: √ (j) (j) ~ (j) x1 = x0 + ∆τ/mρ1 ;j = 1,2,...,N0 (336)

This is regarded as one-step diffusion process of replicas. (j) (j) 3) Once the new position x1 is calculated, one evaluates W (x1 ) through 330 and from (j) (j) 335 one determines a set of integers m1 for j = 1,2,...,N0. Replicas with m1 = 0 are (j) (j) terminated. If m1 = 1 replicas are left unaffected. Replicas with m1 = 2,3 go to the next diffusion step, but 1,2 more replica(s) should be added to the system at the current position.

4) The number of replicas is counted and N1 is determined. 5) During the combined diffusion and birth-death processes, the distribution of replicas

145 (j) changes in such a way that the coordinate x1 now is distributed according to the probability density ψ(x,∆τ).

6) As a result of birth-death processes, the total number of replicas,N1, is now different from N0. One wants to have almost constant number of replicas during the calculations. Therefore, one can use a suitable choice of ER to fix the increased or decreased number of replicas. Note that for sufficiently small ∆τ 330 can be approximated as W (x)  − − ~ 1 (V (x) ER)∆τ/ . Now, averaging over all replicas: − − ~ < W >1 1 (< V >1 ER)∆τ/ (337)

PN1 (j) with < V >1= 1/N1 j=1 V (x1 ) One would like < W >1 to be eventually always unity. Therefore,

(1) ER =< V >1 (338)

(2) Er can be evaluated as (the proof is left for homework): (2) (1) ~ − ER = ER + /∆τ(1 N1/N0) (339)

The diffusive displacement, the birth-death processes and estimation of new ER are repeated until ER and distribution of replicas converge to stationary values. Now, the distribution of replicas can be interpreted as the ground state wavefunction and the ground state energy can be calculated as E0 = limn < V >n. →∞

XIV. DENSITY-MATRIX APPROACHES

The quantum state of a single particle thus far has been described by a wavefunction Ψ (x) in coordinate and spin space. In this section, we will consider an alternative representation of the quantum state, called the density matrix. The density matrix was originally introduced in quantum statistical mechanics to describe a system for which the state was incompletely specified. Although describing a quantum system with the density matrix is equivalent to using the wavefunction, it has been shown that density matrices are more practical for certain time-dependent problems. The general N-order density matrix is formally defined as

≡ γN (x10 x20 ...xN0 ,x1x2 ...xN ) ΨN (x10 ,x20 ,...,xN0 )ΨN∗ (x1,x2,...,xN ) (340) { } where xi = ri,s denotes spatial and spin coordinates. Note that the density matrix { } { } contains two sets of independent quantities, xi0 and xi , that gives γN a numerical value. Equivalently, Eq. 340 can be viewed as the coordinate representation of the density operator, | ih | γˆN = ΨN ΨN (341)

146 since h | | i h | ih | i x10 x20 ...xN0 γˆN x1x2 ...xN = x10 x20 ...xN0 ΨN ΨN x1x2 ...xN | i Note that γˆN can also be thought of as the projection operator onto the state ΨN . We then have for normalized ΨN , Z N N N Tr(γˆN ) = ΨN∗ (x )ΨN (x )dx = 1

N { }N ˆ where x stands for the set xi i=1. The trace of an operator A is defined as the sum of diagonal elements of the matrix representing Aˆ, or the integral if the representation is continuous as above. It can also be verified that h ˆi ˆ ˆ A = Tr(γˆN A) = Tr(AγˆN )

From this, the density operator γˆN can be seen to carry the same information as the N- | i | i electron wave function ΨN . Note that while Ψ is defined only up to an arbitrary phase factor, γˆN for a state is unique. γˆN is also positive semidefinite and Hermitian. The state of the system is said to be pure if it can be described by a wavefunction, and mixed if it cannot. A system in a mixed state can be characterized by a probability distribution over all accessible pure states. We can think of γN as an element of a matrix (density matrix); if we set xi = xi0 for all i, we get the diagonal elements of the density matrix, ≡ | |2 γN (x1x2 ...xN ) ΨN∗ (x1,x2,...,xN )ΨN (x1,x2,...,xN ) = ΨN (x1,x2,...,xN ) which is the N-order density matrix for a pure state. Note that this is also the probability distribution associated with a solution of the Schrödinger equation. We can express the Schrödinger equation in density-matrix formalism by taking the time derivative of the density operator and using Hermiticity and commutation relations, ! ! ∂ ∂ ∂ γˆ = |Ψ i hΨ | + |Ψ i hΨ | ∂t N ∂t N N N ∂t N ! ! ∂ Hˆ Hˆ γˆ = |Ψ i hΨ | − |Ψ i hΨ | ∂t N i~ N N N i~ N ∂ h i i~ γˆ = H,ˆ γˆ (342) ∂t N N This equation describes how the density operator evolves in time. We can generalize the density operator γˆN to the ensemble density operator X ˆ | ih | Γ = pi Ψi Ψi (343) i | i where pi is the probability of the system being found in the state Ψi , and the sum is over the complete set of all accessible pure states. pi has the following properties since it is a probability: X ≥ pi 0, pi = 1 i

147 We can then rewrite Eq. 342 in terms of the ensemble density matrix to obtain ∂ h i i~ Γˆ = H,ˆ Γˆ (344) ∂t which is true if Γˆ only involves states with the same number of particles, as is true in the canonical ensemble. This equation is also known as the von Neumann equation, the quantum mechanical analog of the Liouville equation. For stationary states, Γˆ is independent of time, which means that h i H,ˆ Γˆ = 0 which implies that Hˆ and Γˆ share the same eigenvectors. Work done in statistical mechanics deal heavily with systems at thermal equilibrium, where the density matrix is characterized by thermally distributed populations in the quantum states

e βHˆ ρˆ = − Z

where β = 1/kBT , kB is the Boltzmann constant, and Z is the partition function

βHˆ Z = Tr(e− )

In this language, one can express a thermally averaged expectation value as Tr(Ωˆ ρˆ) hΩˆ i = Z With a mixed state, we have less than perfect knowledge of what the quantum state is. We can describe how much less information there is by defining the entropy as − S = kBTr[ρ lnρ] The basic Hamiltonian operator, Eq. 19, is a sum of two symmetric one-electron operators and a symmetric two-electron operator, neither depending on spin. Along with the fact that the wavefunctions ΨN are antisymmetric , the expectation values of the density operator can be systematically simplified by integrating the probability densities over N − 2 of its variables, giving rise to concepts of reduced density matrix and spinless density matrix.

A. Reduced density matrices

The reduced density matrix of order p is defined as

γp(x10 x20 ...xp0 ,x1x2 ...xp) = !Z Z N ··· γ (x0 x0 ...x0 x ...x ,x x ...x ...x )dx ...dx (345) p N 1 2 p p+1 N 1 2 p N p+1 N

148 N where p is a binomial coefficient, and γN is defined as Eq. 340. This is also known as taking the partial trace of the density matrix. For example, the first-order density matrix

γ1 is defined as Z Z γ1(x10 ,x1) = N ... Ψ ∗(x10 x2 ...xN )Ψ (x1x2 ...xN )dx2 ...xN (346) and normalizes to Z Tr γ1(x10 ,x1) = γ1(x1,x1)dx1 = N

Similarly, the second-order density matrix γ2 is defined as Z Z N(N − 1) γ (x x ,x x ) = ··· Ψ (x x x ...x )Ψ (x x x ...x )dx ...dx (347) 2 10 20 1 2 2 ∗ 10 20 3 N 1 2 3 N 3 N and normalizes to the number of electron pairs ZZ N(N − 1) Tr γ (x x ,x x ) = γ (x x ,x x )dx dx = 2 10 20 1 2 2 1 2 1 2 1 2 2

The reduced density matrices γ1 and γ2 just defined are coordinate-space representations of operators γˆ1 and γˆ2, acting on the one- and two-particle Hilbert spaces, respectively. We can express the one-particle operator in terms of its eigenvalues and eigenvectors X | ih | γˆ1 = ni ψi ψi i | i where the eigenvalues ni are the occupation numbers and the eigenvectors ψi are the natural spin orbitals. Similarly, the two-particle operator can be expressed as X | ih | γˆ2 = gi θi θi i | i where the eigenvalues gi are the occupation numbers and the eigenvectors θi are called ≥ ≥ natural geminals. It also follows that ni 0 and gi 0. Comparing these two operators with Eq. 343, we can see that ni is proportional to the probability of the one-electron | i state ψi being occupied and gi is proportional to the probability of the two-electron | i state θi being occupied. Now let us consider the expectation values of one- and two-electron operators with an antisymmetric N-body wavefunction Ψ . For a one-electron operator

XN ˆ O1 = O1(xi,xi0) i=1

we have Z h ˆ i ˆ O1 = Tr(O1γN ) = O1(x1,x10 )γ1(x10 ,x1)dx1dx10 (348)

149 − If the one-electron operator is local, i.e. O1(r0,r) = O1(r)δ(r0 r), we can conventionally write down only the diagonal part; thus Z hOˆ i = Tr(Oˆ γ ) = [O (x )γ (x ,x )] dx 1 1 N 1 1 1 10 1 x10 =x1 1

Similarly, if the two-electron operator is local, we have

XN ˆ O2 = O2(xi,xj) i

and the corresponding expectation value ZZ hOˆ i = Tr(Oˆ γ ) = [O (x ,x )γ (x ,x ,x ,x )] dx dx 2 2 N 2 1 2 2 10 20 1 2 x10 =x1,x20 =x2 1 2

We thus obtain for the expectation value of the Hamiltonian, Eq. 19, in terms of density matrices

ˆ E = Tr(HγˆN ) = E[γ1,γ2] Z ZZ  1   1 − ∇2 = 1 + v(r1) γ1(x10 ,x1) dx1 + | − |γ2(x1x2,x1x2)dx1dx2 (349) 2 x10 =x1 r1 r2 We can further simplify this result by integrating over the spin variables.

B. Spinless density matrices

The first-order and second-order spinless density matrices are defined by Z ρ1(r10 ,r1) = γ1(r10 s1,r1s1)ds1 Z Z ··· = N Ψ ∗(r10 s1x2 ...xN )Ψ (r1s1x2 ...xN )ds1dx2 ...dxN (350)

and Z ρ2(r10 r20 ,r1r2) = γ2(r10 s1r20 s2,r1s1r2s2)ds1ds2 Z Z N(N − 1) = ··· Ψ (r s r s x ...x )Ψ (r s r s x ...x )ds ds dx ...dx (351) 2 ∗ 10 1 20 2 3 N 1 1 2 2 3 N 1 2 3 N

We can introduce a shorthand notation for the diagonal elements of ρ1, Z Z ··· | |2 ρ1(r1) = ρ1(r1,r1) = N Ψ ds1dx2 ...xN

150 and similarly for ρ2, Z Z N(N − 1) ρ (r ,r ) = ρ (r r ,r ,r ) = ··· |Ψ |2ds ds dx ...dx 2 1 2 2 1 2 1 2 2 1 2 3 N Also note that from the above definitions, we can express the first-order density matrix in terms of the second-order density matrix Z 2 ρ(r0 ,r ) = ρ (r0 r ,r r )dr 1 1 N − 1 2 1 2 1 2 2 Z 2 ρ(r ) = ρ (r ,r )dr 1 N − 1 2 1 2 2 The expectation value of the Hamiltonian, Eq. 349, in terms of density matrices now becomes

E = E[ρ1(r10 ,r1),ρ2(r1,r2)] Z Z ZZ  1  1 − ∇2 = ρ1(r0,r) dr + v(r)ρ(r) + | − |ρ2(r1,r2)dr1dr2 (352) 2 r0=r r1 r2 where the three terms represent the electronic kinetic energy, the nuclear- electron potential energy, and the electron-electron potential energy, respectively. Note that since we can express the first-order density matrix in terms of the second-order, only the second-order density matrix is needed for the expectation value of the Hamiltonian.

C. N-representability

From Eq. 349, one may hope to minimize the energy with respect to the density matrices, thus avoiding having to work with the 4N- dimensional wavefunction. Since only the second-order density matrix is needed for the energy minimization, the trial γ2 must correspond to some antisymmetric wavefunction Ψ ; i.e. for any guessed second- order density matrix γ2 there must be a Ψ from which it comes via its definition, Eq. 347. This is the N-representability problem for the second-order density matrix. For a trial wavefunction to be N-representable, it must correspond to some antisymmetric wavefunction from which it comes via Eq. 345. It’s a difficult task to obtain the necessary and sufficient conditions for a reduced density matrix γ2 to be derivable from an antisymmetric wavefunction Ψ . Instead, it may be easier to solve the ensemble N- representability problem for Γ2, where Γp is the p-th order mixed state (ensemble) density matrix defined as

Γp(x10 x20 ...xp0 ,x1x2 ...xp) = !Z Z N ··· Γ (x0 x0 ...x0 x ...x ,x x ...x ...x )dx ...dx (353) p N 1 2 p p+1 N 1 2 p N p+1 N

151 Since ˆ 0 ≤ ˆ ˆ E0 = Tr(HΓN ) Tr(HΓN ) it is completely legitimate to enlarge the class of trial density operators for an N-electron problem from a pure-state set to the set of positive unit-trace density operators made up from N-electron states. This minimization leads to the N-electron ground state energy

and the ground state γˆN if it is not degenerate, or an arbitrary linear combination γˆN (convex sum) of all degenerate ground states if it is degenerate. Thus the minimization ˆ in Eq. 352 can be done over ensemble N-representable Γ2. For a given Γ1, X ˆ | ih | Γ1 = ni ψi ψi i the necessary and sufficient conditions for it to be N-representable are that

≤ ≤ 0 ni 1 (354) ˆ for all of the eigenvalues of Γ1. This conforms nicely with the Pauli exclusion principle. Let us now prove this theorem of the necessary and sufficient conditions for the first- order density matrix to be N-representable. The necessary conditions for Γ1 and Γ2 are such that they satisfy Eq. 353 for a proper ΓN . The sufficient conditions are those that guarantee the existence of a ΓN that reduces to Γ1 or Γ2. The set of Γ1 or Γ2 that simultaneously satisfies both necessary and sufficient conditions is called the set of N { } { } -representable Γ1 or Γ2. If the energy is minimized over sets Γ1 and Γ2 satisfying only necessary conditions, an energy lower than the true energy can be obtained (lower bound). If it is minimized over sets satisfying only sufficient conditions, an energy higher than the true energy is obtained (upper bound). If one minimizes the energy over all sets satisfying the sufficient conditions, the ground-state energy is obtained. The necessary conditions on γ1 and γ2 imposed by N-representability are also called | i | i Pauli conditions and are as follows. If ψi is some normalized spinorbital state and ψiψj × is a normalized 2 2 Slater determinant built from orthonormal ψi and ψj, then ≤ h | | i ≤ 0 ψi γˆ1 ψi 1 (355) ≤ h | | i ≤ 0 ψiψj γˆ2 ψiψj 1 (356) In the coordinate representation, they can be written as ZZ ≤ ≤ 0 dx1dx10 ψi∗(x10 )γ1(x10 ,x1)ψi(x1) 1 (357) ZZZZ 1 h i 0 ≤ dx dx dx dx |ψ (x )ψ (x )|γ (x x ,x x )|ψ (x )ψ (x )| ≤ 1 (358) 2 1 10 2 20 i∗ 10 j∗ 10 2 10 20 1 2 i∗ 1 j∗ 2

Eq. 355 is equivalent to the requirement that the eigenvalues of γˆ1 are given by Eq. 354, whereas Eq. 356 is not equivalent to the eigenvalues of γˆ2 since the eigenfunctions in

152 × general are not 2 2 Slater determinants. Let us prove this relation for γˆ1. We start by introducing the field creation and annihilation operators, which create and annihilate one-particle states that are the eigenfunction in coordinate space X X ˆ ˆ ψ†(x) = ψi∗(x)aˆi† ψ(x) = ψi(x)aˆi i i

where Z Z ˆ ˆ aˆi† = dxψi(x)ψ†(x) aˆi = dxψi∗(x)ψ(x)

Let us consider an arbitrary single-particle operator Aˆ(i) and do a spectral decomposition X X ˆ | ih | ˆ| ih | | ih | A(i) = α α A β β = Aαβ α β αβ αβ

The simplest N-particle operation can be constructed as

ˆ ˆ ˆ ··· ˆ AN = A(1) + A(2) + + A(N)

where Aˆ(i) acts on the states of the ith particle, X X ˆ | ··· ··· i h | ˆ| i| ··· ··· i | ··· ··· i A(i) α1 αi αN = βi A αi α1 βi αN = Aβi αi α1 βi αN βi βi giving XN X ˆ | ··· ··· i | ··· ··· i AN α1 αi αN = Aβi αi α1 βi αN i=1 βi We can instead write Aˆ in terms of creation and annihilation operators X Z ˆ ˆ ˆ A = Aαβaˆα† aˆβ = dx1dx10 A(x1,x10 )ψ†(x1)ψ(x10 ) αβ

The expectation energy of Aˆ is then obtained from Z h ˆi h | ˆ| i h | ˆ ˆ | i A = Ψ A Ψ = dx1dx10 A(x1,x10 ) Ψ ψ†(x1)ψ(x10 ) Ψ

ˆ ˆ if we now compare this equation to Eq. 348, and let A = O1, we find that the first-order density matrix can be written as

h | ˆ ˆ | i γ1(x10 ,x1) = Ψ ψ†(x1)ψ(x10 ) Ψ

We can now rewrite Eq. 357 as ZZ ≤ h | ˆ ˆ | i ≤ 0 dx1dx10 ψi∗(x10 ) Ψ ψ†(x1)ψ(x10 ) Ψ ψi(x1) 1

153 ZZ X X ≤ h | | i ≤ 0 dx1dx10 ψi∗(x10 ) Ψ ψl∗(x1)aˆl† ψj(x10 )aˆj Ψ ψi(x1) 1 l j which can be reduced down to ≤ h | | i ≤ 0 Ψ aˆi†aˆi Ψ 1 ˆ where aˆi†aˆi = Ni is just the number operator, which generates the occupation number for ˆ the ith orbital. Since Ni is a projection operator, the expectation value of it is always nonnegative and less than or equal to 1. Hence the Pauli condition for the first order density matrix, Eq. 355 is satisfied.

The sufficient conditions require that the there must exist a ΓN that reduces to Γ1 and Γ2. The proof for Γ1 is as follows. First we need a simple lemma about vectors and convex sets. A set is convex if an arbitrary positively weighted average of any two elements in the set also belongs to the set. Next we define an extreme element of a convex set as

an element E such that E = p1Y1 + p2Y2 implies that Y1 and Y2 are both multiples of E. L Then the lemma states that the set of vectors v = (v1,v2,...) in a space of arbitrary but ≤ ≤ P fixed dimension with 0 vi 1 and vi = N is convex and its extreme components are the vectors with N components equal to 1 and all other components equal to 0. Given ˆ this lemma, it is clear that any γˆ1 or Γ1 satisfying Eq. 354 is an element of a convex 0 0 set whose extreme elements are those γ1 or Γ1 that have N eigenvalues equal to 1 and 0 ˆ 0 the rest equal to 0. Each of these γˆ1 or Γ1 determines up to a phase a determinantal 0 N-electron wavefunction and a unique corresponding pure-state density operator γˆN . 0 Some positively weighted sum of these γN will be the ΓN that reduces to the given γˆ1 or ˆ Γ1 through Eq. 353. Hence, sufficiency is proved.

D. Density matrix functional theory

We have studied DFT in depth in an earlier section, where we started from a variational principle that had the electron density ρ(r) as the basic variable. Using the concept of density matrices, we can construct a corresponding density-matrix-functional theory

(DMFT), in which the basic variable is the first-order reduced density matrix γ1(x0,x), or the first-order spinless density matrix ρ1(r0,r). The main advantage of DMFT over DFT is that the kinetic energy as a functional of the density matrix is known, therefore there is no need to introduce an auxiliary system. The unknown exchange- correlation part only has to describe only has to describe the electron- electron interactions, whereas in DFT, the kinetic energy part was also included. As we know, the first-order density matrix determines the density Z

ρ(r) = ρ1(r,r) = γ1(x0,x) ds x=x0 and by the Hohenberg-Kohn theorem, it determines all properties including the energy,

E = E[γ1] = E[ρ1]

154 The explicit form of the functional is given via a constrained search, where the search can be over all trial density operators

ˆ E[γ1] = min Tr(γˆN H) γ γ N → 1 The variational principle for DMFT can be written as

{ − } δ E[γ1] µN[γ1] = 0 (359)

where µ is the chemical potential. This variational principle stands for

E0 = minE[γ1] (360) γ1

If we parametrize γ1 in terms of natural spinorbitals ψi and occupation numbers ni, we obtain ! ∂E µ = for all i ∂n i ψi ,ni ,nj assuming that

0 < ni < 1 (361)

If there are any natural orbitals among the complete set of natural orbitals for which Eq. 361 is not true, then the above assertion for µ is not true, for such an orbital with ≤ ni = 1 has ∂E/∂ni µ. It is a conjecture that Eq. 361 holds for all orbitals in an atom or molecule. If the variation in Eq. 359 is chosen for the natural orbitals themselves with orthonormalization conditions imposed, the result is a set of coupled differential equations for the natural orbitals, which are very different from the KS equations derived previously. From Eq. 349, the first-order density matrix determines all components of the energy explicitly except for Vee[γ], constrained search in Eq. 360 may be restricted to searching for the γN that minimizes Vee,

ˆ Vee[γ1] = min Tr(γˆN Vee) γ γ N → 1 → → or since γN γ2 γ1, we could write ZZ 1 Vee[γ1] = min γ2(x1x2,x1x2)dx1dx2 γ2 γ1 r → 12 where the trial γ2 would have to satisfy the N-representability conditions outlined in the previous section.

155 E. Contracted Schrödinger equation

The many-body Hamiltonian operator is the sum of 1 and 2 electron operators, which is why the energy of an N-electron system can be expressed as a functional of the 2- RDM, which depends on the variables of electrons. This gives us a method of studying the structure of electronic systems by determining the 2-RDM instead of the N-electron wavefunction. The main question is whether the Schrödinger equation can be mapped into the 2 electron space; and if it is capable, what would be the properties of the resulting equations? There were two approaches to answering this question: one method is to integrate the Schrödinger equation obtained in first quantization, resulting in what’s called the density equation; the second method is to apply a contracting mapping to the matrix representation of the Schrödinger equation, obtaining what’s known as the Contracted Schrödinger equation (CSE). It was found that although the two equations looked very different, they are in fact equivalent. In this section, we will put focus on the contracted Schrödinger equation. We begin with an introduction to the notation. The many-body Hamiltonian can be written as X 1 X H = h a a + hij|klia a a a ij i† j 2 i† j† l k ij ijkl where the 1-electron basis is assumed to be finite and formed by 2 K orthogonal spinorbitals. We can rewrite it as 1 X H = K a a a a (362) 2 ijkl i† j† l k ijkl where the elements of the two-particle reduced Hamiltonian matrix are given by 1 K = (h δ + h δ ) + hij|kli ijkl N − 1 ik jl jl ik is the reduced Hamiltonian, which has the same properties of the 2-electron matrix. In second quantization formalism, the p-order reduced density matrix (p-RDM) can be written as 1 p ΨΨ 0 h | | i D = Ψ a† a† ...a† aj ...aj aj Ψ 0 (363) i1,i2...,ip,j1,j2,...,jp p! i1 i2 ip p 2 1

When Ψ , Ψ 0, we have an expression defining an element of the p-order transition reduced density matrix (p-TRDM). We will work with the case of pure states with Ψ = Ψ 0,

p Ψ 1 h | | i D = Ψ a† a† ...a† aj ...aj aj Ψ (364) i1,i2...,ip,j1,j2,...,jp p! i1 i2 ip p 2 1 where the complementary matrix to the p-RDM is the p-order holes reduced density matrix (pHRDM),

p ¯ Ψ 1 h | | i D = Ψ aj ...aj aj a† a† ...a† Ψ (365) i1,i2...,ip,j1,j2,...,jp p! p 2 1 i1 i2 ip

156 where hole implies that Ψ itself is the reference state. This second quantization formalism for the density matrix is equivalent to the expressions in the previous sections. The equivalence is shown using the field creation and annihilation operators formulated in section C. Integrating the N-electron density matrix,

N D = Ψ (1,2...N)Ψ (1 ,2 ...N ) 1,2...N;10,20...N 0 ∗ 0 0 0

over coordinates (p + 1) to N defines the p-RDM Z p Ψ D = Ψ (1,2...N)Ψ (10,20 ...p0 ...N)d(p + 1)...dN 1,2...p,1020...p0

→ where we have changed notation for convenience (i1,i2 ...ip,j1,j2 ...jp) (1,2...p,1020 ...p0). Let us begin with a quantum system of N fermions characterized by the Schrödinger equation (SE) | i | i H ψn = En ψn

where the wavefunction ψn depends on the coordinates for the N particles. We will use second quantization to derive the contracted Schrödinger equation, emphasizing the use of of test functions for contracting the SE onto a lower particle space. Nakatsuji’s theorem tells us that there is a one-to-one mapping between the N-representable RDM solutions of the CSE and the wavefunction solutions of the SE. The proof will be covered as homework. Because the N-particle Hamiltonian contains only two-electron excitations, the expectation value of H yields an energy

1 X X ψ E = K hψ|a a a a |ψi = K 2D 2 ijkl i† j† l k ijkl ij,kl ijkl ijkl

where ψ 1 2D = hψ|a a a a |ψi ij,kl 2 i† j† l k Next define functions to test the two-electron space

h ij| h | Φkl = ψ ai†aj†alak

If we take the inner product of the test functions with the SE, we obtain

h | i h | | i 2 ψ ai†aj†alakHψ = E ψ ai†aj†alak ψ = 2E Dij,kl

If we substitute the Hamiltonian, Eq. 362 into the above, obtaining

1 X K hψ|a a a a a a a a |ψi = 2E 2D 2 pqst i† j† l k p† q† t s ij,kl pqst

157 Rearranging creation and annihilation operators to produce RDMs, we generate the 2,4-CSE

  X  X K2DΨ + 3 K 3DΨ + K 3DΨ + 6 K 4DΨ = E 2DΨ ij,kl pq,it pqj,ktl pq,jt pqi,ltk pq,st pqij,stkl ij,kl pqt pqst where the first term comes from considering two pairs of indices being the same (this removes two pairs of operators, leaving us with the 2-RDM). The second term comes from considering one pair of indices being the same (this removes one pair of operators, leaving us with the two 3-RDM terms). The factor of 3 in front comes from the different ways the operators can be rearranged. And the last term comes from considering no indices being the same (this allows us to commute the operators so all creation operators are on the left and all annihilation operators are on the right). The factor of 6 comes from the possible arrangements of the indices. The underlines in the first term indicate matrices. We can see that this depends on the 3-RDM and 4-RDM, causing it to be indeterminate. If we knew how to build higher ordered RDM from the 2-RDM, this equation would allow us to solve iteratively for the 2-RDM. This class of problems is called the reconstruction problem, and two approaches have been explored. One approach is to explicitly represent the 3-RDM and 4-RDM as functionals of the 2-RDM, and the other method is to construct a family of higher 4-RDMs from the 2-RDM by imposing ensemble representability conditions. Besides this difficulty, we also have to ask whether the solutions of this equation will coincide with those of the SE. The derivation for the 2-CSE shows that the SE implies the 2-CSE, but does the inverse hold true? The answer is yes; we must show that the p-CSE for p ≥ 2 is equivalent to the SE by stating and proving the following theorem. Theorem [Nakatsuji]: If the RDM’s are N-representable, then the p -CSE is satisfied by the p-, (p + 1)-, and (p + 2)-RDM if and only if the N-DM satisfies the Schrödinger equation. The proof of this theorem will be a homework problem. First, the SE is satisfied if and only if the following dispersion relation is satisfied.

hΨ |H2|Ψ i − hΨ |H|Ψ i2 = 0

Therefore, we must prove that the 2-CSE in second quantized form must satisfy the dispersion relation. Since for p > 2, the p-CSE implies the 2-CSE, the demonstration is also valid for the higher order equations. Note that this is not valid for the 1-CSE since the Hamiltonian includes 2-electron terms. One of the important consequences of the equivalence of 2-CSE and higher order CSE’s with the SE is that the CSEs may be applied to the study of excited states.

158 XV. DENSITY MATRIX RENORMALIZATION GROUP (DMRG)

The Density Matrix Renormalization Group (DMRG) is an approximation method which was first conceived by Steven White in 1992 as a way to handle strongly correlated quantum lattices. In the context of molecular physics, the renormalization group structure is often ignored, and DMRG is viewed as a special wave-function ansatz. We will start by writing the wave-function | i | i | i Ψ = ΨHF + Ψcorr (366)

h | i ΨHF Ψ = 1. (367) | i | i where ΨHF is the Hartree-Fock wave-function, and Ψcorr is the correlation part. In | i most methods, it is assumed that Ψcorr is small compared to the exact wave function. | i Issues arise when the coefficients in the expansion of Ψcorr are on the order or larger than unity. The primary challenge of strongly correlated systems is that there are a large number of determinants that contribute significantly to the wave function. The goal of DMRG is to overcome this complexity by encoding the idea of locality into the wave-function which may include all possible determinants. In other words, most of the quantum phase space is not explored by physical ground states, which makes the strongly correlated problem far more tractable. Currently DMRG is able to describe about 40 electrons in 40 orbitals for compact molecules, and in elongated molecules it can describe about 100 electrons in 100 orbitals.

A. Singular value decomposition

When discussing DMRG singular value decomposition (SVD) proves to be an indispensable method which is able to break down the coefficient tensor of a strongly correlated wave function. SVD is a method which is able to decompose any M × N matrix ψ (M ≥ N)

ψ = UDV T , (368) where U is an M × M column orthogonal matrix

T U U = IM M, (369) × V is a square N × N column orthogonal matrix

T V V = IN N , (370) × ··· and D is a diagonal matrix who’s elements σ1, ,σN are arranged in decreasing order.

159 The columns of U are chosen as the eigenvectors of ψψT , the columns of V are chosen as the eigenvectors of ψT ψ, and the singular values are the square root of the eigenvalues of ψT ψ. The SVD is in general not unique and in many texts the diagonal matrix is defined to be square and either the left or right orthogonal matrix is defined to be rectangular, but all of these formulations are really the same thing.

In order to prove the SVD we must first show that the eigenvalues of a real symmetric matrix are real, positive and that their eigenvectors are orthogonal.

Given real valued ψ with dimensions M×N we may construct ψT ψ which is real symmetric.

Given any eigenvalue λ and corresponding normalized eigenvector ~x we will show that λ must be real

h~x,ψT ψ~xi = λh~x,~xi = λ (371) but T T h~x,ψ ψ~xi = hψ ψ~x,~xi = λ∗h~x,~xi = λ∗ (372)

this means that λ = λ∗ is real.

Now we will show that these eigenvalues are non-negative.

h T i h i | |2 ≥ λ = ~x,ψ ψ~x = ψ~x,ψ~x = ψ~x 2 0. (373)

Now let us assume that it has unique eigenvalues, and take any two of them (λ,µ) with their corresponding normalized eigenvectors (~x, ~y). From this we will show that the eigenvectors are orthogonal.

λh~y,~xi = h~y,ψT ψ~xi = hψT ψ~y,~xi = µh~y,~xi. ⇒ (λ − µ)h~y,~xi = 0 ⇒ h~y,~xi = 0 since µ , λ.

Now if any of the eigenvalues are degenerate then those eigenvectors are clearly linearly independent (since they are different eigenvectors) and orthogonal to all other eigenvectors corresponding to different eigenvalues by the argument above, therefore we may decompose them using the Gram-Schmidt process and produce a set of orthogonal vectors for the degenerate eigenvalues.

Now the proof of SVD is as follows. For all non-zero eigenvalues√ which are ordered ∈ { ··· } T 2 from largest to smallest λi (i 1, r of ψ ψ we define σi = λi, and ~ui = ψ~xi/σi. Then

160 h i m ~ui, ~uj = δi,j. These ~ui may be therefore extended to a basis for R . Now construct a matrix U out of these ~ui by defining each of these vectors in order as the columns of U, and then define matrix V in the same way buy using ~xi as its columns. From this definition we see that

T  ~xiψ ψ~xj 0 i,j > r (U T ψV ) = ~uT ψ~x = =  ≡ D . (374) i,j i j σ  i,j i σiδi,j else From this we have shown that ψ = UDV T .

B. SVD applications

Let us now apply SVD to a quantum system A, which is described by an M dimensional {| i }M orthogonal basis i A i=1, which is surrounded by an environment B, that is described by {| i }N an N dimensional orthogonal basis j B j=1.

The state of this combined time-independent system can be represented as

XM XN | i | i | i Ψ = ψi,j i A j B. (375) i=1 j=1 where ψ is real.

Now suppose that some operator O acts on the quantum system but not the environment. The expectation value of this operator O may be written as

XM XN XM XM XM A A hOi = ψ ψ hi|O|i0iδ = ψ ψ hi|O|i0i = ρ O = Tr [ρ O], i,j i0,j0 j,j0 i,j i0,j i0,i i,i0 A i,i0=1 j,j0=1 i,i0=1 j=1 i,i0=1 (376) A | ih | where ρ = TrB[ρ] = TrB[ Ψ Ψ ] is the reduced density matrix for this system.

Now as an example lets assume that we have a specific case of a system described before where one spin in system A is in contact with some environment B which also contains a spin:

1 |Ψ i = (4| ↑↓i − 3| ↓↑i), (377) 5 then the resultant coefficient matrix is ! 1 0 4 ψ = , (378) 5 −3 0

161 and it has the resultant singular value decomposition ! ! ! 1 0 1 4 0 0 1 U = ,D = ,V = , (379) 0 −1 5 0 3 1 0 and from this we may show that (proved in the homework for general case)

16 9 ρA = ~u ~uT + ~u ~uT . (380) 25 1 1 25 2 2 Now suppose were were to approximate this RDM with only its largest eigenvalue (the most probable state) clearly this approximation would yield inexact expectation values, but in general as the coefficient matrices become sufficiently large and a smaller proportion of the eigenvectors are truncated, approximations of this type become better and better until solutions converge to the exact case. This is the real essence of DMRG. We use SVD in order to lower the dimensionality of the space over which we search for ground states by only searching over the most probable states.

C. DMRG wave function

To understand the DMRG wave function lets start with the FCI wave function expanded in a complete basis of determinants.

X | i s1s2 sL | ··· i Ψ = ψ ··· s1s2 sL , (381) s { j } | i ∈ {| i | ↑i | ↓i | ↑↓i} sj vac , , , (382) | ··· i Here s1s2 sL describes the occupancy of L orbitals, and the coefficient tensor ψ in the expansion above has dimension 4L. This problem becomes intractable as L gets large, since unlike in normal CI we are unable to truncate this tenor since sparsity for strongly correlated systems is not assumed. The FCI tensor may be exactly decomposed by singular value decomposition as follows:

X s1s2 sL ψ ··· = U[1]s ,α s[1]α V [1]α ,s s . (383) 1 1 1 1 2··· L α1

s1 ≡ A[1]α1 U[1]s1,α1 s[1]α1 . (384) And we may again do this to decompose V [1] X V [1]α ,s s = U[2]s ,α s[2]α V [2]α ,s s (385) 1 2··· L 2 2 2 2 3··· L α2

162 s2 ≡ A[2]α1,α2 U[2]s2,α2 s[2]α2 (386) We may continue doing this until the coefficient tensor is exactly decomposed

X s s s s1s2 sL 1 2 ··· L s1 s2 ··· sL ψ ··· = A[1]α1 A[2]α1,α2 A[L]αL 1 = Tr[A[1] A[2] A[L] ]. (387) − α { k} This form is useful since instead of variationally optimizing the FCI tensor we may optimize over the tensors of its decomposition and truncate the virtual (α) dimension. As the dimension D of the virtual index is increased the MPS ansats includes a larger region of the full Hilbert space until it exactly captures the original FCI wave-function.

D. Expectation values and diagrammatic notation

At this point it is convenient to introduce diagrammatic notation for tensors. In tensor networks objects are denoted by shapes with lines connecting them. The amount of lines determines what type of object is being dealt with. Scalars have no lines, vectors are connected by one line, matrices have two, and higher dimensional tensors have more.

FIG. 9. Diagrammatic notation for tensors.

From this notation we may represent the FCI tensor: From this notation we may compute overlap integrals by contracting two tensors with like indicates.

Lastly we may use this notation to represent operators and their expectation values. Given a many body operator O.

163 FIG. 10. Diagrammatic notation for FCI tensor and MPS tensor.

FIG. 11. Diagrammatic notation for FCI contractions and MPS overlaps.

X r1r2 rL | ··· ih ··· | O = Os s ···s r1r2 rL s1s2 sL , (388) 1 2··· L r ,s { j j } we may write it as:

FIG. 12. Diagrammatic notation for an arbitrary operator.

Expectation values are calculated by contracting the open indicates on each side by appropriate wave-functions.

164 FIG. 13. Diagrammatic notation for an expectation value.

E. Matrix product ansatz

In DMRG it is assumed that the wave-function can be written as a matrix product state as described before (MPS ansatz), where the virtual dimensions are truncated to D.

This wave-function is invariant to a number of transformations. Between any two A[i] we may place the identity without changing the wave-function. From this the DMRG wave function may be written in canonical form as

X | i s1 ··· − sp 1 sp sp+1 ··· sL | ··· i Ψ = Tr[L[1] L[p 1] − C[p] R[p + 1] R[L] ] s1s2 sL (389) s { j } where Ls and Rs satisfy the orthogonality conditions

X ( sj [ ]) sj [ ] = (390) L k α† k,αk 1 L k αk 1,βk δαk,βk − − sj ,αk 1 − X Rsj [k] (Rsj [k]) = δ (391) αk 1,αk α† k,βk 1 αk 1,βk 1 − − − − sj ,αk 1 − From these Lsi and Rsi operators we may define sets of renormalized many-particle basis states {l},{r}, where

X s sp 1 | L i 1 ··· − − | ··· i αp 1 = L[1]α1 L[p 1]αp 2,αp 1 s1s2 sp 1 , (392) − − − − sj α1 αp 2 { }{ ··· − }

165 X s | R i p+2 ··· sL | ··· i αp+1 = R[p + 2]αp+1,αp+2 R[L]αL 1 sp+2sp+3 sL , (393) − sj αp+2 αL 1 { }{ ··· − } where

h L | L i αp 1 βp 1 = δαi 1,βi 1 (394) − − − − h R | R i αp+1 βp+1 = δαi 1,βi 1 (395) − − These left and right vectors represent a renormalized bases of the many body Hilbert spaces for site k from orbitals 1 to i − 1 and orbitals i + 2 to L respectively. Consider the left side. For site k from 1 to i − 2, the many body basis is augmented by one orbital and subsequently truncated again to at most D renormalized basis states

X {| L i} ⊗ {| i} → | Li sp | L i| i αp 1 sp αp = A[k]αp 1,αp αp 1 sp . (396) − − − αp 1,sp − This implies that DMRG is a renormalization group for many-body Hilbert spaces.

F. DMRG algorithm

Our goal is to approximate the diagonalization of the exact Hamiltonian in the {| L i} ⊗ {| i} ⊗ {| i} ⊗ {| R i} orthogonal basis αp 1 sp sp+1 αp+1 . The DMRG algorithm consists of successive sweeps over− the orbitals during which two neighboring MPS tensors are variationally optimized. We will achieve this by first combining two adjacent matrices in the MPS and combining them into a new tensor which we will variationally optimize.

X si si+1 si ,si+1 A[i]αi 1,αi A[i + 1]αi ,αi+1 = B[i]αi 1,αi+1 . (397) − − αi We will do this by choosing a trial wave-function (for the first step only) and extremizing the Lagrangian

h | ˆ | i − h | i L = Ψ (B[i]) H Ψ (B[i]) Ei Ψ (B[i]) Ψ (B[i]) , (398) with respect to the complex conjugate of B[i]. From our prior knowledge of variational principle we know this to yield the eigenvalue problem

ef f H[i] B[i] = EiB[i]. (399) Once B[i] is found, it is decomposed with a SVD, and is truncated if there are more than D singular values.

166 During this iteration B[i] is constructed, the corresponding effective Schrodinger equation is solved, the solution B[i] is decomposed using singular value decomposition, A[i] is defined again as the contraction between U[i],s[i], and A[i + 1] is set to the corresponding right normalized matrix V [i]. Once this iteration is completed i → i + 1 (i → i − 1) and the process is repeated again and again until i = L (i = 1) and the process will reverse direction back and forth in consecutive "sweeps" until some convergence criterion is triggered. These calculations are repeated again and again with increasing virtual dimensions D in order to take note of the convergence, since we may often

extrapolate EFCI from the DMRG energies ED.

G. DMRG in practice

Before a DMRG algorithm is implemented it is hugely important to chose an appropriate orbital ordering for the MPS ansatz and an appropriate initial wave-function. The effect of the starting guess causes an estimated error in the energy which is an order of magnitude smaller than the effect of the ordering of the orbitals. It is best to either start with a small active space and subsequently add in previously frozen orbitals, this can be done by starting with a small CASSCF, CI or HF calculation for the orbitals. The choice and ordering of orbitals is non-trivial. In early DMRG White was able to easily order the orbitals since the system was one-dimensional. Neighboring electrons were far more correlated than electrons which were very far from one another. For many elongated molecules a spatially local basis is useful since the system is roughly one-dimensional.

H. Dynamic correlation and excited states

DMRG is not able to correct for dynamic electron correlation on its own. These correlations may be accounted for by combining DMRG to solve for the active space and CASSCF. DMRG may find excited states by projecting out lower-lying eigenstates, or by targeting specific excited states with state-specific algorithms. Additionally DMRG may be combined with linear response theory in order to make DMRG-LRT which allows for the calculation of excited states and other response properties.

I. Applications to atoms and molecules

DMRG has been applied to many systems in order to calculate ground state properties, excited state energies, polarizabilities, and many properties that require calculating the

167 single or two body reduced density matrix (which may easily be computed from the MPS) such as spin densities and dipole moments. Currently is one of the best methods for one one dimensional, almost one dimensional systems, heavy molecules for which relativistic effects are important, and systems such as transition metals with have a very large active space.

J. Limitations

Although DMRG is applicable to most systems, it is only efficient at describing locality in on spatial dimension. In large molecules orbital ordering and the MPS structure as a whole becomes unpractical. For higher dimensional systems more general tensor network states are being developed which increases the number of virtual dimensions summed over and helps encode locality into the TNS wave-function.

168