ON the EVALUATION of NON-ORTHOGONAL MATRIX ELEMENTS However Interesting Ab Initio Valence Bond Theory May Be, It Has Not Kept Pa

Journal of Molecular Structure (Theochem), 229 (1991) 115-137 115 Elsevier Science Publishers B.V., Amsterdam

ON THE EVALUATION OF NON-ORTHOGONAL MATRIX ELEMENTS

JACOB VERBEEK and JOOP H. VAN LENTHE Theoretical Chemistry Group, University of Utrecht, Padualaan 14,3584 CH Utrecht (The Netherlands) (Received 23 February 1990)

ABSTRACT

Lowdin’s formula for non-orthogonal matrix elements is derived and elucidated. Algorithms for cofactor evaluation are discussed. It is shown that for matrix elements involving two-electron operators the effort associated with the computation of the cofactors is only about three floating point operations per cofactor. We conclude that the non-orthogonality problem does not reside in the evaluation of the cofactors and that in situ generation is a viable approach. Explicit algorithms are given.

INTRODUCTION

However interesting ab initio valence bond theory may be, it has not kept pace with the molecular orbital models. This is because of the mathematical and technical problems brought about by the use of non-orthogonal orbitals, especially in the evaluation of matrix elements. In 1955 Lowdin derived an expression [ 1 ] which has become the starting point for the majority of algorithms that have been devised [ 2-9,14,15]. These methods differ mainly in the way the cofactors that appear in Lowdin’s formula are generated. We recently developed a computer code [ 481 for VF3/VEBCF/VEZI [lo] calculations which uses Lowdin’s formula as well. In order to put Lowdin’s formula into use one needs the one- and two-electron integrals and their cofactors. In general, the integrals will involve orbitals which are linear combinations of “raw” basis functions. An integral package generates the integrals in the “raw” basis. Therefore either the cofactors must be transformed to the “raw” basis or the integrals must be transformed to the orbital basis. We assume that the number of matrix elements warrants the integral transformation. This transformation can be done once and for all, before the matrix elements are calculated, and does not require any specific (non-orthogonal) know-how [ 11,121. The cofactors are another story. They are absent in orthogonal calculations,

0166-1280/91/$03.50 0 1991- Elsevier Science Publishers B.V. 116 and consequently their role in regular ab initio quantum chemistry is quite modest. Their evaluation is often suggested to be the bottle-neck in non- orthogonal calculations [ 3,9,13-161. Lijwdin already proposed matrix inver- sion and the subsequent use of the Jacobi ratio theorem as a means of calculating the cofactors. This is a fine scheme provided the matrix to be inverted is non-singular. In practice the matrix very often is singular, so a generalisation of Lowdin’s route is called for. Several solutions have been formulated [ 2-6,9] that are often different in appearance but very similar in essence. This paper is an attempt to unify these approaches and to provide a concep- tual framework to throw some light on the abstruse algebraic manipulations. We present a derivation of the Lijwdin formula and the cofactors are discussed. We discuss determinants in terms of “volume”; this allows for an intuitive picture of the subject. The first- and second-order cofactors are introduced as derivatives of a determinant. A scheme for efficient cofactor evaluation is then outlined using the concepts introduced previously. The L-d-R decomposition of the matrix of overlap integrals is briefly described. Finally, the actual cofactor algorithms used in our program are given. We believe that these algorithms are an optimal blend of existing strategies.

A DERIVATION OF LijWDIN’S FORMULA FOR NON-ORTHOGONAL MATRIX ELEMENTS [ 1 ]

We consider matrix elements of one- and two-electron operator represen- tations in a Slater determinant basis:

IB) (1) (Al~lB) =: i (Alh(i) IB)+ i-zj5 (AM&j)

A and B are normalised Slater determinants, constructed from the orbitals {a} and {b} respectively, and &’ is the electronic hamiltonian operator, which con- sists of a one-electron part, h, and a two-electron part, g. The electron labels are referred to as i and j. By “label” we mean the set of coordinates which corresponds to one electron (three spatial and one spin coordinate). N is the number of electrons. The members of {a} are not necessarily orthogonal to the members of {b}. “Non-orthogonal matrix elements” are matrix elements involving non-orthogonal orbitals. A product of N orbitals in which label 1 is assigned to the first orbital, label 2 to the second and so on, is called a Hartree product. It is textbook knowledge that the normalised operator that turns a Hartree product into a Slater determinant (the antisymmetriser ) is idempotent and hermitian, so that using the turn-over rule, a matrix element can be written in terms of Nl N-electron integrals. If this expression were used, the computational effort needed to eval- uate matrix elements would scale with N! (throughout this paper the excla- 117 mation mark denotes “factorial” rather than enthusiasm). This presents a serious difficulty, generally called “the N! problem”. Starting from eqn. (1) and via the N! formulation, we derive below Lowdin’s general formula for matrix elements. This formula shows that the problem size scales as a polynomial in N. If a Slater determinant is expanded in N! terms in the usual way (repeated Cauchy expansion [ 171) each term is a product of the N distinct spin-orbitals, with the normalisation factor (N! ) - ‘I2 . The first term in the expansion is the Hartree product. The other terms can be generated from the Hartree product by permuting the labels over the orbitals in all ways, yielding the other N! - 1 terms. Each term receives a “ + ” or “ - “, depending on the number of transpositions with respect to the Hartree product being even or odd. A matrix element can then be written as (N!)’ integrals over the N labels. Per N-electron integral an orbital of the “left” Slater determinant is connected with an orbital of the “right” Slater determinant through their unique common label. The connection may or may not involve the operators. One can recognise three important facts [ 181. (a) For each N-electron integral the outcome of the integration depends on the connections and not on the labels used to make the connections. (b) If two integrals have the same connections, but use the labels in a different order, they will have the same overall parity, because the renumbering which converts the first integral into the second brings along the same sign for both determinants and the net parity therefore cancels. (c) Both the total one-electron and the total two-electron operators are sym- metric with respect to all labels. These considerations show, in accordance with the above, that of the (N!)’ integrals only Nl are unique and each unique integral occurs with the same frequency. Considering just the N! unique integrals, dropping the normalisation constant (Nl ) - ’ suffices:

N =T (al(l)a2(2).....aN(N) INi) I

{9(-lP.b,(l)b,(2).....bN(N)})

+ ; (a1(I)a2(2).....aN(N) k(G) I j+ (9 ( -l)P.b,(l)bz(2).....bN(N)}) (2) where 9 is the permutation operator that scatters the labels over the orbitals 118 in all ways, and p is the number of label transpositions needed to generate the term from the Hartree product. Let us denote the ith one-electron term and the ijth two-electron term by ( h(i) ) ab and (g (z$) ) ab. The labelling is quite arbitrary, so that if we know how to deal with say h(i) and g( 1,2), mutatis mutandis the result can be transferred to the other labels. For h(1) (h(l) jab=

N (h(l)),=(al(l)a,(2)...aN(N) lb(l) I C b,(l)

(~(-l)P[&-‘(-l)pib,ob:(3).....b,(N)1~) (4) The integration over label 1 now can be performed separately:

N (h(l))ah=C (al(l)lh(l)lbi(l)).(-l)i-1.(a,(2)a,(3).....aN(N)l

{b(- l) “&~‘b2(2)b3(3).....b~(N)])) (5) For each i the rightmost integral is a sum of (N-l)! integrals. Each of these (N- 1 )! integrals involves integration over N- 1 labels, and each orbital a is connected to just one b via its label, without intermission of an electronic operator. They are products of N- 1 overlap integrals, each product carrying a sign according to P’s permutations being even or odd with respect to the nat- ural order of the Hartree productfizT1b2(2)&(3)...b,(N). By definition, the sum of the (N-l)! signed products of (N-l) overlap integrals is the determinant of the (N- 1) x (N- 1) matrix of overlap integrals having ( ai+ (1) I ZI,+~(1) ) at position (i,j). Recalling that fii’ b2...bN equals the product of the N b orbitals without the ith b orbital the determinant is seen to be the (1,i) th minor of the NX N matrix of overlap integrals of {a> and {b}, having ( ai ( 1) I bj (1) ) at position (i, j). This matrix will be called Sab. Minors and cofactors are described in Appendix 1. It should be noted that the overlap integrals involve spin orbitals (see ref. 19 for some interesting remarks 119 on the effect of the related blocking structure). It is also important to realise that, for A #B, Sab is not a metric so that singularity of Sab does not imply a dependency in the spin-orbital basis. The extra parity offiil coincides with the parity which turns the minor into a cofactor (cf. Appendix 1). We conclude that the weight factor of the ith one-electron integral in the summation of eqn. (5) is the (1,i)th cofactor of Sab:

(h(l)).,=: (al(l)Ih(l)lbi(l)).S~~i’ (6) i

For the general case hG) the reasoning is essentially the same. We wish to isolate thejth label per group of b terms.bi then has to interchange (i-j) times to let thejth label be assigned to bP The minor now is the G,i)th minor of Sab, and the parity of/LLT1again turns the minor into the cofactor. The expression for the h(j) part is then

Summing over all labels yields the total one-electron contribution:

N N (Al C h(i) IB) =I (ai Ih(l) ]bj(l))*SiL;” (8) i i,j

It is convenient to introduce the matrix of one-electron integrals ha,,, which has ( ai (1) Ih (1) Ibj (1) ) at position (ij) pand the first-order adjugate, adj ( Sab), cf. Appendix 1. The total one-electron contribution now can be written as the trace of a matrix product:

This expression closely resembles an energy expression in terms of integrals and density matrix elements. In the case where A equals B, and the wavefunction is just A, the cofactors in fact are the elements of the one-electron density matrix [ 201. If A # B the adjugate is a one-electron transition density matrix. The starting point for g (1,2) is the two-electron analogue of eqn. (3 ) :

(g(1,2) )ab= (&(l)%(2) --aN(N) k(h2) i{p(-1)”

.b,(l)b,(2).....bN(N)}) (10)

The b terms with labels 1 and 2 have to be isolated. This is achieved by bij, which brings two terms in front. This means one has to distinguish N. (N- 1) groups, one for each assignment of 1 to one b term and 2 to another b: 120

bj(2) 15i#j *{P( -l>“b$( -l)“i’.b3(3)bq(4).....b~(N)]}) (11) However, the case bi (1)bj (2) can be converted into the case bj (1)bi (2) by just one interchange (the parity swaps), so that they can be combined into a single term. This reduces the summation from iV* (N- 1) terms to N* (N- 1) /2 terms:

(g&2) jab=

(&(W)= i i-cj

- (ai(l)aj(2) (g(1,2) lb,(l)b,(2))].S~~k,” (13) It should be noted that the two-electron integrals may vanish because of spin integration. It is convenient to introduce the (super) matrix Gab, which has the difference of the two-electron integrals appearing in the last formula at position (ij,kZ), and the second-order adjugate adjC2)(I&,), cf. Appendix 1. The two-electron contribution can be written as the trace of a matrix product of two matrices of dimension N. (N- 1) /2:

(Al F g(i,j) IB) =Tr[G,b~adj’2)(S,b)] (14) ixj Again this expression closely resembles an energy expression in terms of integrals and density matrix elements. Using the matrices defined above, we obtain Lowdin’s general formula for a matrix element in the form:

(Al~lB) =Tr[h,b *adj(S,)]+Tr[Gab.adjC2)(Sab)] (15) This formula holds in both the orthogonal and the non-orthogonal cases. hab and Gab are second and fourth rank contravariant tensors, whereas the first- and second-order adjugates are (transposed) second- and fourth-rank covariant tensors (cf. Appendix 2). Each term in the traces of eqn. (15) therefore is an inner product of a contravariant and a covariant vector, and hence the matrix element does not change upon any non-singular transformation amongst 121 the orbitals of A or the orbitals of B. This of course is already apparent from the determinantal form of A and B. A valence bond structure generally comprises a linear combination of Slater determinants, so that one matrix element in the structure basis requires the evaluation of several expressions of type (15). In this paper we do not explore the savings that can be made by exploiting the interrelation of these terms; see, however, refs. 4 and 21.

Determinants as volumes and cofactors as derivatives [17,22,23]

From now on the subscripts a and b are omitted from S, although it must be remembered that S is different for different combinations of A and B. A square N-dimensional real matrix S, that is any collection of N 2 real numbers, can be looked upon as a prescription for how to transform a set of numbers {x1 to x~} to another set of numbers {yl to yN}: sx=y or

Sll% +s123c2 + .. . ..+SlNXN =y1

Sal% +s22x2 + .. . ..+S2NXN =y2 (16) ......

%Vlk+SN2X2 + .. . ..+sNNXN =yN Given N basis vectors that span a linear space, the N numbers may refer to the basis vectors so as to realise a vector in this space. If the same space is spanned by another basis, other numbers are needed to indicate the same vector. The transformation of x to y now can be assumed to stand for a transformation of the basis of x, so that y merely equals x albeit expressed in another basis. Let us adopt this viewpoint. Alternatively y can be taken to be an image of x, expressed in the same basis. One is free to choose one’s metric (though the linear space changes with the metric), so let us choose the unit matrix as a metric for case x. The basis vectors then span a (hyper)cube having a unit (hyper )volume. In general the transformation of the basis vectors by S distorts the cube into, a parallelepiped, thereby changing the volume. The new basis in terms of the old basis coincides with S and the new metric equals SS. SS is hermitian, but need not be pos- itive definite, as it may have eigenvalues equal to zero. The ratio of the new and the old volume, that is determined by the distortion of the angles and the lengths of the basis vectors, is the determinant of S. If one component of x is lost in the transformation, the pictoral representation of the new basis is an (N - 1 )-dimensional body and the determinant (volume) of S is zero (S is singular). The transformation of the basis vectors 122 then cannot be inverted: y equals x after projection onto an (N- 1) -dimensional subspace. Likewise more components can get lost. The volume of S is linear in each of the lengths of its vectors. The length of a vector is not linear in each of its components unless it has just one non-zero component. The more the vectors are aligned the less sensitive the volume will be to elongation of the vectors (absolutely, not relatively). If the alignment is minimal (all vectors perpendicular), the volume is maximal for given lengths of the vectors, and equals the product of the lengths of the vectors. If the vectors are not perpendicular, the angles (their cosines) appear in the expression of the volume, and the determinant in terms of the elements of the matrix becomes complicated (N! terms). Assuming the vectors have unit length, the value of the determinant is a measure of the “amount of linear independence” in the coefficients of the set of linear equations associated with S. The determinant then is unity at most and approaches zero as the vectors become aligned and the set of linear equations approaches linear dependency. In general, all N2 entries of the matrix play a role in the determinant. The contribution of each separate element can be established by expanding the determinant along the element’s row or column, in terms of elements of the matrix a& and their cofactors S (i,k):

( s 1 = 2 Sik’S (Q3) (17) i

Say a,& changes as

I srnk =%nk +t (18) where t is a parameter. The determinant now can be written as a function of the parameter t:

Is’(t) 1 =$ .&‘(S’)(i’k) i

= JSI +t.S’@’ (19)

The determinant thus is a linear function of the parameter t, the slope and the abscissa being the cofactor of t (or a,,&) and 1S 1 respectively. A first-order cofactor therefore can be looked upon as the first derivative of the determinant with respect to the corresponding element of the matrix. A second-order cofactor is a first-order cofactor of a first-order cofactor. As far as the signs are concerned this is not evident: the parity could be spoiled by the fact that the submatrix corresponding to the first-order cofactor has fewer 123 rows and columns. However, say the first-order cofactor is S (i-k),and the second-order cofactor is made from S (i,k)by striking out the jth row and the Zth column in the submatrix corresponding to S (i,k). If i

(adj(S)),=SCk,‘)=$/$ (20)

(adj(2)(S))~~kl=S(k"'i,j)= a",2.'tL, -[ kc ’ b 1z

An efficient scheme for cofactor evaluation

Bearing in mind the concept of volume, the determinant in an orthogonal, diagonal representation can be seen to be a function of only N independent parameters, namely N lengths. Let us assume S can be transformed to this form through pre- and post-multiplication by L and R, both being NX N matrices having a unit determinant. L transforms the old basis (see preceding section) and R transforms the new basis, so that each transformed new basis vector is parallel to just one transformed old basis vector, or has vanished: L.53.R=d with (d)ik=O,for i#k and ILJ = [RI.=1 so

,s,=,d,=fi& (21)

There are two ways in which L and R can be constructed. (i) In N steps, the ith vector of the old basis and the ith vector of the new basis are projected out of all vectors yet to come (“asymmetrical Schmidt orthogonalisation”, biorthogonalisation) [ 2,251. The result then will depend on the order of the vectors, and L and R are non-unitary. (ii) Diagonalisation of the new metric SS, yielding the eigenvectors V ( = R) in the old basis. The eigenvectors in the new basis SV are an orthogonal set and hence can be made diagonal by pre-multiplication with a unitary matrix. This matrix is usually denoted Ut ( = L). This 124 technique (“asymmetrical Lijwdin orthogonalisation”, singular-value decomposition [ 421, corresponding orbital transformation [ 24,251) yields unitary L and R matrices. Since we are not interested in the appearance of the transformed vectors we prefer biorthogonalisation, because it is simpler. In the diagonal representation the derivatives (cofactors) can be easily established:

(adj(d))i*=z kc a(d,, *cl22*d33.....dNN) = (22) ahi

(adj(2)(d))ijj.kl= @%u J ddki*ddlj ilinear algebra [ 171. For the first-order cofactors [ 21 S=L-l.d.R-’ adj (L-l) =L adj (R-l) =R *adj(S) =adj(L-l*d*R-‘) adj(ABC)=adj(C)adj(B)adj(A) =adj(R-l)*adj(d)*adj(L-l) =R*adj(d)*L (24) Here, A, B and C are arbitrary square matrices of matching dimensions that illustrate the reversal rule for the adjugate of a product of matrices (a conse- quence of the fact that the adjugates have the (i,j)th cofactor at the (j,i)th, i.e. transposed, position). For the second-order cofactors the reasoning is essentially the same [ 21:

S=L-l.d.R-1 adjc2)(ABC) =adj(2)(C)*adj’2’(B).adj’2’(A) adjc2’(A)= ]A(-1*(adj(A))‘2’ ]L]=JR]=l =>adj(2) (S) = adj (L-l) =L =>adj(2)(L-1) =Lc2) R(2) .adj(2)(d)mL(2) adj (R-l) =R adj(2)(R_l)=R(2) (25) 125

The theorem involving A only is the Jacobi ratio theorem [ 171, which requires A to be non-singular. This always is the case for L and R. The definition of the compound matrices ( AC2)) is given in Appendix 1. It is noted that the transformation properties of the adjugates are somewhat odd: one might have expected the undoing of the transformation by L and R to involve their inverses (see Appendix 2). Expressions (24) and (25) can be inserted in Lowdin’s formula: (A]%]B) =Tr[h*adj(S)]+Tr[G-adj(2)(S)]

=Tr[h-R.adj(d)*L]+Tr[G-R’2).adj(2)(d).L’2’] (26) This route for matrix element evaluation will be called possibility I: the cofactors are calculated in the biorthogonal basis and transformed to the orbital basis, after which they are used in Lowdin’s formula [ 1,2,4,5,7-9,14,15]. The second-order cofactors should not be transformed using eqn. (25). If this equation were used the number of operations would be 0 (N6), as the matrix multiplications involve matrices of dimension N* (N - 1) /2. Instead, by using the so-called Binet-Cauchy theorem [ 171, the matrix multiplications can be performed before the compound is formed. The theorem reads as fol- lows: given some square matrices of equal dimension, the product of the kth order compound matrices is equal to the kth order compound of the product of the matrices:

A’k’.B(W.C(k)= (A.B.C)‘k’ (27) The reason for this by no means obvious property of compound matrices is that they are orderly “blown up” matrices. This theorem can be used if a matrix exists (e, say), the second-order compound of which is proportional to the second-order adjugate of d: constant*e(2)=adj’2’(d)+-adj’2’(S) =constant* (R*e*L)(2) (29) This allows for the transformation of the 0 (N “) cofactors using 0 (N “) operations. So how does one find e? The second-order adjugate of a diagonal matrix has a very simple structure: it is a diagonal matrix having at diagonal position j* (j - 1) . Cj- 2 ) /2 + i (with i < j) the product of all but the ith and the jth diagonal elements of the diagonal matrix. If d is non-singular, e can be seen to be a diagonal matrix too, having the reciprocal ith diagonal element of d at the ith diagonal position, that is, e is the inverse of d. The constant is the product of the diagonal elements of d, that is, the constant is the determinant of d, so adj(‘)(S) =constant* (R*e*L)“’

= Id]. (R.d-1.L)‘2’ (29) The product of R, d-l and L is the inverse of S, so that, via the simple relation 126 between the inverse of a matrix and its adjugate, the first-order cofactors can be used to calculate the second-order cofactors:

(36)

The last equation is the Jacobi ratio theorem for second-order adjugates [ 171. From a computational point of view, the Binet-Cauchy theorem states the clue. It should be noted that the fact that for one-configuration wavefunctions the two-electron density matrix is completely determined by the one-electron density matrix [20] is a statement of the Jacobi ratio theorem. The scheme must be adapted if S is singular, but this does not present a serious difficulty (see section on algorithms for calculating all first- and second-order cofactors of S). Instead of transforming the cofactors back to the original basis one also can transform the one- and two-electron integrals to the new (biorthogonal) basis. This form can be arrived at by performing a similarity transformation with L-’ for the one-electron part and (L’2’)-1 for the two-electron part at the right-hand side of eqn. (26): (AlZlB) =Tr[L.h*R-adj(d)]+Tr[L(2)*G.R(2)*adj(2)(d)]

=Tr[h’*adj(d)]+Tr[G’*adj’2’(d)] (31) with h’ =L.h.R G’ =LW.G.R@) so h’ and G’ stand for the transformed integrals. The integral transformation requires 0 (iV5) operations, per matrix element. This scheme will be called possibility II: the cofactors are calculated in the biorthogonal basis, and the one- and two-electron integrals are transformed to the same basis, after which Liiwdin’s formula is used [ 26,271. Once the investment in the integral transformation has been done, possibility II allows for an increase in efficiency. As mentioned above, the adjugates of d are very sparse: they only have non-zero elements at the diagonal. Therefore only the diagonal elements of h’ and G’ appear in the trace, and the matrix multiplications and the traces take 0 (N “) operations. If d is singular once, adj (d) has only one non-zero element, picking out just one-electron integral. adjC2)(d) then has (N- 1) non-zero diagonal elements, thus allowing only the corresponding two-electron integrals to appear in the trace. The number of operations required is then 0 (N) per matrix element. If d is singular twice all one-electron integrals are weighted by zeros, and just one element of G’ con- tributes to the matrix element. The effort then is O(iVO) only. These short- 127 cuts in Liiwdin’s formula in fact are the Slater-Condon rules for biorthogonal orbitals [ 261. If {a} and {b} are members of a common orthonormal basis L, R, L@) and R@) are unit matrices, rendering the transformation to h’ and G’ superfluous. The Slater-Condon rules then apply directly to the original integrals. This reflects the reason why per matrix element orthonormal calculations are much more efficient than non-orthogonal calculations. In the orthonormal case the adjugates contain just ones and zeros. The ones just pick out the relevent integrals for the matrix element. Indeed, the Slater-Condon rules are in fact the relations between the structure of the matrix of overlap integrals S and its first- and second-order adjugates. In the orthogonal case the distribution of the ones in theadjugates bears a simple relationship to the occupation numbers of the spin orbitals of the Slater determinants in question, so that the adjugates can be dispensed with altogether. The summations in Lowdin’s formula are very much restricted because of the sparsity of the adjugates and only very few integrals are needed per matrix element. In the non-orthogonal case the function space is spanned in a much less orderly fashion, so that, per matrix element, one has to face the transformation of the cofactors to the orbital basis and the tiresome summations (possibility I), or the elaborate transformation of the one- and two-electron integrals (possibility II). The transformation of the cofactors is much cheaper than the transformation of the integrals, because the 0 (N4) cofactors are characterised by 0 (N’) numbers, whereas the 0 (N4) integrals, from the practical viewpoint, are independent numbers. Therefore, as has been known for a long time [ 29,301, possibility I is faster than possibility II. The effort associated with a four-index transformation scales with n;nr4 [ 121, where n, is the number of orbitals, occupied in the Slater determinants, (“cooked” orbitals), and n, is the number of “raw orbitals”. Therefore, if the number of requisite matrix elements is small (small < number of “cooked” orbitals), per matrix element, it pays to transform either the cofactors to the “raw” basis (possibility I’ ) [ 61, or the “raw” integrals directly to the biorthogonal basis (possibility II’ ) [ 301. In the latter case the two transformations (1, “raw” to “cooked”; 2, “cooked” to biorthogonal) are combined as one, and given the willingness to transform integrals per matrix element, the non-orthogonality problem is alleviated for free. Still, possibility I’ is more efficient, because then only the first-order cofactors need to be transformed to the “raw” basis and the “raw” second-order cofactors can be evaluated directly using the “raw” first-order cofactors. It is striking that this kind of reasoning is paral- lelled completely in the theory of analytical gradient methods (cf. eqn. (32) in ref. 31). It should be realised that in these algorithms all raw integrals contrib- ute to every matrix element. A third possibility [ 28,32-361, seems to be the transformation of all integrals to a dual, biorthogonal basis [ 371, before the matrix elements are calculated. 128

In this basis the orbitals of the “left” Slater determinants {@} are biorthogonal to the orbitals of the “right” Slater determinants {@‘}. It should be noted that the total number of orbitals then doubles, the integrals lose some symmetry: (~‘iI~l~;)Z(&l~l@:) e t c., and a matrix representation of a hermitian operator becomes non-hermitian. The biorthogonal Slater-Condon rules, or more sophisticated formulae [ 32,33,35,36], then apply directly, without the 0 (N5) integrals transformation per matrix element. Only when the configuration space is exploited to the full (full configuration interaction (CI), CAS ) ) the method is exact [ 281. Truncation of the wavefunction deems the matrix elements ap- proximate, and the results are disappointing [ 381, though in the case of weakly bonded systems the approximation has been reported as “not too serious” [ 361. We want to be able to calculate a few million, exact, non-orthogonal matrix elements. Therefore our attention will be devoted to possibility I, using integrals in the_“cooked” orbital basis.

The L-d-R decomposition of S [2]

The purpose of the decomposition is to eliminate the off-diagonal elements of S by means of non-singular transformations (biorthogonalisation). Though it is not essential, we restrict ourselves to transformations which leave the determinant unchanged. If a matrix S is post-multiplied by a unit matrix, augmented with a non-zero element at position (i,j) above the diagonal (i j) results in weighed addition of the ith row to the jth row. In principle, all off-diagonal elements of S can be eliminated this way, so that chains of the elementary matrix operations can turn S into a diagonal matrix, without changing its determinant. Along the way, care must be taken not to let a later elimination introduce a non-zero element at an already blanked position of S. This can be achieved by following a strict order. Knowing that the post-multiplying matrices are going to add columns to columns, one should have the ith post-multiplying transformation matrix (Ri) clear all elements in the ith row to the right of the ith diagonal element. This transformation matrix then is a unit matrix, augmented with non-zero elements to the right of the ith diagonal position. One has (N- 1) rows to clear, so one needs (N- 1) post-multiplying matrices. Likewise, the ith pre-multiplying matrix (Li) wipes out the elements of the ith column below the ith diagonal element. It should be noted that in the ith step the elements 129 lying both right of and below the ith diagonal element change too. The whole procedure symbolically is LN__1.LN__2.....L1-S*R1-R2.....RN_1=d

L= .J, Li (32) N-l

L has zeros to the right of the diagonal, R has zeros below the diagonal. Both matrices have a unit diagonal. The accumulations of L and R take only partial matrix multiplications (0 (N 2, ). For example, in a six-dimensional case, after three L and three R operations

(33)

This technique (elimination) is a standard method for solving a set of linear equations. Detailed descriptions can be found in most textbooks on numerical linear algebra [ 39-41,431. The number of operations is 0 (iV3). Before the ith step one should permute rows and columns so that the largest element of the submatrix yet to be cleared occurs in the ith diagonal position (pivot search), to assure numerical stability. The symbolic definition of the Slater determinants changes accordingly, as does the parity associated with the permutations. If S is singular this pivot search pushes the singularity for- ward, so that it will show at the final diagonal position(s). This is helpful as the singularity is localised, so that in further stages it can be dealt with easily. During the decomposition the Slater determinants become reordered. If they happened to have this new order from the start the pivot search would not have had any effect, so that it is permitted to omit the pivot search in the mathematical treatment [ 401.

Algorithms for calculating all first- and second-order cofactors of S

The aforementioned considerations have led us to select the following algorithms. If the nullity (number of zeros at the diagonal of d, dimension minus rank) of S is three or more, all relevant cofactors are zero. We distinguish three more cases depending on the nullity being 0,l or 2. 130

It is assumed that S has been decomposed through L-d-R decomposition (L*S.R = d). All reasonings hold in any case where the matrix has been transformed to diagonal form without changing the determinant.

The nullity of S is zero The first-order cofactors are evaluated via adj(S)= Idj*(R.d-‘SL) (34) This equation is a slight modification of eqn. (24): the use of d-l instead of the adjugate reduces the number of operations (d-l = ( d 1-l.adj (d) ). The determinant of d is simply the product of the diagonal elements of d, while d-l has (&) -’ at the ith diagonal position, and zeros elsewhere. The number of operations is dominated by the matrix multiplications and is 0 (N3). The second-order cofactors are evaluated using the Jacobi ratio theorem:

The evaluation of an element of a second-order compound matrix requires only two multiplications and one subtraction (a 2 x 2 determinant) so that the total effort to construct all the second-order cofactors scales with N4. These algorithms were suggested by Lowdin [ 11.

The nullity of S is one The first-order cofactors are evaluated via adj(S)=R*adj(d)*L (36) d has one zero diagonal element, so that its adjugate has just one non-zero element. The adjugate of S thus has rank one, and its elements must be products of the elements of just two N-dimensional vectors. They can be seen to be the last column of R and the last row of L, one of them weighed by the non- zero element of the adjugate of d. The effort is 0 (N 2). Unfortunately, for the second-order cofactors there is a problem: the Jacobi ratio theorem breaks down. However, it still is of use. By definition, a matrix of nullity one must have at least one first-order, non-zero factor, say S@,4). This allows for the introduction of a parameter t that controls the singularity of s:

&=sp4+t (37) pr(t)J=p~+t+P~)

All first- and second-order cofactors now are linear functions of the parameter, as can be seen by expansion along column p (cf. eqns. (17)-(19) ). For the first-order cofactors 131

=~~ks~q.(S’(t))‘k,l,~,”

=lgkSLq ’S(k,l,i,q) + t.s WJAi,q)

=s (ki) + t.s (k,P,h?) (33)

For the second-order cofactors

m#k,l

=m~klsmq*~(kLm,ii,q) + t. S W,~,i..Lq)

= S (kkj) + t. S V4,rki,q) (39) The first three superscripts of the third-order cofactors refer to deleted rows, the second three to deleted columns. By no means do we want to use these awkward formulae; they serve merely to prove the linearity. The desired second-order adjugate of S equals the second-order adjugate of S’ (t) for t=O. Knowing that the elements of the adjugate of S’ (t) are linear in t, the second-order adjugate of S is seen to be accessible through interpolation of the second-order adjugate of S ’ (t) for two reasonable values for t, e.g. + 1: adj(2)(S)=$.[adj(2)(S’(t))t=l+adj’2’(S’(t))t=_-1] (40)

For both values oft the adapted S matrix is non-singular, so that the method, used in case the nullity of S is zero, can be invoked to get the extrapolated adjugates. The fact that the singularity is nicely localised by the pivot search facilitates the S’ construction: one simply has to put a 1 or a- 1 at dNNand go on as for a non-singular problem. The two extrapolated adjugates are inter- mediates only, so their elements can be combined into the elements of adjc2)(S) right away. The effort still scales with N *. Compared with the case of nullity zero, it requires twice as many operations. Algorithms like these were reported by van Montfort [ 51, Broer-Braam [ 61, Gallup et al. [ 41 and Leisure and Balint-Kurti [ 91. The use of interpolation, which is reminiscent of the parameter method of Gallup et al., was suggested to us by Dr. W.I.I. van der Kallen and Dr. S.J. van Edixhoven of the Mathe- matical Institute of the University of Utrecht. It should be noted that the solution of Prosser and Hagstrom [ 21 is an 0 (N 5, algorithm. 132

The nullity of S is two All first-order cofactors are zero. Let us recall the decomposition of the second-order adjugate in terms of L, d and R: adj “j(S) =R (2).adj(2)(d).L(2) (41) Analogously to adj (d) in the case of nullity one, adj(‘) (d) now has just one non-zero element, so that adj”‘(S) =constant*x*y+ (42) x and yt can readily be identified as the last column of Rc2) and the last row of Lc2),whereas the constant equals the non-zero element of the second-order adjugate of d: (x) = (Rc2’) _ v q,NN 1

(Y)ij= (Lc2))~~--l,kl (43)

N-2 constant = n & i=l

Here x, Rc2), y and Lc2) carry the [Cj - 1)* (j - 2 ) /2 + i,i < j] type labelling, whereas d has normal matrix labels. The construction of x and y take 0 (N “) operations, so that overall the bottle-neck is the combination of the N* (N- 1) /2 elements of x with the same number of elements of y. Compared with the case of nullity zero, this process takes one-third of the number of operations. This algorithm was devised by Prosser and Hagstrom [ 21.

DISCUSSION

For an arbitrary square matrix the calculation of all the first- and second- order cofactors is dominated by the 0 (N*) second-order cofactors and takes O(N*) operations at most. In the mean one needs just about three floating point operations per second-order cofactor. The evaluation of an electronic Hamiltonian matrix element using Lowdin’s formula therefore requires 0 (N*) operations. The transformation of second-order cofactors is much cheaper than the transformation of two-electron integrals because the cofactors are interrelated by a quadratic dependency, as opposed to the two-electron integrals that, from the practical viewpoint, are independent numbers. Therefore if two-electron integrals are involved, one should transform cofactors (possibility I), not integrals (possibility II). Straightforward use of Lowdin’s formula, either in the orbital basis or the “raw” basis, is more efficient than the use of biorthogonal or corresponding orbitals. 133

In Lowdin’s expression,for a matrix element involving a one- and a two- electron operator, a second-order cofactor will be combined with the difference of two two-electron integrals. In general, these integrals are stored in a super- matrix in which the position defines the integral. The integral addresses must be calculated from the orbital labels, the integrals must be fetched, and they have to be subtracted, before the cofactor, which took about three floating point operations, enters the picture. The integrals must be multiplied by the cofactor, and the results must be summed. It is clear that the effort associated with all this is at least comparable with the effort associated with the evaluation of the cofactors. This conclusion is confirmed by actual timings in our code, and is in complete accordance with the findings of Gallup [44]. It is incompatible with the analysis of Raimondi and Gianinetti [13]. The evaluation of the cofactors does not cause Lowdin’s formula to be im- practible, so that, if only for reasons of simplicity, they can be evaluated in situ. Fetching unique pre-computed cofactors, which necessitates extra ad- dress calculations, is probably about as tiresome as recalculating them when- ever needed. Generally, the matrix of overlap integrals will change only gradually in going from one matrix element to another. The decomposition then will change gradually too (“Woodbury” or “Sherman/Morrison” formulae, “reinforcement method”, cf. refs. 39 and 41), so that the new cofactors can be related to the old ones [ 14,151. This philosophy (i.e. the use of the information common to several matrix elements) will be beneficial only if the parts of the summations involving these second-hand cofactors and integrals are not repeated [ 3,451.

ACKNOWLEDGEMENT

We wish to thank Dr. P.J.A. Ruttink and Dr. F.B. van Duijneveldt for their comments on the manuscript and Dr. W.I.I. van der Kallen and Dr. S.J. van Edixhoven for their unseen understanding of the singularity problem. The stimulating discussions with Dr. G.G. Balint-Kurti and Dr. H.B. Broer are greatly appreciated.

REFERENCES

1 P.O. Lowdin, Phys. Rev., 97 (1955) 1474. 2 F. Prosser and S. Hagstrom, Int. J. Quantum Chem., 2 (1968) 89. 3 D.L. Cooper, J. Gerratt and M. Raimondi, Adv. Chem. Phys., 69 (1987) 319. 4 G.A. Gallup, R.L. Vance, J.R. Collins and J.M. Norbeck, Adv. Quantum Chem., 16 (1982) 229. 5 J.Th. van Montfort, Ph.D. Thesis, University of Groningen, 1980. 6 H.B. Broer-Braam, Ph.D. Thesis, University of Groningen, 1981. 134

I P.C. Hiberty and J.M. Lefour, personal communication, 1988. See for example, P.C. Hiberty and J.M. Lefour, J. de Chim. Phys., 5 (1987) 84. 8 R.D. Harcourt and W. Roso, Can. J. Chem., 56 (1978) 1093. 9 S.C. Leasure and G.G. Balint-Kurti, Phys. Rev., Ser. A, 31 (1985) 2107. 10 J.H. van Lenthe and G.G. Balint-Kurti, J. Chem. Phys., 78 (1983) 5699. 11 M. Yoshimine, J. Comput. Phys., 11 (1973) 449. 12 V.R. Saunders and J.H. van Lenthe, Mol. Phys., 48 (1983) 923. 13 M. Raimondi and E. Gianinetti, J. Phys. Chem., 92 (1988) 899. 14 I.C. Hayes and A.J. Stone, Mol. Phys., 53 (1984) 69. 15 G. Figari and V. Magnasco, Mol. Phys., 55 (1985) 319. 16 A.F. Voter and W.A. Goddard III, Chem. Phys., 57 (1981) 253. 17 A.C. Aitken, Determinants and Matrices, Oliver and Boyd, Edinburgh, 1959. 18 J.C. Slater, Quantum Theory of Matter, 2nd edn., McGraw-Hill, New York, 1968, p. 222. 19 M. Raimondi, W. Campion and M. Karplus, Mol. Phys., 34 (1977) 1483. 20 R. McWeeny and B.T. Sutcliffe, Methods of Molecular Quantum Mechanics, Academic Press, London, 1969. 21 R. McWeeny, Int. J. Quantum Chem., 34 (1988) 25. 22 E.D. Nering, Elementary Linear Algebra, W.B. Saunders, Philadelphia, 1974. 23 V.I. Smirnov, Linear Algebra and Group Theory, McGraw-Hill, New York, 1961. 24 A.T. Amos and G.G. Hall, Proc. R. Sot. London, Ser. A, 263 (1961) 483. 25 F.E. Harris, Adv. Quantum Chem., 3 (1967) 61. 26 H.F. King, R.E. Stanton, H. Kim, R.E. Wyatt and R.G. Parr, J. Chem. Phys., 47 (1967) 1936. 27 H. &en, R. Arneberg, J. Mtiller and R. Manne, Chem. Phys., 83 (1984) 53. 28 P.A. Malmqvist, Int. J. Quantum Chem., 30 (1986) 479. 29 F. Prosser and S. Hagstrom, J. Chem. Phys., 48 (1968) 4807. 30 H.F. King and R.E. Stanton, J. Chem. Phys., 48 (1968) 4808. 31 P. Pulay, Adv. Chem. Phys., 69 (1987) 241. 32 M. Moshinski and T.H. Seligman, Ann. Phys., 66 (1971) 311. 33 A.A. Cantu, D.J. Klein and F.A. Matsen, Theor. Chim. Acta., Berlin, 38 (1975) 341. 34 J.M. Norbeck and R. McWeeny, Chem. Phys. Lett., 34 (1975) 206. 35 Z.E. Slatterly, Ph.D. Thesis, University of London, 1982. 36 W.H.H. Rijks, Ph.D. Thesis, University of Nijmegen, 1988, p. 120. 37 J.P. Dahl, Int. J. Quantum Chem., 14 (1978) 191. 38 M.A. Robb and R. McWeeny, personal communication, 1988. 39 A. Kielbasinski and H. Schwetlick, Numerische Lineare Algebra, VEB Deutscher Verlag der Wissenschaften, Berlin, 1988. 40 J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, Springer-Verlag, New York, 1983. 41 D.K. Faddeev and V.N. Faddeeva, Computational Methods of Linear Algebra, Freeman, San Francisco, CA, 1963. 42 A. Ben-Israel, Generalized Inverses, Wiley, New York, 19’74. 43 J.H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1988. 44 G.A. Gallup, Phys. Rev., Ser. A, 35 (1987) 35. 45 M. Raimondi, M. Simonetta and G.F. Tantardini, Comp. Phys. Rep., 2 (1985) 171. 46 B. Spain, Tensor Calculus, Oliver and Boyd, Edinburgh, 1960. 47 E.A. Hylleraas, Mathematical and Theoretical Physics, Vol. I, Wiley, New York, 1970, p. 50. 48 J. Verbeek and J.H. Langenberg, TURTLE, an ab initio VB/VBSCF/VBCI program, The- oretical Chemistry Group, Utrecht, 1989. J. Verbeek, Ph.D. Thesis, University of Utrecht, 1990. 135

APPENDIX I

A minor of a square, n-dimensional matrix A is the determinant of a matrix, made from A by striking out rows and columns. We will distinguish first-order minors, obtained by striking out one row and one column, and second-order minors, obtained by striking out two rows and two columns. A signed first- or second-order minor, the sign being ( - 1) ‘+’ or ( - 1) i+j+k+z, where i andj refer to deleted rows, and k and 1 to deleted columns, is the first- or second-order cofactor. These cofactors will be denoted A G,~)or A (i*j,k,‘). With each element of A, a&,one can associate a cofactor A (iTk).If the elements of a row of A are multiplied by their cofactors, and the products are summed, one gets the determinant of A (the usual Cauchy expansion). If the elements of a row are multiplied by the cofactors of another row and then added, one effectively expands a determinant which has two equal rows, so that this sum will be zero. A similar reasoning holds for columns. In order to represent this property of the cofactors in matrix form, one defines a matrix, called the first-order adjugate adj (A),which has the (i,j)th cofactor at the (j,i)th, i.e. transposed, position. The matrix product of adj (A) and the original A, has zero off-diagonal elements, and the determinant of A at the diagonal positions adj(A).A=A.adj(A)= IAl- (Al.1) where 1 is a unit matrix. These equations show the relation between the adjugate of A and the inverse of A (A-' = 1A 1-l*adj (A) ),though it must be stressed that the adjugate of A always exists, whereas the inverse is defined for non-singular A only. It should be noted that if the columns of A and the rows of adj (A) are taken to be vectors in a linear space, the ith row of adj (A) is a vector that is orthogonal to all but the ith column of A. With respect to the columns of A the rows of adj (A) are the so-called reciprocal vectors, cf. reciprocal lattice in x-ray diffraction [ 32,471. If one forms all n* (n - 1) /2 2 X 2 determinants (a&’ ujz- ailsUjk)which can be constructed from rows i and j of A and multiplies them by their (second- order) cofactors, the sum again will yield the determinant of A (Laplace expansion [ 171). Analogously to the first-order case, the use of second-order cofactors stemming from another choice of the rows yields zero. In order to represent this property of the second-order cofactors one defines two square matrices of dimension n. (n- 1) /2. The first is the second-order compound matrix Ac2),which has (Uik'Ujl-uil'Ujk)at position (ij,kZ). ij and kl are (j-l).(j-2)/2+i and (Z-l)*(Z-2)/2+k respectively, with i

APPENDIX 2

Vectors, the components of which are partial derivatives of a function, are covariant vectors, i.e. a non-singular coordinate transformation in the coordinate space induces an inverse-transposed transformation in the gradient space [23,46]. For example, given a differentiable function f (3c1,xa,x3)and its three-dimensional gradient vector v

(A2.1)

Say the coordinates are transformed by a matrix A as 3c; =a11 ‘Xl +a,,-3tz +als’x3 zC;=a,,*sc, +u,,*x,+u,,*X~ (A2.2) x; =u31-x1 +a3~-x~+u33.X3 The components of the original gradient vector can be related to the components of the new gradient vector (v’ ) using the chain rule for differentiation. For the first component

_~~~_+~~~+~~~af af ax; af axg af ax; aX1 ax; ax, ax; ax, X3 1 (A2.3) =u;*c& +u;‘c& +ujYz~l The others behave similarly, so we write v=At.v’ or v’ = (At)-l*v (A2.4) The gradient transformation matrix is the inverse-transpose of the coordinate transformation matrix. In our case the function is the determinant of S, the coordinates are the elements of S, and the gradient is the. collection of first-order cofactors. The cofactors can be represented by a matrix having the (i,j)th derivative at the (i,j)th position. This matrix then equals the transposed first-order adjugate, and is a collection of N N-dimensional covariant vectors, i.e. a covariant second-rank tensor, so that L.S.R=d

(A2.5) adj(S)=R*adj(d)*L This result was obtained somewhat differently in eqn. (24). A vector that transforms like the coordinates is a contravariant vector. An inner product of a covariant and a contravariant vector is invariant with respect to any non-singular transformation: 137

(v’ )+-w’ = (Av)+* ( (A+) -l-w) =vt.A+. (At) -l.w (A2.6) =v+*w where v is a contravariant vector, w is a covariant vector and A is a non- singular transformation. In the unitary case the distinction between contravariant and covariant transformations is academic, as the inverse-transpose of a unitary matrix coincides with the matrix itself.