INFORMATION TO USERS

The most advanced technology has been used to photograph and reproduce this manuscript from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book.

Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order.

University Microfilms International A Bell & Howell Information C om pany 3 0 0 North Z eeb Road, Ann Arbor. Ml 48106-1346 USA 313.761-4700 800'521-0600 Order Number 9111690

Large-scale ab initio molecular electronic structure calculations

Comeau, Donald Clifford, Ph.D.

The State University, 1990

UMI 300 N. ZeebRd. Ann Arbor, MI 48106 L a r g e -S c a l e Ab Initio M o l e c u l a r E l e c t r o n ic S t r u c t u r e C alculations

dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The By Donald Clifford Comeau, B.S.

The Ohio State University

1990

Dissertation Committee: Approved by Charles F. Bender

C. Weldon Mathews j <^> ^^jjj Russell M. Pitzer Adviser isaiant • u Diiavitbci 'u Department of To Kathy and Kim

ii A cknowledgements

I would like to thank my advisor, Professor Isaiah Shavitt, for his guidance, instruction, suggestions, and support. While the friendship and support of all my fellow graduate students and post-docs is acknowledged, special assistance to this project was provided by Randy Diffenderfer, Kyungsun Kim, Melanie Pepper,

Eric Stahlberg, and Bob Zellmer. Professor Russell M. Pitzer also provided help at several points. Portions of this work were supported by a Graduate School

Fellowship and a Presidential Fellowship, both from The Ohio State University.

Computer time was provided by the Instruction and Research Computer Center of

The Ohio State University, the NASA-Lewis Research Center, Boeing Computing

Services, the Pittsburgh Supercomputing Center, and the Ohio Supercomputer

Center.

Special thanks go to Kathy for her patience and encouragement, to Kim for her squeals of delight, and to my parents for instilling my thirst for knowledge and for guiding my early steps on the road of learning. V i t a

April 26, 1961 ...... Born—Anchorage, Alaska

1982 ...... B. S., summa cum laude, Columbia Union College, Takoma Park, Maryland

1983-198 4 ...... Graduate Teaching Associate Department of Chemistry The Ohio State University Columbus, Ohio

1984-198 6 ...... Graduate Research Associate Department of Chemistry The Ohio State University Columbus, Ohio

1987-1988 ...... Graduate Research Associate Department of Chemistry The Ohio State University Columbus, Ohio

1989-1990 Graduate Research Associate Ohio Supercomputer Center Columbus, Ohio

IV P ublications

“An Ab Initio Determination of the Potential-Energy Surfaces and Rotation-Vibra- tion Energy Levels of Methylene in the Lowest Triplet and Singlet States and the Singlet-Triplet Splitting,” D. C. Comeau, I. Shavitt, P. Jensen, and P. R. Bunker, J. Chem. Phys., 90, (1989), 6491-6500.

“A Progress Report on the Status of the COLUMBUS MRCI Program Sys­ tem,” R. Shepard, I. Shavitt, R. M. Pitzer, D. C. Comeau, M. Pepper, H. Lischka, P. G. Szalay, R. Ahlrichs, F. B. Brown, and J.-G. Zhao, Int. J. Quant. Chem. Symposium, 2 2 , (1988), 149-165.

“The Location and Characterization of Stationary Points on Molecular Poten­ tial Energy Surfaces,” D. C. Comeau, R. J. Zellmer, and I. Shavitt, in: Geometrical Derivatives of Energy Surfaces and Molecular Properties, (P. J 0 rgenson and J. Si­ mons, eds.), D. Reidel Publishing, (1986), 243-251.

F i e l d s o f S t u d y

Major Field: Chemistry

Theoretical Chemistry. Professor Isaiah Shavitt.

v T a b l e o f C o n t e n t s

DEDICATION ...... ii

ACKNOWLEDGEMENTS ...... iii

VITA ...... iv

LIST OF FIGURES ...... ix

LIST OF TABLES ...... xi

CHAPTER PAGE

I Introduction ...... 1

II Electronic Structure M ethods ...... 4

III Potential Energy Surface Stationary Points ...... 1 1 3.1 Potential Energy Surfaces ...... 11 3.2 Gradients ...... 12 3.3 Internal Coordinates \ . 13 3.3.1 Traditional Internal Coordinates ...... 14 3.3.2 Nonredundant Subset of Cartesian Coordinates 16 3.4 Analytic Polynomial Potential Surfaces ...... 19 3.5 Geometry Optimization ...... 20 3.5.1 Newton-Raphson ...... 20

3.5.2 Hessian U pdate ...... 2 1 3.5.3 Our Method for Finding Stationary Points . .. 22 3.6 MINPT Program ...... 25 3.7 Least Squares Equations ...... 26 3.8 Frequencies ...... 33 3.9 Test Case ...... 41 3.10 Calculations on CHs"...... 41 3.11 Further W ork...... 46

IV Computer Programming ...... 48 4.1 Optimization...... 50 4.1.1 Algorithm...... 50 4.1.2 Clean Code ...... 52 4.1.3 Clean Inner Loops...... 54 4.1.4 Computational Kernels...... 55 4.1.5 Unrolling Loops...... 57 4.2 Portability...... 58 4.2.1 I O ...... 59 4.2.2 Clean Code ...... 60 4.2.3 Isolate Dependencies ...... 61 4.2.4 Original Methods...... 62 4.2.5 MDC...... 64 4.2.6 Logical Expressions in MDC...... 68 4.2.7 Com putational K e r n e ls ...... 69 4.3 Cray Matrix Operations...... 72 4.3.1 Ordinary Matrix Multiplication...... 73 4.3.2 Symmetric Matrix Operations...... 76 4.4 Formula Tape Loops ...... 79 4.4.1 Distinct Row Table ...... 81 4.4.2 External Space Treatment...... 85 4.4.3 Symmetry ...... 87 4.4.4 Numbering the Configurations...... 88 4.4.5 Matrix Element Evaluation...... 93 4.4.6 Matrix Element Contributions from Integrals with External Indices ...... 97 4.5 Results ...... 102 4.6 Further Work...... I ll 4.6.1 Taking More Advantage of GUGA Organization 112 4.6.2 Parallel Processing...... 112 4.6.3 Docum entation and User In terfa ce...... 113

V CH2 Potential Energy Surfaces ...... 115 5.1 Background...... 115 5.1.1 Electronic Structure...... 120 5.2 Basis Sets ...... 122 5.2.1 Segmented Gaussian Basis S ets...... 122 5.2.2 Atomic Natural Orbital Basis S ets ...... 124

vii 5.2.3 Core Correlating Basis S e ts...... 139 5.2.4 Final Basis S e t ...... 146 5.2.5 Correct H Basis...... 148 5.3 Preliminary Calculations...... 154 5.3.1 Reference Space ...... 155 5.3.2 Frozen-Core MCSCF ...... 158 5.3.3 Wrong Symmetry References...... 158 5.3.4 Core Correlation...... 161 5.3.5 Hartree-Fock Interacting Space...... 163 5.4 Potential Energy Surfaces...... 164 5.5 Analysis of the Surface...... 170 5.5.1 Dissociation Energy...... 177 5.5.2 Triplet-Singlet Splitting...... 177 5.6 Further Work...... 178

VI Conclusions...... 181

APPENDICES PAGE A g O rb ita ls...... 183

BIBLIOGRAPHY ...... 192 L i s t o f F i g u r e s

F i g u r e P a g e

1 Orientation of Methyl Chloride for Removing Redundant Cartesian

Coordinates ...... 17

2 Demonstration That Two Energy Points With Gradients Are Inad­

equate to Describe a Two-Dimensioned Surface ...... 27

3 C H g ...... 43

4 Effective Speed of Matrix-Matrix Multiplication as a Function of

Matrix Size on a Cray Y-MP, Assuming 2n 3 Operations ...... 74

5 Speed For Creating a Square Copy of a Triangular Matrix on a Cray

X-MP ...... 78

6 Speed of Symmetric Matrix-Matrix Multiplication as a Function of

Matrix Size on a Cray Y-MP ...... 80

7 Sample Shavitt G raph ...... 84

8 The External Portion of the DRT for a Singles and Doubles Cl

Calculation ...... 8 6 9 Shavitt Graph Corresponding to the DRT in Table 5. The small

numbers represent the arc and row weights ...... 91

10 External DRT For W, X, and Y Vertices ...... 92

11 External Portion of DRT for Walks With Orbitals of Different Sym­

metries ...... 94

12 Diagram of Two Configurations That Have at Least Three Nonco­

incidences. Note the three loops ...... 95

13 Representative Examples of External Portion of DRT Loops with

Four External Indices and Their Integral Contributions ...... 98

14 Representative Example of External Portion of DRT Loops with

Two External Indices and Its Integral Contributions ...... 101

15 Time in CPU Seconds for One Iteration of the Cl Diagonalization

P ro g ra m ...... 110

16 Orientation of CH 2 ...... 1 2 1

17 Natural Orbital Occupation Numbers ...... 127

x L i s t o f T a b l e s

T a b l e P a g e

1 MINPT Testing...... 3 2

2 H2 O Frequencies (cm-1) ...... 41

3 CH 5 G e o m e try ...... 44

4 CH5 Harmonic Frequencies (cm-1) ...... 45

5 Distinct Row Table Corresponding to the Shavitt Graph in Figure

9. It is for a full Cl with 5 orbitals, 4 electrons, and singlet spin. . . 90

6 Descriptions of the Test Cases Used to Determine the Speed of the

COLUMBUS Cl Diagonalization Program ...... 104

7 CPU Time in Seconds for Small CH 2 Test Case...... 105

8 CPU Time in Seconds for Cs Symmetry CH4 Test Case...... 106

9 CPU Time in Seconds for Large CH 2 Test Case...... 107

10 Time per Cl Diagonalization Iteration for Several CII 2 Calculations 109

11 Ab initio values for the singlet-triplet splitting (in kcal/mol) for

various basis sets and levels of theory ...... 117

12 Abbreviations Used to Identify Different Types of Calculations . . . 118 13 CH2 O rbitals ...... 1 2 0

14 GO Basis S et...... 123

15 VO Basis S et...... 125

16 Total and Relative Energies for CH 4 Using Small Natural Orbital

Basis S e t s ...... 131

17 Total and Relative Energies for CH 2 Using Small Natural Orbital

Basis S e t s ...... 132

18 Total Energies and Singlet-Triplet Splitting for CH 2 Using Small

Natural Orbital Basis S e ts ...... 133

19 Total and Relative Energies for CH 4 Using Large Natural Orbital

Basis S e t s ...... 136

20 Total and Relative Energies for CH 2 Using Large Natural Orbital

Basis S e t s ...... 137

21 Total Energies and Singlet-Triplet Splitting for CH 2 Using Large

Natural Orbital Basis S e ts ...... 138

22 Comparison of Total Energies and Singlet-Triplet Splitting for an

Ordinary Basis (VO) and a Core Correlating Basis (V I) ...... 140

23 Comparison of Total Energy and Singlet-Triplet Splitting for a Me­

dium (Cl) and a Large(C2) Core Correlating Basis S e t ...... 142

24 Comparison of Total Energy and Singlet-Triplet Splitting for Va­

lence (MA,NA) and Core-Cor relating (Cl,C2) Basis Se ts ...... 144

xii 25 Comparison of Total Energy and Singlet-Triplet Splitting for Va­

lence and Core-Correlating Calculations ...... 145

26 Comparison of Total Energy and Singlet-Triplet Splitting using Car­

bon g and Hydrogen d Basis Functions ...... 147

27 F7 Basis S e t ...... 149

28 Total Energy, Singlet-Triplet Splittings, and Dissociation Energy

Using the F7 Basis Set ...... 150

29 F 8 Basis S e t ...... 151

30 Total Energies and Singlet-Triplet Splittings for F7 and F 8 Basis Sets 152

31 Comparision of Total Energies for the F7 and F 8 Basis Sets at Dif­

ferent Geometries ...... 152

32 Total Energy Differences at Various Geometries for the F7 and F 8

Basis Sets...... 153

33 Total Energies and Singlet-Triplet Splitting for Various Calculations

Using the VO Basis S e t ...... 156

34 Effect of the Reference Space on the Singlet-Triplet Splitting .... 157

35 Effect of MCSCF Frozen Core on the Singlet-Triplet Splitting . . . 158

36 Effect of Wrong-Symmetry References on the Singlet-Triplet Splittingl61

37 Effect of Core Correlation on the Singlet-Triplet Splitting ...... 161

38 Effect of Hartree-Fock Interacting Space Restrictions on the Singlet-

Triplet Splitting ...... 163

xiii 39 3Bi CH 2 Geometries ...... 165

40 3Bi CH 2 E n e rg ies ...... 166

41 Wi CH 2 Geometries ...... 167

42 JAi CH 2 E n e rg ie s ...... 168

43 Timings for Calculating Surfaces (CPU Seconds on a Cray X-MP) . 169

44 Potential energy parameters for the X 3Bistate of CH 2 determined

by fitting the potential expansion of Eq. 5.2 to the ab initio data of

Table 40 ...... 173

45 Potential energy parameters for the a *Ai state of CH 2 determined

by fitting the potential expansion of Eq. 5.2 to the ab initio data of

Table 42 ...... 174

46 Rotation-vibration energies (in cm-1) for the X 3Bi state of CH 2 . 175

47 Rotation-vibration energies (in cm-1) for the a xAi state of CH 2 . . 176

48 Dissociation E n erg y ...... 177

49 Triplet-Singlet Splitting ...... 178

50 Legendre and Associated Legendre Functions ...... 186

51 Spherical Harmonics ...... 187

52 s - f Cubic Harmonics ...... 188

53 g Cubic Harmonics ...... 189

54 s — f ARGOS Symmetry Orbital Coefficients ...... 190

55 g ARGOS Symmetry Orbital Coefficients ...... 191

xiv C H A P T E R I

Introduction

Early this century, Bohr, Schrodinger, Heisenberg and others developed quantum mechanics. At this point, all of the fundamental equations of chemistry could be written down. Unfortunately, the road from equation to solution has been long and tortuous. As Dirac said

The underlying physical laws necessary for ... the whole of chemistry

are thus completely known, and the difficulty is only that the exact

application of these laws leads to equations much too complicated to

be soluble . 1

In principle, Schrodinger created theoretical chemistry when he first wrote his famous differential equation

Hip — Eif), (1.1) where II is the Hamiltonian operator for the system of interest, E is the energy of the system, and ip is the wavefunction for the system. The Schrodinger equation 2 can be solved explicitly in only a few special cases. The particle in a box, the harmonic oscillator, and the hydrogen atom were solved quite quickly after the equation was developed. The H 2 molecular ion followed not long after.

That short lists sums up all the important cases where the Schrodinger equa­ tion can be solved analytically. All more complicated situations must be handled numerically. With a few exceptions such as the helium atom 3 and the hydrogen molecule , 4 this was not practical until the arrival of the digital computer, shortly after World War II. While some of the theoretical techniques used today were de­ veloped before that time, they weren’t practical without a computer. A famous example is Hartree’s development of the Self-Consistent Field (SCF) method. A surprising number of calculations were performed with the help of his retired fa­ ther and other “computers . ” 5 ,6 Nonetheless, SCF calculations were few and far between. Now that computers are commonplace, SCF is one of the most widely used theoretical methods.

Since that time there have been tremendous advances, both in theoretical tech­ niques and computing power. These improvements allow today’s chemists to per­ form calculations far beyond the reach of their forerunners. Supercomputers are readily available to nearly all researchers around the world. Computational tech­ niques are being modified to take advantage of the unique computing capabili­ ties and requirements of these machines. The challenges are to obtain practical chemical information and continue the advance in techniques and capabilities. In its own small way, this work attempts to address these questions.

The next chapter briefly reviews the standard techniques and procedures used in the COLUMBUS system of quantum chemistry programs . 7 The reader familiar with this subject can safely skim or even skip over it. Unusual situations will be covered specifically in later chapters.

One of the more advanced techniques is the calculation of gradients of the energy with respect to distortions of molecular geometry. This is a gold mine of additional information. What is the best way to make use of it? Chapter III addresses this question.

Supercomputers offer a potential speedup of 100 times conventional computers.

Unfortunately, this increase in speed is not obtainable merely by using a super­ computer instead of a conventional computer. The program must be organized and optimized carefully to provide the desired increase in speed. Application of these techniques to the COLUMBUS Cl programs is discussed in Chapter IV.

Techniques are not useful unless they can be applied to practical problems.

In Chapter V, our newly optimized programs are used to calculate high quality potential energy surfaces for the lowest two states of CH 2 .

The final chapter gives a brief look at the future. Both short-term and long­ term possibilities are considered. We don’t know the specifics of the future, but we know the capabilities and applications of theoretical chemistry will increase. C H A P T E R II

Electronic Structure Methods

Properties of a quantum mechanical system in a stationary state are determined by solving the Schrodinger equation,

Hip = Eip, (2 .1 ) for the “wave function,” ip, and the energy, E. For molecular systems, the Born-

Oppenheimer approximation, that the nuclei can be considered stationary com­ pared to the much lighter electrons, is used. With this assumption, the nonrela- tivistic spin-free Hamiltonian operator for molecular systems is

i i,A tA i

where i and j refer to electrons, A and B refer to nuclei, Z& is the charge on

nucleus A, ther’s are distances, and atomic units have been used.

The equation can be solved exactly for one-electron atoms, such as hydrogen,

giving a wave function of the form

Vntm = N R ne(r)Ytm(0,

4 5 where Rn( is an associated Laguerre polynomial, Yim is a spherical harmonic, and

N is a normalization constant. While many-electron systems cannot be solved exactly, approximate solutions axe constructed from one-electron functions that are qualitatively the same as hydrogenic wavefunctions.

The most common solution methods expand the wavefunction in a basis set.

The quality of the basis set determines the limits on the quality of the entire calculation, because the final wavefunction is built piece by piece from the initial basis set.

Primitive basis functions are nearly always Cartesian gaussians

g = x lymzne -ar\ (2.4)

While Slater functions (with an exponential factor of e-l*r as in the hydrogen atom) have the correct cusp and long-range character, gaussian functions axe used because integrals over them can be calculated very efficiently. Appropriate linear combinations produce true spherical harmonics from the Cartesian products. (See

Appendix A.)

The basis functions used in further calculations are fixed linear combinations of these primitive basis functions,

X = J2°i9i- (2-5) t

In segmented contractions, each primitive contributes to only one final basis func­ tion. General contractions allow a primitive to contribute to an arbitrary number of basis functions,8

Xj = (2.6) O' To obtain reasonable quality results, the basis set needs to be at least at the double-zeta plus polarization (DZP) level. This means there are two basis functions for each occupied orbital (double-zeta) and a set of basis functions with angular momentum larger than any occupied orbital (polarization functions). Segmented basis sets can be found in reference 9 and references therein. Correlation con­ sistent, generally-contracted basis sets axe reported in reference 10. Some of the

calculations reported here use DZP basis sets, while others use much larger basis

sets.

Several different methods are used to describe basis sets. One simply lists the

number of basis functions for each I value. This list is enclosed by parenthesis (())

for primitive basis functions and brackets ([]) for contracted basis sets. A slash

is used to separate the description of the “large” atom (usually anything other

than hydrogen) basis set from the description of the “small” atom basis set. For

example, a reasonable (DZP) basis set for H 2 O would be a (9s5pld/4slp) primitive

basis set contracted to [4s2pld/2slp].

Pople basis sets use one digit for each contracted basis function and the value

of the digit tells the number of primitive functions included in the contraction.

Other symbols used include a hyphen (-) to separate core and valence orbitals, an asterisk (*) indicates polarization, a plus (+) shows diffuse functions, and a G indicates that gaussians are used. A valence DZP basis in this notation is 6-31G**.

Hartree-Fock Self-Consistent Field (SCF) methods assume the wave function can be written as an antisymmetrized product of one-electron functions or spin orbitals,

$ = l^i • • • Wv(»>r)l- (2.7)

(The vertical bars indicate the determinantal form of the antisymmetrized prod­ uct.) Each spin orbital,

This shows how each electron behaves in the average field of all the other electrons.

A potential energy surface can be determined by calculating the energy at

a large number of molecular geometries. Directly calculating the gradient of the

energy with respect to the coordinates of the nuclei provides additional information

about the surface. As discussed in a later chapter, this information is very helpful

when determining stationary points, such as minima and saddle points.

SCF calculations only include the average effects of electron interaction. They

ignore instantaneous effects. One way to include these “correlation” effects is configuration interaction (Cl ) . 1 1 A Cl wave function is a linear combination of

spin- and space-adapted configuration functions (CFs),

njl n s no t = £ c?*? + £ cf ®? + £ c? *? . (2.9) * t i

The most common form of Cl includes in the expansion all configurations that

differ from the SCF (or reference) configuration by only one or two electrons. This

is called singles and doubles Cl or SD-CI. In fact, this is so common we use Cl to

mean singles and doubles Cl, unless explicitly stated otherwise.

There axe times when a single configuration (CF) does a particularly poor job of describing the system of interest. This can happen when there are near

degeneracies or when the ground state configuration doesn’t dissociate properly.

This is handled by allowing the wavefunction to be a linear combination of a few

important CFs. The form of the wave function is the same as for a Cl calculation,

but the procedure optimizes both the orbitals used to construct the CFs and the

coefficients used to construct the final wavefunction from the CFs. This is called

Multi-Configuration Self-Consistent Field (MCSCF).

An MCSCF calculation accounts for “nondynamic” correlation effects due to

near degeneracies, but still does not address the “dynamic” or instantaneous cor­

relation: A Cl calculation is still needed for that purpose. In order to include

nondynamic correlation effects in the Cl calculation, each configuration in the

MCSCF is considered a “reference CF” ($ f) in the Cl calculation. Any CF which 9 is a single (3>f) or double ($ f) excitation from any of the reference CFs is included in the Cl expansion. This is called multireference Cl (MR-CI), in contrast to the single-reference Cl (SR-CI) discussed above.

The most important information not included in a SD-CI calculation axe the

“disconnected doubles” terms—the simultaneous excitation of two electron pairs.

These can be approximated by the Davidson correction,

A E q = (1 — Co)2jB corr , (2 .10) where cq is the coefficient of the reference configuration in the Cl expansion and jEcorr is the correlation energy obtained by the Cl . 1 2 ,1 3

We performed our calculations using the COLUMBUS system of programs. Elec­ tron integrals were calculated by ARGOS.14 It produces integrals over symmetry- adapted basis functions. It can use generally-contracted basis sets and basis func­ tions through g. It is limited to Abelian symmetries.

SCF calculations were performed by SCFPQ.15 SCF gradients were performed by a program from the Schaefer group 1 6 which was partially based on routines from

HONDO.17 Pitzer adapted it for use with the COLUMBUS program system . 1 8

The MCSCF program was written by Shepard, Shavitt, and Simons . 1 9 It uses the GUGA method which is discussed later. The COLUMBUS Graphical Unitary Group Approach (GUGA) Cl programs were first written by Lischka, Brown, Shepard, and Shavitt . 2 0 Shavitt 2 1 developed the

GUGA based on work by Paldus . 2 2

The Cl expansion is represented by a Shavitt graph. (See Figure 7 in Chap­ ter IV.) Each configuration is represented by a walk from the head to the tail of the graph. This graph is useful for several purposes. It provides a convenient way of mapping from a specific configuration to its number in the Cl expansion.

Even more important, the Hamiltonian integral between two configurations can be determined by the shape of the “loops” formed by the walks representing the two configurations. More detail is provided later. CHAPTER III

Potential Energy Surface Stationary Points

3.1 Potential Energy Surfaces

The result of a typical ab initio calculation is the total electronic energy of the system. Under the Born-Oppenheimer approximation, this energy can be consid­ ered a function of the nuclear geometry of the molecule. Evaluating this function or energy at different geometries creates a surface in n-dimensional space, known

as the “Potential Energy Surface.”

Properties of this surface directly relate to molecular properties of chemical

interest. For example, minima on this surface are stable molecular equilibrium

geometries. Harmonic vibrational frequencies are related to the eigenvalues of the

Hessian at a stationary point. Saddle-point geometries correspond to molecular

transition points. Reaction paths between minima show how chemical reactions

may proceed.

Because of the plethora of information provided, potential energy surface cal­

culations are very important. Unfortunately, because of the large number of

11 12

calculations required, they are much more expensive than single-point calcula­

tions. Depending on the amount of information desired, they may be one to three

orders of magnitude more work. For this reason, efficient use of the data available

is very important.

3.2 Gradients

In order to obtain additional and more accurate information about these surfaces,

programs have been developed to calculate not only the energy as a function of

molecular geometry, but first and higher derivatives also. These techniques were first developed by Pulay . 2 3 ,2 4

The gradient calculations increase the time taken for a calculation at any par­

ticular point. (In the work discussed here, the total cost of SCF energy and gradient

was about three times the cost of SCF energy alone.) But gradients dramatically

increase the amount of information available at that point. Since the amount of

information is proportional to the number of degrees of freedom in a molecule, the

larger the molecule, the more important the gradients. (The point is even more

true for higher derivatives.) By using gradients, calculations will need to be done

at significantly fewer points. In addtion, by avoiding the need for using finite dif­

ference calculations to determine derivatives, the loss of precision in such difference

calculations is also avoided. 13

Special care should be taken when performing gradient calculations. Since gradient and energy data from nearby points will be used together, they need to be very consistent and precise. The error in the data must be much less than the differences between values at nearby points. One example of the special care required was that the cut-off for neglecting integrals needed to be lowered from

1 0 " 9 to 1 0 ~20.

An even more important consideration is the convergence of the SCF procedure.

If the SCF energy is converged until the error is less than e2, then the error in the orbitals will be about e. The error in the gradients will also be proportional to e, since they are directly related to the orbitals . 2 5 This requires the energy to be converged further than usual. For example, we found that we needed to converge the energy to 1 0 - 1 2 instead of the 1 0 - 9 we found adequate for calculations concerned solely with the energy.

3.3 Internal Coordinates

Potential energy surfaces are often described using internal coordinates. On the other hand, our programs and their input use Cartesian coordinates. Cartesians are necessary in any case for mass-weighting the potential surface for finding fre­ quencies 2 6 and intrinsic reaction paths . 2 7 ,2 8 With these considerations, it would seem that using Cartesian coordinates would be ideal to describe potential energy surfaces. Unfortunately, there are too many of them. A molecule with n atoms 14

has 3 n Cartesian coordinates. If the molecule is not linear, then it has only 3n — 6 internal coordinates. The 3n Cartesian coordinates include information about the location and orientation of the molecule that is not necessary.

Since the energy of a molecule is independent of the “external” coordinates, the location and orientation of the molecule in space, we want to avoid including this information in our investigation of the molecule’s “interal” properties. This is usually accomplished by choosing a set of interal coordinates, bonds, angles, etc. that describe the molecule in a chemically reasonable manner. We use a subset of the Cartesian coordinates to describe the internal structure of the molecule.

3.3.1 Traditional Internal Coordinates

When we think of the internal structure of a molecule, we think in terms of bonds between atoms, angles between these bonds, and the twisting or distorting of the bonds out of planes. Thus it should be no surprise that the usual method of quantitatively describing the structure of a molecule is with coordinates that describe these features. For a molecule with n atoms, the usual choice is n — 1 bonds, n — 2 bond angles, and n — 3 out-of-plane angles. (How many of these out-of-plane angles are angles between a bond and the plane formed by two other bonds and how many are dihedral or torsional angles depends on the particular structure of the molecule.) For linear molecules, the choices are n — 1 bonds and

2 (n — 2 ) bends. 15

One disadvantage of this method is the laxge degree of arbitrariness in the selection of the coordinates. There are rt(n — l)/2 pairs of atoms (for bonds), n(n — l)(n — 2 ) / 6 triples of atoms (for bond angles) and n(n — l)(n — 2)(n — 3)/24 atomic quadruples (out-of-plane angles). In practice, reasonable chemical intuition adequately dictates most of the choices. There still are situations in which the choices are arbitrary. For example, consider cyclic molecules.

The choice of internal coordinates influences how the search for a minimum will proceed. A poorly chosen set may harm or even prevent convergence to a stationary point . 2 9

Using gradients can be a bit challenging when using coordinates of these types.

Gradients are nearly always directly calculated in Cartesian coordinates. They then have to be transformed to internal coordinates. For the internal coordinates described above, the transformation is nonlinear. The gradient along a bend de­ pends on the value of the angle itself.

To avoid problems of this type, internal coordinates are often expressed as differences between the current geometry and a reference geometry (usually chosen to be a stationary point). This has the advantage that small changes appear as small values for the coordinates. The transformation between Cartesians and these coordinates can then be treated as approximately linear.

The disadvantage of this solution appears when the molecule distorts far from

the reference geometry. Interesting studies can include soft vibrational modes, reaction paths, or even larger regions of the surface. In such investigations, no reference geometry will be a good approximation to all the geometries of interest.

Nonredundant Cartesians side-step this problem completely, because they do not need or use a reference geometry.

Another difficulty is that the units are inequivalent. Bonds are measured in units of distance while angles are measured in units of angle. Various force con­ stants are then in different units. This is related to the problem of nonlinearity mentioned above, but also is inconvenient for least squares (discussed below).

While there are ways to work around all the problems listed, they can be completely avoided by using Cartesian coordinates directly.

3.3.2 Nonredundant Subset of Cartesian Coordinates

We handle these problems in our program by using a nonredundant subset of

Cartesian coordinates. In effect, we simply ignore six of the Cartesian coordinates.

Precisely which of the six we ignore depends on the situation, but a valid six

can always be found. For example, place one atom at the origin and ignore its

coordinates. Place another atom on an axis, such as the z axis, and ignore its x

and y coordinates. Finally, place a third atom in a plane, such as the yz plane,

and ignore its x coordinate.

A specific example should make this clear. Consider methyl chloride as pre­

sented in Figure 1. First place the carbon atom at the origin. Then place the 17

z

Figure 1 : Orientation of Methyl Chloride for Removing Redundant Cartesian Co­ ordinates 18 chlorine atom along the z axis. Chose one of the hydrogen atoms and place it in the yz plane. Since this molecule has Gzv symmetry, its geometry has been com­ pletely described. For optimizing the geometry, these coordinates (the chlorine z and a hydrogen y and z) are the only ones that need to be considered. The other hydrogen coordinates can be calculated from those of the first. In order to calculate frequencies, one needs the complete surface, not merely the symmetric portions. This means the coordinates of the other two hydrogens must be included.

But this still leaves us with only 9 coordinates instead of the 15 in the complete set of Cartesians.

If the molecule possesses some symmetry, even more Cartesian coordinates can be ignored during the process of searching for the minimum geometry. For example, if a molecule contains a plane of symmetry, the coordinates of the atoms on one side of the plane can be ignored. They will simply mirror the coordinates of atoms on the other side of the plane, and thus contain no additional information. Of course, the same idea applies to a symmetry axis.

An interesting point concerning the handling of gradients and coordinates ig­

nored due to symmetry needs to be raised. If one of the nonreduntant Cartesian

coordinates specifies the location of more than one atom (due to symmetry), the corresponding gradient will need to be scaled by the number of atoms involved.

This is because the gradient value is for moving just that one atom, while an 19 entire set of atoms will change position when that coordinate is changed and a new calculation performed.

Actually, this is better handled by considering the linear transformation be­ tween the full Cartesian basis and the set of nonredundant Cartesians actually used. This will have the same effect of scaling the gradient as before, but the justification is even easier to see. Of course, the transformation from the Carte­ sian basis to the degree of freedom basis potentially loses information. It is only valid when the molecule is in a “standard” location and orientation. At these lo­ cations the orthogonal complement of the transformation from the Cartesian basis to the degrees of freedom basis will transform the geometry to a zero vector. The transformation from degrees of freedom back to Cartesians is always valid.

While our work with Cartesians originated and proceeded in quite an ad hoc m anner , 3 0 we now realize that it can be developed in a very formal manner. An example is given by Banerjee and coworkers , 3 1 ,3 2 who developed these ideas at about the same time we did. (See also King and Komornicki.33)

3.4 Analytic Polynomial Potential Surfaces

There are several practical advantages of working with analytic polynomial po­

tential energy surfaces. First, it is trivial to characterize the surface. Stationary

points can be identified quite easily. While it is true that polynomials are not ca­

pable of describing large regions of the true surface, they are completely adequate 20 for describing a local region of the surface. If this local region contains a station­

ary point, then finding the stationary point on the true surface is equivalent to finding the stationary point of the polynomial. Even if the stationary point of the polynomial is outside of the local region, it will probably be a reasonable guess for

the location of that stationary point on the true surface.

3.5 Geometry Optimization

Geometry optimization is one of the most important tasks of theoretical chemistry.

Identifying the stable structure of molecules is a necesary preliminary to determin­

ing vibration-rotation frequencies or reaction paths. Locating a stationary point

on the potential energy surface involves finding a point at which the gradient is

zero. Like any high-dimensional minimization problem, it is nontrivial.

If gradients are not available, very simple, but time-consuming single-step

methods can be used . 3 4 ’3 5

Fortunately, analytic gradients are available and affordable for most of the

common ab initio methods of quantum chemistry. When this data is available,

Newton-Raphson or related methods are usually used. The modified methods are

most often used because pure Newton-Raphson requires second derivatives.

3.5.1 Newton-Raphson

The Newton-Raphson method is very simple and direct. If Hessian information is

available directly, it is trivial to implement. One only has to solve the following 21 system of linear equations:

H A x = —g, (3.1) where H is the Hessian matrix of second derivatives, g is the gradient vector, and

A x is the step to the next guess for the minimum. If the Hessian is not directly available, some effort will be necessary to produce the required second derivatives.

A disadvantage of this method is that it doesn’t use all the available informa­ tion. Information from nearby points that might be useful is ignored. Another problem is that the steps axe often too long. Not only can they take you past the confidence region, but into another area of the surface entirely. There are ways to scale the step size, but they are not always adequate. As a result, if you are not close enough to the stationary point of interest, you may end up wandering aimlessly over the surface.

3.5.2 Hessian Update

Quasi-Newton Hessian update methods are useful for exploring a potential surface when gradient information is available . 3 6 Two specific examples are the Murtagh-

Sargent 3 7 variant of the Davidon-Fletcher-Powell method 3 8 ,3 9 and the Broyden-

Fletcher-Goldfarb-Shanno method . 4 0 - 4 3 The exploration begins with an approxi­ mate Hessian matrix. This approximate Hessian may come from such disparate sources as a one-time calculation, chemical knowledge, such as typical stretching or bending frequencies, or as is often the case, simply the identity matrix. Steps 22 are taken based on this matrix and the current gradient. At each point, the new gradient and energy data is folded into the approximate Hessian by one of several different formulas. It is hoped that the Hessian will improve to the point that it will allow the location of the stationary point of interest.

This method has the advantage that it can take into account chemical informa­ tion such as the usual frequencies of various motions (for example, a C—H, O—H, or C—C stretch or a bend or a wag). Unfortunately, to do this in a reasonable manner requires the use of internal coordinates. Their disadvantages have been discussed above.

A problem with these methods is that old information is only used indirectly as it is folded into the current Hessian.

In spite of the problems mentioned above, in practice these methods perform very well. As a result, they axe at the heart of most automatic geometry mini­ mization programs available today.

3.5.3 Our Method for Finding Stationary Points

This section presents the procedure used in this work to find stationary points.

No claim is made that it is best or ideal. It worked easily with the tools at hand.

It makes some attempts to be efficient. It has been used successfully in other projects such as the CH 4 stereomutation transition point , 4 4 ’2 5 linear and rhombic

C4 , 4 5 S 0 2 , 4 6 and HCCO . 4 7 23

First, begin with a reasonable starting guess. This might be an experimental value, from a lower quality calculation, or from chemical knowledge of similar species. Around this starting guess, calculate enough ab initio points to fit a quadratic surface. (If gradient data are available, this will be one calculation for each degree of freedom.) Use the stationary point on that calculated surface as the location to perform another ab initio calculation. Add that point to your data, refit the surface, and do yet another ab initio point at the new surface’s stationary point. Repeat until the ab initio gradient is below the threshold selected.

In practice, as the new points move away from the original staxting guess, old points will need to be discarded. They lie outside of the quadratic region of the current best guess for the stationary point and hence will only harm the least- squaxe fit. Also, as the procedure converges, the quality of the fit must improve, so the quadratic region becomes smaller. The size of the quadratic region depends on the precision needed at the moment. The more precise the fit required, the smaller the area over which this will be possible.

Newton-Raphson steps, such as the first follow-up point, often overshoot the actual stationary point. Adding predicted points to the original data and calculat­ ing a new surface avoids the blind wandering that can result using pure Newton-

Raphson. Because of this overshoot, the second predicted point is often in between the initial guess and first predicted point. At other times, the second point will be 24 quite near the first point. This is good news, because it suggests the stationary point is in the quadratic region and convergence is near.

While heading toward the stationary point, the eigenvalue spectrum of the Hes­ sian determines the character of surface around that point. If the eigenvalues are all positive, the region is that of a minimum. One negative eigenvalue character­ izes a transition state. Sudden changes suggest insufficient data and an arbitrary surface. Other changes may result from a more complicated surface than expected.

Care will be necessary to make sure the true potential surface is understood before proceeding.

This method is clearly related to the Hessian update schemes briefly discussed above. As new points and gradients are calculated, they are folded into a new

Hessian via the least squares fit and the polynomial potential energy surface. An interesting feature of this method is that one can monitor such information as how well the points have been fit, which points are the least well fit, and through singu­ lar value decomposition (SVD ) , 4 8 ,4 9 about which directions the least information is available.

Unfortunately, it is doubtful whether these advantages can compare with the ease with which Hessian update methods can be automated. While with some effort, this method could probably be automated, most of the advantages would then be lost. 25

This method would really shine if second derivative information was directly available. The Hessian information at nearby points could be combined and pro­ cessed as easily as gradient information is handled now.

3.6 MINPT Program

MINPT is the program that actually solves for the polynomial surface and finds stationary points on that surface. Some effort has been made to automate the flow of data from SCF gradient calculations to MINPT and from there back to the integral program, ARGOS. There is a lot of room for improvement in this scheme, but the small level of automation that does exist can greatly speed work and decrease

the number of errors.

Production geometry optimizers chose the points at which to perform calcula­

tions themselves. These choices are relatively unimportant as long as the system

accomplishes the desired result. Unfortunately, if the calculation happens to go

astray, the user may have little hope of returning it to the desired path.

The system used here does not close the loop so tightly. A complete set of

calculations may be performed at once, but only on the user’s explicit command.

The automation merely makes the data’s journey from step to step through the

calculation more error-free and less tedious. The disadvantage is that when the

calculation does proceed without anomaly, the user must still be involved. 26

Currently, the program can handle only energy and gradient data. The reason is

that gradients are the only derivatives that we can currently calculate. There would

be no difficulties, in either the theory or the programming, if we had additional data available. Any level of derivatives could be handled without any problems.

While the program allows simplified input for complete polynomials, it also allows complete flexibility in specifying which monomials to include in the overall polynomial . 2 5 This gives the user flexibility in handling cases where a 3rd or 4th degree polynomial is required or necessary for a few of the degrees of freedom.

The polynomial is fit to the ab initio data using the least squares method.

The following section discusses this fitting in detail. The stationary point of the polynomial surface is found using the Newton-Raphson method discussed above.

Since we have the analytic form of the polynomial, the necessary derivatives can be determined efficiently.

3.7 Least Squares Equations

When fitting a surface to data using least squares, one could naively assume that

as long as the number of data items (energy and gradient values at each geometry)

exceeded the number of terms in the polynomial, there would be enough data.

Due to inherent redundancies in the data, this is not the case. Consider the two-

dimensional case pictured in Figure 2. The polynomial to describe the surface is 27

x

Ex

Figure 2: Demonstration That Two Energy Points With Gradients Are Inadequate to Describe a Two-Dimensional Surface 28 simply

a x 2 + bxy + cy2 + dx + ey + / , (3.2) which has six terms. Each geometry provides three data values: one energy and two gradient components, symbolized in the diagram by E, Ex, and Ey. Simply considering quantity of data, two geometries, say at the origin and along the x axis, should be enough. They are not. There is not enough information to determine c, the coefficient of the y 1 term. The redundant data is also relatively obvious, in hindsight. The difference in energy between the two points is proportional to the gradient in the x direction.

To have sufficient data to fit the surface, data from a point in the y direc­ tion would also be necessary. (Actually, any three noncollinear points would be adequate. The closer the angle between them is to perpendicular, the more inde­ pendent the data. The actual orientation of the offset points will have no effect.)

When fitting an equation to a set of data, you end up with an overdetermined set of equations. They have the same form as an ordinary set of linear equations

A x = b, (3.3) where A is a matrix of values (or derivatives) of the polynomial terms at different geometries, b is the vector of actual values of properties at those points, and x is the vector of polynomial coefficients for which we are solving. The difference between this system and an ordinary system of linear equations is that the matrix 29

A is not square but rectangular. It will have r rows for the number of data items and c columns for the number of terms in the polynomial. Of course, b will be a vector with r elements and x will be a vector with c elements. An overdetermined system will have r > c, or ideally, r ^ c.

To solve the system of equations in a least squares sense, one can solve the equation

A A x = A b. (3-4)

Since A*A is a c x c matrix and A b is a c-length vector, we now have an ordinary set of linear equations. In fact, this is called the normal equations for least squares.

In principle, it can be solved by ordinary linear equation techniques. In practice, the normal equations are ill-conditioned. If k is the condition number of A, then

/c2 will be the condition number of AA. One needs to use very stable methods to obtain reliable solutions to these equations. Unfortunately, we were not able to consistently solve these equations. We decided to go back and use techniques for solving the original least squares equations (Equation 3.3) directly.

An example of such a method is QR decomposition with pivoting. This method is provided by the UNPACK routines SQRDC, SQRSL, and SQRST.48 (Code for SQRST appears in the book , 4 8 but not in the distribution). Because this method includes pivoting, terms in the polynomial with insufficient data to provide reliable coeffi­ cients are ignored instead of receiving nonsensical coefficients. 30

Another stable method is Singular Value Decomposition (or SVD). Routines for using this method can be found in both LINPACK48 and reference 49. The advantage of this method is that insufficient information can be detected and removed from the polynomial in any direction, not just in the direction of a monomial term. This sounds incredibly useful. In practice, we haven’t seen this marvelous behavior, but neither have we done any careful, exhaustive testing. QR with pivoting has been adequate for our purposes so far.

Unless there is a particular reason for doing otherwise, all the data in a least squares fit should be treated equivalently; with equal weight. To avoid an un­ intended weighting, the data in a least squares fit should preferably be in the same units. This is impossible when fitting energy and gradient data. The energy has units of energy (obviously), while the gradients will be in units of energy per distance. In our case, the actual units happen to be hartrees and hartrees/bohr.

In this situation, the best one can to do is insure that all the elements of the data matrix will have their error in the same decimal place . 4 8 Once again, how to do this is not obvious, but consideration of SCF convergence will give us a number to work with. The energy of an SCF calculation converges faster than the orbitals. If the error in the orbitals is proportional to e, the error in the energy will be proportional to e2. Since the gradients are not stationary with respect to orbital variations, they will also have an error proportional to e. If the energy data is weighted by it should have an error in about the same decimal place as 31 the gradient data. Since the energy is converged to 10-12, the energy should be weighted by 1 0 6.

Empirical observations support this rough handwaving argument. Weighting the energy by 1 0 6 provides better fits to the data, more consistent surfaces, and a more reliable convergence of the entire procedure.

Table 1 shows the results of a series of tests run using MINPT and some sample input. Blank lines divide the data into sections that illustrate various points. A few data items are duplicated in several sets. The final column, condition num­ ber (cond) indicates the numerical stability of the least squares matrix. Smaller numbers are better.

The first set of numbers demonstrates the effect of weighting the energy data compared to the corresponding gradients. As the weight (Ew) increases, the fit to the energies (||AE||) improves, without harming the overall fit (||r||). This is very important when searching for stationary points. If the energies are fit well, then the polynomial surface at least mimics the character of the true surface quite well.

The other three sections in the table demonstrate interesting, but less important points. The second section shows the harm to the fit if the pivoting tolerance (tol) is slack enough to allow some of the polynomial terms to be removed from the fit.

The third section shows the improvement gained by shifting the energy (E Con) solely for the purpose of the numerical behaviour. The final section demonstrates that nothing is gained in shifting the energy by other than an integral value. Table 1: MINPT Testing 0

E w tol IMI l|A £|| •Scon cond

1 1 0 " 6 1.4e —5 6 e —8 38. 3e3 1 0 4 lO- 1 0 2 .1 e —5 3e — 1 1 38. 1.9e7 1 0 s 1 0 - 1 0 2 .1 e —5 3e —13 38. 1.5e8 1 0 - 1 ° 1 0 6 2 .1 e —5 3e —15 38. 1.9e9 1 0 7 1 0 ~ 1 2 2 .1 e —5 3e —16 38. 1.9el0

1 1 0 “ 6 1.4e —5 6 e —8 38. 3e3

1 0 6 1 0 “ 6 5.2e —3 7e —13 38. 1.5e5

1 0 6 1 0 - 9 5.2e —3 7e —13 38. 1.5e5 H 0 o 1 1 0 6 2 .1 e —5 3e —15 38. 1.9e9 1 0 -i° 1 0 7 5 .2 e - 3 7e —15 38. 3.4e9 1 0 7 lO" 1 2 2 .1 e —5 3e —16 38. 1.9el0

1 0 -i° 1 0 6 2 .1 e —5 2 e —14 0 . 1.9e9 1 0 6 1 0 " 1 0 2 .1 e —5 9e — 15 30. 1.9e9 1 0 1 1 0 6 - ° 2 .1 e —5 3e —15 38. 1.9e9 1 0 - 1 ° 1 0 6 2 .1 e —5 3e —15 38.9296 1.9e9 1 0 6 1 0 “ 1 0 2 .1 e —5 3e —15 40. 1.9e9

1 0 6 1 0 “ 9 5.2e —3 7e —13 38. 1.5e5 1 0 6 1 0 - 9 5.2e —3 7e —13 38.929 1.5e5

1 0 6 1 0 “ 9 5.2e —3 7e —13 38.9296 1.5e5

° Column Headings: Ew Energy weight tol Tolerance for pivoting j|rj| Norm of the residuals HASH Largest error in energy lit Econ Constant added to energy (hartrees) cond Condition number The above is true at least for finding stationary points. Interesting counter- data comes from later work attempting to develop a more precise procedure for obtaining consistent frequencies. That work indicated that a weight of 1 for the energy was best . 2 5 Of course, this should not be confused with weighting the energy and gradients equally. As discussed above, because they have different units, that is impossible! This difference in recommended weight is not completely unreasonable. Finding stationary points and high quality frequencies axe quite different tasks. To find a stationary point, one wants the best possible fit over as large a region as feasible. For harmonic frequencies the best possible fit in the immediate region of the stationary point is required.

3.8 Frequencies

Once you have a polynomial surface that mimics the true surface well enough to

assist in finding stationary points, it should be possible to use the polynomial

surface to obtain frequencies. Since we have only used quadratic surfaces, we only

attempted to obtain harmonic frequencies, but if a higher order polynomial was

used, anharmonic frequencies should also be possible. Of course, this would require

a more complicated method.

Frequencies can be obtained from the eigenvalues of the mass-weighted Hessian

by the formula 34 where is the ith frequency and A,- is the ith eigenvalue. The elements of the mass-weighted Hessian are given by the expression

i d2E dAidBj ’ ^ ' ' where is the mass of the Ath atom, and A; is the ith coordinate of atom A.

The task at this point should be clear. A polynomial (and hence the Hes­ sian) in a nonredundant set of Cartesians has been calculated. The Hessian in the complete Cartesian basis is needed. For the moment, ignore symmetry. Assume the nonredundent Cartesians describe all motions other than translations and ro­

tations. Obviously, if the polynomial surface is symmetry restricted, it will be

impossible to obtain all the frequencies. That means that the only coordinates

which have been ignored describe translations or rotations of the molecule.

As one would expect, the translational and rotational invariance of the energy

is the key to solving our problem. Translational invariance leads to the equation

• <3 -7 >

while rotational invariance leads to

¥ A ‘w r A i l * ) = 0 ’ (i* })’ ( 3 -8 )

where A,- (i = x, y, z) is a Cartesian coordinate of atom A and the sums are over all

atoms. Differentiating with respect to a particular Cartesian coordinate of atom 35

B yields the equations „ d2E ^ dAWk ~ (3'9) and

d2E . d2E , . dE c dE n ^ 'dAjdBk dAidBk ikdBj jkdB{ ’ ( ^

These yield a system of equations that can be solved to give the Hessian elements necessary to construct a Hessian over all 3 n Cartesian coordinates. The single derivatives that appear in the last equation will be zero at a stationary point. At

any other point, the translation and rotation invariance equations (Equations 3.7 and 3.8) can be solved to provide the missing gradient values.

One very important advantage of this method is that the calculated rotational

and translational “normal modes” will have zero frequencies, to numerical pre­

cision. Other methods can have translational or rotational “frequencies” large

enough to be confused with low-frequency vibrations. This can be avoided by

projecting the correct translational and rotational eigenvectors from the space . 5 0

While this shouldn’t cause a significant problem, it is not as clean and is an in­

dication of slightly impure eigenvectors and suggests some undersirable mixing is

occuring.

Next consider the problem of Hessians for symmetry restricted surfaces. For ex­

ample, to optimize a geometry, only a symmetric surface is required. Actually, the

more general question of any motion orthogonal to the surface will be addressed. 36

There are two important questions that must be answered. One, what will be the eigenvalues of motions orthogonal to the degrees of freedoms explicitly described? Two, what is the relationship between eigenvectors of the complete

Cartesian Hessian and the partial Cartesian Hessian constructed from the Hessian in the degrees of freedom basis?

Dining this discussion, the Hessian of the potential energy surface will be con­

sidered in three different basis sets. First, H itself will be the complete Hessian in

the Cartesian basis set. The Hessian over the degrees of freedom in the original

degrees of freedom basis set will be H /. When the degrees of freedom Hessian is

expressed in the Cartesian basis set, H c will be used. Note that while H and H c

have the same size, 3n x 3n, and are both expressed in the same Cartesian basis,

they do not contain the same information. H c is still only the degrees of freedom

Hessian, while H is the complete Cartesian Hessian. Finally, 7im is a Hessian in a

mass weighted Cartesian basis.

First the question of motions orthogonal to the degrees of freedom is considered.

If F is the matrix describing each degree of freedom in terms of the Cartesian basis,

then the degrees of freedom Hessian can be expressed in the Cartesian basis set by

Hc = F'HyF. (3.11)

H c will be a 3n x 3n matrix for a molecule with n atoms. Assuming one is working

with / degrees of freedom, then Hy will be an / X / matrix and F a / x 3n matrix. 37

While / can be any number in the range 1 < / < 3n, it will become clear that the only useful choices are when / includes all the motions of a particular symmetry.

If the degrees of freedom basis is chosen orthonormal, then

FF4 = 1. (3.12)

Now assume v is a vector in the Cartesian basis describing a motion orthogonal to the degrees of freedom included in F. Then F v = 0. The motion v can be an asymmetric motion when considering a symmetric surface. The product Hcv will

be 0 because v is orthogonal to the vectors that make up F,

Hcv = F1H/ F v = F ‘H ,0 = Ov. (3.13)

As a result, v is an eigenvector of Hc with a zero eigenvalue.

Now consider the relationship of eigenvalues and eigenvectors of the complete

Cartesian Hessian, H, with those of the degrees of freedom Hessian in the Cartesian

basis, Hc. The degrees of freedom Hessian, H /, is related to the complete Cartesian

Hessian, H, by

H , = FH F4. ' (3.14)

If a vector, v, is within the degrees of freedom space, then the degrees of

freedom projection operator, F 4F, which satisfies

(F4F)(F4F) = F4(FF4)F = F41F = F4F, (3.15)

will have no effect on it,

F 4Fv = v. (3.16) 38

Assume v is an eigenvector of H,

H v = Av. (3.17)

Now v can be shown to also be an eigenvector of H c with the same eigenvalue,

Hcv = F‘H/Fv = F‘FHF*Fv = F‘FHv = F'FAv = Av. (3.18)

This means normal motions which are adequately described by the degrees of freedom can be correctly determined from the partial Hessian. The only way to

guarantee that the degrees of freedom completely describe a particular normal

motion would be to include all Cartesian motions of the particular symmetry.

Since frequencies are actually obtained from a mass weighted Hessian, we must

consider whether that will have any effect on the results. A mass weighted Hessian

is related to a Hessian in the Cartesian basis by

H m = D H D (3.19)

where D is a diagonal matrix. Each diagonal element is equal to the - A power of

the mass of the atom described by the corresponding Cartesian coordinate.

Before proceeding, we must show that F*F and D commute. Multiplying an

arbitrary matrix on the right (left) by a diagonal matrix effectively multiplies

each column (row) by the corresponding diagonal element of the diagonal matrix.

However, each degree of freedom (or row in F) contains nonzero elements only

for equivalent atoms which must have the same mass. When the product FD 39 is calculated, each column of F is scaled appropriately for the atom to whose

Cartesian coordinate it corresponds. But since each row of F only contains nonzero elements for equivalent atoms, this is the same as scaling each row appropriately for the atoms included in that degree of freedom. This operation can be expressed as D /F where D / is a diagonal matrix and the diagonal elements are equal to the power of the mass of the atoms included in the corresponding degree of freedom. Using the equality FD = D /F and its transpose DF* = F*D/, we can see that F*F and D commute,

F‘FD = F*D/F = DF'F. (3.20)

Assume v is an eigenvector of Tim, the mass weighted, complete Cartesian

Hessian,

H mv = Av. (3.21)

The vector v will also be an eigenvector of the mass weighted, partial Cartesian

Hessian, 7ic = D H CD, with the same eigenvalue. Using the definitions of 7 i c and

H c and

Hcv = DHcDv = DF‘FHF‘FDv. (3.22)

Since F*F and D commute, and F‘Fv = v,

Ucv = F'FDHDF'Fv = F 4F H mv = F'FA v = Av. (3.23) 40

Thus frequencies calculated from the complete Cartesian Hessian, H, and the par­ tial Cartesian Hessian, H c, will be the same, provided the eigenvector is completely described by F.

The situation is much simpler for zero eigenvalues. If v is an eigenvector of H c with a zero eigenvalue, then D - 1 v will be an eigenvector of 'Hc with zero eigenvalue,

WcD"xv = DHcDD-1v = DHcv = DO = Ov. (3.24)

Thus eigenvectors of H c and H c with zero eigenvalues are trivially related. This is connected with the fact that the motions describing translations and rotations of a molecule are independent of the masses of the atoms in the molecule.

When working in a symmetry-restricted nonredundant set of Cartesians, the first step is to include appropriate translations and rotations in the degrees of freedom basis by solving Equations 3.9 and 3.10. Then transform the Hessian to the Cartesian basis and mass weight, as described above.

The final conclusion is that all the motions perpendicular to the degrees of freedom used to describe the polynomial surface will have zero eigenvalues. This is quite convenient. In addition, the symmetric frequencies can be obtained from the surface used to optimize the geometry. As would be expected, to calculate frequencies of other symmetries, the surface must be expanded to include additional degrees of freedom. 41

Table 2: H 2 O Frequencies (cm x)

This Work This Work Binkley, Pople Optimum Geometry Pople’s Geometry and Hehre“

bend 1799.3 1800.8 1801 symmetric stretch 3812.4 3808.5 3809 asymmetric stretch 3945.8 3939.9 3941

“Reference 51

3.9 Test Case

To make sure the method works, we applied it to a test case. We selected H20 from work by Pople . 5 1 We used the 3-21G basis sets, as introduced in that paper.

As can be seen in Table 2 , the geometries and frequencies obtained are nearly identical.

3.10 Calculations on CH^

For a more challenging test of our method, we looked at CH 5 . It is an unusual molecular ion and has received a fair amount of theoretical study . 5 2 - 5 7 As the pro- tonated form of the smallest saturated hydrocarbon, methane, it is an interesting model system. The prototype Se2 reaction at a saturated site, CH 4 +H+, produces

CH 5 as an intermediate . 5 3 ,5 8

Protonated methane forms readily in a mixture of methane and methane cation

CH4+ + CH 4 -> CI1+ + CI13 (3.25) 42 in one of the most fundamental ion-molecule reactions. As a result, it plays an important role as an intermediate in ion-molecule reactions. It is often used in chemical ionization mass spectroscopy . 5 9 There is even evidence of CH 5 occuring in solution . 6 0

The structure of CH 5 is fairly well known. It is most easily visualized as methane with one H atom replaced by a H 2 molecule. The most stable configu­ ration has one of the H 2 fragment hydrogens eclipsing one of the CH 3 hydrogens.

The H 2 group has a nearly free rotation through a transition state in which the

H 2 hydrogens are staggered between the GH 3 hydrogens. In fact, because another transition state, C2v, also has very low energy, all the hydrogens are dynamically equivalent at room temperature . 5 3

In our work, we investigated the Ca structure labeled Illb in reference 52. This is the structure with the hydrogens staggered. A diagram of this molecule appears in Figure 3. The basis set used consisted of a Dunning [4s2p/2s] contraction 6 1 of

Huzinaga (9s5p/4s) primitives . 6 2 The polarization functions were as recommended by Dunning and Hay . 9 This is known as a double-zeta plus polarization (DZP) basis set. All calculations were carried out at the SCF level.

The initial geometry and final, optimized geometry are reported in Table 3.

One of the noticeable points is the relaxation of the CH 3 fragment. The original

work had constrained it to a C$v geometry. In addition, the somewhat larger basis 43

Figure 3: CH* 44

Table 3: CH 5 Geometry

Parameter Initial Final

Bond, lengths (bohr)

CHX 2.039 2.050425

CH 2 = CH3 2.039 2.066101

CH 4 = CHS 2.308 2.350615

Bond angles (degrees)

HiCH2 = HiCHa 1 1 0 . 8 115.889

H2 CH3 1 1 0 . 8 109.281

H!CH 4 = H ^ H s 108.4 106.252

h 4 c h 5 46.76 41.426 h 2 c h 4 = H3 CH5 86.76 85.891

set tightened the H 2 bond and increased the distance between the CH 3 and H 2 portions of the ion.

The frequencies of CH 5 in this geometry are listed in Table 4. In addition to the results calculated in this work, some of our earlier less rigorous results 3 0 and some more recent, very high quality results 5 7 are also included.

Comparison with our earlier results shows that some of the frequencies were quite good, while others were relatively poor. This is due to the uneven quality of the surface used to minimize the geometry. Such a surface is more than ade­ quate for finding an optimum geometry. However, obtaining consistent, reliable frequencies requires more care, as discussed above. Table 4: CH 5 Harmonic Frequencies (cm *)

This Earlier Komornicki Work Worka and Dixon 6

A' block 3412.9 3411.0 3365 CH3 deg. stretch 3270.6 3270.6 3228 CH 3 breathing 3141.3 3140.2 3088 H 2 stretch UJA 1641.0 1795.7 1667 H 2 —CH3 stretch ^ 5 1562.3 1612.6 1593 CH 3 rock/bend U)q 1400.9 1514.5 1420 CH 3 rock

LOj 1119.6 1118.8 1154 CH 3 bend

A" block CJi 3329.9 3272 CH3 asym. stretch 2288.7 2299 H 2 rock u>3 1556.5 1583 CH3 asym. bend/rock co4 1251.8 1262 CH3 asym. rock Ws 176.8i 2 0 1 i H 2 rotation

Reference 30. Reference 57. 46

The reported frequencies are fairly similar to those reported by Komornicki and Dixon in their more recent study using a larger basis set . 5 7 This stability with different basis sets suggests the frequencies reported are reliable, at least at the SCF level. Of course, for actual comparision with experiment, the scaled frequencies reported by the other authors to reflect the effects of anharmonicity and correlation would provide a better starting point.

3.11 Farther Work

Several improvements and changes to the methods presented here come quickly to mind. First, it would be interesting to try the M IN P T program with higher derivatives. Derivatives up to third order are available for SCF . 6 3 *64 As disscussed earlier, there should be no problems, but one can never be certain until the work is actually done.

The procedures used would profit from more automation. They were developed for a system of programs designed for single-point calculations. On a UNIX™ system, it should be easy to set up scripts for this purpose. While not reducing the computer time required, this could greatly reduce the amount of human effort consumed and hence decrease the total amount of time used. (Some progress in this area has been made while tidying up a few details for this report. Unfortunately, the procedures, while much improved, are far from completely automatic.) 47

While the SVD method promises to be very useful for solving the least squares equations, our brief experiments with incomplete or insufficient information didn’t show any improvement when compared to QR with pivoting. It would be worth some time to see if this is the situation in general, or whether with more care and understanding, the additional flexibility of SVD can be utilized to advantage.

Now for further work on CH 5 itself. An obvious next step after realizing the stationary point we have investigated is actually a transition state, is to look at the true minimum itself. In addition, one would like to know what the reaction path between the two stationary points looks like . 2 7 ’2 8 ’2 5 ,6 5

Since CHjj' is a closed shell molecule, the SCF description is fairly reasonable.

However, the bonding is obviously not traditional. Correlation, such as provided by Cl, will probably have a noticeable effect on the frequencies. Correlation will also be important as the study extends to other transition points and to reaction paths.

Once CH 5 is well characterized, the work should be extended to similar com­ pounds of other elements in the same column of the periodic chart as carbon: SiHs and GeH C H A P T E R IV

Computer Programming

The initiatives started by the National Science Foundation, and followed by several states and universities, make computer time on such supercomputers as Crays, Cy- bers, and large IBMs readily available. This computer time provides great potential for performing much more accurate calculations or for performing calculations on physical systems which were previously inaccessible.

The challenge is to take advantage of the computing power so recently made

available. Ordinary scalar computers perform only one operation at a time. Con­

ceptually, vector computers perform operations on entire vectors at once. (Actu­

ally, the operations are pipelined through segmented arithmetic units.) A scalar

program running on a supercomputer will run only three or four times faster than

ordinary mainframes, such as an IBM 3081. In order to obtain the increases in

speed of 10 to 100 times that are possible, the program must be modified. De­

pending on the program, this modification (vectorization) may be straightforward or may be quite difficult. 49

One unfortunate fact is that different supercomputers may require different modifications. The new minisupercomputers (Convex, Titan, FPS) which provide performance/cost ratios matching or exceeding traditional supercomputers 6 6 can require somewhat different modifications. And finally, the programs should still work efficiently and effectively on existing scalar computers. In order for a program to be used effectively in a wide range of environments, it must be very portable, but still optimized.

At times a program can be vectorized simply from careful and insightful reading of the source code. More often this method gives the simpler vectorizations, but only a portion of the potential speedup. To achieve all of the speedup that is possible, it may be necessaxy to consider the algorithms used in a program at a more fundamental level. At this more basic level, a reorganization that would have no effect on a scalar computer can have a significant effect on the amount of vectorization possible on a vector computer.

This chapter’s following sections address the issues raised above. The first section discusses optimization, particulaxly vectorization. The second section con­ siders how to make programs portable without losing the efficiency gained in the first section. The next section looks at the tremendous speed available from a

Cray matrix operation. The Graphical Unitary Group Approach (GUGA) formal­ ism is examined to see how it can be used to organize a Configuration Interaction

(Cl) in terms of matrix operations. This chapter concludes with the results of 50 applying these techniques to the COLUMBUS Cl program and a brief look at future possibilities.

4.1 Optimization

As mentioned above, a large amount of work can be required to take advantage of the tremendous speeds possible on a m odem supercomputer. In this section we take a look at the work required to optimize the COLUMBUS programs for a Cray c o m p u te r .

Our work concentrated on the Cray. Nonetheless, most of the principles, and many of the specifics, also apply to other computers—both vector and scalar ori­ ented.

4.1.1 Algorithm

Using the correct algorithm is the single most important optimization possible.

No amount of what is usually referred to as optimization will correct the costs of

a poorly chosen algorithm.

In his Programming Pearls column, Bentley provided an amusing yet significant

example of the importance of the algorithm chosen . 6 7 He demonstrates the actual

crossing point between a cubic algorithm on a Cray and a linear algorithm on a

TRS-80. Of course the example is absurd—that’s the point. But it is surprising

how often the importance of the proper algorithm is overlooked. This example 51 gives the lie to a commonly expressed attitude, “It’s on a Cray. It doesn’t matter how it’s done.”

The choice of the correct algorithm must not be confused with the theoretically best algorithm. The specifics of the situation must be taken into account. There are many instances when a simple algorithm will be more effective than a more sophisticated one. For example, sequentially searching a short list is usually faster than using the theoretically correct binary search . 6 8

One reason for the importance of algorithm selection is that the resulting op­ timization will be portable. The proper algorithm on one machine will probably be the correct algorithm on another machine. An O(nlnn) algorithm will always be faster than an 0 (n2) algorithm for large problems regardless of the machine or

architecture. Any optimization that does not detract from portability is precious

indeed.

There are exceptions, of course. Details of the algorithm’s implementation

may differ from machine to machine. Computational kernels address this issue, as

discussed later. Gross difference in hardware (such as The Connection Machine or

other massively parallel systems) may dictate different algorithms . 6 9 This situation

is probably best handled by different programs.

As later sections will show, the most important optimization technique on a

Cray is to express the arithmetic as matrix operations. The GUGA formalism 2 1 ,7 0

specifies which arithmetic operations need to be performed in a Cl diagonalization. 52

The manner and order in which the operations are performed is up to the imple­ mentor. How this is organized to provide matrix operations is discussed in detail in a later section.

Two even better examples of optimization by algorithm development are di­ rect Cl and GUGA itself. The idea of direct Cl , 7 1 never explicitly constructing the Hamiltonian matrix, allowed Cl to address a much larger class of problems.

By providing the conceptual framework, GUGA allows the direct Cl to be orga­ nized in a very time and space efficient way. These steps, in addition to allowing larger problems on the most powerful computers, allowed large calculations to be performed on much smaller computers . 7 2

These strides would not have been possible by simply applying ordinary op­ timization techniques to the then current programs. Nonetheless, more mundane optimizations are important. One needs to take advantage of the resources avail­ able. Dramatic breakthroughs are not everyday occurrences.

4.1.2 Clean Code

At first glance, “clean code” is a very surprising item to see in a discussion on optimization. It would be expected under the category maintainability, or maybe portability, but not optimization. Yet it does have its place here. The reason is optimizing compilers. 53

Optimizing compilers are no longer fantasy. An ordinary compiler may produce better code from a particular contorted construct than from a straightforward one.

A good optimizing compiler will produce faster code from a simple construct than

that produced by an ordinary compiler for any high level language construct. In fact, the code produced can be better than hand-coded assembly. (For example,

the c f 7 7 Fortran compiler on the Cray produces better code for SAXPY than the

(Cray Assembly Language) CAL version available in S C IL IB .73) Constructs that

are recognized by the compiler will often result in better code than if the program­

mer attempts to second-guess the compiler and provide a more efficient method

directly.

One example of this is 2-dimensional arrays. Before the advent of optimiz­

ing compilers, it was common practice to use 1 -dimensional arrays to simulate

2-dimensional arrays. The effective index of the 2-dimensional array would be

transformed into an index of a 1 -dimensional array explicitly in the program. This

allowed certain optimizations, such as strength reduction—calculating the index

by a series of accumulating additions instead of a series of independent multipli­

cations . 7 4

A good optimizing compiler will make the same transformation as the program­

mer in the previous method. Except this time the compiler knows that the indices

being manipulated are temporary, so additional optimizations, such as leaving the

value in a register and never assigning it to memory, are possible. 54

One particularly relevant class of optimizing compilers are vectorizing compil­ ers. These compilers identify appropriate DO-loops and produce vector instructions to perform the loop instead of scalar instructions. While the classes of loops that can be identified and vectorized are steadily increasing, clean loops are always more likely to be recognized. In addition, the code produced will often be better, as discussed above. Explicit vector operations such as those proposed in Fortran

90 will make this even easier . 7 5

The process of making clean loops is usually the majority of the work performed when a program is “vectorized.” The ease or difficulty of this task depends on how well the program was originally written and whether the basic algorithms lend themselves to vectorization.

4.1.3 Clean Inner Loops

Most of a program’s time is spent inside of inner loops. This is simply because that

code is executed more times than the rest of the code in a program. As a result,

it is most time effective to concentrate the optimization efforts on inner loops.

The following guidelines are most often heard when discussing changes neces­

sary for vectorization. Many users are surprised to find that they often improve

scalar performance also. 55

N o IO

Input/Output (IO) is always much slower than central processing unit (CPU) instructions. This must be removed from inner loops. It is much more efficient to read or write a large block of data outside a loop than to process a small piece at a time inside a loop.

Few IF Statem ents

Decision making and branching always take time. If the decision can be made outside of the loop once and for all, it doesn’t have to be made repeatedly as the

loop is executed.

4.1.4 Computational Kernels

The concept of computational kernels is the most important idea for writing

portable, optimized code. Many of these advantages appear under portability,

but there is a significant reason to use computational kernels for optimization—let

someone else do the optimization!

While we are fairly good programmers, it is unlikely we could write code as

well as someone whose job is to write the most efficient code possible. And even if

we could write code just as well, it is still inefficient to repeat what someone else

has done. 56

Most computer systems provide efficient, optimized, assembly language rou­ tines for performing common core tasks, such as matrix and vector operations.

Whenever these routines are used in your program, you are probably obtaining the most efficient performance possible for your machine.

There are times no efficient library routine is available for a particular task.

One must write it oneself. This does not negate the importance of computational kernels. All their portability advantages are still relevant. The programmer can freely optimize the new routine for the current computer, without impairing the program’s performance on other machines, or the portability of the program that

actually calls the kernel routine. A standard Fortran version will be available for use on a new machine until an optimized version can be prepared, if it is necessary.

One of the points raised at a recent Symposium on Supercomputing was the

need for more and higher-level computational kernels. At one time, everyone used

his own square root and trigonometric routines. But eventually, quality versions of

these routines were standardized and even prepared in silicon . 7 6 ,7 7 BLAS routines 7 8

are very common, but they are not yet fully standardized. The standards should

expand to include higher level operations such as matrix, eigenvalue, and FFT

calculations . 7 9 57

4.1.5 Unrolling Loops

Here is a general example that displays the importance of computational kernels.

At times, a high proportion of the speed increase of assembly language routines can be realized in a high level language by unrolling the innermost loop . 6 8 (An unrolled loop has the inner operations explicitly repeated several times and the iteration count reduced accordingly.) In fact, this optimization is performed automatically by some compilers . 8 0

However, on a vector machine, unrolling the inner loop would slow the program down, because it is the inner loop that is vectorized. On these machines, the penultimate loop should be unrolled . 8 1 This provides the same speed advantages as on scalar machines, without harming vectorization. Using this technique, the speed of a Fortran matrix multiplication approaches the speed of an assembly language version.

The challenge is for the program to execute the rolled or unrolled code as appropriate for a given computer without masking the underlying operation being performed. If a portion of the program consumes enough time that it truly warrants applying this optimization technique, we have identified a computational kernel.

The main routine can simply call the kernel routine to accomplish the desired task.

The computational kernel optimized for the current machine will be used. When the program is ported to a different architecture, the kernel routine will be replaced 58 with the appropriate version for the other machine and the program will continue to operate efficiently.

4.2 Portability

One of the purposes of science is to gain information. If the information gained cannot be shaxed with others, it is of no use. One of the purposes of theoretical chemistry is to develop programs that can be used to describe systems of chemical interest. If these programs cannot be shared, their usefulness decreases dramat­ ically. Improvements in the theory developed by different researchers cannot be easily combined. In fact, portability is also useful to the isolated research group.

For better or for worse, a computing environment is never stable. Faster and more powerful hardware is continually becoming available. One must make sure that these additional resources can be utilized quickly and easily. If a program must be largely rewritten, much of the advantage of a new system is already lost.

While little time here will be spent discussing program maintenance, it is very important and closely related to portability. After all, much more time is spent modifying and maintaining programs than is ever spent designing or writing them.

The cute expressions “All useful programs have at least one bug” and “All pro­ grams can be made one instruction shorter” both contain the truth that no pro­ gram is ever static. Programs are continually made more capable, faster, or more 59 reliable. Any steps that can make these processes easier and less laborious advance the cause of science, because more time can then be spent on productive research.

4.2.1 IO

It is unfortunate, but Fortran 10 is not portable. One author observed that

“whereas the syntax of these facilities is now standardized, in many cases the semantics have not been standardized . ” 8 2 In practice, certain important subsets axe portable: form atted 1 0 on fairly short records (80 characters or less) and sequential unformatted IO when used either for input or output, but not both.

While these cases do cover most of the 10 performed, much 10 is still not portable. Examples include direct access with long record lengths, writing at the end of a file after reading it, or special interaction with the operating system. An honest, straightforward approach works best: admit that 1 0 is not portable and take steps to minimize the damage.

The approach we have developed is to write a set of low-level 10 routines that mimic the Fortran 10 they replace. Then a version of each low-level routine is written for every machine on which the program needs to run. At first, this sounds like unnecessary work, but it is actually much less work than would otherwise be required. If standard Fortran 10 is adequate for the machine, then the low-level routine will be trivial. If Fortran 10 is not adequate, then something different would be necessary in any case. By using a separate low-level 10 routine, in the 60

future, this type of 1 0 can be performed on this particular machine without any additional effort.

Direct access files on IBM systems provide several examples of this point. When using the MVS operating systems, direct access records can only be 32K bytes (or

4K working precision words) long. For our applications, this is unreasonably short.

So our low-level routines allow several physical records to be treated as one logical record. In addition, when writing a new direct access file in MVS, the file is first

“initialized”, which actually means it is written twice. The 10 routines avoid this unnecessary 10 by writing the file sequentially, whenever possible. And finally, when using the CMS operating system, special system routines are called instead of Fortran 10.

One possible fear is that the additional routine will hurt the program’s per­ formance. While it will slow the program slightly, it is guaranteed not to be significant. First, the 10 will take much longer than the additional subroutine call.

Second, the system 10 routines on many computers involve many subroutine calls, so an additional one will be a negligible increase.

4.2.2 Clean Code

Clean, standard, and readable code is much more likely to be portable than unstructured “spaghetti” code. Confusing code often conceals dependencies on 6 1 details of one particular implementation that may not be true in another standard- conforming implementation.

In addition, clean code is much more readable and understandable by the people that must maintain and port the code. Contrary to general opinion, clean code does not hurt performance. People will spend more time examining most of the code than will be spent by the computer. In an empirical investigation of a large number of Fortran programs, Knuth found that more than half of the time is usually spent executing less than 4 percent of the code . 8 3 Time spent optimizing the program should be spent in this small portion, while the vast remainder of the program can be written for the programmers that must maintain it.

4.2.3 Isolate Dependencies

There are many kinds of machine dependencies. There are hardware differences such as an integer word is 64 bits on a Cray and 32 bits on an IBM. There are op­ erating system differences. VAX hardware could be running VMS, Ultrix, or BSD

Unix. Nonstandard features such as bit operations are implemented differently by each compiler. Different scientific or mathematical libraries may be available. It is impossible to completely prevent machine dependencies. The solution is to limit the problems that will result from them.

The first step is to isolate the machine dependent code from the machine in­ dependent code. Once the dependent and independent code have been separated, 62 the machine independent code is forever free from any of the abuses or kludges that may be required by the dependent code. Any changes or improvements to this code can be made without fear of breaking the program on another machine.

Once the machine dependent code has been isolated in separate routines (or possibly small portions of otherwise independent routines), one knows precisely where the trouble spots are. When changes here axe necessary, care must be taken to avoid breaking the program on other machines. When the program is ported to a new system, the portion of the program that must be covered with a fine-tooth comb is ready and waiting for inspection. The vast majority of the program may be safely ignored, at least in the first pass.

This separation of independent and dependent code is unlikely to harm perfor­ mance. Most of the dependent code will be in areas such as 10 and system interface that have negligible effect on performance. Computationally intensive dependen­ cies will be handled by Computational Kernels which are further discussed below.

4.2.4 Original Methods

Our original methods were quite simple. This was facilitated by the fact that we were only working on an IBM and a Cray. The IBM version was the “master” version. A simple editor script transformed IBM code to Cray code by changing variable types (such as DOUBLE PRECISION to R E A L and IN T E G E R * 2 to IN T E G E R ) and double precision B L A S and L IN P A C K to single precision. Other more significant 63 differences such as bit operations or word size differences were hidden in system dependent subroutines.

This method worked quite well at the time, but in the long term, it had some significant disadvantages. One is that it never could have been flexible enough to handle the varied environments in which our programs currently are used. Another was the use of a master version. When converting to a Cray version, information such as which variables were INTEGER*2 and which were INTEGER was lost. This meant that for a change to be permanent it had to be made in the IBM version.

This was inconvenient when debugging a problem on a Cray.

Pitzer’s ARGOS and SCF programs used a more sophisticated approach . 1 4 ’1 5

Comment codes, such as C#X were inserted in front of machine dependent lines.

Editor scripts removed the comments from whichever lines were appropriate for the machine of interest. This was much more expandable and flexible than the previous method.

Unfortunately, it also had serious drawbacks. The master version of a program was unusable on any machine. The user had to debug one source file and update a different one. Line numbers on machine dependent lines were limited to 2 digits.

While some of the codes were mnemonic, C#C meant Cray and C#V meant VAX, others were quite arbitrary. Could the user remember that C#Q meant standard

Fortran 77 10 was available? This made the master version of a program hard to read. 64

4 .2 .5 MDC

If all machine dependencies could be isolated in separate subroutines, our task would be complete. Unfortunately, this is not realistic. Insisting on this can cause large subroutines to be declared dependent merely because one very small portion is machine dependent. Subroutine calls which are inside inner loops must always be avoided for performance reasons. How should machine dependent code in such places be handled?

One solution is to use a preprocessor, such as the one provided by the C lan­ guage.84’85 This preprocessor is powerful and flexible enough that even though the

C language itself is inherently less portable than Fortran, in practice, the aver­ age C program is more portable than the average Fortran program. The #ifdef construct controls what source code is actually compiled in a given environment.

The ability to #def ine macros allows one to refer to the desired code indirectly.

Finally, the necessary definitions can be localized to one file and # included where desired.

The C language is common enough that it was very tempting to use that preprocessor. Unfortunately, its lack of “knowledge” of Fortran input requirements

(such as the 72 character line length limit) would make some features, such as

#def ine difficult to use. In addition, a preprocessor implemented in Fortran would have advantages. It can be guaranteed to run in any environment where our other programs can run. 65

Traditional preprocessors have other disadvantages. One is that the compiler sees a different program than the programmer. Unless the compiler and prepro­

cessor are integrated, compiler error messages will refer to the wrong line. Since

debugging information is based on what the compiler sees, symbolic debuggers will

be less convenient to use.

Another problem is that the preprocessor has to be executed before every com­

pile. Considering the speed of most current computers and the ability of many

operating systems to have one command actually execute several commands inter­

nally, this disadvantage is more perceived than real. Nonetheless, these perceptions

could harm acceptance by other programmers. If some of the programmers on a

project choose not to work with the preprocessor, most of its advantages have been

lost.

To solve the machine dependent code problem, while addressing the other com­

plications mentioned above, Shepaxd and Bair developed the MDC program.86

The MDC program is written in Fortran. As with traditional preprocessors, it

provides commands (such as *MDC*IF and *MDC*ENDIF) to control which source

lines are processed by the compiler.

The break with traditional preprocessors is the treatment of code that should

not be executed. Unlike most preprocessors, the unwanted lines are not deleted.

They are merely commented out. None of the information in the original source

file is lost. This has several significant implications. 66

Since no information has been lost, the programmer can work directly on the preprocessed file. As long as the editing done on the program does not affect the

MDC blocks themselves, the program will never have to be re-preprocessed for that environment.

When the time comes for the program to be transported to a new machine, the source code for the current machine, not a special generic version, can be fed through MDC once again. This time, however, the uncommented code will be appropriate for the new environment.

Here is an example of MDC code. It is used for unpacking information stored in an array of working precision words. Some machines have library routines that help. Other machines need explicit code. These have different word orders which affect how the unpacking is done.

*mdc*if cray fps

real*8 p(*)

integer u(nuw)

*mdc*elseif vax

integer p(*)

integer u(*)

* m d c*else

integer p(2,*)

integer u(2,*) 67

*m dc*endif

*mdc*if cray

nuw32=((nuw+l)/2)*2

call unpack(p,32,u,nuw32)

*mdc*elseif fps

npw=(nuw+l)/2

call viup32(p,l,u,l,npw)

*mdc*elseif vax

do 10 i=l,nuw

u ( i ) = p ( i )

10 continue

*mdc*elseif littleendian

do 10 i=l,((nuw+l)/2)

u(2,i)=p(l,i)

u (l, i)=p(2, i)

10 continue

* m d c*else c ...big-endian.

do 20 i= l,((nuw+l)/2)

u(l,i)=p(l,i)

u(2,i)=p(2,i) 68

20 continue

*m d c*end if

4.2.6 Logical Expressions in MDC

One lack of the original MDC program was the method of indicating whether a section of code should be turned on. The *MDC*IF line beginning a block of code contained a list of keywords for which the block should be turned on. When the

MDC program was run, another list of kewords was given indicating which blocks should be turned on. If a keyword appeared both as an input option and in the

*MDC*IF line, the block was turned on. A new keyword was required for every combination of possibilities for site, hardware, or operating system. This caused the creation of a large number of keywords.

Some simple examples include SUNANL for code running on a Sun at Argonne and IBMOSU for code running at Ohio State on an IBM.

A major contribution to this particular programming project has been the inclusion of a full-featured logical expression evaluator for *MDC*IF statements.

This is important, because it provides the flexibility of working with a reasonable set of keywords. Each type of dependency can be given a descriptive keyword.

Keywords can be combined as appropriate to describe the code.

The desire for the code to be in portable Fortran prevented the use of automatic tools such as yacc and lex that would have made the task trivial. A collection 69 of string scanning routines allows the tokens to be identified and then encoded.

As the input is parsed, the logical values and operators axe pushed onto separate stacks.

Proper precedence is obtained by controlling the order in which the operators are applied. The priority of and over or is obtained by applying ands as soon as both operands are available while ors wait until the end of the expression. A not, like an and, is applied as soon as its operand is seen. It has higher priority than an and because it has to appear immediately prior to its operand. An open parenthesis prevents evaluation of prior operators until the closing parenthesis has been seen.

The problem combinations listed above can now be indicated by SUN .AND.

ANL and IBM .AND. OSU.

4.2.7 Computational Kernels

The most powerful tool in the continual struggle between optimization and porta­ bility is the Computational Kernel. Whenever a fundamental computational (or

1 0 ) building block is identified, it is implemented in the most efficient manner for the machines available. Then the program can be guaranteed to be working efficiently whenever this kernel is used. There is no loss of portability, because the high level program simply calls the kernel routine. How this routine accomplishes 70 its task is of no concern to the high level routine, as long as the result is correct and efficient.

One of the best examples of this idea is matrix multiplication. The calculation is usually implemented by three nested DQ-loops. Mathematically, the order of the three loops is completely insignificant. In practice, the performance on a computer can be affected dramatically by the order chosen. On a Cray (and most other vector processors), the speed is greatest if the inner loop is an outer product, as in the following example.

DO 100 J = 1, N

DO 90 I = 1, N

C(I,J) = 0.0

9 0 CONTINUE

DO 200 K = 1, N

DO 300 I = 1, N

C(I,J) = C(I,J) + A(I,K) * B(K,J)

1 0 0 CONTINUE

2 0 0 CONTINUE

300 CONTINUE 71

On the other hand, for the Floating Point Systems (FPS) and other array processors, the most efficient way to code this operation is with an inner-product inner loop.

DO 100 I = 1, N

DO 200 J = 1, N

C(I,J) = 0.0

DO 300 K = 1, N

C(I,J) = C(I,J) + A(I,K) * B(K,J)

100 CONTINUE

200 CONTINUE

300 CONTINUE

While the difference is much smaller, this second form of the matrix product is usually faster on scalar machines due to the fewer memory references. (It can be significantly faster on the newest workstations where the time for a floating point operation is comparable to the speed of a memory reference.)

Of course, both Cray and FPS offer highly optimized assembly languages rou­ tines that are much faster than these simple Fortran routines. This does not negate the importance of computational kernels. In fact it exemplifies their applicability.

On those machines someone else has already written the computational kernel. You 72 can take advantage of their expertise, enjoy high performance on these machines, and still have a program that ports easily to other machines and architectures.

4.3 Cray Matrix Operations

Cray computers perform matrix operations at nearly peak speed, even for rela­ tively small matrices. This is one of the most important observations necessary for optimizing programs for Cray computers. It is common knowledge that vector operations are much faster than scalar operations. For very long vectors, quite respectable speeds can be obtained. Unfortunately, the overhead of keeping the vector registers full and the inevitable memory bank conflicts prevent the speed of simple vector operations from ever reaching very close to the theoretical maximum.

On the other hand, matrix operations reuse information in the vector registers.

Fewer memory operations are required. Less overhead is required to keep the arithmetic units busy. Operations on relatively small matrices can reach speeds within 1 0 % of the theoretical peak.

Calculations on traditional computers also profit from an organization" centered around matrix operations. The advantages of matrix operations over vector oper­ ations on these computers may not be as significant, but they still exist. For ex­ ample, they concentrate a lot of work ( 0 (n3) operations) in a very small amount of code and data space. This allows memory caches to be used cHoetivoly. In 73 addition, the vendor will probably supply optimized matrix routines in the system library.

This section documents the speed and hence importance of matrix operations on the Cray. The next section examines what in the GUGA allows so much of the work to be organized in matrix operations. The final section covers the practical effects these changes had on the speed of the COLUMBUS Cl program.

4.3.1 Ordinary Matrix Multiplication

There axe many possible ways a matrix multiplication can be coded in Fortran. It can be explicitly coded as three nested DO loops with the inner loop a dot product.

(This is probably the most natural.) It can be coded with the inner loop a sum of a scalar times a vector and a vector

y <-y + ax, (4.1)

(a SAXPY operation in BLAS parlance78). This vectorizes the best. Or it can be written simply as a call to a library routine.

Figure 4 compares the speed of several methods of calculating a matrix-matrix product. The range of performance is quite striking. The vector methods (DOT— dot product inner loop and AXY— SAXPY innerloop) are significantly slower than the library matrix routines. While these vector methods might eventually obtain the speed of the other methods, it would only be for incredibly large matrices.

(A rough extrapolation suggests matrices larger than 2000 x 2000.) The direct 74

O - CO

os. 13 g> 2 a> >

111

— gms — mxm — dot — xpy — unrl — mxv

0 100 200 300 400 500

Matrix Order

Figure 4: Effective Speed of Matrix-Matrix Multiplication as a Function of Matrix

Size on a Cray Y-MP, Assuming 2n 3 Operations. (Codes in legend explained in the text.) 75 matrix-matrix product library routine (MXM) reaches half speed for matrices of order 13 and has reached full speed by order 53.

The speed of a matrix-vector product library routine (MXV) is interesting, not because it might actually be used instead of a matrix-matrix routine for forming a matrix product, but because some operations are more readily expressed as matrix-vector operations. Knowing how much speed is sacrificed is important. A matrix-vector product does reach the same speed as the explicit matrix operation.

Unfortunately, the speed ramps up more slowly. Matrices of order 27 are required for half speed and order 1 1 0 for full speed.

The Strassen’s method library routine (GMS) uses an algorithm to multiply matrices that requires only 0 (n lo g 2 7) operations instead of the usual 0(n3). Un­ fortunately, it doesn’t become consistently faster than traditional methods until matrices of order 180.

It is interesting to observe that for matrices larger than order 320, Strassen’s method produces “speeds” (calculated assuming 2 n 3 operations) greater than the theoretical maximum of 333 Mflops. This points out a danger of trusting simple- minded timing routines. When Strassen’s method is used, the hardware is execut­ ing slower, but finishing sooner because it has less work to perform. As a specific example, using MXM to multiply two 500 x 500 matrices requires 250 million oper­ ations and at a speed of 300 Mflops requires 0.83 seconds. On the other hand, the 76

SGEMMS routine took 0.66 seconds but needed 180 million operations for an actual speed of only 270 Mflops, but an effective speed of 380 Mflops.

The unrolled Fortran (UNRL) results show that Fortran can nearly approach the speed of the assembly language library routine. The trick is to unroll the penultimate loop instead of the inner loop as would be done on a scalar machine.

This avoids harming vectorization. This technique is clearly useful for operations similar to matrix products that are not available in the system library. Yet such code would be worthless on many other machines and would need to be replaced with either rolled code or code with the inner loop unrolled. This once again demonstrates the importance of computational kernels for achieving portable op­ timization.

These results clearly demonstrate the importance of organizing calculations as matrix operations. Matrix operations are 2 to 3 times faster for large matrices and

the differences is even greater for smaller matrices. While Fortran can approach

the speed of the library routine, it also requires the code to be organized as a matrix operation.

4.3.2 Symmetric Matrix Operations

Operations on symmetric matrices are often required. In order to save 10 and

memory, only the lower (or upper) half of the matrix is stored. Unfortunately, 77 there axe no library routines available on the Cray that work with matrices stored in this manner. It was time to develop our own computation kernel.

We started with a list of methods to try: explicit Fortran, using a square matrix library routine on a “squared” copy of the matrix, and a “symmetric matrix” library routine which expects a square matrix, but only the elements on or below the diagonal are used.

There are two obvious ways to create a square version of a triangular matrix: copy the rows one at a time or create an index vector to map the elements using the scatter/gather hardware. A plot of the speed of this operation appears in Figure 5.

Clearly, explicitly copying each row is faster for all sizes of matrices.

Figure 6 reports the speed of multiplying a symmetric matrix by an ordinary matrix for the various methods. The speed of multiplying ordinary square matrices

(MXM) is included for reference. Typical Fortran code for processing symmetric matrices directly (SMXM) performs quite poorly. Creating a square copy of the symmetric matrix and using the ordinary library routine (COPY) performed very well. The same is true for the“symmetric matrix” library routine (SYMM) which only requires one copy of the matrix. They are nearly as fast as the square matrix routine itself.

The speed of the matrix operation overwhelms the overhead caused by copying

the matrix. This is due to the fact that a matrix multiply requires 0(n3) operations

while squaring the matrix requires only 0(n2) operations. The timing situation is 78

CO TJ e 8 WCD S. CO 8c CO © © DC o E £ copy co scatter 5

o CM

0 200 400 600 800

Matrix Order

Figure 5: Speed For Creating a Square Copy of a Triangular Matrix on a Cray X-MP 79 more complicated for a symmetric matrix-vector product because it only requires

0 ( n 2) operations.

These data demonstrate the importance of explicit timing tests and computa­ tional kernels. On most machines, the explicit Fortran would probably be fastest.

It would have been nice to have had the time to experiment with Cray Assembly

Language (CAL) routines.

An interesting point is that the data on squaring a matrix was gathered on a

Cray X-MP. More recent tests indicate that on a Y-MP, using scatter is faster for matrices smaller than order 100. This demonstrates that concern for performance optimization increases maintenance costs. It also demonstrates the importance of computation kernels. Whichever routine is best for the particular computer and environment can be used.

4.4 Formula Tape Loops

The COLUMBUS Cl programs use the Graphical Unitary Group Approach (GUGA) to organize the calculation. GUGA provides a mapping from CFs to CF number and gives the contribution of each integral to the Hamiltonian matrix. A careful examination of a few of its features will allow us to organize our calculation to take advantage of the speed provided by matrix operations.

Several excellent theoretical developments of the GUGA are available .21,22,70,87

In contrast, this will be a quite pragmatic exposition. Some aspects of the GUGA trix Size on a Cray Y-MP. (Codes in legend explained in the text.) the in explained legend in (Codes Y-MP. Cray a on Size trix Figure 6 : Speed of Symmetric Matrix-Matrix M ultiplication as a Function of Ma­ of Function a as ultiplication M Matrix-Matrix Symmetric of Speed : Megaflops o OJ OJ CO o u> o 0 0 10 200 150 100 50 0 Matrix Order copy — smxrr — mxm — symfT — 80 81 can, in hindsight, be understood as convenient organizational tricks with possi­ ble application in other situations. These include the structure of the Distinct

Row Table (DRT), its Shavitt graph representation, and the configuration to CF number mapping. And while the quantitative aspects of matrix element construc­ tion require the underlying theory, several qualitative aspects can be understood directly.

The Cl program uses the Davidson method to obtain eigenvalues of the Hamil­ tonian . 8 8 This method only requires the product of the Hamiltonian matrix, H, with trial Cl vectors, c,

er = H e. (4.2)

The Hamiltonian matrix never needs to be explicitly constructed or stored. This product is the most time consuming portion of a Cl calculation. Its efficient handling is the focus of this section.

4.4.1 Distinct Row Table

Consider a Cl wavefunction constructed from the orbitals,

01) 02) • • • > 0n- (4.3)

Each configuration can be completely described by whether each orbital 0,- contains zero, one, or two electrons and, if it contains one electron, whether it is coupled so as to increase or decrease the overall spin. Each of these possibilities can be

labeled by a step number — 0 for no electrons, 1 for one electron coupled “up,” 2 82 for one electron coupled “down,” and 3 for two electrons. Now each configuration can be completely described by an n digit number consisting solely of the digits 0 ,

1, 2, and 3. This is called the step vector representation.

Imagine a configuration constructed one orbital at a time. After each orbital is added, the sub configuration obtained can be described by the number of orbitals i, the number of electrons iV,-, and the overall spin, 5,-. Each of these substates is called a row (after the manner in which it appears in Gel’fand-Tsetlin or Paldus tableaux) and will appear as a vertex in the Shavitt graph.

Each configuration can now be considered as a walk from a vacuum row (no orbitals, no electrons, no spin) through other rows until a row is reached that describes the desired state (n orbitals, N electrons, S spin).

Assuming there axe many more orbitals, n, than electrons, N, the approximate number of possible configurations is given by 11

N~2(2ne/N)N. (4.4)

The number of possible rows will be much less than

\ n N 2, (4.5) since there can only be n levels, at most electron pairs and |7V intermedi­ ate unpaired electrons for a low spin case. This much smaller number gives us something reasonable around which to organize the calculation. 83

For drawing the Shavitt graph, a different method of describing each row is used. For describing the spin, one less than the multiplicity

k = 2 S',- (4.6) is used. The number of spin-paired electron “pairs,” a,-, in addition to replace the number of electrons,

Ni = 2a,• + bi. (4.7)

The Shavitt graph is laid out on a grid. Each horizontal line (or level) is labeled by an orbital number. The smallest horizontal spacing is used for the 6 ,’s. The a,- spacing is chosen to be a little larger than that required to accommodate the maximum number of 6 ,’s needed by any a,-.

Arcs (or edges) connect the rows as needed to describe any of the configurations.

Arcs only connect rows that differ by a valid change from one substate to another by one of the four ways electrons can be placed in an orbital. For example, an arc will never directly connect two rows that differ by three electrons. Chaining indices, kitd, record which row in the immediately lower level is connected to row i via an arc representing a step number d. If the arc does not exist, a zero is stored.

A sample Shavitt graph for a multireference SD-CI appears in Figure 7. The limited number of ways electrons can be added to an orbital and the regular struc­ ture of the grid on which the DRT is laid out mean that arcs can have only four different slopes. A vertical arc denotes no electrons, a slightly slanted arc denotes 84

active orbitals

doubly- occupied orbitals

virtual orbitals

Figure 7: Sample Shavitt Graph 85 one electron coupled “up,” a more slanted arc denotes one electron coupled “down,” and the most slanted arc denotes two electrons in an orbital.

4.4.2 External Space Treatment

Orbitals in a Cl calculation can be partitioned into two sets, internal and external

(or virtual) orbitals. Internal orbitals are occupied in some or all of the reference

CFs. Electrons only occupy external orbitals in excited CFs. Since there are usually many more external orbitals than internal ones, they must be treated very efficiently.

In a singles and doubles Cl (including multi-reference Cl) calculation, there will be at most two electrons in the external space. As a result, the structure of the DRT for the external space is very simple. (See Figure 8 .) In fact, it is so simple, it is never explicitly constructed. But the program is written taking this simple structure into account, so it will be discussed extensively.

Integrals can be classified by the number of indices they have in the external space. Two electron integrals can have between zero and four external indices while one electron integrals can have between zero and two. Since the program treats external and internal orbitals differently, this makes a very convenient distinction for breaking the program into pieces. 86

W X Y Z

Figure 8 : The External Portion of the DRT for a Singles and Doubles Cf Calcula­ tion 87

Since there are at most two electrons in the external space, only a limited number of substates can be created. The rows which identify each of these substates has been assigned a name.

W two electrons, singlet coupled

X two electrons, triplet coupled

Y one electron

Z no electrons

The rational for the names of these external vertices is obvious in Figure 8 . Another commonly used set of names are S, T, D, and V for Singlet, Triplet, Doublet, and

Valence .8 7 ’8 9

4.4.3 Symmetry

The important question of symmetry is not addressed by the basic GUGA. Fortu­ nately, for Abelian symmetries, it is not difficult to add. For nondegenerate irreps, irrep products can be calculated directly. The symmetry of a walk or CF is simply the product of the irreps of singly occupied orbitals. To avoid recalculating this value, a vector over internal walks is used to store the symmetry of each internal walk.

Symmetry in the external space is handled somewhat differently. Imagine that

for each existing row (or vertex) there was one row for each irrep in the symmetry. 88

Arcs would only exist for changes that are symmetry allowed. Adding either zero or two electrons would never change the symmetry. Adding one electron would

produce the symmetry corresponding to the product of the orbital in question and current overall symmetry in the partial configuration . 9 0

Thus, if the overall symmetry of two external walks differ, they are numbered

as if they are on different DRTs. Even if the pattern of symmetries is different,

the walks are numbered separately.

Matching up walks for calculating loops values is too complicated if symmetry

for the entire DRT is implemented using this scheme. However, the simplicity of

the external portion of the DRT makes this method very effective and efficient.

Internal walks of a particular symmetry are matched with all external walks

which will produce a CF with the desired overall symmetry. The number of external

walks of each symmetry at each external vertex are known, so the correct CF

number offsets can be applied.

4.4.4 Numbering the Configurations

If the configurations are going to be processed efficiently there needs to be a con­

venient and efficient mapping from CF to CF 1 number. Here is the procedure used. 89

Ordering an Arbitrary DRT

The numbering information is prepared from the bottom or tail of the graph leading toward the top or head. (This is known as forward lexical ordering. Reverse lexical

ordering follows the same principles, but begins at the head of the graph.) A value or weight is associated with each arc and row. For each row, i, this weight is the number of walks that lead from the vacuum row to this row, a;,-. For the vacuum row itself, this is one. Each row in the next level is considered, one at a time.

The furthest left arc leading to this row (the one with this smallest step value) is

assigned a value of 0, y,-to. Each following arc is assigned a weight which is the sum of the of the previous arc’s weight and the row from which it leads.

Vi,d = Vi.d-i + (4.8)

Finally, the row weight itself is the sum of the last arc weight and the corresponding

row weight.

Xi = Vi, 3 + XkiA (4.9)

The CF number of a particular CF can now be calculated as one more than the

sum of the arc weights which it crosses on the way from the tail of the graph to

the head.

This is made more clear in Figure 9 and Table 5. Table 5 demonstrates a

DRT, complete with weights and chaining indices, for a full Cl with 5 orbitals,

4 electrons, and singlet spin. Figure 9 contains the corresponding Shavitt graph. 90

Table 5: Distinct Row Table Corresponding to the Shavitt Graph in Figure 9. It is for a full Cl with 5 orbitals, 4 electrons, and singlet spin.

Level Row a* h h o h i h,2 fc,\3 Vk,i Vk,2 Vk,3 x k

1 1 0 0 0 0 0 0 0 0 0 1 2 1 1 0 0 0 0 1 0 0 0 1 3 1 0 1 0 1 0 0 0 0 0 1 4 1 0 0 1 0 0 0 0 0 0 1 5 2 2 0 0 0 0 2 0 0 0 1 6 2 1 1 0 2 0 3 0 0 1 2 7 2 1 0 2 0 3 4 0 1 2 3 8 2 0 2 0 3 0 0 0 0 0 1 9 2 0 1 3 4 0 0 1 0 0 1 10 2 0 0 4 0 0 0 0 0 0 1 11 3 2 0 5 0 6 7 0 1 3 6 12 3 1 1 6 7 8 9 2 5 6 8 13 3 1 0 7 0 9 10 0 3 5 6 14 3 0 2 8 9 0 0 1 0 0 3 15 3 0 1 9 10 0 0 2 0 0 3 16 3 0 0 10 0 0 0 0 0 0 1 17 4 2 0 11 0 12 13 0 6 14 20 18 4 1 1 12 13 14 15 8 14 17 20 19 4 1 0 13 0 15 16 0 6 9 10 20 5 2 0 17 0 IS 19 0 20 40 50

The small numbers give the row and arc weights. The CF number of the walk in

bold is 1 + 0 + 2 + 0 + 8 + 20 = 31. If the five orbitals were a, 6, c, d, and e, it

would represent the configuration, b2de.

Numbering of External Walks

The CF numbering method provided by the GUGA is conceptually simple and

efficient. But for an arbitrary DRT, it requires the explicit use of the arc weights to 91

2 0

Figure 9: Shavitt Graph Corresponding to the DRT in Table 5. The small numbers represent the arc and row weights. 92 w X z

1 5

Figure 10: External DRT For W, X, and Y Vertices calculate the CF number for an arbitrary walk. This is inconvenient for vectorizing a program. Fortunately, the simplicity and regularity of the external portion of the DRT can be exploited to yield a very simple numbering scheme.

The internal portion of the DRT uses reverse lexical ordering, starting from the head of the graph. This results in separate numbering for the W, X, Y, and Z rows. The external space uses the forward lexical ordering discussed above.

First, there is only one external Z walk, so it has no effect on CF numbering.

Then, it is as if there were a separate DRT connected to each of the external vertices, W, X, and Y. Figure 10 demonstrates how the W, X, and Y portions of 93 an external DRT are numbered. The analytic formulas to use are obvious:

Yi = * (4.10)

X,J = ^ ~ ~2) +», i j i j (4.11)

Wo = (412) where t and j are levels at which an electron is added, and j > i.

Symmetry actually simplifies this. Orbitals of the same symmetry are blocked together, so they can be treated as a group. For orbitals of different symmetry, the walk number becomes

Wij = Xij = (j - 1 )Ni + i, (4.13) where Ni is the number of external orbitals having the same symmetry as orbital i, and i and j have been numbered within their symmetry block. See example in

Figure 1 1 . For orbitals of the same symmetry, the equations are the same as for no symmetry.

4.4.5 Matrix Element Evaluation

To calculate the matrix element between two CFs, first draw the two walks corre­ sponding to each CF. Both walks will begin at the tail and end at the head of the graph. Where the configurations are the same, the walks will either be superim­

posed on top of each other or parallel. Where there is a noncoincidence, the walks

will either approach each other or separate. The graph formed by the two walks Figure 11: External Portion of DRT for Walks With Orbitals of Different Symme­ tries 95

Figure 12: Diagram of Two Configurations That Have at Least Three Noncoinci­ dences. Note the three loops. 96 will consist of portions where they overlap and “loops” where they separate and

rejoin. (See Figure 12.)

If there are three or more loops, there are more than two noncoincidences. The

Slater-Condon rules state that matrix elements between CFs that differ by more than two noncoincidences will be zero. Thus the matrix element described by the walks will be zero. If there are two loops, then there will be at most contributions from two-electron integrals. More quantitative rules are required because even

if there is only one loop, the value might be zero because the noncoincidences

“overlap.”

Using second quantization, in a Hilbert space constructed from a one-electron

basis, the Hamiltonian can be recast as

VL = Y^iaij(i\h\j) + £ aijk&j\M]. (4.14) *'. j » .j\M

Clearly, matrix elements consist of sums over products of integrals and coupling

coefficients, afJ- and The coupling coefficients are matrix elements of unitary

group generators, and products of two generators, EijEu — SkjEu.

As can be found in reference 70, insightful use of unitary group properties

eventually lead to the formula for single generators,

(m'l-E.-jlm) = W(Tt ,h ), (4.15) k-i

for a loop extending from row level i — 1 to level j (i < j), where 7* is the “shape”

of the loop segment at level fc, bk is the b value of the m walk at the k level, and 97

W (T, b) is the value for that particular loop segment. There is a similar formula for two generator matrix elements. (The matrix element vanishes unless the two

walks coincide outside the loop.)

Only a few specific loop segment shapes have non-zero values. Using the above

formula, the values of these segments can be tabulated. Using the coupling co­

efficients from the tables and the integral values previously determined, matrix

elements can be calculated easily.

4.4.6 Matrix Element Contributions from Integrals with External In­

dices

Exhaustive tables of nonzero loop values have been tabulated . 9 1 The task is to

use them to organize the calculation in terms of matrix operations. Only a few

illustrative examples are presented here. They suffice to demonstrate that this

organization is possible. The actual changes made were based on an examination

of the Fortran code. A complete reworking from the tables would be a useful

project.

Four-External Matrix Elements

We first consider loops of type 1, such as in Figure 13. Assume each of the four

in d ic e s i < j < k < £ is in a different symmetry block. The CF numbers for the 98

Type 1 Type 2 Type 4

W Y z W Y Z W Y Z

1 !

k k k

[:ij\k£] + [ik\jt\

Figure 13: Representative Examples of External Portion of DRT Loops with Four External Indices and Their Integral Contributions 99 bra walks are given by

offset 4 - (£ — 1)N j + j (4.16) and for the ket walks by

offset 2 + (k — 1 )Nr + i, (4.17) where the offsets are appropriate for the internal walk symmetries of the external indices. For fixed k and £ values, the £ and j indices describe subvectors of the

CF list. If the integral linear combinations, I ^ t = [iji k£\ + \jk‘,i£\, are blocked correctly then the operation

can be performed as an explicit matrix-vector operation. Of course, the corre­

sponding

— IxjktCj (4*19)

must be performed in order to include both the upper and lower halves of the

Hamiltonian matrix.

What can be obtained by not ignoring the k and t indices? Unfortunately, a

matrix-matrix product is not possible. But consider (ik) = (k— l)Nj + i as a single

index ranging from 1 through Nr N i . It will be continuous because i varies from

1 through Nj and k varies from 1 to Nr . Treat (j£) analogously. Assuming the

integral linear combinations are blocked appropriately, the integral contributions 100 now have the form

a(ik) = h^)(jt)cUt)- (4.20)

This is still a matrix-vector product, but now the vectors are N i Nk and N j N l elements long instead of merely Ni and N j elements long. Even for relatively small basis sets, this will move these products well into the asymptotic speed region for matrix-vector products.

This “supermatrix-supervector” enhancement has not been implemented for most loop types. The integrals are not arranged properly. The ease and usefulness of this change were not recognized when the programs were being modified. It would be worthwhile to add this change.

When the indices are symmetry distinct, the type 2 loops are handled the same as type 1 loops, except that different integral linear combinations, Iijkt =

[ij] M\ -f [ik;j£\, are used. When i and j can have the same symmetry, the story changes. Now i and j can refer to the same level, leading to loops of type 4. Since type 1 and type 2 loops change into each other if i is allowed to become greater than j, it is straightforward to process both as parts of one matrix. Allow each index, i and j, to range independently from 1 through N[ (= A j), instead of enforcing i < j. Then use an integral linear combination matrix of the form

( [ij\kC] + [jk] it] i < j Type 1 Iijke = < + [ik-,id] i = j Type 4 . (4.21) [ [*'j ; ki] + \jk] ii] i > j Type 2

The type 4 loops are handled as the diagonal of the integral contribution matrix. 101

Type 1 W Y Z

P

Figure 14: Representative Example of External Portion of DRT Loops with Two External Indices and Its Integral Contributions

Two-External Matrix Elements

Two-external integrals have two indices in the internal portion of the DRT and two in the external portion. The contribution from the internal portion of a loop is pre-tabulated and used as needed during the construction of the Hamiltonian matrix-vector product.

While only having two indices in the external part simplifies the calculation of a particular loop type, there are actually more distinct loop types. For example, consider the external portion of a two-external loop in Figure 14. The k and t 102 indices don’t appear because they now reside in the internal portion of the DRT.

The existence of index p is an example of the flexibility provided by having only two external integral indices. Since it appears outside of the loop, it has no effect on the integral contribution to various matrix elements. But it does influence which matrix elements the integral does contribute to, so it appears in the sums.

The formulas for the bra and ket walk numbers are very similar to those in the four external case,

CFbra = offseti + (i — 1 )Np + p (4.22)

CFket = offset 2 + (j — 1 )NP + p. (4.23)

The integral contribution now has the form

Iij = A([ij\ki\ + \\jk\if]), (4.24) where A is the contribution from the internal part of the loop. Now the integral contribution to the matrix element becomes

4.5 Results

Now for the important practical question: what effect did the changes discussed above have on the COLUMBUS Cl diagonalization program? How much faster is the 103 program? This question is easily addressed by comparing execution times before and after optimizing the program.

Three test cases were used for most of the timing tests. Detailed information is provided in Table 6 . Briefly, they consist of a small CH 2 calculation, a large basis

CH4 calculation, and a large basis, multi-reference CH 2 calculation.

Execution times are reported for these test cases in Tables 7 through 9. The numbers are broken down by the number of integral indices in the external space, as discussed earlier.

The small CH 2 test case shows a relatively small speed improvement. This is due to the small basis. The number of external orbitals determine vector lengths and hence calculation speed. This test case has relatively few external orbitals.

Nonetheless, a speed increase of 2x is respectable. (Actually, this calculation was not run with the final version of the program. This could have increased the speed improvement to about 2.5x.)

The sample CH 4 calculation has a much larger basis set. The small number of configurations is the result of the calculation only having a single reference configuration. This example shows an improvement of more than 4x. This result was quite pleasing. Coupled with the large memory of a Cray computer, this speed allows very large calculations to be tackled.

The final large CII 2 results demonstrate that the speed increase holds for a large basis, large multi-reference calculation. The table reports a speed increase of 104

Table 6 : Descriptions of the Test Cases Used to Determine the Speed of the COLUMBUS Cl Diagonalization Program

Small CH2

State 3A" (3 BX CH2 in Ca symmetry) Basis (9s5pld/5slp) —> [3s2pld/3slp] (General Contraction)

Orbitals 2 1 a' (one frozen-core) 6 a" References 99 (Complete Valence Space) Configurations 32,144

CH4 in Ca symmetry

State M' Basis 6-311+G**(2d/), i.e., (12s6p2dl//5slp) —» [5s4p2dl//3slp] Orbitals 35a' 23a" Reference SCF Configurations 11,539

Large CH2

State 3 BX Basis (Ils7p3d2//7s2p) —* [9s6p3d2//5s2p]

Orbitals 34ax (C Is not frozen) 7a 2 15&i 2262 References 51 (Complete Valence Space) Configurations 517,239 105

Table 7: CPU Time in Seconds for Small CH 2 Test Case. Ratios with the new time are reported in parentheses.

Program 0 Cray X-MP/24 IBM 6 new old old

CITRANC sort 0 . 2 1 0.19(0.9) .80 (3.8) totald 1 . 8 6 1.79(0.9) 10.39 (5.6)

CIDIAG 4x , 1 . 1 1 1.38(1.2) 4.11 (3.7)

3x 0.94 1 .8 8 (2 .0 ) 8.14 (8.7) 2 x 2.93 12.18(4.2) 38.91(13.3) lx e 2.08 2.03(1.0) 9.23 (4.4) 0 xe 0.38 0.37(1.0) 3.15 (8.3)

1 st iter 7.48 18.88(2.5) 64.49 ( 8 .6 ) totalrf 53.49 109.06(2.0) 400.97 (7.5)/

“CITRAN is the integral transformation program and C1DIAG is the Cl diagonalization pro­ gram. CIDIAG is usually the dominant computational step. AThe IBM times are for the old programs because the timing changes are expected to be negligible, except for 2x where the time will increase due to full matrix multiplications. eCITRAN times were included because in the new version the sort phase calculates the linear combinations needed for 4x and 3x. Thus this step is actually slower in the new program. dTotal refers to the time for the entire calculation including all overhead, such as link-editing. *The differences in Cray times for lx and Ox give an indication of the precision of the timings because those routines are identical in the two versions. ■^This calculation actually timed out, but it was almost finished. 106

Table 8 : CPU Time in Seconds for Ca Symmetry CH4 Test Case. Ratios with the new time are reported in parentheses.

Program® Cray X-MP/24 IBM 6 Pittsburgh Boeing new old old

CITRANC 23.68

CIDIAG 4x 0.55 1.98(3.6) 10.78(19.6) 3x 0.07 0.21(3.0) 2.00(28.6) 2 x 0.18 1.36(7.6) 9.28(51.6) lx 0.03 0.03(1.0) 0.26 (8.7)

Ox 0 . 0 0 0 .0 0 (1 .0 ) 0 . 1 1 (??) 1 st iter 0.83 3.57(4.3) 22.43(27.0) totald 5.21 22.03(4.2) 139.7 (26.8)

“CITRAN is the integral transformation program and CIDIAG is the Cl diagonalization pro­ gram. CIDIAG is usually the dominant computational step. 6The IBM times are for the old programs because the timing changes are expected to be negligible, except for 2x where the time will increase due to full matrix multiplications. CCITRAN time included for comparison with CIDIAG time. dTotal refers to the time for the entire calculation including all overhead, such as link-editing. 107

Table 9: CPU Time in Seconds for Large CH 2 Test Case. Ratios with the new time are reported in parentheses.

Cray X-MP/24 Cray X-MP/24 Program 0 Ohio Boeing new old

CITRAN 6 sort 12.97 12.34(1.0) total 0 38.38 38.16(1.0)

CIDIAG 4x 22.77 70.73(3.1) 3x 18.17 48.47(2.7) 2 x 45.06 278.46(6.2) lx d 25.03 21.56(0.9) Ox* 2.33 2.40(1.0) 1 st iter 114.07 421.73(3.7)

“CITRAN is the integral transformation program and CIDIAG is the Cl diagonalization pro­ gram. CIDIAG is usually the dominant computational step. *CITRAN times were included because in the new version the sort phase calculates the linear combinations needed for 4x and 3x. Thus this step is actually slower in the new program. cTotal refers to the time for the entire calculation including all overhead, such as link-editing. dThe differences in times for lx and Ox give an indication of the difference in speed of the two machines because those routines are identical in the two versions.

3.7x. If the slower effective speed of the computer used to calculate the new result is taken into account, the speed increase is again more than 4x. (The difference in computer speed is apparent in the lx times. That routine did not change between the two versions.)

It is interesting to consider the results in a little more detail. Because they were such a small portion of the total time, the Ox and lx portions were not optimized.

Thus, their speed didn’t change (unless the speed of the computer itself changed). 108

For the multi-reference calculations these portions now consume a non-negligible fraction of the total time. Clearly these routines now deserve attention.

The speed of the two-external portion of the program increased by between

6 and 7x. This significant speed increase is due to the extensive use of matrix- matrix multiplications. The four- and three-external portions of the program are also significantly faster, but not by as great an amount. They are around 3x faster.

They only take advantage of matrix-vector products.

Two factors determine the size of a Cl calculation—the basis set and the refer­ ence space. The larger the basis set, the larger the external space. Increasing the external space increases vector lengths and hence the speed of the program. In­ creasing the reference space, increases the complexity of the internal space, but has no effect on the external space. This does not affect the speed of the calculation, only the total time. These observations are born out by the speed improvements demonstrated by these test cases.

An obvious question is how does the time of a Cl calculation depend on the number of configurations. Table 10 reports the size and times of several CH 2 calculations. Details on the calculations appear in chapter V. As the numbers in the table and in Figure 15 indicate, the time required for a Cl calculation is not far from linear in the number of configurations. The plotted curve through the singlet points shows times proportional to about the number of configurations raised to the 1.34 power, while the triplet points indicate a power of about 1.06. While this 109

Table 10: Time per Cl Diagonalization Iteration for Several CH 2 Calculations

State Calculation 0 Configurations Time per Iteration

CVR FC 139,939 18.65 lAi CVR HF 416,296 66.39 *Ai CVR HF-DRT 419,698 68.28 *Ai CVR 488,448 86.28 'Ai CVR 2-SYM 610,806 111.73 JAx CVR 4-SYM 702,366 133.62 3Bi CVR FC 190,621 30.02

3 B, CVR HF 512,514 109.51 3Bi CVR HF-DRT 517,239 114.96 3Bi CVR 729,934 176.92 3Bx CVR 2-SYM 901,342 225.94 3Bi CVR 4-SYM 1,139,860 276.49

“Definitions for these codes are given in chapter V. is not exactly linear, it is not far from it. Once again, it demonstrates that very large calculations are feasible with these programs.

An interesting sidelight of the CH 4 numbers is the relative times for the trans­ formation and the diagonalization. The transformation takes longer than the Cl diagonalization! Observations suggest this is a general result. The most time con­ suming step for a single-reference Cl calculation is the integral transformation.

While it is thought that the time required for the transformation can be reduced, the conclusion is the same—single-reference Cl calculations are inexpensive.

The numbers in chapter V indicate the CII 2 singlet and triplet potential energy surfaces required about 30 hours. Increasing the Cl diagonalization time by a factor Program Figure 15: Time in CPU Seconds for One Iteration of the Cl Diagonalization Diagonalization Cl the of Iteration One for Seconds CPU in Time 15: Figure

CPU Time (Seconds) CM in CO O o CM o in o 0 5 5 5 5 6 e6 e6 e5 e5 e5 e5 e0 0 2.0 Triplet CH2 Singlet CH2 Singlet Configurations

8.04.0 6.0 1.0 1.2 110 Ill of 4 would have increased this to more than 55 hours. This time does not include the many preliminary calculations that were performed. This significant increase in cost would have left the potential energy surfaces more tedious, time-consuming, and expensive.

Finally, a few comments actually about speed, not time used. Recent Cray com­ puters, including the Cray Y-MP, include hardware performance monitors. These monitors can record the precise number of floating point operations performed in a given period of time, without affecting the performance of the executing program.

This allows the speed of a program to be easily determined.

A large basis set calculation of C 3 H3 has achieved 192 Mflops . 9 2 A complete- valence space calculation on CH 3 in G„ symmetry using a cc-pVQZ basis reached

231 Mflops . 9 3 This calculation has 5,666,940 configuration. Since the theoretical maximum speed of a Cray Y-MP is 333 Mflops, these are vary impressive speeds for an actual production program.

4.6 Further Work

These results make it clear that forethought, planning, and careful programming can produce highly optimized, yet portable programs. Speeds of 200 Mflops can be realized on a Cray Y-MP, yet a Sun-3 can be used for debugging or program development. 112

4.6.1 Taking More Advantage of GUGA Organization

We have seen the usefulness of the organization features offered by GUGA. A simple examination of a few loop types has been used to explain the existing program organization and point out explicit improvements possible.

It would be worthwhile to undertake an exhaustive examination of all GUGA loop types in the light of vectorization and matrix operation possibilities. Further improvements would probably come to light. In addition, the resulting subroutines would be very clear, well documented, and easy to maintain.

A specific example is including the internal part of the DRT in the matrix optimization. This can be useful either to change vector operations into matrix operations or simply to increase the length of the vectors processed.

4.6.2 Parallel Processing

It is unlikely that CPUs 100 times faster than a Cray Y-MP will be available in the immediate future. But joining 100 Y-MP-class CPUs is a reasonable prospect.

Taking advantage of the computing power available in a large number of CPUs is a challenging task. Yet this is the challenge that must be faced as we attempt calculations on larger and larger molecular systems.

Several aspects of the GUGA lend themselves to parallelization. First is the or­ ganization in terms of matrix products. Matrix operations achieve near peak speed with relatively small matrices. Thus, a large matrix operation can be “stripmined” 113 and spread across a large number of CPUs without seriously harming the perfor­ mance on any one CPU.

The detailed discussion of classes of loops and integral contributions suggest that they could be handled by separate processors. In fact, some different classes would contribute to different portions of the cr vector so they could proceed without interference.

Finally, the program allows the c and cr vectors to be segmented so that only a few portions have to be in main memory at one time. A parallel computer could take advantage of this situation by treating each segment on a different processor.

This is a wide range of parallel options. Low-level parallelization can use the relatively small amount of work in a task to do load-balancing. This insures that all processors axe active at all times. High-level parallelization requires few syn­ chronization points and so experiences less overhead. Experimentation will teach us what level of parallelization is appropriate for this program on various hardware platforms.

4.6.3 Documentation and User Interface

If we have stressed the good points of the COLUMBUS program system, the weak points must at least be acknowledged. The programs were developed with an eye on the results and they have been maintained and improved by other results- oriented users. As a result, the documentation is mostly sketchy and incomplete. The input is organized to simplify the program, not meet the needs of the user.

If the programs are to ever see very wide use, both of these issues will need to be addressed. C H A P T E R V

CH2 Potential Energy Surfaces

5.1 Background

Theoretical chemists have been interested in CH 2 for a long time. It is the sim­ plest polyatomic molecule with a triplet ground state, is the prototype carbene, and plays an important role as an intermediate in chemical reactions. Its small size encourages hope that accurate calculations will be feasible, while its high re­ activity makes experimental observations difficult. These factors make accurate calculations both interesting and useful.

CH2 has been involved in several controversies between experimentalists and theoreticians. Since this work is an indirect result of the more recent conflict, the next few paragraphs briefly review this interesting history. More information can be found in several good review articles available .9 4 ,9 5

The spectrum of 3Bi CH 2 was first analyzed by Ilerzberg in 1961.96,97 While recognizing the molecule might be bent, he concluded it was linear because of the absence of A" > 0 bands.. While some early calculations suggested it might be

115 116

bent , 9 8 they were not taken seriously. In fact, some theoretical studies assumed it was linear . 9 9 - 1 0 1 In one investigation which did obtain a bent structure, the researchers accepted the experimental evidence for the linear structure because the bending potential was very flat . 1 0 2

Bender and Schaefer made names for themselves and for theoretical chemistry when they performed a series of calculations that allowed them to insist that

3 BX CH 2 was bent . 1 0 3 When ESR results also indicated a bent structure , 1 0 4 the vacuum ultraviolet adsorption spectrum was reanalyzed. While requiring the ad hoc assumption that predissociation broadened the expected K > 0 lines beyond detection, other evidence did suggest a bent moleule . 1 0 5 Of course, 3 BX CH2 is weakly bent, so “quasi-linear” is really a better description.

The more recent disagreement concerned the difference in energy between the

3 BX CH 2 ground state and the 4AX CH2 first excited state. Most experiments suggested a difference of between 8 and 9 kcal/mol, but they were indirect mea­ surements. Using laser photodetachment photoelectron spectroscopy of CH 2 ions,

Lineberger and co-workers performed the first direct measurement and obtained a value of 19.5kcal/mol.106 Since it was a direct measurement, chemists took this result very seriously.

Semi-empirical calculations provided values ranging from 0 to 44 kcal/mol . 1 0 7 ,1 0 8

While in many situations these methods can be of great practical value and utility, 117

Table 11: Ab initio values for the singlet-triplet splitting (in kcal/mol) for various basis sets and levels of theory.

Calculation 0 Basis Set*1 MBS DZDZPVL

SCF 40.16 32.4C 26.2C 24.8d

2C-SCF 31.6e 2 2 .8 C 1 2 .8 C 10.7' SR-CI 23.7C 14.6° 13.1s 2R-CI 2 2 .2 C 1 2 .2 C 9.8'

“See Table 12 6 Reference 55 “ R eference 109 ^R e fe re n ce n o “Reference 102 ' Reference 111 3 Reference 112 h Basis Set Abbreviations:

MBS minimal basis set DZ double zeta DZP double zeta with polarization VL very large basis sets, from (8s5p3

On the other hand, as the quality improved, ab initio calculations showed a clear convergence on a value near 10 kcal/mol. This was true for improvements in either the basis set or the level of correlation. This can be clearly seen in Table 11 which is a portion of Table 5 in reference 94. The abbreviations for the types of calculations are defined in Table 12. More detail on the types of calculations will be given later. 118

Table 12: Abbreviations Used to Identify Different Types of Calculations

Basic Calculation Types : 0

SCF Self-Consistent Field

2 C-SCF Two Configuration Self-Consistent Field for singlet (SCF for triplet) SR-CI Single-Reference Configuration Interaction

2 R-CI Two-Reference Configuration Interaction for singlet (SR-CI for triplet) CVS-SCF Complete-Valence-Space SCF (complete-active-space SCF with all valence electrons and orbitals in the active space) M R-CI Multireference Configuration Interaction CVR-CI Complete-Valence-Reference Cl (complete-active-space with all valence electrons and orbitals is reference for MR-CI)

Modifiers To Basic Calculation Types : 6

FC Frozen Core HF Hartree-Fock Interacting Space Limitations HF-DRT Hartree-Fock Interacting Space Limitations, but only deleting rows and arcs, not individual walks MF Uses Orbitals from an MCSCF performed with Frozen Core CV Core-Valence — only single excitations allowed from core or­ bitals 2-SYM References of Two Symmetries are Allowed. These are the two Civ symmetries that correlate the the Ca symmetry: Ai and B2 for the singlet and B\ and A2 for the triplet. 4-SYM References of all Four C2v Symmetries are Allowed HF2 HF and 2-SYM HF-DRT2 HF-DRT and 2-SYM Dav Davidson’s correction

°0 n ly one of these w ill be used at a time. frMore than one of these may be used. A few are already combinations of two others. 119

This table makes several points clear. First, the basis set needs to be of at least DZP quality. Smaller basis sets are inadequate. Second, the singlet state needs at least two configurations to be treated equivalently with the triplet state.

The reason for this is discussed later. Finally, correlation has a significant effect on the singlet-triplet splitting. The net result is that a 2R-CI calculation with a

DZP basis set provides an answer quite close to the correct value. One can feel confident in this value because of the clear pattern of convergence.

This controversy encouraged researchers to perform additional direct measure­ ments of the singlet-triplet splitting . 1 1 3 ' 1 1 4 One of the more interesting ones was performed by Lee and co-workers . 1 1 4 An excimer laser dissociated ketene. The velocity of the fragments were obtained using an ionizer and a quadrupole mass selector. This allowed the heats of formation of singlet and triplet methylene to be calculated. The singlet-triplet splitting obtained was 8.5 ± 0.8 kcal/mol.

A direct spectroscopic measurement of the singlet-triplet gap was obtained by the Laser Magnetic Resonance (LMR) spectroscopy work of McKellar and cowork­ ers. 1 1 5 A perturbation was found in the triplet levels due to a nearby singlet level, which allowed the relative positions of the triplet and singlet energy ladders to be determined. Later, a direct singlet-triplet transition was found.

LMR spectroscopy uses a powerful magnetic field and the Zeeman effect to tune molecular transitions into resonance with a particular laser line. This very sensitive technique is ideal for studying highly reactive triplet species, such as CH 2 . 120

Unfortunately, the interpretation of LMR results is very difficult. One needs a good idea of the correct answer in order to limit the possible assignments that must be considered. High-quality theoretical calculations can provide a good starting point for the LMR analysis. The following work is one effort to provide that information.

Now back to the photoelectron spectrum that began this discussion. Using a flowing afterglow ion source, this spectrum was remeasured. This time lines from vibrationally excited states of CHJ, or “hot” bands, were missing. From the remaining lines, it was easy to determine that the triplet-singlet splitting was about 9 kcal/mol . 1 1 6 This is yet another example in the history of science of the challenges faced when attempting to interpret experimental results.

5.1.1 Electronic Structure

To understand the electronic structure of CH 2 , let us begin by looking at a chart of the lowest orbitals. The molecule is oriented as indicated in Figure 16.

Table 13: CH 2 Orbitals

la i C ls Carbon inner shell

2 ai C(2 s + 2pz) + Hals + H&ls Symmetric C—H bond

lf>2 C2pj, + Hals — Hftls Asymmetric C—II bond 3aj C(2s — 2pz) In-plane nonbonding orbital

I 6 1 C2 px Out-of-plane nonbonding orbital

The lax orbital is the carbon Is inner shell orbital. The 2ai, 1 b2, and 3aj

orbitals are from an sp2 hybridization on carbon. Six of the eight electrons fill Figure 16: Orientation of CH 2 the