Chapter 1

Introduction

"If you teach a computer to write a piece of music by feeding it an algorithm, have you composed the resulting piece or has the computer?" - Alexander Gelfand; The Sounds of Science; The Walrus (Toronto, Canada); Jun 2007

1.1 A Journey through Possibilities in Scientific Computing

In broader terms, Scientific Computing is the application of computers for solving a variety of computational problems in Science. This ever expanding field, originally purely oriented towards applications of computing methods for Science, has matured over past decade or so and has, in fact, contributed many original algorithms and tech­ niques to a general field of Computer Science. With the advances in modern scientific techniques enabling better cure of diseases, deeper view into the origin of our universe and better control of tiny electronics for designing the next generation computing de­ vices; the related data and the complexity of handling the interpretation of these have become a major challenge. At this juncture, methods in Computer Science and Scien­ tific Computing have helped scientists to have faster and better understanding of data available at their disposal. Today, the methods in Scientific Computing have a far and wide-ranging impact on the way Science is understood, explored and put to daily use in everyone's life. The use of computer as a tool for scientific discovery has largely been made possible by immense technological advances in the development of Central Processing Unit (CPU), commonly referred to as processor. In the later half of the last decade, these developments have brought in the power of a supercomputer to a desktop machine requiring far less infrastructure (in terms of electrical power and cooling) for taping into its computational resources. Development of the IBM's Cell Processor[l, 2, 3] and Multi-core processors from Intel[4], AMD[5] and Sun Microsystems[6] have got to the 2 desktop what was only possible on a supercomputer only a few years ago: Parallel Programming[7, 8]. In recent years, the computing power has slowly started moving away form a mere desktop to living room (set-top boxes, high-end gaming stations such as PS3[9], X-Box[10]) and mobile devices, which has opened up a whole new possibility of utilizing the computing power offered by these devices for scientific exploration. This trend is evident from the recent release of the Folding@Home clients for PS3 and X-Box gaming machines[ll]. These new generation gaming devices are, infact, powered by highly parallel vector processors such as IBM's Cell processor, which in turn provides a very attractive platform for high performance scientific code. One of the first such codes to take advantage of this new architecture is the Folding@Home project. This project uses a highly distributed setup of voluntary contributions from a number of PC or PS3 owners to study protein folding problem largely related to various disorders affecting humans [12]. There is also a trend in using the auxiliary power offered by a graphics processing unit (GPU) to do some computations asynchronously with the main CPU[13]. Out of curiosity, the author himself has been exploring the use of programmable mobile devices for performing small quantum chemical calculations[14].

So much is the thrust in the field of Scientific Computing that a leading peer- reviewed journal Nature carried out a special news feature series titled "2020 Comput­ ing" [15]. In these series of articles, scientists working in various fields have evaluated the possibilities of use of computers in coming years based on current trends and re­ search. Fields as diverse as Geography, Earth and Seismic Science, Astronomy, Eco­ nomic and Financial analysis and even Psychology increasingly use massive computing power which, many a time, handles real time physical data. Apart from conventional semiconductor-based computers, there is also a trend towards building highly paral­ lel computers based on quantum theory[16, 17]. These computers use a Quantum Bit (generally known as qubit) as a building block rather than a conventional Binary Digit (referred to as just Bit). Qubit, in comparison to its classical counterpart can take three states: 1, 0 and a superposition of 1 and 0; potentially allowing infinite amount of information to be represented in single such quantum bit. Further, these computers are claimed to be far superior in solving certain problems that are very difficult or are not solvable using current day technology [18]. All these interesting aspects of Scientific Computing are the basic propelling fac­ tors behind the author's interest in investigating them, though in this case restricted to and related visualization tools. The current work focuses on 3 the scaling problem of current ab initio quantum chemical codes. It will also dwell into the development of Web based tools and an integrated development environment (IDE) specifically tailored for structure-based computational chemistry.

1.2 Codes, Scalability and Linear Scaling Algorithms

Over the past decade or so, computational chemistry has played a major role in assisting an experimental chemist either to explain the results of his experiments or to aid him into discovering the unexplored chemical pathways. But in recent years, a computational chemist is rather playing a more proactive role by being a medium of future discoveries or designing of novel molecules and clusters.

Increasing Decreasing system size -H system size 6 s$? -W" -0(10000) ~O(1000) ~o(i6bj -O(10) Few atoms FF Semi Ab initio HF MP2 CC Empirical COMPUTATIONAL COST

Decreasing time/ * Increasing time/ storage complexity storage complexity

Figure 1.1: Complexity and sophistication of various methods used in structure based computational chemistry.

Of the various methods available to a computational chemist, ab initio quan­ tum chemical methods have a wider acceptance for their ability to predict experimental results quite accurately. Other methods such as molecular mechanics[19, 20] and semi- empirical [21. 22, 23] methods are fast but not reliable and are based on a number of assumptions. Ab initio quantum chemical methods on the other hand, provide approx­ imate solutions to the Schrodinger equation (Eq. 1.1), which are accurate enough to describe interactions in a many particle system. In Eq. 1.1, H is the Hamiltonian, E represents the total energy while * represents the molecular wavefunction.

HV = EV 1.1)

However, these approximate solutions such as the Hartree-Fock (HF) equations have very high scaling factor (typically 0(N3)) with respect to the functions 4 used to approximate the atomic orbitals[24, 25]. For more accurate methods which take electron correlation effects into consideration, such as M0llei-Plesset second order per­ turbation theory (MP2) and coupled cluster (CC) methods have an even worse scaling factor of 0(N5) or higher[24, 26]. This scaling factor is not only in terms of compute requirements but also in terms of memory requirements which further restricts the use of these methods to only a few hundred atoms for HF method and to only a few tens for correlated (or post-HF) methods. The schematic representation in Fig. 1.1 sums up the memory and time complexities of various methods for single point evaluation of energy. This also clearly brings out the need for improvements in the current conven­ tional methods and codes, if ever they need to be routinely applied to larger systems of chemical and biological interest. The ensuing Sections bring out some methodological and computational details pertaining to ab initio quantum chemical methods which are investigated in this thesis.

1.2.1 Hartree-Fock, M0ller-Plesset and Density Functional Theory

The goal of all quantum chemical methods applied to many electron systems is to calculate chemical properties of molecular systems from first principles. This essentially involves solving the the Schrodinger equation (Eq. 1.1), as accurately as possible, to determine the wave function

Hartree-Fock Method

The time-independent Schrodinger equation cannot be solved exactly except for simplest systems, so for practical applications a series of approximations were intro­ duced. One of the most important of these is the Born-Oppenheimer approximation[27], where in motions of electron and nuclei are treated separately owing to enormous dif­ ference in their masses. Another major simplification based on the Born-Oppenheimer approximation was introduced by Hartree and Fock[28, 29]. The so called canonical Hartree-Fock equations are given as:

FA = f^lH (1.2) where F is known as Fock operator and the index i is used to represent the total number of wavefunctions used to represent the system for which the orbital energies are ej. Roothaan and Hall[30. 31] derived a particular case of Hartree-Fock equations by expanding the molecular orbitals ^'s as a linear combination of atomic orbitals (AOs), 5 tfi. These AOs are again expanded as linear combination of a fixed set of basis functions p, the Hartree-Fock equations now can be represented in a matrix form which are also known as Roothaan equations. For closed shell systems they are of the form:

FC = eSC (1.4)

In the above equation, F is the Fock matrix, C is expansion coefficient matrix and S is the overlap matrix, while e are the eigen values representing the orbital energies of the system. All the matrices are of dimension N x jV, for a molecular system represented with TV basis functions. Eq. 1.4 is a form of pseudo-eigenvalue problem and hence cannot be solved directly. Hence an iterative method is employed for solving this equation. This method involves providing a trial function as a initial guess and then successively improving it using a procedure known as Self-Consistent Field (SCF) procedure. The basic computational steps involved in HF SCF procedure are depicted in Fig. 1.2. The major task in the SCF procedure is evaluating one- and two- electron integrals over the basis functions p's, which are needed at each iteration of the SCF. However, the integrals themselves do not change during the SCF procedure. Of the two types of integrals, two-electron integrals also termed as electron repulsion integrals (ERI) are more compute intensive. While ERIs have a complexity of 0(N4), for one electron integrals it is 0(N2). In fact, ERI evaluation is the most computationally expensive as well as storage intensive part of SCF procedure. Once the integrals are evaluated, the overlap matrix S and a guess density matrix P is set up. Next, the initial Fock matrix F is setup from one electron Hcore matrix, density matrix P and the two-electron integrals < ij\kl > as follows:

N x re ik L5 Fij = Hf° + £ Pkl[< ij\kl > - g < W >1 ( )

The Fock matrix is then diagonalized to obtain the MO coefficient matrix C, which is then used to refine the guess P matrix. This iterative process is then continued till the difference between the old P matrix and the newer refined matrix is below a threshold value. The total energy of the system is then computed as: C Start )

Read nuclear co-ordinates, etc. i Calculate {S }, {hj and {} I Diagonalize 5; X=SI/2; guess P0'

Set up the initial Fock matrix F J. P = XTX, diagonalize F'; FC = C'E, obtain C T Form P^1* from C = XC No paidi/d _ nriewnn

Write MOs (matrix C) and Pn

C Stop ")

Figure 1.2: Flowchart for the iterative SCF procedure of a Hartree-Fock calculation at a fixed geometry. The following notations are used. S represents the overlap matrix, hij are the elements of the Hcore matrix, F represents the Fock matrix, P denotes the density matrix and C represents the MO coefficient matrix. See text for details. 7

In the above equation, ENN, PijHj?re and PijFji represent nuclear-nuclear repulsion energy, nuclear-electron attraction energy and two electron repulsion energy respectively. Note that the diagonalization procedure (which typically scales as 0(N3)) is also one of the computationally expensive parts of the SCF procedure. However, this cost is far less as compared to evaluating the two-electron integrals. Section 1.2.2 will provide few more details on computational and storage requirements.

M0ller-Plesset Method

While the HF approximation is extremely successful in many of the cases, it has some known limitations [24]. A number of methods, generally termed as post- HF methods, provide more accurate descriptions of chemical systems. However, it must be emphasized here that the HF SCF procedure forms the basis of all post-HF methods. In other words, all post-HF methods are corrections introduced to HF energy and are not a completely different theory. Examples of such post-HF methods include configuration interaction (CI), M0ller-Plesset perturbation theory (MPPT) and coupled- cluster (CC) methods[24, 32]. In the present work, one such method, viz. M0ller- Plesset perturbation theory is considered. MPPT is a perturbation method wherein the zeroth order term is the HF wavefunction. The perturbation term is difference between the approximate Hamiltonian and the actual molecular Hamiltonian. When such a perturbation expansion is carried out, corrections to HF-SCF energy and wavefunctions to various orders can be obtained. A perturbation expansion truncated at order 2 is called as second order Moller-Plesset perturbation theory (MP2). The present work restricts its investigations only to MP2 level of theory. The MP2 correction to the energy is given by:

E(2) = - > —! — • (1 7) 4 f f f pfa,a ( -P + i - -r -f-s)

Here ep is the SCF energy of orbital p. The integrals < pq\rs > are termed as superinte- grals and are formed from the transformation of AO-level integrals. These transformed integrals < pq\rs > (also termed as MO-level integrals) are the same two-electron inte­ grals required for SCF procedure, except that these are now transformed from AO basis to MO basis using MO coefficients obtained from SCF procedure. The transformation procedure is given as follows:

< vq\rs >=Y.Y.Y.HCpiCqjCrkCsl < ij\kl > (1.8) i j k l Here p,q,r,s are the MO indices whereas i.j,k,l are AO indices and C is the MO coefficient matrix. The MO indices take values for all the occupied as well as virtual 8 orbitals, while AO indices account for only the occupied orbitals. Evaluation of all < pq\rs > as per Eq. 1.8 has computational complexity of 0(TV8). However, it can be rearranged as a series of partial transformations so that the complexity is reduced to 0(N5). One of the commonly employed schemes where four quarter transformations are performed is shown below:

N =J2cm<*j\M> (1-9) i=i TV

=YtCqj (1.10) j=i N < pq\rl > = Y^ Crk < P (1-11) fc=l N < pq\rs >= J2 da < pq\rl > (1.12) 1=1 Note that each of the quarter transformations involves 0(N5) terms. This scaling can be understood from the fact that £"=i Q» < ij\kl > (i.e. the RHS of Eq. 1.9) has a total of TV operations and is varied over p, j. kj. The index p takes values of all occupied as well as virtual orbitals, say TV', while j, fc, I take values of all occupied orbitals, say TV. Thus the total number of computation are TV'.TV4 and hence in general this quarter transform scales as 0(TV5). Overall, the full transformation requires 4.(9(TV5) evaluations. For the above equations, intermediate integrals can be stored either in the main memory or secondary storage. The minimum storage requirement of such a quarter transform is 0(TV3), owing to the fact that the intermediate storage array needs to accommodate about TV3.TV' elements.

Density Functional Method

Density Functional Theory (DFT) is one of the widely used ab initio methods as it provides correlation correction to energy with a timing factor similar to HF. The basic foundation of DFT was laid by the formulation of coveted Hohenberg-Kohn (HK) theorems[33]. In the HK theorems, electron density plays the role of basic variable and can be stated as:

(1) The external potential is determined, within an additive constant, by the ground state electron density, and conversely.

(2) The energy due to any normalized, non-negative trial density that satisfies cer­ tain conditions is variational. 9

As an implication of the above theorems, energy can be expressed as a functional of the ground state density p(r) as E[p(r)\. In DFT, the electronic energy functional is expressed in terms of the contributions from kinetic energy (T), external ptotential (V) and electron-electron interaction energy (U) as: E[p] = T[p\ + V[p] + U[p\. A practical orbital-based, solution to using this formulation was provided by Kohn and Sham[34] with the introduction orbitals in 1965. Applying the HK theorem with respect to KS orbitals yields the KS equations:

1 2 (- -V + Veff(r))Ur) = tlA(r) (1.13)

The effective potential Ve/f is a sum of external potential Vext, the electron-electron repulsion term and exchange-correlation potential Vxc as given below:

Ve//(r) = Vext(r) + [dV/^j- + Vxc(r) (1.14) J |r — r | The KS equivalent of HF (Eq. 1.4) matrix form using finite basis expansion can now be written as: HKSC = SCi (1.15)

Here, the HKS resembles F from the HF equations (Eq. 1.4). This term, as indicated in Eq. 1.13 and Eq. 1.14 comprises of the contributions from two-electron integrals (as in case of HF) and involves two additional terms: exchange and correlation. Note that the exchange and correlation term in HKS is completely unknown and is derived from

Exc. This results into a similar SCF procedure as in the case of HF, although each of this iteration is more expensive than the corresponding HF counterpart as it involves numerical integration for evaluating exchange-correlation integrals. The most widely used exchange functional are the so called hybrid functionals, which are a combination of HF exchange with exchange-correlation functionals of DFT. The present thesis uses one such popular Becke's three-parameter exchange functional also called as B3LYP[35], but the methods presented could be used with any other functional with out any loss of generality. The B3LYP model is defined by:

E™LYP = (1 - a)E™DA + aE?F + bAE* + (1 - c)E^SDA + cE^YP (1.16)

In the above equation, a, b and c are taken to be 0.20, 0.72 and 0.81 (in a.u.), respec­ SDA tively. EX is termed as the local spin density approximation correlation energy, F YP EX is the HF exchange term while Ex and E^ are DFT exchange terms provided by Becke and Lee, Yang and Parr, respectively. For more elaborate details regarding theory and application of these methods the readers are directed elsewhere[24, 26, 36]. 10

1.2.2 Ab initio one-electron Property Evaluation, Geometry Optimiza­ tion and its Scalability

The methods described in the previous section can be used for describing vari­ ous one-electron properties of a molecular system. These one-electron properties like electron density, electrostatic potential, electronic moments etc. can be computed from molecular wavefunctions or the density matrix obtained from SCF procedure. Compu­ tation of property itself is a 0(N2) process, which is far less expensive than obtaining the density matrix or the wavefunction as evident from the previous section. The 0(N2) complexity comes from the fact that the basic operation involved in all property evaluation is the binary product of the N wavefunctions. For instance, the molecular electrostatic potential (MESP) at point r using ab initio molecular wave functions can be computed as:

Here

1.2.3 A note on Geometry Optimization methods

Prediction and reproduction of molecular conformation has been one of the most successful application of quantum chemistry [24, 26, 44, 45, 46, 47]. Even with the application of minimal basis set, quantum chemical programs have been able to repro­ duce geometry accurate to ±0.02A in terms of bond length and bond angles to ±5° in a number of cases [24]. Application of larger basis sets and better level of theory that incorporate election correlation effects can now produce geometries that challenge crystallographic data for accuracy. Even though there is a strong reason to use quantum chemical calculation for predicting molecular structure, this process is an extremely exhaustive task. For M atoms, the energy is a function of 3M - 6 (or 3M - 5) vibrational degrees of freedom. For an exhaustive search of the minima all "3M" regions might need to be visited, however for this thesis we restrict our sell's to finding best possible local minima. From the previous Sections it is clear that the energy of a molecule E is a paramet­ ric function of the nuclear positions X = {X\,X2. .--,X3M), A geometry optimization procedure will involve moving from X to X„euj such that E(Xnew) < E(X). The energy can then be expanded using Taylor series[24, 44] about X as:

E(Xnew) = E(X) + q'f(X) + iq'tf (X)q + ... (1-18) 12

Here q is the displacement, q = (Xnew - X). The energy gradient is given as:

$ = ^W (1.19) 9Xi The corresponding Hessian matrix element is given by:

Although the Taylor series is infinite, the quadratic form is accurate enough description near an extremal position (say Xe). The extremum point by definition is characterized by f(Xe) = 0, i.e. the derivative of energy w.r.t. the nuclear positions vanishes. Thus for X = Xe,

£(Xi) = E(Xe) + ^q'tf (Xe)q (1.21)

The gradients can also be written as:

f(X1) = f(Xe) + //(X)q (1.22)

When Xi = Xe the above equation becomes:

f(X) = -H(X)q (1.23)

The solution of Eq. 1.23 leads to some of the most efficient procedures used to find extrema of functions of several variables for which the functional form of E(X.) is not explicit in X. With the assumption that H is nonsingular, the displacement q, which allows for a solution to Xe from any given X "near enough" for the energy function to be nearly quadratic, can be computed as:

q=-if"1(X)f(X) (1.24)

All the algorithms basically use the above equation to decide the direction and the step size to be taken from a starting point X in search of an extrema. In case of HF and MP2 methods, analytic calculation of derivatives and Hessian calculation is possible and is generally used instead of numerically differentiating the energy[26]. For DFT however, it is common to use numerical derivatives of energy; though some quantum chemical codes do provide analytic derivatives. One of the earliest work on analytic derivatives for HF was reported by Handy et al. and implemented in a once popular package called MICROMOL[48]. MICROMOL in itself started a sort of revolution in quantum chemistry codes (QCC) by targeting its development for a microcomputers and PC as compared to mainframes on which all other popular 13 codes were based. Later on, this trend was followed by others like TURBOMOLE[49], Gaussian[50] and GAMESS[51], which were ported to the PC platform as its power and availability grew dramatically during 1990s. Once the gradients and Hessian are defined the necessary and sufficient conditions for a minimum energy structure are:

fi = 0 (1.25)

Hij > 0 (1.26)

However, in actual practice, fj is checked for a "near zero" threshold (say |fj| < e, where e = 10~4) rather than equality with zero. The algorithms that drive the given guess geometry towards the minimum energy structure can be largely classified as follows:

(1) as the ones that work without gradients

(2) those employing numerical gradients and second derivatives

(3) those based on analytic gradients and numerical second derivatives

(4) use analytic gradients and second derivatives

A number of such methods are described in detail else where [5 2]; a few relevant meth­ ods are described here. Of the type of methods enumerated above, quantum chemical programs generally use the type 3 algorithms. Type 4 methods are the most reliable ones as it utilizes most of the information about the energy function, however they turn out to be way too expensive for routine application for geometry optimization. Type 1 algorithms are one of the simplest to implement.

Direct function based methods

Methods such as Simplex optimization[52], for instance, the one given by Nelder and Mead[53] fall under this category. A brief algorithmic description of this method is presented here:

(1) The algorithm begin by constructing a simplex for the optimization problem. A simplex is a geometrical construct with M + 1 interconnected vertices. In case of geometry optimization, M is the dimensionality of energy function (i.e. the number of coordinates representing the molecular system). For a problem involving three variable minimization, a tetrahedral simplex is used. Similarly for non-linear arrangement of 3M Cartesian coordinates a simplex of 3M + 1 vertices is used; if internal coordinates are used instead, then the simplex has 3M - 5 vertices. 14

(2) Once the simplex is setup, the algorithm locates a minimum by moving around on the potential energy surface that is akin to the motion of a amoeba[52]. There are three basic kinds of moves possible:

(a) The first move involves reflection of the vertex with the highest value through the opposite face of the simplex, so as to attempt to generate a new point that is lower in value.

(b) If this new point is found to be lower than the first point then the reflection is followed by an expansion; the second kind of move. When the optimizer reaches the "bottom of valley", following the above moves will result in the optimizer getting stuck. To avoid this the simplex contracts along the highest value; again in a attempt to "zero in" towards the minima.

(c) If this move too fails, a third kind of move is possible in which the simplex contracts in all directions.

(3) Step 2 is repeated until no further moves are possible or when none of the moves give a lower energy configuration. At this juncture the simplex algorithm is said to converge.

The simplex method is attractive because of its relative straightforward implementation and complete absence of using gradients. However, the disadvantages of this method completely outweigh its usefulness for routine application for ab initio theory based geometry optimization. Of the many disadvantages, the prime one is the setting up the simplex itself, which would require 3M + 1 energy evaluations. With each energy evalu­ ation being a very expensive proposition for QC methods (see Section 1.2.1), using the simplex algorithm for geometry optimization turns out to be inefficient and impractical.

Methods using gradients

Gradient-based methods overcome this problem by taking not more that M steps to reach a minimum for a system with M variables. Of the various gradient based methods, steepest descents and conjugate gradient minimization procedures[52] are one of the most popular methods. In case of steepest descents method, steps are taken in a direction parallel to the net force, which is analogous to walking straight downhill. For a molecular system with 3M Cartesian coordinates, this direction can be represented using a 3Af dimensional unit vector, sfc as: sfc = -ffc/|ffc|, where ffc is gradient of energy w.r.t. the coordinates at point k. While s^ provides the direction of movement, the amount of movement to be made to locate the minima can be decided either arbitrarily or using 15 a simple line search procedure. In case of conjugate gradient method, the gradients at each point are orthogonal but the directions are conjugate. A set of conjugate directions has the property that for a quadratic function of M variables, the minimum will always be reached in M steps. The conjugate gradients method moves in a direction v^. from starting point X, where v^ is computed from the gradients at that point and the previous direction vector ~Vk-i as:

Vfc = -ffc + 7fcVfc_1 (1.27)

7fc is a scalar and is given as:

J Ik = r ~- (1.28)

Note that Eq. 1.27 can only be applied from second step. Hence the first step is akin to steepest descents method where the first step is taken arbitrarily or using a simple line search procedure. Gradient based methods described above provide quite good perfor­ mance, in terms of number of steps required to reach the minima, which is generally M if the function depends on M variables. As the direction of gradient is determined by the largest of the interatomic forces, the method is generally robust even if the initial configuration is far away from the extrema.

Methods using second derivatives

The Newton-Raphson method is one of the simplest of type 4 algorithms that explicitly uses the Hessian. Geometrically this method consists of extending a tangent line at a current point X* until it crosses zero and then setting the next guess X,+i to the abscissa of that zero crossing. Algebraically, Newton-Raphson derives from Taylor series expansion of the function about a starting point (cf. Eq. 1.18). However, computation of second derivative matrix is in itself a very expensive process. Moreover, in a number of cases (which is especially true in case of quantum chemical calculations) computation of analytic second derivatives might be either too expensive or not possible at all. Thus, a class of methods called quasi-Newton methods have become popular. These methods start with an approximate Hessian H or its inverse and gradually build it in the successive iterations. Most popular of these quasi-Newton update procedures that are currently used in most of the quantum chemical codes are the Davidson-Fletcher- Powell (DFP) method and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [52]. The BFGS Hessian update procedure is given by the following expression:

X X X X v u i ( fc+1 - fc) ® ( fc+1 ~ fc) ,, 9Qx Hk+i = Hk + —7^ vTTf f~\— ^ ' (Afc+i - -X-jfcMifc+i ~~ lk) 16

[Hk.(fk+i-fk)]®[HUh+i-U)] (fjt+i ~ h)Hk\h+\ - h)

+ [(ffc+1-ffc).Jfffc.(ffc+i-fik)]u(S>u

In the above expression, is used to denote the cross product of vectors and u is denned as:

u = (Xfc+i - Xfc) [Hk-jfk+i ~ ffc)] ^ 3QN

(X/t+i - Xfc).(ffc+1 - ffc) (ffc+1 - ffc)./Tfc.(ffc+i - ffc) The matrix H is often initialized to a unit matrix. Generally for a energy function of M variables, Quasi-Newton methods converge to a minimum in about M steps, even when using a inaccurate guess Hessian. However, for quantum chemical calculations, a better guess Hessian can be provided from a lower level of theory that in some way represents information of atoms and bonding between them in the molecular system. This generally improves the performance of these methods in terms of total number of steps taken to reach a minima. A thorough description of this method and its implementation can be found in Ref. [52]. This thesis basically uses BFGS algorithm coded into GAMESS for building a new geometry optimization procedure as described in Chapter 2.

1.2.4 Parallel Computing in Quantum Chemistry Codes

Parallel computing is a method for improving the total throughput of a task dramatically, generally by splitting the workload among a number of interconnected processors[7, 8]. Not all tasks are parallelizable however, for a large code only few tasks within may be parallelizable. Thus a parallel program will generally compose of a mix­ ture of parallel and serial codes. A parallel computing environment can be distinguished from a purely serial form of computing by examining the order in which instructions are executed. While in a serial computing model, instructions are executed one at a time, a number of concurrent instructions can be executed under parallel computing environment. The need for parallel computing arouse in early days of computing as a solution to obtain tremendous computing power for problems that are characterized by their shear size and complexity. These include weather modeling, astronomical data processing and modeling, accurate quantum chemical calculations and financial model­ ing and predictions. In the past few years, a new justification has come up for adapting parallel computing paradigm: the fundamental physical laws that hinder the miniatur­ ization process of uni-processor systems. This has resulted into the recent explosion of multi core processors offerings form various chip manufacturing companies. For a tightly connected parallel computing framework, the kind of network topol­ ogy used to connect discrete nodes has a large effect on the overall performance of the 17 system. This is also true, when the parallel part of a program is heavily communication oriented. To name a few, ring, bus, mesh and tree are some of the more commonly used network topologies to connect computers in a tightly coupled supercomputing environ­ ment. For a more detailed explanation of parallel architectures and issues on network topology, readers are directed elsewhere[7, 8]. Depending on the problem being parallelized, the workload may be distributed evenly among available compute nodes. In general, however, it is but common that there is mismatch between the available workload that can be parallelized to the number of available processors[7, 8]. Thus, when a parallel code is written, it is many a times required to explicitly account for balancing the load among available nodes. This can either be statically achieved, i.e. before the parallel execution starts or can be dynam­ ically achieved as the parallel program proceeds. The earlier strategy is useful when the compute load of each parallel task is known before hand, while the later strategy is typically used either when the compute load of parallel tasks is uneven or is not known before hand. The ensuing text gives few more insights into terms, tools and hardware used in parallel computing.

Amdahl's Law

Amdahl's law is named after computer architect Gene Amdahl [54] and is used to find the maximum expected improvement to an overall system when only part of the system is improved upon. This law is quite often used in parallel computing to pre­ dict the theoretical maximum speedup using multiple processors for solving a problem.

Generally, the maximum speedup Smax for a serial program that can be parallelized to improve it by p times can be written as:

Smax < TTT^I TT (L31) 1 + / x (P - 1) where, / always lies between 0.0 and 1.0 and is the fraction of time spent on portion of the program that cannot be improved (or parallelized). As a special case for paralleliza- tion of sequential program, Amdahl's law states that if F is the fraction of a calculation that cannot be parallelized, and (1 - F) is the fraction that can be parallelized, then the maximum speedup S%ax that can be achieved by using N processors is: SN < (1-32) max - F + (l-F)/N The above law basically serves as a benchmark tool for sequential codes that can be parallelized in a number of ways. Thus it can be used to measure the performance 18 of variety of parallization algorithms for converting a sequential code. However, there is one known limitation of Amdahl's law. According to Amdahl's law, the theoretical maximum speedup of using N processors would be N, also termed as linear speedup. However, in practice, it is not uncommon to observe more than TV speedup on a machine with N processors, also termed as superlinear speedup[55]. There can be a number of factors (extensive use of cache being one of them) that result into superlinear speedup, albeit one should always be aware of this limitation when applying the above law as a benchmark tool.

Tools for parallel software environments

Along with the evolution of parallel hardware, there has been constant increase and maturity in the software environments that support and ease development of par­ allel codes. Today parallel programming is mostly written in one or a combination of the following environments: Message Passing Interface (MPI) [56, 57], OpenMP [58], Threads [59] and TCP/IP Sockets [60]. All these programming a environments support a number of programming languages including but not limited to: C/C++, FORTRAN, Java and Python. While MPI provides a high level programming library to support parallel ap­ plication development; it assumes a reliable communication among nodes. Thus MPI does not provide any fault tolerant capabilities to the applications developed under this environment. Though there has been some recent progress towards developing fault tolerance capabilities into MPI, it is largely not useful for computing environments that do not have very reliable interconnects between themselves. Though MPI is a fairly powerful and flexible programming environment, porting an existing serial code to parallel environment might turn out to be daunting task. For systems that provide large amounts of shared memory, OpenMP provides a easier and cleaner way to develop parallel programs. OpenMP is a directive based programming environment where in possible parallelism such as loop splitting etc. is specified using one or more directives with in the sequential code. This also makes it easier for porting originally serial codes onto a parallel environment as there are least changes made to the logical flow of the code. While MPI and OpenMP provide high level interface for task parallelism, finer control can be obtained by using TCP/IP sockets and threading libraries. In fact, the underlying implementation of MPI and OpenMP on most of the platforms are calls to lower level socket or threading libraries. Features like fault tolerance and customized load balancing can be easily incorporated by using these lower level libraries. This is 19 one of the primary reasons to use socket and thread library for implementing distributed mode parallelism as will be described in Chapter 2. Looking into future, the development of new parallel software environments like the Fortress programming language from Sun Microsystems [61], X10 from IBM [62] and the Chapel language from Cray [63] would open up interesting possibilities towards utilizing massively parallel computing resources wherein not few hundreds but few tens of thousands of processors will be used together to solve a large and complex problem. These languages intented to provide a simpler framework and syntax in comparison to MPI and OpenMP for developing highly parallel programs.

Advances in parallel hardware platform

Preceding Sections brought out the perennial non-linear scalability of QCC. This has resulted into rather restricted application of QCC to systems with only few tens of nuclei at least till about 1980s. Over the years, however, computing systems have evolved from highly memory constrained systems (such as the early mainframes) to new highly coupled multiprocessor systems with terabytes of shared memory (such as the SGI Altix system). A few such systems, such as IBM's Blue Gene/L [64] have gained a lot of public­ ity and respect from the computing world due to introduction of many new concepts in computer architecture and software development for massively parallel computing en­ vironments. A number of Blue Gene installations world around are being actively used in researching on weather modeling, earthquake predictions and study of human brain. While the IBM's Blue Gene machines use a humongous 65K or more highly connected processors with about 512MB of local memory per processor, NEC's Earth Simulator[65] is based on highly powerful vector processors. The Earth Simulator was probably the first non-US fastest supercomputer, until it was again taken over by bigger installations of IBM's Blue Gene. SGFs Altix platform [66] and the Cray platform [67] from Cray Inc. have been the other heavily used architectures for large scale shared memory par­ allel machines. For a detailed description of many of the computer architectures used in these high end platforms see Refs. [7, 8]. Away from the "big iron" supercomputing facilities, there is an increasing trend to connect relatively cheap "off the shelf commodity computing boxes together and are popularly called as a cluster. In fact, top500.org website[68] that maintains the list of world's top 500 most powerful supercomputers has seen an increasing number of these cluster computers over the past few years and are giving a strong competition to the 20 ones like Cray, SGI and NEC. The author himself, during the course of his work had a fair amount of exposure on coding and porting a few existing and new QCC codes to some of these machines in past and present top500 list [69]. Over the past few years, another term called "Grid computing" has caught the attention of many researchers who need access to high performance computing resources, but individually they have very limited access to local computing resources. Grid computing provides means to connect a number of computing resources; mostly placed in geographically distinct locations, and use this "ad hoc" computing power to solve CPU intensive problems that require this large computing power. Though this form of computing is still at an experimental stage, a number of initiatives, largely supported by government across the world have come up [70].

Exploiting parallelism for QCC

One of the most fascinating aspects of this fast changing hardware field has been equal amount of interest and effort put in to harness computing power especially ori­ ented towards computational chemistry. The field of computational quantum chemistry in particular has had an advantage of contributions from a continuous stream of nu­ merous researchers. This is particularly due to the fact of high non-linear scaling of QC methods: each new hardware platform is used up and a computational chemist is always ready with his next big system that cannot be run on this "new" hardware! In contrast to very slow adoption of QCC in early 1980s, availability of ever powerful high performance hardware in the past two decades has provided a long awaited boost to development of parallel quantum chemical codes as also their application. Some of the early works on parallelization of quantum chemical codes were largely centered around GAUSSIAN[50]. Other developments, many of them at the same time, were product of work done in many laboratories around the world. These included TURBOMOLE[49], INDMOL[71, 72], GAMESS[73, 51], ADF[74], [75] and a number of other devel­ opments. Apart from developing parallel SCF codes, researchers have also investigated a number of algorithmic modifications to MP2 energy evaluations[76] and tuned it to various architectures ranging from distributed memory setup[77] to vector computers such as Cray[78]. SCF parallelization schemes developed into modern QCC have been either based on distributing the workload of computing the two-integral among available processors or using replicated storage and setting up parts of the Fock matrix in parallel[79, 80]. Two electron integral evaluation being the most time consuming part, has received most 21 attention in actual implementations as computer codes. As each integral evaluation is independent of the other, these are "embarrassingly parallelizable". Two electron integrals are generally evaluated using four nested loops as in Algorithm 1.

Algorithm 1 Pseudocode representing loops for evaluation two electron integrals. Require: Integrals indices i, j, k, I Require: numberO f BasisFunctions indiacting total number of basis function repre­ senting the molecule 1: for i = 1 to numberOf'BasisFunctions do 2: for j = 1 to (i + 1) do 3: for k = 1 to numberOf BasisFunctions do 4: for i = 1 to k + 1 do 5: Evaluate 2E integral < ij\kl > 6: end for end for end for end for

The easiest way to achieve parallelization of the above is to split the outer most loop evenly among available processors. Another way is to split all nested for loops among processors. However, both of these may result into uneven distribution of load among processors as the amount of time each integral requires depends upon the shell on which it is centered. A purely s-type integral (i.e. < ss\ss >) will require lesser time to evaluate than mixed p and s -type (such as < ps\ss >) integral[71, 72]. Thus, its more profitable to sort these integrals on types and then schedule them on processors. However, this scheme too has a drawback that the pre-processing time is very large. Another scheme is to use dynamic load balancing, where in only one integral (or a small batch) is sent to each processor; when the processor completes the set allocated to it, the master node sends the next in the list. Though dynamic load balancing scheme will require frequent communication between the master processor and other compute processors, this is largely offset by the fact that compute load is more evenly distributed. Dynamic load balancing is by far one of the most effective ways to perform two-electron integral parallelization. Another expensive operation in an SCF procedure is the setting up of the Fock matrix. From Eq. 1.5 it is clear that for setting a Fock element Fij the complete density matrix P and a few relevant two-electron integrals are needed. Generally, Fock matrix parallelization is done via duplicating the P matrix for each node and then sending the relevant two-electron integrals (in case of conventional HF), or computing them on the fly (in case of direct HF). In case the two-electron integrals 22 are evaluated on the fly, then the amount of communication is far less as compared to when they are transferred from a master node. In supercomputing facilities, where network 10 is slow or comes at a premium cost, it is best to compute them on the fly as the high computing speed of individual nodes offset the disadvantage rendered due to recomputing of the integrals during SCF iteration. MP2 parallelization on the other hand is more tricky and most of the parallel implementations of MP2 energy and gradient evaluations require large local storage and memory as well as very good communication capabilities among the compute nodes. In the recent past, the availability of highly coupled and massively parallel com­ putational resources have paved the way for highly parallel quantum chemistry codes like the MPQC[81] and NWChem[82]. These codes have been designed to scale almost linearly on hundreds of tightly coupled processors. Fast methods such as Resolution Identity MP2[83] (or simply RI-MP2) coded into these packages provide for very fast evaluations of MP2 energy and gradients with very high accuracy in terms of compar­ isons to actual MP2 evaluations. On algorithmic front, for actual MP2 energies and gradients without any approximations, there have been very recent developments in improving MP2 sequential as well as parallel implementations in GAMESS by Nagase and his group[84, 85]. These developments have enabled computation of MP2 energy for molecules with 2000 basis functions. However the memory, disk and communi­ cation overheads are still prohibitively large to enable their routine application. For instance, the per-node memory requirement for computing MP2 energy (no gradients) for a molecule taxol at 6-31G(d) basis (1032 contractions) was about 700MB for 16 nodes. While the disk requirements were about 90GB per node, the calculations also involved communication amounting to a total of 90GB across all nodes[84, 85]. Further, the scaling order in terms of both memory and disk in all these improvised methods is not linear and hence increase quite drastically with increase in system size being handled.

Alternative hardware for improving QCC performance

The idea of using special purpose devices to speed up computation is not new [86], but recent improvements in electronics has helped a new kind of parallelism being explored of late. This largely owe to the availability of low cost GPUs that can do enormous array processing at much faster speeds than the general CPU. Though GPUs are restricted by the kind of computations that can be performed on them, with one major drawback being very scant support for double precision arithmetic, Yasuda[87] 23

has intelligently exploited the GPU core for computing certain two-electron integrals (for DFT) that do no require high precision. These lower precision integrals are evaluated on GPU, asynchronously with others requiring high precision on the CPU, thus providing considerable speedup over just using the conventional CPU. With the computer chip building technology becoming cheaper day by day, researchers like Ramdas[88] have gone even further and proposed development of special purpose computers just for performing Hartree-Fock calculations. While architectures like these may not be very widely available in coming few years, they are definitely looking at a new way to speedup quantum chemical calculations. All these developments have surely brought in better utilization of available resources for performing quantum chemical calculations, but none of them in a true sense address the fundamental perennial problem of non-linear scaling of these methods.

1.2.5 Linear Scaling Quantum Chemistry Codes

In an attempt to reduce the CPU and memory requirements of ab initio methods, there have been attempts by various researchers to device linear-scaling algorithms. Of these, linear-scaling divide-and-conquer (D&C) type algorithms have had a fair amount of success in computing one-electron properties and for structure determination[89, 90, 91, 92, 93, 94, 95, 96, 97. 98, 99. 100, 101]. Christoffersen[89], in as early as 1972, had proposed a fragment based method to yield various chemical properties of molecules with about 200 electrons by applying ab initio procedure. However, actual applications reported by Christoffersen were very limited partly due to the restrictions on avail­ ability of computer hardware of that era. Moreover, this work was based on FSGO approximations to representing the molecular orbitals which are not accurate as HF energy obtained using Gaussian basis. The group of Yang[92] reported one of the earliest attempts in applying DC-type methodology for geometry optimization with in density functional theory (DFT). Although the results obtained by them were re­ ported to be close to the actual computations (when applied to small molecules like a /?-tetrapeptide comprising of glycine), there was no apparent advantage in terms of CPU time when compared to the corresponding full Kohn-Sham calculation. Gadre and co-workers[94, 95] independently developed a molecular tailoring approach (MTA) and have applied it for the computation of one-electron properties such as electron density and electrostatic potential of many large non-linear molecules such as silicalite, ibupro- fen cluster and many others. Exner and Mezey[96] have developed and employed their adjustable density matrix approximation (ADMA) to various classes of polypeptides for 24

determining one-electron properties. Using molecular fractionation with conjugate caps (MFCC) approach, developed by Li et al. [98] and later extended by Zhang et al.[102], the estimation of Hartree-Fock (HF), M0ller-Plesset second order perturbation (MP2) and B3LYP energies with maximum error of up to a few millihartrees has been demon­ strated. However, most of the systems reported by the previous researchers are either small or linear except for the case of crambin molecule [103]. Although the geometry optimization of crambin molecule appears to be one of the largest ab initio calculation that has been carried out so far, we find that there are still sufficient computational bottlenecks that need to be overcome for a routine application of these methods to biologically interesting systems. In an almost parallel, independent development, Kitaura's group have developed a similar fragment based strategy termed Fragment Molecular Orbital (FMO) method which has been incorporated into GAMESS package[99, 100, 104, 105, 106, 107]. The group of Kitaura has also applied FMO to simulations [108]. How­ ever, the FMO technique has been largely tested and applied for single point energy evaluation of large molecules, obtaining binding/ interaction energies of protein or protein-ligand complexes[106, 104] etc. Using FMO technique, single point calcula­ tion on Lysozyme, a protein with about 2000 atoms at HF/4-31G level has also been reported [99, 100]. A computational time of 4 days with a cluster of 18 Pentium III computers was reported for this investigation. The largest of the FMO enabled calcula­ tion, which has recently been reported[104] is on 20881 atom photosynthetic protein at RHF/6-31G(d) level of theory using huge computing power of 600 CPUs. A few systems have also been subjected to geometry optimization, predominantly within the FMO-HF framework. In a related work, Nakai et al. have used the density matrix obtained from Yang's[91] scheme to obtain MP2 energies of medium to large systems[109, 110] such as polyalenene chains, the crambin protein etc. It is worth mentioning here that most of the above groups have introduced ways to compute single point energy for a given configuration. A few, such as FMO method of Kitaura et al. have developed strategies to compute gradients and have applied it for geometry optimization of a few molecules only recently [111].

Apart from D&C techniques, there has also been growing interest in developing methods for geometry optimization of large molecules. Of these, the ONIOM[50, 112, 113] method, which is available through a popular ab initio package[19] - Gaussian'03, has been extensively used in recent years. These "semi" ab initio methods usually apply a lower level of theory in the outer region of the molecule and a sufficiently higher 25 level in the region of interest or activity, thus considerably reducing the computational cost. This method has been extensively used by Wieczorek and Dannenberg[114] to demonstrate the stability and cooperative interactions due to H-bonds of ^-strands, a- helices and 310 helices of polyalanine. Nemeth and Challacombe[115] on the other hand follow a different approach towards optimization by using a weighted extrapolation of energy gradients to achieve faster convergence for large biological systems. The present thesis introduces a new version of MTA adapted for geometry opti­ mization as well as property calculation of large molecules. This new scheme termed as Cardinality Guided MTA (CG-MTA) is applied at HF, B3LYP and MP2 levels of theory for a variety of molecular systems. Distributed and parallel implementation of CG-MTA is also presented, which clearly brings out the ease with which grid-enabled ab initio computations could be achieved.

1.3 Programmable and Visualization environments for Computa­ tional Chemists

A computational chemist is, and probably will, remain one of the most active users of computers and computing environments. Be it either a desktop PC or a su­ percomputer, computational chemists (probably with collaboration from computational scientist or computer scientist or in many cases on their own) over the years have devel­ oped tools to harness computing power to solve routine and grand challenge problems facing chemistry and other chemical sciences. Several data analysis tools which involve visualization, statistical analysis, graphing and charting etc. have also come out and many of these are available as free or opensource programs.

Visualization tools

For a computational chemist, visualization of the numerical results produced by computer programs has been of prime importance from the very beginning of this field. Owing to this requirement, a number of visualization tools tailored specifically towards structure based computational chemistry have been developed over years. These not only provides an intuitive way to present data, but also open up some possibilities to look at the data at hand via various probing aids that many of these tools pro­ vide. A number of specialized visualization tools like RasMol[116], MacMolPlot[117] and Univis[118] have been built to address this growing visualization requirement of computational chemists. In recent years, there have been a number of high quality opensource packages such as PyMol[119] and [120] that provide support for read- 26 ing outputs generated from a number of quantum chemical packages. All these packages provide basic functionality of viewing molecules in different models including simple wire model to more appealing and stick and sphere models. These packages also provide viewing of contours and surfaces for molecular properties such as electrostatic poten­ tial, solvent accessible surface etc. The quality of graphics produced from these packages vary a lot depending upon the underlying graphics libraries used by them. While Mac- MolPlot, Univis and PyMol employ high quality OpenGL[121] based libraries, packages such as JMol have built their own graphics systems. While all the packages mentioned are available on the PC platform, some packages such as MacMolPlot, PyMol and JMol are available on multiple platforms. Another class of visualization tools that are increasingly gaining popularity among computational chemists are the volume exploration tools [122, 123, 124]. While IBM's OpenDX[122] is a complete visualization framework, Drishti[123] provides a user friendly interface for volumetric data interpretation and visualization. Note that these tools are quite general and are not specifically programmed for computational chemistry appli­ cations. However, a few researchers have used these visualization tools for interpreting and utilizing them towards applications for computational chemistry. For instance, MacDougall and Henze [124] have used volume rendering tools for rapid screening of pharmacophores, which are the potential drug molecules for known biological target. Most of the above tools are restricted by the fact that the kind of visualization that can be performed are the ones that have already been coded into these packages. Except for OpenDX, which is in fact an external library for visualization, other packages provide limited or no support for programmability. Of the computational chemistry specific visualization tools, RasMol [116], PyMol[119] and JMol[29] provide limited support for scripting. Scripting in these packages is mostly supported using RasMol[116] style programming language. Scripting ability of these packages is limited to functions for manipulating graphics (like adding trackers, displaying isosurface etc.) and reading data (input coordinates, outputs of QCC codes) into the program.

Programmable environments

Ever since the mainstream acceptance of computational chemistry as a tool for new discovery, there have been a number of applications and codes written to help computational chemists. However, developing these application and trying out any new ideas from already written set of codes is increasingly getting more and more difficult as none of them have been developed with extensibility and rapid application 27

development in mind. This problem has been realized by many researchers working in the field of computational chemistry and there have been attempts to introduce component based libraries to cater to new application development in recent times. Consecutively, researchers in the field of computational science are now trying to build up applications based on component architecture[125, 126]. Ray et al.[125] have build a component architecture that allows extension of ab initio packages like NWChem[82] and MPQC[81]. This component oriented development enables their easy integration in other packages. On the visualization front. Sanner[126j has build visualization toolkits that are programmable and extendable using Python programming language[127]. This provides for scripting capabilities in the visualization environment and allows a user to perform a number of customizations in a automated manner, which otherwise is not possible by simply providing a GUI. Demetropoulos et al.[128], in a more niche area, have developed a programmable interface to the GAMESS package[73] which can be used to plug in different optimization packages. The problems handled by computational chemists are increasingly vast and are many a time handled by a number of people working in different areas of research. Collaborative tools that help in such situations are very helpful and this is exactly what Weber et al.[129] have tried to do with their AMMP-Vis package. AMMP-Vis is built around a client-server architecture and tries to provide a collaborative virtual environment that aids in molecular modeling. Apart form application point of view, there has been huge effort put forward by many computational chemists around the world to build Java class libraries (such as JOElib[130] and CDK[131]) that provide most commonly used functionalities for building new applications. However, all these attempts to bring in programmability is rather restricted to one niche area (such as for a particular package) or only for visualization. Another class of packages such as Mathematica[132] are beginning to provide more and more tools to generalize scientific programming by providing auxiliary tools that could be used by non-mathematicians. One of the powers of Mathematica type packages is the availability of symbolic processing language that eases in solving a particular problem rather than having a steep learning curve. However, in this case too, the programming interface is restricted to using the in built language and to program with these environments using a language of your choice is not an easy task. As a part of this dissertation, an attempt to build a cross-platform programmable environment for computational chemist is described. A novel feature of this environment is that it provides a complete application development environment with a set of ex- 28 tensive Application Programming Interfaces (APIs), editors, and visualization tool in a single integrated package. It also provides an easy mechanism to use existing Java class libraries such as JOElib[130] and CDK[131] or any other user developed libraries. Fur­ ther, the tool also provides a number of utilities to try and experiment with divide and conquer type algorithms for structure based computational chemistry and specifically supporting CG-MTA algorithms discussed later in Chapter 2.

1.4 Motivation and Scope of Present Work

The growth in computational power and storage driven by Moore's law[133] has enabled the scientific community to attempt solution of difficult problems in their areas of research. For example, it has become possible to routinely apply ab initio theories for attempting solutions of many challenging problems in chemistry, physics and biology. However, the formidable computational complexity of these methods is a major bottle­ neck towards applying them to larger chemical or biological systems. In consequence, even with huge computational resources, practical applications of conventionally coded ab initio methods are plausible only for the systems containing fewer than 100 atoms at a sufficiently reliable level of theory and basis. The main objective of this thesis is to develop reliable linear scaling algorithms and computer codes so as to extend the applicability of these accurate methods to larger systems including, but not restricted to protein fragments and molecular clusters. The second Chapter will discuss the development of Cardinality Guided Molecu­ lar Tailoring Approach (CG-MTA) for ab initio geometry optimization and one-electron property calculation of large molecular systems. Ab initio quality one-electron prop­ erties like Molecular Electron Density (MED) and Molecular Electrostatic Potential (MESP) have been investigated extensively to gain insights into the structure and reac­ tivity of molecules. Computation of these properties requires one to obtain first order density matrix (DM) of the concerned system. Geometry optimization on the other hand is a more involved and expensive process, the objective being to systematically obtain a local energy minimum structure. Gradient based optimization algorithms that are commonly used in such codes require energy and its partial derivatives for updating the values of the geometric variables in the optimization scheme. As the process of finding a local energy minimum structure is an iterative task, multiple of these steps is usually required to be taken from an initial guess configuration of coordinates. As com­ pared to one-electron property evaluation where it is enough to obtain a good quality first order DM, geometry optimization requires to have an accurate evaluation of energy 29

and its derivatives. Calculation of energy and DM for a fixed geometry requires one to follow a Self Consistent Field (SCF) procedure. However, evaluation of derivatives of energy is some times a more expensive process for medium to large sized systems. CG-MTA address both these issues in an effective manner by completely by-passing the calculation on the entire system and replacing it by calculations on smaller overlapping sub-fragments of the main molecule. Implementation details and parallelization strate­ gies used in the code will be thoroughly discussed in this Chapter. WebProp, a web based interface for computing one-electron properties for medium to large molecules will be elaborated. This interface uses the CG-MTA code at the backend to obtain the density matrix needed for computing one-electron properties of large molecules, whose calculations are otherwise an extremely expensive task. Chapter 3 will contain in depth details of testing and benchmarking of CG-MTA code on a few chemical and biological systems. The code is tested and benchmarked to evaluate the automated fragmentation routine and its ability to handle variety of molecules including molecular clusters. The accuracy of energy and its derivatives at various levels of theory as compared to complete calculation will be reported. Com­ parison of CPU time, memory and disk requirements of CG-MTA code with that for a standard ab initio code will also be reported. A few test cases for complete geom­ etry optimization of CG-MTA along with a comparative actual run will be presented to gauge the accuracy of CG-MTA. Scalability of the code and its current restrictions will also be discussed and detailed out. Porting of this code on a few available high performance computing platforms and issues encountered will be elaborated. A few applications of the code developed will be illustrated in Chapter 4. Applica­ tions to geometry optimization and property evaluation of medium and moderately large molecules and clusters will be demonstrated. Geometry optimization and property eval­ uation using CG-MTA for molecules such as cholesterol, a-tocopherol, 7-cyclodextrin, a-helical glycine, taxol and an albumin binding protein will be documented. Other areas where MTA code written by the author has been used by others for deriving more information about chemical nature of clusters and H-bonds will be glanced through. These include application to Many Body Analysis of Clusters and a brief note on how tailoring idea of CG-MTA is used in estimating H-bond energy. The final Chapter will bring out the development of a programmable integrated environment (called MeTA Studio) specifically tailored (but not restricted) for com­ putational chemists working in the area of quantum chemistry with an emphasis on handling large molecules. MeTA Studio, apart from being a general tool for compu- 30 tational chemist is an indispensable part of the CG-MTA implementation that assists a user in visualizing molecules, scalar fields and manual fragmentation. MeTA Stu­ dio viewer introduces some innovative but simple techniques of assisting in viewing large molecules which include a simple Find language and multi camera view of the molecule being viewed. The viewer also incorporates a feature for manual fragmenta­ tion of molecules and provides immediate feedback on the goodness of the fragmentation scheme. MeTA Studio also incorporates basic tools needed for collaborative computing effectively opening up a new way of sharing ideas and information among computational chemists working on a common large problem. It also provides a powerful programming environment and a rich set of APIs that can be used to easily extend its functionality or make new applications as needed by the users.

References

[1] D. Phani et al., The design and implementation of a first-generation cell processor, in Solid-State Circuits Conference, 2005. Digest of Technical Papers. ISSCC. 2005 IEEE International, pages 184-592, IEEE Computer Society, 2005.

[2] H. P. Hofstee, Power efficient processor architecture and the cell processor, in HPCA '05: Proceedings of the 11th International Symposium on High- Performance Computer Architecture, pages 258-262, IEEE Computer Society, 2005.

[3] S. Williams et al., The potential of the cell processor for scientific computing, in CF '06: Proceedings of the 3rd conference on Computing frontiers, pages 9-20, New York, 2006, ACM Press.

[4] C. McNairy and R. Bhatia, IEEE Micro 25, 10 (2005).

[5] AMD, Advanced Micro Devices, http://www.amd.com (2007).

[6] P. Kongetira, K. Aingaran, and K. Olukotun, IEEE Micro 25. 21 (2005).

[7] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to parallel com­ puting: design and analysis of algorithms, Benjamin-Cummings, Redwood City, 1994.

[8] M. Quinn, Parallel computing: theory and practice, McGraw-Hill, New York, 1994.

[9] PS3, Sony Playstation, http://www.playstation.com (2007).

[10] J. Andrews and N. Baker, IEEE Micro 26, 25 (2006).

[11] C. Tilstone, Lancet Oncology 8, 201 (2007).

[12] V. Pandey, FoldingOHome, http://folding.stanford.edu/, 2007. 31

[13] J. Kriiger and R. Westermann, Linear algebra operators for gpu implementation of numerical algorithms, in SIGGRAPH '05: ACM SIGGRAPH 2005 Courses, page 234, New York, 2005, ACM Press.

[14] mobihf, Quantum Chemistry on Mobile based on PyQuante, http: //tovganesh.googlepages.com/s60 (2006).

[15] Nature, Future Computing, http://www.nature.com/news/infocus/futurecomputing.html, 2006.

[16] R. P. Feynman, A. J. Hey, and R. W. Allen, Feynman Lectures on Computation, Perseus Books Cambridge, USA, 2000.

[17] M. A. Nielsen and I. L. Chuang, Quantum Computing and Quantum Information, Cambridge University Press, Cambridge, 2000.

[18] P. W. Shor, SIAM Journal on Computing 26, 1484 (1997).

[19] W. L. Jorgensen and J. Triafo-Rivers, J. Am. Chem. Soc. 110, 1666 (1988).

[20] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impely, and M. L. Klein, J. Chem. Phys. 79, 926 (1983).

[21] M. J. S. Dewar, E. G. Zoebisch, E. F. Healy, and J. J. P. Stewart, J. Am. Chem. Soc. 107, 3902 (1985).

[22] J. J. P. Stewart, J. Comput. Chem. 10, 209 (1989).

[23] J. J. P. Stewart, J. Comput. Chem. 10, 221 (1989).

[24] A. Szabo and N. S. Ostlund, Morden Quantum Chemistry, McGraw-Hill, New York, 1989.

[25] W. J. Hehre, L. Radom, and P. V. R. Schleyer, Ab Initio Molecular Orbital Theory, John Wiley, New York, 1986.

[26] D. B. Cook, Handbook of Computational Chemistry, Oxford University Press, Oxford, 1998.

[27] M. Born and J. R. Oppenheimer, Ann. Phys. 84, 457 (1927).

[28] D. R. Hartree, The Calculation of Atomic Structure, Wiley, New York, 1957.

[29] V. Fock, Z. Phys. 61, 126 (1930).

[30] C. C. J. Roothaan, Rev. Mod. Phys. 23, 69 (1951).

[31] G. G. Hall, Proc. Roy. Soc. (London) A205, 541 (1951).

[32] C. M0llcr and M. S. Plessut, Phys. Rev. 46, 618 (1934).

[33] P. Hohenberg and W. Kohn, Phys. Rev. 136, B864 (1964).

[34] W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965). 32

[35] C. Lee, W. Yang, and R. G. Parr, Phys. Rev. B 37, 785 (1988).

[36] R. G. Parr and W. Yang, Density Functional Theory of Atoms and Molecules, Oxford University Press, New York. 1989.

[37] R. F. W. Bader, Atoms in molecules. A Quantum Theory, Oxford University Press, Oxford, 1990.

[38] S. R. Gadre and R. N. Shirsat, Electrostatics of Atoms and Molecules, Universities Press, Hyderabad, 2000.

[39] J. Almlof, K. Faegri, and K. Korsell, J. Comput. Chem. 3, 385 (1982).

[40] M. Haser and R. Ahlrichs, J. Comput. Chem. 10, 104 (1989).

[41] M. E. Colvin, C. L. Janssen, R. A. Whiteside, and C. H. Tong, Theoret. Chem. Ace. 84, 301 (1992).

[42] S. Brode et al, J. Comput. Chem. 14, 1142 (1993).

[43] M. Haser, R. Ahlrichs, H. P. Baron, P. Weis, and H. Horn, Theor. Chem. Ace. 83, 455 (2005).

[44] A. R. Leach, , Addison Wesley Longman, England, 1996.

[45] P. Pulay and G. Fogarasi, J. Chem. Phys. 96, 2856 (1992).

[46] F. Eckert, P. Pulay, and H. J. Werner, J. Comput. Chem. 18, 1473 (1997).

[47] B. Paizs, G. Fogarasi, and P. Pulay, J. Chem. Phys. 109, 6571 (1998).

[48] S. M. Colwell, A. R. Marshall, R. D. Amos, and N. C. Handy, Chem. Br. 21, 665 (1985).

[49] R. Ahlrichs, M. Bar, M. Haser, H. Horn, and C. Kolmel, Chem. Phys. Lett. 162, 165 (1989).

[50] M. J. Frisch et al., Gaussian 03, Revision C.02, Gaussian, Inc., Wallingford, CT, 2004.

[51] M. W. Schmidt et al, J. Comput. Chem. 14, 1347 (1993).

[52] W. H. Press, S. A. Teukolsky, W. T. Vettering, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing (2nd cd.), Cambridge University Press, Cambridge, 1992.

[53] J. A. Nelder and R. Mead, Comput. J. 7, 308 (1965).

[54] G. Amdahl, Validity of the single processor approach to achieving large-scale computing capabilities, in AFIPS Conference Proceedings, pages 483-485, 1967.

[55] D. Parkinson, Parallel Computing , 261262 (1986).

[56] D. W. Walker, Parallel Computing 20(4), 657 (1994). 33

[57] MPICH, MPICH: A raulti platform implementation of MPI, http://www.mcs.anl.gov/mpi/, 2007.

[58] OpenMP, OpenMP: Simple, Portable, Scalable SMP Programming, http://www.openmp.org/, 2007.

[59] B. Nichols, D. Buttlar, and J. P. Farrell, Pthreads Programming: A POSIX Standard for Better Multiprocessing, O'Reilly, 1996.

[60] W. R. Stevens, UNIX Network Programming. Volume 1 and 2., Prentice Hall, 1998.

[61] G. L. Steele, Fortress: a new programming language designed for high- performance computing, http://fortress.sunsource.net/, 2007.

[62] IBM, X10, http://xlO.sourceforge.net, 2007.

[63] Cray, Chapel, http://chapel.cs.washington.edu/, 2007.

[64] IBM, Blue Gene/L, http://research.ibm.com/bluegene/, 2007.

[65] NEC, The Earth Simulator Center: Japan Agency for Marine-Earth Science and Technology, http://www.es.jamstec.go.jp/index.en.html, 2007.

[66] SGI, Altix Family of Clusters and Supercomputers, http://www.sgi.com/products/servers/altix/, 2007.

[67] Cray Inc., The Supercomputer Company, http://www.cray.com/, 2007.

[68] top500.org, Top 500 Supercomputers, http://www.top500.org/, 2007. [69] PARAM and APAC, The IBM p690 series based PARAM supercomputing fa­ cility provided by C-DAC and the SGI Altix based APAC systems in Australian National University.

[70] Grid Computing, See for example Tera Grid in the USA, EUGrid of the Eu- ropian union, GARUDA grid in India (being created by C-DAC), EU-India grid connecting the EUGrid and the GARUDA grid.

[71] S. R. Gadre, S. A. Kulkarni, A. C. Limaye, and R. N. Shirsat, Z. Phys. D-Atoms, Molec. and Clusters 18, 357 (1991).

[72] S. R. Gadre, A. C. Limaye, and R. N. Shirsat, J. Comput. Chem. 14, 445 (1993).

[73] M. S. Gordon, The GAMESS package, http://www.msg.arneslab.gov/GAMESS/GAMESS.htn 2003.

[74] ADF, Advanced Density Functional package, http://www.scrn.com/, 2007.

[75] Schrodinger Inc., Jaguar: Rapid ab initio electronic structure package, http://www.schrodinger.com/, 2007.

[76] P. Pulay, S. Saebob, and K. Wolinski, Chem. Phys. Lett. 344, 543 (2001). 34

[77; A. C. Limaye, J. Comput. Chem. 18, 552 (1998).

[78; G. D. Fletcher, A. P. Rendell, and P. Sherwood, MoL Phys. 91, 431 (1997).

[79 U. Wedig, A. Burkhardt, and H. G. Schnering, Z. Phys. S Atoms, Molec. Clusters 13, 377 (1989).

[80 R. J. Harrison and R. A. Kendell, Theor. Chim. Acta. (Berl.) 79, 337 (1991).

[81 M. P. Q. C. Program, The Massively Parallel Quantum Chemistry Program, http://www.mpqc.org, 2007.

[82 E. Apra et ah, NWChem, A computational chemistry package for parallel com­ puters, version 4.7. Pacific Northwest National Laboratory, Richland, Washington, USA, 2005.

[83 F. Weigend and M. Haser, Theor. Chem. Ace. 97, 331 (1998).

[84 K. Ishimura, P. Pulay, and S. Nagase, J. Comput. Chem. 27, 407 (2006).

[85 K. Ishimura, P. Pulay, and S. Nagase, J. Comput. Chem. 28, 2034 (2007).

[86 V. Bush and S. H. Caldwell, Phys. Rev. 38, 1898 (1931).

[87 K. Yasuda, J. Comput. Chem. , ASAP article (2007).

[88 T. Ramdas, G. Egan. D. Abramson, and K. Baldridge, Theor. Chem. Ace. , DOI: 10.1007/s00214 (2007).

[89] D. Spangler and R. E. Christoffersen, Adv. Quant. Chem. 6, 333 (1972).

[90] Q. Zhao and W. Yang, J. Chem. Phys. 102, 9598 (1995).

[91] W. Yang, Phys. Rev. Lett. 66, 1432 (1991).

[92] W. Yang and T. S. Lee, J. Chem. Phys. 103, 5674 (1995).

[93] X. P. Li, R. W. Niines, and D. Vanderbilt, Phys. Rev. B 47, 10891 (1993).

[94] S. R. Gadre, R. N. Shirsat, and A. C. Limaye, J. Phys. Chem. 98, 9165 (1994).

[95] K. Babu and S. R. Gadre, J. Comput. Chem. 24, 484 (2003).

[96] T. E. Exner and P. G. Mezey, J. Phys. Chem. A 108, 4301 (2004).

[97] W. Li and S. Li, J. Chem. Phys. 122, 194109 (2005).

[98] S. Li, W. Li, and T. Fang, J. Am. Chem. Soc. 127, 7215 (2005).

[99] T. Nakano et al., Chem. Phys. Lett. 318, 614 (2000).

[100] K. Kitaura, S. Sugiki, T. Nakano, Y. Akiyama, and M. Uebayasi, Chem. Phys. Lett. 336, 163 (2001).

[101] V. Deev and M. A. Collins, J. Chem. Phys. 122, 154102 (2005). 35

102] X. Chen, Y. Zhang, and J. Z. H. Zhang, J. Chem. Phys. 122, 184 (2005).

103] C. V. Alsenoy, Y. Ching-Hsing, A. Peters, J. M. L. Martin, and L. Schafer, J. Phys. Chem. A 102, 2246 (1998).

104] T. Ikegami et al, Proc. of Supercomputing, IEEE Computer Society 12, 10 (2005).

105] D. G. Fedorov and K. Kitaura, J. Chem. Phys. 121, 2483 (2004).

106] K. Fukuzawaet al., J. Comput. Chem. 26, 1 (2005). 107] D. G. Fedorov, R. M. Olson, K. Kitaura, M. S. Gordon, and S. Koseki, J. Comput. Chem. 25, 872 (2004).

108] Y. Komeiji et al., Chem. Phys. Lett. 372, 342 (2003).

109] M. Kobayashi, T. Akama, and H. Nakai, J. Chem. Phys. 125, 204106 (2006).

110] M. Kobayashi, Y. Imamura, and H. Nakai, J. Chem. Phys. 127, 074103 (2007).

Ill] D. G. Fedorov, T. Ishida, M. Uebayasi, and K. Kitaura, J. Phys. Chem. A. , DOI: 10.1021/jp0671042 (2007).

112] T. Vreven, K. Morokuma, O. Frakas, H. B. Schlegel, and M. J. Frisch, J. Comput. Chem. 24, 760 (2003).

113] K. S. Byun, K. Morokuma, and M. J. Frisch, J. Mol. Struct. (Theochem) 462, 1 (1999).

114] R. Wieczorek and J. J. Dannenberg, J. Am. Chem. Soc. 126, 14198 (2004).

115] K. Nemeth and M. Challacombe, J. Chem. Phys. 121, 2877 (2004).

116] RasMol, Molecular Visualization Freeware,

http://www.umass.edu/microbio/rasmol/, 2007.

117] B. M. Bode and M. S. Gordon, J. Mol. Graphics Mod. 16, 133138 (1998).

118] A. C. Limaye and S. R. Gadre, Curr. Sci. (India) 80, 1296 (2001). 119] W. L. DeLano, The PyMOL Molecular Graphics System (2002) on World Wide Web http://www.pymol.org. 120] Jmol, A Java Based opensource molecular visualization tool, http://www.jmol.org, 2007.

121] OpenGL, The Industry Standard for High Performance Graphics, http://www.opengl.org/, 2007.

122] IBM, OpenDX: Open Visualization Data Explorer, http://www.research.ibm.com/dx/, 2002. 36

[123] A. C. Limaye, Drishti - Volume Exploration and Presentation Tool, Poster presen­ tation, Vis 2006. Also see URL: http://sf.anu.edu.au/Vizlab/drishti/, Baltimore, 2006.

[124] P. J. MacDougall and C. E. Henze, Fleshing-out pharmacophores with volume rendering of the laplacian pf the charge density and hyperwall visualization tech­ nology, in The Quantum Theory of Atoms in Molecules, pages 499-514, Wiley- VCH, 2007.

[125] G. Kumfert et al., J. Phys. (Conference Series) 46, 479 (2006).

[126] M. F. Sanner, Structure 13, 447 (2005).

[127] Python, The Python Programming language, http://www.python.org, 2007.

[128] F. G. Kalatzis, D. G. Papageorgiou, and I. N. Demetropoulos, Comp. Phys. Comm. 175, 359 (2006).

[129] J. W. Chastine et al., Ammp-vis: A collaborative virtual environment for molecu­ lar modeling, in VRST'05, November 7, 2005, Monterey, California, USA., pages 8-15, 2005.

[130] JOELib, A Cheminformatics algorithm library, which was designed for prototyping, data mining, graph mining and algorithm development, http://joelib.sourceforge.net/, 2007.

[131] CDK, Chemistry Development Kit, A Java library for structural chemo- and bioinformatics, http://cdk.sourceforge.net/, 2007.

[132] Mathematica, Wolfram Research, http://www.wolfram.com/, 2007.

[133] G. E. Moore, Electronics 8, 38 (1965).