Chapter 1 Introduction

Chapter 1 Introduction "If you teach a computer to write a piece of music by feeding it an algorithm, have you composed the resulting piece or has the computer?" - Alexander Gelfand; The Sounds of Science; The Walrus (Toronto, Canada); Jun 2007 1.1 A Journey through Possibilities in Scientific Computing In broader terms, Scientific Computing is the application of computers for solving a variety of computational problems in Science. This ever expanding field, originally purely oriented towards applications of computing methods for Science, has matured over past decade or so and has, in fact, contributed many original algorithms and tech niques to a general field of Computer Science. With the advances in modern scientific techniques enabling better cure of diseases, deeper view into the origin of our universe and better control of tiny electronics for designing the next generation computing de vices; the related data and the complexity of handling the interpretation of these have become a major challenge. At this juncture, methods in Computer Science and Scien tific Computing have helped scientists to have faster and better understanding of data available at their disposal. Today, the methods in Scientific Computing have a far and wide-ranging impact on the way Science is understood, explored and put to daily use in everyone's life. The use of computer as a tool for scientific discovery has largely been made possible by immense technological advances in the development of Central Processing Unit (CPU), commonly referred to as processor. In the later half of the last decade, these developments have brought in the power of a supercomputer to a desktop machine requiring far less infrastructure (in terms of electrical power and cooling) for taping into its computational resources. Development of the IBM's Cell Processor[l, 2, 3] and Multi-core processors from Intel[4], AMD[5] and Sun Microsystems[6] have got to the 2 desktop what was only possible on a supercomputer only a few years ago: Parallel Programming[7, 8]. In recent years, the computing power has slowly started moving away form a mere desktop to living room (set-top boxes, high-end gaming stations such as PS3[9], X-Box[10]) and mobile devices, which has opened up a whole new possibility of utilizing the computing power offered by these devices for scientific exploration. This trend is evident from the recent release of the Folding@Home clients for PS3 and X-Box gaming machines[ll]. These new generation gaming devices are, infact, powered by highly parallel vector processors such as IBM's Cell processor, which in turn provides a very attractive platform for high performance scientific code. One of the first such codes to take advantage of this new architecture is the Folding@Home project. This project uses a highly distributed setup of voluntary contributions from a number of PC or PS3 owners to study protein folding problem largely related to various disorders affecting humans [12]. There is also a trend in using the auxiliary power offered by a graphics processing unit (GPU) to do some computations asynchronously with the main CPU[13]. Out of curiosity, the author himself has been exploring the use of programmable mobile devices for performing small quantum chemical calculations[14]. So much is the thrust in the field of Scientific Computing that a leading peer- reviewed journal Nature carried out a special news feature series titled "2020 Comput ing" [15]. In these series of articles, scientists working in various fields have evaluated the possibilities of use of computers in coming years based on current trends and re search. Fields as diverse as Geography, Earth and Seismic Science, Astronomy, Eco nomic and Financial analysis and even Psychology increasingly use massive computing power which, many a time, handles real time physical data. Apart from conventional semiconductor-based computers, there is also a trend towards building highly paral lel computers based on quantum theory[16, 17]. These computers use a Quantum Bit (generally known as qubit) as a building block rather than a conventional Binary Digit (referred to as just Bit). Qubit, in comparison to its classical counterpart can take three states: 1, 0 and a superposition of 1 and 0; potentially allowing infinite amount of information to be represented in single such quantum bit. Further, these computers are claimed to be far superior in solving certain problems that are very difficult or are not solvable using current day technology [18]. All these interesting aspects of Scientific Computing are the basic propelling fac tors behind the author's interest in investigating them, though in this case restricted to computational chemistry and related visualization tools. The current work focuses on 3 the scaling problem of current ab initio quantum chemical codes. It will also dwell into the development of Web based tools and an integrated development environment (IDE) specifically tailored for structure-based computational chemistry. 1.2 Quantum Chemistry Codes, Scalability and Linear Scaling Algorithms Over the past decade or so, computational chemistry has played a major role in assisting an experimental chemist either to explain the results of his experiments or to aid him into discovering the unexplored chemical pathways. But in recent years, a computational chemist is rather playing a more proactive role by being a medium of future discoveries or designing of novel molecules and clusters. Increasing Decreasing system size -H system size 6 s$? -W" -0(10000) ~O(1000) ~o(i6bj -O(10) Few atoms FF Semi Ab initio HF MP2 CC Empirical COMPUTATIONAL COST Decreasing time/ * Increasing time/ storage complexity storage complexity Figure 1.1: Complexity and sophistication of various methods used in structure based computational chemistry. Of the various methods available to a computational chemist, ab initio quan tum chemical methods have a wider acceptance for their ability to predict experimental results quite accurately. Other methods such as molecular mechanics[19, 20] and semi- empirical [21. 22, 23] methods are fast but not reliable and are based on a number of assumptions. Ab initio quantum chemical methods on the other hand, provide approx imate solutions to the Schrodinger equation (Eq. 1.1), which are accurate enough to describe interactions in a many particle system. In Eq. 1.1, H is the Hamiltonian, E represents the total energy while * represents the molecular wavefunction. HV = EV 1.1) However, these approximate solutions such as the Hartree-Fock (HF) equations have very high scaling factor (typically 0(N3)) with respect to the Gaussian functions 4 used to approximate the atomic orbitals[24, 25]. For more accurate methods which take electron correlation effects into consideration, such as M0llei-Plesset second order per turbation theory (MP2) and coupled cluster (CC) methods have an even worse scaling factor of 0(N5) or higher[24, 26]. This scaling factor is not only in terms of compute requirements but also in terms of memory requirements which further restricts the use of these methods to only a few hundred atoms for HF method and to only a few tens for correlated (or post-HF) methods. The schematic representation in Fig. 1.1 sums up the memory and time complexities of various methods for single point evaluation of energy. This also clearly brings out the need for improvements in the current conven tional methods and codes, if ever they need to be routinely applied to larger systems of chemical and biological interest. The ensuing Sections bring out some methodological and computational details pertaining to ab initio quantum chemical methods which are investigated in this thesis. 1.2.1 Hartree-Fock, M0ller-Plesset and Density Functional Theory The goal of all quantum chemical methods applied to many electron systems is to calculate chemical properties of molecular systems from first principles. This essentially involves solving the the Schrodinger equation (Eq. 1.1), as accurately as possible, to determine the wave function <!/ of the molecular system being studied. Hartree-Fock Method The time-independent Schrodinger equation cannot be solved exactly except for simplest systems, so for practical applications a series of approximations were intro duced. One of the most important of these is the Born-Oppenheimer approximation[27], where in motions of electron and nuclei are treated separately owing to enormous dif ference in their masses. Another major simplification based on the Born-Oppenheimer approximation was introduced by Hartree and Fock[28, 29]. The so called canonical Hartree-Fock equations are given as: FA = f^lH (1.2) where F is known as Fock operator and the index i is used to represent the total number of wavefunctions used to represent the system for which the orbital energies are ej. Roothaan and Hall[30. 31] derived a particular case of Hartree-Fock equations by expanding the molecular orbitals ^'s as a linear combination of atomic orbitals (AOs), 5 tfi. These AOs are again expanded as linear combination of a fixed set of basis functions <VS- fi = ^2Cip(/)p (1.3) p where Cip are orbital expansion coefficients. With the use of these basis functions <f>p, the Hartree-Fock equations now can be represented in a matrix form which are also known as Roothaan equations. For closed shell systems they are of the form: FC = eSC (1.4) In the above equation, F is the Fock matrix, C is expansion coefficient matrix and S is the overlap matrix, while e are the eigen values representing the orbital energies of the system. All the matrices are of dimension N x jV, for a molecular system represented with TV basis functions. Eq. 1.4 is a form of pseudo-eigenvalue problem and hence cannot be solved directly. Hence an iterative method is employed for solving this equation.

Load more