APPLICATIONS OF COMPUTER ALGEBRA IN PHYSICAL CHEMISTRY

J. F. OGILVIE Research School of Chemistry, Australian National University, P.O. Box 4, Canberra, A.C.T. 2600, Australia

(Received 25 February 1982; in revised form 23 March 1982)

Abstract-After a brief discussion of the nature of computer algebra, the facilities of some algebraic or symbolic processors are outlined, followed by some instances of applications of important features td problems in physical chemistry.

1. INTRODUCTION 2. SURVEY OF PROCESSORS Among the variety of methods that modern chemists In fact the first application of symbolic computation apply in their work, numerical computing is well (Kahrimanian, 1953) dates from the earliest years of established. Through the use of such unstructured lan- modern electronic digital computers, at a time that pre- guages (compilers or interpreters) as BASIC or FOR- ceded active development of FORTRAN in U.S.A. and TRAN or such structured languages as ALGOL or PL-1, ALGOL in Europe. However the really concerted or Iverson’s array language APL, one can readily make development of algebraic processors started about 1960. numerical calculations to perform such diverse tasks as FORMAC, a formula manipulation compiler, is a super- to compute the electronic density distribution of a mole- set of PL-1: because its usage is closely similar to that of cule according to a quantum theory, to find the ther- numerical PL-I, it is easy to use by programmers with modynamic properties of a liquid substance by simula- knowledge of this language. Furthermore this for- tion of the motions of the molecular particles, by a mulation has potentially enormous power due to the Monte-Carlo sampling technique, or to find the velocity combination of algebraic and numerical functions. Un- coefficient of a chemical reaction by regression of some fortunately development of FORMAC was terminated by function of reactant concentrations with time. In prin- IBM before a completely satisfactory processor had ciple, all such computations (i.e. the execution of the been achieved; some improvements and enhancements programs) can be performed by a (patient) human being were however made during further development in on an abacus. In the history of the development of Eurooe that resulted in FORMAC-73 (Bahr. 1975). human numeracy, there was long ago discovered the Nevertheless both versions are useful. F’ORMAC has capacity to think in a purely symbolic manner. been distributed in the form of macros that restricted its Many of the branches of mathematics, such as algebra, implementation to IBM (or compatible) computers. differential and integral calculus, group theory, etc. are LISP, a list processor, was designed not only as a founded on such symbolic representations, although of language for instructing computers but also as a formal course purely arithmetic operations are an essential mathematical language primarily for symbolic data component of any such practice. Perhaps many chem- processing (Berkeley & Dobrow, 1964). In fact the first ists, even those familiar with numerical computations, application of computer algebra to physical chemistry may be unaware of both the existence of symbolic was in LISP, where wavefunctions were formulated both processors (or languages) to conduct mechanically these algebraically and numerically in order to check the mathematics (as opposed to primarily arithmetic) and the results (Chandler et al., 1968). This language, developed potentially powerful applications of such processors in at Massachusetts Institute of Technology, is charac- their scientific work. In physics, already extensive re- terised by a proliferation of parentheses in its typical views (Barton & Fitch, 1972; Campbell, 1974; anony- expressions; thus the formal construction of such mous, 1981; Pavelle et al., 1981) have been published expressions became an impediment to casual use by with much discussion of applications, and r&ently a non-professional programmers. But the importance of book (Howard, 1980) with FORMAC and LISP has far transcended its visibIe applications, at least programs for aeronautical applications, paiticle for physical scientists, because it has been used as a dynamics, fluid mechanics, cosmological models and tra- preprocessor or interpreter of programs through its abil- jectory calculations has appeared. ity to generate programs for further execution. In this article we outline some exemplary uses of One processor based on LISP was MATHLAB(68), computer algebra in physical chemistry after describing also developed at M.I.T. (Engleman, 1969). This inter- briefly some processors and their facilities for such ap- active processor enabled such common procedures as plications. So utilised, the computer can be regarded as simplification, substitution, differentiation, polynomial having not only the numerical capabilities of traditional factorisation, indefinite integration {of a restricted class scientific computing, but also a word-processing facility of functions), direct and inverse Laplace transforms, and knowledge of rules of higher mathematics that per- solution of linear differential equations with constant mit it still to execute purely numerically while simulating coefficients, solution of simultaneous linear equations, an algebraic or symbolic capacity. and a collection of matrix functions for the introduction

169 170 J. F. OCILV[E or modification of matrices, extraction of sub-matrices or istry. Perhaps one should first comment that these pro- elements, matrix arithmetic (addition, multiplication or cessors are likely to be useful for any symbolic com- inversion), and extraction of significant properties (rank, outation of which the method is well understood but for determinant, characteristic polynomial and trace). khich the operations are tedious to perform. In the Development of MATHLAB has been superseded by following paragraphs, we specify some facilities that any that of MACSYMA, a much more advanced and power- generally useful algebraic processor must provide and ful processor, only recently released for use outside give instances of how these have been used. Some M.I.T. typical times for the central processor unit, of A product of the University of Wisconsin (Madison) moderately large computers common during the mid was SAC-l, for symbolic and algebraic calculations. This 1970’s, are given as an indication of machine require- system consists of several FORTRAN subroutines and ments, but of course these will vary depending on the functions that can be called in a user’s program to particular processor and the machine on which it is facilitate these types of calculations. All the subroutines implemented. and functions form thirteen different sets of programs, called subsystems, each of which is designed for the (1) AlI types of manipulation of polynomials or multi- manipulation of a special type of mathematical object. nomials should be possible, although division may be The philosophy of the producers of SAC-l was to regard dificulf the mathem&al objects as lists, so that in principle the Consider the Dunham (1932) potential-energy function nromams containing SAC-l extend FORTRAN to V(x) in terms of the reduced internuclear separation x: become a special&d list processing language. The availability of SAC-2 has recently been announced. A language and system for performing symbolic com- V(x) = aoxZ (I + gj &]. putations on algebraic data originating in Bell Labora- tories, U.S.A., ALTRAN, algebra translator, has the basic capability to perform rational operations on Sandeman (1940) has given expressions for energy rational expressions in one or more indeterminates (al- coefficients Ykr in terms of ci defined in terms of the gebraic quantities or variables), with integer coefficients, inverse relationship: and is designed lo handle very large problems involving such data with considerable efficiency. The syntax of the language is similar to that of FORTRAN. An attractive feature of the language is that it has been itself prepared in a very transportable form of FORTRAN, although some assembly language primitives for several popular In order to use Sandernan’s formulae in the usual form computer systems are included on the system tape. Thus Yk,(a,), it is necessary to revert the first series to obtain it is readily installed on almost any computer having both ci as a function of u,. This task is easily done in AL- a FORTRAN compiler and sufficient core memory. TRAN, for instance, and the expressions for crclO, to Another popular processor for computer algebra has check and extend Wooltey’s (1962) results, require only a been REDUCE (Hearn, 1973). This programming system, few seconds of CPU time. built on LISP, is capable of expansion and ordering of polynomials and rational functions, symbolic differen- (2) Diflerentiation should be available tiation, substitution and pattern matching in a wide If we wish to determine the discrete energy states of variety of forms, calculation of greatest common divisor Ar2 from the potential-energy function of Barker et al. of two polynomials, automatic and user-controlled sim- (1971) from thermodynamic measurements, for instance, plication of expressions, matrix and tensor operations, of the form and spin-; and spin-l algebra for calculations in high- energy physics. REDUCE is still in active development and has recently incorporated a general integration V(X) = E eC”” 5: ,+I’ - 2 f&+,/(6 + (1 + X)2i+6 capability. ,=” 1=0 Although really satisfactory languages for computer algebra reauire both a word length of at least 32 bits and we might solve the Schrijdinger equation numerically, a machine-size of at least 32006 words of core memory, but the accuracy may be inadequate. Alternatively we recentlv two uackaees. MUMATH and PICOMATH. for could differentiate numerically this V(X) function at x = computer algebra have become available for popular 0 and equate the derivatives to the Dunham potential- microcomputers (Stoutemyer, 1980). Despite not being energy coefficients 4 as: capable of extensive computations, these programs demonstrate the nature of computer algebra, and are a0anm2n!, for n z==2. appropriate to the context of high-school mathematics. ($$-y = There are of course other algebraic processors desig- ned for more specialised applications, such as Numerical accuracy is almost certainly a problem if one SCHOONSCHIP (C.E.R.N., Geneva) for quantum elec- wanted to determine the twelfth derivative in order to trodynamics, CAMAL (Cambridge University, U.K.) for obtain al0. Perhaps the most precise approach is to general relativity, TRIGMAN for celestial mechanics conduct the twelve differentiations analytically, with etc., but these are unlikely to be generally useful for “infinite” precision, and then to substitute the values of chemical applications. the known Ai and Ci to obtain the desired numbers. The latter orocedure reauires about a minute of CPU time in 3. APPLICATIONS IN PHYSICALCHEMISTRY FORMAC, for instance, to derive ai up to alo, whereas Having outlined some available algebraic processors, manually even the first eight derivatives constitute a we can now discuss some applications in physical chem- day’s work. Applications of computer algebra in physical chemistry 171

Another example can be found in determination of standard errors of parameters from the standard devia- tions of observable quantities. For instance if we wish to estimate the significance (Ogilvie & Koo, I976) of some set of potential-energy coefficients, 0, f a,“, resulting where the elements of R involve the angular parameters from the experimental error of some set of Ykr2 CT&, as follows: then the procedure for error propagation (Clifford, 1973) requires the knowledge of values of the derivatives, R,,=l; R,~=cosY; R,3=~~~P; d Y&,/da,. Such analytic expressions for the derivatives R,, = R3, = R,, = 0; can be readily generated, in REDUCE for instance, in a few minutes of CPU time; in this case furthermore, the R,, = sin y; R2, = (cos a - cos f3 cos y)lsin y; output can be expressed as statements directly accept- R,, = k”*lsin y, able for inclusion in FORTRAN programs in which the actual numerical computations are better carried out. where (3) Substitution Some of the most important operations in computer algebra are substitutions: these can be local or global, they can occur in general circumstances, or an exact and the unit cell volume pattern-matching criterion might be applied. V = k”‘abc. As an instance of this type of operation, we can consider the generation of the energy coefficients Ykr In REDUCE, the statements from the rotationless Yku. The latter quantities can be determined from solution of the Schriidinger equation RZN:= R ** (- I) or RZN:= l/R for a vibrational potential-energy function, that of Dun- ham (1932) expressed as produce the results:

R;,‘= I; R;: = -cos V(x) = B,iyq1 + ,g, u,xj y/sin y; R,~=(~~~a~~~-,-sin~ycos~ - cos p cos* r)l(k”’ sin y); where y = Z&Jo, The Y&, can be obtained from the corresponding Yko by making Z-dependent all the Ryi = l/sin y; R;i = (-cos a +cos p cos y)/k”‘sin 7); parameters of the V(x) function, such that y+ y’, x+x’ R;;’ = sin y/k”‘; R;,’ = R;,’ = R_$ = 0. and a, + ui’ (where the superscript J denotes a depen- dence upon the quantum number J of rotational angular Because the transformation matrix is inverted symbolic- momentum, not an exponent). For instance, ally there can be no problems of numerical ill-condition- ing to affect adversely the precision of the results. J = a,+ y2[J(I+ l)J{4(a,+ I)-3a,(a,+ 1)) al (5) Znfegrntion + y’[J(J + I)]‘{. . . I Perhaps many chemists labour under the mistaken impression that there is no general algorithm for Substitution of these J-dependent parameters into the indefinite integration. In fact s&h an algorithm was YkO has been used to determine all the 84 expressions discovered by Risch (1969) and imolemented in relation (containing up to alo) of Y,,, up to [_Z(Z+ l)]“. The to the CAMAL system by Norman (Norman & Moore, expression for Y1.10 for instance contains 139 terms, 1977). Now this scheme has been incorporated into the such as 87091200000n,“a,a,. In a few minutes of CPU REDUCE system, to enable use of expressions compris- time with REDUCE, it was possible to double the num- ing polynomials, logarithmic functions, exponential ber of expressions found since Dunham’s work (1932). A functions, tan and arctan functions, and also to attempt smaller collection of Y,, required nearer an hour of CPU to integrate expressions involving error functions, dilo- time in a numerical computation according to pertur- garithms and other trigonometric expressions. For in- bation theory (Bouanich, 1978), and derivation of Yo.,, stance the expression INT(LOG(X), x) will return the would probably require a year of manual effort. result X * (LOG(X) - 1). Of course the result can always be checked by differentiation. Without this expiicit in- (4) General matrix operalions tegration facility, one can stilt inchrde the definition of Although some operations on matrices are trivial to integration operators for specific classes of functions. program, such as addition or multiplication, others like Thus integration stages can be incorporated into com- inversion or evaluation of the determinant arc less puter algebra programs just as easily and naturally as simple. differentiation. As an example from crystal chemistry, consider the transformation from reduced coordinates (in terms of lattice parameters a, b, c, a, p, y) to orthogonal Cartesian 4. CONCLUSIONS coordinates (x, y, z) according to a laboratory frame- Computer algebra makes possible a new approach to work. While it is convenient to perform symmetry machine solution of problems in physical chemistry. operations upon the lattice points expressed in reduced Because calculations are exact (of “infinite” precision), coordinates, all calculations of internuclear distances with no approximations (in principle), and because of the rely upon the Cartesian coordinates. Therefore we direct algebraic form of the results, thus conveying more require the following transformation: analytical or physical meaning than a mere table of

CAC Vol. 6. No. 4-C 172 J. F. OGILVIE

numbers, this approach complements traditional Barker, J. A., Fisher, R. A. & Watts, R. 0. (1971). Mol. Phys. 21, numerical computing. 657. A distinction as to mode of use of numerical and Barton, D. & Fitch, J. P. (1972), Rep. Prog. Phys. 35, 235. symbolic computing should be stated. Typically one Berkeley, E. C. & Dobrow, D. G. (1966), The Programming develops an extensive numerical program with the in- L.anguage LISP, Cambridge, U.S.A., M.I.T. Press. Bouanich. J. P. (1978), .I. Quant. Specirosc. Radiat. Trans. 19, tention of making many production runs with different 381. data sets in order to generate lesser or greater quantities Campbell. J. A. (1974). Acta Phys. Aw~K, ~uppl. XIII, 595. of numerical results of perhaps uncertain accuracy or Chandler, G. S., Thirnnamachaodran, T. & Campbell, J. A. validity. In contrast, a program of computer algebra is (1968), J. Chem. Phys. 49, 3640. typically run once successfully, to produce exact sym- Clifford, A. A. (1973), M&iuariate Error Analysis. London, bolic expressions probably checked easily (to some Applied Science Publishers. extent) by inspection, although the resulting expressions Dunham, 1. L. (1932), Phys. Rev. 41,721. might later be incorporated into numerical programs to Engleman. C. (1%9), IItformalion Processing 68 (Edited by produce several sets of meaningful numbers. Computer Morrell, A. J. H.), pp. 462-467, Amsterdam. North Holland. Hearn, A. C. (1973), Proc. Third Inl. Coiloq. on Advanced algebra is thus best conducted in an interactive fashion,

Comautine Methods in Theoretical Phvsics.I ,. o. AVl-19. Mar- as the user engages in transformations or substitutions se&s, C.N.R.S. on the basis of intermediate results in order to produce Howard, I. C. (1980), Practical Appiicntions of Symbolic Com- the final output in the most readable form. Although putation (Mathematical Modelling of Diverse Phenomena). algebraic computing may sometimes be relatively less Guildford. U.K., IPC Science and Technology Press. efficient than purely numerical computing if only the Kahrimanian, H. & (1953), Analytic Difieren&ion by (I Digital CPU time to yield a given numerical result is considered, Computer. M.A. Thesis, Temple University, Philadelphia, it is likely that a judicious combination of algebraic and U.S.A. numerical computing can produce more reliable and Norman, A. C. & Moore, P. M. A. (1977), Implementing the New Risch Intemation Ahoritbm. Fourfh Znr. Colloq. on Advaanced meaningful results for many problems of interest to Cornout& Methods% Theoretical Phvsics. M&seilles. physical chemists, accompanied by both greater physical Ogilvi~, J. Fy& Koo. D. (1976), I. Mol. ipecirosc. 61, 332. insight of the mathematical expressions and a significant Pavelle. R., Rothstein, M. & Fitch, J. (1981), Sci. Am. Z&(6). LO2. decrease of programming effort and computer time, than Risch. R. H. (1969). Trans. AMS 139. 167. can be achieved by numerical programming alone. Sandernan, I. (1940), Proc. Roy. Sot. Edinburgh 60, 210. Stoutemyer, D. R. (1980). SZGSAM Bull. A.C.M. 14(3), 5. REFERENCES Woolley, H. W. (1%2), J. Chem. Phys. 37, 1307. Anonymous (1981), Nature 290, 198. Bahr, K. (1975), SIGSAM Bull. ACM 9(l), 21.