<<

the Top T HEME ARTICLE

THE DECOMPOSITIONAL APPROACH TO COMPUTATION

The introduction of into numerical revolutionized matrix computations. This article outlines the decompositional approach, comments on its history, and surveys the six most widely used decompositions.

n 1951, Paul S. Dwyer published Linear used equations whereas Householder used Computations, perhaps the first book de- partitioned matrices. But a deeper difference is voted entirely to .1 that while Dwyer started from a system of equa- Digital computing was in its infancy, and tions, Householder worked with a (block) LU IDwyer focused on computation with mechani- decomposition—the of a matrix cal calculators. Nonetheless, the book was state into the product of lower and upper triangular of the art. Figure 1 reproduces a page of the matrices. book dealing with . In Generally speaking, a decomposition is a fac- 1954, Alston S. Householder published Princi- torization of a matrix into simpler factors. The ples of ,2 one of the first mod- underlying principle of the decompositional ap- ern treatments of high-speed digital computa- proach to matrix computation is that it is not the tion. Figure 2 reproduces a page from this book, business of the matrix algorithmists to solve par- also dealing with Gaussian elimination. ticular problems but to construct computational The contrast between these two pages is strik- platforms from which a variety of problems can ing. The most obvious difference is that Dwyer be solved. This approach, which was in full swing by the mid-1960s, has revolutionized ma- trix computation. To illustrate the nature of the decompositional 1521-9615/00/$10.00 © 2000 IEEE approach and its consequences, I begin with a discussion of the and G.W. STEWART the solution of linear systems—essentially the University of Maryland decomposition that Gauss computed in his elim-

50 COMPUTING IN SCIENCE & ENGINEERING Figure 1. This page from Linear Computations Figure 2. On this page from Principles of Numerical shows that Paul Dwyer’s approach begins with a Analysis, Alston Householder uses partitioned system of scalar equations. Courtesy of John Wiley matrices and LU decomposition. Courtesy of Mc- & Sons. Graw-Hill.

ination . This article also provides a A can be factored in the form tour of the five other major matrix decomposi- tions, including the pivoted LU decomposition, A = RTR (2) the QR decomposition, the spectral decompo- sition, the , and the singu- where is an upper . The fac- lar value decomposition. torization is called the Cholesky decomposition of A disclaimer is in order. This article deals pri- A. marily with dense matrix computations. Al- The factorization in Equation 2 can be used though the decompositional approach has greatly to solve linear systems as follows. If we write the − influenced iterative and direct methods for sparse system in the form RTRx = b and set y = R Tb, matrices, the ways in which it has affected them then x is the solution of the triangular system are different from what I describe here. Rx = y. However, by definition y is the solution of the system RTy = b. Consequently, we have re- duced the problem to the solution of two trian- The Cholesky decomposition and gular systems, as illustrated in the following al- linear systems gorithm: We can use the decompositional approach to solve the system 1. Solve the system RTy = b. 2. Solve the system Rx = y. (3) Ax = b (1) Because triangular systems are easy to solve, the where A is positive definite. It is well known that introduction of the Cholesky decomposition has

JANUARY/FEBRUARY 2000 51 used to solve a system at one point in a compu- tation, you can reuse the decomposition to do the same thing later without having to recom- pute it. Historically, Gaussian elimination and its variants (including Cholesky’s algorithm) have solved the system in Equation 1 by reducing it to an equivalent triangular system. This mixes the computation of the decomposition with the solution of the first triangular system in Equa- tion 3, and it is not obvious how to reuse the elimination when a new right-hand side presents itself. A naive programmer is in danger of per- forming the reduction from the beginning, thus repeating the lion’s share of the work. On the other hand, a program that knows a decomposi- tion is in the background can reuse it as needed. (By the way, the problem of recomputing de- compositions has not gone away. Some matrix packages hide the fact that they repeatedly com- pute a decomposition by providing drivers to solve linear systems with a call to a single rou- tine. If the program calls the routine again with the same matrix, it recomputes the decomposi- tion—unnecessarily. Interpretive matrix systems such as Matlab and Mathematica have the same problemthey hide decompositions behind op- Figure 3. These varieties of Gaussian elimination are all numerically erators and function calls. Such are the conse- equivalent. quences of not stressing the decompositional ap- proach to the consumers of matrix .) Another advantage of working with decompo- transformed our problem into one for which the sitions is unity. There are different ways of or- solution can be readily computed. ganizing the operations involved in solving lin- We can use such decompositions to solve ear systems by Gaussian elimination in general more than one problem. For example, the fol- and Cholesky’s algorithm in particular. Figure 3 lowing algorithm solves the system ATx = b: illustrates some of these arrangements: a white area contains elements from the original matrix, 1. Solve the system Ry = b. a dark area contains the factors, a light gray 2. Solve the system RTx = y. (4) area contains partially processed elements, and the boundary strips contain elements about to Again, in many statistical applications we want be processed. Most of these variants were origi- − to compute the quantity ρ = xTA 1x. Because nally presented in scalar form as new algorithms. Once you recognize that a decomposition is in- − − − − xTA 1x = xT(RTR) 1x = (R Tx)T(R Tx) (5) volved, it is easy to see the essential unity of the various algorithms. we can compute ρ as follows: All the variants in Figure 3 are numerically equivalent. This means that one rounding-error 1. Solve the system RTy = x. analysis serves all. For example, the Cholesky al- 2. ρ = yTy. (6) gorithm, in whatever guise, is backward stable: the computed factor R satisfies The decompositional approach can also save computation. For example, the Cholesky de- (A + E) = RTR (7) composition requires O(n3) operations to com- pute, whereas the solution of triangular systems where E is of the size of the rounding unit rela- requires only O(n2) operations. Thus, if you rec- tive to A. Establishing this backward is usually ognize that a Cholesky decomposition is being the most difficult part of an analysis of the use

52 COMPUTING IN SCIENCE & ENGINEERING of a decomposition to solve a problem. For ex- ample, once Equation 7 has been established, the Benefits of the decompositional rounding errors involved in the solutions of the approach triangular systems in Equation 3 can be incor- porated in E with relative ease. Thus, another • A matrix decomposition solves not one but many problems. advantage of the decompositional approach is • A matrix decomposition, which is generally expensive to that it concentrates the most difficult aspects of compute, can be reused to solve new problems involving the rounding-error analysis in one place. original matrix. In general, if you change the elements of a • The decompositional approach often shows that apparently positive definite matrix, you must recompute its different algorithms are actually computing the same object. Cholesky decomposition from scratch. However, • The decompositional approach facilitates rounding-error if the change is structured, it may be possible to analysis. compute the new decomposition directly from • Many matrix decompositions can be updated, sometimes the old—a process known as updating. For ex- with great savings in computation. ample, you can compute the Cholesky decom- • By focusing on a few decompositions instead of a host of position of A + xxT from that of A in O(n2) op- specific problems, software developers have been able to erations, an enormous savings over the ab initio produce highly effective matrix packages. computation of the decomposition. Finally, the decompositional approach has greatly affected the development of software for matrix computation. Instead of wasting energy the reduction of a quadratic form ϕ(x) = xTAx developing code for a variety of specific applica- (I am simplifying a little here). In terms of the tions, the producers of a matrix package can con- Cholesky factorization A = RTR, Gauss wrote centrate on the decompositions themselves, per- ϕ(x) in the form haps with a few auxiliary routines to handle the 2 2 2 most important applications. This approach has ϕ = T + T + + T (x) (r1 x) (r2 x) K (rn x) informed the major public-domain packages: the ≡ ρ2 + ρ2 + + ρ2 Handbook series,3 Eispack,4 Linpack,5 and La- 1 (x) 2(x) K n(x) (8) pack.6 A consequence of this emphasis on de- T compositions is that software developers have where ri is the ith row of R. Thus Gauss re- found that most algorithms have broad compu- duced ϕ(x) to a sum of squares of linear functions ρ tational features in common—features than can i. Because R is upper triangular, the function ρ be relegated to basic linear-algebra subprograms i(x) depends only on the components xi,…xn (such as Blas), which can then be optimized for of x. Since the coefficients in the linear forms 7−9 ρ specific machines. i are the elements of R, Gauss, by showing how ρ For easy reference, the sidebar “Benefits of the to compute the i, effectively computed the decompositional approach” summarizes the ad- Cholesky decomposition of A. vantages of decomposition. Other decompositions were introduced in other ways. For example, Jacobi introduced the LU decomposition as a decomposition of a bi- History linear form into a sum of products of linear func- All the widely used decompositions had made tions having an appropriate triangularity with their appearance by 1909, when Schur intro- respect to the variables. The de- duced the decomposition that now bears his composition made its appearance as an orthogo- name. However, with the exception of the Schur nal change of variables that diagonalized a bilin- decomposition, they were not cast in the lan- ear form. Eventually, all these decompositions guage of matrices (in spite of the fact that matri- found expressions as of matrices.11 ces had been introduced in 185810). I provide The process by which decomposition became some historical background for the individual so important to matrix computations was slow decompositions later, but it is instructive here to and incremental. Gauss certainly had the spirit. consider how the originators proceeded in the He used his decomposition to perform many absence of matrices. tasks, such as computing variances, and even Gauss, who worked with positive definite sys- used it to update least-squares solutions. But tems defined by the normal equations for least Gauss never regarded his decomposition as a squares, described his elimination procedure as matrix factorization, and it would be anachro-

JANUARY/FEBRUARY 2000 53 nistic to consider him the father of the decom- algorithms that compute them have a satisfac- positional approach. tory backward rounding-error analysis (see In the 1940s, awareness grew that the usual al- Equation 7). In this brief tour, I provide refer- gorithms for solving linear systems involved ma- ences only for details that cannot be found in the trix factorization.12,13 John Von Neumann and many excellent texts and monographs on nu- − H.H. Goldstine, in their ground-breaking error merical linear algebra,18 26 the Handbook se- analysis of the solution of linear systems, pointed ries,3 or the LINPACK Users’ Guide.5 out the division of labor between computing a factorization and solving the system:14 The Cholesky decomposition Description. Given a positive definite matrix We may therefore interpret the elimination A, there is a unique upper triangular matrix R method as one which bases the inverting of an with positive diagonal elements such that arbitrary matrix A on the combination of two tricks: First it decomposes A into the product of A = RTR. two semi-diagonal matrices , B′ …, and conse- quently the inverse of A obtains immediately In this form, the decomposition is known as the from those of C and B′. Second it forms their in- Cholesky decomposition. It is often written in verses by a simple, explicit, inductive process. the form

In the 1950s and early 1960s, Householder A = LDLT systematically explored the relation between var- ious algorithms in matrix terms. His book The where D is diagonal and L is unit lower triangu- Theory of Matrices in Numerical Analysis is the lar (that is, L is lower triangular with ones on the mathematical epitome of the decompositional diagonal). approach.15 Applications. The Cholesky decomposition is In 1954, Givens showed how to reduce a sym- used primarily to solve positive definite linear metric matrix A to tridiagonal form by orthogo- systems, as in Equations 3 and 6. It can also be nal transformation.16 The reduction was merely employed to compute quantities useful in statis- a way station to the computation of the - tics, as in Equation 4. values of A, and at the time no one thought of it Algorithms. A Cholesky decomposition can as a decomposition. However, it and other in- be computed using any of the variants of Gauss- termediate forms have proven useful in their ian elimination (see Figure 3)modified, of own right and have become a staple of the de- course, to take advantage of symmetry. All these compositional approach. algorithms take approximately n3/6 floating- In 1961, James Wilkinson gave the first back- point additions and multiplications. The algo- ward rounding-error analysis of the solutions of rithm Cholesky proposed corresponds to the di- linear systems.17 Here, the division of labor is agram in the lower right of Figure 3. complete. He gives one analysis of the compu- Updating. Given a Cholesky decomposition tation of the LU decomposition and another of A = RTR, you can calculate the Cholesky de- the solution of triangular systems and then com- composition of A + xxT from R and x in O(n2) bines the two. Wilkinson continued analyzing floating-point additions and multiplications. The various algorithms for computing decomposi- Cholesky decomposition of A − xxT can be cal- tions, introducing uniform techniques for deal- culated with the same number of operations. ing with the transformations used in the com- The latter process, which is called downdating, is putations. By the time his book Algebraic numerically less stable than updating. Eigenvalue Problem18 appeared in 1965, the de- The pivoted Cholesky decomposition. If P compositional approach was firmly established. is a and A is positive defi- nite, then PTAP is said to be a diagonal permu- tation of A (among other things, it permutes the The big six diagonals of A). Any diagonal permutation of A There are many matrix decompositions, old is positive definite and has a Cholesky factor. and new, and the list of the latter seems to grow Such a factorization is called a pivoted Cholesky daily. Nonetheless, six decompositions hold the factorization. There are many ways to pivot a center. The reason is that they are useful and sta- Cholesky decomposition, but the most common ble—they have important applications and the one produces a factor satisfying

54 COMPUTING IN SCIENCE & ENGINEERING History. In establishing the existence of the 2 ≥ 2 32 rkk ∑∑rij LU decomposition, Jacobi showed that under j>k i≥k (9) certain conditions a bilinear form ϕ(x, y) can be written in the form In particular, if A is positive semidefinite, this ϕ ρ σ ρ σ ρ σ strategy will assure that R has the form (x, y) = 1(x) 1(y) + 2(x) 2(y) + … + n(x) n(y) R R  R =  11 12 where ρ and σ are linear functions that depend  0 0  i i only on the last (n – i + 1) components of their where R11 is nonsingular and has the same order arguments. The coefficients of the functions are as the of A. Hence, the pivoted Cholesky the elements of L and U. decomposition is widely used for rank determi- nation. The QR decomposition History. The Cholesky decomposition (more Description. Let A be an m × n matrix with precisely, an LDLT decomposition) was the de- m ≥ n. There is an Q such that composition of Gauss’s elimination algorithm, 27   which he sketched in 1809 and presented in T R 28 Q A =   full in 1810. Benoît published Cholesky’s vari-  0 ant posthumously in 1924.29 where R is upper triangular with nonnegative di- The pivoted LU decomposition agonal elements (or positive diagonal elements Description. Given a matrix A of order n, if A is of rank n). there are permutations P and Q such that If we partition Q in the form

T P AQ = LU Q = (QA Q⊥)

where L is unit lower triangular and U is upper where QA has n columns, then we can write triangular. The matrices P and Q are not unique, and the process of selecting them is known as A = QAR. (10) pivoting. Applications. Like the Cholesky decomposi- This is sometimes called the QR factorization of tion, the LU decomposition is used primarily for A. solving linear systems. However, since A is a Applications. When A is of rank n, the general matrix, this application covers a wide columns of QA form an orthonormal for range. For example, the LU decomposition is the column space R(A) of A, and the columns of used to compute the steady-state vector of Markov Q⊥ form an orthonormal basis of the orthogo- R T chains and, with the inverse power method, to nal complement of (A). In particular, QAQA is compute eigenvectors. the orthogonal projection onto R(A). For this Algorithms. The basic algorithm for com- reason, the QR decomposition is widely used in puting LU decompositions is a generalization of applications with a geometric flavor, especially Gaussian elimination to nonsymmetric matrices. least squares. When and how this generalization arose is ob- Algorithms. There are two distinct classes of scure (see Dwyer1 for comments and references). algorithms for computing the QR decomposi- Except for special matrices (such as positive def- tion: Gram–Schmidt algorithms and orthogonal inite and diagonally dominant matrices), the triangularization. method requires some form of pivoting for sta- Gram–Schmidt algorithms proceed stepwise bility. The most common form is partial pivot- by orthogonalizing the kth columns of A against ing, in which pivot elements are chosen from the the first (k − 1) columns of Q to get the kth column to be eliminated. This algorithm re- column of Q. There are two forms of the quires about n3/3 additions and multiplications. Gram–Schmidt algorithm, the classical and the Certain contrived examples show that Gauss- modified, and they both compute only the ian elimination with partial pivoting can be un- factorization in Equation 10. The classical stable. Nonetheless, it works well for the over- Gram–Schmidt is unstable. The modified form 30,31 whelming majority of real-life problems. can produce a matrix QA whose columns deviate Why is an open question. from . But the deviation is

JANUARY/FEBRUARY 2000 55 bounded, and the computed factorization can be oted QR factorization has the form used in certain applications—notably comput- R R  ing least-squares solutions. If the orthogonaliza- AP = (Q Q ) 11 12 . tion step is repeated at each stage—a process 1 2  0 0  known as reorthogonalization—both algorithms will produce a fully orthogonal factorization. It follows that either Q1 or the first k columns of When n » m, the algorithms without reorthogo- AP form a basis for the column space of A. nalization require about mn2 additions and mul- Thus, the pivoted QR decomposition can be tiplications. used to extract a set of linearly independent The method of orthogonal triangularization columns from A. proceeds by premultiplying A by certain simple History. The QR factorization first appeared orthogonal matrices until the elements below in a work by Erhard Schmidt on integral equa- the diagonal are zero. The product of the or- tions.33 Specifically, Schmidt showed how to or- thogonal matrices is Q, and R is the upper trian- thogonalize a sequence of functions by what is gular part of the reduced A. Again, there are two now known as the Gram–Schmidt algorithm. versions. The first reduces the matrix by House- (Curiously, Laplace produced the basic formu- holder transformations. The method has the ad- las34 but had no notion of orthogonality.) The vantage that it represents the entire matrix Q in name QR comes from the QR algorithm, named the same amount of memory that is required to by Francis (see the history notes for the Schur hold A, a great savings when n » p. The second algorithm, discussed later). Householder intro- method reduces A by plane rotations. It is less duced Householder transformations to matrix efficient than the first method, but is better computations and showed how they could be suited for matrices with structured patterns of used to triangularize a general matrix.35 Plane nonzero elements. rotations were introduced by Givens,16 who used Relation to the Cholesky decomposition. them to reduce a to tridiago- From Equation 10, it follows that nal form. Bogert and Burris appear to be the first to use them in orthogonal triangularization.36 ATA = RTR. (11) The first updating algorithm (adding a row) is due to Golub,37 who also introduced the idea of In other words, the triangular factor of the QR pivoting. decomposition of A is the triangular factor of the Cholesky decomposition of the cross-product The spectral decomposition matrix ATA. Consequently, many problems— Description. Let A be a symmetric matrix of particularly least-squares problems—can be order n. There is an orthogonal matrix V such solved using either a QR decomposition from a that least-squares matrix or the Cholesky decompo- Λ T Λ λ λ sition from the normal equation. The QR de- A = V V , = diag( 1, …, n). (12) composition usually gives more accurate results, λ whereas the Cholesky decomposition is often If vi denotes the ith column of V, then Avi = ivi. λ faster. Thus ( i, vi) is an eigenpair of A, and the spectral Updating. Given a QR factorization of A, decomposition shown in Equation 12 exhibits the there are stable, efficient algorithms for recom- eigenvalues of A along with complete orthonor- puting the QR factorization after rows and mal system of eigenvectors. columns have been added to or removed from Applications. The spectral decomposition A. In addition, the QR decomposition of the finds applications wherever the eigensystem of rank-one modification A + xyT can be stably a symmetric matrix is needed, which is to say in updated. virtually all technical disciplines. The pivoted QR decomposition. If P is a Algorithms. There are three classes of algo- permutation matrix, then AP is a permutation of rithms to compute the spectral decomposition: the columns of A, and (AP)T(AP) is a diagonal the QR algorithm, the divide-and-conquer al- permutation of ATA. In view of the relation of gorithm, and Jacobi’s algorithm. The first two the QR and the Cholesky decompositions, it is require a preliminary reduction to tridiagonal not surprising that there is a pivoted QR factor- form by orthogonal similarities. I discuss the QR ization whose triangular factor R satisfies Equa- algorithm in the next section on the Schur de- tion 9. In particular, if A has rank k, then its piv- composition. The divide-and-conquer algo-

56 COMPUTING IN SCIENCE & ENGINEERING rithm38,39 is comparatively recent and is usually Hessenberg form, which is usually done with faster than the QR algorithm when both eigen- Householder transformations, the Schur from is values and eigenvectors are desired; however, it computed using the QR algorithm.46 Elsewhere is not suitable for problems in which the eigen- in this issue, Beresford Parlett discusses the values vary widely in magnitude. The Jacobi al- modern form of the algorithm. It is one of the gorithm is much slower than the other two, but most flexible algorithms in the repertoire, having for positive definite matrices it may be more variants for the spectral decomposition, the sin- accurate.40 All these algorithms require O(n3) gular values decomposition, and the generalized operations. eigenvalue problem. Updating. The spectral decomposition can be History. Schur introduced his decomposition updated. Unlike the Cholesky and QR decom- in 1909.47 It was the only one of the big six to positions, the algorithm does not result in a re- have been derived in terms of matrices. It was duction in the order of the work—it still remains largely ignored until Francis’s QR algorithm O(n3), although the order constant is lower. pushed it into the limelight. History. The spectral decomposition dates back to an 1829 paper by Cauchy,41 who intro- The singular value decomposition duced the eigenvectors as solutions of equations Description. Let A be an m × n matrix with of the form Ax = λx and proved the orthogonal- m ≥ n. There are orthogonal matrices U and V ity of eigenvectors belonging to distinct eigen- such that values. In 1846, Jacobi42 gave his famous algo- Σ rithm for spectral decomposition, which iter- U T AV =   atively reduces the matrix in question to diago-  0 nal form by a special type of plane rotations, where now called Jacobi rotations. The reduction to Σ σ σ σ ≥ σ ≥ ≥ σ ≥ tridiagonal form by plane rotations is due to = diag( 1, …, n), 1 2 … n 0. Givens16 and by Householder transformations to Householder.43 This decomposition is called the singular value decomposition of A. If UA consists of the first n The Schur decomposition columns of U, we can write Description. Let A be a matrix of order n. Σ T There is a U such that A = UA V (13)

A = UTUH which is sometimes called the singular value fac- torization of A. where T is upper triangular and H means con- The diagonal elements of σ are called the singu- jugate . The diagonal elements of T are lar values of A. The corresponding columns of U the eigenvalues of A, which, by appropriate and V are called left and right singular vectors of A. choice of U, can be made to appear in any order. Applications. Most of the applications of the This decomposition is called a Schur decomposi- QR decomposition can also be handled by the tion of A. singular value decomposition. In addition, the A real matrix can have complex eigenvalues singular value decomposition gives a basis for the and hence a complex Schur form. By allowing T row space of A and is more reliable in determin- to have real 2 × 2 blocks on its diagonal that con- ing rank. It can also be used to compute optimal tain its complex eigenvalues, the entire decom- low-rank approximations and to compute angles position can be made real. This is sometimes between subspaces. called a real Schur form. Relation to the spectral decomposition. The Applications. An important use of the Schur singular value factorization is related to the spec- form is as an intermediate form from which the tral decomposition in much the same way as the eigenvalues and eigenvectors of a matrix can be QR factorization is related to the Cholesky de- computed. On the other hand, the Schur de- composition. Specifically, from Equation 13 it composition can often be used in place of a com- follows that plete system of eigenpairs, which, in fact, may not exist. A good example is the solution of Sylvester’s ATA = VΣ2VT. equation and its relatives.44,45 Algorithms. After a preliminary reduction to Thus the eigenvalues of the cross-product ma-

JANUARY/FEBRUARY 2000 57 trix ATA are the squares of the singular vectors 7. J.J. Dongarra et al., “An Extended Set of Fortran Basic Linear Al- T gebra Subprograms,” ACM Trans. Mathematical Software, Vol. of A and the eigenvectors of A A are right sin- 14, 1988, pp. 1–17. gular vectors of A. 8. J.J. Dongarra et al., “A Set of Level 3 Basic Linear Algebra Sub- Algorithms. As with the spectral decomposi- programs,” ACM Trans. Mathematical Software, Vol. 16, 1990, tion, there are three classes of algorithms for pp. 1–17. computing the singular value decomposition: the 9. C.L. Lawson et al., “Basic Linear Algebra Subprograms for For- tran Usage,” ACM Trans. Mathematical Software, Vol. 5, 1979, QR algorithm, a divide-and-conquer algorithm, pp. 308–323. and a Jacobi-like algorithm. The first two re- 10. A. Cayley, “A Memoir on the Theory of Matrices,” Philosophical quire a reduction of A to bidiagonal form. The Trans. Royal Soc. of London, Vol. 148, 1858, pp. 17–37. 49 divide-and-conquer algorithm is often faster 11. C.C. Mac Duffee, The Theory of Matrices, Chelsea, New York, than the QR algorithm, and the Jacobi algorithm 1946. is the slowest. 12. P.S. Dwyer, “A Matrix Presentation of Least Squares and Corre- lation Theory with Matrix Justification of Improved Methods of History. The singular value decomposition was Solution,” Annals Math. Statistics, Vol. 15, 1944, pp. 82–89. 50 introduced independently by Beltrami in 1873 13. H. Jensen, An Attempt at a Systematic Classification of Some Meth- 51 and Jordan in 1874. The reduction to bidiago- ods for the Solution of Normal Equations, Report No. 18, Geo- nal form is due to Golub and Kahan,52 as is the dætisk Institut, Copenhagen, 1944. variant of the QR algorithm. The first Jacobi-like 14. J. von Neumann and H.H. Goldstine, “Numerical Inverting of Matrices of High Order,” Bull. Am. Math. Soc., Vol. 53, 1947, pp. algorithm for computing the singular value de- 1021–1099. 53 composition was given by Kogbetliantz. 15. A.S. Householder, The Theory of Matrices in Numerical Analysis, Dover, New York, 1964. 16. W. Givens, Numerical Computation of the Characteristic Values of he big six are not the only decompo- a Real Matrix, Tech. Report 1574, Oak Ridge Nat’l Laboratory, sitions in use; in fact, there are many Oak Ridge, Tenn., 1954. 17. J.H. Wilkinson, “Error Analysis of Direct Methods of Matrix In- more. As mentioned earlier, certain version,” J. ACM, Vol. 8, 1961, pp. 281–330. intermediate forms—such as tridi- 18. J.H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, agonalT and Hessenberg forms—have come to be Oxford, U.K., 1965. regarded as decompositions in their own right. 19. Å. Björck, Numerical Methods for Least Squares Problems, SIAM, Since the singular value decomposition is ex- Philadelphia, 1996. pensive to compute and not readily updated, 20. B.N. Datta, Numerical Linear Algebra and Applications, Brooks/ rank-revealing alternatives have received con- Cole, Pacific Grove, Calif., 1995. 54,55 21. J.W. Demmel, Applied Numerical Linear Algebra. SIAM, Philadel- siderable attention. There are also general- phia, 1997. izations of the singular value decomposition and 22. N.J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, the Schur decomposition for pairs of matri- Philadelphia, 1996. 56,57 ces. All crystal balls become cloudy when 23. B.N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, En- they look to the future, but it seems safe to say glewood Cliffs, N.J., 1980; reissued with revisions by SIAM, that as long as new matrix problems arise, new Philadelphia, 1998. 24. G.W. Stewart, Introduction to Matrix Computations, Academic decompositions will be devised to solve them. Press, New York, 1973. 25. G.W. Stewart, Matrix Algorithms I: Basic Decompositions, SIAM, Philadelphia, 1998. Acknowledgment 26. D.S. Watkins, Fundamentals of Matrix Computations, John Wiley & This work was supported by the National Science Sons, New York, 1991. Foundation under Grant No. 970909-8562. 27. C.F. Gauss, Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium [Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections], Perthes and References Besser, Hamburg, Germany, 1809. 1. P.S. Dwyer, Linear Computations, John Wiley & Sons, New York, 28. C.F. Gauss, “Disquisitio de elementis ellipticis Palladis [Disquisi- 1951. tion on the Elliptical Elements of Pallas],” Commentatines soci- 2. A.S. Householder, Principles of Numerical Analysis, McGraw-Hill, etatis regiae scientarium Gottingensis recentiores, Vol. 1, 1810. New York, 1953. 29. C. Benoît, “Note sur une méthode de résolution des équations 3. J.H. Wilkinson and C. Reinsch, Handbook for Automatic Compu- normales provenant de l’application de la méthode des moin- tation, Vol. II, Linear Algebra, Springer-Verlag, New York, 1971. dres cárres a un systém d’équations linéares en nombre inférieur ’a celui des inconnues. — application de la méthode á la resolu- 4. B.S. Garbow et al., “Matrix Eigensystem RoutinesEispack Guide tion d’un systéme défini d’équations linéares (Procédé du Com- Extension,” Lecture Notes in Computer Science, Springer-Verlag, mandant Cholesky) [Note on a Method for Solving the Normal New York, 1977. Equations Arising from the Application of the Method of Least 5. J.J. Dongarra et al., LINPACK User’s Guide, SIAM, Philadelphia, Squares to a System of Linear Equations whose Number Is Less 1979. than the Number of Unknowns—Application of the Method to 6. E. Anderson et al., LAPACK Users’ Guide, second ed., SIAM, the Solution of a Definite System of Linear Equations],” Bullétin Philadelphia, 1995. Géodésique (Toulouse), Vol. 2, 1924, pp. 5–77.

58 COMPUTING IN SCIENCE & ENGINEERING 30. L.N. Trefethen and R.S. Schreiber, “Average-Case Stability of in Tech. Report 90–37, Dept. of Computer Science, Univ. of Min- Gaussian Elimination,” SIAM J. Matrix Analysis and Applications, nesota, Minneapolis, 1990. Vol. 11, 1990, pp. 335–360. 51. C. Jordan, “Memoire sur les formes bilineaires [Memoir on Bilin- 31. L.N. Trefethen and D. Bau III, Numerical Linear Algebra, SIAM, ear Forms],” Journal de Mathematiques Pures et Appliquees, Deux- Philadelphia, 1997, pp. 166–170. ieme Serie, Vol. 19, 1874, pp. 35–54. 32. C.G.J. Jacobi, “Über einen algebraischen Fundamentalsatz und 52. G.H. Golub and W. Kahan, “Calculating the Singular Values and seine Anwendungen [On a Fundamental Theorem in Algebra Pseudo-Inverse of a Matrix,” SIAM J. Numerical Analysis, Vol. 2, and Its Applications],” Journal für die reine und angewandte Math- 1965, pp. 205–224. ematik, Vol. 53, 1857, pp. 275–280. 53. E.G. Kogbetliantz, “Solution of Linear Systems by Diagonaliza- 33. E. Schmidt, “Zur Theorie der linearen und nichtlinearen Inte- tion of Coefficients Matrix,” Quarterly of Applied Math., Vol. 13, gralgleichungen. I Teil, Entwicklung willkürlichen Funktionen 1955, pp. 123–132. nach System vorgeschriebener [On the Theory of Linear and 54. T.F. Chan, “Rank Revealing QR Factorizations,” Linear Algebra Nonlinear Integral Equations. Part I, Expansion of Arbitrary Func- and Its Applications, Vol. 88/89, 1987, pp. 67–82. tions by a Prescribed System],” Mathematische Annalen, Vol. 63, 1907, pp. 433–476. 55. G.W. Stewart, “An Updating Algorithm for Subspace Tracking,” IEEE Trans. Signal Processing, Vol. 40, 1992, pp. 1535–1541. 34. P.S. Laplace, Theorie Analytique des Probabilites [Analytical Theory of Probabilities], 3rd ed., Courcier, Paris, 1820. 56. Z. Bai and J.W. Demmel, “Computing the Generalized Singular Value Decomposition,” SIAM J. Scientific and Statistical Comput- 35. A.S. Householder, “Unitary Triangularization of a Nonsymmet- ing, Vol. 14, 1993, pp. 1464–1468. ric Matrix,” J. ACM, Vol. 5, 1958, pp. 339–342. 57. C.B. Moler and G.W. Stewart, “An Algorithm for Generalized Ma- 36. D. Bogert and W.R. Burris, Comparison of Least Squares Algo- trix Eigenvalue Problems,” SIAM J. Numerical Analysis, Vol. 10, rithms, Tech. Report ORNL-3499, V.1, S5.5, Neutron Physics 1973, pp. 241–256. Div., Oak Ridge Nat’l Laboratory, Oak Ridge, Tenn., 1963. 37. G.H. Golub, “Numerical Methods for Solving Least Squares Prob- lems,” Numerische Mathematik, Vol. 7, 1965, pp. 206–216. 38. J.J.M. Cuppen, “A Divide and Conquer Method for the Sym- metric Eigenproblem,” Numerische Mathematik, Vol. 36, 1981, pp. 177–195. 39. M. Gu and S.C. Eisenstat, “Stable and Efficient Algorithm for the Rank-One Modification of the Symmetric Eigenproblem,” SIAM J. Matrix Analysis and Applications, Vol. 15, 1994, pp. 1266–1276. 40. J. Demmel and K. Veselic, “Jacobi’s Method Is More Accurate than QR,” SIAM J. Matrix Analysis and Applications, Vol. 13, 1992, G.W. Stewart is a professor in the Department of pp. 1204–1245. Computer Science and in the Institute for Advanced 41. A.L. Cauchy, “Sur l’equation á l’aide de laquelle on détermine Computer Studies at the University of Maryland. His les inégalites séculaires des mouvements des planétes [On the Equation by which the Inequalities for the Secular Movement of research interests focus on numerical linear algebra, Planets Is Determined],” Oeuvres Completes (II, e Séie), Vol. 9, perturbation theory, and rounding-error analysis. He 1829. earned his PhD from the University of Tennessee, 42. C.G.J. Jacobi, “Über ein leichtes Verfahren die in der Theorie der where his advisor was A.S. Householder. Contact Stew- Säculärstörungen vorkommenden Gleichungen numerisch aufzulösen [On an Easy Method for the Numerical Solution of art at the Dept. of Computer Science, Univ. of Mary- the Equations that Appear in the Theory of Secular Perturba- land, College Park, MD 20742; [email protected]. tions],” Journal für die reine und angewandte Mathematik, Vol. 30, 1846, pp. 51–s94. 43. J.H. Wilkinson, “Householder’s Method for the Solution of the Algebraic Eigenvalue Problem,” Computer J., Vol. 3, 1960, pp. 23–27. 44. R.H. Bartels and G.W. Stewart, “Algorithm 432: The Solution of the Matrix Equation AX − BX = C,” Comm. ACM, Vol. 8, 1972, pp. 820–826. Scientific 45. G.H. Golub, S. Nash, and C. Van Loan, “Hessenberg–Schur Method for the Problem AX + XB = C,” IEEE Trans. Automatic Con- trol, Vol. AC-24, 1979, pp. 909–913. Programming 46. J.G.F. Francis, “The QR Transformation, Parts I and II,” Computer J., Vol. 4, 1961, pp. 265–271, 1962, pp. 332–345. 47. J. Schur, “Über die charakteristischen Würzeln einer linearen Sub- stitution mit einer Anwendung auf die Theorie der Integral gle- will return in ichungen [On the Characteristic Roots of a Linear Transforma- tion with an Application to the Theory of Integral Equations],” Mathematische Annalen, Vol. 66, 1909, pp. 448–510. the next issue 49. M. Gu and S.C. Eisenstat, “A Divide-and-Conquer Algorithm for the Bidiagonal SVD,” SIAM J. Matrix Analysis and Applications, Vol. 16, 1995, pp. 79–92. 50. E. Beltrami, “Sulle funzioni bilineari [On Bilinear Functions],” Gior- of CiSE. nale di Matematiche ad Uso degli Studenti Delle Universita, Vol. 11, 1873, pp. 98–106. An English translation by D. Boley is available

JANUARY/FEBRUARY 2000 59