The Decompositional Approach to Matrix Computation

the Top T HEME ARTICLE THE DECOMPOSITIONAL APPROACH TO MATRIX COMPUTATION The introduction of matrix decomposition into numerical linear algebra revolutionized matrix computations. This article outlines the decompositional approach, comments on its history, and surveys the six most widely used decompositions. n 1951, Paul S. Dwyer published Linear used scalar equations whereas Householder used Computations, perhaps the first book de- partitioned matrices. But a deeper difference is voted entirely to numerical linear algebra.1 that while Dwyer started from a system of equa- Digital computing was in its infancy, and tions, Householder worked with a (block) LU IDwyer focused on computation with mechani- decomposition—the factorization of a matrix cal calculators. Nonetheless, the book was state into the product of lower and upper triangular of the art. Figure 1 reproduces a page of the matrices. book dealing with Gaussian elimination. In Generally speaking, a decomposition is a fac- 1954, Alston S. Householder published Princi- torization of a matrix into simpler factors. The ples of Numerical Analysis,2 one of the first mod- underlying principle of the decompositional ap- ern treatments of high-speed digital computa- proach to matrix computation is that it is not the tion. Figure 2 reproduces a page from this book, business of the matrix algorithmists to solve par- also dealing with Gaussian elimination. ticular problems but to construct computational The contrast between these two pages is strik- platforms from which a variety of problems can ing. The most obvious difference is that Dwyer be solved. This approach, which was in full swing by the mid-1960s, has revolutionized matrix computation. To illustrate the nature of the decompositional 1521-9615/00/$10.00 © 2000 IEEE approach and its consequences, I begin with a discussion of the Cholesky decomposition and G.W. STEWART the solution of linear systems—essentially the University of Maryland decomposition that Gauss computed in his elim- 50 COMPUTING IN SCIENCE & ENGINEERING Figure 1. This page from Linear Computations Figure 2. On this page from Principles of Numerical shows that Paul Dwyer’s approach begins with a Analysis, Alston Householder uses partitioned system of scalar equations. Courtesy of John Wiley matrices and LU decomposition. Courtesy of Mc- & Sons. Graw-Hill. ination algorithm. This article also provides a A can be factored in the form tour of the five other major matrix decompositions, including the pivoted LU decomposition, A = RTR (2) the QR decomposition, the spectral decomposition, the Schur decomposition, and the singu- where R is an upper triangular matrix. The fac- lar value decomposition. torization is called the Cholesky decomposition of A disclaimer is in order. This article deals pri- A. marily with dense matrix computations. Al- The factorization in Equation 2 can be used though the decompositional approach has greatly to solve linear systems as follows. If we write the − influenced iterative and direct methods for sparse system in the form RTRx = b and set y = R Tb, matrices, the ways in which it has affected them then x is the solution of the triangular system are different from what I describe here. Rx = y. However, by definition y is the solution of the system RTy = b. Consequently, we have re- duced the problem to the solution of two trian- The Cholesky decomposition and gular systems, as illustrated in the following al- linear systems gorithm: We can use the decompositional approach to solve the system 1. Solve the system RTy = b. 2. Solve the system Rx = y. (3) Ax = b (1) Because triangular systems are easy to solve, the where A is positive definite. It is well known that introduction of the Cholesky decomposition has JANUARY/FEBRUARY 2000 51 used to solve a system at one point in a computation, you can reuse the decomposition to do the same thing later without having to recompute it. Historically, Gaussian elimination and its variants (including Cholesky’s algorithm) have solved the system in Equation 1 by reducing it to an equivalent triangular system. This mixes the computation of the decomposition with the solution of the first triangular system in Equa- tion 3, and it is not obvious how to reuse the elimination when a new right-hand side presents itself. A naive programmer is in danger of per- forming the reduction from the beginning, thus repeating the lion’s share of the work. On the other hand, a program that knows a decomposition is in the background can reuse it as needed. (By the way, the problem of recomputing decompositions has not gone away. Some matrix packages hide the fact that they repeatedly compute a decomposition by providing drivers to solve linear systems with a call to a single routine. If the program calls the routine again with the same matrix, it recomputes the decomposition—unnecessarily. Interpretive matrix systems such as Matlab and Mathematica have the same problemthey hide decompositions behind op- Figure 3. These varieties of Gaussian elimination are all numerically erators and function calls. Such are the conse- equivalent. quences of not stressing the decompositional approach to the consumers of matrix algorithms.) Another advantage of working with decompo- transformed our problem into one for which the sitions is unity. There are different ways of or- solution can be readily computed. ganizing the operations involved in solving lin- We can use such decompositions to solve ear systems by Gaussian elimination in general more than one problem. For example, the fol- and Cholesky’s algorithm in particular. Figure 3 lowing algorithm solves the system ATx = b: illustrates some of these arrangements: a white area contains elements from the original matrix, 1. Solve the system Ry = b. a dark area contains the factors, a light gray 2. Solve the system RTx = y. (4) area contains partially processed elements, and the boundary strips contain elements about to Again, in many statistical applications we want be processed. Most of these variants were origi- − to compute the quantity ρ = xTA 1x. Because nally presented in scalar form as new algorithms. Once you recognize that a decomposition is in- − − − − xTA 1x = xT(RTR) 1x = (R Tx)T(R Tx) (5) volved, it is easy to see the essential unity of the various algorithms. we can compute ρ as follows: All the variants in Figure 3 are numerically equivalent. This means that one rounding-error 1. Solve the system RTy = x. analysis serves all. For example, the Cholesky al- 2. ρ = yTy. (6) gorithm, in whatever guise, is backward stable: the computed factor R satisfies The decompositional approach can also save computation. For example, the Cholesky de- (A + E) = RTR (7) composition requires O(n3) operations to compute, whereas the solution of triangular systems where E is of the size of the rounding unit rela- requires only O(n2) operations. Thus, if you rec- tive to A. Establishing this backward is usually ognize that a Cholesky decomposition is being the most difficult part of an analysis of the use 52 COMPUTING IN SCIENCE & ENGINEERING of a decomposition to solve a problem. For example, once Equation 7 has been established, the Benefits of the decompositional rounding errors involved in the solutions of the approach triangular systems in Equation 3 can be incor- porated in E with relative ease. Thus, another • A matrix decomposition solves not one but many problems. advantage of the decompositional approach is • A matrix decomposition, which is generally expensive to that it concentrates the most difficult aspects of compute, can be reused to solve new problems involving the rounding-error analysis in one place. original matrix. In general, if you change the elements of a • The decompositional approach often shows that apparently positive definite matrix, you must recompute its different algorithms are actually computing the same object. Cholesky decomposition from scratch. However, • The decompositional approach facilitates rounding-error if the change is structured, it may be possible to analysis. compute the new decomposition directly from • Many matrix decompositions can be updated, sometimes the old—a process known as updating. For ex- with great savings in computation. ample, you can compute the Cholesky decom- • By focusing on a few decompositions instead of a host of position of A + xxT from that of A in O(n2) op- specific problems, software developers have been able to erations, an enormous savings over the ab initio produce highly effective matrix packages. computation of the decomposition. Finally, the decompositional approach has greatly affected the development of software for matrix computation. Instead of wasting energy the reduction of a quadratic form ϕ(x) = xTAx developing code for a variety of specific applica- (I am simplifying a little here). In terms of the tions, the producers of a matrix package can con- Cholesky factorization A = RTR, Gauss wrote centrate on the decompositions themselves, per- ϕ(x) in the form haps with a few auxiliary routines to handle the 2 2 2 most important applications. This approach has ϕ = T + T + + T (x) (r1 x) (r2 x) K (rn x) informed the major public-domain packages: the ≡ ρ2 + ρ2 + + ρ2 Handbook series,3 Eispack,4 Linpack,5 and La- 1 (x) 2(x) K n(x) (8) pack.6 A consequence of this emphasis on de- T compositions is that software developers have where ri is the ith row of R. Thus Gauss re- found that most algorithms have broad compu- duced ϕ(x) to a sum of squares of linear functions ρ tational features in common—features than can i. Because R is upper triangular, the function ρ be relegated to basic linear-algebra subprograms i(x) depends only on the components xi,…xn (such as Blas), which can then be optimized for of x.

The Decompositional Approach to Matrix Computation

Conjugate Gradient Method (Part 4)  Pre-Conditioning  Nonlinear Conjugate Gradient Method

Hybrid Algorithms for Efficient Cholesky Decomposition And

Geometric Polarimetry − Part I: Spinors and Wave States

Matrix Inversion Using Cholesky Decomposition

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Deﬁnite Systems; Cholesky Factorization

A Fast Method for Computing the Inverse of Symmetric Block Arrowhead Matrices

1. Positive Definite Matrices a Matrix a Is Positive Deﬁnite If X>Ax > 0 for All Nonzero X

Cholesky Decomposition 1 Cholesky Decomposition

Structured Factorizations in Scalar Product Spaces

The CMA Evolution Strategy: a Tutorial

Linear Algebra - Part II Projection, Eigendecomposition, SVD

Reducing Dimensionality in Text Mining Using Conjugate Gradients and Hybrid Cholesky Decomposition