Focus Article Cholesky factorization Nicholas J. Higham∗

This article aimed at a general audience of computational scientists, surveys the Cholesky factorization for symmetric positive definite matrices, covering algorithms for computing it, the of the algorithms, and updating and downdating of the factorization. Cholesky factorization with pivoting for semidefinite matrices is also treated.  2009 John Wiley & Sons, Inc. WIREs Comp Stat 2009 1 251–254

INTRODUCTION COMPUTATION The Cholesky factorization can be computed by a symmetric n × n matrix A is positive definite if form of that takes advantage of the quadratic form xTAx is positive for all non- A the symmetry and definiteness. Equating (i, j)elements zero vectors x or, equivalently, if all the eigenvalues in the equation A = RTR gives of A are positive. Positive definite matrices have many important properties, not least that they can i be expressed in the form A = XTX for a non-singular 2 j = i : aii = , matrix X. The Cholesky factorization is a particular ki k=1 form of this factorization in which X is upper i triangular with positive diagonal elements; it is usually j > i : aij = r r . (1) written as A = RTR or A = LLT and it is unique. In ki kj k=1 the case of a scalar (n = 1), the Cholesky factor R is just the positive of A. However, R should, These equations can be solved to yield R a column at in general, not be confused with the square roots of a time, according to the following algorithm: A, which are the matrices Y such that A = Y2, among which there is a unique symmetric positive definite for j = 1: n square root, denoted A1/2 (Section 1.7 in Ref 1). for i = 1: j − 1 The Cholesky factorization (sometimes called − r = (a − i 1 r r )/r the Cholesky decomposition) is named after Andre-´ ij ij k=1 ki kj ii end Louis Cholesky (1875–1918), a French military officer j−1 2 = − 2 1/2 involved in geodesy. It is commonly used to solve the rjj (ajj k=1 rkj) normal equations ATAx = ATb that characterize the end least squares solution to the overdetermined linear system Ax = b. The positive definiteness of A guarantees that the A variant of Cholesky factorization is the argument of the square root in this algorithm is factorization A = LDLT, where L is unit lower always positive and hence that R has a real, positive triangular (i.e., has unit diagonal) and D is diagonal. diagonal. The algorithm requires n3/3 + O(n2) flops This factorization exists and is unique for positive and n square roots, where a flop is any of the four definite matrices. If D is allowed to have non-positive elementary scalar arithmetic operations +, −, ∗,and/. diagonal entries, the factorization exists for some (but The algorithm above is just one of many ways not all) indefinite matrices. When A is positive definite of arranging Cholesky factorization and can be the Cholesky factor is given by R = D1/2LT. identified as the ‘jik’ form based on the ordering of the indices of the three nested loops. There are five other orderings, yielding algorithms that ∗Correspondence to: [email protected] are mathematically equivalent but that have quite School of Mathematics, The University of Manchester, Manchester different efficiency for large dimensions depending M13 9PL, UK on the computing environment, by which we mean DOI: 10.1002/wics.018 both the programming language and the hardware. In

Volume 1, September/October 2009  2009 John Wiley & Sons, Inc. 251 Focus Article www.wiley.com/wires/compstats

3 T 1/2 modern libraries such as LAPACK, the factorization 2-norm x2 = (x x) and the corresponding sub- is implemented in partitioned form, which introduces ordinate A2 = maxx=0 Ax2/x2, another level of looping in order to extract the best where for symmetric A we have A2 = max{|λi| : performance from the memory hierarchies of modern λi is an eigenvalue of A}. If the factorization runs to computers. To illustrate, we describe a partitioned completion in floating point arithmetic, with the argu- Cholesky factorization algorithm. For a given block ment of the square root always positive, then the size r, we can write computed R satisfies A A RT 0 I 0 R R RTR = A + A , A  ≤ n2uA ,(4) 11 12 = 11 r 11 12 ,(2) 1 1 2 1 2 T T − A12 A22 R12 In−r 0 S 0 In r where the subscripted c denotes a constant of order 1 where A11 and R11 are r × r. One step of the algorithm and u is the unit roundoff (or machine precision). consists of computing the Cholesky factorization Most modern computing environments use IEEE T = −53 ≈ A11 = R R11, solving the multiple right-hand side double precision arithmetic, for which u 2 11 − T = 1.1 × 10 16. Moreover, the computed solution x to triangular system R11R12 A12 for R12,andthen = − T Ax = b satisfies forming the Schur complement S A22 R12R12;this procedure is repeated on S. This partitioned algorithm + =   ≤ 2   does precisely the same arithmetic operations as (A A2)x b, A2 2 c2n u A 2. (5) any other variant of Cholesky factorization, but it does the operations in an order that permits them This is a backward error result that can be interpreted to be expressed as matrix operations. The block as saying that the computed solution x is the true solution to a slightly perturbed problem. The operations defining R12 and S are level 3 BLAS operations,4 for which efficient computational kernels factorization is guaranteed to run to completion if 3/2 =   −1 ≥ are available on most machines. In contrast, a block c3n κ2(A)u < 1, where κ2(A) A 2 A 2 1is LDLT factorization (the most useful form of block the matrix with respect to inversion. factorization for a symmetric positive definite matrix) By applying standard perturbation theory for linear has the form A = LDLT, where systems to Eq. (5), a bound is obtained for the forward   error: I 2   x − x2 c2n κ2(A)u L21 I  ≤ . (6) = =   − 2 L  . . , D diag(Dii), (3) x 2 1 c2n κ2(A)u  . ..  Lm1 ... Lm,m−1 I The excellent numerical stability of Cholesky fac- torization is essentially due to the equality A2 =  T  = 2 where the diagonal blocks Dii are, in general, R R 2 R 2, which guarantees that R is of full matrices. This factorization is mathematically bounded norm relative to A. For proofs of these different from a Cholesky or LDLT factorization (in results and more refined error bounds, see Ref 5 fact, for an indefinite matrix, it may exist when the [Chapter 10]. factorization with 1 × 1 blocks does not). It is of most interest when A is block tridiagonal5 [Chapter 13]. Once a Cholesky factorization of A is available, SEMIDEFINITE MATRICES it is straightforward to solve a linear system Ax = b. A A is positive semidefinite if the The system is RTRx = b,whichcanbesolvedintwo quadratic form xTAx is non-negative for all x; thus steps, costing 2n2 flops: A may be singular. For such matrices a Cholesky factorization A = RTR exists, now with R possibly 1. Solve the lower triangular system RTy = b, having some zero elements on the diagonal, but the 2. Solve the upper triangular system Rx = y. diagonal of R may not display the rank of A.For example,       1 −11 100 1 −11 NUMERICAL STABILITY A = −11−1 = −100 001 Rounding error analysis shows that Cholesky factor- 1 −12 110 000 ization has excellent numerical stability properties. = T We will state two results in terms of the vector R R,(7)

252  2009 John Wiley & Sons, Inc. Volume 1, September/October 2009 WIREs Computational Statistics Cholesky factorization and A has rank 2 but R has only one non-zero diagonal The test is simply to run the Cholesky factorization element. However, with P, the permutation matrix algorithm and declare the matrix positive definite if comprising the with its columns in the algorithm completes without encountering any T = T reverse order, P AP R1 R1, where negative or zero pivots and not positive definite  √  otherwise. This test is much faster than computing 2 − √1 √1 all the eigenvalues of A, and it can be shown to  2 2  1 1 R1 =  0 √ − √  . (8) be numerically stable: the answer is correct for a 2 2 matrix A + A with A satisfying Eq. (4).7 When 000 an attempted Cholesky factorization breaks down More generally, any symmetric positive semidefinite with a non-positive pivot, it is sometimes useful T ≤ A has a factorization to compute a vector p such that p Ap 0. In optimization, when A is the Hessian of an underlying R R function to be minimized, p is termed a direction of PTAP = RTR, R = 11 12 ,(9) 00 negative curvature. Such a p is the first column of the matrix where P is a permutation matrix, R is r × r 11 −1 upper triangular with positive diagonal elements, = R11 R12 Z − , (11) and rank(A) = r. This factorization is produced by I using complete pivoting, which at each stage permutes the largest diagonal element in the active submatrix where R11 R12 is the partially computed Cholesky T into the pivot position. The following algorithm factor, and this choice makes p Ap equal to the implements Cholesky factorization with complete next pivot, which is non-positive by assumption. pivoting and overwrites the upper triangle of A with This choice of p is not necessarily the best that R. It is a ‘kji’ form of the algorithm. can be obtained from Cholesky factorization, either in terms of producing a small value of pTAp Set pi = 1, i = 1: n. or in terms of the effects of rounding errors on for k = 1: n the computation of p. Indeed this is a situation = Find s such that ass maxk≤i≤n aii. where Cholesky factorization with complete pivoting Swap rows and columns k and s of A can profitably be used. For more details, see √and swap pk and ps. Ref 8. = akk akk for j = k + 1: n UPDATING AND DOWNDATING = / akj akj akk In some applications, it is necessary to modify a end = T = + Cholesky factorization A R R after a rank 1 change for j k 1: n to the matrix A. Specifically, given a vector x such that = + for i k 1: j A − xxT is positive definite we may need to compute R = − aij aij akiakj such that A − xxT = RTR (the downdating problem), end or given a vector y we may need to compute R such end that A + yyT = RTR. For the updating problem, we end can write Set P to the matrix whose jth column is the p th column of I. R j A + yyT = RTR + yyT = RT y yT An efficient implementation of this algorithm that uses R level 3 BLAS6 is available in LAPACK Version 3.2. = RT y QT · Q , QTQ = I. (12) yT Complete pivoting produces a matrix R that satisfies the inequalities We aim to use the Q to restore = min(j,r) triangularity; thus, for n 3, for example, we want r2 ≥ r2, j = k + 1: n, k = 1: r, (10) Q to effect, pictorially, kk ij     i=k ××× ×××  ××  ×× which imply r ≥ r ≥···≥r .  0  =  0  11 22 nn Q  ×   ×  , (13) An important use of Cholesky factorization is for 00 00 ××× testing whether a symmetric matrix is positive definite. 000

Volume 1, September/October 2009  2009 John Wiley & Sons, Inc. 253 Focus Article www.wiley.com/wires/compstats where × denotes a non-zero element. This can be xxT from A, and several methods are available, all achieved by taking Q as a product of suitably chosen more complicated than the updating procedure out- Givens rotations. The downdating problem is more lined above. For more on updating and downdating, delicate because of possible cancellation in removing see Ref 9 [Section 3.3].

REFERENCES

1. Higham NJ. Functions of Matrices: Theory and Com- 6. Hammarling S, Higham NJ, Lucas C. LAPACK-style putation. Philadelphia, PA: Society for Industrial and codes for pivoted Cholesky and QR updating. In: Applied Mathematics; 2008, xx+425 pp. (ISBN 978-0- Kagstr˚ om¨ B, Elmroth E, Dongarra J, Wasniewski´ J, eds. 898716-46-7). Applied Parallel Computing. State of the Art in Scientific Computing. 8th International Workshop, PARA 2006, 2. Brezinski C. The life and work of Andre´ Cholesky. Number 4699 in Lecture Notes in Computer Science. Numer Algorithms 2006, 43:279–288. Berlin: Spring-er-Ver-lag; 2007, 137–146. 3. Anderson E, Bai Z, Bischof CH, Blackford S, Demmel 7. Higham NJ. Computing a nearest symmetric posi- JW, et al. LAPACK Users’ Guide. 3rd ed. Philadelphia, tive semidefinite matrix. Appl 1988, PA: Society for Industrial and Applied Mathematics; 103:103–118. 1999, xxvi+407 pp. (ISBN 0-89871-447-8). 8. Guo CH, Higham NJ, Tisseur F. An Improved Arc Algo- rithm for Detecting Definite Hermitian Pairs,MIMS 4. Dongarra JJ, Du Croz JJ, Duff IS, Hammarling SJ. A set EPrint 2008.115. Manchester: Manchester Institute for of Level 3 basic linear algebra subprograms. ACM Trans Mathematical Sciences, The University of Manchester; Math Softw 1990, 16:1–17. 2008. 5. Higham NJ. Accuracy and Stability of Numerical Algo- 9. Bjorck¨ A. Numerical Methods for Least Squares Prob- rithms. 2nd ed. Philadelphia, PA: Society for Industrial lems. Society for Industrial and Applied Mathematics; and Applied Mathematics, 2002, xxx+680 pp. (ISBN Manchester, UK, 1996, xvii+408 pp. (ISBN 0-89871- 0-89871-521-0). 360-9).

254  2009 John Wiley & Sons, Inc. Volume 1, September/October 2009