Cholesky Factorization Nicholas J

Focus Article Cholesky factorization Nicholas J. Higham∗ This article aimed at a general audience of computational scientists, surveys the Cholesky factorization for symmetric positive definite matrices, covering algorithms for computing it, the numerical stability of the algorithms, and updating and downdating of the factorization. Cholesky factorization with pivoting for semidefinite matrices is also treated. 2009 John Wiley & Sons, Inc. WIREs Comp Stat 2009 1 251–254 INTRODUCTION COMPUTATION The Cholesky factorization can be computed by a symmetric n × n matrix A is positive definite if form of Gaussian elimination that takes advantage of the quadratic form xTAx is positive for all non- A the symmetry and definiteness. Equating (i, j)elements zero vectors x or, equivalently, if all the eigenvalues in the equation A = RTR gives of A are positive. Positive definite matrices have many important properties, not least that they can i be expressed in the form A = XTX for a non-singular 2 j = i : aii = r , matrix X. The Cholesky factorization is a particular ki k=1 form of this factorization in which X is upper i triangular with positive diagonal elements; it is usually j > i : aij = r r . (1) written as A = RTR or A = LLT and it is unique. In ki kj k=1 the case of a scalar (n = 1), the Cholesky factor R is just the positive square root of A. However, R should, These equations can be solved to yield R a column at in general, not be confused with the square roots of a time, according to the following algorithm: A, which are the matrices Y such that A = Y2, among which there is a unique symmetric positive definite for j = 1: n square root, denoted A1/2 (Section 1.7 in Ref 1). for i = 1: j − 1 The Cholesky factorization (sometimes called − r = (a − i 1 r r )/r the Cholesky decomposition) is named after Andre-´ ij ij k=1 ki kj ii end Louis Cholesky (1875–1918), a French military officer j−1 2 = − 2 1/2 involved in geodesy. It is commonly used to solve the rjj (ajj k=1 rkj) normal equations ATAx = ATb that characterize the end least squares solution to the overdetermined linear system Ax = b. The positive definiteness of A guarantees that the A variant of Cholesky factorization is the argument of the square root in this algorithm is factorization A = LDLT, where L is unit lower always positive and hence that R has a real, positive triangular (i.e., has unit diagonal) and D is diagonal. diagonal. The algorithm requires n3/3 + O(n2) flops This factorization exists and is unique for positive and n square roots, where a flop is any of the four definite matrices. If D is allowed to have non-positive elementary scalar arithmetic operations +, −, ∗,and/. diagonal entries, the factorization exists for some (but The algorithm above is just one of many ways not all) indefinite matrices. When A is positive definite of arranging Cholesky factorization and can be the Cholesky factor is given by R = D1/2LT. identified as the ‘jik’ form based on the ordering of the indices of the three nested loops. There are five other orderings, yielding algorithms that ∗Correspondence to: [email protected] are mathematically equivalent but that have quite School of Mathematics, The University of Manchester, Manchester different efficiency for large dimensions depending M13 9PL, UK on the computing environment, by which we mean DOI: 10.1002/wics.018 both the programming language and the hardware. In Volume 1, September/October 2009 2009 John Wiley & Sons, Inc. 251 Focus Article www.wiley.com/wires/compstats 3 T 1/2 modern libraries such as LAPACK, the factorization 2-norm x2 = (x x) and the corresponding sub- is implemented in partitioned form, which introduces ordinate matrix norm A2 = maxx=0 Ax2/x2, another level of looping in order to extract the best where for symmetric A we have A2 = max{|λi| : performance from the memory hierarchies of modern λi is an eigenvalue of A}. If the factorization runs to computers. To illustrate, we describe a partitioned completion in floating point arithmetic, with the argu- Cholesky factorization algorithm. For a given block ment of the square root always positive, then the size r, we can write computed R satisfies A A RT 0 I 0 R R RTR = A + A , A ≤ c n2uA ,(4) 11 12 = 11 r 11 12 ,(2) 1 1 2 1 2 T T − A12 A22 R12 In−r 0 S 0 In r where the subscripted c denotes a constant of order 1 where A11 and R11 are r × r. One step of the algorithm and u is the unit roundoff (or machine precision). consists of computing the Cholesky factorization Most modern computing environments use IEEE T = −53 ≈ A11 = R R11, solving the multiple right-hand side double precision arithmetic, for which u 2 11 − T = 1.1 × 10 16. Moreover, the computed solution x to triangular system R11R12 A12 for R12,andthen = − T Ax = b satisfies forming the Schur complement S A22 R12R12;this procedure is repeated on S. This partitioned algorithm + = ≤ 2 does precisely the same arithmetic operations as (A A2)x b, A2 2 c2n u A 2. (5) any other variant of Cholesky factorization, but it does the operations in an order that permits them This is a backward error result that can be interpreted to be expressed as matrix operations. The block as saying that the computed solution x is the true solution to a slightly perturbed problem. The operations defining R12 and S are level 3 BLAS operations,4 for which efficient computational kernels factorization is guaranteed to run to completion if 3/2 = −1 ≥ are available on most machines. In contrast, a block c3n κ2(A)u < 1, where κ2(A) A 2 A 2 1is LDLT factorization (the most useful form of block the matrix condition number with respect to inversion. factorization for a symmetric positive definite matrix) By applying standard perturbation theory for linear has the form A = LDLT, where systems to Eq. (5), a bound is obtained for the forward error: I 2 x − x2 c2n κ2(A)u L21 I ≤ . (6) = = − 2 L . , D diag(Dii), (3) x 2 1 c2n κ2(A)u . .. Lm1 ... Lm,m−1 I The excellent numerical stability of Cholesky factorization is essentially due to the equality A2 = T = 2 where the diagonal blocks Dii are, in general, R R 2 R 2, which guarantees that R is of full matrices. This factorization is mathematically bounded norm relative to A. For proofs of these different from a Cholesky or LDLT factorization (in results and more refined error bounds, see Ref 5 fact, for an indefinite matrix, it may exist when the [Chapter 10]. factorization with 1 × 1 blocks does not). It is of most interest when A is block tridiagonal5 [Chapter 13]. Once a Cholesky factorization of A is available, SEMIDEFINITE MATRICES it is straightforward to solve a linear system Ax = b. A symmetric matrix A is positive semidefinite if the The system is RTRx = b,whichcanbesolvedintwo quadratic form xTAx is non-negative for all x; thus steps, costing 2n2 flops: A may be singular. For such matrices a Cholesky factorization A = RTR exists, now with R possibly 1. Solve the lower triangular system RTy = b, having some zero elements on the diagonal, but the 2. Solve the upper triangular system Rx = y. diagonal of R may not display the rank of A.For example, 1 −11 100 1 −11 NUMERICAL STABILITY A = −11−1 = −100 001 Rounding error analysis shows that Cholesky factor- 1 −12 110 000 ization has excellent numerical stability properties. = T We will state two results in terms of the vector R R,(7) 252 2009 John Wiley & Sons, Inc. Volume 1, September/October 2009 WIREs Computational Statistics Cholesky factorization and A has rank 2 but R has only one non-zero diagonal The test is simply to run the Cholesky factorization element. However, with P, the permutation matrix algorithm and declare the matrix positive definite if comprising the identity matrix with its columns in the algorithm completes without encountering any T = T reverse order, P AP R1 R1, where negative or zero pivots and not positive definite √ otherwise. This test is much faster than computing 2 − √1 √1 all the eigenvalues of A, and it can be shown to 2 2 1 1 R1 = 0 √ − √ . (8) be numerically stable: the answer is correct for a 2 2 matrix A + A with A satisfying Eq. (4).7 When 000 an attempted Cholesky factorization breaks down More generally, any symmetric positive semidefinite with a non-positive pivot, it is sometimes useful T ≤ A has a factorization to compute a vector p such that p Ap 0. In optimization, when A is the Hessian of an underlying R R function to be minimized, p is termed a direction of PTAP = RTR, R = 11 12 ,(9) 00 negative curvature. Such a p is the first column of the matrix where P is a permutation matrix, R is r × r 11 −1 upper triangular with positive diagonal elements, = R11 R12 Z − , (11) and rank(A) = r. This factorization is produced by I using complete pivoting, which at each stage permutes the largest diagonal element in the active submatrix where R11 R12 is the partially computed Cholesky T into the pivot position. The following algorithm factor, and this choice makes p Ap equal to the implements Cholesky factorization with complete next pivot, which is non-positive by assumption. pivoting and overwrites the upper triangle of A with This choice of p is not necessarily the best that R.

Load more