AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Deﬁnite Systems; Cholesky Factorization

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric Positive-Definite Matrices n×n Symmetric matrix A 2 R is symmetric positive definite (SPD) if T n x Ax > 0 for x 2 R nf0g n×n Hermitian matrix A 2 C is Hermitian positive definite (HPD) if ∗ n x Ax > 0 for x 2 C nf0g SPD matrices have positive real eigenvalues and orthogonal eigenvectors Note: Most textbooks only talk about SPD or HPD matrices, but a positive-definite matrix does not need to be symmetric or Hermitian! A real matrix A is positive definite iff A + AT is SPD T n If x Ax ≥ 0 for x 2 R nf0g, then A is said to be positive semidefinite Xiangmin Jiao Numerical Analysis I 2 / 11 Properties of Symmetric Positive-Definite Matrices SPD matrix often arises as Hessian matrix of some convex functional I E.g., least squares problems; partial differential equations If A is SPD, then A is nonsingular Let X be any n × m matrix with full rank and n ≥ m. Then T I X X is symmetric positive definite, and T I XX is symmetric positive semidefinite n×m If A is n × n SPD and X 2 R has full rank and n ≥ m, then X T AX is SPD Any principal submatrix (picking some rows and corresponding columns) of A is SPD and aii > 0 Xiangmin Jiao Numerical Analysis I 3 / 11 Cholesky Factorization If A is symmetric positive definite, then there is factorization of A A = RT R where R is upper triangular, and all its diagonal entries are positive Note: Textbook calls is “Cholesky decomposition”, but “Cholesky factorization” is a better term Key idea: take advantage and preserve symmetry and positive-definiteness during factorization Eliminate below diagonal and to the right of diagonal T T a11 b r11 0 r11 b =r11 A = = T b K b=r11 I 0 K − bb =a11 T r11 0 1 0 r11 b =r11 T = T = R1 A1R1 b=r11 I 0 K − bb =a11 0 I p where r11 = a11, where a11 > 0 T −T −1 K − bb =a11 is principal submatrix of SPD A1 = R1 AR1 and therefore is SPD, with positive diagonal entries Xiangmin Jiao Numerical Analysis I 4 / 11 Cholesky Factorization Apply recursively to obtain T T T T A = R1 R2 ··· Rn (Rn ··· R2R1) = R R; rjj > 0 which is known as Cholesky factorization How to obtain R from Rn, ::: , R2, R1? r 0 1 0 r sT A = 11 11 s I 0 A1 0 I T r11 0 1 0 1 0 r11 s = T s I 0 R~ 0 R~1 0 I r 0 r sT = 11 11 = RT R s R~ T 0 R~ T T R is “union” of kth rows of Rk (R is “union” of columns of Rk ) Matrix A1 is called the Schur complement of a11 in A Xiangmin Jiao Numerical Analysis I 5 / 11 Existence and Uniqueness Every SPD matrix has a unique Cholesky factorization I Exists because algorithm for Cholesky factorization always works for SPD matrices p I Unique because once α = a11 is determined at each step, entire column w/α is determined Question: How to check whether a symmetric matrix is positive definite? Answer: Run Cholesky factorization and it succeeds iff the matrix is positive definite. Xiangmin Jiao Numerical Analysis I 6 / 11 Algorithm of Cholesky Factorization n×n T Factorize SPD matrix A 2 R into A = R R Algorithm: Cholesky factorization R = A for k = 1 to n for j = k + 1 to n r r − (r =r )r j;j:n jp;j:n kj kk k;j:n rk;k:n rk;k:n= rkk Note: rj;j:n denotes subvector of jth row with columns j; j + 1;:::; n Operation count n n n k n X X X X X n3 2(n − j) ≈ 2 j ≈ k2 ≈ 3 k=1 j=k+1 k=1 j=1 k=1 In practice, R overwrites A, and only upper-triangular part is stored. Xiangmin Jiao Numerical Analysis I 7 / 11 Bordered Form of Cholesky Factorization Another version of Cholesky factorization is the bordered form Let Aj denote the j × j leading principle submatrix of A. T Aj−1 c Rj−1 0 Rj−1 h Aj = T = T ; c ajj h rjj 0 rjj T T T 2 where Aj−1 = Rj−1Rj−1, c = Rj−1h, and ajj = h h + rjj Therefore, q −T T h = Rj−1c; rjj = ajj − h h: h is solved for using forward substitution Xiangmin Jiao Numerical Analysis I 8 / 11 Notes on Cholesky Factorization Implementations I Different versions of Cholesky factorization can all use block-matrix operators to achieve better performance, and actual performance depends on the sizes of blocks I Different versions may have different amount of parallelism Hermitian positive definite (HPD) matrices ∗ I Cholesky factorization A = R R, exists for HPD matrices, where R is upper-triangular and its diagonal entries are positive real values Question: What happens if A is symmetric but not positive definite? Xiangmin Jiao Numerical Analysis I 9 / 11 LDLT Factorization Cholesky factorization is sometimes given by A = LDLT where D is diagonal matrix and L is unit lower triangular matrix This avoids computing square roots Symmetric indefinite systems can be factorized with PAPT = LDLT , where I P is a permutation matrix I D is block diagonal with 1 × 1 and 2 × 2 blocks I its cost is similar to Cholesky factorization Xiangmin Jiao Numerical Analysis I 10 / 11 Banded Positive Definite Systems A matrix A is banded if there is a narrow band around the main diagonal such that all of the entries of A outside of the band are zero If A is n × n, and there is an s n such that aij = 0 whenever ji − jj > s, then we say A is banded with bandwidth 2s + 1 For symmetric matrices, only half of band is stored. We say that A has semi-bandwidth s. Theorem Let A be a banded, symmetric positive definite matrix with semi-bandwidth s. Then its Cholesky factor R also has semi-bandwidth s. This is easy to prove using bordered form of Cholesky factorization Total flop count of Cholesky factorization is only ∼ ns2 However, A−1 of a banded matrix may be dense, so it is not economical to compute A−1 Xiangmin Jiao Numerical Analysis I 11 / 11.

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Deﬁnite Systems; Cholesky Factorization

Conjugate Gradient Method (Part 4)  Pre-Conditioning  Nonlinear Conjugate Gradient Method

Hybrid Algorithms for Efficient Cholesky Decomposition And

Matrix Inversion Using Cholesky Decomposition

1. Positive Definite Matrices a Matrix a Is Positive Deﬁnite If X>Ax > 0 for All Nonzero X

Cholesky Decomposition 1 Cholesky Decomposition

The CMA Evolution Strategy: a Tutorial

Reducing Dimensionality in Text Mining Using Conjugate Gradients and Hybrid Cholesky Decomposition

Stable Computations of Generalized Inverses of Positive Semidefinite

Hierarchical Sparse Cholesky Decomposition with Applications to High-Dimensional Spatio-Temporal ﬁltering

LAPACK Working Note

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators LAPACK Working Note #223

A Simple Yet Efficient Rank One Update for Covariance Matrix