<<

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2014 LECTURE 8

• we look at LU and some of its variants: condensed LU, LDU, LDLT, and Cholesky

1. existence of lu factorization • the solution method for a linear system Ax = b depends on the structure of A: A may be a sparse or dense , or it may have one of many well-known structures, such as being a banded matrix, or a • for the general case of a dense, unstructured matrix A, the most common method is to obtain a decomposition A = LU, where L is lower triangular and U is upper triangular • this decomposition is called the LU factorization or LU decomposition • we deduce its existence via a constructive proof, namely, • the motivation for this is something you learnt in middle school, i.e., solving Ax = b by eliminating variables

a11x1 + ··· + a1nxn = b1 a21x1 + ··· + a2nxn = b2 ...... an1x1 + ··· + annxn = bn

• we proceed by multiplying the first equation by −a21/a11 and adding it to the second equation, and in general multiplying the first equation by −ai1/a11 and adding it to equation i and this leaves you with the equivalent system

a11x1 + a12x2 + ··· + a1nxn = b1 0 0 0 0x1 + a22x2 + ··· + a2nxn = b2 ...... 0 0 0 0x1 + an2x2 + ··· + annxn = bn • continuing in this fashion, adding multiples of the second equation to each subsequent equa- tion to make all elements below the equal to zero, you obtain an upper triangular system and may then solve for all xn, xn−1, . . . , x1 by back substitution • getting the LU factorization A = LU is very similar, the main difference is that you want not just the final upper triangular matrix (which is your U) but also to keep track of all the elimination steps (which is your L)

2. Gaussian elimination revisited • we are going to look at Gaussian elimination in a slightly different light from what you learnt in your undergraduate class • we think of Gaussian elimination as the process of transforming A to an upper triangular matrix U is equivalent to multiplying A by a sequence of matrices to obtain U

Date: November 5, 2014, version 1.0. Comments, bug reports: [email protected]. 1 • but instead of elementary matrices, we consider again a rank-1 change to I, i.e., a matrix of the form I − uvT n where u, v ∈ R • in Householder QR, we used Householder reflection matrices of the form H = I − 2uuT • in Gaussian elimination, we use so-called Gauss transformation or elimination matrices of the form T M = I − mei T where ei = [0,..., 1,... 0] is the ith standard basis vector • the same trick that led us to the appropriate u in Householder matrix can be applied to T find the appropriate m too: suppose we want M1 = I − m1e1 to ‘zero out’ all the entries beneath the first in a vector a, i.e.,     a1 γ a2 0     M1  .  =  .   .   .  an 0 i.e., T (I − m1e1 )a = γe1 T a − (e1 a1)m1 = γe1 a1m1 = a − γe1

and if a1 6= 0, then we may set  0  a2/a1   γ = a1, m1 =  .   .  an/a1 • so we get  1 0 −a2/a1 1  T   M1 = I − m1e1 =  . ..   . 0 .  −an/a1 1 and, as required,       1 0 a1 a1 −a2/a1 1  a2  0        M1a =  . ..   .  =  .  = a1e1  . 0 .   .   .  −an/a1 1 an 0

• applying this to zero out the entries beneath a11 in the first column of a matrix A, we get M1A = A2 where a11 a12 ··· a1n  (2) (2)  0 a ··· a   22 2n  A2 =  . . .   . . .  (2) (2) 0 an2 ··· ann 2 where the superscript in parenthesis denote that the entries have changed • we will write  1 0 −`21 1  a   i1 M1 =  . ..  , `i1 =  . 0 .  a11 −`n1 1 for i = 2, . . . , n • if we do this recursively, defining M2 by 1  0 1    a(2) M = 0 −`32 1  , ` = i2 2   i2 (2) . . ..  a . . .  22 0 −`n2 1 for i = 3, . . . , n, then   a11 a12 a13 ··· a1n  0 a(2) a(2) ··· a(2)  22 23 2n   (3) (3) M2A2 = A3 =  0 0 a33 ··· a3n   . . . .   . . . .   . . . .  (3) (3) 0 0 an3 ··· ann • in general, we have 1   ..  0 .  . .  (k) . .. 1  a M =   , ` = ik k .  ik (k) . −` 1  a  k+1,k  kk . . .  . . ..  0 −`nk 1 for i = k + 1, . . . , n, and   u11 ··· u1n  0 u22 ··· u2n    Mn−1Mn−2 ··· M1A = An ≡  . .. .   . . .  0 ··· 0 unn or, equivalently, −1 −1 −1 A = M1 M2 ··· Mn−1U −1 • it turns out that Mj is very easy to compute, we claim that  1 0 `21 1  −1   M1 =  . ..  (2.1)  . 0 .  `n1 1 3 • to see this, consider the product  1 0  1 0 −`21 1  `21 1  −1     M1M1 =  . ..   . ..   . 0 .   . 0 .  −`n1 1 `n1 1 which can easily be verified to be equal to the • in general, we have 1   ..  0 .  . .  . ..  −1 . 1  Mk = .  (2.2) . ` 1   k+1,k  . . .  . . ..  0 `nk 1 • now, consider the product  1  1  `21 1  0 1      −1 −1 `31 0 1  0 `32 1  M1 M2 =      . . ..  . . ..   . . .  . . .  `n1 0 1 0 `n2 1  1  `21 1    `31 `32 1  =    . . ..   . . .  `n1 `n2 1 • so inductively we get  1  .  ..  `21  −1 −1 −1  . .  M M ··· M =  . ..  1 2 n−1  . `32   . . . .   ......  `n1 `n2 ··· `n,n−1 1 • it follows that under proper circumstances, we can write A = LU where

 1 0 0 ··· 0 u11 u12 ······ u1n `21 1 0 ··· 0  0 u22 ······ u2n  . .  . .   .. .  .. .  L = `31 `32 . ,U =  0 0 .   . . . .   . . . .   ...... 0  . . .. .  `n1 `n2 ··· `n,n−1 1 0 0 ··· 0 unn • what exactly are proper circumstances? (k) • we must have akk 6= 0, or we cannot proceed with the decomposition 4 • for example, if 0 1 11 1 3 4 A = 3 7 2  or A = 2 6 4 2 9 3 7 1 2 Gaussian elimination will fail; note that both matrices are nonsingular • in the first case, it fails immediately; in the second case, it fails after the subdiagonal entries (k) in the first column are zeroed, and we find that a22 = 0 • in general, we must have det Aii 6= 0 for i = 1, . . . , n where   a11 ··· a1i  . .  Aii =  . .  ai1 ··· aii for the LU factorization to exist • the existence of LU factorization (without pivoting) can be guaranteed by several conditions, 1 n×n one example is column diagonal dominance: if a nonsingular A ∈ R satisfies n X |ajj| ≥ |aij|, j = 1, . . . , n, i=1 i6=j then one can guarantee that Gaussian elimination as described above produces A = LU with |`ij| ≤ 1 • there are necessary and sufficient conditions guaranteeing the existence of LU decomposition but those are difficult to check in practice and we do not state them here • how can we obtain the LU factorization for a general non-singular matrix? • if A is nonsingular, then some element of the first column must be nonzero • if ai1 6= 0, then we can interchange row i with row 1 and proceed • this is equivalent to multiplying A by a Π1 that interchanges row 1 and row i: 0 ······ 0 1 0 ··· 0  1   .   ..       1  Π1 =   1 0 ······ 0 ······ 0    1   .   ..  1

• thus M1Π1A = A2 (refer to earlier lecture notes for more information about permutation matrices) • then, since A2 is nonsingular, some element of column 2 of A2 below the diagonal must be nonzero • proceeding as before, we compute M2Π2A2 = A3, where Π2 is another permutation matrix • continuing, we obtain −1 A = (Mn−1Πn−1 ··· M1Π1) U • it can easily be shown that ΠA = LU where Π is a permutation matrix — easy but a bit of a pain because notation is cumbersome • so we will be informal but you’ll get the idea

1 Pn the usual type of diagonal dominance, i.e., |aii| ≥ j=1,j6=i|aij |, i = 1, . . . , n, is called row diagonal dominance 5 • for example if after two steps we get (recall that permutation matrices or orthogonal ma- trices), −1 A = (M2Π2M1Π1) A2 T −1 T −1 = Π1 M1 Π2 M2 A2 T T −1 T −1 = Π1 Π2 (Π2M1 Π2 )M2 A2 T = Π L1L2A2 then – Π = Π2Π1 is a permutation matrix −1 – L2 = M2 is a unit lower triangular matrix −1 T −1 – L1 = Π2M1 Π2 will always be a unit lower triangular matrix because M1 is of the form in (2.1) 1  M −1 = 1 ` I

whereas Π2 must be of the form 1  Π2 = Πb 2

for some (n − 1) × (n − 1) permutation matrix Πb 2 and so   −1 T 1 0 Π2M1 Π2 = Πb 2` I −1 T in other words Π2M1 Π2 also has the form in (2.1) • if we do one more steps we get −1 A = (M3Π3M2Π2M1Π1) A3 T −1 T −1 T −1 = Π1 M1 Π2 M2 Π3 M3 A2 T T T −1 T T −1 T −1 = Π1 Π2 Π3 (Π3Π2M1 Π2 Π3 )(Π3M2 Π3 )M3 A2 T = Π L1L2L3A2 where – Π = Π3Π2Π1 is a permutation matrix −1 – L3 = M3 is a unit lower triangular matrix −1 T −1 – L2 = Π3M2 Π3 will always be a unit lower triangular matrix because M2 is of the form in (2.2) 1  −1 M2 =  1  ` I

whereas Π3 must be of the form 1  Π3 =  1  Πb 3

for some (n − 2) × (n − 2) permutation matrix Πb 3 and so 1  −1 T Π3M2 Π3 =  1 0 Πb 3` I −1 T in other words Π3M2 Π3 also has the form in (2.2) 6 −1 T T – L1 = Π3Π2M1 Π2 Π3 will always be a unit lower triangular matrix for the same reason above because Π3Π2 must have the form 1  1  1  Π3Π2 =  1  = Πb 2 Π32 Πb 3 for some (n − 1) × (n − 1) permutation matrix 1  Π32 = Πb 2 Πb 3 • more generally if we keep doing this, then T A = Π L1L2 ··· Ln−1An−1 where – Π = Πn−1Πn−2 ··· Π1 is a permutation matrix −1 – Ln−1 = Mn−1 is a unit lower triangular matrix −1 T T – Lk = Πn−1 ··· Πk+1Mk Πk+1 ··· Πn−1 is a unit lower triangular matrix for all k = 1, . . . , n − 2 – An−1 = U is an upper triangular matrix – L = L1L2 ··· Ln−1 is a unit lower triangular matrix • this algorithm with the row permutations is called Gaussian elimination with partial pivoting or gepp for short; we will say more in the next section

3. pivoting strategies • the (k, k) entry at step k during Gaussian elimination is called the pivoting entry or just pivot for short (k) • in the preceding section, we said that if the pivoting entry is zero, i.e., akk = 0, then we (k) just need to find an entry below it in the same column, i.e., akj for some j > k, and then permute this entry into the pivoting position, before carrying on with the algorithm • but it is really better to choose the largest entry below the pivot, and not just any non-zero entry • that is, the permutation Πk is chosen so that row k is interchanged with row j, where (k) (k) akj = maxj=k,k+1,...,n|akj | • this guarantees that |`kj| ≤ 1 for all k and j • this strategy is known as partial pivoting, which is guaranteed to produce an LU factoriza- m×n tion if A ∈ R has full row-rank, i.e., rank(A) = m ≤ n • another common strategy, complete pivoting, which uses both row and column interchanges (k) to ensure that at step k of the algorithm, the element akk is the largest element in absolute value from the entire submatrix obtained by deleting the first k − 1 rows and columns, i.e., (k) (k) aij = max |aij | i=k,k+1,...,n j=k,k+1,...,n • in this case we need both row and column permutation matrices, i.e., we get

Π1AΠ2 = LU when we do complete pivoting • complete pivoting is necessary when rank(A) < min{m, n} • there are yet other pivoting strategies due to considerations such as preserving sparsity (if you’re interested, look up minimum degree algorithm or Markowitz algorithm) or a tradeoff between partial and complete pivoting (e.g., rook pivoting) 7 4. uniqueness of the LU factorization • the LU decomposition of a nonsingular matrix, if it exists (i.e., without row or column permutations), is unique • if A has two LU decompositions, A = L1U1 and A = L2U2 −1 −1 • from L1U1 = L2U2 we obtain L2 L1 = U2U1 • the inverse of a unit lower triangular matrix is a unit lower triangular matrix, and the −1 product of two unit lower triangular matrices is a unit lower triangular matrix, so L2 L1 must be a unit lower triangular matrix −1 • similarly, U2U1 is an upper triangular matrix • the only matrix that is both upper triangular and unit lower triangular is the identity matrix I, so we must have L1 = L2 and U1 = U2

5. Gauss–Jordan elimination • a variant of Gaussian elimination is called Gauss–Jordan elimination • it entails zeroing elements above the diagonal as well as below, transforming an m × n matrix into reduced , i.e., a form where all pivoting entries in U are 1 and all entries above the pivots are zeros • this is what you probably learnt in your undergraduate linear algebra class, e.g., 1 3 1 9  1 3 1 9  1 3 1 9  1 0 −2 −3 A = 1 1 −1 1  → 0 −2 −2 −8 → 0 −2 −2 −8 → 0 1 1 4  = U 3 11 5 35 0 2 2 8 0 0 0 0 0 0 0 0 • the main drawback is that the elimination process can be numerically unstable, since the multipliers can be large • furthermore the way it is done in undergraduate linear algebra courses is that the elimination matrices (i.e., the L and Π) are not stored

6. condensed LU factorization • just like QR and svd, LU factorization with complete pivoting has a condensed form too m×n • let A ∈ R and rank(A) = r ≤ min{m, n}, recall that gepp yields

Π1AΠ2 = LU L 0  U U  = 11 11 12 L21 Im−r 0 0   L11   = U11 U12 =: LeUe L21

r×r r×r where L11 ∈ R is unit lower triangular (thus nonsingular) and U11 ∈ R is also nonsingular m×r n×r • note that Le ∈ R and Ue ∈ R and so T T A = (Π1 Le)(UeΠ2 ) is a rank-retaining factorization

7. LDU and LDLT factorizations n×n • if A ∈ R has nonsingular principal submatrices A1:k,1:k for k = 1, . . . , n, then there n×n n×n exists a unit lower triangular matrix L ∈ R , a unit upper triangular matrix U ∈ R , 8 n×n and a D = diag(d11, . . . , dnn) ∈ R such that       1 0 d11 1 u12 ··· u1n `21 1   d22   1 u2n       A = LDU =  . ..   ..   .. .   . .   .   . .  `n1 `n2 ··· 1 dnn 1 • this is called the LDU factorization of A • if A is furthermore symmetric, then L = U T and this called the LDLT factorization • if they exist, then both LDU and LDLT factorizations are unique (exercise) T • if a symmetric A has an LDL factorization and if dii > 0 for all i = 1, . . . , n, then A is positive definite • in fact, even though d11, . . . , dnn are not the eigenvalues of A (why not?), they must have the same signs as the eigenvalues of A, i.e., if A has p positive eigenvalues, q negative eigenvalues, and z zero eigenvalues, then there are exactly p, q, and z positive, negative, and zero entries in d11, . . . , dnn — a consequence of the Sylvester law of inertia • unfortunately, both LDU and LDLT factorizations are difficult to compute because – the condition on the principal submatrices is difficult to check in advance – algorithms for computing them are invariably unstable because size of multipliers can- not be bounded in terms of the entries of A • for example, the LDLT factorization of a 2 × 2 is a c  1 0 a c   1 0 a 0  1 c/a = = c d c/a 1 0 d − (c/a)c c/a 1 0 d − (c/a)c 0 1 • so ε 1  1 0 ε 0  1 1/ε = 1 ε 1/ε 1 0 ε − 1/ε 0 1 the elements of L and D are arbitrarily large when |ε| is small • nonetheless there is one special case when LDLT factorization not only exists but can be computed in an efficient and stable way — when A is positive definite

8. positive definite matrices n×n T • a symmetric matrix A ∈ R is positive definite if x Ax > 0 for all nonzero x • a symmetric positive definite matrix has real and positive eigenvalues, and its leading prin- cipal submatrices all have positive • from the definition, it is easy to see that all diagonal elements are positive • to solve the system Ax = b where A is symmetric positive definite, we can compute the Cholesky factorization A = RTR where R is upper triangular • this factorization exists if and only if A is symmetric positive definite • in fact, attempting to compute the Cholesky factorization of A is an efficient method for checking whether A is symmetric positive definite • it is important to distinguish the Cholesky factorization from the square root factorization • a A is defined as a matrix S such that S2 = SS = A • note that the matrix R in A = RTR is not the square root of A, since it does not hold that R2 = A unless A is a diagonal matrix 9 • the square root of a symmetric positive definite A can be computed by using the fact that A has an eigendecomposition A = UΛU T where Λ is a diagonal matrix whose diagonal elements are the positive eigenvalues of A and U is an whose columns are the eigenvectors of A • it follows that A = UΛU T = (UΛ1/2U T)(UΛ1/2U T) = SS and so S = UΛ1/2U T is a square root of A

9. cholesky factorization • the Cholesky factorization can be computed directly from the matrix equation A = RTR where R is upper-triangular • while it is conventional to write Cholesky factorization in the form A = RTR, it will be more natural later when we discuss the vectorized version of the algorithm to write F = RT and A = FF T • we can derive the algorithm for computing F by examining the matrix equation A = RTR = FF T on an element-by-element basis, writing       a11 a12 ··· a1n f11 f11 f21 ··· fn1 a12 a22 ··· a2n f21 f22   f22 fn2        . . .  =  . . ..   .. .   . . .   . . .   . .  a1n a2n ··· ann fn1 fn2 ··· fnn fnn

2 • from the above we see that f11 = a11, from which it follows that √ f11 = a11

• from the relationship f11fi1 = ai1 and the fact that we already know f11, we obtain

ai1 fi1 = , i = 2, . . . , n f11

2 2 • proceeding to the second column of F , we see that f21 + f22 = a22 • since we already know f21, we have q 2 f22 = a22 − f21

• if you know the fact that a positive definite matrix must have positive leading principal minors, then you could deduce the term above in the square root is positive by examining the 2 × 2 principal : 2 a11a22 − a12 > 0 and therefore 2 a12 2 a22 > = f21 a11

• next, we use the relation f21fi1 + f22fi2 = a2i to compute

a2i − f21fi1 fi1 = f22 10 • hence we get

2 a11 = f11, ai1 = f11fi1, i = 2, . . . , n . . 2 2 2 akk = fk1 + fk2 + ··· + fkk, aik = fk1fi1 + ··· + fkkfik, i = k + 1, . . . , n • the resulting algorithm that runs for k = 1, . . . , n is

1/2  Pk−1 2  fkk = akk − j=1 fkj ,  Pk−1  aik − j=1 fkjfij fik = , i = k + 1, . . . , n fkk • you could use induction to show that the term in the square root is always positive but we’ll soon see a more elegant vectorized version showing that this algorithm doesn’t ever require taking square roots of negative numbers • this algorithm requires roughly half as many operations as Gaussian elimination • instead of considering an elementwise algorithm, we can also derive a vectorized version T • we use the relationship aij = fi fj to compute rij, where fi is the ith column of F • we start by computing 1 f1 = √ a1 a11

where ai is the ith column of A • then we set A(1) = A and compute

0 0 ··· 0 0  (2) (1) T   A = A − f1f1 = .  . A2  0

• note that 1 0  A(1) = B BT 0 A2

where B is the identity matrix with its first column replaced by f1 −1 • writing C = B , we see that A2 is positive definite since 1 0  = CACT 0 A2 is positive definite:  T     T 0 1 0 0 T T T x A2x = = (C y) A(C y) > 0 x 0 A2 x for all x 6= 0 (or if you know Sylvester law of inertia, you can apply it to deduce the same thing) • so we may repeat the process on A2 11 h (2) (2) (2)i • we partition the matrix A2 into columns, writing A2 = a2 a3 ··· an and then compute 1  0  f2 = q (2) (2) a2 a22 • we then compute (2) T A3 = A − f2f2 and so on • note that 2 2 2 akk = fk1 + fk2 + ··· + fkk which implies that p |fki| ≤ |akk| • in other words, the elements of F are bounded • we also have the relationship T 2 2 2 2 det A = det F det F = (det F ) = f11f22 ··· fnn • is the Cholesky decompositon unique? • employing a similar approach to the one used to prove the uniquess of the LU factorization, we assume that A has two Cholesky factorizations T T A = F1F1 = F2F2 • then −1 T −T F2 F1 = F2 F1 but since F1 and F2 are lower triangular, both matrices must be diagonal • let −1 T −T F2 F1 = D = F2 F1 T T −1 T −T • so F1 = F2D and thus F1 = DF2 and we get D = F2 F1 • in other words, D−1 = D or D2 = I • hence D must have diagonal elements equal to ±1 • since we require that the diagonal elements be positive, it follows that the factorization is unique • in computing the Cholesky factorization, no row interchanges are necessary because A is positive definite, so the number of operations required to compute F is approximately n3/3 • a simple variant of the algorithm Cholesky factorization yields the LDLT factorization A = LDLT where L is a unit lower triangular matrix, and D is a diagonal matrix with positive diagonal elements • the algorithm is sometimes called the square-root-free Cholesky factorization since unlike in the usual Cholesky factorization, it does not require taking square roots (which can be expensive, most computer hardware and software use Newton-Raphson method to extract square roots) • the LDLT and Cholesky factorizations are related by F = LD1/2

12