Notes on LU Factorization

Notes on LU Factorization Robert A. van de Geijn Department of Computer Science The University of Texas Austin, TX 78712 [email protected] October 27, 2014 The LU factorization is also known as the LU decomposition and the operations it performs are equivalent to those performed by Gaussian elimination. We STRONGLY recommend that the reader consult \Linear Algebra: Foundations to Frontiers - Notes to LAFF With" [12] Weeks 6 and 7. 1 Definition and Existence Definition 1. LU factorization (decomposition) Given a matrix A 2 Cm×n with m ≤ n its LU factorization is given by A = LU where L 2 Cm×n is unit lower trapezoidal and U 2 Cn×n is upper triangular. The first question we will ask is when the LU factorization exists. For this, we need a definition. k×k Definition 2. The k×k principle leading submatrix of a matrix A is defined to be the square matrix ATL 2 C ! ATL ATR such that A = . ABL ABR This definition allows us to indicate when a matrix has an LU factorization: Theorem 3. Existence Let A 2 Cm×n and m ≤ n have linearly independent columns. Then A has a unique LU factorization if and only if all its principle leading submatrices are nonsingular. The proof of this theorem is a bit involved and can be found in Section 4. 2 LU Factorization We are going to present two different ways of deriving the most commonly known algorithm. The first is a straight forward derivation. The second presents the operation as the application of a sequence of Gauss transforms. 2.1 First derivation Partition A, L, and U as follows: ! ! ! α aT 1 0 υ uT A ! 11 12 ;L ! ; and U ! 11 12 : a21 A22 l21 L22 0 U22 1 Then A = LU means that ! ! ! ! α aT 1 0 υ uT υ uT 11 12 = 11 12 = 11 12 : T a21 A22 l21 L22 0 U22 l21υ11 l21u12 + L22U22 This means that T T α11 = υ11 a12 = u12 T a21 = υ11l21 A22 = l21u12 + L22U22 or, equivalently, α = υ aT = uT 11 11 12 12 : T a21 = υ11l21 A22 − l21u12 = L22U22 If we let U overwrite the original matrix A this suggests the algorithm • l21 = a21/α11. • a21 = 0. T • A22 := A22 − l21a12. • Continue by overwriting the updated A22 with its LU factorization. This is captured in the algorithm in Figure 1. 2.2 Gauss transforms 0 1 Ik 0 0 B C Definition 4. A matrix Lk of the form Lk = @ 0 1 0 A where Ik is k × k is called a Gauss transform. 0 l21 0 Example 5. Gauss transforms can be used to take multiples of a row and subtract these multiples from other rows: 0 1 0 1 0 1 0 1 T aT T 1 0 0 0 ba0 b0 ba0 B C B T C B T C B T C B 0 1 0 0 C B a1 C B ba1 C B a1 C B C B b C = B ! ! C = B b C : B 0 −λ 1 0 C B aT C B aT λ C B aT − λ aT C @ 21 A @ b2 A @ b2 − 21 aT A @ b2 21b1 A T T b1 T T 0 −λ31 0 1 ba3 ba3 λ31 ba3 − λ31ba1 Notice the similarity with what one does in Gaussian Elimination: take a multiples of one row and subtract these from other rows. Now assume that the LU factorization in the previous subsection has proceeded to where A contains 0 1 A00 a01 A02 B T C @ 0 α11 a12 A 0 a21 A22 where A00 is upper triangular (recall: it is being overwritten by U!). What we would like to do is eliminate T the elements in a21 by taking multiples of the \current row" α11 a12 and subtract these from the rest 2 Algorithm: Compute LU factorization of A, overwriting L with factor L and A with factor U ! ! ATL ATR LTL 0 Partition A ! , L ! ABL ABR LBL LBR where ATL and LTL are 0 × 0 while n(ATL) < n(A) do Repartition 0 1 0 1 ! A00 a01 A02 ! L00 0 0 ATL ATR LTL LTR ! B T T C, ! B T C @ a10 α11 a12 A @ l10 λ11 0 A ABL ABR LBL LBR A20 a21 A22 L20 l21 L22 where α11, λ11 are 1 × 1 8 l := a /α <> 21 21 11 T A22 := A22 − l21a12 > : (a21 := 0) or, alternatively, 8 > l21 := a21/α11 > 0 1 0 1 0 1 > A00 a01 A02 I 0 0 A00 a01 A02 > > T T > B 0 α11 a C := B 0 1 0 C B 0 α11 a C <> @ 12 A @ A @ 12 A 0 a21 A22 0 −l21 0 0 a21 A22 > 0 1 > A00 a01 A02 > > = B 0 α aT C > @ 11 12 A :> T 0 0 A22 − l21a12 Continue with 0 1 0 1 ! A00 a01 A02 ! L00 0 0 A A L 0 TL TR B aT α aT C TL B lT λ 0 C @ 10 11 12 A, @ 10 11 A ABL ABR LBL LBR A20 a21 A22 L20 l21 L22 endwhile Figure 1: Most commonly known algorithm for overwriting a matrix with its LU factorization. of the rows: a21 A22 . The vehicle is a Gauss transform: we must determine l21 so that 0 1 0 1 0 1 I 0 0 A00 a01 A02 A00 a01 A02 B C B T C = B T C : @ 0 1 0 A @ 0 α11 a12 A @ 0 α11 a12 A T 0 −l21 I 0 a21 A22 0 0 A22 − l21a12 This means we must pick l21 = a21/α11 since 0 1 0 1 0 1 I 0 0 A00 a01 A02 A00 a01 A02 B C B T C = B T C : @ 0 1 0 A @ 0 α11 a12 A @ 0 α11 a12 A T 0 −l21 I 0 a21 A22 0 a21 − α11l21 A22 − l21a12 The resulting algorithm is summarized in Figure 1 under \or, alternatively,". Notice that this algorithm is 3 identical to the algorithm for computing LU factorization discussed before! How can this be? The following set of exercises explains it. Exercise 6. Show that 0 1−1 0 1 Ik 0 0 Ik 0 0 B C B C @ 0 1 0 A = @ 0 1 0 A 0 −l21 I 0 l21 I Answer: 0 1 0 1 0 1 0 1 Ik 0 0 Ik 0 0 Ik 0 0 Ik 0 0 B C B C B C B C @ 0 1 0 A @ 0 1 0 A = @ 0 1 0 A = @ 0 1 0 A 0 −l21 I 0 l21 I 0 −l21 + l21 I 0 0 I End of Answer Now, clearly, what the algorithm does is to compute a sequence of n Gauss transforms Lb0;:::; Lbn−1 −1 such that Lbn−1Lbn−2 ··· Lb1Lb0A = U. Or, equivalently, A = L0L1 ··· Ln−2Ln−1U, where Lk = Lbk . What will show next is that L = L0L1 ··· Ln−2Ln−1 is the unit lower triangular matrix computed by the LU factorization. 0 1 L00 0 0 Exercise 7. Let L~ = L L :::L . Assume that L~ has the form L~ = B T C, where L~ is k ×k. k 0 1 k k k−1 @ l10 1 0 A 00 L20 0 I 0 1 0 1 L00 0 0 I 0 0 Show that L~ is given by L~ = B T C :. (Recall: L = B C.) k k @ l10 1 0 A bk @ 0 1 0 A L20 l21 I 0 −l21 I Answer: ~ ~ ~ −1 Lk = Lk−1Lk = Lk−1Lbk 0 1 0 1−1 0 1 0 1 L00 0 0 I 0 0 L00 0 0 I 0 0 = B T C : B C = B T C : B C @ l10 1 0 A @ 0 1 0 A @ l10 1 0 A @ 0 1 0 A L20 0 I 0 −l21 I L20 0 I 0 l21 I 0 1 L00 0 0 = B T C : @ l10 1 0 A L20 l21 I End of Answer What this exercise shows is that L = L0L1 ··· Ln−2Ln−1 is the triangular matrix that is created by simply placing the computed vectors l21 below the diagonal of a unit lower triangular matrix. 2.3 Cost of LU factorization The cost of the LU factorization algorithm given in Figure 1 can be analyzed as follows: • Assume A is n × n. • During the kth iteration, ATL is initially k × k. 4 • Computing l21 := a21/α11 is typically implemented as β := 1/α11 and then the scaling l21 := βa21. The reason is that divisions are expensive relative to multiplications. We will ignore the cost of the division (which will be insignificant if n is large). Thus, we count this as n − k − 1 multiplies. 2 2 • The rank-1 update of A22 requires (n − k − 1) multiplications and (n − k − 1) additions. • Thus, the total cost (in flops) can be approximated by n−1 n−1 X X (n − k − 1) + 2(n − k − 1)2 = j + 2j2 (Change of variable: j = n − k − 1) k=0 j=0 n−1 n−1 X X = j + 2 j2 j=0 j=0 n(n − 1) Z n ≈ + 2 x2dx 2 0 n(n − 1) 2 = + n3 2 3 2 ≈ n3 3 Notice that this involves roughly half the number of floating point operations as are required for a Householder transformation based QR factorization. 3 LU Factorization with Partial Pivoting It is well-known that the LU factorization is numerically unstable under general circumstances.

Load more