Linear Algebra – Brief Review LAPACK Solving Linear Simultaneous Equations LU Decomposition – Rephrase Gaussian Elimination

Linear Algebra { brief review LAPACK Many good long textbooks Free packages. Download library. DO NOT CODE { use excellent free packages Search engine to find correct routine for you Nonlinear fluids ! many linear sub-problems, linear equations or linear least squares, e.g. Poisson problem, e.g. linear stability I or eigenvalues, singular decomposition, generalised Questions I precision: single/double, real/complex I \matrix inversion": Ax = b I matrix type: symmetric, SPD, banded I eigenvalues: Ae = λe Driver routine, calls computational routines, calls auxiliary (BLAS) Matrices Real, single, general matrix, linear equations SGESV(N; Nrhs; A; LDA; IPIV ; B; LBD; info) dense or sparse I where matrix A is N × N, with Nrhs b's in B. I symmetric, positive definite, banded,. Solving linear simultaneous equations LU decomposition { rephrase Gaussian elimination 1. Gaussian elimination Lower and Upper triangular a11x1 + a12x2 + ::: + a1nxn = b1 a21x1 + a22x2 + ::: + a2nxn = b2 01 0 0 01 0::::1 . B: 1 0 0C B0 :::C L = B C U = B C an1x1 + an2x2 + ::: + annxn = bn @:: 1 0A @0 0 ::A ::: 1 0 0 0 : Divide 1st eqn by a11, so coef x1 is 1 Subtract 1st eqn ×ak1 from kth eqn, so coef x1 becomes 0 Repeat on (n − 1) × (n − 1) subsystem of eqn 2 ! n Step k = 1 ! n: Repeat on even smaller subsystems ukj = akj for j = k ! n Finally back-solve ìk = aik =akk for i = k ! n aij aij − ìk ukj for i = k + 1 ! n, for j = k + 1 ! n annxn = bn ! xn an−1 n−1xn−1+ an−1 nxn = bn−1 ! xn−1 For a dense matrix 1 n3 multiplies . 3 . For a tridiagonal matrix, avoiding zeros 2n multiplies ! x1 Solve LUx = b by LU: pivoting Forward Ly = b `11y1 = b1 ! y1 Problem at step k if akk = 0 `21y1 + `22y2 = b1 ! y − 2 Find largest ajk in j = k ! n, say at j = ` . Swap rows k and ` { use index mapping (permutation matrix) ! yn Partial pivoting = swapping rows Full pivoting = swap rows and columns { rarely better Backward Ux = y unnxn = yn ! xn I Note det A = Πi uii un−1 n−1xn−1 + un−1 nxn = yn−1 ! xn−1 Symmetric A: A = LDLT with diagonal D . I . 1=2 1=2 T . I Sym & positive definite: A = (LD )(LD ) Cholesky ! x1 I Tridiagonal A: L diagonal and one under, U diagonal and one above. Finding LU is O(n3) but solving LUx = b for a new b is only O(n2) Errors Ax = b QR decomposition A = QR Small error in b could become /λmin error in solution, while worst solution is b/λmax I R upper triangular Thus relative error in solution could increase by factor T I Q orthogonal, QQ = I , i.e. columns orthonormal λ So at no cost Q−1 = QT K = max = condition number of A λmin I May not stretch/increase errors like LU I Used for eigenvalues I det A = Π r Theoretically LU decomposition gives bigger errors, but not often i ii Q not unique 3 methods: Gram-Schmidt, Givens, Householder QR Gram-Schmidt QR Givens rotation Columns of A a ; a ;:::; a 1 2 n Q = product of many rotations 0 0 0 q1 = a1 q1 = q1=jq1j 0 1 0 0 0 ij q2 = a2 −(a2 · q1)q1 q2 = q2=jq2j 0 0 0 B 1 C q = a3 −(a3 · q1)q1 −(a3 · q2)q2 q3 = q =jq j B C 3 3 3 B 1 C . B C . Bi cos θ sin θ C B C Gij = B 1 C Q = matrix with columns q1; q2;:::; qn B C B 1 C B C Let Bj − sin θ cos θ C 0 B C rii = jqi j; and rij = aj · qi ; i < j @ 1 A Then 1 j X G A alters rows and columns i and j a = q r i.e. A = QR ij j j ij Choose θ to zero an off-diagonal i=1 Strategy to avoid filling previous zeros Can parallelise Better: when produce qi project it out of aj j > i QR Householder Sparse matrices Q = product of many reflections Do not store all A, just non-zero elements in \packed" form hhT H = I − 2 h · h Evaluating Ax cheaper than O(n2) e.g. Poisson on N × N grid, A is N2 × N2 with 5N2 non-zero, T 2 4 Take h1 = a1 + (α1; 0;:::; 0) with α1 = ja1jsign(a11) so Ax is 5N not N So 2 h1 · a1 = ja1j + ja11jja1j and h1 · h1 = twice LU and QR \direct methods" for dense (faster if banded) Hence Use iterative method for sparse A T H1a1 = (−α1; 0;:::; 0) i.e. A = B + C ! x = B−1(b − Cx ) Now work on (n − 1) × (n − 1) subsystem in same way n+1 n converges if jB−1Cj < 1, e.g. Sor Note Hx is O(n) operations, not O(n3) Hence forming Q is O(n3) Conjugate gradients { A symmetric, positive definite GC not steepest descent rf { actually a direct method, but usually converges well before n steps Solve Ax = b by minimising quadratic Steepest descent ! rattle from side to side across steep valley 1 T −1 1 T T 1 T with no movement along the valley floor f (x) = 2 (Ax − b) A (Ax − b) = 2 x Ax − x b + 2 b Ab with Need new direction v which does not reset u minimisation rf = Ax − b 1 2 T f (xn + αu + βv) = f (xn) + αu · rfn + 2 α u Au T 1 2 T +αβu Av + βv · rfn + 2 β v Av From xn look in direction u for minimum 1 2 T f (xn + αu) = f (xn) + αu · rfn + 2 α u Au Hence need uT Av = 0 \conjugate directions" A i.e. minimum at α = −u · rfn=u u Choose u? steepest descent u = rf ? NO Conjugate Gradient Algorithm Precondition Ax = b same solution as B−1Ax = B−1b Start x0 and u0 Choose B with easy inverse and B−1A sparse Residual rn = Axn − b = rfn Typical ILU = incomplete LU, few large elements Iterate T un rn Non-symmetric A Minimising α = − T xn+1 = xn + αun un Aun Gmres minimises (Ax − b)T (Ax − b) r = r + αAu T { but condition number K 2 n+1 n n rn+1Aun Conj grad β = − T Gmres(n) restart after n { avoids large storage un+1 = rn+1 + βun un Aun Note only one matrix evaluation per iteration { good sparse If tough, then SVD = singular value decomposition X T Can show un+1 conjugate all ui i = 1; 2;:::; n A = USV = ui λi vi i r T r r T r n n n+1 n+1 with v and u eigenvalues and adjoints, λ eigenvalues Can show α = T , β = T i un Aun rn rn Eigenproblems Ae = λe and generalised Ae = λBe Power iteration { for largest evalue Start random x0 n I No finite/direct method { must iterate Iterate a few times xn+1 = Axn = A x0 I A real & symmetric { nice orthogonal evectors xn becomes dominated by evector with largest evalue, so I A not symmetric { possible degenerate cases also non-normal modes (& pseud-spectra. ) λapprox = jAxx j=jxnj; eapprox = Axx =jAxnj d x −1 k2 x = IC x(0) = 0y(0) = 1 With this crude approximation invert dt y 0 −1 − k y (A − λ I )−1 has solution x = k(e−t − e(1+k)t ) approx which eventually decays but before is k larger than IC. which has one very large evalue 1=(λcorrect − λapprox), so power iteration on this converges very rapidly Henceforth A real and symmetric Find other evalues with µ-shifts (A − µI )−1 Jacobi { small A only Main method Step 1: reduce to Hessenberg H, upper triangular plus one below diagonal 2 Arnoldi (GS on Kyrlov space q1; Aq1; A q1;:::) Find maximum off-diagonal aij Given unit q1, step k = 1 ! n − 1 Givens rotation Gij with θ to zero aij , and aji by symmetry v = Aqk A0 = GAG T has same evalues for j = 1 ! k Hjk = qj · v, v v − Hjk qj Hkk = jvj Does fill in previous zeros, qk+1 = v=Hk+1 k but sum of off-diagonals squared decreases by a2 ij Hence Hence converges to diagonal (=evalues) form original v = Aqk = Hk+1 k qk+1 + Hkk qk + ::: + H1k q1 i.e. A (q1; q2;:::; qn) = (q1; q2;:::; qn) H i.e. AQ = QH or H = QT AQ with same evalues as A Cost O(n2) if dense Main method, step 2 H = QT AQ Hessenberg a. QR Find QR decomposition of H Set H0 = RQ = QT AQ A symmetric ! H symmetric, hence tridiagonal { remains Hessenberg/Tridiagonal Hence reduce `for j = 1 ! k' to `for j = k − 1; k', { off-diagonals reduced by λi /λj Cost ! O(n2) (Lanzcos) ! converges to diagonal, of evalues b. Power iteration { quick when tridiagonal NB: making qk+1 orthogonal to qk & qk−1 gives qk+1 orthogonal to qj j = k; k − 1; k − 2;:::; 1 c. Root solve det(A − λI ) = 0 { quick if tridiagonal cf conjugate gradient BUT USE PACKAGES.

Linear Algebra – Brief Review LAPACK Solving Linear Simultaneous Equations LU Decomposition – Rephrase Gaussian Elimination

Recursive Approach in Sparse Matrix LU Factorization

Lecture 5 - Triangular Factorizations & Operation Counts

LU Factorization LU Decomposition LU Decomposition: Motivation LU Decomposition

LU-Factorization 1 Introduction 2 Upper and Lower Triangular Matrices

LU, QR and Cholesky Factorizations Using Vector Capabilities of Gpus LAPACK Working Note 202 Vasily Volkov James W

LU and Cholesky Decompositions. J. Demmel, Chapter 2.7

Math 2270 - Lecture 27 : Calculating Determinants

LU Decomposition S = LU, Then We Can  1 0 0   2 4 −2   2 4 −2  Use It to Solve Sx = F

Scalable Sparse Symbolic LU Factorization on Gpus

LU, QR and Cholesky Factorizations Using Vector Capabilities of Gpus LAPACK Working Note 202 Vasily Volkov James W

Gaussian Elimination and Lu Decomposition (Supplement for Ma511)

Effective GPU Strategies for LU Decomposition H