Linear Algebra Gaussian Elimination Norms Numerical Stability
Total Page:16
File Type:pdf, Size:1020Kb
Thomas Finley, [email protected] Norms Determinant For A Rm n, if rank(A) = n, then AT A is SPD. ∈ × A vector norm function · : Rn R satisfies: The determinant det : Rn n R satisfies: Linear Algebra / / → × → Basic Linear Algebra Subroutines 1. x 0, and x = 0 x = %0. 1. det(AB) = det(A) det(B). 2 2 A subspace is a set S Rn such that 0 S and x, y / / ≥ / / ⇔ 0. Scalar ops, like x + y . O(1) flops, O(1) data. γx γ x γ R x Rn ff R ⊆ ∈ ∀ ∈ 2. = | | · for all , and all . 2. det(A) = 0 i A is singular. 1. Vector ops, like !y = ax + y. O(n) flops, O(n) data. S, α, β . αx + βy S. / / / / ∈ Rn ∈ ∈ ∈ n 3. x + y x + y , for all x, y . 3. det(L) = ℓ1,1ℓ2,2 ··· ℓn,n for triangular L. T The span of {v1,..., vk} is the set of all vectors in R / / ≤ / / / / ∈ T 2. Matrix-vector ops, like rank-one update A = A + xy . Common norms include: 4. det(A) = det(A ). 2 2 that are linear combinations of v1,..., vk. T s O(n ) flops, O(n ) data. 1. x 1 = |x1| + |x2| + ··· + |xn| To compute det(A) factor A = P LU. det(P ) = ( 1) 2 A basis B of subspace S, B = {v1,..., vk} S has / / 2 2 2 − 3. Matrix-matrix ops, like C = C + AB. O(n ) data, ⊂ 2. x 2 = x1 + x2 + ··· + xn where s is the number of swaps, det(L) = 1. When com- 3 Span(B) = S and all vi linearly independent. / / 1 O(n ) flops. ! p p p puting det(U) watch out for overflow! 3. x = lim (|x1| + ··· + |xn| ) = max |xi| Use the highest BLAS level possible. Operators are ar- The dimension of S is |B| for a basis B of S. ∞ p i=1..n / / →∞ Ax ! Orthogonal Matrices chitecture tuned, e.g., data processed in cache-sized bites. For subspaces S, T with S T , dim(S) dim(T ), and An induced matrix norm is A ! = sup ' ' . It x=0 x ! Rn n ⊆ ≤ / / & ' ' For Q × , these statements are equivalent: further if dim(S) = dim(T ), then S = T . satisfies the three properties of norms. T∈ T Linear Least Squares Rn Rm n m n 1. Q Q = QQ = I (i.e., Q is orthogonal) A linear transformation T : has x, y x R ,A R × , Ax ! A ! x !. Suppose we have points (u1, v1),..., (u5, v5) that we want Rn R → ∀ ∈ ∀ ∈ ∈ / / ≤ / / / / 2. The · 2 for each row and column of Q. The inner 2 , α, β .T (αx + βy) = αT (x) + βT (y). Further, AB ! A ! B !, called submultiplicativity. / / to fit a quadratic curve au + bu + c through. We want to m∈n / / ≤ / / / / product of any row (or column) with another is 0. A R × such that x .T (x) Ax. T 2 a b a 2 b 2, called Cauchy-Schwarz inequality. n solve for u u1 1 v1 ∃ ∈ ∀ ≡ n m m ≤ / / / / 3. For all x R , Qx 2 = x 2. 1 a For two linear transformations T : R R , S : R 1. A = max n |a | (max row sum). ∈ / / / / . → → i=1,...,m j=1 i,j A matrix Q Rm n with m > n has orthonormal columns . b = . Rp, S T S(T (x)) is linear transformation. (T (x) / /∞ m × . 2. A 1 = maxj=1,...,n "i=1 |ai,j| (max column sum). ∈ T 2 c ◦ ≡ ≡ / / 3 2 if the columns are orthonormal, and Q Q = I. u5 u5 1 v5 Ax) (S(y) B) (S T )(x) BAx. 3. A 2 is hard: it takes" O(n ), not O(n ) operations. ∧ ≡ ⇒ ◦ ≡ / / The product of orthogonal matrices is orthogonal. The matrix’s row space is the span of its rows, its column n m 2 This is overdetermined so an exact solution is out. Instead, 4. A F = i=1 j=1 ai,j. · F often replaces · 2. For orthogonal Q, QA 2 = A 2 and AQ 2 = A 2. space or range is the span of its columns, and its rank is / / # / / / / / / / / / / / / find the least squares solution x that minimizes Ax b 2. " " / − / the dimension of either of these spaces. Numerical Stability QR-factorization For the method of normal equations, solve for x in m n Six sources of error in scientific computing: modeling er- Rm n T T For A R × , rank(A) min(m, n). A has full row (or For any A × with m n, we can factor A = QR, A Ax = A b by using Cholesky factorization. This takes ∈ ≤ rors, measurement or data errors, blunders, discretization ∈Rm m ≥ T 3 column) rank if rank(A) = m (or n). where Q × is orthogonal, and R = [ R1 0 ] 2 n ∈ ∈ mn + 3 + O(mn) flops. It is conditionally but not back- n n or truncation errors, convergence tolerance, and rounding Rm n ff T A diagonal matrix D R × has d = 0 for j = k. The × is upper triangular. rank(A) = n i R1 is invertible. wards stable: A A doubles the condition number. j,k exponent ∈ , errors. For single and double: T diagonal identity matrix I has ij,j = 1. Q’s first n (or last m n) columns form an orthonormal Alternatively, factor A = QR. Let c = [ c1 c2 ] = e − T The upper (or lower) bandwidth of A is max |i j| among ± d1.d2d3 ··· dt β t = 24, e ∈ {−126,..., 127} basis for span(A) (or nullspace(A )). T 1 × T Q b. The least squares solution is x = R1− c1. − &'$% t = 53, e ∈ {−1022,..., 1023} 2vv i, j where i j (or i j) such that A = 0. sign mantissa base A Householder reflection is H = I T . H is symmet- i,j − v v If rank(A) = r and r < n (rank deficient), factor A = ≥ ≤ , $%&' $ %& ' $%&' |ˆx x| ric and orthogonal. Explicit H.H. QR-factorization is: Σ T T T A matrix with lower bandwidth 1 is upper Hessenberg. The relative error in ˆx approximating x is |−x| . U V , let y = V x and c = U b. Then, min Ax n n t+1 / − For A, B R , B is A’s inverse if AB = BA = I. If ff 1: for k = 1 : n do r 2 m 2 ci × Unit roundo or machine epsilon is ǫmach = β− . b 2 = min (σiyi ci) + c , so yi = . For ∈ 1 2: / i=1 − i=r+1 i σi such a B exists, A is invertible or nonsingular. B = A− . Arithmetic operations have relative error bounded by ǫmach. v = A(k : m, k) ± A(k : m, k) 2e1 # i = r + 1 : n, "yi is arbitrary. " 1 / 2vvT / The inverse of A is A− = [x1, ··· , xn] where Axi = ei. E.g., consider z = x y with input x, y. This program has 3: A(k : m, k : n) = I A(k : m, k : n) − vT v For A Rn n the following are equivalent: A is nonsin- three roundoff errors.z ˆ = ((1 + δ )x (1 + δ )y) (1 + δ ), ( − ) Singular Value Decomposition × 1 2 3 4: end for m n T ∈ − For any A R × , we can express A = UΣV such gular, rank(A) = n, Ax = b has a solution x for any b, if where δ1, δ2, δ3 [ ǫmach, ǫmach]. We get HnHn 1 ··· H1A = R, so then, Q = H1H2 ··· Hn. R∈m m Rn n ∈ − 2 that U × and V × are orthogonal, and Ax = 0 then x = 0. |z zˆ| |(δ1+δ3)x (δ2+δ3)y+O(ǫ )| −2 2 3 − = − mach This takes 2mn n + O(mn) flops. Σ ∈ Rm∈n Rm n x Rn x 0 |z| |x y| − 3 = diag(σ1, ··· , σp) × where p = min(m, n) and The nullspace of A × is { : A = }. − Givens requires 50% more flops. Preferable for sparse A. ∈ Rm n ∈ ∈ T The bad case is where δ1 = ǫmach, δ2 = ǫmach, δ3 = 0: σ1 σ2 ··· σp 0. The σi are singular values. For A × , Range(A) and Nullspace(A ) are − The Gram-Schmidt produces a skinny/reduced QR- ≥ ≥ ≥ ≥ ∈ |z zˆ| |x+y| 1. Matrix 2-norm, where A 2 = σ1. orthogonal complements, i.e., x Range(A), y − = ǫmach m n |z| |x y| factorization A = Q R , where Q R has or- / / 1 σ1 T T ∈ m ∈ − 1 1 1 × 2. The condition number κ (A) = A A = , or Nullspace(A ) x y = 0, and for all p R , p = x + y ∈ 2 2 − 2 σn Inaccuracy if |x+y| |x y| called catastrophic calcellation. thonormal columns. The Gram-Schmidt algorithm is: / / / σ/1 ⇒ ∈ rectangular condition number κ2(A) = . Note for unique x and y. ≫ − Left Looking Right Looking σmin(m,n) n n Conditioning & Backwards Stability T 2 For a permutation matrix P R , PA permutes the that κ2(A A) = κ2(A) . ∈ × 1: for k = 1 : n do 1: Q = A rows of A, AP the columns of A. P 1 = P T . A problem instance is ill conditioned if the solution is sen- 3. For a rank k approximation to A, let Σ = − 2: q = a 2: for k = 1 : n do k sitive to perturbations of the data. For example, sin 1 is k k diag(σ , ··· , σ , 0T ). Then A = UΣ V T . rank(A ) 3: for j = 1 : k 1 do 3: R(k, k) = q 1 k k k k Gaussian Elimination well conditioned, but sin 12392193 is ill conditioned. k 2 ff σ ≤ 4: −T 4: / / k and rank(Ak) = k i k > 0. Among rank k or lower GE produces a factorization A = LU, GEPP PA = LU.