Linear Algebra Gaussian Elimination Norms Numerical Stability

[email protected] T r 2 m 2 ci Thomas Finley, 1. x 1 = x1 + x2 + + xn Positive Definite, A = LDL min (σiyi ci) + c , so yi = . For i = r +1: k k | | | | ··· | | i=1 i=r+1 i σi x 2 2 2 n n q − 2. 2 = x1 + x2 + + xn A R × is positive definite (PD) (or semidefinite (PSD)) if n, y isP arbitrary. P Linear Algebra k k ··· 1 i p p p p T ∈ T Rn 0 x y 3. x = lim ( x1 + + xn ) = max xi x Ax > 0 (or x Ax 0). A subspace is a set S such that S and , S,α,β k k∞ p | | ··· | | i=1..n | | ≥ ⊆ ∈ ∀ ∈ ∈ →∞ When LU-factorizing symmetric A, the result is A = LDLT ; Singular Value Decomposition R . αx + βy S. Ax m n T An induced matrix norm is A = sup k k . It satisfies For any A R × , we can express A = UΣV such n ∈ x=0 x L is unit lower triangular, D is diagonal. A is SPD iff D has x R is a linear combination of v1, , vk if β1, ,βk R k k 6 k k m∈ m n n the three properties of norms. T that U R × and V R × are orthogonal, and Σ = ∈ x v v ··· ∃ ··· ∈ all positive entries. The Cholesky factorization is A = LDL = ∈ ∈ such that = β1 1 + + βk k. Rn Rm n 3 Rm n ··· n x ,A × , Ax A x . 1/2 1/2 T T n 2 diag(σ1, , σp) × where p = min(m, n) and σ1 σ2 The span of v1,..., vk is the set of all vectors in R that ∀ ∈ ∈ k k ≤ k k k k LD D L = GG . Can be done directly in 3 +O(n ) flops. ··· ∈ ≥ ≥ AB A B , called submultiplicativity. σp 0. The σi are singular values. { } If G’s diagonal is positive, A is SPD. ···≥ ≥ are linear combinations of v1,..., vk. kT k ≤ k k k k 1. Matrix 2-norm, where A = σ . a b a 2 b 2, called Cauchy-Schwarz inequality. To solve Ax = b for SPD A, factor A = GGT , solve Gw = b 2 1 A basis B of subspace S, B = v1,..., vk S has ≤ k k k k n k k 1 σ1 T 2. The condition number κ2(A)= A 2 A− 2 = , or rect- { } ⊂ 1. A = maxi=1,...,m j=1 ai,j (max row sum). by forward substitution, then solve G x = w with backwards σn Span(B)= S and all vi linearly independent. k k∞ m | | k k k σ1k n3 2 angular condition number κ2(A) = . Note that 2. A 1 = maxj=1,...,n Pi=1 ai,j (max column sum). substitution, which takes + O(n ) flops. σmin(m,n) The dimension of S is B for a basis B of S. k k | | 3 T 2 | | 3. A is hard: it takes O(n3), not O(n2) operations. m n T κ (A A)= κ (A) . For subspaces S, T with S T , dim(S) dim(T ), and fur- 2 P For A R × , if rank(A)= n, then A A is SPD. 2 2 k k n m ∈ ther if dim(S)= dim(T ), then⊆S = T . ≤ 4. A = a2 . often replaces . 3. For a rank k approximation to A, let Σk = k kF i=1 j=1 i,j k · kF k · k2 T T A linear transformation T : Rn Rm has x, y Rn,α,β q QR-factorization diag(σ1, , σk, 0 ). Then Ak = UΣkV . rank(Ak) k P P m n ··· ≤ → ∀ ∈ m n ∈ Numerical Stability For any A R with m n, we can factor A = QR, where and rank(Ak) = k iff σk > 0. Among rank k or lower R . T (αx + βy) = αT (x)+ βT (y). Further, A R × such × Six sources of error in scientific computing: modeling errors, mea- Rm m∈ ≥ T Rm n that x . T (x) Ax. ∃ ∈ Q × is orthogonal, and R =[ R1 0 ] × is upper matrices, Ak minimizes A Ak 2 = σk+1. ∀ ≡ surement or data errors, blunders, discretization or truncation ∈ ∈ k − k For two linear transformations T : Rn Rm, S : Rm Rp, triangular. rank(A)= n iff R1 is invertible. 4. Rank determination, since rank(A) = r equals the num- errors, convergence tolerance, and rounding errors. S T S(T (x)) is linear transformation. (→T (x) Ax) (S→(y) Q’s first n (or last m n) columns form an orthonormal basis ber of nonzero σ, or in machine arithmetic, perhaps the exponent − T B◦y) ≡ (S T )(x) BAx. ≡ ∧ ≡ For single and double: for span(A) (or nullspace(A )). number of σ ǫmach σ1. e T ≥ × T ⇒ ◦ ≡ d1.d2d3 dt β t = 24, e ∈ {−126,..., 127} 2vv T Σ(1 : r, 1: r) 0 V A Householder reflection is H = I T . H is symmetric 1 The matrix’s row space is the span of its rows, its column ± ··· × v v A = UΣV = U1 U2 T z}|{ t = 53, e ∈ {−1022,..., 1023} − 0 0 V space or range is the span of its columns, and its rank is the sign mantissa base and orthogonal. Explicit H.H. QR-factorization is: 2 ˆx x |{z} | {z } |{z} 1: dimension of either of these spaces. The relative error in ˆx approximating x is | −x | . for k =1: n do See that range(U1)= range(A). The SVD gives an orthonormal Rm n | | t+1 2: T For A × , rank(A) min(m, n). A has full row (or Unit roundoff or machine epsilon is ǫmach = β− . Arith- v = A(k : m, k) A(k : m, k) 2e1 basis for the range and nullspace of A and A . ∈ ≤ ± k T k column) rank if rank(A)= m (or n). 2vv T metic operations have relative error bounded by ǫmach. 3: A(k : m, k : n)= I T A(k : m, k : n) Compute the SVD by using shifted QR on A A. A diagonal matrix D Rn n has d = 0 for j = k. The − v v × j,k E.g., consider z = x y with input x, y. This program has 4: end for diagonal identity matrix I∈has i = 1. 6 − Information Retrival & LSI j,j three roundoff errors.z ˆ = ((1+ δ1)x (1 + δ2)y)(1+ δ3), where m − 2 We get H H H A = R, so then, Q = H H H . This In the bag of words model, wd R , where wd(i) is the (per- The upper (or lower) bandwidth of A is max i j among i,j z zˆ (δ1+δ3)x (δ2+δ3)y+O(ǫ ) n n 1 1 1 2 n | − mach | −2 ··· ··· ∈ | − | δ1, δ2, δ3 [ ǫmach,ǫmach]. | −z | = x y 2 3 where i j (or i j) such that A = 0. ∈ − takes 2mn 3 n + O(mn) flops. haps weighted) frequency of term i in document d. The corpus i,j | | | − | − m n m ≥ ≤ 6 The bad case is where δ1 = ǫmach, δ2 = ǫmach, δ3 = 0: Givens requires 50% more flops. Preferable for sparse A. matrix is A = [w1, , wn] R × . For a query q R , rank A matrix with lower bandwidth 1 is upper Hessenberg. z zˆ x+y − ··· q∈T w ∈ Rn n | −z | = ǫmach |x y| Inaccuracy if x + y x y called catas- The Gram-Schmidt produces a skinny/reduced QR- documents according to a d score. For A, B × , B is A’s inverse if AB = BA = I. If such | | | − | | |≫| − | wd 2 m n k k ∈ 1 trophic calcellation. factorization A = Q R , where Q R × has orthonormal a B exists, A is invertible or nonsingular. B = A− . 1 1 1 ∈ In latent semantic indexing, you do the same, but in a 1 columns. The Gram-Schmidt algorithm is: T The inverse of A is A− =[x1, , xn] where Axi = ei. Conditioning & Backwards Stability k dimensional subspace. Factor A = UΣV , then define n n ··· T k n T For A R the following are equivalent: A is nonsingular, Left Looking Right Looking A∗ = Σ V R × . Each w∗ = A∗ = U w , and × A problem instance is ill conditioned if the solution is sensitive to 1:k,1:k :,1:k ∈ d :,d :,1:k d ∈ x b b x 0 x 0 1: for k =1: n do 1: Q = A T rank(A)= n, A = is solvable for any , A = iff = . perturbations of the data. For example, sin 1 is well conditioned, q∗ = U:,1:kq. Rn T n 2: q a 2: for do The inner product of x, y is x y = i=1 xiyi. but sin 12392193 is ill conditioned. k = k k =1: n In the Ando-Lee analysis, for a corpus with k topics, for Rn ∈ T 3: for do 3: q Vectors x, y are orthogonal if x y =P 0. Suppose we perturb Ax = b by (A + E)ˆx = b + e where j =1: k 1 R(k,k)= k 2 t 1: k and d 1: n, let Rt,d 0 be document d’s relevance to ∈ m n n −T k k ∈ ∈ ≥ T n n The nullspace or kernel of A R is x R : Ax = 0 . E e ˆx+x 2 4: R(j,k)= q ak 4: qk = qk/R(k,k) R × k k δ, k k δ. Then k k 2δκ(A) + O(δ ), where j topic t. R:,d 2 = 1. True document similarity is RR = × , Rm n ∈ { ∈T } A b x 5: q q q 5: for do k k For A × , Range(A) and Nullspace(A ) are orthogonal k k ≤ k k ≤1 k k ≤ k = k R(j,k) j j = k +1: n where entry (i,j) is relevance of i to j.

Load more