Notes on Matrix Computation University of Chicago, 2014
Total Page:16
File Type:pdf, Size:1020Kb
Notes on Matrix Computation University of Chicago, 2014 Vivak Patel September 7, 2014 1 Contents 1 Introduction 3 1.1 Variations of solving Ax = b .....................3 1.2 Norms . .3 1.3 Error Analysis . .6 1.4 Floating Point Numbers . .7 2 Eigenvalue Decomposition 10 2.1 Eigenvalues and Eigenvectors . 10 2.2 Jordan Canonical Form . 10 2.3 Spectra . 12 2.4 Spectral Radius . 12 2.5 Diagonal Dominance and Gerschgorin's Disk Theorem . 13 3 Singular Value Decomposition 15 3.1 Theory . 15 3.2 Applications . 16 4 Rank Retaining Factorization 25 4.1 Theory . 25 4.2 Applications . 25 5 QR & Complete Orthogonal Factorization 27 5.1 Theory . 27 5.2 Applications . 28 5.3 Givens Rotations . 29 5.4 Householder Reflections . 30 6 LU, LDU, Cholesky and LDL Decompositions 31 7 Iterative Methods 32 7.1 Overview . 32 7.2 Splitting Methods . 32 7.3 Semi-Iterative Methods . 35 7.4 Krylov Space Methods . 36 2 1 Introduction 1.1 Variations of solving Ax = b 1. Linear Regression. A is known and b is known but corrupted by some error unknown error r. Our goal is to find x such that: x 2 arg min kAx − bk2 = arg minfkrk2 : Ax + r = bg n 2 2 x2R 2. Data Least Squares. A is known but corrupted by some unknown error E. We want to determine: x 2 arg min fkEk2 :(A + E)x = bg n F x2R 3. Total least squares. A and b are both corrupted by errors E and r (resp.). We want to determine: x 2 arg min fkEk2 + krk2 :(A + E)x + r = bg n F 2 x2R 4. Minimum norm least squares. Given any A and b, we want x such that: 2 2 2 T T x 2 arg minfkzk2 : z 2 arg min kAz − bk2g = arg minfkzk2 : A Az = A bg 5. Robust Regression. The linear regression problem with a different norm for the error r. 6. Regularized Least Squares. Given a matrix Γ, we want to find: 2 2 x 2 arg minfkAx − bk2 + kΓxk2g 7. Linear Programming. x 2 arg minfcT x : Ax ≤ bg 8. Quadratic Programming. x 2 arg minf0:5xT Ax + cT x : Bx = dg 1.2 Norms 1. Norm. Definition 1.1. A norm is a real-valued function defined over a vector space, denoted by k·k such that: (a) kxk ≥ 0 (b) kxk = 0 if and only if x = 0 (c) kx + yk ≤ kxk + kyk (d) kαxk = jαj kxk for any scalar α 2. Vector Norms (a) p-norms Pn p 1=p i. For p ≥ 1 the p-norm of x 2 V is kxkp = ( i=1 jxij ) 3 ii. For p = 1, the 1-norm or Chebyshev norm is kxk1 = maxi=1;:::;n jxij iii. The Chebyshev norm is the limit of p-norms Lemma 1.1. Let x 2 V then limp!1 kxkp = kxk1. Proof. Let kxk1 = 1. Then 9j 2 f1; : : : ; ng such that xi = 1. Then: n !1=p X p 1=p lim jxij ≤ lim n = 1 p!1 p!1 i=1 Secondly, n !1=p X p lim jxij ≥ lim jxjj = 1 p!1 p!1 i=1 iv. Weighted p-norms: add a non-negative weight term to each com- ponent in the sum. (b) Mahalanobis norm. Let A be a symmetric matrix. p ∗ kxkA = x Ax 3. Matrix Norms (a) Compatible. Submultiplicative/Consistent. Definition 1.2. Let k·kM be a matrix norm and k·kV be a vector norm. i. A matrix norm is compatible with a vector norm if: kAxkV ≤ kAkM kxkV ii. A matrix norm is consistent or submultiplicative if: kABkM ≤ kAkm kBkM (b) Holder Norms 1=p Pm Pn p i. The Holder p-norm of A is kAkH;p = j=1 i=1 jaijj ii. The Holder 2-norm is called the Frobenius norm iii. The Holder 1-norm is kAkH;p = maxi;j jaijj (c) Induced Norms i. Induced Norms. Spectral Norm Definition 1.3. Let k·kα ; k·kβ be vector norms. The matrix norm k·kα,β is the induced norm defined by: kAxkα kAkα,β = max x6=0 kxkβ When α = β = 2, the induced norm is called the spectral norm. ii. Equivalent Definitions 4 Lemma 1.2. The following are equivalent definitions for an in- duced norm: A. kAkα,β = supfkAxkα : kxkβ = 1g B. kAkα,β = supfkAxkα : kxkβ ≤ 1g Proof. Let x = v . Using this in the definition, we see that kvkβ the definition and first characterization are equivalent. For the second characterization, already have that the norm from the second characterization is necessarily less than or equal to the one from the definition. Assume that it is strictly less than the one from the definition. From the first characterization, we know that the norm is achieved and so let this point be x0. At x0=2 the norm is still maximized, so the definitions are equivalent. iii. Compatibility Lemma 1.3. Letting k·k = k·kα,β: kAxk ≤ kAk kxk Proof. For any x 6= 0: kAxk kAk =≥ kxk When x = 0, the result holds simply by plugging in values. iv. Consistency Lemma 1.4. Letting k·k = k·kα,β: kABk ≤ kAk kBk Proof. kA(Bx)k kABk = max kxk kBxk ≤ kAk max x ≤ kAk kBk m×n v. Computing k·k1;1 norm. A 2 F . Lemma 1.5. m X kAk = max jaijj 1;1 j=1;:::;n i=1 5 Proof. There exists an x such that kxk1 = 1 and kAxk1 = kAk. Therefore: m n X X kAk = jaijjjxjj i=1 j=1 n m X X = jxjj jaijj j=1 i=1 m ! n X X ≤ max jaijj jxjj j i=1 j=1 m X ≤ max jaijj j i=1 For the other direction, suppose the maximum occurs at the kth column. Then kAekk1 ≤ kAk but the left hand side is right hand side of the array of equations above. vi. Computing k·k1. Lemma 1.6. n X kAk = max jaijj 1 i=1;:::;m j=1 Proof. There is an x such that kxk1 = 1 and kAxk1 = kAk. Therefore: kAk = kAxk1 n X = max jaijxjj i=1;:::;m j=1 n X ≤ max jaijj i=1;:::;m j=1 For the other direction, let k be the index of the maximizing row. Let x be a vector such that xi = sgn(aki). Then kxk = 1 and Pn kAk ≥ kAxk1 = j=1 jakjj. 1.3 Error Analysis 1. Types of Error given for true value x and computed valuex ^ (a) kx^ − xk is the absolute error, but it depends on units (b) kx^ − xk = kxk is the relative error, and it does not depend on units x^i−xi (c) Pointwise error: compute kyk where yi = 1 [xi 6= 0]. xi 2. Backwards Error Analysis (a) Notation 6 i. Suppose we want to solve Ax = b and we denote ∆(A; b) =x ^ the algorithm which produces the estimate. ii. The condition number of a matrix κ(A) = kAk A−1 . iii. Let ρ = kδAk kAk for some small perturbation matrix δA. (b) Idea: Viewx ^ as the solution to a nearby system (A + δA)x = b + δb. (c) Error bound Lemma 1.7. Suppose A is an invertible matrix and we have a com- kδAk kδbk patible norm. If kAk ≤ and kbk ≤ then kx − x^k 2 ≤ κ(A) kxk 1 − ρ Proof. Note that: (I + A−1δA)(^x − x) = A−1 (δb − δAx) Then: −1 (1 − ρ) kx^ − xk ≤ A (kδbk + kδAk kxk) Dividing both sides by kxk and multiplying the right hand side by 1 = kAk = kAk: kx^ − xk kδbk kδAk ≤ κ(A) + kxk kAk kxk kAk Noting that kbk ≤ kAk kxk, the result follows. 1.4 Floating Point Numbers 1. Motivation: computers do not have infinite memory and so they can only store numbers up to a certain precision 2. Floating Point Numbers, sign, mentissa, exponent, base Definition 1.4. Floating Point Numbers are F ⊂ Q, that have the fol- lowing representation: e1:::el ±a1a2 : : : ak × b (a) ± is called the sign (b) a1 : : : ak is called the mentissa and ai are values in some finite field (c) e1 : : : el is called the exponent and ek are values in some finite field (d) b is the base 3. Floating Point Representation Standards (a) Floating Point Representation. Machine Precision. 7 Definition 1.5. A floating point representation is a function fl : R ! F which is characterized by the machine precision denoted m and defined as: m = inffx 2 R : x > 0; fl(1 + x) 6= 1g 0 0 (b) Standard 1: 8x 2 R, 9x 2 F such that jx − x j ≤ mjxj (c) Standard 2: 8x; y 2 R and there is an j1j ≤ m: fl(x ± y) = (x ± y)(1 + 1) (d) Standard 3: 8x; y 2 R and there is an j2j ≤ m: fl(xy) = (xy)(1 + 2) (e) Standard 4: 8x; y 2 R such that y 6= 0, there is an j3j ≤ m: fl(x=y) = (x=y)(1 + 3) 4. Floating Point in Computers (a) Fields: b = 2 and ai; ej 2 f0; 1g (b) Storage i. A floating point number requires 1 + l + k bits of storage using the following storage: ±|e1je2j · · · jelja1ja2j · · · jak ii. 32-bit computers: 1 bit for sign, 8 bits for exponent, and 23 bits for the mentissa iii. 64-bit computers: 1 bit for sign, 11 bits for exponent, and 52 bits for the mentissa (c) Errors and (typical) Handling i. Round-off Error is when the number is more precise than the mentissa allows, and is handled by cutting off lower priority val- ues.