Low Rank Approximation Lecture 1

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL [email protected] 1 Organizational aspects I Lectures: Tuesday 8-10, MA A110. First: September 25, Last: December 18. I Exercises: Tuesday 8-10, MA A110. First: September 25, Last: December 18. I Exam: Miniproject + oral exam. I Webpage: https://anchp.epfl.ch/lowrank. I [email protected], [email protected] 2 From http://www.niemanlab.org ... his [Aleksandr Kogan’s] message went on to confirm that his approach was indeed similar to SVD or other matrix factorization meth- ods, like in the Netflix Prize competition, and the Kosinki-Stillwell- Graepel Facebook model. Dimensionality reduction of Facebook data was the core of his model. 3 Rank and basic properties For field F, let A 2 F m×n. Then rank(A) := dim(range(A)): For simplicity, F = R throughout the lecture and often m ≥ n. Lemma Let A 2 Rm×n. Then 1. rank(AT ) = rank(A); 2. rank(PAQ) = rank(A) for invertible matrices P 2 Rm×m, Q 2 Rn×n; 3. rank(AB) ≤ minfrank(A); rank(B)g for any matrix B 2 Rn×p. A11 A12 m ×n 4. rank = rank(A11) + rank(A22) for A11 2 R 1 1 , 0 A22 m ×n m ×n A12 2 R 1 2 ,A22 2 R 2 2 . Proof: See Linear Algebra 1 / Exercises. 4 Rank and matrix factorizations m Let B = fb1;:::; br g ⊂ R with r = rank(A) be basis of range(A). Then each of the columns of A = a1; a2;:::; an can be expressed as linear combination of B: 2 3 ci1 ;:::; 6 . 7 ai = b1ci1 + b2ci2 + ··· + br cir = b1 br 4 . 5 ; cir for some coefficients cij 2 R with i = 1;:::; n, j = 1;:::; r. Stacking these relations column by column 2 3 c11 ··· cn1 ;:::; ;:::; 6 . 7 a1 an = b1 br 4 . 5 c1r ··· cnr 5 Rank and matrix factorizations Lemma. A matrix A 2 Rm×n of rank r admits a factorization of the form T m×r n×r A = BC ; B 2 R ; C 2 R : We say that A has low rank if rank(A) m; n. Illustration of low-rank factorization: A BCT #entries mn mr + nr I Generically (and in most applications), A has full rank, that is, rank(A) = minfm; ng. I Aim instead at approximating A by a low-rank matrix. 6 Questions addressed in lecture series What? Theoretical foundations of low-rank approximation. When? A priori and a posteriori estimates for low-rank approximation. Situations that allow for low-rank approximation techniques. Why? Applications in engineering, scientific computing, data analysis, ... where low-rank approximation plays a central role. How? State-of-the-art algorithms for performing and working with low-rank approximations. Will cover both, matrices and tensors. 7 Literature for Lecture 1 Golub/Van Loan’2013 Golub, Gene H.; Van Loan, Charles F. Matrix computations. Fourth edition. Johns Hopkins University Press, Baltimore, MD, 2013. Horn/Johnson’2013 Horn, Roger A.; Johnson, Charles R. Matrix analysis. Second edition. Cambridge University Press, 2013. + References on slides. 8 1. Fundamental tools I SVD I Relation to eigenvalues I Norms I Best low-rank approximation 9 The singular value decomposition Theorem (SVD). Let A 2 Rm×n with m ≥ n. Then there are orthogonal matrices U 2 Rm×m and V 2 Rn×n such that 2 3 σ1 6 .. 7 T 6 . 7 m×n A = UΣV ; with Σ = 6 7 2 R 4 σn5 0 and σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0. I σ1; : : : ; σn are called singular values I u1;:::; un are called left singular vectors I v1;:::; vn are called right singular vectors T I Avi = σi ui , A ui = σi vi for i = 1;:::; n. I Singular values are always uniquely defined by A. I Singular values are never unique. If σ1 > σ2 > ··· σn > 0 then unique up to ui ±ui , vi ±vi . 10 SVD: Sketch of proof Induction over n. n = 1 trivial. For general n, let v1 solve maxfkAvk2 : kvk2 = 1g =: kAk2: Set 1 σ1 := kAk2 and u1 := Av1/σ1. By definition, Av1 = σ1u1: m×m After completion to orthogonal matrices U1 = u1; U? 2 R and n×n V1 = v1; V? 2 R : T T T T u1 Av1 u1 AV? σ1 w U1 AV1 = T T = ; U?Av1 U?AV? 0 A1 T T T with w := V?A u1 and A1 = U?AV?. k · k2 invariant under orthogonal transformations T q T σ1 w 2 2 σ1 = kAk2 = kU1 AV1k2 = ≥ σ1 + kwk2: 0 A1 2 Hence, w = 0. Proof completed by applying induction to A1. 1 If σ1 = 0, choose arbitrary u1. 11 Very basic properties of the SVD I r = rank(A) is number of nonzero singular values of A. I kernel(A) = spanfvr+1;:::; vng I range(A) = spanfu1;:::; ur g 12 SVD: Computation (for small dense matrices) Computation of SVD proceeds in two steps: 1. Reduction to bidiagonal form: By applying n Householder reflectors from left and n − 1 Householder reflectors from right, compute orthogonal matrices U1, V1 such that 2 3 B @@ UT AV = B = 1 = @@ ; 1 1 0 4 @ 5 0 n×n that is, B1 2 R is an upper bidiagonal matrix. 2. Reduction to diagonal form: Use Divide&Conquer to compute T orthogonal matrices U2, V2 such that Σ = U2 B1V2 is diagonal. Set U = U1U2 and V = V1V2. Step 1 is usually the most expensive. Remarks on Step 1: I If m is significantly larger than n, say, m ≥ 3n=2, first computing QR decomposition of A reduces cost. I Most modern implementations reduce A successively via banded form to bidiagonal form.2 2Bischof, C. H.; Lang, B.; Sun, X. A framework for symmetric band reduction. ACM Trans. Math. Software 26 (2000), no. 4, 581–601. 13 SVD: Computation (for small dense matrices) In most applications, vectors un+1;:::; um are not of interest. By omitting these vectors one obtains the following variant of the SVD. Theorem (Economy size SVD). Let A 2 Rm×n with m ≥ n. Then there is a matrix U 2 Rm×n with orthonormal columns and an orthonormal matrix V 2 Rn×n such that 2 3 σ1 T 6 .. 7 n×n A = UΣV ; with Σ = 4 . 5 2 R σn and σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0. Computed by MATLAB’s [U,S,V] = svd(A,’econ’). Complexity: memory operations singular values only O(mn) O(mn2) economy size SVD O(mn) O(mn2) (full) SVD O(m2 + mn) O(m2n + mn2) 14 SVD: Computation (for small dense matrices) Beware of roundoff error when interpreting singular value plots. Exmaple: semilogy(svd(hilb(100))) 10 0 10 -10 10 -20 0 20 40 60 80 100 I Kink is caused by roundoff error and does not reflect true behavior of singular values. 3 I Exact singular values are known to decay exponentially. 4 I Sometimes more accuracy possible. 3Beckermann, B. The condition number of real Vandermonde, Krylov and positive definite Hankel matrices. Numer. Math. 85 (2000), no. 4, 553–577. 4Drmac,ˇ Z.; Veselic,´ K. New fast and accurate Jacobi SVD algorithm. I. SIAM J. Matrix Anal. Appl. 29 (2007), no. 4, 1322–1342 15 Singular/eigenvalue relations: symmetric matrices Symmetric A = AT 2 Rn×n admits spectral decomposition T A = U diag(λ1; λ2; : : : ; λn)U with orthogonal matrix U. After reordering may assume jλ1j ≥ jλ2j ≥ · · · ≥ jλnj. Spectral decomposition can be turned into SVD A = UΣV T by defining Σ = diag(jλ1j;:::; jλnj); V = U diag(sign(λ1);:::; sign(λn)): Remark: This extends to the more general case of normal matrices (e.g., orthogonal or symmetric) via complex spectral or real Schur decompositions. 16 Singular/eigenvalue relations: general matrices Consider SVD A = UΣV T of A 2 Rm×n with m ≥ n. We then have: 1. Spectral decomposition of Gramian T T T 2 2 T A A = V Σ ΣV = V diag(σ1; : : : ; σn)V T 2 2 A A has eigenvalues σ1; : : : ; σn, right singular vectors of A are eigenvectors of AT A. 2. Spectral decomposition of Gramian T T T 2 2 T AA = UΣΣ U = U diag(σ1; : : : ; σn; 0;:::; 0)U T 2 2 AA has eigenvalues σ1; : : : ; σn and, additionally, m − n zero eigenvalues, first n left singular vectors A are eigenvectors of AAT . 3. Decomposition of Golub-Kahan matrix T 0 A U 0 0 Σ U 0 A = = AT 0 0 V ΣT 0 0 V A ±σ = ;:::; − eigenvalues of are j , j 1 n, and zero (m n times). 17 Norms: Spectral and Frobenius norm Given SVD A = UΣV T , one defines: I Spectral norm: kAk2 = σ1. q 2 2 I Frobenius norm: kAkF = σ1 + ··· + σn. Basic properties: I kAk2 = maxfkAvk2 : kvk2 = 1g (see proof of SVD). I k · k2 and k · kF are both (submultiplicative) matrix norms. I k · k2 and k · kF are both unitarily invariant, that is kQAZk2 = kAk2; kQAZkF = kAkF for any orthogonal matrices Q; Z. p I kAk2 ≤ kAkF ≤ kAk2= r I kABkF ≤ minfkAk2kBkF ; kAkF kBk2g 18 Euclidean geometry on matrices n×n Let B 2 R have eigenvalues λ1; : : : ; λn 2 C. Then trace(B) := b11 + ··· + bnn = λ1 + ··· + λn: In turn, 2 T T X 2 kAkF = trace A A = trace AA = aij : i;j Two simple consequences: I k · kF is the norm induced by the matrix inner product T m×n hA; Bi := trace(AB ); A; B 2 R : I Partition A = a1; a2;:::; an and define vectorization 2 3 a1 6 . 7 mn vec(A) = 4 . 5 2 R : an Then hA; Bi = hvec(A); vec(B)i and kAkF = k vec(A)k2.

Low Rank Approximation Lecture 1

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support