Lecture 15: July 22, 2013 1 Expander Mixing Lemma 2 Singular Value
Total Page:16
File Type:pdf, Size:1020Kb
REU 2013: Apprentice Program Summer 2013 Lecture 15: July 22, 2013 Madhur Tulsiani Scribe: David Kim 1 Expander Mixing Lemma Let G = (V; E) be d-regular, such that its adjacency matrix A has eigenvalues µ1 = d ≥ µ2 ≥ · · · ≥ µn. Let S; T ⊆ V , not necessarily disjoint, and E(S; T ) = fs 2 S; t 2 T : fs; tg 2 Eg. Question: How close is jE(S; T )j to what is \expected"? Exercise 1.1 Given G as above, if one picks a random pair i; j 2 V (G), what is P[i; j are adjacent]? dn #good cases #edges 2 d Answer: P[fi; jg 2 E] = = n = n = #all cases 2 2 n − 1 Since for S; T ⊆ V , the number of possible pairs of vertices from S and T is jSj · jT j, we \expect" d jSj · jT j · n−1 out of those pairs to have edges. d p Exercise 1.2 * jjE(S; T )j − jSj · jT j · n j ≤ µ jSj · jT j Note that the smaller the value of µ, the closer jE(S; T )j is to what is \expected", hence expander graphs are sometimes called pseudorandom graphs for µ small. We will now show a generalization of the Spectral Theorem, with a generalized notion of eigenvalues to matrices that are not necessarily square. 2 Singular Value Decomposition n×n ∗ ∗ Recall that by the Spectral Theorem, given a normal matrix A 2 C (A A = AA ), we have n ∗ X ∗ A = UDU = λiuiui i=1 where U is unitary, D is diagonal, and we have a decomposition of A into a nice orthonormal basis, m×n consisting of the columns of U. We show that for any rectangular matrix A 2 C , we can also have a decomposition in the following form A = UΣV ∗ m×m n×n where U 2 C , V 2 C are unitary (orthogonal if real), and Σ is \almost diagonal", giving us a nice pair of orthonormal bases to work with. 1 m n Definition 2.1 (Singular Value) σ ≥ 0 is called a \singular value" if 9u 2 C ; v 2 C (jjujj = jjvjj = 1) such that Av = σu and A∗u = σv, i.e. u and v are like \pairs". v is called \the right singular vector", and u is called \the left singular vector". Exercise 2.2 (σ is a singular value of A) , (σ2 is an eigenvalue of AA∗ and A∗A). ()) By assumption, (9u; v)(Av = σu; A∗u = σv). Then AA∗u = A(σv) = σAv = σ2u. Likewise, A∗Av = A∗(σu) = σA∗u = σ2v. (() Before proving the converse, what can we infer about AA∗;A∗A and σ2 assuming that σ2 is an eigenvalue? n×n ∗ Recall that a matrix M 2 C is Hermitian if M = M . We proved in a previous lecture that the eigenvalues of a Hermitian matrix are real. ∗ ∗ 2 m ∗ ∗ ∗ ∗ Clearly AA and A A are Hermitian, so σ 2 R. Also, (8x 2 C )(x AA x = hA x; A xi ≥ 0) ) AA∗ is positive semidefinite ) σ2 ≥ 0, as σ2 is an eigenvalue. The next exercise shows that given such σ2, we can assume that σ ≥ 0. Exercise 2.3 WLOG, all singular values σ ≥ 0. Proof: If σ < 0, let σ0 = −σ > 0. Then Av = σu = (−σ)(−u) = σ0(−u), and A∗u = σv ) A∗(−u) = −σv = σ0v. For the rest of this lecture, we will stick to real matrices, but we can generalize the same results n×n T to complex matrices. Recall we defined B 2 R ;B = B , as PSD (positive semi-definite) if n T n×n ∗ (8x 2 R )(x Bx ≥ 0). The same definition follows for complex matrices: B 2 C ;B = B , is n PSD if (8x 2 C )(hx; Bxi ≥ 0). We now prove the main theorem: m×n Theorem 2.4 (Singular Value Decomposition) Let A 2 R such that rk(A) = r. Then m×m n×n m×n 9σ1 ≥ · · · ≥ σr > 0 and an orthogonal matrix U 2 R ;V 2 R ; Σ 2 R such that T A = UΣV , where Σii = σi for i = 1; : : : ; r, and Σij = 0 everywhere else. Proof strategy: We will use the Spectral Theorem to decompose AT A = V DV T , and take the 2 diagonals of D as σi , the eigenvalues. Question: Why is σr > 0? T T 2 Answer: By the following exercise, rk(A) = rk(A A) = r, and A A ∼ D with σi 's on the diagonal. m×n T Exercise 2.5 Prove: (8A 2 R )(rk(A A) = rk(A)). hint: rank-nullity theorem. Proof: [Singular Value Decomposition] 2 T T T n×n A A is symmetric - hence by the Spectral Theorem, A A = V DV where D 2 R is diag- onal with non-negative entries, and V is orthogonal. The rank of D is r, so it has r positive entries on the diagonal, say λ1 ≥ λ2 ≥ ::: ≥ λr > 0 on the first r diagonal entries, and 0's everywhere else. We simply use the same V above (the eigenbasis of AT A), and see what we get from AV . The claim is that this gives us the desired decomposition. n×n • Let V = V1 V2 2 R , where V1 consists of the first r columns , and V2 the last n − r columns of V . m×m 0 −1=2 m×r 0 −1=2 r×r • Let U = U1 U2 2 R , where U1 = AV1(D ) 2 R and (D ) 2 R with 0 −1=2 1 (D ) = p . Let U2 be any extension of the remaining m − r columns from U1 to an ii λi orthonormal basis. (D0)1=2 0 p • Let Σ = 2 m×n,(D0)1=2 2 r×r where (D0)1=2 = λ 0 0 R R ii i Av1 Avr Then U1 = p ::: p . U would be orthogonal if U1 consists of othonormal columns. λ1 λr Claim 2.6 The columns of U1 are othonormal. Proof. Since V is an orthogonal matrix, * + T T T Avi Avj vi (A Avj) λjvi vj hui; uji = p ; p = p = p λi λj λiλj λiλj ( λj hvi; vji 0 if i 6= j = p = λiλj 1 if i = j Thus, we can extend the columns of U1 with U2 such that U is orthogonal. To complete the proof, we only need to prove that such U; V; Σ satisfy the properties we desire. Claim 2.7 Given A; V; U; Σ as above, A = UΣV T . Proof. 0 1=2 T T (D ) 0 V1 UΣV = U1 U2 T 0 0 V2 (D0)1=2V T = U U 1 1 2 0 0 1=2 T 0 −1=2 0 1=2 T = U1(D ) V1 = AV1(D ) (D ) V1 T = AV1V1 = AI = A Now that we have the decomposition, we check that the decomposition indeed gives us the singular values, which would complete the proof for Exercise 2.2. 3 T Exercise 2.8 If A = UΣV , where rk(A) = rk(Σ) = r, then 8i = 1 : : : r, where σi = Σii, Avi = σiui T A ui = σivi Proof: 001 B.C B.C T B C Avi = UΣV vi = UΣ B1C = σiui B C B.C @.A 0 T (Likewise for A ui = σivi.) T Pr T Exercise 2.9 If A = UΣV ) A = i=1 σiuivi , where ui and vi are i'th columns of U and V , respectively. 3 Extension of Courant-Fischer Now we have the SVD theorem. We will show that singular values also have an analogous definition similar to that of the Courant-Fischer for eigenvalues. m×n Exercise 3.1 Show that for A 2 R , the k'th largest singular value of A is jjAxjj jjAxjj σk = min max = max min S⊆Rn x2S;x6=0 jjxjj T ⊆Rn x2T;x6=0 jjxjj dim(S)=n−k+1 dim(T )=k hint: Use Courant-Fischer for AT A. Proof: Let's show for k = 1. (rest is exercise) 2 jjAxjj 2 jjAxjj σ1 = max , σ1 = max 2 x2Rn;x6=0 jjxjj x2Rn;x6=0 jjxjj xT AT Ax = max T x2Rn;x6=0 x x = max RAT A x2Rn;x6=0 2 T We know that the right side is true from Courant-Fischer, as σ1 is the largest eigenvalue of A A. T Question: what is the x that achieves σ1? v1, the first column of V in A = UΣV . 4 4 Applications 4.1 Data Fitting The interpretation of singular values in extension of Courant-Fischer has many fairly good appli- cations, and a nice interpretation in terms of fitting data to a line, or a subspace. n Suppose we have m points in R , and we wish to find a line passing through the origin “fit- ting the data" the best. By “fitting the data" the best, we mean to find the line l which minimizes Pm 2 i=1(di(l)) (least squared distance), where di(l) = closest point of l to the i'th data point. How do you specify any line passing through the origin? a unit vector in the direction of the line. So let v be the unit vector in the direction of a line l. n Question: Let ai 2 R , and l in direction of v, a unit vector. What is di(l)? Answer: By a simple geometric argument, 2 2 2 2 2 2 jjaijj = j hai; vi j + (di(l)) ) (di(l)) = jjaijj − j hai; vi j 2 Since jjaijj does not depend on v, m m X 2 X 2 2 min (di(l)) = min (jjaijj − j hai; vi j ) jjvjj=1;v2 n i=1 R i=1 m m X 2 X 2 = jjaijj − max j hai; vi j jjvjj=1;v2 n i=1 R i=1 m X 2 2 = jjaijj − max jjAvjj jjvjj=1;v2 n i=1 R 0 T 1 a1 B .