Introduction to Machine Learning (67577) Lecture 11

Introduction to Machine Learning (67577) Lecture 11 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Dimensionality Reduction Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 1 / 25 Reduces training (and testing) time Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration Why? n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Reduces training (and testing) time Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Reduces training (and testing) time Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Interpretability of the data, finding meaningful structure in data, illustration n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Reduces training (and testing) time Reduces estimation error Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Reduces training (and testing) time Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Reduces training (and testing) time Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Outline 1 Principal Component Analysis (PCA) 2 Random Projections 3 Compressed Sensing Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 3 / 25 Linear recovery: x~ = Uy = UW x Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Natural criterion: we want to be able to approximately recover x from y = W x PCA: Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Linear recovery: x~ = Uy = UW x Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 PCA: Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Natural criterion: we want to be able to approximately recover x from y = W x Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Linear recovery: x~ = Uy = UW x Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Natural criterion: we want to be able to approximately recover x from y = W x PCA: Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Natural criterion: we want to be able to approximately recover x from y = W x PCA: Linear recovery: x~ = Uy = UW x Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Natural criterion: we want to be able to approximately recover x from y = W x PCA: Linear recovery: x~ = Uy = UW x Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Theorem Pm > Let A = i=1 xixi and let u1; : : : ; un be the n leading eigenvectors of A. Then, the solution to the PCA problem is to set the columns of U to > be u1;:::; un and to set W = U Solving the PCA Problem m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 5 / 25 Solving the PCA Problem m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Theorem Pm > Let A = i=1 xixi and let u1; : : : ; un be the n leading eigenvectors of A. Then, the solution to the PCA problem is to set the columns of U to > be u1;:::; un and to set W = U Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 5 / 25 The transformation x 7! UW x moves x to this subspace The point in S which is closest to x is VV >x, where columns of V are orthonormal basis of S Therefore, we can assume w.l.o.g. that W = U > and that columns of U are orthonormal Proof main ideas UW is of rank n, therefore its range is n dimensional subspace, denoted S Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 6 / 25 The point in S which is closest to x is VV >x, where columns of V are orthonormal basis of S Therefore, we can assume w.l.o.g. that W = U > and that columns of U are orthonormal Proof main ideas UW is of rank n, therefore its range is n dimensional subspace, denoted S The transformation x 7! UW x moves x to this subspace Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 6 / 25 Therefore, we can assume w.l.o.g. that W = U > and that columns of U are orthonormal Proof main ideas UW is of rank n, therefore its range is n dimensional subspace, denoted S The transformation x 7! UW x moves x to this subspace The point in S which is closest to x is VV >x, where columns of V are orthonormal basis of S Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 6 / 25 Proof main ideas UW is of rank n, therefore its range is n dimensional subspace, denoted S The transformation x 7! UW x moves x to this subspace The point in S which is closest to x is VV >x, where columns of V are orthonormal basis of S Therefore, we can assume w.l.o.g. that W = U > and that columns of U are orthonormal Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 6 / 25 Therefore, an equivalent PCA problem is m ! ! > X > argmax trace U xixi U : d;n > U2R :U U=I i=1 Pm > The solution is to set U to be the leading eigenvectors of A = i=1 xixi . Proof main ideas Observe: kx − UU >xk2 = kxk2 − 2x>UU >x + x>UU >UU >x = kxk2 − x>UU >x = kxk2 − trace(U >xx>U) ; Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 7 / 25 Pm > The solution is to set U to be the leading eigenvectors of A = i=1 xixi . Proof main ideas Observe: kx − UU >xk2 = kxk2 − 2x>UU >x + x>UU >UU >x = kxk2 − x>UU >x = kxk2 − trace(U >xx>U) ; Therefore, an equivalent PCA problem is m ! ! > X > argmax trace U xixi U : d;n > U2R :U U=I i=1 Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 7 / 25 Proof main ideas Observe: kx − UU >xk2 = kxk2 − 2x>UU >x + x>UU >UU >x = kxk2 − x>UU >x = kxk2 − trace(U >xx>U) ; Therefore, an equivalent PCA problem is m ! ! > X > argmax trace U xixi U : d;n > U2R :U U=I i=1 Pm > The solution is to set U to be the leading eigenvectors of A = i=1 xixi . Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 7 / 25 Value of the objective It is easy to see that m d X 2 X min kxi − UW xik = λi(A) W 2 n;d;U2 d;n R R i=1 i=n+1 Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 8 / 25 Centering It is a common practice to \center" the examples before applying PCA, namely: 1 Pm First calculate µ = m i=1 xi Then apply PCA on the vectors (x1 − µ);:::; (xm − µ) This is also related to the interpretation of PCA as variance maximization (will be given in exercise) Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 9 / 25 Efficient implementation for d m and kernel PCA Pm > > m;d Recall: A = i=1 xixi = X X where X 2 R is a matrix whose > i'th row is xi .

Introduction to Machine Learning (67577) Lecture 11

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support