Introduction to Machine Learning (67577) Lecture 11

Introduction to Machine Learning (67577) Lecture 11 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Dimensionality Reduction Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 1 / 25 Reduces training (and testing) time Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration Why? n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Reduces training (and testing) time Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Reduces training (and testing) time Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Interpretability of the data, finding meaningful structure in data, illustration n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Reduces training (and testing) time Reduces estimation error Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Reduces training (and testing) time Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Dimensionality Reduction Dimensionality Reduction = taking data in high dimensional space and mapping it into a low dimensional space Why? Reduces training (and testing) time Reduces estimation error Interpretability of the data, finding meaningful structure in data, illustration n;d Linear dimensionality reduction: x 7! W x where W 2 R and n < d Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 2 / 25 Outline 1 Principal Component Analysis (PCA) 2 Random Projections 3 Compressed Sensing Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 3 / 25 Linear recovery: x~ = Uy = UW x Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Natural criterion: we want to be able to approximately recover x from y = W x PCA: Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Linear recovery: x~ = Uy = UW x Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 PCA: Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Natural criterion: we want to be able to approximately recover x from y = W x Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Linear recovery: x~ = Uy = UW x Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Natural criterion: we want to be able to approximately recover x from y = W x PCA: Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Natural criterion: we want to be able to approximately recover x from y = W x PCA: Linear recovery: x~ = Uy = UW x Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Principal Component Analysis (PCA) x 7! W x What makes W a good matrix for dimensionality reduction ? Natural criterion: we want to be able to approximately recover x from y = W x PCA: Linear recovery: x~ = Uy = UW x Measures \approximate recovery" by averaged squared norm: given examples x1;:::; xm, solve m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 4 / 25 Theorem Pm > Let A = i=1 xixi and let u1; : : : ; un be the n leading eigenvectors of A. Then, the solution to the PCA problem is to set the columns of U to > be u1;:::; un and to set W = U Solving the PCA Problem m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 5 / 25 Solving the PCA Problem m X 2 argmin kxi − UW xik n;d d;n W 2R ;U2R i=1 Theorem Pm > Let A = i=1 xixi and let u1; : : : ; un be the n leading eigenvectors of A. Then, the solution to the PCA problem is to set the columns of U to > be u1;:::; un and to set W = U Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 5 / 25 The transformation x 7! UW x moves x to this subspace The point in S which is closest to x is VV >x, where columns of V are orthonormal basis of S Therefore, we can assume w.l.o.g. that W = U > and that columns of U are orthonormal Proof main ideas UW is of rank n, therefore its range is n dimensional subspace, denoted S Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 6 / 25 The point in S which is closest to x is VV >x, where columns of V are orthonormal basis of S Therefore, we can assume w.l.o.g. that W = U > and that columns of U are orthonormal Proof main ideas UW is of rank n, therefore its range is n dimensional subspace, denoted S The transformation x 7! UW x moves x to this subspace Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 6 / 25 Therefore, we can assume w.l.o.g. that W = U > and that columns of U are orthonormal Proof main ideas UW is of rank n, therefore its range is n dimensional subspace, denoted S The transformation x 7! UW x moves x to this subspace The point in S which is closest to x is VV >x, where columns of V are orthonormal basis of S Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 6 / 25 Proof main ideas UW is of rank n, therefore its range is n dimensional subspace, denoted S The transformation x 7! UW x moves x to this subspace The point in S which is closest to x is VV >x, where columns of V are orthonormal basis of S Therefore, we can assume w.l.o.g. that W = U > and that columns of U are orthonormal Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 6 / 25 Therefore, an equivalent PCA problem is m ! ! > X > argmax trace U xixi U : d;n > U2R :U U=I i=1 Pm > The solution is to set U to be the leading eigenvectors of A = i=1 xixi . Proof main ideas Observe: kx − UU >xk2 = kxk2 − 2x>UU >x + x>UU >UU >x = kxk2 − x>UU >x = kxk2 − trace(U >xx>U) ; Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 7 / 25 Pm > The solution is to set U to be the leading eigenvectors of A = i=1 xixi . Proof main ideas Observe: kx − UU >xk2 = kxk2 − 2x>UU >x + x>UU >UU >x = kxk2 − x>UU >x = kxk2 − trace(U >xx>U) ; Therefore, an equivalent PCA problem is m ! ! > X > argmax trace U xixi U : d;n > U2R :U U=I i=1 Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 7 / 25 Proof main ideas Observe: kx − UU >xk2 = kxk2 − 2x>UU >x + x>UU >UU >x = kxk2 − x>UU >x = kxk2 − trace(U >xx>U) ; Therefore, an equivalent PCA problem is m ! ! > X > argmax trace U xixi U : d;n > U2R :U U=I i=1 Pm > The solution is to set U to be the leading eigenvectors of A = i=1 xixi . Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 7 / 25 Value of the objective It is easy to see that m d X 2 X min kxi − UW xik = λi(A) W 2 n;d;U2 d;n R R i=1 i=n+1 Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 8 / 25 Centering It is a common practice to \center" the examples before applying PCA, namely: 1 Pm First calculate µ = m i=1 xi Then apply PCA on the vectors (x1 − µ);:::; (xm − µ) This is also related to the interpretation of PCA as variance maximization (will be given in exercise) Shai Shalev-Shwartz (Hebrew U) IML Lecture 11 Dimensionality Reduction 9 / 25 Efficient implementation for d m and kernel PCA Pm > > m;d Recall: A = i=1 xixi = X X where X 2 R is a matrix whose > i'th row is xi .

Introduction to Machine Learning (67577) Lecture 11

Experiments with Random Projection

Random Projections for Machine Learning and Data Mining

Experiments with Random Projection Abstract 1 Introduction 2 High

Targeted Random Projection for Prediction from High-Dimensional Features

Bilinear Random Projections for Locality-Sensitive Binary Codes

Large Scale Canonical Correlation Analysis with Iterative Least Squares

Efficient Dimensionality Reduction Using Random Projection

Random Projections: Data Perturbation for Classification Problems Arxiv:1911.10800V1

Experiments with Random Projection Ensembles: Linear Versus Quadratic Discriminants

Data Quality Measures and Efficient Evaluation Algorithms for Large

Dimension Reduction with Extreme Learning Machine Liyanaarachchi Lekamalage Chamara Kasun, Yan Yang, Guang-Bin Huang and Zhengyou Zhang

Constrained Extreme Learning Machines: a Study on Classification Cases