Principal Component Analysis — Exercises Without Solutions —

Principal Component Analysis | Exercises without Solutions | Laurenz Wiskott Institut fürNeuroinformatik Ruhr-UniversitätBochum, Germany, EU 4 February 2017 Contents 1 Intuition 3 1.1 Problem statement..........................................3 1.1.1 Exercise: Second moment from mean and variance....................3 1.1.2 Exercise: Second moment of a uniform distribution....................3 1.2 Projection and reconstruction error.................................3 1.2.1 Exercise: Projection by an inner product is orthogonal..................3 1.2.2 Exercise: Error function...................................3 1.3 Reconstruction error and variance.................................3 1.4 Covariance matrix..........................................3 1.4.1 Exercise: Relation among the elements of a second moment matrix...........3 1.4.2 Exercise: From data distribution to second-moment matrix...............4 1.4.3 Exercise: From data distribution to second-moment matrix...............4 1.4.4 Exercise: From second-moment matrix to data......................4 1.4.5 Exercise: Data distributions with and without mean...................4 1.5 Covariance matrix and higher order structure...........................5 1.6 PCA by diagonalizing the covariance matrix............................5 © 2016, 2017 Laurenz Wiskott (ORCID http://orcid.org/0000-0001-6237-740X, homepage https://www.ini.rub.de/ PEOPLE/wiskott/). This work (except for all figures from other sources, if present) is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License, see http://creativecommons.org/licenses/by-sa/4.0/. These exercises complement my corresponding lecture notes, and there is a version with and one without solutions. The table of contents of the lecture notes is reproduced here to give an orientation when the exercises can be reasonably solved. For best learning effect I recommend to first seriously try to solve the exercises yourself before looking into the solutions. More teaching material is available at https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/. 1 2 Formalism 5 2.1 Definition of the PCA-optimization problem............................5 2.2 Matrix VT : Mapping from high-dimensional old coordinate system to low-dimensional new coordinate system..........................................5 2.3 Matrix V: Mapping from low-dimensional new coordinate system to subspace in old coordinate system..............................................5 2.3.1 Exercise: Norm of a vector.................................5 2.4 Matrix (VT V): Identity mapping within new coordinate system................5 2.5 Matrix (VVT ): Projection from high- to low-dimensional (sub)space within old coordinate system.................................................5 2.6 Variance................................................5 2.7 Reconstruction error.........................................5 2.8 Covariance matrix..........................................5 2.8.1 Exercise: Second-moment matrices are positive semi-definite..............5 2.8.2 Exercise: Covariance matrix from mean and second-moment matrix..........6 2.9 Eigenvalue equation of the covariance matrix...........................6 2.9.1 Exercise: Eigenvectors of a symmetric matrix are orthogonal..............6 2.10 Total variance of the data x .....................................6 2.11 Diagonalizing the covariance matrix................................6 2.12 Variance of y for a diagonalized covariance matrix........................6 2.13 Constraints of matrix V0 .......................................6 2.14 Finding the optimal subspace....................................6 2.15 Interpretation of the result.....................................6 2.15.1 Exercise: Moments of a data distribution: Simple example................6 2.15.2 Exercise: From data distribution to second-moment matrix via the eigenvectors....7 2.15.3 Exercise: From data distribution to second-moment matrix via the eigenvectors....7 2.15.4 Exercise: Dimensionality reduction.............................8 2.16 PCA Algorithm............................................8 2.17 Intuition of the Results.......................................8 2.18 Whitening or sphering........................................8 2.18.1 Exercise: Sphered data is uncorrelated...........................8 2.19 Singular value decomposition + ...................................8 3 Application 8 3.1 Face processing............................................8 4 Acknowledgment 8 2 1 Intuition 1.1 Problem statement 1.1.1 Exercise: Second moment from mean and variance How are mean m, variance v and 2nd moment s related to each other? In other words, if mean and variance of a one-dimensional distribution were given. How could you compute the corresponding 2nd moment? Hint: Assume x to be the data values andx ¯ their mean. Then play around with the corresponding expressions for meanx ¯ = hxi, variance h(x − x¯)2i and second moment hx2i. 1.1.2 Exercise: Second moment of a uniform distribution Calculate the second moment of a uniform, i.e. flat, distribution in [−1; +1]. This is a distribution where every value between −1 and +1 is equally likely and other values are impossible. 1.2 Projection and reconstruction error 1.2.1 Exercise: Projection by an inner product is orthogonal 1. We have defined the projected vector, xk, by T xk = vv x (1) where x is the data point and v is the unit vector along the principal axis of the projection. Show that the difference vector between data point and the projected data point x? = x − xk (2) is orthogonal to v. 2. Give a reason why the orthogonality of the two vectors is useful. 1.2.2 Exercise: Error function Why should the reconstruction error, E, be defined as the mean of the squared difference of the original and reconstructed data vectors, and not simply the mean of the difference or the mean of the absolute difference? 1.3 Reconstruction error and variance 1.4 Covariance matrix 1.4.1 Exercise: Relation among the elements of a second moment matrix µ µ µ For a set of data vectors x ; µ = 1; :::; M the second moment matrix C is defined as Cij := hxi xj iµ. What are the upper and lower limits of Cij if Cii and Cjj are known? µ µ 1 P µ µ Hint: Consider hxi xj iµ = M µ xi xj as the scalar product of two vectors. 3 1.4.2 Exercise: From data distribution to second-moment matrix Give an estimate of the second moment matrix for the following data distributions. x2 x2 x2 1 1 1 1 x1 1 x1 1 x1 (a) (b) (c) © CC BY-SA 4.0 1.4.3 Exercise: From data distribution to second-moment matrix Give an estimate of the second moment matrix for the following data distributions. x2 x2 x2 1 1 1 1 x1 1 x1 1 x1 (a) (b) (c) © CC BY-SA 4.0 1.4.4 Exercise: From second-moment matrix to data Draw a data distribution qualitatively consistent with the following second-moment matrices C. 1 −0:5 1 0 1 1 (a) C = (b) C = (c) C = −0:5 1 0 0:5 1 1 1.4.5 Exercise: Data distributions with and without mean 1. Define a procedure by which you can turn any mean-free data distribution into a distribution with finite (non-zero) mean but identical second-moment matrix. (Are there exceptions?) 2. Conversely, define a procedure by which you can turn any data distribution with finite mean into a distribution with zero mean but identical second-moment matrix. (Are there exceptions?) Hint: Think about what happens if you flip a point µ at the origin, i.e. if you replace xµ by −xµ in the data set. 4 1.5 Covariance matrix and higher order structure 1.6 PCA by diagonalizing the covariance matrix 2 Formalism 2.1 Definition of the PCA-optimization problem 2.2 Matrix VT : Mapping from high-dimensional old coordinate system to low- dimensional new coordinate system 2.3 Matrix V: Mapping from low-dimensional new coordinate system to subspace in old coordinate system 2.3.1 Exercise: Norm of a vector Let bi; i = 1; :::; N, be an orthonormal basis. Then we have (bi; bj) = δij and N X v = vibi with vi := (v; bi) 8v : (1) i=1 Show that N 2 X 2 kvk = vi : (2) i=1 2.4 Matrix (VT V): Identity mapping within new coordinate system 2.5 Matrix (VVT ): Projection from high- to low-dimensional (sub)space within old coordinate system 2.6 Variance 2.7 Reconstruction error 2.8 Covariance matrix 2.8.1 Exercise: Second-moment matrices are positive semi-definite µ µ T (//10/11 min)Show that a second-moment matrix C := hx (x ) iµ is always positive semi-definite, i.e. for each vector v we find vT Cv ≥ 0. For which vectors v does vT Cv = 0 hold? 5 2.8.2 Exercise: Covariance matrix from mean and second-moment matrix Given some data xµ; µ = 1; :::; M, with mean 1 x¯ := hxi = (1) −1 and second-moment matrix 4 −1 C := hxxT i = (2) −1 2 Calculate the covariance matrix Σ := h(x − x¯)(x − x¯)T i : (3) First derive a general formula and then calculate it for the concrete values given. 2.9 Eigenvalue equation of the covariance matrix 2.9.1 Exercise: Eigenvectors of a symmetric matrix are orthogonal Prove that the eigenvectors of a symmetric matrix are orthogonal, if their eigenvalues are different. Proceed as follows: 1. Let A be a symmetric N-dimensional matrix, i.e. A = AT . Show first that (v; Aw) = (Av; w) for any vectors v; w 2 RN , with (·; ·) indicating the Euclidean inner product. 2. Let faig be the eigenvectors of the matrix A with the eigenvalues λi. Show with the help of part one that (ai; aj) = 0 if λi 6= λj. Hint: λi(ai; aj) = ::: 2.10 Total variance of the data x 2.11 Diagonalizing the covariance matrix 2.12 Variance of y for a diagonalized covariance matrix 2.13 Constraints of matrix V0 2.14 Finding the optimal subspace 2.15 Interpretation of the result 2.15.1 Exercise: Moments of a data distribution: Simple example Given a data distribution xµ with −3 1 −2 x1 = ; x2 = ; x3 = : (1) 2 −1 3 µ µ µT 1. Calculate the mean x¯ = hx iµ and the second-moment matrix C = hx x iµ.

Load more