Principal Component Analysis — Exercises without Solutions —

Laurenz Wiskott Institut f¨urNeuroinformatik Ruhr-Universit¨atBochum, Germany, EU

4 February 2017

Contents

1 Intuition 3 1.1 Problem statement...... 3 1.1.1 Exercise: Second moment from mean and variance...... 3 1.1.2 Exercise: Second moment of a uniform distribution...... 3 1.2 Projection and reconstruction error...... 3 1.2.1 Exercise: Projection by an inner product is orthogonal...... 3 1.2.2 Exercise: Error function...... 3 1.3 Reconstruction error and variance...... 3 1.4 Covariance ...... 3 1.4.1 Exercise: Relation among the elements of a second moment matrix...... 3 1.4.2 Exercise: From data distribution to second-moment matrix...... 4 1.4.3 Exercise: From data distribution to second-moment matrix...... 4 1.4.4 Exercise: From second-moment matrix to data...... 4 1.4.5 Exercise: Data distributions with and without mean...... 4 1.5 and higher order structure...... 5 1.6 PCA by diagonalizing the covariance matrix...... 5

© 2016, 2017 Laurenz Wiskott (ORCID http://orcid.org/0000-0001-6237-740X, homepage https://www.ini.rub.de/ PEOPLE/wiskott/). This work (except for all figures from other sources, if present) is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License, see http://creativecommons.org/licenses/by-sa/4.0/. These exercises complement my corresponding lecture notes, and there is a version with and one without solutions. The table of contents of the lecture notes is reproduced here to give an orientation when the exercises can be reasonably solved. For best learning effect I recommend to first seriously try to solve the exercises yourself before looking into the solutions. More teaching material is available at https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/.

1 2 Formalism 5 2.1 Definition of the PCA-optimization problem...... 5 2.2 Matrix VT : Mapping from high-dimensional old coordinate system to low-dimensional new coordinate system...... 5 2.3 Matrix V: Mapping from low-dimensional new coordinate system to subspace in old coordi- nate system...... 5 2.3.1 Exercise: Norm of a vector...... 5 2.4 Matrix (VT V): Identity mapping within new coordinate system...... 5 2.5 Matrix (VVT ): Projection from high- to low-dimensional (sub)space within old coordinate system...... 5 2.6 Variance...... 5 2.7 Reconstruction error...... 5 2.8 Covariance matrix...... 5 2.8.1 Exercise: Second-moment matrices are positive semi-definite...... 5 2.8.2 Exercise: Covariance matrix from mean and second-moment matrix...... 6 2.9 Eigenvalue equation of the covariance matrix...... 6 2.9.1 Exercise: Eigenvectors of a are orthogonal...... 6 2.10 Total variance of the data x ...... 6 2.11 Diagonalizing the covariance matrix...... 6 2.12 Variance of y for a diagonalized covariance matrix...... 6 2.13 Constraints of matrix V0 ...... 6 2.14 Finding the optimal subspace...... 6 2.15 Interpretation of the result...... 6 2.15.1 Exercise: Moments of a data distribution: Simple example...... 6 2.15.2 Exercise: From data distribution to second-moment matrix via the eigenvectors....7 2.15.3 Exercise: From data distribution to second-moment matrix via the eigenvectors....7 2.15.4 Exercise: Dimensionality reduction...... 8 2.16 PCA Algorithm...... 8 2.17 Intuition of the Results...... 8 2.18 Whitening or sphering...... 8 2.18.1 Exercise: Sphered data is uncorrelated...... 8 2.19 Singular value decomposition + ...... 8

3 Application 8 3.1 Face processing...... 8

4 Acknowledgment 8

2 1 Intuition

1.1 Problem statement

1.1.1 Exercise: Second moment from mean and variance

How are mean m, variance v and 2nd moment s related to each other? In other words, if mean and variance of a one-dimensional distribution were given. How could you compute the corresponding 2nd moment? Hint: Assume x to be the data values andx ¯ their mean. Then play around with the corresponding expressions for meanx ¯ = hxi, variance h(x − x¯)2i and second moment hx2i.

1.1.2 Exercise: Second moment of a uniform distribution

Calculate the second moment of a uniform, i.e. flat, distribution in [−1, +1]. This is a distribution where every value between −1 and +1 is equally likely and other values are impossible.

1.2 Projection and reconstruction error

1.2.1 Exercise: Projection by an inner product is orthogonal

1. We have defined the projected vector, xk, by

T xk = vv x (1)

where x is the data point and v is the unit vector along the principal axis of the projection. Show that the difference vector between data point and the projected data point

x⊥ = x − xk (2)

is orthogonal to v.

2. Give a reason why the orthogonality of the two vectors is useful.

1.2.2 Exercise: Error function

Why should the reconstruction error, E, be defined as the mean of the squared difference of the original and reconstructed data vectors, and not simply the mean of the difference or the mean of the absolute difference?

1.3 Reconstruction error and variance

1.4 Covariance matrix

1.4.1 Exercise: Relation among the elements of a second moment matrix

µ µ µ For a set of data vectors x , µ = 1, ..., M the second moment matrix C is defined as Cij := hxi xj iµ. What are the upper and lower limits of Cij if Cii and Cjj are known? µ µ 1 P µ µ Hint: Consider hxi xj iµ = M µ xi xj as the scalar product of two vectors.

3 1.4.2 Exercise: From data distribution to second-moment matrix

Give an estimate of the second moment matrix for the following data distributions.

x2 x2 x2 1 1 1

1 x1 1 x1 1 x1

(a) (b) (c)

© CC BY-SA 4.0

1.4.3 Exercise: From data distribution to second-moment matrix

Give an estimate of the second moment matrix for the following data distributions.

x2 x2 x2 1 1 1

1 x1 1 x1 1 x1

(a) (b) (c)

© CC BY-SA 4.0

1.4.4 Exercise: From second-moment matrix to data

Draw a data distribution qualitatively consistent with the following second-moment matrices C.

 1 −0.5   1 0   1 1  (a) C = (b) C = (c) C = −0.5 1 0 0.5 1 1

1.4.5 Exercise: Data distributions with and without mean

1. Define a procedure by which you can turn any mean-free data distribution into a distribution with finite (non-zero) mean but identical second-moment matrix. (Are there exceptions?) 2. Conversely, define a procedure by which you can turn any data distribution with finite mean into a distribution with zero mean but identical second-moment matrix. (Are there exceptions?)

Hint: Think about what happens if you flip a point µ at the origin, i.e. if you replace xµ by −xµ in the data set.

4 1.5 Covariance matrix and higher order structure

1.6 PCA by diagonalizing the covariance matrix

2 Formalism

2.1 Definition of the PCA-optimization problem

2.2 Matrix VT : Mapping from high-dimensional old coordinate system to low- dimensional new coordinate system

2.3 Matrix V: Mapping from low-dimensional new coordinate system to sub- space in old coordinate system

2.3.1 Exercise: Norm of a vector

Let bi, i = 1, ..., N, be an orthonormal basis. Then we have (bi, bj) = δij and

N X v = vibi with vi := (v, bi) ∀v . (1) i=1 Show that N 2 X 2 kvk = vi . (2) i=1

2.4 Matrix (VT V): Identity mapping within new coordinate system

2.5 Matrix (VVT ): Projection from high- to low-dimensional (sub)space within old coordinate system

2.6 Variance

2.7 Reconstruction error

2.8 Covariance matrix

2.8.1 Exercise: Second-moment matrices are positive semi-definite

µ µ T (//10/11 min)Show that a second-moment matrix C := hx (x ) iµ is always positive semi-definite, i.e. for each vector v we find vT Cv ≥ 0. For which vectors v does vT Cv = 0 hold?

5 2.8.2 Exercise: Covariance matrix from mean and second-moment matrix

Given some data xµ, µ = 1, ..., M, with mean

 1  x¯ := hxi = (1) −1 and second-moment matrix  4 −1  C := hxxT i = (2) −1 2 Calculate the covariance matrix Σ := h(x − x¯)(x − x¯)T i . (3) First derive a general formula and then calculate it for the concrete values given.

2.9 Eigenvalue equation of the covariance matrix

2.9.1 Exercise: Eigenvectors of a symmetric matrix are orthogonal

Prove that the eigenvectors of a symmetric matrix are orthogonal, if their eigenvalues are different. Proceed as follows:

1. Let A be a symmetric N-dimensional matrix, i.e. A = AT . Show first that (v, Aw) = (Av, w) for any vectors v, w ∈ RN , with (·, ·) indicating the Euclidean inner product.

2. Let {ai} be the eigenvectors of the matrix A with the eigenvalues λi. Show with the help of part one that (ai, aj) = 0 if λi 6= λj.

Hint: λi(ai, aj) = ...

2.10 Total variance of the data x

2.11 Diagonalizing the covariance matrix

2.12 Variance of y for a diagonalized covariance matrix

2.13 Constraints of matrix V0

2.14 Finding the optimal subspace

2.15 Interpretation of the result

2.15.1 Exercise: Moments of a data distribution: Simple example

Given a data distribution xµ with −3  1  −2 x1 = , x2 = , x3 = . (1) 2 −1 3

µ µ µT 1. Calculate the mean x¯ = hx iµ and the second-moment matrix C = hx x iµ.

6 2. Determine the normalized eigenvectors c1 and c2 of C and the corresponding eigenvalues. Hint: Look at the data distribution and guess the eigenvectors on the basis of the symmetry of the distribution. Then insert the guessed eigenvectors into the eigenvalue equation, verify that they are eigenvectors and calculate the eigenvalues. Otherwise you have to go the hard way via the characteristic polynomial.

3. Determine the first and second moment of

µ T µ y = cα x , (2)

µ µ 2 i.e. hy iµ and h(y ) iµ, for α ∈ {1, 2}. Hint: You don’t have to compute the projected data. There is a simpler way.

2.15.2 Exercise: From data distribution to second-moment matrix via the eigenvectors

Give an estimate of the second-moment matrix for the following data distributions by first guessing the eigenvalues and normalized eigenvectors from the distribution and then calculating the matrix.

x2 x2 x2 1 1 1

1 x1 1 x1 1 x1

(a) (b) (c)

© CC BY-SA 4.0

2.15.3 Exercise: From data distribution to second-moment matrix via the eigenvectors

Give an estimate of the second-moment matrix for the following data distributions by first guessing the eigenvalues and normalized eigenvectors from the distribution and then calculating the matrix.

x2 x2 x2 1 1 1

1 x1 1 x1 1 x1

(a) (b) (c)

© CC BY-SA 4.0

7 2.15.4 Exercise: Dimensionality reduction

3 Given some data in R with the corresponding 3 × 3 second-moment matrix C with eigenvectors cα and eigenvalues λα, with λ1 = 3, λ2 = 1 and λ3 = 0.2.

1. Define a matrix A ∈ R2×3 that maps the data into a two-dimensional space while preserving as much variance as possible.

2. Define a matrix B ∈ R3×2 that places the reduced data back into R3 with minimal reconstruction error. How large is the reconstruction error?

3. Prove that AB is an . Why would one expect that intuitively? 4. Prove that BA is a but not the identity matrix.

2.16 PCA Algorithm

2.17 Intuition of the Results

2.18 Whitening or sphering

2.18.1 Exercise: Sphered data is uncorrelated

Prove that sphered zero-mean data xˆ projected onto two orthogonal vectors n1 and n2 is uncorrelated.

Hint: The correlation coefficient for two scalar data sets y1 and y2 with meansy ¯i := hyii is defined as

h(y − y¯ )(y − y¯ )i c := 1 1 2 2 (1) p 2 p 2 h(y1 − y¯1) i h(y2 − y¯2) i

2.19 Singular value decomposition +

3 Application

3.1 Face processing

4 Acknowledgment

8