34 | Principal Component Analysis
Total Page:16
File Type:pdf, Size:1020Kb
34 j Principal component analysis 34.1 Introduction Principal component analysis (PCA) is a useful method to reduce the dimensionality of a multivariate data set Y 2 Rn×m. It is closely related to the singular value decomposition (SVD) of a matrix. The SVD of a data matrix Y 2 Rn×m is defined as a matrix decomposition Y = UΣV T ; (34.1) where U 2 Rn×n is an orthogonal matrix, Σ 2 Rn×m is a rectangular diagonal matrix, and V 2 Rm×m is an orthogonal matrix. The diagonal entries σii of Σ for i = 1; :::; n , referred to as the singular values of Y , correspond to the square roots of the non-zero eigenvalues of both the matrices Y T Y 2 Rm×m and YY T 2 Rn×n. The n columns of U are referred to as the left-singular vectors and correspond to the eigenvectors of YY T , while the m columns of V are referred to as right-singular vectors and correspond to the eigenvectors of Y T Y . The aim of the current chapter is to review the linear algebra terminology involved in both SVD and PCA, and, more importantly, to endow the linear algebraic concepts of SVD with some data analytic intuition. Before we provide an outline of the current chapter, we give two examples for the application of PCA in functional neuroimaging. Example 1 In fMRI one is often interested in the BOLD signal time-series of anatomical regions of interest, for example as the data basis for biophysical modelling approaches (Chapter 43). If a region of interest comprises both voxels that exhibit MR signal increases for a given experimental perturbation and other voxels that exhibit MR signal decreases for the same perturbation, averaging the voxel time-series over space can artificially create an average time-series that exhibits no modulation by the experimental perturbation - despite the fact that both voxel populations were in fact responsive to the experimental perturbation (Figure 34.1A, left panel). This effect can be mitigated by using the first eigenvector of the of the region's voxel-by-voxel covariance matrix, sometimes referred to as the first eigenmode as a summary of the region's MR signal time series instead (Figure 34.1A, right panel). On the other hand, if the voxel MR signal time-series within a region of interest are spatially coherent, than the average time-series and the first eigenvector of the voxel MR time-series matrix do not differ much. Example 2 In biophysical modelling approaches for event-related potentials, such as dynamic causal modelling (Chap- ter 44), the data corresponds to a matrix in the number of electrodes and the number of peri-event time-bins. For computationally efficiency, this potentially large matrix can be projected onto a smaller matrix of feature timecourses. Only this reduced matrix is then subjected to biophysical modelling using the DCM framework. As an example, the leftmost panel of Figure 34.1B visualizes an event-related potential EEG electrode × data sample matrix. The central panel of Figure 34.1B depicts the feature representation of these data comprising five eigenvectors of the data covariance matrix that are associated with the largest variance, and the rightmost panel visualizes the reconstructed data based on these PCA results only. Notably, the reconstructed data based on the PCA-selected features is virtually identical to the original data. To get at the inner workings of both PCA and SVD we have to revisit some elementary concepts from matrix theory and vector algebra. We proceed as follows: we first review some fundamentals of matrix eigenanalysis, including the notions of eigenvalues, eigenvectors and diagonalization of real symmetric matrices. We then review some essential prerequisites from vector space theory, including the notions of abstract vectors spaces, linear vector combinations, vector space bases, orthogonal and orthonormal bases, vector projections, and vector coordinate transforms. In essence, PCA corresponds to a coordinate 0.3 20 0.2 0.1 40 0 60 -0.1 80 -0.2 -0.3 100 50 100 150 200 250 0 50 100 150 200 250 6 1 10 1 10 1 4 20 0.5 2 20 0.5 2 30 0 30 0 3 0 -0.5 -0.5 40 40 -2 -1 4 -1 50 50 -1.5 -4 -1.5 5 60 60 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 Figure 34.1. PCA applications in functional neuroimaging. (A) Eigenmode analysis as spatial summary measure for region-of-interest timecourse extraction. For the current example, it is assumed that a region of interest comprises two voxel populations, one of which is positively modulated by some temporal event of interest (left panel, upper half of voxels), the other of which is negatively modulated by the same event of interest (left panel, lower half of voxels). The right panel depicts the resulting spatial average exhibiting no systematic variation with the temporal event of interest, as well as the the first eigenmode of the voxel × TR matrix shown on the left, which retains the event-related modulation. (B) PCA for feature selection and dimensionality reduction. The leftmost panels depicts an EEG event-related potential electrode × data samples matrix. Using PCA, these data can be compressed to the feature representation shown in the central panel. Notably, the reconstructed data based on this feature representation is virtually identical to the original data, as shown in the rightmost panel. transform of a data set onto a basis that is formed by the eigenvectors of its empirical covariance matrix. In this transformed space, the data features have zero covariance and are hence maximal informative. This property can be used to remove redundant features from the data set and hence allows for data compression. 34.2 Eigenanalysis An intuitive understanding of the concepts of eigenanalysis requires familiarity with differential equations. We will here thus strive only for a formal understanding. Let A 2 Rm×m be a square matrix. Any vector v 2 Rm; v 6= 0 that fulfils the equation Av = λv (34.2) for a scalar λ 2 R is called an eigenvector of A. The scalar λ is called an eigenvalue of A. Each eigenvector has an associated eigenvalue. Eigenvalues for different eigenvectors can be identical. Note that if v 2 Rm is an eigenvector, then av 2 Rm with a 2 R is an eigenvector with eigenvalue aλ. Therefore, one assumes without loss of generality that eigenvectors have length one, i.e., vT v = 1. Computing eigenvectors and eigenvalues Eigenvectors and eigenvalues of a matrix A 2 Rm×m can be computed as follows. First, from the definition of eigenvectors and eigenvalues we have Av = λv , Av − λv = 0 , (A − λI) v = 0 (34.3) This shows, that we are interested in a vector v 2 Rm and a scalar λ 2 R, such that the matrix product of (A − λI) and v results in the zero vector 0 2 Rm. A trivial solution for this would be to set v = 0, but this is not allowed by the definition of the eigenvector. If v 6= 0, we must adjust λ and v such that v is an element of the nullspace of A. The nullspace of a matrix M 2 Rm×m, here denoted by N (M) is the set of all vectors w 2 Rm that are mapped onto the zero vector, i.e., m N (M) = fw 2 R jMw = 0g: (34.4) If the nullspace of a matrix contains any other element than zero, the matrix is noninvertible (singular). This holds, because the zero vector is always mapped onto the zero vector by premultiplication with any matrix. If another vector is mapped into the zero vector, we would not know which vector we should assign to the zero vector when inverting the matrix multiplication by finding the inverse of the matrix in question. Therefore, the matrix cannot be invertible. We know that determinants can be checked to see whether a matrix is invertible or not, and we can make use of this here: if a matrix is not invertible, then its determinant must be zero. Therefore, we are searching for all scalars λ 2 R, such that χA(λ) := det (A − λI) = 0 (34.5) The expression det(A−λI), conceived as a function of λ is referred to as the characteristic polynomialof A, because, written in full, it corresponds to a polynomial in λ. Formulation of the characteristic polynomial then allows for the following strategy to compute eigenvalues and eigenvectors of matrices ∗ 1. Solve χA (λ) = 0 for zero-crossings (also referred to as roots) λi ; i = 1; 2; ::: The roots of the characteristic polynomial are the eigenvalues of A. ∗ 2. Substitute the values λi in (34.3), which yields the system of linear equations ∗ (A − λi I) vi = 0 (34.6) and solve the system for the associated eigenvectors vi; i = 1; 2; :::. For small matrices with nice properties such as symmetry, the above strategy can be applied by hand. In practice, matrices are usually larger than 3 × 3 and eigenanalysis problems are usually solved using numerically computing. Eigenvalues and eigenvectors of symmetric matrices We next consider how eigenvalues and eigenvectors can be used to decompose or diagonalize matrices. To this end, assume that the square matrix A 2 Rm×m is symmetric, for example, because A is a covariance matrix. A corollary of a fundamental result from linear algebra, which is known as the spectral theorem, asserts that symmetric matrices of size m × m have m distinct eigenvalues λ1; :::; λm with associated m orthogonal eigenvectors q1; :::; qm 2 R .