Principal Components Analysis

Principal Components Analysis Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 16-Mar-2017 Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 1 Copyright Copyright c 2017 by Nathaniel E. Helwig Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 2 Outline of Notes 1) Background 3) Sample PCs Overview Definition Data, Cov, Cor Calculation Orthogonal Rotation Properties 2) Population PCs 4) PCA in Practice Definition Covariance or Correlation? Calculation Decathlon Example Properties Number of Components Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 3 Background Background Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 4 Background Overview of Principal Components Analysis Definition and Purposes of PCA Principal Components Analysis (PCA) finds linear combinations of variables that best explain the covariation structure of the variables. There are two typical purposes of PCA: 1 Data reduction: explain covariation between p variables using r < p linear combinations 2 Data interpretation: find features (i.e., components) that are important for explaining covariation Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 5 Background Data, Covariance, and Correlation Matrix Data Matrix The data matrix refers to the array of numbers 0 1 x11 x12 ··· x1p Bx21 x22 ··· x2pC B C Bx x ··· x C X = B 31 32 3pC B . .. C @ . A xn1 xn2 ··· xnp where xij is the j-th variable collected from the i-th item (e.g., subject). items/subjects are rows variables are columns X is a data matrix of order n × p (# items by # variables). Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 6 Background Data, Covariance, and Correlation Matrix Population Covariance Matrix The population covariance matrix refers to the symmetric array 0 1 σ11 σ12 σ13 ··· σ1p Bσ21 σ22 σ23 ··· σ2pC B C Bσ σ σ ··· σ C Σ = B 31 32 33 3pC B . .. C @ . A σp1 σp2 σp3 ··· σpp where 2 σjj = E([Xj − µj ] ) is the population variance of the j-th variable σjk = E([Xj − µj ][Xk − µk ]) is the population covariance between the j-th and k-th variables µj = E(Xj ) is the population mean of the j-th variable Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 7 Background Data, Covariance, and Correlation Matrix Sample Covariance Matrix The sample covariance matrix refers to the symmetric array 0 2 1 s1 s12 s13 ··· s1p 2 Bs21 s s23 ··· s2pC B 2 C Bs s s2 ··· s C S = B 31 32 3 3pC B . .. C @ . A 2 sp1 sp2 sp3 ··· sp where 2 1 Pn ¯ 2 sj = n−1 i=1(xij − xj ) is the sample variance of the j-th variable 1 Pn ¯ ¯ sjk = n−1 i=1(xij − xj )(xik − xk ) is the sample covariance between the j-th and k-th variables Pn x¯j = (1=n) i=1 xij is the sample mean of the j-th variable Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 8 Background Data, Covariance, and Correlation Matrix Covariance Matrix from Data Matrix We can calculate the (sample) covariance matrix such as 1 S = X0 X n − 1 c c 0 where Xc = X − 1nx¯ = CX with 0 x¯ = (x¯1;:::; x¯p) denoting the vector of variable means −1 0 C = In − n 1n1n denoting a centering matrix Note that the centered matrix Xc has the form 0 1 x11 − x¯1 x12 − x¯2 ··· x1p − x¯p Bx21 − x¯1 x22 − x¯2 ··· x2p − x¯pC B C Bx − x¯ x − x¯ ··· x − x¯pC Xc = B 31 1 32 2 3p C B . .. C @ . A xn1 − x¯1 xn2 − x¯2 ··· xnp − x¯p Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 9 Background Data, Covariance, and Correlation Matrix Population Correlation Matrix The population correlation matrix refers to the symmetric array 0 1 1 ρ12 ρ13 ··· ρ1p Bρ21 1 ρ23 ··· ρ2pC B C Bρ ρ 1 ··· ρ C P = B 31 32 3pC B . .. C @ . A ρp1 ρp2 ρp3 ··· 1 where σjk ρjk = p σjj σkk is the Pearson correlation coefficient between variables Xj and Xk . Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 10 Background Data, Covariance, and Correlation Matrix Sample Correlation Matrix The sample correlation matrix refers to the symmetric array 0 1 1 r12 r13 ··· r1p Br21 1 r23 ··· r2pC B C Br r 1 ··· r C R = B 31 32 3pC B . .. C @ . A rp1 rp2 rp3 ··· 1 where Pn sjk i=1(xij − x¯j )(xik − x¯k ) rjk = = q q sj sk Pn 2 Pn 2 i=1(xij − x¯j ) i=1(xik − x¯k ) is the Pearson correlation coefficient between variables xj and xk . Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 11 Background Data, Covariance, and Correlation Matrix Correlation Matrix from Data Matrix We can calculate the (sample) correlation matrix such as 1 R = X0 X n − 1 s s −1 where Xs = CXD with −1 0 C = In − n 1n1n denoting a centering matrix D = diag(s1;:::; sp) denoting a diagonal scaling matrix Note that the standardized matrix Xs has the form 0 1 (x11 − x¯1)=s1 (x12 − x¯2)=s2 ··· (x1p − x¯p)=sp B(x21 − x¯1)=s1 (x22 − x¯2)=s2 ··· (x2p − x¯p)=spC B C B(x − x¯ )=s (x − x¯ )=s ··· (x − x¯p)=spC Xs = B 31 1 1 32 2 2 3p C B . .. C @ . A (xn1 − x¯1)=s1 (xn2 − x¯2)=s2 ··· (xnp − x¯p)=sp Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 12 Background Orthogonal Rotation Rotating Points in Two Dimensions 0 Suppose we have z = (x; y) 2 R2, i.e., points in 2D Euclidean space. A 2 × 2 orthogonal rotation of (x; y) of the form x∗ cos(θ) − sin(θ) x = y ∗ sin(θ) cos(θ) y rotates (x; y) counter-clockwise around the origin by an angle of θ and x∗ cos(θ) sin(θ) x = y ∗ − sin(θ) cos(θ) y rotates (x; y) clockwise around the origin by an angle of θ. Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 13 Background Orthogonal Rotation Visualization of 2D Clockwise Rotation No Rotation 30 degrees 45 degrees 3 3 3 ● 2 f 2 2 ●f ● g● ● ● e e e● f ● 1 1 g 1 ● g● d● d ● ● ● ● y 0 d h 0 0 c a b● ● ● xyrot[,2] c h xyrot[,2] h● a● b● ● ● −1 c i −1 −1 ● i ● b● ●j i a● k● −2 −2 −2 ●j ●j ● k k● −3 −3 −3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x xyrot[,1] xyrot[,1] 60 degrees 90 degrees 180 degrees 3 3 3 ● 2 2 a 2 ● a● k ● ● b● j b ● ● ● c ● ● 1 e f 1 1 i c a● ● d● ● ● d b c g● e● ● 0 0 f 0 h● d● ● xyrot[,2] xyrot[,2] g xyrot[,2] ● h h● −1 −1 ●i −1 g● e● ●i ●j ● ● −2 −2 k −2 f ●j k● −3 −3 −3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 Nathaniel E. Helwig (Uxyrot[,1] of Minnesota) Principal Componentsxyrot[,1] Analysis Updated 16-Mar-2017xyrot[,1] : Slide 14 Background Orthogonal Rotation Visualization of 2D Clockwise Rotation (R Code) rotmat2d <- function(theta){ matrix(c(cos(theta),sin(theta),-sin(theta),cos(theta)),2,2) } x <- seq(-2,2,length=11) y <- 4*exp(-x^2) - 2 xy <- cbind(x,y) rang <- c(30,45,60,90,180) dev.new(width=12,height=8,noRStudioGD=TRUE) par(mfrow=c(2,3)) plot(x,y,xlim=c(-3,3),ylim=c(-3,3),main="No Rotation") text(x,y,labels=letters[1:11],cex=1.5) for(j in 1:5){ rmat <- rotmat2d(rang[j]*2*pi/360) xyrot <- xy%*%rmat plot(xyrot,xlim=c(-3,3),ylim=c(-3,3)) text(xyrot,labels=letters[1:11],cex=1.5) title(paste(rang[j]," degrees")) } Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 15 Background Orthogonal Rotation Orthogonal Rotation in Two Dimensions Note that the 2 × 2 rotation matrix cos(θ) − sin(θ) R = sin(θ) cos(θ) is an orthogonal matrix for all θ: cos(θ) sin(θ) cos(θ) − sin(θ) R0R = − sin(θ) cos(θ) sin(θ) cos(θ) cos2(θ) + sin2(θ) cos(θ) sin(θ) − cos(θ) sin(θ) = cos(θ) sin(θ) − cos(θ) sin(θ) cos2(θ) + sin2(θ) 1 0 = 0 1 Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 16 Background Orthogonal Rotation Orthogonal Rotation in Higher Dimensions Suppose we have a data matrix X with p columns. Rows of X are coordinates of points in p-dimensional space Note: when p = 2 we have situation on previous slides A p × p orthogonal rotation is an orthogonal linear transformation. 0 0 R R = RR = Ip where Ip is p × p identity matrix If X~ = XR is rotated data matrix, then X~X~0 = XX0 Orthogonal rotation preserves relationships between points Nathaniel E. Helwig (U of Minnesota) Principal Components Analysis Updated 16-Mar-2017 : Slide 17 Population Principal Components Population Principal Components Nathaniel E.

Load more