Data, Covariance, and Correlation Matrix

Data, Covariance, and Correlation Matrix Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 16-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 1 Copyright Copyright c 2017 by Nathaniel E. Helwig Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 2 Outline of Notes 1) The Data Matrix 3) The Correlation Matrix Definition Definition Properties Properties R code R code 2) The Covariance Matrix 4) Miscellaneous Topics Definition Crossproduct calculations Properties Vec and Kronecker R code Visualizing data Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 3 The Data Matrix The Data Matrix Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 4 The Data Matrix Definition The Organization of Data The data matrix refers to the array of numbers 0 1 x11 x12 ··· x1p Bx21 x22 ··· x2pC B C Bx x ··· x C X = B 31 32 3pC B . .. C @ . A xn1 xn2 ··· xnp where xij is the j-th variable collected from the i-th item (e.g., subject). items/subjects are rows variables are columns X is a data matrix of order n × p (# items by # variables). Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 5 The Data Matrix Definition Collection of Column Vectors We can view a data matrix as a collection of column vectors: 0 1 B C X = @x1 x2 ··· xpA where xj is the j-th column of X for j 2 f1;:::; pg. The n × 1 vector xj gives the j-th variable’s scores for the n items. Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 6 The Data Matrix Definition Collection of Row Vectors We can view a data matrix as a collection of row vectors: 0 0 1 x1 B x0 C X = B 2 C B . C @ . A 0 xn 0 where xi is the i-th row of X for i 2 f1;:::; ng. 0 The 1 × p vector xi gives the i-th item’s scores for the p variables. Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 7 The Data Matrix Properties Calculating Variable (Column) Means The sample mean of the j-th variable is given by n 1 X x¯ = x j n ij i=1 −1 0 = n 1nxj where 1n denotes an n × 1 vector of ones xj denotes the j-th column of X Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 8 The Data Matrix Properties Calculating Item (Row) Means The sample mean of the i-th item is given by p 1 X x¯ = x i p ij j=1 −1 0 = p xi 1p where 1p denotes an p × 1 vector of ones 0 xi denotes the i-th row of X Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 9 The Data Matrix R Code Data Frame and Matrix Classes in R > data(mtcars) > class(mtcars) [1] "data.frame" > dim(mtcars) [1] 32 11 > head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 > X <- as.matrix(mtcars) > class(X) [1] "matrix" Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 10 The Data Matrix R Code Row and Column Means > # get row means (3 ways) > rowMeans(X)[1:3] Mazda RX4 Mazda RX4 Wag Datsun 710 29.90727 29.98136 23.59818 > c(mean(X[1,]), mean(X[2,]), mean(X[3,])) [1] 29.90727 29.98136 23.59818 > apply(X,1,mean)[1:3] Mazda RX4 Mazda RX4 Wag Datsun 710 29.90727 29.98136 23.59818 > # get column means (3 ways) > colMeans(X)[1:3] mpg cyl disp 20.09062 6.18750 230.72188 > c(mean(X[,1]), mean(X[,2]), mean(X[,3])) [1] 20.09062 6.18750 230.72188 > apply(X,2,mean)[1:3] mpg cyl disp 20.09062 6.18750 230.72188 Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 11 The Data Matrix R Code Other Row and Column Functions > # get column medians > apply(X,2,median)[1:3] mpg cyl disp 19.2 6.0 196.3 > c(median(X[,1]), median(X[,2]), median(X[,3])) [1] 19.2 6.0 196.3 > # get column ranges > apply(X,2,range)[,1:3] mpg cyl disp [1,] 10.4 4 71.1 [2,] 33.9 8 472.0 > cbind(range(X[,1]), range(X[,2]), range(X[,3])) [,1] [,2] [,3] [1,] 10.4 4 71.1 [2,] 33.9 8 472.0 Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 12 The Covariance Matrix The Covariance Matrix Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 13 The Covariance Matrix Definition The Covariation of Data The covariance matrix refers to the symmetric array of numbers 0 2 1 s1 s12 s13 ··· s1p 2 Bs21 s s23 ··· s2pC B 2 C Bs s s2 ··· s C S = B 31 32 3 3pC B . .. C @ . A 2 sp1 sp2 sp3 ··· sp where 2 Pn ¯ 2 sj = (1=n) i=1(xij − xj ) is the variance of the j-th variable Pn sjk = (1=n) i=1(xij − x¯j )(xik − x¯k ) is the covariance between the j-th and k-th variables Pn x¯j = (1=n) i=1 xij is the mean of the j-th variable Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 14 The Covariance Matrix Definition Covariance Matrix from Data Matrix We can calculate the covariance matrix such as 1 S = X0 X n c c 0 where Xc = X − 1nx¯ = CX with 0 x¯ = (x¯1;:::; x¯p) denoting the vector of variable means −1 0 C = In − n 1n1n denoting a centering matrix Note that the centered matrix Xc has the form 0 1 x11 − x¯1 x12 − x¯2 ··· x1p − x¯p Bx21 − x¯1 x22 − x¯2 ··· x2p − x¯pC B C Bx − x¯ x − x¯ ··· x − x¯pC Xc = B 31 1 32 2 3p C B . .. C @ . A xn1 − x¯1 xn2 − x¯2 ··· xnp − x¯p Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 15 The Covariance Matrix Properties Variances are Nonnegative 2 Variances are sums-of-squares, which implies that sj ≥ 0 8j. 2 sj > 0 as long as there does not exist an α such that xj = α1n This implies that. tr(S) ≥ 0 where tr(·) denotes the matrix trace function Pp j=1 λj ≥ 0 where (λ1; : : : ; λp) are the eigenvalues of S If n < p, then λj = 0 for at least one j 2 f1;:::; pg. If n ≥ p and the p columns of X are linearly independent, then λj > 0 for all j 2 f1;:::; pg. Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 16 The Covariance Matrix Properties The Cauchy-Schwarz Inequality From the Cauchy-Schwarz inequality we have that 2 2 2 sjk ≤ sj sk with the equality holding if and only if xj and xk are linearly dependent. We could also write the Cauchy-Schwarz inequality as jsjk j ≤ sj sk where sj and sk denote the standard deviations of the variables. Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 17 The Covariance Matrix R Code Covariance Matrix by Hand (hard way) > n <- nrow(X) > C <- diag(n) - matrix(1/n, n, n) > Xc <- C %*%X > S <- t(Xc) %*% Xc / (n-1) > S[1:3,1:6] mpg cyl disp hp drat wt mpg 36.324103 -9.172379 -633.0972 -320.7321 2.1950635 -5.116685 cyl -9.172379 3.189516 199.6603 101.9315 -0.6683669 1.367371 disp -633.097208 199.660282 15360.7998 6721.1587 -47.0640192 107.684204 # or # > Xc <- scale(X, center=TRUE, scale=FALSE) > S <- t(Xc) %*% Xc / (n-1) > S[1:3,1:6] mpg cyl disp hp drat wt mpg 36.324103 -9.172379 -633.0972 -320.7321 2.1950635 -5.116685 cyl -9.172379 3.189516 199.6603 101.9315 -0.6683669 1.367371 disp -633.097208 199.660282 15360.7998 6721.1587 -47.0640192 107.684204 Nathaniel E. Helwig (U of Minnesota) Data, Covariance, and Correlation Matrix Updated 16-Jan-2017 : Slide 18 The Covariance Matrix R Code Covariance Matrix using cov Function (easy way) # calculate covariance matrix > S <- cov(X) > dim(S) [1] 11 11 # check variance > S[1,1] [1] 36.3241 > var(X[,1]) [1] 36.3241 > sum((X[,1]-mean(X[,1]))^2) / (n-1) [1] 36.3241 # check covariance > S[1:3,1:6] mpg cyl disp hp drat wt mpg 36.324103 -9.172379 -633.0972 -320.7321 2.1950635 -5.116685 cyl -9.172379 3.189516 199.6603 101.9315 -0.6683669 1.367371 disp -633.097208 199.660282 15360.7998 6721.1587 -47.0640192 107.684204 Nathaniel E.

Data, Covariance, and Correlation Matrix

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support