Sample Mean Vector and Sample Covariance Matrix

STA135 Lecture 2: Sample Mean Vector and Sample Covariance Matrix Xiaodong Li UC Davis 1 Sample mean and sample covariance Recall that in 1-dimensional case, in a sample x1; : : : ; xn, we can define n 1 X x¯ = x n i i=1 as the (unbiased) sample mean n 1 X s2 := (x − x¯)2 n − 1 i i=1 p-dimensional case: Suppose we have p variates X1;:::;Xp. For the vector of variates 2 3 X1 ~ 6 . 7 X = 4 . 5 ; Xp we have a p-variate sample with size n: p ~x1; : : : ; ~xn 2 R : This sample of n observations give the following data matrix: 2 3 2 >3 x11 x12 : : : x1p ~x1 > 6x21 x22 : : : x2p7 6~x2 7 X = 6 7 = 6 7 : (1.1) 6 . .. 7 6 . 7 4 . 5 4 . 5 > xn1 xn2 : : : xnp ~xn Notice that here each column in the data matrix corresponds to a particular variate Xj. Sample mean: For each variate Xj, define the sample mean: n 1 X x¯ = x ; j = 1; : : : ; p: j n ij i=1 Then the sample mean vector 2 n 3 1 P 2 3 n xi1 2 3 x¯1 6 i=1 7 n xi1 n 6 7 1 X 1 X ~x := 6 . 7 = 6 . 7 = 6 . 7 = ~x : 4 . 5 6 . 7 n 4 . 5 n i 6 n 7 i=1 i=1 x¯p 4 1 P 5 xip n xip i=1 1 Sample covariance matrix: For each variate Xj, j = 1; : : : ; p, define its sample variance as n 1 X s = s2 := (x − x¯ )2; j = 1; : : : ; p jj j n − 1 ij j i=1 and sample covariance between Xj and Xk n 1 X s = s := (x − x¯ )(x − x¯ ); 1 ≤ k; j ≤ p; j 6= k: jk kj n − 1 ij j ik k i=1 The sample covariance matrix is defined as 2 3 s11 s12 : : : s1p 6s21 s22 : : : s2p7 S = 6 7 ; 6 . .. 7 4 . 5 sp1 sp2 : : : spp Then 2 1 Pn 2 1 Pn 3 n−1 i=1(xi1 − x¯1) ::: n−1 i=1(xi1 − x¯1)(xip − x¯p) 6 . .. 7 S = 4 . 5 1 Pn 1 Pn 2 n−1 i=1(xip − x¯p)(xi1 − x¯1) ::: n−1 i=1(xip − x¯p) 2 2 3 n (xi1 − x¯1) ::: (xi1 − x¯1)(xip − x¯p) 1 X . = 6 . .. 7 n − 1 4 . 5 i=1 2 (xip − x¯p)(xi1 − x¯1) ::: (xip − x¯p) 2 3 n xi1 − x¯1 1 X = 6 . 7 x − x¯ : : : x − x¯ n − 1 4 . 5 i1 1 ip p i=1 xip − x¯p n 1 X > = ~x − ~x ~x − ~x : n − 1 i i i=1 2 Linear transformation of observations 2 3 X1 ~ 6 . 7 Consider a sample of X = 4 . 5 with size n: Xp ~x1; : : : ; ~xn: The corresponding data matrix is represented as 2 3 2 >3 x11 x12 : : : x1p ~x1 > 6x21 x22 : : : x2p7 6~x2 7 X = 6 7 = 6 7 : 6 . .. 7 6 . 7 4 . 5 4 . 5 > xn1 xn2 : : : xnp ~xn q×p q For some C 2 R and d~ 2 R , consider the linear transformation 2 3 Y1 ~ 6 . 7 ~ ~ Y = 4 . 5 = CX + d: Yq 2 Then we get a q-variate sample: ~yi = C~xi + d;~ i = 1; : : : ; n; The sample mean of ~y1; : : : ; ~yn is n n n 1 X 1 X 1 X ~y = ~y = (C~x + d~) = C( ~x ) + d~ = C~x + d:~ n i n i n i i=1 i=1 i=1 And the sample covariance is n 1 X > S = ~y − ~y ~y − ~y y n − 1 i i i=1 n 1 X > = C~x − C~x C~x − C~x n − 1 i i i=1 n 1 X > = C ~x − ~x ~x − ~x C> n − 1 i i i=1 n ! 1 X > = C ~x − ~x ~x − ~x C> n − 1 i i i=1 > = CSxC : 3 Block structure of the sample covariance 2 3 2 3 2 3 X1 X1 Xq+1 6X27 6X27 6Xq+27 For the vector of variables X~ = 6 7, we can divide it into two parts: X~ (1) = 6 7 and X~ (2) = 6 7. 6 . 7 6 . 7 6 . 7 4 . 5 4 . 5 4 . 5 Xp Xq Xp In other words, 2 3 2 3 X1 xi1 6 X2 7 6 xi2 7 6 7 6 7 6 . 7 6 . 7 6 . 7 6 . 7 6 7 " ~ (1) # 6 7 " (1) # ~ 6 Xq 7 X 6 xiq 7 ~xi X = 6 7 = (2) ; in correspondence ~xi = 6 7 = (2) 6 Xq+1 7 X~ 6 xi(q+1) 7 ~x 6 7 6 7 i 6 X 7 6 x 7 6 q+2 7 6 i(q+2) 7 6 . 7 6 . 7 4 . 5 4 . 5 Xp xip where 2 3 2 3 xi1 xi(q+1) 6xi27 6xi(q+2)7 ~x(1) = 6 7 ; ~x(2) = 6 7 i 6 . 7 i 6 . 7 4 . 5 4 . 5 xiq xip 3 We have the partition of the sample mean and the sample covariance matrix as follows: 2 3 x¯1 2 3 6 x¯2 7 s11 : : : s1q s1;q+1 : : : s1;p 6 7 6 . 7 6 . .. .. 7 6 . 7 6 . 7 6 7 ¯(1) 6 7 ¯ 6 x¯q 7 ~x 6 sq1 : : : sqq sq;q+1 : : : sq;p 7 S11 S12 ~x = 6 7 = (2) ; S = 6 7 = : 6 x¯q+1 7 ~x¯ 6 sq+1;1 : : : sq+1;q sq+1;q+1 : : : sq+1;p 7 S21 S22 6 7 6 7 6 x¯q+2 7 6 . .. .. 7 6 7 4 . 5 6 . 7 4 . 5 sp1 : : : spq sp;q+1 : : : sp;p x¯p (1) (2) By definition, S11 is the sample covariance of X~ and S22 is the sample covariance of X~ . Here S12 is referred to as the sample cross covariance matrix between X~ (1) and X~ (2). In fact, we can derive the following formula: n 1 X (2) (1) > S = S> = ~x − ~x¯(2) ~x − ~x¯(1) 21 12 n − 1 i i i=1 4 Standardization and Sample Correlation Matrix For the data matrix (1.1). The sample mean vector is denoted as ~x and the sample covariance is denoted p as S. In particular, for j = 1; : : : ; p, letx ¯j be the sample mean of the j-th variable and sjj be the sample standard deviation. For any entry xij for i = 1; : : : ; n and j = 1; : : : ; p, we get the standardized entry xij − x¯j zij = p : sjj Then the data matrix X is standardized to 2 3 2 >3 z11 z12 : : : z1p ~z1 > 6z21 z22 : : : z2p7 6~z2 7 Z = 6 7 = 6 7 : 6 . .. 7 6 . 7 4 . 5 4 . 5 > zn1 zn2 : : : znp ~zn Denote by R the sample covariance for the sample z1;:::; zn. What is the connection between R and S? The i-th row of Z can be written as 2 3 2 p 3 2 p1 3 2 3 zi1 (xi1 − x¯1)= s11 s11 xi1 − x¯1 p p1 6zi27 6(xi2 − x¯2)= s227 6 s 7 6xi2 − x¯27 6 7 = 6 7 = 6 22 7 6 7 6 . 7 6 . 7 6 .. 7 6 . 7 4 . 5 4 . 5 6 . 7 4 . 5 p 4 1 5 zip (xip − x¯p)= spp p xip − x¯p spp Let 2 p1 3 s11 6 p1 7 − 1 6 s22 7 V 2 = 6 . 7 : 6 .. 7 4 5 p1 spp This transformation can be represented as − 1 − 1 − 1 ~zi = V 2 (~xi − ~x) = V 2 ~xi − V 2 ~x; i = 1; : : : ; n: 4 This implies that the sample mean for the new data matrix is − 1 ~z¯ = V 2 (~x¯ − ~x¯) = ~0; By the formula for the sample covariance of linear combinations of variates, the sample covariance matrix for the new data matrix Z is > − 1 − 1 R = V 2 S V 2 2 p1 3 2 3 2 p1 3 s11 s11 s12 : : : s1p s11 p1 p1 6 s 7 6s21 s22 : : : s2p7 6 s 7 = 6 22 7 6 7 6 22 7 6 .. 7 6 . .. 7 6 .. 7 6 . 7 4 . 5 6 . 7 4 1 5 4 1 5 p sp1 sp2 : : : spp p spp spp s 2 1 p s12 ::: p 1p 3 2 3 s11s22 s11spp r11 r12 : : : r1p s p s21 2p 6 1 ::: p 7 6r21 r22 : : : r2p7 6 s22s11 s22spp 7 6 7 = 6 . 7 := . 6 . 7 6 . .. 7 4 . 5 4 5 sp1 sp2 p p ::: 1 rp1 rp2 : : : rpp spps11 spps22 The matrix R is called the sample correlation matrix for the original data matrix X. 5 Mahalanobis distance and mean-centered ellipse Sample covariance is p.s.d. Recall that the sample covariance is n 1 X S = (~x − ~x¯)(~x − ~x¯)>: n − 1 i i i=1 Is S always positive semidefinite? Consider the spectral decomposition p X > S = λj~uj~uj : j=1 Then S~uj = λj~uj, which implies that > > > ~uj S~uj = ~uj (λj~uj) = λj~uj ~uj = λj: On the other hand n ! 1 X ~u>S~u = ~u> (~x − ~x¯)(~x − ~x¯)> ~u j j n − 1 j i i j i=1 n 1 X = ~u>(~x − ~x¯)(~x − ~x¯)>~u n − 1 j i i j i=1 n 1 X = j~u>(~x − ~x¯)j2 ≥ 0: n − 1 j i i=1 This implies that all eigenvalues of S are nonnegative, so S is positive semidefinite.

Load more