STA135 Lecture 2: Sample Mean Vector and Sample Covariance Matrix
Xiaodong Li UC Davis
1 Sample mean and sample covariance
Recall that in 1-dimensional case, in a sample x1, . . . , xn, we can define
n 1 X x¯ = x n i i=1 as the (unbiased) sample mean n 1 X s2 := (x − x¯)2 n − 1 i i=1 p-dimensional case: Suppose we have p variates X1,...,Xp. For the vector of variates X1 ~ . X = . , Xp
we have a p-variate sample with size n: p ~x1, . . . , ~xn ∈ R . This sample of n observations give the following data matrix:
> x11 x12 . . . x1p ~x1 > x21 x22 . . . x2p ~x2 X = = . (1.1) . . .. . . . . . . . > xn1 xn2 . . . xnp ~xn
Notice that here each column in the data matrix corresponds to a particular variate Xj.
Sample mean: For each variate Xj, define the sample mean:
n 1 X x¯ = x , j = 1, . . . , p. j n ij i=1 Then the sample mean vector
n 1 P n xi1 x¯1 i=1 n xi1 n 1 X 1 X ~x := . = . = . = ~x . . . n . n i n i=1 i=1 x¯p 1 P xip n xip i=1
1 Sample covariance matrix: For each variate Xj, j = 1, . . . , p, define its sample variance as n 1 X s = s2 := (x − x¯ )2, j = 1, . . . , p jj j n − 1 ij j i=1
and sample covariance between Xj and Xk n 1 X s = s := (x − x¯ )(x − x¯ ), 1 ≤ k, j ≤ p, j 6= k. jk kj n − 1 ij j ik k i=1 The sample covariance matrix is defined as s11 s12 . . . s1p s21 s22 . . . s2p S = , . . .. . . . . . sp1 sp2 . . . spp Then 1 Pn 2 1 Pn n−1 i=1(xi1 − x¯1) ... n−1 i=1(xi1 − x¯1)(xip − x¯p) . .. . S = . . . 1 Pn 1 Pn 2 n−1 i=1(xip − x¯p)(xi1 − x¯1) ... n−1 i=1(xip − x¯p) 2 n (xi1 − x¯1) ... (xi1 − x¯1)(xip − x¯p) 1 X . . . = . .. . n − 1 . . i=1 2 (xip − x¯p)(xi1 − x¯1) ... (xip − x¯p) n xi1 − x¯1 1 X = . x − x¯ . . . x − x¯ n − 1 . i1 1 ip p i=1 xip − x¯p n 1 X > = ~x − ~x ~x − ~x . n − 1 i i i=1
2 Linear transformation of observations X1 ~ . Consider a sample of X = . with size n: Xp
~x1, . . . , ~xn. The corresponding data matrix is represented as
> x11 x12 . . . x1p ~x1 > x21 x22 . . . x2p ~x2 X = = . . . .. . . . . . . . > xn1 xn2 . . . xnp ~xn
q×p q For some C ∈ R and d~ ∈ R , consider the linear transformation Y1 ~ . ~ ~ Y = . = CX + d. Yq
2 Then we get a q-variate sample: ~yi = C~xi + d,~ i = 1, . . . , n,
The sample mean of ~y1, . . . , ~yn is
n n n 1 X 1 X 1 X ~y = ~y = (C~x + d~) = C( ~x ) + d~ = C~x + d.~ n i n i n i i=1 i=1 i=1 And the sample covariance is
n 1 X > S = ~y − ~y ~y − ~y y n − 1 i i i=1 n 1 X > = C~x − C~x C~x − C~x n − 1 i i i=1 n 1 X > = C ~x − ~x ~x − ~x C> n − 1 i i i=1 n ! 1 X > = C ~x − ~x ~x − ~x C> n − 1 i i i=1 > = CSxC .
3 Block structure of the sample covariance X1 X1 Xq+1 X2 X2 Xq+2 For the vector of variables X~ = , we can divide it into two parts: X~ (1) = and X~ (2) = . . . . . . . Xp Xq Xp In other words, X1 xi1 X2 xi2 . . . . " ~ (1) # " (1) # ~ Xq X xiq ~xi X = = (2) , in correspondence ~xi = = (2) Xq+1 X~ xi(q+1) ~x i X x q+2 i(q+2) . . . . Xp xip
where xi1 xi(q+1) xi2 xi(q+2) ~x(1) = , ~x(2) = i . i . . . xiq xip
3 We have the partition of the sample mean and the sample covariance matrix as follows: x¯1 x¯2 s11 . . . s1q s1,q+1 . . . s1,p . ...... . ...... ¯(1) ¯ x¯q ~x sq1 . . . sqq sq,q+1 . . . sq,p S11 S12 ~x = = (2) , S = = . x¯q+1 ~x¯ sq+1,1 . . . sq+1,q sq+1,q+1 . . . sq+1,p S21 S22 x¯q+2 ...... ...... . . sp1 . . . spq sp,q+1 . . . sp,p x¯p
(1) (2) By definition, S11 is the sample covariance of X~ and S22 is the sample covariance of X~ . Here S12 is referred to as the sample cross covariance matrix between X~ (1) and X~ (2). In fact, we can derive the following formula:
n 1 X (2) (1) > S = S> = ~x − ~x¯(2) ~x − ~x¯(1) 21 12 n − 1 i i i=1
4 Standardization and Sample Correlation Matrix
For the data matrix (1.1). The sample mean vector is denoted as ~x and the sample covariance is denoted √ as S. In particular, for j = 1, . . . , p, letx ¯j be the sample mean of the j-th variable and sjj be the sample standard deviation. For any entry xij for i = 1, . . . , n and j = 1, . . . , p, we get the standardized entry
xij − x¯j zij = √ . sjj Then the data matrix X is standardized to > z11 z12 . . . z1p ~z1 > z21 z22 . . . z2p ~z2 Z = = . . . .. . . . . . . . > zn1 zn2 . . . znp ~zn
Denote by R the sample covariance for the sample z1,..., zn. What is the connection between R and S? The i-th row of Z can be written as
√ √1 zi1 (xi1 − x¯1)/ s11 s11 xi1 − x¯1 √ √1 zi2 (xi2 − x¯2)/ s22 s xi2 − x¯2 = = 22 . . .. . . . . . √ 1 zip (xip − x¯p)/ spp √ xip − x¯p spp Let √1 s11 √1 − 1 s22 V 2 = . . .. √1 spp This transformation can be represented as
− 1 − 1 − 1 ~zi = V 2 (~xi − ~x) = V 2 ~xi − V 2 ~x, i = 1, . . . , n.
4 This implies that the sample mean for the new data matrix is
− 1 ~z¯ = V 2 (~x¯ − ~x¯) = ~0,
By the formula for the sample covariance of linear combinations of variates, the sample covariance matrix for the new data matrix Z is > − 1 − 1 R = V 2 S V 2
√1 √1 s11 s11 s12 . . . s1p s11 √1 √1 s s21 s22 . . . s2p s = 22 22 .. . . .. . .. . . . . . . 1 1 √ sp1 sp2 . . . spp √ spp spp s 1 √ s12 ... √ 1p s11s22 s11spp r11 r12 . . . r1p s √ s21 2p 1 ... √ r21 r22 . . . r2p s22s11 s22spp = . . . . := . . . . . . . . . . .. . . . . . sp1 sp2 √ √ ... 1 rp1 rp2 . . . rpp spps11 spps22 The matrix R is called the sample correlation matrix for the original data matrix X.
5 Mahalanobis distance and mean-centered ellipse
Sample covariance is p.s.d. Recall that the sample covariance is
n 1 X S = (~x − ~x¯)(~x − ~x¯)>. n − 1 i i i=1 Is S always positive semidefinite? Consider the spectral decomposition
p X > S = λj~uj~uj . j=1
Then S~uj = λj~uj, which implies that
> > > ~uj S~uj = ~uj (λj~uj) = λj~uj ~uj = λj. On the other hand n ! 1 X ~u>S~u = ~u> (~x − ~x¯)(~x − ~x¯)> ~u j j n − 1 j i i j i=1 n 1 X = ~u>(~x − ~x¯)(~x − ~x¯)>~u n − 1 j i i j i=1 n 1 X = |~u>(~x − ~x¯)|2 ≥ 0. n − 1 j i i=1 This implies that all eigenvalues of S are nonnegative, so S is positive semidefinite.
In this course, we always assume n > p and S is positive definite, which also implies that the inverse sample covariance matrix S−1 is also positive definite.
5 Mahalanobis distance p −1 For any two vectors ~x,~y ∈ R , their Mahalanobis distance based on S is defined as q > −1 dM (~x,~y) = (~x − ~y) S (~x − ~y).
By spectral decomposition of S−1: p X 1 S−1 = ~u ~u>, λ j j j=1 j the Mahalanobis distance is well-defined since
p p X 1 X 1 (~x − ~y)>S−1(~x − ~y) = (~x − ~y)> ~u ~u> (~x − ~y) = |(~x − ~y)>~u |2 ≥ 0. λ j j λ j j=1 j j=1 j The mean-centered ellipse with Mahalanobis radius c is defined as p ¯ p ¯ > −1 ¯ 2 {~x ∈ R : dM (~x, ~x) ≤ c} = {~x ∈ R :(~x − ~x) S (~x − ~x) ≤ c }.
Mean-centered ellipse For any ~x, we have
p p X 1 X 1 (~x − ~x¯)>S−1(~x − ~x¯) = (~x − ~x¯)> ~u ~u> (~x − ~x¯) = |(~x − ~x¯)>~u |2 λ j j λ j j=1 j j=1 j ¯ Consider a new cartesian coordinate system with center ~x and axes ~u1, ~u2, ..., ~up, the new coordinates of ¯ > ~x based on the axis ~uj becomes wj = (~x − ~x) ~uj, j=1, . . . , p. Then the mean-centered ellipse {~x :(~x − ~x¯)>S−1(~x − ~x¯) ≤ c2} becomes p p X 1 X 1 {~w : w2 ≤ c2} = {~w : w2 ≤ 1} (pλ )2 j (cpλ )2 j j=1 j j=1 j √ √ p in the new coordinate system, which is an ellipse with half axis lengths c λ1, c λ2, ..., c λp.
X2
~u2 ~u1
√ c λ1
√ c λ2 ~x¯
X1
6 6 Examples
Example 1 Consider a 2-variate data matrix x11 x12 x21 x22 X = . . . . xn1 xn2
with sample mean vector ~x and sample covariance matrix S~x.
Define the new sample y1 = x11 + x12, y2 = x21 + x22, ..., yn = xn1 + xn2.
Can we compute its sample mean and sample variance directly through ~x and S~x? Denote C = [1, 1]. Then xi1 yi = xi1 + xi2 = [1, 1] = C~xi. xi2
The sample mean of y1, . . . , yn can be represented as 1 y¯ = [(x + x ) + ... + (x + x )] n 11 12 n1 n2 1 1 = [x + ... + x ] + [x + ... + x ] n 11 n1 n 12 n2 = x1 + x2 = C~x.
2 Represent the sample variance of y1, . . . , yn by sy. Then
n n 2 X 2 X 2 (n − 1)sy = (yi − y) = ((xi1 + xi2) − (x1 + x2)) i=1 i=1 n X 2 = ((xi1 − x1) + (xi2 − x2)) i=1 n X 2 2 = (xi1 − x1) + 2(xi1 − x1)(xi2 − x2) + (xi2 − x2) i=1 n n n X 2 X X 2 = (xi1 − x1) + 2 (xi1 − x1)(xi2 − x2) + (xi2 − x2) i=1 i=1 i=1
= (n − 1)s11 + 2(n − 1)s12 + (n − 1)s22.
Then 2 sy = s11 + 2s12 + s22 = s11 + s12 + s21 + s22 s s 1 = [1, 1] 11 12 = CSC> s21 s22 1
7 Example 2 X1 n×4 X2 Suppose X ∈ R is a data matrix for the variables X~ = , with the following sample covariance X3 X4
2 0 0 0 0 2 1 0 Sx = . 0 1 2 1 0 0 1 2
X X What is the sample cross-covariance matrix between 1 and 2 ? X3 X4
Solution Since X1 1 0 0 0 X1 X3 0 0 1 0 X2 Y~ := = := CX,~ X2 0 1 0 0 X3 X4 0 0 0 1 X4 we know it sample covariance matrix is
> Sy = CSxC 1 0 0 0 2 0 0 0 1 0 0 0 0 0 1 0 0 2 1 0 0 0 1 0 = 0 1 0 0 0 1 2 1 0 1 0 0 0 0 0 1 0 0 1 2 0 0 0 1 2 0 0 0 1 0 0 0 0 1 2 1 0 0 1 0 = 0 2 1 0 0 1 0 0 0 0 1 2 0 0 0 1 2 0 0 0 0 2 1 1 = . 0 1 2 0 0 1 0 2
From the partition X1 X3 Y~ = X2 X4 we have the partition 2 0 0 0 0 2 1 1 Sy = . 0 1 2 0 0 1 0 2 X X 0 0 Then sample cross-covariance matrix between 1 and 2 is . This result can be verified X3 X4 1 1 entrywise.
8