<<

STA135 Lecture 2: Vector and Sample

Xiaodong Li UC Davis

1 Sample mean and sample covariance

Recall that in 1-dimensional case, in a sample x1, . . . , xn, we can define

n 1 X x¯ = x n i i=1 as the (unbiased) sample mean n 1 X s2 := (x − x¯)2 n − 1 i i=1 p-dimensional case: Suppose we have p variates X1,...,Xp. For the vector of variates   X1 ~  .  X =  .  , Xp

we have a p-variate sample with size n: p ~x1, . . . , ~xn ∈ R . This sample of n observations give the following data matrix:

   > x11 x12 . . . x1p ~x1 > x21 x22 . . . x2p ~x2  X =   =   . (1.1)  . . .. .   .   . . . .   .  > xn1 xn2 . . . xnp ~xn

Notice that here each column in the data matrix corresponds to a particular variate Xj.

Sample mean: For each variate Xj, define the sample mean:

n 1 X x¯ = x , j = 1, . . . , p. j n ij i=1 Then the sample mean vector

 n  1 P   n xi1   x¯1  i=1  n xi1 n   1 X 1 X ~x :=  .  =  .  =  .  = ~x .  .   .  n  .  n i  n  i=1 i=1 x¯p  1 P  xip n xip i=1

1 Sample : For each variate Xj, j = 1, . . . , p, define its sample as n 1 X s = s2 := (x − x¯ )2, j = 1, . . . , p jj j n − 1 ij j i=1

and sample covariance between Xj and Xk n 1 X s = s := (x − x¯ )(x − x¯ ), 1 ≤ k, j ≤ p, j 6= k. jk kj n − 1 ij j ik k i=1 The sample covariance matrix is defined as   s11 s12 . . . s1p s21 s22 . . . s2p S =   ,  . . .. .   . . . .  sp1 sp2 . . . spp Then  1 Pn 2 1 Pn  n−1 i=1(xi1 − x¯1) ... n−1 i=1(xi1 − x¯1)(xip − x¯p)  . .. .  S =  . . .  1 Pn 1 Pn 2 n−1 i=1(xip − x¯p)(xi1 − x¯1) ... n−1 i=1(xip − x¯p)  2  n (xi1 − x¯1) ... (xi1 − x¯1)(xip − x¯p) 1 X . . . =  . .. .  n − 1  . .  i=1 2 (xip − x¯p)(xi1 − x¯1) ... (xip − x¯p)   n xi1 − x¯1 1 X =  .  x − x¯ . . . x − x¯  n − 1  .  i1 1 ip p i=1 xip − x¯p n 1 X > = ~x − ~x ~x − ~x . n − 1 i i i=1

2 Linear transformation of observations   X1 ~  .  Consider a sample of X =  .  with size n: Xp

~x1, . . . , ~xn. The corresponding data matrix is represented as

   > x11 x12 . . . x1p ~x1 > x21 x22 . . . x2p ~x2  X =   =   .  . . .. .   .   . . . .   .  > xn1 xn2 . . . xnp ~xn

q×p q For some C ∈ R and d~ ∈ R , consider the linear transformation   Y1 ~  .  ~ ~ Y =  .  = CX + d. Yq

2 Then we get a q-variate sample: ~yi = C~xi + d,~ i = 1, . . . , n,

The sample mean of ~y1, . . . , ~yn is

n n n 1 X 1 X 1 X ~y = ~y = (C~x + d~) = C( ~x ) + d~ = C~x + d.~ n i n i n i i=1 i=1 i=1 And the sample covariance is

n 1 X > S = ~y − ~y ~y − ~y y n − 1 i i i=1 n 1 X > = C~x − C~x C~x − C~x n − 1 i i i=1 n 1 X > = C ~x − ~x ~x − ~x C> n − 1 i i i=1 n ! 1 X > = C ~x − ~x ~x − ~x C> n − 1 i i i=1 > = CSxC .

3 Block structure of the sample covariance       X1 X1 Xq+1 X2 X2 Xq+2 For the vector of variables X~ =  , we can divide it into two parts: X~ (1) =   and X~ (2) =  .  .   .   .   .   .   .  Xp Xq Xp In other words,     X1 xi1  X2   xi2       .   .   .   .    " ~ (1) #   " (1) # ~  Xq  X  xiq  ~xi X =   = (2) , in correspondence ~xi =   = (2)  Xq+1  X~  xi(q+1)  ~x     i  X   x   q+2   i(q+2)   .   .   .   .  Xp xip

where     xi1 xi(q+1) xi2 xi(q+2) ~x(1) =   , ~x(2) =   i  .  i  .   .   .  xiq xip

3 We have the partition of the sample mean and the sample covariance matrix as follows:   x¯1    x¯2  s11 . . . s1q s1,q+1 . . . s1,p    .   ......   .   ......     ¯(1)      ¯  x¯q  ~x  sq1 . . . sqq sq,q+1 . . . sq,p  S11 S12 ~x =   = (2) , S =   = .  x¯q+1  ~x¯  sq+1,1 . . . sq+1,q sq+1,q+1 . . . sq+1,p  S21 S22      x¯q+2   ......     ......   .   .  sp1 . . . spq sp,q+1 . . . sp,p x¯p

(1) (2) By definition, S11 is the sample covariance of X~ and S22 is the sample covariance of X~ . Here S12 is referred to as the sample cross covariance matrix between X~ (1) and X~ (2). In fact, we can derive the following formula:

n 1 X  (2)   (1) > S = S> = ~x − ~x¯(2) ~x − ~x¯(1) 21 12 n − 1 i i i=1

4 Standardization and Sample Correlation Matrix

For the data matrix (1.1). The sample mean vector is denoted as ~x and the sample covariance is denoted √ as S. In particular, for j = 1, . . . , p, letx ¯j be the sample mean of the j-th variable and sjj be the sample . For any entry xij for i = 1, . . . , n and j = 1, . . . , p, we get the standardized entry

xij − x¯j zij = √ . sjj Then the data matrix X is standardized to    > z11 z12 . . . z1p ~z1 > z21 z22 . . . z2p ~z2  Z =   =   .  . . .. .   .   . . . .   .  > zn1 zn2 . . . znp ~zn

Denote by R the sample covariance for the sample z1,..., zn. What is the connection between R and S? The i-th row of Z can be written as

   √   √1    zi1 (xi1 − x¯1)/ s11 s11 xi1 − x¯1 √ √1 zi2 (xi2 − x¯2)/ s22  s  xi2 − x¯2   =   =  22     .   .   ..   .   .   .   .   .  √  1  zip (xip − x¯p)/ spp √ xip − x¯p spp Let  √1  s11  √1  − 1  s22  V 2 =  .  .  ..    √1 spp This transformation can be represented as

− 1 − 1 − 1 ~zi = V 2 (~xi − ~x) = V 2 ~xi − V 2 ~x, i = 1, . . . , n.

4 This implies that the sample mean for the new data matrix is

− 1 ~z¯ = V 2 (~x¯ − ~x¯) = ~0,

By the formula for the sample covariance of linear combinations of variates, the sample covariance matrix for the new data matrix Z is > − 1  − 1  R = V 2 S V 2

 √1     √1  s11 s11 s12 . . . s1p s11 √1 √1  s  s21 s22 . . . s2p  s  =  22     22   ..   . . .. .   ..   .   . . . .   .   1   1  √ sp1 sp2 . . . spp √ spp spp s  1 √ s12 ... √ 1p    s11s22 s11spp r11 r12 . . . r1p s √ s21 2p  1 ... √  r21 r22 . . . r2p  s22s11 s22spp    =  . . . .  := . . . .  . . . .   . . .. .   . . . .    sp1 sp2 √ √ ... 1 rp1 rp2 . . . rpp spps11 spps22 The matrix R is called the sample correlation matrix for the original data matrix X.

5 and mean-centered ellipse

Sample covariance is p.s.d. Recall that the sample covariance is

n 1 X S = (~x − ~x¯)(~x − ~x¯)>. n − 1 i i i=1 Is S always positive semidefinite? Consider the spectral decomposition

p X > S = λj~uj~uj . j=1

Then S~uj = λj~uj, which implies that

> > > ~uj S~uj = ~uj (λj~uj) = λj~uj ~uj = λj. On the other hand n ! 1 X ~u>S~u = ~u> (~x − ~x¯)(~x − ~x¯)> ~u j j n − 1 j i i j i=1 n 1 X = ~u>(~x − ~x¯)(~x − ~x¯)>~u n − 1 j i i j i=1 n 1 X = |~u>(~x − ~x¯)|2 ≥ 0. n − 1 j i i=1 This implies that all eigenvalues of S are nonnegative, so S is positive semidefinite.

In this course, we always assume n > p and S is positive definite, which also implies that the inverse sample covariance matrix S−1 is also positive definite.

5 Mahalanobis distance p −1 For any two vectors ~x,~y ∈ R , their Mahalanobis distance based on S is defined as q > −1 dM (~x,~y) = (~x − ~y) S (~x − ~y).

By spectral decomposition of S−1: p X 1 S−1 = ~u ~u>, λ j j j=1 j the Mahalanobis distance is well-defined since

 p  p X 1 X 1 (~x − ~y)>S−1(~x − ~y) = (~x − ~y)> ~u ~u> (~x − ~y) = |(~x − ~y)>~u |2 ≥ 0.  λ j j  λ j j=1 j j=1 j The mean-centered ellipse with Mahalanobis radius c is defined as p ¯ p ¯ > −1 ¯ 2 {~x ∈ R : dM (~x, ~x) ≤ c} = {~x ∈ R :(~x − ~x) S (~x − ~x) ≤ c }.

Mean-centered ellipse For any ~x, we have

 p  p X 1 X 1 (~x − ~x¯)>S−1(~x − ~x¯) = (~x − ~x¯)> ~u ~u> (~x − ~x¯) = |(~x − ~x¯)>~u |2  λ j j  λ j j=1 j j=1 j ¯ Consider a new cartesian coordinate system with center ~x and axes ~u1, ~u2, ..., ~up, the new coordinates of ¯ > ~x based on the axis ~uj becomes wj = (~x − ~x) ~uj, j=1, . . . , p. Then the mean-centered ellipse {~x :(~x − ~x¯)>S−1(~x − ~x¯) ≤ c2} becomes p p X 1 X 1 {~w : w2 ≤ c2} = {~w : w2 ≤ 1} (pλ )2 j (cpλ )2 j j=1 j j=1 j √ √ p in the new coordinate system, which is an ellipse with half axis lengths c λ1, c λ2, ..., c λp.

X2

~u2 ~u1

√ c λ1

√ c λ2 ~x¯

X1

6 6 Examples

Example 1 Consider a 2-variate data matrix   x11 x12 x21 x22 X =    . .   . .  xn1 xn2

with sample mean vector ~x and sample covariance matrix S~x.

Define the new sample y1 = x11 + x12, y2 = x21 + x22, ..., yn = xn1 + xn2.

Can we compute its sample mean and sample variance directly through ~x and S~x? Denote C = [1, 1]. Then   xi1 yi = xi1 + xi2 = [1, 1] = C~xi. xi2

The sample mean of y1, . . . , yn can be represented as 1 y¯ = [(x + x ) + ... + (x + x )] n 11 12 n1 n2 1 1 = [x + ... + x ] + [x + ... + x ] n 11 n1 n 12 n2 = x1 + x2 = C~x.

2 Represent the sample variance of y1, . . . , yn by sy. Then

n n 2 X 2 X 2 (n − 1)sy = (yi − y) = ((xi1 + xi2) − (x1 + x2)) i=1 i=1 n X 2 = ((xi1 − x1) + (xi2 − x2)) i=1 n X 2 2 = (xi1 − x1) + 2(xi1 − x1)(xi2 − x2) + (xi2 − x2) i=1 n n n X 2 X X 2 = (xi1 − x1) + 2 (xi1 − x1)(xi2 − x2) + (xi2 − x2) i=1 i=1 i=1

= (n − 1)s11 + 2(n − 1)s12 + (n − 1)s22.

Then 2 sy = s11 + 2s12 + s22 = s11 + s12 + s21 + s22 s s  1 = [1, 1] 11 12 = CSC> s21 s22 1

7 Example 2   X1 n×4 X2 Suppose X ∈ R is a data matrix for the variables X~ =  , with the following sample covariance X3 X4

2 0 0 0 0 2 1 0 Sx =   . 0 1 2 1 0 0 1 2

X  X  What is the sample cross-covariance matrix between 1 and 2 ? X3 X4

Solution Since       X1 1 0 0 0 X1 X3 0 0 1 0 X2 Y~ :=   =     := CX,~ X2 0 1 0 0 X3 X4 0 0 0 1 X4 we know it sample covariance matrix is

> Sy = CSxC 1 0 0 0 2 0 0 0 1 0 0 0 0 0 1 0 0 2 1 0 0 0 1 0 =       0 1 0 0 0 1 2 1 0 1 0 0 0 0 0 1 0 0 1 2 0 0 0 1 2 0 0 0 1 0 0 0 0 1 2 1 0 0 1 0 =     0 2 1 0 0 1 0 0 0 0 1 2 0 0 0 1 2 0 0 0 0 2 1 1 =   . 0 1 2 0 0 1 0 2

From the partition   X1 X3 Y~ =   X2 X4 we have the partition  2 0 0 0   0 2 1 1  Sy =   .  0 1 2 0  0 1 0 2 X  X  0 0 Then sample cross-covariance matrix between 1 and 2 is . This result can be verified X3 X4 1 1 entrywise.

8