<<

STA135 Lecture 3: Random Vectors, Multivariate Normality and Random Samples

Xiaodong Li UC Davis

1 Review of expectation, and

Let’s focus on the case of continuous random variables. Let fX (x) be the pdf of the X. Its expectation is defined as Z ∞ E[X] := µX := xfX (x)dx, −∞ and its variance is defined as Z ∞  2 2 Var(X) := E (X − µX ) = (x − µX ) fX (x)dx. −∞

Let X and Y be two jointly distributed random variables with joint pdf fX,Y (x, y). Their covariance is defined as Z ∞ Z ∞ Cov(X,Y ) = E ((X − E(X)) (Y − E(Y ))) = (x − µX )(y − µY )fX,Y (x, y)dxdy. −∞ −∞

Properties of expectation, variance, covariance:

2 2 1. Var(X) = E(X ) − (E(X)) .

2. : Cov(X,Y ) = E(XY ) − E(X)E(Y ). Pn Pn 3. E(a + i=1 biXi) = a + i=1 biE(Xi). 4. Var(a + bX) = b2Var(X).

5. Var(X) = Cov(X,X).

6. Cov(X,Y ) = Cov(Y,X). Pn Pm Pn Pm 7. Cov(a + i=1 biXi, c + j=1 djYj) = i=1 j=1 bidjCov(Xi,Yj). Pn Pn 2 P 8. Var(a + i=1 biXi) = i=1 bi Var(Xi) + 2 bibjCov(Xi,Xj). 1≤i

1 Normal distribution and properties The pdf of N (µ, σ2) is

1  1 (x − µ)2  f(x) = 1 exp − 2 . (2π) 2 σ 2 σ

In particular, N (0, 1) is referred to as the standard normal distribution. Let X ∼ N (µ, σ2). We have the following properties:

2 1. E(X) = µ and Var(X) = σ ; 2. If X ∼ N (µ, σ2), then aX + b ∼ N (aµ + b, a2σ2).

2 3. If X1,...,Xn are independent and Xi ∼ N (µi, σi ), then

n n n ! X X X 2 2 a + biXi ∼ N a + biµi, bi σi i=1 i=1 i=1

i.i.d. 4. If X1,...,Xn ∼ N (0, 1), then n X 2 2 Xi ∼ χn. i=1

2 Random vectors and matrices

Confidence region for “true” ~µ.

A vector   X1 X2 X~ =    .   .  Xp is referred to as a random vector, if X1,...,Xp are jointly distributed random variables. Its expectation or population mean is defined as     E X1 µ1 E X2 µ2 X~ =   :=   := ~µ E  .   .   .   .  E Xp µp A   X11 X12 ...X1m X21 X22 ...X2m X =    . . .. .   . . . .  Xk1 Xk2 ...Xkm

is a , if X11,X12,...,Xkm are jointly distributed random variables. Its expectation is defined as   E X11 E X12 ... E X1m E X21 E X22 ... E X2m X =   E  . . .. .   . . . .  E Xk1 E Xk2 ... E Xkm

2 p Random vectors X,~ Y~ ∈ R ,   E X~ + Y~ = E X~ + E Y.~

p p Random vector X~ ∈ R , deterministic ~a ∈ R and c ∈ R,

 >  > E ~a X~ + c = ~a E X~ + c.

p q×p q Random vector X~ ∈ R , deterministic C ∈ R and d~ ∈ R ,

E(CX~ + d~) = C E X~ + d.~

k×m Random matrices X, Y ∈ R , E (X + Y ) = E X + E Y k×m l×k m×n Random matrix X ∈ R , deterministic A ∈ R and B ∈ R ,

E (AXB) = A (E X) B.

3 Population covariance matrix   X1 X2 Population covariance matrix Let X~ =   ∈ p be a random vector with population mean  .  R  .  Xp ~ ~ E X = ~µ. Denote Cov(Xj,Xk) = σjk. Define the population covariance matrix of X as   σ11 σ12 . . . σ1p σ21 σ22 . . . σ2p Cov(X~ ) = Σ =    . . .. .   . . . .  σp1 σp2 . . . σpp

We can derive the following formula:   Cov(X1,X1) Cov(X1,X2) ... Cov(X1,Xp) Cov(X2,X1) Cov(X2,X2) ... Cov(X2,Xp) Σ =    . . .. .   . . . .  Cov(Xp,X1) Cov(Xp,X2) ... Cov(Xp,Xp)   E ((X1 − µ1)(X1 − µ1)) ... E ((X1 − µ1)(Xp − µp))  . .. .  =  . . .  E ((Xp − µp)(X1 − µ1)) ... E ((Xp − µp)(Xp − µp))   ((X1 − µ1)(X1 − µ1)) ... ((X1 − µ1)(Xp − µp))  . .. .  = E  . . .  ((Xp − µp)(X1 − µ1)) ... ((Xp − µp)(Xp − µp))    X1 − µ1  .    = E  .  X1 − µ1 ...Xp − µp  Xp − µp  > = E (X~ − E X~ )(X~ − E X~ ) .

3 Population cross-covariance matrix Let     X1 Y1 X2 Y2 X~ :=   and Y~ :=    .   .   .   .  Xp Yq be two jointly distributed random vectors with population ~µx and ~µy Define the population cross- covariance matrix between X~ and Y~ :   Cov(X1,Y1) Cov(X1,Y2) ... Cov(X1,Yq) Cov(X2,Y1) Cov(X2,Y2) ... Cov(X2,Yq) Cov(X,~ Y~ ) =    . . .. .   . . . .  Cov(Xp,Y1) Cov(Xp,Y2) ... Cov(Xp,Yq)

We can derive the following formula:

 > p×q Cov(X,~ Y~ ) = E (X~ − E X~ )(Y~ − E Y~ ) ∈ R .(W hy?)

Properties of population covariance

1. Cov(X~ ) = Cov(X,~ X~ ).

2. Cov(bX~ + d~) = b2Cov(X~ ).

3. Cov(X,~ Y~ ) = Cov(Y,~ X~ )>.

4. Cov(CX~ + d~) = CCov(X~ )C>.

Proof. The population mean of CX~ + d~ is   E CX~ + d~ = C E(X~ ) + d~ = C~µ + d.~

The population covariance matrix of CX~ + d~ is

 > Cov(CX~ + d~) = E (CX~ + d~ − (C~µ + d~))(CX~ + d~ − (C~µ + d~)) > > = E C(X~ − ~µ)(X~ − ~µ) C  > > = C E(X~ − ~µ)(X~ − ~µ) C > = CΣxC

5. Cov(CX,~ DY~ ) = CCov(X,~ Y~ )D>. (Why?)

Pn ~ Pm ~  Pn Pm  ~ ~  6. Cov i=1 biXi, j=1 djYj = i=1 j=1 bidjCov Xi, Yj .

4 Proof. To show this result, for any i = 1, . . . , n and j = 1, . . . , m, we denote     Xi1 Yj1 Xi2 Yj2 X~ =   Y~ =   i  .  j  .   .   .  Xip Yjp

The (`k)-entry of the left is  n m  X X Cov  biXi`, djYjk . i=1 j=1 The (`k)-entry of the left is n m X X bidjCov (Xi`,Yjk) . i=1 j=1 Therefore, the left equals the right.

~ p ~ q ~ ~ 7. If X ∈ R and Y ∈ R are independent random vectors, then Cov(X, Y ) = 0p×q. (Why?)

~ ~ p 8. For mutually independent random vectors X1,..., Xn ∈ R , we have ~ ~ 2 ~ 2 ~ Cov(a1X1 + ... + anXn + ~c) = a1Cov(X1) + ... + anCov(Xn).

(Why?)

4 Multivariate normality   X1 X2 Definition 1. A p × 1 random vector X~ =   ∼ N (~µ, Σ), if and only if its pdf is  .  p  .  Xp   1 1 > −1 f(~x) = p 1 exp − (~x − ~µ) Σ (~x − ~µ) (2π) 2 |Σ| 2 2

~ In this case, X1,...,Xp are said to jointly normally distributed as N (~µ, Σ). We also have E[X] = ~µ and Cov(X~ ) = Σ. Moreover, the distribution N (~0, Ip) is referred to as the p-dimensional standard normal distribution.

~ q×p ~ q Property 1: If X ∼ Np(~µ, Σ), then for C ∈ R , d ∈ R , then

> CX~ + d~ ∼ Nq(C~µ + d,~ CΣC ).

~ p Property 2: If X ∼ Np(~µ, Σ), then for ~a ∈ R , b ∈ R, then

> > > a1X1 + ... + apXp + b = ~a X~ + b ∼ N (~a ~µ + b,~a Σ~a).

5   X1 X2 Property 3: If X~ =   ∼ N (~µ, Σ), then X ,X ,...,X are mutually independent if and only if Σ  .  p 1 2 p  .  Xp is a , i.e., Cov(Xj,Xk) = 0 for any j 6= k.

Property 4: Suppose Σ is positive definite. If X~ ∼ Np(~µ, Σ), then

>  ~  −1  ~  2 X − ~µ Σ X − ~µ ∼ χp.

− 1   − 1 − 1 − 1 Proof. Let Z~ = Σ 2 X~ − ~µ . We have E(Z~) = Σ 2 (~µ − ~µ) = ~0 and Cov(Z~) = Σ 2 Cov(X~ )Σ 2 = I. i.i.d. This implies that Z~ ∼ Np(~0, I), i.e., Z1,...,Zp ∼ N (0, 1). Then

> >   −1     − 1 − 1   X~ − ~µ Σ X~ − ~µ = X~ − ~µ Σ 2 Σ 2 X~ − ~µ

>  − 1   − 1   = Σ 2 X~ − ~µ Σ 2 X~ − ~µ = Z~ >Z~ 2 2 = Z1 + ... + Zp 2 ∼ χp

Property 5: If X~ 1,..., X~ n are independent multivariate normal random vectors, then X~ 1 + ... + X~ n is a multivariate normal random vector.

Property 6: (The marginal distributions of a multivariate normal distribution are also multivariate normal distributions.) Suppose X~ ∼ Np(~µ, Σ). Considering the partition " # X~ (1) X~ = , X~ (2)

(1) q (2) p−q where X~ ∈ R and X~ ∈ R . Correspondingly  (1)    ~µ Σ11 Σ12 ~µ = (2) , Σ = . ~µ Σ21 Σ22

~ (1) (1)  ~ (2) (2)  p−q Then X ∼ N ~µ , Σ11 and X ∼ N ~µ , Σ22 . Moreover, for any ~x ∈ R ,

~ (1) ~ (2)  (1) −1  (2) −1  X |(X = ~x) ∼ Np1 ~µ + Σ12Σ22 ~x − ~µ , Σ11 − Σ12Σ22 Σ21 .

5 Random samples

Recall that for an observed (fixed) sample ~x1, . . . , ~xn from a p-variate population, they form a data matrix

   > x11 x12 . . . x1p ~x1 > x21 x22 . . . x2p ~x2  X =   =   .  . . .. .   .   . . . .   .  > xn1 xn2 . . . xnp ~xn

6 Its sample mean and sample covariance matrix have the fomulas

n 1 X 1 ~x := ~x = X>~1 , n i n n i=1 and n 1 X > 1 > > S := ~x − ~x ~x − ~x = (X − ~1 ~x )>(X − ~1 ~x ). n − 1 i i n − 1 n n i=1

In contrast, we say X~ 1,..., X~ n is a random sample of the p-variate population F, if they are independent and identically distributed random vectors from F. A random sample is used to model unobserved data that are intended to be observed. For example, assume the p-dimensional population is repre- sented by a p-dimensional random vector   X1 ~  .  X =  .  ∼ N (~µ, Σ), Xp

i.i.d. where Xj represents the j-th variable. A collection of random vectors X~ 1,..., X~ n ∼ N (~µ, Σ) is referred to as a random sample of X~ with size n. As with the fixed sample, each random vector is denoted as   Xi1 Xi1 X~ =   , i  .   .  Xip and the random data matrix is represented as

   ~ > X11 X12 ...X1p X1 > X21 X22 ...X2p X~  X =   =  2  .  . . .. .   .   . . . .   .  ~ > Xn1 Xn2 ...Xnp Xn We can similarly define the sample mean and sample covariance

n 1 X 1 X~ := X~ = X>~1 . n i n n i=1 and the sample covariance matrix

n > 1 X    > 1  >  > S := X~ − X~ X~ − X~ = X − ~1 X~ X − ~1 X~ , n − 1 i i n − 1 n n i=1

5.1 Algebraic properties All the algebraic properties on fixed samples hold the same for random samples. For example, the linear transformation on the variates Y~ = C X~ + d~ (q×1) (q×p)(p×1) (q×1) yields the linear transformation on the random sample

Y~i = CX~ i + d,~ i = 1, . . . , n.

7 ~ n The sample covariance of {Yi}i=1 is then

> SY = CSX C

~ n where SX is the sample covariance of {Xi}i=1.

In particular, under the linear combination

> Y = ~b X~ i = b1Xi1 + . . . bpX~ ip,

2 ~> ~ the sample variance Y is SY = b SX b.

5.2 Probabilistic properties

Let X~ 1,..., X~ n be a random sample from Np(~µ, Σ). We have the following properties on S and X~ :   1. E X~ = ~µ;

Proof. n ! n   1 X 1 X   1 X~ = X~ = X~ = (n~µ) = ~µ. E E n i n E i n i=1 i=1

 ~  1 2. Cov X = n Σ;

Proof. Since X~ 1,..., X~ n are independent, we have

n ! n ! n   1 X 1 X 1 X   1 1 Cov X~ = Cov X~ = Cov X~ = Cov X~ = (nΣ) = Σ. n i n2 i n2 i n2 n i=1 i=1 i=1

~ 1  3. X ∼ Np ~µ, n Σ ;

~ Pn 1 ~ ~ ~ Proof. Since X = i=1 n Xi, and X1,..., Xn are independent multivariate normal random vectors, we know X~ is a multivariate normal random vector.

~ > −1 ~ 2 4. n(X − ~µ) Σ (X − ~µ) ∼ χp;

~ 1  Proof. This is simply due to X ∼ Np ~µ, n Σ .

5. E(S) = Σ;

8 Proof. Method 1: To evaluate E(S), we first have

(n − 1)S n X > = (X~ i − X~ )(X~ i − X~ ) i=1 n X        > = X~ i − ~µ − X~ − ~µ X~ i − ~µ − X~ − ~µ i=1 n X    >    >    >    > = X~ i − ~µ X~ i − ~µ − X~ − ~µ X~ i − ~µ − X~ i − ~µ X~ − ~µ + X~ − ~µ X~ − ~µ i=1 n n !> n ! > > X       X X   > = X~ i − ~µ X~ i − ~µ − X~ − ~µ X~ i − n~µ − X~ i − n~µ X~ − ~µ + n(X~ − ~µ)(X~ − ~µ) i=1 i=1 i=1 n > X     > = X~ i − ~µ X~ i − ~µ − n(X~ − ~µ)(X~ − ~µ) i=1 This implies that

n > > X  ~   ~   ~   ~  E((n − 1)S) = E Xi − ~µ Xi − ~µ − n E X − ~µ X − ~µ i=1 n X 1 = Cov(X~ ) − n Cov(X~ ) i n i=1 n X 1 = Σ − n Σ n i=1 = (n − 1)Σ.

This implies that E S = Σ.

Method 2: To evaluate E(S), we only need to evaluate E(Sk`) for any 1 ≤ k, `, ≤ p.

n n ! ! 1 X 1 X S := (X − X¯ )(X − X¯ ) = X X − nX¯ X¯ . k` n − 1 ik k i` ` n − 1 ik i` k ` i=1 i=1 Then we have n ! ! 1 X [S ] = [X X ] − n [X¯ X¯ ] E k` n − 1 E ik i` E k ` i=1 Recall the formula

Cov(X,Y ) = E[XY ] − E[X] E[Y ] ⇒ E[XY ] = Cov(X,Y ) + E[X] E[Y ].

By ~ ~ E[Xi] = µ, Cov(Xi) = Σ we have E[XikXi`] = Cov(Xik,Xi`) + E[Xik] E[Xi`] = σk` + µkµ` Similarly, by 1 [X~ ] = µ, Cov(X~ ) = Σ E n

9 we have 1 [X¯ X¯ ] = Cov(X¯ , X¯ ) + [X¯ ] [X¯ ] = σ + µ µ E k ` k ` E k E ` n k` k ` Therefore we have

n ! ! 1 X [S ] = [X X ] − n [X¯ X¯ ] E k` n − 1 E ik i` E k ` i=1 n ! 1 X  1  = (σ + µ µ ) − n σ + µ µ n − 1 k` k ` n k` k ` i=1

= σk`.

This implies that E[S] = Σ.

~ > −1 ~ (n−1)p 6. n(X − ~µ) S (X − ~µ) ∼ n−p Fp,n−p. The above properties are consistent with the well-known properties for one-variate sample: Let

i.i.d. 2 X1,...,Xn ∼ N (µ, σ ). with sample mean X¯ and sample variance S2. Then

¯ 1 2 1. X ∼ N µ, n σ ; √ n(X−µ) n(X−µ)2 2 2. σ ∼ N (0, 1) and σ2 ∼ χ1; 2 2 3. E S = σ ; √ n(X−µ) n(X−µ)2 4. S ∼ tn−1 and S2 ∼ F1,n−1.

10 6 Examples

Example 1: Let X~ ∼ N3(~µ, Σ), where

 1  3 1 0 ~µ =  0  , and Σ = 1 3 1 . −1 0 1 2   X1 (a) Find the conditional distribution of given X3 = 0. X2 (b) Find the conditional distribution of X1 given X2 = X3.

Answer: X  (a) The conditional distribution of 1 is multivariate normal with X2          X1 1 0 −1 1 E X3 = 0 = + 2 (0 − (−1)) = 1 X2 0 1 2 and          X1 3 1 0 −1   3 1 Cov X3 = 0 = − 2 0 1 = 5 . X2 1 3 1 1 2  X  (b) We know 1 is multivariate normal with X2 − X3         X1 E X1 1 1 E = = = X2 − X3 E X2 − E X3 0 − (−1) 1 and 3 1 0 1 0   X  1 0 0  3 1 Cov 1 = 1 3 1 0 1 = . X − X 0 1 −1     1 3 2 3 0 1 2 0 −1

Then we know X1|X2 − X3 = 0 is conditional normal. The conditional mean is 1 2 (X |X − X = 0) = 1 + 1 × × (0 − 1) = E 1 2 3 3 3 and the is 1 8 Var(X |X − X = 0) = 3 − 1 × × 1 = . 1 2 3 3 3

X  Example 2: Suppose the population covariance matrix of X~ = 1 is X2

3 1 Σ = . 1 2

We aim at finding Var(X1 + X2).

Method 1: Since X1 + X2 = [1, 1]X~ , we know its population variance is

3 1 1 Var(X + X ) = [1, 1] = 7. 1 2 1 2 1

11 3 1 Method 2: We can avoid using matrix algebra. Notice that from Σ = , we have 1 2

Var(X1) = 3, Var(X2) = 2, Cov(X1,X2) = 1.

Then

Var(X1 + X2) = Cov(X1 + X2,X1 + X2)

= Cov(X1,X1) + Cov(X1,X2) + Cov(X2,X1) + Cov(X2,X2).

= Var(X1) + Var(X2) + 2Cov(X1,X2) = 7

Example 3: Let  1  3 1 1 ~ X ∼ N2  0  , 1 4 1 . −1 1 1 5   X1 Find the marginal distributions of X2 and . X3

      X1 1 3 1 Answer: X2 ∼ N (0, 4) and ∼ N , . X3 −1 1 5

12