STA135 Lecture 3: Random Vectors, Multivariate Normality and Random Samples
Xiaodong Li UC Davis
1 Review of expectation, variance and covariance
Let’s focus on the case of continuous random variables. Let fX (x) be the pdf of the random variable X. Its expectation is defined as Z ∞ E[X] := µX := xfX (x)dx, −∞ and its variance is defined as Z ∞ 2 2 Var(X) := E (X − µX ) = (x − µX ) fX (x)dx. −∞
Let X and Y be two jointly distributed random variables with joint pdf fX,Y (x, y). Their covariance is defined as Z ∞ Z ∞ Cov(X,Y ) = E ((X − E(X)) (Y − E(Y ))) = (x − µX )(y − µY )fX,Y (x, y)dxdy. −∞ −∞
Properties of expectation, variance, covariance:
2 2 1. Var(X) = E(X ) − (E(X)) .
2. Covariances: Cov(X,Y ) = E(XY ) − E(X)E(Y ). Pn Pn 3. E(a + i=1 biXi) = a + i=1 biE(Xi). 4. Var(a + bX) = b2Var(X).
5. Var(X) = Cov(X,X).
6. Cov(X,Y ) = Cov(Y,X). Pn Pm Pn Pm 7. Cov(a + i=1 biXi, c + j=1 djYj) = i=1 j=1 bidjCov(Xi,Yj). Pn Pn 2 P 8. Var(a + i=1 biXi) = i=1 bi Var(Xi) + 2 bibjCov(Xi,Xj). 1≤i 1 Normal distribution and properties The pdf of N (µ, σ2) is 1 1 (x − µ)2 f(x) = 1 exp − 2 . (2π) 2 σ 2 σ In particular, N (0, 1) is referred to as the standard normal distribution. Let X ∼ N (µ, σ2). We have the following properties: 2 1. E(X) = µ and Var(X) = σ ; 2. If X ∼ N (µ, σ2), then aX + b ∼ N (aµ + b, a2σ2). 2 3. If X1,...,Xn are independent and Xi ∼ N (µi, σi ), then n n n ! X X X 2 2 a + biXi ∼ N a + biµi, bi σi i=1 i=1 i=1 i.i.d. 4. If X1,...,Xn ∼ N (0, 1), then n X 2 2 Xi ∼ χn. i=1 2 Random vectors and matrices Confidence region for “true” mean ~µ. A vector X1 X2 X~ = . . Xp is referred to as a random vector, if X1,...,Xp are jointly distributed random variables. Its expectation or population mean is defined as E X1 µ1 E X2 µ2 X~ = := := ~µ E . . . . E Xp µp A matrix X11 X12 ...X1m X21 X22 ...X2m X = . . .. . . . . . Xk1 Xk2 ...Xkm is a random matrix, if X11,X12,...,Xkm are jointly distributed random variables. Its expectation is defined as E X11 E X12 ... E X1m E X21 E X22 ... E X2m X = E . . .. . . . . . E Xk1 E Xk2 ... E Xkm 2 p Random vectors X,~ Y~ ∈ R , E X~ + Y~ = E X~ + E Y.~ p p Random vector X~ ∈ R , deterministic ~a ∈ R and c ∈ R, > > E ~a X~ + c = ~a E X~ + c. p q×p q Random vector X~ ∈ R , deterministic C ∈ R and d~ ∈ R , E(CX~ + d~) = C E X~ + d.~ k×m Random matrices X, Y ∈ R , E (X + Y ) = E X + E Y k×m l×k m×n Random matrix X ∈ R , deterministic A ∈ R and B ∈ R , E (AXB) = A (E X) B. 3 Population covariance matrix X1 X2 Population covariance matrix Let X~ = ∈ p be a random vector with population mean . R . Xp ~ ~ E X = ~µ. Denote Cov(Xj,Xk) = σjk. Define the population covariance matrix of X as σ11 σ12 . . . σ1p σ21 σ22 . . . σ2p Cov(X~ ) = Σ = . . .. . . . . . σp1 σp2 . . . σpp We can derive the following formula: Cov(X1,X1) Cov(X1,X2) ... Cov(X1,Xp) Cov(X2,X1) Cov(X2,X2) ... Cov(X2,Xp) Σ = . . .. . . . . . Cov(Xp,X1) Cov(Xp,X2) ... Cov(Xp,Xp) E ((X1 − µ1)(X1 − µ1)) ... E ((X1 − µ1)(Xp − µp)) . .. . = . . . E ((Xp − µp)(X1 − µ1)) ... E ((Xp − µp)(Xp − µp)) ((X1 − µ1)(X1 − µ1)) ... ((X1 − µ1)(Xp − µp)) . .. . = E . . . ((Xp − µp)(X1 − µ1)) ... ((Xp − µp)(Xp − µp)) X1 − µ1 . = E . X1 − µ1 ...Xp − µp Xp − µp > = E (X~ − E X~ )(X~ − E X~ ) . 3 Population cross-covariance matrix Let X1 Y1 X2 Y2 X~ := and Y~ := . . . . Xp Yq be two jointly distributed random vectors with population means ~µx and ~µy Define the population cross- covariance matrix between X~ and Y~ : Cov(X1,Y1) Cov(X1,Y2) ... Cov(X1,Yq) Cov(X2,Y1) Cov(X2,Y2) ... Cov(X2,Yq) Cov(X,~ Y~ ) = . . .. . . . . . Cov(Xp,Y1) Cov(Xp,Y2) ... Cov(Xp,Yq) We can derive the following formula: > p×q Cov(X,~ Y~ ) = E (X~ − E X~ )(Y~ − E Y~ ) ∈ R .(W hy?) Properties of population covariance 1. Cov(X~ ) = Cov(X,~ X~ ). 2. Cov(bX~ + d~) = b2Cov(X~ ). 3. Cov(X,~ Y~ ) = Cov(Y,~ X~ )>. 4. Cov(CX~ + d~) = CCov(X~ )C>. Proof. The population mean of CX~ + d~ is E CX~ + d~ = C E(X~ ) + d~ = C~µ + d.~ The population covariance matrix of CX~ + d~ is > Cov(CX~ + d~) = E (CX~ + d~ − (C~µ + d~))(CX~ + d~ − (C~µ + d~)) > > = E C(X~ − ~µ)(X~ − ~µ) C > > = C E(X~ − ~µ)(X~ − ~µ) C > = CΣxC 5. Cov(CX,~ DY~ ) = CCov(X,~ Y~ )D>. (Why?) Pn ~ Pm ~ Pn Pm ~ ~ 6. Cov i=1 biXi, j=1 djYj = i=1 j=1 bidjCov Xi, Yj . 4 Proof. To show this result, for any i = 1, . . . , n and j = 1, . . . , m, we denote Xi1 Yj1 Xi2 Yj2 X~ = Y~ = i . j . . . Xip Yjp The (`k)-entry of the left is n m X X Cov biXi`, djYjk . i=1 j=1 The (`k)-entry of the left is n m X X bidjCov (Xi`,Yjk) . i=1 j=1 Therefore, the left equals the right. ~ p ~ q ~ ~ 7. If X ∈ R and Y ∈ R are independent random vectors, then Cov(X, Y ) = 0p×q. (Why?) ~ ~ p 8. For mutually independent random vectors X1,..., Xn ∈ R , we have ~ ~ 2 ~ 2 ~ Cov(a1X1 + ... + anXn + ~c) = a1Cov(X1) + ... + anCov(Xn). (Why?) 4 Multivariate normality X1 X2 Definition 1. A p × 1 random vector X~ = ∼ N (~µ, Σ), if and only if its pdf is . p . Xp 1 1 > −1 f(~x) = p 1 exp − (~x − ~µ) Σ (~x − ~µ) (2π) 2 |Σ| 2 2 ~ In this case, X1,...,Xp are said to jointly normally distributed as N (~µ, Σ). We also have E[X] = ~µ and Cov(X~ ) = Σ. Moreover, the distribution N (~0, Ip) is referred to as the p-dimensional standard normal distribution. ~ q×p ~ q Property 1: If X ∼ Np(~µ, Σ), then for C ∈ R , d ∈ R , then > CX~ + d~ ∼ Nq(C~µ + d,~ CΣC ). ~ p Property 2: If X ∼ Np(~µ, Σ), then for ~a ∈ R , b ∈ R, then > > > a1X1 + ... + apXp + b = ~a X~ + b ∼ N (~a ~µ + b,~a Σ~a). 5 X1 X2 Property 3: If X~ = ∼ N (~µ, Σ), then X ,X ,...,X are mutually independent if and only if Σ . p 1 2 p . Xp is a diagonal matrix, i.e., Cov(Xj,Xk) = 0 for any j 6= k. Property 4: Suppose Σ is positive definite. If X~ ∼ Np(~µ, Σ), then > ~ −1 ~ 2 X − ~µ Σ X − ~µ ∼ χp. − 1 − 1 − 1 − 1 Proof. Let Z~ = Σ 2 X~ − ~µ . We have E(Z~) = Σ 2 (~µ − ~µ) = ~0 and Cov(Z~) = Σ 2 Cov(X~ )Σ 2 = I. i.i.d. This implies that Z~ ∼ Np(~0, I), i.e., Z1,...,Zp ∼ N (0, 1). Then > > −1 − 1 − 1 X~ − ~µ Σ X~ − ~µ = X~ − ~µ Σ 2 Σ 2 X~ − ~µ > − 1 − 1 = Σ 2 X~ − ~µ Σ 2 X~ − ~µ = Z~ >Z~ 2 2 = Z1 + ... + Zp 2 ∼ χp Property 5: If X~ 1,..., X~ n are independent multivariate normal random vectors, then X~ 1 + ... + X~ n is a multivariate normal random vector. Property 6: (The marginal distributions of a multivariate normal distribution are also multivariate normal distributions.) Suppose X~ ∼ Np(~µ, Σ). Considering the partition " # X~ (1) X~ = , X~ (2) (1) q (2) p−q where X~ ∈ R and X~ ∈ R . Correspondingly (1) ~µ Σ11 Σ12 ~µ = (2) , Σ = . ~µ Σ21 Σ22 ~ (1) (1) ~ (2) (2) p−q Then X ∼ N ~µ , Σ11 and X ∼ N ~µ , Σ22 . Moreover, for any ~x ∈ R , ~ (1) ~ (2) (1) −1 (2) −1 X |(X = ~x) ∼ Np1 ~µ + Σ12Σ22 ~x − ~µ , Σ11 − Σ12Σ22 Σ21 . 5 Random samples Recall that for an observed (fixed) sample ~x1, . . . , ~xn from a p-variate population, they form a data matrix > x11 x12 . . . x1p ~x1 > x21 x22 . . . x2p ~x2 X = = . . . .. . . . . . . . > xn1 xn2 . . . xnp ~xn 6 Its sample mean and sample covariance matrix have the fomulas n 1 X 1 ~x := ~x = X>~1 , n i n n i=1 and n 1 X > 1 > > S := ~x − ~x ~x − ~x = (X − ~1 ~x )>(X − ~1 ~x ). n − 1 i i n − 1 n n i=1 In contrast, we say X~ 1,..., X~ n is a random sample of the p-variate population F, if they are independent and identically distributed random vectors from F. A random sample is used to model unobserved data that are intended to be observed. For example, assume the p-dimensional population is repre- sented by a p-dimensional random vector X1 ~ . X = . ∼ N (~µ, Σ), Xp i.i.d. where Xj represents the j-th variable. A collection of random vectors X~ 1,..., X~ n ∼ N (~µ, Σ) is referred to as a random sample of X~ with size n. As with the fixed sample, each random vector is denoted as Xi1 Xi1 X~ = , i . . Xip and the random data matrix is represented as ~ > X11 X12 ...X1p X1 > X21 X22 ...X2p X~ X = = 2 . . . .. . . . . . . . ~ > Xn1 Xn2 ...Xnp Xn We can similarly define the sample mean and sample covariance n 1 X 1 X~ := X~ = X>~1 . n i n n i=1 and the sample covariance matrix n > 1 X > 1 > > S := X~ − X~ X~ − X~ = X − ~1 X~ X − ~1 X~ , n − 1 i i n − 1 n n i=1 5.1 Algebraic properties All the algebraic properties on fixed samples hold the same for random samples. For example, the linear transformation on the variates Y~ = C X~ + d~ (q×1) (q×p)(p×1) (q×1) yields the linear transformation on the random sample Y~i = CX~ i + d,~ i = 1, . . . , n. 7 ~ n The sample covariance of {Yi}i=1 is then > SY = CSX C ~ n where SX is the sample covariance of {Xi}i=1. In particular, under the linear combination > Y = ~b X~ i = b1Xi1 + . . . bpX~ ip, 2 ~> ~ the sample variance Y is SY = b SX b. 5.2 Probabilistic properties Let X~ 1,..., X~ n be a random sample from Np(~µ, Σ). We have the following properties on S and X~ : 1. E X~ = ~µ; Proof. n ! n 1 X 1 X 1 X~ = X~ = X~ = (n~µ) = ~µ. E E n i n E i n i=1 i=1 ~ 1 2. Cov X = n Σ; Proof. Since X~ 1,..., X~ n are independent, we have n ! n ! n 1 X 1 X 1 X 1 1 Cov X~ = Cov X~ = Cov X~ = Cov X~ = (nΣ) = Σ. n i n2 i n2 i n2 n i=1 i=1 i=1 ~ 1 3. X ∼ Np ~µ, n Σ ; ~ Pn 1 ~ ~ ~ Proof. Since X = i=1 n Xi, and X1,..., Xn are independent multivariate normal random vectors, we know X~ is a multivariate normal random vector. ~ > −1 ~ 2 4. n(X − ~µ) Σ (X − ~µ) ∼ χp; ~ 1 Proof. This is simply due to X ∼ Np ~µ, n Σ . 5. E(S) = Σ; 8 Proof. Method 1: To evaluate E(S), we first have (n − 1)S n X > = (X~ i − X~ )(X~ i − X~ ) i=1 n X > = X~ i − ~µ − X~ − ~µ X~ i − ~µ − X~ − ~µ i=1 n X > > > > = X~ i − ~µ X~ i − ~µ − X~ − ~µ X~ i − ~µ − X~ i − ~µ X~ − ~µ + X~ − ~µ X~ − ~µ i=1 n n !> n ! > > X X X > = X~ i − ~µ X~ i − ~µ − X~ − ~µ X~ i − n~µ − X~ i − n~µ X~ − ~µ + n(X~ − ~µ)(X~ − ~µ) i=1 i=1 i=1 n > X > = X~ i − ~µ X~ i − ~µ − n(X~ − ~µ)(X~ − ~µ) i=1 This implies that n > > X ~ ~ ~ ~ E((n − 1)S) = E Xi − ~µ Xi − ~µ − n E X − ~µ X − ~µ i=1 n X 1 = Cov(X~ ) − n Cov(X~ ) i n i=1 n X 1 = Σ − n Σ n i=1 = (n − 1)Σ. This implies that E S = Σ. Method 2: To evaluate E(S), we only need to evaluate E(Sk`) for any 1 ≤ k, `, ≤ p. n n ! ! 1 X 1 X S := (X − X¯ )(X − X¯ ) = X X − nX¯ X¯ . k` n − 1 ik k i` ` n − 1 ik i` k ` i=1 i=1 Then we have n ! ! 1 X [S ] = [X X ] − n [X¯ X¯ ] E k` n − 1 E ik i` E k ` i=1 Recall the formula Cov(X,Y ) = E[XY ] − E[X] E[Y ] ⇒ E[XY ] = Cov(X,Y ) + E[X] E[Y ]. By ~ ~ E[Xi] = µ, Cov(Xi) = Σ we have E[XikXi`] = Cov(Xik,Xi`) + E[Xik] E[Xi`] = σk` + µkµ` Similarly, by 1 [X~ ] = µ, Cov(X~ ) = Σ E n 9 we have 1 [X¯ X¯ ] = Cov(X¯ , X¯ ) + [X¯ ] [X¯ ] = σ + µ µ E k ` k ` E k E ` n k` k ` Therefore we have n ! ! 1 X [S ] = [X X ] − n [X¯ X¯ ] E k` n − 1 E ik i` E k ` i=1 n ! 1 X 1 = (σ + µ µ ) − n σ + µ µ n − 1 k` k ` n k` k ` i=1 = σk`. This implies that E[S] = Σ. ~ > −1 ~ (n−1)p 6. n(X − ~µ) S (X − ~µ) ∼ n−p Fp,n−p. The above properties are consistent with the well-known properties for one-variate sample: Let i.i.d. 2 X1,...,Xn ∼ N (µ, σ ). with sample mean X¯ and sample variance S2. Then ¯ 1 2 1. X ∼ N µ, n σ ; √ n(X−µ) n(X−µ)2 2 2. σ ∼ N (0, 1) and σ2 ∼ χ1; 2 2 3. E S = σ ; √ n(X−µ) n(X−µ)2 4. S ∼ tn−1 and S2 ∼ F1,n−1. 10 6 Examples Example 1: Let X~ ∼ N3(~µ, Σ), where 1 3 1 0 ~µ = 0 , and Σ = 1 3 1 . −1 0 1 2 X1 (a) Find the conditional distribution of given X3 = 0. X2 (b) Find the conditional distribution of X1 given X2 = X3. Answer: X (a) The conditional distribution of 1 is multivariate normal with X2 X1 1 0 −1 1 E X3 = 0 = + 2 (0 − (−1)) = 1 X2 0 1 2 and X1 3 1 0 −1 3 1 Cov X3 = 0 = − 2 0 1 = 5 . X2 1 3 1 1 2 X (b) We know 1 is multivariate normal with X2 − X3 X1 E X1 1 1 E = = = X2 − X3 E X2 − E X3 0 − (−1) 1 and 3 1 0 1 0 X 1 0 0 3 1 Cov 1 = 1 3 1 0 1 = . X − X 0 1 −1 1 3 2 3 0 1 2 0 −1 Then we know X1|X2 − X3 = 0 is conditional normal. The conditional mean is 1 2 (X |X − X = 0) = 1 + 1 × × (0 − 1) = E 1 2 3 3 3 and the conditional variance is 1 8 Var(X |X − X = 0) = 3 − 1 × × 1 = . 1 2 3 3 3 X Example 2: Suppose the population covariance matrix of X~ = 1 is X2 3 1 Σ = . 1 2 We aim at finding Var(X1 + X2). Method 1: Since X1 + X2 = [1, 1]X~ , we know its population variance is 3 1 1 Var(X + X ) = [1, 1] = 7. 1 2 1 2 1 11 3 1 Method 2: We can avoid using matrix algebra. Notice that from Σ = , we have 1 2 Var(X1) = 3, Var(X2) = 2, Cov(X1,X2) = 1. Then Var(X1 + X2) = Cov(X1 + X2,X1 + X2) = Cov(X1,X1) + Cov(X1,X2) + Cov(X2,X1) + Cov(X2,X2). = Var(X1) + Var(X2) + 2Cov(X1,X2) = 7 Example 3: Let 1 3 1 1 ~ X ∼ N2 0 , 1 4 1 . −1 1 1 5 X1 Find the marginal distributions of X2 and . X3 X1 1 3 1 Answer: X2 ∼ N (0, 4) and ∼ N , . X3 −1 1 5 12