Random Vectors, Multivariate Normality and Random Samples

STA135 Lecture 3: Random Vectors, Multivariate Normality and Random Samples Xiaodong Li UC Davis 1 Review of expectation, variance and covariance Let's focus on the case of continuous random variables. Let fX (x) be the pdf of the random variable X. Its expectation is defined as Z 1 E[X] := µX := xfX (x)dx; −∞ and its variance is defined as Z 1 2 2 Var(X) := E (X − µX ) = (x − µX ) fX (x)dx: −∞ Let X and Y be two jointly distributed random variables with joint pdf fX;Y (x; y). Their covariance is defined as Z 1 Z 1 Cov(X; Y ) = E ((X − E(X)) (Y − E(Y ))) = (x − µX )(y − µY )fX;Y (x; y)dxdy: −∞ −∞ Properties of expectation, variance, covariance: 2 2 1. Var(X) = E(X ) − (E(X)) . 2. Covariances: Cov(X; Y ) = E(XY ) − E(X)E(Y ). Pn Pn 3. E(a + i=1 biXi) = a + i=1 biE(Xi). 4. Var(a + bX) = b2Var(X). 5. Var(X) = Cov(X; X). 6. Cov(X; Y ) = Cov(Y; X). Pn Pm Pn Pm 7. Cov(a + i=1 biXi; c + j=1 djYj) = i=1 j=1 bidjCov(Xi;Yj). Pn Pn 2 P 8. Var(a + i=1 biXi) = i=1 bi Var(Xi) + 2 bibjCov(Xi;Xj). 1≤i<j≤n 9. If X and Y are independent, then 8 (XY ) = (X) (Y ) <>E E E Cov(X; Y ) = 0 :>Var(X + Y ) = Var(X) + Var(Y ) 1 Normal distribution and properties The pdf of N (µ, σ2) is 1 1 (x − µ)2 f(x) = 1 exp − 2 : (2π) 2 σ 2 σ In particular, N (0; 1) is referred to as the standard normal distribution. Let X ∼ N (µ, σ2). We have the following properties: 2 1. E(X) = µ and Var(X) = σ ; 2. If X ∼ N (µ, σ2), then aX + b ∼ N (aµ + b; a2σ2). 2 3. If X1;:::;Xn are independent and Xi ∼ N (µi; σi ), then n n n ! X X X 2 2 a + biXi ∼ N a + biµi; bi σi i=1 i=1 i=1 i:i:d: 4. If X1;:::;Xn ∼ N (0; 1), then n X 2 2 Xi ∼ χn: i=1 2 Random vectors and matrices Confidence region for \true" mean ~µ. A vector 2 3 X1 6X27 X~ = 6 7 6 . 7 4 . 5 Xp is referred to as a random vector, if X1;:::;Xp are jointly distributed random variables. Its expectation or population mean is defined as 2 3 2 3 E X1 µ1 6E X27 6µ27 X~ = 6 7 := 6 7 := ~µ E 6 . 7 6 . 7 4 . 5 4 . 5 E Xp µp A matrix 2 3 X11 X12 :::X1m 6X21 X22 :::X2m7 X = 6 7 6 . .. 7 4 . 5 Xk1 Xk2 :::Xkm is a random matrix, if X11;X12;:::;Xkm are jointly distributed random variables. Its expectation is defined as 2 3 E X11 E X12 ::: E X1m 6E X21 E X22 ::: E X2m7 X = 6 7 E 6 . .. 7 4 . 5 E Xk1 E Xk2 ::: E Xkm 2 p Random vectors X;~ Y~ 2 R , E X~ + Y~ = E X~ + E Y:~ p p Random vector X~ 2 R , deterministic ~a 2 R and c 2 R, > > E ~a X~ + c = ~a E X~ + c: p q×p q Random vector X~ 2 R , deterministic C 2 R and d~ 2 R , E(CX~ + d~) = C E X~ + d:~ k×m Random matrices X; Y 2 R , E (X + Y ) = E X + E Y k×m l×k m×n Random matrix X 2 R , deterministic A 2 R and B 2 R , E (AXB) = A (E X) B: 3 Population covariance matrix 2 3 X1 6X27 Population covariance matrix Let X~ = 6 7 2 p be a random vector with population mean 6 . 7 R 4 . 5 Xp ~ ~ E X = ~µ. Denote Cov(Xj;Xk) = σjk. Define the population covariance matrix of X as 2 3 σ11 σ12 : : : σ1p 6σ21 σ22 : : : σ2p7 Cov(X~ ) = Σ = 6 7 6 . .. 7 4 . 5 σp1 σp2 : : : σpp We can derive the following formula: 2 3 Cov(X1;X1) Cov(X1;X2) ::: Cov(X1;Xp) 6Cov(X2;X1) Cov(X2;X2) ::: Cov(X2;Xp)7 Σ = 6 7 6 . .. 7 4 . 5 Cov(Xp;X1) Cov(Xp;X2) ::: Cov(Xp;Xp) 2 3 E ((X1 − µ1)(X1 − µ1)) ::: E ((X1 − µ1)(Xp − µp)) 6 . .. 7 = 4 . 5 E ((Xp − µp)(X1 − µ1)) ::: E ((Xp − µp)(Xp − µp)) 2 3 ((X1 − µ1)(X1 − µ1)) ::: ((X1 − µ1)(Xp − µp)) 6 . .. 7 = E 4 . 5 ((Xp − µp)(X1 − µ1)) ::: ((Xp − µp)(Xp − µp)) 02 3 1 X1 − µ1 B6 . 7 C = E @4 . 5 X1 − µ1 :::Xp − µp A Xp − µp > = E (X~ − E X~ )(X~ − E X~ ) : 3 Population cross-covariance matrix Let 2 3 2 3 X1 Y1 6X27 6Y27 X~ := 6 7 and Y~ := 6 7 6 . 7 6 . 7 4 . 5 4 . 5 Xp Yq be two jointly distributed random vectors with population means ~µx and ~µy Define the population cross- covariance matrix between X~ and Y~ : 2 3 Cov(X1;Y1) Cov(X1;Y2) ::: Cov(X1;Yq) 6Cov(X2;Y1) Cov(X2;Y2) ::: Cov(X2;Yq)7 Cov(X;~ Y~ ) = 6 7 6 . .. 7 4 . 5 Cov(Xp;Y1) Cov(Xp;Y2) ::: Cov(Xp;Yq) We can derive the following formula: > p×q Cov(X;~ Y~ ) = E (X~ − E X~ )(Y~ − E Y~ ) 2 R :(W hy?) Properties of population covariance 1. Cov(X~ ) = Cov(X;~ X~ ). 2. Cov(bX~ + d~) = b2Cov(X~ ). 3. Cov(X;~ Y~ ) = Cov(Y;~ X~ )>. 4. Cov(CX~ + d~) = CCov(X~ )C>. Proof. The population mean of CX~ + d~ is E CX~ + d~ = C E(X~ ) + d~ = C~µ + d:~ The population covariance matrix of CX~ + d~ is > Cov(CX~ + d~) = E (CX~ + d~ − (C~µ + d~))(CX~ + d~ − (C~µ + d~)) > > = E C(X~ − ~µ)(X~ − ~µ) C > > = C E(X~ − ~µ)(X~ − ~µ) C > = CΣxC 5. Cov(CX;~ DY~ ) = CCov(X;~ Y~ )D>. (Why?) Pn ~ Pm ~ Pn Pm ~ ~ 6. Cov i=1 biXi; j=1 djYj = i=1 j=1 bidjCov Xi; Yj . 4 Proof. To show this result, for any i = 1; : : : ; n and j = 1; : : : ; m, we denote 2 3 2 3 Xi1 Yj1 6Xi27 6Yj27 X~ = 6 7 Y~ = 6 7 i 6 . 7 j 6 . 7 4 . 5 4 . 5 Xip Yjp The (`k)-entry of the left is 0 n m 1 X X Cov @ biXi`; djYjkA : i=1 j=1 The (`k)-entry of the left is n m X X bidjCov (Xi`;Yjk) : i=1 j=1 Therefore, the left equals the right. ~ p ~ q ~ ~ 7. If X 2 R and Y 2 R are independent random vectors, then Cov(X; Y ) = 0p×q. (Why?) ~ ~ p 8. For mutually independent random vectors X1;:::; Xn 2 R , we have ~ ~ 2 ~ 2 ~ Cov(a1X1 + ::: + anXn + ~c) = a1Cov(X1) + ::: + anCov(Xn): (Why?) 4 Multivariate normality 2 3 X1 6X27 Definition 1. A p × 1 random vector X~ = 6 7 ∼ N (~µ, Σ), if and only if its pdf is 6 . 7 p 4 . 5 Xp 1 1 > −1 f(~x) = p 1 exp − (~x − ~µ) Σ (~x − ~µ) (2π) 2 jΣj 2 2 ~ In this case, X1;:::;Xp are said to jointly normally distributed as N (~µ, Σ). We also have E[X] = ~µ and Cov(X~ ) = Σ. Moreover, the distribution N (~0; Ip) is referred to as the p-dimensional standard normal distribution. ~ q×p ~ q Property 1: If X ∼ Np(~µ, Σ), then for C 2 R , d 2 R , then > CX~ + d~ ∼ Nq(C~µ + d;~ CΣC ): ~ p Property 2: If X ∼ Np(~µ, Σ), then for ~a 2 R , b 2 R, then > > > a1X1 + ::: + apXp + b = ~a X~ + b ∼ N (~a ~µ + b;~a Σ~a): 5 2 3 X1 6X27 Property 3: If X~ = 6 7 ∼ N (~µ, Σ), then X ;X ;:::;X are mutually independent if and only if Σ 6 . 7 p 1 2 p 4 . 5 Xp is a diagonal matrix, i.e., Cov(Xj;Xk) = 0 for any j 6= k. Property 4: Suppose Σ is positive definite. If X~ ∼ Np(~µ, Σ), then > ~ −1 ~ 2 X − ~µ Σ X − ~µ ∼ χp: − 1 − 1 − 1 − 1 Proof. Let Z~ = Σ 2 X~ − ~µ . We have E(Z~) = Σ 2 (~µ − ~µ) = ~0 and Cov(Z~) = Σ 2 Cov(X~ )Σ 2 = I. i:i:d: This implies that Z~ ∼ Np(~0; I), i.e., Z1;:::;Zp ∼ N (0; 1). Then > > −1 − 1 − 1 X~ − ~µ Σ X~ − ~µ = X~ − ~µ Σ 2 Σ 2 X~ − ~µ > − 1 − 1 = Σ 2 X~ − ~µ Σ 2 X~ − ~µ = Z~ >Z~ 2 2 = Z1 + ::: + Zp 2 ∼ χp Property 5: If X~ 1;:::; X~ n are independent multivariate normal random vectors, then X~ 1 + ::: + X~ n is a multivariate normal random vector. Property 6: (The marginal distributions of a multivariate normal distribution are also multivariate normal distributions.) Suppose X~ ∼ Np(~µ, Σ). Considering the partition " # X~ (1) X~ = ; X~ (2) (1) q (2) p−q where X~ 2 R and X~ 2 R . Correspondingly (1) ~µ Σ11 Σ12 ~µ = (2) ; Σ = : ~µ Σ21 Σ22 ~ (1) (1) ~ (2) (2) p−q Then X ∼ N ~µ ; Σ11 and X ∼ N ~µ ; Σ22 . Moreover, for any ~x 2 R , ~ (1) ~ (2) (1) −1 (2) −1 X j(X = ~x) ∼ Np1 ~µ + Σ12Σ22 ~x − ~µ ; Σ11 − Σ12Σ22 Σ21 : 5 Random samples Recall that for an observed (fixed) sample ~x1; : : : ; ~xn from a p-variate population, they form a data matrix 2 3 2 >3 x11 x12 : : : x1p ~x1 > 6x21 x22 : : : x2p7 6~x2 7 X = 6 7 = 6 7 : 6 .

Random Vectors, Multivariate Normality and Random Samples

Ph 21.5: Covariance and Principal Component Analysis (PCA)

6 Probability Density Functions (Pdfs)

Covariance of Cross-Correlations: Towards Efﬁcient Measures for Large-Scale Structure

Lecture 4 Multivariate Normal Distribution and Multivariate CLT

The Variance Ellipse

Principal Components Analysis (Pca)

3 Random Vectors

Covariance Matrix Estimation in Time Series Wei Biao Wu and Han Xiao June 15, 2011

The Multivariate Normal Distribution

Portfolio Allocation with Skewness Risk: a Practical Guide∗

Multivariate Distributions

The Multivariate Normal Distribution