Lecture 1. Random Vectors and Multivariate Normal Distribution

Lecture 1. Random vectors and multivariate normal distribution 1.1 Moments of random vector A random vector X of size p is a column vector consisting of p random variables X1;:::;Xp 0 and is X = (X1;:::;Xp) . The mean or expectation of X is defined by the vector of expectations, 0 1 E(X1) B . C µ ≡ E(X) = @ . A ; E(Xp) which exists if EjXij < 1 for all i = 1; : : : ; p. Lemma 1. Let X be a random vector of size p and Y be a random vector of size q. For any non-random matrices A(m×p), B(m×q), C(1×n), and D(m×n), E(AX + BY ) = AE(X) + BE(Y ); E(AXC + D) = AE(X)C + D: 2 For a random vector X of size p satisfying E(Xi ) < 1 for all i = 1; : : : ; p, the variance{ covariance matrix (or just covariance matrix) of X is Σ ≡ Cov(X) = E[(X − EX)(X − EX)0]: The covariance matrix of X is a p × p square, symmetric matrix. In particular, Σij = Cov(Xi;Xj) = Cov(Xj;Xi) = Σji. Some properties: 1. Cov(X) = E(XX0) − E(X)E(X)0: 2. If c = c(p×1) is a constant, Cov(X + c) = Cov(X). 0 3. If A(m×p) is a constant, Cov(AX) = ACov(X)A . Lemma 2. The p × p matrix Σ is a covariance matrix if and only if it is non-negative definite. 1.2 Multivariate normal distribution - nonsingular case Recall that the univariate normal distribution with mean µ and variance σ2 has density 2 − 1 1 −2 f(x) = (2πσ ) 2 exp[− (x − µ)σ (x − µ)]: 2 Similarly, the multivariate normal distribution for the special case of nonsingular covariance matrix Σ is defined as follows. p p Definition 1. Let µ 2 R and Σ(p×p) > 0. A random vector X 2 R has p-variate normal distribution with mean µ and covariance matrix Σ if it has probability density function − 1 1 0 −1 f(x) = j2πΣj 2 exp − (x − µ) Σ (x − µ) ; (1) 2 p for x 2 R . We use the notation X ∼ Np(µ; Σ). Theorem 3. If X ∼ Np(µ; Σ) for Σ > 0, then − 1 1. Y = Σ 2 (X − µ) ∼ Np(0; Ip), L 1 2. X = Σ 2 Y + µ where Y ∼ Np(0; Ip), 3. E(X) = µ and Cov(X) = Σ, 4. for any fixed v 2 Rp , v0X is univariate normal. 5. U = (X − µ)0Σ−1(X − µ) ∼ χ2(p). Example 1 (Bivariate normal). 1.2.1 Geometry of multivariate normal The multivariate normal distribution has location parameter µ and the shape parameter Σ > 0. In particular, let's look into the contour of equal density p Ec = fx 2 R : f(x) = c0g p 0 −1 2 = fx 2 R :(x − µ) Σ (x − µ) = c g: 0 Moreover, consider the spectral decomposition of Σ = UΛU where U = [u1;:::; up] and Λ = diag(λ1; : : : ; λp) with λ1 ≥ λ2 ≥ ::: ≥ λp > 0. The Ec, for any pc > 0, is an ellipsoid centered around µ with principal axes ui of length proportional to λi. If Σ = Ip, the ellipsoid is the surface of a sphere of radius c centered at µ. As an example, consider a bivariate normal distribution N2(0; Σ) with 2 1 cos(π=4) − sin(π=4) 3 0 cos(π=4) − sin(π=4)0 Σ = = : 1 2 sin(π=4) cos(π=4) 0 1 sin(π=4) cos(π=4) The location of the distribution is the origin (µ = 0), and the shape (Σ) of the distribution is determined by the ellipse given by the two principal axes (one at 45 degree line, the other at -45 degree line). Figure 1 shows the density function and the corresponding Ec for c = 0:5; 1; 1:5; 2;:::. 2 Figure 1: Bivariate normal density and its contours. Notice that an ellipses in the plane can represent a bivariate normal distribution. In higher dimensions d > 2, ellipsoids play the similar role. 1.3 General multivariate normal distribution The characteristic function of a random vector X is defined as it0X p 'X (t) = E(e ); for t 2 R : Note that the characteristic function is C-valued, and always exists. We collect some important facts. L 1. 'X (t) = 'Y (t) if and only if X = Y . 2. If X and Y are independent, then 'X+Y (t) = 'X (t)'Y (t). 3. Xn ) X if and only if 'Xn (t) ! 'X (t) for all t. An important corollary follows from the uniqueness of the characteristic function. Corollary 4 (Cramer{Wold device). If X is a p × 1 random vector then its distribution is uniquely determined by the distributions of linear functions of t0X, for every t 2 Rp. Corollary 4 paves the way to the definition of (general) multivariate normal distribution. Definition 2. A random vector X 2 Rp has a multivariate normal distribution if t0X is an univariate normal for all t 2 Rp. The definition says that X is MVN if every projection of X onto a 1-dimensional subspace is normal, with a convention that a degenerate distribution δc has a normal distribution with variance 0, i.e., c ∼ N(c; 0). The definition does not require that Cov(X) is nonsingular. 3 Theorem 5. The characteristic function of a multivariate normal distribution with mean µ and covariance matrix Σ ≥ 0 is, for t 2 Rp, 1 '(t) = exp[it0µ − t0Σt]: 2 If Σ > 0, then the pdf exists and is the same as (1). In the following, the notation X ∼ N(µ; Σ) is valid for a non-negative definite Σ. How- ever, whenever Σ−1 appears in the statement, Σ is assumed to be positive definite. Proposition 6. If X ∼ Np(µ; Σ) and Y = AX + b for A(q×p) and b(q×1), then Y ∼ 0 Nq(Aµ + b; AΣA ). Next two results are concerning independence and conditional distributions of normal random vectors. Let X1 and X2 be the partition of X whose dimensions are r and s, r + s = p, and suppose µ and Σ are partitioned accordingly. That is, X1 µ1 Σ11 Σ12 X = ∼ Np ; : X2 µ2 Σ21 Σ22 Proposition 7. The normal random vectors X1 and X2 are independent if and only if Cov(X1; X2) = Σ12 = 0. Proposition 8. The conditional distribution of X1 given X2 = x2 is −1 −1 Nr(µ1 + Σ12Σ22 (x2 − µ2); Σ11 − Σ12Σ22 Σ21) ∗ −1 ∗ Proof. Consider new random vectors X1 = X1 − Σ12Σ22 X2 and X2 = X2, ∗ −1 ∗ X1 Ir −Σ12Σ22 X = ∗ = AX; A = : X2 0(s×r) Is By Proposition 6, X∗ is multivariate normal. An inspection of the covariance matrix of X∗ ∗ ∗ leads that X1 and X2 are independent. The result follows by writing ∗ −1 X1 = X1 + Σ12Σ22 X2; ∗ and that the distribution (law) of X1 given X2 = x2 is L(X1 j X2 = x2) = L(X1 + −1 ∗ −1 Σ12Σ22 X2 j X2 = x2) = L(X1 + Σ12Σ22 x2 j X2 = x2), which is a MVN of dimension r. 4 1.4 Multivariate Central Limit Theorem p If X1; X2;::: 2 R are i.i.d. with E(Xi) = µ and Cov(X) = Σ, then n − 1 X n 2 (Xj − µ) ) Np(0; Σ) as n ! 1; j=1 or equivalently, 1 ¯ n 2 (Xn − µ) ) Np(0; Σ) as n ! 1; ¯ 1 Pn where Xn = 2 j=1 Xj. ¯ The delta-method can be used for asymptotic normality of h(Xn) for some function h : Rp ! R. In particular, denote rh(x) for the gradient of h at x. Using the first two terms of Taylor series, ¯ 0 ¯ ¯ 2 h(Xn) = h(µ) + (rh(µ)) (Xn − µ) + Op(kXn − µk2); Then Slutsky's theorem gives the result, p p p ¯ 0 ¯ ¯ 0 ¯ n(h(Xn) − h(µ)) = (rh(µ)) n(Xn − µ) + Op( n(Xn − µ) (Xn − µ)) 0 ) (rh(µ)) Np(0; Σ) as n ! 1; 0 = Np(0; (rh(µ)) Σ(rh(µ))) 1.5 Quadratic forms in normal random vectors Let X ∼ Np(µ; Σ). A quadratic form in X is a random variable of the form p p 0 X X Y = X AX = XiaijXj; i=1 j=1 where A is a p × p symmetric matrix and Xi is the ith element of X. We are interested in the distribution of quadratic forms and the conditions under which two quadratic forms are independent. Example 2. A special case: If X ∼ Np(0; Ip) and A = Ip, p 0 0 X 2 2 Y = X AX = X X = Xi ∼ χ (p): i=1 Fact 1. Recall the following: 1.A p × p matrix A is idempotent if A2 = A. 0 2. If A is symmetric, then A = Γ ΛΓ, where Λ = diag(λi) and Γ is orthogonal. 3. If A is symmetric idempotent, 5 (a) its eigenvalues are either 0 or 1, (b) rank(A) = #fnon zero eigenvaluesg = trace(A). 2 Theorem 9. Let X ∼ Np(0; σ I) and A be a p × p symmetric matrix. Then X0AX Y = ∼ χ2(m) σ2 if and only if A is idempotent of rank m < p. Corollary 10. Let X ∼ Np(0; Σ) and A be a p × p symmetric matrix. Then Y = X0AX ∼ χ2(m) if and only if either i) AΣ is idempotent of rank m or ii) ΣA is idempotent of rank m. 0 −1 2 Example 3. If X ∼ Np(µ; Σ) then (X − µ) Σ (X − µ) ∼ χ (p). Theorem 11. Let X ∼ Np(0; I) and A be a p × p symmetric matrix, and B be a k × p matrix.

Load more