Quick viewing(Text Mode)

Lecture 1. Random Vectors and Multivariate Normal Distribution

Lecture 1. Random Vectors and Multivariate Normal Distribution

Lecture 1. Random vectors and multivariate

1.1 Moments of random vector

A random vector X of size p is a column vector consisting of p random variables X1,...,Xp 0 and is X = (X1,...,Xp) . The or expectation of X is defined by the vector of expec- tations,   E(X1)  .  µ ≡ E(X) =  .  , E(Xp) which exists if E|Xi| < ∞ for all i = 1, . . . , p. Lemma 1. Let X be a random vector of size p and Y be a random vector of size q. For any non-random matrices A(m×p), B(m×q), C(1×n), and D(m×n),

E(AX + BY) = AE(X) + BE(Y),

E(AXC + D) = AE(X)C + D.

2 For a random vector X of size p satisfying E(Xi ) < ∞ for all i = 1, . . . , p, the variance– (or just ) of X is

Σ ≡ Cov(X) = E[(X − EX)(X − EX)0].

The covariance matrix of X is a p × p , . In particular, Σij = Cov(Xi,Xj) = Cov(Xj,Xi) = Σji. Some properties:

1. Cov(X) = E(XX0) − E(X)E(X)0.

2. If c = c(p×1) is a constant, Cov(X + c) = Cov(X).

0 3. If A(m×p) is a constant, Cov(AX) = ACov(X)A . Lemma 2. The p × p matrix Σ is a covariance matrix if and only if it is non-negative definite.

1.2 Multivariate normal distribution - nonsingular case

Recall that the normal distribution with mean µ and σ2 has density

2 − 1 1 −2 f(x) = (2πσ ) 2 exp[− (x − µ)σ (x − µ)]. 2 Similarly, the multivariate normal distribution for the special case of nonsingular covariance matrix Σ is defined as follows. p p Definition 1. Let µ ∈ R and Σ(p×p) > 0. A random vector X ∈ R has p-variate normal distribution with mean µ and covariance matrix Σ if it has density function   − 1 1 0 −1 f(x) = |2πΣ| 2 exp − (x − µ) Σ (x − µ) , (1) 2

p for x ∈ R . We use the notation X ∼ Np(µ, Σ).

Theorem 3. If X ∼ Np(µ, Σ) for Σ > 0, then

− 1 1. Y = Σ 2 (X − µ) ∼ Np(0, Ip),

L 1 2. X = Σ 2 Y + µ where Y ∼ Np(0, Ip), 3. E(X) = µ and Cov(X) = Σ,

4. for any fixed v ∈ Rp , v0X is univariate normal. 5. U = (X − µ)0Σ−1(X − µ) ∼ χ2(p).

Example 1 (Bivariate normal).

1.2.1 of multivariate normal

The multivariate normal distribution has location µ and the Σ > 0. In particular, let’s look into the contour of equal density

p Ec = {x ∈ R : f(x) = c0} p 0 −1 2 = {x ∈ R :(x − µ) Σ (x − µ) = c }.

0 Moreover, consider the spectral decomposition of Σ = UΛU where U = [u1,..., up] and Λ = diag(λ1, . . . , λp) with λ1 ≥ λ2 ≥ ... ≥ λp > 0. The Ec, for any √c > 0, is an centered around µ with principal axes ui of length proportional to λi. If Σ = Ip, the ellipsoid is the surface of a sphere of radius c centered at µ.

As an example, consider a bivariate normal distribution N2(0, Σ) with

2 1 cos(π/4) − sin(π/4) 3 0 cos(π/4) − sin(π/4)0 Σ = = . 1 2 sin(π/4) cos(π/4) 0 1 sin(π/4) cos(π/4)

The location of the distribution is the origin (µ = 0), and the shape (Σ) of the distribution is determined by the ellipse given by the two principal axes (one at 45 degree line, the other at -45 degree line). Figure 1 shows the density function and the corresponding Ec for c = 0.5, 1, 1.5, 2,....

2 Figure 1: Bivariate normal density and its contours. Notice that an ellipses in the plane can represent a bivariate normal distribution. In higher dimensions d > 2, play the similar role.

1.3 General multivariate normal distribution

The characteristic function of a random vector X is defined as

it0X p ϕX(t) = E(e ), for t ∈ R .

Note that the characteristic function is C-valued, and always exists. We collect some important facts.

L 1. ϕX(t) = ϕY(t) if and only if X = Y.

2. If X and Y are independent, then ϕX+Y = ϕX(t)ϕY(t).

3. Xn ⇒ X if and only if ϕXn (t) → ϕX(t) for all t.

An important corollary follows from the uniqueness of the characteristic function.

Corollary 4 (Cramer–Wold device). If X is a p × 1 random vector then its distribution is uniquely determined by the distributions of linear functions of t0X, for every t ∈ Rp.

Corollary 4 paves the way to the definition of (general) multivariate normal distribution.

Definition 2. A random vector X ∈ Rp has a multivariate normal distribution if t0X is an univariate normal for all t ∈ Rp.

The definition says that X is MVN if every projection of X onto a 1-dimensional subspace is normal, with a convention that a δc has a normal distribution with variance 0, i.e., c ∼ N(c, 0). The definition does not require that Cov(X) is nonsingular.

3 Theorem 5. The characteristic function of a multivariate normal distribution with mean µ and covariance matrix Σ ≥ 0 is, for t ∈ Rp, 1 ϕ(t) = exp[it0µ − t0Σt]. 2 If Σ > 0, then the pdf exists and is the same as (1).

In the following, the notation X ∼ N(µ, Σ) is valid for a non-negative definite Σ. How- ever, whenever Σ−1 appears in the statement, Σ is assumed to be positive definite.

Proposition 6. If X ∼ Np(µ, Σ) and Y = AX + b for A(q×p) and b(q×1), then Y ∼ 0 Nq(Aµ + b, AΣA ).

Next two results are concerning independence and conditional distributions of normal random vectors. Let X1 and X2 be the partition of X whose dimensions are r and s, r + s = p, and suppose µ and Σ are partitioned accordingly. That is,       X1 µ1 Σ11 Σ12 X = ∼ Np , . X2 µ2 Σ21 Σ22

Proposition 7. The normal random vectors X1 and X2 are independent if and only if Cov(X1, X2) = Σ12 = 0.

Proposition 8. The conditional distribution of X1 given X2 = x2 is

−1 −1 Nr(µ1 + Σ12Σ22 (x2 − µ2), Σ11 − Σ12Σ22 Σ21)

∗ −1 ∗ Proof. Consider new random vectors X1 = X1 − Σ12Σ22 X2 and X2 = X2,

 ∗  −1 ∗ X1 Ir −Σ12Σ22 X = ∗ = AX, A = . X2 0(s×r) Is By Proposition 6, X∗ is multivariate normal. An inspection of the covariance matrix of X∗ ∗ ∗ leads that X1 and X2 are independent. The result follows by writing

∗ −1 X1 = X1 + Σ12Σ22 X2,

∗ −1 and that the distribution (law) of X1 given X2 = x2 is L(X1 | X2 = x2) = L(X1+Σ12Σ22 X2 | ∗ −1 X2 = x2) = L(X1 + Σ12Σ22 x2 | X2 = x2), which is a MVN of dimension r.

4 1.4 Multivariate

p If X1, X2,... ∈ R are i.i.d. with E(Xi) = µ and Cov(X) = Σ, then n − 1 X n 2 (Xj − µ) ⇒ Np(0, Σ) as n → ∞, j=1 or equivalently, 1 ¯ n 2 (Xn − µ) ⇒ Np(0, Σ) as n → ∞, ¯ 1 Pn where Xn = 2 j=1 Xj. ¯ The delta-method can be used for asymptotic normality of h(Xn) for some function h : Rp → R. In particular, denote ∇h(x) for the gradient of h at x. Using the first two terms of ,

¯ 0 ¯ ¯ 2 h(Xn) = h(µ) + (∇h(µ)) (Xn − µ) + Op(kXn − µk2), Then Slutsky’s theorem gives the result, √ √ √ ¯ 0 ¯ ¯ 0 ¯ n(h(Xn) − h(µ)) = (∇h(µ)) n(Xn − µ) + Op( n(Xn − µ) (Xn − µ)) 0 ⇒ (∇h(µ)) Np(0, Σ) as n → ∞, 0 = Np(0, (∇h(µ)) Σ(∇h(µ)))

1.5 Quadratic forms in normal random vectors

Let X ∼ Np(µ, Σ). A quadratic form in X is a random of the form

p p 0 X X Y = X AX = XiaijXj, i=1 j=1 where A is a p × p symmetric matrix. We are interested in the distribution of quadratic forms and the conditions under which two quadratic forms are independent.

Example 2. A special case: If X ∼ Np(0, Ip) and A = Ip,

p 0 0 X 2 2 Y = X AX = X X = Xi ∼ χ (p). i=1 Fact 1. Recall the following:

1.A p × p matrix A is idempotent if A2 = A.

0 2. If A is symmetric, then A = Γ ΛΓ, where Λ = diag(λi) and Γ is orthogonal. 3. If A is symmetric idempotent,

(a) its eigenvalues are either 0 or 1,

5 (b) rank(A) = #{non zero eigenvalues} = (A).

2 Theorem 9. Let X ∼ Np(0, σ I) and A be a p × p symmetric matrix. Then X0AX Y = ∼ χ2(m) σ2 if and only if A is idempotent of rank m < p.

Corollary 10. Let X ∼ Np(0, Σ) and A be a p × p symmetric matrix. Then

Y = X0AX ∼ χ2(m) if and only if either i) AΣ is idempotent of rank m or ii) ΣA is idempotent of rank m.

0 −1 2 Example 3. If X ∼ Np(µ, Σ) then (X − µ) Σ (X − µ) ∼ χ (p).

Theorem 11. Let X ∼ Np(0, I) and A be a p × p symmetric matrix, and B be a k × p matrix. If BA = 0, then BX and X0AX are independent.

2 ¯ 2 Example 4. Let Xi ∼ N(µ, σ ) i.i.d. The mean Xn and the sample variance Sn = 2 −1 Pn ¯ 2 Sn 2 (n − 1) i=1(Xi − Xn) are independent. Moreover, (n − 1) σ2 ∼ χ (n − 1).

Theorem 12. Let X ∼ Np(0, I). Suppose A and B are p × p symmetric matrices. If BA = 0, then X0AX and X0BX are independent.

Corollary 13. Let X ∼ Np(0, Σ) and A be a p × p symmetric matrix.

0 1. For B(k×p), BX and X AX are independent if BΣA = 0; 2. For symmetric B, X0AX and X0BX are independent if BΣA = 0.

Example 5. The residual sum of squares in the standard has a scaled chi- squared distribution and is independent with the coefficient estimates.

Next lecture is on the distribution of the sample covariance matrix.

6