<<

Lecture 1. Random vectors and multivariate

1.1 Moments of random vector

A random vector X of size p is a column vector consisting of p random variables X1,...,Xp 0 and is X = (X1,...,Xp) . The or expectation of X is defined by the vector of expectations,   E(X1)  .  µ ≡ E(X) =  .  , E(Xp)

which exists if E|Xi| < ∞ for all i = 1, . . . , p. Lemma 1. Let X be a random vector of size p and Y be a random vector of size q. For any non-random matrices A(m×p), B(m×q), C(1×n), and D(m×n),

E(AX + BY ) = AE(X) + BE(Y ),

E(AXC + D) = AE(X)C + D.

2 For a random vector X of size p satisfying E(Xi ) < ∞ for all i = 1, . . . , p, the variance– (or just ) of X is

Σ ≡ Cov(X) = E[(X − EX)(X − EX)0].

The covariance matrix of X is a p × p , . In particular, Σij = Cov(Xi,Xj) = Cov(Xj,Xi) = Σji. Some properties:

1. Cov(X) = E(XX0) − E(X)E(X)0.

2. If c = c(p×1) is a constant, Cov(X + c) = Cov(X).

0 3. If A(m×p) is a constant, Cov(AX) = ACov(X)A . Lemma 2. The p × p matrix Σ is a covariance matrix if and only if it is non-negative definite.

1.2 Multivariate normal distribution - nonsingular case

Recall that the normal distribution with mean µ and σ2 has density

2 − 1 1 −2 f(x) = (2πσ ) 2 exp[− (x − µ)σ (x − µ)]. 2 Similarly, the multivariate normal distribution for the special case of nonsingular covariance matrix Σ is defined as follows. p p Definition 1. Let µ ∈ and Σ(p×p) > 0. A random vector X ∈ R has p-variate normal distribution with mean µ and covariance matrix Σ if it has density   − 1 1 0 −1 f(x) = |2πΣ| 2 exp − (x − µ) Σ (x − µ) , (1) 2

p for x ∈ R . We use the notation X ∼ Np(µ, Σ).

Theorem 3. If X ∼ Np(µ, Σ) for Σ > 0, then

− 1 1. Y = Σ 2 (X − µ) ∼ Np(0, Ip),

L 1 2. X = Σ 2 Y + µ where Y ∼ Np(0, Ip), 3. E(X) = µ and Cov(X) = Σ,

4. for any fixed v ∈ Rp , v0X is univariate normal. 5. U = (X − µ)0Σ−1(X − µ) ∼ χ2(p).

Example 1 (Bivariate normal).

1.2.1 Geometry of multivariate normal

The multivariate normal distribution has location µ and the Σ > 0. In particular, let’s look into the contour of equal density

p Ec = {x ∈ R : f(x) = c0} p 0 −1 2 = {x ∈ R :(x − µ) Σ (x − µ) = c }.

0 Moreover, consider the spectral decomposition of Σ = UΛU where U = [u1,..., up] and Λ = diag(λ1, . . . , λp) with λ1 ≥ λ2 ≥ ... ≥ λp > 0. The Ec, for any √c > 0, is an centered around µ with principal axes ui of length proportional to λi. If Σ = Ip, the ellipsoid is the surface of a sphere of radius c centered at µ.

As an example, consider a bivariate normal distribution N2(0, Σ) with

2 1 cos(π/4) − sin(π/4) 3 0 cos(π/4) − sin(π/4)0 Σ = = . 1 2 sin(π/4) cos(π/4) 0 1 sin(π/4) cos(π/4)

The location of the distribution is the origin (µ = 0), and the shape (Σ) of the distribution is determined by the given by the two principal axes (one at 45 degree line, the other at -45 degree line). Figure 1 shows the density function and the corresponding Ec for c = 0.5, 1, 1.5, 2,....

2 Figure 1: Bivariate normal density and its contours. Notice that an in the plane can represent a bivariate normal distribution. In higher dimensions d > 2, play the similar role.

1.3 General multivariate normal distribution

The of a random vector X is defined as

it0X p ϕX (t) = E(e ), for t ∈ R .

Note that the characteristic function is C-valued, and always exists. We collect some important facts.

L 1. ϕX (t) = ϕY (t) if and only if X = Y .

2. If X and Y are independent, then ϕX+Y (t) = ϕX (t)ϕY (t).

3. Xn ⇒ X if and only if ϕXn (t) → ϕX (t) for all t.

An important corollary follows from the uniqueness of the characteristic function.

Corollary 4 (Cramer–Wold device). If X is a p × 1 random vector then its distribution is uniquely determined by the distributions of linear functions of t0X, for every t ∈ Rp.

Corollary 4 paves the way to the definition of (general) multivariate normal distribution.

Definition 2. A random vector X ∈ Rp has a multivariate normal distribution if t0X is an univariate normal for all t ∈ Rp.

The definition says that X is MVN if every projection of X onto a 1-dimensional subspace is normal, with a convention that a δc has a normal distribution with variance 0, i.e., c ∼ N(c, 0). The definition does not require that Cov(X) is nonsingular.

3 Theorem 5. The characteristic function of a multivariate normal distribution with mean µ and covariance matrix Σ ≥ 0 is, for t ∈ Rp, 1 ϕ(t) = exp[it0µ − t0Σt]. 2 If Σ > 0, then the pdf exists and is the same as (1).

In the following, the notation X ∼ N(µ, Σ) is valid for a non-negative definite Σ. How- ever, whenever Σ−1 appears in the statement, Σ is assumed to be positive definite.

Proposition 6. If X ∼ Np(µ, Σ) and Y = AX + b for A(q×p) and b(q×1), then Y ∼ 0 Nq(Aµ + b, AΣA ).

Next two results are concerning independence and conditional distributions of normal random vectors. Let X1 and X2 be the partition of X whose dimensions are r and s, r + s = p, and suppose µ and Σ are partitioned accordingly. That is,       X1 µ1 Σ11 Σ12 X = ∼ Np , . X2 µ2 Σ21 Σ22

Proposition 7. The normal random vectors X1 and X2 are independent if and only if Cov(X1, X2) = Σ12 = 0.

Proposition 8. The conditional distribution of X1 given X2 = x2 is

−1 −1 Nr(µ1 + Σ12Σ22 (x2 − µ2), Σ11 − Σ12Σ22 Σ21)

∗ −1 ∗ Proof. Consider new random vectors X1 = X1 − Σ12Σ22 X2 and X2 = X2,

 ∗  −1 ∗ X1 Ir −Σ12Σ22 X = ∗ = AX, A = . X2 0(s×r) Is By Proposition 6, X∗ is multivariate normal. An inspection of the covariance matrix of X∗ ∗ ∗ leads that X1 and X2 are independent. The result follows by writing

∗ −1 X1 = X1 + Σ12Σ22 X2,

∗ and that the distribution (law) of X1 given X2 = x2 is L(X1 | X2 = x2) = L(X1 + −1 ∗ −1 Σ12Σ22 X2 | X2 = x2) = L(X1 + Σ12Σ22 x2 | X2 = x2), which is a MVN of dimension r.

4 1.4 Multivariate Central Theorem

p If X1, X2,... ∈ R are i.i.d. with E(Xi) = µ and Cov(X) = Σ, then

n − 1 X n 2 (Xj − µ) ⇒ Np(0, Σ) as n → ∞, j=1

or equivalently, 1 ¯ n 2 (Xn − µ) ⇒ Np(0, Σ) as n → ∞, ¯ 1 Pn where Xn = 2 j=1 Xj. ¯ The delta-method can be used for asymptotic of h(Xn) for some function h : Rp → R. In particular, denote ∇h(x) for the gradient of h at x. Using the first two terms of ,

¯ 0 ¯ ¯ 2 h(Xn) = h(µ) + (∇h(µ)) (Xn − µ) + Op(kXn − µk2), Then Slutsky’s theorem gives the result, √ √ √ ¯ 0 ¯ ¯ 0 ¯ n(h(Xn) − h(µ)) = (∇h(µ)) n(Xn − µ) + Op( n(Xn − µ) (Xn − µ)) 0 ⇒ (∇h(µ)) Np(0, Σ) as n → ∞, 0 = Np(0, (∇h(µ)) Σ(∇h(µ)))

1.5 Quadratic forms in normal random vectors

Let X ∼ Np(µ, Σ). A in X is a random of the form

p p 0 X X Y = X AX = XiaijXj, i=1 j=1

where A is a p × p symmetric matrix and Xi is the ith element of X. We are interested in the distribution of quadratic forms and the conditions under which two quadratic forms are independent.

Example 2. A special case: If X ∼ Np(0, Ip) and A = Ip,

p 0 0 X 2 2 Y = X AX = X X = Xi ∼ χ (p). i=1 Fact 1. Recall the following:

1.A p × p matrix A is idempotent if A2 = A.

0 2. If A is symmetric, then A = Γ ΛΓ, where Λ = diag(λi) and Γ is orthogonal. 3. If A is symmetric idempotent,

5 (a) its eigenvalues are either 0 or 1, (b) rank(A) = #{non zero eigenvalues} = trace(A).

2 Theorem 9. Let X ∼ Np(0, σ I) and A be a p × p symmetric matrix. Then X0AX Y = ∼ χ2(m) σ2 if and only if A is idempotent of rank m < p.

Corollary 10. Let X ∼ Np(0, Σ) and A be a p × p symmetric matrix. Then

Y = X0AX ∼ χ2(m)

if and only if either i) AΣ is idempotent of rank m or ii) ΣA is idempotent of rank m.

0 −1 2 Example 3. If X ∼ Np(µ, Σ) then (X − µ) Σ (X − µ) ∼ χ (p).

Theorem 11. Let X ∼ Np(0, I) and A be a p × p symmetric matrix, and B be a k × p matrix. If BA = 0, then BX and X0AX are independent.

2 ¯ 2 Example 4. Let Xi ∼ N(µ, σ ) i.i.d. The sample mean Xn and the sample variance Sn = 2 −1 Pn ¯ 2 Sn 2 (n − 1) i=1(Xi − Xn) are independent. Moreover, (n − 1) σ2 ∼ χ (n − 1).

Theorem 12. Let X ∼ Np(0, I). Suppose A and B are p × p symmetric matrices. If BA = 0, then X0AX and X0BX are independent.

Corollary 13. Let X ∼ Np(0, Σ) and A be a p × p symmetric matrix.

0 1. For B(k×p), BX and X AX are independent if BΣA = 0; 2. For symmetric B, X0AX and X0BX are independent if BΣA = 0.

Example 5. The residual sum of squares in the standard has a scaled chi- squared distribution and is independent with the coefficient estimates.

Next lecture is on the distribution of the sample covariance matrix.

6