Lecture 1. Random Vectors and Multivariate Normal Distribution

Lecture 1. Random vectors and multivariate normal distribution 1.1 Moments of random vector A random vector X of size p is a column vector consisting of p random variables X1;:::;Xp 0 and is X = (X1;:::;Xp) . The mean or expectation of X is defined by the vector of expec- tations, 0 1 E(X1) B . C µ ≡ E(X) = @ . A ; E(Xp) which exists if EjXij < 1 for all i = 1; : : : ; p. Lemma 1. Let X be a random vector of size p and Y be a random vector of size q. For any non-random matrices A(m×p), B(m×q), C(1×n), and D(m×n), E(AX + BY) = AE(X) + BE(Y); E(AXC + D) = AE(X)C + D: 2 For a random vector X of size p satisfying E(Xi ) < 1 for all i = 1; : : : ; p, the variance{ covariance matrix (or just covariance matrix) of X is Σ ≡ Cov(X) = E[(X − EX)(X − EX)0]: The covariance matrix of X is a p × p square, symmetric matrix. In particular, Σij = Cov(Xi;Xj) = Cov(Xj;Xi) = Σji. Some properties: 1. Cov(X) = E(XX0) − E(X)E(X)0: 2. If c = c(p×1) is a constant, Cov(X + c) = Cov(X). 0 3. If A(m×p) is a constant, Cov(AX) = ACov(X)A . Lemma 2. The p × p matrix Σ is a covariance matrix if and only if it is non-negative definite. 1.2 Multivariate normal distribution - nonsingular case Recall that the univariate normal distribution with mean µ and variance σ2 has density 2 − 1 1 −2 f(x) = (2πσ ) 2 exp[− (x − µ)σ (x − µ)]: 2 Similarly, the multivariate normal distribution for the special case of nonsingular covariance matrix Σ is defined as follows. p p Definition 1. Let µ 2 R and Σ(p×p) > 0. A random vector X 2 R has p-variate normal distribution with mean µ and covariance matrix Σ if it has probability density function − 1 1 0 −1 f(x) = j2πΣj 2 exp − (x − µ) Σ (x − µ) ; (1) 2 p for x 2 R . We use the notation X ∼ Np(µ, Σ). Theorem 3. If X ∼ Np(µ, Σ) for Σ > 0, then − 1 1. Y = Σ 2 (X − µ) ∼ Np(0; Ip), L 1 2. X = Σ 2 Y + µ where Y ∼ Np(0; Ip), 3. E(X) = µ and Cov(X) = Σ, 4. for any fixed v 2 Rp , v0X is univariate normal. 5. U = (X − µ)0Σ−1(X − µ) ∼ χ2(p). Example 1 (Bivariate normal). 1.2.1 Geometry of multivariate normal The multivariate normal distribution has location parameter µ and the shape parameter Σ > 0. In particular, let's look into the contour of equal density p Ec = fx 2 R : f(x) = c0g p 0 −1 2 = fx 2 R :(x − µ) Σ (x − µ) = c g: 0 Moreover, consider the spectral decomposition of Σ = UΛU where U = [u1;:::; up] and Λ = diag(λ1; : : : ; λp) with λ1 ≥ λ2 ≥ ::: ≥ λp > 0. The Ec, for any pc > 0, is an ellipsoid centered around µ with principal axes ui of length proportional to λi. If Σ = Ip, the ellipsoid is the surface of a sphere of radius c centered at µ. As an example, consider a bivariate normal distribution N2(0; Σ) with 2 1 cos(π=4) − sin(π=4) 3 0 cos(π=4) − sin(π=4)0 Σ = = : 1 2 sin(π=4) cos(π=4) 0 1 sin(π=4) cos(π=4) The location of the distribution is the origin (µ = 0), and the shape (Σ) of the distribution is determined by the ellipse given by the two principal axes (one at 45 degree line, the other at -45 degree line). Figure 1 shows the density function and the corresponding Ec for c = 0:5; 1; 1:5; 2;:::. 2 Figure 1: Bivariate normal density and its contours. Notice that an ellipses in the plane can represent a bivariate normal distribution. In higher dimensions d > 2, ellipsoids play the similar role. 1.3 General multivariate normal distribution The characteristic function of a random vector X is defined as it0X p 'X(t) = E(e ); for t 2 R : Note that the characteristic function is C-valued, and always exists. We collect some important facts. L 1. 'X(t) = 'Y(t) if and only if X = Y. 2. If X and Y are independent, then 'X+Y = 'X(t)'Y(t). 3. Xn ) X if and only if 'Xn (t) ! 'X(t) for all t. An important corollary follows from the uniqueness of the characteristic function. Corollary 4 (Cramer{Wold device). If X is a p × 1 random vector then its distribution is uniquely determined by the distributions of linear functions of t0X, for every t 2 Rp. Corollary 4 paves the way to the definition of (general) multivariate normal distribution. Definition 2. A random vector X 2 Rp has a multivariate normal distribution if t0X is an univariate normal for all t 2 Rp. The definition says that X is MVN if every projection of X onto a 1-dimensional subspace is normal, with a convention that a degenerate distribution δc has a normal distribution with variance 0, i.e., c ∼ N(c; 0). The definition does not require that Cov(X) is nonsingular. 3 Theorem 5. The characteristic function of a multivariate normal distribution with mean µ and covariance matrix Σ ≥ 0 is, for t 2 Rp, 1 '(t) = exp[it0µ − t0Σt]: 2 If Σ > 0, then the pdf exists and is the same as (1). In the following, the notation X ∼ N(µ, Σ) is valid for a non-negative definite Σ. How- ever, whenever Σ−1 appears in the statement, Σ is assumed to be positive definite. Proposition 6. If X ∼ Np(µ, Σ) and Y = AX + b for A(q×p) and b(q×1), then Y ∼ 0 Nq(Aµ + b; AΣA ). Next two results are concerning independence and conditional distributions of normal random vectors. Let X1 and X2 be the partition of X whose dimensions are r and s, r + s = p, and suppose µ and Σ are partitioned accordingly. That is, X1 µ1 Σ11 Σ12 X = ∼ Np ; : X2 µ2 Σ21 Σ22 Proposition 7. The normal random vectors X1 and X2 are independent if and only if Cov(X1; X2) = Σ12 = 0. Proposition 8. The conditional distribution of X1 given X2 = x2 is −1 −1 Nr(µ1 + Σ12Σ22 (x2 − µ2); Σ11 − Σ12Σ22 Σ21) ∗ −1 ∗ Proof. Consider new random vectors X1 = X1 − Σ12Σ22 X2 and X2 = X2, ∗ −1 ∗ X1 Ir −Σ12Σ22 X = ∗ = AX; A = : X2 0(s×r) Is By Proposition 6, X∗ is multivariate normal. An inspection of the covariance matrix of X∗ ∗ ∗ leads that X1 and X2 are independent. The result follows by writing ∗ −1 X1 = X1 + Σ12Σ22 X2; ∗ −1 and that the distribution (law) of X1 given X2 = x2 is L(X1 j X2 = x2) = L(X1+Σ12Σ22 X2 j ∗ −1 X2 = x2) = L(X1 + Σ12Σ22 x2 j X2 = x2), which is a MVN of dimension r. 4 1.4 Multivariate Central Limit Theorem p If X1; X2;::: 2 R are i.i.d. with E(Xi) = µ and Cov(X) = Σ, then n − 1 X n 2 (Xj − µ) ) Np(0; Σ) as n ! 1; j=1 or equivalently, 1 ¯ n 2 (Xn − µ) ) Np(0; Σ) as n ! 1; ¯ 1 Pn where Xn = 2 j=1 Xj. ¯ The delta-method can be used for asymptotic normality of h(Xn) for some function h : Rp ! R. In particular, denote rh(x) for the gradient of h at x. Using the first two terms of Taylor series, ¯ 0 ¯ ¯ 2 h(Xn) = h(µ) + (rh(µ)) (Xn − µ) + Op(kXn − µk2); Then Slutsky's theorem gives the result, p p p ¯ 0 ¯ ¯ 0 ¯ n(h(Xn) − h(µ)) = (rh(µ)) n(Xn − µ) + Op( n(Xn − µ) (Xn − µ)) 0 ) (rh(µ)) Np(0; Σ) as n ! 1; 0 = Np(0; (rh(µ)) Σ(rh(µ))) 1.5 Quadratic forms in normal random vectors Let X ∼ Np(µ, Σ). A quadratic form in X is a random variable of the form p p 0 X X Y = X AX = XiaijXj; i=1 j=1 where A is a p × p symmetric matrix. We are interested in the distribution of quadratic forms and the conditions under which two quadratic forms are independent. Example 2. A special case: If X ∼ Np(0; Ip) and A = Ip, p 0 0 X 2 2 Y = X AX = X X = Xi ∼ χ (p): i=1 Fact 1. Recall the following: 1.A p × p matrix A is idempotent if A2 = A. 0 2. If A is symmetric, then A = Γ ΛΓ, where Λ = diag(λi) and Γ is orthogonal. 3. If A is symmetric idempotent, (a) its eigenvalues are either 0 or 1, 5 (b) rank(A) = #fnon zero eigenvaluesg = trace(A). 2 Theorem 9. Let X ∼ Np(0; σ I) and A be a p × p symmetric matrix. Then X0AX Y = ∼ χ2(m) σ2 if and only if A is idempotent of rank m < p. Corollary 10. Let X ∼ Np(0; Σ) and A be a p × p symmetric matrix. Then Y = X0AX ∼ χ2(m) if and only if either i) AΣ is idempotent of rank m or ii) ΣA is idempotent of rank m. 0 −1 2 Example 3. If X ∼ Np(µ, Σ) then (X − µ) Σ (X − µ) ∼ χ (p). Theorem 11. Let X ∼ Np(0; I) and A be a p × p symmetric matrix, and B be a k × p matrix.

Lecture 1. Random Vectors and Multivariate Normal Distribution

1. How Different Is the T Distribution from the Normal?

LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function

Ph 21.5: Covariance and Principal Component Analysis (PCA)

6 Probability Density Functions (Pdfs)

5.1 Convergence in Distribution

A Study of Non-Central Skew T Distributions and Their Applications in Data Analysis and Change Point Detection

Spatial Autocorrelation: Covariance and Semivariance Semivariance

On the Meaning and Use of Kurtosis

Approaching Mean-Variance Efficiency for Large Portfolios

Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis

Characteristics and Statistics of Digital Remote Sensing Imagery (1)

A Multivariate Student's T-Distribution