The Multivariate Normal Distribution
Total Page：16
File Type：pdf, Size：1020Kb
Multivariate normal distribution Linear combinations and quadratic forms Marginal and conditional distributions The multivariate normal distribution Patrick Breheny September 2 Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 1 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Introduction • Today we will introduce the multivariate normal distribution and attempt to discuss its properties in a fairly thorough manner • The multivariate normal distribution is by far the most important multivariate distribution in statistics • It’s important for all the reasons that the one-dimensional Gaussian distribution is important, but even more so in higher dimensions because many distributions that are useful in one dimension do not easily extend to the multivariate case Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 2 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Inverse • Before we get to the multivariate normal distribution, let’s review some important results from linear algebra that we will use throughout the course, starting with inverses • Deﬁnition: The inverse of an n × n matrix A, denoted A−1, −1 −1 is the matrix satisfying AA = A A = In, where In is the n × n identity matrix. • Note: We’re sort of getting ahead of ourselves by saying that −1 −1 A is “the” matrix satisfying AA = In, but it is indeed the case that if a matrix has an inverse, the inverse is unique Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 3 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Singular matrices • However, not all matrices have inverses; for example " # 1 2 A = 2 4 −1 • There does not exist a matrix such that AA = I2 • Such matrices are said to be singular • Remark: Only square matrices have inverses; an n × m matrix A might, however, have a left inverse (satisfying BA = Im) or right inverse (satisfying AB = In) Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 4 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Positive deﬁnite • A related notion is that of a “positive deﬁnite” matrix, which applies to symmetric matrices • Deﬁnition: A symmetric n × n matrix A is said to be n positive deﬁnite if for all x ∈ R , x>Ax > 0 if x 6= 0 • The two notions are related in the sense that if A is positive deﬁnite, then (a) A is not singular and (b) A−1 is also positive deﬁnite • If x>Ax ≥ 0, then A is said to be positive semideﬁnite • In statistics, these classiﬁcations are particularly important for variance-covariance matrices, which are always positive semideﬁnite (and positive deﬁnite, if they aren’t singular) Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 5 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Square root of a matrix • These concepts are important with respect to knowing whether a matrix has a “square root” • Deﬁnition: An n × n matrix A is said to have a square root if there exists a matrix B such that BB = A. • Theorem: Let A be a positive deﬁnite matrix. Then there exists a unique matrix A1/2 such that A1/2A1/2 = A. • Positive semideﬁnite matrices have square roots as well, although they aren’t necessarily unique Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 6 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Rank • One additional linear algebra concept we need is the rank of a matrix (there are many ways of deﬁning rank; all are equivalent) • Deﬁnition: The rank of a matrix is the dimension of its largest nonsingular submatrix. • For example, the following 3 × 3 matrix is singular, but contains a nonsingular 2 × 2 submatrix, so its rank is 2: 1 2 3¡ A = 2¡ 4¡ 6¡ 1 0 1¡ • Note that a nonsingular n × n matrix has rank n, and is said to be full rank Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 7 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Rank and multiplication • There are many results and theorems involving rank; we’re not going to cover them all, but it is important to know that rank cannot be increased through the process of multiplication • Theorem: For any matrices A and B with appropriate dimensions, rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B). • In particular, rank(A>A) = rank(AA>) = rank(A). Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 8 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Expectation and variance • Finally, we need some results on expected values of vectors and functions of vectors • First of all, we need to deﬁne expectation and variance as they pertain to random vectors • Deﬁnition: Let x = (X1 X2 ··· Xd) denote a vector of random variables, then E(x) = (EX1 EX2 ··· EXd). Meanwhile, Vx is a d × d matrix: > Vx = E{(x − µ)(x − µ) } with elements (Vx)ij = E {(Xi − µi)(Xj − µj)} , where µi = EXi. The matrix Vx is referred to as the variance-covariance matrix of x. Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 9 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Linear and quadratic forms • Letting A denote a matrix of constants and x a random vector with mean µ and variance Σ, T T E(A x) = A µ T T V(A x) = A ΣA T T E(x Ax) = µ Aµ + tr(AΣ), P where tr(A) = i Aii is the trace of A • Some useful facts about traces: tr(AB) = tr(BA) tr(A + B) = tr(A) + tr(B) tr(cA) = c tr(A) tr(A) = rank(A) if AA = A Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 10 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Motivation • In the univariate case, the family of normal distributions can be constructed from the standard normal distribution through the location-scale transformation µ + σZ, where Z ∼ N(0, 1); the resulting random variable has a N(µ, σ2) distribution • A similar approach can be taken with the multivariate normal distribution, although some care needs to be taken with regard to whether the resulting variance is singular or not Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 11 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Standard normal • First, the easy case: if Z1,...,Zr are mutually independent and each follows a standard normal distribution, the random vector z is said to follow an r-variate standard normal distribution, denoted z ∼ Nr(0, Ir) • Remark: For multivariate normal distributions and identity matrices, I will usually leave oﬀ the subscript from now on when it is either unimportant or able to be ﬁgured out from context • If z ∼ Nr(0, I), its density is −r/2 1 > p(z) = (2π) exp{− 2 z z} Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 12 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Multivariate normal distribution • Deﬁnition: Let x be a d × 1 random vector with mean vector µ and covariance matrix Σ, where rank(Σ) = r > 0. Let Γ be a r × d matrix such that Σ = Γ>Γ. Then x is said to have a d-variate normal distribution of rank r if its distribution is the same as that of the random vector µ + Γ>z, where z ∼ Nr(0, I). • This is typically denoted x ∼ Nd(µ, Σ) Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 13 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Density • Suppose x ∼ Nd(µ, Σ) and that Σ is full rank; then x has a density: −d/2 −1/2 1 > −1 p(x|µ, Σ) = (2π) |Σ| exp{− 2 (x − µ) Σ (x − µ)}, where |Σ| denotes the determinant of Σ • We will not really concern ourselves with determinants and their properties in this course, although it is worth pointing out that if Σ is singular, then |Σ| = 0 and the above result does not hold (or even make sense) Patrick Breheny University of Iowa Likelihood Theory (BIOS 7110) 14 / 31 Multivariate normal distribution Linear algebra background Linear combinations and quadratic forms Deﬁnition Marginal and conditional distributions Density and MGF Singular case • In fact, if Σ is singular, then x does not even have a density • This is connected to our earlier discussion of the Lebesgue decomposition theorem: if Σ is singular, then the distribution of x has a singular component (i.e., x is not absolutely continuous) • This is the reason why the deﬁnition of the MVN might seem somewhat roundabout – we can’t