7 the Multivariate Normal Model

7 The multivariate normal model Up until now all of our statistical models have been univariate models, that is, models for a single measurement on each member of a sample of individuals or each run of a repeated experiment. However, datasets are frequently multivariate, having multiple measurements for each individual or experiment. This chapter covers what is perhaps the most useful model for multivariate data, the multivariate normal model, which allows us to jointly estimate population means, variances and correlations of a collection of variables. After first calculating posterior distributions under semiconjugate prior distributions, we show how the multivariate normal model can be used to impute data that are missing at random. 7.1 The multivariate normal density Example: Reading comprehension A sample of twenty-two children are given reading comprehension tests before and after receiving a particular instructional method. Each student i will then have two scores, Yi,1 and Yi,2 denoting the pre- and post-instructional scores respectively. We denote each student’s pair of scores as a 2 × 1 vector Y i, so that Yi,1 score on first test Y i = = . Yi,2 score on second test Things we might be interested in include the population mean θ, E[Y ] θ E[Y ] = i,1 = 1 E[Yi,2] θ2 and the population covariance matrix Σ, 2 2 2 E[Y1 ] − E[Y1] E[Y1Y2] − E[Y1]E[Y2] σ1 σ1,2 Σ = Cov[Y ] = 2 2 = 2 , E[Y1Y2] − E[Y1]E[Y2] E[Y2 ] − E[Y2] σ1,2 σ2 P.D. Hoff, A First Course in Bayesian Statistical Methods, Springer Texts in Statistics, DOI 10.1007/978-0-387-92407-6 7, c Springer Science+Business Media, LLC 2009 106 7 The multivariate normal model where the expectations above represent the unknown population averages. Having information about θ and Σ may help us in assessing the effectiveness of the teaching method, possibly evaluated with θ2 − θ1, or the consistency of the reading comprehension test, which could be evaluated with the correlation p 2 2 coefficient ρ1,2 = σ1,2/ σ1σ2. The multivariate normal density Notice that θ and Σ are both functions of population moments, or population averages of powers of Y1 and Y2. In particular, θ and Σ are functions of first- and second-order moments: first-order moments: E[Y1], E[Y2] 2 2 second-order moments: E[Y1 ], E[Y1Y2], E[Y2 ] Recall from Chapter 5 that a univariate normal model describes a population in terms of its mean and variance (θ, σ2), or equivalently its first two moments (E[Y ] = θ, E[Y 2] = σ2 + θ2). The analogous model for describing first- and second-order moments of multivariate data is the multivariate normal model. We say a p-dimensional data vector Y has a multivariate normal distribution if its sampling density is given by p(y|θ,Σ) = (2π)−p/2|Σ|−1/2 exp{−(y − θ)T Σ−1(y − θ)/2} where 2 y1 θ1 σ1 σ1,2 ··· σ1,p 2 y2 θ2 σ1,2 σ2 ··· σ2,p y = θ = Σ = . . . . . . . 2 yp θp σ1,p ······ σp Calculating this density requires a few operations involving matrix algebra. For a matrix A, the value of |A| is called the determinant of A, and measures how “big” A is. The inverse of A is the matrix A−1 such that AA−1 is equal to the identity matrix Ip, the p×p matrix that has ones for its diagonal entries but is otherwise zero. For a p × 1 vector b, bT is its transpose, and is simply the 1 × p vector of the same values. Finally, the vector-matrix product bT A Pp Pp is equal to the 1 × p vector ( j=1 bjaj,1,..., j=1 bjaj,p), and the value of T Pp Pp b Ab is the single number j=1 k=1 bjbkaj,k. Fortunately, R can compute all of these quantities for us, as we shall see in the forthcoming example code. Figure 7.1 gives contour plots and 30 samples from each of three different two-dimensional multivariate normal densities. In each one θ = (50, 50)T , 2 2 σ1 = 64, σ2 = 144, but the value of σ1,2 varies from plot to plot, with σ1,2 = −48 for the left density, 0 for the middle and +48 for the density on the right (giving correlations of -.5, 0 and +.5 respectively). An interesting feature of the multivariate normal distribution is that the marginal distribution of each 2 variable Yj is a univariate normal distribution, with mean θj and variance σj . 7.2 A semiconjugate prior distribution for the mean 107 This means that the marginal distributions for Y1 from the three populations in Figure 7.1 are identical (the same holds for Y2). The only thing that differs across the three populations is the relationship between Y1 and Y2, which is controlled by the covariance parameter σ1,2. 80 80 80 x x x x x 70 x x 70 x x 70 x x x xx x xx x x x 60 xx 60 60 x x xxx xx xx x x x x 2 x x x 2 xx 2 xx xx y y y 50 x xxx x 50 x xx x 50 x xxxxx x x x xx x xxxxxxx x xx x x x 40 x x 40 x 40 x x x x x xxx 30 30 30 20 20 20 20 40 60 80 20 40 60 80 20 40 60 80 y1 y1 y1 Fig. 7.1. Multivariate normal samples and densities. 7.2 A semiconjugate prior distribution for the mean Recall from Chapters 5 and 6 that if Y1,...,Yn are independent samples from a univariate normal population, then a convenient conjugate prior distribution for the population mean is also univariate normal. Similarly, a convenient prior distribution for the multivariate mean θ is a multivariate normal distribution, which we will parameterize as p(θ) = multivariate normal(µ0,Λ0), where µ0 and Λ0 are the prior mean and variance of θ, respectively. What is the full conditional distribution of θ, given y1,..., yn and Σ? In the univariate case, having normal prior and sampling distributions resulted in a normal full conditional distribution for the population mean. Let’s see if this result holds for the multivariate case. We begin by examining the prior distribution as a function of θ: 1 p(θ) = (2π)−p/2|Λ |−1/2 exp{− (θ − µ )T Λ−1(θ − µ )} 0 2 0 0 0 1 1 = (2π)−p/2|Λ |−1/2 exp{− θT Λ−1θ + θT Λ−1µ − µT Λ−1µ } 0 2 0 0 0 2 0 0 0 1 ∝ exp{− θT Λ−1θ + θT Λ−1µ } 2 0 0 0 1 = exp{− θT A θ + θT b }, (7.1) 2 1 1 108 7 The multivariate normal model −1 −1 where A0 = Λ0 and b0 = Λ0 µ0. Conversely, Equation 7.1 says that if a T random vector θ has a density on Rp that is proportional to exp{−θ Aθ/2 + θT b} for some matrix A and vector b, then θ must have a multivariate normal distribution with covariance A−1 and mean A−1b. If our sampling model is that {Y 1,..., Y n|θ,Σ} are i.i.d. multivariate normal(θ,Σ), then similar calculations show that the joint sampling density of the observed vectors y1,..., yn is n Y −p/2 −1/2 T −1 p(y1,..., yn|θ,Σ) = (2π) |Σ| exp{−(yi − θ) Σ (yi − θ)/2} i=1 n 1 X = (2π)−np/2|Σ|−n/2 exp{− (y − θ)T Σ−1(y − θ)} 2 i i i=1 1 ∝ exp{− θT A θ + θT b }, (7.2) 2 1 1 −1 −1 where A1 = nΣ , b1 = nΣ y¯ and y¯ is the vector of variable-specific 1 Pn 1 Pn T averages y¯ = ( n i=1 yi,1,..., n i=1 yi,p) . Combining Equations 7.1 and 7.2 gives 1 1 p(θ|y ,..., y ,Σ) ∝ exp{− θT A θ + θT b } × exp{− θT A θ + θT b } 1 n 2 0 0 2 1 1 1 = exp{− θT A θ + θT b }, where (7.3) 2 n n −1 −1 An = A0 + A1 = Λ0 + nΣ and −1 −1 bn = b0 + b1 = Λ0 µ0 + nΣ y¯. From the comments in the previous paragraph, Equation 7.3 implies that the conditional distribution of θ therefore must be a multivariate normal −1 −1 distribution with covariance An and mean An bn, so −1 −1 −1 Cov[θ|y1,..., yn,Σ] = Λn = (Λ0 + nΣ ) (7.4) −1 −1 −1 −1 −1 E[θ|y1,..., yn,Σ] = µn = (Λ0 + nΣ ) (Λ0 µ0 + nΣ y¯) (7.5) p(θ|y1,..., yn,Σ) = multivariate normal(µn,Λn). (7.6) It looks a bit complicated, but can be made more understandable by analogy with the univariate normal case: Equation 7.4 says that posterior precision, or inverse variance, is the sum of the prior precision and the data precision, just as in the univariate normal case. Similarly, Equation 7.5 says that the posterior expectation is a weighted average of the prior expectation and the sample mean. Notice that, since the sample mean is consistent for the population mean, the posterior mean also will be consistent for the population mean even if the true distribution of the data is not multivariate normal. 7.3 The inverse-Wishart distribution 109 7.3 The inverse-Wishart distribution Just as a variance σ2 must be positive, a variance-covariance matrix Σ must be positive definite, meaning that x0Σx > 0 for all vectors x.

Load more