Bayesian Inference Chapter 9. Linear Models and Regression

0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models Chapter 9. Linear models and regression Objective Illustrate the Bayesian approach to fitting normal and generalized linear models. Recommended reading • Lindley, D.V. and Smith, A.F.M. (1972). Bayes estimates for the linear model (with discussion), Journal of the Royal Statistical Society B, 34, 1-41. • Broemeling, L.D. (1985). Bayesian Analysis of Linear Models, Marcel- Dekker. • Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2003). Bayesian Data Analysis, Chapter 8. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models 9. Linear models and regression Chapter 9. Linear models and regression AFM Smith AFM Smith developed some ofObjective the central ideas in the theory and practice of modern Bayesian statistics. To illustrate the Bayesian approach to fitting normal and generalized linear models. Bayesian Statistics 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models Contents 0. Introduction 1. The multivariate normal distribution 1.1. Conjugate Bayesian inference when the variance-covariance matrix is known up to a constant 1.2. Conjugate Bayesian inference when the variance-covariance matrix is unknown 2. Normal linear models 2.1. Conjugate Bayesian inference for normal linear models 2.2. Example 1: ANOVA model 2.3. Example 2: Simple linear regression model 3. Generalized linear models 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models The multivariate normal distribution Firstly, we review the definition and properties of the multivariate normal distribution. Definition T A random variable X = (X1;:::; Xk ) is said to have a multivariate normal distribution with mean µ and variance-covariance matrix Σ if: 1 1 f (xjµ; Σ) = exp − (x − µ)T Σ−1 (x − µ) ; k=2 1 (2π) jΣj 2 2 for x 2Rk : In this case, we write Xjµ; Σ ∼ N (µ; Σ). 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models The multivariate normal distribution The following properties of the multivariate normal distribution are well known: • Any subset of X has a (multivariate) normal distribution. Pk • Any linear combination i=1 αi Xi is normally distributed. • If Y = a + BX is a linear transformation of X, then: Yjµ, Σ ∼ N a + Bµ; BΣBT • If X1 µ1 Σ11 Σ12 X = µ; Σ ∼ N ; ; X2 µ2 Σ21 Σ22 then, the conditional density of X1 given X2 = x2 is: −1 −1 X1jx2; µ; Σ ∼ N µ1 + Σ12Σ22 (x2 − µ2); Σ11 − Σ12Σ22 Σ21 : 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models The multivariate normal distribution The likelihood function given a sample x = (x1;:::; xn) of data from N (µ; Σ) is: n ! 1 1 X T l µ; j − − µ −1 − µ ( Σ x) = nk=2 n exp (xi ) Σ (xi ) 2 2 (2π) jΣj i=1 n ! 1 1 X T −1 T −1 / n exp − (xi − ¯x) Σ (xi − ¯x) + n (µ − ¯x) Σ (µ − ¯x) 2 2 jΣj i=1 1 1 −1 T −1 / n exp − tr SΣ + n (µ − ¯x) Σ (µ − ¯x) jΣj 2 2 1 Pn Pn T where ¯x = n i=1 xi and S = i=1 (xi − ¯x)(xi − ¯x) and tr(M) represents the trace of the matrix M. It is possible to carry out Bayesian inference with conjugate priors for µ and Σ. We shall consider two cases which reflect different levels of knowledge about the variance-covariance matrix, Σ. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models 1 Conjugate Bayesian inference when Σ = φC Firstly, consider the case where the variance-covariance matrix is known 1 up to a constant, i.e. Σ = φ C where C is a known matrix. Then, we 1 have Xjµ; φ ∼ N (µ; φ C) and the likelihood function is, nk φ −1 T −1 l (µ; φjx) / φ 2 exp − tr SC + n (µ − ¯x) C (µ − ¯x) 2 Analogous to the univariate case, it can be seen that a multivariate normal-gamma prior distribution is conjugate. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models 1 Conjugate Bayesian inference when Σ = φC Definition We say that (µ; φ) have a multivariate normal gamma prior with −1 a b parameters m; V ; 2 ; 2 if, 1 µjφ ∼ N m; V φ a b φ ∼ G ; : 2 2 −1 a b In this case, we write (µ; φ) ∼ N G(m; V ; 2 ; 2 ). Analogous to the univariate case, the marginal distribution of µ is a multivariate, non-central t distribution. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models 1 Conjugate Bayesian inference when Σ = φC Definition A (k-dimensional) random variable, T = (T1;:::; Tk ), has a multivariate t distribution with parameters (d; µT ; ΣT ) if: d+k d+k − 2 Γ 2 1 T −1 f (t) = k 1 1 + (t−µT ) ΣT (t−µT ) 2 2 d d (πd) jΣT j Γ 2 In this case, we write T ∼ T (µT ; ΣT ; d). Theorem −1 a b Let µ; φ ∼ N G(m; V ; 2 ; 2 ). Then, the marginal density of µ is: b µ ∼ T (m; V; a) a . 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models 1 Conjugate Bayesian inference when Σ = φC Theorem 1 −1 a b Let Xjµ; φ ∼ N (µ; φ C) and assume a priori (µ; φ) ∼ N G(m; V ; 2 ; 2 ). Then, given a sample data, x, we have that, 1 µjx; φ ∼ N m∗; V∗ φ a∗ b∗ φjx ∼ G ; : 2 2 where, −1 V∗ = V−1 + nC−1 m∗ = V∗ V−1m + nC−1x¯ a∗ = a + nk b∗ = b + tr SC−1 + mT V−1 + n¯xT C−1¯x + m∗V∗−1m∗ 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models 1 Conjugate Bayesian inference when Σ = φC Theorem 1 Given the reference prior p(µ; φ) / φ , then the posterior distribution is, nk −1 φ −1 T −1 p (µ; φjx) / φ 2 exp − tr SC + n (µ − ¯x) C (µ − ¯x) ; 2 −1 −1 (n−1)k tr(SC ) Then, µ; φ j x ∼ N G(x¯; nC ; 2 ; 2 ) which implies that, 1 µ j x; φ ∼ N x¯; C nφ −1! (n − 1) k tr SC φ j x ∼ G ; 2 2 ! tr SC−1 µ j x ∼ T x¯; C; (n − 1) k n (n − 1) k 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models Conjugate Bayesian inference when Σ is unknown In this case, it is useful to reparameterize the normal distribution in terms of the precision matrix Φ = Σ−1. Then, the normal likelihood function becomes, n 1 T l(µ; Φ) / jΦj 2 exp − tr (SΦ) + n (µ − ¯x) Φ (µ − ¯x) 2 It is clear that a conjugate prior for µ and Σ must take a similar form to the likelihood. This is a normal-Wishart distribution. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models Conjugate Bayesian inference when Σ is unknown Definition A k × k dimensional symmetric, positive definite random variable W is said to have a Wishart distribution with parameters d and V if, d−k−1 jWj 2 1 f (W) = exp − tr V−1W ; dk d k(k−1) k 2 2 4 Q d+1−i 2 2 jVj π i=1Γ 2 where d > k − 1: In this case, E [W] = dV and we write W ∼ W (d; V) If W ∼ W (d; V), then the distribution of W−1 is said to be an inverse Wishart distribution, W−1 ∼ IW d; V−1, with mean −1 1 −1 E W = d−k−1 V 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models Conjugate Bayesian inference when Σ is unknown Theorem −1 1 −1 Suppose that X j µ; Φ ∼ N (µ; Φ ) and let µ j Φ ∼ N (m; α Φ ) and Φ ∼ W(d; W). Then, αm + ¯x 1 µ j Φ; x ∼ N ; Φ−1 α + n α + n αn Φ j x ∼ W d + n; W−1 + S + (m − ¯x)(m − ¯x)T α + n Theorem k+1 Given the limiting prior, p(Φ) / jΦj 2 , the posterior distribution is, 1 µ j Φ; x ∼ N ¯x; Φ−1 n Φ j x ∼ W (n − 1; S) 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models Conjugate Bayesian inference when Σ is unknown The conjugacy assumption that the prior precision of µ is proportional to the model precision φ is very strong in many cases. Often, we may simply wish to use a prior distribution of form µ ∼ N (m; V) where m and V are known and a Wishart prior for Φ, say Φ ∼ W(d; W) as earlier. In this case, the conditional posterior distributions are: −1 −1 µ j Φ; x ∼ N V−1 + nΦ V−1m + nΦ¯x ; V−1 + nΦ Φ j µ; x ∼ W d + n; W−1 + S + n (µ − ¯x)(µ − ¯x)T and therefore, it is straightforward to set up a Gibbs sampling algorithm to sample the joint posterior, as in the univariate case. 0. Introduction 1. Multivariate normal 2. Normal linear models 3.

Load more