<<

0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Bayesian

Chapter 9. Linear models and regression

M. Concepcion Ausin Universidad Carlos III de Madrid

Master in Business Administration and Quantitative Methods Master in Mathematical 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Chapter 9. Linear models and regression

Objective

Illustrate the Bayesian approach to fitting normal and generalized linear models.

Recommended reading

• Lindley, D.V. and Smith, A.F.M. (1972). Bayes estimates for the (with discussion), Journal of the Royal Statistical Society B, 34, 1-41. • Broemeling, L.D. (1985). Bayesian Analysis of Linear Models, Marcel- Dekker. • Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2003). Bayesian Analysis, Chapter 8. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

9. Linear models and regression Chapter 9. Linear models and regression

AFM Smith

AFM Smith developed some ofObjective the central ideas in the theory and practice of modern Bayesian . To illustrate the Bayesian approach to fitting normal and generalized linear models.

Bayesian Statistics 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Contents

0. Introduction 1. The multivariate

1.1. Conjugate when the - matrix is known up to a constant

1.2. Conjugate Bayesian inference when the variance- is unknown 2. Normal linear models

2.1. Conjugate Bayesian inference for normal linear models

2.2. Example 1: ANOVA model

2.3. Example 2: Simple model 3. Generalized linear models 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

The multivariate normal distribution

Firstly, we review the definition and properties of the multivariate normal distribution.

Definition

T A X = (X1,..., Xk ) is said to have a multivariate normal distribution with µ and variance-covariance matrix Σ if:

1  1  f (x|µ, Σ) = exp − (x − µ)T Σ−1 (x − µ) , k/2 1 (2π) |Σ| 2 2

for x ∈Rk .

In this case, we write X|µ, Σ ∼ N (µ, Σ). 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

The multivariate normal distribution

The following properties of the multivariate normal distribution are well known:

• Any subset of X has a (multivariate) normal distribution. Pk • Any linear combination i=1 αi Xi is normally distributed. • If Y = a + BX is a linear transformation of X, then:

Y|µ, Σ ∼ N a + Bµ, BΣBT 

• If       X1 µ1 Σ11 Σ12 X = µ, Σ ∼ N , , X2 µ2 Σ21 Σ22

then, the conditional density of X1 given X2 = x2 is:

−1 −1  X1|x2, µ, Σ ∼ N µ1 + Σ12Σ22 (x2 − µ2), Σ11 − Σ12Σ22 Σ21 . 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

The multivariate normal distribution

The given a sample x = (x1,..., xn) of data from N (µ, Σ) is:

n ! 1 1 X T l µ, | − − µ −1 − µ ( Σ x) = nk/2 n exp (xi ) Σ (xi ) 2 2 (2π) |Σ| i=1 n ! 1 1 X T −1 T −1 ∝ n exp − (xi − ¯x) Σ (xi − ¯x) + n (µ − ¯x) Σ (µ − ¯x) 2 2 |Σ| i=1   1 1 −1 T −1 ∝ n exp − tr SΣ + n (µ − ¯x) Σ (µ − ¯x) |Σ| 2 2

1 Pn Pn T where ¯x = n i=1 xi and S = i=1 (xi − ¯x)(xi − ¯x) and tr(M) represents the trace of the matrix M.

It is possible to carry out Bayesian inference with conjugate priors for µ and Σ. We shall consider two cases which reflect different levels of knowledge about the variance-covariance matrix, Σ. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

1 Conjugate Bayesian inference when Σ = φC

Firstly, consider the case where the variance-covariance matrix is known 1 up to a constant, i.e. Σ = φ C where C is a known matrix. Then, we 1 have X|µ, φ ∼ N (µ, φ C) and the likelihood function is,   nk φ −1 T −1 l (µ, φ|x) ∝ φ 2 exp − tr SC + n (µ − ¯x) C (µ − ¯x) 2

Analogous to the univariate case, it can be seen that a multivariate normal-gamma prior distribution is conjugate. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

1 Conjugate Bayesian inference when Σ = φC

Definition

We say that (µ, φ) have a multivariate normal gamma prior with −1 a b  m, V , 2 , 2 if,  1  µ|φ ∼ N m, V φ a b  φ ∼ G , . 2 2

−1 a b In this case, we write (µ, φ) ∼ N G(m, V , 2 , 2 ). Analogous to the univariate case, the of µ is a multivariate, non-central t distribution. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

1 Conjugate Bayesian inference when Σ = φC Definition

A (k-dimensional) random variable, T = (T1,..., Tk ), has a multivariate t distribution with parameters (d, µT , ΣT ) if:

d+k d+k   − 2 Γ 2 1 T −1 f (t) = k 1 1 + (t−µT ) ΣT (t−µT ) 2 2 d  d (πd) |ΣT | Γ 2

In this case, we write T ∼ T (µT , ΣT , d).

Theorem

−1 a b Let µ, φ ∼ N G(m, V , 2 , 2 ). Then, the marginal density of µ is: b µ ∼ T (m, V, a) a . 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

1 Conjugate Bayesian inference when Σ = φC Theorem

1 −1 a b Let X|µ, φ ∼ N (µ, φ C) and assume a priori (µ, φ) ∼ N G(m, V , 2 , 2 ). Then, given a sample data, x, we have that,

 1  µ|x, φ ∼ N m∗, V∗ φ a∗ b∗  φ|x ∼ G , . 2 2

where,

−1 V∗ = V−1 + nC−1 m∗ = V∗ V−1m + nC−1x¯ a∗ = a + nk b∗ = b + tr SC−1 + mT V−1 + n¯xT C−1¯x + m∗V∗−1m∗ 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

1 Conjugate Bayesian inference when Σ = φC Theorem

1 Given the reference prior p(µ, φ) ∝ φ , then the posterior distribution is,   nk −1 φ −1 T −1 p (µ, φ|x) ∝ φ 2 exp − tr SC + n (µ − ¯x) C (µ − ¯x) , 2

−1 −1 (n−1)k tr(SC ) Then, µ, φ | x ∼ N G(x¯, nC , 2 , 2 ) which implies that,  1  µ | x, φ ∼ N x¯, C nφ −1! (n − 1) k tr SC φ | x ∼ G , 2 2 ! tr SC−1 µ | x ∼ T x¯, C, (n − 1) k n (n − 1) k 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Conjugate Bayesian inference when Σ is unknown

In this case, it is useful to reparameterize the normal distribution in terms of the matrix Φ = Σ−1. Then, the normal likelihood function becomes,   n 1 T l(µ, Φ) ∝ |Φ| 2 exp − tr (SΦ) + n (µ − ¯x) Φ (µ − ¯x) 2

It is clear that a for µ and Σ must take a similar form to the likelihood. This is a normal-. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Conjugate Bayesian inference when Σ is unknown

Definition

A k × k dimensional symmetric, positive definite random variable W is said to have a Wishart distribution with parameters d and V if,

d−k−1 |W| 2  1  f (W) = exp − tr V−1W , dk d k(k−1) k 2 2 4 Q d+1−i  2 2 |V| π i=1Γ 2 where d > k − 1. In this case, E [W] = dV and we write W ∼ W (d, V)

If W ∼ W (d, V), then the distribution of W−1 is said to be an inverse Wishart distribution, W−1 ∼ IW d, V−1, with mean  −1 1 −1 E W = d−k−1 V 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Conjugate Bayesian inference when Σ is unknown Theorem

−1 1 −1 Suppose that X | µ, Φ ∼ N (µ, Φ ) and let µ | Φ ∼ N (m, α Φ ) and Φ ∼ W(d, W). Then,

αm + ¯x 1  µ | Φ, x ∼ N , Φ−1 α + n α + n  αn  Φ | x ∼ W d + n, W−1 + S + (m − ¯x)(m − ¯x)T α + n

Theorem

k+1 Given the limiting prior, p(Φ) ∝ |Φ| 2 , the posterior distribution is,

 1  µ | Φ, x ∼ N ¯x, Φ−1 n Φ | x ∼ W (n − 1, S) 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Conjugate Bayesian inference when Σ is unknown

The conjugacy assumption that the prior precision of µ is proportional to the model precision φ is very strong in many cases. Often, we may simply wish to use a prior distribution of form µ ∼ N (m, V) where m and V are known and a Wishart prior for Φ, say Φ ∼ W(d, W) as earlier.

In this case, the conditional posterior distributions are:

 −1 −1 µ | Φ, x ∼ N V−1 + nΦ V−1m + nΦ¯x , V−1 + nΦ   Φ | µ, x ∼ W d + n, W−1 + S + n (µ − ¯x)(µ − ¯x)T

and therefore, it is straightforward to set up a Gibbs to sample the joint posterior, as in the univariate case. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Normal linear models

Definition

A normal linear model is of form:

y = Xθ + ,

0 where y = (y1,..., yn) are the observed data, X is a given design matrix 0 of size n × k and θ = (θ1, . . . , θk ) are the set of model parameters. Finally, it is usually assumed that:

 1   ∼ N 0 , I . n φ n 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Normal linear models

A simple example of normal linear model is the T  1 1 ... 1  model where X = and θ = (α, β)T . x1 x2 ... xn

It is easy to see that there is a conjugate, multivariate normal-gamma prior distribution for any normal linear model. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Conjugate Bayesian inference Theorem

Consider a normal linear model, y = Xθ + , and assume a normal-gamma priori distribution, a b θ, φ ∼ N G(m, V−1, , ). 2 2 Then, the posterior distribution given y is also a normal-: a∗ b∗ θ, φ | y ∼ N G(m∗, V∗−1, , ) 2 2 where: −1 m∗ = XT X + V−1 XT y + V−1m −1 V∗ = XT X + V−1 a∗ = a + n b∗ = b + yT y + mT V−1m − m∗T V∗−1m∗ 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Conjugate Bayesian inference Given a normal linear model, y = Xθ + , and assuming a normal-gamma −1 a b priori distribution, θ, φ ∼ N G(m, V , 2 , 2 ), it is easy to see that the predictive distribution of y is:  a  y ∼ T Xm, XVXT + I , a b To see this, note that the distribution of y | φ is:

 1   y | φ ∼ N Xm, XVXT + I , φ

then, the joint distribution of y and φ is a multivariate normal-gamma:

 a b  y, φ ∼ N G Xm, (XVXT + I)−1, , 2 2

and therefore, the marginal of y is the multivariate t-distribution obtained before. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Interpretation of the posterior mean

We have that:

−1 E [θ | y] = XT X + V−1 XT y + V−1m −1  −1  = XT X + V−1 XT X XT X XT y + V−1m

−1   = XT X + V−1 XT Xθˆ + V−1m

−1 where θˆ = XT X XT y is the maximum likelihood estimator.

Thus, this expression may be interpreted as a weighted average of the prior estimator, m, and the MLE, θˆ, with weights proportional to precisions, as we can recall that, conditional on φ, the prior variance was 1 φ V and that the distribution of the MLE from the classical viewpoint is   ˆ 1 T −1 θ | φ ∼ N θ, φ (X X) 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Conjugate Bayesian inference Theorem

Consider a normal linear model, y = Xθ + , and assume the limiting 1 prior distribution, p(θ, φ) ∝ φ . Then, we have that,

 1 −1 θ | y, φ ∼ N θˆ, XT X , φ  T  n − k yT y − θˆ XT X θˆ φ | y ∼ G  ,  . 2 2

And then,  −1  θ | y ∼ T θˆ, σˆ2 XT X , n − k ,

where, T yT y − θˆ XT X θˆ σˆ2 = n − k 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Conjugate Bayesian inference

T ˆT T ˆ 2 y y−θ (X X)θ 2 1 Note thatσ ˆ = n−k is the usual classical estimator of σ = φ . In this case, Bayesian credible intervals, estimators etc. will coincide with their classical counterparts. One should note however that the propriety of the posterior distribution in this case relies on two conditions:

1. n > k. 2. XT X is of full rank.

If either of these two conditions is not satisfied, then the posterior distribution will be improper. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

ANOVA model The ANOVA model is an example of normal lineal model where:

yij = θi + ij ,

1 where ij ∼ N (0, φ ), for i = 1,..., k, and j = 1,..., ni .

Thus, the parameters are θ = (θ1, . . . , θk ), the observed data are T y = (y11,..., y1n1 , y21,..., y2n2 ,..., yk1,..., yknk ) , the design matrix is:  1 0 ··· 0   . .   . .     1n1 0 ··· 0     0 1 0    X =  . . .   . . .     0 1n 0   2   . . .   . . .  0 0 1 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

ANOVA model

 1  If we use conditionally independent normal priors, θi ∼ N mi , , for αi φ a b i = 1,..., k, and a gamma prior φ ∼ G( 2 , 2 ). Then, we have that:  a b  (θ, φ) ∼ N G m, V−1, , 2 2

 1  α1  ..  where m = (m1,..., mk ) and V =  .  1 αk   n1 + α1 T −1  ..  Noting that, X X + V =  . , we can obtain nk + αk the following posterior distributions. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

ANOVA model

It is obtained that,

 n1y¯1·+α1m1   1  n +α α +n 1 1 1 1 1 θ | y, φ ∼ N  .  ,  ..   .  φ  .  n1y¯1·+α1m1 1 n1+α1 αk +nk and

Pk Pni 2 Pk ni 2 ! a + n b + i=1 j=1 (yij − y¯i·) + i=1 (¯yi· − mi ) φ | y ∼ G , ni +αi 2 2 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

ANOVA model

Often we are interested in the differences in group , e.g. θ1 − θ2. Here we have,    n1y¯1· + α1m1 n2y¯2· + α2m2 1 1 1 θ1−θ2 | y, φ ∼ N − , + n1 + α1 n2 + α2 φ α1 + n1 α2 + n2

and therefore a posterior, 95% interval for θ1 − θ2 is given by: n y¯ + α m n y¯ + α m 1 1· 1 1 − 2 2· 2 2 n1 + α1 n2 + α2 s Pk Pni 2 Pk ni 2   b+ (yij −y¯i·) + (¯yi·−mi ) 1 1 i=1 j=1 i=1 ni +αi ± + ta+n (0.975) α1+n1 α2+n2 a+n 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

ANOVA model 1 If we assume alternatively the reference prior, p(θ, φ) ∝ φ , we have:

   1  y¯1· n 1 1 θ | y, φ ∼ N  .  ,  ..  ,  .  φ  .  1 y¯k· nk n − k (n − k)σ ˆ2  φ ∼ G , , 2 2

2 1 Pk 2 whereσ ˆ = n−k i=1 (yij − y¯i·) is the classical variance estimate for this problem.

A 95% posterior interval for θ1 − θ2 is given by:

r 1 1 y¯1· − y¯2· ± σˆ + tn−k (0.975) , n1 n2 which is equal to the usual, classical interval. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Simple linear regression model

Consider the simple regression model:

yi = α + βxi i ,

 1  for i = 1,..., n, where i ∼ N 0, φ and suppose that we use the limiting prior: 1 p(α, β, φ) ∝ . φ 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Simple linear regression model Then, we have that:   n    P 2 α αˆ 1 xi −nx¯ y, φ ∼ N , β  βˆ φns  i=1  x −nx¯ n 2! n − 2 sx 1 − r φ | y ∼ G , 2 2   n     2 P 2 α αˆ σˆ 1 xi −nx¯ y ∼ T , , n − 2 β  βˆ n s  i=1   x −nx¯ n where s αˆ =y ¯ − βˆx¯, βˆ = xy , sx Pn 2 Pn 2 sx = i=1 (xi − x¯) , sy = i=1 (yi − y¯) , 2 Pn sxy 2 sy 1 − r sxy = i=1 (xi − x¯)(yi − y¯) , r = √ , σˆ = . sx sy n − 2 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Simple linear regression model

Thus, the marginal distributions of α and β are:

 n  P 2 2 xi  σˆ i=1  α | y ∼ T α,ˆ , n − 2  n sx 

 σˆ2  β | y ∼ T β,ˆ , n − 2 sx

and therefore, for example, a 95% for β is given by: σˆ βˆ ± √ tn−2(0.975) sx

equal to the usual classical interval. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Simple linear regression model Suppose now that we wish to predict a future observation:

ynew = α + βxnew + new .

Note that,

E [ynew | φ, y] =α ˆ + βˆxnew Pn 2 2  1 i=1xi + nxnew − 2nxx¯ new V [ynew | φ, y] = + 1 φ nsx 1 s + nx¯2 + nx 2 − 2nxx¯  = x new new + 1 φ nsx

Therefore,

2 !! 1 (¯x − xnew ) 1 ynew | φ, y ∼ N αˆ + βˆxnew , + + 1 φ sx n 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Simple linear regression model

And then,

2 ! ! 2 (¯x − xnew ) 1 ynew | y ∼ T αˆ + βˆxnew , σˆ + + 1 , n − 2 sx n

leading to the following 95% credible interval for ynew : v u 2 ! u (¯x − xnew ) 1 αˆ + βˆxnew ± σˆt + + 1 tn−2 (0.975) , sx n

which coincides with the usual, classical interval. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Generalized linear models

The generalizes the normal linear model by allowing the possibility of non-normal error distributions and by allowing for a non-linear relationship between y and x.

A generalized linear model is specified by two functions:

1. A conditional, density function of y given x, parameterized by a mean , µ = µ(x) = E[Y | x] and (possibly) a dispersion parameter, φ > 0, that is independent of x.

2. A (one-to-one) link function, g(·), which relates the mean, µ = µ(x) to the covariate vector, x, as g(µ) = xθ. 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Generalized linear models The following are generalized linear models with the canonical link function which is the natural parameterization to leave the exponential family distribution in canonical form.

Logistic regression It is often used for predicting the occurrence of an event given covariates:

Yi | pi ∼ Bin(ni , pi ) pi log = xi θ 1 − pi It is used for predicting the number of events in a time period given covariates:

Yi | pi ∼ P(λi )

log λi = xi θ 0. Introduction 1. Multivariate normal 2. Normal linear models 3. Generalized linear models

Generalized linear models

The Bayesian specification of a GLM is completed by defining (typically normal or normal gamma) prior distributions p(θ, φ) over the unknown model parameters. As with standard linear models, when improper priors are used, it is then important to check that these lead to valid posterior distributions.

Clearly, these models will not have conjugate posterior distributions, but, usually, they are easily handled by .

In particular, the posterior distributions from these models are usually log concave and are thus easily sampled via adaptive rejection sampling.