Bayesian Inference in the Normal Linear Regression Model
Total Page:16
File Type:pdf, Size:1020Kb
Bayesian Inference in the Normal Linear Regression Model () Bayesian Methods for Regression 1 / 53 Bayesian Analysis of the Normal Linear Regression Model Now see how general Bayesian theory of overview lecture works in familiar regression model Reading: textbook chapters 2, 3 and 6 Chapter 2 presents theory for simple regression model (no matrix algebra) Chapter 3 does multiple regression In lecture, I will go straight to multiple regression Begin with regression model under classical assumptions (independent errors, homoskedasticity, etc.) Chapter 6 frees up classical assumptions in several ways Lecture will cover one way: Bayesian treatment of a particular type of heteroskedasticity () Bayesian Methods for Regression 2 / 53 The Regression Model Assume k explanatory variables, xi1,..,xik for i = 1, .., N and regression model: yi = b1 + b2xi2 + ... + bk xik + #i . Note xi1 is implicitly set to 1 to allow for an intercept. Matrix notation: y1 y2 y = 2 . 3 6 . 7 6 7 6 y 7 6 N 7 4 5 # is N 1 vector stacked in same way as y () Bayesian Methods for Regression 3 / 53 b is k 1 vector X is N k matrix 1 x12 .. x1k 1 x22 .. x2k X = 2 ..... 3 6 ..... 7 6 7 6 1 x .. x 7 6 N2 Nk 7 4 5 Regression model can be written as: y = X b + #. () Bayesian Methods for Regression 4 / 53 The Likelihood Function Likelihood can be derived under the classical assumptions: 1 2 # is N(0N , h IN ) where h = s . All elements of X are either fixed (i.e. not random variables). Exercise 10.1, Bayesian Econometric Methods shows that likelihood function can be written in terms of OLS quantities: n = N k, 1 b = X 0X X 0y 0 b y X b y X b s2 = n Likelihood function: b b 1 p(y b, h) = N j (2p) 2 k n 2 h 0 2 hn h exp 2 b b X 0X b b h exp 2 2s n o () Bayesianb Methods for Regressionb 5 / 53 The Prior Common starting point is natural conjugate Normal-Gamma prior b conditional on h is now multivariate Normal: 1 b h N(b, h V ) j Prior for error precision h is Gamma 2 h G (s , n) 2 b, V , s and n are prior hyperparameter values chosen by the researcher Notation: Normal-Gamma distribution 2 b, h NG b, V , s , n . () Bayesian Methods for Regression 6 / 53 The Posterior Multiply likelihood by prior and collecting terms (see Bayesian Econometrics Methods Exercise 10.1). Posterior is 2 b, h y NG b, V , s , n j where 1 1 V = V + X 0X , 1 b = V V b + X0X b n = n + N b 2 and s is defined implicitly through 1 2 2 2 0 1 ns = ns + ns + b b V + X 0X b b . h i b b () Bayesian Methods for Regression 7 / 53 Marginal posterior for b: multivariate t distribution: b y t b, s2V , n , j Useful results for estimation: E (b y) = b j ns2 var(b y) = V . j n 2 Intuition: Posterior mean and variance are weighted average of information in the prior and the data. () Bayesian Methods for Regression 8 / 53 A Noninformative Prior Noninformative prior sets n = 0 and V is big (big prior variance implies large prior uncertainty). But there is not a unique way of doing the latter (see Exercise 10.4 in Bayesian Econometric Methods). 1 A common way: V = cIk where c is a scalar and let c go to zero. This noninformative prior is improper and becomes: 1 p (b, h) ∝ . h With this choice we get OLS results. 2 b, h y NG b, V , s , n j where 1 V = X 0X b= b n = N b ns2 = ns2. () Bayesian Methods for Regression 9 / 53 Model Comparison Case 1: M1 imposes a linear restriction and M2 does not (nested). Case 2: M1 : y = X1 b(1) + #1 and M2 : y = X2 b(2) + #2, where X1 and X2 contain different explanatory variables (non-nested). Both cases can be handled by defining models as (for j = 1, 2): Mj : yj = Xj b(j) + #j Non-nested model comparison involves y1 = y2. Nested model comparison defines M2 as unrestricted regression. M1 imposes the restriction can involve a redefinition of explanatory and dependent variable. () Bayesian Methods for Regression 10 / 53 Example: Nested Model Comparison M2 is unrestricted model y = b1 + b2x2 + b3x3 + # M1 restricts b3 = 1, can be written: y x3 = b + b x2 + # 1 2 M1 has dependent variable y x3 and intercept and x2 are explanatory variables () Bayesian Methods for Regression 11 / 53 Marginal likelihood is (for j = 1, 2): 1 2 nj V j 2 2 p(yj Mj ) = cj j j nj s j V j j j j cj is constant depending on prior hyperparameters, etc. 1 n 2 1 V 1 2 2 c1 j j n1s p(M1) V 1 1 PO12 = j j 1 n V 2 2 2 2 2 c2 jV j n2s2 p(M2) j 2 j Posterior odds ratio depends on the prior odds ratio and contains rewards for model fit, coherency between prior and data information and parsimony. () Bayesian Methods for Regression 12 / 53 Model Comparison with Noninformative Priors Important rule: When comparing models using posterior odds ratios, it is acceptable to use noninformative priors over parameters which are common to all models. However, informative, proper priors should be used over all other parameters. If we set n1 = n2 = 0. Posterior odds ratio still has a sensible interpretation. Noninformative prior for h1 and h2 is fine (these parameters common to both models) But noninformative priors for b(j)’scauses problems which occur largely when k1 = k2. (Exercise 10.4 of Bayesian Econometric Methods) 6 1 E.g. noninformative prior for b(j) based on V j = cIkj and letting 1 c 0. Since V = k terms involving kj do not cancel out. ! j j j c j If k1 < k2, PO12 becomes infinite, while if k1 > k2, PO12 goes to zero. () Bayesian Methods for Regression 13 / 53 Prediction Want to predict: y = X b + # Remember, prediction is based on: p (y y) = p (y y, b, h) p(b, h y)d bdh. j ZZ j j The resulting predictive: 2 y y t X b, s IT + X VX 0 , n j Model comparison, prediction and posterior inference about b can all be done analytically. So no need for posterior simulation in this model. However, let us illustrate Monte Carlo integration in this model. () Bayesian Methods for Regression 14 / 53 Monte Carlo Integration Remember the basic LLN we used for Monte Carlo integration Let b(s) for s = 1, .., S be a random sample from p(b y) and g (.) be any function and define j S 1 (s) gS = ∑ g b S r =1 then gS converges to E [gb(b) y] as S goes to infinity. j How would you write a computer program which did this? b () Bayesian Methods for Regression 15 / 53 Step 1: Take a random draw, b(s) from the posterior for b using a random number generator for the multivariate t distribution. Step 2: Calculate g b(s) and keep this result. Step 3: Repeat Steps 1 and 2 S times. Step 4: Take the average of the S draws g b(1) , ..., g b(S ) . These steps will yield an estimate of E [g(b) y] for any function of interest. j Remember: Monte Carlo integration yields only an approximation for E [g(b) y] (since you cannot set S = ¥). j By choosing S, can control the degree of approximation error. Using a CLT we can obtain 95% confidence interval for E [g(b) y] j Or a numerical standard error can be reported. () Bayesian Methods for Regression 16 / 53 Empirical Illustration Data set on N = 546 houses sold in Windsor, Canada in 1987. th yi = sales price of the i house measured in Canadian dollars, th xi2 = the lot size of the i house measured in square feet, th xi3 = the number of bedrooms in the i house, th xi4 = the number of bathrooms in the i house, th xi5 = the number of storeys in the i house. () Bayesian Methods for Regression 17 / 53 Example uses informative and noninformative priors. Textbook discusses how you might elicit a prior. Our prior implies statements of the form ”if we compare two houses which are identical except the first house has one bedroom more than the second, then we expect the first house to be worth $5, 000 more than the second”. This yields prior mean, then choose large prior variance to indicate prior uncertainty. The following tables present some empirical results (textbook has lots of discussion of how you would interpret them). 95% HPDI = highest posterior density interval Shortest interval [a, b] such that: p a b b y = 0.95. ≤ j ≤ j () Bayesian Methods for Regression 18 / 53 Prior and Posterior Means for b (standard deviations in parentheses) Prior Posterior Using Noninf Using Inf Informative Prior Prior 0 4, 009.55 4, 035.05 b 1 (10, 000) (3, 593.16) (3, 530.16) 10 5.43 5.43 b 2 (5) (0.37) (0.37) 5, 000 2, 824.61 2, 886.81 b 3 (2, 500) (1, 211.45) (1, 184.93) 10, 000 17, 105.17 16, 965.24 b 4 (5, 000) (1, 729.65) (1, 708.02) 10, 000 7, 634.90 7, 641.23 b 5 (5, 000) (1, 005.19) (997.02) () Bayesian Methods for Regression 19 / 53 Model Comparison involving b Informative Prior Posterior Odds p bj > 0 y 95% HPDI j for bj = 0 b 0.13 [ 10, 957, 2, 887] 4.14 1 b 1.00 [4.71, 6.15] 2.25 10 39 2 b3 0.99 [563.5, 5, 210.1] 0.39 19 b4 1.00 [13, 616, 20, 314] 1.72 10 11 b5 1.00 [5, 686, 9, 596] 1.22 10 Noninformative Prior Posterior Odds p bj > 0 y 95% HPDI j for bj = 0 b 0.13 [ 11, 055, 3, 036] – 1 b2 1.00 [4.71, 6.15] – b3 0.99 [449.3, 5, 200] – b4 1.00 [13, 714, 20, 497] – b5 1.00 [5, 664, 9, 606] – () Bayesian Methods for Regression 20 / 53 Posterior Results for b2 Calculated Various Ways Standard Numerical St.