Bayesian Estimation and Inferenceq
Total Page:16
File Type:pdf, Size:1020Kb
Greene-50558 book June 21, 2007 15:50 18 BAYESIAN ESTIMATION AND INFERENCEQ 18.1 INTRODUCTION The preceding chapters (and those that follow this one) are focused primarily on para- metric specifications and classical estimation methods. These elements of the economet- ric method present a bit of a methodological dilemma for the researcher. They appear to straightjacket the analyst into a fixed and immutable specification of the model. But in any analysis, there is uncertainty as to the magnitudes, sometimes the signs and, at the extreme, even the meaning of parameters. It is rare that the presentation of a set of empirical results has not been preceded by at least some exploratory analysis. Pro- ponents of the Bayesian methodology argue that the process of “estimation” is not one of deducing the values of fixed parameters, but rather, in accordance with the scientific method, one of continually updating and sharpening our subjective beliefs about the state of the world. Of course, this adherence to a subjective approach to model building is not necessarily a virtue. If one holds that “models” and “parameters” represent objec- tive truths that the analyst seeks to discover, then the subjectivity of Bayesian methods may be less than perfectly comfortable. Contemporary applications of Bayesian methods typically advance little of this the- ological debate. The modern practice of Bayesian econometrics is much more pragmatic. As we will see below in several examples, Bayesian methods have produced some re- markably efficient solutions to difficult estimation problems. Researchers often choose the techniques on practical grounds, rather than in adherence to their philosophical basis; indeed, for some, the Bayesian estimator is merely an algorithm.1 Bayesian methods have have been employed by econometricians since well be- fore Zellner’s classic (1971) presentation of the methodology to economists, but until fairly recently, were more or less at the margin of the field. With recent advances in technique (notably the Gibbs sampler) and the advance of computer software and hardware that has made simulation-based estimation routine, Bayesian methods that rely heavily on both have become widespread throughout the social sciences. There are libraries of work on Bayesian econometrics and a rapidly expanding applied 1For example, from the home website of MLWin, a widely used program for multilevel (random parameters) modeling, http://www.cmm.bris.ac.uk/MLwiN/features/mcmc.shtml, we find “Markov Chain Monte Carlo (MCMC) methods allow Bayesian models to be fitted, where prior distributions for the model parameters are specified. By default MLwiN sets diffuse priors which can be used to approximate maximum likelihood estimation.” Train (2001) is an interesting application that compares Bayesian and classical estimators of a random parameters model. 600 Greene-50558 book June 21, 2007 15:50 CHAPTER 18 ✦ Bayesian Estimation and Inference 601 literature.2 This chapter will introduce the vocabulary and techniques of Bayesian econometrics. Section 18.2 lays out the essential foundation for the method. The canon- ical application, the linear regression model, is developed in Section 18.3. Section 18.4 continues the methodological development. The fundamental tool of contemporary Bayesian econometrics, the Gibbs sampler, is presented in Section 18.5. Three appli- cations and several more limited examples are presented in Sections 18.6, 18.7, and 18.8. Section 18.6 shows how to use the Gibbs sampler to estimate the parameters of a probit model without maximizing the likelihood function. This application also in- troduces the technique of data augmentation. Bayesian counterparts to the panel data random and fixed effects models are presented in Section 18.7. A hierarchical Bayesian treatment of the random parameters model is presented in Section 18.8 with a com- parison to the classical treatment of the same model. Some conclusions are drawn in Section 18.9. The presentation here is nontechnical. A much more extensive entry level presentation is given by Lancaster (2004). Intermediate-level presentations appear in Cameron and Trivedi (2005, Chapter 13), and Koop (2003). A more challenging treat- ment is offered in Geweke (2005). The other sources listed in footnote 2 are oriented to applications. 18.2 BAYES THEOREM AND THE POSTERIOR DENSITY The centerpiece of the Bayesian methodology is the Bayes theorem: for events A and B, the conditional probability of event Agiven that B has occurred is P(B|A)P(A) P(A|B) = . (18-1) P(B) Paraphrased for our applications here, we would write P(data | parameters)P(parameters) P(parameters | data) = . P(data) In this setting, the data are viewed as constants whose distributions do not involve the parameters of interest. For the purpose of the study, we treat the data as only a fixed set of additional information to be used in updating our beliefs about the parameters. Note the similarity to (14-1). Thus, we write P(parameters | data) ∝ P(data | parameters)P(parameters) (18-2) = Likelihood function × Prior density. The symbol ∝ means “is proportional to.” In the preceding equation, we have dropped the marginal density of the data, so what remains is not a proper density until it is scaled by what will be an inessential proportionality constant. The first term on the right is the joint distribution of the observed random variables y, given the parameters. As we 2Recent additions to the dozens of books on the subject include Gelman et al. (2004), Geweke (2005), Gill (2002), Koop (2003), Lancaster (2004), Congdon (2005), and Rossi et al. (2005). Readers with an historical bent will find Zellner (1971) and Leamer (1978) worthwhile reading. There are also many methodological surveys. Poirier and Tobias (2006) as well as Poirier (1988, 1995) sharply focus the nature of the methodological distinctions between the classical (frequentist) and Bayesian approaches. Greene-50558 book June 21, 2007 15:50 602 PART IV ✦ Estimation Methodology shall analyze it here, this distribution is the normal distribution we have used in our previous analysis—see (14-1). The second term is the prior beliefs of the analyst. The left-hand side is the posterior density of the parameters, given the current body of data, or our revised beliefs about the distribution of the parameters after “seeing” the data. The posterior is a mixture of the prior information and the “current information,” that is, the data. Once obtained, this posterior density is available to be the prior density function when the next body of data or other usable information becomes available. The principle involved, which appears nowhere in the classical analysis, is one of continual accretion of knowledge about the parameters. Traditional Bayesian estimation is heavily parameterized. The prior density and the likelihood function are crucial elements of the analysis, and both must be fully specified for estimation to proceed. The Bayesian “estimator” is the mean of the posterior density of the parameters, a quantity that is usually obtained either by integration (when closed forms exist), approximation of integrals by numerical techniques, or by Monte Carlo methods, which are discussed in Section 17.3. Example 18.1 Bayesian Estimation of a Probability Consider estimation of the probability that a production process will produce a defective product. In case 1, suppose the sampling design is to choose N = 25 items from the production line and count the number of defectives. If the probability that any item is defective is a constant θ between zero and one, then the likelihood for the sample of data is L(θ | data) = θ D (1− θ) 25−D , where D is the number of defectives, say, 8. The maximum likelihood estimator of θ will be p = D/25 = 0.32, and the asymptotic variance of the maximum likelihood estimator is estimated by p(1− p)/25 = 0.008704. Now, consider a Bayesian approach to the same analysis. The posterior density is obtained by the following reasoning: p(θ, data) p(θ, data) p(data | θ) p(θ) p(θ | data) = = = θ θ p(data) θ p( , data)d p(data) Likelihood(data | θ) × p(θ) = p(data) where p(θ) is the prior density assumed for θ. [We have taken some license with the termi- nology, since the likelihood function is conventionally defined as L(θ | data).] Inserting the results of the sample first drawn, we have the posterior density: θ D (1− θ) N−D p(θ) p(θ | data) = . θ D − θ N−D θ θ θ (1 ) p( )d What follows depends on the assumed prior for θ. Suppose we begin with a “noninforma- tive” prior that treats all allowable values of θ as equally likely. This would imply a uniform distribution over (0,1). Thus, p(θ) = 1, 0 ≤ θ ≤ 1. The denominator with this assumption is a beta integral (see Section E2.3) with parameters a = D + 1 and b = N − D + 1, so the posterior density is θ D (1− θ) N−D ( N + 2)θ D (1− θ) N−D p(θ | data) = = . ( D + 1)( N − D + 1) ( D + 1)( N − D + 1) ( D + 1 + N − D + 1) This is the density of a random variable with a beta distribution with parameters (α, β) = ( D +1, N − D +1). (See Section B.4.6.) The mean of this random variable is ( D +1)/( N +2) = 9/27 = 0.3333 (as opposed to 0.32, the MLE). The posterior variance is [( D +1)/( N − D +1)]/ [( N + 3)( N + 2) 2] = 0.007936. Greene-50558 book June 21, 2007 15:50 CHAPTER 18 ✦ Bayesian Estimation and Inference 603 There is a loose end in this example. If the uniform prior were noninformative, that would mean that the only information we had was in the likelihood function. Why didn’t the Bayesian estimator and the MLE coincide? The reason is that the uniform prior over [0,1] is not really noninformative.