
Journal of Econometrics 185 (2015) 409–419 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom Bayesian regression with nonparametric heteroskedasticity Andriy Norets Department of Economics, Brown University, United States article info a b s t r a c t Article history: This paper studies large sample properties of a semiparametric Bayesian approach to inference in a linear Received 3 March 2014 regression model. The approach is to model the distribution of the regression error term by a normal Received in revised form distribution with the variance that is a flexible function of covariates. The main result of the paper is a 1 September 2014 semiparametric Bernstein–von Mises theorem under misspecification: even when the distribution of the Accepted 19 December 2014 regression error term is not normal, the posterior distribution of the properly recentered and rescaled Available online 7 January 2015 regression coefficients converges to a normal distribution with the zero mean and the variance equal to the semiparametric efficiency bound. Keywords: Bayesian linear regression ' 2014 Elsevier B.V. All rights reserved. Heteroskedasticity Misspecification Posterior consistency Semiparametric Bernstein–von Mises theorem Semiparametric efficiency Gaussian process priors Multivariate Bernstein polynomials 1. Introduction efficients converges to a normal distribution with the zero mean and the variance equal to the semiparametric efficiency bound. The D 0 C A linear model Yi Xi β0 ϵi with the conditional moment equality of the variance to the semiparametric efficiency bound restriction E(ϵijXi/ D 0 is a standard regression model, which is suggests that the Bayesian inference about the linear coefficients widely used in statistics and econometrics. This paper analyzes based on this model is conservative in the following sense: the pos- asymptotic properties of a Bayesian semiparametric approach to terior variance in a correctly specified parametric model is likely estimation of this model. The approach is to model the distribu- to be smaller than the posterior variance in a model that postu- tion of the error term by a normal distribution with the variance lates normally distributed errors with the flexibly modeled vari- that is a flexible function of covariates. For example, Gaussian pro- ance. With carefully specified priors, Bayesian procedures usually cess priors, splines, or polynomials can be used to build a prior behave well in small samples. Thus, the Bayesian normal linear for the variance. Normality of the error term guarantees that the regression with nonparametric heteroskedasticity can also be an Kullback–Leibler (KL) distance between the model and the data attractive alternative to classical semiparametrically efficient esti- generating process (DGP), which does not necessarily satisfy the mators from Carroll(1982) and Robinson(1987). At the same time, normality assumption, is minimized at the data generating values the results of the paper provide a Bayesian interpretation to these of the linear coefficients and the conditional variance of the error classical estimators. term. Thus, one can expect that the posterior asymptotically con- Several different approaches to inference in a regression model centrates around the true values for these two parameters. The have been proposed in the Bayesian framework. In a standard text- normality assumption can also be justified by appealing to the book linear regression model, normality of the error terms is as- principle of maximum entropy of Jaynes(1957) when only the first sumed. More recent literature relaxed the normality assumption two conditional moments are of interest. by using mixtures of normal or Student t distributions. However, The main result of the paper is a semiparametric Bernstein–von if the shape of the error distribution depends on covariates then Mises theorem under misspecification: even when the distribution the posterior may not concentrate around the data generating val- of the regression error term is not normal in the DGP, the posterior ues of the linear coefficients (Müller, 2013). Lancaster(2003) and distribution of the properly recentered and rescaled regression co- Poirier(2011) do not assume linearity of the regression function and treat the linear projection coefficients as the parameters of in- terest. They use Bayesian bootstrap (Rubin, 1981) to justify from E-mail address: [email protected]. the Bayesian perspective the use of the ordinary least squares es- http://dx.doi.org/10.1016/j.jeconom.2014.12.006 0304-4076/' 2014 Elsevier B.V. All rights reserved. 410 A. Norets / Journal of Econometrics 185 (2015) 409–419 timator with a heteroskedasticity robust covariance matrix. Pele- on a finite constant B is specified below, and a prior distribution for nis(2014) demonstrates posterior consistency in a semiparamet- σ on S. It is also assumed that σ0 2 S. ric model with a parametric specification for the regression func- The prior on S will be assumed to put a sufficiently large tion and a nonparametric specification for the conditional distri- probability on the following class of smooth functions bution of the regression error term. It is also possible to estimate k a fully nonparametric model for the distribution of the response SM,α D σ V X !Tσ; σUV max sup j@ σ .x/j +···+ ≤ conditional on covariates, see, for example, Peng et al.(1996), k1 kd α x2X Wood et al.(2002), Geweke and Keane(2007), Villani et al.(2009), j@ kσ .x/ − @ kσ .z/j C max sup ≤ M ; and Norets(2010) for Bayesian models based on smoothly mix- α−α k1+···+kdDα x6Dz2X kx − zk ing regressions or mixtures of experts and MacEachern(1999), k k k1 kd DeIorio et al.(2004), Griffin and Steel(2006), Dunson and Park where k D .k1;:::; kd/ is a multi-index, @ D @ =@ x1 ··· @ xd is (2008), Chung and Dunson(2009), Norets and Pelenis(2014), and a partial derivative operator, and α is the greatest integer strictly Pati et al.(2013) for models based on dependent Dirichlet pro- smaller than α > 0. cesses. These fully nonparametric models require a lot of data The distribution of covariates is assumed to be ancillary and it for reliable estimation results and prior specification is nontrivial. is not modeled. The likelihood function is given by The model considered in the present paper is more parsimonious. n n n Y Nevertheless, it delivers consistent estimation of the first two con- p.Y jX ; β; σ / D pβ,σ .YijXi/; ditional moments, conservative inference about the regression co- iD1 efficients, and it is robust to misspecification of the regression error n − 0 2 Y 1 .Yi Xi β/ distribution. Thus, it can be thought of as a useful intermediate step pβ,σ .YijXi/ D p exp − : 2 2 X between fully nonparametric and oversimplistic models. iD1 2πσ .Xi/ σ . i/ Bayesian Markov chain Monte Carlo (MCMC) estimation algo- For A 2 A, the posterior is given by rithms for the normal regression with flexibly modeled variance R n n have been developed in the literature; see, for example, Yau and p.Y jX ; β; σ /dΠ(β; σ / Π.AjYn; Xn/ D A : Kohn(2003) and Chib and Greenberg(2013), who use transformed R nj n d×S p.Y X ; β; σ /dΠ(β; σ / splines, or Goldberg et al.(1998), who use transformed Gaussian R process prior for modeling the variance. In those papers, the mod- In misspecified models, parameter values minimizing the KL els with flexibly modeled variances were shown to perform well distance between the model and the DGP are called pseudo true in simulation studies. Thus, the present paper considers only the parameter values. It is well known that in models with finite di- theoretical properties of the model. mensional parameters the maximum likelihood and Bayesian esti- The rest of the paper is organized as follows. Section2 describes mators are consistent for the pseudo true parameter values under the DGP. The model is described in Section3. The Bernstein–von weak regularity conditions (see Huber(1967), White(1982), and Mises theorem is presented in Section4. The assumptions of the Gourieroux et al.(1984) for classical results and Geweke(2005) theorem are verified in Section5 for priors based on truncated and Kleijn and van der Vaart(2012) for Bayesian results). Analo- gous results for misspecified infinite dimensional models are ob- Gaussian processes and multivariate Bernstein polynomials. Sec- 1 tion6 concludes. Proofs are delegated to Section7. tained in Kleijn and van der Vaart(2006). Thus, the following lemma suggests that in the regression model described above the posterior concentrates around (β ; σ / in large samples. 2. Data generating process 0 0 Lemma 1. Consider the DGP and the model described above. Suppose The data are assumed to include n observations on a response j j j 1 n n E. log f0.Yi Xi/ / < . Then, variable and covariates .Y ; X / D .Y1;:::; Yn; X1;:::; Xn/, where d Y 2 Y ⊂ R and X 2 X ⊂ , i 2 f1;:::; ng. X is f0.YijXi/ i i R (β ; σ / 2 argmin E log : assumed to be convex and bounded set with a nonempty interior. 0 0 β2Rd,σ VX!Tσ;σ U pβ,σ .YijXi/ The observations are independently identically distributed (iid), 0 .Yi; Xi/ ∼ F0. The joint DGP distribution F0 is assumed to have a If E.XiXi / is positive definite, then the minimizer is F0 almost surely conditional density f0.YijXi/ with respect to (w.r.t.) the Lebesgue unique. measure. The distribution of the infinite sequence of observations, 1 1 1 · The lemma is proved in Section7. .Y ; X /, is denoted by F0 . Hereafter, the expectations E. / and E.·|·/ are taken w.r.t. the DGP F 1. 0 4. Semiparametric Bernstein–von Mises theorem Let us make the following assumptions about the data gen- erating process. First, E.Y jX / D X 0β .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-