Bayesian Analysis of Logistic Regression with an Unknown Change Point

Gössl, Küchenhoff: Bayesian analysis of logistic regression with an unknown change point Sonderforschungsbereich 386, Paper 148 (1999) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Bayesian analysis of logistic regression with an unknown change point Christo Gossl Helmut Kuchenho University of Munich Institute of Statistics Akademiestrae D Munc hen Abstract We discuss Bayesian estimation of a logistic regression mo del with an unknown threshold limiting value TLV In these mo dels it is assumed that there is no eect of a covariate on the resp onse under a certain unknown TLV The estimation of these mo dels with a fo cus on the TLV in a Bayesian context by Markovchain Monte Carlo MCMC metho ds is considered We extend the mo del by accounting for measurement error in the covariate The Bayesian solution is compared with the likeliho o d solution prop osed by Kuchenho and Carroll using a data set concerning the relationship between dust concentration in the working place and the o ccurrence of chronic bronchitis Keywords threshold limiting value TLV segmented regression mea surement error MCMC Intro duction In toxicology environmental and o ccupational epidemiology the assessment of threshold limiting values TLVs is an imp ortant task In a doseresp onse relati onship the TLV is the dose of the toxin or a substance under which there is no inuence on the resp onse In many applications there is a controversy ab out the existence of such a TLV from a substantive point of view In empirical studies evidence for the existence of a TLV and its estimation is often dicult since distinguishing between no eect and a small eect can only be done by huge data sets There are dierent mo dels and metho ds for assessing a TLV see eg Kuchenho and Ulm In this pap er we concentrate on a fully parametric logistic regression mo del prop osed by Ulm In this mo del which is a seg mented regression mo del the TLV is treated as an unknown parameter which can b e estimated assuming its existence The interval estimates of the TLV give some evidence ab out its existence since a TLV which is smaller than the smallest observed dose is equivalent to a non existing TLV While the theoretical and prac tical problems in maximum likeliho o d estimation and the frequentist treatment of this mo del has b een discussed by Kuchenho and Wellisch we use a Bayesian approach In this context no dierentiability assumptions are necessary and it can be implemented with Markov chain Monte Carlo MCMC metho ds We apply our metho ds to a study concerning the relationship b etween dust con centration in the working place and the o ccurrence of chronic bronchitis In this study the exp osure can only be measured with substantial measurement error Therefore we also showhow to incorp orate this measurement error in our mo del Since there are dierent approaches and p ossibilities concerning the MCMC al gorithm and the assumption of the distribution of the regressor variable we give a detailed discussion of the bronchitis example The results are compared with those of a frequentist approach The pap er is organized as follows In Section we present the mo del and a Bayesian solution of the problem of estimating the limiting value of a logistic threshold mo del We prop ose a way to calculate the estimates by means of MCMC metho ds The mo deling and the handling of measurement error in the dose covariate of our mo del is treated in Section In Section we apply our metho ds to analyze in detail an o ccupational study regarding the assessing of a TLV for dust concentration in the working place Further our metho ds are compared with the dierent approaches as investigated b y Kuchenho and Carroll A Bayesian Approach to the Logistic Thres hold Mo del In the following we fo cus our analysis on the logistic threshold mo del prop osed by Ulm P Y jX x Z z Gz x k k where Gt exp t k maxx IR and x k k Here Y denotes the resp onse variable X is the dose variable Z refers to further covariates The unknown mo del parameters are and the TLV As can b e seen from there is no inuence of X on Y if X is smaller than which exactly reects the concept ofaTLV In contrast to the classical frequentist inference the parameters of a Bayesian mo del are not supp osed to be x but at random For each of them exists a probability function which reects the prior knowledge of their value the so called priors Now it is p ossible according to the theorem of Bayes to determine in combination with the likeliho o d function of the data a socalled p osterior of the parameters This p osterior distribution includes all knowledge relating to the parameters once from the prior and on the other hand from the likeliho o d The theorem of Bayes in its simplest form runs p data p jdata pdata Here data denotes the observed and the unknown parameters and latent va riables The numerator is the pro duct of the likeliho o d and the priors Note that in contrast to likeliho o d analysis no further assumptions on p data like dierentiability in are needed for the analysis From this p osterior the Bayesian p oint and interval estimates are derived The median or the mean of the partial densities are dep ending on the used loss function appropriate estimates for the parameters In this pap er we use the mean of the p osterior as p oint estimate and probability intervals which can be regarded as Bayesian equivalents to the classical condencesets Thus to derive the Bayesian p osterior for our logistic threshold mo del we have to determine the conditional likeliho o d function and the prior distributions The conditional likeliho o d of the iid sample y z x i n is according i i i to given by n Y y y i i Y jZ X i i i where G z x i k i k i As usual refers to the density or probability of the corresp onding random variables We assume that the threshold is in the range of our observed data X and use a uniform prior in the range of the observed data for the threshold For a at prior is assumed Now the p osterior density of the parameters is Y jZ X R jY Z X Y jZ X d Based on this density the ab ove mentioned estimates are to be calculated Alt hough the determination of the numerator is easy in most practical cases it is not p ossible to evaluate the denominator in an analytic way For the threshold mo del wesolve this problem by means of MCMC metho ds These metho ds allow to take a sample from a densityonlyknown up to a normalizing constant which is in our particular problem the denominator Then on the basis of this sample of the p osterior the Bayesian estimates can simply b e calculated see eg Gilks Richardson and Spiegelhalter For the logistic threshold mo del we use a two step Metrop olis Hastings MH algorithm with multivariate random walk prop osals in each step We sample the parameter and the threshold in one step and the parameter vector in k k step two The full conditionals are straightforward Since the densities cannot be analytically determined it is not p ossible to apply the GibbsSampler We take two steps b ecause of the strong dep endence between the threshold and The covariance matrices of the prop osals are tuned according to test runs to k acceptance rates from to The starting values are chosen overdisp ersed at random and the burnin phase has to be determined by comparing several runs and then discarded Due to high auto correlation and slow convergence in the MHoutput it is often necessary to thin out the simulated chain by taking only every k th observation into the sample Wecho ose k such that the auto correlation decreases to a suciently lowlevel The total extend of the runs dep ends on the convergence of the Markovchains and can b e determined by comparing the p oint estimates of several runs Figure shows a tra jectory of such a run and the b elonging histogram with kernel density estimate for a simulated dataset For results on real data we refer to Section Figure Errors in Variables In many practical regression problems the regressors can only b e measured with measurement error Here we want to prop ose a solution for incorp orating mea surement error of the variable X in our Bayesian mo del A general intro duction to the measurement error problem in threshold mo dels is given by Kuchenho and Carroll see also Carroll Rupp ert Stefanski We assume an additive measurement error mo del ie instead of X the variable W X U is observed where U is the measurement error which is indep endent of Y X Z and is normally distributed with E U and V U u Now we split our mo del in three parts main mo del Y j X Z error mo del W j X covariable mo del X j Z where and are the mo del parameters Using the indep endence assumption of U and Y W X the likeliho o d of the whole mo del can be written as Z n Y y jx z w jx xjz dx Y WjZ i i i i i where For our Bayesian approach the underlying mo del is the threshold mo del as and nally we assume measurement error mo del we dene W j X N X u X to be indep endent from Z and to have a normal distribution This approach distributions which is a is extended in Section to a nite mixture of normal exible mo del for X with few parameters With resp ect to the priors we prop ose as in Section that no additional informa tion is available and therefore dene again noninformatives for and For the parameters and of the covariable mo del we assume similar to the nonin x x formatives a normal distribution with mean and a very large variance s and a highly disp ersed inversegamma

Load more