Interval Estimation for Messy Observational Data
Total Page:16
File Type:pdf, Size:1020Kb
Statistical Science 2009, Vol. 24, No. 3, 328–342 DOI: 10.1214/09-STS305 c Institute of Mathematical Statistics, 2009 Interval Estimation for Messy Observational Data Paul Gustafson and Sander Greenland Abstract. We review some aspects of Bayesian and frequentist interval estimation, focusing first on their relative strengths and weaknesses when used in “clean” or “textbook” contexts. We then turn attention to observational-data situations which are “messy,” where modeling that acknowledges the limitations of study design and data collection leads to nonidentifiability. We argue, via a series of examples, that Bayesian interval estimation is an attractive way to proceed in this context even for frequentists, because it can be supplied with a diagnostic in the form of a calibration-sensitivity simulation analysis. We illustrate the basis for this approach in a series of theoretical considerations, simulations and an application to a study of silica exposure and lung cancer. Key words and phrases: Bayesian analysis, bias, confounding, epi- demiology, hierarchical prior, identifiability, interval coverage, obser- vational studies. 1. INTRODUCTION land and Lash, 2008; Molitor et al., 2009; Turner et al., 2009). The conventional approach to observational-data The entrenchment of the conventional approach analysis is to apply statistical methods that assume derives in part from the fact that realistic models a designed experiment or survey has been conducted. for observational studies are not identified by the In other words, they assume that all unmodeled data, a fact which renders conventional methods and sources of variation are randomized under the de- software useless (except perhaps as part of a larger sign. In most settings, deviations of the reality from fitting cycle). The most commonly proposed mode this ideal are dealt with informally in post-analysis of addressing this problem is sensitivity analysis, discussion of study problems. Unfortunately, such which, however, leads to problems of dimensional- arXiv:1010.0306v1 [stat.ME] 2 Oct 2010 informal discussion seldom appreciates the potential ity and summarization. The latter problems have in size and interaction of sources of bias and, as a con- turn been addressed by Bayesian and related infor- sequence, the conventional approach encourages far mal simulation methods for examining nonidentified too much certainty in inference (Eddy, Hasselblad models (which are often dealt with under the topic and Schachter, 1992; Greenland, 2005, 2009; Green- of nonignorability). These methods include hierar- chical (multilevel) modeling of biases (Greenland, Paul Gustafson is Professor, Department of Statistics, 2003, 2005), which is intertwined with the theme of University of British Columbia, Vancouver, BC V6T the present paper. 1Z2, Canada e-mail: [email protected]. Sander We start in Section 2 by reviewing some notions Greenland is Professor, Departments of Epidemiology of interval estimator performance, with emphasis on and Statistics University of California, Los Angeles, CA coverage averaged over different parameter values. 90095-1772, USA e-mail: [email protected]. Section 3 then extends this discussion to include in- This is an electronic reprint of the original article tervals arising from hierarchical Bayesian analysis published by the Institute of Mathematical Statistics in when data from multiple studies are at hand. These Statistical Science, 2009, Vol. 24, No. 3, 328–342. This two sections reframe existing theory and results in a reprint differs from the original in pagination and manner suited for our present needs. We emphasize typographic detail. a well-known tradeoff: To the extent the selected 1 2 P. GUSTAFSON AND S. GREENLAND prior distribution is biased relative to reality, the considered by many authors, but not with a consis- coverage of a Bayesian posterior interval will be off, tent terminology. While it might be temping to refer but perhaps not by much; and in return the inter- to (1) as “Bayesian” coverage, we find this confus- vals can deliver substantial gains in precision and re- ing since (1) can be evaluated for Bayesian or non- duced false-discovery rates compared to frequentist Bayesian interval estimators. We choose to call it confidence intervals. In addition, hierarchical priors labwise coverage since C(I, P ) is the proportion of provide a means to reduce prior misspecification as right answers reported by a lab or research team studies unfold. applying estimator I in a long series of studies of In Section 4 we turn to the more novel aspect different phenomena (different exposure-disease re- of our work, by studying the case which we believe lationships, say) within a research domain. The role better captures observational-study reality, in which of the PGD P is then to describe the corresponding priors are essential for identification. Here the usual across-phenomena variation in the underlying pa- order of robustness of frequentist vs. Bayesian pro- rameter values. Interest in labwise coverage might cedures reverses: Confidence intervals become only be very direct in some contexts, in that estimator extreme posterior intervals, obtained under degener- operating characteristics in a long sequence of ac- ate priors, with coverage that rapidly deteriorates as tual studies really are the primary consideration. Or reality moves away from these point priors. In con- interest may be more oblique, in that performance trast, the general Bayesian framework with proper on the “next” study is of interest, and this perfor- priors offers some protection against catastrophic mance is being measured conceptually by regarding undercoverage, with good coverage guaranteed un- the next study as a random draw from the popula- der a spectrum of conditions specified by the inves- tion of “potential” or “future” studies. tigator and transparent to the consumer. Section 5 If I is a frequentist confidence interval (abbrevi- summarizes the lessons we take away from our ob- ated FCI), then it will attain nominal coverage ex- servations and makes a recommendation concerning actly for any PGD. That is, if Pr{φ ∈ I(D)|θ} = the practical assessment of interval estimator per- 1 − α for every value of θ, then C(I, P ) = 1 − α for formance. We conclude that Bayesian interval es- any P . Thus, correct coverage for a hypothetical se- timation is an attractive way to proceed even for quence of studies with the same parameter values frequentists, because its relevant calibration proper- implies correct coverage in the more realistic setting ties can be checked in each application via simula- of repeatedly applying a procedure in a sequence tion analysis. We close with an illustration of our of differing real problems. While this fact is often proposed practical approach in an application to a viewed as a robustness property of an FCI, Bayarri study of silica exposure and lung cancer in which an and Berger (2004), citing Neyman (1977), empha- unmeasured confounder (smoking) renders the tar- size that it is the labwise coverage that is relevant get parameter nonidentified. for practice. Put another way, if a lab is well cali- brated in the LWC sense of producing 95% intervals 2. THE WELL-CALIBRATED LAB that capture the true parameter for 95% of stud- Let θ denote the parameter vector, and D the ob- ies, and the cost of failing to capture is the same servable data, for a study that is to be carried out. across studies (as might be the case in some genome Assume for now that the distribution of (D|θ)(i.e., studies or screening projects), there is little obvious “the model”) is known correctly. Say that φ = g(θ) benefit if the intervals happen to also have correct is the scalar parameter of interest, and that I(D) frequentist coverage. is an interval estimator for this target. We define 2.1 Bayesian Intervals under PGDs the labwise coverage (LWC) of I with respect to a parameter-generating distribution (PGD) P as For a given choice of prior distribution Π on the parameter vector θ, a 1−α Bayesian posterior credi- (1) C(I, P )=Pr{φ ∈ I(D)}. ble interval (BPCI) for the target parameter φ would Here the probability is taken with respect to the be any interval having Bayesian probability 1 − α of distribution of (θ, D) jointly, with θ ∼ P and (D|θ) containing φ given the observed data D. The most following the model distribution. common choices of BPCI are the equal-tailed BPCI Interval coverage with respect to a joint distribu- (i.e, the interval formed by the α/2 and 1 − α/2 tion on parameters and data, as in (1), has been posterior quantiles of the target parameter), and INTERVAL ESTIMATION 3 the highest-posterior-density (HPD) BPCI. Though 2.2 Example: Mixture of Near-Null and HPD intervals are optimally short, we consider only Important Effects equal-tailed intervals here, given their simple inter- Say that θ represents the strength of a putative pretation and widespread use. exposure-disease relationship (which may indeed be If the prior Π and the PGD P coincide, then a one of a sequence of such exposure-disease combi- BPCI is guaranteed to have correct labwise cover- nations to be investigated). For instance, θ might age. This strikes us as a fundamental property of be a risk difference or a log odds-ratio relating bi- BPCIs, though it is surprisingly unemphasized in nary exposure and disease variables. Suppose that most introductions to Bayesian techniques. Hence- D is a univariate sufficient statistic such that D|θ ∼ forth, we refer to a BPCI arising from a prior dis- 2 2 N(θ,σ ) where σ is known. Then (D ± qα/2σ) can tribution set equal to the PGD as an omniscient be reported as a 100 × (1 − α)% frequentist confi- or “oracular” BPCI (abbreviated OBPCI), in the dence interval (FCI) for θ, where qα/2 is the 1 − α/2 sense that the investigator is omniscient in know- standard normal quantile.