<<

Case-Control Studies, Inference in

Gary King Harvard University, Cambridge, Massachusetts, U.S.A.

Langche Zeng George Washington University, Washington, District of Columbia, U.S.A.

INTRODUCTION Unfortunately, existing methods make this consensus methodological advice impossible to follow in the most Classic (or ‘‘cumulative’’) case-control sampling designs used research design in many areas of medical research: do not admit inferences about quantities of interest other case-control studies. In practice, medical researchers have than risk ratios and then only by making the rare events historically used classic (i.e., ‘‘cumulative’’) case-control assumption. Probabilities, risk differences, number needed designs along with the rare events assumption (i.e., that to treat, and other quantities cannot be computed without the exposure and nonexposure incidence fractions ap- knowledge of the population incidence fraction. Similarly, proach zero) to estimate risk ratios—or they abandon density (or ‘‘risk set’’) case-control sampling designs do these quantities of interest altogether and merely report not allow inferences about quantities other than the rate odds ratios. Although virtually no one supports the ratio. Rates, rate differences, cumulative rates, risks, and publication of odds ratios alone, this remains the other quantities cannot be estimated unless auxiliary dominant practice in the field. information about the underlying cohort such as the In recent years, medical researchers have been switch- number of controls in each full risk set is available. Most ing to density sampling, which requires no such scholars who have considered the issue recommend assumption for estimating rate ratios. (We discuss the reporting more than just risk and rate ratios, but auxiliary failure-time matched version of density sampling called population information needed to do this is not usually risk-set sampling by biostatisticians.) Unfortunately, available. We address this problem by developing researchers rarely have the population information needed methods that allow valid inferences about all relevant by existing methods to estimate almost any other quantity quantities of interest from either type of case-control study of interest, such as absolute risks and rates, risk and rate when completely ignorant of or only partially knowledge- differences, attributable fractions, or numbers needed to able about relevant auxiliary population information. treat. We provide a way out of this situation by developing methods of estimating all relevant quantities of interest under classic and density case-control sampling designs OVERVIEW when completely ignorant of, or only partially knowl- edgeable about, the relevant population information. Moynihan et al.[1] express the conclusions of nearly all We begin with theoretical work on cumulative case- who have written about the standards of statistical control sampling by Manski,[2,3] who shows that inform- reporting for academic and general audiences: ative bounds on the risk ratio and difference are identified for this sampling design even when no auxiliary In general, giving only the absolute or only the relative population information is available. We build on these benefits does not tell the full story; it is more informative results and improve them in several ways to make them if both researchers and the media make data available in more useful in practice. First, we provide a substantial both absolute and relative terms. For individual deci- simplification of Manski’s risk difference bounds, which sions...consumers need information to weigh the proba- also makes estimation feasible. Second, we show how to bility of benefit and harm; in such cases it [also] seems provide meaningful bounds for a variety of quantities of desirable for media stories to include actual event interest in situations of partial ignorance. Third, we [probabilities] with and without treatment. provide confidence intervals for all quantities and a ‘‘robust Bayesian’’ interpretation of our methods that work even for researchers who are completely ignorant This is a somewhat revised and extended version of Gary King and Langche Zeng, 2002. ‘‘Estimating Risk and Rate Levels, Ratios, of prior information. Fourth, through the reanalysis of and Differences in Case-Control Studies,’’ Statistics in Medicine, 21: the hypothetical example from Manski’s work and a 1409–1427. replication and extension of an epidemiological study of

Encyclopedia of Biopharmaceutical Statistics 1 DOI: 10.1081/E-EBS 120023390 Copyright D 2004 by Marcel Dekker, Inc. All rights reserved. ORDER REPRINTS

2 Case-Control Studies, Inference in bacterial pneumonia in individuals infected with human RR Pr(Y =1jX‘)/Pr(Y =1jX0), and the risk difference (or immunodeficiency virus (HIV), we demonstrate that ‘‘first difference,’’ as it is called in political science, or the adding information in the way we suggest is quite ‘‘attributable risk,’’ as economists and some epidemiol- powerful as it can substantially narrow the bounds on ogists call it), RD Pr(Y =1jX‘)ÀPr(Y =1jX0). The prob- the quantities of interest. Fifth, we extend our methods to ability is evaluated at some values of the explanatory the density case-control sampling design and provide variables, such as X0 or X‘. The quantity RD is the increase informative bounds for all quantities of interest when in probability, and RR is the factor by which the auxiliary information on the population data is not probability increases [an (RRÀ1)Â100 percent increase] available or only partially available. Finally, we suggest when the explanatory variables change from X0 to X‘. new reporting standards for applied research and offer The quantities l(t), rrt,rdt, p, RR, and RD are normally software in Stata and in Gauss that implements the used to study incidence in etiological analyses, but they methods developed in this paper (available at http:// are sometimes used to study by changing the GKing.Harvard.edu). definition of an ‘‘event.’’ Functions of these quantities, such as the proportionate increase in the risk or rate difference, the attributable fraction, or the expected QUANTITIES OF INTEREST number of people needed to treat to prevent one adverse event (or, when the ‘‘treatment’’ has an adverse effect, the For subject i (i=1,...,Dn), define the outcome variable expected number of people needed to harm to result in one Yi,(t,t+ Dt) as 1 when one or more ‘‘events’’ (such as disease adverse event), can also be computed as a function of incidence) occur in interval (t,t+Dt) (for Dt>0) and 0 these quantities with the methods described below. For otherwise. The variable t usually indexes time but can example, to compute the number needed to treat (NNT), we denote any continuous variable. In etiological studies, we merely calculate: shall be interested in Yit limDt ! 0Yi,(t,t + Dt). In other studies, such as of perinatal , conditions 1 NNT ¼ ð2Þ with brief risk periods such as acute intoxication, and jRDj some prevalence data, scholars only measure, or only can measure, Yi,(t,t + Dt), which we refer to as Yi, because the 1 ¼ ð3Þ observation period in these studies is usually the same for j PrðY ¼ 1jX‘ÞÀPrðY ¼ 1jX0Þj all i.[4] Define a k-vector of covariates and constant term a as Xi. In addition, let X0 and X‘ each denote k-vectors of That is, we simply use the reciprocal of the size of the risk possibly hypothetical values of the explanatory variables difference between the ‘‘treatment’’ and ‘‘control’’ (often chosen so that the treatment variable changes and groups. The number needed to harm (NNT) is obtained in the others remain constant at their means). the same way. Quantities of interest that are generally a function of The other quantity frequently discussed and presented t include the rate (or ‘‘hazard rate’’ or ‘‘instantaneous in the epidemiological literature is the rate’’), li(t)=limDt! 0Pr(Yi,(t,t+ Dt) =1jYis =0, 8s

Case-Control Studies, Inference in 3 more comfortable with OR than the other quantities defined failure times and a subset of ‘‘controls’’ (subjects for above; we have found no author who claims to be more which Yit =0) from all individuals at risk at the time of comfortable communicating with the general public using each failure, possibly matched on a set of other control [5] an odds ratio than one of the other quantities. The odds variables. So each sampled risk set Rj ( j=1,...,M, where ratio has been used ‘‘largely because it serves as a link M is the total number of cases in the data) is composed of between results obtainable from follow-up studies and one case (or more in the case of timing ties in their those obtainable from case-control studies.’’[6] We provide occurrence) and a random sample of controls at the same this link with other quantities so that OR is no longer the time (or other continuous index) t. A subject may appear only choice available. in multiple risk sets. This administratively convenient data Concluding a controversy on this subject in the British collection strategy can result in a substantial reduction in Medical Journal, Davies et al.[7] write ‘‘On one thing we the resources required for the study and also has the are in clear agreement: odds ratios can lead to confu- advantage of controlling nonparametrically (i.e., without sion and alternative measures should be used when these functional form assumptions) for all omitted variables, are available.’’ and unmeasured heterogeneity, related to t. Statistical We show here how to make all relevant alternative models for such data specify the incidence (hazard) rate measures always available, even in case-control data. as a function of t and X. The most commonly used model With the results in this article, scholars should never again is the Cox proportional hazard: feel forced into presenting odds ratios.b liðtÞ¼lðtÞrðXi; bÞ ð6Þ

Xib where, in most applications, r(Xi,b)=e and the baseline CASE-CONTROL SAMPLING DESIGNS hazard l(t) does not vary over subjects.c We now introduce the two key case-control sampling designs. With appropriate modifications, most of the INFERENCE WITHOUT methods we introduce also work with many variants POPULATION INFORMATION of them. Classic (or cumulative) case-control designs involve Classic Case-Control sampling (usually all) ‘‘cases’’ (subjects for which Yi =1) and a random sample of ‘‘controls’’ (subjects for which Without additional population information or assump- Yi=0) at the end of the study period. This design is most tions, the literature provides no method for estimating any appropriate in studies of closed populations and when Yi quantity of interest from classic case-control data. but not Yit is observed. Statistical models for such data Historically, medical researchers have used the rare specify the risk Pr(Yi =1jXi) as a function of the input Xi. events assumption to estimate the risk ratio by the odds The most commonly used model is the logistic: ratio. Denote the population fraction of incident cases by 1 t, which is the key piece of auxiliary information about PrðYi ¼ 1jXiÞ¼ ð5Þ the population not reflected in the sample [although other 1 þ eÀXib quantities such as P(X), the possibly multivariate density Some examples in medical research using the cumulative of X, are also sufficient]. The rare events assumption case-control design are studies of congenital malforma- states that t is arbitrarily small [while P(X) stays bounded tion, analyses of ‘‘chronic conditions with ill-defined away from zero, or instead that Pr(Yi =1jXj)!0 for j=0,‘]. onset times and limited effects on mortality, such as This assumption is not merely that cases are ‘‘rare,’’ but obesity and multiple sclerosis, and studies of health that they occur, at the limit, with zero probability. The [8] services utilization’’ and research that has as its goal advantage of this assumption is that, when correct, OR is a descriptive, rather than causal, inferences (such as good approximation to the risk ratio: limt!0OR = RR. ascertaining the types of people that now have lung This limit result is attractive because OR is often easy to cancer so we can better prepare health-care facilities estimate in case-control designs. For example, if X is a in anticipation). single binary variable, OR could be estimated by replacing The newer density case-control sampling design elements in the second line of Eq. 4 with their sample involves, in the study of cohort data organized by risk analogs [e.g., Pr(X‘jY=1) can be estimated by the fraction sets, sampling ‘‘cases’’ (subjects for which Yit =1) at their

c The proportionality assumption holds when Xi is time-invariant. When bUnfortunately, the term ‘‘’’ no longer seems useful because this is not appropriate, a model that allows time-dependent covariates it has been co-opted to denote such diverse quantities as rr, RR, and OR. (such as the so-called Cox regression model) can be used. ORDER REPRINTS

4 Case-Control Studies, Inference in of cases for which Xi =X‘ among all observations where be used on the data organized in risk sets to estimate b. Yi =1]. For another example, in logistic regression Eq. 5, Eq. 7 also takes the same form as the conditional case-control sampling only biases the intercept term, probability expression for matched cohort data (for which [11] which drops out in the odds ratio expression, OR= Greenland gives an alternative modeling strategy for e(X‘ÀX0)b. The coefficients of the control variables also estimating the risk ratio). drop out (because the elements of X‘ ÀX0 corresponding Eq. 7 takes the same form as the likelihood function to those parameters are zero). contribution of a full risk set in a Cox regression model for As is well known (e.g., Ref. [8]), the rare events full cohort data, only that now the sampled risk sets are assumption is problematic when t is not nearly zero, and used. Fortunately, this does not bias the first-order as a result, the odds ratio overestimates RR (when both are condition and preserves the consistency and asymptotic above 1; OR underestimates RR otherwise). In addition, this normality of the maximum likelihood estimator (see assumption implies, implausibly for most applications, Refs. [12–14]). Moreover, values of nj of only about 6 can [12,15] that the risk difference tends to zero (limt!0RD =0), no result in nearly full efficiency. For a discussion of matter how strong the real effect. In practice, RD is biased selection of the controls, see Refs. [16,17] and see therefore not estimated, even when it is of more interest Ref. [18] for a discussion of alternative sampling designs. than RR. With b consistently estimated by the conditional logit When t is very small but not zero, the bias introduced approach, we can compute the rate ratio using the by using OR as if it were RR can be ignored without incidence model (6), although the baseline hazard rate practical consequence. The rare events assumption is l(t) is not estimated: inappropriate in etiology studies with more commonly X b X b l‘ðtÞ lðtÞe ‘ e ‘ occurring events, such as in highly infectious diseases. ðX0 À X‘Þb 8 rr ¼ ¼ X b ¼ X b ¼ e ð Þ The assumption is also often not plausible when study- l0ðtÞ lðtÞe 0 e 0 ing diseases with nonabsorbing states or when study- Thus a key advantage of density designs over classic case- ing prevalence. control designs is that we can estimate rr without any additional (rare events or other) assumptions or any Density Case-Control population information. However, without auxiliary information on the population, the literature provides no In the commonly used proportional hazard model (6), the method for estimating any other quantity of interest, such coefficients b can be consistently estimated without any as the rate, rate difference, the risk, risk difference, risk auxiliary information on the population risk sets. The ratio, or number needed to treat. contribution of each sampled risk set to the likelihood function is the ex ante incidence probability for the individual who will actually acquire the disease, condi- INFERENCE UNDER FULL tional on the total number of cases in the set being one: POPULATION INFORMATION

PrðY ¼ 1Þ Xib P itj Pe We now discuss the quantities of interest that can be PrðYitj ¼ 1jRjÞ¼ ¼ ð7Þ Xkb PrðYktj ¼ 1Þ e estimated under each sampling design when required k2Rj k2Rj auxiliary population information is available. where i indexes the case in the risk set and the summation Classic Case-Control in each denominator is over all observations in the risk set, k=1,...,n , where n is the number of observations, or the j j In the classic case-control design, the population fraction size, of R . The likelihood function then is the product of j of cases, t, can supply the required auxiliary informa- terms such as Eq. 7 over all M sampled risk sets (see, e.g., [19–21] tion. For example, in the simple case of a single Ref. [9]). When a risk set includes multiple cases, because binary explanatory variable, the risk can be estimated as of timing ties, the conditional probability expression is more complicated, but the approach remains the same. p ¼ PrðY ¼ 1jX; tÞ This same estimator was independently proposed in [10] PðXjY ¼ 1Þ PrðY ¼ 1Þ econometrics by Chamberlain to estimate consistently ¼ the parameters in a fixed-effect logit model when the PðXÞ number of subjects per group remains fixed even as the PðXjY ¼ 1Þt total number of subjects grows. Note also that Eq. 7 takes ¼ PðXjY ¼ 1Þt þ PðXjY ¼ 0Þð1 À tÞ ð9Þ the same form as the probability expressions in the popular multinomial logit model (or McFadden’s choice where P(XjY=1) and P(XjY=0) are replaced with the model as some call it), and indeed the same software can sample mean of X among subjects where Y=1 and Y=0, ORDER REPRINTS

Case-Control Studies, Inference in 5 respectively. From this expression, we can compute RR = many new and far more valid parametric approaches to Pr(Y =1jX‘,t) / Pr(Y =1jX0,t), RD = Pr(Y =1jX‘,t)ÀPr(Y= choose from. 1jX0,t), or NNT =1/jRDj. For another example, the commonly used logistic Density Case-Control regression model (5) on case-control data yields consistent maximum likelihood estimates of the slope coefficients To estimate quantities of interest other than rr under and only the estimated intercept b0 requires a correction density case-control sampling, we require information that such that the quantity estimated should instead be: enables the estimation of the baseline hazard rate. We  employ the baseline hazard rate estimator proposed by 1 À t y Ref. [30] where the key population information needed is b0 À ln ð10Þ t 1 À y tj=nj/Nj, the sampling fraction for each observed risk set Rj (j=1,...,M). With this information, and denoting the where ¯y is the mean of Y or the sampling probability of maximum likelihood estimate of b from the conditional Y=1. Thus with exact knowledge of t, all logit parame- logit procedure as b, the baseline incidence rate l at time tj ters can be consistently estimated (see Refs. [22–26]; see is estimated as:d Ref. [19] for a comprehensive review). Let bt denote the X 1 X 1 logit coefficient vector, with the first element corrected as lðtjÞ¼ ¼ ð11Þ Xkb Xkb À lnðtjÞ in Eq. 10. Then the quantities of interest are: ðe Þð1=tjÞ e k2Rj k2Rj À1 p ¼ PrðY ¼ 1jX; tÞ¼½1 þ eÀXbt Š ; and with this, we can estimate the rate, li(t), from Eq. 6:

Xib ÀX‘bt À1 e ½1 þ e Š liðtjÞ¼ P ð12Þ RR Xkb À lnðtjÞ ¼ À1 ; e 1 eÀX0bt ½ þ Š k2Rj

ÀX b À1 ÀX b À1 RD ¼½1 þ e ‘ t Š À½1 þ e 0 t Š ; With the rate, we can further estimate the cumulative rate: X X X b X e i 1 HðTi; XiÞ¼ liðtjÞ¼ ð13Þ NNT ¼ eXkb À lnðtjÞ jRDj tj2Ti tj2Ti k2Rj

For estimation, we can plug in the maximum likelihood and hence the risk: estimates for b or use improved methods with lower mean [26] ÀHðTi;XiÞ square error in the presence of rare events. Standard pi ¼ PrðY ¼ 1jXiÞ¼1 À e errors for any of these quantities can be computed easily 0 1 [27] by simulation. B X Xib C B e C The correction in Eq. 10 applies not only to logit ¼ 1 À exp@À X A ð14Þ Xkb À lnðtjÞ models, but also to any multiplicative intercept model, tj2Ti e even including many neural networks (such as the feed- k2Rj forward perceptron with a logit output function[28,29]). and any of the other quantities of interest. The factor 1/tj Statistical research in medicine still predominately uses serves as a weighting factor that enables us to weight up the logit specification for parametric analysis, whereas each risk set to the full risk set size. numerous other scholarly fields have been steadily switching to more flexible functional forms such as neural networks. In other fields, scholars have recognized that INFERENCE WITH BAYESIAN the logit model makes specific assumptions that are PRIOR ASSUMPTIONS normally without substantive justification and indeed in the vast majority of research in this field are not even ‘‘Inference Without Population Information’’ and ‘‘Infer- addressed explicitly. This is acknowledged in medical ence Under Full Population Information’’ discuss infer- research in its use of matching models, but not when it ence under conditions of ignorance and full knowledge of comes to parametric approaches. Unfortunately, when the relevant auxiliary population information (t and tj). using highly restrictive parametric forms such as logit, our When this information is not known exactly, but some empirical answers are functions of our theoretical specification rather than our data. By recognizing that the correction above applies to a much broader class of dFor simplicity, we use the same notation for the estimated baseline models, and makes considerably less stringent assump- hazard as for the theoretical version. We do the same for the other tions, researchers using case-control designs will have quantities below as well. ORDER REPRINTS

6 Case-Control Studies, Inference in prior information exists, Bayesian methods are appropriate Extending Manski’s ‘‘Ignorance’’ Results and straightforward (although we are not aware of their use in applications in this context). Ideally, this prior in- As an alternative to full knowledge of t and the rare formation would come from registry or survey data from events assumption, Manski[2,3] studied what inferences we the target population, but similar information from closely could make when the researcher had no knowledge of t.It related populations would help form reasonable priors also. is widely known that under these circumstances, no The procedure then is simply to put a prior distribution on t information about risk is available [i.e., p2(0,1)] and RR is (in classic case-control designs) or tj (in density case- bounded between 1 and OR. That is, RR 2 [min(1,OR), control designs) and draw inferences about the quantity of max(1,OR)], apart from sampling error. That this expres- interest from the posterior. These inferences have all the sion depends on the odds ratio is useful because of how desirable properties of Bayesian estimates. They are often it is easily estimable. consistent, efficient, asymptotically normal, etc. when Manski’s advance is that he also shows that RD can be averaging over the prior. We strongly support their use bounded, although it requires the following complicated when prior information is available and known to be valid. expression. Let

Bayesian estimates also share at least two disadvan- 1 tages. First, they are not calculable when the analyst is PrðX0jY ¼ 1Þ PrðX0jY ¼ 0Þ 2 f ¼ ð15Þ completely ignorant of t or tj in some or all parts of its PrðX‘jY ¼ 1Þ PrðX‘jY ¼ 0Þ range because the required prior density cannot be fully and specified, and, in Bayesian inference, one must have a full a prior. Second, Bayesian estimates yield biased inferences g ¼ ; ð16Þ a À½f PrðX jY ¼ 1ÞÀPrðX jY ¼ 1ފ when prior information is biased. This second problem is ‘ 0 especially severe in case-control studies because the data where a=fPr(X‘|Y=0)ÀPr(X0|Y=0, and contain no information about t and tj, and so the prior does not become dominated by the likelihood as the sample size PrðX‘jY ¼ 1Þg grows. That is, even for very large samples, inferences RD ¼ g PrðX jY ¼ 1Þg þ PrðX jY ¼ 0Þð1 À gÞ from case-control data depend heavily on the prior. ‘ ‘ These disadvantages combine in unfortunate ways PrðX0jY ¼ 1Þg À sometimes when analysts are unsure of the validity of PrðX0jY ¼ 1Þg þ PrðX0jY ¼ 0Þð1 À gÞ their prior information. In these cases, the classic ð17Þ Bayesian paradigm may, in practice, have the effect of encouraging researchers to guess values for their prior, Then RD is bounded between 0 and RDg, apart from hence introducing biased information into their analyses sampling error, where RDg is the value of the risk and adversely affecting their inferences. ‘‘Diffuse’’ priors difference if t were equal to g. with large variances are also no solution here because in Manski’s expression for the risk difference is useful but general, increasing the variance of t or tj will also make only when it can be estimated. Unfortunately, except for the mean tend toward 0.5, which of course is a statement very simple cases, sophisticated nonparametric methods of knowledge, not ignorance. are required to estimate each of the component probabil-

We therefore pursue ‘‘robust Bayesian’’ methods ities in Eqs. 15–17, and f, g, and RDg are not easy to below that allow full or partial ignorance to be represented estimate directly; to our knowledge, they have never been accurately without biasing inferences. estimated in a real application. We remedy this situa- tion by showing, in Appendix A,pffiffiffiffiffi that thesep equationsffiffiffiffiffi can be simplified so that RDg ¼ð OR À 1Þ=ð OR þ 1Þ (which, surprisingly, is exactly Yule’s[31] ‘‘coefficient of INFERENCES WITHOUT FULL POPULATION colligation,’’ sometimes called Yule’s Y). The bounds are INFORMATION OR ASSUMPTIONS: thus a simple function of the odds ratio: CLASSIC CASE-CONTROL pffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi OR À 1 OR À 1 RD 2 min 0; pffiffiffiffiffi ; max 0; pffiffiffiffiffi ð18Þ We now discuss inferences under classic case-control de- OR þ 1 OR þ 1 signs about p, RR, and RD without the rare events as- sumption or population information about t. We begin by The advantages of our expression in Eq. 18 are not only extending Manski’s results in the situation of pure algebraic simplicity, and the familiarity with the odds ignorance and then add methods for partial ignorance, ratio among applied researchers, but also that OR can be confidence intervals, and our preferred robust Bayesian estimated very easily without nonparametric methods in interpretation. We conclude the section with two examples. simple discrete cases, in logistic regression, and in a wide ORDER REPRINTS

Case-Control Studies, Inference in 7 variety of multiplicative intercept models such as neural RDt with respect to t, evaluated at t0 and t1, have the same network models. Because these neural network models sign. (This derivative can easily be checked numerically RD RD RD RD have arbitrary approximation capabilities, Eq. 18 can ef- by comparing the signs of t0+e À t0 and t1+e À t1 fectively always be applied. for a suitably small value of e.) When the relationship is monotonic, the bounds are A Proposed ‘‘Available RD 2½minðRD ; RD Þ; maxðRD ; RD ފ ð21Þ Information’’ Assumption t0 t1 t0 t1 and otherwise, they are Applied researchers have been reluctant to adopt Manski’s

‘‘ignorance’’ assumption, perhaps in part because the RD 2½minðRDt0 ; RDt1 ; RDgÞ; maxðRDt0 ; RDt1 ; RDgފ ð22Þ knowledge they have about t is discarded entirely, often The bounds for number needed to treat can be directly resulting in very wide bounds on the quantities of interest. derived from those for the risk difference by using Particularly uncomfortable for researchers is that no reciprocals of the latter. matter how strong the empirical relationship among the variables is, the bounds on RR always include 1 and on RD Revisiting a Numerical Example always include 0, which in both cases denote no treat- ment effect. We illustrate our methods by extending the numerical Thus existing literature effectively requires researchers example concerning smoking and heart disease given to choose among three extreme assumptions: t is es- by Manski.[2,3] For clarity, we follow Manski in ignoring sentially zero, is known exactly (possibly apart from uncertainty (i.e., equating sample fractions with sampling sampling error), or is completely unknown. Our alterna- probabilities as if n!1) in this section (only); in the tive approach is to elicit from researchers a range of next two sections, we show how to include estimation values into which they are willing to say that t must fall uncertainty and compute confidence intervals. This sec- (e.g., [0.001,0.05]), which appears to be a better reflection tion also demonstrates the degree to which results un- of the nature of prior information available in applied der our approach are sensitive to assumptions about the research settings than the extremes of exact knowledge or interval [t ,t ] while holding constant (at zero) estima- complete ignorance. Our approach seems consistent with 0 1 tion uncertainty. (The effect of estimation uncertainty, Manski’s[2] goals for future research and, like his specific while holding constant the interval, follows standard methods, does not require a fully Bayesian prior sampling theory.) distribution. Our approach could also be applied to bring In Manski’s example, X is a binary explanatory var- available information and probabilistic inference to the iable taking values 1 for smokers and 0 for nonsmokers, methods Manski[2] has offered in other areas. and Y takes on the values 1 for coronary heart disease and Let p , RR , RD , and NNT denote values of the t t t t 0 for healthy individuals. The assumptions in his example probability, relative risk, risk difference, and number imply that Pr(X =1jY =1)=0.6, Pr(X =1jY =0)=0.49, needed to treat, respectively, evaluated at t. Suppose that Pr(X=0jY=1)=0.4, and Pr(X=0jY=0)=0.51. Hence using t is known only to fall within the range [t ,t ], where 0 1 Eq. 9, we can write the probabilities as functions of t: 0

RR 2½minðRR ; RR Þ; maxðRR ; RR ފ ð20Þ t t0 t1 t0 t1 ¼ 1:28 À 0:28t The bounds for the risk difference are more compli- cated because RDt is a parabolic function of t, and so the For each of the quantities of interest, we now compare bounds differ in the monotonic and nonmonotonic the case where t is unknown, as Manski does, to where it regions. The relationship is monotonic in regions where is known to lie in the interval [0.05, 0.15] (Manski’s t0 and t1 are both greater than orp bothffiffiffiffiffi less thanpffiffiffiffiffi the value example implies that t=0.1, which he treats as not of t that corresponds to RDg ¼ð OR À 1Þ=ð OR þ 1Þ. known). Without bounds on t, the problem provides no This region corresponds to cases where the derivative of information about any probability, whereas the additional ORDER REPRINTS

8 Case-Control Studies, Inference in information about t gives much more informative bounds: accuracy and reduce speed. The required m depends on the the probability of heart disease among smokers is example; 1000 will often be enough, but it is easy to verify: Pr(Y=1jX=1)2[0.06, 0.18], whereas among nonsmokers, rerun the simulation and if anything changes in as many it is Pr(Y=1jX=0)2[0.04, 0.12]. For relative risk, significant digits as is needed, increase m and try again. Manski’s ‘‘ignorance’’ assumption gives RR2[1., 1.57], whereas our alternative approach implies the very tight A Robust Bayesian Interpretation bounds of RR2[1.46, 1.53], indicating that smoking increases the risk of heart disease between 46% and Our estimation procedure is not strictly Bayesian in that 53%. For the risk difference, the bounds given no choosing an interval for t is not equivalent to imposing a information on t are RD2[0, 0.11], whereas our approach uniform (or any version of a ‘‘noninformative’’) prior yields much narrower bounds of RD2[0.021, 0.056], the density within those bounds. However, our procedure can increase in probability due to smoking. Because smoking be thought of as a special case of ‘‘robust Bayesian has an adverse effect, we can also obtain bounds for NNT, analysis’’ (e.g., Refs. [32,33]) and one that happens to be the number needed to harm, based on that for the risk easier to apply and gives results that are considerably difference. With no information on t, Manski’s approach easier for applied researchers to use than most examples in would result in NNT2[9, 1], which is hardly informative this literature. at all, while our method gives NNT2[18, 48], a range From this robust Bayesian perspective, the interval infinitely narrower than ‘‘between 9 and infinity’’ and of chosen for t can be thought of as narrowing the choice of a course more substantively meaningful. prior to only a class of densities rather than a (fully Bayesian) single density. In our case, the class of priors is Classical Confidence Intervals defined to includeR all densities P(t) subject to the t1 constraint that t0P(t)dt=1. The advantages of this We now provide a method of computing classical approach are that prior elicitation is much easier, it does confidence intervals, saving our preferred robust Bayesian not force analysts to give priors when no prior information interpretation for the following section. Because each end exists, and, more importantly, estimates depend only on of the bounds on p, RR, and RD are measured with error, real information and so are the same for any density within upper and lower confidence intervals could be computed the class. If a classical Bayesian prior is inaccurate, and reported for each. However, the inner bounds (the classical Bayesian inferences will be incorrect. In contrast, upper confidence limit on the lower bound and the lower our ‘‘robust Bayesian’’ approach will give valid inferences confidence limit on the upper bound) are not of interest. even if we can only narrow the prior to a class of densities Thus we recommend defining a confidence interval (CI) rather than one particular density function. (Because data as the range between the outer confidence limits. The do not help in making inferences about t, this model is actual CI coverage of the resulting interval is always at an example of the type of analysis for which Berger[32] least as great as the nominal coverage. argues that robust Bayesian analysis is required.) For all methods provided above, confidence intervals The cost of this approach is that the information about can easily be computed by simulation (the delta method is our quantity of interest can only be narrowed to a class of also possible but difficult because of the discontinuities posterior densities. Fortunately, in the present case, this caused by the minimum and maximum functions). The class can be conveniently summarized as an inequality bounds are known functions of, and derive their sampling (rather than an equality) statement regarding the credible distributions from, the estimated model parameters. intervals: the probability that the actual quantity of in- Therefore the distribution of the bounds can be simulated terest is within the computed interval is always at least as using random draws from the sampling distributions of great as the nominal coverage.e the parameters.[27] For example, in the logit model, the asymptotic distribution of the estimated parameters is normal with mean vector and covariance matrix estimated eOne objection to our procedure is that assuming a zero prior probability by the usual maximum likelihood procedures. Random for t outside the interval [t0,t1] may be unrealistic. Of course, one may draws from this distribution can then be converted into simply enlarge the prior interval to include the nonzero density area, but a probabilistic version is easy to construct: first, elicit the interval random draws from the distribution of the bounds through endpoints, t0 and t1, and also a fraction a (e.g.,R 0.05) such that 1À a the relevant formulas relating the bounds and the model t1P fraction of the time t falls within [t0,t1], i.e., t0 (t)dt=1À a. Then, parameters. A 90% (for example) CI for the bounds can be outside this interval, use a portion of a (single) density to allocate the obtained by sorting the m random draws of the bounds and remaining probability to [0,t0) and (t1,1], so that the sum of the integrals taking the 5th and 95th percentile values as the lower and of the two add to a. In this way, every member of the class of densities on the full interval [0, 1] is proper, and robust Bayesian analysis can proceed upper bounds, respectively. Our software implements as before. We have found these changes inconsequential in the real these procedures. The choice of m reflects the tradeoff applications we have studied, although our software offers the option to between accuracy and speed: larger values of m improve handle this situation. ORDER REPRINTS

Case-Control Studies, Inference in 9

Table 1 Replication and extension of 95% CIs

Risk factor

Quantity of interest IV Drug use Smoking Pneumonia Cirrhosis

OR 1.44–2.70 1.81–3.64 1.01–1.88 1.01–2.49 RR 1.31–2.45 1.52–3.13 1.01–1.73 1.03–2.17 RD 0.03–0.19 0.05–0.27 0.00–0.13 0.00–0.20 Pr(Y =1jX=0) 0.05–0.26 0.05–0.26 0.08–0.31 0.08–0.31 Pr(Y =1jX=1) 0.10–0.38 0.13–0.47 0.09–0.40 0.10–0.50

The OR row is an exact replication and the others are extensions using the methods developed here. The last two rows give the probability of contracting bacterial pneumonia given the absence and presence of the given risk factor, respectively. (From Ref. [34].)

Thus reporting results from our methods can be as somewhat smaller effects than OR, with the most no- simple as from existing methods used in this field. This is ticeable effects for IV drug use and smoking. a considerable advantage over other applications of robust Perhaps more interesting are the final three rows of the Bayesian methods because although considerable research table which offer information not reported in any form in in statistics has focused on these approaches, applications the original article. For example, Table 1 shows that are still fairly rare. The reason is that the class of priors smoking increases the probability of bacterial pneumonia turns into a class of posteriors, and it is the entire class of between 0.05 and 0.27 (a 95% CI for the risk difference, posteriors that constitutes the research results. Summariz- RD). For another example, the 95% CI for the base ing a single multidimensional posterior is already fairly probability of an IV drug user contracting bacterial difficult; summarizing a class of posteriors in multidi- pneumonia is 0.10–0.38. These examples and the other mensional space only creates more problems. Yet in our information in Table 1 all seem like valuable information case, the class of posteriors can be reduced for pre- for researchers and others interested in the study and its sentation purposes to a credible interval, the only dis- results. The information existed in the data from this advantage of which is that it provides an inequality rather study, but they are revealed only by application of the than equality statement. methods offered here.

An Empirical Example

We now reanalyze data provided in Tumbarello et al.,[34] INFERENCE WITHOUT FULL POPULATION the largest case-control study ever conducted of the risk INFORMATION OR ADDITIONAL factors leading to bacterial pneumonia in HIV-infected ASSUMPTIONS: DENSITY CASE-CONTROL patients. We focus on their univariate analysis of risk factors in 350 cases and 700 controls. The authors report We now give analogous results for density case-control prior knowledge of t (the fraction of HIV-seropositive designs to those provided in ‘‘Inferences Without Full individuals in the general population who have an episode Population Information or Assumptions: Classic Case- of bacterial pneumonia), based on previous studies, as Control’’ for classic case-control designs and provide falling in the interval [0.097, 0.29]. We interpret this to be informative bounds for p, RR, RD, li(t), and rdt when no a 99% prior interval and, for simplicity, assume the information or only partial information is available about remaining a=0.01 mass to be uniform in [0, 0.097) and the tj’s in ‘‘Inference Under Full Population Information’’ (0.29, 0.6]; our experiments (not shown) indicate that (we skip rr because it does not depend on tj and is estimable inferences change very little across many reasonable from the conditional logit procedure). Inference from choices for the density outside the [0.097, 0.29] interval density case-control samples is complicated by the fact that more than one piece of population information is involved: for t. f The first row of Table 1 replicates the CI for the there is a tj associated with each of the M risk sets. We univariate odds ratio reported in Ref. [34] for each of four risk factors they considered. The second row of f the table gives 95% CIs for the risk ratio of each risk See Ref. [35] for methods for estimating the risk using only the overall cohort disease rate. However, this estimator is less efficient than the factor, using the robust Bayesian methods described in ‘‘A one proposed by Langholz and Ørnulf [30] and employed here. See Robust Bayesian Interpretation’’ (with m=1000 simula- also Ref. [6] on estimating rates using information on the crude tions). As the table shows, the intervals for RR indicate incidence density. ORDER REPRINTS

10 Case-Control Studies, Inference in elicit the minimum and maximum values of each tj, which the partial derivative is positive and so RR increases with we denote tj and tj, respectively. The interval (tj; tj) can respect to tj. Otherwise, the derivative is negative and change with j or can be constant over risk sets; it can be RR decreases with respect to tj. In Appendix B, we show specified to include 100% of the prior density, as in ‘‘A that Eq. 27 holds whenever rr<1, independent of the Proposed ‘Available Information’ Assumption,’’ or 1Àa values of tj. Similarly, the sign is reversed whenever rr>1. fraction of the density, as in ‘‘A Robust Bayesian Hence RR is either monotonically increasing or monoton- Interpretation.’’ When we are completely ignorant over ically decreasing with respect to tj, for all j. tj, the interval is (0, 1). P Thus when rr<1, RR is monotonically increasing with x b j k RR RR To simplify notation, let rk=e and r = k2Rjrk. respect to all tj and is therefore bounded by and , j Clearly rk>0 and r >0 8k, j. Then the estimators for the which are values of RR with all tj set to their minimum and rate, cumulative rate, and risk in Eqs. 12–14, respec- maximum values, respectively. Otherwise, RR is mono- tively, can be rewritten as: tonically decreasing and the bounds are RR and RR.In short, RR is bounded by ritj 23 liðtjÞ¼ j ð Þ r RR ¼2½minðRR; RRÞ; maxðRR; RRފ ð28Þ X X ritj It is easy to see that lim tj!0 RR =r1/r0, hence when no HðTi; XiÞ¼ liðtjÞ¼ ð24Þ rj information is available for tj and therefore ðtj; tjŠ is (0, 1), tj2Ti tj2Ti the bounds become and RR 2ðminðr1=r0; RRÞ; maxðr1=r0; RRÞÞ ÀHðTi;XiÞ pi ¼ PrðY ¼ 1jXiÞ¼1 À e ! where RR is RR evaluated at tj =1 8j. X ritj ¼ 1 À exp À ð25Þ rj tj2Ti Risk Difference respectively. We now develop bounds for the quantities The case of RD =p1 Àp0 is more complicated because of interest as functions of tj and tj. RD is not a monotonic function of the tj’s and @RD/@tj = ÀH(T1,X‘) ÀH(T0,X0) j Risk [r1e Àr0e ]/r can change signs depending on the values of tj. Under the proportional hazards model j From Eq. 25, we have where ri and r are not functions of time, however, we can reduce the analytically difficult or even intractable @p eÀHðXi;TiÞr problem of constrained optimization in multidimensional i i > 0 ¼ j 8j RD @tj r space to a simple one in which is a one-dimensional function of the cumulative baseline hazard, which is a hence the risk is a monotonically increasing function with monotone functionP of the tj’s. M j respect to every tj. Denote pi and pi as the values of pi Let Q(tj)= j =1tj/r denote the cumulative baseline j with all tj set to tj and tj, respectively; then the bounds hazard rate, and note that @Q(tj)/@tj =1/r >0 for all j, for pi are simply pi 2½pi; piŠ. For example, when we are so Q(tj) is monotonically increasing in all tj’s and completely ignorant about t and therefore (t ; t is (0, 1],  j P j jŠ therefore bounded between Q=Q(tjÞ and Q=Q(tjÞ. r r j the bounds give pi2(0, 1Àexp(À tj2Rj( i/ ))]. Now rewrite RD in terms of Q. From Eq. 24, we have H(Tk,Xk)=rkQ for k=0, 1; hence Risk Ratio Àr Q Àr Q RD ¼ð1 À e 1 ÞÀð1 À e 0 Þ We now examine RR =p1/p0, where pi is as in Eq. 25, Àr0Q Àr1Q i=0, 1. We have ¼ e À e ð29Þ

@RR r1ð1 À p1Þp0 À r0ð1 À p0Þp1 and ¼ ð26Þ @t 2 j j p0r Àr Q Àr Q @RD r1e 1 À r0e 0 30 ¼ j ð Þ the sign of which is determined by that of the numerator. @tj r When the numerator is positive, i.e., when RD is not monotone in Q, but Q is a scalar and we know p1ð1 À p0Þ its bounds, which brings us to a situation mathematically rr ¼ r1=r0 > ¼ OR ð27Þ ð1 À p1Þp0 similar to analyzing RD in classic case-control designs. ORDER REPRINTS

Case-Control Studies, Inference in 11

Let Q* be the solution to the first-order condition @RD/ statistical calculations. We suggest instead that research- @tj =0. From Eq. 30, we can solve for Q*=[1/ ers justify their assumption regarding bounds on t (in (r1Àr0)]ln(r1/r0). Then from Eq. 29, we have classic case-control studies) or tj (in risk set case-control studies) in the data or methods section of their work. r0=ðr1 À r0Þ r1=ðr1 À r0Þ RDðQ*Þ¼ðr0=r1Þ Àðr0=r1Þ ð31Þ Then, they can substitute the confidence interval (CI) now reported for the odds ratio with the CI for their To obtain the bounds for RD, we first see whether ½Q; QŠ chosen quantity (or quantities) of interest. For example, contains Q*. If it does, then instead of an uninformative but presently common RD 2½minðRD; RD; RD*Þ; maxðRD; RD; RD*ފ ð32Þ reporting style: where RD = RD(Q), RD = RD(Q), and RD*=RD(Q*). Other- the effect of smoking on lung cancer is positive OR =1:38 wise, the bounds are (95% CI 1.30–1.46)

RD 2½minðRD; RDÞ; maxðRD; RDފ ð33Þ researchers could give the much more interesting:

When no information is available for tj, the bounds become smoking increases the risk of contracting lung cancer by a factor of between 2.5 and 3.1 (a 95% CI) RD 2½minð0; RD*Þ; maxð0; RD*ފ ð34Þ or Rate smoking increases the probability of contracting lung j cancer between 0.022 and 0.051 (a 95% CI) From Eq. 23, we see that @li(tj)/@tj =ri/r >0; hence li(tj) is monotonically increasing in t . It is therefore bound- j If uncertainty exists over the appropriate bounds for the ed in ðl ðt Þ; l ðt ÞÞ, where l ðt Þ is l (t ) evaluated at t and i j i j i j i j j unknown quantities, we suggest using the widest bounds, l ðt Þ is l (t ) evaluated at t . When we are ignorant with i j i j j conducting sensitivity analyses by showing how the CI respect to the t ’s, the rate is bounded as (0, r /r j). j i depends on different assumptions or setting a to a value other than zero. Rate Difference The methods discussed here are meant to improve presentation and increase the amount of information that Because @rd/@t =(r Àr )/r j, rd is monotonically increas- j 1 0 can be extracted from existing models and data collec- ing in t if r >r and decreasing otherwise. Hence the j 1 0 tions. They do not enable scholars to ignore the usual bound on the rate difference is threats to inference (measurement error, selection bias, confounding, etc.) that must be avoided in any study. rd 2½min½rdðtjÞ; rdðtjފ; max½rdðtjÞ; rdðtjފŠ ð35Þ and when ignorant of all information on tj, the bounds are

j j rd 2½min½0; ðr1 À r0Þ=r Š; max½0; ðr1 À r0Þ=r ŠŠ APPENDIX A

SIMPLIFYING MANSKI’S BOUNDS ON THE RISK DIFFERENCE CONCLUSION Proving Eq. 18 requires algebra only. For simplicity, let As is increasingly recognized, the quantity of interest in Pab =Pr(XajY=b), so that OR =(P11P00)=(P01P10). Then, most case-control studies is not the odds ratio, but rather omitting tedious but straightforwardpffiffiffiffiffi algebra at severalpffiffiffiffiffi some version or function of a probability, risk ratio, risk 2 2 1/2 stages,pffiffiffiffiffi f=(ORP01/P11) = ORP01=P11, and g ¼ OR= difference, rate, rate ratio, or rate difference, depending on ð OR þ P11=P10Þ. Then the components of RDg are context.[6,7,36–41] We provide the methods to estimate each pffiffiffiffi pffiffiffiffi of these quantities from case-control studies, even if no ffiffiffiffiORP10P11 ffiffiffiffiORP01P10 P11g ¼ p ; P01g ¼ p auxiliary information, or only limited auxiliary informa- ORP10 þ P11 ORP10 þ P11 tion, is available. ffiffiffiffiP10P11 ffiffiffiffiP11P00 P10ð1 À gÞ¼p ; P00ð1 À gÞ¼p Unless the odds ratio happens to approximate a ORP10 þ P11 ORP10 þ P11 parameter of central substantive interest, which is quite pffiffiffiffiffi rare, we suggest that it should not be reported any more pandffiffiffiffiffi so putting the termspffiffiffiffiffi togetherpffiffiffiffiffi yields RDg p¼ffiffiffiffiffi OR=ð1 þ frequently than any other intermediate quantity in ORÞÀ1=ð1 þ ORÞ¼ð OR À 1Þ=ð OR þ 1Þ. ORDER REPRINTS

12 Case-Control Studies, Inference in

APPENDIX B Research in the Social Sciences for research support. Software to implement the methods in this paper is MONOTONICITY OF RISK RATIO UNDER available for R, Stata, and Gauss from http://GKing. DENSITY CASE-CONTROL DESIGNS Harvard.Edu.

We show here that if rr<1, then rr>[p1(1Àp0)]/[(1À p1)p0] (the r1/r0>1 case is similar). Let Hk =H(Tk,Xk), REFERENCES k=0, 1. From the definition of p1 and p0,[p1(1Àp0)]/ H1 H0 [(1Àp1)p0] can be simplified to (e À1)/(e À1). 1. Moynihan, R.; Bero, L.; Ross-Degnan, D.; Henry, D.; Lee, Because rr=H1/H0, we only need to show that if H1/ K.; Watkins, J.; Mah, C.; Soumerai, S.B. Coverage by the H1 H0 H0< 1, then H1/H0 >(e À1)/(e À1) or, equivalently, news media of the benefits and risks of medications. N. H0 H1 H1(e À1)>H0(e À1). Engl. J. Med. June 1 2000, 342 (22), 1645–1650. The Taylor series expansions of eH1À1 and eH0À1at 2. Manski, C.F. Identification Problems in the Social 0 give Sciences; Harvard University Press, 1995; p. 31. 3. Manski, C.F. Nonlinear Statistical Inference: Essays in H1 2 3 Honor of Takeshi Amemiya. In Nonparametric Identifica- e À 1 ¼ H1 þð1=2ÞH1 þð1=3!ÞH1 þ ... ð36Þ tion Under Response-Based Sampling; Hsiao, C., Mor- imune, K., Powell, J., Eds.; Cambridge University Press, H 2 3 1999. e 0 À 1 ¼ H þð1=2ÞH þð1=3!ÞH þ ... ð37Þ 0 0 0 4. Greenland, S. On the need for the rare disease assumption and hence in case-control studies. Am. J. Epidemiol. 1982, 116 (3), 547–553. H0 2 5. Deeks, J.; Sackett, D.; Altman, D. Down with odds ratios. H1ðe À 1Þ¼H1H0 þð1=2ÞH0 H1 Evid.-Based Med. 1996, 1 (6), 164–166. 3 þð1=3!ÞH0 H1 þ ... ð38Þ 6. Greenland, S. Interpretation and choice of effect measures in epidemiologic analysis. Am. J. Epidemiol. 1987, 125 (5), 761–768. H1 2 7. Davies, H.T.O.; Manouche, T.; Iain, C.K. When can H0ðe À 1Þ¼H0H1 þð1=2ÞH1 H0 odds ratio mislead. Br. Med. J. March 28 1998, 31, 989– 3 þð1=3!ÞH1 H0 þ ... ð39Þ 991. 8. Rothman, K.J.; Greenland, S. Modern Epidemiology, 2nd The first terms in Eqs. 38 and 39 are equal, and when H1/ Ed.; Lippincott-Raven: Philadelphia, 1998; pp. 113, 244– H0<1, hence H10 and H1 > 0 always). Thus when H1/H0 <1, failure-time models. Biometrica 1978, 65, 153-155. H0 H1 H1(e À1)>H0(e À1). 10. Chamberlain, G. Analysis of covariance with qualitative data. Rev. Econ. Stud. 1980, XLVII, 225–238. 11. Greenland, S. Modeling risk ratios from matched cohort data: An estimating equation approach. Appl. Stat. 1994, 43 (1), 223–232. ACKNOWLEDGMENTS 12. Goldstein, L.; Langholz, B. Asymptotic theory for nested case-control sampling in the cox regression model. Ann. We thank Sander Greenland for his generosity, insight, Stat. 1992, 20 (4), 1903–1928. and wisdom about the epidemiological literature, Norm 13. Borgan, Ø.; Langgholz, B.; Goldstein, L. Methods for the Breslow and Ken Rothman for many helpful explanations, analysis of sampled cohort data in the cox proportional and Chuck Manski for his suggestions, provocative hazard model. Ann. Stat. 1995, 23, 1749–1778. econometric work, and other discussions. Thanks also to 14. Langholz, B.; Goldstein, L. Risk set sampling in Neal Beck, Rebecca Betensky, Josue´ Guzma´n, Bryan epidemiologic cohort studies. Stat. Sci. 1996, 11 (1), Langholz, Meghan Murray, Adrain Raferty, Ted Thomp- 35–53. son, Jon Wakefield, Clarice Weinberg, and David 15. Langholz, B.; Thomas, D.C. Efficiency of cohort sampling Williamson for helpful discussions, Ethan Katz for designs: Some surprising results. Biometrics 1991, 47, 1563–1571. research assistance, and Mike Tomz for spotting an error 16. Lubin, J.H.; Gail, M.H. Sampling strategies in nested case- in an earlier version. Thanks to the National Science control studies. Environ. Health Perspect. 1994, 102 Foundation (IIS-9874747), the Centers for Disease (Suppl. 8), 47–51. Control and Prevention (Division of Diabetes Transla- 17. Robins, J.M.; Gail, M.H.; Lubin, J.H. More on biased tion), the National Institutes of Aging (P01 AG17625-01), selection of controls for case-control analyses of cohort the World Health Organization, and the Center for Basic studies. Biometrics 1986, 42, 293–299. ORDER REPRINTS

Case-Control Studies, Inference in 13

18. Prentice, R.L. A case-cohort design for epidemiological nested case-control data. Biometrics June 1997, 53, 767– studies and disease prevention trials. Biometrica 1986, 73, 774. 1–11. 31. Yule, G.U. On the methods of measuring the association 19. Greenland, S. Multivariate estimation of exposure-specific between two attributes. J. R. Stat. Soc. 1912, 75, 579–642. incidence from case-control studies. J. Chronic. Dis. 1981, 32. Berger, J. An overview of robust Bayesian analysis (with 34, 445–453. discussion). Test 1994, 3, 5–124. 20. Neutra, R.R.; Drolette, M.E. Estimating exposure-specific 33. Insua, D.R.; Fabrizio, R. Bayesian Analysis; Springer disease rates from case-control studies using Bayes Verlag, 2000. theorem. Am. J. Epidemiol. 1978, 108 (3), 214–222. 34. Tumbarello, M.; Tacconelli, E.; de Gaetano, K.; Ardit, F.; 21. Cornfield, J. A method of estimating comparative rates Pirronti, T.; Claudia, R.; Ortona, L. Bacterial pneumonia from clinical data: Application to cancer of the lung, breast in HIV–infected patients: Analysis of risk factors and and cervix. J. Natl. Cancer Inst. 1951, 11, 1269–1275. prognostic indicators. J. Acquir. Immune Defic. Syndr. 22. Anderson, J.A. Separate-sample logistic discrimination. Human Retrovirol. 1998, 18, 39–45. Biometrika 1972, 59, 19–35. 35. Benichou, J.; Gail, M. Methods of inference for estimates 23. Prentice, R.L.; Pyke, R. Logistic disease incidence models of absolute risk derived from population-based case- and case-control studies. Biometrica 1979, 63, 403–411. control studies. Biometrics 1995, 51, 182–194. 24. Mantel, N. Synthetic retrospective studies and related 36. Nurminen, M. To use or not to use the odds ratio in topics. Biometrics 1973, 29, 479–486. epidemiologic analysis. Eur. J. Epidemiol. 1995, 11, 365– 25. Manski, C.F. The estimation of choice probabilities from 371. choice based samples. Econometrics November 1977, 45 37. Zhang, J.; Kai Yu, F. What’s the relative risk? A method of (8), 1977–1988. correcting the odds ratio in cohort studies of common 26. King, G.; Zeng, L. Logistic regression in rare events data. outcomes. N. Engl. J. Med. 1998, 280 (19), 1690–1691. Polit. Anal. Spring 2001, 9 (2), 137–163. 38. Davies, H.T.O.; Manouche, T.; Kinloch Iain, C. Authors 27. King, G.; Tomz, M.; Wittenberg, J. Making the most reply. Br. Med. J. October 24 1998, 317, 1156–1157. of statistical analyses: Improving interpretation and 39. Deeks, J. Odds ratio should be used only in case-control presentation. Am. J. Polit. Sci. April 2000, 44 (2), 341– studies and logistic regression analyses. Br. Med. J. 355. October 24 1998, 317, 1155–1156. 28. Bishop, C.M. Neural Networks for Pattern Recognition; 40. Michael, B.; Bracken, J.C. Avoidable systematic error in Oxford University Press: Oxford, 1995. estimating treatment effects must not be tolerated. Br. 29. Beck, N.; King, G.; Zeng, L. Improving quantitative Med. J. October 24 1998, 317, 11–56. studies of international conflict: A conjecture. Am. Polit. 41. Altman, D.G.; Deeks, J.J.; Sackett, D.L. Odds ratios should Sci. Rev. March 1999, 94 (1), 21–36. be avoided when events are common. Br. Med. J. Nov. 7 30. Langholz, B.; Ørnulf, B. Estimation of absolute risk from 1998, 317, 1318. Request Permission or Order Reprints Instantly!

Interested in copying and sharing this article? In most cases, U.S. Copyright Law requires that you get permission from the article’s rightsholder before using copyrighted content.

All information and materials found in this article, including but not limited to text, trademarks, patents, logos, graphics and images (the "Materials"), are the copyrighted works and other forms of intellectual property of Marcel Dekker, Inc., or its licensors. All rights not expressly granted are reserved.

Get permission to lawfully reproduce and distribute the Materials or order reprints quickly and painlessly. Simply click on the "Request Permission/ Order Reprints" link below and follow the instructions. Visit the U.S. Copyright Office for information on Fair Use limitations of U.S. copyright law. Please refer to The Association of American Publishers’ (AAP) website for guidelines on Fair Use in the Classroom.

The Materials are for your personal use only and cannot be reformatted, reposted, resold or distributed by electronic means or otherwise without permission from Marcel Dekker, Inc. Marcel Dekker, Inc. grants you the limited right to display the Materials only on your personal computer or personal wireless device, and to copy and download single copies of such Materials provided that any copyright, trademark or other notice appearing on such Materials is also retained by, displayed, copied or downloaded as part of the Materials and is not removed or obscured, and provided you do not edit, modify, alter or enhance the Materials. Please refer to our Website User Agreement for more details.

Request Permission/Order Reprints

Reprints of this article can also be ordered at http://www.dekker.com/servlet/product/DOI/101081EEBS120023390