Model-Based Estimation of Relative Risks and Other Epidemiologic

Model-Based Estimation of Relative Risks and Other Epidemiologic

American Journal of EPIDEMIOLOGY Volume 160 Copyright © 2004 by The Johns Hopkins Bloomberg School of Public Health Number 4 Sponsored by the Society for Epidemiologic Research August 15, 2004 Published by Oxford University Press SPECIAL ARTICLE Model-based Estimation of Relative Risks and Other Epidemiologic Measures in Downloaded from https://academic.oup.com/aje/article/160/4/301/165918 by guest on 30 September 2021 Studies of Common Outcomes and in Case-Control Studies Sander Greenland From the Departments of Epidemiology and Statistics, University of California, Los Angeles, Los Angeles, CA. Received for publication November 7, 2003; accepted for publication May 27, 2004. Some recent articles have discussed biased methods for estimating risk ratios from adjusted odds ratios when the outcome is common, and the problem of setting confidence limits for risk ratios. These articles have overlooked the extensive literature on valid estimation of risks, risk ratios, and risk differences from logistic and other models, including methods that remain valid when the outcome is common, and methods for risk and rate estimation from case-control studies. The present article describes how most of these methods can be subsumed under a general formulation that also encompasses traditional standardization methods and methods for projecting the impact of partially successful interventions. Approximate variance formulas for the resulting estimates allow interval estimation; these intervals can be closely approximated by rapid simulation procedures that require only standard software functions. absolute risk; case-control studies; clinical trials; cohort studies; logistic regression; odds ratio; relative risk; risk assessment Abbreviation: GEE, generalized estimating equations. Recently, McNutt et al. (1) noted a bias in a popular by Zhang and Yu (2). Greenland and Holland (4) described method by Zhang and Yu (2) for converting odds ratios to the biases in that method and gave valid formulas for risk ratios. Both articles overlooked the extensive literature converting odds-ratio estimates into risk-difference esti- on estimating relative risks and other measures from fitted mates. There are now many model-based estimates and models. This literature addresses the problems they noted confidence intervals for risks (incidence proportions), rates, and provides valid methods for all study designs, including and their ratios and differences (5, pp. 414–415; 6–14) and case-control studies, cohort studies, and clinical trials. for attributable fractions (15, 16). These methods can make use of input from logistic and other models, and most require BACKGROUND no rare-disease assumption. One can also get valid confi- In 1989, Holland (3) proposed an “adjusted risk differ- dence intervals for risk ratios by using Poisson regression ence” using the same biased risk-ratio formula rediscovered with robust (generalized estimating equations (GEE) or Correspondence to Dr. Sander Greenland, Departments of Epidemiology and Statistics, University of California, Los Angeles, Los Angeles, CA 90095-1772 (e-mail: [email protected]). 301 Am J Epidemiol 2004;160:301–305 302 Greenland “sandwich”) variance estimates (10), which avoid the overly A key advantage of model-based estimates is that they do wide intervals noted elsewhere (1). not require large numbers at each regressor level; the To describe the general ideas, suppose that r(x) is the risk regressor values may even be unique to each individual (as or rate at level x of a regressor vector X. X is a function of all would be expected when some covariates are continuous) exposures, confounders, and modifiers in the model; it may (6–14). To avoid sparse-data artifacts, they do require that contain powers, product terms, splines, and so forth. For the numbers of cases and noncases be adequate relative to example, a study of cannabis smoking and lung cancer might the number of model parameters, although this restriction use X = (cannabis grams/year, pack-years cigarettes, age, can be reduced by using penalized estimation (shrinkage) or age2, female, age × female), where female = 1 for women, 0 Bayesian methods to fit the model (19–21). for men. Suppose we want to compare average risks or rates Model-based estimates of r(x) can be sensitive to influen- tial data points and to model misspecification, but they none- when the distribution of X in a target population is p1(x) theless tend to have a smaller mean-squared error than do versus p0(x). These distributions usually correspond to raw covariate-specific estimates (which become wildly everyone exposed versus everyone unexposed to some risk Downloaded from https://academic.oup.com/aje/article/160/4/301/165918 by guest on 30 September 2021 unstable in sparse data) if the model fits well (22, chap. 12). factor. In policy applications, p1(x) and p0(x) may represent the population distribution without versus with the applica- When one standardizes over distributions similar to those in tion of some intervention program. Some X components the data, the resulting summary estimates will be far less sensitive than the specific r(x) estimates; this robustness (e.g., age, sex) may have the same distribution in p1(x) and p (x); only those components affected by the exposure will derives from the tendency of residual errors to average to 0 zero over the data distribution. differ. For example, p1(x) could represent the existing joint distribution of cannabis smoking, cigarette smoking, age, and sex in the target, while p0(x) could represent the same EXAMPLE distribution of cigarette smoking, age, and sex, with zero cannabis assigned to everyone (for everyone, the first entry Table 1 presents data on the relation of receptor level and in X is shifted to zero, and the rest are unchanged). staging to survival in a cohort of women with breast cancer The standardized (population-averaged) risk or rate under (23, table 5.3), along with risk estimates from use of several methods. Let x = (x , x , x ), with x an indicator of low- exposure or intervention j is R = Σ p (x)r(x), where the sum 1 2 3 1 j x j receptor status and x and x indicators of stage II and III. is over the range of X in the target. The adjusted risk or rate 2 3 The first set of estimates is the observed proportion of ratio and difference are then RR = R /R and RD = R – R , 10 1 0 10 1 0 women in each column who died. The second set is from respectively. The adjusted attributable fraction is (R – R )/ 1 0 maximum-likelihood logistic regression using the model r(x) = R = RD /R = (RR – 1)/RR ; the population attributable 1 10 1 10 10 expit(α + xβ), where expit(u) = eu/(1 + eu); the model fits fraction is the special case in which p (x) is the current distri- 1 well (e.g., likelihood-ratio p = 0.8) and the fitted risks are bution and p (x) is the distribution after exposure removal 0 expit(a + xb), where a and b are the α and β estimates. The (15, 16). The covariate-specific RR formula given by α + β third set is from a log-linear model r(x) = e x fit by bino- McNutt et al. (1, p. 941) is a special case in which the distri- mial maximum likelihood (23); this model also fits well x x x butions p1( ) and p0( ) are concentrated at single values 1 (e.g., p = 0.8) and the fitted risks are ea + xb. The fourth set is and x0 of X, and the exposure differs between x1 and x0 but the from this log-linear model fit using the incorrect Poisson other covariates do not. For example, to compare the risk likelihood, as one would obtain by entering the observed from use of 200 g/year of cannabis with that of nonuse column totals as person-years in a Poisson regression among males aged 50 years with 10 cigarette pack-years of program (1, 10, 14). smoking, if X = (cannabis grams/year, cigarette pack-years, To obtain a standardized risk ratio comparing the low and 2 × age, age , female, age female), one would take x1 = (200, high receptor group using the total group as the standard, × 2 × 10, 50, 502, 0, 50 0) and x0 = (0, 10, 50, 50 , 0, 50 0). take p1(x) and p0(x) shown in the final two rows of table 1, multiply them against a row of estimated risks, and sum the ESTIMATION results to get the R1 and R0 estimates. Under the log-linear model, the RR10 estimate simplifies to exp(b1), where b1 is Model-based confidence intervals for the above quantities the estimated x coefficient. The estimated standardized 1 ˆ ˆ can be obtained from variance formulas (8–14). To avoid risks, ratios, and differences are R1 = 0.392, R0 = 0.237, ˆ ˆ programming these formulas, intervals can instead be RR10 = 1.65, and RD10 = 0.156 if the observed proportions ˆ ˆ ˆ ˆ obtained by simulation or other resampling methods such as are used; R1 = 0.401, R0 = 0.239, RR10 = 1.68, and RD10 = ˆ ˆ bootstrapping (16–18). If the comparison distributions p1(x) 0.162 if the logistic model is used; R = 0.371, R0 = 0.238, ˆ ˆ 1 and p0(x) are also estimated, their estimates should also be RR10 = 1.56, and RD10 = 0.133 if the log-linear model fit by ˆ ˆ resampled. There are many ways to make use of the resam- binomial maximum likelihood is used; and R = 0.383, R0 = ˆ ˆ 1 pling distribution of estimates; one should avoid naive use of 0.235, RR10 = 1.63, and RD10 = 0.148 if log-linear Poisson the percentiles of the bootstrap distribution to set confidence regression is used.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us