Estimation and Inference for the Population Attributable Risk in the Presence of Misclassification
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from https://academic.oup.com/biostatistics/advance-article-abstract/doi/10.1093/biostatistics/kxz067/5767138 by Harvard College Library, Cabot Science Library user on 27 April 2020 Biostatistics (2020) 0,0,pp. 1–14 C doi:10.1093/biostatistics/kxz067 Estimation and inference for the population attributable risk in the presence of misclassification BENEDICT H. W. WONG Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA JOOYOUNG LEE Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA and Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA DONNA SPIEGELMAN Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA, Department of Epidemiology, Harvard T.H. Chan School of Public Health, 181 Longwood Ave, Boston, MA 02115, USA, Department of Nutrition and Global Health & Population, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA and Department of Biostatistics, Center on Methods in Implementation and Prevention Science, Yale School of Public Health, 60 College St, New Haven, CT 06510, USA MOLIN WANG∗ Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA, Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA and Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Ave, Boston, MA 02115 [email protected] SUMMARY Because it describes the proportion of disease cases that could be prevented if an exposure were entirely eliminated from a target population as a result of an intervention, estimation of the population attributable risk (PAR) has become an important goal of public health research. In epidemiologic studies, categorical covariates are often misclassified. We present methods for obtaining point and interval estimates of the PAR and the partial PAR (pPAR) in the presence of misclassification, filling an important existing gap in public health evaluation methods. We use a likelihood-based approach to estimate parameters in the models for the disease and for the misclassification process, under main study/internal validation study and main study/external validation study designs, and various plausible assumptions about transportability. We assessed the finite sample perf ormance of this method via a simulation study, and used it to obtain corrected point and interval estimates of the pPAR for high red meat intake and alcohol intake in relation ∗To whom correspondence should be addressed. © The Author 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]. Downloaded from https://academic.oup.com/biostatistics/advance-article-abstract/doi/10.1093/biostatistics/kxz067/5767138 by Harvard College Library, Cabot Science Library user on 27 April 2020 2 B.H.W.WONG AND OTHERS to colorectal cancer incidence in the HPFS, where we found that the estimated pPAR for the two risk factors increased by up to 317% after correcting for bias due to misclassification. Keywords: Attributable fraction; Attributable risk; Measurement error; Misclassification; Partial population attribut- able risk; Population attributable risk; Validation study. 1. INTRODUCTION The population attributable risk (PAR)is the fraction of disease cases that would be prevented if an exposure were to be eliminated from a population of interest. It has attracted much attention in epidemiology and health policy research, as it evaluates the impact of public health interventions which remove the harmful exposure. If the research goal is to estimate the amount or proportion of cases of a disease attributable to a given risk factor, or to predict the impact of public health interventions on the health status of a population, then PARs are particularly relevant (Northridge, 1995). In 2018, a British Journal of Cancer editorial entitled “Population attributable fractions continue to unmask the power of prevention” exhorted that “The population attributable fraction is a critical driver of evidence-based cancer prevention.” (Bray and Soerjomataram, 2018). In a single-exposure setting, the PAR is a function of the relative risk (RR) and the prevalence of the exposure (Levin, 1952). In the presence of risk factors for the disease under study whose distribution is not affected by the interventions, the effect of the interventions can be evaluated using the partial PAR (pPAR) (Spiegelman and others, 2007). The pPAR is also called the adjusted attributable risk (Bruzzi and others, 1985; Benichou, 2001). In the research motivating this article, cancer epidemiologists were interested in estimating the propor- tion of colorectal cancer (CRC) cases among men in the Health Professionals Follow-up Study (HPFS) that are attributable to a number of modifiable exposures, and thus might be preventable (Platz and others, 2000). The HPFS began in 1986 when 51 529 male health professionals were enrolled by responding to mailed questionnaires (Rimm and others, 1991). Every 2 years since the start of the study, these partici- pants filled in questionnaires inquiring about topics such as dietary intake and health status. The accuracy of the responses in the food frequency questionnaires was assessed by validation with dietary records in a sub-sample of 127 study participants (Rimm and others, 1992). By comparing the dietary records of the validation study participants to their responses in the food frequency questionnaire, we saw that red meat intake and alcohol intake were measured with moderate to substantial levels of misclassification. Most notably, the specificity for high red meat intake was 0.29, and the sensitivity of high alcohol intake was 0.78, leading to a large number of individuals being falsely classified into the high red meat category and/or falsely classified into the low alcohol category. Decisions about a reduction in which risk factors should be emphasized in health promotions programs could be misleading due to bias in the pPAR estimates that quantify the extent to which disease can be prevented by reduction in the individual factor. This article provides a methodology to correct for this bias. In epidemiologic studies, when categorical variables are misclassified, bias will arise in estimates of the exposure prevalences and the RRs. This, in turn, affects the validity of the PAR and pPAR estimates, which are functions of the exposure prevalences and the RR estimates, and will be biased if there is bias in estimates of either. There are publications on both the impact of misclassification on estimates of exposure prevalences and on exposure-disease associations (Goldberg, 1975; Copeland and others, 1977; Hsieh and Walter, 1988). The effect of non-differential exposure misclassification on the PAR estimates in the single-exposure setting has also been studied previously. Misclassification is said to be non-differential when it is independent of disease status, that is, when exposure sensitivity and specificity are the same for both the disease cases and the non-cases (Johnson and others, 2014). One article (Hsieh and Walter, 1988) showed that when there is imperfect sensitivity of a single binary exposure, both the Downloaded from https://academic.oup.com/biostatistics/advance-article-abstract/doi/10.1093/biostatistics/kxz067/5767138 by Harvard College Library, Cabot Science Library user on 27 April 2020 Estimation and inference for the PAR in presence of misclassification 3 disease-exposure odds ratio (OR) and the PAR will be underestimated. This article also showed that when there is perfect sensitivity and imperfect specificity, the OR is again underestimated but the PAR is unbiased. On the other hand, when misclassification is differential, the bias in the OR can be in either direction (Copeland and others, 1977). Misclassification is said to be differential when exposure sensitivity and specificity differ between the disease cases and the non-cases (Johnson and others, 2014) and can arise through the dichotomization of a continuous exposure which is subject to non-differential measurement error (Dalen and others, 2009). Other studies have also examined the effect of outcome misclassification on the PAR (Hsieh, 1991; Vogel and others, 2005), and one article has examined the effect of exposure misclassification on the estimation of the pPAR in the two-exposure setting (Wong and others, 2018). In this latter study, it was shown that in the presence of non-differential exposure misclassification, the bias in the pPAR can be in either direction, unlike the bias in the single-exposure PAR which can only be toward the null. In addition, these authors found that the magnitude of the bias is most dependent on the sensitivity of the exposure being eliminated. These findings further motivate the need for developing tools that can help researchers estimate unbiased pPARs and confidence intervals (CIs) in the presence of misclassification. Statistical methods exist for correcting for the misclassification-caused bias in the prevalence estimators as well as association estimators, and there is an especially large literature on the latter (Marshall, 1990; Spiegelman and others, 2000; Yi and others, 2015). However, there are no existing statistical methods for correcting