Air Pollution and COVID-19 Mortality in the United States: Strengths and Limitations of an Ecological Regression Analysis X
Total Page:16
File Type:pdf, Size:1020Kb
SCIENCE ADVANCES | RESEARCH ARTICLE CORONAVIRUS Copyright © 2020 The Authors, some rights reserved; Air pollution and COVID-19 mortality in the United exclusive licensee American Association States: Strengths and limitations of an ecological for the Advancement of Science. No claim to regression analysis original U.S. Government X. Wu1*, R. C. Nethery1*, M. B. Sabath1, D. Braun1,2, F. Dominici1† Works. Distributed under a Creative Commons Attribution Assessing whether long-term exposure to air pollution increases the severity of COVID-19 health outcomes, NonCommercial including death, is an important public health objective. Limitations in COVID-19 data availability and quality License 4.0 (CC BY-NC). remain obstacles to conducting conclusive studies on this topic. At present, publicly available COVID-19 outcome data for representative populations are available only as area-level counts. Therefore, studies of long-term expo- sure to air pollution and COVID-19 outcomes using these data must use an ecological regression analysis, which precludes controlling for individual-level COVID-19 risk factors. We describe these challenges in the context of one of the first preliminary investigations of this question in the United States, where we found that higher historical PM2.5 exposures are positively associated with higher county-level COVID-19 mortality rates after accounting for Downloaded from many area-level confounders. Motivated by this study, we lay the groundwork for future research on this important topic, describe the challenges, and outline promising directions and opportunities. INTRODUCTION logical regression analyses of air pollution and COVID-19 health http://advances.sciencemag.org/ The suddenness and global scope of the coronavirus disease 2019 outcomes and describe additional challenges related to evolving (COVID-19) pandemic have raised urgent questions that require data quality, statistical modeling, and control of measured and un- coordinated investigation to slow the disease’s devastation. A critically measured confounding, paving the way for future research on this important public health objective is to identify key modifiable environ- topic. We discuss these challenges and illustrate them in the context mental factors that may contribute to the severity of health outcomes of a specific study, in which we investigated the impact of long-term [e.g., intensive care unit (ICU) hospitalization and death] among PM2.5 exposure on COVID-19 mortality rates in 3089 counties in individuals with COVID-19. Numerous scientific studies reviewed the United States, covering 98% of the population. by the U.S. Environmental Protection Agency (EPA) have linked fine particles (PM2.5; particles with diameter, ≤ 2.5 mm) to a variety Illustration of an ecological regression analysis of historical of adverse health events (1) including death (2). It has been hypothe- exposure to PM2.5 and COVID-19 mortality rate sized that because long-term exposure to PM2.5 adversely affects We begin by describing how to conduct an ecological regression anal- on November 5, 2020 the respiratory and cardiovascular systems and increases mortality ysis in this setting. COVID-19 death counts (a total of 116,747 deaths) risk (3–5), it may also exacerbate the severity of COVID-19 symp- were obtained from the Johns Hopkins University Coronavirus toms and worsen the prognosis of this disease (6). Resource Center and were cumulative up to 18 June 2020. We used Epidemiological studies to estimate the association between data from 3089 counties, of which 1244 (40.3%) had reported zero long-term exposure to air pollution and COVID-19 hospitalization COVID-19 deaths at the time of our analysis. Daily PM2.5 concen- and death is a rapidly expanding area of research that is attracting trations were estimated across the United States on a 0.01° × 0.01° grid attention around the world. Two studies have been published using for the period 2000–2016 using well-validated atmospheric chemistry data from European countries (7, 8), and many more are available and machine learning models (9). We used zonal statistics to aggregate as preprints. However, because of the unprecedented nature of the PM2.5 concentration estimates to the county level and then averaged pandemic, researchers face serious challenges when conducting these across the period 2000–2016 to perform health outcome analyses. studies. One key challenge is that, to our knowledge, individual-level Figure 1 illustrates the spatial variation in 2000–2016 average (here- data on COVID-19 health outcomes for large, representative popula- after referred to as “long-term average”) PM2.5 concentrations and tions are not publicly available or accessible to the scientific com- COVID-19 mortality rates (per 1 million population) by county. munity. Therefore, the only way to generate preliminary evidence We fit a negative binomial mixed model using COVID-19 mortality on the link between PM2.5 and COVID-19 severity and outcomes rates as the outcome and long-term average PM2.5 as the exposure using these aggregate data is to use an ecological regression analysis. of interest, adjusting for 20 county-level covariates. We conducted With this study design, publicly available area-level COVID-19 more than 80 sensitivity analyses to assess the robustness of the mortality rates are regressed against area-level air pollution concentra- findings to various modeling assumptions. We found that an in- 3 tions while accounting for area-level potential confounding factors. crease of 1 mg/m in the long-term average PM2.5 is associated with Here, we discuss the strengths and limitations of conducting eco- a statistically significant 11% (95% CI, 6 to 17%) increase in the county’s COVID-19 mortality rate (see Table 1); this association continues to be stable as more data accumulate (fig. S3). We also 1Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 2 found that population density, days since the first COVID-19 case USA. Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA. *These authors contributed equally to this work. was reported, median household income, percent of owner-occupied †Corresponding author. Email: [email protected] housing, percent of the adult population with less than high school Wu et al., Sci. Adv. 2020; 6 : eabd4049 4 November 2020 1 of 6 SCIENCE ADVANCES | RESEARCH ARTICLE Downloaded from http://advances.sciencemag.org/ on November 5, 2020 Fig. 1. National maps of historical PM2.5 concentrations and COVID-19 deaths. Maps show (A) county-level 17-year long-term average of PM2.5 concentrations (2000–2016) in the United States in mg/m3 and (B) county-level number of COVID-19 deaths per 1 million population in the United States up to and including 18 June 2020. education, age distribution, and percent of Black residents are im- previous section. This approach is regularly applied in many areas portant predictors of the COVID-19 mortality rate in the model. of research (10). Using our study as an example, we summarize We found a 49% (95% CI, 38 and 61%) increase in COVID-19 mortality in Table 2 the strengths, limitations, and opportunities considering rate associated with a 1-SD (per 14.1%) increase in percent Black resi- (i) study design, (ii) COVID-19 health outcome data, (iii) historical dents of the county. Details on the data sources, statistical methods, exposure to air pollution, and (iv) measured and unmeasured and analyses are summarized in the Supplementary Materials. All data confounders, with the goal of paving the way for future research. sources used in the analyses, along with fully reproducible code, are Among the key limitations, by design, ecological regression publicly available at https://github.com/wxwx1993/PM_COVID. analyses are unable to adjust for individual-level risk factors (e.g., age, race, and smoking status); when individual-level data are Strengths and limitations of an ecological unavailable, this approach leaves us unable to make conclusions re- regression analysis garding individual-level associations. In the context of COVID-19 Ecological regression analysis provides a simple and cost-effective health outcomes, this is a severe limitation, as individual-level risk approach for studying potential associations between historical ex- factors are known to affect COVID-19 health outcomes. It is im- posure to air pollution and increased vulnerability to COVID-19 in portant to note that confusion between ecological associations and large representative populations, as illustrated in our study in the individual associations may present an ecological fallacy. In extreme Wu et al., Sci. Adv. 2020; 6 : eabd4049 4 November 2020 2 of 6 SCIENCE ADVANCES | RESEARCH ARTICLE cases, this fallacy can lead to associations detected in ecological (i) prioritization of precautionary measures [e.g., personal protec- regression that do not exist or are in the opposite direction of true tive equipment (PPE) allocations and hospital beds] to areas with associations at the individual level. However, ecological regression historical higher air pollution and (ii) further strengthening the scien- analyses still allow us to make conclusions at the area level, which tific argument for lowering the U.S. National Ambient Air Quality can be useful for policy-making (11). For the association between Standards for PM2.5 and other pollutants. To completely avoid COVID-19 health outcomes and PM2.5 exposure, we argue that potential ecological bias, a representative sample of individual-level area-level conclusions are valuable, as they can inform important data is necessary. While this may not be feasible in the near future, immediate policy actions that will benefit public health, such as as some COVID-19 outcome data become available at the indi- vidual level, existing approaches that augment county-level data with individual-l evel data (12) could be used to correct for eco- logical bias. Furthermore, air pollution exposure misclassification, due to Table 1. Mortality rate ratios (MRR), 95% confidence intervals (CI), between-area mobility and within-area variation, is another potential and values for all variables in the main analysis.