ASSOCIATIONS OF UNCONVENTIONAL NATURAL GAS DEVELOPMENT WITH ASTHMA EXACERBATIONS AND DEPRESSIVE SYMPTOMS IN

by Sara Rasmussen

A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy

Baltimore, Maryland April, 2017

Abstract

Background: Unconventional natural gas development (UNGD) has proceeded rapidly

in Pennsylvania, which now accounts for over 25% of the country’s unconventional

natural gas production. UNGD has been associated with air quality and community

impacts.

Objectives: 1) Evaluate associations of UNGD metrics with asthma exacerbations. 2)

Compare different approaches to UNGD activity assessment with one another and in

associations with mild asthma exacerbations. 3) Evaluate associations of UNGD metrics

with depression symptoms. 4) Evaluate whether and how other aspects of UNGD

(impoundments, compressor engines, flaring events) should be incorporated into UNGD

activity metrics.

Methods: The health studies were conducted using electronic data from the Geisinger

Clinic in Pennsylvania. We created UNGD metrics for four phases of well development.

We conducted a nested case-control study comparing asthma patients with

exacerbations to asthma patients without exacerbations from 2005-12. We then re-

evaluated the mild exacerbation associations after replacing our UNGD metrics with

those used in prior studies. We evaluated the association of UNGD metrics with

depression symptoms ascertained from questionnaire data. We identified UNGD-related impoundments, compressor engines, and flaring using crowdsourcing, abstraction of paper records, and satellite data, respectively, and conducted a principal component analysis (PCA) of UNGD metrics created for wells, impoundments and compressor engines.

Results: From the mid-2000s through 2015, 9,669 wells were drilled in Pennsylvania.

We found consistent associations of UNGD metrics with three types of asthma

exacerbations. Comparing UNGD metrics created in different studies, metrics had

ii

different magnitudes of association with mild asthma exacerbations, though the highest category of each metric (vs. the lowest) was associated with the outcome. In the

depression study, the UNGD metric was associated with depression symptoms. We

identified 361 compressor stations, 1,218 impoundments, and 216 locations with flaring

events. The PCA identified that a single component captured most of the variation

between metrics and was approximately an equal mix of the metrics for compressors,

impoundments, and well development.

Conclusions: UNGD metrics were associated with asthma exacerbations and

depression symptoms and were robust to increasing covariate control and in sensitivity

analyses. Determining if these associations are causal requires further research,

including more detailed exposure assessment.

Thesis readers: Brian Schwartz, Karen Bandeen-Roche, Hugh Ellis, and Paul Strickland

Alternates: Jessie Buckley, Mary Fox, and Holly Wilcox

iii

Acknowledgments

I could not have completed this thesis without the support of many, some of whom I would like to thank here. Brian Schwartz has been a tremendous mentor. He provided both freedom to pursue my own ideas and structure to complete this thesis.

Thank you for so generously giving expertise and time, and for teaching me to be an environmental epidemiologist. I am so grateful for the opportunities he has provided me.

I want to thank Hugh Ellis, Karen Bandeen-Roche, Holly Wilcox, Meredith

McCormack, Kirsten Koehler, Paul Strickland, and Anne-Marie Hirsch for their guidance and insight. I also would like to thank Chris Heaney and Ana Navas-Acien for leading journal club, which has been instrumental to my education. Thanks to my fellow Hopkins students for their friendship and support, and thanks to Jon Pollak and Joan Casey for all of their help. Thanks to Cindy Parker for encouraging me to pursue a Ph.D. as a

master’s student and providing mentorship throughout my time at Johns Hopkins. I

would like to thank the Johns Hopkins Bloomberg School of Public Health for providing

me with a great education. Thanks also to the School of Engineering and the IGERT

program, namely Grace Brush and Shahin Zand, for broadening my horizons. This

thesis would not be possible without the collaboration with Geisinger, and I would like to

thank Dione Mercer and Joe DeWalle. I would also like to thank the team at SkyTruth for

their collaboration.

Thank you to my parents, Joan and Per Rasmussen, for their encouragement

from preschool through 22nd grade; and thank you to my family: Lisbeth Rasmussen;

Robert Jacobs, and Andrea, Tony, and Kristin Rogers. Thank you to friends: Nicole Tatz

and Marta Schantz, who spent countless hours with me at every coffee shop in DC, and

Ruth Mandelbaum, Hillary Smith, and Becca Shareff, for a group text that kept me sane.

Finally, thank you to Tim Rogers for his love and support, and for helping me spell all the

hard words in this thesis. Thanks for sticking with me through three degrees.

iv

Table of Contents

Abstract ...... ii Acknowledgments ...... iv Table of Contents ...... v List of tables ...... x List of figures ...... xii List of equations ...... xii Chapter 1: Introduction ...... 1 1.0 Rationale ...... 1 1.1 Unconventional natural gas terminology ...... 1 1.2 Unconventional natural gas development in the ...... 3 1.3 Development of a shale gas well ...... 3 1.4 Environmental and social impacts ...... 4 1.4.1 Soil impacts ...... 7 1.4.2 Water impacts ...... 7 1.4.3 Air impacts ...... 8 1.4.4 Community and social impacts ...... 11 1.5 UNGD and health studies ...... 12 1.5.1 Occupational studies of UNGD ...... 13 1.5.2 Epidemiology studies of UNGD ...... 13 1.6 The use of electronic health records for epidemiology studies ...... 19 1.6.1 Overview of the Geisinger Clinic and its EHR ...... 19 1.6.2 Environmental epidemiology studies using the Geisinger EHR ...... 20 1.7 Outcomes selected for study in this thesis ...... 21 1.7.1 Epidemiology of asthma and asthma exacerbations ...... 22 1.7.1.1 Overview and definition of asthma exacerbations ...... 23 1.7.1.2 Asthma exacerbations and air pollution ...... 24 1.7.1.3 Asthma exacerbations and stress ...... 28 1.7.2 Epidemiology of depression ...... 30 1.7.2.1 Association of community characteristics and depression symptoms ...... 30 1.7.2.2 Association of environmental variables and depressive symptoms ...... 32 1.7.3 Relationship between depression and asthma ...... 35 1.8 Specific aims ...... 35 1.9 References...... 36 Chapter 2: Detailed Methods ...... 52 2.0 Chapter overview ...... 52 2.1 Data sources ...... 52 2.1.1 Geisinger Clinic ...... 52 2.1.2 Pennsylvania Department of Environmental Protection ...... 52 2.1.2.1 Natural gas wells ...... 52 2.1.2.2 Compressors ...... 55 2.1.2.3 Municipal water supply ...... 56 2.1.3 Pennsylvania Department of Conservation and Natural Resources ...... 56 2.1.4 U.S. Census data ...... 56 2.1.5 Federal Highway Administration ...... 57 2.1.6 U.S. Department of Agriculture National Agricultural Imagery Program ...... 57 2.1.7 Satellite data ...... 57 2.1.8 Environmental Protection Agency ...... 58 2.1.8.1 Air quality monitoring network ...... 58 2.1.8.2 National Emissions Inventory ...... 58

v

2.2 Data acquisition ...... 58 2.2.1 Geisinger EHR data ...... 58 2.2.2 Crowdsourced data on impoundments and well pads from SkyTruth ...... 59 2.2.3 Compressor data ...... 60 2.3 Data processing ...... 61 2.3.1 Creation of the unconventional natural gas well dataset ...... 61 2.3.1.1 Well data sources ...... 62 2.3.1.2 Inclusion criteria ...... 62 2.3.1.3 Creation of well variables ...... 63 2.3.1.3.1 Well latitude and longitude ...... 63 2.3.1.3.2 Spud date...... 63 2.3.1.3.3 Total depth ...... 65 2.3.1.3.4 Stimulation date ...... 66 2.3.1.3.5 Production start date and production quantities ...... 67 2.3.3.1.6 Well pad ...... 67 2.3.2 Creation of the UNGD-related compressor engine dataset ...... 68 2.3.2.1 Data abstraction ...... 68 2.3.2.2 Data checking ...... 68 2.3.2.3 Creation of a Compressor Station Database ...... 69 2.4 Selection of study population and outcomes ...... 69 2.4.1 Asthma study ...... 69 2.4.1.1 Identification of asthma population ...... 69 2.4.1.2 Identification of asthma exacerbations ...... 70 2.4.1.3 Identification and matching of control index dates ...... 71 2.4.2 Depression symptom study ...... 72 2.4.2.1 Study population ...... 72 2.4.2.2 Outcome and mediating variables created from the questionnaires ...... 73 2.4.2.2.1 Fatigue ...... 73 2.4.2.2.2 Migraine headache ...... 74 2.4.2.2.3 Depression symptoms ...... 74 2.4.2.3 Case and control dates for the disordered sleep outcome ...... 75 2.5 Exposure study ...... 76 2.5.1 Creation of the regular grid ...... 76 2.5.2 Estimation of impoundment start and stop dates ...... 76 2.6 Geocoding of study population ...... 77 2.7 Creation of study variables ...... 78 2.7.1 Covariates created from the electronic health record ...... 78 2.7.1.1 Sex ...... 79 2.7.1.2 Age ...... 79 2.7.1.3 Season ...... 79 2.7.1.4 Race/ethnicity ...... 80 2.7.1.5 Smoking ...... 80 2.7.1.6 Family history ...... 82 2.7.1.7 Medical Assistance ...... 82 2.7.1.8 Diabetes ...... 83 2.7.1.9 Overweight/obesity ...... 83 2.7.1.10 Alcohol use ...... 84 2.7.1.11 Anti-depressant use ...... 85 2.7.2 Covariates created using patients’ coordinates ...... 86 2.7.2.1 Place type ...... 86 2.7.2.2 Community socioeconomic deprivation ...... 87

vi

2.7.2.3 Maximum temperature on prior day ...... 87 2.7.2.4 Distance to nearest major and minor road ...... 88 2.7.2.5 Distance to hospital ...... 89 2.7.2.6 Well water supply ...... 89 2.7.2.7 Greenness ...... 90 2.7.3 UNGD activity metrics ...... 90 2.7.3.1 Durations of phases of well development ...... 91 2.7.3.2 Assignment of unconventional natural gas activity metrics for wells ...... 92 2.7.3.3 Assignment of unconventional natural gas activity metrics for impoundments and compressors ...... 93 2.8 References...... 93 Chapter 3: Asthma Exacerbations and Unconventional Natural Gas Development in the Marcellus Shale ...... 102 3.0 Cover page ...... 102 3.1 Abstract ...... 103 3.2 Introduction ...... 104 3.3 Methods ...... 106 3.3.1 Study population ...... 106 3.3.2 Outcome Ascertainment ...... 106 3.3.3 Controls and Matching ...... 108 3.3.4 Covariates ...... 109 3.3.5 Well Data ...... 109 3.3.6 Activity Metric Assignment ...... 110 3.3.7 Statistical Analysis ...... 111 3.3.7.1 Model Building ...... 112 3.3.7.2 Sensitivity Analyses ...... 112 3.4 Results ...... 113 3.4.1 Descriptions of Wells and Patients ...... 113 3.4.2 Associations of UNGD Activity Metrics with Asthma Outcomes ...... 116 3.4.3 Sensitivity Analyses ...... 121 3.5 Discussion...... 124 3.6 References...... 127 Chapter 4: Associations of unconventional natural gas development with disordered sleep and depression symptoms in Pennsylvania ...... 133 4.0 Cover Page ...... 133 4.1 Abstract ...... 134 4.2 Introduction ...... 136 4.3 Methods ...... 138 4.3.1 Survey design and study population ...... 138 4.3.2 Outcome ascertainment ...... 139 4.3.2.1 Depression symptoms ...... 139 4.3.2.2 Disordered sleep diagnoses ...... 139 4.3.3 Potential mediating variables: migraine and fatigue symptoms ...... 141 4.3.4 Well data and activity metric assignment ...... 142 4.3.5 Covariates ...... 143 4.3.6 Statistical analysis ...... 144 4.4 Results ...... 147 4.4.1 Description of study population ...... 147 4.4.2 Associations of UNGD with depression symptoms ...... 150 4.4.3 Associations of UNGD with disordered sleep ...... 153 4.5 Discussion...... 154

vii

4.6 Conclusions ...... 157 4.7 References...... 157 Chapter 5: Exposure assessment using secondary data sources in unconventional natural gas development and health studies ...... 164 5.0 Cover page ...... 164 5.1 Abstract ...... 165 5.2 Introduction ...... 166 5.3 Methods ...... 167 5.3.1 UNGD-related compressor engines, impoundments, and flaring events in Pennsylvania ...... 167 5.3.2 Incorporate impoundments and compressor engines into exposure assessment ...... 169 5.3.3 Comparison of GIS-based metrics and their associations with mild asthma exacerbations ...... 171 5.4 Results ...... 173 5.4.1 UNGD-related compressor engines, impoundments, and flaring events in Pennsylvania ...... 173 5.4.2 PCA applied to wells, compressor stations, and impoundments ...... 175 5.4.3 Comparison of GIS-based UNGD metrics ...... 178 5.5 Discussion...... 180 5.6 References...... 187 Chapter 6: Miscellaneous results ...... 193 6.1 Additional Results for Chapter 3 ...... 193 6.1.1 Associations of covariates with event status ...... 193 6.1.1 Race/ethnicity ...... 193 6.1.2 Family history ...... 193 6.1.3 Smoking status ...... 194 6.1.4 Season ...... 194 6.1.5 Type 2 diabetes ...... 194 6.1.6 Community socioeconomic deprivation ...... 194 6.1.7 Maximum temperature on prior day ...... 195 6.1.7 Distance to nearest roadway ...... 195 6.1.2 Additional sensitivity analyses ...... 202 6.1.2.1 Stimulation extrapolation methods ...... 202 6.1.2.2 Control Encounter Dates ...... 203 6.1.2.3 Inverse distance squared vs. cubed metric ...... 204 6.1.2.4 Distance to hospital ...... 205 6.2 Additional Results for Chapter 4 ...... 207 6.2.1 Associations of covariates with event status ...... 207 6.2.1.1 Race / ethnicity ...... 207 6.2.1.2 Sex ...... 207 6.2.1.3 Age ...... 207 6.2.1.4 Smoking status ...... 208 6.2.1.5 Alcohol status ...... 208 6.1.2.6 Medical assistance ...... 208 6.2.1.7 Body mass index ...... 208 6.1.2.8 Community socioeconomic deprivation ...... 208 6.2.1.9 Well water ...... 209 6.3 Comparing asthma patients identified in the electronic health record and by self- report ...... 216 6.3.1 Methods ...... 218

viii

6.3.2 Results ...... 218 6.3.3 Discussion ...... 219 6.4 Unconventional natural gas development and asthma symptom study ...... 222 6.4.1 Survey data ...... 222 6.4.2 Asthma symptom outcomes ...... 223 6.4.3 UNGD metrics ...... 223 6.4.4 Covariates ...... 224 6.4.5 Data analysis ...... 224 6.4.6 Results ...... 226 6.4.7 Discussion ...... 242 6.5 Greenness and asthma exacerbation study ...... 244 6.5.1 Study population and covariates ...... 248 6.5.2 Greenness measurement, and exposure assignment ...... 248 6.5.3 Statistical analysis ...... 248 6.5.4 Results ...... 249 6.5.4 Discussion ...... 250 6.6 Greenness and depression symptoms study ...... 250 6.6.1 Greenness measure ...... 250 6.6.2 Depression symptom data ...... 251 6.6.3 Covariates ...... 251 6.6.4 Data analysis ...... 252 6.6.5 Results ...... 254 6.6.6 Discussion ...... 255 6.7 References...... 256 Chapter 7: Discussion ...... 260 7.1 Summary of findings ...... 260 7.2 Health impacts of energy production and use ...... 262 7.3 Future research directions and policy implications ...... 264 7.3.1 Research opportunities ...... 264 7.3.1.1 Replication in other shale basins ...... 265 7.3.1.2 Improving UNGD exposure assessment in epidemiology studies ...... 265 7.3.1.3 Reducing potential sources of bias in epidemiology studies of UNGD ... 267 7.3.1.4 Employ causal inference methods in studies of UNGD and health ...... 267 7.3.2 Policy implications of studies on UNGD and health ...... 268 7.3.2.1 Improve data collection on UNGD ...... 268 7.3.2.2 Expand air quality monitoring in rural oil and natural gas producing areas269 7.3.2.3 Fund research on UNGD and health ...... 270 7.3.2.4 Incorporate externalities into energy prices ...... 271 7.3.3 Health implications of our research ...... 272 7.4 Final Remarks ...... 272 7.5 References...... 274 Appendix ...... 278 Institutional review board documents ...... 278 Curriculum Vita – Sara G. Rasmussen ...... 294

ix

List of tables Table 1.4. Description of the phases of Marcellus shale well development...... 6 Table 1.5.2.1 Published epidemiology studies of unconventional natural gas development by other research groups...... 16 Table 1.5.2.2 Published epidemiology studies of unconventional natural gas development by our research group...... 18 Table 1.7.1.2. Epidemiology studies of air pollution and asthma...... 25 Table 1.7.1.3 Studies of psychosocial stress and asthma exacerbations...... 29 Table 1.7.2.2. Epidemiology studies of environmental variables and mental health outcomes...... 33 Table 2.3.3.2.1. Median days from spud to stimulation by year and region, based on the 2013 well dataset ...... 64 Table 2.3.3.2.2. Median days from stimulation to production start by year and region, based on the 2013 well dataset ...... 65 Table 2.3.3.2.3. Spud date missingness percent (number) by data set iteration ...... 65 Table 2.3.3.4. Stimulation date missingness percent (number) by data set iteration ...... 67 Table 2.3.3.6. Percentage (number) of well pads by data source ...... 68 Table 2.4.2.2.1. Symptoms included in Patient-Reported Outcomes Measurement Information System fatigue short form 8a...... 73 Table 2.4.2.2.2. Symptoms included in the ID Migraine questionnaire...... 74 Table 2.4.2.2.3. Symptoms included in the Personal Health Questionnaire Depression Scale (PHQ-8) questionnaire...... 75 Table 2.4.2.3. ICD-9 codes used to identify disordered sleep...... 76 Table 2.6. Geocoding level for the asthma and depression study populations...... 78 Table 2.7.1. Variables created from the electronic health record used in health studies.78 Table 2.7.1.5.1. Smoking status categories considered as evidence of current smoking81 Table 2.7.1.5.2. Procedure codes considered as evidence of smoking ...... 81 Table 2.7.1.5.3. ICD-9 codes considered as evidence of smoking ...... 81 Table 2.7.2. Variables created from the electronic health record used in health studies 86 Table 2.7.2.2. Variables used to create the socioeconomic deprivation index...... 87 Table 2.7.3.2. Spearman correlation coefficient of the drilling metric assigned for different durations and lags for 446 randomly chosen asthma hospitalizations...... 93 Table 3.4.1. Descriptive statistics of cases and controls by exacerbation type for selected study variables by variable type (constant vs. time-varying) ...... 118 Table 3.4.2. Associations of unconventional natural gas activity metrics and asthma outcomes ...... 121 Table 4.3.2.2. ICD-9 codes used to identify disordered sleep...... 140 Table 4.3.6. Calculation of sample weights...... 145 Table 4.4.1. Descriptive statistics by depression symptoms...... 148 Table 4.4.2.1. Association of UNGD and depression symptoms in survey multinomial logistic models ...... 151 Table 4.4.2.2. Association of UNGD and depression symptoms in survey negative binomial models...... 152 Table 4.4.2.3. Association of UNGD (assigned at baseline) and depression symptoms in survey negative binomial models that include migraine or fatigue...... 153 Table 4.4.3. Association between UNGD and sleep deprivation in a survey-weighted generalized estimating equations model...... 154 Table 5.4.2.1. Results of PCA with Percentage of Variation Explained by Component 1 and Component 1 Loadings ...... 176

x

Table 6.1.1.1. Odds ratios from oral corticosteroid (mild exacerbation) models...... 196 Table 6.1.1.2. Odds ratios from emergency encounter (moderate exacerbation) models ...... 198 Table 6.1.2.1. Associations of UNGD stimulation metrics creating with extrapolated and sensitivity stimulation dates and asthma hospitalizations ...... 203 Table 6.1.2.2. Associations of UNGD spud metrics assigned on random encounter dates vs. random dates and asthma hospitalizations ...... 204 Table 6.1.2.3. Associations of spud activity metrics assigned using distance squared vs. distance cubed ...... 205 Table 6.1.2.4.1. Median distance to closer Geisinger Hospital by event and event status, km ...... 206 Table 6.1.2.4.2. Associations of the UNGD spud metric and hospitalization outcome without and with distance to hospital in the model...... 206 Table 6.2.1.1. Exponentiated coefficients from the truncated-weighted negative binomial model...... 210 Table 6.2.1.2. Odds ratios from the truncated-weighted multinomial logistic model. .... 211 Table 6.2.1.3. Exponentiated coefficients from the fully-weighted negative binomial model...... 212 Table 6.2.1.4. Odds ratios from the fully-weighted multinomial logistic model...... 213 Table 6.2.1.5. Exponentiated coefficients from the unweighted negative binomial model ...... 214 Table 6.2.1.6. Odds ratios from the unweighted multinomial logistic model...... 215 Table 6.3. Studies comparing self-reported asthma and asthma in medical record...... 217 Table 6.3.2.1. Classification by the electronic health record asthma algorithm by self- reported asthma...... 218 Table 6.3.2.2. Characteristics of patients with and without EHR and self-reported asthma...... 220 Table 6.4.5 Calculation of survey weights at baseline and follow-up (cells are counts unless otherwise specified)...... 226 Table 6.4.6.1. Asthma symptoms and missingness at baseline and follow-up...... 227 Table 6.4.6.2. Association of UNGD metrics with number of asthma symptoms at baseline among patients with and without asthma from adjusted multinomial models ...... 229 Table 6.4.6.3. Association of UNGD metrics with number of asthma symptoms at follow- up among patients with and without asthma from adjusted multinomial logistic models ...... 233 Table 6.4.6.4. Association of UNGD metrics with number of asthma symptoms (multinomial) at baseline among patients with and without asthma from adjusted logistic models...... 237 Table 6.5.1. Studies on greenness and asthma prevalence...... 246 Table 6.5.2. Studies on greenness and asthma exacerbations...... 247 Table 6.6.5.1. Association of peak normalized difference vegetation index with depression symptoms among study participants in boroughs in surveya negative binomial regressions...... 254 Table 6.6.5.2. Association of peak normalized difference vegetation index with depression symptoms among study participants in townships in surveya negative binomial regressions...... 254 Table 6.5.4. Association between UNGD, NDVI, and asthma exacerbations ...... 249

xi

List of figures Figure 1.1.1. Formations containing unconventional natural gas ...... 2 Figure 1.2. U.S. natural gas production, 2000-2015. Data from the U.S. Energy Information Agency...... 3 Figure 1.4. Impacts of Unconventional Natural Gas Development ...... 5 Figure 1.4.3.1. The spatial extent of the Haynesville Shale in Texas and Louisiana ...... 10 Figure 1.4.3.2. Modeled ozone impacts from Haynesville shale development compared to baseline...... 11 Figure 1.7. Conceptual diagram of outcomes considered in this thesis...... 22 Figure 2.1.2.1. The well record form...... 53 Figure 2.3.3.2.1. Counties considered eastern and northern for the purposes of well variable imputation and extrapolation...... 64 Figure 2.7.2.3. Weather stations reporting daily maximum temperature between 2005-12 in New York, New Jersey, and Pennsylvania...... 88 Figure 2.7.2.4. Locations of major and minor roads in New York and Pennsylvania...... 89 Figure 2.7.2.6. Public water supply areas in Pennsylvania...... 90 Figure 2.7.3.1. Timeline of well development with estimated durations each phases. .... 92 Figure 3.3.2. Flow diagram for identification of new asthma oral corticosteroid (OCS) medication orders...... 108 Figure 3.4.1.1. Number of developed pads (blue), and spudded (red), stimulated (green), and producing wells (yellow), 2005-12...... 114 Figure 3.4.1.2 The location of spudded wells as of December 2012 and residential location of Geisinger asthma patients...... 115 Figure 3.4.1.3 Locations of cases and controls by quartile of spud activity metric...... 116 Figure 3.4.3. Counties Associated with Asthma Hospitalization Case Status...... 123 Figure 4.2. Relationships among UNGD and moderating, mediating, and outcome variables...... 138 Figure 4.3.2.2. Flow diagram for identification of disordered sleep diagnoses...... 141 Figure 5.3.2. Location of UNG-related impoundments, compressor engines, and UNG wells...... 171 Figure 5.4.1. Total number of drilled unconventional natural gas wells and operating unconventional natural gas related impoundments and compressor engines in Pennsylvania by year...... 174 Figure 6.1.2.1. Locations of Wells with Extrapolated Stimulation Dates...... 202 Figure 6.4.1. Chronic Rhinosinusitis Integrative Studies Program survey design...... 223 Figure 6.5.1. Directed acyclic graph of UNGD, NDVI, and asthma exacerbations...... 245 Figure 6.6.4.1. Peak normalized difference vegetation index in 2014 by place type among study participants...... 252 Figure 6.6.4.1. Lowess smoother and scatter plot of population density (per km2) and peak normalized difference vegetation index in 2014 among study participants in townships...... 253 Figure 6.6.5. Association of peak normalized difference vegetation index with depression symptoms among study participants in townships in adjusted survey negative binomial regressions...... 255

xii

List of equations Equation 2.7.1.9. BMI formula for adults...... 84 Equation 2.7.3.1 Activity metrics for unconventional natural gas wells ...... 92 Equation 3.3.6.1. Pad preparation and spud metric...... 110 Equation 3.3.6.2. Stimulation activity metric...... 110 Equation 3.3.6.3. Production activity metric...... 111 Equation 3.3.7. Statistical Model...... 112 Equation 4.3.4. Activity metric...... 142 Equation 5.3.2. Inverse distance squared (IDS) metric...... 170 Equation 5.3.3. Inverse distance metric based on the drilling phase (IDD)...... 172

xiii

Chapter 1: Introduction

1.0 Rationale Over the last decade, unconventional natural gas development (UNGD) has

rapidly become a major energy source in the United States. Although UNGD is a major

industrial undertaking with community and environmental impacts, research on the

health effects of UNGD is limited. To help fill in the gaps in the understanding of the

health impacts of UNGD, we completed four primary tasks: 1) evaluation of associations

of UNGD metrics with asthma exacerbations; 2) evaluation of associations of UNGD

metrics with depression symptoms; 3) comparison of different approaches to UNGD

activity assessment with one another and in their associations with mild asthma

exacerbations; and 4) evaluation of whether and how other exposure-relevant aspects of

UNGD, such as impoundments, compressor engines, and flaring events should be

incorporated into UNGD activity metrics.

1.1 Unconventional natural gas terminology Unconventional natural gas refers to the resource (e.g., the natural gas in shale geologic layers), not the drilling technologies used. Unconventional natural gas

resources include natural gas from shale (natural gas trapped in the shale rock

formations), tight sand (natural gas trapped in sandstone formations), and coalbed

methane (methane stored within coal) (Figure 1.1.1).1 In the United States, most UNGD

extracts natural gas from shale. There are many shale gas plays in United States

(Figure 1.1.2).

1

Figure 1.1.1. Formations containing unconventional natural gas.1

Figure 1.1.2. Shale gas plays in the United States.2

2

1.2 Unconventional natural gas development in the United States The United States is the first country to produce shale gas on an industrial scale.2,3 Advances in drilling technologies (e.g., horizontal drilling) and the use of

(“fracking”) have allowed for the rapid growth in shale gas

production, from 4% of total United States natural gas production in 2005 to 46% of

production in 2015.2 Since 2008, conventional natural gas production has declined, although the country’s total natural gas production has increased as shale gas production has grown (Figure 1.2). In the United States, the Barnett shale in Texas drove the growth in shale gas production in the early 2000s, but since 2012 the

Marcellus shale in Pennsylvania has been the most productive shale gas play in the

United States.2

Figure 1.2. U.S. natural gas production, 2000-2015. Data from the U.S. Energy Information Agency.2

1.3 Development of a shale gas well The first step of UNGD is well pad preparation, during which the land (about seven acres) is cleared and materials are brought to the site.4 This requires 1,420-1,975

diesel truck trips per well.5 The well is then drilled vertically and horizontally. The day

3

drilling begins is known as the spud date. After drilling is completed, the horizontal

portion of the well is then perforated. Hydraulic fracturing (“fracking”) is the following

step, and is also called stimulation. Fracking requires three to seven million gallons of

fluids, which is 90-95% water, 5-10% sand proppant, and 0.1-1% chemical additives

(including friction reducers, biocides, acids, and gelling agents). The 10-20% of the injected fracking fluid that returns to the surface in the first 20 to 30 days is called flowback water, and it is chemically similar to fracking fluids.6 During this time, the gases

produced by the well are either flared off, resulting in pollutants such as carbon

monoxide, nitrogen oxides (NOx), and PM2.5 (particulate matter less than or equal to

2.5 micrometers in aerodynamic diameter), or captured and separated by devices called

green completions into its different components, which can then be sold.7

Gas production then begins, flowing through the enhanced fracture network to the well, and after it arrives at the surface the gas is separated from other organics and water.6 Next, the gas is compressed using diesel or natural gas powered compressor

engines. The gas then is distributed or stored.6 During gas production, water continues

to flow up the well—this is called produced water.8 To date, data suggest that sixty percent of an average well’s lifetime production of natural gas is produced in the first

year, and 88% by the end of the second year.9

1.4 Environmental and social impacts UNGD is rapidly changing rural communities into industrial areas. UNGD has

documented air, water, soil, and community impacts, and these impacts vary in scale

from local to global (Figure 1.4).10 Studies on environmental and community impacts are

summarized in the sections below.

From the start of pad preparation to the beginning of well production takes about

3 months.11 The Pennsylvania Department of Environmental Protection, under

Pennsylvania Code § 78.122, requires companies to submit forms and permits at most

4 of the stages of well development.12 The forms and permits that are required and the

information that is collected varies at the various steps of well completion (Table 1.4).

Figure 1.4. Impacts of Unconventional Natural Gas Development.10

5

Table 1.4. Description of the phases of Marcellus shale well development. Abbreviation: Mcf, thousand cubic feet of natural gas Approximate Phase Description Duration Report Data available from report

Permits Issued Detail Report Location, operator, permit issue date, unconventional (yes/no) Well Permitting -- Well Location Plat Location of the proposed well

Pad Requires tree clearing and building a Preparation foundation for the pad 1 month -- --

Local coordinates that describe the path of the well bore in terms of the total measured depth and the north/ south and east/west Directional Survey offsets from the surface hole location Spud Data Report Date, location, operator, well status

Drilling commence date is also known as The drilling process (drill method, drilling started, drilling the spud date. Rigs run non-stop during complete, completion date) and the physical characteristics of the

Well drilling drilling. 1 month Well Record well (type, depth, cement, casing and tubing)

Perforation Record 3 months (part of Completion Perforation Report) When and where the well was perforated

Stimulation Date, average pump rate, average treatment pressure, maximum Information (part of breakdown pressure, instantaneous shut in pressure, proppant Completion Report) type, and proppant mesh size

Well Completion Stimulation Fluid A list of all the additives in the stimulation fluid including the Stimulation 1 week Additives Chemical Component % Mass used in the Total Base Fluid

Statewide Data 10-20% of the injected hydraulic Download of Waste Waste type, waste quantity, disposal method, name and location Flowback fracturing volume 1 month Data and disposal site

Gas is treated (separated from other Statewide Data Well organics and water), gas is compressed, Download of Gas Quantity (Mcf) and gas production days by 6 month window Production and gas is sent to stored/distributed -- Production Data (Jan-June or July-Dec)

6

1.4.1 Soil impacts

Soil impacts are primarily during the drilling phase, which produces cuttings, the

debris from the various geologic layers that are drilled. In 2011, drilling of Marcellus wells

in Pennsylvania produced 789,632 tons of cuttings, which contain low levels of naturally

occurring radioactive materials. The cuttings were primarily disposed of in landfills.6

1.4.2 Water impacts

Much of the opposition to UNGD has cited its potential impacts on ground and surface water.13 Unconventional natural gas wells are drilled as deep as 10,000 feet,

passing through the drinking water aquifers, which are less than 300 feet from the

surface, creating a concern for water contamination.14 Additionally, UNGD can stress the

availability of water and the ability to treat waste water. Each well requires 3-7 million

gallons of fluid for hydraulic fracturing. In 2011, Marcellus wells in Pennsylvania

generated 7,878,587 barrels (330 million gallons) of flowback and 9,065,470 barrels

(380 million gallons) of produced water. Most of the flowback water was (90%) reused

for uses other than road spreading within Pennsylvania. Over half (56%) of the produced

water was reused for uses other than road spreading within Pennsylvania, a quarter

(26%) was disposed of in injection disposal wells in Ohio, and 12% was treated in

industrial waste water treatment facilities in Pennsylvania.6

Following reports of water contamination in 2008 in Pavillion, Wyoming and 2009 in Dimock, Pennsylvania,8 studies were conducted that documented impacts of UNGD

on ground and surface water quality. Findings from three studies of private water wells in

northeastern Pennsylvania and upstate New York found no evidence that Marcellus

shale formation water or saline fracturing fluids were contaminating drinking water

aquifers, but did find evidence that some water wells within 1 kilometer of a natural gas

well were contaminated with stray methane, ethane, and propane.14-16 The methane

carbon isotopes for wells closer to active drilling were consistent with deeper

7 thermogenic methane sources (such as the Marcellus and Utica shales), but the carbon isotopes for wells farther were more likely from biogenic or biogenic/thermogenic sources.15 Unlike methane, ethane and propane do not have biogenic sources and thus

must have arisen from the shale layer. Ethane concentrations were 23 times higher in

water wells within one kilometer of a natural gas well, and all of the water wells with

detectable propane concentrations were within 1 kilometer of a natural gas well.14 The

authors hypothesized that leaky gas well casings, inadequate cement sealings between

the casing and rock, or enhanced connectivity between the shale layer and the shallow

layers due to hydraulic fracturing may have been responsible for the water

contamination.14,15 The only conclusive evidence of groundwater impacts from UNGD

comes in Pavilion, Wyoming, where these same authors demonstrated impacts to groundwater drinking sources by comparing concentrations of major ions in groundwater and stimulation fluids and concluded that the impacts to groundwater in Pavilion were due to hydraulic fracturing.17

Surface water may be impacted by UNGD, but through a different pathway than

for groundwater: surface water impacts from UNGD may have been a result of waste

management and well pad preparation. Wastewater from UNGD is high in chloride (Cl-), which wastewater treatment plants may not be effective at removing. A study from

Pennsylvania found that the release of treated flowback and produced water from treatment facilities upstream was associated with an increase in the Cl- concentration in watersheds downstream. Additionally, the study suggested that clearing land for well pad development upstream may have contributed to runoff and increased the

concentration of total suspended solids downstream.4

1.4.3 Air impacts

Each stage of UNGD has air impacts.10 Both direct emissions, such as volatile organic compounds (VOCs), NOx, and PM, as well as ozone, a secondary pollutant

8 formed from VOCs and NOx in the presence of sunlight, are of concern. During the pad preparation and drilling phases, the trucks that bring materials to the site and the heavy machinery used at the site emit diesel exhaust, which includes PM, NOx, and SOx.

During stimulation, diesel trucks bring water to the site. The water injected into the well returns to the surface with VOCs. Fugitive emissions of VOCs, including benzene, toluene, ethylbenzene and xylenes (BTEX), occur during production. The infrastructure associated with UNGD other than wells may contribute more to emissions than the wells themselves. Two studies estimated that compressor engines contributed the most to emissions (including VOCs, PM, and NOx) from any source related to UNGD in

Pennsylvania.5,18 However, neither study incorporated impoundments, because there were not reliable emission factors for them, so impoundments remain an uncharacterized source of emissions.

The scale of the pollutants described above varies tremendously, from hundreds of meters to hundreds of kilometers. The hazardous air pollutants, such as BTEX, drop off quickly from the well site, but likely represent an important exposure to people living very close to a well site.19 Other pollutants have a regional scale. Vinciguerra 2015 attributed an increase in ethane to UNGD in the Baltimore/Washington area, hundreds of

20 kilometers from UNGD. Studies of UNGD’s impact on ozone, PM2.5, NOx, SO2, and/or

VOC in the Marcellus shale,18 and in Texas shale plays in Eagle Ford,21 Barnett,22 and

Haynesville23 shales have also demonstrated quality impacts several counties away from where the natural gas was actually produced. An emissions inventory estimated that, in

2020, Marcellus-related development would contribute 12% of VOCs and 12% of regional NOx in an area covering much of and Pennsylvania, and the southern counties in New York.18 In a modeling study of the Haynesville shale in Texas and Louisiana, the authors note that ozone impacts of UNGD in the Haynesville shale

“may extend well outside the immediate vicinity of the Haynesville shale into other

9 regions of Texas and Louisiana.” In comparing the extent of the Haynesville shale

(Figure 1.4.3.1, from Kemball-Cook 201023) to the extent of ozone impacts (Figure

1.4.3.2, from Kemball-Cook 201023), the episode maximum difference in daily maximum

8-hour ozone for the low and high scenarios (panels C and D in figure 4 [Figure 1.4.3.2])

show impacts of up to 6 ppb in areas several counties away from shale development.23

Figure 1.4.3.1. The spatial extent of the Haynesville Shale in Texas and Louisiana.23

10

Figure 1.4.3.2. Modeled ozone impacts from Haynesville shale development compared to baseline. Panels A and B show average daily differences and panels C and D show maximum daily differences; panels A and C are based on a low development scenario and panels B and D are based on a high development scenario.23

Emissions from UNGD also have climate impacts, which although not the focus

of this research, have implications for public health.10 UNGD has fugitive methane

emissions, and methane is a powerful greenhouse gas, with over 20 times more

greenhouse gas warming potential over 100 years than carbon dioxide. There have

been conflicting studies on the magnitude of fugitive methane emissions, some of which

suggest that, over various 25-100 year time scales, natural gas produced from UNGD is

worse for climate than coal because of the fugitive emissions.24-27

1.4.4 Community and social impacts

11

UNGD has many potential impacts on communities, including changes to employment opportunities, community composition, and neighborhood aesthetics.10,28 An

increase in heavy truck traffic is one of the most noticeable changes observed by

residents of communities undergoing UNGD.13 This increased traffic can lead to more

motor vehicle crashes. Counties with high levels of drilling in northeastern Pennsylvania

had 23% higher crash rates than counties with no drilling.29 Two non-peer reviewed studies that used ecologic designs found increased rates of calls to police,30 arrests for

driving under the influence,30 traffic violations,30 violent crimes,31 cases of sexually

transmitted diseases,31 and traffic fatalities31 in counties with UNGD compared to those

without.

Two studies have evaluated the impact of UNGD on home prices in

Pennsylvania. One found that the price of a house using groundwater and within 2km of a spudded well lost, on average, $16,059 of value, but the price of a house with public

water and within 2km of a well gained $5,070 on average.32 The second found a 20%

decrease in value for houses with well water and within 0.75 miles of a well permitted

within the last 6 months (approximately $30,000 for the study’s average home price of

$148,401).33

1.5 UNGD and health studies As discussed above, the construction and operation of shale gas wells presents risks to the environment and the community, and these risks can have occupational and environmental health impacts. UNGD may be an example of chronic environmental contamination—a long lasting environmental exposure that has contextual effects on health through psychosocial stress pathways or by influencing health-related behaviors.34 Psychosocial stress, sleep disruption, low socioeconomic status, exposure

to truck traffic, and exposure to air pollution are all biologically plausible pathways for

UNGD to affect health. There is a small but growing number of health studies on UNGD.

12

1.5.1 Occupational studies of UNGD

Although there are many potential occupational health risks from UNGD, including injury, exposure to noise, and chemical hazards, there is limited research on the subject to date.10 A NIOSH study evaluated exposure to the sand used for hydraulic fracturing. Inhalation of respirable crystalline silica, a component of sand, is a recognized

cause of silicosis and lung cancer. Worker exposure to respirable crystalline silica during hydraulic fracturing in 11 well drilling sites in five shale gas plays, including one site in

the Marcellus shale, found that in all shale gas plays studied, workers were being

exposed to levels above the Occupational Safety and Health Administration’s

permissible exposure limit.35

1.5.2 Epidemiology studies of UNGD

Research on UNGD and health remains limited. While a review article claims to have identified 31 studies on UNGD and health,36 the majority of those studies were

exposure assessment studies with no health outcomes evaluated. There have been eight epidemiology studies published on UNGD and health by other research groups

(Table 1.5.2.1) that have categorized study participants by UNGD, identified a health outcome, and evaluated associations between UNGD and the health outcome.29,37-43

Most of these studies were conducted in Pennsylvania. Two of these studies evaluated birth outcomes, two evaluated cancers, two evaluated symptoms, one evaluated car crashes, and one evaluated inpatient hospitalization rates. Six of the studies used government databases or registries for outcome assessment and three used questionnaires. Sample sizes ranged from very small (72 responders) to very large

(124,842 births).

Several of these studies have important limitations that impact the interpretation of the results. Fryzek 201337 may not have adequately accounted for the latency of the

cancers studied. The four ecologic studies were subject to the ecologic fallacy.29,37,41,42

13

McKenzie et al. 201439 included primarily conventional wells in their well metric. Saberi

et al. 201443 ascertained information on both UNGD and health outcomes on the same

questionnaire, creating the potential for dependent measurement error.

Simultaneously to the articles presented in this thesis, our research group

published two additional studies on UNGD and health outcomes, both of which I was a

co-author on but are not presented in this thesis (Table 1.5.2.2). One study evaluated

associations of UNGD with birth outcomes (ascertained from an electronic health record

[EHR])44 and the other with chronic rhinosinusitis (CRS), migraine headache, and fatigue

symptoms, ascertained from a questionnaire.45 Both were conducted in Pennsylvania.

Environmental epidemiology studies need to rank study participants on a gradient by exposure, but in studies of UNGD, this is challenging because exposure to

UNGD is not a single exposure, but multiple exposures (Section 1.4). Instead of using exposure assessment methods of each of air quality impacts, noise, light, vibration, flaring events, odors, and stress individually, studies to date have used geographic information system (GIS)-based proxies that incorporated the distance between study participants’ home addresses and unconventional natural gas wells, and in some cases additional information about well characteristics, to capture multiple potential pathways at once. Of the studies that have assigned a UNGD metric at the individual level, one asked study participants to attribute their health outcome to UNGD, and the remaining five assigned UNGD at the individual level using GIS-proxies, namely using nearest neighbor distance or gravity models. These GIS-based proxies have the benefit of being easily used retrospectively and being inexpensive. However, the GIS proxies for UNGD exposure also have limitations. To date, all GIS-based proxies used in UNGD and health studies have only used wells in their UNGD assessment, even though air pollution modeling studies have shown that components of UNGD other than wells, namely impoundments and compressor engines, may have significant contributions to emissions

14

(Section 1.4.3). Additionally, studies have used different GIS-based UNGD metrics,

which makes understanding what each metric is capturing and comparing results across

studies challenging.

While all studies found significant associations of UNGD and at least one health

outcome evaluated, associations were not consistent, even for the same outcome,

across studies. For example, the three studies evaluated associations of UNGD with

birth weight, and all found different results.39,40,44 Stacy found that UNGD was associated with lower birth weight, Casey also found that UNGD was associated with lower birth weight, but that the association was null when year was added to the model, and

McKenzie found that UNGD was associated with increased birth weight. The different results may be due to different confounders included in the models by study (e.g., only

Casey included year in the model as a confounder), or because different studies have used different UNGD metrics.

15

Table 1.5.2.1 Published epidemiology studies of unconventional natural gas development by other research groups. Abbreviations: C, cohort; E, ecologic; CS, cross-sectional; PA, Pennsylvania; CO, Colorado; G, government database; Q, questionnaire; UNGD, unconventional natural gas development; UNG, unconventional natural gas; OR, odds ratio; CI, confidence interval; SIR, standardized incidence ratio; RR, rate ratio First Outcome author, Study data year design Sample size Location Well metric Outcome(s) source Model(s) Significant findings 11,508 urinary Urinary bladder cancer bladder cases, 6,222 County count of cancer, Counties with higher well counts had higher thyroid cancer producing UNG thyroid rates of urinary bladder cancer compared to cases, and wells (high vs. cancer, and Standardize those without UNGD (e.g. in Washington Finkel 5,061 leukemia low) in six leukemia d incidence county from 2008-12, for urinary cancer, SIR 201641 E cases PA counties cases G ratios 130.7, 95% CI: 104.8, 156.6). Cancer (all cancers, central nervous system tumors, and 10,708 Before vs. after leukemia) Standardize Counties after drilling had a higher rate of Fryzek childhood PA drilling by rates by d incidence central nervous system tumors compared to 201337 E cancer cases county county G ratios before drilling SIR = 1.13; 95% CI, 1.02-1.25). Linear generalized Counties with high drilling were associated Graham estimating with higher vehicle crash rates (e.g., in 201529 6,432 car PA County level equation northern drilling counties vs. non-drilling E crashes high/low drilling Car crashes G models counties, RR = 1.15, 95% CI: 1.04-1.27). Inpatient Cardiology inpatient rates were associated prevalence with number of UNG wells per zip code and rates by UNG wells per km2. Neurology inpatient rates Jemielita Wells per zip medical Poisson were associated with UNG wells per km2 (e.g., 201542 92,805 PA code; wells per category per regression for wells per zip code and cardiology inpatient E hospitalizations km2 zip code G models rates RR = 1.0007, p = 0.0007). Congenital heart Inverse defects, distance to neural tube UNGD was associated with increased odds of spudded well, defects, oral congenital heart defect and neural tube includes clefts, Linear and defects (e.g., for UNGD and congenital heart conventional preterm birth, logistic defects, OR = 1.3, 95% CI: 1.2, 1.5). UNGD McKenzie CO and and term low regression was negatively associated with preterm birth 201439 C 124,842 births unconventional birth weight G models and positively associated with fetal growth.

16

First Outcome author, Study data year design Sample size Location Well metric Outcome(s) source Model(s) Significant findings 180 responders reporting on the Proximity to UNG wells was associated with Rabinowitz health status of Generalized increased odds of dermal and respiratory 201538 492 household PA Distance to well linear mixed symptoms (e.g., for skin conditions, OR = 4.1, CS members in buffers Symptoms Q model 95% CI: 1.4, 12.3). Questionnaire asked whether responder attribute health Saberi symptoms to Descriptive Of the 72 responders, 13% attributed a 201443 CS 72 responders PA UNGD Symptoms Q statistics symptom to UNGD. Birth weight, small for Linear and UNGD was associated with lower birth weight Stacy Inverse gestational logistic and higher odds of small for gestational age 201540 PA distance to age, and regression (e.g., for small for gestational age, OR = 1.34, C 15,451 births spudded well prematurity G models 95% CI: 1.10–1.63).

17

Table 1.5.2.2 Published epidemiology studies of unconventional natural gas development by our research group. Abbreviations: C, cohort; CS, cross-sectional; PA, Pennsylvania; EHR, electronic health record; Q, questionnaire; UNGD, unconventional natural gas development; OR, odds ratio; CI, confidence interval. First author, Study Outcome year design Sample size Location Well metric Outcome(s) data source Model(s) Significant findings

Inverse distance Term birth weight, UNGD was associated with squared UNGD preterm birth, low Multilevel higher odds of preterm birth and 9,384 activity metric 5 minute Apgar linear and high-risk pregnancy. Casey mothers; incorporating four score and small logistic (e.g., high UNGD and preterm 201644 10,496 PA phases of well size for regression birth OR = 1.41, 95% CI: 1.04- C neonates development gestational age EHR models 1.92). UNGD was associated with Inverse distance increased odds of CRS and squared UNGD Chronic fatigue together, migraine and activity metric rhinosinusitis Survey- fatigue together, and all three incorporating four (CRS), migraine weighted outcomes together (high UNGD Tustin 7,785 phases of well headache, and logistic and all three symptoms OR = 201645 CS responders PA development fatigue symptoms Q regression 1.84, 95% CI: 1.08, 3.14).

18

1.6 The use of electronic health records for epidemiology studies EHRs have been rapidly adopted by heath systems over the past decade. While they were originally designed for clinical and administrative (e.g., medical billing) purposes, they present an opportunity for epidemiological research. The use of EHRs in epidemiology studies is growing. EHRs can provide a longitudinal data source with large population sizes at much lower costs than cohort studies, although they have important differences compared to traditional primary data collection methods used in most epidemiology studies.46

1.6.1 Overview of the Geisinger Clinic and its EHR Geisinger is one health system using EHRs for epidemiology research. The

Geisinger Clinic provides primary care services to over 400,000 patients in over 35 counties of central and northeastern Pennsylvania, and includes counties with and without Marcellus shale wells. From 2001 to present, the Geisinger Clinic has been expanding, with an increasing number of hospitals (now around 7 or 8) and outpatient community clinics (now approaching 50). Geisinger has used an EHR since 2001.

Electronic health records capture data at all clinical encounters (inpatient, outpatient, emergency, telephone). The data in the EHR includes information on diagnoses and

ICD-9 codes, vital signs, medications, procedures, and laboratory tests. The EHR also includes sociodemographic information on patients, including address, age, sex, race/ethnicity, Medical Assistance for health insurance (a surrogate for family socioeconomic status), and information on habits such as tobacco and alcohol use.

Importantly, patients who have a Geisinger Clinic primary care provider represent the general population of the region.47 The Geisinger Clinic’s EHR is a powerful source of data for epidemiology studies because it includes a large sample size, can be obtained relatively inexpensively, is longitudinal, has detailed health information, and is already in an electronic form.

19

1.6.2 Environmental epidemiology studies using the Geisinger EHR

Several environmental epidemiology studies have been conducted using the Geisinger EHR. These include studies evaluating associations of: abandoned coal mines and diabetes,47,48 the built environment and obesity,49-51 industrial food animal

production and methicillin-resistant Staphylococcus aureus,52,53 unconventional natural

gas development and several health outcomes (these studies conducted simultaneously

to those presented in this thesis, as discussed in Section 1.5.2) ,44,45 greenness and

pregnancy outcomes,54 and several environmental exposures and CRS.55 Many of these

studies have used similar methods. First, patient data was obtained from the EHR. The

study population of cases and controls were identified using the EHR, including by

diagnosis codes, laboratory results, vitals, and mediation orders. Patients were

geocoded to their home address. Geographic information systems (GIS) were used to

create exposure metrics. Biostatistical analyses were used to evaluate the association of

exposures and outcomes while taking into account correlations within people, places,

and time. Finally, sensitivity analyses were conducted to evaluate the robustness of

associations.

The use of EHR data in epidemiology studies can present limitations, but many

of these limitations can be overcome. For example, the Geisinger EHR does not contain

information on socioeconomic status, but Medical Assistance, a means-tested program,

can be used as a surrogate. The EHR also does not retain old addresses if patients

move, with only the most recent address recorded. However, addresses can be

compared over different EHR data pulls, and an analysis of address changes revealed

little residential mobility among Geisinger patients.56 The EHR only captures health

conditions for which patients seek care, so conditions that patients can treat over the

counter or for which they do not seek care are not well captured. For these types of

outcomes, the EHR can be supplemented by a questionnaire (e.g., migraine headache,

20 fatigue, and nasal and sinus symptoms were ascertained from a questionnaire in an analysis of the association of UNGD and these symptoms45).

1.7 Outcomes selected for study in this thesis This thesis research first focused on asthma exacerbations using EHR data. As associations were discovered in that analysis, other data from another NIH- funded study (Tustin et al., discussed in Section 1.5.2) became available that assessed

depression symptoms by questionnaire. These data thus allowed the evaluation of

additional hypotheses regarding UNGD and health pathways. The rationales for

selecting asthma exacerbations as an outcome were: asthma exacerbations are common and severe; patients seek care for them, so they are well captured in an EHR; they can be affected by stress and by air pollution, which we hypothesized were the two primary pathways for UNGD to affect health; and they have a short latency from exposure to care seeking and thus EHR documentation. In contrast, while there is public and scientific concern over carcinogens in drinking water as a result of UNGD,19 the

latency from exposure to carcinogens to development of cancer could be decades. After

completing the study of UNGD and asthma exacerbations (Chapter 3), we selected

depression symptoms and disordered sleep as outcomes in our second epidemiology

study of UNGD. Similar to asthma exacerbations, depression symptoms can be affected

by stress and air pollution. Unlike asthma exacerbations, depression symptoms are not

well captured in an EHR because patients do not always seek care for symptoms, or

patients may take years to seek care. In a survey representative of the United States

population, only 13, 19.5, and 35.3% of patients with mild, moderate, and moderately

severe/severe depression symptoms reported contacting a mental health professional in the prior 12 months.57 However, we were still able to ascertain depression symptoms on

a population of Geisinger patients because questions on depression symptoms were

available from a questionnaire used for a study of chronic rhinosinusitis epidemiology

21 that was sent to Geisinger patients in 2014. We wanted to evaluate mediation of the

UNGD – depressive symptoms association by sleep deprivation, but because the association of UNGD with disordered sleep diagnoses has not been previously studied, we also directly evaluated this association. We also wanted to evaluate depression symptoms as a mediator of the UNGD and asthma exacerbation association, but that was not possible because there was an insufficient amount of time between the return of the depressive symptom questionnaire and the latest events collected in the EHR data

(Figure 1.7).

Figure 1.7. Conceptual diagram of outcomes considered in this thesis. Blue arrows identify associations evaluated in a study concurrent to, but not included in, this thesis (Tustin 2016), red arrows identify associations evaluated in this thesis, and the yellow arrow identifies an association that could not be evaluated because an insufficient number of events of asthma exacerbations were available in the EHR data at the time of this analysis. Abbreviation: EHR, electronic health record

1.7.1 Epidemiology of asthma and asthma exacerbations

22

Asthma is a common, chronic disease—in 2010, 25.7 million people in

the United States had asthma, a prevalence of 8.4%.58 Asthma is characterized by

variable and recurring symptoms (including cough, wheezing, shortness of breath, and

chest tightness), reversible airflow obstruction, bronchial hyper-responsiveness, and

underlying inflammation.59,60 In 2009, there were 11.8 million outpatient visits, 2.1 million

emergency department (ED) visits, and 479,300 hospitalizations for asthma.58 There is

no cure for asthma, but asthma can be controlled in most patients.61 Asthma control is

the degree to which asthma symptoms have been reduced by treatment. This includes

the current state of symptoms and the future risk of symptoms. Asthma severity reflects

the level of treatment needed to achieve asthma control.62

1.7.1.1 Overview and definition of asthma exacerbations

The goal of asthma treatment is to control symptoms and prevent exacerbations.

Asthma exacerbations are acute episodes of shortness of breath, cough, wheezing,

and/or chest tightness. Some exacerbations can be managed at home by patients, but

others require emergency medical care or hospitalization.60 Asthma exacerbations are a

strain on healthcare costs and reduce quality of life for patients. The 20% of asthma

patients who have exacerbations requiring hospitalization contribute 80% of all asthma

healthcare costs.59 Asthma exacerbations can cause children to miss school and adults

to miss work.63 We identified an asthma exacerbation using the National Institutes of

Health standardized asthma outcomes as one of the following, defining severe, moderate, and mild exacerbations, respectively: (1) an asthma specific hospitalization,

(2) an asthma specific ED visit, or (3) new oral corticosteroid (OCS) medication order.64

We defined an asthma hospitalization or ED visit using the International Classification of

Diseases, 9th Revision, Clinical Modification code for asthma (493.x), and a new OCS

mediation order by distinguishing new OCS medication orders from standing orders, and

23

OCS orders for an asthma exacerbation from those for other diseases (Chapter 3,

Figure 3.3.2).

1.7.1.2 Asthma exacerbations and air pollution

Outdoor air pollution is a recognized cause of asthma exacerbations, as pollutants can cause airway inflammation. There is a large body of literature linking asthma exacerbations to acute and chronic exposure to air pollutants, in particular ozone and PM, but also nitrogen dioxide and sulfur dioxide.60 Epidemiology studies have found

an association between low levels of air pollutants and asthma exacerbations (including

hospital and emergency visits and medication use) (Table 1.6.1.1).65-83 Many studies on

asthma and air pollution have evaluated the effect of an increase in ambient levels of air pollutants to rates of asthma hospitalizations or emergency room visits, and have found significant results. These studies have been done in many countries worldwide and often use a time series design (ecologic) to evaluate the short term effects of air pollution on

asthma exacerbations.

24

Table 1.7.1.2. Epidemiology studies of air pollution and asthma. Abbreviations: PM, particulate matter; NOx, nitrogen oxides; CO, carbon monoxide; SOx, sulfur oxides

Association with a measure of asthma outcomea

Exposure to Air Pollution Assigned on an Area Level Duration (A) or on and Lag Study Sample Confounders Individual Used for b c d e Author, Year Design Size Outcomes Level (I) Exposure Country Population Ozone PM2.5 PM10 NOx CO SOx Time 6447 Abe 200783 Series events H M A 0, 1 Japan A ------null null null Meta- Not Australia and yes, Barnett 200565 analysis reported H M, S A 0, 1 New Zealand C null null null expected null null 23,373 child ER visits; 6939 child hospital admissions ; 22,277 adolescent ER visits; 5,478 adolescent Time hospital yes, yes, Chew 200166 series admissions H M A 0, 1, 2 Singapore C null -- -- expected -- expected Parity, breast- feeding, income quintile (neighborhood level), maternal education status (neighborhood Sum over 3,482 level), birth gestation; nested cases; weight, and sum over case– 17,410 gestational first year yes, yes, yes, yes, yes, Clark 201067 control controls H, P length. I of life Canada C null expected expected expected expected expected 1123 0, 1, 2, 3, panel person 4 yes, yes, yes, yes, Delfino 200384 study days S D, I, M A USA C expected -- expected expected null expected Time 0, 1, 2, 3 yes, Fauroux 200085 Series 715 events H I, P, M A France C expected -- null null -- null Average of Time days 0 - 2 yes, Friedman 200180 Series 73 days H D, M A USA C expected ------Time 271 0, 1 yes, Gent 200368 Series participants M, S M A USA C expected null ------Time 5,933 0, 1, 2 Gouveia 200081 Series events H D, H, M, S A Brazil C null -- null null null null

25

Association with a measure of asthma outcomea

Exposure to Air Pollution Assigned on an Area Level Duration (A) or on and Lag Study Sample Confounders Individual Used for b c d e Author, Year Design Size Outcomes Level (I) Exposure Country Population Ozone PM2.5 PM10 NOx CO SOx Single day: 0,1,2,3,4,5 ; average of 0-4 lag Time 1972 days yes, yes, Halonen 201069 Series events H H, I, M, P A Finland C expected expected ------4416 yes, yes, Jaffe 200382 Cohort events H M, S A 1, 2, 3 USA A expected -- null null -- expected Single day: 0, 1, Mean of 2, 3; sum Case 174 events of days 0 crossove per day for and 1 yes, yes, yes, yes, yes, yes, Jalaludin 200870 r 1826 days H D, H, M A Australia C expected expected expected expected expected expected Single day: 0, 1, 2, 3, 4, 5; sum of days 0 - 1, Time 69,716 0 - 2, and yes, yes, yes, yes, Ko 200771 series events H M, S A 0 - 5 Hong Kong A expected expected expected expected -- null Time 1270 yes, Magas 200772 Series events H M A 1 USA C null null -- expected -- -- A, sex, R, poverty level, and health Case 1502 insurance yes, yes, Meng 200979 control participants H, S status I 1 year USA A expected expected yes null null -- 10,022 panel person- yes, yes, yes, yes, Ostro 200173 study days M, S M, R A 0 ,1, 2, 3 USA C expected expected expected expected -- -- Single day: 1, 2, 3; sum of Petroeschevsky Time 13,246 days 0 – 2 yes, 200174 Series events H M, S A and 0 - 4 Australia A expected -- -- null -- null Single day: 0, 1, Time 3601 2; sum of yes, yes, Samoli 201177 Series events H D, I, M, S A days 0 - 2 Greece C null -- expected null -- expected Single day: 0, 1, 2; 3-day moving 54,450 (sum of 0-, Meta- person- 1-, and 2- USA and yes, yes, yes, Schildcrout 200675 analysis days M M, S A day lags) Canada C null -- null expected expected expected

26

Association with a measure of asthma outcomea

Exposure to Air Pollution Assigned on an Area Level Duration (A) or on and Lag Study Sample Confounders Individual Used for b c d e Author, Year Design Size Outcomes Level (I) Exposure Country Population Ozone PM2.5 PM10 NOx CO SOx Time 1987 yes, Stieb 199686 Series events H M, S A 0 Canada A expected ------Time 0, 1, 2, 3, yes, yes, yes, Tenías 199878 Series 734 events H D, I, M, S A 4, 5 Spain A expected -- null expected -- expected 830 person- yes, Thurston 199776 Cohort days M, S M A 0, 1, 2, 3 USA C expected ------a "--" means the pollutant was not studied. b "H" is hospital visits or admissions; "P" is physician diagnosis; "M" is extra medication use; "S" is symptoms c "A" is age, "D" is day of the week, "H" is holidays, "I" is influenza or respiratory infections, "M" is meteorological factors, "P" is pollen, "R" is race / ethnicity, "S" is for seasonality or time trends. d Single day used for duration unless otherwise noted. Numbers refer to lag in days unless otherwise noted. “0” is the same day for exposure as the event. e "C" is children, "A" is children and adults

27

1.7.1.3 Asthma exacerbations and stress

Psychosocial stress can modify the effects of environmental triggers87 and has

been associated with asthma exacerbations, worse asthma control, and worse

medication aderence.88-93 Both chronic and acute psychosocial stress can exacerbate

asthma. Compared to studies of air pollution and asthma exacerbation, studies on

psychosocial stress have found stronger associations. For example, in children with

asthma, the risk of an asthma exacerbation increased 4.7 times in the two days following a very stressful event,89 and adults exposed to violence in their community had 2.3 and

2.5 times the risk of an asthma ED visit or hospitalization, respectively, compared to

those not exposed to community violence.91 Epidemiology studies of stress and asthma

exacerbations are summarized in Table 1.6.1.2.

28

Table 1.7.1.3 Studies of psychosocial stress and asthma exacerbations. Abbreviations: OR, odds ratio; 95% CI, 95% confidence interval First author, Outcome data year Study design Sample size Stress measure Outcomes(s) source Model(s) Primary findings Children with caretakers with mental health problems were A questionnaire on more likely to have psychosocial status asthma 1528 children of the child and the Asthma symptoms, hospitalizations (OR = Weil 199993 Longitudinal with asthma caregiver health care utilization Caregiver report Logistic regression 1.8, 95%CI [1.2-2.7]). Exposure to a stressful event was associated with an Three standardized Asthma exacerbation, increased risk of an interviews on the defined as a peak- asthma exacerbation 90 children with child’s stressful life flow recording of less in the subsequent 4 Sandberg asthma aged 6– events than 30% of the weeks (OR = 1.71, 200090 Longitudinal 13 years recorded maximum Self-report Logistic regression 95% CI [1.04–2.82]). Asthma exacerbation, defined as a mean of The risk of an asthma 60 children with Three standardized the day’s two exacerbation asthma aged 6– interviews on the readings below 70% increased 4.7 times 13 years (the child’s stressful life of the normal value (p<0.01) in the 2 days Sandberg same data as events and an increase in following a very 200489 Longitudinal Sandberg 2000) reported symptoms Self-report Cox regression stressful event. Adults exposed to Asthma-related ED violence had higher visits, self-reported odds of asthma asthma-related emergency visits and hospitalizations, hospitalizations (OR = asthma-related 2.3, 95% CI [1.3-3.9] 397 adults with quality of life, and and 2.5, 95% CI [1.1- prior asthma Exposure to forced expiratory 5.6], respectively) than Apter 201091 Longitudinal encounters community violence volume in 1 second. Self-report Logistic regression those unexposed. Children in neighborhoods Asthma symptoms, perceived as less safe medication use, had increased risk of missed school due to asthma exacerbations Caregiver asthma, (e.g., for medication 219 children with assessment of health care utilization, use, OR = 4.0, 95% CI Kopel 201592 Cross-sectional asthma neighborhood safety lung function Kopel, 2015 Logistic regression [1.8–8.8]).

29

1.7.2 Epidemiology of depression

Depression is a common but serious disease. Depression is a symptom-based condition, and symptoms of major depressive disorder include hopelessness, helplessness, sad or irritable mood, loss of interest in activities, and fatigue.94

Depression has significant public health and economic costs. Major depressive disorder

is one of the top five contributors to disability adjusted life years lost in the United

States.95 Depression can range in severity, from mild to severe.96 The Centers for

Disease Control, using data from the 2009-12 National Health and Nutrition Examination

Survey, estimated that the general population prevalence of mild, moderate, and

moderately severe/severe depression symptoms was 15.3, 4.7, and 2.9%, respectively,

in the United States, using the PHQ–9. There were differences in rates of depression

symptoms by race/ethnicity and poverty: rates of severe depression symptoms were

higher among non-Hispanic black people than non-Hispanic white people, and rates of

mild and moderate depression symptoms were higher among non-Hispanic black people

and Hispanic people than non-Hispanic white people. Rates of moderate, moderately

severe, and severe depression symptoms were higher among people living below the

federal poverty line than above it.57

In epidemiology studies, depression can be ascertained using a questionnaire

about depression symptoms (e.g., the Patient Health Questionnaire [PHQ–9], asks about

depression symptoms in the two weeks prior to questionnaire response), or by using

clinical diagnoses and antidepressant medication use. However, since many patients

with depression do not seek care, using clinical diagnoses and antidepressant

medication use could have low sensitivity for identifying depression (Section 1.7).57,97

1.7.2.1 Association of community characteristics and depression symptoms

UNGD may change the social and physical characteristics of a community

(Section 1.4.4), and these neighborhood changes could affect depression symptoms.

30

Several potential pathways from neighborhood characteristics to depression have been identified, including changing health behaviors and psychosocial stress (Figure

1.7.2.1).98 The neighborhood characteristics that could be affected by UNGD, which

include noise, traffic, changes in community composition, and changes in socioeconomic

status, are highlighted in yellow.

Two recent reviews evaluated studies on associations of community

characteristics with depression symptoms, one of cross-sectional and longitudinal

studies and one of only longitudinal studies.98,99 In studies of the association of

community characteristics with depression symptoms, characteristics were often

quantified using a single community-level variable or an index of variables, including

poverty, employment, home ownership, availability of retail outlets, and community

population composition (e.g., percent minority, percent single parents, or percent foreign

born). The review of cross-sectional and longitudinal studies concluded that community

characteristics were associated with depression symptoms,98 and the review of

longitudinal studies concluded the evidence was mixed for the association of community characteristics and depression symptoms.99

31

Figure 1.7.2.1. Relationship between community characteristics and depression symptoms.1 The community characteristics that could be affected by UNGD are highlighted in yellow.

1.7.2.2 Association of environmental variables and depressive symptoms

Several studies have evaluated the association of a wide range of environmental variables with depression symptoms.100-106 These studies have modeled depression

symptoms as binary and continuous outcomes and have largely been cross-sectional. In

particular, studies of coal mining and the oil spill provide evidence of

an association between chronic environmental contamination (which have contextual

effects on health through psychosocial stress pathways or by influencing health-related

behaviors) and depressive symptoms.

32

Table 1.7.2.2. Epidemiology studies of environmental variables and mental health outcomes. Abbreviations: OR, odds ratio; 95% CI, 95% confidence interval; PHQ, Patient Health Questionnaire; NHANES, National Health and Nutrition Examination Survey Study Study design Sample size Location Outcome Exposure(s) Model(s) Primary findings 2,669 people with a Dryland salinity at the district level Speldewinde hospitalization Western Hospitalization rates for Linear was associated with 2009100 Ecological for depression Australia depression Dryland salinity regression hospitalizations for depression.

Residence in a county with Living in a county with mountaintop Kentucky, mountain top removal, removal, compared to no coal Tennessee, Depression symptoms, other non-mountaintop mining, was associated with Hendryx Cross- Virginia, and measured with removal coal mining, or no Logistic depression symptoms (OR = 1.40, 2013101 sectional 8,591 adults West Virginia the PHQ-9 coal mining regression 95% CI: 1.15, 1.71). Survey weighted (NHANES Exposure to secondhand smoke Depression symptoms, data) zero- was associated with increased Bandiera Cross- measured with Secondhand smoke inflated Poisson level of depression symptoms (β = 2011102 sectional 2,901 children United States the PHQ-9 exposure regression 0.09, p = .03). Four measures to exposure to Deepwater Horizon Oil Spill: 1. Location of home with respect to the oil spill (proximal vs. non-proximal) 2. Did participants have direct contact with the oil? (yes vs. no) 3. Were participants involved in cleanup Having direct contact with oil (OR activities? (yes/no) = 1.58, 95% CI: 1.15, 2.16) and 4. How did the oil spill having had a direct impact on impact upon their job job/income from the oil spill (OR = Depression symptoms and/or income? (1 = no job 1.52, 95% CI: 1.08, 2.12) were Alabama, using the dichotomized loss or no income associated with depression Florida, PHQ-8 (a score of decrease; 2 = job loss; 3 = symptoms, but proximity of Louisiana, greater than or equal to income decrease; 4 = both Loglinear residence to the oil spill and being Cross- 38,361 adults and 10 was used to define job loss and income regression involved in cleanup were not. Fan 2014103 sectional Mississippi current depression) decrease) models

33

Study Study design Sample size Location Outcome Exposure(s) Model(s) Primary findings A factor analysis of 9 Economic and physical exposure questions on exposure to to the oil spill was associated with Deepwater Horizon Oil Spill depression symptoms ( RR for 20-item Center for was used to create two economic exposure = 1.2, 95% CI: Epidemiological factors: 1.02 - 1.41; RR for physical Cross- Studies Depression 1. Economic Exposure exposure = 1.2, 95% CI: 1.01 - Rung 2016104 sectional 2,842 women Louisiana Scale, dichotomized 2. Physical Exposure Poisson 1.43) Four questions on exposure to the Haiti Major damage to the house was earthquake: associated with depression 1. Major damage to house symptoms (OR= 1.34, 95% CI: 2. Trapped under rubble 1.01, 1.78), but was not statistically Depression symptoms, 3. Physically injured significantly associated with the Cerdá Cross- measured with the 4. Involved in other three measures of exposure. 2013105 sectional 1,323 adults Haiti PHQ-9 rescue/recovery Logistic Two survey weighted (NHANES) models: 1. Poisson: depression symptoms as a dichotomous variable Blood lead levels were not Depression symptoms, 2. Ordinal statistically significantly associated Golub Cross- measured with the logistic with depression symptoms in 2010106 sectional 4,159 adults United States PHQ-9 Blood lead quartiles regression either model.

34

1.7.3 Relationship between depression and asthma

Asthma and depression symptoms are often co-morbid, but the temporal relationship between asthma and depression is not clear. A recent meta-analysis of depression and asthma identified six studies that evaluated the association between depression at baseline and the subsequent risk of adult-onset asthma,107-112 and two

studies that evaluated the association between asthma at baseline and the subsequent

risk of depression.112,113 The meta-analysis found that depression at baseline was associated with increased risk of developing asthma (relative risk [RR] = 1.43, 95% confidence interval [CI]: 1.28, 1.61), but that asthma at baseline was not associated with increased risk of developing depression (RR = 1.23, 95% CI: 0.72, 2.10).114 However,

the authors note that the analysis of asthma and incident depression may not have been

statistically significant because of the limited number of studies available. Research on

the association of depression symptoms and asthma exacerbations is limited, but in a

longitudinal study of asthma patients, having depression symptoms was associated with

subsequent asthma ED visits, but not with OCS medication orders.115

1.8 Specific aims As described above, there were few published studies on UNGD and health,

despite calls for more research, when this thesis research began.116,117 Notably, there

were no prior published studies on UNGD and asthma exacerbations or UNGD and a

mental health outcome. Furthermore, the published studies on UNGD and health

outcomes used different methods of exposure assessment (nearest neighbor distance

and gravity metrics), making comparison of results across studies difficult. To fill these

gaps, the aims of this dissertation research were to:

1. Using a nested case-control design, evaluate associations between UNGD activity

(using surrogate measures created using geographic information systems [GIS]) and

35 asthma exacerbations (hospitalizations, ED visits, and new OCS mediation orders) in a

cohort of Geisinger patients with asthma using EHR data.

2. Evaluate associations between UNGD activity and depression symptoms using

questionnaire data.

3. Characterize impoundments, compressor engines, and flaring events related to

UNGD in Pennsylvania, evaluate whether and how to incorporate impoundments and

compressor engines into UNGD activity assessment, and compare the different

approaches to UNGD activity assessment to themselves and in their relations with mild

asthma exacerbations.

1.9 References

1. U.S. Energy Information Administration. The geology of natural gas resources. Today

in Energy Web site. http://www.eia.gov/todayinenergy/detail.cfm?id=110. Published

2/14/2011. Updated 2011. Accessed 5/19, 2016.

2. U.S. Energy Information Administration. Shale in the United States. Energy in Brief

Web site.

https://www.eia.gov/energy_in_brief/article/shale_in_the_united_states.cfm#shaledata.

Updated 2016. Accessed 5/19, 2016.

3. US Energy Information Administration. Technically recoverable shale oil and shale

gas resources. . 2013(8/19/2013).

4. Olmstead SM, Muehlenbachs LA, Shih J, Chu Z, Krupnick AJ. Shale gas development

impacts on surface water quality in Pennsylvania. Proceedings of the National Academy

of Sciences. 2013;110(13):4962-4967.

36

5. Litovitz A, Curtright A, Abramzon S, Burger N, Samaras C. Estimation of regional air- quality damages from Marcellus shale natural gas extraction in Pennsylvania.

Environmental Research Letters. 2013;8(1):014017.

6. Maloney KO, Yoxtheimer DA. Production and disposal of waste materials from gas and oil extraction from the Marcellus shale play in Pennsylvania. Env Prac.

2012;14(04):278-287.

7. Weinhold B. The future of fracking: New rules target air emissions for cleaner natural gas production. Environ Health Perspect. 2012;120(7):a272-9.

8. Vidic R, Brantley S, Vandenbossche J, Yoxtheimer D, Abad J. Impact of shale gas development on regional water quality. Science. 2013;340(6134).

9. Hughes JD. Drill baby drill. Nature. 2013(7437):7437-308. http://www.postcarbon.org/reports/DBD-report-FINAL.pdf.

10. Adgate JL, Goldstein BD, McKenzie LM. Potential public health hazards, exposures and health effects from unconventional natural gas development. Environ Sci Technol.

2014;48(15):8307-8320.

11. Hill EL. Unconventional Natural Gas Development and Infant Health: Evidence from

Pennsylvania. 2012.

12. Pennsylvania Code. Subchapter E. Well reporting § 78.121-§ 78.125. http://www.pacode.com/secure/data/025/chapter78/subchapEtoc.html.

13. Powers M, Saberi P, Pepino R, Strupp E, Bugos E, Cannuscio CC. Popular epidemiology and “fracking”: Citizens’ concerns regarding the economic, environmental,

37 health and social impacts of unconventional natural gas drilling operations. J Community

Health. 2015;40(3):534-541.

14. Jackson RB, Vengosh A, Darrah TH, et al. Increased stray gas abundance in a subset of drinking water wells near Marcellus shale gas extraction. Proceedings of the

National Academy of Sciences. 2013;110(28):11250-11255.

15. Osborn SG, Vengosh A, Warner NR, Jackson RB. Methane contamination of drinking water accompanying gas-well drilling and hydraulic fracturing. Proc Natl Acad

Sci U S A. 2011;108(20):8172-8176.

16. Warner NR, Jackson RB, Darrah TH, et al. Geochemical evidence for possible natural migration of brine to shallow aquifers in Pennsylvania.

Proceedings of the National Academy of Sciences. 2012;109(30):11961-11966.

17. DiGiulio DC, Jackson RB. Impact to underground sources of drinking water and domestic wells from production well stimulation and completion practices in the Pavillion,

Wyoming. Environ Sci Technol. 2016;50(8):4524-4536.

18. Roy AA, Adams PJ, Robinson AL. Air pollutant emissions from the development, production, and processing of Marcellus shale natural gas. J Air Waste Manage Assoc.

2013;64(1):19-37.

19. McKenzie LM, Witter RZ, Newman LS, Adgate JL. Human health risk assessment of air emissions from development of unconventional natural gas resources. Sci Total

Environ. 2012;424:79-87.

38

20. Vinciguerra T, Yao S, Dadzie J, et al. Regional air quality impacts of hydraulic fracturing and shale natural gas activity: Evidence from ambient VOC observations.

Atmos Environ. 2015;110:144-150.

21. Pacsi AP, Kimura Y, McGaughey G, McDonald-Buller E, Allen DT. Regional ozone impacts of increased natural gas use in the Texas power sector and development in the

eagle ford shale. Environ Sci Technol. 2015;49(6):3966-3973.

22. Pacsi AP, Alhajeri NS, Zavala-Araiza D, Webster MD, Allen DT. Regional air quality impacts of increased natural gas production and use in Texas. Environ Sci Technol.

2013;47(7):3521-3527.

23. Kemball-Cook S, Bar-Ilan A, Grant J, et al. Ozone impacts of natural gas

development in the Haynesville shale. Environ Sci Technol. 2010;44(24):9357-9363.

24. Howarth RW, Santoro R, Ingraffea A. Methane and the greenhouse-gas footprint of

natural gas from shale formations. Clim Change. 2011;106(4):679-690.

25. Allen DT, Torres VM, Thomas J, et al. Measurements of methane emissions at

natural gas production sites in the united states. Proc Natl Acad Sci U S A.

2013;110(44):17768-17773.

26. Jiang M, Griffin WM, Hendrickson C, Jaramillo P, VanBriesen J, Venkatesh A. Life

cycle greenhouse gas emissions of Marcellus shale gas. Environmental Research

Letters. 2011;6(3):034014.

27. Karion A, Sweeney C, Pétron G, et al. Methane emissions estimate from airborne

measurements over a western united states natural gas field. Geophys Res Lett.

2013;40(16):4393-4397.

39

28. Sangaramoorthy T, Jamison AM, Boyle MD, et al. Place-based perceptions of the impacts of fracking along the Marcellus shale. Soc Sci Med. 2016.

29. Graham J, Irving J, Tang X, et al. Increased traffic accident rates associated with shale gas drilling in Pennsylvania. Accident Analysis & Prevention. 2015;74:203-209.

30. Brasier KJ, Rhubart D. Effects of Marcellus shale development on the criminal justice

system (The Marcellus Impacts Project Report #6). 2014.

31. Price M, Basurto L, Herzenberg S, Polson D, Ward S, and Wazeter E. The shale

tipping point: The relationship of drilling to crime, traffic fatalities, STDs, and rents in

Pennsylvania, West Virginia, and Ohio. (The Multi-State Shale Research Collaborative).

32. Muehlenbachs L, Spiller E, Timmins C. The housing market impacts of shale gas

development. Am Econ Rev. 2015;105(12):3633-59.

33. Gopalakrishnan S, Klaiber HA. Is the shale energy boom a bust for nearby residents? evidence from housing values in Pennsylvania. Am J Agric Econ.

2014;96(1):43-66.

34. Couch SR, Coles CJ. Community stress, psychosocial hazards, and EPA decision-

making in communities impacted by chronic technological disasters. Am J Public Health.

2011;101(S1).

35. Esswein EJ, Breitenstein M, Snawder J, Kiefer M, Sieber WK. Occupational

exposures to respirable crystalline silica during hydraulic fracturing. Journal of

occupational and environmental hygiene. 2013;10(7):347-356.

40

36. Hays J, Shonkoff SB. Toward an understanding of the environmental and public health impacts of unconventional natural gas development: A categorical assessment of the peer-reviewed scientific literature, 2009-2015. PloS one. 2016;11(4):e0154164.

37. Fryzek J, Pastula S, Jiang X, Garabrant DH. Childhood cancer incidence in

Pennsylvania counties in relation to living in counties with hydraulic fracturing sites.

Journal of Occupational and Environmental Medicine. 2013;55(7):796-801.

38. Rabinowitz PM, Slizovskiy IB, Lamers V, et al. Proximity to natural gas wells and reported health status: Results of a household survey in Washington County,

Pennsylvania. Environ Health Perspect. 2014.

39. McKenzie LM, Guo R, Witter RZ, Savitz DA, Newman LS, Adgate JL. Birth outcomes and maternal residential proximity to natural gas development in rural Colorado. Environ

Health Perspect. 2014.

40. Stacy SL, Brink LL, Larkin JC, et al. Perinatal outcomes and unconventional natural gas operations in southwest Pennsylvania. PLOS ONE. 2015;10(6):e0126425.

41. Finkel M. Shale gas development and cancer incidence in southwest Pennsylvania.

Public Health. 2016;141:198-206.

42. Jemielita T, Gerton GL, Neidell M, et al. Unconventional gas and oil drilling is

associated with increased hospital utilization rates. PLoS ONE. 2015;10(7):e0131093.

43. Saberi P, Propert KJ, Powers M, Emmett E, Green-McKenzie J. Field survey of

health perception and complaints of Pennsylvania residents in the Marcellus shale

region. Int J Environ Res Public Health. 2014;11(6):6517-6527.

41

44. Casey JA, Savitz DA, Rasmussen SG, et al. Unconventional natural gas development and birth outcomes in Pennsylvania, USA. Epidemiology. 2015.

45. Tustin AW, Hirsch AG, Rasmussen SG, Casey JA, Bandeen-Roche K, Schwartz BS.

Associations between unconventional natural gas development and nasal and sinus, migraine headache, and fatigue symptoms in Pennsylvania. Environ Health Perspect.

2016.

46. Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health records for population health research: A review of methods and applications. Annu Rev Public

Health. 2015(0).

47. Liu AY, Curriero FC, Glass TA, Stewart WF, Schwartz BS. The contextual influence of coal abandoned mine lands in communities and type 2 diabetes in Pennsylvania.

Health Place. 2013.

48. Liu AY, Curriero FC, Glass TA, Stewart WF, Schwartz BS. Associations of the burden of coal abandoned mine lands with three dimensions of community context in

Pennsylvania. ISRN Public Health. 2012;2012.

49. Nau C, Schwartz BS, Bandeen‐Roche K, et al. Community socioeconomic deprivation and obesity trajectories in children using electronic health records. Obesity.

2015;23(1):207-212.

50. Nau C, Ellis H, Huang H, et al. Exploring the forest instead of the trees: An innovative method for defining obesogenic and obesoprotective environments. Health

Place. 2015;35:136-146.

42

51. Schwartz BS, Stewart WF, Godby S, et al. Body mass index and the built and social environments in children and adolescents using electronic health records. Am J Prev

Med. 2011;41(4):e17-e28.

52. Casey JA, Curriero FC, Cosgrove SE, Nachman KE,Schwartz BS. HIgh-density livestock operations, crop field application of manure, and risk of community-associated methicillin-resistant staphylococcus aureus infection in Pennsylvania. JAMA Internal

Medicine. 2013;173(21):1980-1990.

53. Casey JA, Shopsin B, Cosgrove SE, et al. High-density livestock production and molecularly characterized MRSA infections in Pennsylvania. Environ Health Perspect.

2014;122(5):464-470.

54. Casey JA, James P, Rudolph KE, Wu CD, Schwartz BS. Greenness and birth outcomes in a range of Pennsylvania communities. Int J Environ Res Public Health.

2016;13(3):10.3390/ijerph13030311.

55. Hirsch AG, Stewart WF, Sundaresan AS, et al. Nasal and sinus symptoms and chronic rhinosinusitis in a population-based sample. Allergy. 2016.

56. Rasmussen SG, Ogburn EL, McCormack M, et al. Association between unconventional natural gas development in the Marcellus shale and asthma exacerbations. JAMA Intern Med. 2016;176(9):1334-1343.

57. Pratt LA, Brody DJ. Depression in the U.S. household population, 2009-2012. NCHS

Data Brief. 2014;(172)(172):1-8.

58. Moorman JE, Akinbami LJ, Bailey CM, et al. National surveillance of asthma: United states, 2001-2010. National Center for Health Statistics, Vital Health Stat. 2012;3:35.

43

59. Dougherty R, Fahy JV. Acute exacerbations of asthma: Epidemiology, biology and the exacerbation‐prone phenotype. Clinical & Experimental Allergy. 2009;39(2):193-202.

60. National Heart, Lung, and Blood Institute. National Asthma Education Program.

Expert Panel on the Management of Asthma. Expert panel report 3: Guidelines for the diagnosis and management of asthma: Full report. US Department of Health and Human

Services, National Institutes of Health, National Heart, Lung, and Blood Institute; 2007.

61. Taylor D, Bateman E, Boulet L, et al. A new perspective on concepts of asthma

severity and control. European Respiratory Journal. 2008;32(3):545-554.

62. Reddel HK, Taylor DR, Bateman ED, et al. An official American thoracic

society/European respiratory society statement: Asthma control and exacerbations:

Standardizing endpoints for clinical asthma trials and clinical practice. American journal

of respiratory and critical care medicine. 2009;180(1):59-99.

63. Jackson DJ, Sykes A, Mallia P, Johnston SL. Asthma exacerbations: Origin, effect,

and prevention. J Allergy Clin Immunol. 2011;128(6):1165-1174.

64. Fuhlbrigge A, Peden D, Apter AJ, et al. Asthma outcomes: Exacerbations. J Allergy

Clin Immunol. 2012;129(3):S34-S48.

65. Barnett AG, Williams GM, Schwartz J, et al. Air pollution and child respiratory health:

A case-crossover study in Australia and New Zealand. American Journal of Respiratory

and Critical Care Medicine. 2005;171(11):1272-1278.

66. Chew F, Goh D, Ooi B, Saharom R, Hui J, Lee B. Association of ambient

air‐pollution levels with acute asthma exacerbation among children in Singapore. Allergy.

1999;54(4):320-329.

44

67. Clark NA, Demers PA, Karr CJ, et al. Effect of early life exposure to air pollution on development of childhood asthma. Environ Health Perspect. 2010;118(2):284.

68. Gent JF, Triche EW, Holford TR, et al. Association of low-level ozone and fine particles with respiratory symptoms in children with asthma. JAMA: the journal of the

American Medical Association. 2003;290(14):1859-1867.

69. Halonen JI, Lanki T, Yli-Tuomi T, Kulmala M, Tiittanen P, Pekkanen J. Urban air pollution, and asthma and COPD hospital emergency room visits. Thorax.

2008;63(7):635-641.

70. Jalaludin B, Khalaj B, Sheppeard V, Morgan G. Air pollution and ED visits for asthma in australian children: A case-crossover analysis. Int Arch Occup Environ Health.

2008;81(8):967-974.

71. Ko F, Tam W, Wong T, et al. Effects of air pollution on asthma hospitalization rates in different age groups in Hong Kong. Clinical & Experimental Allergy. 2007;37(9):1312-

1319.

72. Magas OK, Gunter JT, Regens JL. Ambient air pollution and daily pediatric hospitalizations for asthma. Environ Sci Pollut Res Int. 2007;14(1):19-23.

73. Ostro B, Lipsett M, Mann J, Braxton-Owens H, White M. Air pollution and

exacerbation of asthma in African-American children in Los Angeles. Epidemiology.

2001;12(2):200-208.

74. Petroeschevsky A, Simpson RW, Thalib L, Rutherford S. Associations between outdoor air pollution and hospital admissions in Brisbane, Australia. Archives of

Environmental Health: An International Journal. 2001;56(1):37-52.

45

75. Schildcrout JS, Sheppard L, Lumley T, Slaughter JC, Koenig JQ, Shapiro GG.

Ambient air pollution and asthma exacerbations in children: An eight-city analysis. Am J

Epidemiol. 2006;164(6):505-517.

76. Thurston GD, Lippmann M, Scott MB, Fine JM. Summertime haze air pollution and

children with asthma. Am J Respir Crit Care Med. 1997;155(2):654-660.

77. Samoli E, Nastos P, Paliatsos A, Katsouyanni K, Priftis K. Acute effects of air

pollution on pediatric asthma exacerbation: Evidence of association and effect

modification. Environ Res. 2011;111(3):418-424.

78. Tenias JM, Ballester F, Rivera ML. Association between hospital emergency visits

for asthma and air pollution in Valencia, Spain. Occup Environ Med. 1998;55(8):541-

547.

79. Meng YY, Rull RP, Wilhelm M, Lombardi C, Balmes J, Ritz B. Outdoor air pollution

and uncontrolled asthma in the San Joaquin valley, California. J Epidemiol Community

Health. 2010;64(2):142-147.

80. Friedman MS, Powell KE, Hutwagner L, Graham LM, Teague WG. Impact of changes in transportation and commuting behaviors during the 1996 summer Olympic

Games in Atlanta on air quality and childhood asthma. JAMA. 2001;285(7):897-905.

81. Gouveia N, Fletcher T. Respiratory diseases in children and outdoor air pollution in

Sao Paulo, brazil: A time series analysis. Occup Environ Med. 2000;57(7):477-483.

82. Jaffe DH, Singer ME, Rimm AA. Air pollution and emergency department visits for asthma among Ohio Medicaid recipients, 1991–1996. Environ Res. 2003;91(1):21-28.

46

83. Abe T, Tokuda Y, Ohde S, Ishimatsu S, Nakamura T, Birrer RB. The relationship of short-term air pollution and weather to ED visits for asthma in japan. Am J Emerg Med.

2009;27(2):153-159.

84. Delfino RJ, Gong H,Jr, Linn WS, Pellizzari ED, Hu Y. Asthma symptoms in Hispanic children and daily ambient exposures to toxic and criteria air pollutants. Environ Health

Perspect. 2003;111(4):647-656.

85. Fauroux B, Sampil M, Quenel P, Lemoullec Y. Ozone: A trigger for hospital pediatric asthma emergency room visits. Pediatr Pulmonol. 2000;30(1):41-46.

86. Stieb DM, Burnett RT, Beveridge RC, Brook JR. Association between ozone and

asthma emergency department visits in Saint John, New Brunswick, Canada. Environ

Health Perspect. 1996;104(12):1354-1360.

87. Chen E, Miller GE. Stress and inflammation in exacerbations of asthma. Brain Behav

Immun. 2007;21(8):993-999.

88. Wisnivesky JP, Lorenzo J, Feldman JM, Leventhal H, Halm EA. The relationship between perceived stress and morbidity among adult inner-city asthmatics. Journal of

Asthma. 2010;47(1):100-104.

89. Sandberg S, Jarvenpaa S, Penttinen A, Paton JY, McCann DC. Asthma exacerbations in children immediately following stressful life events: A cox's hierarchical regression. Thorax. 2004;59(12):1046-1051.

90. Sandberg S, Paton JY, Ahola S, et al. The role of acute and chronic stress in asthma attacks in children. The Lancet. 2000;356(9234):982-987.

47

91. Apter AJ, Garcia LA, Boyd RC, Wang X, Bogen DK, Ten Have T. Exposure to community violence is associated with asthma hospitalizations and emergency department visits. J Allergy Clin Immunol. 2010;126(3):552-557.

92. Kopel LS, Gaffin JM, Ozonoff A, et al. Perceived neighborhood safety and asthma

morbidity in the school Inner‐City asthma study. Pediatr Pulmonol. 2015;50(1):17-24.

93. Weil CM, Wade SL, Bauman LJ, Lynn H, Mitchell H, Lavigne J. The relationship

between psychosocial factors and asthma morbidity in inner-city children with asthma.

Pediatrics. 1999;104(6):1274-1280.

94. American Psychiatric Association. Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Pub; 2013.

95. Murray CJ, Abraham J, Ali MK, et al. The state of US health, 1990-2010: Burden of diseases, injuries, and risk factors. JAMA. 2013;310(6):591-606.

96. Kroenke K, Strine TW, Spitzer RL, Williams JB, Berry JT, Mokdad AH. The PHQ-8

as a measure of current depression in the general population. J Affect Disord.

2009;114(1):163-173.

97. González HM, Vega WA, Williams DR, Tarraf W, West BT, Neighbors HW.

Depression care in the United States: Too little for too few. Arch Gen Psychiatry.

2010;67(1):37-46.

98. Kim D. Blues from the neighborhood? Neighborhood characteristics and depression.

Epidemiol Rev. 2008;30:101-117.

48

99. Richardson R, Westley T, Gariépy G, Austin N, Nandi A. Neighborhood socioeconomic conditions and depression: A systematic review and meta-analysis. Soc

Psychiatry Psychiatr Epidemiol. 2015;50(11):1641-1656.

100. Speldewinde PC, Cook A, Davies P, Weinstein P. A relationship between environmental degradation and mental health in rural Western Australia. Health Place.

2009;15(3):880-887.

101. Hendryx M, Innes-Wimsatt KA. Increased risk of depression for people living in coal mining areas of central Appalachia. Ecopsychology. 2013;5(3):179-187.

102. Bandiera FC, Richardson AK, Lee DJ, He J, Merikangas KR. Secondhand smoke exposure and mental health among children and adolescents. Arch Pediatr Adolesc

Med. 2011;165(4):332-338.

103. Fan AZ, Prescott MR, Zhao G, Gotway CA, Galea S. Individual and community- level determinants of mental and physical health after the Deepwater Horizon oil spill:

Findings from the gulf states population survey. The journal of behavioral health services

& research. 2015;42(1):23-41.

104. Rung AL, Gaston S, Oral E, et al. Depression, mental distress and domestic conflict among Louisiana women exposed to the Deepwater Horizon oil spill in the WaTCH study. Environ Health Perspect. 2016.

105. Cerdá M, Paczkowski M, Galea S, Nemethy K, Péan C, Desvarieux M.

Psychopathology in the aftermath of the Haiti earthquake: A population‐based study of posttraumatic stress disorder and major depression. Depress Anxiety. 2013;30(5):413-

424.

49

106. Golub NI, Winters PC, van Wijngaarden E. A population-based study of blood lead levels in relation to depression in the United States. Int Arch Occup Environ Health.

2010;83(7):771-777.

107. Jonas BS, Wagener DK, Lando JF, Feldman JJ. Symptoms of anxiety and depression as risk factors for development of asthma. Journal of Applied Biobehavioral

Research. 1999;4(2):91-110.

108. Patten SB, Williams JV, Lavorato DH, Modgill G, Jetté N, Eliasziw M. Major depression as a risk factor for chronic disease incidence: Longitudinal analyses in a general population cohort. Gen Hosp Psychiatry. 2008;30(5):407-413.

109. Loerbroks A, Apfelbacher CJ, Bosch JA, Sturmer T. Depressive symptoms, social support, and risk of adult asthma in a population-based cohort study. Psychosom Med.

2010;72(3):309-315.

110. Brumpton BM, Leivseth L, Romundstad PR, et al. The joint association of anxiety, depression and obesity with incident asthma in adults: The HUNT study. Int J Epidemiol.

2013;42(5):1455-1463.

111. Coogan PF, Yu J, O'Connor GT, Brown TA, Palmer JR, Rosenberg L. Depressive symptoms and the incidence of adult-onset asthma in African American women. Annals of Allergy, Asthma & Immunology. 2014;112(4):333-338. e1.

112. Brunner WM, Schreiner PJ, Sood A, Jacobs Jr DR. Depression and risk of incident asthma in adults. the CARDIA study. American journal of respiratory and critical care medicine. 2014;189(9):1044-1051.

50

113. Walters P, Schofield P, Howard L, Ashworth M, Tylee A. The relationship between asthma and depression in primary care patients: A historical cohort and nested case control study. PLoS One. 2011;6(6):e20750.

114. Gao Y, Zhao H, Zhang F, et al. The relationship between depression and asthma: A meta-analysis of prospective studies. PloS one. 2015;10(7):e0132424.

115. Ahmedani BK, Peterson EL, Wells KE, Williams LK. Examining the relationship between depression and asthma exacerbations in a prospective follow-up study.

Psychosom Med. 2013;75(3):305-310.

116. Mitka M. Rigorous evidence slim for determining health risks from natural gas

fracking. JAMA. 2012;307(20).

117. Kovats S, Depledge M, Haines A, et al. The health implications of fracking. Lancet.

2014;383(9919):757-758.

51

Chapter 2: Detailed Methods

2.0 Chapter overview This chapter describes the data sources used in this research, and provides detailed methods and their rationales for the analyses in Chapters 3 - 6. Only methods that are not described in Chapters 3 - 6 are covered here.

2.1 Data sources 2.1.1 Geisinger Clinic

The Geisinger Clinic provides primary care services to over 450,000 patients in more than 35 counties of central and northeastern Pennsylvania, covering a range of place types (townships, boroughs, and cities). Geisinger has had an electronic health record (EHR) since 2001. The EHR captures data at all clinical encounters (inpatient,

outpatient, emergency, urgent care, telephone).1 This is a powerful source of data for epidemiology studies because it includes a large sample size, can be obtained relatively inexpensively, is longitudinal, has detailed health information, and is already in an electronic form.2

2.1.2 Pennsylvania Department of Environmental Protection The Pennsylvania Department of Environmental Protection is the state agency responsible for the enforcement of the state's environmental laws.

2.1.2.1 Natural gas wells

Oil and gas development in Pennsylvania is regulated under the Pennsylvania

Oil and Gas Act (58 P.S. §§ 601.101-601.607), which was passed in 1984, was overhauled in 2012 with the passage of Act 13, and was additionally updated by the

Unconventional Well Report Act passed in 2014.3-5 As required by these laws, well

operators are required to submit a well record form within 30 days of drilling of a well and

a well completion report within 30 days of stimulating a well to the DEP. The well record

form collects information on the well’s location, permit number, depth, and the date

52 drilling started (spud date) (Figure 2.1.2.1). The well completion report collects information on the stimulation phase, including the date of stimulation (Figure 2.1.2.2).

Reporting requirements for natural gas production have changed over time. Before July

2010, production was reported in yearlong increments. From July 2010 to December

2014, production was reported in 6-month increments (January – June and July –

December). As of January 2015, production is reported monthly. The DEP compiles the spud and production data electronically, and the data is available for download on their website (https://www.paoilandgasreporting.state.pa.us/publicreports).

Figure 2.1.2.1. The well record form.

53

54

Figure 2.1.2.2. The well completion report.

2.1.2.2 Compressors

Compressors have been documented to be an important source of air pollution

from unconventional natural gas development (UNGD).6,7 As emitters of air pollution,

compressor stations in Pennsylvania are required to obtain a plan approval and an

operating permit from the DEP. Smaller compressor stations (non-major facilities) can

obtain a General Permit for Air Pollution Control in Natural Gas Compression and/or

Processing Facilities (referred to as a GP-5 permit). This is a general permit that serves

as both a plan approval and an operating permit; companies can apply for this as long

as the facility meets the qualifying criteria, which include limits (that vary by county) on

55 the yearly emissions of several pollutants. Companies must submit a suite of standardized documents, including an application and a General Information Form (GIF), to obtain a GP-5. The GP-5 offers faster times to approvals over non-general plan approvals, which are used by larger compressor stations. Non-general plan approvals do not have standardized forms.8

2.1.2.3 Municipal water supply

As the agency in charge of enforcing the state's water quality laws, the DEP

maintains a shapefile of areas in the state with a public water supply. We downloaded

the shape file from their website (http://www.depgis.state.pa.us/emappa/). This

information was used to identify patients who were likely using ground water for drinking

water, if their residences geocoded to areas outside these public water supply

boundaries.

2.1.3 Pennsylvania Department of Conservation and Natural Resources

After the DEP receives a completion report (described in Section 2.1.2.1) from a well operator, the DEP sends this report to the Pennsylvania Department of

Conservation and Natural Resources (DCNR). The DCNR is charged with analyzing and interpreting the geologic information provided on the completion report. To do so, the

DCNR maintains the Internet Record Imaging System/Wells Information System

(PA*IRIS/WIS), renamed Exploration and Development Well Information Network

(EDWIN) in 2016. PA*IRIS/WIS is a subscription service that allows electronic access to scanned images of Location Plats, Completion Reports, Geophysical Logs, and Plugging

Certificates, as well as access to an electronic database of the variables contained on the completion report.9

2.1.4 U.S. Census data We used Census data to create community level variables. As required by the

Constitution, the census is conducted every 10 years. Starting in 2010, the census

56 contained only 10 questions on name, sex, age, race, race/ethnicity, relationship, and home ownership; and questions on economic, housing, and social characteristics were moved to the American Community Survey, which is a continuous monthly survey.10 In this thesis, we used data from the 2010 American Community Survey (5 year estimates).

This information was used to create area-level social environmental variables.

2.1.5 Federal Highway Administration

The Federal Highway Administration is the division of the United States

Department of Transportation that is in charge of the highway system. We downloaded a shape file of highway system from the Federal Highway Administration’s Highway

Performance Monitoring System website

(http://www.fhwa.dot.gov/policyinformation/hpms/shapefiles.cfm). This information was

used to create metrics for distance to major and minor road ways.

2.1.6 U.S. Department of Agriculture National Agricultural Imagery Program The U.S. Department of Agriculture’s National Agricultural Imagery

Program (NAIP) provides publically available aerial imagery at a one meter resolution of the continental U.S. during the agricultural growing season. The aerial imagery is available to download from their website (https://www.fsa.usda.gov/programs-and-

services/aerial-photography/imagery-programs/naip-imagery/index). For Pennsylvania,

aerial imagery is available for 2005, 2008, 2010, and 2013. This information was used to

confirm the location of well pads.

2.1.7 Satellite data

To create area-level variables, we used data from three different satellites:

Landsat 7, Suomi NPP, and Terra. NASA’s Landsat program is the longest running program to collect satellite imagery (both visible and infrared) of the Earth. The current satellite in the Landsat program is Landsat 7.11 We used the Visible Infrared Imaging

Radiometer Suite (VIIRS) on the Suomi NPP satellite to identify natural gas flares.12 The

57

Terra satellite contains the Moderate-resolution Imaging Spectroradiometer (MODIS).

MODIS is used to create a measure of greenness (normalized difference vegetation

index [NDVI]) every 16 days with a 250 meter resolution.13 Satellite data is available

from NASA’s Earth Observing System Clearing House

(https://reverb.echo.nasa.gov/reverb/).

2.1.8 Environmental Protection Agency

The Environmental Protection Agency (EPA) is the federal agency in charge of enforcing environmental laws. The EPA was the sources of several types of data for the completed research, as follows.

2.1.8.1 Air quality monitoring network

The National Ambient Air Quality Standards (NAAQS) are the federal standards for criteria air pollutants (carbon monoxide, lead, nitrogen dioxide, ozone, particulate matter, and sulfur dioxide), and the EPA requires that states maintain a monitoring network for these. Data from these monitors is available from the EPA’s AirData website

(https://aqs.epa.gov/api), as are monitor locations.

2.1.8.2 National Emissions Inventory

The National Emissions Inventory (NEI) is a database of sources that emit

criteria and hazardous air pollutants. It includes their type, their location, and their

emissions. It is used to create air pollution models. It is available on the EPA’s website

(https://www.epa.gov/air-emissions-inventories). We considered using these data as a

source of air pollution emissions in Chapter 5.

2.2 Data acquisition 2.2.1 Geisinger EHR data The Geisinger EHR data was provided in 13 files: vitals, family history, demographics, contact information, procedures, social history, encounter diagnoses, outpatient encounters, hospital encounters, medication orders, medication order

58 diagnoses, medication record, lab orders, and problem list. Each patient had a study identification number, which was used to link patients across datasets, and each encounter had an encounter identification number, which was used to link information about encounters across datasets.

2.2.2 Crowdsourced data on impoundments and well pads from SkyTruth

We collected the impoundment and well pad data in partnership with SkyTruth.

For impoundments, SkyTruth created a collaborative image analysis application on their website (skytruth.org) that displayed aerial imagery collected by the USDA National

Agricultural Imagery Program14 of the one square kilometer area around UNG wells from

the summers of 2005, 2008, 2010 and 2013 (Figure 2.2.2). Trained volunteers and staff

identified and outlined impoundments. Each image was reviewed by no less than three

staff or ten volunteers, 66.6% agreement was required among staff or 70% among

volunteers, and assignments were validated by a GIS analyst before inclusion in the final

dataset. The methods were the same for well pads, but volunteers only identified points

(not outlines) of well pads.

59

Figure 2.2.2. Screenshot of the SkyTruth application used to crowdsource impoundment locations.

2.2.3 Compressor data

We started with a list of compressor stations related to UNGD from the

Pennsylvania Department of Environmental Protection (DEP) to define our population of compressor stations (n=506). To characterize the compressor stations in our population, we needed to collect the following variables on each compressor engine: station name, location, compressor engine horsepower, compressor engine emissions, and start and stop dates of operation for each engine. These variables are contained on several different documents, including applications, GIFs, authorization letters, start letters, and cancelation letters in the station’s files. Applications, GIFs, and plan approvals are

60 submitted before the construction of a new compressor station, alteration of an existing compressor station, or to renew a station’s permit. These contain information on the station location, number of compressor engines, compressor engine horsepower, compressor engine emissions, and expected start date of operation. Authorization letters are sent from the DEP to the company notifying them that their permit is authorized.

Start letters are sent from the company to the DEP notifying them that an engine has started operation. Cancelation letters are sent from the company to the DEP notifying them that an engine has stopped operation.

These documents are not available electronically. Paper copies of these documents are kept in files at 4 different DEP locations, for the Northeast, North-central,

Northwest, and Southwest Regional offices. We also scanned other documents that, during file review, were found to contain information on station location, number of compressor engines, compressor engine horsepower, compressor engine emissions, start date of operation, and stop date of operation. Between October 28, 2013 and May

1, 2014, we made a total of 17 visits (each lasting between 2 days and one week) to the

DEP regional offices and scanned a total of 6,007 documents on our population of compressor stations.

2.3 Data processing 2.3.1 Creation of the unconventional natural gas well dataset

To estimate patients’ UNGD activity metrics, we needed complete data on the location, dates of development, grouping of wells on well pads, total depth, and production quantities of all drilled unconventional natural gas wells in Pennsylvania.

However, these variables were not available from a single dataset, the different datasets with these variables contained different populations of wells, and within each dataset there were variables with missing values. To create a complete well dataset, we merged data from several sources and abstracted, extrapolated, and imputed several missing

61 variables using the following methods. We initially created a complete well dataset through June 2013, which identified 6,915 spudded wells. We then updated the well data twice: first through December 2014, which identified 8,888 spudded wells, and then through December 2015, which identified 9,669 spudded wells.

2.3.1.1 Well data sources

In our analysis, we used data on well latitude and longitude; well pad

latitude and longitude; dates of spudding, stimulation, and production; total depth; and

volume of natural gas produced and the number of production days. All variables except

for total depth, date of stimulation, and well pad latitude and longitude were available

electronically from the PA DEP. Total depth and date of stimulation were available from

the Pennsylvania Internet Record Imaging System (PA*IRIS). Well pad data was from

SkyTruth, which used crowdsourcing of aerial photographs from the U.S. Department of

Agriculture from 2005, 2008, 2010, and 2013 to identify the location of wellpads

(Section 2.2.2).15 We merged data across different data sources using API number, a

unique well identifier.

2.3.1.2 Inclusion criteria

Because the spud, stimulation, and production reports included different

populations of wells, we used the following inclusion criteria to identify unconventional

wells:

Wells with a spud date and marked unconventional in the spud report; OR

Wells with a total depth greater than 10,848.5 feet (the median depth), well type of gas, and a stimulation date in the stimulation report; OR

Wells with at least one non-zero production period and marked unconventional in the production report.

We merged data on wells that met at least one inclusion criterion from the spud, stimulation, and production reports using wells’ permit numbers.

62

2.3.1.3 Creation of well variables

The variables in each of the three iterations of the UNGD datasets that were most important to UNGD activity assessment included: well permit number, well location, well pad identification number, well pad location, spud date, total depth, stimulation date, production start date, and production quantities. We required the spud date to be before the stimulation date, which was required to be before the production start date. We also created indicator variables to identify the data source (present in the original report, abstracted, extrapolated, or imputed) for the spud date, total depth, and stimulation date variables.

2.3.1.3.1 Well latitude and longitude

The spud, production, and stimulation reports all contained latitudes and longitudes. We took the average of the latitudes and longitudes if the largest difference between any two of the decimal latitudes or longitudes was less than 0.001

(approximately 100 meters). For wells with differences in the latitudes or longitudes greater than or equal to 0.001 (n = 35), we used Google Earth to locate the well pad and then used the latitude and longitude from Google Earth. If none of the locations looked like a well pad (n = 3), then we used the latitude and longitude from the spud report.

2.3.1.3.2 Spud date

All wells in the dataset were required to have a spud date. If a spud date was after the well's stimulation date, or after the well’s start date of production, we deleted the spud date and treated it as missing. To impute spud dates, we first calculated the median number of days from spud date to stimulation date and to production start date by time period (2009 and before vs. 2010 and after) and region (north vs. east, Figure

2.3.3.2.1) for wells not missing those dates (Tables 2.3.3.2.1 and 2.3.3.2.2). For wells missing a spud date but not missing a stimulation date, we extrapolated the spud date by subtracting the median days from spud to stimulation by year and region from the

63 well’s stimulation date. For wells missing a spud date and a stimulation date, we imputed the spud date by subtracting the median days from spud to start date of production by year and region from the well’s start date of production. To update the well dataset in

2014 and 2015, we looked for spud dates in the spud report for wells with a previously extrapolated spud date, and if we found a spud date in the spud report, we replaced the extrapolated spud date with the date in the spud report. The percentage of wells with spud dates present in the spud report vs. extrapolated remained constant at about 98% over the three iterations of the dataset (Table 2.3.3.2.3).

Figure 2.3.3.2.1. Counties considered eastern and northern for the purposes of well variable imputation and extrapolation.

Table 2.3.3.2.1. Median days from spud to stimulation by year and region, based on the 2013 well dataset Production start date in Production start date in Region 2009 and earlier 2010 and later Northern 179 192 Eastern 111 244

64

Table 2.3.3.2.2. Median days from stimulation to production start by year and region, based on the 2013 well dataset Production start date in Production start date in Region 2009 and earlier 2010 and later Northern 202 330 Eastern 143 334

Table 2.3.3.2.3. Spud date missingness percent (number) by data set iteration 2013 well 2014 well dataset 2015 well dataset dataset Spud date not missing 97.8 (6,766) 98.3 (8,733) 98.4 (9,512) in spud report Spud date missing in 2.2 (149) 1.7 (155) 1.6 (157) spud report Total wells in dataset 100 (6,915) 100 (8,888) 9,669

2.3.1.3.3 Total depth

All wells in the dataset were required to have a total depth. The total depth variable is from PA*IRIS/WIS. We looked up and abstracted total depths for wells missing this variable in the scanned forms in PA*IRIS/WIS. We imputed the remaining missing total depths using the predictions from a regression of total depth on county

(indicator variables) and spud year (indicator variables). Similar to spud dates, to update the well dataset in 2014 and 2015, we looked for total dates in the stimulation report for wells with a previously imputed total depth, and we replaced the imputed total depth with the total depth in the stimulation report if one was present. The percentage of wells with total depths present in the stimulation report declined from 62.4% to 53.1% from the

2013 dataset to the 2015 dataset, and the percentage of wells with an imputed total depth jumped from <1% in the 2013 and 2014 datasets to 7.6% in the 2015 well dataset

(Table 2.3.3.3).

65

Table 2.3.3.3. Total depth missingness percent (number) by data set iteration 2013 well 2014 well dataset 2015 well dataset dataset Total depth not missing 62.4 (4,312) 56.6 (5030) 53.1 (5135) in stimulation report Total depth missing and abstracted from 37.0 (2,558) 42.7 (3,795) 39.3 (3803) PA*IRIS/WIS Total depth missing 0.6 (45) 0.7 (63) 7.6 (731) and imputed Total wells in dataset 100 (6,915) 8,888 9,669

2.3.1.3.4 Stimulation date

Because a well could be spudded but not yet stimulated or producing at the time

the dataset was created, we only considered stimulation dates missing if the well had

reported production quantities. If the stimulation date was after the well’s estimated start

date of production, we deleted the stimulation date and treated it as missing. Missing

stimulation dates were data abstracted from PA*IRIS/WIS, and the remaining were extrapolated using a similar process as was used for missing spud dates. We divided the median number of days from spud to stimulation (Table 2.3.3.2.1) by the number of days

from spud to start date of production (Table 2.3.3.2.2), and calculated the median proportion by region and time period. We multiplied this proportion by the number of days from spud to the start date of production for a given well, and added the calculated number of days to the well’s spud date. From the 2013 to the 2015 well dataset, the number of wells not needing a stimulation date decreased, reflecting the increasing number of wells in production (Table 2.3.3.4).

66

Table 2.3.3.4. Stimulation date missingness percent (number) by data set iteration 2013 well 2014 well dataset 2015 well dataset dataset Stimulation date not missing in stimulation 42.2 (2,159) 43.9 (3,901) 45.5 (4,404) report Stimulation date missing and abstracted 1.8 (121) 1.4 (127) 2.1 (202) from PA*IRIS/WIS Stimulation date missing and 24.9 (1,718) 29.5 (2,625) 31.8 (3071) extrapolated Stimulation date not needed (well does not 31.2 (2,159) 25.2 (2,235) 20.6 (1,992) have production) Total wells in dataset 100 (6,915) 100 (8,888) 100 (9,669)

2.3.1.3.5 Production start date and production quantities

The production report includes well production days and production quantities

(MCF, thousand cubic feet of natural gas) by reporting period for each well. Reporting periods were year-long in 2009 and prior, and half-year-long in 2010 and after. We

estimated the start date of production by subtracting the gas production days from a

well’s first production period from the last day of the well’s first production period. For

wells missing a production quantity with reported quantities in the periods before and

after, we took the average of the quantities before and after.

2.3.3.1.6 Well pad

Using the well permit number, we merged our well dataset with the SkyTruth well

pad dataset. If the well was in the SkyTruth well pad dataset, we assigned it the

SkyTruth well pad ID. However, wells were missing from the SkyTruth well pad dataset.

In this case, we grouped wells within 150 meters of one another using ArcGIS. If a well

not in the SkyTruth well pad dataset grouped with a well in the SkyTruth well pad

dataset, we assigned the well missing a well pad the SkyTruth well pad ID of the well in

its group. Wells that did not group with a well in the SkyTruth well pad dataset were

assigned a well pad ID with a designation that these were GIS-created. The SkyTruth

67 well pad dataset was not updated between the creation of the 2013 and 2015 dataset, so over time the number of well pads from SkyTruth remained constant and the number of well pads created in GIS increased (Table 2.3.3.6).

Table 2.3.3.6. Percentage (number) of well pads by data source 2013 well 2014 well dataset 2015 well dataset dataset Well pad from 40 (1,174) 36 (1,174) 35 (1,174) SkyTruth Well pad created in 60 (1,736) 64 (2,096) 65 (2,218) GIS Total well pads in 100 (2,910) 100 (3,270) 100 (3,392) dataset

2.3.2 Creation of the UNGD-related compressor engine dataset

We used the following methods to create a dataset on UNGD-related compressor engines from the documents we scanned at the DEP.

2.3.2.1 Data abstraction

To systematically extract the variables we needed from the scanned documents on compressor stations, we created 6 data abstraction forms using Google Docs: applications, GIFs, authorization letters, start letters, cancelation letters, and other documents. Each scanned PDF was read and data abstracted onto its respective data abstraction form, by one of three primary data abstractors. Documents that did not contain information on the key variables identified above were not data abstracted.

2.3.2.2 Data checking

We exported the spreadsheets from each of the six abstraction forms to Excel.

We took 10% random samples of the compressor stations. A data abstractor (different than the person who originally abstracted the data) re-abstracted the scanned source documents from that station, and then compared the re-abstracted data to the data originally abstracted. They noted if the errors observed were entry errors or the result of differential decision making by a data abstractor. They then corrected the errors. We

68 took four random samples. The first three contained errors. These errors did not appear to be the result of differential decision making. We corrected these errors in the database. The fourth did not, so we considered the data abstraction complete.

2.3.2.3 Creation of a Compressor Station Database

We merged the six compressor engine spreadsheets using station ID and station name to link stations across spreadsheets. The database was formatted with one row per type of engine at each station, and contained the following variables: station ID, station name, station latitude, date the engine was authorized, station longitude, date the engine started operating, date the engine was canceled (stopped operating), engine horsepower, engine emissions (NOx, VOC, and CO), and the number of engines at the station with all these same characteristics. We reformatted the data in STATA to format with one row per compressor engine using the “expand” command.

2.4 Selection of study population and outcomes Discussed below are the methods for selecting the study population and identifying health outcomes for the two epidemiology studies in this thesis (Chapter 3 and Chapter 4).

2.4.1 Asthma study

The UNGD and asthma exacerbation study compared asthma patients with asthma exacerbations to asthma patients without asthma exacerbations (up to that point in time). Because everyone in the study had asthma, we needed to first identify patients with asthma from the general Geisinger Clinic population, and then identify all asthma exacerbations among patients in this population.

2.4.1.1 Identification of asthma population

To identify patients with asthma from the general Geisinger Clinic population, we

first restricted the Geisinger Clinic population to patients with a Pennsylvania or New

York address. Next, based a study that used electronic health records to identify patients

69

with asthma,16 we excluded patients with two or more encounters with ICD-9 codes for

cystic fibrosis (277.0x); chronic pulmonary heart disease (416.x); paralysis of vocal cords

or larynx (478.3x); bronchiectasis (494.xx); and pneumoconiosis (500.xx-508.xx). Next,

we required patients have at least 2 ICD-9 encounters code for asthma (493.x) on

different days, or at least one ICD-9 encounter code for asthma and at least one

medication order (with an ICD-9 code for asthma) on a different day. Finally, we dropped

patients who did not geocode to any level (Section 2.6) and patients missing information

on sex or date of birth in the EHR.

2.4.1.2 Identification of asthma exacerbations

We identified three types of asthma exacerbations among the study population of patients with asthma: mild (new oral corticosteroid [OCS] medication order), moderate

(asthma emergency department visit), and severe (asthma hospitalization). For asthma

emergency department visits and asthma hospitalizations, first we combined all

emergency or hospitalization encounters by patient that were overlapping or within 72

hours. We considered encounters that combined both emergency department visits and

hospitalizations to be hospitalizations. We excluded emergency department visits within

a week before or after a hospitalization. We identified moderate and severe asthma

exacerbations (2005 to 2012) by selecting those with an ICD-9 encounter code for

asthma (493.x). We used both primary and secondary diagnoses.

For OCS medication orders, we needed to distinguish new OCS medication

orders for an asthma exacerbation from standing orders or OCS ordered for other

diseases. To do so, we identified all OCS orders among asthma patients from 2008 to

2012. OCS orders from before 2008 were excluded because inpatient medication orders

were not consistently captured in the EHR then. We dropped OCS orders from to seven

days before to seven days after a hospitalization or emergency department encounter.

To separate new orders from standing orders, we dropped OCS orders that were

70

submitted while the patient was already on OCS, as reported in the medication record

file, or already had another order for OCS. We dropped OCS orders that were within a

week of the previous order, and we required the outpatient visit reason or the medication

order diagnosis to be asthma-related. We dropped OCS orders associated with an

outpatient visit for the following reasons, since these are reasons that OCS are

prescribed that are not related to asthma: suppurative and unspecified otitis media (ICD-

9 code 382.x), nonsuppurative otitis media and Eustachian tube disorders (ICD-9 code

381.x), contact dermatitis and other eczema (ICD-9 code 692.x), and other and

unspecified disorders of the back (i.e. spine) (ICD-9 code 724.x).

For all three types of asthma exacerbations, we dropped exacerbations among patients less than 5 years of age on the day of the encounter, exacerbations after the patient’s date of death (likely because of erroneous recording of one of the two dates), and exacerbations before 2005 or after 2012. Finally, we randomly selected and retained one exacerbation per type per person per year.

2.4.1.3 Identification and matching of control index dates

Because our study design compared asthma patients with asthma exacerbations

(case events) to asthma patients without asthma exacerbations (control date), we needed to identify contact dates for asthma patients who had not yet had an asthma exacerbation. To do so, we started with the population of asthma patients (Section

2.4.1.1), and then created a list of all contact dates with the health system (for lab orders, outpatient encounters, hospital encounters, medication orders, medication order diagnoses, procedures, and vitals) for these patients. For patients with each type of asthma exacerbation, we dropped potential control dates after they had a case event.

For hospitalization controls, we dropped potential control index dates in the year and after the control had an asthma hospitalization; for emergency department controls, we dropped potential control dates in the year and after the control had an asthma

71

hospitalization or an emergency department visit; and for OCS controls, we dropped

potential control dates in the year and after the control had an OCS medication order,

emergency department visit, or hospitalization. Next, we randomly selected one control

date per person per year. We did this so patients with many contact dates with the

health system would not contribute much more information to the analysis than patients

with fewer contact dates. For patients with contact with Geisinger in a year and again

two years later, but not in the middle year, we considered the patient under observation

for the entire period and took the average of the two dates to create a date in the middle

year. We dropped potential control dates among patients less than 5 years of age on the

day of the encounter, potential control dates after the patient’s date of death, and

potential control dates before 2005 or after 2012. Finally, we frequency-matched controls

to cases by age category (5 to < 13, 13 to < 19, 19 to < 45, 45 to < 62, 62 to < 75, > 75

years), year, and sex to case events to select which control dates to include in the

analysis.

2.4.2 Depression symptom study

The depression symptom study was conducted using data from the

Chronic Rhinosinusitis Integrative Studies Program (CRISP), a study of chronic rhinosinusitis (CRS) in conducted in the Geisinger Clinic.17

2.4.2.1 Study population

The CRISP study population consisted of adult primary care patients of the

Geisinger Clinic. EHR data from 2006 to 2013 were used to categorize all adult primary care patients into one of three groups based on a history of sinus-related diagnoses and/or evaluations in the EHR, and three groups based on race/ethnicity, for a total of nine groups. The nasal and sinus symptom groups were: patients with at least two ICD-9 codes for CRS (ICD-9 codes 473.x or 471.x); patients with least one ICD-9 code for asthma or allergic rhinitis (ICD-9 codes 493.x or 477.x) or a single ICD-9 code for CRS;

72 and patients with no ICD-9 codes for CRS, asthma, or allergic rhinitis. The three race/ethnicity groups were white, non-Hispanic; black, non-Hispanic; and Hispanic. The survey design oversampled for patients with a history of nasal and sinus symptoms and patients who were not white, and 23,700 patients were randomly selected and included in the CRISP study population.17

2.4.2.2 Outcome and mediating variables created from the questionnaires

The 23,700 patients selected for the CRISP study were sent the baseline questionnaire in April 2014, and all patients who responded to the baseline questionnaire were sent the follow-up questionnaire in October 2014. The baseline and follow-up questionnaires included validated questionnaires on symptoms that we used to create the outcome and mediating variables.

2.4.2.2.1 Fatigue

Fatigue symptoms were ascertained using the Patient-Reported Outcomes

Measurement Information System (PROMIS) fatigue short form 8a, a questionnaire with

8 questions that was included in the baseline questionnaire.18 It asks eight questions

about fatigue symptoms over the past seven days (Table 2.4.2.2.1), with answer choices

of “not at all” (1 point), “a little bit” (2 points), “somewhat” (3 points), “quite a bit” (4

points), and “very much” (5 points). We added the points for each responder and

considered responders in the highest quartile of fatigue scores to have severe fatigue.

Table 2.4.2.2.1. Symptoms included in Patient-Reported Outcomes Measurement Information System fatigue short form 8a. I feel fatigued. I have trouble starting things because I am tired How run-down did you feel on average? How fatigued were you on average? How much were you bothered by your fatigue on average? To what degree did your fatigue interfere with your physical functioning? How often did you have to push yourself to get things done because of your fatigue? How often did you have trouble finishing things because of your fatigue?

73

2.4.2.2.2 Migraine headache

We ascertained migraine headache using the ID Migraine questionnaire, which was included on the baseline questionnaire.19 This questionnaire first asks responders

how often they have had headaches in the past 12 months, and the answer choices

were “never,” “once in a while,” “some of the time,” “most of the time,” or “all of the time”.

Those who responded to that question with at least “some of the time” were asked three

additional questions on headache-associated disability, nausea, and light sensitivity

(Table 2.4.2.2.2) with response choices of “never,” “rarely,” “less than half the time,” or

“half the time or more.” “Never” or “rarely” were scored as no and “less than half the

time” or “half the time or more” were scored as yes, and responders with “yes” on at

least two of the three questions were considered to have migraines.

Table 2.4.2.2.2. Symptoms included in the ID Migraine questionnaire. How often do your headaches interfere with your ability to work, study, or enjoy life? How often do you have nausea with your headaches? How often have you been unusually sensitive to light during your headaches?

2.4.2.2.3 Depression symptoms

We used the validated Personal Health Questionnaire Depression Scale (PHQ-8) questionnaire, which was included on the follow-up questionnaire, to ascertain depression symptoms. The PHQ-8 is a measure of current depression symptoms, used both in the clinical setting and in epidemiology studies, that asks responders how often they were bothered by the symptoms in Table 2.4.2.2.3 over the past two weeks: “not at all” (0 points), “several days” (1 point), “more than half the days” (2 points), or “nearly every day” (3 points). The points are added up, and a score of 0 to < 5 was considered no or minimal depression symptoms, 5 to < 10 was mild depression symptoms, 10 to <

15 was moderate depression symptoms, 15 to < 20 was moderate depression symptoms, and greater than 20 was severe depression symptoms.20

74

Table 2.4.2.2.3. Symptoms included in the Personal Health Questionnaire Depression Scale (PHQ-8) questionnaire. Little interest or pleasure in doing things Feeling down, depressed, or hopeless Trouble falling or staying asleep, or sleeping too much Feeling tired or having little energy Poor appetite or overeating Feeling bad about yourself, or that you are a failure, or have let yourself or your family down Trouble concentrating on things, such as reading the newspaper or watching television Moving or speaking so slowly that other people could have noticed. Or the opposite – being so fidgety or restless that you have been moving around a lot more than usual

2.4.2.3 Case and control dates for the disordered sleep outcome

We identified disordered sleep diagnoses (case-events), consisting of encounters

and medication orders, in Geisinger’s EHR among the study population included in the

depression study. Encounters were identified in the EHR using the ICD-9 codes for

disordered sleep in Table 2.4.2.3.21 We identified orders for disordered sleep medications in the drug class “hypnotics” and using drug subclass and name. We included all medications in the drug subclass antihistamine hypnotics, selective melatonin receptor agonists, hypnotics – tricyclic agents, and orexin receptor antagonists. In the subclass non-barbiturate hypnotics, we included all medications except midazolam hydrochloride. Either an appropriate medication order or an encounter with the appropriate ICD-9 code was considered as a disordered sleep outcome. We excluded disordered sleep outcomes from before 2009, only retained disordered sleep diagnoses from when the participant was 18 years of age or older, and randomly selected one disordered sleep diagnosis per participant per year so that study subjects with many encounters for sleep disorders would not unduly contribute. We identified control encounters and matched them to case events using the same methods as in the asthma exacerbation study (Section 2.4.1.3).

75

Table 2.4.2.3. ICD-9 codes used to identify disordered sleep. ICD-9 code Description 780.52 Insomnia 780.50 Sleep disturbance, unspecified 307.47 Other dysfunctions of sleep stages or arousal from sleep 780.59 Other sleep disturbances 307.42 Persistent disorder of initiating or maintaining sleep 780.5 Sleep disturbances 307.41 Transient disorder of initiating or maintaining sleep 307.40 Nonorganic sleep disorder, unspecified 307.48 Repetitive intrusions of sleep 780.56 Dysfunctions associated with sleep stages or arousal from sleep 780.55 Disruptions of 24-hour sleep-wake cycle Abbreviation: ICD-9 = International Classification of Diseases, 9th Revision, Clinical Modification 2.5 Exposure study 2.5.1 Creation of the regular grid

In Chapter 5, we wanted to explore the relationships among UNGD metrics

(Section 2.7.3) for wells, compressors, and impoundments. We did not want to assign

the UNGD metrics at the locations of Geisinger patients because then the analysis

would be influenced by population density and residential patterns. Instead, we created

a regular grid across the Geisinger region. We did this using the “Create Fishnet” tool in

ArcGIS, and we specified that points would be 5 km from each other. We then exported

the coordinates of the points to R to assign them the UNGD metrics (Section 2.7.3).

2.5.2 Estimation of impoundment start and stop dates

Rutherford Platt at Gettysburg College estimated an installation and removal date for each impoundment. He used a trend analysis of Landsat data to identify sudden spectral changes in the grid cell that contained each impoundment. He compiled all available Landsat 5, 7, and 8 surface reflectance imagery with < 30% cloud cover for the years 2000-2015, a total of 754 images across four Landsat path/rows. For each impoundment location, he masked remaining clouds and then interpolated a monthly time series for the near infrared band and the NDVI. He used the Breaks for

Additive Season and Trend package in R to identify discrete breaks in the time series

76 after the removal of seasonal effects.22 The dataset has a nominal temporal resolution of

1 month, but cloud cover and gaps can potentially delay the detection of the creation or

removal of impoundments. Based on the direction, magnitude, and timing of the time

series breaks, he identified approximate dates of creation and removal of

impoundments. He verified estimates for a sample of impoundments by comparing

Landsat-derived dates to dates derived using historical imagery on Google Earth.

2.6 Geocoding of study population Joseph Dewalle at the Geisinger Center for Health Research, geocoded the

Geisinger patients included in this thesis. To do so, first, all addresses were validated against a US Postal Service database, which standardizes address components, adds

ZIP + 4, and converts some rural-style addresses and P.O. boxes into city-style addresses. Then, he sequentially used the following base maps to geocode addresses: residential structure points, created as a part of Pennsylvania’s effort to convert rural style addresses to city style addresses; StreetMap Premium Tom Tom Edition, commercial products from ESRI, versions 3, 2 and 1 for years 2012, 2011, and 2010;

Census TIGER 2013 and Census TIGER 2010, basemaps created by the U.S. Census for the 2010 Census; TeleAtlas 2009, an ESRI product; and Census TIGER 2000, a basemap created by the U.S. Census for the 2000 Census. These basemaps were used in an order of decreasing quality or increasing age. Only matches with a high sensitivity

(spelling > 90) were accepted. Addresses that could not geocode to the street level were instead geocoded to the centroid of the address’s ZIP + 4 or ZIP code. In the asthma exacerbation study (Chapter 3), we included only patients who geocoded to the states of Pennsylvania or New York. In the depression symptom study (Chapter 4), we included only patients who geocoded to the state of Pennsylvania. The number and percentages of the study populations in these studies who geocoded to the street, ZIP +

4, or ZIP code are provided in Table 2.6.

77

Table 2.6. Geocoding level for the asthma and depression study populations. Depression study Asthma study population population Geocoding level n (%) n (%) Street address 31,567 (88.9) 4396 (89.1) ZIP + 4 centroid 923 (2.6) 155 (3.1) ZIP centroid 3,018 (8.5) 381 (7.7)

2.7 Creation of study variables 2.7.1 Covariates created from the electronic health record

The covariates created from the electronic health record data are summarized in

Table 2.7.1 and described in detail along with their rationale in the following sections.

Table 2.7.1. Variables created from the electronic health record used in health studies. Variable Type Units Study Sex Binary Male/female Asthma, depression symptoms Age Categorical Years Asthma Age Continuous Years Depression symptoms Season Categorical Spring, summer, fall, Asthma winter Race/ethnicity Categorical White, black, Asthma Hispanic, other Race/ethnicity Categorical White, black, Depression Hispanic symptoms Smoking status Categorical Current, former, Asthma, never, missing depression symptoms Family history of Binary Yes/no Asthma asthma Family history of Binary Yes/no Depression mental disorders symptoms Medical Assistance Binary Yes/no Asthma, depressive symptoms Overweight/obesity Categorical Not Asthma, overweight/obese, depression overweight, obese, symptoms missing Diabetes Binary Yes/no Asthma Alcohol use Categorical Yes, not heavy; yes, Depression heavy; no symptoms Anti-depressant use Binary Yes/no Depression symptoms

78

2.7.1.1 Sex

Patient sex was determined from the sex variable in the demographics file, which classified patients as female, male, or unknown/missing. Patients with unknown/missing sex were excluded from the analysis. Sex is an important covariate in studies of asthma and of depression. In children, asthma prevalence is higher among males than females, but in adults asthma prevalence is higher among females than males.23 Male and female

children with asthma have similar risk for asthma exacerbations, but female adults have

higher risk for asthma exacerbation than male adults.24 Females of all ages 12 years and

older are more likely to have depression symptoms than males,25 and are more likely to

take anti-depressant medication than males at all levels of depression severity.26

2.7.1.2 Age

Age is also an important covariate in studies of asthma and depression. The rates of these diseases vary by age though both diseases can affect children and adults.

The rates of depressive symptoms increase from age groups 12-17 years to 18-39 years, and from 18-39 years to 40-59 years, but then decrease from 40-59 years to 60 years and over, among both males and females.25 Asthma prevalence is higher among

children than adults,23 and children are at higher risk for asthma exacerbations than

adults.24

Age was calculated as years between date of birth, from the demographics file,

and the index date (in the asthma study) or the date of survey return (in the depression

study). In the asthma study, patients were categorized into six age groups (5 to < 13, 13

to < 19, 19 to < 45, 45 to < 62, 62 to < 75, ≥ 75 years), the same categories used for

matching. In the depression analysis, age was a continuous variable because this

analysis had fewer study participants than the asthma analysis so we were more

concerned with having a parsimonious model.

2.7.1.3 Season

79

Asthma exacerbations tend to peak in the fall, especially for children, but also for adults, which has been attributed to children returning to school and to respiratory viruses.27,28 In the asthma study, season was calculated using month of the index date,

and categorized as spring (March 22-June 21), summer (June 22-September 21), fall

(September 22-December 21), and winter (December 22-March 21). Because the survey

that included depression questions was mailed on the same day to all recipients, we did

not use season as a covariate in the depression analysis.

2.7.1.4 Race/ethnicity

In the United States, black race/ethnicity is associated with higher risk of asthma

hospitalization than white race/ethnicity,29 and black or Hispanic race/ethnicity is

associated with higher rates of depression but lower use of anti-depressant

medication.25,26 In the asthma study, patient race/ethnicity was determined from the race/ethnicity variable in the demographics file, which categorized patients in five mutually exclusive categories: white, black, Hispanic, other, and missing. We combined the other and missing to create four categories: white, black, Hispanic, or other/missing.

In the depression study, three race/ethnic categories (white, black, Hispanic) were used in the survey design, and these same categories were used in analysis.

2.7.1.5 Smoking

It is well established that smoking can aggravate asthma.30 Smoking is also

associated with depression, though the direction of the association is not clear.31 Data

from the social history, the procedure, and the encounter diagnosis files were combined

to create the smoking variable, which classified patients into current, former, never,

missing smoking status. We started with the social history file, which included variables

on: the date the social history was taken, smoking status, packs per day, smoke years,

and smoke quit date. The categories of the smoking status variable and those that we

considered to be current smokers are in Table 2.7.1.5.1. We assumed passive smokers

80 were not smokers. Former smokers were reclassified as current smokers if their quit date was greater than the date the social history was taken. We treated “unknown if ever smoked” and “never assessed” as missing and did not use these social history records.

Next, we looked in the procedure file (Table 2.7.1.5.2) and in the encounter diagnosis file (Table 2.7.1.5.3) for smoking-related codes. All smoking related procedure and encounter diagnosis codes were considered to be indicative of current smoking. We identified the most recent smoking status, smoking related procedure code, or smoking related encounter diagnosis code before the index date.

Table 2.7.1.5.1. Smoking status categories considered as evidence of current smoking Smoking status category Considered current smoker? Current everyday smoker Yes Current some day smoker Yes Smoker, current status unknown Yes Former smoker; never assessed No Never smoker No Passive smoker No Unknown if ever smoked No

Table 2.7.1.5.2. Procedure codes considered as evidence of smoking Code Description 99406 “BEHAV CHNG SMOKING 3-10 MIN” 99407 “BEHAV CHNG SMOKING < 10 MIN” G0375 “DEMO-SMOKING CESSATION COUN” G0376 “SMOKING & TOBAC CESSATION” G9016 “SMOKING & TOBAC CESSATION-INT” W5963 “SMOKING (TOBACCO) CESSATION CO”

Table 2.7.1.5.3. ICD-9 codes considered as evidence of smoking Code Description Excluded subcategories "Chewing tobacco use," "Chews tobacco,” and 305.1 “Tobacco use disorder” "Tobacco dipper" V15.82 “History of tobacco use” "Toxic effect of secondhand 989.84 “Tobacco” tobacco smoke" “Tobacco use disorder complicating pregnancy, childbirth, or the 649.0 puerperium”

81

Next, we looked for any evidence of former smoking before the index date. This included smoking statuses, smoking related procedure codes, or smoking related encounter diagnosis codes, packs per day, smoke years, and smoke quit date. Patients were moved from the never smoker category to the former smoker category if their most recent smoking status was for never smoking, but they had evidence of smoking in their past. We assumed patients 15 and younger were non-smokers if they had no smoking information from the social history, procedures, or encounter diagnosis files. Patients over the age of 15 were categorized as missing if they had no smoking information from the social history, procedures, or encounter diagnosis files.

2.7.1.6 Family history

For both asthma and depression, having a family history of the disease is a risk factor for the disease in relatives.30,32 The family history file includes family history of

asthma and of mental disorders (a more specific family history of depression is not

available). Family history of asthma was created as a binary variable, which

distinguished patients recorded in the family history file as having one or more first

degree relatives (i.e., father, mother, brother, sister) with asthma from those without a

first degree relative with asthma. We also created a variable for family history of mental

disorders in the same way.

2.7.1.7 Medical Assistance

People below the poverty level have a higher risk for depression and for

asthma.25,29 Medical Assistance for health insurance is a means tested program. It has been used as a surrogate for low family SES in prior studies, which found associations with various health outcomes in patterns consistent with prior knowledge about low SES, supporting its use for this purpose.33,34 Patients were considered to be on Medical

Assistance if, up to their index date, patients had at least 3 outpatient visits with any of

the following payors: ACCESS PLUS D15, BLUE CHIP S18, CHIP UHC COMM PL

82

KIDS H64, GHP CHILD HLTH INS PROG (CHIP), MA PENDING D99, MEDICAID SSU

D74, or PENNA M A PROGRAM D01. If a patient did not have an outpatient encounter before their index date, we looked at outpatient encounters up to a year after their index date. If patients only had 3 outpatient visits in total up to their index date, patients only needed 2 visits with any of the above payors to be categorized as having medical assistance; if patients only had 2 outpatient visits in total up to a year after their index date, patients only needed 1 visit with any of the above payors to be categorized as having medical assistance; and if patients only had 1 outpatient visit in total up to a year after their index date, patients only needed 1 visit with any of the above payors to be categorized as having medical assistance.

2.7.1.8 Diabetes

We used diagnoses from both the medication and encounter files to classify patients as having diabetes mellitus. Patients were considered to have type 1 diabetes if they had one ICD-9 code (from either medication an encounter) of 250.X1 or 250.X3 before their index date. Patients were considered to have type 2 diabetes if they had two

ICD-9 codes (from either medication an encounter) on different days of 250.X0 or

250.X2 before their index date.

We did not allow patients to have both type 1 and type 2 diabetes. However, some patients did have ICD-9 codes for both type 1 and type 2 diabetes. For patients with ICD-9 codes for both types of diabetes, we calculated the number of encounters on different days in which the patient had diagnoses for each type of diabetes. We assigned patients as type 1 diabetes if they had more type 1 diabetes than type 2 diagnosis days and had at least one prior medication order for insulin. We assigned patients as type 2 diabetes if they had more type 2 diabetes than type 1 diagnosis days, or if they had more type 1 diabetes than type 2 diagnosis days but had a BMI greater than 40.

2.7.1.9 Overweight/obesity

83

Obesity is associated with worse asthma severity, and obesity and depression are often comorbid.30,35 The vitals file contains information on height and weight. We

used these variables to create the overweight and obese variables, which were based

on body mass index (BMI). For adults in the asthma and depression studies, we

calculated BMI with the most recent weight and the most recent height using Equation

2.7.1.9. For adults, we assumed heights of less than 36 inches or greater than 90 inches

and weights of less than 50 pounds or greater than 600 pounds were not possible and

dropped those measurements. If no height measurements before the index date were

available, we used the average of height measurements after the index date, assuming

that adults do not have increases in height. We classified BMIs less than 25 as not

overweight or obese BMIs greater than or equal to 25 and less than 30 as overweight,

and greater than or equal to 30 as obese.36

Equation 2.7.1.9. BMI formula for adults.

In the asthma study, for children, we calculated BMI z-scores using the CDC

SAS growth chart program.37 We used the most recent height and weight before the

index date and assumed a BMI percentile greater than or equal to the 85th percentile but

less than the 95th percentile was overweight, and greater than or equal to the 95th

percentile was obese.36 For both children and adults, we created a fourth category of missing BMI if the weight and height data to calculate BMI z-score or BMI were not available.

2.7.1.10 Alcohol use

Alcohol use is a standard confounder in studies of depression.38 We created a

categorical variable for alcohol use (yes, not heavy; yes, heavy; no; missing) using the

social history and encounter diagnosis files. Heavy alcohol use is defined by the CDC as

15 or more drinks a week for males and 8 or more drinks a week for females.39 We

84 looked at social histories taken within a year before the survey was returned. The social history file includes variables on alcohol status (yes, no, unknown, not asked) and drinks per week. We considered patients with any social history of heavy drinking in the past year as “yes, heavy,” and patients with any social history of drinking (less than heavy drinking) as “yes, not heavy.” Patients recorded as not drinking in the year prior to survey return were classified as “no,” and patients with no social histories in the year were classified as “missing.” We then looked in encounter diagnoses for ICD-9 codes

“305.0” (nondependent abuse of alcohol) and “303” (alcohol dependence syndrome).

Patients with either of these ICD-9 codes in the year prior to survey return were then reclassified as “yes, heavy.”

2.7.1.11 Anti-depressant use

We hypothesized that patients on anti-depressants many not be susceptible to

the potential effects of UNGD on depression. We created an anti-depressant variable

using the medication record file. We identified anti-depressant medications (selective

serotonin reuptake inhibitors, serotonin and norepinephrine reuptake inhibitors, serotonin

antagonist and reuptake inhibitors, tricyclic antidepressants, tetracyclic antidepressants,

bupropion, serotonin modulator and stimulators, and monoamine oxidase inhibitors) from

the antidepressant and miscellaneous psychotherapeutic pharmacy classes.40

Antidepressant use was identified with medication orders as classified using the Medi-

Span Therapeutic Classification System and the 14-digit Generic Product Identifier (GPI)

that identified drug group (e.g., antidepressants), drug class (e.g., tricyclics), drug sub-

class, and drug name (e.g., amitriptyline). The vast majority of orders were identified by

drug group, but some required more extensive searching (e.g., fluoxetine was also found

in drug class “miscellaneous psychotherapeutic”). We looked at the start and end dates

of medication use and assigned patients as on anti-depressants if the patient was on

anti-depressants on at least one day of the 30 days before survey return. We did not

85 look at anti-depressant medication use as an outcome because patients with depression may take several years to seek care.

2.7.2 Covariates created using patients’ coordinates We also created covariates using patients’ geocoded coordinates as summarized in Table 2.7.2 and described along with their rationale below.

Table 2.7.2. Variables created from the electronic health record used in health studies Variable Type Units Study Place type City, borough, Categorical township Asthma, depression Community socioeconomic deprivation Continuous Z-score Asthma, depression Maximum temperature on prior day Continuous Degrees Celsius Asthma Distance to nearest major and minor road Continuous Meters Asthma Distance to hospital Continuous Meters Asthma Well water supply Binary Public/well water Depression

2.7.2.1 Place type

Because patients are clustered in communities, we needed to create a variable

to describe community type and an identifier to group people in the same community

together. We used a mixed definition of place, termed mixed because two different sets

of place boundaries are used – minor civil divisions (township and boroughs) and census

tracts (in cities).41 This is done because census tracts are too large in rural areas to adequately represent communities; cities are too large and heterogeneous in minor civil division boundaries. This definition was thought to be more culturally, behaviorally, and experientially relevant to the concept of neighborhood condition.41 We downloaded U.S.

Census shapefiles for minor civil divisions and census tracts in Pennsylvania.42 In

ArcGIS, we plotted the townships and boroughs from the minor civil divisions, and the

census tracts in the cities. We assigned each patient to their mixed definition of place to

86 create the place type variable (city, borough, or township). We also assigned each patient the geographic identifier unique for their specific city census tract, borough, or township.43

2.7.2.2 Community socioeconomic deprivation

Community economic factors can effect health, even after accounting for individual level economic status.44 We assigned a measure of community socioeconomic deprivation (CSD) to each mixed definition of place geographic identifier. CSD is based on the commonly used deprivation indexes first derived from the Townsend Index45 and

shown to be associated with health outcomes in many prior studies in the social

epidemiology literature.46,47 The CSD index was calculated as the sum of six transformed

census variables (Table 2.7.2.2) using 2010 data.

Table 2.7.2.2. Variables used to create the socioeconomic deprivation index Variable Transformation Proportion of the population (25 years and older) with less than high school education Z-score Proportion of the population (16 years and older) unemployed Log transformed, z-score Proportion of the population (16 years and older) not in labor force Z-score Proportion of the population in poverty in the last 12 months Z-score Proportion of the population receiving public assistance in the last 12 months Log transformed, z-score Proportion of households without a car Log transformed, z-score

2.7.2.3 Maximum temperature on prior day

Temperature may have a direct effect on asthma exacerbations, and also an indirect effect through asthma triggers (e.g., air pollution), and the effects of temperature may not be adequately captured by season alone.48 We downloaded daily maximum

temperature data for 2005-2012 for New York, New Jersey, and Pennsylvania from the

National Climatic Data Center (Figure 2.7.2.3).49 Using R, we calculated the distance between each patient and each weather station. We then assigned each patient to the

87 closest weather station that reported maximum temperature on the day before each of the patient’s index dates.

Figure 2.7.2.3. Weather stations reporting daily maximum temperature between 2005- 12 in New York, New Jersey, and Pennsylvania.

2.7.2.4 Distance to nearest major and minor road

Living very close to roads (e.g., less than 200 meters) is associated with

increased risk of prevalent asthma and of asthma symptoms.50 We downloaded the

Federal Highway Administration’s 2011 shapefile of highways in the Pennsylvania and

New York (Figure 2.7.2.4).51 Using ArcGIS, we calculated the distance from each

patient’s geocoded address to the closest major road (defined as an interstate; principal

arterial, other freeways and expressways; and principal arterial, other) and minor road

(defined as minor arterials) in meters.

88

Figure 2.7.2.4. Locations of major and minor roads in New York and Pennsylvania.

2.7.2.5 Distance to hospital

We were concerned that asthma patients who lived closer to a Geisinger hospital might be more likely to seek care at the hospital than patients who lived farther away.

The majority of asthma emergency department encounters and hospitalizations occurred at two Geisinger hospitals, Geisinger Medical Center in Danville, Pennsylvania, and

Geisinger Wyoming Valley Medical Center in Wilkes-Barre, Pennsylvania (Figure

2.7.2.5). We calculated the distance from each patient’s geocoded address to both hospitals in meters in R, and then assigned each patient the smaller of the two

distances.

2.7.2.6 Well water supply

Studies on the effect of UNGD on home prices found larger decreases in home prices for homes on well water near unconventional wells compared to homes at the same distance but with a public water supply. The authors proposed the possible

89 explanation that, given public concern on the effect of UNGD on groundwater, having well water may increase the perception of risk from UNGD.52,53 We created a well water

variable using the shapefile of the public water supplier's service areas from the

Pennsylvania Department of Environmental Protection (Figure 2.7.2.6).54 We assumed

that patients with geocoded coordinates inside the shapefile had public water, and that

those outside the shape file had well water.

Figure 2.7.2.6. Public water supply areas in Pennsylvania.

2.7.2.7 Greenness

We assigned a measure of residential greenness, NDVI, which is from the

MODIS satellite (Section 2.1.7), to each patient’s geocoded home address. The NDVI data were from NASA on a 250 meter by 250 meter grid. We resampled to a five image by five image grid using the Focal Statistics tool in ArcGIS. Using the grid from the resampling, the value assigned to each geocoded address was the mean NDVI of the surrounding 25 images (1250 meter by 1250 meters).

2.7.3 UNGD activity metrics

The general purpose of exposure assessment is to rank study participants on intensity, duration, and/or frequency of an exposure for relevant time periods considering

90 disease latency and recent vs. cumulative exposure depending on consideration of how exposure is thought to cause acute vs. chronic disease.55 In our studies, we needed an

exposure assessment method that could be used retrospectively and would be sensitive

to the time-varying nature of UNGD. We wanted the metric to capture all potential

pathways for UNGD to affect health, and for the metric to rank patients higher who lived

closer to wells, among a greater density of wells, and/or near larger wells. We used a

gravity metric, which has been used in epidemiology studies previously, for example, in

a study of infections from methicillin-resistant Staphylococcus aureus in relation to industrial food animal production.33

2.7.3.1 Durations of phases of well development

We estimated the durations of well development phases to determine when wells would contribute to the UNGD metrics using descriptions of the process and our data

(Figure 2.7.3.1).56 We assumed that pad preparation lasted 30 days for the first well on each pad. We estimated the duration of drilling by assuming the largest well was drilled in 30 days. We divided the largest total depth of an unconventional well drilled as of

2013 (20,664 feet) by 30 and assumed that value, 688.8 feet, was the depth of a well that would be drilled in one day. We assumed that stimulation took seven days and that production was daily for each production period that a well reported production.

91

Figure 2.7.3.1. Timeline of well development with estimated durations each phases.

2.7.3.2 Assignment of unconventional natural gas activity metrics for wells

For each of four phases of well development (pad preparation, drilling

stimulation, and production), the metric was assigned using Equation 2.7.3.1.

Equation 2.7.3.1 Activity metrics for unconventional natural gas wells

2 In Equation 2.7.3.1, n was the number of wells in the given phase, dij was the

squared-distance (meters) between well i and participant j, and si was 1 for the pad

production and drilling phases, total well depth (meters) of well i for the stimulation

phase, and daily natural gas production volume (m3) of well i for the production phase.

We summed the metric for the relevant period of exposure for each outcome (d; negative in the formula above because it represents days before the index date). For the asthma exacerbation outcomes, we assigned the metrics on the single day before the index date. We did this because studies of air pollution and asthma typically use a short window of exposure (Table 1.6.1). We compared several lags and durations for the drilling metric for a subset of 446 randomly selected asthma hospitalizations (Table

2.7.3.2). Because they were highly correlated, for computational simplicity, we used a lag of one day and duration of one day for each UNGD metric.

92

For the depression symptoms outcome, for each phase of development, the metric was summed for the period of the 14 days prior to the date of the returned follow- up questionnaire, because the depression symptom questionnaire asked about the prior

14 days.20 In the analysis to evaluate mediation by fatigue or migraine of the UNGD-

depression symptom association, we assigned the UNGD metric for the three months

before baseline questionnaire return because the prior study that evaluated associations

of UNGD with symptoms from the baseline questionnaire also summed the UNGD

metric over three months.57

Table 2.7.3.2. Spearman correlation coefficient of the drilling metric assigned for different durations and lags for 446 randomly chosen asthma hospitalizations. Lagged Lagged Lagged Lagged day 3 day 1 days 3-5 days 1-5 Lagged day 3 1 Lagged day 1 0.97 1 Lagged days 3-5 0.99 0.96 1 Lagged days 1-5 0.99 0.98 0.99 1

2.7.3.3 Assignment of unconventional natural gas activity metrics for impoundments and

compressors

In the exposure assessment study (Chapter 5), the UNGD metrics for

compressor engines and impoundments were also assigned using Equation 2.7.3.1. For

compressor engines, si was the compressor engine horsepower. Engines contributed to the metric from their start date to their removal date. For impoundments, si was the area

(m2) of the impoundment, which contributed to the metric from their installation to their removal date.

2.8 References

1. Paulus RA, Davis K, Steele GD. Continuous innovation in health care: Implications of

the geisinger experience. Health Affairs. 2008;27(5):1235-1245.

93

2. Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health records for population health research: A review of methods and applications. Annu Rev Public

Health. 2015(0).

3. Pennsylvania Code. Unconventional well report act. http://www.legis.state.pa.us/cfdocs/Legis/LI/uconsCheck.cfm?txtType=HTM&yr=2014&s essInd=0&smthLwInd=0&act=173. Updated 2014. Accessed 1/11, 2017.

4. Pennsylvania Code. Act 13. http://www.legis.state.pa.us/CFDOCS/LEGIS/LI/uconsCheck.cfm?txtType=HTM&yr=201

2&sessInd=0&smthLwInd=0&act=0013.&CFID=126352892&CFTOKEN=56814378.

Updated 2012. Accessed 1/11, 2017.

5. Pennsylvania Code. Oil and gas act. http://www.legis.state.pa.us/WU01/LI/LI/CT/HTM/58/00.032..HTM. Updated 1984.

Accessed 1/11, 2017.

6. Roy AA, Adams PJ, Robinson AL. Air pollutant emissions from the development, production, and processing of Marcellus shale natural gas. J Air Waste Manage Assoc.

2013;64(1):19-37.

7. Litovitz A, Curtright A, Abramzon S, Burger N, Samaras C. Estimation of regional air- quality damages from marcellus shale natural gas extraction in pennsylvania.

Environmental Research Letters. 2013;8(1):014017.

8. Pennsylvania Department of Environmental Protection Bureau of Air Quality. GP-5 fact sheet.

94 http://www.dep.state.pa.us/dep/deputate/airwaste/aq/permits/gp/Fact_Sheet_GP5.pdf.

Accessed 1/12, 2017.

9. Pennsylvania Department of Conservation and Natural Resources. Exploration and development well information network.

http://dcnr.state.pa.us/topogeo/econresource/oilandgas/EDWIN_home/index.htm.

Updated 2016. Accessed 1/5, 2017.

10. U.S. Census Burea. History of the census. 2010 Overview Web site.

https://www.census.gov/history/www/through_the_decades/overview/2010_overview_1.

html. Updated 2016. Accessed 1/11, 2017.

11. National Aeronautics and Space Administration. Landsat overview. . Updated 2015.

Accessed 1/12, 2017.

12. National Oceanic and Atmospheric Administration. The earth at night: Suomi NPP

satellite offers unprecedented views.

http://research.noaa.gov/News/TabId/496/ArtMID/1377/ArticleID/10133/The-Earth-at- night-Suomi-NPP-satellite-offers-unprecedented-views.aspx. Updated 2012. Accessed

1/12, 2017.

13. National Aeronautics and Space Administration. Moderate resolution imaging spectroradiometer . http://terra.nasa.gov/about/terra-instruments/modis Web site. .

Updated 20171/12.

14. U.S. Department of Agriculture. National agriculture imagery program. https://www.fsa.usda.gov/programs-and-services/aerial-photography/imagery- programs/naip-imagery/index. Updated 2013. Accessed 1/10, 2017.

95

15. SkyTruth. TADPOLE pennsylvania results. http://frack.skytruth.org/frackfinder/frackfinder-news/tadpolepennsylvaniaresults.

Published Feb 12, 2014. Updated 2014. Accessed June 30, 2014.

16. Pacheco JA, Avila PC, Thompson JA, et al. A highly specific algorithm for identifying asthma cases and controls for genome-wide association studies. AMIA Annu Symp

Proc. 2009;2009:497-501.

17. Hirsch AG, Stewart WF, Sundaresan AS, et al. Nasal and sinus symptoms and chronic rhinosinusitis in a population-based sample. Allergy. 2016.

18. Patient-Reported Outcomes Measurement Information System. PROMIS fatigue short form 8a. http://www.assessmentcenter.net. Updated 2015. Accessed October 10,

2015.

19. Lipton RB, Dodick D, Sadovsky R, et al. A self-administered screener for migraine in primary care: The ID migraine validation study. Neurology. 2003;61(3):375-382.

20. Kroenke K, Strine TW, Spitzer RL, Williams JB, Berry JT, Mokdad AH. The PHQ-8 as a measure of current depression in the general population. J Affect Disord.

2009;114(1):163-173.

21. Balkrishnan R, Rasu RS, Rajagopalan R. Physician and patient determinants of pharmacologic treatment of sleep difficulties in outpatient settings in the united states.

Sleep. 2005;28(6):715.

22. Verbesselt J, Hyndman R, Newnham G, Culvenor D. Detecting trend and seasonal changes in satellite image time series. Remote Sens Environ. 2010;114(1):106-115.

96

23. Moorman JE, Zahran H, Truman BI, Molla MT, Centers for Disease Control and

Prevention (CDC). Current asthma prevalence-united states, 2006-2008. MMWR

Surveill Summ. 2011;60(Suppl):84-86.

24. Moorman JE, Person CJ, Zahran HS. Asthma attacks among persons with current

asthma—United states, 2001–2010. MMWR Surveill Summ. 2013;62(suppl 3):93-98.

25. Pratt LA, Brody DJ. Depression in the U.S. household population, 2009-2012. NCHS

Data Brief. 2014;(172)(172):1-8.

26. Pratt LA, Brody DJ, Gu Q, National Center for Health Statistics (US). Antidepressant

use in persons aged 12 and over: United states, 2005-2008. 2011.

27. Johnston NW, Sears MR. Asthma exacerbations . 1: Epidemiology. Thorax.

2006;61(8):722-728.

28. Johnston NW, Johnston SL, Norman GR, Dai J, Sears MR. The september epidemic

of asthma hospitalization: School children as disease vectors. J Allergy Clin Immunol.

2006;117(3):557-562.

29. Moorman JE, Akinbami LJ, Bailey CM, et al. National surveillance of asthma: United

states, 2001-2010. National Center for Health Statistics, Vital Health Stat. 2012;3:35.

30. National Heart, Lung, and Blood Institute. National Asthma Education Program.

Expert Panel on the Management of Asthma. Expert panel report 3: Guidelines for the

diagnosis and management of asthma: Full report. US Department of Health and Human

Services, National Institutes of Health, National Heart, Lung, and Blood Institute; 2007.

97

31. Munafo MR, Araya R. Cigarette smoking and depression: A question of causation. Br

J Psychiatry. 2010;196(6):425-426.

32. Schreier A, Höfler M, Wittchen H, Lieb R. Clinical characteristics of major depressive disorder run in families–a community study of 933 mothers and their children. J

Psychiatr Res. 2006;40(4):283-292.

33. Casey JA, Curriero FC, Cosgrove SE, Nachman KE,Schwartz BS. HIgh-density livestock operations, crop field application of manure, and risk of community-associated methicillin-resistant staphylococcus aureus infection in Pennsylvania. JAMA Internal

Medicine. 2013;173(21):1980-1990.

34. Schwartz BS, Bailey-Davis L, Bandeen-Roche K, et al. Attention deficit disorder, stimulant use, and childhood body mass index trajectory. Pediatrics. 2014;133(4):668-

676.

35. Preiss K, Brennan L, Clarke D. A systematic review of variables associated with the relationship between obesity and depression. Obesity Reviews. 2013;14(11):906-918.

36. Ogden CL, Carroll MD, Kit BK, Flegal KM. Prevalence of childhood and adult obesity in the united states, 2011-2012. JAMA. 2014;311(8):806.

37. Centers for Disease Control and Prevention. A SAS Program for the 2000 CDC

Growth Charts (ages 0 to< 20 years). 2014.

38. Mair C, Diez Roux AV, Galea S. Are neighbourhood characteristics associated with depressive symptoms? A review of evidence. J Epidemiol Community Health.

2008;62(11):940-6.

98

39. Stahre M. Contribution of excessive alcohol consumption to deaths and years of potential life lost in the united states. Preventing chronic disease. 2014;11.

40. Schwartz BS, Glass TA, Pollak J, et al. Depression, its comorbidities and treatment, and childhood body mass index trajectories. Obesity (Silver Spring). 2016.

41. Schwartz BS, Stewart WF, Godby S, et al. Body mass index and the built and social environments in children and adolescents using electronic health records. Am J Prev

Med. 2011;41(4):e17-e28.

42. U.S. Census Bureau. TIGER/line shapefiles. https://www.census.gov/geo/maps- data/data/tiger-line.html. Updated 2016. Accessed 5/24, 2016.

43. U.S. Census Bureau. Understanding geographic identifiers. https://www.census.gov/geo/reference/geoidentifiers.html. Updated 2015. Accessed

5/24, 2016.

44. Pickett KE, Pearl M. Multilevel analyses of neighbourhood socioeconomic context and health outcomes: A critical review. J Epidemiol Community Health. 2001;55(2):111-

122.

45. Townsend, Peter,, Phillimore, Peter,,Beattie, Alastair.,. Health and deprivation :

Inequality and the north. London; New York: Croom Helm; 1988.

46. Nau C, Schwartz BS, Bandeen‐Roche K, et al. Community socioeconomic deprivation and obesity trajectories in children using electronic health records. Obesity.

2015;23(1):207-212.

99

47. Liu AY, Curriero FC, Glass TA, Stewart WF, Schwartz BS. The contextual influence of coal abandoned mine lands in communities and type 2 diabetes in pennsylvania.

Health Place. 2013.

48. Buckley JP, Richardson DB. Seasonal modification of the association between temperature and adult emergency department visits for asthma: A case-crossover study.

Environ Health. 2012;11(1):55.

49. National Climatic Data Center. Climate Data Online Web site. . Accessed May 11,

2011.

50. McConnell R, Berhane K, Yao L, et al. Traffic, susceptibility, and childhood asthma.

Environ Health Perspect. 2006:766-772.

51. U.S. Department of Transportation Federal Highway Administration. Highway

Performance Monitoring System Web site. http://www.fhwa.dot.gov/policyinformation/hpms/shapefiles.cfm. Updated 2013.

Accessed March 27, 2015.

52. Gopalakrishnan S, Klaiber HA. Is the shale energy boom a bust for nearby residents? evidence from housing values in pennsylvania. Am J Agric Econ.

2014;96(1):43-66.

53. Muehlenbachs L, Spiller E, Timmins C. The housing market impacts of shale gas development. Am Econ Rev. 2015;105(12):3633-59.

54. Pennsylvania Department of Health. Public water systems. Environmental Health

Tracking Program Web site. http://www.health.pa.gov/My%20Health/Environmental%20Health/Environmental%20Pu

100

blic%20Health%20Tracking/Pages/Metadata-for-Drinking-Water-

Quality.aspx#.V0Xr8JErKM8. Updated 2015. Accessed 5/25, 2016.

55. Porta M. A dictionary of epidemiology. Oxford University Press; 2008.

56. Gaines M. PennDOT’s posting and bonding program and impact of unconventional oil & gas. http://extension.psu.edu/natural-resources/natural-gas/webinars/shale-energy-

developments-effect-on-the-posting-bonding-and-maintenance-of-roads-in-rural-

pa/mark-gaines-may-16-2013-powerpoint. Published May 16, 2013.

57. Tustin AW, Hirsch AG, Rasmussen SG, Casey JA, Bandeen-Roche K, Schwartz BS.

Associations between unconventional natural gas development and nasal and sinus,

migraine headache, and fatigue symptoms in Pennsylvania. Environ Health Perspect.

2016.

101

Chapter 3: Asthma Exacerbations and Unconventional Natural Gas Development in the Marcellus Shale

3.0 Cover page Sara G. Rasmussen, MHS1; Elizabeth L. Ogburn, PhD2; Meredith McCormack, MD3; Joan A. Casey, PhD4; Karen Bandeen-Roche, PhD2; Dione G. Mercer, BS5; and Brian S. Schwartz, MD, MS1,3,5

1Department of Environmental Health Sciences, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA; 2Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA; 3Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA; 4Robert Wood Johnson Foundation Health and Society Scholars Program, UC San Francisco and UC Berkeley, California, USA; 5Center for Health Research, Geisinger Health System, Danville, Pennsylvania, USA

Acknowledgements: We thank Joseph J. DeWalle, BS, Jennifer K. Irving, BA, and Joshua M. Crisp, BS (Geisinger Center for Health Research) for patient geocoding and assistance in assembling the UNGD dataset; SkyTruth in Shepherdstown, WV for the well pad data; Kirsten Koehler, PhD (JHSPH) for assistance with the temperature data; Kara Rudolph, PhD, MHS (Robert Wood Johnson Foundation Health and Society Scholars Program, UC San Francisco and UC Berkeley) for the code for the unmeasured confounder graphs; and Jonathan S. Pollak, MPP (JHSPH) for identifying the asthma patients from the general Geisinger population. All except KK and KR received compensation for their contributions. This study was funded by the National Institute of Environmental Health Sciences grant ES023675-01 (PI: B S Schwartz) and training grant ES07141 (S G Rasmussen). Additional support was provided by the Degenstein Foundation for compilation of well data, the Robert Wood Johnson Foundation Health & Society Scholars program (J A Casey), and the National Science Foundation Integrative Graduate Education and Research Traineeship (S G Rasmussen). No funders had input into the study design, conduct, data collection or analysis, or manuscript preparation.

Rasmussen SG, Ogburn EL, McCormack M, et al. Association between unconventional natural gas development in the Marcellus shale and asthma exacerbations. JAMA Intern Med. 2016;176(9):1334-1343.

102

3.1 Abstract Importance: Asthma is common and can be exacerbated by air pollution and stress.

Unconventional natural gas development (UNGD) has community and environmental impacts. In Pennsylvania, development began in 2005 and by 2012, 6,253 wells were drilled. There are no prior studies of UNGD and objective respiratory outcomes.

Objective: To evaluate associations between UNGD and asthma exacerbations.

Design: A nested case-control study comparing asthma patients with exacerbations to asthma patients without exacerbations from 2005-12.

Setting: The Geisinger Clinic, which provides primary care services to over 400,000 patients in Pennsylvania.

Participants: Asthma patients aged 5-90 years (n = 35,508) were identified in electronic health records; those with exacerbations were frequency-matched on age, sex, and year of event to those without.

Exposure(s): On the day before each patient’s index date (cases: date of event or medication order; controls: contact date), we estimated UNGD activity metrics for four phases (pad preparation, drilling, stimulation [“fracking”], and production) using distance from the patient’s home to the well, well characteristics, and the dates and durations of phases.

Main Outcome(s) and Measure(s): We identified mild, moderate, and severe asthma exacerbations (new oral corticosteroid medication order, emergency department encounter, and hospitalization, respectively).

Results: We identified 20,749 mild, 1,870 moderate, and 4,782 severe asthma exacerbations, and frequency-matched these to 14,104, 9,350, and 18,693 control index dates, respectively. In three-level adjusted models, there was an association between the highest group of the activity metric for each UNGD phase compared to the lowest

103 group for 11 out of 12 UNGD-outcome pairs (odds ratios [95% CI] ranged from 1.5 [1.2-

1.7] for the association of the pad metric with severe exacerbations to 4.4 [3.8-5.2] for the association of the production metric with mild exacerbations). Six of the 12 UNGD- outcome associations had increasing odds ratios across quartiles. Our findings were robust to increasing levels of covariate control and in sensitivity analyses that included evaluation of some possible sources of unmeasured confounding.

Conclusions and Relevance: Residential UNGD activity metrics were statistically associated with increased odds of mild, moderate, and severe asthma exacerbations.

Whether these associations are causal awaits further investigation, including more detailed exposure assessment.

3.2 Introduction Asthma is a common, chronic disease – in 2010, 25.7 million people in the

United States had asthma, a prevalence of 8.4%.1 Asthma is characterized by variable and recurring symptoms (including cough, wheezing, shortness of breath, and chest tightness), reversible airflow obstruction, bronchial hyper-responsiveness, and underlying inflammation.2,3 In 2009, there were 11.8 million outpatient visits, 2.1 million emergency department visits, and 479,300 hospitalizations for asthma in the US.1

Outdoor air pollution is a recognized cause of asthma exacerbations. A large body of literature links asthma exacerbations to exposure to air pollutants, including ozone, particulate matter, nitrogen dioxide, and sulfur dioxide,2,4 and exposure to even low levels of these pollutants has been associated with asthma hospitalizations, emergency department visits, and rescue medication use, with latency between 0 and 5 days.5-11 Stress at the individual and community levels is also associated with asthma exacerbations.12 Psychosocial stress can modify the effects of environmental triggers13 and is associated with worse asthma control and medication aderence.14

104

Unconventional natural gas development (UNGD) has recently become a major

energy source domestically and worldwide. Pennsylvania has proceeded with UNGD

rapidly – between the mid-2000s and 2012, 6,253 wells were drilled. In contrast, New

York and Maryland, also in the Marcellus shale, have not developed.15,16 Despite calls

for research on the health effects of the industry, there are few published studies of

public health impacts of UNGD.17,18

The first step of UNGD is well pad preparation, lasting about 30 days,

during which 3-5 acres are cleared and materials are brought to the site.19 Drilling begins

on the spud date and typically lasts up to a month as a well is drilled vertically 2,000-

3,000 meters and horizontally 600-3,000 meters.19 After drilling is completed, the

horizontal portion is perforated. Stimulation, also called hydraulic fracturing or “fracking,”

follows, lasts around a week, and requires 11-19 million liters of water, sand, and

chemical additives (e.g., friction reducers, biocides, gelling agents).19,20 Development to

this point requires over 1,000 truck trips per well.19 After stimulation, gas production

begins. The Pennsylvania Department of Environmental Protection (PA DEP) requires

companies to submit documentation at most of these stages of well development.21

UNGD has been associated with air quality and community social impacts.22-29

Psychosocial stress,12 exposure to air pollution4,30 including truck traffic,31 sleep disruption,32,33 and reduced socioeconomic status34 are all biologically plausible

pathways for UNGD to affect asthma exacerbations. To date, there have been no

epidemiologic studies of UNGD and objective respiratory outcomes. Respiratory

outcomes are appropriate outcomes to assess potential health impacts of UNGD

because these have clear links to air pollution and stress; have short latency between

exposure and health effects; are common in the general population; and prompt patients

to seek care so are captured by health system data. Using electronic health record

(EHR) data from the Geisinger Clinic, located in over 35 counties in Pennsylvania,

105 including many with active UNGD, we conducted a nested case-control study of the association between four UNGD activity metrics and asthma exacerbations.

3.3 Methods 3.3.1 Study population We identified asthma patients from the Geisinger Clinic population, which is representative of the general population in the region.35 We included Pennsylvania and

New York patients and excluded patients with cystic fibrosis (277.0x); chronic pulmonary heart disease (416.x); paralysis of vocal cords or larynx (478.3x); bronchiectasis

(494.xx); and pneumoconiosis (500.xx-508.xx) using International Classification of

Diseases, 9th Revision, Clinical Modification (ICD-9) codes. We required patients to have at least two encounters or medication orders with ICD-9 codes for asthma on different days.36 Patients were geocoded using previously published methods,37 88.9% to home address, 2.6% to ZIP+4, and 8.5% to ZIP code centroid. Inclusion criteria also included contact with Geisinger from 2005-2012 while between the ages 5-90 years and recorded information on sex (n=35,508). The study was approved by the Geisinger

Institutional Review Board (with an IRB authorization agreement with Johns Hopkins

Bloomberg School of Public Health). Patients did not receive a stipend and informed consent was obtained through a waiver of HIPAA authorization.

3.3.2 Outcome Ascertainment We identified new oral corticosteroid (OCS) medication orders, asthma emergency department encounters, and asthma hospitalizations, termed mild, moderate, and severe exacerbations, respectively. For patients with more than one exacerbation of a given type within a calendar year, we randomly selected one event. For mild exacerbations, we distinguished new OCS medication orders from 2008-2012 for an asthma exacerbation from standing orders or OCS ordered for other diseases (Figure

3.3.2). The medication order date was considered the index date. OCS orders from

106 before 2008 were excluded because these were not consistently captured before then.

For moderate and severe exacerbations, we identified all emergency and hospitalization encounters from 2005-12. Primary or secondary diagnoses were for asthma (493.x) were used to identify emergency or hospitalization encounters. Patients who had multiple emergency or hospitalization encounters within 72 hours were considered to have a single event. Emergency and hospitalizations encounters within 72 hours were identified as a single hospitalization. The first encounter or admission date of each group of combined encounters was the index date. For patients with more than one type of exacerbation within a week, we retained only the higher category.

107

Figure 3.3.2. Flow diagram for identification of new asthma oral corticosteroid (OCS) medication orders.

3.3.3 Controls and Matching

We identified controls from asthma patients under observation by the health system, so that if the patient were to have an exacerbation, it would be captured by the

EHR. All patient contact dates were identified (e.g., encounter, order, test). Because many of the covariates and the UNGD metrics were time-varying, we needed a single

108

date on which to assign these variables. Therefore, for controls, we randomly selected

one contact date per year per patient. A case was always eligible to be a control for a

less severe event; or for an event of equal or greater severity until the year of the case’s

event. We frequency-matched cases to controls on age category (5-12, 13-18, 19-44,

45-61, 62-74, 75+ years), sex (male, female), and year of encounter.

3.3.4 Covariates

We created time-varying covariates (age, season of event, smoking status,

overweight and obesity, Medical Assistance [as a measure of low family socioeconomic

status], type 2 diabetes) for each index date; and non-time-varying covariates (sex and

race/ethnicity) for each patient. Race/ethnicity was assessed by patient self-report, and

was included because it is a well-documented confounder in studies of asthma.2 We

estimated the patients’ distance to nearest major and minor road using a network from

the Federal Highway Administration,38 and used patients’ geographic coordinates to

assign them to a community using a mixed definition of place and calculated community

socioeconomic deprivation (CSD) for these places.37,39 In cities, communities were

defined by census tracts; elsewhere, communities were defined by minor civil divisions

(townships and boroughs). We estimated the peak temperature on the day before each

index date using data from the nearest weather station to each patient.40 We did not control for place type because of the concern of controlling for exposure.

3.3.5 Well Data

Well data were obtained from: the PA DEP, for well spud (start of drilling) and production; the Pennsylvania Department of Conservation and Natural Resources, for information on well stimulation (hydraulic fracturing) and depths; and SkyTruth

(Shepherdstown, WV), which used crowdsourcing of aerial photographs from the U.S.

Department of Agriculture to identify the location of wellpads.41 For each well, we had

information on well pad; latitude and longitude; dates of spudding, stimulation, and

109

production; total depth; and volume of natural gas produced and the number of

production days. We imputed missing total depths (0.4%) using conditional mean

imputation. We estimated missing production quantities (0.2%) by averaging production

quantities in the prior and following period. We extrapolated missing spud (2.0%) and

stimulation (34.6%) dates using the well’s available dates of development by requiring

that the stimulation date fall in between the spud and production date and using median

durations between phases from wells without any missing dates.

3.3.6 Activity Metric Assignment We estimated the UNGD activity metrics using an inverse distance-squared method for pad preparation, spud, stimulation, and production phases. We compared activity metrics on the day before, three days before, the sum of three to five days before, and the sum of one to five days before the index date, and because they were highly correlated (Spearman correlation coefficients ranged from 0.96-1.00), we used only the day before the index date.

For the pad preparation and spud metrics, we used Equation 3.3.6.1), where n is

2 the number of wells and dij is the squared-distance (meters) between well i and patient j.

Equation 3.3.6.1. Pad preparation and spud metric.

2 For the stimulation metric, we used Equation 3.3.6.2, where n is the number of wells, dij

is the squared-distance (meters) between well i and patient j, and ti is the total well depth

(meters) of well i.

Equation 3.3.6.2. Stimulation activity metric.

Total depth was used as a surrogate for truck traffic because volume of water used

during stimulation42 was highly correlated with total depth, and water is trucked to the

well during stimulation. For the production metric, we used Equation 3.3.6.3, where n is

110

2 the number of wells, dij is the squared-distance (meters) between well i and patient j,

3 and vi is the daily natural gas production volume (m ) of well i.

Equation 3.3.6.3. Production activity metric.

Production volume was used as a surrogate for fugitive emissions and compressor

engine activity.22

Based on descriptions of the process19 and our data, we estimated that pad

development lasted 30 days before the spud date for the first well on a pad, drilling

lasted between 1-30 days after the spud date based on total depth, and stimulation

lasted seven days. All wells in Pennsylvania in a given phase on the day prior to an

index date contributed to that phase’s activity metric (Equations 3.3.6.1-3.3.6.3). We divided the four continuous metrics into quartiles using all 69,548 index dates from all three outcomes so the cutpoints were the same for all outcomes (very low, low, medium, and high).

3.3.7 Statistical Analysis

To assess the association of the four UNGD activity metrics with the three types of asthma exacerbations, we used multilevel logistic regression with random intercept for patient and community to account for multiple events per patient and patient clustering within communities. The base model included one of the four UNGD activity metrics

(very low, low, medium, high), age category (5-12, 13-18, 19-44, 45-61, 62-74, 75+

years), sex (male, female), race/ethnicity (black, Hispanic, other, white), family history of

asthma (yes, no), smoking status (former, current, missing, never), season (summer,

fall, winter, spring), Medical Assistance (yes, no), and overweight/obesity (using BMI

percentile for children and BMI for adults43) as covariates. We then added, one at a time, type 2 diabetes (yes, no), CSD (quartiles), distances to nearest major and minor arterial

road (meters, z-transformed), and maximum temperature on the day prior to event (°C

111

per interquartile range) (Equation 3.3.7). We included the continuous covariates as linear and quadratic terms to allow for non-linearity. We used a 2-sided type 1 error rate of 0.05 for significance testing. We used Stata version 11.2 (StataCorp Inc.) and R version 3.1.2 (R Foundation for Statistical Computing).

Equation 3.3.7. Statistical Model a b Logit(Yijk) = β0 + β1(UNGD Q 2)ijk + β2(UNGD Q3)ijk + β3(UNGD Q4)ijk + β4(age category 13-18)ijk + β5(age category 19-44)ijk + β6(age category 45-61)ijk + β7(age category 62-74)ijk + β8(age category 75+)ijk + β9(male sex)ij + β10(race/ethnicity, black)ij + β11(race/ethnicity, Hispanic)ij + β12(race/ethnicity, other/missing)ij + β13(family history of asthma)ij + β14(smoking status, current)ijk + β15(smoking status, former)ijk + β16(smoking status, missing)ijk + β17(season, summer)ijk + β18(season, fall)ijk + β19(season, winter)ijk + β20(Medical Assistance)ijk + β21(overweight/obesity, overweight)ijk + β22(overweight/obesity, obese)ijk + β23(overweight/obesity, BMI missing)ijk + β24(type 2 diabetes)ijk + β25(community socioeconomic deprivation Q2)i + β26(community socioeconomic deprivation Q3)i + β27(community socioeconomic deprivation Q4)i + β28(distance to nearest major road)ij + β29(distance to nearest major road squared)ij + β30(distance to nearest minor road)ij + β31(distance to nearest minor road squared)ij + β32(maximum temperature on the day prior to event)ij + β33(maximum temperature on the day prior to event squared)ij + u_i + u_ij

where i=community, j=person, k=index date and (u_i, u_ij) are independent normally distributed random effects with mean 0 and separate variances.

a unconventional natural gas development activity metric b quartile 3.3.7.1 Model Building

We calculated the intraclass correlation coefficient for the person and community levels. The proportions of total variance that were accounted for by between-community variation and between-person variation, respectively, were 14% and 63% for severe exacerbations, 41% and 89% for moderate exacerbations, and 1.2% and 59% for mild

exacerbations. We evaluated covariates for conditional significance as they were added

to the models.

3.3.7.2 Sensitivity Analyses

To evaluate how the four separate UNGD activity metrics compared to a

summary measure, we calculated z-scores using continuous metrics, summed the z-

scores, and re-ran the final models with this combined UNGD activity metric (quartiles).

112

To explore whether an unmeasured confounder was responsible for our associations, we evaluated associations with encounters for a negative control44 (intestinal infectious disease and noninfectious gastroenteritis, ICD-9 codes 001-009 and 558.9) among asthma patients, and we also replaced the UNGD activity metric with indicators for counties. We were concerned about the unbalanced numbers of cases and controls for certain age categories, sex, and years in the mild exacerbations analysis, so we reran the analysis dropping the unbalanced cells. In order to check the sensitivity to geocoding level, we reran the final model for the production UNGD metric and each outcome using only patients who were geocoded to their home address. We estimated how large an unmeasured confounder would need to be to account for the observed associations, in whole or in part.45

3.4 Results 3.4.1 Descriptions of Wells and Patients

Between 2005-2012, 6,253 unconventional natural gas wells were spudded on

2,710 pads, 4,728 were stimulated, and 3,706 were in production. The median number of wells per pad was 1 (IQR 1-3) and median total depth was 3,394m (IQR 2,934-3,839).

Most development occurred after 2007 (Figure 3.4.1.1). On their index date, patients in the highest group of the spud metric lived a median of 19km from the closest spudded well, compared to 63km for patients in the lowest group. We identified 5,600 severe,

2,291 moderate, and 25,647 mild exacerbations. After retaining one event per type per year per person, 4,782 severe, 1,870 moderate, and 20,749 mild exacerbations were included. There was substantial overlap of patients and wells in the northern counties

(Figure 3.4.1.2), and substantial overlap of patients by quartile of UNGD activity metric

(Figure 3.4.1.3).

113

Figure 3.4.1.1. Number of developed pads (blue), and spudded (red), stimulated (green), and producing wells (yellow), 2005-12.

114

Figure 3.4.1.2 The location of spudded wells as of December 2012 and residential location of Geisinger asthma patients.

115

Figure 3.4.1.3 Locations of cases and controls by quartile of spud activity metric.

Demographic and clinical variables differed by outcome (Table 3.4.1). Compared to patients with mild and moderate exacerbations, patients with severe exacerbations were more likely to be female, older, current smokers, and obese (all p<0.001). Patients with moderate exacerbations were more likely to be on Medical Assistance and of black race than patients with the other two outcomes, and patients with mild exacerbations were more likely to live in townships (all p<0.001) than patients with the other two outcomes.

3.4.2 Associations of UNGD Activity Metrics with Asthma Outcomes

For severe, moderate, and mild exacerbations, the average percent changes for all odds ratios, from simple models with random intercepts for person and place without

116

covariates to fully adjusted multilevel models, were -8.5%, -0.2%, and 6.0%,

respectively, suggesting little sensitivity of the associations to measured covariates. In

adjusted models, the high activity (vs. very low) of each UNGD metric was associated

with each asthma outcome (Table 3.4.2), except for the pad metric with mild

exacerbations. Associations for the other 11 exposure-outcome pairs ranged from (odds ratio [95% confidence interval]) 1.5 (1.2-1.7) for pad metric with severe exacerbations to

4.4 (3.8-5.2) for production metric with mild exacerbations. Of the 12 activity metric- outcome pairs, six had increasing odds ratios across quartiles 2-4.

117

Table 3.4.1. Descriptive statistics of cases and controls by exacerbation type for selected study variables by variable type (constant vs. time-varying) Hospitalization Emergency Department Encounter Oral Corticosteroid Order Control n (%a) Case n (%) Control n (%) Case n (%) Control n (%) Case n (%) Non-time-varying (constant)

variables Total number of patients 14104 (100) 3576 (100) 9350 (100) 1454 (100) 18693 (100) 13196 (100) Female 10093 (71.6) 2520 (70.5) 5660 (60.5) 872 (60) 11297 (60.4) 8173 (61.9) Family history of asthma 1324 (9.4) 404 (11.3) 1147 (12.3) 266 (18.3) 2047 (11) 1672 (12.7) Race/ethnicity White 13309 (94.4) 3316 (92.7) 8705 (93.1) 1223 (84.1) 17160 (91.8) 12177 (92.3) Black 345 (2.4) 111 (3.1) 286 (3.1) 125 (8.6) 676 (3.6) 431 (3.3) Hispanic 344 (2.4) 126 (3.5) 273 (2.9) 93 (6.4) 674 (3.6) 471 (3.6) Other/missing 106 (0.8) 23 (0.6) 86 (0.9) 13 (0.9) 183 (1.0) 117 (0.9) Place type Township 8583 (60.9) 2017 (56.4) 5590 (59.8) 659 (45.3) 11324 (60.6) 7917 (60) Borough 4192 (29.7) 1108 (31) 2786 (29.8) 490 (33.7) 5445 (29.1) 3891 (29.5) City 1329 (9.4) 451 (12.6) 974 (10.4) 305 (21) 1924 (10.3) 1388 (10.5) Community socioeconomic

deprivation Quartile 1 2967 (21) 673 (18.8) 1936 (20.7) 226 (15.5) 3897 (20.8) 2751 (20.8) Quartile 2 3677 (26.1) 886 (24.8) 2454 (26.2) 307 (21.1) 4839 (25.9) 3259 (24.7) Quartile 3 3561 (25.2) 920 (25.7) 2294 (24.5) 378 (26.0) 4659 (24.9) 3427 (26.0) Quartile 4 3899 (27.6) 1097 (30.7) 2666 (28.5) 543 (37.3) 5298 (28.3) 3759 (28.5) Total number of eventsb 0 14104 (100) 0 (0) 9350 (100) 0 18693 (100) 0 (0) 1 0 (0) 2732 (76.4) 0 (0) 1169 (80.4) 0 (0) 8205 (62.2) 2 0 (0) 605 (16.9) 0 (0) 208 (14.3) 0 (0) 3138 (23.8) 3 0 (0) 162 (4.5) 0 (0) 46 (3.2) 0 (0) 1273 (9.6) 4 0 (0) 48 (1.3) 0 (0) 20 (1.4) 0 (0) 451 (3.4) 5 0 (0) 20 (0.6) 0 (0) 5 (0.3) 0 (0) 129 (1) 6 0 (0) 3 (0.1) 0 (0) 3 (0.2) 0 (0) 0 (0) 7 0 (0) 4 (0.1) 0 (0) 0 (0) 0 (0) 0 (0) 8 0 (0) 2 (0.1) 0 (0) 3 (0.2) 0 (0) 0 (0) Time-varying variables Encounters (controls) or events 14104 (100) 4782 (100) 9350 (100) 1870 (100) 18693 (100) 20749 (100) (cases)c Age (years) at event or matched

encounter

118

Hospitalization Emergency Department Encounter Oral Corticosteroid Order Control n (%a) Case n (%) Control n (%) Case n (%) Control n (%) Case n (%) 5 to < 13 1062 (7.5) 354 (7.4) 2265 (24.2) 453 (24.2) 4157 (22.2) 4245 (20.5) 13 to < 19 810 (5.7) 269 (5.6) 995 (10.6) 199 (10.6) 1926 (10.3) 1926 (9.3) 19 to < 45 5253 (37.2) 1751 (36.6) 4105 (43.9) 821 (43.9) 6013 (32.2) 6323 (30.5) 45 to < 62 4014 (28.5) 1338 (28) 1390 (14.9) 278 (14.9) 4313 (23.1) 5353 (25.8) 62 to < 75 1983 (14.1) 661 (13.8) 405 (4.3) 81 (4.3) 1613 (8.6) 2113 (10.2) > 75 years 982 (7.0) 409 (8.6) 190 (2.0) 38 (2.0) 671 (3.6) 789 (3.8) Year of encounter 2005 1593 (11.3) 531 (11.1) 845 (9) 169 (9) 0 (0) 0 (0) 2006 1767 (12.5) 589 (12.3) 905 (9.7) 181 (9.7) 0 (0) 0 (0) 2007 1659 (11.8) 552 (11.5) 1185 (12.7) 237 (12.7) 0 (0) 0 (0) 2008 1563 (11.1) 526 (11) 1220 (13) 244 (13) 3375 (18.1) 3375 (16.3) 2009 1819 (12.9) 608 (12.7) 1380 (14.8) 276 (14.8) 4038 (21.6) 4038 (19.5) 2010 1794 (12.7) 603 (12.6) 1205 (12.9) 241 (12.9) 4019 (21.5) 4019 (19.4) 2011 1886 (13.4) 648 (13.6) 1230 (13.2) 246 (13.2) 4286 (22.9) 4624 (22.3) 2012 2023 (14.3) 725 (15.2) 1380 (14.8) 276 (14.8) 2975 (15.9) 4693 (22.6) Season of encounterd Spring 3447 (24.4) 1219 (25.5) 2218 (23.7) 456 (24.4) 4337 (23.2) 4618 (22.3) Summer 3357 (23.8) 1134 (23.7) 2253 (24.1) 380 (20.3) 4536 (24.3) 3207 (15.5) Fall 4171 (29.6) 1183 (24.7) 2724 (29.1) 553 (29.6) 5695 (30.5) 6995 (33.7) Winter 3129 (22.2) 1246 (26.1) 2155 (23) 481 (25.7) 4125 (22.1) 5929 (28.6) Obesitye Not overweight/obese 3728 (26.4) 1046 (21.9) 3366 (36) 569 (30.4) 6591 (35.3) 5737 (27.6) Overweight 3605 (25.6) 1077 (22.5) 2173 (23.2) 376 (20.1) 4441 (23.8) 4821 (23.2) Obese 6683 (47.4) 2641 (55.2) 3762 (40.2) 895 (47.9) 7577 (40.5) 10137 (48.9) Missing 88 (0.6) 18 (0.4) 49 (0.5) 30 (1.6) 84 (0.4) 54 (0.3) Smoking status Never 7454 (52.9) 2014 (42.1) 5335 (57.1) 826 (44.2) 11375 (60.9) 11556 (55.7) Current 2552 (18.1) 1204 (25.2) 1466 (15.7) 387 (20.7) 2589 (13.9) 3672 (17.7) Former 3204 (22.7) 1238 (25.9) 1395 (14.9) 304 (16.3) 3231 (17.3) 4251 (20.5) Missing 894 (6.3) 326 (6.8) 1154 (12.3) 353 (18.9) 1498 (8) 1270 (6.1) Medical Assistancef 2657 (18.8) 1568 (32.8) 2529 (27) 741 (39.6) 4956 (26.5) 5850 (28.2) Type 2 diabetes 1504 (10.7) 917 (19.2) 517 (5.5) 156 (8.3) 1420 (7.6) 1905 (9.2) On inhaled corticosteroids 4061 (28.8) 1577 (33) 2545 (27.2) 713 (38.1) 5319 (28.5) 10458 (50.4) Distance to nearest major roadg, 1042 826 1077.5 651.5 1064 1032 median, meters Distance to nearest minor roadh, 708.5 535 682 411 687 691 median, meters

119

Hospitalization Emergency Department Encounter Oral Corticosteroid Order Control n (%a) Case n (%) Control n (%) Case n (%) Control n (%) Case n (%) Temperature on the prior day, 16.1 16.7 16.7 15 16.1 13.3 median, degrees Celsius Pad activity metric, 1010 /m2 Very low, less than 10.7 5988 (42.5) 2004 (41.9) 3671 (39.3) 719 (38.4) 2344 (12.5) 2661 (12.8) Low, 10.7 to 25.7 2811 (19.9) 816 (17.1) 2096 (22.4) 350 (18.7) 5281 (28.3) 6033 (29.1) Medium, 25.8 to 48.7 2675 (19) 887 (18.5) 1819 (19.5) 363 (19.4) 5489 (29.4) 6154 (29.7) High, greater than 48.7 2630 (18.6) 1075 (22.5) 1764 (18.9) 438 (23.4) 5579 (29.8) 5901 (28.4) Spud activity metric, 1010 /m2 Very low, less than 5.1 6009 (42.6) 2032 (42.5) 3701 (39.6) 742 (39.7) 2352 (12.6) 2551 (12.3) Low, 5.1 to 32.3 2796 (19.8) 819 (17.1) 2030 (21.7) 371 (19.8) 5491 (29.4) 5880 (28.3) Medium, 32.4 to 66.8 2719 (19.3) 821 (17.2) 1832 (19.6) 317 (17) 5389 (28.8) 6309 (30.4) High, greater than 66.8 2580 (18.3) 1110 (23.2) 1787 (19.1) 440 (23.5) 5461 (29.2) 6009 (29) Stimulation activity metric, 1013 x m/m2 Very low, less than 2.7 5829 (41.3) 1986 (41.5) 3598 (38.5) 729 (39) 2577 (13.8) 2668 (12.9) Low, 8,2.7 to 25.5 2876 (20.4) 858 (17.9) 2089 (22.3) 391 (20.9) 5573 (29.8) 5600 (27) Medium, 25.6 to 67.4 2736 (19.4) 841 (17.6) 1835 (19.6) 310 (16.6) 5415 (29) 6250 (30.1) High, greater than 67.4 2663 (18.9) 1097 (22.9) 1828 (19.6) 440 (23.5) 5128 (27.4) 6231 (30) Production activity metric, 1015 x m3/m2 Very low, less than 2.3 6079 (43.1) 2087 (43.6) 3776 (40.4) 765 (40.9) 2345 (12.5) 2335 (11.3) Low, 2.3 to 133.2 2629 (18.6) 794 (16.6) 1953 (20.9) 363 (19.4) 5713 (30.6) 5935 (28.6) Medium, 133.3 to 759.7 2636 (18.7) 798 (16.7) 1789 (19.1) 271 (14.5) 5787 (31) 6106 (29.4) High, greater than 759.7 2760 (19.6) 1103 (23.1) 1832 (19.6) 471 (25.2) 4848 (25.9) 6373 (30.7) a Percentages may not add to 100 due to rounding. b Cases contribute up to one hospitalization per year (hospitalizations are randomly chosen from patients with multiple hospitalizations in a year). Controls cannot have had a hospitalization up to the year of the hospitalization in the frequency-matched case, but can serve as a case later. c For controls, the encounter is a randomly selected encounter during the year of the matched case’s hospitalization and before the year of any subsequent asthma hospitalization in the control. For cases, the event is an asthma hospitalization. d Spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21. e For children and adults, respectively: normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25- <30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2 f A means tested program that is a surrogate for family SES. g Principal arterial or interstate h Minor arterial road

120

Table 3.4.2. Associations of unconventional natural gas activity metrics and asthma outcomes Asthma Asthma Emergency OCS Ordersa Hospitalizationsa Department Visitsa Odds Ratio (95% CIb) Odds Ratio (95% CI) Odds Ratio (95% CI) Lowc 1.26 (1.06 - 1.50) 1.53 (1.06 - 2.23) 1.54 (1.37 - 1.74) Pad Activity Medium 1.37 (1.15 – 1.64) 1.77 (1.2 - 2.6) 1.66 (1.47 - 1.87) Metric High 1.45 (1.21 – 1.73) 1.37 (0.94 - 1.99) 1.59 (1.41 - 1.81) Spud Low 1.16 (0.98 – 1.37) 1.53 (1.06 - 2.21) 1.45 (1.29 - 1.63) Activity Medium 1.26 (1.05 – 1.50) 1.54 (1.04 - 2.27) 1.98 (1.75 - 2.24) Metric High 1.64 (1.38 – 1.97) 1.57 (1.08 - 2.29) 1.99 (1.75 - 2.26) Stimulation Low 1.13 (0.96 - 1.33) 1.51 (1.05 - 2.19) 1.23 (1.09 - 1.39) Activity Medium 1.31 (1.10 - 1.57) 1.74 (1.17 - 2.61) 2.22 (1.95 - 2.53) Metric High 1.66 (1.38 - 1.98) 1.71 (1.16 - 2.52) 3.00 (2.60 - 3.45) Production Low 1.10 (0.92 - 1.30) 1.47 (1.01 - 2.14) 1.28 (1.13 - 1.46) Activity Medium 1.16 (0.97 - 1.38) 1.10 (0.74 - 1.65) 2.15 (1.87 - 2.47) Metric High 1.74 (1.45 - 2.09) 2.19 (1.47 - 3.25) 4.43 (3.75 - 5.22) a Multilevel models with a random intercept for patient and community, adjusted for age category (5-12, 13-18, 19-44, 45-61, 62-74, 75+ years), sex (male, female), race/ethnicity (white, black, Hispanic, other), family history of asthma (yes vs. no), smoking status (never, former, current, missing), season (spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21), Medical Assistance (yes vs. no), overweight/obesity (normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults, respectively; BMI missing), type 2 diabetes (yes vs. no), community socioeconomic deprivation (quartiles), distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), squared distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), maximum temperature on the day prior to event (degrees Celsius), and squared maximum temperature on the day prior to event (degrees Celsius) b Confidence interval c Very low is the reference group

3.4.3 Sensitivity Analyses

The four UNGD activity metrics, calculated for all case and control index dates

(n=69,548), were correlated with one another (Spearman correlation coefficients of the continuous variables ranged from 0.73-0.91). In the analysis to evaluate associations of a combined UNGD activity metric of the four phases of development, the odds ratio point estimates were between those from regressions of each phase separately. In the negative disease control analysis, we found no association of the spud activity metric with gastrointestinal illness. In a model evaluating associations of counties with

121

outcomes (UNGD metrics removed), counties with high UNGD activity were not

associated with outcomes (Figure 3.4.3). In the analysis that removed cells with

unbalanced numbers of cases and controls in the mild exacerbation analysis,

associations were attenuated (odds ratios decreased by 5%, 17%, 37%, and 55% for the

high group odds ratio for the pad, spud, stimulation, and production metrics,

respectively, all odds ratios p<0.05). In the analysis to evaluate the impact of different quality of geocoding, associations were unchanged. In the analysis of the mild and severe exacerbations, we determined that even an unmeasured confounder strongly associated with both UNGD activity and outcome (e.g., both odds ratios = 3.0), and a prevalence of 0.3 in the exposed group, would not likely change our inference about associations, given our models. However, for moderate exacerbations, an unmeasured confounder with the same characteristics could account for two of the three statistically

significant associations.

122

Figure 3.4.3. Counties Associated with Asthma Hospitalization Case Status.

123

3.5 Discussion We conducted a nested case-control study in a large number of asthma patients using EHR data in Pennsylvania from 2005-2012, a period of rapid development. In this first study of UNGD and objective respiratory outcomes, we found consistent associations of four UNGD activity metrics with three types of asthma exacerbations.

Whether these associations are causal awaits further investigation, including more detailed exposure assessment.

Asthma is a suitable outcome because UNGD has community and environmental impacts that could affect it; it is highly prevalent; it can be exacerbated by stress and small changes in air quality with short latency; and patients usually seek care for exacerbations so they are captured by an EHR. By leveraging longitudinal EHR data, we

were able to complete a number of sensitivity analyses that suggested the associations

were robust to increasing levels of adjustment, although in some cases they were

attenuated.

Studies of air pollution and asthma exacerbations have generally found

small but consistently increased risks. A study of pediatric emergency department visits

for asthma in Atlanta found that a standard deviation increase in pollution had

associated risk ratios of 1.020, 1.036, and 1.062 for particulate matter < 10μm, nitrogen

dioxide, and ozone, respectively.46 Studies on psychosocial stress have found that in children with asthma, the risk of an asthma exacerbation increased 4.7 times in the two days following a very stressful event.47 Adults exposed to violence in their community

have 2.3 and 2.5 times the risk of an asthma emergency department visit and

hospitalization, respectively, than those not exposed to community violence.48

Two sensitivity analyses were directed to the very important possibility that

unmeasured confounding could account for our results. First, UNGD metrics were not

associated with the negative disease control. Second, in the analysis replacing UNGD

124

metrics with indicators for counties, counties with UNGD were not associated with

severe exacerbations. These both provide evidence that unmeasured confounding is unlikely to account for our findings, but we acknowledge that the possibility still exists.

We note that an unmeasured confounder would need to be strongly associated with both

UNGD and asthma outcomes to account for our results. In sensitivity analysis to address unbalanced numbers of cases and controls, results were attenuated; the

majority of dropped patients comprised the most susceptible groups (younger and older)

in the most exposed years, so attenuation was not unexpected. Finally, geocoding

method and analysis with an overall activity metric did not change inferences

This study had several strengths, including a large sample size from a population

that represents the general population in the region. Additionally, our exposure

assessment improved on in prior studies,49,50 which used categorical distance-based metrics, that did not account for UNGD phases. Our metric incorporated the temporality and duration of phases, gas production volume, and a surrogate for truck traffic. This study also improved on outcome ascertainment used in the previous study on UNGD and respiratory outcomes,50 which relied on self-reported outcomes and grouped several

respiratory symptoms and conditions together (including asthma). We used documented

asthma exacerbations. Our findings were robust to increasing levels of covariate control

and in several sensitivity analyses.

This study also had limitations. The EHR does not collect information on

occupation and only keeps patients’ most recent address. However, comparing

addresses used in a prior study35 to addresses used in this study (39 months apart),

79.8% of patients were at the same address and an additional 7.4% and 7.6% were less

than 3.2km and 3.2-16km, respectively, from their prior address, indicating little residential mobility. The EHR only collects data on events that occur at Geisinger facilities, but ambulances go to the closest hospital, so we may have under-counted

125

events. We were unable to differentiate between asthma exacerbations that were

hospitalized from those that occurred while hospitalized. We frequency-matched cases

and controls for year because UNGD activity metrics and year were highly correlated.

We did not include year in the final model because of this high correlation, so there

remains the possibility of unmeasured residual confounding by factors that strongly vary

by year. We kept all four UNGD metrics because of a priori evidence that exposures

differed by phase, but because metrics were highly correlated we were unable to

definitively distinguish among them. Furthermore, our UNGD metrics do not provide

insight into the mechanism of the associations we observed.

Asthma is a common disease with large individual and societal burdens, so the possibility that UNGD may increase risk for asthma exacerbations requires public health attention. As ours is the first study of UNGD and objective respiratory outcomes, and several other health outcomes have not been investigated to date, there is an urgent need for more health studies. These should include more detailed exposure assessment to better characterize pathways and identify the phases of development that present the most risk.

126

3.6 References 1. Akinbami LJ, Bailey CM, Johnson CA, et al. National surveillance of asthma: United

States, 2001-2010. National Center for Health Statistics, Vital Health Stat. 2012;3:35.

2. National Heart, Lung, and Blood Institute. National Asthma Education Program.

Expert Panel on the Management of Asthma. Expert panel report 3: Guidelines for the

diagnosis and management of asthma: Full report. US Department of Health and Human

Services, National Institutes of Health, National Heart, Lung, and Blood Institute; 2007.

3. Dougherty R, Fahy JV. Acute exacerbations of asthma: Epidemiology, biology and the exacerbation Clinical‐prone phenotype.& Experimental Allergy. 2009;39(2):193-202.

4. Guarnieri M, Balmes JR. Outdoor air pollution and asthma. The Lancet.

2014;383(9928):1581-1592.

5. Gent JF, Triche EW, Holford TR, et al. Association of low-level ozone and fine particles with respiratory symptoms in children with asthma. JAMA. 2003;290(14):1859-

1867.

6. Peel JL, Tolbert PE, Klein M, et al. Ambient air pollution and respiratory emergency department visits. Epidemiology. 2005;16(2):164-174.

7. Ostro B, Lipsett M, Mann J, Braxton-Owens H, White M. Air pollution and exacerbation of asthma in African-American children in Los Angeles. Epidemiology.

2001;12(2):200-208.

8. Dominici F, Peng RD, Bell ML, et al. Fine particulate air pollution and hospital admission for cardiovascular and respiratory diseases. JAMA. 2006;295(10):1127-1134.

9. Ko F, Tam W, Wong T, et al. Effects of air pollution on asthma hospitalization rates in different age groups in Hong Kong. Clinical & Experimental Allergy. 2007;37(9):1312-

1319.

127

10. Schildcrout JS, Sheppard L, Lumley T, Slaughter JC, Koenig JQ, Shapiro GG.

Ambient air pollution and asthma exacerbations in children: An eight-city analysis. Am J

Epidemiol. 2006;164(6):505-517.

11. Halonen JI, Lanki T, Yli-Tuomi T, Kulmala M, Tiittanen P, Pekkanen J. Urban air

pollution, and asthma and COPD hospital emergency room visits. Thorax.

2008;63(7):635-641.

12. Yonas MA, Lange NE, Celedón JC. Psychosocial stress and asthma morbidity.

Current Opinion in Allergy and Clinical Immunology. 2012;12(2):202-210.

13. Chen E, Miller GE. Stress and inflammation in exacerbations of asthma. Brain Behav

Immun. 2007;21(8):993-999.

14. Wisnivesky JP, Lorenzo J, Feldman JM, Leventhal H, Halm EA. The relationship

between perceived stress and morbidity among adult inner-city asthmatics. Journal of

Asthma. 2010;47(1):100-104.

15. New York State Department of Health Completes Review of High-volume Hydraulic

Fracturing [press release]. Albany, NY: New York State Department of Enviornmental

Conservation, December 17, 2014.

16. Cox E. Assembly votes to ban fracking for two years. Baltimore Sun April 10, 2015.

17. Werner AK, Vink S, Watt K, Jagals P. Environmental health impacts of

unconventional natural gas development: A review of the current strength of evidence.

Sci Total Environ. 2015;505:1127-1141.

18. Mitka M. Rigorous evidence slim for determining health risks from natural gas

fracking. JAMA. 2012;307(20).

19. Gaines M. PennDOT’s posting and bonding program and impact of unconventional oil & gas [webinar]. http://extension.psu.edu/natural-resources/natural-

gas/webinars/shale-energy-developments-effect-on-the-posting-bonding-and-

128

maintenance-of-roads-in-rural-pa/mark-gaines-may-16-2013-powerpoint. Published May

16, 2013.

20. Maloney KO, Yoxtheimer DA. Production and disposal of waste materials from gas

and oil extraction from the Marcellus shale play in Pennsylvania. Env Prac.

2012;14(04):278-287.

21. Pennsylvania Code. Subchapter E. Well reporting § 78.121-§ 78.125.

http://www.pacode.com/secure/data/025/chapter78/subchapEtoc.html.

22. Roy AA, Adams PJ, Robinson AL. Air pollutant emissions from the development,

production, and processing of Marcellus shale natural gas. J Air Waste Manage Assoc.

2013;64(1):19-37.

23. Litovitz A, Curtright A, Abramzon S, Burger N, Samaras C. Estimation of regional air-

quality damages from Marcellus shale natural gas extraction in Pennsylvania.

Environmental Research Letters. 2013;8(1):014017.

24. McKenzie LM, Witter RZ, Newman LS, Adgate JL. Human health risk assessment of

air emissions from development of unconventional natural gas resources. Sci Total

Environ. 2012;424:79-87.

25. Sangaramoorthy T, Jamison AM, Boyle MD, et al. Place-based perceptions of the

impacts of fracking along the Marcellus shale. Soc Sci Med. 2016.

26. Adgate JL, Goldstein BD, McKenzie LM. Potential public health hazards, exposures

and health effects from unconventional natural gas development. Environ Sci Technol.

2014;48(15):8307-8320.

27. Vinciguerra T, Yao S, Dadzie J, et al. Regional air quality impacts of hydraulic

fracturing and shale natural gas activity: Evidence from ambient VOC observations.

Atmos Environ. 2015;110:144-150.

129

28. Gopalakrishnan S, Klaiber HA. Is the shale energy boom a bust for nearby residents? evidence from housing values in Pennsylvania. Am J Agric Econ.

2014;96(1):43-66.

29. Muehlenbachs L, Spiller E, Timmins C. The housing market impacts of shale gas

development. Am Econ Rev. 2015;105(12):3633-59.

30. Brunekreef B, Holgate ST. Air pollution and health. Lancet. 2002;360:1233-1242.

31. Salam MT, Islam T, Gilliland FD. Recent evidence for adverse effects of residential proximity to traffic sources on asthma. Curr Opin Pulm Med. 2008;14(1):3-8.

32. Hanson MD, Chen E. Brief report: The temporal relationships between sleep, cortisol, and lung functioning in youth with asthma. J Pediatr Psychol. 2007;33(3):312-

316.

33. Daniel LC, Boergers J, Kopel SJ, Koinis-Mitchell D. Missed sleep and asthma morbidity in urban children. Annals of Allergy, Asthma & Immunology. 2012;109(1):41-

46.

34. Griswold SK, Nordstrom CR, Clark S, Gaeta TJ, Price ML, Camargo CA. Asthma exacerbations in North American adults: Who are the “frequent fliers” in the emergency department? Chest Journal. 2005;127(5):1579-1586.

35. Casey JA, Cosgrove SE, Stewart WF, Pollak J, Schwartz BS. A population-based study of the epidemiology and clinical features of methicillin-resistant staphylococcus aureus infection in Pennsylvania, 2001-2010. Epidemiol Infect. 2013;141(06):1166-1179.

36. Pacheco JA, Avila PC, Thompson JA, et al. A highly specific algorithm for identifying asthma cases and controls for genome-wide association studies. AMIA Annu Symp

Proc. 2009;2009:497-501.

37. Schwartz BS, Stewart WF, Godby S, et al. Body mass index and the built and social environments in children and adolescents using electronic health records. Am J Prev

Med. 2011;41(4):e17-e28.

130

38. U.S. Department of Transportation Federal Highway Administration. Highway

Performance Monitoring System website. http://www.fhwa.dot.gov/policyinformation/hpms/shapefiles.cfm. Updated 2013.

Accessed March 27, 2015.

39. Liu AY, Curriero FC, Glass TA, Stewart WF, Schwartz BS. The contextual influence of coal abandoned mine lands in communities and type 2 diabetes in Pennsylvania.

Health Place. 2013.

40. National Climatic Data Center. Climate Data Online website.

http://www.ncdc.noaa.gov/cdo-web/. Accessed May 11, 2011.

41. SkyTruth. TADPOLE Pennsylvania results. http://frack.skytruth.org/frackfinder/frackfinder-news/tadpolepennsylvaniaresults.

Published Feb 12, 2014. Updated 2014. Accessed June 30, 2014.

42. SkyTruth. Fracking chemical database. http://frack.skytruth.org/fracking-chemical- database. Updated 2013. Accessed November 27, 2013.

43. Ogden CL, Carroll MD, Kit BK, Flegal KM. Prevalence of childhood and adult obesity in the united states, 2011-2012. JAMA. 2014;311(8):806.

44. Lipsitch M, Tchetgen Tchetgen E, Cohen T. Negative controls: A tool for detecting confounding and bias in observational studies. Epidemiology. 2010;21(3):383-388.

45. VanderWeele TJ, Arah OA. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology.

2011;22(1):42-52.

46. Strickland MJ, Darrow LA, Klein M, et al. Short-term associations between ambient air pollutants and pediatric asthma emergency department visits. American journal of respiratory and critical care medicine. 2010;182(3):307-316.

131

47. Sandberg S, Jarvenpaa S, Penttinen A, Paton JY, McCann DC. Asthma exacerbations in children immediately following stressful life events: A cox's hierarchical regression. Thorax. 2004;59(12):1046-1051.

48. Apter AJ, Garcia LA, Boyd RC, Wang X, Bogen DK, Ten Have T. Exposure to community violence is associated with asthma hospitalizations and emergency department visits. J Allergy Clin Immunol. 2010;126(3):552-557.

49. McKenzie LM, Guo R, Witter RZ, Savitz DA, Newman LS, Adgate JL. Birth outcomes and maternal residential proximity to natural gas development in rural Colorado. Environ

Health Perspect. 2014.

50. Rabinowitz PM, Slizovskiy IB, Lamers V, et al. Proximity to natural gas wells and reported health status: Results of a household survey in Washington counties,

Pennsylvania. Environ Health Perspect. 2014.

132

Chapter 4: Associations of unconventional natural gas development with disordered sleep and depression symptoms in Pennsylvania

4.0 Cover Page Sara G. Rasmussen, MHS1; Holly C. Wilcox, PhD2; Annemarie G. Hirsch3; Jonathan Pollak1; Brian S. Schwartz, MD, MS1,3,4 1Department of Environmental Health and Engineering, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA; 2Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA; 3Department of Epidemiology and Health Services Research, Geisinger Health System, Danville, Pennsylvania, USA; 4Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA

Acknowledgements: We thank Joseph J. DeWalle, BS (Geisinger Health System) for patient geocoding; Aaron Tustin (JHSPH) for assistance with survey weights; and Karen Bandeen-Roche, PhD (JHSPH) for assistance interpreting the mediation analysis results. This research was funded by National Institutes of Health U19 AI106683 (PI Robert Schleimer), R21 ES023675 (PI Brian Schwartz), and training grant ES07141 (Sara Rasmussen); the Degenstein Foundation; and the National Science Foundation Integrative Graduate Education and Research Traineeship (Sara Rasmussen). No funders had input into the study design, conduct, data collection or analysis, or manuscript preparation. The authors declare they have no actual or potential competing financial interests. Dr. Schwartz is a Fellow of the Post Carbon Institute (PCI), serving as an informal advisor on climate, energy, and health issues. He receives no payment for this role. His research is entirely independent of PCI, and is not motivated, reviewed, or funded by PCI.

133

4.1 Abstract Background: Social and environmental factors are associated with depression.

Unconventional natural gas development (UNGD) has community and environmental

impacts.

Objectives: In this study, we evaluated the association of UNGD with depression symptoms and disordered sleep diagnoses. There are no prior studies of UNGD and mental health or sleep outcomes.

Methods: We identified depression symptoms among 4,762 adult primary care patients from Geisinger Clinic in Pennsylvania who responded to a questionnaire that included the PHQ-8. For these patients, we used electronic health records to identify 3,868 disordered sleep diagnoses and frequency-matched these to control dates on age, sex, and year. We assigned each person (depression analysis) or diagnosis date (sleep

analysis) a metric for residential UNGD (very low, low, medium, and high) that

incorporated dates and durations of well development, distance from patient homes to

wells, and well characteristics. We estimated associations of the residential UNGD

metric with depression symptoms using negative binomial and multinomial logistic

regression, and evaluated mediation by migraine and fatigue symptoms. We estimated

associations of the residential UNGD metric with disordered sleep using generalized

estimating equations. Models were weighted to account for sampling design and

participation.

Results: High UNGD activity (vs. very low) was associated with an increasing burden of

depression symptoms (exponentiated coefficient = 1.18, 95% confidence interval [CI]:

1.04 - 1.34). We observed weak evidence of mediation by fatigue. UNGD was not

associated with disordered sleep.

134

Conclusions: UNGD activity was associated with depression symptoms in adults in

Pennsylvania.

135

4.2 Introduction Unconventional natural gas development (UNGD) is a long-lasting industrial

process with environmental and social impacts, including noise; light; vibration; truck

traffic; air, water, and soil pollution; social disruption; changes in home prices; and stress

and anxiety related to rapid industrial development.1-17 The process involves pad

preparation, drilling, stimulation (“fracking”), and production, involving development of

both wells and associated infrastructure (e.g., pipelines, compressor stations).1

Pennsylvania has proceeded with UNGD rapidly: development began in the state in the mid-2000s and by 2015, 9,669 wells had been drilled. UNGD has been associated with

several health outcomes for which there are environmental and social risk factors,18-22 but to date no epidemiologic study has evaluated a mental health outcome. Here, we evaluated the association of UNGD with depression symptoms, measured by the Patient

Health Questionnaire depression scale (PHQ-8), which is used in both epidemiology studies and clinical settings.23 Depression is a symptom-based condition defined by

hopelessness, helplessness, sad or irritable mood, loss of interest in activities, and

fatigue.24

This study was conducted in a sample of adults from the Geisinger Clinic.25

which has had an electronic health record (EHR) since the 2000s and is located in

central and northeastern Pennsylvania, a region with a range of UNGD. A study of the

patient sample reported herein showed associations between UNGD and nasal and

sinus, migraine, and fatigue symptoms,26 and a study of Geisinger patients with asthma

showed an association of UNGD and asthma exacerbations (using objectively-

documented events from the EHR) 27. We hypothesized that these associations could be

related to air pollution and/or stress pathways. Here, we studied depression because it

can also be affected by stress and air pollution, is common, and has significant public

136

health and economic costs.23,24,28-34 We explored effect modification of the UNGD- depression symptoms association by antidepressant use, and mediation of the association by self-reported fatigue and migraine symptoms and by disordered sleep diagnoses from the EHR (Figure 4.2), each of which can be comorbid with depression symptoms 35-39. Many aspects of UNGD could impact sleep. Disordered sleep can be one cause of fatigue, but there are several other causes of fatigue independent of disordered sleep.40 Because the association of UNGD with disordered sleep diagnoses

has not been previously studied, we also directly evaluated this association.

137

Figure 4.2. Relationships among UNGD and moderating, mediating, and outcome variables.

The baseline questionnaire was mailed in April 2014 and the follow-up questionnaire in October 2014. The dashed lines identify the associations evaluated in this study; the dotted line identifies an association that could not be evaluated because an insufficient number of events of asthma exacerbations were available in EHR data at the time of this analysis; and the solid lines identify associations evaluated in prior studies (1 = Rasmussen et al. 2016, 2 = Tustin et al. 2016). Abbreviations: UNGD = unconventional natural gas development; EHR = electronic health record.

4.3 Methods 4.3.1 Survey design and study population Depression symptoms were ascertained in a questionnaire that was designed to study nasal and sinus symptoms (methods described previously25,26). The survey design oversampled for people more likely to have nasal and sinus symptoms and racial/ethnic minorities. Briefly, in April 2014, a baseline questionnaire that included questions on migraine headache and fatigue symptoms was sent to 23,700 adults 18 years of age and older, of whom 7,847 responded (response rate = 33.1%). Six months later, a

138

follow-up questionnaire, which included questions on depression symptoms, was sent to all responders of the baseline questionnaire. Of the 7,847 study subjects who received

the follow-up questionnaire, 4,966 returned the questionnaire (response rate = 63.3%).

After excluding respondents who lived outside Pennsylvania (n = 34), the analysis

consisted of 4,932 participants. Returned follow-up questionnaires were received

between November 4, 2014 and May 14, 2015 (median date of November 12, 2014).

The study was approved by the Geisinger Institutional Review Board (with an IRB

authorization agreement with Johns Hopkins Bloomberg School of Public Health).

Implied consent was considered to have been provided if the participant returned the

mailed questionnaire.

4.3.2 Outcome ascertainment

4.3.2.1 Depression symptoms

The follow-up questionnaire included the PHQ-8. Each question on the PHQ-8

has response options as “not at all,” “several days,” “more than half the days,” or “nearly

every day,” scored as 0 to 3 respectively. For participants who answered all eight

questions, their total score was the sum of each of the eight questions.23 For participants

who answered less than eight questions, to include the greatest number of subjects in

the study as possible, we calculated their total score as a pro-rated sum using the

formula: (sum of answered questions x 8)/(number of questions answered). We used the

PHQ-8’s depression severity categories, but combined the two most severe groups

because few participants had a “severe” total score. Scores were categorized into 0 to

<5, no significant depression symptoms; 5 to <10, mild depression symptoms; 10 to <15,

moderate depression symptoms; and 15 to 24, moderately severe/severe depression

symptoms.23

4.3.2.2 Disordered sleep diagnoses

139

Disordered sleep diagnoses (case-events) among the study population were identified in Geisinger’s EHR. We identified encounters (98% outpatient) in the EHR that were accompanied by ICD-9 codes for disordered sleep (Table 4.3.2.2).41 We also

identified orders for disordered sleep medications, using drug class hypnotics and using

drug subclass and name. We included all medications in the drug subclass antihistamine

hypnotics, selective melatonin receptor agonists, hypnotics – tricyclic agents, and orexin

receptor antagonists. In the subclass non-barbiturate hypnotics, we included all

medications except midazolam hydrochloride. We considered either an appropriate

medication order or an encounter with the appropriate ICD-9 code as a disordered sleep

outcome. We excluded disordered sleep outcomes from before 2009 (when UNGD

activity was low), only retained disordered sleep diagnoses from when the participant

was 18 years of age or older, and randomly selected one disordered sleep diagnosis per

participant per year so that study subjects with many encounters for sleep disorders

would not unduly contribute (Figure 4.3.2.2).

Table 4.3.2.2. ICD-9 codes used to identify disordered sleep. Abbreviation: ICD-9 = International Classification of Diseases, 9th Revision, Clinical Modification ICD-9 code Description 780.52 Insomnia 780.50 Sleep disturbance, unspecified 307.47 Other dysfunctions of sleep stages or arousal from sleep 780.59 Other sleep disturbances 307.42 Persistent disorder of initiating or maintaining sleep 780.5 Sleep disturbances 307.41 Transient disorder of initiating or maintaining sleep 307.40 Nonorganic sleep disorder, unspecified 307.48 Repetitive intrusions of sleep 780.56 Dysfunctions associated with sleep stages or arousal from sleep 780.55 Disruptions of 24-hour sleep-wake cycle

140

Figure 4.3.2.2. Flow diagram for identification of disordered sleep diagnoses. Disordered sleep events were identified using medications and ICD-9 codes from encounters (98% outpatient). In this figure, the numbers refer to counts of medications for disordered sleep and encounters with ICD-9 codes for disordered sleep.

Control dates were frequency-matched to cases on age category (18-44, 45-61,

62-74, 75+ years), sex, and year. For control dates, we identified all their dates of

contact with the health system (e.g., medications, inpatient and outpatient visits,

procedures), excluded contact dates within one year of a disordered sleep diagnosis for

a case, and randomly selected one encounter date per year per participant. Encounter

dates, not patients, had to be the selection frame for this analysis because UNGD

activity metrics and many covariates were time-varying.

4.3.3 Potential mediating variables: migraine and fatigue symptoms Using the baseline questionnaire, we used migraine headache and fatigue symptom score groups as previously described.26 We used the validated scoring method

of the ID Migraine questionnaire, which covers the past twelve months, to classify those

141

with migraine headaches from those without.42 We used the Patient-Reported Outcomes

Measurement Information System fatigue short form 8a to create fatigue symptom score groups 43. For participants who answered all eight questions, we summed the

responses, which ranged from “not at all” (1) to “very much” (5). For participants who

answered between four and seven questions, we assigned an adjusted score: (sum of

answered questions x 8)/(number of questions answered).43 We considered participants in the highest quartile of fatigue scores to have severe fatigue and compared them to the bottom three quartiles.

4.3.4 Well data and activity metric assignment Well data were compiled from the Pennsylvania Department of Environmental

Protection, the Pennsylvania Department of Conservation and Natural Resources, and

SkyTruth, as described previously.18,26,27,44 Data that were collected for all

unconventional natural gas wells in Pennsylvania from 2005-2015 included: latitude and

longitude; well pad; dates of drilling, stimulation, and production; total depth; and volume

of natural gas produced.

We assigned UNGD activity for the four phases of well development (pad

preparation, drilling, stimulation, and production) to each study subject (in the depression

symptom analysis) or index date (in the disordered sleep analysis) using metrics that

incorporated distances from participant residence to wells, and the density and size of

wells as in prior studies.18,26,27 For each phase of well development, the metric was

assigned using Equation 4.3.4.

Equation 4.3.4. Activity metric.

2 In Equation 4.3.4, n is the number of wells in the given phase, dij was the squared-

distance (meters) between well i and participant j, and si was 1 for the pad production

142

and drilling phases, total well depth (meters) of well i for the stimulation phase, and daily natural gas production volume (m3) of well i for the production phase.

For the depression symptom analysis, for each phase of development, the metric

was summed for the 14 days prior to the date of the returned follow-up questionnaire (d;

negative in the formula above because it represents days before the survey was

returned). We chose 14 days prior to the survey return because the PHQ-8 asks about

depression symptoms over the past two weeks. In the analysis to evaluate mediation by

fatigue or migraine symptoms of the UNGD-depression symptom association, we

assigned the UNGD metric for the three months before baseline questionnaire return

because we had previously observed that UNGD activity summed over three months

was associated with migraine and fatigue symptoms at baseline 26. For the disordered

sleep analysis, UNGD activity was summed for the three months prior to the date of the

sleep disorder diagnosis. In all analyses, we z-transformed the activity metrics for each

of the four phases of development, summed the transformed values, and quartiled the

sums to create a composite UNGD metric to create the very low, low, medium, and high

UNGD activity groups.

4.3.5 Covariates

Using the EHR, we created covariates for race/ethnicity; sex; Medical

Assistance, a means-tested program that was used as a surrogate for family

socioeconomic status; age; smoking and alcohol status; body mass index; and

antidepressant medication use in the month prior to survey return using drug group,

class, sub-class, and name.45 Time-varying-covariates (all but race/ethnicity and sex)

were assigned before the date of survey return (for the depression symptom analysis) or

before the diagnosis or comparison date (for the disordered sleep analysis). Using

previously described methods,46 we geocoded study subjects to their residential address

in the EHR, 89.13% to street address, 3.14% to ZIP+4, and 7.73% to ZIP code centroid.

143

Using participants’ geocoded coordinates, we assigned people to a community using a mixed definition of place (township, borough, or census tract in cities), calculated community

socioeconomic deprivation for each community,46,47 and created a covariate for water

source (municipal water or well water) by using the locations of public water service

areas from the Pennsylvania Department of Environmental Protection.48

4.3.6 Statistical analysis

All models were weighted to account for the survey stratified sampling design, the response rate to the baseline questionnaire, and loss to follow-up from the baseline

to the follow-up questionnaires (Table 4.3.6). Because one weight was much larger than

the others, we truncated the largest weight to the next largest for our primary analyses.49

To build models, we first included the composite UNGD activity index (very low, low,

medium, high), and then added, one at a time, race/ethnicity (white non-Hispanic, black

non-Hispanic, Hispanic), sex (male, female), Medical Assistance (no, yes), age (years),

smoking status (never, former, current), alcohol status (yes, heavy [based on the

Centers for Disease Control definition of heavy drinking as 8 or more drinks per for

females and 15 or more drinks per week for males50]; yes, not heavy; no), body mass index (BMI, kg/m2), community socioeconomic deprivation, and water source (municipal water, well water). We centered the continuous covariates (age, BMI, and community

socioeconomic deprivation) and included them as linear and quadratic terms to allow for

non-linearity. We used a 2-sided type 1 error rate of 0.05 for significance testing. We

used Stata version 11.2 (StataCorp Inc.) and R version 3.2.2 (R Foundation for

Statistical Computing) for analysis.

144

Table 4.3.6. Calculation of sample weights. Higher likelihood Intermediate Lower likelihood Race/ethnicity of CRS likelihood of CRS of CRS White non-Hispanic Identified using EHR 13,132 47,892 131,366 Received baseline survey 12,209 4,224 2,775 Responded to baseline survey 4,691 1,481 871 Responded to follow-up survey 3,076 950 551 Sample weight 4.27 50.41 238.41 Black non-Hispanic Identified using EHR 170 991 2,832 Received baseline survey 159 903 1,109 Responded to baseline survey 35 144 155 Responded to follow-up survey 17 67 62 Sample weight 10.00 14.79 45.68 Hispanic Identified using EHR 192 1,035 3,159 Received baseline survey 181 966 1,174 Responded to baseline survey 35 206 167 Responded to follow-up survey 19 92 98 Sample weight 10.11 11.25 32.23 Abbreviations: EHR = electronic health record; CRS = chronic rhinosinusitis

145

We evaluated the association of UNGD with depression symptoms in two ways, using multinomial logistic regression and negative binomial regression. We used

multinomial logistic regression to evaluate the association of UNGD with different levels

of depression symptoms separately because we hypothesized that UNGD would be

differentially associated with depression symptom severity. We fit multinomial logistic

models to estimate the association of the UNGD activity metric with each level of

depression symptoms (mild, moderate, moderately severe/severe) compared to no

depression symptoms (base outcome). We also evaluated the association of UNGD with

depression symptoms using negative binomial logistic regression. Negative binomial

logistic regression treats the PHQ-8 score as a continuous outcome, which allowed us to

evaluate associations between UNGD and the burden of depression symptoms, rather

than with the screening tool cutoffs.51

We hypothesized that participants not on antidepressants may be more

susceptible to potential stressors like UNGD. To test this hypothesis, we evaluated effect

modification by antidepressant use by adding cross-products of the UNGD indicator

variables and antidepressant medication use to the final multinomial logistic and

negative binomial models, and used a Wald test to evaluate the significance of the

cross-products. We evaluated whether migraine or fatigue symptoms, measured on the

baseline questionnaire, mediated the associations between UNGD and depression

symptoms (measured on the follow-up survey) by including these variables in the

negative binomial model, by comparing the UNGD effect estimates from models with

and without the potential mediators.

Because weighted models are less precise, but more unbiased, and unweighted

models tend to be more precise, but more biased,52 we wanted to evaluate the influence

of weighting. In a sensitivity analysis, we evaluated associations of UNGD with

depression symptoms among all subjects using the final multinomial logistic and

146

negative binomial models without weights and with full weights (i.e., not truncating the largest weight to the second largest weight, as was done in the primary analysis).

To assess the association of the UNGD activity metrics with disordered sleep

diagnoses, we fit a survey-weighted generalized estimating equations model, to account

for multiple events within participants. In this analysis, if UNGD was associated with

disordered sleep, we would next evaluate whether disordered sleep was a mediator of

the UNGD-depression symptom association.

4.4 Results 4.4.1 Description of study population Of the 4,932 subjects in the study, 170 did not answer any PHQ-8 questions,

2,976 had no significant depression symptoms, 1,075 had mild depression symptoms,

454 had moderate depression symptoms, and 257 had moderately severe or severe

depression symptoms (Table 4.4.1). Participants with more severe depression

symptoms, compared to those with no or less severe symptoms, were more likely to be

female, take antidepressants, and record having heavy alcohol use in the EHR (all p <

0.01). We identified 8,578 disordered sleep diagnoses using EHR data among 1,699 of

the 4932 study subjects. The remaining 3,233 study subjects did not have disordered

sleep diagnoses. After selecting one disordered sleep diagnosis per person per year,

3,868 disordered sleep diagnoses were included in the study (Figure 4.3.2.2).

Participants with at least one disordered sleep diagnosis, compared to those with no

disordered sleep diagnoses, were more likely to be female and to be older at the time of

survey return (both p < 0.05).

147

Table 4.4.1. Descriptive statistics by depression symptoms. Depression symptoms No significant Moderately depression severe / Variable symptoms Mild Moderate severe Missing Total numbera, n (%b) 2,976 (100) 1,075 (100) 454 (100) 257 (100) 170 (100) UNGDc metric, n (%) Very low 756 (25.4) 259 (24.1) 117 (25.8) 62 (24.1) 39 (22.9) Low 726 (24.4) 285 (26.5) 113 (24.9) 66 (25.7) 43 (25.3) Medium 776 (26.1) 253 (23.5) 101 (22.2) 58 (22.6) 45 (26.5) High 718 (24.1) 278 (25.9) 123 (27.1) 71 (27.6) 43 (25.3) pd = 0.65 Race, n (%) White 2766 (92.9) 1005 (93.5) 420 (92.5) 228 (88.7) 158 (92.9) Black 88 (3.0) 33 (3.1) 14 (3.1) 8 (3.1) 3 (1.8) Hispanic 122 (4.1) 37 (3.4) 20 (4.4) 21 (8.2) 9 (5.3) p = 0.11 Female, n (%) 1829 (61.5) 721 (67.1) 294 (64.8) 191 (74.3) 87 (51.2) p < 0.01 Medical Assistance, n 138 (4.6) 107 (10.0) 80 (17.6) 84 (32.7) 12 (7.1) (%) p < 0.01 Smoking status, n (%) Never 1774 (59.6) 588 (54.7) 233 (51.3) 107 (41.6) 83 (48.8) Current 278 (9.3) 162 (15.1) 81 (17.8) 67 (26.1) 18 (10.6) Former 924 (31.0) 325 (30.2) 140 (30.8) 83 (32.3) 69 (40.6) p < 0.01 Community type, n (%) Borough 799 (26.8) 284 (26.4) 131 (28.9) 80 (31.1) 46 (27.1) City 202 (6.8) 99 (9.2) 45 (9.9) 34 (13.2) 9 (5.3) Township 1975 (66.4) 692 (64.4) 278 (61.2) 143 (55.6) 115 (67.6) p < 0.01

148

Depression symptoms No significant Moderately depression severe / Variable symptoms Mild Moderate severe Missing Well water, n (%) 1129 (37.9) 410 (38.1) 147 (32.4) 66 (25.7) 75 (44.1) p < 0.01 Alcohol status, n (%) No 1256 (42.2) 431 (40.1) 183 (40.3) 121 (47.1) 82 (48.2) Yes, not heavy 1505 (50.6) 524 (48.7) 191 (42.1) 92 (35.8) 78 (45.9) Yes, heavy 215 (7.2) 120 (11.2) 80 (17.6) 44 (17.1) 10 (5.9) p < 0.01 On depression 601 (20.2) 396 (36.8) 213 (46.9) 138 (53.7) 43 (25.3) medication, n (%) p < 0.01 Number of PHQ-8 questions missing, n (%) 0 1-7 2796 (94) 977 (90.9) 411 (90.5) 235 (91.4) 0 (0) All 8 180 (6) 98 (9.1) 43 (9.5) 22 (8.6) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 170 (100) p < 0.01 BMI, mean 29.6 30.5 31.5 32.1 29.4 Abbreviation: UNGD = unconventional natural gas development a The follow-up responders outside of Pennsylvania (n = 34) were excluded. b Column percent c The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the two weeks prior to survey return. d p-values from chi-squared tests of each covariate with the different levels of depression symptoms (no, mild, moderate, moderately severe / severe depression symptoms; missing).

149

4.4.2 Associations of UNGD with depression symptoms

The high and low groups of the UNGD activity index (vs. very low) were associated with mild depression symptoms (vs. none) in an adjusted multinomial logistic model (Table 4.4.2.1) and with the burden of depression symptoms in an adjusted

negative binomial model (Table 4.4.2.2). There was no association between the medium

UNGD activity group (vs. very low) and depression symptoms in either model. When we added a cross-product between UNGD and antidepressant medication use to the model, the p-value from the Wald test of the cross-product was 0.14 and 0.12 in the multinomial logistic and negative binomial models, respectively, indicating that there was not statistically significant effect modification by treatment. When we added fatigue or migraine to the negative binomial models of UNGD and depression symptoms, the coefficient for the high UNGD activity decreased by 9.4% and 3.4%, respectively,

providing weak evidence of mediation by fatigue symptoms on the associations of

UNGD with depression symptoms (Table 4.4.2.3). In the sensitivity analysis to evaluate the influence of weighting on associations, we observed stronger associations between

UNGD and depression symptoms using full weights, and no association between UNGD and depression symptoms using no weights (Tables 4.4.2.1 and 4.4.2.2).

150

Table 4.4.2.1. Association of UNGD and depression symptoms in survey multinomial logistic models (n=4,762a). Moderately Moderate severe/severe Mild depression depression depression UNGD activity symptomsc,d symptomsc,d symptomsc,d groupb OR (95% CI) OR (95% CI) OR (95% CI) Truncated survey weightse Lowf 1.63 (1.21 - 2.19) 1.22 (0.80 - 1.86) 1.13 (0.61 - 2.06) Medium 1.25 (0.92 - 1.71) 1.04 (0.68 - 1.60) 0.89 (0.47 - 1.69) High 1.51 (1.12 - 2.04) 1.26 (0.83 - 1.92) 1.39 (0.76 - 2.54) Full survey weights Low 1.72 (1.14 - 2.59) 1.20 (0.67 - 2.14) 0.93 (0.37 - 2.34) Medium 1.29 (0.84 - 1.98) 1.23 (0.66 - 2.28) 0.68 (0.26 - 1.79) High 1.95 (1.28 - 2.97) 1.77 (0.98 - 3.20) 1.47 (0.63 - 3.46) No survey weights Low 1.23 (1.003 - 1.50) 1.04 (0.78 - 1.39) 1.19 (0.82 - 1.74) Medium 0.996 (0.81 - 1.22) 0.84 (0.63 - 1.13) 0.91 (0.61 - 1.34) High 1.12 (0.92 - 1.37) 1.06 (0.80 - 1.40) 1.11 (0.76 - 1.61) Abbreviations: UNGD = unconventional natural gas development, OR = odds ratio, CI = confidence interval a The follow-up responders outside of Pennsylvania (n = 34) and those who answered no depression symptom questions (n = 170) were excluded. b The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the two weeks prior to follow-up survey return. c Models adjusted for race/ethnicity (white non-Hispanic, black non-Hispanic, Hispanic), sex (male, female), Medical Assistance (no, yes), age (years, linear and quadratic terms), smoking status (never, former, current), alcohol status (no; yes, not heavy; yes, heavy), body mass index (BMI, kg/m, linear and quadratic terms), community socioeconomic deprivation (linear and quadratic terms), and water source (municipal water, well water). d No depression symptoms was the base outcome. e Primary analysis f Very low was the reference group.

151

Table 4.4.2.2. Association of UNGD and depression symptoms in survey negative binomial models (n=4,762a). UNGD activity Depression symptomsc groupb Exponentiated coefficientd (95% CI) Truncated survey weightse Lowf 1.14 (1.01 - 1.29) Medium 1.03 (0.91 - 1.17) High 1.18 (1.04 - 1.34) Full survey weights Low 1.12 (0.94 - 1.34) Medium 1.07 (0.88 - 1.29) High 1.29 (1.08 - 1.56) No survey weights Low 1.05 (0.96 - 1.15) Medium 0.96 (0.88 - 1.05) High 1.03 (0.94 - 1.13) Abbreviations: UNGD = unconventional natural gas development, CI = confidence interval a The follow-up responders outside of Pennsylvania (n = 34) and those that answered no depression symptom questions (n = 170) were excluded. b The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the two weeks prior to follow-up survey return. c Models adjusted for race/ethnicity (white non-Hispanic, black non-Hispanic, Hispanic), sex (male, female), Medical Assistance (no, yes), age (years, linear and quadratic terms), smoking status (never, former, current), alcohol status (no; yes, not heavy; yes, heavy), body mass index (BMI, kg/m, linear and quadratic terms), community socioeconomic deprivation (linear and quadratic terms), and water source (municipal water, well water). d Ratio of mean symptom counts e Primary analysis f Very low was the reference group.

152

Table 4.4.2.3. Association of UNGD (assigned at baseline) and depression symptoms in survey negative binomial models that include migraine or fatigue (n=4,762a).

UNGD 3 months UNGD 3 months prior to baseline prior to baseline UNGD 3 months with fatigue as a with migraine as a prior to baselinec covariate covariate UNGD activity Exponentiated Exponentiated Exponentiated groupb coefficientd (95% CI) coefficientd (95% CI) coefficientd (95% CI) Lowe 1.07 (0.95 - 1.21) 1.03 (0.91 - 1.16) 1.08 (0.96 - 1.22) Medium 1.02 (0.90 - 1.15) 0.997 (0.88 - 1.13) 1.05 (0.92 - 1.19) High 1.17 (1.03 - 1.33) 1.06 (0.94 - 1.21) 1.13 (0.999 - 1.29) Fatigue 2.58 (2.39 - 2.80) Migraine 1.77 (1.60 - 1.96) Abbreviations: UNGD = unconventional natural gas development, CI = confidence interval a The follow-up responders outside of Pennsylvania (n = 34) and those that answered no depression symptom questions (n = 170) were excluded. b The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the three months prior to baseline survey return. c Models adjusted for race/ethnicity (white non-Hispanic, black non-Hispanic, Hispanic), sex (male, female), Medical Assistance (no, yes), age (years, linear and quadratic terms), smoking status (never, former, current), alcohol status (no; yes, not heavy; yes, heavy), body mass index (BMI, kg/m, linear and quadratic terms), community socioeconomic deprivation (linear and quadratic terms), and water source (municipal water, well water). d Ratio of mean symptom counts e Very low was the reference group.

4.4.3 Associations of UNGD with disordered sleep

In the multilevel model for the longitudinal disordered sleep outcome, UNGD was not associated with disordered sleep diagnoses (Table 4.4.3). Because there was no association of UNGD and disordered sleep, we did not evaluate mediation of the UNGD and depression symptoms association by disordered sleep.

153

Table 4.4.3. Association between UNGD and sleep deprivation in a survey-weighteda generalized estimating equations model. Depression UNGD activity symptomsc groupb OR (95% CI) Lowd 0.96 (0.73 - 1.25) Medium 1.06 (0.80 - 1.40) High 1.06 (0.79 - 1.42) Abbreviations: UNGD = unconventional natural gas development, OR = odds ratio, CI = confidence interval a Truncated survey weights were used. b The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the three months prior to each event. c Models adjusted for race/ethnicity (white non-Hispanic, black non-Hispanic, Hispanic), sex (male, female), Medical Assistance (no, yes), age (years, linear and quadratic terms), smoking status (never, former, current), alcohol status (no; yes, not heavy; yes, heavy), body mass index (BMI, kg/m, linear and quadratic terms), community socioeconomic deprivation (linear and quadratic terms), and water source (municipal water, well water). d Very low was the reference group.

4.5 Discussion We observed an association between UNGD activity and depression symptoms, but which was only present in weighted models. However, unweighted models tend to be more biased, so the weighted analysis was our primary analysis. Antidepressant use did not appear to be an effect modifier of this relationship. We observed suggestive evidence of mediation by fatigue of the association between UNGD and depression symptoms, but we cannot rule out the possibility that fatigue was instead a confounder of

the association. While we established the temporality of the mediation analysis by

making the UNGD metric precede fatigue, and fatigue precede depression symptoms,

because migraine, fatigue, and depression symptoms and the UNGD activity metrics

were all correlated within a person over time, we cannot rule out the potential for

confounding. Finally, we did not observe an association between UNGD and disordered

sleep.

There are several biologically plausible pathways for UNGD to affect depression

symptoms, including air pollution, psychosocial stress, and changes to the built or social

154

environment (which may be mediated through stress30). Each stage of UNGD has air

impacts, including from truck traffic, diesel powered machinery, and fugitive

emissions.1,2,8-12,17 UNGD has contributed to psychosocial stress in the region; in an

analysis of letters to the editor about UNGD in a newspaper in Pennsylvania, the authors

identified stress as a major theme,16 and among a convenience sample of people living

near UNGD in Pennsylvania, the most commonly reported symptom was stress.53 In a

study of symptoms that found that dermal and upper respiratory symptoms were more

common in people living less than 1 km from a drilled well compared to those living more

than 2 km from a drilled well, the authors suggested stress as a potential pathway for

these associations.21

Short and long term exposure to air pollution has been associated with

depression symptoms.29,54,55 For example, in a study in Korea that evaluated long-term exposure, a 10 µg/m3 increase in PM2.5 over the prior year was associated with 1.47 times the risk of a diagnosis of major depression disorder.55 Air pollution is hypothesized

to affect depression through an inflammatory pathway.55

Exposure to technological disasters has also been associated with depression.

Technological disasters, for instance, oil spills, are long lasting and are of man-made origin, and can affect health through psychosocial stress pathways or by influencing health-related behaviors.56 For example, women who reported income loss from the

Deepwater Horizon oil spill or physical exposure to the oil spill were more likely to exhibit depression symptoms than those without income loss or physical exposure.57 Similarly,

UNGD is a long-lasting exposure of man-made origin that has effects on health through

psychosocial stress pathways.

Studies of neighborhood conditions, including socioeconomic measures, crime

rates, and the built environment, in relation to depression have generally, but not always,

reported that worse neighborhood conditions were associated with depression. 30,31 The

155

associations between neighborhood conditions and depression may be mediated through stress. Residents of disadvantaged neighborhoods may experience greater exposure to stressors and poorer access to mental and physical health resources, both

of which could contribute to depression.30,31

This study had several strengths, in particular its large sample size and that it is

the first study to evaluate the association of UNGD with a mental health outcome. It

assessed depression symptoms with a questionnaire, which is a strength because

depression and its symptoms may not be well captured in EHRs. Unlike a prior study of

UNGD and self-reported outcomes,22 the questionnaire did not mention UNGD, which is

a strength because it reduced the possibility of dependent measurement error.

Additionally, the UNGD metric captured the time-varying nature of well development,

though it was not able to determine the pathway through which UNGD was associated

with depression symptoms.

This study had several limitations. Responders tended to be sicker than the general population because the survey was designed to oversample patients with nasal and sinus symptoms.25,26 We used survey weights to account for the survey design and non-response, but there may still be differences between the weighted population and the general population. We used ICD-9 codes and medication orders in the EHR to identify disordered sleep diagnoses, but if many participants treated disordered sleep over the counter, the EHR would have low sensitivity for identifying disordered sleep.58

This could explain why we did not observe an association between UNGD and

disordered sleep. Future studies could consider identifying disordered sleep in the

clinical notes or by questionnaire. Additionally, we did not have information on the onset

of migraine, fatigue, or depression symptoms. In longitudinal studies, fatigue and

depression are risk factors for one another, but because this study asked about each

symptom at one time point only, we could not determine the timing of onset or changes

156

in symptom status.59 We did not have information on if survey responders had signed a lease with a drilling company. Leaseholders are more supportive of UNGD than non- leaseholders,60 so lease-holding could be an effect modifier if people who have gained financially from UNGD experience this development differently than those who have not.

4.6 Conclusions UNGD was associated with depression symptoms in a large population, and this association may be mediated by fatigue symptoms. UNGD was not associated with disordered sleep diagnoses using EHR data. This was the first study of UNGD and a mental health or disordered sleep outcome.

4.7 References 1. Adgate JL, Goldstein BD, McKenzie LM. Potential public health hazards, exposures

and health effects from unconventional natural gas development. Environ Sci Technol.

2014;48(15):8307-8320.

2. McKenzie LM, Witter RZ, Newman LS, Adgate JL. Human health risk assessment of

air emissions from development of unconventional natural gas resources. Sci Total

Environ. 2012;424:79-87.

3. Maloney KO, Yoxtheimer DA. Production and disposal of waste materials from gas and oil extraction from the Marcellus shale play in Pennsylvania. Env Prac.

2012;14(04):278-287.

4. Olmstead SM, Muehlenbachs LA, Shih J, Chu Z, Krupnick AJ. Shale gas development

impacts on surface water quality in Pennsylvania. Proceedings of the National Academy

of Sciences. 2013;110(13):4962-4967.

5. Jackson RB, Vengosh A, Darrah TH, et al. Increased stray gas abundance in a subset of drinking water wells near Marcellus shale gas extraction. Proceedings of the National

Academy of Sciences. 2013;110(28):11250-11255.

157

6. Warner NR, Jackson RB, Darrah TH, et al. Geochemical evidence for possible natural migration of marcellus formation brine to shallow aquifers in Pennsylvania. Proceedings of the National Academy of Sciences. 2012;109(30):11961-11966.

7. Osborn SG, Vengosh A, Warner NR, Jackson RB. Methane contamination of drinking water accompanying gas-well drilling and hydraulic fracturing. Proc Natl Acad Sci U S A.

2011;108(20):8172-8176.

8. Roy AA, Adams PJ, Robinson AL. Air pollutant emissions from the development, production, and processing of Marcellus shale natural gas. J Air Waste Manage Assoc.

2013;64(1):19-37.

9. Pacsi AP, Alhajeri NS, Zavala-Araiza D, Webster MD, Allen DT. Regional air quality impacts of increased natural gas production and use in texas. Environ Sci Technol.

2013;47(7):3521-3527.

10. Pacsi AP, Kimura Y, McGaughey G, McDonald-Buller E, Allen DT. Regional ozone impacts of increased natural gas use in the Texas power sector and development in the

eagle ford shale. Environ Sci Technol. 2015;49(6):3966-3973.

11. Litovitz A, Curtright A, Abramzon S, Burger N, Samaras C. Estimation of regional air-

quality damages from marcellus shale natural gas extraction in Pennsylvania.

Environmental Research Letters. 2013;8(1):014017.

12. Kemball-Cook S, Bar-Ilan A, Grant J, et al. Ozone impacts of natural gas

development in the haynesville shale. Environ Sci Technol. 2010;44(24):9357-9363.

13. Sangaramoorthy T, Jamison AM, Boyle MD, et al. Place-based perceptions of the

impacts of fracking along the marcellus shale. Soc Sci Med. 2016.

14. Muehlenbachs L, Spiller E, Timmins C. The housing market impacts of shale gas

development. Am Econ Rev. 2015;105(12):3633-59.

158

15. Gopalakrishnan S, Klaiber HA. Is the shale energy boom a bust for nearby residents? evidence from housing values in Pennsylvania. Am J Agric Econ.

2014;96(1):43-66.

16. Powers M, Saberi P, Pepino R, Strupp E, Bugos E, Cannuscio CC. Popular

epidemiology and “fracking”: Citizens’ concerns regarding the economic, environmental,

health and social impacts of unconventional natural gas drilling operations. J Community

Health. 2015;40(3):534-541.

17. Vinciguerra T, Yao S, Dadzie J, et al. Regional air quality impacts of hydraulic

fracturing and shale natural gas activity: Evidence from ambient VOC observations.

Atmos Environ. 2015;110:144-150.

18. Casey JA, Savitz DA, Rasmussen SG, et al. Unconventional natural gas

development and birth outcomes in Pennsylvania, USA. Epidemiology. 2015.

19. McKenzie LM, Guo R, Witter RZ, Savitz DA, Newman LS, Adgate JL. Birth outcomes and maternal residential proximity to natural gas development in rural colorado. Environ

Health Perspect. 2014.

20. Stacy SL, Brink LL, Larkin JC, et al. Perinatal outcomes and unconventional natural gas operations in southwest Pennsylvania. PLOS ONE. 2015;10(6):e0126425.

21. Rabinowitz PM, Slizovskiy IB, Lamers V, et al. Proximity to natural gas wells and reported health status: Results of a household survey in Washington county, pennsylvania. Environ Health Perspect. 2014.

22. Saberi P, Propert KJ, Powers M, Emmett E, Green-McKenzie J. Field survey of

health perception and complaints of Pennsylvania residents in the Marcellus shale

region. Int J Environ Res Public Health. 2014;11(6):6517-6527.

23. Kroenke K, Strine TW, Spitzer RL, Williams JB, Berry JT, Mokdad AH. The PHQ-8

as a measure of current depression in the general population. J Affect Disord.

2009;114(1):163-173.

159

24. American Psychiatric Association. Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Pub; 2013.

25. Hirsch AG, Stewart WF, Sundaresan AS, et al. Nasal and sinus symptoms and chronic rhinosinusitis in a population-based sample. Allergy. 2016.

26. Tustin AW, Hirsch AG, Rasmussen SG, Casey JA, Bandeen-Roche K, Schwartz BS.

Associations between unconventional natural gas development and nasal and sinus, migraine headache, and fatigue symptoms in Pennsylvania. Environ Health Perspect.

2016.

27. Rasmussen SG, Ogburn EL, McCormack M, et al. Association between unconventional natural gas development in the Marcellus shale and asthma exacerbations. JAMA Intern Med. 2016;176(9):1334-1343.

28. Murray CJ, Abraham J, Ali MK, et al. The state of US health, 1990-2010: Burden of diseases, injuries, and risk factors. JAMA. 2013;310(6):591-606.

29. Lim YH, Kim H, Kim JH, Bae S, Park HY, Hong YC. Air pollution and symptoms of depression in elderly adults. Environ Health Perspect. 2012;120(7):1023-1028.

30. Kim D. Blues from the neighborhood? neighborhood characteristics and depression.

Epidemiol Rev. 2008;30:101-117.

31. Richardson R, Westley T, Gariépy G, Austin N, Nandi A. Neighborhood socioeconomic conditions and depression: A systematic review and meta-analysis. Soc

Psychiatry Psychiatr Epidemiol. 2015;50(11):1641-1656.

32. Kendler KS, Gardner CO, Prescott CA. Toward a comprehensive developmental model for major depression in men. Am J Psychiatry. 2006;163(1):115-124.

33. Kendler KS, Gardner CO, Prescott CA. Toward a comprehensive developmental model for major depression in women. Am J Psychiatry. 2002;159(7):1133-1145.

34. Hammen C. Stress and depression. Annu Rev Clin Psychol. 2005;1:293-319.

160

35. Riemann D. Insomnia and comorbid psychiatric disorders. Sleep Med. 2007;8 Suppl

4:S15-20.

36. Riemann D, Voderholzer U. Primary insomnia: A risk factor to develop depression? J

Affect Disord. 2003;76(1-3):255-259.

37. Bigal ME, Lipton RB. The epidemiology, burden, and comorbidities of migraine.

Neurol Clin. 2009;27(2):321-334.

38. Jette N, Patten S, Williams J, Becker W, Wiebe S. Comorbidity of migraine and psychiatric disorders--a national population-based study. Headache. 2008;48(4):501-

516.

39. Antonaci F, Nappi G, Galli F, Manzoni GC, Calabresi P, Costa A. Migraine and

psychiatric comorbidity: A review of clinical findings. J Headache Pain. 2011;12(2):115-

125.

40. O'Donnell JF. Insomnia in cancer patients. Clin Cornerstone. 2004;6(1):S6-S14. doi: http://dx.doi.org/10.1016/S1098-3597(05)80002-X.

41. Balkrishnan R, Rasu RS, Rajagopalan R. Physician and patient determinants of pharmacologic treatment of sleep difficulties in outpatient settings in the united states.

Sleep. 2005;28(6):715.

42. Lipton RB, Dodick D, Sadovsky R, et al. A self-administered screener for migraine in

primary care: The ID migraine validation study. Neurology. 2003;61(3):375-382.

43. Patient-Reported Outcomes Measurement Information System. PROMIS fatigue short form 8a. http://www.assessmentcenter.net. Updated 2015. Accessed October 10,

2015.

44. Casey JA, Ogburn EL, Rasmussen SG, et al. Predictors of indoor radon concentrations in Pennsylvania, 1989-2013. Environ Health Perspect.

2015;123(11):1130-1137.

161

45. Schwartz BS, Glass TA, Pollak J, et al. Depression, its comorbidities and treatment, and childhood body mass index trajectories. Obesity (Silver Spring). 2016.

46. Schwartz BS, Stewart WF, Godby S, et al. Body mass index and the built and social environments in children and adolescents using electronic health records. Am J Prev

Med. 2011;41(4):e17-e28.

47. Liu AY, Curriero FC, Glass TA, Stewart WF, Schwartz BS. Associations of the burden of coal abandoned mine lands with three dimensions of community context in

Pennsylvania. ISRN Public Health. 2012;2012.

48. Pennsylvania Department of Health. Public water systems. Environmental Health

Tracking Program Web site. http://www.health.pa.gov/My%20Health/Environmental%20Health/Environmental%20Pu blic%20Health%20Tracking/Pages/Metadata-for-Drinking-Water-

Quality.aspx#.V0Xr8JErKM8. Updated 2015. Accessed 5/25, 2016.

49. Potter F. Survey of procedures to control extreme sampling weights. . 1988:453-458.

50. Esser MB, Hedden SL, Kanny D, Brewer RD, Gfroerer JC, Naimi TS. Prevalence of alcohol dependence among US adult drinkers, 2009-2011. Prev Chronic Dis.

2014;11:E206.

51. Gries CJ, Engelberg RA, Kross EK, et al. Predictors of symptoms of posttraumatic stress and depression in family members after patient death in the ICU. Chest.

2010;137(2):280-287.

52. Pike GR. Using weighting adjustments to compensate for survey nonresponse.

Research in Higher Education. 2008;49(2):153-171.

53. Ferrar KJ, Kriesky J, Christen CL, et al. Assessment and longitudinal analysis of health impacts and stressors perceived to result from unconventional shale gas development in the Marcellus shale region. Int J Occup Environ Health. 2013;19(2):104-

112.

162

54. Szyszkowicz M, Rowe BH, Colman I. Air pollution and daily emergency department visits for depression. Int J Occup Med Environ Health. 2009;22(4):355-362.

55. Kim KN, Lim YH, Bae HJ, Kim M, Jung K, Hong YC. Long-term fine particulate matter exposure and major depressive disorder in a community-based urban cohort.

Environ Health Perspect. 2016.

56. Couch SR, Coles CJ. Community stress, psychosocial hazards, and EPA decision- making in communities impacted by chronic technological disasters. Am J Public Health.

2011;101(S1).

57. Rung AL, Gaston S, Oral E, et al. Depression, mental distress and domestic conflict among louisiana women exposed to the deepwater horizon oil spill in the WaTCH study.

Environ Health Perspect. 2016.

58. Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health records for population health research: A review of methods and applications. Annu Rev Public

Health. 2015(0).

59. Skapinakis P, Lewis G, Mavreas V. Temporal relations between unexplained fatigue and depression: Longitudinal data from an international study in primary care.

Psychosom Med. 2004;66(3):330-335.

60. Kriesky J, Goldstein BD, Zell K, Beach S. Differing opinions about natural gas drilling in two adjacent counties with different levels of drilling activity. Energy Policy.

2013;58:228-236.

163

Chapter 5: Exposure assessment using secondary data sources in unconventional natural gas development and health studies

5.0 Cover page Sara Rasmussen, MHS1; Kirsten Koehler, PhD1; J. Hugh Ellis, PhD1; David Manthos, BA2; Karen Bandeen-Roche, PhD3; Rutherford Platt, PhD4; and Brian S. Schwartz*, MD, MS1,5,6 1Department of Environmental Health and Engineering, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA; 2SkyTruth, Shepherdstown, WV, USA; 3Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA; 4Department of Environmental Studies, Gettysburg College, Gettysburg, Pennsylvania, USA; 5Department of Epidemiology and Health Services Research, Geisinger Health System, Danville, Pennsylvania, USA; 6Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA

Acknowledgements: This study was funded by the National Institute of Environmental Health Sciences grant ES023675-01 (PI: B S Schwartz), training grant ES07141 (S G Rasmussen), the Degenstein Foundation, and the National Science Foundation Integrative Graduate Education and Research Traineeship (S G Rasmussen). No funders had input into the study design, conduct, data collection or analysis, or manuscript preparation. Dr. Schwartz is a Fellow of the Post Carbon Institute (PCI), serving as an informal advisor on climate, energy, and health issues. He receives no payment for this role. His research is entirely independent of PCI, and is not motivated, reviewed, or funded by PCI. We thank Joseph J. DeWalle (Geisinger Health System) for creating the map; John Amos (SkyTruth) for guidance with the flaring and impoundment data; and Chloe Quinlan (Johns Hopkins Bloomberg School of Public Health), Jennifer Irving, and Joshua Crisp (Geisinger Health System) for compiling the compressor data.

Reproduced with permission from Environmental Science and Technology, submitted for publication. Unpublished work copyright 2017 American Chemical Society.

164

5.1 Abstract Studies of unconventional natural gas development (UNGD) and health have ranked participants along an “exposure” gradient using geographic information system (GIS)-based proxies that incorporated the distance between participants’ home addresses and unconventional natural gas wells. However, studies have used different GIS-based proxies, making comparison of results across studies difficult. Furthermore, studies have only incorporated wells, but neglected other components of development, namely compressors, impoundments, and flaring events, which may have relevance to health. Here, we characterized UNGD-related impoundments, compressors, and flaring events in Pennsylvania and evaluated whether and how to incorporate these into exposure assessment using a principal component analysis. We compared three different approaches to GIS-based UNGD metrics used in health studies to each other and their associations with a health outcome, mild asthma exacerbations. We identified 361 compressor stations, 1,218 impoundments, and 216 locations with flaring events. The principal component analysis identified a single component that was approximately an equal mix of the metrics for compressors, impoundments, and four phases of well development (pad preparation, drilling, stimulation, and production). The three GIS- based UNGD metrics had different magnitudes of association with mild asthma exacerbations, although the highest category of each metric (vs. the lowest) was associated with the outcome, regardless of metric.

165

5.2 Introduction Unconventional natural gas (UNG) constitutes over 40% of the natural gas produced in the U.S., up from less than 10% in 2007. Pennsylvania’s Marcellus shale accounts for over a quarter of the country’s UNG production.1 Several epidemiology

studies evaluated associations of unconventional natural gas development (UNGD) with

health outcomes, but these studies used different UNGD metrics to categorize

participants, making comparing results difficult, and these metrics have only

incorporated wells, though wells are just one component of UNGD-related infrastructure.

UNGD involves pad preparation, drilling, perforation, stimulation, and gas production. The fluid returning to the surface with the gas can be stored in surface impoundments, where volatile organic compounds (VOCs) evaporate. Gas is

compressed using diesel or natural gas powered compressor engines before

distribution.2 Residents of regions undergoing UNGD face potential chemical exposures,

through water, soil, and air; physical exposures, including noise, light, and vibration; and

community impacts.2-21 Exposure to UNGD is not a single exposure, but multiple time

varying exposures, each with of different scales of impact.

Epidemiologic studies have evaluated the associations of UNGD with birth

outcomes,22-24 asthma exacerbations,25 symptoms,26-28 cancer,29, 30 hospitalization

rates,31 and car crashes,5 using several different metrics to rank study participants on a

gradient by UNGD. The epidemiologic studies that assigned UNGD metrics on an

individual level22-26, 28 assigned their metrics using a geographic information system

(GIS)-based proxy that incorporated the distance between study participants’ home

addresses and UNG wells, using a nearest neighbor distance or gravity model approach.

The primary advantages of GIS-based proxies are that they are inexpensive compared

166

to multiple pathway exposure assessment of physical, chemical, and social impacts, and that they can be used retrospectively.

However, the GIS-based proxies used in these studies had limitations. Thus far, these metrics have only incorporated wells, even though other components of UNGD, such as compressor stations and impoundments, may have air quality impacts.12, 21 An

air emissions study estimated that compressor stations were responsible for the majority

of UNGD-related emissions of VOC, nitrogen oxides, and PM10 and PM2.5 (particulate

matter less than or equal to 10 and 2.5 micrometers in aerodynamic diameter,

respectively) in Pennsylvania in 2011.21 Impoundments remain a largely uncharacterized

source of air emissions.12 Because no prior study has attempted to incorporate

impoundments and compressor engines into UNGD metrics, it is not clear what (if

anything) they add to metric creation. Additionally, no study has incorporated flaring

events, which are sources of combustion products.2, 32 Finally, because studies have

used different approaches to defining metrics, comparing results across studies is

problematic.

To address the limitations related to UNGD metrics used in epidemiology studies, the three primary aims of the analyses in this paper were to: 1) characterize UNGD-

related impoundments, compressor engines, and flaring events in Pennsylvania; 2)

evaluate whether and how to incorporate impoundments, compressor engines, and

flaring events into a UNGD metric; and 3) compare associations of different GIS-based

UNGD metrics used in existing studies to each other and in their associations with mild

asthma exacerbations.

5.3 Methods 5.3.1 UNGD-related compressor engines, impoundments, and flaring events in Pennsylvania

167

Unlike data on wells,33, 34 data on compressor engines and impoundments are not available electronically. To identify compressor engines, we obtained a list of compressor stations thought to be UNGD-related from the Pennsylvania Department of

Environmental Protection (DEP) (n = 506). We visited four DEP locations (Northeast,

North-central, Northwest, and Southwest) and scanned relevant documents (including applications, General Information Forms, authorizations, start letters, and cancelations; n

= 6,007) between October 2013 and May 2014. We data abstracted these documents for station name, location, number of compressor engines, compressor engine horsepower, compressor engine emissions, expected start date of operation, authorization date, start date, and cancelation date; 2,700 documents contained at least one of these variables (initially, we scanned unnecessary documents; later, we refined the process on which documents to scan). We excluded compressor stations that had no available documents or, upon document review, were not UNGD-related (n = 49). After data entry, did data checking to confirm the accuracy of entered data. We used compressor station names and site identification numbers to link data across documents. If, say, horsepower was missing in one document, we looked for it in other documents for that compressor engine.

Information on impoundment location and sizes was obtained in partnership with

SkyTruth, which created a collaborative image analysis application on their website

(skytruth.org) that displayed aerial imagery collected by the USDA National Agricultural

Imagery Program35 of the one square kilometer area around UNG wells from the summers of 2005, 2008, 2010 and 2013 (Figure 2.2.2). Trained volunteers and staff identified and outlined impoundments. Each image was reviewed by no less than three staff or ten volunteers, 66.6% agreement was required, and assignments were validated by a GIS analyst before inclusion in the final dataset.

168

To estimate an installation and removal date for each impoundment, we used a trend analysis of Landsat data to identify sudden spectral changes in the grid cell that contained each impoundment. To do so, we compiled all available Landsat 5, 7, and 8 surface reflectance imagery with < 30% cloud cover for the years 2000-2015, a total of

754 images across four Landsat path/rows. For each impoundment location, we masked

remaining clouds and then interpolated a monthly time series for the near infrared band

and the normalized difference vegetation index (NDVI). We used the Breaks for Additive

Season and Trend package in R to identify discrete breaks in the time series after the

removal of seasonal effects.36 The dataset has a nominal temporal resolution of 1

month, but cloud cover and gaps can potentially delay the detection of the creation or

removal of impoundments. Based on the direction, magnitude, and timing of the time series breaks, we identified approximate dates of creation and removal of impoundments. We verified estimates for a sample of impoundments by comparing

Landsat-derived dates to photointerpretation-derived dates using historical imagery on

Google Earth.

To identify flaring events, we used detections recorded at night by the Visible

Infrared Imaging Radiometer Suite on the Suomi NPP satellite operated by the National

Oceanic and Atmospheric Administration (NOAA). We identified detections in

Pennsylvania with a temperature >1773ºK and <5273.15ºK (excluding temperatures of

1810ºK, which NOAA used to identify detections where it is not possible to estimate the

temperature) from September 9, 2012 – August 3, 2015. Because there were often

several detections close together, we grouped detections that were within 150 m on the

same day.

5.3.2 Incorporate impoundments and compressor engines into exposure assessment

We used principal components analysis (PCA) to assess the relationship between metrics created for four phases of well development (pad preparation, drilling,

169

stimulation, and production), compressor engines, and impoundments. We created a regular grid (5 by 5 km) across 38 counties in central and Northeastern Pennsylvania

(Figure 1, in green) (number of grid points = 2,627). On January 1 and July 1 for 2005-

2013, we assigned inverse distance-squared (IDS) metrics to each grid point for four phases of well development (pad preparation, drilling, stimulation, and production, as used in our prior health studies22, 25, 26), impoundments, and compressor engines using

Equation 5.3.2.

Equation 5.3.2. Inverse distance squared (IDS) metric.

For each IDS metric, m was either the number of wells in the given phase, started

2 compressor engines, or installed impoundments; and dij was the squared-distance

(meters) between well, compressor engine, or impoundment i and point j. For the four

phases of well development, si was 1 for the pad production and drilling phases, total

well depth (meters) of well i for the stimulation phase, and daily natural gas production

3 volume (m ) of well i for the production phase. For compressor engines, si was the compressor engine horsepower. Engines contributed to the metric from their start date to

2 their removal date. For impoundments, si was the area (m ) of the impoundment, which

contributed to the metric from their installation to their removal date. For years with aerial

imagery (2005, 2008, 2010, and 2013), we assigned six metrics (impoundments,

compressor engines, and four phases of well development). For the remaining dates, we

assigned all metrics but the one for impoundments, though in some cases because there

were no wells in a given phase on a date, so that metric was not included in the PCA for

that date. We did not incorporate flaring events into this analysis because we did not

have information on flaring events before 2013 and only four locations had flaring events

identified in 2013.

170

On each date evaluated, we truncated the UNGD metrics at their 98th percentile,

log- and z-transformed the truncated values to normalize distributions and put the

metrics on the same scale, and conducted a PCA using the Pearson correlation matrix in

Stata. We compared loadings and scree plots across the evaluated dates. We also

compared the first component from the PCA to a summed z-score of all UNGD metrics

available on that date.

Figure 5.3.2. Location of UNG-related impoundments, compressor engines, and UNG wells. Impoundments included those identified in 2005, 2008, 2010, and 2013 (n = 1,218); compressor engines included those started by 2013 (n = 861), and wells included those drilled by 2015 (n = 9,669). Counties in green are those that were included in the fishnet grid. Ozone monitors (n = 55) are those that were active in 2012. Abbreviations: UNGD, unconventional natural gas development; UNG, unconventional natural gas

5.3.3 Comparison of GIS-based metrics and their associations with mild asthma exacerbations

Studies of UNGD and health have used different approaches to GIS-based

UNGD metrics. Here, we compared how different UNGD metrics categorized patients

171

and evaluated the sensitivity of associations of UNGD and a health outcome to different

metrics. In our prior study, we evaluated the associations of four phases of UNG well

development with mild, moderate, and severe asthma exacerbations, among 35,508

primary care patients with asthma of the Geisinger Clinic in Pennsylvania from 2005-

12.25 We identified case encounters (mild, moderate, and severe asthma exacerbations: asthma oral corticosteroid [OCS] medication orders, asthma emergency department visits, and asthma hospitalizations, respectively) and control encounters (patient contact dates with the health system) from the Geisinger electronic health record. To compare how different metrics categorized patients on UNGD, here, we assigned three different

UNGD metrics (described below) to the case and control encounters identified in our previous study (n = 69,548) and we compared how each metric ranked case and control

encounter dates using Spearman correlations for continuous metrics and tables for

categorical metrics. We then evaluated associations of each of these metrics with mild

asthma exacerbations (the largest outcome, with 39,442 case and control dates) using

our previously reported adjusted multilevel model.25

We evaluated three different approaches to UNGD metrics: 1) categorical

distance to the nearest drilled well (DNDW), 2) inverse distance metric based on the

drilling phase (IDD), and 3) IDS metric incorporating four phases of well development

and compressor engines (IDS4PC). The DNDW approach, used by Rabinowitz,28 was based on distance from a patient’s home to the closest drilled well of any age, and categorized into less than 1 km, 1-2km, and greater than 2km.28 The IDD metric, used by

McKenzie and Stacy,23, 24 was assigned using Equation 5.3.3.

Equation 5.3.3. Inverse distance metric based on the drilling phase (IDD).

172

In Equation 5.3.3, n was the number of drilled wells within 10 miles of a patient’s

home and dij was the distance between the patient’s home and a well. We tertiled the

IDD metric using case and control encounters with at least one well within 10 miles, and

created a reference group of case and control encounters with no wells within 10 miles.24

The IDS4PC metric included four phases of well development and UNG-related

compressor engines. As described above (Equation 5.3.2), we assigned each

encounter date a value for four phases of well development and compressor stations,

created z-scores for each of the five values, summed the z-scores, and quartiled the

sum using all patient events (exacerbations or control dates). The results of the PCA

(Section 5.4) informed the creation of the IDS4PC metric. We did not include

impoundments because data were not available for all years.

To evaluate sensitivity of associations of different approaches to UNGD metric

creation with a health outcome, we re-ran the model for mild asthma exacerbations from

our prior study25 with the DNDW, IDD, and IDS4PC metrics. We then compared the odds ratios from each of these models. The study was approved by the Geisinger Institutional

Review Board (with an IRB authorization agreement with Johns Hopkins Bloomberg

School of Public Health).

5.4 Results 5.4.1 UNGD-related compressor engines, impoundments, and flaring events in Pennsylvania

We identified 1,218 impoundments and 457 compressor stations in Pennsylvania

(Figures 5.3.2 and 5.4.1). The median area (m2) of impoundments in 2005, 2008, 2010, and 2013 was 344.0, 558.8, 1990.2, and 6209.7, respectively. The average estimated duration of an impoundment from installation to removal was 1.9 years. At the 457

173

compressor stations, we identified 1,419 compressor engines, though only 861 engines at 361 stations had start letters stating they were operational.

Between September 2012 and August 2015, we identified 1,174 flaring observations on 380 days. After grouping flares within 150m, we identified flares at 216 locations (Figure 5.3.2). At 114 locations (53%), the flaring event was observed on one

day, and at the remaining 102 sites, there was a median of 115 days from the first to last

flaring event.

Figure 5.4.1. Total number of drilled unconventional natural gas wells and operating unconventional natural gas related impoundments and compressor engines in Pennsylvania by year.

174

5.4.2 PCA applied to wells, compressor stations, and impoundments

In each PCA, the first component explained between 58 and 94% (median 79%) of the total variation (Table 5.4.2.1). For 15 of the 18 dates, only the first component had an eigenvalue above one. The first components’ loadings were consistently made up of an approximately equal mix of the UNGD metrics. The first component was highly correlated with a summed z-score of the metrics on each date (Spearman correlations >

0.99). In contrast, the second component, which explained between 4 and 29% of the variation, did not have consistent loadings, although the compressor metric tended to be the largest (Table S1).

175

Table 5.4.2.1. Results of PCA with Percentage of Variation Explained by Component 1 and Component 1 Loadings Proportion of Component 1 loadings Correlation variance Compressor Well metrics of explained by engine Impoundment component 1 Date component 1 metric Pad Drilling Stimulation Production metric with z score 1/1/2005 0.77 0.50 a a a 0.62 0.59 0.99 7/1/2005 0.76 0.47 0.46 0.33 a 0.47 0.49 0.99 1/1/2006 0.91 0.56 0.59 a a 0.58 b 0.99 7/1/2006 0.94 0.42 0.46 0.46 0.46 0.44 b 0.99 1/1/2007 0.85 0.46 0.37 0.45 0.47 0.47 b 0.99 7/1/2007 0.72 0.50 0.21 0.50 0.45 0.51 b 0.99 1/1/2008 0.72 0.46 0.36 0.43 0.43 0.46 0.30 0.99 7/1/2008 0.58 0.46 0.35 0.34 0.43 0.48 0.36 0.99 1/1/2009 0.58 0.47 0.53 0.41 0.47 0.32 b 0.99 7/1/2009 0.69 0.24 0.50 0.50 0.48 0.46 b 0.99 1/1/2010 0.67 0.33 0.36 0.45 0.39 0.45 0.45 0.99 7/1/2010 0.81 0.34 0.43 0.43 0.42 0.40 0.42 0.99 1/1/2011 0.80 0.38 0.47 0.46 0.46 0.46 b 0.99 7/1/2011 0.83 0.41 0.46 0.46 0.45 0.46 b 0.99 1/1/2012 0.84 0.41 0.44 0.46 0.46 0.46 b 0.99 7/1/2012 0.79 0.40 0.43 0.47 0.45 0.48 b 0.99 1/1/2013 0.83 0.38 0.41 0.42 0.42 0.43 0.39 0.99 7/1/2013 0.78 0.41 0.37 0.41 0.41 0.44 0.40 0.99 a All grid points had a value of zero for this variable on this date, and variables with zero variance were dropped from PCA. b Impoundment data was only available in 2005, 2008, 2010, and 2013.

176

Table 5.4.2.2. Results of PCA with Percentage of Variation Explained by Component 2 and Component 2 Loadings

Component 2 loadings Proportion of Well metrics variance Compressor explained by engine Impoundment Date component 2 metric Pad Drilling Stimulation Production metric 1/1/2005 0.20 0.83 a a a -0.19 0.59 7/1/2005 0.15 -0.15 -0.46 0.33 a -0.08 0.07 1/1/2006 0.07 0.81 -0.23 a a 0.58 b 7/1/2006 0.04 0.86 -0.24 0.06 0.46 0.44 b 1/1/2007 0.11 0.08 0.83 -0.45 -0.32 0.01 b 7/1/2007 0.18 -0.13 0.97 -0.18 0.05 -0.13 b 1/1/2008 0.12 -0.09 0.21 -0.11 -0.34 -0.23 0.88 7/1/2008 0.27 -0.31 0.49 0.53 -0.43 -0.30 0.33 1/1/2009 0.29 -0.35 -0.12 0.54 -0.41 0.63 b 7/1/2009 0.19 0.90 0.02 -0.14 0.05 -0.40 b 1/1/2010 0.22 0.56 0.49 -0.32 -0.46 -0.29 0.20 7/1/2010 0.11 0.78 -0.17 -0.05 -0.24 -0.46 0.29 1/1/2011 0.10 0.87 -0.05 -0.33 0.01 -0.36 b 7/1/2011 0.08 0.84 -0.36 -0.23 0.13 -0.29 b 1/1/2012 0.07 0.86 -0.33 -0.24 0.08 -0.29 b 7/1/2012 0.11 0.76 -0.63 0.05 0.02 -0.15 b 1/1/2013 0.10 0.58 -0.39 -0.34 -0.24 -0.10 0.57 7/1/2013 0.09 -0.48 0.37 0.08 0.52 0.12 -0.59 a All grid points had a value of zero for this variable on this date, and variables with zero variance were dropped from PCA. b Impoundment data was only available in 2005, 2008, 2010, and 2013

177

5.4.3 Comparison of GIS-based UNGD metrics

We sought to compare how the DNDW, IDD, and IDS4PC metrics categorized the index dates. Comparing the DNDW and IDS4PC metrics (Table 5.4.3.1), 96.4% of the index dates in the IDS4PC metric’s highest quartile were also in the highest category of the DNDW metric (greater than 2 km from the closest well), but a 98.6% of index dates in the IDS4PC metric’s highest category were greater than 2km from the closest well. For the IDD and ID4PC metrics, we compared both the continuous and categorical metrics. The Spearman correlation for continuous IDD and ID4PC metrics was 0.36.

While 80.3% of assignments for the IDD metric’s highest tertile were also in the highest quartile of IDS4PC, 18.5% of assignments for IDD’s lowest category (no wells within 10 miles) were in IDS4PC’s highest quartile (Table 5.4.3.2).

Table 5.4.3.1. Categorization of case and control encounter dates (counts) by distance to nearest drilled well (DNDW) and by an inverse distance squared metric incorporating four phases of well development and compressor engines (IDS4PC) DNDW categoriesa <1 km 1-2 km >2 km Total IDS4PC Qc1 2 4 17,381 17,387 categoriesb Q2 4 30 17,353 17,387 Q3 4 46 17,337 17,387 Q4 238 385 16,764 17,387 Total 248 465 68,835 69,548 a Distance to the nearest drilled well, based on Rabinowitz b An inverse distance metric incorporating four phases of well development (pad preparation, drilling, stimulation, and production) and UNG-related compressor stations, based on Casey, Tustin, and Rasmussen. c Quartile

178

Table 5.4.3.2. Categorization of case and control encounter dates (counts) by an inverse distance metric that was based only on the drilling phase inverse distance (IDD) and an inverse distance squared metric incorporating four phases of well development and compressor engines (IDS4PC) IDD tertilesa 0 wells Tb1 T2 T3 Total in 10 miles IDS4PC Qd1 16,999 159 146 83 17,387 c quartiles Q2 15,158 954 965 310 17,387 Q3 14,866 1,050 1,086 385 17,387 Q4 10,649 1,796 1,762 3,180 17,387 Total 57,672 3,959 3,959 3,959 69,548 a An inverse distance metric incorporating drilled wells, based on McKenzie and Stacy b Tertile c An inverse distance metric incorporating four phases of well development (pad preparation, drilling, stimulation, and production) and UNG-related compressor stations, based on Casey, Tustin, and Rasmussen. d Quartile

We compared associations of the DNDW, IDD, and IDS4PC metrics with a health outcome, mild asthma exacerbations. In the models that evaluated associations of the different metrics with mild asthma exacerbations, the highest group of each metrics (vs. the lowest) was associated with increased odds of mild exacerbation, though the magnitudes of association differed (IDD < DNDW < IDS4PC, Table 5.4.3.3). The DNDW and IDS4PC metrics had increasing odds ratios across UNGD categories, whereas the second tertile for the IDD metric had a slightly stronger association with the outcome than that for the third tertile. Associations of the IDS4PC metric with mild asthma exacerbations were intermediate of those from four regressions of each phase of well development separately in our prior study.25”

179

Table 5.4.3.3. Associations of unconventional natural gas development (UNGD) metrics and with mild asthma exacerbationsa. UNGD metric included in modelb Category Odds Ratio (95% CIb) > 2 km (REF) 1.0 DNDWc 1 - 2 km 1.13 (0.76 - 1.69) < 1 km 1.83 (1.03 - 3.25) No wells within 10 miles (REF) 1.0 Tertile 1 0.96 (0.83 - 1.13) IDDd Tertile 2 1.21 (1.03 - 1.42) Tertile 3 1.19 (1.01 - 1.41) Quartile 1 (REF) 1.0 Quartile 2 1.31 (1.16 - 1.48) IDS4PCe Quartile 3 2.20 (1.93 - 2.52) Quartile 4 3.69 (3.16 - 4.30) a New oral corticosteroid medication orders. b Multilevel models with a random intercept for patient and community, adjusted for age category (5-12, 13-18, 19-44, 45-61, 62-74, 75+ years), sex (male, female), race/ethnicity (white, black, Hispanic, other), family history of asthma (yes vs. no), smoking status (never, former, current, missing), season (spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21), Medical Assistance (yes vs. no), overweight/obesity (normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults, respectively; BMI missing), type 2 diabetes (yes vs. no), community socioeconomic deprivation (quartiles), distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), squared distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), maximum temperature on the day prior to event (degrees Celsius), and squared maximum temperature on the day prior to event (degrees Celsius) c Confidence interval d Distance to the nearest drilled well, based on Rabinowitz e An inverse distance metric that was based only on the drilling phase, based on McKenzie and Stacy f An inverse distance metric incorporating four phases of well development (pad preparation, drilling, stimulation, and production) and UNG-related compressor engines, based on Casey, Tustin, and Rasmussen.

5.5 Discussion Compressor engines, impoundments, and flaring events, which are potential sources of emissions, have not previously been described or incorporated in epidemiology studies, in part because data is not readily available. Additionally, approaches to incorporating wells have differed across epidemiology studies. In this study, we described UNGD-related compressor engines, impoundments and flaring events in Pennsylvania, evaluated the impact of including compressor engines and

180

impoundments in a UNGD metric, and compared associations of different metrics with a health outcome.

We identified 361 compressor stations, 1,218 impoundments, and 216 sites with flares. The dates of development for compressor engines and impoundments was similar to that for wells. Although the number of impoundments decreased from 2010-

2013, the total area of impoundments increased from 1.96 to 3.96 km2.

The PCAs suggested that on a majority of days evaluated, a single component captured most, but not all, of the variation of the compressor engine, impoundment, and well IDS metrics. That single component ranked points similarly to a z-score of the

metrics. It was not unexpected that the PCA loaded on a single component since the

wells, impoundments, and compressor engines have similar spatial and temporal

distributions. Based on these results, we incorporated compressor engines and the four

phases of well development into a UNGD metric by summing the z-score of each

component (the IDS4PC metric). We then compared how that metric classified case and

control dates to classifications by two other metrics (DNDW and IDD). There were

substantial differences in how the DNDW, IDD, and IDS4PC metrics ranked dates.

Differences between the three metrics were expected because the metrics were

designed for studies conducted in different regions and time periods. The DNDW metric

was designed for a study in southwestern Pennsylvania in 201228 and the IDD metric for

a study in Colorado from 1996 and 2009.24 The participants included in those studies

lived, on average, closer to wells than in the studies that used the IDS metric, which

were conducted in from 2005-2105 in northeastern Pennsylvania,22, 25, 26 because the

study in southwestern Pennsylvania did not include the earlier years of UNGD, when

wells were less dense, and the Colorado study included both conventional and UNG

wells.

181

Additionally, each of the three metrics incorporated different information.

Because the DNDW metric categorized based on distance to the single closest drilled well, it did not take into account the density of wells. The DNDW and IDD metrics only incorporated the drilling phase of development, whereas the IDS4PC distinguished between four phases of development. Both the DNDW and IDD metrics assumed that all exposures from wells were continuous after a well was drilled and that exposures were equal from all drilled wells, regardless of phase of development, depth of the well, or volume of natural gas produced at the well. However, phase of development is important to incorporate into metric formulation because exposures such as air emissions differ by phase of development, and because not all drilled wells are later stimulated or produce natural gas. Of the 9,669 unconventional natural gas wells drilled in Pennsylvania by

2015, 1,992 did not have stimulation dates, and of wells with stimulation dates, 377 did not report production (although it is not possible to distinguish between missing and zero values in the data). The DNDW metric assumed that wells farther than 2 km did not contribute to exposure, and the IDD also assumed that wells farther than 10 miles from patient’s home did not contribute to exposure, assumptions that may not be true for exposures such as regional air pollutants (e.g., ozone and particulate matter).

We designed our IDS4PC metric to capture all potential exposure pathways associated with UNGD. The IDS4PC metric assumed that wells only contributed to exposure during the four phases of development (pad preparation, drilling, stimulation, or production), and in between these phases wells did not contribute to the metric. It also assumed wells to contribute differently to exposure during the stimulation phases

(proportional to total depth) and production phase (proportional to gas quantity produced), and compressor engines contribute differently (proportional to total horsepower). We hypothesize that these are reasonable assumptions because well depth is correlated with the amount of fluid used in stimulation,37, 38 and thus also likely

182

correlated with truck trips needed to fluids to the site, and we hypothesize that fugitive

emissions are correlated with the volume of natural gas produced. However, we

acknowledge that without environmental measurements, we cannot definitively say how

well our metric is captures each potential exposure pathway.

We compared the associations of the different UNGD metrics with mild asthma

exacerbations. Although inference was the same across the three metrics (the highest

group of each was associated with mild asthma exacerbations), the magnitude of the

odds ratios differed. The IDS4PC metric was most strongly associated with OCS orders,

and the IDD metric was the least strongly associated. Had we used the IDD or DNDW

metric in our original study, we would have come to different conclusions on the strength

of the association of UNGD and asthma exacerbations. Because the associations of the

IDS4PC metric with mild asthma exacerbations were in between of those from each

phase of well development separately (as in our prior study),25 we concluded that the

time, effort, and expense required to capture information on compressor engines did not

substantively change interpretation of the association between the inverse distance

squared metric UNGD and mild asthma exacerbations. It is possible that the DNDW and

IDD metrics had more misclassification than the IDS4PC metric did, but without

environmental measurements, we cannot quantify how well each metric is captures

potential exposures from UGND, so we cannot definitively interpret the decrease in

magnitude of the association with the DNDW and IDD metrics compared to the IDS4PC

metric. We also acknowledge that we cannot rule out the potential for unmeasured

confounding in each model, just as we could not do so in our original study.25

One important pathway through which UNGD could influence health is air quality

impacts. We wanted to evaluate the adequacy of a GIS-based metric for air quality

impacts by comparing them to air quality estimates. To do this, we needed air pollution

measurements that were on a fine spatial resolution with a daily time step that included

183

emissions from UNGD and covered the years of UNGD (2005-2015) in Pennsylvania.

Because EPA monitors are too sparse in counties with UNGD (Figure 5.3.2), we

considered using the Community Multi-scale Air Quality (CMAQ) model output on a

12km grid for PM2.5 and ozone in 2007 and 2011. However, the National Emissions

Inventory (NEI), which CMAQ uses, “likely underestimates oil and gas emissions.”39 It

included only 2,675 unconventional natural gas wells in Pennsylvania in 2011, whereas

our analysis identified 4,951 spudded wells by the end of 2011. The Environmental

Protection Agency is working to improve UNGD emissions for future versions of the NEI,

so it may be possible to compare UNGD metrics to CMAQ in the future.

This study had several strengths. No prior study has described the size and

temporal and spatial distribution of UNGD-related compressor stations, impoundments,

and flaring in Pennsylvania, and evaluated what information they added to GIS-based

metrics, or compared associations of different UNGD metrics with a health outcome.

This study also had several limitations. No UNGD metric took into consideration the full

variability of potential exposures: for example, safety practices (and potential accidental

exposures) differ between well operators,40 and impacts also vary over time, as

regulations (such as Act 13 in 2009) are enacted and industry practices change. We also

recognize that we likely captured different potential exposures with differing amounts of

measurement error. There are rigorous approaches to characterize each of the potential

chemical, physical, and social exposures from UNGD, but such approaches may not

work well retrospectively and are much more time-consuming and costly than GIS-based approaches. The IDS approach should capture exposures that are consistent during a given phase of well development, are absent between well phases of development, are the same across wells of the same depth, and decay based on one over distance squared, but we did not take environmental measurements to identify which exposures these assumptions hold for.

184

We likely underestimated counts of impoundments because we only had aerial imagery for four years between 2005 and 2013. Because the average estimated duration of an impoundment from installation to removal was 1.9 years, there were likely

impoundments that were installed and removed in between the years with images, and

thus would not have had the chance to make it into our dataset. Additionally, we did not

look for impoundments that were more than 1 km from a well. We likely underestimated

the number of compressor engines because we could not distinguish between

compressor engines missing a start letter and those never started. Additionally, we are

not able to evaluate if the original list of UNGD-related compressor stations from the

DEP was missing any stations. We also likely underestimated the number of flaring

events because we could not identify flaring events on cloudy nights. We did not have

information on whether compressor engines were diesel or natural gas powered. For the

PCA, we assigned metrics to points in a regular grid, instead of to the residential

locations of Geisinger patients, so that the locations of the points included in the PCA

would not be affected by residential patterns or population density. Although there is still

a spatial correlation structure between the grid points included in the PCA, we aimed

primarily to build an index rather than to study correlation structure, and thus we do not

consider this a major limitation. Finally, the PCA was restricted to the Geisinger region

and therefore may not be generalizable to areas where wells, compressors, and

impoundments are not co-located.

GIS proxies for UNGD were defensible metrics to capture multiple pathways

retrospectively for low cost in the initial studies of UNGD and health. However, without

environmental measurements, it is not possible to determine what pathways are

captured by the GIS proxies. This study highlights the need for future UNGD and health

studies to improve exposure assessment by collecting environmental measurements or

biomarkers. Only when we understand how UNGD is affecting health can we effectively

185

design interventions to reduce exposure.

186

5.6 References 1. Shale in the United States; https://www.eia.gov/energy_in_brief/article/shale_in_the_united_states.cfm#shaledata.

2. Adgate, J.L.; Goldstein, B.D.; McKenzie, L.M. Potential Public Health Hazards,

Exposures and Health Effects from Unconventional Natural Gas Development. Environ.

Sci. Technol. 2014, 48 (15), 8307-8320; DOI: 10.1021/es404621d.

3. Sangaramoorthy, T.; Jamison, A.M.; Boyle, M.D.; Payne-Sturges, D.; Sapkota, A.;

Milton, D.K.; Wilson, S.M. Place-Based Perceptions of the Impacts of Fracking along the

Marcellus Shale. Soc. Sci. Med. 2016, 151; DOI: 10.1016/j.socscimed.2016.01.002.

4. Powers, M.; Saberi, P.; Pepino, R.; Strupp, E.; Bugos, E.; Cannuscio, C.C. Popular

epidemiology and “fracking”: citizens’ concerns regarding the economic, environmental,

health and social impacts of unconventional natural gas drilling operations. J.

Community Health 2015, 40 (3), 534-541; DOI:10.1007/s10900-014-9968-x

5. Graham, J.; Irving, J.; Tang, X.; Sellers, S.; Crisp, J.; Horwitz, D.; Muehlenbachs, L.;

Krupnick, A.; Carey, D. Increased traffic accident rates associated with shale gas drilling in Pennsylvania. Accident Analysis & Prevention 2015, 74, 203-209; DOI:

10.1016/j.aap.2014.11.003.

6. Gopalakrishnan, S.; Klaiber, H.A. Is the shale energy boom a bust for nearby residents? Evidence from housing values in Pennsylvania. Am. J. Agric. Econ. 2014, 96

(1), 43-66; DOI: 10.1093/ajae/aat065.

7. Muehlenbachs, L.; Spiller, E.; Timmins, C. The Housing Market Impacts of Shale Gas

Development. Am. Econ. Rev. 2015, 105 (12), 3633-59; DOI: 10.1257/aer.20140079.

8. The Shale Tipping Point: The Relationship of Drilling to Crime, Traffic Fatalities,

STDs, and Rents in Pennsylvania, West Virginia, and Ohio; http://www.multistateshale.org/shale-tipping-point.

187

9. Brasier, K.J.; Rhubart, D. Effects of Marcellus shale development on the criminal justice system (The Marcellus impacts Project Report # 6); http://www.rural.palegislature.us/documents/reports/Marcellus-Report-6-Crime%20.pdf;

2014.

10. Pacsi, A.P.; Alhajeri, N.S.; Zavala-Araiza, D.; Webster, M.D.; Allen, D.T. Regional air quality impacts of increased natural gas production and use in Texas. Environ. Sci.

Technol. 2013, 47 (7), 3521-3527; DOI: 10.1021/es3044714.

11. Pacsi, A.P.; Kimura, Y.; McGaughey, G.; McDonald-Buller, E.; Allen, D.T. Regional

Ozone Impacts of Increased Natural Gas Use in the Texas Power Sector and

Development in the Eagle Ford Shale. Environ. Sci. Technol. 2015, 49 (6), 3966-3973;

DOI: 10.1021/es5055012.

12. Roy, A.A.; Adams, P.J.; Robinson, A.L. Air pollutant emissions from the development, production, and processing of Marcellus Shale natural gas. J. Air Waste

Manage. Assoc. 2013, 64 (1), 19-37; DOI: 10.1080/10962247.2013.826151.

13. Vinciguerra, T.; Yao, S.; Dadzie, J.; Chittams, A.; Deskins, T.; Ehrman, S.;

Dickerson, R.R. Regional air quality impacts of hydraulic fracturing and shale natural gas activity: Evidence from ambient VOC observations. Atmos. Environ. 2015, 110, 144-150;

DOI: 10.1016/j.atmosenv.2015.03.056.

14. McKenzie, L.M.; Witter, R.Z.; Newman, L.S.; Adgate, J.L. Human health risk assessment of air emissions from development of unconventional natural gas resources.

Sci. Total Environ. 2012, 424, 79-87; DOI: 10.1016/j.scitotenv.2012.02.018.

15. Osborn, S.G.; Vengosh, A.; Warner, N.R.; Jackson, R.B. Methane contamination of

drinking water accompanying gas-well drilling and hydraulic fracturing. Proc. Natl. Acad.

Sci. U. S. A. 2011, 108 (20), 8172-8176; DOI: 10.1073/pnas.1100682108.

16. Jackson, R.B.; Vengosh, A.; Darrah, T.H.; Warner, N.R.; Down, A.; Poreda, R.J.;

Osborn, S.G.; Zhao, K.; Karr, J.D. Increased stray gas abundance in a subset of drinking

188

water wells near Marcellus shale gas extraction. Proceedings of the National Academy of Sciences 2013, 110 (28), 11250-11255; DOI: 10.1073/pnas.1221635110.

17. Warner, N.R.; Jackson, R.B.; Darrah, T.H.; Osborn, S.G.; Down, A.; Zhao, K.; White,

A.; Vengosh, A. Geochemical evidence for possible natural migration of Marcellus

Formation brine to shallow aquifers in Pennsylvania. Proceedings of the National

Academy of Sciences 2012, 109 (30), 11961-11966; DOI: 10.1073/pnas.1121181109.

18. Olmstead, S.M.; Muehlenbachs, L.A.; Shih, J.; Chu, Z.; Krupnick, A.J. Shale gas

development impacts on surface water quality in Pennsylvania. Proceedings of the

National Academy of Sciences 2013, 110 (13), 4962-4967; DOI:

10.1073/pnas.1213871110.

19. Maloney, K.O.; Yoxtheimer, D.A. Production and Disposal of Waste Materials from

Gas and Oil Extraction from the Marcellus Shale Play in Pennsylvania. Env Prac 2012,

14 (04), 278-287; DOI: 10.1017/s146604661200035x.

20. Vengosh, A.; Jackson, R.B.; Warner, N.; Darrah, T.H.; Kondash, A. A critical review

of the risks to water resources from unconventional shale gas development and

hydraulic fracturing in the United States. Environ. Sci. Technol. 2014, 48 (15), 8334-

8348; DOI: 10.1021/es405118y.

21. Litovitz, A.; Curtright, A.; Abramzon, S.; Burger, N.; Samaras, C. Estimation of

regional air-quality damages from Marcellus Shale natural gas extraction in

Pennsylvania. Environmental Research Letters 2013, 8 (1), 014017; DOI:10.1088/1748-

9326/8/1/014017.

22. Casey, J.A.; Savitz, D.A.; Rasmussen, S.G.; Ogburn, E.L.; Pollak, J.; Mercer, D.G.;

Schwartz, B.S. Unconventional Natural Gas Development and Birth Outcomes in

Pennsylvania, USA. Epidemiology 2015; DOI: 10.1097/EDE.0000000000000387.

189

23. Stacy, S.L.; Brink, L.L.; Larkin, J.C.; Sadovsky, Y.; Goldstein, B.D.; Pitt, B.R.; Talbott,

E.O. Perinatal Outcomes and Unconventional Natural Gas Operations in Southwest

Pennsylvania. PLOS ONE 2015, 10 (6), e0126425; DOI: 10.1371/journal.pone.0126425.

24. McKenzie, L.M.; Guo, R.; Witter, R.Z.; Savitz, D.A.; Newman, L.S.; Adgate, J.L. Birth

Outcomes and Maternal Residential Proximity to Natural Gas Development in Rural

Colorado. Environ. Health Perspect. 2014; DOI: 10.1289/ehp.1306722.

25. Rasmussen, S.G.; Ogburn, E.L.; McCormack, M.; Casey, J.A.; Bandeen-Roche, K.;

Mercer, D.G.; Schwartz, B.S. Association Between Unconventional Natural Gas

Development in the Marcellus Shale and Asthma Exacerbations. JAMA Intern. Med.

2016, 176 (9), 1334-1343; DOI: 10.1001/jamainternmed.2016.2436.

26. Tustin, A.W.; Hirsch, A.G.; Rasmussen, S.G.; Casey, J.A.; Bandeen-Roche, K.;

Schwartz, B.S. Associations between Unconventional Natural Gas Development and

Nasal and Sinus, Migraine Headache, and Fatigue Symptoms in Pennsylvania. Environ.

Health Perspect. 2016; DOI: 10.1289/EHP281.

27. Saberi, P.; Propert, K.J.; Powers, M.; Emmett, E.; Green-McKenzie, J. Field survey of health perception and complaints of Pennsylvania residents in the Marcellus Shale region. Int. J. Environ. Res. Public. Health. 2014, 11 (6), 6517-6527; DOI:

10.3390/ijerph110606517.

28. Rabinowitz, P.M.; Slizovskiy, I.B.; Lamers, V.; Trufan, S.J.; Holford, T.R.; Dziura,

J.D.; Peduzzi, P.N.; Kane, M.J.; Reif, J.S.; Weiss, T.R.; Stowe, M.H. Proximity to Natural

Gas Wells and Reported Health Status: Results of a Household Survey in Washington

County, Pennsylvania. Environ. Health Perspect. 2014; DOI: 10.1289/ehp.1307732.

29. Finkel, M. Shale gas development and cancer incidence in southwest Pennsylvania.

Public Health 2016, 141, 198-206; DOI: 10.1016/j.puhe.2016.09.008.

30. Fryzek, J.; Pastula, S.; Jiang, X.; Garabrant, D.H. Childhood Cancer Incidence in

Pennsylvania Counties in Relation to Living in Counties With Hydraulic Fracturing Sites.

190

Journal of Occupational and Environmental Medicine 2013, 55 (7), 796-801; DOI:

10.1097/jom.0b013e318289ee02.

31. Jemielita, T.; Gerton, G.L.; Neidell, M.; Chillrud, S.; Yan, B.; Stute, M.; Howarth, M.;

Saberi, P.; Fausti, N.; Penning, T.M.; Roy, J.; Propert, K.J.; Panettieri, R.A., Jr.

Unconventional Gas and Oil Drilling Is Associated with Increased Hospital Utilization

Rates. PLoS ONE 2015, 10 (7), e0131093; DOI: 10.1371/journal.pone.0131093.

32. Olaguer, E.P. The potential near-source ozone impacts of upstream oil and gas

industry emissions. J. Air Waste Manag. Assoc. 2012, 62 (8), 966-977; DOI:

10.1080/10962247.2012.688923/

33. Pennsylvania Internet Record Imaging System Well Information System website,

http://www.dcnr.state.pa.us/topogeo/econresource/oilandgas/resrefs/wis_home/.

34. Oil & Gas Reporting Website,

https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Welcome/Welcome.

aspx.

35. National Agriculture Imagery Program website, https://www.fsa.usda.gov/programs-

and-services/aerial-photography/imagery-programs/naip-imagery/index.

36. Verbesselt, J.; Hyndman, R.; Newnham, G.; Culvenor, D. Detecting trend and

seasonal changes in satellite image time series. Remote Sens. Environ. 2010, 114 (1),

106-115; DOI: 10.1016/j.rse.2009.08.014.

37. Fracking Chemical Database website, http://frack.skytruth.org/fracking-chemical-

database.

38. Schmid, K.W. The Marcellus Shale Gas Play. Pennsylvania Geology 2012,

http://www.dcnr.state.pa.us/cs/groups/public/documents/document/dcnr_20027757.pdf.

39. EPA Needs to Improve Air Emissions Data for the Oil and Natural Gas Production

Sector; United States Environmental Protection Agency: Washington, DC, 2013;

https://www.epa.gov/sites/production/files/2015-09/documents/20130220-13-p-0161.pdf.

191

40. Abualfaraj, N.; Olson, M.S.; Gurian, P.L.; De Roos, A.; Gross-Davis, C.A. Statistical

analysis of compliance violations for natural gas wells in Pennsylvania. Energy Policy

2016, 97, 421-428; DOI: 10.1016/j.enpol.2016.07.051.

192

Chapter 6: Miscellaneous results

In this chapter we present additional results from Chapters 3 and 4, additional

sensitivity analyses not reported in those chapters, and additional hypotheses explored.

6.1 Additional Results for Chapter 3 6.1.1 Associations of covariates with event status

Presented below are the odds ratios for the covariates with event status in the 12

UNGD phase - asthma outcome models (Tables 6.1.1.1-6.1.1.3).

6.1.1 Race/ethnicity

For the hospitalization outcome, race/ethnicity was not statistically significantly associated with event status. For the emergency department outcome, patients with non-Hispanic black race/ethnicity and with Hispanic race/ethnicity had higher odds of the outcome compared to patients with non-Hispanic white race/ethnicity. For example, in the pad and emergency department model, patients with non-Hispanic black race/ethnicity had 4.87 times the odds of having an event (95% confidence interval [CI]:

2.45 - 9.66), and patients with Hispanic race/ethnicity had 3.44 times the odds of having an event (95% CI: 1.7 - 6.94). For the oral corticosteroid (OCS) outcome, patients with non-Hispanic black race/ethnicity had lower odds of event status (e.g., in the pad model,

odds ratio [OR] = 0.72, 95% CI: 0.58 - 0.90), and Hispanic race/ethnicity was not

associated with event status.

6.1.2 Family history

Family history of asthma was associated with event status in every model. For example, in the pad and hospitalization model, patients with a family history of asthma had 1.48 times the odds (95% CI: 1.20 - 1.83) of having the outcome than patients

without a family history. Similiarly, in the pad and emergency department model, the

odds ratio for family history was 3.25 (95% CI: 2.14 - 4.95); and in the pad and OCS

model, the odds ratio for family history was 1.45 (95% CI: 1.29 - 1.64).

193

6.1.3 Smoking status

For each of the outcomes, current and former smoking status (compared to never smoking) were both associated with event status, current smoking was more strongly associated than former smoking, and odds ratios were similar across models. In the pad and OCS model, for example, current smokers had 1.81 times the odds of having a hospitalization (95% CI: 1.61 - 2.04) compared to never smokers, and former smokers had 1.55 times the odds (95% CI: 1.39 - 1.73), compared to never smokers.

6.1.4 Season

For all three outcomes, index dates in the winter were more likely to be case events compared to those in spring, though the odds ratios were only consistently statistically significant across the four phases of well development with the OCS outcome (e.g., in the pad and OCS model, OR = 1.52, 95% CI: 1.34 - 1.73). Index dates

in the fall were less likely to be case events compared to those in spring, though the

odds ratios were only consistently statistically significant across the four phases of well

development with the hospitalization outcome (e.g., in the pad and hospitalization model,

OR = 0.67, 95% CI: 0.56 - 0.80). Finally, index dates in the summer were also less likely

to be case events compared to those in spring, though the odds ratios were only

consistently statistically significant across the four phases of well development with the

OCS outcome (e.g., in the pad and hospitalization model, OR = 0.65, 95% CI: 0.58 -

0.73).

6.1.5 Type 2 diabetes

Patients with type 2 diabetes, compared to those who did not have type 2 diabetes, were more likely to be cases, though the odds ratios were only statistically significant for the hospitalization outcome (e.g., in the pad and hospitalization model, OR

= 2.48, 95% CI: 2.04 - 3.01).

6.1.6 Community socioeconomic deprivation

194

Community socioeconomic deprivation was not statistically significantly associated with event status in models for any of the three outcomes. Odds ratios were elevated, but not statistically significant, for the emergency department outcome (e.g., in the pad model, OR = 1.23, 95% CI: 0.44 - 3.45).

6.1.7 Maximum temperature on prior day

Temperature was not associated with event status for the emergency department or hospitalization outcomes. For the OCS outcome, higher temperatures were statistically significantly associated with lower odds of event status compared to lower temperatures (e.g., for the pad and OCS model, OR for linear temperature = 0.98, 95%

CI: 0.98 - 0.99; OR for quadratic temperature = 0.998, 95% CI: 0.998 - 0.999).

6.1.7 Distance to nearest roadway

Distance to nearest major road was not associated with event status for any of

the three outcomes. Distance to nearest minor road was not associated with event

status in the hospitalization and OCS models. However, for the emergency department

outcome, patients living closer to minor roads had greater odds of event status than

patients living farther away (e.g., for the pad and emergency department model, OR for

linear distance = 0.90, 95% CI: 0.57 - 1.40; OR for quadratic distance = 0.68, 95% CI:

0.53 - 0.89).

195

Table 6.1.1.1. Odds ratios from oral corticosteroid (mild exacerbation) models. UNGD phase Pad Spud Stimulation Production Odds ratioa (95% CI) Odds ratio (95% Odds ratio (95% Odds ratio (95% CI) CI) CI) UNGD activity metric (ref: very low) Low 1.54 (1.37 - 1.74) 1.45 (1.29 - 1.63) 1.23 (1.09 - 1.39) 1.28 (1.13 - 1.46) Medium 1.66 (1.47 - 1.87) 1.98 (1.75 - 2.24) 2.22 (1.95 - 2.53) 2.15 (1.87 - 2.47) High 1.59 (1.41 - 1.81) 1.99 (1.75 - 2.26) 3.00 (2.60 - 3.45) 4.43 (3.75 - 5.22) Race/ethnicity (ref: white) Black 0.72 (0.58 - 0.9) 0.71 (0.57 - 0.9) 0.69 (0.54 - 0.88) 0.66 (0.51 - 0.85) Hispanic 0.86 (0.70 - 1.07) 0.85 (0.68 - 1.06) 0.84 (0.66 - 1.06) 0.79 (0.62 - 1.02) Other/missing 0.85 (0.56 - 1.27) 0.83 (0.55 - 1.26) 0.82 (0.53 - 1.29) 0.82 (0.51 - 1.32) Family history (ref: no) 1.45 (1.29 - 1.64) 1.46 (1.29 - 1.66) 1.48 (1.29 - 1.69) 1.49 (1.29 - 1.72) Smoking status (ref: never) Current 1.81 (1.61 - 2.04) 1.85 (1.64 - 2.09) 1.91 (1.68 - 2.18) 2.00 (1.74 - 2.29) Former 1.55 (1.39 - 1.73) 1.57 (1.40 - 1.76) 1.60 (1.41 - 1.80) 1.64 (1.44 - 1.86) Missing 0.65 (0.56 - 0.76) 0.67 (0.57 - 0.78) 0.71 (0.60 - 0.84) 0.76 (0.64 - 0.91) Seasonb (ref: spring) Summer 0.65 (0.58 - 0.73) 0.65 (0.58 - 0.73) 0.59 (0.52 - 0.67) 0.58 (0.51 - 0.66) Fall 0.99 (0.90 - 1.09) 0.98 (0.88 - 1.08) 0.91 (0.82 - 1.02) 0.85 (0.76 - 0.95) Winter 1.52 (1.34 - 1.73) 1.49 (1.31 - 1.70) 1.52 (1.33 - 1.74) 1.51 (1.31 - 1.74) Medical Assistance (ref: no) 1.19 (1.09 - 1.32) 1.19 (1.07 - 1.31) 1.16 (1.04 - 1.29) 1.14 (1.02 - 1.28) Overweight/obesityc (ref: normal) Overweight 1.42 (1.28 - 1.58) 1.44 (1.29 - 1.60) 1.47 (1.31 - 1.65) 1.51 (1.34 - 1.70) Obese 1.90 (1.72 - 2.09) 1.93 (1.75 - 2.14) 2.02 (1.82 - 2.25) 2.10 (1.88 - 2.35) Missing BMI 0.60 (0.32 - 1.14) 0.59 (0.31 - 1.15) 0.57 (0.28 - 1.14) 0.57 (0.27 - 1.21) Type 2 diabetes (ref: no) 1.04 (0.9 - 1.21) 1.05 (0.90 - 1.22) 1.05 (0.89 - 1.23) 1.07 (0.9 - 1.27) CSD quartile (ref: quartile 1) Quartile 2 0.89 (0.76 - 1.04) 0.89 (0.76 - 1.04) 0.88 (0.74 - 1.05) 0.86 (0.71 - 1.04) Quartile 3 0.95 (0.82 - 1.11) 0.95 (0.81 - 1.11) 0.95 (0.79 - 1.13) 0.92 (0.76 - 1.11) Quartile 4 0.90 (0.77 - 1.05) 0.90 (0.76 - 1.06) 0.90 (0.75 - 1.08) 0.89 (0.73 - 1.08) Maximum temperature on prior day, degrees Celsius

196

UNGD phase Pad Spud Stimulation Production Odds ratioa (95% CI) Odds ratio (95% Odds ratio (95% Odds ratio (95% CI) CI) CI) Centered 0.98 (0.98 - 0.99) 0.98 (0.98 - 0.99) 0.98 (0.98 - 0.99) 0.98 (0.98 - 0.99) Centered and squared 0.998 (0.998 - 0.998 (0.998 - 0.998 (0.998 - 0.998 (0.998 - 0.999) 0.999) 0.999) 0.999) Distance to nearest major road, meters Truncated at the 98th percentile, z- 0.99 (0.9 - 1.09) 0.99 (0.90 - 1.09) 1.001 (0.90 - 1.11) 0.999 (0.89 - 1.12) transformed Truncated at the 98th percentile, z- 1.01 (0.97 - 1.05) 1.01 (0.96 - 1.05) 0.999 (0.95 - 1.05) 0.997 (0.95 - 1.05) transformed, squared Distance to nearest minor road, meters Truncated at the 98th percentile, z- 0.96 (0.87 - 1.05) 0.96 (0.87 - 1.06) 0.96 (0.86 - 1.06) 0.97 (0.87 - 1.09) transformed Truncated at the 98th percentile, z- 1.02 (0.97 - 1.07) 1.02 (0.97 - 1.07) 1.02 (0.96 - 1.07) 1.01 (0.96 - 1.07) transformed, squared Abbreviations: UNGD, unconventional natural gas development; CI, confidence interval; ref, reference group a Multilevel models with a random intercept for patient and community, b Spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21 c Normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults

197

Table 6.1.1.2. Odds ratios from emergency encounter (moderate exacerbation) models. UNGD phase Pad Spud Stimulation Production Odds ratioa (95% CI) Odds ratio (95% CI) Odds ratio (95% CI) Odds ratio (95% CI) UNGD activity metric (ref: very low) 1.53 (1.06 - 2.23) 1.53 (1.06 - 2.21) 1.51 (1.05 - 2.19) 1.47 (1.01 - 2.14) Low 1.77 (1.20 - 2.6) 1.54 (1.04 - 2.27) 1.74 (1.17 - 2.61) 1.10 (0.74 - 1.65) Medium 1.37 (0.94 - 1.99) 1.57 (1.08 - 2.29) 1.71 (1.16 - 2.52) 2.19 (1.47 - 3.25) High Race/ethnicity (ref: white) Black 4.87 (2.45 - 9.66) 4.81 (2.42 - 9.58) 4.82 (2.41 - 9.65) 5.03 (2.49 - 10.15) Hispanic 3.44 (1.70 - 6.94) 3.45 (1.70 - 6.98) 3.41 (1.68 - 6.95) 3.40 (1.66 - 6.98) Other/missing 0.83 (0.18 - 3.81) 0.83 (0.18 - 3.84) 0.82 (0.17 - 3.86) 0.80 (0.17 - 3.86) Family history (ref: no) 3.25 (2.14 - 4.95) 3.28 (2.15 – 5.00) 3.31 (2.16 - 5.06) 3.33 (2.17 - 5.13) Smoking status (ref: never) Current 1.91 (1.26 - 2.9) 1.90 (1.25 - 2.89) 1.91 (1.25 - 2.92) 1.95 (1.27 - 3.00) Former 1.54 (0.997 - 2.37) 1.55 (1.002 - 2.39) 1.56 (1.01 - 2.42) 1.58 (1.01 - 2.46) Missing 3.24 (2.10 - 5.00) 3.24 (2.09 - 5.01) 3.35 (2.15 - 5.22) 3.50 (2.23 - 5.5) Seasonb (ref: spring) Summer 0.76 (0.50 - 1.17) 0.77 (0.50 - 1.19) 0.75 (0.48 - 1.15) 0.75 (0.48 - 1.16) Fall 0.88 (0.60 - 1.28) 0.87 (0.60 - 1.27) 0.83 (0.57 - 1.22) 0.82 (0.56 - 1.21) Winter 1.57 (0.97 - 2.55) 1.49 (0.92 - 2.42) 1.53 (0.94 - 2.50) 1.48 (0.90 - 2.42) Medical Assistance (ref: no) 2.25 (1.6 - 3.17) 2.26 (1.6 - 3.18) 2.22 (1.57 - 3.13) 2.23 (1.58 - 3.16) Overweight/obesityc (ref: normal) Overweight 1.11 (0.75 - 1.63) 1.11 (0.75 - 1.64) 1.12 (0.76 - 1.66) 1.12 (0.75 - 1.66) Obese 1.95 (1.36 - 2.78) 1.95 (1.36 - 2.78) 1.96 (1.37 - 2.81) 1.98 (1.38 - 2.85) Missing BMI 14.01 (2.9 - 67.65) 14.27 (2.92 - 69.69) 14.58 (2.94 - 72.21) 14.62 (2.89 – 74.00) Type 2 diabetes (ref: no) 1.73 (0.95 - 3.14) 1.72 (0.94 - 3.14) 1.73 (0.94 - 3.17) 1.75 (0.95 - 3.22) CSD quartile (ref: quartile 1) Quartile 2 0.89 (0.33 - 2.41) 0.89 (0.33 - 2.40) 0.89 (0.33 - 2.45) 0.90 (0.33 - 2.47) Quartile 3 1.20 (0.44 - 3.32) 1.19 (0.43 - 3.29) 1.21 (0.43 - 3.37) 1.21 (0.43 - 3.38) Quartile 4 1.23 (0.44 - 3.45) 1.24 (0.44 - 3.48) 1.25 (0.44 - 3.54) 1.24 (0.44 - 3.53)

198

UNGD phase Pad Spud Stimulation Production Odds ratioa (95% CI) Odds ratio (95% CI) Odds ratio (95% CI) Odds ratio (95% CI) Maximum temperature on prior day, degrees Celsius Centered 0.998 (0.98 - 1.02) 0.996 (0.98 - 1.02) 0.998 (0.98 - 1.02) 0.998 (0.98 - 1.02) Centered and squared 0.999 (0.997 - 0.999 (0.997 - 0.999 (0.997 - 0.999 (0.997 - 1.00001) 1.00004) 0.99998) 1.0001) Distance to nearest major road, meters Truncated at the 98th percentile, 0.92 (0.56 - 1.52) 0.92 (0.55 - 1.53) 0.92 (0.55 - 1.53) 0.91 (0.54 - 1.52) z-transformed Truncated at the 98th percentile, 1.01 (0.81 - 1.26) 1.01 (0.81 - 1.26) 1.01 (0.81 - 1.26) 1.01 (0.81 - 1.27) z-transformed, squared Distance to nearest minor road, meters Truncated at the 98th percentile, 0.90 (0.57 - 1.40) 0.90 (0.58 - 1.41) 0.90 (0.57 - 1.41) 0.91 (0.58 - 1.43) z-transformed Truncated at the 98th percentile, 0.68 (0.53 - 0.89) 0.68 (0.52 - 0.89) 0.68 (0.52 - 0.89) 0.67 (0.51 - 0.88) z-transformed, squared Abbreviations: UNGD, unconventional natural gas development; CI, confidence interval; ref, reference group a Multilevel models with a random intercept for patient and community, b Spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21 c Normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults

199

Table 6.1.1.3. Odds ratios from hospitalization (severe exacerbation) models.

UNGD phase Pad Spud Stimulation Production Odds ratioa (95% CI) Odds ratio (95% CI) Odds ratio (95% CI) Odds ratio (95% CI) UNGD activity metric (ref: very low) Low 1.26 (1.06 - 1.5) 1.16 (0.98 - 1.37) 1.13 (0.96 - 1.33) 1.10 (0.92 - 1.30) Medium 1.37 (1.15 - 1.64) 1.26 (1.05 - 1.50) 1.31 (1.10 - 1.57) 1.16 (0.97 - 1.38) High 1.45 (1.21 - 1.73) 1.64 (1.38 - 1.97) 1.66 (1.38 - 1.98) 1.74 (1.45 - 2.09) Race/ethnicity (ref: white) Black 1.15 (0.77 - 1.73) 1.15 (0.77 - 1.73) 1.14 (0.76 - 1.72) 1.13 (0.75 - 1.70) Hispanic 1.41 (0.96 - 2.08) 1.43 (0.97 - 2.11) 1.41 (0.96 - 2.09) 1.41 (0.95 - 2.08) Other/missing 0.87 (0.39 - 1.91) 0.86 (0.39 - 1.91) 0.86 (0.39 - 1.91) 0.85 (0.38 - 1.9) Family history (ref: no) 1.48 (1.20 - 1.83) 1.47 (1.19 - 1.82) 1.48 (1.19 - 1.83) 1.48 (1.19 - 1.83) Smoking status (ref: never) Current 1.92 (1.61 - 2.28) 1.94 (1.62 - 2.31) 1.94 (1.63 - 2.32) 1.96 (1.64 - 2.34) Former 1.51 (1.28 - 1.78) 1.52 (1.29 - 1.79) 1.51 (1.28 - 1.79) 1.51 (1.28 - 1.78) Missing 1.36 (1.04 - 1.77) 1.36 (1.04 - 1.78) 1.37 (1.05 - 1.80) 1.38 (1.05 - 1.81) Seasonb (ref: spring) Summer 0.87 (0.72 - 1.05) 0.88 (0.73 - 1.07) 0.85 (0.71 - 1.03) 0.86 (0.71 - 1.04) Fall 0.67 (0.56 - 0.80) 0.67 (0.56 - 0.80) 0.66 (0.55 - 0.78) 0.65 (0.54 - 0.77) Winter 1.26 (1.02 - 1.57) 1.24 (0.996 - 1.54) 1.25 (1.002 - 1.55) 1.24 (0.998 - 1.54) Medical Assistance (ref: no) 3.32 (2.79 - 3.95) 3.31 (2.78 - 3.93) 3.34 (2.8 - 3.97) 3.35 (2.81 - 3.99) Overweight/obesityc (ref: normal) 1.14 (0.95 - 1.37) 1.14 (0.95 - 1.37) 1.14 (0.95 - 1.37) 1.15 (0.95 - 1.38) Overweight 1.52 (1.29 - 1.79) 1.53 (1.29 - 1.80) 1.53 (1.29 - 1.81) 1.53 (1.30 - 1.81) Obese 0.65 (0.26 - 1.64) 0.65 (0.26 - 1.64) 0.65 (0.26 - 1.64) 0.64 (0.25 - 1.63) Missing BMI Type 2 diabetes (ref: no) 2.48 (2.04 - 3.01) 2.48 (2.04 - 3.02) 2.49 (2.04 - 3.03) 2.5 (2.05 - 3.05) CSD quartile (ref: quartile 1) Quartile 2 0.85 (0.59 - 1.21) 0.84 (0.59 - 1.21) 0.85 (0.59 - 1.21) 0.84 (0.58 - 1.20) Quartile 3 0.95 (0.66 - 1.36) 0.95 (0.66 - 1.36) 0.94 (0.65 - 1.36) 0.94 (0.65 - 1.35) Quartile 4 0.75 (0.52 - 1.09) 0.75 (0.52 - 1.10) 0.75 (0.51 - 1.09) 0.74 (0.51 - 1.08)

200

UNGD phase Pad Spud Stimulation Production Odds ratioa (95% CI) Odds ratio (95% CI) Odds ratio (95% CI) Odds ratio (95% CI) Maximum temperature on prior day, degrees Celsius Centered 1.01 (0.997 - 1.01) 1.01 (0.996 - 1.01) 1.01 (0.997 - 1.02) 1.01 (0.997 - 1.02) Centered and squared 1.0002 (0.9996 - 1.0002 (0.9996 - 1.0002 (0.996 – 1.001) 1.0002 (0.9996 - 1.001) 1.001) 1.001) Distance to nearest major road, meters Truncated at the 98th 0.88 (0.72 - 1.07) 0.89 (0.73 - 1.08) 0.88 (0.73 - 1.08) 0.89 (0.73 - 1.08) percentile, z-transformed Truncated at the 98th 1.04 (0.96 - 1.14) 1.04 (0.95 - 1.13) 1.04 (0.96 - 1.14) 1.04 (0.95 - 1.14) percentile, z-transformed, squared Distance to nearest minor road, meters Truncated at the 98th 0.96 (0.81 - 1.15) 0.96 (0.80 - 1.15) 0.96 (0.80 - 1.15) 0.97 (0.81 - 1.16) percentile, z-transformed Truncated at the 98th 0.99 (0.90 - 1.08) 0.99 (0.90 - 1.08) 0.99 (0.90 - 1.08) 0.98 (0.90 - 1.08) percentile, z-transformed, squared Abbreviations: UNGD, unconventional natural gas development; CI, confidence interval; ref, reference group a Multilevel models with a random intercept for patient and community, b Spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21 c Normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults

201

6.1.2 Additional sensitivity analyses

Presented here are additional sensitivity analyses for the study on UNGD and asthma exacerbations.

6.1.2.1 Stimulation extrapolation methods

More than a third (34.6%) of wells were missing stimulation dates, and we were concerned that this missing data could cause bias. However, because the stimulation dates are bounded by the spud and production start date, we hypothesized that the potential for exposure misclassification caused by extrapolated dates was minimal. As long as the spatial distribution of the wells with and without extrapolated stimulation dates was random, as it appears in Figure 6.1.2.1, the extrapolation of stimulation dates should not account for our results.

Figure 6.1.2.1. Locations of Wells with Extrapolated Stimulation Dates.

To evaluate the sensitivity of our results to stimulation date extrapolation, we conducted an analysis that replaced all extrapolated stimulation dates with a date 30 days after the spud date. We used these new dates to calculate a new stimulation

202

metric, and then re-ran the final model for the hospitalization outcome. The results of this sensitivity analysis (Table 6.1.2.1) show comparable odds ratios and overlapping confidence intervals for the extrapolated stimulation dates (as presented in Chapter 3) and sensitivity stimulation dates (replacing the extrapolated stimulation dates with a date

30 days after the spud date).

Table 6.1.2.1. Associations of UNGD stimulation metrics creating with extrapolated and sensitivity stimulation dates and asthma hospitalizations Extrapolateda stimulation datesc Sensitivity b stimulation datesc Odds Ratio (95% CId) Odds Ratio (95% CI) Low e 1.14 (0.97 - 1.34) 1.22 (1.04-1.44) Medium 1.32 (1.10 - 1.57) 1.50 (1.26-1.78) High 1.66 (1.39 - 1.98) 1.47 (1.23-1.76) a As presented in Chapter 3. b Extrapolated stimulation dates replaced with a date 30 days after the spud date. c Multilevel models with a random intercept for patient and community, adjusted for race/ethnicity (white, black, Hispanic, other), family history of asthma (yes vs. no), smoking status (never, former, current, missing), season (spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21), Medical Assistance (yes vs. no), overweight/obesity (normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults, respectively; BMI missing), type 2 diabetes (yes vs. no), community socioeconomic deprivation (quartiles), distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), squared distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), maximum temperature on the day prior to event (degrees Celsius), and squared maximum temperature on the day prior to event (degrees Celsius) d Confidence interval e Very low is the reference group

6.1.2.2 Control Encounter Dates

In Section 3.3.3, for controls, we selected a random encounter date with the

Geisinger Health System to have a date to assign time-varying covariates. There are

several advantages to using a date with actual patient contact: first, it made sure that the

patient was under observation, and had they had an exacerbation, they would have

been evaluated at a Geisinger facility. Using a specific contact date gave us confidence

that the patient was in the area and so that they could have had exposure on the day

before. Additionally, using a specific contact date increased the likelihood that if there

203

had been a major change to one of the patient’s time-varying covariates (i.e., change in

BMI or smoking status), that the EHR would have recorded this change.

However, we also considered using a random encounter date for patients instead

of a contact date. We completed a sensitivity analysis to evaluate the sensitivity of our

results to using a randomly selected patient contact date instead of a random date from

all dates within the year. We randomly selected a date from all dates within the year and

assigned the spud activity metric on those dates for controls instead of on the randomly

selected contact dates. The results of this sensitivity analysis (Table 6.1.2.2) show comparable odds ratios and overlapping confidence intervals whether a random encounter date or a random date from the entire year was used for controls.

Table 6.1.2.2. Associations of UNGD spud metrics assigned on random encounter dates vs. random dates and asthma hospitalizations Control spud activity metric assigned on Control Spud Activity Metric randomly selected index datea,b Assigned on Randomly Selected Dateb,c Odds ratio (95% CId) Odds ratio (95% CI) Low e 1.17 (0.98 - 1.39) 1.30 (1.11-1.54) Medium 1.26 (1.06 - 1.50) 1.28 (1.08-1.52) High 1.65 (1.38 - 1.97) 1.45 (1.22-1.72) a As presented in Chapter 3. b Multilevel models with a random intercept for patient and community, adjusted for race/ethnicity (white, black, Hispanic, other), family history of asthma (yes vs. no), smoking status (never, former, current, missing), season (spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21), Medical Assistance (yes vs. no), overweight/obesity (normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults, respectively; BMI missing), type 2 diabetes (yes vs. no), community socioeconomic deprivation (quartiles), distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), squared distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), maximum temperature on the day prior to event (degrees Celsius), and squared maximum temperature on the day prior to event (degrees Celsius) c Random date in year selected. d Confidence interval e Very low is the reference group

6.1.2.3 Inverse distance squared vs. cubed metric

204

We used inverse-distance squared UNGD activity metrics, but we were concerned about our assumption about a squared decay function. To evaluate the sensitivity of our results to the functional form of the inverse-distance-squared weighting,

we conducted an analysis that assigned the spud exposure using distance-cubed

instead. We then re-ran the final model for the hospitalization outcome. The results of

this sensitivity analysis (Table 6.1.2.3) show comparable odds ratios and overlapping confidence intervals for the spud activity metric created using distance-squared in the denominator compared to using distance-cubed.

Table 6.1.2.3. Associations of spud activity metrics assigned using distance squared vs. distance cubed Spud activity metric assigned using Spud activity metric assigned using distance squareda,b distance cubedb Odds ratio (95% CIc) Odds ratio (95% CI) Low d 1.17 (0.98 - 1.39) 1.20 (1.02-1.41) Medium 1.26 (1.06 - 1.50) 1.44 (1.21-1.71) High 1.65 (1.38 - 1.97) 1.44 (1.21-1.71) a As presented in Chapter 3. b Multilevel models with a random intercept for patient and community, adjusted for race/ethnicity (white, black, Hispanic, other), family history of asthma (yes vs. no), smoking status (never, former, current, missing), season (spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21), Medical Assistance (yes vs. no), overweight/obesity (normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults, respectively; BMI missing), type 2 diabetes (yes vs. no), community socioeconomic deprivation (quartiles), distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), squared distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), maximum temperature on the day prior to event (degrees Celsius), and squared maximum temperature on the day prior to event (degrees Celsius) c Confidence interval d Very low is the reference group

6.1.2.4 Distance to hospital

We were concerned that distance to hospital might be a confounder in the

hospitalization and emergency department analyses. Patients who lived closer to the

hospital might be more likely to go the hospital for an exacerbation, while patients who

lived farther away might seek care over the phone or in an outpatient center. The

205

distance to hospital was much shorter among patients with events than without (Table

6.1.2.4.1).

To evaluate distance to hospital as a confounder, we added distance to hospital as a covariate in the spud and hospitalization model. We z-transformed the distance to hospital variable, and we added the standardized and the squared standardized variables to the model. The results of this sensitivity analysis (Table 6.1.2.4.2) show comparable odds ratios and overlapping confidence intervals whether or not distance to hospital was in the model.

Table 6.1.2.4.1. Median distance to closer Geisinger Hospital by event and event status, km Hospitalization Emergency OCS Control 37.1 37.7 36.8 Case 21.0 12.4 38.9

Table 6.1.2.4.2. Associations of the UNGD spud metric and hospitalization outcome without and with distance to hospital in the model. Final Modela,b Adding distance to hospital to the modelc Odds Ratio (95% CId) Odds Ratio (95% CId) Spud Activity Low 1.17 (0.98 - 1.39) 1.17 (0.99-1.38) Metric e Medium 1.26 (1.06 - 1.50) 1.23 (1.03-1.46) High 1.65 (1.38 - 1.97) 1.49 (1.26-1.77) a As presented in Chapter 3. b Multilevel models with a random intercept for patient and community, adjusted for race/ethnicity (white, black, Hispanic, other), family history of asthma (yes vs. no), smoking status (never, former, current, missing), season (spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21), Medical Assistance (yes vs. no), overweight/obesity (normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults, respectively; BMI missing), type 2 diabetes (yes vs. no), community socioeconomic deprivation (quartiles), distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), squared distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), maximum temperature on the day prior to event (degrees Celsius), and squared maximum temperature on the day prior to event (degrees Celsius) c Distance to hospital added as a z-score and a squared z-score d Confidence interval e Very low is the reference group

206

6.2 Additional Results for Chapter 4 6.2.1 Associations of covariates with event status

Presented below are the associations of the covariates in the models of UNGD with the level of depression symptoms (multinomial logistic regression) and with the burden of depression symptoms (negative binomial regression) (Tables 6.2.1.1 and

6.1.1.2). Only results from truncated survey-weighted models (Section 4.2.6), which were the primary analyses, are discussed below, but the associations of the covariates in fully-weighted and unweighted models, which were sensitivity analyses, are presented in Tables 6.2.1.3 - 6.2.1.6. Associations generally had similar interpretations in fully- weighted and unweighted models as in the truncated weighted models, though they tended to be stronger in fully-weighted models and weaker in unweighted models.

Below, all continuous variables were first centered and then both linear and quadratic terms were included in models to evaluate non-linearity.

6.2.1.1 Race / ethnicity

Race/ethnicity was not associated with the level of depression symptoms or with the burden of depression symptoms.

6.2.1.2 Sex

Female sex was associated with a higher burden of depression symptoms and with mild (but not moderate or severe) depression symptoms, compared to no / minimal symptoms. For examples, females had 1.46 times the odds (95% CI: 1.15 - 1.84) of having mild depression symptoms compared to males.

6.2.1.3 Age

Younger age was associated with both lower odds of mild, moderate, and moderately severe/severe depression symptoms (compared to no / minimal symptoms), and with a lower burden of depression symptoms (exponentiated coefficient, modeled as a continuous variable, = 0.99, 95% CI =0.99 - 0.99).

207

6.2.1.4 Smoking status

Current and former smoking was associated with a lower burden of depression symptoms and with severe depression symptoms (compared to no / minimal symptoms), but not with mild or moderate depression symptoms. Current smokers had 2.88 times the odds of having severe depression symptoms (95% CI: 1.45 - 5.73) compared to

never smokers, and former smokers had 2.15 times the odds (95% CI: 1.26 - 3.65),

compared to never smokers.

6.2.1.5 Alcohol status

Alcohol status was not statistically significantly associated with either the burden

of depression symptoms or the level of depression symptoms. Odds ratios were

elevated, but not statistically significant, for the association of heavy alcohol status with

mild and moderate depression symptoms (e.g., for moderate depression symptoms, OR

= 1.84, 95% CI: 0.95 - 3.58).

6.1.2.6 Medical assistance

Patients with Medical Assistance, compared to those without, had both a higher

burden of depression symptoms and higher odds of mild, moderate, and moderately

severe/severe depression symptoms. For example, having Medical Assistance was

associated with 1.62 times the odds of having mild depression symptoms (95% CI: 1.07

- 2.44) compared to not having Medical Assistance.

6.2.1.7 Body mass index

Higher body mass index, modeled as a continuous variable, was associated with

both a higher burden of depression symptoms and higher odds of mild and moderate

(but not moderately severe/severe) depression symptoms. For example, each additional

point in body mass index was associated with 1.04 times the odds (95% CI: 1.02 - 1.06)

of mild depression.

6.1.2.8 Community socioeconomic deprivation

208

Community socioeconomic deprivation was not statistically significantly associated with either the burden of depression symptoms or the level of depression symptoms.

6.2.1.9 Well water

Residential well water, compared to municipal water, was not associated with the

burden of depression symptoms or having mild or moderate depression symptoms.

Patients with well water were less likely to have moderately severe / severe depression

symptoms than patients with municipal water (odds ratio = 0.53, 95% CI: 0.32 - 0.87).

209

Table 6.2.1.1. Exponentiated coefficients from the truncated-weighted negative binomial model. Exponentiated coefficienta (95% CI) UNGD activity metricb (ref: very low) Low 1.14 (1.01 - 1.29) Medium 1.03 (0.91 - 1.17) High 1.18 (1.04 - 1.34) Race/ethnicity (ref: white) Black 0.83 (0.65 - 1.06) Hispanic 0.85 (0.71 - 1.02) Age Centered 0.99 (0.99 - 0.99) Centered and squared 0.9999 (0.9997 - 1.000048) Sex (ref: male) 1.14 (1.04 - 1.26) Smoking status (ref: never) Current 1.30 (1.09 - 1.54) Former 1.18 (1.06 - 1.31) Alcohol status (ref: no) Yes, not heavy 1.01 (0.92 - 1.11) Yes, heavyc 1.14 (0.95 - 1.37) Medical Assistance (ref: no) 1.54 (1.32 - 1.81) Body mass index Centered 1.02 (1.01 - 1.02) Centered and squared 0.9999 (0.9992 - 1.001) CSD Continuous 1.01 (0.996 - 1.02) Centered and squared 1.0006 (0.998 - 1.003) Well water (ref: municipal water) 0.94 (0.86 - 1.04) Abbreviations: UNGD, unconventional natural gas development; CI, confidence interval; ref, reference group a Truncated-weighted negative binomial model b The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the two weeks prior to follow-up survey return. c Heavy was defined based on the Centers for Disease Control definition of as 8 or more drinks per for females and 15 or more drinks per week for males

210

Table 6.2.1.2. Odds ratios from the truncated-weighted multinomial logistic model. Odds ratioa (95% CI) Odds ratioa (95% CI) Odds ratioa (95% CI) Level of depression symptomsb Mild Moderate Severe UNGD activity metricc (ref: very low) Low 1.63 (1.21 - 2.19) 1.22 (0.80 - 1.86) 1.13 (0.61 - 2.06) Medium 1.25 (0.92 - 1.71) 1.04 (0.68 - 1.60) 0.89 (0.47 - 1.69) High 1.51 (1.12 - 2.04) 1.26 (0.83 - 1.92) 1.39 (0.76 - 2.54) Race/ethnicity (ref: white) Black 0.78 (0.47 - 1.28) 0.56 (0.27 - 1.15) 0.79 (0.3 - 2.09) Hispanic 0.68 (0.43 - 1.08) 0.66 (0.36 - 1.19) 1.13 (0.58 - 2.2) Age Centered 0.99 (0.98 - 0.998) 0.98 (0.97 - 0.99) 0.97 (0.95 - 0.99) Centered and squared 1.0002 (0.9998 - 1.0005) 0.9996 (0.9991 - 1.0002) 0.9995 (0.999 - 1.0005) Sex (ref: male) 1.46 (1.15 - 1.84) 0.99 (0.71 - 1.37) 1.41 (0.85 - 2.34) Smoking status (ref: never) Current 1.48 (0.97 - 2.24) 1.12 (0.59 - 2.13) 2.88 (1.45 - 5.73) Former 1.25 (0.97 - 1.60) 1.24 (0.87 - 1.79) 2.15 (1.26 - 3.65) Alcohol status (ref: no) Yes, not heavy 1.15 (0.91 - 1.44) 1.06 (0.75 - 1.48) 0.69 (0.42 - 1.12) Yes, heavyd 1.43 (0.90 - 2.29) 1.84 (0.95 - 3.58) 1.03 (0.47 - 2.26) Medical Assistance (ref: no) 1.62 (1.07 - 2.44) 2.01 (1.19 - 3.41) 4.14 (2.24 - 7.64) Body mass index Centered 1.04 (1.02 - 1.06) 1.05 (1.02 - 1.08) 1.02 (0.99 - 1.06) Centered and squared 0.996 (0.99 - 0.998) 1.001 (0.998 - 1.003) 0.9998 (0.997 - 1.003) CSD Continuous 1.03 (0.996 - 1.07) 1.01 (0.97 - 1.06) 1.03 (0.967 - 1.09) Centered and squared 1.0004 (0.99 - 1.01) 1.0013 (0.99 - 1.01) 0.9998 (0.99 - 1.01) Well water (ref: municipal water) 1.21 (0.95 - 1.53) 0.90 (0.64 - 1.27) 0.53 (0.32 - 0.87) Abbreviations: UNGD, unconventional natural gas development; CI, confidence interval; ref, reference group a Truncated-weighted multinomial logistic model b No depression symptoms was the base outcome. c The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the two weeks prior to follow-up survey return. d Heavy was defined based on the Centers for Disease Control definition of as 8 or more drinks per for females and 15 or more drinks per week for males

211

Table 6.2.1.3. Exponentiated coefficients from the fully-weighted negative binomial model. Exponentiated coefficienta (95% CI) UNGD activity metricb (ref: very low) Low 1.12 (0.94 - 1.34) Medium 1.07 (0.88 - 1.29) High 1.29 (1.08 - 1.56) Race/ethnicity (ref: white) Black 0.89 (0.69 - 1.15) Hispanic 0.93 (0.76 - 1.13) Age Centered 0.99 (0.99 - 0.99) Centered and squared 0.9999 (0.9997 - 1.0001) Sex (ref: male) 1.22 (1.06 - 1.4) Smoking status (ref: never) Current 1.21 (0.97 - 1.53) Former 1.19 (1.02 - 1.39) Alcohol status (ref: no) Yes, not heavy 1.04 (0.91 - 1.20) Yes, heavyc 1.19 (0.93 - 1.51) Medical Assistance (ref: no) 1.44 (1.15 - 1.81) Body mass index Centered 1.01 (1.001 - 1.02) Centered and squared 0.99993 (0.999 - 1.0009) CSD Continuous 1.02 (1.001 - 1.04) Centered and squared 0.9995 (0.996 - 1.003) Well water (ref: municipal water) 0.89 (0.78 - 1.03) Abbreviations: UNGD, unconventional natural gas development; CI, confidence interval; ref, reference group a Fully-weighted negative binomial model b The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the two weeks prior to follow-up survey return. c Heavy was defined based on the Centers for Disease Control definition of as 8 or more drinks per for females and 15 or more drinks per week for males

212

Table 6.2.1.4. Odds ratios from the fully-weighted multinomial logistic model. Odds ratioa (95% CI) Odds ratioa (95% CI) Odds ratioa (95% CI) Level of depression symptomsb Mild Moderate Severe UNGD activity metricc (ref: very low) Low 1.72 (1.14 - 2.59) 1.20 (0.67 - 2.14) 0.93 (0.37 - 2.34) Medium 1.29 (0.84 - 1.98) 1.23 (0.66 - 2.28) 0.68 (0.26 - 1.79) High 1.95 (1.28 - 2.97) 1.77 (0.98 - 3.20) 1.47 (0.63 - 3.46) Race/ethnicity (ref: white) Black 0.75 (0.43 - 1.30) 0.67 (0.32 - 1.42) 0.85 (0.29 - 2.47) Hispanic 0.71 (0.43 - 1.16) 0.77 (0.42 - 1.44) 1.16 (0.53 - 2.53) Age Centered 0.99 (0.98 - 0.999) 0.97 (0.96 - 0.99) 0.96 (0.94 - 0.988) Centered and squared 1.0001 (0.9996 - 1.0006) 0.9996 (0.999 - 1.0004) 0.9997 (0.998 - 1.0012) Sex (ref: male) 1.52 (1.09 - 2.11) 1.2 (0.77 - 1.86) 1.65 (0.78 - 3.49) Smoking status (ref: never) Current 1.61 (0.89 - 2.92) 0.68 (0.25 - 1.83) 2.01 (0.62 - 6.58) Former 1.10 (0.78 - 1.56) 1.26 (0.77 - 2.08) 1.92 (0.89 - 4.14) Alcohol status (ref: no) Yes, not heavy 1.19 (0.85 - 1.65) 0.96 (0.58 - 1.58) 0.79 (0.39 - 1.62) Yes, heavyb 1.58 (0.80 - 3.12) 2.09 (0.77 - 5.62) 1.24 (0.37 - 4.23) Medical Assistance (ref: no) 1.80 (0.998 - 3.25) 1.24 (0.58 - 2.66) 4.00 (1.53 - 10.44) Body mass index Centered 1.04 (1.01 - 1.06) 1.05 (1.004 - 1.10) 1.002 (0.95 - 1.06) Centered and squared 0.996 (0.99 - 0.999) 1.00004 (0.997 - 1.003) 0.998 (0.993 - 1.003) CSD Continuous 1.06 (1.011 - 1.11) 1.04 (0.971 - 1.11) 1.07 (0.981 - 1.16) Centered and squared 0.9983 (0.99 - 1.01) 0.9992 (0.99 - 1.01) 0.99 (0.97 - 1.004) Well water (ref: municipal water) 1.23 (0.89 - 1.71) 0.65 (0.40 - 1.05) 0.44 (0.20 - 0.99) Abbreviations: UNGD, unconventional natural gas development; CI, confidence interval; ref, reference group a Fully-weighted multinomial logistic model b No depression symptoms was the base outcome. c The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the two weeks prior to follow-up survey return. d Heavy was defined based on the Centers for Disease Control definition of as 8 or more drinks per for females and 15 or more drinks per week for males

213

Table 6.2.1.5. Exponentiated coefficients from the unweighted negative binomial model. Exponentiated coefficienta (95% CI) UNGD activity metricb (ref: very low) Low 1.05 (0.96 - 1.15) Medium 0.96 (0.88 - 1.05) High 1.03 (0.94 - 1.13) Race/ethnicity (ref: white) Black 0.81 (0.67 - 0.97) Hispanic 0.89 (0.76 - 1.05) Age Centered 0.99 (0.99 - 0.99) Centered and squared 0.9998 (0.9997 - 0.9999) Sex (ref: male) 1.12 (1.05 - 1.20) Smoking status (ref: never) Current 1.25 (1.10 - 1.41) Former 1.14 (1.06 - 1.22) Alcohol status (ref: no) Yes, not heavy 0.93 (0.87 - 0.999) Yes, heavyc 1.09 (0.95 - 1.25) Medical Assistance (ref: no) 1.65 (1.47 - 1.86) Body mass index Centered 1.02 (1.01 - 1.02) Centered and squared 1.00005 (0.9996 - 1.0005) CSD Continuous 1.01 (0.998 - 1.02) Centered and squared 1.0003 (0.999 - 1.002) Well water (ref: municipal water) 0.93 (0.86 - 0.99) Abbreviations: UNGD, unconventional natural gas development; CI, confidence interval; ref, reference group a Unweighted negative binomial model b The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the two weeks prior to follow-up survey return. c Heavy was defined based on the Centers for Disease Control definition of as 8 or more drinks per for females and 15 or more drinks per week for males

214

Table 6.2.1.6. Odds ratios from the unweighted multinomial logistic model. Odds ratioa (95% CI) Odds ratioa (95% CI) Odds ratioa (95% CI) Level of depression symptoms Mild Moderate Severe UNGD activity metric (ref: very low) 1.23 (1.003 - 1.50) 1.04 (0.78 - 1.39) 1.19 (0.82 - 1.74) Low 0.996 (0.81 - 1.22) 0.84 (0.63 - 1.13) 0.91 (0.61 - 1.34) Medium 1.12 (0.92 - 1.37) 1.06 (0.80 - 1.40) 1.11 (0.76 - 1.61) High Race/ethnicity (ref: white) Black 0.75 (0.50 - 1.15) 0.64 (0.35 - 1.17) 0.49 (0.23 - 1.08) Hispanic 0.67 (0.46 - 0.99) 0.76 (0.46 - 1.26) 1.14 (0.67 - 1.94) Age Centered 0.99 (0.98 - 0.994) 0.98 (0.98 - 0.991) 0.97 (0.96 - 0.985) Centered and squared 0.99996 (0.9997 - 1.0002) 0.9997 (0.9993 - 1.0001) 0.999 (0.998 - 0.9996) Sex (ref: male) 1.27 (1.09 - 1.49) 1.03 (0.83 - 1.28) 1.41 (1.04 - 1.92) Smoking status (ref: never) Current 1.51 (1.15 - 1.99) 1.14 (0.78 - 1.68) 2.68 (1.73 - 4.14) Former 1.14 (0.96 - 1.34) 1.17 (0.92 - 1.49) 1.71 (1.25 - 2.35) Alcohol status (ref: no) Yes, not heavy 1.01 (0.86 - 1.18) 0.87 (0.69 - 1.08) 0.63 (0.47 - 0.85) Yes, heavyb 1.16 (0.85 - 1.59) 1.93 (1.30 - 2.86) 0.78 (0.47 - 1.27) Medical Assistance (ref: no) 1.71 (1.28 - 2.27) 2.98 (2.14 - 4.15) 5.51 (3.85 - 7.89) Body mass index Centered 1.03 (1.02 - 1.05) 1.03 (1.015 - 1.05) 1.041 (1.02 - 1.07) Centered and squared 0.998 (0.997 - 0.9995) 1.00054 (0.999 - 1.002) 1.0001 (0.998 - 1.002) CSD Continuous 1.04 (1.021 - 1.07) 1.01 (0.978 - 1.04) 1.01 (0.971 - 1.05) Centered and squared 0.99 (0.99 - 1.002) 1.0023 (1 - 1.01) 1.002 (0.99 - 1.009) Well water (ref: municipal water) 1.14 (0.97 - 1.33) 0.85 (0.68 - 1.07) 0.65 (0.47 - 0.90) Abbreviations: UNGD, unconventional natural gas development; CI, confidence interval; ref, reference group a Unweighted multinomial logistic model b No depression symptoms was the base outcome. c The UNGD metric was a composite for four phases of well development (pad preparation, drilling, stimulation, and production) and was assigned for the two weeks prior to follow-up survey return. d Heavy was defined based on the Centers for Disease Control definition of as 8 or more drinks per for females and 15 or more drinks per week for males

215

6.3 Comparing asthma patients identified in the electronic health record and by self- report In the unconventional natural gas development (UNGD) and asthma exacerbation study, we identified asthma patients using visits and medications for asthma documented in the electronic health record (EHR) (Section 3.3.1), using an algorithm developed in a prior study.1 In the UNGD and asthma symptom study (Section

6.4), we used the chronic rhinosinusitis baseline questionnaire, which included a

question on doctor diagnosed asthma. Here, we compared characteristics of patients

identified as having asthma using the self-reported doctor-diagnosed question and with

those identified using the EHR algorithm. Prior studies comparing asthma diagnoses

from medical records and from self-report have found a range of agreements (kappa

coefficients ranged from 0.4 - 0.7) (Table 6.3) and several variables associated with

discordance between asthma diagnosis in medical record and self-report.2-4 However, in

contrast to these prior studies, which used abstraction or audit of medical records, our

EHR asthma classification is based on an algorithm.

216

Table 6.3. Studies comparing self-reported asthma and asthma in medical record. Article Patient Number Outcome Medical record Self-report Agreement Variables source of source data source associated patients with discordance Tisnado Adult 1270 History of Medical records Mailed, self- Kappa=0.7, None Medical outpatient asthma abstracted by administered agreement=91% evaluated Care 2006 clinics in study survey California, fieldworkers Washington, and Oregon Skinner J Elderly male 402 Chronic lung Medical records Self-reported Kappa=0.55 Age, marital Ambulatory veterans disease abstracted by screening status, and Care using (defined as authors questionnaires education Manage Veterans chronic administered at were not 2006 Affairs bronchitis, the time of a associated ambulatory emphysema, clinic visit with care or asthma) discordance. Corser Hospitalized 525 Chronic Medical record Telephone Kappa=0.43 Asthma in self BMC Health acute pulmonary audits by study interview report but not Services coronary Disease/ fieldworkers of in medical Research syndrome Asthma and paper or record 2008 patients bronchitis electronic associated medical records with (approximately depression. 50% in each Education and form) age were not associated.

217

6.3.1 Methods

The baseline questionnaire, which 7,785 adults residing in Pennsylvania responded to, asked if responders had ever been told by a doctor that they had asthma.

We used this answer to create a variable for self-reported asthma (yes, no, missing).

Using the EHR-based algorithm we used to identify asthma patients in the UNGD and asthma study, we assigned all baseline questionnaire responders a variable for EHR asthma on the date of baseline survey return (yes, no). We compared patients’ classification by self-reported asthma and EHR asthma. We created covariates for

Medical Assistance, smoking status, sex, community socioeconomic deprivation (CSD), community type, Charleson index, race/ethnicity, years since first Geisinger encounter, and body mass index (BMI), and we compared these characteristics among patients in the four groups (yes/no self-reported asthma, and yes/no EHR asthma).

6.3.2 Results Among the 7,785 baseline survey responders, 7,780 had electronic health record data available, and these responders made up the study population. Among the study population, 2,059 (26%) had self-reported asthma and 1,715 (22%) were classified as having asthma using the EHR algorithm (Table 6.3.2.1). Among patients who did answered the question on doctor-diagnosed asthma (n=7,590), the kappa statistic between the two asthma classifications was 0.70.

Table 6.3.2.1. Classification by the electronic health record asthma algorithm by self- reported asthma. Self-reported asthma No Yes Missing Total EHR No 5,288 611 166 6,065 asthma Yes 243 1,448 24 1,715 Total 5,531 2,059 190 7,780

All covariates evaluated were statistically significantly different among patients with EHR and self-reported asthma, patients with self-reported but not EHR asthma,

218

patients with EHR but not self-reported asthma, and neither EHR nor self-reported

asthma (Table 6.3.2.2). Excluding patients with neither self-reported nor EHR asthma, community type was no longer statistically significantly different among patients with self-reported and/or EHR asthma. Additionally, community socio-economic deprivation and BMI were also no longer statistically significantly among patients with self-reported and/or EHR asthma, though the p-values were marginal (p=0.08 and p=0.06, respectively).

Patients with EHR and self-reported asthma were the most likely to be female of the four groups (p<0.001). Patients with EHR but not self-reported asthma tended to have more years of Geisginer EHR data compared to patients with self-reported but not

EHR asthma (p<0.001).

6.3.3 Discussion

The kappa agreement between self-reported asthma and asthma in the EHR was

substantial. However, several characteristics differed across patients with self-reported

asthma, EHR asthma, or both.

219

Table 6.3.2.2. Characteristics of patients with and without EHR and self-reported asthma. p-value p-value from Chi2 from Chi2 test with Self- test self- EHR and reported but EHR but not among reported self-reported not EHR self-reported all and/or asthma asthma asthma Neither Total responde EHR (n=1,448) (n=611) (n=243) (n=5,288) (n=7,590) rs asthma Medical Assistance No 1146 (79.1) 525 (85.9) 212 (87.2) 4828 (91.3) 6711 (88.4) p<0.001 p<0.001 Yes 302 (20.9) 86 (14.1) 31 (12.8) 460 (8.7) 879 (11.6) Smoking status Never 833 (57.5) 302 (49.4) 133 (54.7) 2900 (54.8) 4168 (54.9) p=0.03 p=0.01 Current 195 (13.5) 111 (18.2) 37 (15.2) 753 (14.2) 1096 (14.4) Former 420 (29) 198 (32.4) 73 (30) 1635 (30.9) 2326 (30.6) Female 1049 (72.4) 389 (63.7) 156 (64.2) 3152 (59.6) 4746 (62.5) p<0.001 p<0.001 CSD Q1 318 (22) 165 (27) 69 (28.4) 1397 (26.4) 1949 (25.7) Q2 345 (23.8) 126 (20.6) 59 (24.3) 1334 (25.2) 1864 (24.6) p<0.001 p=0.08 Q3 372 (25.7) 156 (25.5) 58 (23.9) 1326 (25.1) 1912 (25.2) Q4 413 (28.5) 164 (26.8) 57 (23.5) 1231 (23.3) 1865 (24.6) Community type Borough 413 (28.5) 172 (28.2) 65 (26.7) 1427 (27) 2077 (27.4) p=0.045 p=0.41 City 144 (9.9) 72 (11.8) 19 (7.8) 448 (8.5) 683 (9) Township 891 (61.5) 367 (60.1) 159 (65.4) 3413 (64.5) 4830 (63.6) Charleson index 0 81 (5.6) 146 (23.9) 20 (8.2) 1466 (27.7) 1713 (22.6) 1 124 (8.6) 166 (27.2) 45 (18.5) 1734 (32.8) 2069 (27.3) p<0.001 p<0.001 2 256 (17.7) 133 (21.8) 42 (17.3) 1150 (21.7) 1581 (20.8) 3+ 987 (68.2) 166 (27.2) 136 (56) 938 (17.7) 2227 (29.3) Race/ethnicity White 1271 (87.8) 538 (88.1) 225 (92.6) 4834 (91.4) 6868 (90.5) p<0.001 p=0.01 Black 71 (4.9) 43 (7) 9 (3.7) 201 (3.8) 324 (4.3) Hispanic 106 (7.3) 30 (4.9) 9 (3.7) 253 (4.8) 398 (5.2)

220

p-value p-value from Chi2 from Chi2 test with Self- test self- EHR and reported but EHR but not among reported self-reported not EHR self-reported all and/or asthma asthma asthma Neither Total responde EHR (n=1,448) (n=611) (n=243) (n=5,288) (n=7,590) rs asthma Years in Geisinger 0-4 61 (4.2) 54 (8.8) 5 (2.1) 280 (5.3) 400 (5.3) p<0.001 p<0.001 5-9 204 (14.1) 126 (20.6) 23 (9.5) 856 (16.2) 1209 (15.9) 10+ 1183 (81.7) 431 (70.5) 215 (88.5) 4152 (78.5) 5981 (78.8) BMI Not 300 (20.7) 139 (22.7) 43 (17.7) 1341 (25.4) 1823 (24.0) overweight/obese 410 (28.3) 200 (32.7) 74 (30.5) 1758 (33.2) 2442 (32.2) p<0.001 p=0.06 Overweight 738 (51.0) 272 (44.5) 126 (51.9) 2189 (41.4) 3325 (43.8) Obese Total 1448 (100) 611 (100) 243 (100) 5288 (100) 7590 (100)

221

6.4 Unconventional natural gas development and asthma symptom study After completing the study of UNGD and clinically-documented asthma exacerbations, we evaluated the association of UNGD with asthma symptoms. The rationale for this study was to evaluate if the association of UNGD with asthma symptoms was similar to the associations that we observed between UNGD and clinically-documented asthma exacerbations. We hypothesized that UNGD would be associated with increased odds of asthma symptoms.

6.4.1 Survey data

Asthma symptom data came from a baseline and follow-up questionnaire sent to adult patients of the Geisinger Clinic in April and October 2014 by the Chronic

Rhinosinusitis Integrative Studies Program.5 The survey design oversampled for patients

with nasal and sinus symptoms and race/ethnic minorities. The baseline survey, mailed

to 23,700 patients, included questions on asthma symptoms (cough, wheeze, chest

tightness, and shortness of breath) in the past three months and a question on doctor

diagnosis of asthma. The follow-up questionnaire was mailed to all responders of the

baseline questionnaire and included questions on asthma symptoms in the past six

months. A total of 7,847 patients responded to the baseline questionnaire (response rate

33.1%) (Figure 6.4.1). All baseline survey responders received the follow-up survey,

and 4,966 patients responded to the follow-up questionnaire. We excluded patients who

lived outside of Pennsylvania, leaving 7,785 patients at baseline and 4,932 at follow-up.

222

Figure 6.4.1. Chronic Rhinosinusitis Integrative Studies Program survey design. Abbreviation: LTFU, loss to follow-up

6.4.2 Asthma symptom outcomes The responses for the questions on asthma symptoms were a scale (“never,”

“once in a while,” “some of the time,” “most of the time,” “all of the time”). We created two types of outcomes. First, we evaluated each symptom (cough, wheeze, chest tightness, and shortness of breath) separately. We excluded patients who did not respond to the question for each symptom. Second, we created an outcome for number of asthma symptoms with a “most of the time” or “all of the time” response, which ranged from zero to four. For this outcome, we assumed patients who did not respond to the question did not have the symptom.

6.4.3 UNGD metrics

We assigned the same UNGD metrics as used in the UNGD and objectively- documented asthma exacerbations study (Section 3.3.6). For the analysis of symptoms at baseline, which asked about symptoms over the past three months, we assigned the metrics from the day before the questionnaire was received to 90 days prior to that. For the analysis of symptoms at follow-up, which asked about symptoms over the past six

223

months, we assigned the metrics from the day before the questionnaire was received to

180 days prior to that. In a sensitivity analysis, we re-assigned the UNGD metrics at

follow-up with two different exposure windows: from the day before the questionnaire

was received to 14 days prior to that, and from the day before the questionnaire was

received to 365 days prior to that. We summed z-scores of the four phases of

development (pad preparation, spud, stimulation, and production) and quartiled the sum

to create summary metrics.

6.4.4 Covariates From the electronic health record, we created covariates on race/ethnicity (white, black, Hispanic); sex (male, female); use of Medical Assistance, a measure of low socio- economic status (no, yes); age at survey return (years); smoking status at survey return

(current, former, never); and family history of asthma (no, yes). From the baseline survey, we created a variable on self-reported doctor diagnosis of asthma (no/missing, yes).

6.4.5 Data analysis

We used multinomial logistic regression to evaluate the association of the UNGD summary metric with the asthma symptom outcomes. Race/ethnicity, sex, use of

Medical Assistance, age, smoking status, family history of asthma and doctor diagnosis of asthma were included as covariates. Age was included centered and centered and squared to allow for non-linearity. We evaluated effect modification by asthma diagnosis by including an interaction between the summary UNGD metric and doctor diagnosis of asthma. In a sensitivity analysis, we re-ran the model for the number of asthma symptoms outcome at follow-up with a two week and one year summary UNGD metric

(described in 6.4.3).

We calculated baseline and follow-up weights as show in Table 6.4.5. We ran each model three times: using unweighted logistic regression, using logistic regression

224

with the truncated weights (which replaced the largest weight in Table 6.4.5 with the second largest weight), and using logistic regression with the full weighting (as in Table

6.4.5). We ran the three models to examine the trade-off between bias and precision: the full weighting model should be the least biased, but the unweighted model should be the most precise.

225

Table 6.4.5 Calculation of survey weights at baseline and follow-up (cells are counts unless otherwise specified). White 1. Identified using EHR 13,132 47,892 131,366 2. Received survey 12,209 4,224 2,775 Survey weight (#1 ÷ #2) 1.08 11.34 47.34 3. Responded to BL survey 4,730 1,489 876 Response weight (#2 ÷ #3) 2.58 2.84 3.17 Sample weight, baseline (#1 ÷ #3) 2.78 32.16 149.96 4. Responded to FU6mo 3,095 955 554 Response weight fu6mo (#3 ÷ #4) 1.53 1.56 1.58 Sample weight, follow-up (#1 ÷ #4) 4.24 50.15 237.12

Black 1. Identified using EHR 170 991 2,832 2. Received survey 159 903 1,109 Survey weight (#1 ÷ #2) 1.07 1.10 2.55 3. Responded to BL survey 35 147 160 Response weight (#2 ÷ #3) 4.54 6.14 6.93 Sample weight, baseline (#1 ÷ #3) 4.86 6.74 17.70 4. Responded to FU6mo 17 69 65 Response weight fu6mo (#3 ÷ #4) 2.06 2.13 2.46 Sample weight, follow-up (#1 ÷ #4) 10.00 14.36 43.57

Hispanic 1. Identified using EHR 192 1,035 3,159 2. Received survey 181 966 1,174 Survey weight (#1 ÷ #2) 1.06 1.07 2.69 3. Responded to BL survey 35 207 168 Response weight (#2 ÷ #3) 5.17 4.67 6.99 Sample weight, baseline (#1 ÷ #3) 5.49 5.00 18.80 4. Responded to FU6mo 19 93 99 Response weight fu6mo (#3 ÷ #4) 1.84 2.23 1.70 Sample weight, follow-up (#1 ÷ #4) 10.11 11.13 31.91

6.4.6 Results

Each outcome had missing data, ranging from 1.5% for cough to 2.2% for shortness of breath at baseline, and from 2.3% for cough to 3.2% for shortness of breath at follow-up (Table 6.4.6.1). The percentage of survey responders classified with the

226

symptom “most of the time” or “all of the time” ranged from 5% for chest tightness to

14% for cough at baseline, and from 4% for chest tightness to 13% for cough at follow-

up.

Table 6.4.6.1. Asthma symptoms and missingness at baseline and follow-up. Time Baseline survey responders Follow-up survey responders Period (n=7,785) (n=4,932) Outcom Chest Shortnes Chest Shortnes Coug Wheez Coug Wheez e tightnes s of tightnes s of h e h e s breath s breath Never 1,309 3,860 4,143 3,367 908 2,594 2,813 2,331 Once in a while 3,144 1,926 2,050 2,321 2,051 1,229 1,243 1,436 Some of the time 2,153 1,282 1,067 1,281 1,216 668 563 684 Most of the time 791 423 291 474 479 214 135 237 All of the time 268 161 83 170 167 96 38 86 Missing 120 133 151 172 111 131 140 158

In the analysis of baseline symptoms, among patients with asthma, there were

no associations of UNGD with wheeze or shortness of breath symptoms. The fourth

quartile of the summary UNGD metric (compared to the first quartile) was associated

with increased odds of cough symptoms “once in a while” and “all of the time” in

unweighted models only. The second and third quartiles of the summary UNGD metric

(compared to the first quartile) was associated with decreased odds of chest tightness

symptoms “once in a while” in truncated weighted models only.

Among patients without asthma, the fourth quartile of the summary UNGD metric

(compared to the first quartile) was associated with increased odds of each severity of

shortness of breath symptoms in unweighted models, but associations were not

consistent in truncated or full weighted models at baseline (Table 6.4.6.2). The fourth quartile of the summary UNGD metric (compared to the first quartile) was associated with increased odds of chest tightness symptoms “once in a while” in an unweighted

227

model only, and was associated with increased odds of cough symptoms “all of the time” in a full weighted model only. The fourth quartile of the summary UNGD metric

(compared to the first quartile) was associated with increased odds of wheeze symptoms

“some of the time” and “most of the time” in unweighted models, and wheeze symptoms

“all of the time” in a full weighted model.

228

Table 6.4.6.2. Association of UNGD metrics with number of asthma symptoms at baseline among patients with and without asthma from adjusted multinomial models. Having the symptom “never” was the reference outcome. In each cell, the odds ratio and 95% confidence interval were reported from unweighted logistic regression, survey logistic regression with truncated weights, and survey logistic regression with full weights. Associations statistically significant at p=0.05 are bolded. Abbreviations: OR, odds ratio; CI, confidence interval Outcome Cough Wheeze Chest tightness Shortness of breath Association among asthma patients with the symptom “once in a while” (OR, 95% CI) Q2 vs. Q1 1.14 (0.76 - 1.71) 0.83 (0.60 - 1.16) 0.89 (0.66 - 1.21) 0.80 (0.57 - 1.11) 1.11 (0.60 - 2.04) 0.62 (0.38 - 1.03) 0.59 (0.38 - 0.94) 0.73 (0.44 - 1.20) 1.20 (0.51 - 2.82) 0.72 (0.35 - 1.45) 0.66 (0.35 - 1.23) 0.997 (0.49 - 2.04) Q3 vs. Q1 1.28 (0.85 - 1.95) 0.84 (0.60 - 1.16) 0.80 (0.59 - 1.08) 0.87 (0.63 - 1.21) 1.32 (0.72 - 2.42) 0.59 (0.36 - 0.98) 0.56 (0.35 - 0.89) 0.71 (0.43 - 1.17) 1.42 (0.60 - 3.32) 0.71 (0.35 - 1.44) 0.62 (0.32 - 1.18) 1.04 (0.50 - 2.14) Q4 vs. Q1 1.63 (1.03 - 2.57) 1.26 (0.87 - 1.82) 0.96 (0.69 - 1.33) 1.02 (0.72 - 1.46) 1.51 (0.76 - 3.02) 1.14 (0.65 - 2.01) 0.77 (0.47 - 1.27) 1.13 (0.65 - 1.97) 2.24 (0.83 - 6.10) 1.31 (0.58 - 2.95) 0.78 (0.39 - 1.58) 1.28 (0.58 - 2.80) Association among asthma patients with the symptom “some of the time” (OR, 95% CI) Q2 vs. Q1 1.22 (0.81 - 1.84) 1.05 (0.75 - 1.46) 0.96 (0.69 - 1.32) 0.97 (0.69 - 1.35) 1.59 (0.86 - 2.94) 0.87 (0.52 - 1.46) 0.86 (0.82 - 1.42) 0.83 (0.50 - 1.37) 2.06 (0.86 - 4.96) 1.11 (0.54 - 2.28) 0.89 (0.45 - 1.74) 0.94 (0.49 - 1.82) Q3 vs. Q1 1.29 (0.85 - 1.96) 1.001 (0.72 - 1.40) 0.86 (0.62 - 1.19) 0.94 (0.67 - 1.32) 1.11 (0.59 - 2.06) 1.001 (0.59 - 1.69) 0.86 (0.52 - 1.42) 0.77 (0.46 - 1.30) 1.37 (0.56 - 3.34) 1.30 (0.62 - 2.73) 0.83 (0.42 - 1.63) 0.96 (0.48 - 1.91) Q4 vs. Q1 1.37 (0.86 - 2.18) 1.70 (1.17 - 2.46) 1.19 (0.85 - 1.67) 1.22 (0.84 - 1.76) 1.57 (0.78 - 3.18) 1.33 (0.74 - 2.38) 0.95 (0.55 - 1.63) 1.10 (0.63 - 1.94) 2.34 (0.85 - 6.43) 1.63 (0.70 - 3.77) 1.24 (0.56 - 2.73) 1.12 (0.52 - 2.41) Association among asthma patients with the symptom “most of the time” (OR, 95% CI) Q2 vs. Q1 1.08 (0.68 - 1.74) 0.88 (0.56 - 1.36) 0.80 (0.48 - 1.34) 0.83 (0.53 - 1.30) 0.85 (0.41 - 1.76) 0.61 (0.31 - 1.19) 0.58 (0.26 - 1.29) 0.63 (0.31 - 1.26) 0.97 (0.35 - 2.73) 0.53 (0.22 - 1.28) 0.49 (0.18 - 1.38) 0.53 (0.22 - 1.30)

229

Q3 vs. Q1 1.19 (0.74 - 1.93) 0.73 (0.46 - 1.14) 0.83 (0.50 - 1.37) 0.82 (0.52 - 1.28) 1.06 (0.52 - 2.18) 0.55 (0.27 - 1.10) 0.80 (0.35 - 1.77) 0.94 (0.46 - 1.90) 0.86 (0.34 - 2.22) 0.47 (0.19 - 1.14) 0.63 (0.23 - 1.74) 0.98 (0.36 - 2.68) Q4 vs. Q1 1.63 (0.98 - 2.72) 1.49 (0.94 - 2.38) 1.23 (0.74 - 2.04) 1.06 (0.66 - 1.71) 1.39 (0.63 - 3.07) 1.10 (0.54 - 2.27) 1.31 (0.58 - 2.96) 0.74 (0.35 - 1.56) 1.31 (0.46 - 3.74) 1.08 (0.38 - 3.08) 1.06 (0.37 - 3.00) 0.51 (0.20 - 1.31) Association among asthma patients with the symptom “all of the time” (OR, 95% CI) Q2 vs. Q1 1.24 (0.66 - 2.35) 0.85 (0.47 - 1.54) 1.26 (0.54 - 2.92) 1.11 (0.59 - 2.08) 0.86 (0.31 - 2.36) 0.58 (0.22 - 1.49) 0.80 (0.22 - 2.86) 0.60 (0.22 - 1.62) 0.74 (0.20 - 2.67) 0.71 (0.24 - 2.11) 0.89 (0.22 - 3.61) 0.74 (0.24 - 2.28) Q3 vs. Q1 1.69 (0.91 - 3.16) 0.64 (0.34 - 1.19) 0.64 (0.24 - 1.67) 0.64 (0.32 - 1.29) 1.10 (0.42 - 2.89) 0.39 (0.15 - 1.02) 0.45 (0.11 - 1.79) 0.54 (0.19 - 1.53) 0.82 (0.24 - 2.82) 0.47 (0.16 - 1.44) 0.55 (0.12 - 2.48) 0.72 (0.22 - 2.32) Q4 vs. Q1 2.33 (1.22 - 4.47) 1.35 (0.73 - 2.51) 1.28 (0.53 - 3.09) 1.56 (0.82 - 2.98) 2.00 (0.74 - 5.43) 1.19 (0.59 - 3.72) 0.80 (0.23 - 2.78) 1.45 (0.54 - 3.89) 1.74 (0.47 - 6.54) 2.85 (0.88 - 9.27) 1.58 (0.38 - 6.59) 3.41 (0.98 - 11.87) Association among patients without asthma with the symptom “once in a while” (OR, 95% CI) Q2 vs. Q1 0.89 (0.72 - 1.09) 0.98 (0.82 - 1.18) 1.10 (0.92 - 1.31) 1.07 (0.90 - 1.28) 0.91 (0.70 - 1.20) 1.13 (0.87 - 1.47) 1.13 (0.88 - 1.46) 0.998 (0.78 - 1.27) 0.86 (0.60 - 1.23) 1.14 (0.80 - 1.63) 0.95 (0.67 - 1.34) 0.999 (0.72 - 1.39) Q3 vs. Q1 0.89 (0.73 - 1.10) 1.10 (0.92 - 1.32) 1.14 (0.95 - 1.36) 1.16 (0.97 - 1.37) 0.79 (0.60 - 1.04) 1.22 (0.94 - 1.59) 1.22 (0.94 - 1.57) 1.11 (0.87 - 1.41) 0.82 (0.58 - 1.17) 1.18 (0.83 - 1.68) 1.07 (0.76 - 1.50) 1.04 (0.75 - 1.44) Q4 vs. Q1 0.91 (0.74 - 1.11) 1.12 (0.94 - 1.35) 1.21 (1.02 - 1.44) 1.22 (1.03 - 1.44) 0.87 (0.66 - 1.14) 1.10 (0.85 - 1.44) 1.18 (0.92 - 1.52) 1.20 (0.94 - 1.53) 0.77 (0.54 - 1.10) 0.89 (0.62 - 1.28) 0.84 (0.59 - 1.18) 1.15 (0.83 - 1.60) Association among patients without asthma with the symptom “some of the time” (OR, 95% CI) Q2 vs. Q1 1.02 (0.81 - 1.27) 1.13 (0.88 - 1.44) 1.02 (0.78 - 1.33) 1.25 (0.99 - 1.59) 1.11 (0.81 - 1.51) 1.06 (0.74 - 1.52) 1.19 (0.80 - 1.77) 1.47 (1.02 - 2.10) 1.27 (0.84 - 1.93) 1.06 (0.64 - 1.73) 1.75 (1.01 - 3.03) 1.77 (1.07 - 2.90) Q3 vs. Q1 0.95 (0.76 - 1.18) 1.20 (0.94 - 1.54) 1.05 (0.81 - 1.38) 1.10 (0.86 - 1.41) 1.03 (0.76 - 1.41) 1.13 (0.79 - 1.61) 0.35 (0.91 - 1.99) 1.43 (0.997 - 2.04) 1.23 (0.81 - 1.85) 1.11 (0.68 - 1.80) 1.53 (0.88 - 2.64) 1.71 (1.01 - 2.80)

230

Q4 vs. Q1 1.03 (0.83 - 1.29) 1.51 (1.19 - 1.90) 1.35 (1.05 - 1.74) 1.36 (1.07 - 1.72) 1.16 (0.85 - 1.58) 1.36 (0.96 - 1.91) 1.50 (1.03 - 2.19) 1.66 (1.16 - 2.36) 1.31 (0.86 - 1.98) 1.17 (0.73 - 1.88) 2.08 (1.23 - 3.53) 1.90 (1.18 - 3.18) Association among patients without asthma with the symptom “most of the time” (OR, 95% CI) Q2 vs. Q1 1.05 (0.76 - 1.43) 1.29 (0.83 - 2.00) 1.32 (0.80 - 2.19) 1.14 (0.76 - 1.69) 1.11 (0.81 - 1.51) 0.85 (0.43 - 1.68) 0.82 (0.38 - 1.74) 0.997 (0.56 - 1.79) 1.25 (0.63 - 2.47) 1.43 (0.55 - 3.73) 0.64 (0.22 - 1.86) 1.40 (0.63 - 3.11) Q3 vs. Q1 1.01 (0.74 - 1.39) 1.19 (0.76 - 1.87) 1.23 (0.74 - 2.04) 1.27 (0.86 - 1.87) 1.03 (0.76 - 1.41) 0.99 (0.51 - 1.95) 0.80 (0.38 - 1.72) 0.95 (0.54 - 1.69) 1.56 (0.83 - 2.95) 1.50 (0.60 - 3.78) 0.95 (0.32 - 2.78) 1.17 (0.52 - 2.62) Q4 vs. Q1 1.26 (0.93 - 1.70) 1.54 (1.004 - 2.35) 1.20 (0.72 - 1.99) 1.47 (1.01 - 2.14) 1.16 (0.85 - 1.58) 1.39 (0.74 - 2.61) 0.87 (0.41 - 1.84) 1.42 (0.83 - 2.45) 1.76 (0.94 - 3.30) 2.41 (1.01 - 5.79) 0.80 (0.27 - 2.41) 2.31 (1.09 - 4.91) Association among patients without asthma with the symptom “all of the time” (OR, 95% CI) Q2 vs. Q1 1.25 (0.74 - 2.13) 0.96 (0.45 - 2.05) 0.68 (0.24 - 1.94) 1.14 (0.55 - 2.39) 1.39 (0.64 - 3.02) 0.91 (0.27 - 3.10) 0.23 (0.60 - 0.85) 1.11 (0.37 - 3.31) 1.69 (0.58 - 4.95) 1.89 (0.41 - 8.75) 0.23 (0.06 - 0.89) 0.66 (0.15 - 2.95) Q3 vs. Q1 1.08 (0.63 - 1.85) 0.63 (0.27 - 1.48) 0.87 (0.33 - 2.28) 1.58 (0.80 - 3.12) 1.15 (0.53 - 2.53) 0.43 (0.11 - 1.75) 0.95 (0.22 - 4.16) 1.31 (0.46 - 3.76) 1.47 (0.50 - 4.38) 1.32 (0.20 - 8.58) 0.99 (0.22 - 4.51) 0.95 (0.23 - 3.99) Q4 vs. Q1 1.59 (0.96 - 2.64) 1.55 (0.79 - 3.04) 1.54 (0.67 - 3.54) 1.95 (1.02 - 3.76) 1.65 (0.77 - 3.50) 2.21 (0.80 - 6.08) 1.49 (0.43 - 5.19) 2.75 (1.07 - 7.08) 3.58 (1.30 - 9.85) 3.68 (1.12 - 12.07) 2.71 (0.52 - 14.04) 1.77 (0.53 - 5.90)

231

In the analysis of symptoms at follow-up, among patients with asthma, UNGD was not associated with chest tightness symptoms (Table 6.4.6.3). UNGD quartile four

(vs. quartile one) was associated with increased odds of cough symptoms only “most of the time” and only in an unweighted model among patients with asthma. Among patients with asthma, UNGD quartile two (vs. quartile one) was associated with increased odds of wheeze symptoms only “all of the time” and only in an unweighted model, and was associated with increased odds of shortness of breath symptoms, only “most of the time” and only in an unweighted model

Among patients without asthma, UNGD was not associated with cough, wheeze or chest tightness symptoms at follow-up. UNGD quartile three (vs. quartile one) was

associated with shortness of breath “once in a while” in truncated and full weighted

models, and quartile four (vs. quartile one) was associated with increased odds of

shortness of breath symptoms “once in a while” in an unweighted model.

232

Table 6.4.6.3. Association of UNGD metrics with number of asthma symptoms at follow-up among patients with and without asthma from adjusted multinomial logistic models. Having the symptom “never” was the reference outcome. In each cell, the odds ratio and 95% confidence interval was reported from unweighted logistic regression, survey logistic regression with truncated weights, and survey logistic regression with full weights. Associations statistically significant at p=0.05 are bolded. Abbreviations: OR, odds ratio; CI, confidence interval Outcome Cough Wheeze Chest tightness Shortness of breath Association among asthma patients with the symptom “once in a while” (OR, 95% CI) Q2 vs. Q1 1.17 (0.71 - 1.93) 1.28 (0.86 - 1.91) 1.03 (0.71 - 1.49) 1.28 (0.86 - 1.92) 1.39 (0.66 - 2.91) 0.95 (0.52 - 1.75) 1.05 (0.60 - 1.85) 1.36 (0.74 - 2.51) 0.96 (0.35 - 2.65) 0.95 (0.40 - 2.24) 0.98 (0.46 - 2.11) 0.96 (0.41 - 2.23) Q3 vs. Q1 1.22 (0.74 - 2.00) 0.90 (0.61 - 1.34) 0.99 (0.68 - 1.44) 1.23 (0.82 - 1.83) 0.88 (0.44 - 1.79) 0.75 (0.41 - 1.38) 0.74 (0.42 - 1.30) 0.99 (0.54 - 1.83) 0.76 (0.29 - 2.01) 0.97 (0.40 - 2.33) 0.70 (0.32 - 1.56) 0.58 (0.25 - 1.36) Q4 vs. Q1 1.003 (0.59 - 1.72) 1.24 (0.81 - 1.91) 1.05 (0.71 - 1.55) 1.25 (0.81 - 1.91) 1.004 (0.45 - 2.25) 0.67 (0.34 - 1.32) 0.86 (0.48 - 1.56) 1.09 (0.57 - 2.09) 1.002 (0.34 - 2.99) 0.46 (0.19 - 1.14) 0.71 (0.32 - 1.53) 1.09 (0.45 - 2.68) Association among asthma patients with the symptom “some of the time” (OR, 95% CI) Q2 vs. Q1 1.56 (0.93 - 2.61) 1.15 (0.74 - 1.80) 1.05 (0.68 - 1.61) 1.14 (0.73 - 1.76) 1.53 (0.70 - 3.34) 0.88 (0.44 - 1.76) 0.97 (0.49 - 1.92) 1.05 (0.52 - 2.10) 1.31 (0.45 - 3.87) 0.72 (0.28 - 1.84) 0.99 (0.44 - 2.25) 0.79 (0.33 - 1.88) Q3 vs. Q1 1.26 (0.75 - 2.13) 0.86 (0.55 - 1.33) 0.98 (0.64 - 1.52) 1.08 (0.70 - 1.67) 0.92 (0.43 - 1.99) 0.78 (0.39 - 1.58) 0.81 (0.41 - 1.61) 0.99 (0.50 - 1.97) 0.99 (0.35 - 2.82) 0.59 (0.23 - 1.49) 0.95 (0.39 - 2.32) 0.88 (0.34 - 2.29) Q4 vs. Q1 1.63 (0.94 - 2.83) 1.47 (0.93 - 2.33) 1.09 (0.69 - 1.71) 1.13 (0.71 - 1.80) 1.75 (0.76 - 4.01) 1.06 (0.52 - 2.15) 0.83 (0.41 - 1.7) 1.21 (0.59 - 2.49) 2.78 (0.86 - 9.00) 0.85 (0.31 - 2.35) 1.08 (0.42 - 2.8) 1.34 (0.51 - 3.48) Association among asthma patients with the symptom “most of the time” (OR, 95% CI) Q2 vs. Q1 1.47 (0.79 - 2.74) 1.50 (0.82 - 2.74) 1.33 (0.65 - 2.76) 2.29 (1.26 - 4.14) 1.62 (0.63 - 4.21) 0.84 (0.32 - 2.17) 1.78 (0.56 - 5.60) 1.65 (0.66 - 4.11) 1.05 (0.30 - 3.65) 0.43 (0.13 - 1.48) 0.93 (0.2 - 4.41) 0.85 (0.24 - 2.97)

233

Q3 vs. Q1 1.72 (0.94 - 3.16) 1.17 (0.64 - 2.12) 1.65 (0.82 - 3.32) 1.09 (0.56 - 2.1) 1.15 (0.46 - 2.87) 0.43 (0.16 - 1.12) 0.73 (0.25 - 2.19) 0.55 (0.2 - 1.49) 0.78 (0.24 - 2.55) 0.22 (0.06 - 0.76) 0.38 (0.09 - 1.63) 0.25 (0.07 - 0.90) Q4 vs. Q1 2.24 (1.19 - 4.19) 1.60 (0.85 - 3.01) 1.07 (0.49 - 2.32) 1.66 (0.87 - 3.16) 2.10 (0.80 - 5.50) 0.94 (0.35 - 2.55) 1.11 (0.33 - 3.78) 0.95 (0.35 - 2.59) 1.92 (0.54 - 6.78) 0.36 (0.09 - 1.43) 0.58 (0.12 - 2.74) 0.62 (0.17 - 2.27) Association among asthma patients with the symptom “all of the time” (OR, 95% CI) Q2 vs. Q1 2.02 (0.89 - 4.56) 2.57 (1.15 - 5.76) 2.37 (0.76 - 7.40) 1.84 (0.75 - 4.52) 1.03 (0.30 - 3.52) 1.9 (0.53 - 6.82) 1.03 (0.17 - 6.24) 1.65 (0.40 - 6.91) 0.43 (0.09 - 2.09) 1.81 (0.45 - 7.25) 1.10 (0.18 - 6.67) 1.35 (0.30 - 6.00) Q3 vs. Q1 1.62 (0.70 - 3.72) 1.36 (0.58 - 3.17) 0.89 (0.23 - 3.51) 1.27 (0.50 - 3.21) 0.61 (0.18 - 2.12) 0.85 (0.22 - 3.25) 0.83 (0.14 - 4.76) 0.77 (0.18 - 3.22) 0.53 (0.08 - 3.39) 0.89 (0.21 - 3.71) 0.78 (0.13 - 4.75) 0.51 (0.11 - 2.34) Q4 vs. Q1 1.62 (0.66 - 3.95) 1.51 (0.59 - 3.85) 0.96 (0.24 - 3.82) 1.86 (0.74 - 4.67) 0.39 (0.12 - 1.3) 1.04 (0.26 - 4.25) 0.16 (0.03 - 0.93) 0.62 (0.17 - 2.34) 0.23 (0.05 - 1.07) 0.89 (0.20 - 3.92) 0.2 (0.03 - 1.18) 0.68 (0.17 - 2.73) Association among patients without asthma with the symptom “once in a while” (OR, 95% CI) Q2 vs. Q1 0.84 (0.66 - 1.07) 0.97 (0.77 - 1.22) 1.22 (0.97 - 1.54) 1.03 (0.83 - 1.28) 0.95 (0.68 - 1.31) 0.93 (0.66 - 1.29) 1.38 (0.99 - 1.93) 1.18 (0.86 - 1.61) 0.82 (0.53 - 1.28) 0.89 (0.55 - 1.42) 1.27 (0.80 - 2.02) 1.31 (0.85 - 2.02) Q3 vs. Q1 0.9 (0.70 - 1.15) 1.08 (0.85 - 1.35) 1.14 (0.90 - 1.43) 1.12 (0.90 - 1.39) 1.01 (0.73 - 1.41) 1.08 (0.77 - 1.50) 1.39 (0.99 - 1.95) 1.37 (1.01 - 1.87) 1.07 (0.70 - 1.66) 1.15 (0.73 - 1.82) 1.38 (0.87 - 2.18) 1.69 (1.11 - 2.57) Q4 vs. Q1 1.15 (0.90 - 1.48) 1.03 (0.82 - 1.30) 1.23 (0.98 - 1.54) 1.24 (1.003 - 1.54) 1.20 (0.86 - 1.68) 1.01 (0.73 - 1.42) 1.43 (1.03 - 2.00) 1.21 (0.89 - 1.65) 1.23 (0.79 - 1.92) 1.06 (0.67 - 1.66) 1.47 (0.93 - 2.33) 1.44 (0.94 - 2.21) Association among patients without asthma with the symptom “some of the time” (OR, 95% CI) Q2 vs. Q1 0.96 (0.73 - 1.27) 1.18 (0.86 - 1.63 1.09 (0.77 - 1.56) 1.07 (0.77 - 1.48) 1.07 (0.73 - 1.57) 1.07 (0.67 - 1.70) 0.91 (0.54 - 1.54) 1.11 (0.69 - 1.79) 0.95 (0.57 - 1.60) 1.22 (0.63 - 2.35) 0.83 (0.40 - 1.71) 1.43 (0.75 - 2.76) Q3 vs. Q1 1.05 (0.80 - 1.40) 1.05 (0.76 - 1.47) 1.13 (0.80 - 1.62) 1.18 (0.86 - 1.63) 1.05 (0.71 - 1.56) 1.07 (0.66 - 1.71) 1.16 (0.70 - 1.93) 1.08 (0.67 - 1.73) 1.01 (0.60 - 1.70) 1.17 (0.60 - 2.27) 1.50 (0.75 - 2.98) 1.30 (0.67 - 2.54)

234

Q4 vs. Q1 1.26 (0.95 - 1.68) 1.28 (0.94 - 1.76) 1.33 (0.94 - 1.86) 1.30 (0.95 - 1.78) 1.10 (0.73 - 1.64) 1.01 (0.64 - 1.61) 1.10 (0.66 - 1.82) 1.16 (0.73 - 1.85) 1.05 (0.61 - 1.81) 1.10 (0.57 - 2.12) 0.84 (0.41 - 1.69) 1.17 (0.60 - 2.26) Association among patients without asthma with the symptom “most of the time” (OR, 95% CI) Q2 vs. Q1 0.95 (0.65 - 1.39) 1.23 (0.65 - 2.30) 1.10 (0.48 - 2.53) 1.18 (0.65 - 2.14) 1.001 (0.57 - 1.76) 0.86 (0.33 - 2.26) 0.96 (0.26 - 3.48) 1.13 (0.48 - 2.66) 1.22 (0.56 - 2.65) 0.63 (0.21 - 1.84) 0.45 (0.09 - 2.23) 0.78 (0.25 - 2.39) Q3 vs. Q1 0.83 (0.56 - 1.23) 1.17 (0.62 - 2.22) 1.56 (0.72 - 3.35) 1.72 (0.99 - 2.99) 0.85 (0.47 - 1.52) 1.43 (0.57 - 3.58) 1.81 (0.53 - 6.18) 1.62 (0.71 - 3.70) 0.91 (0.41 - 2.02) 1.32 (0.39 - 4.45) 0.83 (0.17 - 4.1) 1.56 (0.50 - 4.86) Q4 vs. Q1 1.18 (0.81 - 1.73) 1.50 (0.83 - 2.73) 1.47 (0.68 - 3.16) 1.43 (0.81 - 2.53) 1.24 (0.71 - 2.17) 1.45 (0.60 - 3.50) 1.47 (0.43 - 4.97) 1.38 (0.60 - 3.17) 2.05 (0.95 - 4.40) 2.16 (0.68 - 6.90) 0.66 (0.14 - 3.04) 1.05 (0.35 - 3.14) Association among patients without asthma with the symptom “all of the time” (OR, 95% CI) Q2 vs. Q1 0.62 (0.32 - 1.18) 0.62 (0.22 - 1.76) 1.86 (0.34 - 10.29) 0.76 (0.31 - 1.87) 0.76 (0.30 - 1.90) 0.75 (0.17 - 3.31) 2.46 (0.26 - 23.04) 0.51 (0.14 - 1.83) 0.81 (0.23 - 2.86) 1.85 (0.3 - 11.30) 2.25 (0.24 - 21.38) 0.48 (0.13 - 1.81) Q3 vs. Q1 0.86 (0.47 - 1.58) 1.04 (0.42 - 2.59) 1.18 (0.19 - 7.21) 0.33 (0.10 - 1.05) 0.64 (0.25 - 1.62) 1.19 (0.31 - 4.64) 1.18 (0.09 - 15.18) 0.42 (0.09 - 1.88) 0.99 (0.26 - 3.68) 2.28 (0.43 - 11.99) 5.27 (0.35 - 80.09) 1.92 (0.38 - 9.63) Q4 vs. Q1 1.50 (0.86 - 2.63) 1.30 (0.55 - 3.04) 2.33 (0.46 - 11.74) 1.25 (0.57 - 2.74) 1.75 (0.80 - 3.84) 0.62 (0.16 - 2.41) 1.27 (0.14 - 11.39) 0.72 (0.24 - 2.19) 2.64 (0.90 - 7.69) 0.59 (0.15 - 2.33) 1.14 (0.12 - 10.65) 1.24 (0.33 - 4.65)

235

Among patients with asthma, the fourth quartile (vs. the first quartile) of the summary UNGD metric was associated with decreased odds of having four asthma symptoms at baseline (vs. no symptoms) in the full weighted model (Table 6.4.6.4). The fourth quartile (vs. the first quartile) of the summary UNGD metric was associated with increased odds of having two asthma symptoms at baseline (vs. no symptoms) in unweighted and full weighted models among patients without asthma, but not with one, three, or four symptoms. At follow-up, the summary UNGD metric was associated with one and two symptoms (vs. zero) among patients with asthma in the unweighted model only, but was not associated with number of asthma symptoms among patients without asthma (Table 6.4.6.5). In the sensitivity analysis where we re-ran the number of asthma symptoms at follow-up model with a UNGD metric assigned over the two weeks before survey return, there was no association between UNGD and number of asthma symptoms. When we re-ran the number of asthma symptoms at follow-up model with a

UNGD metric assigned over the year before survey return, the fourth quartile of the

UNGD metric was associated with having one asthma symptom in patients without asthma. There were no associations among patients with asthma, or patients without asthma with two, three, or four symptoms (Table 6.4.6.6).

236

Table 6.4.6.4. Association of UNGD metrics with number of asthma symptoms (multinomial) at baseline among patients with and without asthma from adjusted logistic models. Zero symptoms was the reference outcome. In each cell, the odds ratio and 95% confidence interval were reported from unweighted logistic regression, survey logistic regression with truncated weights, and survey logistic regression with full weights. Associations statistically significant at p=0.05 are bolded. Abbreviations: OR, odds ratio; CI, confidence interval Outcome Number of asthma symptoms Association among asthma patients with 1 symptom (OR, 95% CI) Q2 vs. Q1 0.95 (0.69 - 1.31) 095 (0.57 - 1.60) 0.89 (0.44 - 1.79) Q3 vs. Q1 0.94 (0.68 - 1.30) 0.88 (0.52 - 1.49) 0.79 (0.39 - 1.62) Q4 vs. Q1 1.14 (0.82 - 1.60) 0.81 (0.47 - 1.40) 0.54 (0.28 - 1.05) Association among asthma patients with 2 symptoms (OR, 95% CI) Q2 vs. Q1 1.06 (0.67 - 1.67) 0.82 (0.39 - 1.72) 0.66 (0.28 - 1.55) Q3 vs. Q1 0.63 (0.38 - 1.05) 0.62 (0.26 - 1.44) 0.45 (0.17 - 1.16) Q4 vs. Q1 1.14 (0.71 - 1.83) 1.15 (0.54 - 2.44) 1.44 (0.51 - 4.04) Association among asthma patients with 3 symptoms (OR, 95% CI) Q2 vs. Q1 0.98 (0.54 - 1.78) 0.76 (0.30 - 1.88) 0.77 (0.29 - 2.03) Q3 vs. Q1 0.86 (0.47 - 1.57) 1.0004 (0.38 - 2.66) 0.99 (0.35 - 2.76) Q4 vs. Q1 1.59 (0.90 - 2.80) 2.20 (0.94 - 5.15) 2.37 (0.93 - 6.06) Association among asthma patients with 4 symptoms (OR, 95% CI) Q2 vs. Q1 0.79 (0.41 - 1.54) 0.44 (0.15 - 1.26) 0.33 (0.10 - 1.12) Q3 vs. Q1 1.004 (0.54 - 1.88) 0.78 (0.31 - 1.95) 0.55 (0.19 - 1.62)

237

Q4 vs. Q1 0.91 (0.46 - 1.80) 0.42 (0.14 - 1.23) 0.27 (0.08 - 0.92) Association among patients without asthma with 1 symptom (OR, 95% CI) Q2 vs. Q1 1.03 (0.81 - 1.30) 1.006 (0.70 - 1.45) 1.16 (0.69 - 1.96) Q3 vs. Q1 0.92 (0.72 - 1.17) 1.12 (0.78 - 1.61) 1.19 (0.72 - 1.98) Q4 vs. Q1 1.17 (0.93 - 1.47) 1.33 (0.93 - 1.89) 1.40 (0.85 - 2.31) Association among patients without asthma with 2 symptoms (OR, 95% CI) Q2 vs. Q1 1.45 (0.87 - 2.44) 1.30 (0.64 - 2.62) 1.31 (0.50 - 3.47) Q3 vs. Q1 1.84 (1.12 - 3.01) 0.97 (0.47 - 1.99) 1.44 (0.52 - 3.98) Q4 vs. Q1 1.98 (1.12 - 3.21) 1.88 (0.98 - 3.61) 2.98 (1.20 - 7.43) Association among patients without asthma with 3 symptoms (OR, 95% CI) Q2 vs. Q1 1.34 (0.64 - 2.77) 0.56 (0.18 - 1.70) 0.61 (0.20 - 1.89) Q3 vs. Q1 1.33 (0.65 - 2.73) 1.37 (0.49 - 3.80) 1.44 (0.52 - 4.06) Q4 vs. Q1 1.19 (0.58 - 2.48) 1.45 (0.52 - 4.08) 4.61 (1.39 - 15.25) Association among patients without asthma with 4 symptoms (OR, 95% CI) Q2 vs. Q1 1.01 (0.42 - 2.44) 0.59 (0.14 - 2.48) 1.77 (0.26 - 12.21) Q3 vs. Q1 0.85 (0.34 - 2.11) 0.80 (0.19 - 3.35) 1.98 (0.34 - 11.52)

238

Q4 vs. Q1 1.76 (0.82 - 3.78) 1.42 (0.42 - 4.78) 1.55 (0.46 - 5.19)

Table 6.4.6.5. Association of UNGD metrics with number of asthma symptoms (multinomial) at follow-up among patients with and without asthma from adjusted logistic models. Zero symptoms was the reference outcome. In each cell, the odds ratio and 95% confidence interval were reported from unweighted logistic regression, survey logistic regression with truncated weights, and survey logistic regression with full weights. Associations statistically significant at p=0.05 are bolded. Abbreviations: OR, odds ratio; CI, confidence interval Outcome Number of asthma symptoms Association among asthma patients with 1 symptom (OR, 95% CI) Q2 vs. Q1 1.39 (0.90 - 2.14) 0.93 (0.46 - 1.89) 0.80 (0.35 - 1.86) Q3 vs. Q1 1.40 (0.91 - 2.15) 1.18 (0.59 - 2.36) 1.09 (0.42 - 2.82) Q4 vs. Q1 1.92 (1.24 - 2.95) 1.53 (0.77 - 3.07) 1.24 (0.54 - 2.85) Association among asthma patients with 2 symptoms (OR, 95% CI) Q2 vs. Q1 3.40 (1.60 - 7.23) 2.57 (0.77 - 8.57) 1.18 (0.25 - 5.67) Q3 vs. Q1 2.25 (1.02 - 4.97) 0.66 (0.21 - 2.01) 0.27 (0.06 - 1.24) Q4 vs. Q1 2.56 (1.14 - 5.77) 2.05 (0.57 - 7.42) 0.85 (0.15 - 4.74) Association among asthma patients with 3 symptoms (OR, 95% CI) Q2 vs. Q1 1.58 (0.72 - 3.50) 2.05 (0.56 - 7.48) 1.97 (0.50 - 7.72) Q3 vs. Q1 1.27 (0.56 - 2.87) 0.71 (0.19 - 2.72) 0.68 (0.17 - 2.75) Q4 vs. Q1 1.46 (0.63 - 3.37) 1.16 (0.29 - 4.67) 1.14 (0.27 - 4.90) Association among asthma patients with 4 symptoms (OR, 95% CI)

239

Q2 vs. Q1 1.38 (0.58 - 3.30) 0.79 (0.22 - 2.86) 0.38 (0.08 - 1.84) Q3 vs. Q1 1.17 (0.48 - 2.84) 0.64 (0.18 - 2.30) 0.30 (0.61 - 1.46) Q4 vs. Q1 1.03 (0.39 - 2.71) 0.43 (0.11 - 1.74) 0.22 (0.05 - 1.09) Association among patients without asthma with 1 symptom (OR, 95% CI) Q2 vs. Q1 1.14 (0.83 - 1.56) 1.17 (0.73 - 1.89) 1.32 (0.67 - 2.58) Q3 vs. Q1 1.001 (0.72 - 1.38) 1.19 (0.74 - 1.93) 1.41 (0.72 - 2.76) Q4 vs. Q1 1.35 (0.99 - 1.83) 1.49 (0.94 - 2.35) 1.80 (0.94 - 2.46) Association among patients without asthma with 2 symptoms (OR, 95% CI) Q2 vs. Q1 0.71 (0.38 - 1.35) 0.60 (0.24 - 1.48) 0.66 (0.19 - 2.28) Q3 vs. Q1 1.01 (0.57 - 1.80) 0.50 (0.21 - 1.24) 0.52 (0.13 - 2.08) Q4 vs. Q1 0.95 (0.54 - 1.70) 0.94 (0.43 - 2.09) 1.33 (0.46 - 3.82) Association among patients without asthma with 3 symptoms (OR, 95% CI) Q2 vs. Q1 0.93 (0.39 - 2.67) 0.83 (0.22 - 3.22) 0.84 (0.21 - 3.32) Q3 vs. Q1 1.07 (0.46 - 2.51) 1.49 (0.41 - 5.42) 2.55 (0.54 - 12.01) Q4 vs. Q1 0.91 (0.38 - 2.16) 0.99 (0.25 - 3.84) 2.04 (0.37 - 11.18) Association among patients without asthma with 4 symptoms (OR, 95% CI)

240

Q2 vs. Q1 1.50 (0.36 - 6.36) 1.25 (0.13 - 12.30) 1.12 (0.11 - 10.92) Q3 vs. Q1 1.09 (0.24 - 4.93) 0.61 (0.08 - 4.82) 0.57 (0.07 - 4.41) Q4 vs. Q1 2.58 (0.70 - 9.52) 1.52 (0.20 - 11.36) 1.31 (0.17 - 9.97)

241

Table 6.4.6.6. Association of UNGD metrics assigned over prior six months, two weeks, or one year and number of asthma symptoms (multinomial) at follow-up among patients with and without asthma from adjusted models. Zero symptoms is the reference outcome. In each cell, the odds ratio and 95% confidence interval were reported from survey logistic regression with truncated weights. Associations statistically significant at p=0.05 are bolded. Abbreviations: OR, odds ratio; CI, confidence interval

UNGD assigned over UNGD assigned over UNGD assigned over past six months past two weeks past one year Association among asthma patients with 1 symptom (OR, 95% CI) Q2 vs. Q1 0.93 (0.46 - 1.89) 0.70 (0.34 - 1.42) 1.40 (0.69 - 2.85) Q3 vs. Q1 1.18 (0.59 - 2.36) 0.86 (0.43 - 1.72) 1.37 (0.67 - 2.80) Q4 vs. Q1 1.53 (0.77 - 3.07) 1.28 (0.65 - 2.51) 1.80 (0.88 - 3.68) Association among asthma patients with 2 symptoms (OR, 95% CI) Q2 vs. Q1 2.57 (0.77 - 8.57) 0.87 (0.27 - 2.79) 2.60 (0.75 - 9.03) Q3 vs. Q1 0.66 (0.21 - 2.01) 1.35 (0.45 - 4.11) 0.66 (0.21 - 2.11) Q4 vs. Q1 2.05 (0.57 - 7.42) 1.09 (0.33 - 3.67) 2.18 (0.61 - 7.84) Association among asthma patients with 3 symptoms (OR, 95% CI) Q2 vs. Q1 2.05 (0.56 - 7.48) 0.96 (0.26 - 3.45) 2.07 (0.57 - 7.50) Q3 vs. Q1 0.71 (0.19 - 2.72) 0.7 (0.21 - 2.3) 0.69 (0.17 - 2.84) Q4 vs. Q1 1.16 (0.29 - 4.67) 0.78 (0.2 - 2.97) 1.23 (0.32 - 4.77) Association among asthma patients with 4 symptoms (OR, 95% CI) Q2 vs. Q1 0.79 (0.22 - 2.86) 0.63 (0.18 - 2.18) 1.49 (0.41 - 5.46) Q3 vs. Q1 0.64 (0.18 - 2.30) 0.63 (0.17 - 2.32) 0.54 (0.12 - 2.39) Q4 vs. Q1 0.43 (0.11 - 1.74) 0.38 (0.09 - 1.53) 1.01 (0.24 - 4.29) Association among patients without asthma with 1 symptom (OR, 95% CI) Q2 vs. Q1 1.17 (0.73 - 1.89) 0.91 (0.55 - 1.49) 1.21 (0.75 - 1.96) Q3 vs. Q1 1.19 (0.74 - 1.93) 1.34 (0.84 - 2.13) 1.28 (0.79 - 2.08) Q4 vs. Q1 1.49 (0.94 - 2.35) 1.44 (0.91 - 2.28) 1.64 (1.03 - 2.60) Association among patients without asthma with 2 symptoms (OR, 95% CI) Q2 vs. Q1 0.60 (0.24 - 1.48) 1.19 (0.52 - 2.71) 0.75 (0.31 - 1.8) Q3 vs. Q1 0.50 (0.21 - 1.24) 0.31 (0.11 - 0.87) 0.49 (0.19 - 1.26) Q4 vs. Q1 0.94 (0.43 - 2.09) 1.14 (0.5 - 2.6) 1.05 (0.47 - 2.33) Association among patients without asthma with 3 symptoms (OR, 95% CI) Q2 vs. Q1 0.83 (0.22 - 3.22) 1.20 (0.32 - 4.52) 0.84 (0.23 - 3.11) Q3 vs. Q1 1.49 (0.41 - 5.42) 1.41 (0.36 - 5.46) 1.48 (0.41 - 5.28) Q4 vs. Q1 0.99 (0.25 - 3.84) 1.46 (0.39 - 5.48) 0.96 (0.25 - 3.76) Association among patients without asthma with 4 symptoms (OR, 95% CI) Q2 vs. Q1 1.25 (0.13 - 12.30) 1.10 (0.10 - 11.65) 1.56 (0.15 - 16.31) Q3 vs. Q1 0.61 (0.08 - 4.82) 0.75 (0.10 - 5.40) 0.68 (0.08 - 5.86) Q4 vs. Q1 1.52 (0.20 - 11.36) 1.40 (0.18 - 10.88) 1.71 (0.21 - 14.23)

6.4.7 Discussion

We conducted a study of the association of UNGD and self-reported asthma symptoms using survey data. In contrast to our study of UNGD and objective asthma

242

exacerbations, where we observed consistent associations between UNGD (assigned on the day prior to the exacerbation) and asthma exacerbations among patients with asthma, here we found inconsistent, but generally null, associations between UNGD and asthma symptoms at baseline and follow-up over three and six month windows. At

baseline, the associations that were statistically significant were primarily among

patients without asthma. These results were somewhat unexpected, but there are

several possible explanations, as discussed below.

The differences in the associations at baseline and follow-up may be due to

seasonality. The baseline surveys were returned between March and October 2014, with

a median date of April 22, 2014. The follow-up surveys were returned between

November 2015 and May 2015 with a median date of November 12, 2014. Asthma

exacerbations tend to peak in the fall, especially for children, but also for adults, which

has been attributed to children returning to school and to respiratory viruses.6,7 Seasonal

trends in asthma exacerbations could have obscured an association between UNGD

and asthma symptoms. An additional potential problem is that the questions on asthma

symptoms were not validated for asthma exacerbations, and thus the two analyses

(exacerbations vs. symptoms) are two very different outcomes.8 The most likely explanation is that the questionnaire asked about symptoms over too long a time frame

(three or six months). In the exacerbation study, it was UNGD activity on the day prior to the exacerbation event that was strongly associated, but in the symptom study, UNGD activity and symptoms were assessed over much longer time periods. Finally, it is possible that symptoms were too common and non-specific to have discriminatory properties for asthma exacerbations. These are potential reasons we may not have observed associations between UNGD and asthma symptoms in this data even though we observed prior associations between UNGD and clinically-documented asthma exacerbations.

243

6.5 Greenness and asthma exacerbation study There is increasing interest in the association between greenness and health outcomes. Potential pathways for greenness to affect health include through physical activity, air quality, social contact, stress reduction, and increased social interaction.9

Greenness can be measured in a number of ways, including normalized difference vegetation index (NDVI), which is a measure of greenness from satellite data,9 and Light

Detection and Ranging (LiDAR), which is a laser measurement.10 Several studies have

evaluated greenness and asthma prevalence or exacerbations (defined as

hospitalizations) (Table 6.5.1 and 6.5.2). Studies of NDVI and asthma prevalence have

found conflicting results: one study did not find an association of greenness (measured

using NDVI) and asthma prevalence,11 two studies found increased odds of asthma

prevalence with increasing greenness,12,13 and one study found decreased odds of

asthma prevalence with increasing greenness.14 Both studies that evaluated

associations between NDVI and asthma hospitalizations used an ecological design. Of

these, one found a significant negative correlation between increasing and asthma

hospitalization rates in the spring only,15 and the other found no significant association.16

In this analysis, we evaluated the association of NDVI and asthma exacerbations, and if NDVI was a confounder of the association between UNGD and asthma exacerbations (Figure 6.5.1). NDVI could be a confounder because NDVI is associated UNGD, and NDVI may also be associated with asthma exacerbations. While our primary hypothesis was that NDVI could be a confounder of the UNGD-asthma exacerbation relationship, NDVI could also mediate the association of UNGD and asthma exacerbations: for example, UNGD could alter the landscape and result in less greenness.

244

Figure 6.5.1. Directed acyclic graph of UNGD, NDVI, and asthma exacerbations.

a: NDVI as a mediator of the association of UNGD and asthma exacerbations b: NDVI as a confounder of the association of UNGD and asthma exacerbations c: Ayres-Sampaio 2014 observed a negative association between increasing NDVI and asthma hospitalization rates. d: Rasmussen 2016 observed a positive association between increasing UNGD and asthma exacerbations.

245

Table 6.5.1. Studies on greenness and asthma prevalence. Author Location Population Design Outcome Exposure Results Dadvand Sabadell, 3,178 9-12 Cross-sectional Asthma diagnosis NDVI buffers: OR (95% CI): 2014 Spain year olds 100m 1.00 (0.82, 1.21) 250m 1.00 (0.78, 1.27) 500m 1.03 (0.79, 1.34) 1,000m 1.06 (0.85, 1.32) Lovasi 2013 New York 492 5 year Longitudinal (but Asthma diagnosis LiDAR and multi- OR (95% CI): City olds; 427 7 analyzed cross spectral imagery Age 5: 1.11 (0.85, 1.45) year olds sectionally) at a 0.25 km Age 7: 1.17 (1.02, 1.33) radius Sbihi 2015 British 65,000 Longitudinal (but Incident asthma NDVI OR per IQR of NDVI Columbia, children analyzed cross during preschool- (95% CI): 0.96 (0.93, Canada sectionally) age (0–5 years old) 0.99) Andrusaityte Kaunas, 112 children Cross-sectional Asthma diagnosis NDVI in buffers OR per IQR of NDVI 2016 Lithuania with asthma case-control of 100, 300 and (95% CI): and 1377 500 m 100m buffer: 1.43 children (1.10, 1.85) without 300m buffer: 1.23 (0.94, 1.61) 500m buffer: 1.18 (0.88, 1.57) Abbreviations: OR, odds ratio; CI, confidence interval

246

Table 6.5.2. Studies on greenness and asthma exacerbations. Abbreviations: OR, odds ratio; CI, confidence interval; RR, relative risk Author Location Population Design Outcome Exposure Results

Ayres- Portugal All people Ecologic Asthma Average NDVI in Pearson correlations: Sampaio hospitalization municipality Low urban cover municipalities: 2014 rates by no statistically significant municipality correlation Moderate urban cover municipalities : statistically significant negative correlation (- 0.257) in spring only High urban cover municipality: statically significant negative correlation in all seasons (ranged from -0.50 to -0.38)

Erdman Northeastern People aged 65 Ecologic Asthma Most frequent RR, 95% CI: 0.96 (0.86 - 1.06) 2015 United States years and older hospitalization NDVI value in rates by ZIP code zip code

247

6.5.1 Study population and covariates

We included all events from the UNGD and asthma exacerbation study (Chapter

3). Briefly, we identified 35,508 asthma patients from the Geisinger Clinic population.

Among these asthma patients, we identified 20,749 mild, 1,870 moderate, and 4,782 severe asthma exacerbations, and frequency-matched these on age, sex, and year of events to 14,104, 9,350, and 18,693 control index dates, respectively. For each event,

we created covariates for age, sex, race/ethnicity, season of event, smoking status,

overweight and obesity, Medical Assistance, type 2 diabetes, community socioeconomic

deprivation, distance to nearest major and minor arterial road, and maximum

temperature on the day prior to event (Section 3.3.4).

6.5.2 Greenness measurement, and exposure assignment

We used NDVI to measure greenness. NDVI data was from the National

Aeronautics and Space Administration’s Moderate Resolution Imaging

Spectroradiometer (MODIS). MODIS has global coverage with a 250 meter resolution and provides an NDVI measurement every 16 days. For this study, NDVI values were generalized to a 5 image by 5 image grid in ArcGIS, resulting in a grid of 1250 meters by

1250 meters. For each year of data (2006-12), we assigned all asthma patients’ events the peak NDVI values for the year of the event for the grid cell that contained the geocoded home address. Data was only available for 2006-12, so for events in 2005 we used the NDVI values from 2006. Continuous NDVI values were highly correlated within each patient over the years 2006-2012 (correlation ranged from 0.91 – 0.97).

6.5.3 Statistical analysis

For each asthma exacerbation outcome, we ran a multilevel logistic model with a random intercept for patient (to account for multiple events per patient) and community

(using a mixed definition of place: townships, boroughs, and census tracts in cities). The

models included NDVI (quartiled), the spud UNGD metric, sex, race/ethnicity, season of

248

event, smoking status, overweight and obesity, Medical Assistance, type 2 diabetes, community socioeconomic deprivation, distance to nearest major and minor arterial road, and maximum temperature on the day prior to event as covariates. Continuous covariates (community socioeconomic deprivation, distance to nearest major and minor arterial road, and maximum temperature on the day prior to event) were included with linear and quadratic terms.

6.5.4 Results

There was no association between NDVI and asthma exacerbations in the severe or mild asthma exacerbation analysis. In the moderate asthma exacerbation analysis, the third quartile of NDVI was associated with lower odds of case status (odds ratio = 0.48, 95% confidence interval [0.27-0.86]), and the fourth quartile of NDVI had similar odds (odds ratio = 0.63) but did not reach statistical significance (p = 0.06). For all three outcomes, the odds ratios for UNGD remained largely unchanged compared to models that did not include NDVI (Table 3.4.2).

Table 6.5.4. Association between UNGD, NDVI, and asthma exacerbationsa Severe Moderate Mild UNGD Q2 vs. Q1 1.16 (0.98 - 1.37) 1.57 (1.09 - 2.27) 1.44 (1.28 - 1.62) odds ratios Q3 vs. Q1 1.26 (1.05 - 1.50) 1.55 (1.05 - 2.29) 1.98 (1.75 - 2.24) (95% CI) Q4 vs. Q1 1.64 (1.37 - 1.96) 1.56 (1.07 - 2.27) 2.00 (1.75 - 2.27) NDVI odds Q2 vs. Q1 0.86 (0.70 - 1.06) 0.80 (0.50 - 1.27) 1.08 (0.96 - 1.22) ratios Q3 vs. Q1 1.02 (0.80 - 1.30) 0.48 (0.27 - 0.86) 1.06 (0.93 - 1.22) (95% CI) Q4 vs. Q1 0.92 (0.70 - 1.21) 0.53 (0.27 - 1.03) 1.12 (0.96 - 1.31) Abbreviations: UNGD, unconventional natural gas development; NDVI, Normalized Difference Vegetation Index; CI, confidence interval a Multilevel models with a random intercept for patient and community, adjusted for age category (5-12, 13-18, 19-44, 45-61, 62-74, 75+ years), sex (male, female), race/ethnicity (white, black, Hispanic, other), family history of asthma (yes vs. no), smoking status (never, former, current, missing), season (spring, March 22-June 21; summer, June 22-September 21; fall, September 22-December 21; winter, December 22-March 21), Medical Assistance (yes vs. no), overweight/obesity (normal, body mass index [BMI] < 85th percentile or BMI < 25 kg/m2; overweight, BMI = 85th-<95th percentile or BMI = 25-<30 kg/m2; obese, BMI ≥ 95th percentile or BMI ≥ 30 kg/m2, for children and adults, respectively; BMI missing), type 2 diabetes (yes vs. no), community socioeconomic deprivation (quartiles), distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed), squared distance to nearest major and minor arterial road (truncated at the 98th percentile, meters, z-transformed),

249

maximum temperature on the day prior to event (degrees Celsius), and squared maximum temperature on the day prior to event (degrees Celsius) 6.5.4 Discussion

We did not observe an association between NDVI and mild or severe asthma exacerbations. We observed a lower odds of moderate exacerbations in the third quartile of NDVI, compared to the first, and similarly lower odds, though not statistically significant, for the fourth quartile compared to the first. In contrast to the ecological study that found a negative correlation between NDVI and asthma hospitalization rates at the municipality level, our study used individual level data, so is relevant to inferences about causes of asthma exacerbations in individual patients, not the causes in populations, for example, by zip code or municipality. It is also possible that the relationship between greenness and asthma exacerbations is different in the United States and Europe, and

that the association observed from the study in Portugal is different than that observed in

the United States. Finally, the association between UNGD and asthma exacerbations

remained unchanged comparing models with and without NDVI, and hence was not

confounded by NDVI.

6.6 Greenness and depression symptoms study There is a growing literature on greenness and health outcomes, including mental health.9 However, most studies evaluating the association between greenness

and mental health used a general measure of mental health, the General Health

Questionnaire (GHQ-12), as the outcome.17-24 Only one study evaluated the association

of greenness and a questionnaire specific for depression.25 Here, we evaluated the

association of greenness with depression symptoms. We hypothesized that increased

greenness would be associated with fewer depression symptoms, and that this

association might differ by place type.

6.6.1 Greenness measure

250

As described in the study of greenness and asthma exacerbations (Section

6.6.2), we used normalized difference vegetation index (NDVI), a measure of greenness from satellite data.9 As in the greenness and asthma exacerbation study (Section 6.5), we used peak NDVI and assigned each study participant the NDVI value for the grid cell that contained their geocoded home address. Within each place type, we truncated the

NDVI value at the 2nd and 98th percentiles and created a z-score of the truncated NDVI

values.

6.6.2 Depression symptom data The depression symptom data was described in the unconventional natural gas

and depression symptoms study (Chapter 4). Briefly, data came from a questionnaire

sent to adult patients of the Geisinger Clinic in October 2014 by the Chronic

Rhinosinusitis Integrative Studies Program.5,26 The survey design oversampled for

patients with diagnostic codes for chronic rhinosinusitis, allergic rhinitis, and asthma and

for race/ethnic minorities. A baseline questionnaire was mailed to 23,700 patients, and a

follow-up questionnaire was mailed to all responders of the baseline questionnaire

(Section 4.3.1). The follow-up questionnaire included a validated eight item

questionnaire on depression symptoms (PHQ-8). We excluded patients who lived

outside of Pennsylvania and patients who answered no PHQ-8 questions, leaving 4,762

patients in this analysis.

6.6.3 Covariates

As described in Chapter 4, from the electronic health record, we created covariates on race/ethnicity (white, black, Hispanic); sex (male, female); use of Medical

Assistance for health insurance, a measure of low family socioeconomic status (no, yes); age at survey return (years); smoking status at survey return (current, former, never); alcohol use at survey return (no; current, not heavy; current, heavy); and body mass index (BMI, kg/m2); and we created covariates for well water and community

251

socioeconomic deprivation (CSD) using patients’ geocoded coordinates. We also created a covariate for population density per km2 for each study participant’s mixed

definition of place using data from the U.S. Census 2014 American Community Survey.27

6.6.4 Data analysis

As in the UNGD and depression symptoms study (Chapter 4), we used negative binomial logistic regression to evaluate the association of NDVI with depression symptoms. All models were weighted using truncated survey weights (Section 4.3.6).

We included centered NDVI as a linear and quadratic variable. Race/ethnicity, sex, use of Medical Assistance, age, smoking status, alcohol use, BMI, well water, and CSD were included as covariates. Age, BMI, and CSD were included centered and centered and squared to allow for non-linearity. We stratified by place type (borough and township) because there was little overlap of NDVI by place type (Figure 6.6.4.1), as has been observed in another study of greenness and health outcomes in the Geisinger region.28

We did not run a model in cities because few responders lived in cities (n = 380). In the

township model, we hypothesized that social isolation could be a confounder of the

association between NDVI and depression symptoms. We used population density (per

km2) as a measure of social isolation. Social isolation is a risk factor for depression, and in our data, population density appeared to be associated with NDVI (Figure 6.6.4.2). To

test this hypothesis, we added population density (centered, linear and quadratic) as a

covariate to the model of NDVI and depression symptoms in townships.

Figure 6.6.4.1. Peak normalized difference vegetation index in 2014 by place type among study participants.

252

Abbreviation: NDVI, normalized difference vegetation index

Figure 6.6.4.1. Lowess smoother and scatter plot of population density (per km2) and peak normalized difference vegetation index in 2014 among study participants in townships. Abbreviation: NDVI, normalized difference vegetation index

253

6.6.5 Results

We identified 3,088 study participants in townships and 1,294 in boroughs.

Among study participants in boroughs, there was no association of NDVI with depression symptoms in unadjusted or adjusted models (Table 6.6.5.1). Among study participants in townships, there was no association with the linear term, but there was a quadratic association of NDVI (global p value for the linear and quadratic term = 0.01) with depression symptoms, and associations were largely unchanged when population density was added to the model (Table 6.6.5.2 and Figure 6.6.5).

Table 6.6.5.1. Association of peak normalized difference vegetation index with depression symptoms among study participants in boroughs in surveya negative binomial regressions. Abbreviation: NDVI, normalized difference vegetation index Model Unadjusted Adjusted Included in model 1,294 1,294 Covariates None Allb Exponential coefficient (95% CI) NDVIc 0.98 (0.90 - 1.07) 1.03 (0.94 – 1.13) NDVI2 1.02 (0.94 - 1.11) 1.02 (0.93 - 1.11) Abbreviations: CI = confidence interval a Using truncated survey weights. b Covariates included: race/ethnicity, sex, Medical Assistance, age (centered, centered & squared) smoking status, BMI (centered, centered & squared), well water, alcohol use, CSD (centered, centered & squared) c z-transformed NDVI Table 6.6.5.2. Association of peak normalized difference vegetation index with depression symptoms among study participants in townships in surveya negative binomial regressions. Abbreviation: NDVI, normalized difference vegetation index Adjusted additionally with population Model Unadjusted Adjusted density Included in model 3,088 3,088 3,088 All plus population Covariates None Allb density (linear and quadratic) Exponential coefficient (95% CI) NDVIc 1.001 (0.94 – 1.07) 1.01 (0.93 - 1.08) 1.02 (0.94 - 1.10) NDVI2 1.08 (1.02 - 1.12) 1.06 (1.01 - 1.11) 1.06 (1.01 - 1.11) Abbreviations: CI = confidence interval

254

a Using truncated survey weights. b Covariates included: race/ethnicity, sex, medical assistance, age (centered, centered & squared) smoking status, BMI (centered, centered & squared), well water, alcohol use, CSD (centered, centered & squared) c z-transformed NDVI

Figure 6.6.5. Association of peak normalized difference vegetation index with depression symptoms among study participants in townships in adjusted survey negative binomial regressions. Abbreviation: NDVI, normalized difference vegetation index

6.6.6 Discussion

We conducted a study of the association of NDVI, a measure of residential greenness, and depression symptoms. We stratified the analysis by place type because the distributions of NDVI evidenced large differences with little overlap place type. We observed no association between NDVI and depression symptoms among study participants in boroughs. In townships, we observed a nonlinear association between

NDVI and depression symptoms. For NDVI values below the mean, as they increased towards the mean value, NDVI was associated with fewer depression symptoms.

255

However, increasing NDVI values above the mean were associated with more depression symptoms. These results are in contrast to the prior study of NDVI and depression symptoms, which used data from a state-wide survey of Wisconsin residents.

That study found that higher NDVI was associated with fewer depression symptoms. It controlled for urbanicity by including Rural and Urban Commuting Area codes, a categorical census tract level variable created by the U.S. Department of Agriculture to measure urbanicity based on commuting flow estimates, and population density at the census tract in the models. However, that study did not stratify by place type or urbanicity, so if the levels of greenness in Wisconsin are as different across urban and rural areas as they are in our study, they may have extrapolated beyond their data.

Additionally, that study did not evaluate nonlinear relationships between NDVI and depression symptoms, which could also account for the different results between our study and theirs.

We hypothesized that social isolation (in this study, measured by population density) could explain the positive association for study participants with NDVI above the mean, but when we added population density to the model, associations remained largely unchanged. It is possible that population density is a poor measure of social isolation and that we would see different results with a different measure of social isolation.

6.7 References 1. Pacheco JA, Avila PC, Thompson JA, et al. A highly specific algorithm for identifying asthma cases and controls for genome-wide association studies. AMIA Annu Symp

Proc. 2009;2009:497-501.

2. Tisnado DM, Adams JL, Liu H, et al. What is the concordance between the medical record and patient self-report as data sources for ambulatory care? Med Care.

2006;44(2):132-140.

256

3. Skinner KM, Miller DR, Lincoln E, Lee A, Kazis LE. Concordance between respondent self-reports and medical records for chronic conditions: Experience from the veterans health study. J Ambul Care Manage. 2005;28(2):102-110.

4. Corser W, Sikorskii A, Olomu A, Stommel M, Proden C, Holmes-Rovner M.

"Concordance between comorbidity data from patient self-report interviews and medical record documentation". BMC Health Serv Res. 2008;8:85-6963-8-85.

5. Tustin AW, Hirsch AG, Rasmussen SG, Casey JA, Bandeen-Roche K, Schwartz BS.

Associations between unconventional natural gas development and nasal and sinus, migraine headache, and fatigue symptoms in pennsylvania. Environ Health Perspect.

2016.

6. Johnston NW, Sears MR. Asthma exacerbations . 1: Epidemiology. Thorax.

2006;61(8):722-728.

7. Johnston NW, Johnston SL, Norman GR, Dai J, Sears MR. The september epidemic of asthma hospitalization: School children as disease vectors. J Allergy Clin Immunol.

2006;117(3):557-562.

8. Juniper EF. Validated questionnaires should not be modified. European Respiratory

Journal. 2009;34(5):1015-1017. doi: 10.1183/09031936.00110209.

9. James P, Banay RF, Hart JE, Laden F. A review of the health benefits of greenness.

Curr Epidemiol Rep. 2015;2(2):131-142.

10. National Ocean Service. What is LIDAR?. National Oceanic and Atmospheric

Administration Web site. http://oceanservice.noaa.gov/facts/lidar.html. Published May

29, 2015. Updated 2015. Accessed May 9, 2016.

11. Dadvand P, Villanueva CM, Font-Ribera L, et al. Risks and benefits of green spaces

for children: A cross-sectional study of associations with sedentary behavior, obesity,

asthma, and allergy. Environ Health Perspect. 2014;122(12):1329-1335.

257

12. Andrusaityte S, Grazuleviciene R, Kudzyte J, Bernotiene A, Dedele A,

Nieuwenhuijsen MJ. Associations between neighbourhood greenness and asthma in preschool children in kaunas, lithuania: A case-control study. BMJ Open.

2016;6(4):e010341-2015-010341.

13. Lovasi GS, O'Neil-Dunne JP, Lu JW, et al. Urban tree canopy and asthma, wheeze,

rhinitis, and allergic sensitization to tree pollen in a new york city birth cohort. Environ

Health Perspect. 2013;121(4):494-500.

14. Sbihi H, Tamburic L, Koehoorn M, Brauer M. Greenness and incident childhood

asthma: A 10-year follow-up in a population-based birth cohort. Am J Respir Crit Care

Med. 2015;192(9):1131-1133.

15. Ayres-Sampaio D, Teodoro AC, Sillero N, Santos C, Fonseca J, Freitas A. An

investigation of the environmental determinants of asthma hospitalizations: An applied

spatial approach. Appl Geogr. 2014;47:10-19.

16. Erdman E, Liss A, Gute D, Rioux C, Koch M, Naumova E. Does the presence of

vegetation affect asthma hospitalizations among the elderly? A comparison between

rural, suburban, and urban areas. International Journal of Environment and

Sustainability (IJES). 2015;4(1).

17. Sugiyama T, Leslie E, Giles-Corti B, Owen N. Associations of neighbourhood

greenness with physical and mental health: Do walking, social coherence and local

social interaction explain the relationships? J Epidemiol Community Health.

2008;62(5):e9.

18. Maas J, van Dillen SM, Verheij RA, Groenewegen PP. Social contacts as a possible

mechanism behind the relation between green space and health. Health Place.

2009;15(2):586-595.

19. Triguero-Mas M, Dadvand P, Cirach M, et al. Natural outdoor environments and

mental and physical health: Relationships and mechanisms. Environ Int. 2015;77:35-41.

258

20. Sarkar C, Gallacher J, Webster C. Urban built environment configuration and psychological distress in older men: Results from the caerphilly study. BMC Public

Health. 2013;13:695-2458-13-695.

21. Mitchell R. Is physical activity in natural environments better for mental health than physical activity in other environments? Soc Sci Med. 2013;91:130-134.

22. White MP, Alcock I, Wheeler BW, Depledge MH. Would you be happier living in a

greener urban area? A fixed-effects analysis of panel data. Psychol Sci. 2013;24(6):920-

928.

23. Annerstedt M, Ostergren PO, Bjork J, Grahn P, Skarback E, Wahrborg P. Green qualities in the neighbourhood and mental health - results from a longitudinal cohort

study in southern sweden. BMC Public Health. 2012;12:337-2458-12-337.

24. Astell-Burt T, Mitchell R, Hartig T. The association between green space and mental

health varies across the lifecourse. A longitudinal study. J Epidemiol Community Health.

2014;68(6):578-583.

25. Beyer KM, Kaltenbach A, Szabo A, Bogar S, Nieto FJ, Malecki KM. Exposure to

neighborhood green space and mental health: Evidence from the survey of the health of

wisconsin. Int J Environ Res Public Health. 2014;11(3):3453-3472.

26. Hirsch AG, Stewart WF, Sundaresan AS, et al. Nasal and sinus symptoms and

chronic rhinosinusitis in a population-based sample. Allergy. 2016.

27. U.S. Census Bureau. American community survey, 2014. Table B01003 Web site.

http://factfinder2.census.gov. Accessed 8/22, 2016.

28. Casey JA, James P, Rudolph KE, Wu CD, Schwartz BS. Greenness and birth

outcomes in a range of pennsylvania communities. Int J Environ Res Public Health.

2016;13(3):10.3390/ijerph13030311.

259

Chapter 7: Discussion

7.1 Summary of findings The aims of this thesis were to: 1) evaluate associations of UNGD activity metrics, based on wells only, with asthma exacerbations; 2) evaluate associations of these well-based UNGD activity metrics with depressive symptoms and with disordered sleep; 3) compare the different approaches to UNGD activity assessment used in published studies to themselves and in their relations with mild asthma exacerbations;

and 4) determine whether and how other exposure-relevant aspects of UNGD, such as

impoundments, compressor engines, and flaring events should be incorporated into

UNGD activity metrics. We began data analysis in fall 2013 and finished in late 2016.

This chapter will summarize the findings for these four primary aims (presented in three

manuscripts) and discuss policy implications and future research directions.

The first two aims were addressed with epidemiologic studies and are presented

in separate chapters in this thesis. In Chapter 3, we evaluated the associations of four

well-based UNGD metrics (pad preparation, drilling, stimulation, and production) with

mild, moderate, and severe asthma exacerbation outcomes (new oral corticosteroid

medication orders for asthma, asthma emergency department visits, and asthma

hospitalizations, respectively). We found an association between 11 out of 12 of the

UNGD activity metric-asthma exacerbation pairs. Odds ratios (OR) for the high UNGD

group, compared to very low, ranged from 1.5 (95% confidence interval [CI], 1.2 - 1.7)

for the association of the pad metric with severe exacerbations to 4.4 (95% CI, 3.8 - 5.2)

for the association of the production metric with mild exacerbations. These associations

were robust to increasing levels of covariate control and in several sensitivity analyses.

We hypothesized that these associations, if determined to be causal, were biologically

plausible and could operate through air pollution and/or stress pathways.

260

For the second aim (Chapter 4), we evaluated the association of UNGD activity

with depression symptoms in a population of adults surveyed about their health

symptoms. We chose depression symptoms as the outcome because, similar to asthma

exacerbations, depression and its symptoms have been associated with exposure to air

pollution and stress in prior studies. In this study, we used a summary UNGD metric of

the four phases of well development, instead of the four phases separately, as in the

prior study. We evaluated if migraine headache or fatigue were mediators of this

association because a prior study of this surveyed population found associations of

UNGD activity with symptoms of migraine and fatigue.1 We also evaluated the association of UNGD activity with disordered sleep diagnoses in the Geisinger electronic health record (EHR) because we hypothesized that disordered sleep could be a mediator of a UNGD activity – depression association. We observed an association

between UNGD activity and depression (e.g., for high UNGD activity, compared to very

low, and mild depression, OR = 1.5 [95% CI, 1.1 - 2.0]), but not UNGD activity and

disordered sleep. Fatigue, but not migraine, appeared to partially mediate the

association between UNGD activity and depression. These associations were robust to

increasing levels of covariate control, but only present in survey-weighted models.

The third and fourth aims are presented in a single thesis chapter (Chapter 5).

To address these aims, we completed several analyses related to the UNGD metrics

used in epidemiology studies of UNGD to date. First, we identified and described UNGD-

related compressor stations, impoundments, and flaring events in Pennsylvania.

Second, we used principal component analysis to understand the relationship among

GIS-based metrics for compressors, impoundments, and four phases of well

development (pad preparation, drilling, stimulation, and production). Finally, we

compared how three different metrics used in UNGD and health studies to date

categorized case and control dates identified in the asthma exacerbation study (Chapter

261

3), and in their associations with mild asthma exacerbations. These three metrics were a categorical distance to the nearest drilled well (DNDW) metric, based on Rabinowitz2; an inverse-distance metric based on the drilling phase (IDD), based on McKenzie3 and

Stacy4; and an inverse-distance-squared metric incorporating four phases of well

development and compressor engines (IDS4PC), based on our research. We identified

361 UNGD-related compressor stations, 1,218 impoundments, and 216 locations with

flaring events in Pennsylvania. Using principal component analysis, we found that a

single component captured most of the variation between the metrics for compressors,

impoundments, and the four phases of well development. The loading weights were

approximately the same for each metric. Finally, when we compared three GIS-based

UNGD metrics used in UNGD and health epidemiology studies to date, to each other

and in their association with mild asthma exacerbations, we found that the three metrics

ranked persons differently across a gradient of UNGD and had different magnitudes of

association with mild asthma exacerbations. Although the highest category of each

metric (vs. the lowest) was associated with the outcome, the IDS4PC metric was most

strongly associated with mild asthma exacerbations and evidenced the clearest pattern

of increasing odds across categories of increasing UNGD, followed by the DNDW metric

and then the IDD the metric.

7.2 Health impacts of energy production and use Before discussing the health implications of UNGD, it is important to note that energy production and use, regardless of the source, has both health benefits and health impacts. In low income countries, a lack of access to affordable energy sources is a barrier to health and economic potential,5 and increasing energy use improves health.

For example, on a national level, initially, life expectancy increases and infant mortality

decreases with increasing energy consumption, though the gains quickly level off

(Figure 7.2.1). In higher income countries, the overuse or inefficient use of energy has

262

health impacts, if the energy source has air impacts. Producing and burning fossil fuels results in occupational accidents and the emission of pollutants including particulate matter, nitrogen oxides, and sulfur dioxide. In the United Kingdom, for example, in 2001,

the use of energy for electricity generation was estimated to be responsible for 3,778

deaths, 85% Figure 7.2.1. Scatter plot of infant mortality and life expectancy vs. energy consumption per person.6 Size of the of which were bubble is proportionate to the country’s population. attributed to

coal.6 In the

United

States,

energy use

per person is

much higher

than it is in

other

countries with

similar levels

of economic

development.

But the

increased

energy use in

the United

States does

not translate

263

to improved health or wellbeing outcomes, suggesting that the United States energy use could be lowered.

While not the focus of this thesis, the production and use of energy also has

climate implications, which in turn affects public health. When burned for electricity,

natural gas emits only half the carbon dioxide per unit of energy than coal does.7

However, UNGD has fugitive methane emissions, and there have been conflicting

studies on the magnitude of fugitive methane emissions, some of which suggest that

natural gas produced from UNGD is worse for climate than coal because of the fugitive

emissions.8-12

7.3 Future research directions and policy implications When the research in this thesis began, there was one unpublished epidemiology study of UNGD and pregnancy outcomes.13 Since then, several studies have been

conducted and published,1-4,14-19 including those in this thesis, but the state of knowledge

on UNGD and health is not conclusive. Below is a discussion of future research

directions for studies on UNGD and health, as well as policy implications given the

current state of knowledge on UNGD and health.

7.3.1 Research opportunities

Several frameworks exist to evaluate if these Table 7.3.1. Bradford Hill’s 1 associations are causal, from Bradford Hill’s criteria causal criteria. Strength in 1965 (Table 7.3.1)20 to modern causal inference Consistency Specificity 21 methods. In studies of UNGD and health, we are Temporality Biological gradient limited to observational studies because we cannot Coherence expose populations to something that may be Experiment Analogy harmful, so the experiment criterion is not relevant.

264

Although experimental studies of UNGD and health outcomes in people are not possible, there are many other potential studies that would help inform if the relationship between

UNGD and health outcomes is causal.

7.3.1.1 Replication in other shale basins

Evaluation of a causal relationship is not possible from a single study. This thesis

contains the first studies to evaluate associations of UNGD with asthma exacerbations,

depression, and sleep deprivation. These studies have not been replicated, but need to

be, particularly in other shale basins, where UNGD practices, and thus potential health

impacts, may be different than in the Marcellus shale in Pennsylvania. The asthma

exacerbation and sleep deprivation studies would be difficult to replicate without EHR

data. Researchers could conduct these replication studies in health systems that have

EHRs in other regions with UNGD, for example, Texas and Colorado. Both states have

health care systems that are members of the Health Care Systems Research Network

(www.hcsrn.org), which could provide a source of EHR data. A replication of the

depression study, which relied on a questionnaire, could be conducted without EHR

data, or it could be conducted within a health system with EHR data, as we did. After the

studies in this thesis were replicated, it would be possible to evaluate if the associations

were consistent across studies and if the strength of association was similar across

studies, which are pieces of evidence that would inform if the associations observed are

causal.

7.3.1.2 Improving UNGD exposure assessment in epidemiology studies

A limitation of the epidemiology studies in this thesis is that they did not

incorporate environmental measurements (e.g., air pollution measurements) or

biomarkers (e.g., cortisol). Instead, we designed the GIS-based proxies for UNGD to

capture all potential pathways, though without environmental measurements, we cannot

definitively say what components of UNGD our metric is capturing. We hypothesize that,

265

if the relationships between UNGD and the health outcomes we evaluated are causal, stress and air pollution are the two primary pathways, but without exposure assessment

methods specific to stress or air pollution we cannot test these hypotheses. While GIS-

based proxies we used for UNGD were defensible as a method for low-cost exposure

assessment in the initial studies of UNGD and health, future studies should aim to

improve exposure assessment methods so that they can evaluate specific pathways,

including air pollution, stress, and noise.

One study that would be useful for both policy and informing future research

would be to evaluate exposure levels to noise, criteria air pollutants, and hazardous air

pollutants at different distances from UNG wells, at different phases of well development,

and at wells of varying depths and volumes of natural gas production. Such a study

would be useful to policymakers in determining minimum distances (setbacks) from UNG

wells to homes. Setback distances vary across jurisdictions and have largely been

decided as a result of political negotiations and not based on scientific studies (in Texas,

the minimum setback distance is 200 feet, but in Pennsylvania and Colorado, it is 500

feet22). This study would also be useful for epidemiologists to gain insight into what

pathways may be operating at varying distances from wells.

Future epidemiology studies could evaluate the association of UNGD with health

outcomes using environmental measurements and biomarkers instead of using GIS-

based proxies. For example, a cohort study could periodically collect cortisol biomarkers

from a cohort of patients living at varying distances from UNGD, and in the analysis the

study could compare cortisol levels within the same person at different phases of well

development (and in between phases of development), and across people living at

different densities of UNG wells. A similar study could be conducted for noise and air

pollutants using personal monitors. Because drilling has slowed in recent years, studies

improving exposure assessment methods may need to wait to be conducted until drilling

266

picks up again so that the studies can include a large enough exposed population.

Epidemiology studies incorporating cortisol biomarkers or personal measurements of air pollution or noise would strengthen the evidence on UNGD and health by informing the mechanism between UNGD and health outcomes.

7.3.1.3 Reducing potential sources of bias in epidemiology studies of UNGD

Future studies of UNGD and health outcomes should address potential sources of bias that may affect in the studies in this thesis. For example, we used ICD-9 codes and medication orders to identify disordered sleep outcomes, but this method could have resulted in bias if many patients with disordered sleep did not seek treatment or used

over the counter treatments. A future study on UNGD and disordered sleep would want

to consider using a different method to ascertain the outcome, for example, by using

questionnaires. Additionally, the study participants in our study of UNGD and depression

symptoms tended to be sicker than the general population because the survey

framework oversampled for patients with nasal and sinus symptoms. We used survey

weights to make our study population more similar to the general population in the

region, but there may still be differences between the weighted population and the

general population. A future study on UNGD and depression symptoms could consider

using a study population that more closely matched the general population.

7.3.1.4 Employ causal inference methods in studies of UNGD and health

There are several opportunities to employ causal inference methods in studies

of UNGD with health outcomes, and such studies could help determine if the relationship

between UNGD and health outcomes was causal. For example, the epidemiology

studies in this thesis could be repeated using propensity scores to make patients in the

different UNGD activity groups (very low, low, medium, and high) more similar on

measured confounders. The studies could also be repeated using a difference-in-

differences approach, as used in an unpublished study of UNGD and pregnancy

267

outcomes,13 by comparing the health outcomes of patients living near permitted wells

that are later drilled to patients living near permitted wells that are not drilled. We chose

not use a difference-in-differences approach in this thesis because such an approach

has limitations, namely, that the exposure metric can only be dichotomous. However,

that could be an advantage in studies of setbacks (Section 7.3.1.2), where the exposure

of interest is inherently dichotomous.

7.3.2 Policy implications of studies on UNGD and health

Policy makers on local, state, and international levels have been interested in

the results of studies on UNGD and health from our research group and from others, and

in developing policies to reduce the public health impact of UNGD. I have presented

results from our studies to policy makers from the Maryland House of Delegates and the

European Union Directorate-General for the Environment. Even though the current body

of research on UNGD and health is not conclusive, there are several policy

recommendations that make sense given the current state of knowledge.

7.3.2.1 Improve data collection on UNGD

Studies of UNGD are affected by the quality of recordkeeping on UNG wells. In

Pennsylvania, reporting on UNG wells needs to improve. Particularly, stimulation dates

are frequently missing, though other dates of development (spud dates, production start

dates) and natural gas production quantities also have missingness (Section 2.3.1).

Pennsylvania requires well operators to report this data,23 so the state needs to ensure

that the required information is actually collected with no missingness and made public.

Additionally, the state should publish other information on wells that would be useful for

health studies, but that is not currently collected and made available electronically,

including: natural gas production on a daily basis, the duration of stimulation, the

duration of pad preparation, dates of flaring, vertical and horizontal depth (reported

268

individually, not as just as total depth, as is currently done), and volume of fluid used during stimulation at each well.

Pennsylvania also needs to improve collection of data on infrastructure related to

wells, including compressor engines, impoundments and pipelines. Currently, data on

these are not available electronically, which makes incorporating these into health

studies challenging. We identified impoundments using crowdsourcing and compressors

by data abstracting paper documents, but we likely underestimated the counts of both of

these, because we only looked for impoundments within a kilometer of the nearest well,

and we could not distinguish between compressor engines missing a start letter and

those never started (Chapter 5). The state already collects data on compressor engines,

impoundments, and pipelines, because proposals for new compressor engines,

impoundments, and pipelines are published in the Pennsylvania Bulletin

(pabulletin.com). However, the details included in the Pennsylvania Bulletin for proposed

compressor engines, impoundments, and pipelines are not consistent (for example,

some entries include latitude and longitude and others do not), which makes the

Pennsylvania Bulletin a poor source of information for exposure assessment in

epidemiology studies. The state should compile data on locations, dates of development,

and sizes of this infrastructure systematically into an electronic format available online.

Although our evaluation of compressor engines and impoundments did not suggest that

incorporating these into an inverse-distance-squared metric made a difference in the

interpretation of the association of that metric with an adverse health outcome, having

information on the locations, sizes, and dates of development of compressor engines

and impoundments could be important for exposure assessment studies of specific

pathways. For example, it would be important to know the locations and sizes of

compressor engines to design a study to evaluate the noise impacts of these.

7.3.2.2 Expand air quality monitoring in rural oil and natural gas producing areas

269

The National Ambient Air Quality Standards (NAAQS) are the federal standard

for criteria air pollutants (carbon monoxide, lead, nitrogen dioxide, ozone, particulate

matter, and sulfur dioxide), and the EPA requires that states maintain a monitoring

network for these. However, the requirements for monitors outside of urban areas are

minimal.24 As noted in Chapter 5, we were not able to compare our UNGD metrics

against EPA monitor data because the EPA monitor network is not dense enough in the

areas of Pennsylvania with UNGD. The Environmental Defense Fund and other

environmental organizations have called on the EPA to increase air pollution monitoring

in rural areas with oil and gas development and laid out a legal framework for the EPA to

do so.25 The EPA should move forwards with increasing air pollution monitoring, particularly ozone and particulate matter, in rural, oil and gas producing areas so that future studies could evaluate the impact of UNGD on air quality.

For hazardous air pollutants (HAPs) and precursors to ozone, the monitoring strategy should be different. Studies of emission events of HAPs and ozone precursors from UNGD show that these have been events with high emissions over a short period of time with large spatial variability over a small area. The existing EPA network is not able to detect these events because of the short duration and high spatial variability.

One potential strategy to monitor these emission events is to drive mobile air pollution monitoring stations equipped with monitors designed to measure emissions from point sources like these around well pads during phases of development when emissions are likely (e.g. stimulation).26

7.3.2.3 Fund research on UNGD and health

While Pennsylvania proceeded rapidly with UNGD, other states in the Marcellus shale, namely Maryland and New York, enacted moratoriums due to possible uncharacterized environmental and health impacts. At the time these were enacted, few health studies had been published. Now, several studies have found associations

270

between UNGD and health outcomes, but the research on UNGD and health is still far from conclusive. In order for states to determine if the risks from UNGD outweigh the benefits, they need more epidemiology and exposure studies on UNGD, so this research needs to be funded. This funding could be modeled after the Deepwater Horizon

Research Consortia, which is a National Institute of Environmental Health Sciences

(NIEHS) program to study the health effects of oil spills, funded in part by BP. While BP

provided some of the funding, it is not otherwise involved in the program or the

research.27 Similarly, companies involved with UNGD could fund a NIEHS program to

study UNGD and health, and the NEIHS could ensure that these companies are not

involved with the program or its research.

7.3.2.4 Incorporate externalities into energy prices

Several state and local governments have considered enacting or have enacted moratoriums or bans on UNGD. But UNGD has decreased the cost of natural gas, and as a result there has been a decline in the use of coal to produce electricity in favor of natural gas.28 An unintended consequence of UNGD moratoriums or bans could be to

increase the use of coal for electricity production by decreasing the supply of natural gas

from UNG wells. If the methane leakage from UNG wells can be reduced through

regulations and best practices, UNGD moratoriums or bans could actually speed up

climate change by shifting power plants back to coal.

Instead of UNGD moratoriums and bans, energy needs to be priced to incorporate the externalities of energy production and use. Currently, the producers of energy sources that have health and climate impacts do not pay the costs of the impacts they create. The leading proposal to incorporate these externalities into energy prices is a carbon tax, which is a tax based on the carbon dioxide content (and also potentially the content of other greenhouse gases, like methane) of fuels.29 The price of coal and

natural gas (from conventional and unconventional sources) would rise. By increasing

271

the cost of fossil fuels, the use of those fuels will decrease in favor of renewables and energy efficiency. A carbon tax would have positive health effects over the short term because fuels like coal, which have high greenhouse gas emissions, also have emissions of particulate matter and other air emissions that affect health. If the carbon tax also incorporated other greenhouse gases, such as methane, the carbon tax would de-incentivize UNGD. This could potentially reduce any negative health effects of

UNGD, if the associations observed in epidemiology studies to date are causal. A

carbon tax would also have positive health effects over the longer term by mitigating

climate change and its associated health impacts.

7.3.3 Health implications of our research

If there is a causal association of UNGD with adverse health outcomes, as drilling continues in Pennsylvania, populations will continue to be exposed. While a policy intervention to reduce exposure would be ideal, until such a measure is passed, residents of regions with UNGD should be aware of potential health impacts and take steps to protect their health. The Southwest Pennsylvania Environmental Health Project

(environmentalhealthproject.org), a nonprofit organization working on health impacts of

UNGD in Pennsylvania, provides recommendations for nearby residents, including

frequently vacuuming with a HEPA filter vacuum, taking notes of health symptoms over

time, and recommending that residents stop drinking water from the tap if it causes rash or pain for someone in the household. However, the effectiveness of these recommendations has not been evaluated. Instead, we would recommend that residents are aware and inform their doctors of UNGD occurring near their home and work, and that residents advocate for policies to reduce potential exposures.

7.4 Final Remarks This thesis contributes to the body of research on UNGD and health, and more

broadly, energy and health. Research on the health effects of UNGD needs to continue

272

and address the limitations of prior studies, most importantly by incorporating biomarkers and environmental measurements, so that epidemiologists can determine if the associations observed in this thesis are causal and so that research can better inform

policy decisions. As greenhouse gas emissions rise to critical levels and supplies of

conventional fossil fuels diminish, policy makers have important decisions to make about

what sources of energy will power the future. Historically, economic considerations were

key in making decisions about energy, but this thesis and other epidemiology studies of

UNGD show that health and environmental concerns must be considered too.

273

7.5 References

1. Tustin AW, Hirsch AG, Rasmussen SG, Casey JA, Bandeen-Roche K, Schwartz BS.

Associations between unconventional natural gas development and nasal and sinus, migraine headache, and fatigue symptoms in pennsylvania. Environ Health Perspect.

2016.

2. Rabinowitz PM, Slizovskiy IB, Lamers V, et al. Proximity to natural gas wells and reported health status: Results of a household survey in washington county, pennsylvania. Environ Health Perspect. 2014.

3. McKenzie LM, Guo R, Witter RZ, Savitz DA, Newman LS, Adgate JL. Birth outcomes and maternal residential proximity to natural gas development in rural colorado. Environ

Health Perspect. 2014.

4. Stacy SL, Brink LL, Larkin JC, et al. Perinatal outcomes and unconventional natural gas operations in southwest pennsylvania. PLOS ONE. 2015;10(6):e0126425.

5. Markandya A, Wilkinson P. Electricity generation and health. The Lancet.

2007;370(9591):979-990.

6. Wilkinson P, Smith KR, Joffe M, Haines A. A global perspective on energy: Health effects and injustices. The Lancet. 2007;370(9591):965-978.

7. U.S. Energy Information Administration. How much carbon dioxide is produced when different fuels are burned? http://www.eia.gov/tools/faqs/faq.cfm?id=73&t=11.

8. Howarth RW, Santoro R, Ingraffea A. Methane and the greenhouse-gas footprint of natural gas from shale formations. Clim Change. 2011;106(4):679-690.

274

9. Allen DT, Torres VM, Thomas J, et al. Measurements of methane emissions at natural gas production sites in the united states. Proc Natl Acad Sci U S A. 2013;110(44):17768-

17773.

10. Jiang M, Griffin WM, Hendrickson C, Jaramillo P, VanBriesen J, Venkatesh A. Life

cycle greenhouse gas emissions of marcellus shale gas. Environmental Research

Letters. 2011;6(3):034014.

11. Karion A, Sweeney C, Pétron G, et al. Methane emissions estimate from airborne

measurements over a western united states natural gas field. Geophys Res Lett.

2013;40(16):4393-4397.

12. Adgate JL, Goldstein BD, McKenzie LM. Potential public health hazards, exposures and health effects from unconventional natural gas development. Environ Sci Technol.

2014;48(15):8307-8320.

13. Hill EL. Unconventional Natural Gas Development and Infant Health: Evidence from

Pennsylvania. 2012.

14. Fryzek J, Pastula S, Jiang X, Garabrant DH. Childhood cancer incidence in pennsylvania counties in relation to living in counties with hydraulic fracturing sites.

Journal of Occupational and Environmental Medicine. 2013;55(7):796-801.

15. Finkel M. Shale gas development and cancer incidence in southwest pennsylvania.

Public Health. 2016;141:198-206.

16. Graham J, Irving J, Tang X, et al. Increased traffic accident rates associated with

shale gas drilling in pennsylvania. Accident Analysis & Prevention. 2015;74:203-209.

275

17. Jemielita T, Gerton GL, Neidell M, et al. Unconventional gas and oil drilling is

associated with increased hospital utilization rates. PLoS ONE. 2015;10(7):e0131093.

18. Saberi P, Propert KJ, Powers M, Emmett E, Green-McKenzie J. Field survey of health perception and complaints of pennsylvania residents in the marcellus shale region. Int J Environ Res Public Health. 2014;11(6):6517-6527.

19. Casey JA, Savitz DA, Rasmussen SG, et al. Unconventional natural gas

development and birth outcomes in pennsylvania, USA. Epidemiology. 2015.

20. Hill AB. The environment and disease: Association or causation? Proc R Soc Med.

1965;58(5):295-300.

21. Glass TA, Goodman SN, Hernan MA, Samet JM. Causal inference in public health.

Annu Rev Public Health. 2013;34:61-75.

22. Haley M, McCawley M, Epstein AC, Arrington B, Bjerke EF. Adequacy of current state setbacks for directional high-volume hydraulic fracturing in the marcellus, barnett, and niobrara shale plays. Environ Health Perspect. 2016;124(9):1323-1333.

23. Pennsylvania Code. Subchapter E. Well reporting

§ 78.121-§ 78.125. http://www.pacode.com/secure/data/025/chapter78/subchapEtoc.html.

24. Environmental Protection Agency. Network design criteria for ambient air quality

monitoring. .

25. Environmental Defense Fund. Petition for the U.S. environmental protection agency

to promptly require oila nd gas owners an opertors to monitor for ozone and to issue

276

control techniques guidelines for oil and natural gas operations in nonattainment areas. .

2012.

26. Olaguer EP, Erickson M, Wijesinghe A, Neish B, Williams J, Colvin J. Updated methods for assessing the impacts of nearby gas drilling and production on neighborhood air quality and human health. J Air Waste Manag Assoc. 2016;66(2):173-

183.

27. National Institute of Environmental Health Sciences. Deepwater horizon research consortia. https://www.niehs.nih.gov/research/supported/centers/gulfconsortium/.

Updated 2016. Accessed 1/6, 2017.

28. Culver WJ, Hong M. Coal’s decline: Driven by policy or technology? The Electricity

Journal. 2016;29(7):50-61.

29. Schnoor JL. Responding to climate change with a carbon tax. Environ Sci Technol.

2014;48(21):12475-12476.

277

Appendix

Institutional review board documents

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

Curriculum Vita – Sara G. Rasmussen

WORK ADDRESS

Johns Hopkins Bloomberg School of Public Health 615 N. Wolfe St., W7508 Baltimore, MD 21205 [email protected]

EDUCATION

2012–2017 (expected) Doctor of Philosophy in Environmental Health Sciences Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland Advisor: Brian Schwartz, MD MS Dissertation Title: “Associations of unconventional natural gas development with asthma exacerbations and depressive symptoms in Pennsylvania”

2010–2011 Master of Health Science in Environmental Health Sciences Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland

2006–2010 Bachelor of Arts in Anthropology, cum laude Washington University in St. Louis, St. Louis, Missouri

PROFESSIONAL TRAINING

Fall 2013 Risk Sciences and Public Policy Certificate, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland

PROFESSIONAL EXPERIENCE

May 2011–August Staff Researcher, Earth Policy Institute, Washington, DC 2012 Research, data collection, and fact-checking for Lester Brown's book, Full Planet, Empty Plates: The New Geopolitics of Food

TEACHING EXPERIENCE

Fall 2016 Teaching assistant, Global Sustainability and Health Seminar (188.688.01, 1 credit), Johns Hopkins Bloomberg School of Public Health Lead weekly discussions on readings

Fall 2012 - Fall 2015 Lead teaching assistant, The Global Environment and Public Health (180.611.01, 4 credits), Johns Hopkins Bloomberg School of Public Health Formulated assignments, graded exams, addressed student questions

Spring 2013 Teaching assistant, Environmental and Occupational Health Law and Policy (180.629.01, 4 credits), Johns Hopkins Bloomberg School of Public Health Graded exams and homework assignments

294

HONORS AND AWARDS

March 2016 Morgan James Endowment Award

March 2015 Morgan James Endowment Award

February 2015 Johns Hopkins Bloomberg School of Public Health Delta Omega Poster Competition, Second Place

May 2011 Delta Omega Honor Society

PUBLICATIONS

1. Casey JA, Ogburn EL, Rasmussen SG, Irving JK, Pollak J, Locke PA, Schwartz BS. Predictors of indoor radon concentrations in Pennsylvania, 1989-2013. Environ Health Perspect. 2015 Nov;123(11):1130-1137.

2. Casey JA, Savitz DA, Rasmussen SG, Ogburn EL, Pollak J, Mercer DG, Schwartz BS. Unconventional natural gas development and birth outcomes in Pennsylvania, USA. Epidemiology. 2016 Mar;27(2):163-72.

3. Rasmussen SG, Ogburn EL, McCormack M, Casey JA, Bandeen-Roche K, Mercer DG, Schwartz BS. Association between unconventional natural gas development in the Marcellus shale and asthma exacerbations. JAMA Intern Med. 2016;176(9):1334-1343.

4. Tustin AW, Hirsch AG, Rasmussen SG, Casey JA, Schwartz BS. Associations between unconventional natural gas development and nasal and sinus, migraine headache, and fatigue symptoms in Pennsylvania. Environ Health Perspect. 2017 Feb;125(2):189-197.

PAPERS IN PROGRESS

1. Rasmussen SG, Wilcox H, Hirsch AG, Pollak J, Schwartz BS. Associations of unconventional natural gas development with disordered sleep and depression symptoms in Pennsylvania.

2. Rasmussen SG, Koehler K, Ellis H, Manthos D, Bandeen-Roche K, Platt R, Schwartz BS. Exposure assessment using secondary data sources in unconventional natural gas development and health studies

NON-PEER REVIEWED ARTICLES

1. Rasmussen SG, Casey JA, Schwartz BS. Fracking and health: What we know from Pennsylvania’s natural gas boom. The Conversation. 25 August 2016.

2. Rasmussen SG, Schwartz BS. Unconventional Natural Gas Development: Epidemiology Studies and Public Health Implications. Society of General Internal Medicine Forum. 2016.

SCIENTIFIC CONFERENCE PRESENTATIONS

1. Rasmussen SG, McCormack M, Casey JA, Ogburn EL, Schwartz BS. Marcellus shale development, air pollution, and asthma exacerbations. Poster session presented at: 27th Conference of the International Society for Environmental Epidemiology; 2015 Aug 30-Sep 3; São Paulo, Brazil.

2. Rasmussen SG, Casey JA, Bandeen-Roche K, Schwartz BS. Proximity to industrial food animal production and asthma exacerbations in Pennsylvania, 2005-2012. Poster session at: 49th Annual Meeting of the Society for Epidemiologic Research; 2016 June 21-24; Miami, Florida.

295

3. Rasmussen SG, Hirsch AG, McCormack M, Schwartz BS. Associations between unconventional natural gas development, respiratory symptoms, and mental health in Pennsylvania. Poster session at: 28th Conference of the International Society for Environmental Epidemiology; 2016 September 1-4; Rome, Italy.

4. Rasmussen SG, Ellis H, Koehler KA, Schwartz BS. Exposure assessment in unconventional natural gas and health studies. Oral presentation at: Annual Conference of International Society of Exposure Science; 2016 October 9-13; Utrecht, The Netherlands.

INVITED PRESENTATIONS

1. Rasmussen SG. Marcellus shale development, air pollution, and asthma exacerbations. Lecture in Occupational and Environmental Hygiene Seminar (182.860.81) at Johns Hopkins University, 2015 March, Johns Hopkins Bloomberg School of Public Health.

2. Rasmussen SG. Public Health Practice Seminar on the Future of the Maryland Fracking Moratorium at Johns Hopkins University, 2016 October, Johns Hopkins Bloomberg School of Public Health.

3. Rasmussen SG. Unconventional Natural Gas Development & Health Studies. Presentation at the European Union Technical Workshop on Public Health Impacts and Risks Resulting from Hydrocarbons Exploration and Production, Brussels, Belgium, 2016 November.

4. Schwartz BS and Rasmussen SG. New Research on Asthma and Other Public Health Considerations of Shale Gas. Presentation at the League of Women Voters of Pennsylvania Shale & Public Health Conference, Pittsburgh, Pennsylvania, 2016 November.

RESEARCH GRANT PARTICIPATION

2014–2016 National Science Foundation Water, Climate, and Health Integrative Graduate Education and Research Traineeship

2012–2014 National Institute of Environmental Health Sciences Training Grant, ES07141

EDITORIAL ACTIVITIES

2016 - Present Ad Hoc Reviewer, Environmental Research

PROFESSIONAL MEMBERSHIPS

2015-Present International Society for Environmental Epidemiology

296