The University of Manchester Research

Sputum microbiome temporal variability and dysbiosis in chronic obstructive pulmonary disease exacerbations: an analysis of the COPDMAP study DOI: 10.1136/thoraxjnl-2017-210741

Document Version Accepted author manuscript

Link to publication record in Manchester Research Explorer

Citation for published version (APA): Kolsum, U., Singh, D., & COPDMAP (2017). Sputum microbiome temporal variability and dysbiosis in chronic obstructive pulmonary disease exacerbations: an analysis of the COPDMAP study. Thorax. https://doi.org/10.1136/thoraxjnl-2017-210741 Published in: Thorax

Citing this paper Please note that where the full-text provided on Manchester Research Explorer is the Author Accepted Manuscript or Proof version this may differ from the final Published version. If citing, it is advised that you check and use the publisher's definitive version.

General rights Copyright and moral rights for the publications made accessible in the Research Explorer are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

Takedown policy If you believe that this document breaches copyright please refer to the University of Manchester’s Takedown Procedures [http://man.ac.uk/04Y6Bo] or contact [email protected] providing relevant details, so we can investigate your claim.

Download date:11. Oct. 2021 1 Sputum microbiome temporal variability and dysbiosis in chronic obstructive pulmonary 2 disease exacerbations: an analysis of the COPDMAP study 3 4 Zhang Wang1†, Richa Singh4†, Bruce E. Miller2, Ruth Tal-Singer2, Stephanie Van Horn3, Lynn 5 Tomsho3, Alexander Mackay4, James P. Allinson4, Adam J. Webb5, Anthony J. Brookes5, Leena 6 M. George6, Bethan Barker6, Umme Kolsum7, Louise E Donnelly4, Kylie Belchamber4, Peter J. 7 Barnes4, Dave Singh7, Christopher E. Brightling6, Gavin C. Donaldson4, Jadwiga A. Wedzicha4, 8 James R. Brown1* on behalf of COPDMAP 9 10 1 Computational Biology, Target Sciences, Research and Development (R&D), GlaxoSmithKline 11 (GSK), Collegeville, PA 19426, USA 12 2 Respiratory Therapy Area Unit, R&D, GSK, King of Prussia, PA 19406, USA 13 3 Target and Pathway Validation, Target Sciences, R&D, GSK, Collegeville, PA 19426, USA 14 4 National Heart and Lung Institute, Imperial College London, London, SW3 6NP, UK 15 5 Department of Genetics, University of Leicester, Leicester, LE1 7RH, UK 16 6 Institute for Lung Health, University of Leicester, Leicester, LE3 9QP, UK 17 7 University of Manchester and University Hospital of South Manchester, Manchester, M23 9QZ, 18 UK 19 20 † These authors contributed equally to the manuscript. 21 22 * Correspondence to: 23 James R. Brown ([email protected]) 24 1250 S. Collegeville Road, Collegeville, Pennsylvania, 19426-0989, United States 25 Mobile: +16102478580 26 Tel: +16109176374 27 28 Word count abstract: 240 29 Word count text body: 3500 30

1

31 Abstract 32 Background 33 Recent studies suggest that lung microbiome dysbiosis, the disease associated disruption of the 34 lung microbial community, might play a key role in chronic obstructive pulmonary disease 35 (COPD) exacerbations. However, characterizing temporal variability of the microbiome from 36 large longitudinal COPD cohorts is needed to better understand this phenomenon. 37 38 Methods 39 We performed a 16S ribosomal RNA survey of microbiome on 716 sputum samples collected 40 longitudinally at baseline and exacerbations from 281 COPD subjects at three UK clinical 41 centers as part of the COPDMAP consortium. 42 43 Results 44 The microbiome composition was similar among centers and between stable and exacerbations 45 except for a small significant decrease of Veillonella at exacerbations. The abundance of 46 Moraxella was negatively associated with bacterial alpha diversity. Microbiomes were distinct 47 between exacerbations associated with versus eosinophilic airway inflammation. 48 Dysbiosis at exacerbations, measured as significant within subject deviation of microbial 49 composition relative to baseline, was present in 41% of exacerbations. Dysbiosis was associated 50 with increased exacerbation severity indicated by a greater fall in FEV1, FVC and a greater 51 increase in CAT score, particularly in exacerbations with concurrent eosinophilic inflammation. 52 There was a significant difference of temporal variability of microbial alpha and beta diversity 53 among centers. The variation of beta diversity significantly decreased in those subjects with 54 frequent exacerbations. 55 56 Conclusions 57 Microbial dysbiosis is a feature of some exacerbations and its presence, especially in concert 58 with eosinophilic inflammation, is associated with more severe exacerbations indicated by a 59 greater fall in lung function. 60

2

61 Key messages: 62 What is the key question? 63 How does the lung microbial community vary over time within COPD subjects and how is 64 microbial dysbiosis in exacerbations implicated in disease characteristics? 65 66 What is the bottom line? 67 Dysbiosis of the sputum microbiome in COPD exacerbations, particularly in concert with 68 eosinophilic inflammation, is associated with a greater decline in lung capacity during the 69 exacerbation event. 70 71 Why read on? 72 The presented study entails the largest COPD sputum microbiome cohort to date with multiple 73 study centers, aiming at in-depth examination of microbial temporal variability, dysbiosis, and 74 disease phenotypes.

3

75 Introduction 76 Chronic obstructive pulmonary disease (COPD) is characterized by persistent symptoms and 77 impaired lung function as a consequence of small airway obliteration and alveolar destruction, 78 and is associated with chronic lung inflammation 1-3. Acute exacerbations of COPD are a sudden 79 onset of sustained worsening of these symptoms. Bacteria potentially play a key role in COPD 80 pathogenesis 4 5, with respiratory bacterial pathogens such as Haemophilus influenzae, Moraxella 81 catarrhalis and Streptococcus pneumoniae capable of driving host inflammatory responses 6-9. 82 Since bacteria frequently interact with each other and respond to altered environmental 83 conditions, the consortium of the lung microbial community, known as the lung microbiome, 84 could be important in the crosstalk between respiratory tract pathogens and host response 10 11. 85 86 Emerging studies collectively suggest that the lung microbiome differs between stable and 87 exacerbations in COPD (11-15, for review see 16). For example, Molyneaux et al. found an 88 increased representation of pathogenic in particular Haemophilus in 89 exacerbations following rhinovirus infection 12. Huang et al. observed an increase of 90 Proteobacteria during exacerbations with a predicted loss of function in maintenance of 91 microbial homeostasis 13. Recently, several of us published a longitudinal analysis of the sputum 92 microbiome from 87 subjects from BEAT-COPD cohort 11. Our analysis revealed an increased 93 Proteobacteria versus Firmicutes during exacerbations. In addition, we found distinct 94 microbiome composition between bacterial and eosinophilic exacerbations. In light of the 95 heterogeneous nature of COPD exacerbations, the lung microbiome has potential as a biomarker 96 to assist in the precision medicine treatments for specific COPD patient subpopulations. 97 98 Although insightful, results from these previous studies have limitations in terms of 99 understanding microbial dysbiosis during exacerbations, as most of these studies comparing the 100 microbiome at stable and exacerbations involved only one single sampling point of each state. 101 The lung microbiome is temporally dynamic and can vary even in stable state 10.Thus the 102 microbial changes during exacerbations are a mixture of both the disease associated disruption of 103 microbial community or dysbiosis, and the regular temporal perturbations of the lung microbial 104 composition. Therefore, examining the baseline variation of the lung microbiome is an important

4

105 first step to more precisely assess the extent of microbial dysbiosis during exacerbations. On the 106 other hand, understanding temporal variability of the lung microbiome within individuals is also 107 important in disease understanding. Disorder of the temporal balance of microbial ecosystem in 108 the respiratory tract could trigger a dysregulated host immune response that results in negative 109 effects on host biology 10. Linking microbial temporal variation to disease characteristics and 110 host inflammatory profiles could potentially lead to monitoring and manipulating the stability of 111 airway microbial composition as a therapeutic strategy for COPD. 112 113 A finer-grained longitudinal sampling of microbiome at multiple stable and exacerbation visits is 114 necessary to quantitatively measure temporal variability of the microbiome and assess the 115 significance of microbial dysbiosis during exacerbations. Here we describe a longitudinal 16S 116 ribosomal RNA (rRNA) gene based microbiome survey on 716 sputum samples collected 117 sequentially at baseline and exacerbations over a period of up to two years duration from 281 118 COPD subjects at three UK centers as part of the COPDMAP consortium. This entails one of the 119 largest COPD sputum microbiome cohorts to date aiming at in-depth examination of temporal 120 variability of the microbiome. We provide new insights into temporal changes of the microbiome 121 and its potential implication in disease progression. 122

5

123 Material and Methods 124 Subjects and samples 125 Full information on subject inclusion/exclusion criteria, sputum sample collection, microbiome 126 and statistical analyses are provided in the online supplementary appendix. Briefly, sputum 127 samples were collected at multiple longitudinal baseline and exacerbation visits from COPD 128 subjects at three clinical centers, Imperial College London, University of Leicester and 129 University Hospital of South Manchester (hereafter referred to as London, Leicester and 130 Manchester, respectively) as part of the COPDMAP consortium (www.copdmap.org). All 131 sputum samples were immediately stored at -80oC and shipped frozen in batches for analysis. 132 Exacerbations were treated with corticosteroids and antibiotics according to guidelines 17. The 133 protocol summary is available at https://clinicaltrials.gov/ (Identifier: NCT01620645). 134 135 Microbiome analysis 136 For quality control purposes, all DNA extractions, sequencing and data analyses were performed 137 in a single, centralized lab at the GSK R&D facility in Collegeville, Pennsylvania, USA. 138 Bacterial genomic DNA was extracted from frozen sputum samples using the Qiagen DNA Mini 139 kit (Qiagen, CA, USA) as per manufacture protocol. The V4 hypervariable region of the 16S 140 rRNA gene was PCR amplified and sequenced using multiplexed Illumina Miseq platform with 141 the proper controls against reagent contamination as described previously 11. Sequencing reads 142 were processed using QIIME pipeline version 1.9 18. The default set of criteria was used to 143 remove low quality and chimeric reads. The remaining reads were subject to a close reference 144 OTU picking (97% identity cutoff). Sequence data are deposited at the National Center for 145 Biotechnology Information Sequence Read Archive (SRP102480). 146 147 Statistical analysis 148 Exacerbation phenotypes were defined using microbiological and clinical criteria as established 149 previously [12]. Phenotypes of 146 exacerbations samples were undetermined due to missing 150 data. Partial Least Squares Discriminant Analysis (PLS-DA) was performed on exacerbation 151 phenotypes and microbiome and/or clinical data using SIMCA-P (Umetrics, Stockholm, 152 Sweden) 19. Dysbiosis at exacerbations was measured as the deviation (Z-score) of the first

6

153 Principal Coordinate (PC1) of the weighted UniFrac distance for exacerbation samples relative to 154 all baseline PC1s from the same subject. Temporal variability of microbial alpha and beta 155 diversity was measured using the metrics described by Flores et al. 20. A general linear model 156 (GLM) was constructed between demographic and baseline clinical variables and temporal 157 variability of alpha and beta diversity among subjects. The model was optimized in a stepwise 158 algorithm using the “step” function in the R stats package 21. The false discovery rate (FDR) 159 method was used to adjust P-values for multiple testing wherever applicable 22. 160

7

161 Results 162 Overview of the COPDMAP sputum microbiome 163 Microbial composition was determined for 716 sputum samples collected at baseline and 164 exacerbations from 281 COPDMAP subjects at three centers. The number of samples varies 165 from one to nine per subject (Fig. S1). Demographic and baseline clinical data were recorded for 166 subjects at initiation of sample collection (Table 1, Table S1). A set of 16 clinical and 167 biochemical characteristics were further collected longitudinally (Table 2, Table S2). From DNA 168 sequences of the V4 hypervariable region of the 16S rRNA gene, a total of 3,784 operational 169 taxonomic units (OTUs) were identified using 97% identity cut-off after rarefaction. 170 171 Table 1. Major demographic and baseline clinical features of all subjects. Demographic and baseline features All subjects (N=281) *

Gender † Male: 187 (70.3%), Female: 79 (29.7%)

Age ‡ 70 (8.1)

BMI 27.2 (5.4)

1: 30 (11.4%), 2: 132 (50.2%), 3: 78 (29.7%), 4: 23 Baseline GOLD status (8.7%)

Antibiotics: 38 (15.3%), Steroids: 9 (3.6%), Both: 202 Treatment # (81.1%)

Number of cigarette packs per year 1 47 (30)

Number of exacerbation per year 1 1.1 (1.6)

Baseline FEV1 1.5 (0.6)

Baseline FEV1% 56.3 (18.9)

Baseline FEV1 predicted 2.6 (0.5)

Baseline FVC 2.9 (1.0)

Baseline FEV1/FVC ratio 0.5 (0.1)

CAT score 18.7 (7.3)

CES-D score 1 10 (13)

8

SGRQ total score 47.4 (18.2) 172 † Categorical data present as number (proportion). ‡ Continuous data present as mean (SD) unless stated below. 173 1 Median (interquartile range). 174 * 15 subjects were missing any demographic or clinical data. 175 # The numbers represent exacerbation events, thus include subjects with more than one exacerbation. 176 177 Table 2. Major longitudinal clinical features at baseline and exacerbations of all samples. Longitudinal features All samples Visits P-value ‡ (N=716) Baseline Exacerbations (N=446) (N=270) FEV1 † 1.4 (0.5) 1.5 (0.5) 1.2 (0.5) <0.001

FVC 2.8 (0.9) 3.0 (0.8) 2.5 (0.9) <0.001

FEV1/FVC ratio 0.5 (0.2) 0.5 (0.2) 0.5 (0.2) 0.26

CAT score 21.1 (7.4) 19.6 (7.1) 24.2 (7.0) <0.001

C-reactive protein (CRP) 1 5.0 (11.0) 3.0 (5.0) 10.0 (27.0) <0.001 2

Blood neutrophil count (X109 cells/L) 5.5 (2.3) 4.9 (1.7) 6.2 (2.7) <0.001

Blood lymphocyte count (X109 cells/L) 1.8 (0.7) 1.8 (0.6) 1.8 (0.7) 0.49

Blood monocyte count (X109 cells/L) 0.7 (0.3) 0.6 (0.2) 0.7 (0.3) <0.001

Blood eosinophil count (X109 cells/L) 1 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.18 2

Blood basophil count (X109 cells/L) 0.0 (0.0) 0.1 (0.0) 0.0 (0.0) 0.01

Sputum neutrophil count % 1 78.8 (33.8) 75.1 (34.0) 84.2 (28.5) <0.001 2

Sputum lymphocyte count % 1 0.0 (0.5) 0.0 (0.3) 0.2 (1.0) 0.028 2

Sputum eosinophil count % 1 0.8 (2.0) 0.8 (2.2) 0.5 (2.0) 0.07 2

Sputum macrophage count % 1 13.0 (21.2) 14.5 (23.2) 8.5 (19.0) <0.001 2

Sputum epithelial count % 1 3.2 (8.0) 4.0 (9.8) 2.0 (4.8) <0.001 2

178 † Data present as mean (SD) unless stated below. 179 1 Median (interquartile range). 180 ‡ P-value was calculated for baseline and exacerbations comparison using T-test unless stated below. 2 181 Mann-Whitney-Wilcoxon Test. 182

9

183 Similar to other sputum or lung microbiome studies 11-15 23-26, the vast majority of OTUs 184 belonged to Proteobacteria (52.3%), Firmicutes (28.7%), Bacteroidetes (15.0%) and 185 Actinobacteria (1.9%) at the phylum level (Table S3, Fig. S2). At the genus level, Haemophilus 186 (25.8%) was most abundant across all samples, followed by Veillonella (15.8%) and Prevotella 187 (13.2%). Other common genera in the airway such as Streptococcus (4.4%) and Moraxella 188 (4.0%) were also among the most abundant genera identified. As a quality control for sample 189 processing and sequence analyses, an additional aliquot of sputum was collected as duplicates for 190 11 samples from the same subject at the same visit. Duplicates all had low UniFrac distance and 191 were highly similar in microbial composition (Fig. S3). 192 193 Overall, the microbiome composition was similar between baseline and exacerbation samples 194 with a small significant decrease of Veillonella at exacerbations (repeated measures ANOVA, 195 FDR-adjusted (adj.) P= 0.042) (Fig. 1A). The microbiome composition was similar among 196 centers with a significantly higher alpha diversity in the London cohort (Fig. S4A). Within each 197 center, there was a significant decrease of alpha diversity (Shannon, repeated measures ANOVA, 198 P=1.1e-4) and increase of Moraxella at Leicester (Fig. S4B). A strong negative correlation was 199 found between the abundance of Moraxella and alpha diversity for all samples (Shannon, R=- 200 0.445, adj. P<9.6e-14, Fig. 1B). 201 202 Similar to previously observed 11, distinct microbial populations were found in bacterial and 203 eosinophilic exacerbations, with a significantly decreased alpha diversity (Shannon, T-test 204 P=0.008) and significantly increased proportion of Proteobacteria (T-test, adj. P=0.001) versus 205 Bacteroidetes (T-test, adj. P=0.002) in bacterial exacerbations compared to eosinophilic 206 exacerbations (Fig. 1C-D, Fig. S5A). An improvement in predicting the two phenotypes was 207 observed according to PLS-DA by combining the clinical and microbiome datasets versus using 208 the clinical data only (Fig. S5B). Within individual centers, this trend was more pronounced for 209 Leicester samples than those of London or Manchester (Fig. S5C). 210 211 We performed multivariate analysis to identify clinical factors significantly associated with 212 microbial alpha and beta diversity. Among all clinical variables, C-reactive protein (CRP), a

10

213 known inflammatory marker for COPD prognosis 27, was the most significant factor correlated 214 with both alpha diversity (Shannon, P<0.01, Fig. S6) and beta diversity at the phylum level 215 among all samples (Table S4). No factors significantly predicted variation at the genus and OTU 216 levels. CRP was also significantly associated with alpha and beta diversity of the predicted 217 functional profiles of the sputum microbiome using the software PICRUSt 28 (Table S5). 218 219 Increased disease severity in exacerbations with dysbiosis 220 Longitudinal sampling of a large cohort over a two year period enables temporal variability and 221 dysbiosis of the sputum microbiome to be quantitatively measured. To explore variation of the 222 sputum microbiome over time, we plotted the first Principal Coordinate (PC1) of the weighted 223 UniFrac distance for all samples within each subject as a proxy for their microbial compositions, 224 as it explains 49.0% of the total beta diversity (Fig. 2A). Only subjects with at least two baseline 225 and one exacerbation samples were included. Visual inspection of the plot revealed a deviation 226 of PC1 for many exacerbation samples relative to baseline samples from the same subject (Fig. 227 2A, Fig. S7A), indicating specific exacerbations were particularly susceptible to alternation of 228 microbial composition or dysbiosis. In comparison, the sputum microbiome was much less 229 variable among baseline samples. This is supported by a significantly increased within subject 230 standard deviation of PC1 (paired T-test, P=6.7e-4) combining baseline and exacerbation 231 samples compared to baseline samples only, with the most profound changes at the Leicester 232 center (Fig. S7B). 233 234 Having assessed temporal variability of the sputum microbiome at baseline, we measured the 235 dysbiosis of exacerbation as a Z-score that measures how much its PC1 deviated from all 236 baseline PC1s from the same individual. A total of 49 exacerbations (out of 119 exacerbations 237 with a Z-score, 41.2%) were identified as in significant dysbiosis state with an absolute Z-score 238 greater than 2 (P<0.05, Fig. 2A, Fig. S7C). In most of these exacerbations, the sputum 239 microbiome shifted from a balanced composition to a more biased one predominated by one or a 240 few taxa with a decreased alpha diversity (Fig. S8). Bacterial genera of Veillonella, Cronobacter 241 and Haemophilus were among the key taxa associated with the dysbiosis (Fig. S9). Across all 242 exacerbation subtypes, bacterial exacerbations had the highest number of dysbiosis events than

11

243 other subtypes (Fig. S7C), with the caveat that phenotype could not be defined for 21 of the 49 244 exacerbations due to missing data. 245 246 For exacerbations with or without significant dysbiosis, we compared the exacerbation severity 247 determined by change in lung function and symptoms relative to the last baseline measurement. 248 We found a non-significantly greater decrease in FEV1 and FVC and a greater increase in CAT 249 score for exacerbations with dysbiosis compared to those without (Fig. 2B). Such trends were 250 overall consistent within each center, except for a reversal trend of FVC in Manchester which 251 has a smaller sample size (Fig. S10A). Also, the exacerbation Z-score was positively correlated, 252 albeit non-significantly, with changes of FEV1 and FVC, and negatively correlated with change 253 of CAT score (Fig. S10B), suggesting that the more dysbiotic the exacerbation was, the more 254 severe the clinical outcome could possibly be. As eosinophil abundance is another important 255 factor for COPD exacerbations, we reclassified exacerbations according to both the dysbiosis 256 and blood eosinophil indices. Doing so revealed four subgroups of exacerbations where 257 dysbiosis and/or high blood eosinophil level (>3 x108 cells/L) are the predominant feature. 258 Exacerbations with both dysbiosis and high eosinophil level had the greatest changes of FEV1 259 (statistically significant, ANOVA P=0.02), FVC and CAT score, whereas exacerbations with 260 neither dysbiosis nor high eosinophils level were associated with the least of such changes (Fig. 261 2B). 262 263 Exacerbation frequency associated with temporal variability of the sputum microbiome 264 We next sought to quantify temporal variability of the sputum microbiome within subjects using 265 the metrics described by Flores et al. 20. Only subjects with at least three samples were included. 266 The variability of microbial alpha diversity was denoted as the coefficient of variation of 267 Shannon for samples within each subject. The variability of beta diversity was calculated as the 268 median of pairwise UniFrac distances for samples within each subject. A wide range of temporal 269 variability of alpha and beta diversity was observed across subjects (Fig. 3A). We noted that 270 there was a significantly lower variation of alpha and beta diversity among London subjects than 271 Leicester or Manchester ones (T-test, P<0.001). As expected, both variations of alpha and beta

12

272 diversity were significantly higher in subjects with dysbiosis exacerbations than those without 273 (T-test, P<0.01). 274 275 We constructed a generalized linear model (GLM) to look for clinical characteristics associated 276 with temporal variability of the sputum microbiome. A set of 14 demographic and baseline 277 clinical variables were included for each subject. Center, FEV1/FVC ratio and number of 278 exacerbations per year (prior to the sampling visits) were significant factors for the variation of 279 alpha diversity across all subjects (Table 3). When reconstructing GLM for each center, number 280 of exacerbations per year was significant for London and Leicester subjects. In addition, center 281 and number of exacerbations per year were significantly associated with beta diversity variation 282 across all subjects (Table 2). A continuous decreasing trend of number of exacerbations per year 283 was observed toward subjects with greater variation of beta diversity (Fig. 3B). Likewise, a 284 continuous decreasing trend of beta diversity variation was observed toward subjects with higher 285 number of exacerbations per year (ANOVA, P<0.05, Fig. 3C). Similar trends were observed 286 within each center (Fig. S11) and for temporal variability of baseline microbiomes only (Fig. 287 S12). 288 289 Table 3. List of demographic and baseline clinical variables significantly associated with 290 temporal variability of microbial alpha and beta diversity among subjects. P-values are indicated 291 for significant variables in the model. Temporal variability Alpha diversity (Shannon) Beta diversity (Weighted UniFrac distance) All London Leicester Manchester All London Leicester Manchester Number of exacerbations 0.01 0.01 3E-4 # 0.01 0.03 # # per year BMI 0.11 ‡ # 0.01 # # # 0.04 # CES-D score # 0.03 # # # # # 0.01 Packs of cigarette per year # # # # 0.07 ‡ # 0.02 # FEV1 0.03 # # # # # 0.17 ‡ # FEV1/FVC ratio # # 0.01 # # # # # Age # 0.03 # # # # # # SGRQ total score # # # # # # # 0.03 Center 2.6E-12 NA NA NA 1.8E-10 NA NA NA 292 # P≥0.05 and absent in the model. 293 ‡ Variables not statistically significant but present in the model. 13

294 Discussion 295 Culture-independent analyses have uncovered a previously unappreciated complexity of the lung 296 bacterial community that has reshaped our understanding of COPD etiology 11-15 23-26. Our study 297 reveals a diverse sputum microbiome among the COPDMAP subjects and further validates the 298 association of microbiome with specific exacerbation phenotypes. We also show in-depth 299 temporal variation of the sputum microbial community within subjects and identified potential 300 new relationships of the microbiome variation with patient disease progression. 301 302 One advantage of our study is the longitudinal sampling at multiple baseline and exacerbation 303 visits compared to most previous studies where a single snapshot of exacerbations was taken. It 304 is well appreciated that the lung microbiome is inherently variable shaped by the balance of 305 ecological factors like microbial immigration and elimination 10. During exacerbations, the 306 balance goes awry with dysregulated host immune response and inflammation leading to further 307 microbial changes or dysbiosis. Therefore, to explicitly determine the extent of disease 308 associated dysbiosis one would need to first distill the normal perturbations of microbial 309 composition. Our study underscores the importance of considering temporal variability of the 310 microbiome in understanding the significance of microbial dysbiosis in COPD exacerbations. 311 312 From assessing temporal variation of the sputum microbiome, we identified a subset of 313 exacerbation events in which significant dysbiosis is a feature. In these exacerbations, the 314 microbiome composition shifted from a highly diverse microbial community to a less diverse 315 one characterized by the predominance of only one or few genera. These dysbiosis exacerbations 316 appear to be the main source of microbial temporal variation and are associated with a greater 317 worsening of health status and decrease of lung capacity. To our knowledge, this is the first 318 evidence to suggest that respiratory dysbiosis is associated with increased exacerbation severity 319 in COPD, although the strength of this association is weak and needs to be further validated in 320 additional cohorts and by other measures of disease severity. Altered environmental conditions 321 in exacerbations could disturb the composition of the lung microbial community 29 30, which in 322 turn elicit a dysregulated host immune response through bacterial metabolites and virulent 323 factors, resulting in a sustained damage cycle with an accelerated decline in lung function 31.

14

324 Whether dysbiosis is the cause or consequence of the increased exacerbation severity and how 325 this imbalance is implicated in host inflammatory pathways are new questions that will impact 326 on how we understand and treat COPD exacerbations. 327 328 It has been recently emphasized that not all COPD exacerbations are the same 32. Our results 329 suggest the existence of subgroups of exacerbations associated with or without significant 330 microbial dysbiosis or increased eosinophilia. Importantly, these subgroups likely reflect 331 fundamental differences in their immuno-pathogenesis driving the exacerbations, and therefore 332 might require alternative therapeutic approaches. Interestingly, the most severe exacerbations 333 were observed in the small subgroup that had evidence of bacterial dysbiosis in concert with 334 eosinophilic inflammation. It is possible that this group might require interventions such as 335 antibiotics and steroids (i.e. prednisolone) to target both bacteria and eosinophilic inflammation 336 whereas in contrast those without bacterial dysbiosis nor eosinophilic inflammation might not 337 require these therapies. Our results perhaps establish a new paradigm in stratifying COPD 338 exacerbations according to both dysbiosis and eosinophil measurements, which could be 339 informative guiding future personalized therapies. Further efforts in identifying biomarkers for 340 these subgroups in larger populations could help refine exacerbation subtypes toward phenotype- 341 specific clinical management. 342 343 We found a significantly decreased exacerbation frequency in subjects with higher temporal 344 variation of the microbial beta diversity. In COPD there is a subset of frequent exacerbators that 345 are particularly susceptible to recurrent exacerbations independent of other risk factors 33 34. Thus 346 low temporal variability of the sputum microbiome might come as a predicative factor for the 347 frequent exacerbator phenotypes. Frequently recurrent exacerbations are often associated with 348 persistent propensity to inflammation with high levels of airway biomarkers such as IL-6 and IL- 349 8 35 36. It is possible that elevated airway inflammation in frequent exacerbators maintains the 350 microbiome in a sustainable dysbiotic state and prevents it from regular fluctuation. 351 352 An important novelty of our study is that there were three unique study centers. All samples were 353 processed in a central lab, which minimizes microbiome variation due to differences in

15

354 experimental protocols. Interestingly, there was a significant difference in temporal variability of 355 the microbiome among subjects in the three centers, even though their overall microbiome 356 profiles were highly similar. Factors accounting for the among-center variation could include 357 differences in the frequency of clinical visits and compliance with medications, although we lack 358 the comparative data across centers to suggest specific causes. Our study suggests that the impact 359 of demographics and clinical procedures on the lung microbial community needs to be broadly 360 considered in future studies. 361 362 Our study has several caveats. First, only a proportion of the bacterial 16S rRNA gene was 363 sequenced to characterize the microbial population both here and in previous lung microbiome 364 studies 11-14. Thus the resolution is insufficient when it comes to species-level characterization of 365 the microbiome, whereas ecological and functional interaction of individual species or strains 366 could be important in the underlying disease etiology. Second, despite a large cohort size, 367 longitudinal sampling remains relatively sparse for many subjects with variation in the timing of 368 their sampling visits. Further efforts on characterizing respiratory tract metagenomes in a more 369 regularly and intensively followed patient cohort together with host multi-omics profiling would 370 promise to bring in a more comprehensive picture of the intrinsic variability of the lung 371 microbiome and its implications in disease heterogeneity. 372 373 In summary, our study revealed a temporally dynamic sputum microbiome in COPD subjects in 374 which microbial dysbiosis in exacerbations, particularly in concert with eosinophilic 375 inflammation, was associated with increased exacerbation severity. Our findings underscore the 376 importance of considering temporal variability of the sputum microbiome in COPD 377 heterogeneity and its potential as a biomarker toward more precise treatment of COPD.

16

378 Figure legends 379 Figure 1. Baseline and exacerbation microbiome profiles across centers. A) Alpha diversity 380 (Shannon) and compositions of major phyla and genera in samples at baseline and exacerbations. 381 B) Correlation between alpha diversity (Shannon) and relative abundance of Moraxella. Each dot 382 represents a sample colored by baseline or exacerbations. C) Alpha diversity (Shannon) and 383 composition of major phyla and genera in exacerbation samples of different exacerbation 384 phenotypes. D) Principal Coordinate Analysis (PCoA) showing distinct clustering of samples 385 with bacterial and eosinophilic exacerbations. The number of samples is indicated in the 386 parenthesis under each subgroup in the bar chart. B: bacterial; V: viral; E: eosinophilic; BE: 387 bacterial and eosinophilic; BV: bacterial and viral; and Pauci: pauci-inflammatory. Error bars are 388 within 1.5 interquartile range of the upper and lower quartiles. *** adj. P<0.001; ** adj. P<0.01; 389 * adj. P<0.05. 390 391 Figure 2. Dysbiosis of the sputum microbiome. A) Scatter plot of the first Principal Coordinate 392 (PC1) of all samples within each subject at each center. Only subjects with at least two baseline 393 and one exacerbation samples were included. Exacerbation samples are highlighted in red. Box- 394 whisker plots indicate the distribution of baseline PC1s within each subject. The confidence 395 bands indicate the 95% confidence interval for the mean baseline PC1s within each subject. 396 Exacerbations outside the confidence bands are the ones with significant dysbiosis (absolute Z- 397 score>2, P<0.05). B) Box-whisker plots showing changes of FEV1, FVC and CAT score 398 between dysbiosis and non-dysbiosis exacerbations, and among four subgroups of exacerbations 399 classified by dysbiosis and blood eosinophils level. Error bars are within 1.5 interquartile range 400 of the upper and lower quartiles. 401 402 Figure 3. Temporal variability of the sputum microbiome. A) Temporal variability of microbial 403 alpha (coefficient of variation of Shannon) and beta diversity (median of pairwise weighted 404 UniFrac distances) for each subject. Only subjects with at least three samples were included. 405 Subjects with dysbiosis exacerbations are highlighted in yellow. B) Box-whisker plots showing 406 exacerbation frequency of subjects within different quartile groups of temporal variability of 407 alpha and beta diversity, with the first quartile defined as ‘low’, the second and third quartiles as

17

408 ‘medium’ and the fourth quartile as ‘high’. C) Box-whisker plots showing temporal variability of 409 alpha and beta diversity in subjects with different classes of exacerbation frequency. The number 410 of samples is indicated in the parenthesis under each subgroup in the box-whisker plot. Error 411 bars are within 1.5 interquartile range of the upper and lower quartiles. ANOVA test for 412 temporal variability of alpha and beta diversity: *** adj. P<0.001; ** adj. P<0.01; * adj. P<0.05. 413

18

414 Reference 415 1. Lopez AD, Shibuya K, Rao C, et al. Chronic obstructive pulmonary disease: current burden and future 416 projections. The European respiratory journal 2006;27(2):397-412. doi: 417 10.1183/09031936.06.00025805 418 2. Franklin W, Lowell FC, Michelson AL, et al. Chronic obstructive pulmonary emphysema; a disease of 419 smokers. Annals of internal medicine 1956;45(2):268-74. 420 3. Taraseviciene-Stewart L, Douglas IS, Nana-Sinkam PS, et al. Is alveolar destruction and emphysema in 421 chronic obstructive pulmonary disease an immune disease? Proceedings of the American 422 Thoracic Society 2006;3(8):687-90. doi: 10.1513/pats.200605-105SF 423 4. Erkan L, Uzun O, Findik S, et al. Role of bacteria in acute exacerbations of chronic obstructive 424 pulmonary disease. International journal of chronic obstructive pulmonary disease 425 2008;3(3):463-7. 426 5. Sethi S, Murphy TF. Bacterial infection in chronic obstructive pulmonary disease in 2000: a state-of- 427 the-art review. Clinical microbiology reviews 2001;14(2):336-63. doi: 10.1128/CMR.14.2.336- 428 363.2001 429 6. Miravitlles M, Espinosa C, Fernandez-Laso E, et al. Relationship between bacterial flora in sputum and 430 functional impairment in patients with acute exacerbations of COPD. Study Group of Bacterial 431 Infection in COPD. Chest 1999;116(1):40-6. 432 7. Ball P. Epidemiology and treatment of chronic bronchitis and its exacerbations. Chest 1995;108(2 433 Suppl):43S-52S. 434 8. Soler N, Torres A, Ewig S, et al. Bronchial microbial patterns in severe exacerbations of chronic 435 obstructive pulmonary disease (COPD) requiring mechanical ventilation. American journal of 436 respiratory and critical care medicine 1998;157(5 Pt 1):1498-505. doi: 437 10.1164/ajrccm.157.5.9711044 438 9. Monso E, Ruiz J, Rosell A, et al. Bacterial infection in chronic obstructive pulmonary disease. A study of 439 stable and exacerbated outpatients using the protected specimen brush. American journal of 440 respiratory and critical care medicine 1995;152(4 Pt 1):1316-20. doi: 441 10.1164/ajrccm.152.4.7551388 442 10. Dickson RP, Martinez FJ, Huffnagle GB. The role of the microbiome in exacerbations of chronic lung 443 diseases. Lancet 2014;384(9944):691-702. doi: 10.1016/S0140-6736(14)61136-3 444 11. Wang Z, Bafadhel M, Haldar K, et al. Lung microbiome dynamics in chronic obstructive pulmonary 445 disease exacerbations. The European respiratory journal 2016 doi: 10.1183/13993003.01406- 446 2015 447 12. Molyneaux PL, Mallia P, Cox MJ, et al. Outgrowth of the bacterial airway microbiome after rhinovirus 448 exacerbation of chronic obstructive pulmonary disease. American journal of respiratory and 449 critical care medicine 2013;188(10):1224-31. doi: 10.1164/rccm.201302-0341OC 450 13. Huang YJ, Sethi S, Murphy T, et al. Airway microbiome dynamics in exacerbations of chronic 451 obstructive pulmonary disease. Journal of clinical microbiology 2014;52(8):2813-23. doi: 452 10.1128/JCM.00035-14 453 14. Millares L, Ferrari R, Gallego M, et al. Bronchial microbiome of severe COPD patients colonised by 454 Pseudomonas aeruginosa. European journal of clinical microbiology & infectious diseases : 455 official publication of the European Society of Clinical Microbiology 2014;33(7):1101-11. doi: 456 10.1007/s10096-013-2044-0 457 15. Huang YJ, Kim E, Cox MJ, et al. A persistent and diverse airway microbiota present during chronic 458 obstructive pulmonary disease exacerbations. Omics : a journal of integrative biology 459 2010;14(1):9-59. doi: 10.1089/omi.2009.0100 19

460 16. Huang YJ, Erb-Downward JR, Dickson RP, et al. Understanding the role of the microbiome in chronic 461 obstructive pulmonary disease: principles, challenges, and future directions. Translational 462 research : the journal of laboratory and clinical medicine 2017;179:71-83. doi: 463 10.1016/j.trsl.2016.06.007 464 17. National Institute for Health and Clinical Excellence. Chronic obstructive pulmonary disease: 465 management of chronic obstructive pulmonary disease in adults in primary and secondary care. 466 London: National Clinical Guideline Centre: Available from: 467 http://guidance.nice.org.uk/CG101/Guidance/pdf/English2010. 2010 468 18. Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community 469 sequencing data. Nature methods 2010;7(5):335-6. doi: 10.1038/nmeth.f.303 470 19. Eriksson L JE, Kettaneh-Wold N. Multi- and Megavariate Data Analysis, Part 2, Advanced Applications 471 and Method Extensions. MKS Umetrics AB 2006 472 20. Flores GE, Caporaso JG, Henley JB, et al. Temporal variability is a personalized feature of the human 473 microbiome. Genome biology 2014;15(12):531. doi: 10.1186/s13059-014-0531-y 474 21. R Core Team (2014) R: A language and environment for statistical computing. R Foundation for 475 Statistical Computing, Vienna, Austria.;ISBN 3-900051-07-0, URL http://www.R-project.org/ 476 22. Benjamini Y, and Hochberg, Y. Controlling the false discovery rate – a practical and powerful 477 approach to multiple testing. J R Stat Soc Ser B Methodol 1995;57:289-300. 478 23. Hilty M, Burke C, Pedro H, et al. Disordered microbial communities in asthmatic airways. PloS one 479 2010;5(1):e8578. doi: 10.1371/journal.pone.0008578 480 24. Erb-Downward JR, Thompson DL, Han MK, et al. Analysis of the lung microbiome in the "healthy" 481 smoker and in COPD. PloS one 2011;6(2):e16384. doi: 10.1371/journal.pone.0016384 482 25. Zakharkina T, Heinzel E, Koczulla RA, et al. Analysis of the airway microbiota of healthy individuals 483 and patients with chronic obstructive pulmonary disease by T-RFLP and clone sequencing. PloS 484 one 2013;8(7):e68302. doi: 10.1371/journal.pone.0068302 485 26. Pragman AA, Kim HB, Reilly CS, et al. The lung microbiome in moderate and severe chronic 486 obstructive pulmonary disease. PloS one 2012;7(10):e47305. doi: 10.1371/journal.pone.0047305 487 27. Dahl M, Vestbo J, Lange P, et al. C-reactive protein as a predictor of prognosis in chronic obstructive 488 pulmonary disease. American journal of respiratory and critical care medicine 2007;175(3):250- 489 5. doi: 10.1164/rccm.200605-713OC 490 28. Langille MG, Zaneveld J, Caporaso JG, et al. Predictive functional profiling of microbial communities 491 using 16S rRNA marker gene sequences. Nature biotechnology 2013;31(9):814-21. doi: 492 10.1038/nbt.2676 493 29. Worlitzsch D, Tarran R, Ulrich M, et al. Effects of reduced mucus oxygen concentration in airway 494 Pseudomonas infections of cystic fibrosis patients. The Journal of clinical investigation 495 2002;109(3):317-25. doi: 10.1172/JCI13870 496 30. Schmidt A, Belaaouaj A, Bissinger R, et al. Neutrophil elastase-mediated increase in airway 497 temperature during inflammation. Journal of cystic fibrosis : official journal of the European 498 Cystic Fibrosis Society 2014;13(6):623-31. doi: 10.1016/j.jcf.2014.03.004 499 31. Marsland BJ, Gollwitzer ES. Host-microorganism interactions in lung diseases. Nature reviews 500 Immunology 2014;14(12):827-35. doi: 10.1038/nri3769 501 32. Lopez-Campos JL, Agusti A. Heterogeneity of chronic obstructive pulmonary disease exacerbations: a 502 two-axes classification proposal. The Lancet Respiratory medicine 2015;3(9):729-34. doi: 503 10.1016/S2213-2600(15)00242-8 504 33. Wedzicha JA, Brill SE, Allinson JP, et al. Mechanisms and impact of the frequent exacerbator 505 phenotype in chronic obstructive pulmonary disease. BMC medicine 2013;11:181. doi: 506 10.1186/1741-7015-11-181 20

507 34. Hurst JR, Vestbo J, Anzueto A, et al. Susceptibility to exacerbation in chronic obstructive pulmonary 508 disease. The New England journal of medicine 2010;363(12):1128-38. doi: 509 10.1056/NEJMoa0909883 510 35. Bhowmik A, Seemungal TA, Sapsford RJ, et al. Relation of sputum inflammatory markers to 511 symptoms and lung function changes in COPD exacerbations. Thorax 2000;55(2):114-20. 512 36. Perera WR, Hurst JR, Wilkinson TM, et al. Inflammatory changes, recovery and recurrence at COPD 513 exacerbation. The European respiratory journal 2007;29(3):527-34. doi: 514 10.1183/09031936.00092506

515

21

1 Sputum microbiome temporal variability and dysbiosis in chronic obstructive pulmonary 2 disease exacerbations: an analysis of the COPDMAP study 3 4 Zhang Wang, Richa Singh, Bruce E. Miller, Ruth Tal-Singer, Stephanie Van Horn, Lynn 5 Tomsho, Alexander Mackay, James P. Allinson, Adam J. Webb, Anthony J. Brookes, Leena M. 6 George, Bethan Barker, Umme Kolsum, Louise E Donnelly, Kylie Belchamber, Peter J. Barnes, 7 Dave Singh, Christopher E. Brightling, Gavin C. Donaldson, Jadwiga A. Wedzicha, James R. 8 Brown on behalf of COPDMAP 9 10 11 12 13 SUPPLEMENTARY MATERIAL 14

1

15 Material and Methods 16 Study subjects and sample collection 17 COPDMAP was conducted in accordance with the Declaration of Helsinki 1 and Good Clinical 18 Practice 2, and was approved by the Imperial College London, University of Leicester and 19 University of Manchester Research Ethics Committee. All participants provided written 20 informed consent. Subjects with a physician diagnosis of COPD were recruited from three 21 clinical centers at Imperial College London, University of Leicester and University Hospital of 22 South Manchester, and through local advertising to enter studies investigating biomarkers in 23 COPD as previously described 3 4. The Imperial samples were collected at Royal Free Hospital of 24 University College London at the time of the study. Subjects with asthma, or significant 25 respiratory disease other than COPD, or the inability to produce sputum after sputum induction 26 were excluded from the study. Sputum samples from COPD subjects were collected at multiple 27 longitudinal visits including both baseline (defined as no evidence of symptom-defined 28 exacerbations in the preceding four weeks and the subsequent two weeks post-clinic visit) and 29 exacerbations (defined according to Anthonisen criteria 5 and/or healthcare utilization 6). All 30 exacerbation sputum samples were collected prior to the institution of any exacerbation 31 treatment. Demographic, baseline and longitudinal clinical data were recorded for samples. A 32 number of 15 subjects were missing any demographic or clinical data and were excluded for 33 biostatistical analysis. 34 35 16S rRNA sequencing 36 For quality control purposes, all DNA extractions, sequencing and data analyses occurred in a 37 single, centralized lab at the GSK R&D facility in Collegeville, Pennsylvania, USA. Frozen 38 sputum samples homogenized in sterile 1x sputasol (0.1% DTT) was thawed completely on 39 bench. Bacterial genomic DNA was extracted from sputum samples using the Qiagen DNA Mini 40 kit (Qiagen, CA, USA) as per manufacture protocol. The V4 hypervariable region of the 16S 41 rRNA gene was PCR amplified using specific primers (515F: 5’ 42 GTGCCAGCMGCCGCGGTAA3’, 806R: 5’GGACTACHVGGGTWTCTAAT3’), including 43 Illumina sequencing adapters 7. The reverse amplification primer contained a 12 bp error- 44 correcting Golay barcode sequence allowing for pooling of multiple samples in the same 45 flowcell 8. Negative controls for extraction (no sputum material) and PCR amplification (no

2

46 template, Qiagen Elution Buffer only) were included in each experiment. The extraction negative 47 control for each experiment was subsequently sequenced to identify any potential contaminating 48 bacterial species. 49 50 The amplification mix (25 μl) contained 4 μl sputum DNA, 2 μl (0.2 µM) each of forward and 51 reverse primers (Integrated DNA Technologies, Coralville, IA), 12.5 μl of 2x KAPA HiFi 52 HotStart Ready Mix (KK2602, Kapa biosystems, Boston MA), and 4.5 μl RNase free water. 53 PCR amplification was performed on an ABI 9700 thermocycler using the following cycling 54 protocol: initial denaturation at 95°C for 3 min, followed by 35 cycles of 98°C for 20 sec, 66°C 55 for 15 sec, and 72°C for 15 sec, with a final hold of 72°C for 1 min. Aliquots of reaction mixture 56 (3 µl each) were analyzed by 2% agarose gel (2% Egel, Invitrogen) with samples containing a 57 band of approximately 385 bp considered ‘PCR positive’. Samples with no visible amplified 58 product were considered ‘PCR negative’. Unincorporated nucleotides and remaining primers 59 were removed using Agencourt AMPure XP-PCR clean up (A63882, Beckman Coulter, 60 Pasadena, CA), according to the manufacturer’s protocol. The DNA concentration of the eluted 61 product was quantified using the KAPA Library Quantification Kit for Ilumina platform 62 (KK4835, Kapa biosystems, Boston MA). PCR products were normalized to 10 nM and 63 quantified again using the KAPA Library Quantification kit and pooled into equimolar 4 nM 64 pools. 65 66 The amplified PCR products were sequenced in five runs on an Illumina MiSeq sequencer 67 (Illumina, San Diego, CA). Following cluster formation on the MiSeq instrument, the amplicons 68 were sequenced using primers complimentary to the V4 region and designed for paired-ends 69 sequencing. A third sequencing primer was used for reading the barcodes. To check for proper 70 cluster density and sample normalization, a MiSeq single-end 26 bp+12 bp index sequencing run 71 was performed using the MiSeq instrument. The pool was mixed with a PhiX library (Illumina, 72 San Diego CA) at a ratio of 1:9 in order to increase the entropy of the library. A final MiSeq 2x 73 150 bp+12 bp index sequencing run was performed on the pooled samples. 74 75 Although negative reagent controls were performed for all DNA isolation, extraction and PCR 76 amplification step, we performed further analyses to ensure that potential contamination risks

3

77 were minimized. We compared our results against the 92 contaminant genera detected in 78 sequenced negative ‘blank’ controls by Salter et al. 9. We failed to detect 42 out of the 92 79 contaminant genera in our dataset (Table S6). Of the remaining genera that were found in our 80 data, none had an average relative abundance greater than 0.002, or had a relative abundance 81 greater than 0.1 in any particular sample, except for Pseudomonas and Streptococcus which are 82 known lung pathogens (Table S6). 83 84 16S rRNA sequence analysis and OTU classification 85 First, all reads mapping to PhiX reference sequence (GenBank: NC_001422.1) using bowtie 86 v1.0.1 10 were removed from the analysis. Remaining paired-end reads were merged using pear 87 v0.9.5-64 11, discarding all reads containing ambiguous bases (option ‘-u 0’). A paired-end read 88 was discarded if one of the following conditions was met: overlap < 10 bp, assembly length < 50 89 bp or p-value of alignment > 0.01. Sequencing reads were processed using QIIME pipeline 90 version 1.9 12. Eukaryotic, mitochondria and chloroplast sequences were filtered by BLASTN 91 against the SILVA database [5]. Chimeric reads were identified using UCHIME using both de 92 novo and reference based methods with default parameters 13. A total of 68,643,967 reads were 93 generated, and 55,786,582 reads were retained after filtering processes. The remaining reads 94 were subject to a 97% identity cutoff close reference OTU picking using the UCLUST method 14 95 against the August 2013 edition of the Greengenes 16S rRNA database (v13_8) 15. OTU 96 clustering was performed on each run separately and the resulting OTU tables were merged 97 afterwards. OTUs that contain a single read (singleton OTUs) were excluded to remove potential 98 sequencing artifacts. All 716 samples were rarefied to 46,056 reads which is the minimum 99 number of aligned reads across all samples. The rarefied OTU table was used for assessing 100 alpha, beta diversity and for subsequent statistical analyses. Alpha and beta diversity was also 101 calculated from functional prediction of microbial gene families and pathways using the software 102 PICRUSt 16. 103 104 Statistical analysis 105 PLS-DA between the sputum microbiome and exacerbation phenotypes 106 Exacerbation phenotypes were defined using slightly modified microbiological and clinical 107 criteria as established previously 4 17. In particular, the total bacterial load was estimated by the

4

108 qPCR copy number of Haemophilus influenzae normalized by the proportion of the species in 109 the sputum microbiome, and a bacterial exacerbation was defined as a total bacterial load >= 107 110 cells 4 17. A virus exacerbation was defined as a positive sputum viral detection by PCR. An 111 eosinophilic exacerbation was defined as the presence of more than 3% non-squamous cells in 112 sputum. Samples with multiple criteria satisfied were classified as the corresponding 113 combination of phenotypes. The remaining samples associated with limited changes in the 114 inflammatory profile were classified as ‘pauci-inflammatory’. The phenotype could not be 115 defined for 146 exacerbation samples due to missing data. A total of 25 clinical variables were 116 included in the analysis. PLS-DA was performed using SIMCA-P (Umetrics, Stockholm, 117 Sweden) 18 as previously described 17. For subjects with multiple exacerbations, only the initial 118 exacerbation sample was included in the analysis to meet the independence assumption for PLS- 119 DA. 120 121 Dysbiosis and temporal variability of the sputum microbiome 122 A total of 64 subjects that had at least two baseline and one exacerbation samples were included 123 in the dysbiosis analysis. We measured the dysbiosis of exacerbations relative to baseline 124 samples of the same subject using PC1 of weighted UniFrac distance. Assuming a normal 125 distribution of the baseline PC1s within each subject, the mean and standard deviation of the 126 baseline PC1s were calculated. And a Z-score was calculated for each exacerbation as: 1 = 127 𝑒𝑒 𝑏𝑏 𝑒𝑒 𝑃𝑃𝑃𝑃 − 𝜇𝜇 e b b 𝑏𝑏 128 where PC1 is the exacerbation PC1, and𝑍𝑍 μ and σ are the mean and standard deviation of all 𝜎𝜎 129 baseline PC1s of the same subject, respectively. An absolute Z-score greater than 2 was used as 130 cutoff for dysbiosis, which is equivalent to a probability of 0.05 in observing the exacerbation 131 PC1 from the subject under the distribution of its baseline PC1s. 132 133 A total of 126 subjects that had at least three samples were included in the analysis of the 134 microbial temporal variability. We adopted the metrics from Flores et al. 19 to assess temporal 135 variability of microbial alpha and beta diversity. For the variability of alpha diversity, we 136 calculated the coefficient of variation (CV) as standard deviation normalized by mean for the 137 Shannon of all samples within each subject. For the variability of beta diversity, we calculated

5

138 the median of the pairwise weighted UniFrac distances of all samples within each subject. 139 Higher values of these measurements represent a more variable microbial community. Subjects 140 were then divided into quartiles based on the CV of Shannon and the median of pairwise 141 weighted UniFrac distances where the first quartile was defined as ‘low’, the second and third 142 quartiles as ‘medium’ and the fourth quartile as ‘high’ for each measurement of temporal 143 variability. 144 145 Clinical predictors of the sputum microbiome diversity and temporal variability 146 Multivariate models were constructed to assess the significant association between patient 147 demographic and clinical variables to alpha and beta diversity and their temporal variability. 148 Both the microbiome and clinical datasets were pre-processed as described previously 17. To 149 identify clinical predictors of temporal variability of alpha and beta diversity, a general linear 150 model (GLM) was constructed between demographic/baseline clinical variables and each 151 measurement respectively, for all subjects and subjects within each center. A set of 14 152 demographic and baseline clinical variables were included for each subject. A total of 113 153 subjects with complete measurements of these data were included in the analysis. The 113 154 subjects were not significantly different from the remaining subjects in terms of study center 155 distribution, major demographic and baseline clinical variables (age, FEV1, FVC, CAT score, 156 etc) and microbiome profiles. As each subject was associated a single measurement, we were 157 able to meet the independence assumption. The model was optimized in terms of Akaike 158 information criterion (AIC) through backward elimination of non-significant effects in a 159 stepwise algorithm using the “step” function in the R stats package 20. 160 161 To identify clinical predictors of microbial alpha diversity, a general linear mixed model 162 (GLMM) was constructed between clinical variables and Shannon for all samples as well as 163 samples within each center. Subject ID of each sample was used as the random factor to adjust 164 for dependency of repeated measurements on the same subject. A set of 22 demographic, 165 baseline and longitudinal clinical variables were included for all samples. A total of 391 samples 166 with complete measurements of these data were included in the analysis. The 391 samples were 167 not significantly different from the remaining samples in terms of study center distribution, 168 major clinical variables (FEV1, FVC, CAT score, etc) and microbiome profiles. The model was

6

169 optimized in terms of Akaike information criterion (AIC) through backward elimination of non- 170 significant effects in a stepwise algorithm using the “step” function in the R lmerTest package 21. 171 A GLMM was also constructed between clinical variables and Shannon for the predicted 172 metagenome from microbial taxa. 173 174 To identify clinical predictors of beta diversity, we carried out a canonical correspondence 175 analysis (CCA) using the R Vegan package 22. To meet the independence assumption, only the 176 initial baseline samples from each subject were included in the analysis. The same set of 22 177 demographic, baseline and longitudinal clinical variables were included in the analysis. A total 178 of 129 initial baseline samples with complete measurements of these data were included in the 179 analysis. CCA was performed on clinical variables and the relative abundance of taxa at each of 180 the phylum, genus and OTU level, for all samples as well as samples within each center. At each 181 level, taxa present in at least 10% of samples were included in the analysis. The model was 182 optimized in terms of Akaike information criterion (AIC) in a stepwise algorithm using the 183 “step” function in the R stats package 20. The statistical significance of each clinical variable was 184 obtained by permutation test. CCA analysis was also performed between clinical variables and 185 L2, L3 functional categories for the predicted metagenome from microbial taxa. 186 187 188

7

189 Supplementary Results

190 191 Figure S1. Time points of sample collection from subjects at London, Leicester and Manchester. 192 Each set of connected dots represents samples collected from the same subject at different visits. 193 The X axis represents days of sample collection date relative to the earliest collection date of all 194 samples at each center. Subjects were firstly grouped by the number of samples and then ordered 195 by the initial sample collection date. 196

8

197 198 Figure S2. Overview of the sputum microbiome across all 716 samples. Each column represents 199 one sample. Y-axis represents relative abundance of major phyla and genera. Samples were 200 clustered by UPGMA clustering based on the weighted UniFrac distances. 201

9

202 203 Figure S3. Highly similar microbial profiles between duplicate samples collected from the same 204 individual at the same date. Duplicate samples were grouped by subject. Genus level microbial 205 composition was shown for each sample. Genera with average relative abundance greater than 206 0.005 were included. 207

10

208 209

210 211 Figure S4. Alpha diversity (Shannon) and composition of major phyla and genera in samples A) 212 at London, Leicester and Manchester, and B) baseline and exacerbations within each center.

213

11

214 215 Figure S5. Microbiome distinguished bacterial and eosinophilic exacerbations. A) Unweighted 216 pair group method with arithmetic mean clustering showing distinct clustering of samples with 217 bacterial and eosinophilic exacerbations. B) PLS-DA classification of bacterial and eosinophilic 218 exacerbations using clinical, microbiome and their combined variables at phylum (L2), genus 219 (L6) and OTU levels. The models were evaluated in terms of area under Receiver Operating 220 Characteristic curve (AUC), R2 and Q2 scores. C) Alpha diversity (Shannon) and composition of 221 major phyla and genera in exacerbation samples of different exacerbation phenotypes at London, 222 Leicester and Manchester. The number of samples is indicated in the parenthesis under each 223 subgroup in the bar chart. Error bars are within 1.5 interquartile range of the upper and lower 224 quartiles. B: bacterial; V: viral; E: eosinophilic; BE: bacterial and eosinophilic; BV: bacterial and 225 viral; and Pauci: pauci-inflammatory. *** adj. P<0.001; ** adj. P<0.01; * adj. P<0.05. 226

12

227 228 Figure S6. Significant negative correlation between CRP and alpha diversity (Shannon). Each 229 dot represents a sample colored by center. 230

13

231 232 Figure S7. Dysbiosis of the sputum microbiome. A) Scatter plot of the first Principal Coordinate 233 (PC1) of all samples within each subject at each center. Only subjects with at least two baseline 234 and one exacerbation samples were included. Exacerbation samples were colored by different 235 exacerbation phenotypes. Most deviations of PC1 occurred at exacerbations. B) Box-whisker 236 plots showing significantly increased standard deviation of PC1 combining baseline and 237 exacerbation samples within each subject compared to baseline samples only. C). Scatter plot of 238 absolute Z-score for exacerbations at each center. Exacerbations were colored by different 239 exacerbation phenotypes. Error bars are within 1.5 interquartile range of the upper and lower 240 quartiles. B: bacterial; V: viral; E: eosinophilic; BE: bacterial and eosinophilic; BV: bacterial and 241 viral; and Pauci: pauci-inflammatory. *** adj. P<0.001; ** adj. P<0.01; * adj. P<0.05. 242 243 244

14

245 246 Figure S8. Temporal dynamics of the sputum microbiome in subjects with dysbiosis 247 exacerbations at London, Leicester and Manchester. Each horizontal bar represents alpha 248 diversity (Shannon) and genus level microbial composition of one sample. Samples were 249 grouped by subject and ordered by collection dates from bottom to top. Baseline or exacerbation 250 samples are indicated at the left of the horizontal bars. Exacerbations with dysbiosis are 251 highlighted in asterisks. 252

15

253

254 255 Figure S9. Bacterial genera associated with dysbiosis in exacerbations. Box-whisker plots 256 showing the relative abundance changes of bacterial genera relative to the last baseline 257 measurements between exacerbations with or without dysbiosis. The genera were ordered by the 258 FDR-adjusted P-value in T-test. Error bars are within 1.5 interquartile range of the upper and 259 lower quartiles. 260 261

16

262

263 Figure S10. A) Box-whisker plots showing changes of FEV1, FVC and CAT score between 264 dysbiosis and non-dysbiosis exacerbations at London, Leicester and Manchester. No data is 265 available for CAT score change in dysbiosis exacerbations at London due to missing data. B) 266 Correlations between absolute Z-score measuring exacerbation dysbiosis and changes of FEV1, 267 FVC and CAT score from baseline. Error bars are within 1.5 interquartile range of the upper and 268 lower quartiles.

269

17

270

271 Figure S11. A) Box-whisker plots showing exacerbation frequency of subjects within different 272 quartile groups of temporal variability of alpha and beta diversity at each center, with the first 273 quartile defined as ‘low’, the second and third quartiles as ‘medium’ and the fourth quartile as 274 ‘high’. B) Box-whisker plots showing temporal variability of alpha and beta diversity in subjects 275 within different classes of exacerbation frequency at each center. The number of samples is 276 indicated in the parenthesis under each subgroup in the box-whisker plot. Error bars are within 277 1.5 interquartile range of the upper and lower quartiles. *** adj. P<0.001; ** adj. P<0.01; * adj. 278 P<0.05. 279

18

280 281 Figure S12. A) Box-whisker plots showing exacerbation frequency of subjects within different 282 quartile groups of temporal variability of alpha and beta diversity for baseline samples only, with 283 the first quartile defined as ‘low’, the second and third quartiles as ‘medium’ and the fourth 284 quartile as ‘high’. B) Box-whisker plots showing temporal variability of alpha and beta diversity 285 for baseline samples only in subjects within different classes of exacerbation frequency. Error 286 bars are within 1.5 interquartile range of the upper and lower quartiles. Each comparison was 287 performed on all samples and Leicester samples. The sample sizes for London and Manchester 288 are too small to generate meaningful conclusion. The number of samples is indicated in the 289 parenthesis under each subgroup in the box-whisker plot. Error bars are within 1.5 interquartile 290 range of the upper and lower quartiles.

291

19

292 Table S1. Major demographic and baseline clinical features of all subjects and subjects at each 293 center. Features All subjects (N=281) * Center London (N=128) Leicester (N=100) Manchester (N=53) Gender † Male: 187 (70.3%), Male: 75 (64.7%), Male: 76 (76.8%), Male: 36 (70.6%), Female: 79 (29.7%) Female: 41 (35.3%) Female: 23 (23.2%) Female: 15 (29.4%) Age ‡ 70 (8.1) 71 (8.6) 69 (7.6) 67 (7.4) BMI 27.2 (5.4) 26.7 (5.7) 27.8 (5.0) 26.9 (5.1) Baseline GOLD status 1: 30 (11.4%), 2: 132 1: 9 (7.8%), 2 62 1: 8 (8.1%), 2: 51 1: 13 (26.5%), 2: 19 (50.2%), 3: 78 (29.7%), (53.9%), 3: 33 (28.7%), (51.5%), 3: 32 (38.8%), 3: 13 4: 23 (8.7%) 4: 11 (9.6%) (32.3%), 4: 8 (8.1%) (26.5%), 4: 4 (8.2%) Treatment # Antibiotics: 38 (15.3%), Antibiotics: 22 (13.9%), Antibiotics: 11 Antibiotics: 5 Steroids: 9 (3.6%), Steroids: 1 (0.6%), (15.1%), Steroids: 8 (27.8%), Steroids: 0 Both: 202 (81.1%) Both: 135 (85.4%) (11.0%), Both: 54 (0.0%), Both: 13 (74.0%) (72.2%) Number of cigarette 47 (30) 45 (34) 47 (28) 49 (32) packs per year 1 Number of 1.1 (1.6) 1.6 (1.8) 1 (1.7) 0 (0.7) exacerbation per year 1 Baseline FEV1 ‡ 1.5 (0.6) 1.3 (0.5) 1.4 (0.6) 1.7 (0.6) Baseline FEV1% 56.3 (18.9) 54.6 (17.3) 54.5 (17.4) 63.7 (23.5) Baseline FEV1 2.6 (0.5) 2.5 (0.5) 2.7 (0.6) 2.7 (0.5) predicted Baseline FVC 2.9 (1.0) 2.8 (1.0) 2.7 (0.8) 3.6 (0.8) Baseline FEV1/FVC 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) ratio CAT score 18.7 (7.3) 16.7 (7.5) 20.0 (6.4) 19.5 (8.1) CES-D score 1 10 (13) 10 (12) 10 (13) 13 (16) SGRQ total score 47.4 (18.2) 45.3 (15.3) 48.7 (18.7) 48.8 (22.3) 294 † Categorical data present as number (proportion). 295 ‡ Continuous data present as mean (SD) unless stated below. 296 1 Median (IQR). 297 * 15 subjects were missing any demographic or clinical data. 298 # The numbers represent exacerbation events, thus include subjects with more than one exacerbation. 299 300

20

301 Table S2. Major longitudinal clinical features at baseline and exacerbations of all samples and samples at each center. Features All London Leicester Manchester All Base Exac All Base Exac All Base Exac All Base Exac (N=716) (N=446) (N=270) (N=301) (N=132) (N=169) (N=303) (N=221) (N=82) (N=112) (N=93) (N=19)

FEV1 1.4 (0.5) 1.5 (0.5) 1.2 (0.5) 1.2 (0.5) 1.3 (0.5) 1.2 (0.5) 1.4 (0.5) 1.5 (0.5) 1.2 (0.4) 1.6 (0.6) 1.7 (0.6) 1.0 (0.3) FVC 2.8 (0.9) 3.0 (0.8) 2.5 (0.9) 2.7 (1.0) 3.0 (1.0) 2.6 (1.0) 2.7 (0.7) 2.7 (0.7) 2.5 (0.6) 3.3 (0.9) 3.4 (0.8) 2.1 (0.9) FEV1/FVC ratio 0.5 (0.2) 0.5 (0.2) 0.5 (0.2) 0.5 (0.2) 0.5 (0.1) 0.5 (0.2) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) 0.5 (0.4) 0.5 (0.4) 0.4 (0.2) CAT score 21.1 19.6 24.2 19.9 17.3 22.7 21.9 (6.8) 20.5 (6.7) 25.7 21.1 20.0 26.4 (7.4) (7.1) (7.0) (7.8) (6.9) (7.8) (5.5) (8.0) (8.0) (6.6) C-reactive protein 5.0 3.0 (5.0) 10.0 6.0 4.0 (6.0) 9.0 5.0 (10.0) 3.0 (5.0) 10.0 5.0 (8.0) 4.0 (5.0) 13.0 (CRP) 1 (11.0) (27.0) (16.0) (25.0) (31.0) (28.0) Blood neutrophil count 5.5 (2.3) 4.9 (1.7) 6.2 (2.7) 5.9 (2.5) 5.5 (1.9) 6.2 (2.8) 5.3 (2.1) 4.9 (1.7) 6.3 (2.6) 4.8 (1.7) 4.5 (1.4) 6.3 (2.2) (X109 cells/L) Blood lymphocyte 1.8 (0.7) 1.8 (0.6) 1.8 (0.7) 1.8 (0.7) 2.0 (0.7) 1.7 (0.7) 1.9 (0.7) 1.8 (0.6) 2.0 (0.7) 1.7 (0.6) 1.8 (0.6) 1.6 (0.8) count (X109 cells/L) Blood monocyte count 0.7 (0.3) 0.6 (0.2) 0.7 (0.3) 0.8 (0.3) 0.8 (0.2) 0.8 (0.3) 0.5 (0.2) 0.5 (0.1) 0.6 (0.2) 0.6 (0.2) 0.6 (0.2) 0.7 (0.3) (X109 cells/L) Blood eosinophil count 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) (X109 cells/L) 1 Blood basophil count 0.0 (0.0) 0.1 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.1 (0.0) 0.1 (0.0) 0.1 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) (X109 cells/L) Sputum neutrophil 78.8 75.1 84.2 73.7 73.0 73.7 77.4 73.0 89.5 88.2 88.0 89.2 count % 1 (33.8) (34.0) (28.5) (34.0) (29.0) (36.0) (34.2) (34.2) (22.2) (13.5) (17.5) (7.8) Sputum lymphocyte 0.0 (0.5) 0.0 (0.3) 0.2 (1.0) 0.0 (2.0) 0.0 (0.0) 2.0 (3.0) 0.2 (0.4) 0.2 (0.5) 0.0 (0.2) 0.0 (0.1) 0.0 (0.2) 0.0 (0.0) count % 1 Sputum eosinophil 0.8 (2.0) 0.8 (2.2) 0.5 (2.0) 0.0 (2.0) 0.0 (1.0) 1.0 (2.0) 0.8 (2.2) 0.8 (2.8) 0.5 (1.5) 1.1 (3.0) 1.2 (3.0) 1.0 (8.2) count % 1

21

Sputum macrophage 13.0 14.5 8.5 20.0 22.0 20.0 10.8 14.2 5.2 (9.5) 7.0 8.0 5.5 (9.5) count % 1 (21.2) (23.2) (19.0) (27.0) (22.0) (29.0) (19.2) (23.2) (10.8) (12.9) Sputum epithelial cell 3.2 (8.0) 4.0 (9.8) 2.0 (4.8) 0.0 (1.0) 0.0 (0.0) 0.0 (2.0) 4.8 (11.8) 6.0 (12.2) 3.5 (7.8) 1.9 (3.2) 1.5 (5.2) 3.0 (2.5) count % 1 302 Data present as mean (SD) unless stated below. 303 1 Median (IQR).

22

304 Table S3. The relative abundances of major phyla and genera (average relative abundance > 1%) in the sputum microbiome. The 305 phylum and genus level taxa were separated by the dotted line. Taxa All London Leicester Manchester All Base Exac All Base Exac All Base Exac All Base Exac (N=716 (N=446) (N=270) (N=301) (N=132) (N=169) (N=303) (N=221) (N=82) (N=112) (N=93) (N=19) ) Proteobacteria 52.3 51.5 53.7 53.0 51.4 54.2 52.1 51.8 52.8 51.3 50.9 53.3 Firmicutes 28.7 29.0 28.1 27.2 28.0 26.5 30.0 29.4 31.8 29.3 29.8 26.8 Bacteroidetes 15.0 15.3 14.5 15.5 16.1 15.0 14.4 14.9 12.9 15.5 15.2 16.8 Actinobacteria 1.9 2.0 1.8 2.1 2.2 2.0 1.7 1.9 1.3 1.9 2.0 1.4 Fusobacteria 1.4 1.5 1.3 1.6 1.6 1.6 1.3 1.4 0.9 1.3 1.4 1.0 Haemophilus 25.8 25.6 26.1 27.0 26.5 27.3 25.2 26.1 22.7 24.0 23.0 29.3 Veillonella 15.8 16.1 15.3 13.7 14.0 13.5 17.3 16.7 18.9 17.4 17.7 16.3 Prevotella 13.2 13.5 12.6 13.6 14.3 13.1 12.6 13.2 11.2 13.4 13.1 15.0 Erwinia 7.0 7.3 6.6 7.8 8.2 7.5 6.1 6.6 4.8 7.4 7.6 6.3 Granulicatella 6.8 6.8 6.9 7.5 8.0 7.1 6.9 6.9 6.9 4.8 4.7 5.4 Cronobacter 6.4 6.5 6.3 6.5 5.8 7.0 6.0 6.4 4.8 7.3 7.4 6.3 Streptococcus 4.4 4.5 4.3 4.3 4.4 4.3 4.3 4.2 4.5 5.1 5.4 3.7 Moraxella 4.0 3.6 4.8 3.1 2.7 3.4 4.8 3.7 7.8 4.4 4.5 4.1 Actinomyces 1.0 1.1 0.9 1.1 1.2 1.1 0.9 1.0 0.6 1.1 1.2 0.7 306

23

307 Table S4. List of clinical variables significantly associated with microbial alpha diversity and 308 phylum level beta diversity among samples. P-values are indicated for significant variables in the 309 model. Microbial diversity Alpha diversity (Shannon) Beta diversity (Phylum level abundance) All London Leicester Manchester All London Leicester Manchester C-reactive protein (CRP) 0.02 0.005 0.04 0.02 0.04 # # 0.01 FEV1/FVC ratio 0.003 # 0.03 0.003 0.04 # # # Age 0.03 # # # 0.02 # # 0.01 Number of exacerbations # # 0.04 # # # 0.01 # per year White blood cell # 0.02 # 0.01 # # # # Blood neutrophil count # 0.03 # 0.01 # # # # Blood lymphocyte count # 0.01 # 0.01 # # # # Blood basophil count # 0.01 # # # # # 0.11 ‡ FEV1 # # # 0.04 # # # # FVC # # # 0.01 # # # # CAT score # # 0.02 # # # # # BMI # # # # # # 0.07 ‡ # Visit type <1E-7 0.001 <1E-7 # NA a NA NA NA Center <1E-7 NA NA NA 0.001 NA NA NA 310 # P≥0.05 and absent in the model. 311 ‡ Variables not statistically significant but present in the model. 312 a Only initial baseline samples were used.

24

313 Table S5. List of clinical variables significantly associated with alpha and beta diversity of the inferred metagenomic profiles of the 314 sputum microbiome by PICRUSt 16 among samples. P-values are indicated for significant variables in the model. Inferred functional Alpha diversity (Shannon) Beta diversity (L1) Beta diversity (L2) profile All London Leicester Manchester All London Leicester Manchester All London Leicester Manchester CRP 0.05 <1E-7 # 0.01 0.03 0.02 # 0.09 ‡ 0.04 0.14 ‡ # # Basophil count # # 0.01 # # 0.15 ‡ # 0.03 # # # 0.02 FEV1 0.002 # 0.02 # # # # 0.11 ‡ # # # 0.10 ‡ Age # # # # 0.11 ‡ # # 0.11 ‡ # # # 0.11 ‡ FEV1/FVC ratio 0.002 # 0.005 # # # # # # # # # FVC 0.005 # 0.03 # # # # # # # # # Lymphocyte count # # # # # 0.02 # # # 0.05 # # Years smoked # # # # # 0.07 ‡ # # # 0.01 # # While blood cells # 0.01 # # # # # # # # # # CAT score # # 0.05 # # # # # # # # # Number of # # # # 0.11 ‡ # # # # # # # exacerbations per year Visit type <1E-7 0.01 <1E-7 # NA a NA NA NA NA NA NA NA Center <1E-7 NA NA NA # NA NA NA # NA NA NA 315 # P≥0.05 and absent in the model. 316 ‡ Variables not statistically significant but present in the model. 317 a Only initial baseline samples were used.

25

318 Table S6. Occurrence and average relative abundance of contaminate genera detected in 319 sequenced negative ‘blank’ controls by Salter et al. 9 in COPDMAP dataset. The first column 320 (occurrence rel abundance > 0) was calculated as the fraction of samples in which each genus has 321 abundance greater than 0. The second column (occurrence rel abundance > 0.1) was calculated as 322 the fraction of samples in which each genus has abundance greater than 0.1. And the third 323 column is the average relative abundance of each genus across all samples.

Occurrence (rel Occurrence (rel Average rel abundance abundance > 0) abundance > 0.1) Afipia 0 0 0 Aquabacterium 0 0 0 Asticcacaulis 0.002793296 0 6.06E-08 Aurantimonas 0.025139665 0 6.97E-07 Beijerinckia 0 0 0 Bosea 0.001396648 0 6.06E-08 Bradyhizobium 0 0 0 Brevundimonas 0.044692737 0 1.97E-05 Caulobacter 0.001396648 0 3.03E-08 Craurococcus 0 0 0 Devosia 0.011173184 0 4.25E-07 Hoeflea 0 0 0 Mesorhizobium 0 0 0 Methylobacterium 0.118715084 0 4.52E-06 Novosphingobioum 0 0 0 Ochrobactrum 0.698324022 0 4.58E-05 Paracoccus 0.086592179 0 2.30E-06 Pedomicrobiom 0 0 0 Phyllobacterium 0.009776536 0 3.34E-07 Rhizobium 0.005586592 0 1.21E-07 Roseomonas 0 0 0 Sphingobium 0.036312849 0 6.22E-06 Sphingomonas 0.160614525 0 3.02E-05 Sphingopyxis 0.018156425 0 8.49E-07 Betaproteobacteria Acidovorax 0.019553073 0 7.28E-07 Azoarcus 0 0 0 Azospira 0 0 0 Burkholderia 0 0 0 Comamonas 0.008379888 0 1.82E-07

26

Cupriavidus 0.001396648 0 9.10E-08 Curvibacter 0 0 0 Delftia 0.005586592 0 1.52E-07 Duganella 0 0 0 Herbaspirillum 0 0 0 Janthinobacterium 0.002793296 0 1.52E-07 Kingella 0.995810056 0 0.000529353 Leptothrix 0 0 0 Limnobacter 0 0 0 Massilia 0 0 0 Methylophilus 0 0 0 Methyloversatilis 0 0 0 Oxalobacter 0.025139665 0 1.27E-06 Pelomonas 0 0 0 Polaromonas 0 0 0 Ralstonia 0.005586592 0 1.21E-07 Schlegelella 0 0 0 Sulfuritalea 0 0 0 Undibacterium 0 0 0 Variovorax 0.906424581 0 0.000159964 Gammaproteobacteria Acinetobacter 1 0 0.001896829 Enhydrobacter 0.083798883 0 3.49E-06 Enterobacter 0.997206704 0 0.000509339 Escherichia 0.048882682 0 1.09E-06 Nevskia 0.001396648 0 6.06E-08 Pseudomonas 0.995810056 0.004189944 0.004687729 Pseudoxanthomonas 0.019553073 0 8.79E-07 Psychobacter 0 0 0 Stenotrophomonas 0.780726257 0 0.000139556 Xanthomonas 0 0 0 Actinobacteria Aeromicrobium 0 0 0 Arthrobacter 0.005586592 0 2.73E-07 Beutenbergia 0 0 0 Brevibacterium 0.027932961 0 1.27E-06 Corynebacterium 1 0 0.001012521 Curtobacterium 0 0 0 Dietzia 0.002793296 0 2.12E-07 Geodermatophilus 0 0 0 Janibacter 0.018156425 0 4.25E-07

27

Kocuria 0 0 0 Microbacterium 0.159217877 0 6.22E-06 Micrococcus 0.054469274 0 2.03E-06 Microlunatus 0.002793296 0 9.10E-08 Patulibacter 0 0 0 Propionibacterum 0 0 0 Rhodococcus 0.909217877 0 0.000154081 Tsukamurella 0 0 0 Firmicutes Abiotrophia 0 0 0 Bacillus 0.048882682 0 2.03E-06 Brevibacillus 0 0 0 Brochothrix 0.002793296 0 9.10E-08 Facklamia 0.002793296 0 9.10E-08 Paenibacillus 0.060055866 0 2.30E-06 Streptococcus 1 0.032122905 0.044335024 Bacteroidetes Chryseobacterium 0.205307263 0 1.11E-05 Dyadobacter 0.002793296 0 6.06E-08 Flavobacterium 0.026536313 0 1.46E-06 Hydrotalea 0 0 0 Niatella 0 0 0 Olivibacter 0 0 0 Pedobacter 0.009776536 0 4.25E-07 Wautersiella 0 0 0 Deinococcus-Thermus Deinococcus 0.124301676 0 5.34E-06 324

325

326

28

327 References 328 1. General Assembly of the World Medical A. World Medical Association Declaration of Helsinki: ethical 329 principles for medical research involving human subjects. The Journal of the American College of 330 Dentists 2014;81(3):14-8. 331 2. International Conference on Harmonisation of technical requirements for registration of 332 pharmaceuticals for human u. ICH harmonized tripartite guideline: Guideline for Good Clinical 333 Practice. Journal of postgraduate medicine 2001;47(1):45-50. 334 3. Bafadhel M, McKenna S, Terry S, et al. Blood eosinophils to direct corticosteroid treatment of 335 exacerbations of chronic obstructive pulmonary disease: a randomized placebo-controlled trial. 336 American journal of respiratory and critical care medicine 2012;186(1):48-55. doi: 337 10.1164/rccm.201108-1553OC 338 4. Bafadhel M, McKenna S, Terry S, et al. Acute exacerbations of chronic obstructive pulmonary disease: 339 identification of biologic clusters and their biomarkers. American journal of respiratory and 340 critical care medicine 2011;184(6):662-71. doi: 10.1164/rccm.201104-0597OC 341 5. Anthonisen NR, Manfreda J, Warren CP, et al. Antibiotic therapy in exacerbations of chronic 342 obstructive pulmonary disease. Annals of internal medicine 1987;106(2):196-204. 343 6. Rodriguez-Roisin R. Toward a consensus definition for COPD exacerbations. Chest 2000;117(5 Suppl 344 2):398S-401S. 345 7. Caporaso JG, Lauber CL, Walters WA, et al. Global patterns of 16S rRNA diversity at a depth of millions 346 of sequences per sample. Proceedings of the National Academy of Sciences of the United States 347 of America 2011;108 Suppl 1:4516-22. doi: 10.1073/pnas.1000080107 348 8. Caporaso JG, Lauber CL, Walters WA, et al. Ultra-high-throughput microbial community analysis on 349 the Illumina HiSeq and MiSeq platforms. The ISME journal 2012;6(8):1621-4. doi: 350 10.1038/ismej.2012.8 351 9. Salter SJ, Cox MJ, Turek EM, et al. Reagent and laboratory contamination can critically impact 352 sequence-based microbiome analyses. BMC biology 2014;12:87. doi: 10.1186/s12915-014-0087- 353 z 354 10. Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA 355 sequences to the human genome. Genome biology 2009;10(3):R25. doi: 10.1186/gb-2009-10-3- 356 r25 357 11. Zhang J, Kobert K, Flouri T, et al. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. 358 Bioinformatics 2014;30(5):614-20. doi: 10.1093/bioinformatics/btt593 359 12. Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community 360 sequencing data. Nature methods 2010;7(5):335-6. doi: 10.1038/nmeth.f.303 361 13. Edgar RC, Haas BJ, Clemente JC, et al. UCHIME improves sensitivity and speed of chimera detection. 362 Bioinformatics 2011;27(16):2194-200. doi: 10.1093/bioinformatics/btr381 363 14. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 364 2010;26(19):2460-1. doi: 10.1093/bioinformatics/btq461 365 15. McDonald D, Price MN, Goodrich J, et al. An improved Greengenes with explicit ranks for 366 ecological and evolutionary analyses of bacteria and archaea. The ISME journal 2012;6(3):610-8. 367 doi: 10.1038/ismej.2011.139 368 16. Langille MG, Zaneveld J, Caporaso JG, et al. Predictive functional profiling of microbial communities 369 using 16S rRNA marker gene sequences. Nature biotechnology 2013;31(9):814-21. doi: 370 10.1038/nbt.2676 371 17. Wang Z, Bafadhel M, Haldar K, et al. Lung microbiome dynamics in chronic obstructive pulmonary 372 disease exacerbations. The European respiratory journal 2016 doi: 10.1183/13993003.01406- 373 2015

29

374 18. Eriksson L JE, Kettaneh-Wold N. Multi- and Megavariate Data Analysis, Part 2, Advanced Applications 375 and Method Extensions. MKS Umetrics AB 2006 376 19. Flores GE, Caporaso JG, Henley JB, et al. Temporal variability is a personalized feature of the human 377 microbiome. Genome biology 2014;15(12):531. doi: 10.1186/s13059-014-0531-y 378 20. R Core Team (2014) R: A language and environment for statistical computing. R Foundation for 379 Statistical Computing, Vienna, Austria.;ISBN 3-900051-07-0, URL http://www.R-project.org/ 380 21. Kuznetsova A BPaCR. lmerTest: Tests in Linear Mixed Effects Models. 2014 381 22. Oksanen J BF, Kindt R, Legendre P, Minchin PR, O'Hara RB, Simpson GL, Solymos P, Stevens MHH and 382 Wagner H. vegan: Community Ecology Package. 2013

383

30

C A

Relative abundance % 1 9 8 7 6 5 4 3 2 1 0 1

Relative abundance % 0 0 0 0 0 0 0 0 0 0 0 9 8 7 6 5 4 3 2 1 0 % % % % % % % % % % 0 % 0 0 0 0 0 0 0 0 0 0 Shannon Shannon % 3 4 5 6 3 4 5 6 % % % % % % % % % % (37) Baseline B (446) ** ** ** ** * (13) E * Exacerbation (38) V (270) Pauci (27) (3) BE P P P F F F B A F O i i i u r r r a c r r r t o o o s h c m m m ti o t t t t e n e e e e i i i b c c c r (6) BV o o o o r a s u u u o b b b b c t t t i a e e e a a a t d e c c c c s s s e t r

t t t e V S O t e e e i a e t r e r r r t r s i h i i i i a e a a a l l e p

o H M O r t n s a o t o e h e c r l e m o l a a r c x o s c e p u l l h s a i l u s PC3 (8.24%) D B

Shannon 3 4 5 6 PC2 (13.56%) 0.0 0.2 Moraxella 0.4 FDR P<9.6e-14,R=-0.445 0.6 E Baseline xacerbations E B PC1 (60.23%) 0.8 A

−0.8 Principal−0.4 Coordinate 1 (49.0%) 0.0 0.4 0.8 B

−0.8 Exacerbation−0.4 changes from baseline −0.8 −0.4 London 0.0 0.4 0.0 0.4 EB Dys eB FEV1 FEV1* NonDys Eb † eb −1 −1 0 1 0 1 EB Dys Subject eB FVC FVC NonDys Eb † † eb Leicester −10 −10 10 20 10 20 0 0 EB Dys CAT score CAT score eB NonDys Eb eb * † NonDys: Dys: eb: Eb: eB: Non-eosinophil+Dysbiosis EB: Eosinophil+Dysbiosis P <0.05 0.05

0.1 CV of Shannon 0

0.6

0.4

0.2

Median of paired weighted UniFrac weighted Median of paired 0.0 London Leicester Manchester Subject 0.3 0.6 B C 6

0.2 0.4 4

2 0.1 0.2 Variation of beta diversity * diversity Variation of beta No. of exacerbations per year of exacerbations per No.

0 Variation of alpha diversity Low Medium High Low Medium High (31) (60) (32) (31) (60) (32) 0 0 0-1 1-2 2-3 >3 Variation of alpha diversity * Variation of beta diversity *** 0-1 1-2 2-3 >3 (46) (32) (21) (24) (46) (32) (21) (24) Number of exacerbations per year